Reddit Inc. has launched lawsuits against startup Perplexity AI Inc. and three data-scraping service providers for trawling the company’s copyrighted content to be used to train AI models.
Reddit compared the data scraping companies — SerpApi, Oxylabs and AWMProxy — to “bank robbers,” adding that one of the firms “will apparently do anything to get the Reddit data it desperately needs to fuel its ‘answer engine’ — that is, anything other than enter into an agreement with Reddit directly, as some of its competitors have done.”
Several AI have already made deals with Reddit, including OpenAI, which signed on the dotted line last year to use Reddit’s trove of data to train its large language models. Though no number was given, it was reported that the deal was worth $60 million. At the time, Reddit said it hoped to bring in around $200 million from licensing agreements over the next three years, with Google LLC also signing on.
The company later launched a lawsuit against Anthropic PBC, claiming it was scraping content on Reddit to train its Claude family of AI models. That makes this latest lawsuit, filed today in the U.S. District Court for the Southern District of New York, one of a handful ongoing.
Data scraping firms are a fairly new phenomenon that appeared shortly after the generative AI explosion. According to the New York Times, SerpApi is based in Texas and serves a number of companies. Oxylabs is run out of Lithuania, and AWMProxy is Russian.
“AI companies are locked in an arms race for quality human content — and that pressure has fueled an industrial-scale ‘data laundering’ economy,” Ben Lee, the chief legal officer at Reddit, told The Times. “Scrapers bypass technological protections to steal data, then sell it to clients hungry for training material.”
According to the lawsuit, Reddit claimed it set a trap for Perplexity by publishing a “test post” on its platform that was visible only to Google’s search engine and inaccessible anywhere else on the internet. Within hours, the content of that hidden post appeared in Perplexity’s search results, Reddit said.
Perplexity has said it hasn’t yet received the lawsuit, but told media it will “fight vigorously for users’ rights to freely and fairly access public knowledge.” It added, “Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest.”
Photo: Unsplash
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
- 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
- 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About News Media
Founded by tech visionaries John Furrier and Dave Vellante, News Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.