In the legal process Kadrey against Meta is accused of Mark Zuckerberg’s company to have used works protected by copyright to train his artificial intelligence models. A few weeks ago it was revealed that Zuckerberg had approved to use pirate books, but now new and powerful evidence of this looting arrives.
Revealed emails. The “Appendix A” of the case includes several email messages from finishing employees in which it is revealed that, in effect, there were massive discharges of data in the form of books protected by copyright. One of the employees, Melanie Kambadur, showed her rejection of making that data collection in October 2022.
“Download with torrents from a company’s laptop does not seem a good idea”. In April 2023 Nikolay Bashlykov, one of those responsible for carrying out this data collection, joking including emojis and indicated that the company would have to be careful with the IP from which they downloaded the data.
Goal knew the risks. In September of that year Bashlykov already stopped using emoticons and warned that using torrents would imply acting as “seeds” so that others also download them, and “that might not be legally legally.” These debates are proof that Meta knew that this type of activity was illegal, according to the authors who have sued the company.
Erasing the footprints. In an internal message Meta Frank Zhang indicated how the company avoided using its servers by downloading this data set to “avoid” the risk that anyone can draw the seed “and who downloaded that data.
81.7 TB of data. As indicated in Ars Technica, the tests show that Meta downloaded at least 81.7 the terabytes of data from various libraries offered by those books protected by copyright. In a new document of the legal process it was indicated that at least 35.7 TB had been downloaded from sites such as Z-Library or Libgen (which ended up closing last summer).
Goal wants to dismiss those charges. Goal has presented a motion to dismiss those accusations indicating that there was no evidence that any book was downloaded by finishing employees through Torrent or that they were later distributed by goal. In WorldOfSoftware we have contacted the company, and we will update this news if we receive comments on the case.
Loot on the Internet fire. These data affect the debatable practices that AI companies are using to train their models. We saw it with Google, and of course also with Openai, which used millions of texts to train Chatgpt, and many of them had copyright. Perpleplexity was in the spotlight after discovering that the Internet rules were skipped to the bullfighter to avoid payment walls and feed its AI model.
Internet robberies are being normalized. The amazing thing about all this is that the fact that all companies are skipping the norms and violating copyright seems to be normalizing the looting of the Internet. It almost does not give time to scandal and we give it almost as a policy of consummate facts to be able to follow ours.
Is this really a “fair use”? All companies are shielded in the concept of “fair use” (“Fair Use”). This concept developed in Anglo -Saxon law allows the limited use of protected material without being necessary to ask for permission to do so. Copyright rapes have not stopped arriving in the world of generative AI, but they seem to be in the background while these giants thrive.
In WorldOfSoftware | 5,000 “tokens” of my blog are being used to train an AI. I have not given my permission