A new proposed class action suit was filed in the federal court in Northern California today, accusing Apple of illegally using books to train its AI models. Here are the details.
Authors base the accusation on Apple’s own documents
As reported by Reuters, authors Grady Hendrix and Jennifer Robertson are accusing Apple of using a pirated dataset, in which their work was included. From the lawsuit:
“But Apple is building part of this new enterprise using Books3, a dataset of pirated copyrighted books that includes the published works of Plaintiffs and the Class. Apple used Books3 to train its OpenELM language models. Apple also likely trained its Foundation Language Models using this same pirated dataset.”
The accusation is based on details provided by Apple on its paper about OpenELM, an open-source model the company made available on Hugging Face last year.
The paper mentions RedPajama as one of the datasets used in the model. RedPajama, in turn, uses a dataset called Books3, which, as the lawsuit claimed, is “a known body of pirated books.”
The authors are requesting the court to allow the lawsuit to proceed as a Class action against Apple, and ask for the following remedies following a jury trial:
- Allowing this action to proceed as a class action, with Plaintiffs serving as Class Representatives, and with Plaintiffs’ counsel as Class Counsel;
- Awarding Plaintiffs and the Class statutory damages, compensatory damages, restitution, disgorgement, and any other relief that may be permitted by law or equity;
- Permanently enjoining Defendant from the unlawful, unfair, and infringing conduct alleged herein;
- Ordering destruction under 17 U.S.C. § 503(b) of all Apple Intelligence or other LLM models and training sets that incorporate Plaintiffs’ and Class Members’ works;
- An award of costs, expenses, and attorneys’ fees as permitted by law; and
- Such other or further relief as the Court may deem appropriate, just, and equitable.
The lawsuit follows mixed results in similar cases
Coincidentally (or maybe not), the lawsuit comes on the heels of a record $1.5 billion settlement made by Anthropic in a very similar case.
Interestingly, Meta faced a similar trial recently, but the case went its way, as the judge decided that its use of copyrighted books to train its AI models fell under fair use, a sentiment echoed recently by President Trump:
“You can’t be expected to have a successful AI program when every single article, book or anything else that you’ve read or studied, you’re supposed to pay for. (…) You just can’t do it, because it’s not doable.”
Do you think authors should be compensated for the use of their books to train AI models? Let us know in the comments.
Accessory deals on Amazon
FTC: We use income earning auto affiliate links. More.