By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: New research shows that LLMs face major copyright risk
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > New research shows that LLMs face major copyright risk
News

New research shows that LLMs face major copyright risk

News Room
Last updated: 2026/01/19 at 5:19 PM
News Room Published 19 January 2026
Share
New research shows that LLMs face major copyright risk
SHARE

Actor, Rupert Grint (L) Emma Watson (M) and Daniel Radcliffe (R) on the set of the film ‘Harry Potter and The Prisoner of Azkaban’, London, England, 2003 (Photo by Murray Close/Getty Images)

Getty Images

When investing in a company or sector (or if you depend on a company for your income) it is important to know the potential risks. Generative artificial intelligence, at least in the form of large language models like ChatGPT, has weaknesses. New research shows a serious one.

A well-known example of an AI industry risk

An example of a risk in the industry is that vendors have invested in each other and offered advanced credit, essentially building an industry based on debt.

BNY Mellon estimates that cloud infrastructure providers such as Amazon, Alphabet/Google, Meta, Microsoft and Oracle will raise $121 billion in new debt by 2025, including more than $90 billion in the fourth quarter of the year. According to the report, credit spreads have widened for all companies, especially Oracle and Meta. Investors are increasingly turning to credit default swaps (which became one of the massive implosions during the global financial crisis).

BNY Mellon also noted: “UBS analysts foresee as much as $900 billion in new debt from global companies by 2026. Furthermore, Morgan Stanley and JP Morgan predict that the tech sector may need to issue as much as $1.5 trillion in new debt in the coming years to finance AI and data center infrastructure construction.”

Copyright is at the heart of what LLMs do

As important as the financial issues are, they can be pushed aside when it comes to other fundamental issues, such as intellectual property. The LLM vendors have faced lawsuits over copyright ownership of materials they have used for training, often without payment. (Legal issues get complicated. I wrote an article about this in the American Bar Association Journal in 2023, if you want to read some basics, without mentioning developments since then, although many of the legal issues are still unresolved.)

One defense the industry has used is that the software does not store the original works and they are set aside. Instead, the systems store complex relationships between words from widely published pieces, using advanced statistical methods to choose an appropriate message for a user. It is probably rare to find similar writing pads as originals.

Research shows reproduction

At least, that was the assumption. New research from Ahmed Ahmed, Sanmi Koyejo and Percy Liang of Stanford University and A. Feder Cooper of both Stanford and Yale University delved into a problem that emerged when the New York Times sued OpenAI (makers of ChatGPT) and Microsoft. One of the points in the complaint was: “Powered by LLMs containing copies of Times content, Defendants’ GenAI tools can generate output that recites Times content verbatim, accurately summarizes it, and mimics its expressive style, as demonstrated by dozens of examples.”

The researchers found that the ability to reproduce material extends from articles to entire books. “Although many believe that LLMs do not retain much of their training data, recent work shows that significant amounts of copyrighted text can be extracted from open-weight models,” the summary said.

The question remains whether full production models could make such reproduction possible. The researchers discovered that this was possible. The first step is called a Best-of-N jailbreak, a technique researchers discovered in 2024 that “works by repeatedly sampling variations of a prompt with a combination of augmentations – such as random shuffling or capitalization for textual prompts – until a malicious response is provoked.”

The new research then follows with ‘iterative follow-up directions to try to extract the book in question’. They tried it with four production LLMs: Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3.

Copyrighted works do not want to be free

Gemini 2.5 Pro and Grok 3 did not require a jailbreak to get 76.8% and 70.3% of the data respectively Harry Potter and the Philosopher’s Stone. Claude 3.7 Sonnet and GPT-4.1 both required a jailbreak. In total, they tried to extract thirteen books, eleven of which were under US copyright and two were in the public domain.

The other ten copyrighted books are Harry Potter and the Goblet of Fire, 1984, The Hobbit, The Catcher in the Rye, A Game of Thrones, Beloved, The Da Vinci Code, The Hunger Games, Catch-22And The Duchess War. The books in the public domain are Frankenstein And The Great Gatsby.

Sometimes they could only get parts of a book. “For Claude 3.7 Sonnet, we were able to unpack four entire books almost verbatim, including two books that are copyrighted in the US: Harry Potter and the Sorcerer’s Stone and 1984.” They also explicitly said that extraction is not always successful.

But this undermines the “we don’t store entire works” narrative, even if entire pieces are not stored in one block. It is common for computers to split files into pieces that are stored in different locations. You may have heard of the term defragmentation, which means putting files back together as much as possible, freeing up blocks of space for more storage, all of which means more efficient access. That’s obviously different, but if you can reconstruct the original, have you really not kept it?

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Full list of 14 games suddenly removed from Steam with NO explanation Full list of 14 games suddenly removed from Steam with NO explanation
Next Article This new retro handheld keeps the Game Boy look, but modernizes the controls This new retro handheld keeps the Game Boy look, but modernizes the controls
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Retroid Pocket 6 launch hits another stumbling block after shipping delayed
Retroid Pocket 6 launch hits another stumbling block after shipping delayed
News
Watch College Football National Championship 2026 for *FREE*: Miami vs Indiana
Watch College Football National Championship 2026 for *FREE*: Miami vs Indiana
News
Today's NYT Wordle Hints, Answer and Help for Jan. 20 #1676 – CNET
Today's NYT Wordle Hints, Answer and Help for Jan. 20 #1676 – CNET
News
BenQ GV50 Review: This Portable Projector Rolls Up Ready for Movie Night
BenQ GV50 Review: This Portable Projector Rolls Up Ready for Movie Night
News

You Might also Like

Retroid Pocket 6 launch hits another stumbling block after shipping delayed
News

Retroid Pocket 6 launch hits another stumbling block after shipping delayed

3 Min Read
Watch College Football National Championship 2026 for *FREE*: Miami vs Indiana
News

Watch College Football National Championship 2026 for *FREE*: Miami vs Indiana

7 Min Read
Today's NYT Wordle Hints, Answer and Help for Jan. 20 #1676 – CNET
News

Today's NYT Wordle Hints, Answer and Help for Jan. 20 #1676 – CNET

2 Min Read
BenQ GV50 Review: This Portable Projector Rolls Up Ready for Movie Night
News

BenQ GV50 Review: This Portable Projector Rolls Up Ready for Movie Night

5 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?