LLM-Driven Large Code Rewrites With Relicensing Are The Latest AI Concern

The newest open-source concern around AI that is seeing a lot of interest this weekend is when large language models / AI code generators may rewrite large parts of a codebase and then the “developers” claiming an alternative license incompatible with the original source license. This became a real concern this week with a popular Python project experiencing an AI-driven code rewrite and now published under an alternative license that its original author does not agree with and incompatible with the original code.

Chardet as a Python character encoding detector with its v7.0 release last week was a “ground-up, MIT-licensed rewrite of chardet.” This rewrite was largely driven via AI/LLM and claims to be up to 41x faster and offer an array of new features. But with this AI-driven rewrite, the license shifted from the LGPL to MIT.

Mark Pilgrim as the original author of Chardet though has now come out to say that the current developers have no right to relicense the code. He wrote on the public GitHub:

“Hi, I’m Mark Pilgrim. You may remember me from such classics as “Dive Into Python” and “Universal Character Encoding Detector.” I am the original author of chardet. First off, I would like to thank the current maintainers and everyone who has contributed to and improved this project over the years. Truly a Free Software success story.

However, it has been brought to my attention that, in the release 7.0.0, the maintainers claim to have the right to “relicense” the project. They have no such right; doing so is an explicit violation of the LGPL. Licensed code, when modified, must be released under the same LGPL license. Their claim that it is a “complete rewrite” is irrelevant, since they had ample exposure to the originally licensed code (i.e. this is not a “clean room” implementation). Adding a fancy code generator into the mix does not somehow grant them any additional rights.

I respectfully insist that they revert the project to its original license.”

That, obviously, has opened a whole can of worms. That GitHub thread was since locked and has spilled over into other discussions. Many agree that it breaks the original license given that the large language model was leveraging the original code and still in large part relies on it. Others arguing the legal semantics around AI/LLMs, etc.

While the Python character encoding detector is just one specific project, there is the possibility of this happening for other open-source projects, unofficial/third-party “rewrites” in alternative licenses, etc. As this is becoming now increasingly more likely with LLM coding agents becoming increasingly more capable, the topic was also brought up on the Linux kernel mailing list around the concern of large parts of the kernel codebase being potentially rewritten and/or attempted relicensing of the generated code by coding agents. We’ll see what comes out of the discussion.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Leave a Reply