AI large language model upgrades and rollouts are on a roll this year. Since February, there have been no less than five major LLM upgrades, and that’s not counting OpenAI’s launch of its reasoning-enabled ChatGPT o1-preview in September.
Each new version of these LLMs seems to be better at creating more human sounding content than earlier iterations. This creates a challenge for anyone trying to discern between writings gleaned from a human soul verses a SOL programming model.
Tool To Spy On AI’s Most Overused Phrases and Words
The team at AI detection firm GPTZero has launched a new dynamic feature that posts a list of the 50 most overused words and phrases from LLMs. GPTZero offers free basic and paid advanced scans of uploaded material into a query window that returns a probability result if material was written by a bot or human body.
During a Zoom call, GPTZero CEO Edward Tian said there are two parts to their latest update.
“It’s called AI Vocabulary scan. The first part is that it highlights and lists for registered users the words in a scanned file that match the list of overused AI phrases and words,” Tian said.
“And we’re also launching for everyone, free access to our vocabulary monitor. You can describe it as kind of updating an encyclopedia of AI terms on the internet,” he added.
Tian said the lexicon of LLM retread words will be posted on the website homepage and will be updated on a monthly basis — a screenshot is provided below.
Why Does AI Reuse Certain Language Soooo Much?
Tian said that they’re able to identify the most-used AI words based on the more than 3 million analyses of AI to human content. The current ranking of the top five most frequent AI words and phrases are:
- “objective study aimed” — 269x more frequent in AI
- “research needed to understand” — 235x more frequent in AI
- “despite facing” — 209x more frequent in AI
- “play significant role shaping” — 182x more frequent in AI
- “crucial role in shaping” — 155x more frequent in AI
What the list shows is that each time a scan of a human piece of content found the embedded phrase “…objective study aimed…” — that same phrase appeared 269 times in AI generated content.
Tian said identifying overused AI terms is one thing but understanding why these machines overwrite in the extreme is another. While there are several factors that drive this phenomenon Tian said one reason stands out.
“It’s because these LLMs are statistical parrots, or stochastic [randomly determined] parrots in that sense. They take what they learned from their training data and they overfit it to specific technology because it’s all that they know based on the data they were trained,” explained Tian.
Using The AI Vocabulary Scan
An example of how the actual AI Vocabulary tracker works is illustrated below. Tian stressed that the vocabulary words highlighted as overused by AI in the new feature aren’t suggesting that those specific words were AI generated in the uploaded material.
Instead, it’s just a cautionary heads-up that the words match some of the same ones on the complete list of thousands of repeated AI language examples that are available through the tracker — separate from the top 50 on the homepage.
Users get five free scans per month of the AI Vocabulary tracker before a subscription is required. Tian said one of the reasons this new technology is necessary is to give humans an enhanced resource to advance creativity rather than the staid phrasing and lazy language from LLMs.
“I would say there is a argument for what is just effective communication, and then there’s a factor of what is original and preserving the space for original writing that’s different from ChatGPT,” he said.
“Because if chat GPT is always analyzing all of human writing to copy it, humans should have the same tools to analyze ChatGPT writing and kind of plan ahead — almost like a game of chess,” concluded Tian.