A new artificial intelligence (AI) comes out to the market, social networks and specialized communities hallucinate with their new capabilities, and at the same time, a cycle begins where users who know the models most begin to feel disappointment. According to their experience, what until yesterday achieved without problems via chatbot or API, today stays in a vague attempt.
“Super broken” models. When he launched, Gemini 2.5 PRO reaped huge praise in social networks. The model was very fast, of the cheapest, had a huge context window and was a beast in programming. However, for a few weeks, comments have emerged in communities such as Reddit that describe a “unusable” model.
A model, which as described worked incredibly well between March and June, but that after using now at the end of July, let out “absolute nonsense.” He showed a conversation with Gemini summarized by the assistant in which he did not stop recognizing mistakes. Other users also show annoying behaviors such as not finishing answers.
They are only recent examples concerning Google AI, but even models as praised as Claude have received at different similar critical moments, even recently with Claude Code.
Suspicion. Many of the users who have criticized the different models speak of cut models: “My assumption is that they reduced the size of the model,” said a Claude 3.5 user in Hacker News. The suspicion is that, over time, and at times of maximum demand, companies begin to use distilled versions of AI models that are not so intelligent, because they have less dedicated resources to respond to the indications.
The Ian Nuttal developer also observed Claude Code’s degradation, and claimed that he would pay to have a good version that would never be reduced or degraded at peak hours. Alex Finn, also developer, also expressed frustration: “This has happened to me with all the IA programming tools that I have used.”
It’s not just a sensation. In 2023, many users felt that GPT-4, Openai’s most advanced model at that time, was becoming silly. The company claimed that contrary to what the community denounced, they made each new version “smarter than the previous one.”
However, an academic Paper ended speculation: Berkeley and Stanford experts checked a spectacular precision drop-4 drop between its variants in March and June 2023. In programming, for example “the percentage of generated responses that are directly executable was reduced from 52.0 % in March to 10.0 % in June”. Other statistical studies of the late 2023 also showed a significant loss of quality between the December and May model.
Openai and Anthropic confirmed problems. In December 2023, OpenAI Twitter.com/ChatGPTapp/status/1732979491071549792″>recognized that they had received the feedback on the assistant becoming more vague. They claimed that they had not updated the model from a month earlier, and that it was not intentional, recognizing the problem and explaining that “the behavior of the model could be unpredictable.”
Some users came to devise (and achieve, according to their experience) methods to encourage the model to do better, such as the surprising promise to give a tip or explain to the chatbot that they had no fingers to write the code.
More recently, Anthropic acknowledged Techcrunch to have problems in Claude Code, as slower response times, before complaints of users of having limited use without having reported openly. Users who previously performed tasks normally and now could not progress.
In WorldOfSoftware | I have tried day, the browser that replaces ARC and bets everything to AI. It hasn’t come out as expected