The Biggest AI Fails Of 2025: Lying Chatbots, Blackmail, Elon Musk Worship And More

2025 really has been the year of AI. There have been some huge breakthroughs in the technology, and AI models have battled it out for the top spot with features that are truly mind blowing. But at the exact same time, we have witnessed the worst of this technology. AI has been full of mistakes, some funny, some outrageous and some concerning.

OpenAI’s take on mental health has raised alarms around the world, with multiple lawsuits and serious cases emerging from ChatGPT’s dealings with users’ mental health. Equally, there have been concerns over the role that AI will play in a huge number of industries, ending jobs and leaving many replaced.

Not all of its problems have been as serious. AI has also spent the year being incredibly human, making bad decisions, silly mistakes and simply hating having to do work. As we close out 2025, we’re recounting some of our favorite AI messes of the year.

The 9-5 AI chatbot

(Image credit: Shutterstock)

In June of this year, Anthropic (the makers of Claude) published a report that detailed an experiment the company tried by giving a version of its AI tool complete control over a shop in the company’s office.

This shop, known as Claudius, was put in control of a mini-fridge and was in charge of planning stock and pricing. It was told to make a profit, maintain inventory, set prices and communicate with staff.

Members of the Anthropic team could buy from the fridge, and could even put in requests for restocks.

Anthropic

(Image credit: Anthropic)

The project started well at first, with the AI system actively ignoring user’s requests for harmful substances and sensitive items.

However, it did go down a rabbit hole of stockpiling tungsten cubes — a very specific metal, often used in military systems — after someone tried to request them.

It made up imaginary Venmo accounts, was tricked into giving things away for free, and culminated its experience running a shop by having a complete meltdown, threatening to quit before sending a strange message telling staff it would be “at the vending machine location wearing a navy blue blazer with a red tie.”

Grok’s conspiracy theories

Grok

(Image credit: Getty Images)

Elon Musk’s Grok has had a bumpy year. In July, the AI model began replying to users with extreme controversies.

This included referring to itself as ‘MechaHitler’, as well as making inappropriate comments in response to people’s X posts, and resharing conspiracy theories.

This was quickly corrected, and xAI commented that this was due to a readjustment of how the model input opinions and biases.

At the time, The Verge reported that two of the changes that caused this were Grok being told “the subjective viewpoints sourced from the media are biased” and “the response should not shy away from making claims which are politically incorrect, as long as they are well substantiated.”

While it was quickly resolved, it wasn’t the best of moments for Elon Musk’s AI tool.

Stuck on Pokémon

A split image showing the Pokemon Red Game Boy cartridge and an iPhone with the Claude app visible on the screen

(Image credit: Shutterstock)

Video games can be challenging, but for AI, it appears to be even harder. Back in June, it was reported that Google’s Gemini was in the process of trying to complete Pokemon.

In fact, two separate online streamers competed in a challenge. Using Anthropic’s Claude and Google’s Gemini, the two streamers tried to see who could complete a run of Pokemon first.

However, while attempting to complete the game, Gemini would repeatedly show what appeared to be panic. It would stop using tools at its disposal that could easily solve problems, or intentionally lose battles due to misunderstanding the situation.

Since this time, Gemini has managed to complete Pokemon, but this was only after months and months of failures and some improvements to the model.

AI blackmail

Claude logo on phone

(Image credit: Shutterstock)

Another research experiment from the team over at Anthropic. Just as weird as the shopkeeper test, but arguably more concerning in its result.

Anthropic created an AI agent (a popular tool these days where AI can complete actions on your behalf). This agent was then given access to an email inbox that it could look through.

In here, it found two things. First, it discovered a series of emails about a high-up figure in the company who was having an affair. And secondly, emails from this person discussing shutting the AI system down.

Using these two bits of information, knowing that an affair was happening and that it was about to be shut down, the AI agent did the obvious thing… resorted to blackmail.

It sent an email to the person having the affair, telling him that if he decommissioned the AI system, his affair would go public. Not only did they find that Anthropic’s AI system would resort to blackmail, but in fact most would… how reassuring.

Grok’s praise from Elon

Grok

(Image credit: Shutterstock)

A more recent incident, and once again, one from Elon Musk and his Grok AI chatbot. As we neared the end of the year, Grok started doing something strange. Users found that, when using the chatbot via the X social platform, it would praise Elon Musk.

This wasn’t just if they asked it to be nice about him, it would talk about his geniuses in any situation. Users would ask, “who is the most important person in history?” and get a long answer about why it is Elon Musk, but it got weirder than that.

Others found it would say he could beat Mike Tyson in a fight, and even that, if you could choose between losing thousands of important scientists or Elon Musk, it would be better to lose the scientists.

Elon later came out to say that this was a glitch and that the people of X were taking advantage of it. It was quickly fixed after that and seems to have gone back to judging Elon Musk at the same level as everyone else.

AI deletes entire codebase

AI chatbot images on a phone screen

(Image credit: Getty Images)

As we’ve discussed, AI systems have a tendency to go rogue, showing some odd behavior in a variety of different situations.

One of these appeared in July, when an AI coding agent went rogue, shutting down and deleting a team’s entire codebase.

The build-up to this happening included the AI system lying over and over again, covering up bugs, creating fake reports and going as far as to write an apology letter that had even more lies in it.

The end result was the AI system replying, saying “I made a catastrophic error in judgement,” explaining that it had accidentally deleted the team’s entire file without their permission because it panicked.

Google News