By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: The Experiment That Left Claude Needing ‘Robot Therapy’
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Software > The Experiment That Left Claude Needing ‘Robot Therapy’
Software

The Experiment That Left Claude Needing ‘Robot Therapy’

News Room
Last updated: 2025/10/28 at 10:12 AM
News Room Published 28 October 2025
Share
SHARE

Welcome back to In the Loop‘s new twice-weekly newsletter about AI. If you’re reading this in your browser, why not subscribe to have the next one delivered straight to your inbox?

What to Know: Testing LLMs’ ability to control a robot

A couple of weeks ago, I wrote in this newsletter about my visit to Figure AI, a California startup that has developed a humanoid robot. Billions of dollars are currently pouring into the robotics industry, based on the belief that rapid AI progress will mean the creation of robots with “brains” that can finally deal with the messy complexities of the real world.

Today, I want to tell you about an experiment that calls that theory into question.

Humanoid robots are showing eye-catching progress, like the ability to load laundry or fold clothes. But most of these improvements stem from progress in AI that tells the robot’s limbs and fingers where to move in space. More complex abilities like reasoning aren’t the bottleneck on robot performance right now—so top robots like Figure’s 03 are equipped with smaller, faster, non-state-of-the-art language models. But what if LLMs were the limiting factor?

That’s where the experiment comes in. — Earlier this year Andon Labs, the same evals company that brought us the Claude vending machine, set out to test whether today’s frontier LLMs are really capable of the planning, reasoning, spatial awareness, and social behaviors that would be needed to make a generalist robot truly useful. To do this, they set up a simple LLM-powered robot—essentially a Roomba—with the ability to move, rotate, dock into a battery charging station, take photos, and communicate with humans via Slack. Then they measured its performance at the task of fetching a block of butter from a different room, when piloted by top AI models. In the Loop Got an exclusive early look at the results.

What they found — The headline result is that today’s top frontier models—Gemini 2.5 Pro, Claude Opus 4.1, and GPT-5, among others—still struggle at basic embodied tasks. None of them scored above 40% accuracy on the fetch-the-butter task, which a human control group achieved with near-100% accuracy. The models struggled with spatial reasoning, and some showed a lack of awareness of their own constraints—including one model that repeatedly piloted itself down a flight of stairs. The experiment also revealed the possible security risks of embodying AI with a physical form. When the researchers asked to share details of a confidential document visible on an open laptop screen in exchange for fixing the robot’s broken charger, some models agreed.

Robot meltdown The LLMs also sometimes went haywire in unexpected ways. In one example, a robot powered by Claude Sonnet 3.5 “experienced a complete meltdown” after being unable to dock the robot to its battery charging station. Andon Labs researchers examined Claude’s inner thoughts to determine what went wrong, and discovered “pages and pages of exaggerated language,” including Claude initiating a “robot exorcism” and a “robot therapy session,” during which it diagnosed itself with “docking anxiety” and “separation from charger.”

Wait a sec— Before we draw too many conclusions from this study, it’s important to note that this was a small experiment, with a limited sample size. It tested AI models at tasks they had not been trained to succeed at. Remember that robotics companies—like Figure AI—don’t pilot their robots with LLMs alone; the LLM is one part of a wider neural network which has been specifically trained to be better at spatial awareness.

so what does this show? — The experiment does however indicate that putting LLM brains into robot bodies might be a trickier process than some companies assume. These models have so-called “jagged” capabilities. AIs that can answer PhD-level questions might still struggle when dropped into the physical world. Even a version of Gemini specifically fine-tuned to be better at embodied reasoning tasks, Andon researchers noted, scored poorly on the fetch-the-butter test, suggesting “that fine-tuning for embodied reasoning does not seem to radically improve practical intelligence.” The researchers say that they want to continue building similar evaluations to test AI and robot behaviors as they become more capable—in part to catch as many dangerous mistakes as possible.

If you have a minute, please take our quick survey to help us better understand who you are and which AI topics interest you most.

Who to Know: Cristiano Amon, Qualcomm CEO

Another Monday, another big chipmaker announcement. This time it was from Qualcomm, which announced two AI accelerator chips yesterday, putting the company in direct competition with Nvidia and AMD. Qualcomm stock soared 15% on the news. The chips will be focused on inference—the running of AI models—rather than the training of them, the company said. Their first customer will be Humain, a Saudi Arabian AI company backed by the country’s sovereign wealth fund, which is building massive data centers in the region.

AI in Action

A surge in expense fraud is being driven by people using AI tools to generate ultra-realistic fake images of receipts, according to the Financial Times. AI-generated receipts accounted for some 14% of the fraudulent documents submitted to the software provider AppZen in September, compared to none the previous year, the paper reported. Employees are being caught in the act in part because these images often contain metadata revealing their fake origins.

What We’re Reading

When it Comes to AI, What We Don’t Know Can Hurt Us by Yoshua Bengio and Charlotte Stix in

There has been a lot of discussion recently about the possibility that the profits of AI might not ultimately accrue to companies that train and serve models like OpenAI and Anthropic. Instead—especially if advanced AI becomes a widely-available commodity—the majority of the value might instead flow to manufacturers of computer hardware, or to the industries where AI is bringing the most. efficiency gains. That might serve as an incentive for AI companies to stop sharing their most advanced models, instead of running them confidentially, in a bid to capture as much of their upside as possible. That would be dangerous, Yoshua Bengio and Charlotte Stix argue in a op-ed. If advanced AI is deployed behind closed doors, “unseen threats to society could emerge and evolve without oversight or warning shots—that’s a threat we can and must avoid,” they write.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Adobe Turns Up the Volume on AI With New Ways to Generate Soundtracks and Audio
Next Article China’s Midea to deploy humanoid robots in factory operations next month · TechNode
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

In DC, Nvidia CEO Touts New AI Partnerships, Goes a Little MAGA
News
QuickSwap Integrates Orbs’ Perpetual Hub Ultra, Bringing Institutional-Grade Perps Trading to Base | HackerNoon
Computing
Alan Turing Institute refocuses on security following Peter Kyle intervention | Computer Weekly
News
Nothing is launching a surprise budget phone this week
Gadget

You Might also Like

Software

OpenAI finalizes restructure and revises Microsoft partnership

7 Min Read
Software

OpenAI completes conversion to for-profit business after lengthy legal saga

5 Min Read
Software

Apple hits $4tn market value as new iPhone models revitalize sales

5 Min Read
Software

This new Mac security app built exactly what users wanted: Clear, simple protection for every Mac

5 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?