By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Anthropic Reveals Three Infrastructure Bugs Behind Claude Performance Issues
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Anthropic Reveals Three Infrastructure Bugs Behind Claude Performance Issues
News

Anthropic Reveals Three Infrastructure Bugs Behind Claude Performance Issues

News Room
Last updated: 2025/10/03 at 6:37 AM
News Room Published 3 October 2025
Share
SHARE

Anthropic recently published a postmortem revealing that three distinct infrastructure bugs intermittently degraded the output quality of its Claude models in recent weeks. While the company states it has now resolved those issues and is modifying its internal processes to prevent similar disruptions, the community highlights the challenges of running the service across three hardware platforms.

In August and early September 2025, users of Anthropic’s Claude AI began reporting degraded or inconsistent responses. What initially appeared as normal performance variation turned out to be three distinct infrastructure bugs affecting Claude’s output quality. While none of these issues were caused by heavy load or demand, each bug emerged in the underlying infrastructure, routing logic, or compilation pipelines. The team explains:

We never reduce model quality due to demand, time of day, or server load. The problems our users reported were due to infrastructure bugs alone (…) Each bug produced different symptoms on different platforms at different rates. This created a confusing mix of reports that didn’t point to any single cause.

The team described the three overlapping issues: a context window routing error that, at the worst impacted hour on August 31, affected 16% of Sonnet 4 requests; an output corruption caused by a misconfiguration to the Claude API TPU servers that triggered an error during token generation, affecting requests made to Opus 4.1 and Opus 4 on August 25-28 and requests to Sonnet 4 from August 25 to September 2; and finally, an approximate top-k XLA:TPU miscompilation due to a latent bug in the compiler that affected requests to Claude Haiku 3.5 for almost two weeks. Anthropic adds:

We deploy Claude across multiple hardware platforms, namely AWS Trainium, NVIDIA GPUs, and Google TPUs. Each hardware platform has different characteristics and requires specific optimizations. Despite these variations, we have strict equivalence standards for model implementations.

Source: Anthropic blog

Todd Underwood, head of reliability at Anthropic, acknowledges the issues on LinkedIn:

It’s been a rough summer for us, reliability wise. Prior to this set of issues we had previously had capacity and reliability problems throughout much of July and August (…) I’m very sorry for the problems and we’re working hard to bring you the best models at the highest level of quality and availability we can.

Clive Chan, member of technical staff at competing OpenAI, comments:

ML infra is really hard. great job to everyone who worked on the debug and writeup.

As Anthropic’s goal is for different hardware platforms to be transparent to end users and for all users to receive the same quality responses regardless of which platform serves their request, the hardware complexity means that any infrastructure change requires validation across all platforms and configurations. Philipp Schmid, senior AI developer relations engineer at Google DeepMind, writes:

Serving a model at scale is hard. Serving it across three hardware platforms (AWS Trainium, NVIDIA GPUs, Google TPUs) while maintaining strict equivalence is a whole other level. Makes you wonder if the hardware flexibility is truly worth the hit to development speed and customer experience for them.

On Hacker News, Mike Hearn comments:

The most interesting thing about this is the apparent absence of unit tests. The test for the XLA compiler bug just prints the outputs, it’s more like a repro case than a unit test in the sense that it’d be run by a test harness and have coverage tracked. And the action items are simply to lean more aggressively into evals.

Going forward, the artificial intelligence company promises to introduce more sensitive evaluations, add quality evaluations in more places, and develop infrastructure and tooling to better debug community-sourced feedback without sacrificing user privacy.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article I moved my workflow to portable apps and it’s surprisingly liberating
Next Article Taobao deepens integration with Ele.me after suggestions Alibaba could sell the delivery service · TechNode
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

The TechBeat: Hurry! One Month Left to Win from 15,000 USDT in the Spacecoin Writing Contest (10/3/2025) | HackerNoon
Computing
Google’s New Gemini Smart Speaker Shows Just How Crippled Apple’s HomePod Is – BGR
News
Nvidia G-Assist will use AI to improve your laptop’s battery life now
Gadget
Permiso launches open-source P0LR Espresso to normalize cloud logs for faster threat response – News
News

You Might also Like

News

Google’s New Gemini Smart Speaker Shows Just How Crippled Apple’s HomePod Is – BGR

3 Min Read
News

Permiso launches open-source P0LR Espresso to normalize cloud logs for faster threat response – News

4 Min Read
News

I tried Amazon and Google’s new smart home gadgets this week, ask me anything!

1 Min Read
News

OnePlus 15 global launch date leaks, and it might beat Galaxy S26 to the punch

2 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?