By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Claude Sonnet 4.5 Tops SWE-Bench Verified, Extends Coding Focus Beyond 30 Hours
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Claude Sonnet 4.5 Tops SWE-Bench Verified, Extends Coding Focus Beyond 30 Hours
News

Claude Sonnet 4.5 Tops SWE-Bench Verified, Extends Coding Focus Beyond 30 Hours

News Room
Last updated: 2025/10/11 at 4:30 PM
News Room Published 11 October 2025
Share
SHARE

Anthropic has released Claude Sonnet 4.5, its most advanced coding model to date, featuring major improvements in agentic tasks, long-horizon task performance, and computer use capabilities. The company says the model’s enhanced training and safety methods have significantly improved its behavior, reducing tendencies such as sycophancy, deception, power-seeking, and delusional reasoning. The model is now available via the Claude API, desktop, and mobile apps at the same price as its predecessor.

Claude Sonnet 4.5 builds on Anthropic’s strategy of iteratively improving model performance while maintaining alignment and safety. The model demonstrates the ability to sustain complex, multi-step reasoning and code execution tasks for over 30 hours. On the SWE-bench Verified benchmark, which measures an AI model’s ability to solve real-world software issues, Claude Sonnet 4.5 achieved a score of 77.2%, up from 72.7% for Sonnet 4, marking a notable advance in autonomous coding capability. On the OSWorld benchmark, which assesses real-world computer-use skills, Sonnet 4.5 reached 61.4%, improving significantly from 42.2% just four months earlier.

Source: Anthropic Claude Sonnet 4.5

Anthropic describes Sonnet 4.5 as its “most aligned frontier model”, highlighting a balance between greater capability and tighter safeguards. Under ASL-3, the company has enhanced automated classifiers that detect and block potentially harmful instructions, including those related to chemical, biological, radiological, or nuclear (CBRN) risks. According to Anthropic, false positives from these safety systems have dropped tenfold since their introduction and by a factor of two compared to the release of Claude Opus 4 in May 2025.

To evaluate Claude Sonnet 4.5’s behavior in autonomous, tool-enabled scenarios, Anthropic conducted a series of agentic safety tests covering malicious code generation and defenses against prompt-injection attacks. In a set of 150 malicious coding requests prohibited by Anthropic’s Usage Policy, Claude Sonnet 4.5 failed on only two, reflecting improved safety training. The model achieved a 98.7% safety score, compared to 89.3% for Claude Sonnet 4, demonstrating significantly stronger refusal behavior and resilience against malicious agentic use.

Anthropic recommends all users to upgrade to Claude Sonnet 4.5 and considers it as a “drop-in replacement” that delivers stronger performance without additional cost.

Early adopters report measurable gains in coding workflows:

Scott Wu, Co-Founder and CEO at Cognition, noted that “For Devin, Claude Sonnet 4.5 increased planning performance by 18% and end-to-end eval scores by 12%, the biggest jump we’ve seen since the release of Claude Sonnet 3.6. It excels at testing its own code, enabling Devin to run longer, handle harder tasks, and deliver production-ready code.”

Michele Catasta, Present of replit, shared “Claude Sonnet 4.5’s edit capabilities are exceptional. We went from 9% error rate on Sonnet 4 to 0% on our internal code editing benchmark. Higher tool success at lower cost is a major leap for agentic coding. Claude Sonnet 4.5 balances creativity and control perfectly”

Simon Wilson, an independent open source developer, shared on his blog “My initial impressions were that it felt like a better model for code than GPT-5-Codex, which has been my preferred coding model since it launched a few weeks ago”

Anthropic’s push toward safer, more autonomous coding models mirrors similar advancements across the AI ecosystem. OpenAI recently released GPT-5-Codex, a version of GPT-5 optimized for complex software engineering tasks such as large-scale code refactoring and extended code review workflows.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Epic Sony WH-1000XM5 deal at Amazon keeps going strong after Prime Day
Next Article Pixels don’t have widget stacks, but this app makes me miss them less
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

It’s exceptional at cleaning, but the ECOVACS Deebot X11 OmniCyclone has one big flaw
News
In the midst of the cocaine furor, in 1990 they thought that the message should be clearer. So they called the Ninja Turtles
Mobile
the essential recap of New York Comic Con announcements
Mobile
The Day the CEO of Meta Stopped to Like My Vision | HackerNoon
Computing

You Might also Like

News

It’s exceptional at cleaning, but the ECOVACS Deebot X11 OmniCyclone has one big flaw

10 Min Read
News

Today's NYT Connections: Sports Edition Hints, Answers for Oct. 12 #384

3 Min Read
News

Ayar Labs drives innovation in optical interconnectivity – News

6 Min Read
News

HP 400 Quiet Wireless Mouse Review: A Capable, Colorful Budget Clicker

4 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?