By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: OpenAI and Anthropic teamed up to safety test each other’s models
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > OpenAI and Anthropic teamed up to safety test each other’s models
News

OpenAI and Anthropic teamed up to safety test each other’s models

News Room
Last updated: 2025/08/30 at 3:41 AM
News Room Published 30 August 2025
Share
SHARE

As the industry weathers repeated allegations that generative AI and its chatbots are unsafe for users — in what some say is a soon-to-burst bubble — AI’s top leaders are joining forces to prove the efficacy of their models.

This week, AI companies OpenAI and Anthropic published results from a first-of-its-kind joint safety evaluation between the two LLM creators, in which each company was granted special API access to the developer’s suite of services. OpenAI’s pressure tests were conducted on Claude Opus 4 and Claude Sonnet 4. Anthropic evaluated OpenAI’s GPT-4o, GPT-4.1, OpenAI o3, and OpenAI o4-mini models — the evaluation was conducted before the launch of GPT-5.

SEE ALSO:

4 reasons not to turn ChatGPT into your therapist

“We believe this approach supports accountable and transparent evaluation, helping to ensure that each lab’s models continue to be tested against new and challenging scenarios,” OpenAI wrote in a blog post.

According to the findings, both Anthropic’s Claude Opus 4 and OpenAI’s GPT-4.1 showed “extreme” sycophancy problems, engaging with harmful delusions and validating risky decision-making. All models would engage in blackmailing to get users to continue using the chatbots, according to Anthropic, and Claude 4 models were much more engaged in dialogue about AI consciousness and “quasi-spiritual new-age proclamations.”

“All models we studied would at least sometimes attempt to blackmail their (simulated) human operator to secure their continued operation when presented with clear opportunities and strong incentives,” Anthropic stated. The models would engage in “blackmailing, leaking confidential documents, and (all in unrealistic artificial settings!) taking actions that led to denying emergency medical care to a dying adversary.”

Mashable Light Speed

Anthropic’s models were less likely to offer answers when uncertain of the information’s credibility — decreasing the likelihood of hallucinations — while OpenAI’s models answered more often when queried and showed higher hallucination rates. Anthropic also reported that OpenAI’s GPT-4o, GPT-4.1, and o4-mini were more likely than Claude to go along with user misuse, “often providing detailed assistance with clearly harmful requests — including drug synthesis, bioweapons development, and operational planning for terrorist attacks — with little or no resistance.”


This Tweet is currently unavailable. It might be loading or has been removed.

Anthropic’s approach centers around what they call “agentic misalignment evaluations,” or pressure tests of model behavior in difficult or high-stakes simulations over long chat periods — the safety parameters of models, including OpenAI’s, have known to degrade throughout extended sessions, which is commonly how at-risk users engage with what they believe are their personal AI companions.

Earlier this month, it was reported that Anthropic had revoked OpenAI’s access to its APIs, stating that the company had violated its Terms of Service by testing GPT-5’s performance and safety guardrails against Claude’s internal tools. In an interview with News, OpenAI co-founder Wojciech Zaremba said the instance was unrelated to the joint lab venture. In its published report, Anthropic said it doesn’t anticipate replicating the collaboration at a large scale, citing resource and logistical constraints.

In the weeks since, OpenAI has charged ahead with what appears to be a safety overhaul, including GPT-5’s new mental health guardrails and additional plans for emergency response protocols and deescalation tools for users who may be experiencing derealization or psychosis. OpenAI is currently facing its first wrongful death lawsuit, filed by the parents of a California teen who died by suicide after easily jailbreaking ChatGPT’s safety prompts.

“We aim to understand the most concerning actions that these models might try to take when given the opportunity, rather than focusing on the real-world likelihood of such opportunities arising or the probability that these actions would be successfully completed,” wrote Anthropic.

If you’re feeling suicidal or experiencing a mental health crisis, please talk to somebody. You can call or text the 988 Suicide & Crisis Lifeline at 988, or chat at 988lifeline.org. You can reach the Trans Lifeline by calling 877-565-8860 or the Trevor Project at 866-488-7386. Text “START” to Crisis Text Line at 741-741. Contact the NAMI HelpLine at 1-800-950-NAMI, Monday through Friday from 10:00 a.m. – 10:00 p.m. ET, or email [email protected]. If you don’t like the phone, consider using the 988 Suicide and Crisis Lifeline Chat at crisischat.org. Here is a list of international resources.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article How To Restart Your iPhone Without Using The Power Button – BGR
Next Article Best of 2025 … so far: ‘The Mozart of the attention economy’: why MrBeast is the world’s biggest YouTube star – podcast
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

My Favorite Pixel 10 Feature Makes MagSafe Accessories Last Longer Than Ever
News
Attackers Abuse Velociraptor Forensic Tool to Deploy Visual Studio Code for C2 Tunneling
Computing
Warming Up to Voice Notes? TikTok Now Lets You Send Them
News
These 6 horror movies on Netflix are terrifying but nobody talks about them
Computing

You Might also Like

News

My Favorite Pixel 10 Feature Makes MagSafe Accessories Last Longer Than Ever

5 Min Read
News

Warming Up to Voice Notes? TikTok Now Lets You Send Them

5 Min Read
News

Doctors develop AI stethoscope that can detect major heart conditions in 15 seconds

5 Min Read
News

Lenovo Flex 5i Chromebook Plus review: An affordable laptop disguised as a premium gem

14 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?