Chinese Hackers Successfully Used Anthropic's AI For Cyberespionage

In a scary sign of how AI is reshaping cyberattacks, Chinese state-sponsored hackers allegedly used Anthropic’s AI coding tool to try and infiltrate roughly 30 global targets, the company says.

“The operation targeted large tech companies, financial institutions, chemical manufacturing companies, and government agencies,” Anthropic added, noting the attacks “succeeded in a small number of cases.” Notably, it’s “the first documented case of agentic AI successfully obtaining access to confirmed high-value targets for intelligence collection, including major technology corporations and government agencies,” the company’s report adds.

This Tweet is currently unavailable. It might be loading or has been removed.

The other disturbing part is that Anthropic’s AI helped automate most of the hacking spree, which focused on cyberespionage. “We believe this is the first documented case of a large-scale cyberattack executed without substantial human intervention,” the company said.

Anthropic detected the hacking operation in mid-September. It involved the suspected Chinese hackers abusing Claude Code, which uses Anthropic’s AI agent technology for computer coding purposes. The company didn’t say how it linked China to the AI misuse, only that Anthropic has “high confidence” it was a Chinese state-sponsored group.

Although Claude Code features safeguards to prevent abuse, the hackers were able to “jailbreak” the AI by coming up with prompts that covered up the fact that they were orchestrating a breach.

“They broke down their attacks into small, seemingly innocent tasks that Claude would execute without being provided the full context of their malicious purpose,” Anthropic explained. “They also told Claude that it was an employee of a legitimate cybersecurity firm, and was being used in defensive testing.”

(Credit: Anthropic)

The prompts manipulated Claude Code into testing security vulnerabilities in a target’s IT systems, including writing computer code to initiate the attacks, harvesting the usernames and passwords during the infiltration, and then orchestrating an even deeper breach to steal data.

Get Our Best Stories!

Stay Safe With the Latest Security News and Updates

SecurityWatch Newsletter Image

Sign up for our SecurityWatch newsletter for our most important privacy and security stories delivered right to your inbox.

By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.

Thanks for signing up!

Your subscription has been confirmed. Keep an eye on your inbox!

“The highest-privilege accounts were identified, backdoors were created, and data were exfiltrated with minimal human supervision,” the company added. “Overall, the threat actor was able to use AI to perform 80-90% of the campaign, with human intervention required only sporadically.”

The incident underscores fears that AI agents will make it easy for hackers to automate and unleash all kinds of malicious activities, including sophisticated breaches they otherwise wouldn’t have been able to achieve on their own. As technology advances, state-sponsored hackers could also create their own AI-powered hacking systems without relying on third-party providers.

Recommended by Our Editors

“These attacks are likely to only grow in their effectiveness,” Anthropic further warned. After detecting the hacking campaign, the company banned the Claude Code accounts the Chinese hackers were using and “notified affected entities as appropriate, and coordinated with authorities as we gathered actionable intelligence.”

Still, the company disclosed the incident after Anthropic reported a separate hacker trying to use its Claude AI to automate a large-scale data extortion campaign that targeted 17 organizations. But in that case, the hacker appeared to be focused on financial cybercrime, and demanded ransoms from victims.

In response, Anthropic says it’s built more safeguards to flag and stop abuse of Claude Code. The company is also betting its AI technology will outweigh the risks and help automate the defense of IT systems, bolstering cybersecurity overall, rather than contribute to cybercrime.

Anthropic also noted an interesting limitation is how Claude Code would hallucinate inaccurate information to the Chinese hackers, including overstating findings or fabricating data.