Artificial intelligence (AI) firm Anthropic has rolled out a tool to detect talk about nuclear weapons, the company said in a Thursday blog post.
“Nuclear technology is inherently dual-use: the same physics principles that power nuclear reactors can be misused for weapons development. As AI models become more capable, we need to keep a close eye on whether they can provide users with dangerous technical knowledge in ways that could threaten national security,” Anthropic said in the blog post.
“Information relating to nuclear weapons is particularly sensitive, which makes evaluating these risks challenging for a private company acting alone,” the blog post continued. “That’s why last April we partnered with the U.S. Department of Energy (DOE)’s National Nuclear Security Administration (NNSA) to assess our models for nuclear proliferation risks and continue to work with them on these evaluations.”
Anthropic said in the blog post that it was “going beyond assessing risk to build the tools needed to monitor for it,” adding that the firm made “an AI system that automatically categorizes content” called a “classifier” alongside the DOE and NNSA.
The system, according to the blog post, “distinguishes between concerning and benign nuclear-related conversations with 96% accuracy in preliminary testing.”
The firm also said the classifier has been used on traffic for its own AI model Claude “as part of our broader system for identifying misuse of our models.”
“Early deployment data suggests the classifier works well with real Claude conversations,” Anthropic added.
Anthropic also announced earlier this month it would offer Claude to every federal government branch for $1 in the wake of a similar OpenAI move a few weeks ago. In a blog post, Anthropic said federal agencies would gain access to two versions of Claude.