Hello again, and welcome back to Fast Company‘s Plugged In.
For years, progress in AI has been motivated by an industry-wide yen to create software that’s at least as capable as humans—not at some tasks, but all of them. The precise definition of the goal varies, and two maddeningly overlapping terms, artificial general intelligence (AGI)and superintelligenceboth get bandied around. But no matter how you look at the aspiration (or how long you think it will take to achieve), it’s about the ways the world will change when software can do everything exceptionally well.
I’ve written—here and here—about why I believe fixating on that eventuality isn’t the best way to think about AI and its impact. It might turn out that AI trounces humanity at some jobs and never rivals it at others. That would not be reason to take it any less seriously. This week brought some of the clearest evidence of that point so far.
On April 7, Anthropic announced a new version of its Claude model called Claude Mythos Preview. Like existing Claude versions such as Sonnet and Opus, it was trained for general competency, not to be a specialist at anything in particular. But Anthropic says that when it tested Mythos, it discovered it had made dramatic strides in coding ability. It was particularly good at finding and exploiting vulnerabilities in existing software, surpassing “all but the most skilled humans.”
According to Anthropic, Mythos detected security flaws in every major operating system and web browser. It spotted a 28-year-old hole in OpenBSD, an operating system designed, above all, to be secure. It also found a 16-year-old one in a widely used piece of video software called FFMPEG that had gone unnoticed even after 5 million rounds of automated testing.
As impressive as that sounds from a technical standpoint, it’s also deeply unsettling. Rogue nation states, low-rent scammers, and other bad guys have long exploited bugs to carry out attacks. Until now, the supply of such flaws has been limited by human ability to uncover them. If AI can perform that work with unprecedented aptitude, anything that runs on software would be radically more prone to attack, from your smartphone to the country’s electrical grid.
Just to make matters more unnerving, Anthropic says early versions of Mythos behaved in various “reckless” ways, sometimes when prodded and sometimes on their own initiative. When the model was isolated in a sandbox that theoretically denied it internet access, it figured out how to break free and send one of its researchers an email. It also made changes to code and then covered its tracks, as if it was hiding something.
