Welcome to AI Decoded, Fast Company‘s weekly newsletter that breaks down the most important news in the world of AI. You can sign up to receive this newsletter every week via email here.
Is ‘AI slop’ code here to stay?
A few months ago I wrote about the dark side of vibe coding tools: they often generate code that introduces bugs or security vulnerabilities that surface later. They can solve an immediate problem while making a codebase harder to maintain over time. It’s true that more developers are using AI coding assistants, and using them more frequently and for more tasks. But many seem to be weighing the time saved today against the cleanup they may face tomorrow.
When human engineers build projects with lots of moving parts and dependencies, they have to hold a vast amount of information in their heads and then find the simplest, most elegant way to execute their plan. AI models face a similar challenge. Developers have told me candidly that AI coding tools, including Claude Code and Codex, still struggle when they need to account for large amounts of context in complex projects. The models can lose track of key details, misinterpret the meaning or implications of project data, or make planning mistakes that lead to inconsistencies in the code—all things that an experienced software engineer would catch.
The most advanced AI coding tools are only now beginning to add testing and validation features that can proactively surface problematic code. When I asked OpenAI CEO Sam Altman during a recent press call whether Codex is improving at testing and validating generated code, he became visibly excited. Altman said OpenAI likes the idea of deploying agents to work behind developers, validating code and sniffing out potential problems.
Indeed, Codex can run tests on code it generates or modifies, executing test suites in a sandboxed environment and iterating until the code passes or meets acceptance criteria defined by the developer. Claude Code, meanwhile, has its own set of validation and security features. Anthropic has built testing and validation routines into its Claude Code product, too. Some developers say Claude is stronger at higher-level planning and understanding intent, while Codex is better at following specific instructions and matching an existing codebase.
The real question may be what developers should expect from these AI coding tools. Should they be held to the standard of a junior engineer whose work may contain errors and requires careful review? Or should the bar be higher? Perhaps the goal should be not only to avoid generating “slop” code but also to act as a kind of internal auditor, catching and fixing bad code written by humans.
Altman likes that idea. But judging by comments from another OpenAI executive, Greg Brockman, it’s not clear the company believes that standard is fully attainable. Brockman, OpenAI’s president, suggests in a recently posted set of AI coding guidelines that AI “slop” code isn’t something to eliminate so much as a reality to manage. “Managing AI generated code at scale is an emerging problem, and will require new processes and conventions to keep code quality high,” Brockman wrote on X.
