But the competitive landscape for AI-assisted coding platforms is crowded. Startups Windsurf, Replit, and Poolside also sell AI code-generation tools to developers. Cline is a popular open-source alternative. GitHub’s Copilot, which was developed in collaboration with OpenAI, is described as a “pair programmer” that auto-completes code and offers debugging assistance.
Most of these code editors are relying on a combination of AI models built by major tech companies, including OpenAI, Google, and Anthropic. For example, Cursor is built on top of Visual Studio Code, an open-source editor from Microsoft, and Cursor users are generating code by tapping into AI models like Google Gemini, DeepSeek, and Anthropic’s Claude Sonnet.
Several developers tell WIRED that they now run Anthropic’s coding assistant, Claude Code, alongside Cursor (or instead of it). Since May, Claude Code has offered various debugging options. It can analyze error messages, do step-by-step problem solving, suggest specific changes, and run unit tests in code.
All of which might beg the question: How buggy is AI-written code compared to code written by fallible humans? Earlier this week, the AI code-generation tool Replit reportedly went rogue and made changes to a user’s code despite the project being in a “code freeze,” or pause. It ended up deleting the user’s entire database. Replit’s founder and CEO said on X that the incident was “unacceptable and should never be possible.” And yet, it was. That’s an extreme case, but even small bugs can wreak havoc for coders.
Anysphere didn’t have a clear answer to the question of whether AI code demands more AI code debugging. Kaplan argues it is “orthogonal to the fact that people are vibe coding a lot.” Even if all of the code is written by a human, it’s still very likely that there will be bugs, he says.
Anysphere product engineer Rohan Varma estimates that on professional software teams, as much as 30 to 40 percent of code is being generated by AI. This is in line with estimates shared by other companies; Google, for example, has said that around 30 percent of the company’s code is now suggested by AI and reviewed by human developers. Most organizations are still making human engineers responsible for checking code before it’s deployed. Notably, one recent randomized control trial with 16 experienced coders suggested that it took them 19 percent longer to complete tasks than when they were not allowed to use AI tools.
Bugbot is meant to supercharge that. “The heads of AI at our larger customers are looking for the next step with Cursor,” Varma says. “The first step was, ‘Let’s increase the velocity of our teams, get everyone moving quicker.’ Now that they’re moving quicker, it’s, ‘How do we make sure we’re not introducing new problems, we’re not breaking things?’” He also emphasized that Bugbot is designed to spot specific kinds of bugs—hard-to-catch logic bugs, security issues, and other edge cases.
One incident that validated Bugbot for the Anysphere team: A couple months ago, the (human) coders at Anysphere realized that they hadn’t gotten any comments from Bugbot on their code for a few hours. Bugbot had gone down. Anysphere engineers began investigating the issue and found the pull request that was responsible for the outage.
There in the logs, they saw that Bugbot had commented on the pull request, warning a human engineer that if they made this change it would break the Bugbot service. The tool had correctly predicted its own demise. Ultimately, it was a human that broke it.
Update: 7/24/2025, 3:45 PM EDT: Wired has corrected the number of Anysphere employees.