If you’re a programmer who is scared about AI taking your job, just like many other members of the general public, Microsoft might have some promising news for you.
Microsoft Research, Microsoft’s R&D division, tested a variety of the most popular large language models (LLMs) and found many came up surprisingly short when it came to a common programming task. The study tested nine different models, including Anthropic’s Claude 3.7 Sonnet, OpenAI’s o1, and OpenAI’s o3-mini.
The researchers assessed the ability of these AIs to perform “debugging,” the process where programmers sift through existing code to find flaws that prevent it from working as intended (something that often takes up huge chunks of programmers’ time). Microsoft hooked up the AIs to a third-party debugging assistant it created called Debug Gym and tested the AIs on a common software benchmark known as SWE-bench.
However, the study had mixed results, and none of the tools achieved even a 50% success rate, even with the help of Debug Gym. Anthropic’s Claude 3.7 Sonnet was the best performer, managing to successfully debug the faulty code in 48.4% of cases. OpenAI’s o1 achieved success 30.2% of the time, while OpenAI’s o3-mini did so 22.1% of the time.
Microsoft’s team reiterated that they believe that AI tools like the above can become effective code debuggers, and said it plans “to fine-tune an info-seeking model specialized in gathering the necessary information to resolve bugs” in its future research.
The findings may provide some slight relief for worried programmers, as more of the tech world’s largest names pivot toward using AI for coding.
Recommended by Our Editors
In October 2024, Google announced it is now using AI to write “a quarter of all new code” during an earnings call. Meanwhile, AI startup Cognition Labs rolled out a new AI tool last year, dubbed Devin AI, that it claims can write code without human interference, complete engineering jobs on Upwork, and adjust its own AI models.
Meta CEO Mark Zuckerberg is another famous face making big claims about the rise of AI programmers. He told podcaster Joe Rogan that his company “are going to have an AI that can effectively be a sort of mid-level engineer that you have at your company that can write code” at some point in 2025, adding he expected other companies to have similar capabilities.
Get Our Best Stories!
Your Daily Dose of Our Top Tech News
By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.
Thanks for signing up!
Your subscription has been confirmed. Keep an eye on your inbox!
About Will McCurdy
Contributor
