-
AI tools don’t always increase productivity. A recent study from Model Evaluation and Threat Research found that when 16 software developers were asked to complete tasks using AI tools, it took longer to do so than when they did not use the technology, despite their expectations that AI would increase productivity. The research challenges the dominant narrative that AI drives efficiency in the workplace.
It’s like a new “Tortoise and the Hare” story: a group of experienced software engineers took on an experiment in which they were tasked with completing part of their work using AI tools. Thinking like crazy, developers expected AI to speed up their work and increase productivity. Instead, technology slowed them down. The AI-free turtle approach would, in the context of the experiment, have been faster.
The results of this experiment, part of a recent study, came as a surprise to the software developers charged with using AI – and to the study’s authors, Joel Becker and Nate Rush, technical staffers at the nonprofit technology research organization Model Evaluation and Threat Research (METR).
The researchers enlisted 16 software developers, who had an average of five years of experience, to complete 246 tasks, all of which were part of projects they were already working on. For half of the tasks, the developers were allowed to use AI tools (most chose code editor Cursor Pro or Claude 3.5/3.7 Sonnet) and for the other half the developers performed the tasks themselves.
Believing that the AI tools would make them more productive, the software developers predicted that the technology would reduce the time to complete their tasks by an average of 24%. Instead, AI increased their on-task time by up to 19% compared to not using the technology.
“While I like to believe that my productivity has not suffered as a result of using AI for my tasks, it is not unlikely that it has not helped me as much as I expected, or perhaps even hindered my efforts,” Philipp Burckhardt, a participant in the study, wrote in a blog post about his experiences.
So where did the hares deviate from the path? The experienced developers, in the middle of their own projects, likely approached their work with a lot of extra context that their AI assistants didn’t have, which meant they had to fit their own agenda and problem-solving strategies into the AI’s output, which they also spent ample time debugging, according to the study.
“The majority of developers who participated in the survey noted that even when they get AI outputs that are generally useful to them — and talk about the fact that AI can often do chunks of very impressive work, or sort of very impressive work — these developers have to spend a lot of time cleaning up the resulting code to actually make it suitable for the project,” study author Rush told me. Fortune.
Other developers lost time writing prompts for the chatbots or waiting for the AI to generate results.
The study results contradict lofty promises about AI’s ability to transform the economy and the workforce, including a 15% increase in US GDP by 2035 and ultimately a 25% increase in productivity. In fact, many companies are not yet seeing a return on AI investments. An MIT report published in August found that of 300 AI implementations, only 5% achieved rapid revenue growth. According to a Harvard Business Review Analytic Services research report published last month, only 6% of companies fully trust AI to perform core business operations.
But Rush and Becker hesitate to make sweeping statements about what the results of their research mean for the future of AI.
First, the study’s sample was small and non-generalizable, including only a specialized group of people to whom these AI tools were brand new. The study also measures technology at a specific point in time, the authors said, and does not rule out the possibility that AI tools could be developed in the future that would indeed help developers improve their workflow.
The aim of the study was, broadly, to slow the torrid implementation of AI in the workplace and elsewhere, recognizing that more data on the actual effects of AI must be made known and accessible before more decisions are made about its applications.
“Some of the decisions we’re making now around the development and implementation of these systems have the potential to have very significant consequences,” Rush said. “If we’re going to do that, let’s not just take the obvious answer. Let’s do high-quality measurements.”
Economists have already argued that METR’s research fits into broader narratives about AI and productivity. As AI begins to erode at entry-level positions, it may provide diminishing returns for skilled workers such as experienced software developers, according to Aneesh Raman, LinkedIn’s chief economic opportunity officer.
“For those people who have 20 years, or in this particular example, five years of experience, it may not be their main job that we should look for and force them to start using these tools if they are already doing well in their jobs with their existing work methods,” Anders Humlum, assistant professor of economics at the University of Chicago Booth School of Business, told me. Fortune.
Humlum has similarly conducted research into the impact of AI on productivity. He found in a May work study that among 25,000 employees across 7,000 workplaces in Denmark – a country with a similar AI adoption as the US – productivity improved by a modest 3% among workers who used the tools.
Humlum’s research supports MIT economist and Nobel laureate Daron Acemoglu’s claim that markets have overestimated AI’s productivity gains. Acemoglu states that only 4.6% of tasks within the US economy will be made more efficient with AI.
“In a rush to automate everything, even the processes that shouldn’t be automated, companies will waste time and energy and not achieve any of the promised productivity benefits,” Acemoglu previously wrote Fortune. “The hard truth is that achieving productivity gains from any technology requires organizational adjustments, a series of additional investments and improvements in employee skills, through on-the-job training and learning.”
The case of software developers’ limited productivity points to the need to think critically about the implementation of AI tools, Humlum said. While previous research on AI productivity has looked at self-reported data or specific and narrow tasks, data on the challenges faced by skilled workers using the technology complicates the picture.
“In the real world, many tasks are not as simple as just typing in ChatGPT,” Humlum said. “Many experts have accumulated a lot of experience that is very useful, and we should not simply ignore it and give up the valuable expertise we have built up.
“I would just take this as a good reminder to be very careful when using these tools,” he added.
A version of this story originally published on Fortune.com on July 20, 2025.
This story originally appeared on Fortune.com
