Despite widespread industry recommendations, a new ETH Zurich paper concludes that AGENTS.md files may often hinder AI coding agents. The researchers recommend omitting LLM-generated context files entirely and limiting human-written instructions to non-inferable details, such as highly specific tooling or custom build commands.
The team (Thibaud Gloaguen, Niels Mündler, Mark Müller, Veselin Raychev, Martin Vechev) justified the research by noting that while 60,000 open-source repositories currently contain context files such as AGENTS.md, and many agent frameworks feature built-in commands to auto-generate them, there has been no rigorous empirical investigation into whether these files actually improve an AI agent’s ability to resolve real-world coding tasks.
The researchers (one of whom contributed to the Humanity Last Exam benchmark) built AGENTbench, a novel dataset of 138 real-world Python tasks sourced from niche repositories. This setup deliberately avoids the bias of popular benchmarks like SWE-bench, which AI models may have partially memorized. The team tested four agents (Claude 3.5 Sonnet, Codex GPT-5.2 and GPT-5.1 mini, and Qwen Code) across three distinct scenarios: using no context file, an LLM-generated file, and a human-written file. The researchers assessed the real-world impact of repository-level instructions by tracking three proxy indicators: task success rates (as determined by repository unit tests), the number of agent steps, and overall inference costs. All chosen niche repositories featured human-written context files; the first two scenarios were tested by removing or replacing those files.
The researchers found that LLM-generated context files degrade performance, actually reducing the task success rate by an average of 3% compared to providing no context file at all. They also consistently increased the number of steps the agent took, driving up inference costs by over 20%.
On the other hand, human-written files did offer marginal gains, with a 4% average increase in task success rate on AGENTbench. This positive increase, however, is contrasted by a parallel increase in the number of steps, raising costs by up to 19%.
Including information such as an architectural overview or an explanation of the repository structure in AGENTS.md files did not seem to reduce the time the model spent locating relevant files for the task at hand.
To understand why performance dropped while costs increased, the authors conducted a deep trace analysis of the agents’ tool calls and reasoning patterns. Agents generally followed the instructions included in the AGENTS.md file. As a result, they ran more tests, read more files, executed more grep searches, and performed more code-quality checks. While thorough, this behavior was often unnecessary for resolving the specific task at hand. The data points to the extra context forcing reasoning models to “think” harder without yielding better final patches.
The authors concluded by emphasizing the gap between the study’s findings and the current recommendations made to developers using AI code agents:
We find that all context files consistently increase the number of steps required to complete tasks. LLM-generated context files have a marginal negative effect on task success rates, while developer-written ones provide a marginal performance gain.
Our trace analyses show that instructions in context files are generally followed and lead to more testing and a broader exploration; however, they do not function as effective repository overviews. Overall, our results suggest that context files have only a marginal effect on agent behavior and are likely only desirable when manually written. This highlights a concrete gap between current agent-developer recommendations and observed outcomes, and motivates future work on principled ways to automatically generate concise, task-relevant guidance for coding agents.
Developers received the research with interest. One developer noted that the research should actually have developers focus on writing useful AGENTS.md files:
I read the study. I think it does the opposite of what the authors suggest—it’s actually vouching for good
AGENTS.mdfiles.
[…] The biggest use case forAGENTS.mdfiles is domain knowledge that the model is not aware of and cannot instantly infer from the project. That is gained slowly over time from seeing the agents struggle due to this deficiency. Exactly the kind of thing very common in closed-source, yet incredibly rare in public GitHub projects that have anAGENTS.mdfile—the huge majority of which are recent small vibe-coded projects centered around LLMs. If 4% gains are seen on the latter kind of project, which will have a very mixed quality ofAGENTSfiles in the first place, then for bigger projects with high-quality.md‘s they’re invaluable when working with agents.
Another developer noted that context files may just be more useful to developers than to AI harnesses:
I’ve maintained a
CLAUDE.mdfile for about 3 months now across two projects and the improvement is noticeable but not for the reasons you’d expect. The actual token-level context it provides matters less than the fact that writing it forces you to articulate things about your codebase that were previously just in your head. Stuff like “we use this weird pattern for X because of a legacy constraint in Y.” Once that’s written down, the agent picks it up, but so does every new human on the team.
Developers can review the paper online. The use of context files, such as AGENTS.md, CLAUDE.md, or .cursorrules, grew in importance in the second half of 2025, coinciding with a larger push by AI coding agent providers.
