OpenAI has released GPT-5-Codex, a version of GPT-5 optimized for complex software engineering tasks such as large-scale code refactoring and extended code review workflows. Purpose-built for the Codex CLI, IDE extension, and cloud environment, the model can operate autonomously for more than seven hours to deliver working solutions without human intervention. It is now the default in Codex’s cloud service and available wherever developers use Codex.
A key feature of GPT-5-Codex is adaptive reasoning, which adjusts reasoning time based on task complexity. The model supports both interactive pairing with developers for smaller, well-defined tasks and persistent execution on extended refactoring work. OpenAI notes that while responses are faster in chat-style interactions, the model can allocate additional cycles to handle larger, multi-file code changes.
OpenAI employee usage data suggests efficiency gains in how the model allocates computation. For the simplest 10% of requests, GPT-5-Codex used 93.7% fewer tokens than GPT-5. Conversely, for the most complex 10% of requests, the model invested more effort, spending roughly twice as long on reasoning, editing, testing, and iteration.
In direct evaluations of refactoring tasks, GPT-5-Codex also achieved higher accuracy than GPT-5. The model reached 51.3% accuracy compared to 33.9% for GPT-5 in scenarios requiring systematic, multi-step modifications across a codebase. One benchmark involved a pull request in the Gitea repository that required passing a context variable through multiple layers of the application, spanning 232 files and more than 3,500 lines of code.
Beyond refactoring, the model has also been trained to strengthen code review workflows. GPT-5-Codex can navigate repositories, analyze dependencies, and run tests to validate correctness. OpenAI reports that in evaluations on recent open-source commits from popular open-source repositories, GPT-5-Codex produced review comments that were more accurate and higher-value, reducing noise for developers and highlighting critical issues.
The model was trained using reinforcement learning on real-world coding tasks, such as building full projects from scratch, adding features and tests, debugging, performing large-scale refactors. OpenAI says this approach helps align behavior with common coding styles and pull request conventions. It can also follow project-specific guidelines defined in AGENTS.md files.
For developer access, GPT-5-Codex is available through the Codex CLI and IDE extension. OpenAI recommends this model for agentic coding scenarios, with API key access for CLI integration expected in upcoming releases.
OpenAI highlighted early use cases of GPT-5-Codex from engineering teams. Aaron Wang, Senior Software Engineer at Duolingo, noted that “Codex performed best in our backend Python code-review benchmark. It was the only one to catch tricky backward compatibility issues and consistently found the hard bugs that other bots missed”. At Cisco Meraki, Tech Lead shared, “I needed to update a codebase owned by another team for a feature release. With Codex, I offloaded the refactoring and test generation while focusing on other priorities. It produced high-quality, fully tested code that I could quickly hand back, keeping the feature on schedule without adding risk”