OpenAI's GPT-5.3-Codex Thinks Deeper And Wider About Coding Work

On Thursday, OpenAI released GPT-5.3-Codex, a new model that extends its Codex coding agent beyond writing and reviewing code to performing a much wider range of work tasks. The release comes as competition continues to heat up among AI companies vying for market share in the AI-powered coding tools space.

OpenAI says GPT-5.3 combines the coding performance of GPT-5.2-Codex with the reasoning and professional-knowledge capabilities of GPT-5.2, while running 25% faster. This allows GPT-5.3-Codex to handle long-running tasks that involve research, tool use such as web search or database calls, and complex execution and planning across both general work tasks and software development.

Codex has reached over 1 million developers, OpenAI claims. And while Anthropic’s Claude Code has also seen rapid adoption, head-to-head data comparing the two tools remains scarce. SemiAnalysis reports that 4% of GitHub public commits, or new code uploaded to repositories, are currently being authored by Claude Code, and projects that figure could reach 20% or more by the end of 2026.

Benchmark one-upsmanship

OpenAI says GPT-5.3-Codex now has the best score of any model on SWE-Bench Pro, which evaluates real-world software engineering across four programming languages. Same goes for Terminal-Bench 2.0, which measures the terminal skills coding agents need.

Anthropic says its new Claude Opus 4.6 model, also announced Thursday, achieved top scores on several industry benchmarks including Humanity’s Last Exam (complex multidisciplinary reasoning), GDPval-AA (economically valuable knowledge work), and BrowseComp (hard-to-find information search).

OpenAI says its new model is capable of taking into account larger bodies of information while working on a task, as well as thinking about those tasks for longer periods without human intervention. In testing, OpenAI says it saw GPT-5.3-Codex autonomously iterate on game development over millions of tokens using generic prompts like “fix the bug” or “improve the game.”

Similarly, Anthropic says its new Opus 4.6 model can comprehend larger code bases and make more thoughtful decisions about how to add new code.

OpenAI’s GPT-5.3-Codex thinks deeper and wider about coding work

Benchmark one-upsmanship

Leave a Reply

Benchmark one-upsmanship

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Leave a Reply