The Agentica Project and Together AI have released DeepCoder-14B-Preview, an open source AI coding model based on Deepseek-R1-Distilled-Qwen-14B. The model achieves a 60.6% pass rate on LiveCodeBench, outperforming OpenAI’s o1 model and matching the performance of o3-mini.
DeepCoder-14B-Preview is fine-tuned from the Deepseek model on a dataset of 24K coding problems using reinforcement learning (RL). The developers modified the verl distributed RL framework to improve the end-to-end training efficiency by 2x. They released all artifacts associated with creating the model: code, data, training logs, and their improvements to verl. They evaluated the model on several coding benchmarks, including LiveCodeBench, Codeforces, and HumanEval, and on the math benchmark AIME2024. DeepCoder showed strong performance on all of them, with scores “comparable” to or even better than closed source reasoning models such as o1 and o3-mini. According to the project team,
Our goal is to democratize RL training for LLMs…By fully sharing our dataset, code, and training recipe, we empower the community to reproduce our work and make RL training accessible to all. We believe advancing RL scaling is a collective, community-driven endeavor, and we welcome open-source contributions and sponsorships. Let’s work together to push the frontiers of RL for LLM reasoning—and beyond!
The DeepCoder team published several details about their training process and several problems they overcame. First was a lack of “high-quality, verifiable” training data for coding problems: several popular datasets were “noisy or contained unverifiable problems,” or were just too easy for models to solve. To create a training dataset, the team developed an automated pipeline to keep only problems with a verifiable solution and at least five unit tests.
They also addressed an RL training bottleneck in “sampling,” i.e. running inference on the model being trained. The solution was to pipeline the process: run training and inference in parallel, and use the inference output for the next batch of training. This reduced the training iteration time by 1.4x.
LiveCodeBench Pass@1 Accuracy vs Model Size. Image Source: Together AI Blog
In a Reddit discussion about the model, one user wrote:
I just gave the q4 quant of the 14b version on ollama a try and I have to say that I’m very impressed. It’s definitely the best model I’ve tried in this size. I’d need more testing to conclude if it’s really as good as o3-mini low (particularly as I only have ever tested o3-mini medium), but it definitely feels like it’s beyond 4o in my initial testing on my day-to-day tasks.
Andrew Ng’s newsletter The Batch praised DeepCoder, saying:
Applying reinforcement learning to coding works, but it has two big issues: (i) Training examples of verifiable code are relatively scarce and (ii) computing reward signals for code is time-consuming, since it requires evaluating many test cases. DeepCoder-14B-Preview’s optimizations reduced this complexity, shrinking RL training from months to weeks. Those optimizations are built into Verl-pipeline, an open source RL library from Together.AI and Agentica, giving developers a powerful tool for model training.
Kudos to the DeepCoder team for open sourcing their reasoning recipe! A handful of companies have developed the know-how to execute RL well, but many teams still have trouble implementing successfully. Open recipes for RL training methods and data curation techniques are important to move the field forward.
The DeepCoder-14B-Preview training code is available on GitHub. Model files can be downloaded from Huggingface.