DeepSeek today released a new large language model family, the R1 series, that’s optimized for reasoning tasks.
The Chinese artificial intelligence developer has made the algorithms’ source-code available on Hugging Face.
The LLM lineup is headlined by two algorithms called R1 and R1-Zero. According to DeepSeek, the former model outperforms OpenAI’s o1 across several reasoning benchmarks. R1-Zero, meanwhile, is less capable but represents a potentially significant advancement in machine learning research.
Both LLMs feature a mixture of experts, or MoE, architecture with 671 billion parameters. A MoE model comprises multiple neural networks that are each optimized for a different set of tasks. When the model relieves a prompt, a mechanism known as a router sends the query to the neural network best-equipped to process it.
The main benefit of the MoE architecture is that it lowers inference costs. When users enter a prompt into an MoE model, the query doesn’t activate the entire AI but only the specific neural network that will generate the response. As a result, R1 and R1-Zero activate less than one tenth of their 671 billion parameters when answering prompts.
DeepSeek trained R1-Zero using a different approach than the one researchers usually take with reasoning models.
Reasoning-optimized LLMs are typically trained using two methods known as reinforcement learning and supervised fine-tuning. The former technique teaches an AI model to perform a task through trial and error. Supervised fine-tuning, in turn, boosts the AI’s output quality by providing it with examples of how to carry out the task at hand.
While training R1-Zero, DeepSeek skipped the supervised self-tuning stage. Nevertheless, the company managed to equip the model with reasoning skills such as the ability to break down complex tasks into simpler sub-steps.
“It is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT,” DeepSeek researchers detailed. “This breakthrough paves the way for future advancements in this area.”
Although R1-Zero has an advanced feature set, its output quality is limited. The model’s responses sometimes suffer from “endless repetition, poor readability and language mixing,” DeepSeek‘s researchers detailed. The company created R1 to address those limitations.
R1 is an enhanced version of R1-Zero that was developed using a modified training workflow. This workflow makes use of supervised fine-tuning, the technique that DeepSeek left out during the development of R1-Zero. The company says that this change helped significantly boost output quality.
DeepSeek compared R1 against four popular LLMs using nearly two dozen benchmark tests. According to the company, its model managed to outperform OpenAI’s reasoning-optimized o1 LLM across several of the benchmarks. In most of the benchmarks that o1 completed with a higher score, R1 trailed it by under 5%.
One of the benchmarks in which R1 outperformed o1 is LiveCodeBench. It’s a collection of programming tasks that is regularly updated with new practice problems. This makes it less likely that AI models will find ready-made answers to the problems on the public web.
Alongside R1 and R1-Zero, DeepSeek today open-sourced a set of less capable but more hardware-efficient models. Those models were “distilled” from R1, which means that some of the LLM’s knowledge was transferred to them during training.
The distilled models range in size from 1.5 billion to 70 billion parameters. They’re based on the Llama and Qwen open-source LLM families. DeepSeek says that one of the distilled models, R1-Distill-Qwen-32B, outperforms the scaled-down OpenAI-o1-mini version of o1 across several benchmarks.
Image: Unsplash
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU