MiniMax Releases M1: A 456B Hybrid-Attention Model For Long-Context Reasoning And Software Tasks

MiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks

Last updated: 2025/06/24 at 4:50 PM

News Room Published 24 June 2025

MiniMax has introduced MiniMax-M1, an open-weight language model designed for long-context reasoning and tool use. Based on the earlier MiniMax-Text-01, M1 uses a hybrid Mixture-of-Experts (MoE) architecture and a new “lightning attention” mechanism. The model has a total capacity of 456 billion parameters, with 45.9 billion active per token, and supports context lengths of up to 1 million tokens.

M1 distinguishes itself through its efficient use of compute and support for long-context reasoning. Its lightning attention mechanism reduces test-time computation, requiring only 25% of the FLOPs used by DeepSeek R1 for sequences of 100K tokens. The model was trained using large-scale reinforcement learning across a range of domains, including mathematical problem-solving and software engineering environments.

Two versions of the model are available. The models are evaluated using a custom RL scaling approach. Notably, MiniMax introduces CISPO, a novel RL algorithm that clips importance sampling weights rather than token updates—reportedly improving stability and performance over traditional variants.

Across benchmarks, MiniMax-M1-80K consistently ranks at or near the top among open-weight models, with strong results in:

Long-context tasks (OpenAI-MRCR 128K: 73.4%, LongBench-v2: 61.5%)

Software engineering (SWE-bench Verified: 56.0%)

Tool use (TAU-bench airline: 62.0%, retail: 63.5%)

Reasoning-heavy math benchmarks (AIME 2024: 86.0%)

One Reddit user commented on its standout capabilities:

This looks pretty great. Especially for function calling (Tau-bench) and long context, this seems like SOTA for open-weights. The latter by some big margin, which I don’t even find unbelievable because their old non-reasoning model was also great for this.

However, others pointed to limitations in practice. For example, dubesor86 shared:

It’s unusable, though. I had it play chess matches (usually takes a few minutes), and I had to have it run all night, and it still wasn’t done by the time I woke up. All the scores in the world mean nothing if the usability is zero.

MiniMax-M1 also supports structured function calling, making it suitable for agent frameworks. The model is available in two versions (40K and 80K) via HuggingFace. For deployment, the team recommends vLLM, offering optimized serving, memory management, and batching performance. Developers can also experiment via the MiniMax MCP Server, which bundles API access and capabilities such as video and image generation, speech synthesis, and voice cloning.

MiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks

Leave a Reply Cancel reply

Stay Connected

Latest News

(Updated weekly) 2025 Bluesky news, updates, and features

Oxford Ionics delivers full stack quantum computer to NQCC – UKTN

The Revival of Authentic Online Communities: How Mevsim.com Blends IRC’s Legacy with Modern Connectivity

Lecturers fear impact of DeepSeek ‘censorship’ on students’ work

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News