MiniMax has introduced MiniMax-M1, an open-weight language model designed for long-context reasoning and tool use. Based on the earlier MiniMax-Text-01, M1 uses a hybrid Mixture-of-Experts (MoE) architecture and a new “lightning attention” mechanism. The model has a total capacity of 456 billion parameters, with 45.9 billion active per token, and supports context lengths of up to 1 million tokens.
M1 distinguishes itself through its efficient use of compute and support for long-context reasoning. Its lightning attention mechanism reduces test-time computation, requiring only 25% of the FLOPs used by DeepSeek R1 for sequences of 100K tokens. The model was trained using large-scale reinforcement learning across a range of domains, including mathematical problem-solving and software engineering environments.
Two versions of the model are available. The models are evaluated using a custom RL scaling approach. Notably, MiniMax introduces CISPO, a novel RL algorithm that clips importance sampling weights rather than token updates—reportedly improving stability and performance over traditional variants.
Across benchmarks, MiniMax-M1-80K consistently ranks at or near the top among open-weight models, with strong results in:
- Long-context tasks (OpenAI-MRCR 128K: 73.4%, LongBench-v2: 61.5%)
- Software engineering (SWE-bench Verified: 56.0%)
- Tool use (TAU-bench airline: 62.0%, retail: 63.5%)
- Reasoning-heavy math benchmarks (AIME 2024: 86.0%)
One Reddit user commented on its standout capabilities:
This looks pretty great. Especially for function calling (Tau-bench) and long context, this seems like SOTA for open-weights. The latter by some big margin, which I don’t even find unbelievable because their old non-reasoning model was also great for this.
However, others pointed to limitations in practice. For example, dubesor86 shared:
It’s unusable, though. I had it play chess matches (usually takes a few minutes), and I had to have it run all night, and it still wasn’t done by the time I woke up. All the scores in the world mean nothing if the usability is zero.
MiniMax-M1 also supports structured function calling, making it suitable for agent frameworks. The model is available in two versions (40K and 80K) via HuggingFace. For deployment, the team recommends vLLM, offering optimized serving, memory management, and batching performance. Developers can also experiment via the MiniMax MCP Server, which bundles API access and capabilities such as video and image generation, speech synthesis, and voice cloning.