Moonshot AI released Kimi K2.5, their latest open-weight multimodal LLM. K2.5 excels at coding tasks, with benchmark scores comparable to frontier models such as GPT-5 and Gemini. It also features an agent swarm mode, which can direct up to 100 sub-agents for attacking problems with parallel workflow.
Kimi K2.5 builds on the previous Kimi K2 MoE LLM. The new model adds vision functionality to its text-only predecessor. This in combination with its coding ability makes it excellent for front-end dev tasks. The model supports four modes of operation: Instant, Thinking, Agent, and Agent Swarm. The last is a research preview that can decompose tasks into subtasks which are executed in parallel by a group of sub-agents. Agent mode is designed to support office productivity tasks that involve outputting documents and spreadsheets. According to Moonshot AI:
Grounded in advances in coding with vision, agent swarms, and office productivity, Kimi K2.5 represents a meaningful step toward AGI for the open-source community, demonstrating strong capability on real-world tasks under real-world constraints. Looking ahead, we will push further into the frontier of agentic intelligence, redefining the boundaries of AI in knowledge work.
Kimi K2.5 extends the Kimi K2 architecture by including Moonshot’s MoonViT-3D vision encoder. The team started with a Kimi K2 checkpoint and continued the pre-training process with another 15T tokens. This was followed by supervised fine-tuning and reinforcement learning.
For the Agent Swarm feature, the Moonshot team developed a new RL technique, Parallel Agent Reinforcement Learning (PARL), to train Kimi K2.5 to decompose and parallel complex tasks. PARL was developed to address several challenges: training instability; ambiguous credit assignment; and “serial collapse”, where the orchestrator simply runs a single agent. In PARL, the subagents are frozen and only the orchestrator is trained. The reward function incentivizes sub-agent creation and successful completion of sub-tasks.
The Moonshot team evaluated Kimi K2.5 on a wide array of benchmarks. For Agent Swarm in particular, they used BrowseComp and WideSearch, which measure research and information retrieval capability. On BrowseComp, Kimi K2.5 outperformed GPT-5.2 Pro, and on WideSearch it outperformed Claude Opus 4.5. It also had “substantial wall-clock time reductions” due to the parallel execution. The Mooshot team also noted that Agent Swarm exhibits “proactive context control,” which reduces the risk of context overflow and effectively scales overall context length without the need for context summarization.
Andrew Ng’s The Batch newsletter discussed Kimi K2.5, saying:
Building an agentic workflow can improve a model’s performance on a particular task. Unlike predefined agentic workflows, Kimi K2.5 decides when a new subagent is necessary, what it should do, and when to delegate work to it. This automated agentic orchestration improves performance in tasks that are easy to perform in parallel…Kimi K2.5 shifts task execution from chain-of-thought reasoning to agentic teamwork. Instead of responding to prompts sequentially, it acts as a manager of separate workflows/models that execute different parts of the job in parallel.
Kimi K2.5 is available on the web via a chat interface or Moonshot’s API. The model weights are also available at Huggingface.
