By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Kimi’s K2 Opensource Language Model Supports Dynamic Resource Availability and New Optimizer
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Kimi’s K2 Opensource Language Model Supports Dynamic Resource Availability and New Optimizer
News

Kimi’s K2 Opensource Language Model Supports Dynamic Resource Availability and New Optimizer

News Room
Last updated: 2025/11/17 at 4:22 AM
News Room Published 17 November 2025
Share
Kimi’s K2 Opensource Language Model Supports Dynamic Resource Availability and New Optimizer
SHARE

Kimi released K2, a Mixture-of-Experts large language model with 32 billion activated parameters and 1.04 trillion total parameters, trained on 15.5 trillion tokens. The release introduces MuonClip, a new optimizer that builds on the Muon optimizer by adding a QK-clip technique designed to address training instability, which the team reports resulted in “zero loss spike” during pre-training. The model comes in two variants: a base version and K2 Thinking, with the latter claiming state-of-the-art results on benchmarks testing reasoning, coding, and agent capabilities, including 44.9% on Humanity’s Last Exam (HLE) with tools, 60.2% on BrowseComp, and 71.3% on SWE-Bench Verified. The release positions K2 as a contender in the open-source model space, particularly for software engineering and agentic tasks where the model aims to demonstrate strong generalization capabilities.

The team validated MuonClip through a series of scaling experiments. They first trained a mid-scale model with 9 billion activated parameters and 53 billion total parameters using the standard Muon optimizer. The researchers then tested whether QK-Clip affects model performance, finding that MuonClip maintains the optimization characteristics of Muon without negatively impacting the loss trajectory. For the full-scale Kimi K2 model, the team applied MuonClip with a tau value of 100 (τ = 100) and tracked maximum attention logits throughout training. The maximum logits gradually decreased to a normal operating range during the training process without requiring manual adjustments, which the team presents as evidence of the optimizer’s stability improvements.

Source: Kimi K2 Benchmark Results

Kimi trained K2 on a cluster of NVIDIA H800 GPUs, with each node containing 2 TB of RAM and 8 GPUs connected through NVLink and NVSwitch. The cluster uses 8×400 Gbps RoCE interconnects for cross-node communication. The team designed a flexible parallelism strategy that allows training on any number of nodes that is a multiple of 32, addressing what they describe as dynamic resource availability during large language model training.

To manage memory usage, the team applied selective recomputation to specific operations including LayerNorm, SwiGLU, and multi-head latent attention (MLA) up-projections, choosing what they characterize as inexpensive but high-footprint stages. The training process also recomputes MoE down-projections to further reduce activation memory requirements.

The model can execute 200 to 300 sequential tool calls driven by long-horizon planning and adaptive reasoning. K2 Thinking performs cycles that follow a pattern of think → search → browser use → think → code, generating and refining hypotheses while verifying evidence and constructing answers. This approach allows the model to break down ambiguous, open-ended problems into actionable subtasks.

For deployment, the team addressed inference efficiency challenges specific to thinking models. While low-bit quantization reduces inference latency and GPU memory usage, thinking models generate long output sequences that typically cause performance degradation when quantized. Kimi applied Quantization-Aware Training (QAT) during the post-training phase, using INT4 weight-only quantization on the MoE components. This implementation enables K2 Thinking to run native INT4 inference with approximately 2x generation speed improvement.

The Kimi K2 license includes a commercial use requirement. Organizations using the model or its derivatives for commercial products or services that exceed 100 million monthly active users or generate more than 20 million US dollars in monthly revenue must prominently display “Kimi K2” on the user interface of such products or services. This attribution requirement differentiates K2’s license from standard open-source licenses that typically do not mandate user-facing acknowledgments for high-scale commercial deployments.

Awni Hannun tested K2 Thinking on Apple Silicon, reporting performance results that demonstrate the model’s accessibility beyond datacenter infrastructure. Hannun stated

The new 1 Trillion parameter Kimi K2 Thinking model runs well on 2 M3 Ultras in its native format – no loss in quality! The model was quantization aware trained (qat) at int4. Here it generated ~3500 tokens at 15 toks/sec using pipeline-parallelism in mlx-lm.

Artificial Analysis, which provides independent analysis of AI models, stated that

Kimi K2 Thinking is the new leading open weights model: it demonstrates particular strength in agentic contexts but is very verbose, generating the most tokens of any model in completing our Intelligence Index evals.

One commenter on Hacker News noted that

the ultimate competition between models will eventually become a competition over energy. China’s open-source models have major advantages in energy consumption, and China itself has a huge advantage in energy resources. They may not necessarily outperform the U.S., but they probably won’t fall too far behind either.

Kimi K2 enters a competitive open-source model landscape that includes DeepSeek-R1, which also focuses on extended reasoning, Alibaba’s Qwen models with QwQ for reasoning tasks, Mistral’s Mixtral MoE series, and Meta’s Llama 3 family. 

The K2 Thinking variant is available on kimi.com and through the Moonshot API platform. The team has released the model weights on Hugging Face, where technical details and implementation guidance are accessible. Complete API documentation is available on the Moonshot platform, providing integration specifications for developers looking to incorporate K2 into their applications.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article A Doctor at Apple Shares 9 Hidden Apple Watch Features for Your Health A Doctor at Apple Shares 9 Hidden Apple Watch Features for Your Health
Next Article Mac Pro may never be updated, Apple backs Mac Studio instead Mac Pro may never be updated, Apple backs Mac Studio instead
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

systemd Lands Experimental Support For musl libc
systemd Lands Experimental Support For musl libc
Computing
Score up to 57% off Samsung sound systems ahead of Black Friday
Score up to 57% off Samsung sound systems ahead of Black Friday
News
“Data governance is essential to ensure that information is available where it is needed”
“Data governance is essential to ensure that information is available where it is needed”
Mobile
How eight Nigerian banks earned ₦514bn from digital payments
How eight Nigerian banks earned ₦514bn from digital payments
Computing

You Might also Like

Score up to 57% off Samsung sound systems ahead of Black Friday
News

Score up to 57% off Samsung sound systems ahead of Black Friday

2 Min Read
You can now buy pre-owned Ford vehicles on Amazon
News

You can now buy pre-owned Ford vehicles on Amazon

3 Min Read
How Long Can A USB Cable Be Without Slowing Down? – BGR
News

How Long Can A USB Cable Be Without Slowing Down? – BGR

4 Min Read
The ultimate way to convert an old iMac into a Studio Display – 9to5Mac
News

The ultimate way to convert an old iMac into a Studio Display – 9to5Mac

2 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?