By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Microsoft Native 1-Bit LLM Could Bring Efficient genAI to Everyday CPUs
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Microsoft Native 1-Bit LLM Could Bring Efficient genAI to Everyday CPUs
News

Microsoft Native 1-Bit LLM Could Bring Efficient genAI to Everyday CPUs

News Room
Last updated: 2025/04/23 at 8:32 AM
News Room Published 23 April 2025
Share
SHARE

In a recent paper, Microsoft researchers described BitNet b1.58 2B4T, the first LLM to be natively trained using “1-bit” (technically, 1-trit) weights, rather than being quantized from a model trained with floating point weights. According to Microsoft, the model delivers performance comparable to full-precision LLMs of similar size at a fraction of the computation cost and hardware requirements.

While LLMs have shows impressive performance, there are still barriers to their broader adoption:

State-of-the-art open LLMs typically require large memory footprints, consume considerable energy, and exhibit notable inference latency, rendering them impractical for many edge devices, resource-constrained environments, and real-time applications.

To overcome these limitations, the LLM community has been exploring quantized models, which are derived from full-precision models by converting their weights to a lower-bit format.

Microsoft trained BitNet b1.58 2B4T from scratch on a 4 trillion token corpus using 1-bit weights, aiming to avoid the precision loss typically caused by quantizing a model originally trained in full precision, while retaining the benefits of smaller weights in terms of memory footprint and computational cost.

Indeed, based on Microsoft benchmarks, the new model performs comparably to leading open-weight, full-precision models of similar size across a wide range of tasks, including language understanding and reasoning, world knowledge, reading comprehension, math and code, and instruction following and conversation. The comparative benchmark results are summarized in the chart below:

Where BitNet b1.58 2B4T stands out compared to quantized models of similar or smaller size is in memory footprint, latency, and energy consumption, as shown in the following table.

Architecturally, BitNet b1.58 2B4T replaces standard full-precision linear layers (i.e. torch.nn.Linear), with custom BitLinear layers, which use 1.58-bit representations to encode weights as ternary values (trits) during the forward pass.

This is achieved using an absolute mean (absmean) quantization scheme, which maps weights to ternary values {−1, 0, +1}. This drastically reduces the model size and enables efficient mathematical operations.

Two additional techniques used in BitLinear layers— activation quantization and normalization— further contribute to reducing the model’s size and improving training stability.

In addition to BitLinear layers, BitNet b1.58 2B4T incorporates several established LLM techniques, such as squared ReLU activation functions, rotary positional embeddings, and bias term removal.

For training, BitNet b1.58 2B4T relies on three techniques: large-scale pre-training, supervised fine-tuning, and direct preference optimization. The researchers note that more advanced techniques, such as Proximal Policy Optimization or Group Relative Policy Optimization, will be explored in the future to enhance mathematical capabilities and chain-of-thought reasoning.

Given the unique quantization scheme of BitNet b1.58 2B4T, the model cannot be used with standard deep learning libraries like llama.cpp and requires a specialized kernel. To this aim, Microsoft has developed an open-source dedicated inference library, bitnet.cpp. Based on llama.cpp,

bitnet.cpp is the official inference framework for 1-bit LLMs (e.g., BitNet b1.58). It offers a suite of optimized kernels, that support fast and lossless inference of 1.58-bit models on CPU (with NPU and GPU support coming next).

The researchers note that current GPU hardware is not optimized for 1-bit models and that further performance gains could come from incorporating dedicated logic for low-bit operations. Future research directions include training larger models, adding multi-lingual capabilities and multi-modal integration, and extending the context window length.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Baidu Reportedly Launches Xinxiang Ai Agent for Android Smartphones
Next Article How to use Google Password Manager on iPhone
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

More Than a Feeling: Visualizing Why Filter Atoms Outsmart LoRA in Fine-Tuning | HackerNoon
Computing
A Dedicated Hot Dog Cooker Is the Spirit of American Summer
Gadget
How Rajesh Kesavalalji’s Vision Fuels AI Software Innovation 
Gadget
The Samsung Galaxy S25+ is down to a record-low price — but there’s a limited supply
News

You Might also Like

News

The Samsung Galaxy S25+ is down to a record-low price — but there’s a limited supply

3 Min Read
News

The road to quantum datacentres goes beyond logical qubits | Computer Weekly

6 Min Read
News

Sharp pencils for hard times

4 Min Read
News

Marshall debuts Middleton II speaker with big sound in a portable package

3 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?