Huawei’s Zurich Computing Systems Laboratory has released SINQ (Sinkhorn Normalization Quantization), an open-source quantization method that reduces the memory requirements of large language models (LLMs) by up to 70%. The breakthrough allows workloads that once needed enterprise GPUs like Nvidia’s A100 or H100 to run efficiently on consumer-grade cards such as the RTX 4090, cutting both hardware and cloud compute costs.
The Apache 2.0–licensed project is now available on GitHub and Hugging Face for free use and commercialization. Huawei says SINQ achieves accuracy close to data-calibrated approaches while outperforming other calibration-free methods such as RTN and HQQ in both speed and precision. [TechNode reporting]