By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: What Makes Vision Transformers Hard to Quantize? | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > What Makes Vision Transformers Hard to Quantize? | HackerNoon
Computing

What Makes Vision Transformers Hard to Quantize? | HackerNoon

News Room
Last updated: 2025/11/17 at 5:27 PM
News Room Published 17 November 2025
Share
What Makes Vision Transformers Hard to Quantize? | HackerNoon
SHARE

Table of Links

Abstract and 1. Introduction

  1. Related work

  2. Method

    3.1. Uniform quantizer

    3.2. IGQ-ViT

    3.3. Group size allocation

  3. Experiments

    4.1. Implementation details and 4.2. Results

    4.3. Discussion

  4. Conclusion, Acknowledgements, and References

Supplementary Material

A. More implementation details

B. Compatibility with existing hardwares

C. Latency on practical devices

D. Application to DETR

2. Related work

Network quantization. Network quantization aims at reducing bit-widths of weights and activations of neural networks. QAT methods simulate the quantization process by applying a round function to weights and activations of the network. Since derivatives of the round function is either zero or infinite, they approximate the gradients (e.g., using the straight-through estimator [3]) to train the network with backpropagation. These methods also adjust the derivatives of the round function [17, 18] or train quantization parameters jointly with network weights based on task losses [11, 16]. For better convergence of the training process, many heuristics have been introduced, e.g., progressively shrinking bit-widths [44] or freezing parts of the network weights [29, 43]. Quantized networks using QAT show performance comparable to or even better then full-precision counterparts. However, the quantization process is computationally demanding, requiring a significant amount of training time. PTQ offers an alternative approach to quantizing neural networks. Instead of training fullprecision models and simulating the quantization process at training time, PTQ methods calibrate quantization parameters (e.g., quantization intervals) using a subset of training samples. Early efforts focus on optimizing the quantization parameters to minimize the difference between floatingpoint and quantized values [2, 28]. Another line of research proposes to consider distributions of weights and/or activations to design quantizers. For instance, the work of [12] has observed that network weights follow a bell-shaped distribution. Based on this, it introduces piecewise linear quantizers that assign different quantization intervals according to the magnitudes of activations, performing better compared to uniform quantizers. Recent PTQ methods learn to either round up or down network weights by using a reconstruction error of layer outputs [27] or exploiting the Hessian of training losses [19], and they have proven the effectiveness on CNN architectures (e.g., ResNet [13], MobileNetV2 [31]).

Transformer quantization. While ViTs [10] and the variants [25, 34] have become increasingly popular in computer vision, the unique structure and characteristics of ViT architectures makes network quantization challenging. For example, PTQ methods for CNNs [2, 19, 27, 28] do not perform well on quantizing softmax attentions and GELU activations in transformers, suggesting that directly applying them for ViT quantization results in significant performance degradation [26]. To date, only a limited number of PTQ methods have been developed for ViTs. The work of [26] estimates quantization parameters that maximize similarities between full-precision and quantized outputs of linear operations, and proposes to preserve a relative order of attention values after quantization. APQ-ViT [9] introduces a calibration metric to minimize the discrepancies between full-precision and quantized outputs, while maintaining the power-law distribution of softmax attentions. PTQ4ViT [40] introduces twin uniform quantizers to handle asymmetric distributions in softmax attentions and GELU activations effectively. Most PTQ methods for ViTs exploit a single quantizer for all channels, suggesting that they do not consider the distributions of activation values across channels, typically having extreme scale variations. Recent works [21, 23] attempt to alleviate the scale variation problem efficiently. FQ-ViT [23] proposes to consider interchannel scale variations for LayerNorm [1], and exploits channel-wise quantizers with the constraint of the ratio of quantization intervals being power-of-two values. This enables using bit-shift operations, calculating mean and variance of LayerNorm in an integer level. The scale reparameterization technique, introduced by RepQ-ViT [21], allows to use layer-wise quantizers, instead of adopting channelwise ones, by adjusting the affine factors of LayerNorm and the weights of FC layers. However, this technique applies to the activations for LayerNorm only, and does not fully address the inter-channel scale variations for other layers in transformers.

Similar to ours, the works of [4, 7, 32, 36] adopt group quantization techniques for transformers. For instance, Qbert [32] and VS-quant [7] divide consecutive channels uniformly into a number of groups without considering the dynamic range of each channel, and thus the channels assigned to each group do not follow similar distributions. PEG [4] alleviates this issue by sorting the activations across channels w.r.t. the dynamic ranges during calibration, before grouping the channels. Quantformer [36] proposes to use a differentiable search [6, 24] for QAT in order to group channels of activation maps. The channels assigned to particular groups are however fixed after calibrating pretrained networks for PTQ in the group quantization techniques [4, 7, 32], which makes them inappropriate for ViTs having diverse channel distributions according to input instances. In contrast, our approach apply group quantization along channels of activation maps and tokens of softmax attentions dynamically at runtime for each input instance, without additional parameters for PTQ.

:::info
Authors:

(1) Jaehyeon Moon, Yonsei University and Articron;

(2) Dohyung Kim, Yonsei University;

(3) Junyong Cheon, Yonsei University;

(4) Bumsub Ham, a Corresponding Author from Yonsei University.

:::


:::info
This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.

:::

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Cable giants Charter and Cox, under assault by streaming services, pursue .5 billion merger Cable giants Charter and Cox, under assault by streaming services, pursue $34.5 billion merger
Next Article WIRED Roundup: Fandom in Politics, Zuckerberg’s Illegal School, and Nepal’s Discord Revolution WIRED Roundup: Fandom in Politics, Zuckerberg’s Illegal School, and Nepal’s Discord Revolution
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Google pulled the plug on old Nest thermostats, but not the data mine (Update: Statement)
Google pulled the plug on old Nest thermostats, but not the data mine (Update: Statement)
News
iPhone users in Europe may be able to replace Siri with a smarter assistant very soon
iPhone users in Europe may be able to replace Siri with a smarter assistant very soon
News
Lite Strategy Reports First Quarter Fiscal Year 2026 Results; Highlights Successful Launch Of 0M  | HackerNoon
Lite Strategy Reports First Quarter Fiscal Year 2026 Results; Highlights Successful Launch Of $100M | HackerNoon
Computing
Australian startup to join Illinois quantum campus at former U.S. Steel South Works site
Australian startup to join Illinois quantum campus at former U.S. Steel South Works site
News

You Might also Like

Lite Strategy Reports First Quarter Fiscal Year 2026 Results; Highlights Successful Launch Of 0M  | HackerNoon
Computing

Lite Strategy Reports First Quarter Fiscal Year 2026 Results; Highlights Successful Launch Of $100M | HackerNoon

7 Min Read
How to Make Money on Pinterest Without Followers
Computing

How to Make Money on Pinterest Without Followers

8 Min Read
Building User-Aware AI Agents with MCP and Serverless | HackerNoon
Computing

Building User-Aware AI Agents with MCP and Serverless | HackerNoon

5 Min Read
7 Profitable Crafts to Make and Sell: What the Numbers Say
Computing

7 Profitable Crafts to Make and Sell: What the Numbers Say

7 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?