By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Instance-Aware Group Quantization for Vision Transformers | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Instance-Aware Group Quantization for Vision Transformers | HackerNoon
Computing

Instance-Aware Group Quantization for Vision Transformers | HackerNoon

News Room
Last updated: 2025/11/17 at 5:53 PM
News Room Published 17 November 2025
Share
Instance-Aware Group Quantization for Vision Transformers | HackerNoon
SHARE

:::info
Authors:

(1) Jaehyeon Moon, Yonsei University and Articron;

(2) Dohyung Kim, Yonsei University;

(3) Junyong Cheon, Yonsei University;

(4) Bumsub Ham, a Corresponding Author from Yonsei University.

:::

Table of Links

Abstract and 1. Introduction

  1. Related work

  2. Method

    3.1. Uniform quantizer

    [3.2. IGQ-ViT]()

    [3.3. Group size allocation]()

  3. Experiments

    4.1. Implementation details and 4.2. Results

    4.3. Discussion

  4. [Conclusion, Acknowledgements, and References]()

Supplementary Material

A. More implementation details

[B. Compatibility with existing hardwares]()

[C. Latency on practical devices]()

[D. Application to DETR]()

Abstract

Post-training quantization (PTQ) is an efficient model compression technique that quantizes a pretrained full-precision model using only a small calibration set of unlabeled samples without retraining. PTQ methods for convolutional neural networks (CNNs) provide quantization results comparable to full-precision counterparts. Directly applying them to vision transformers (ViTs), however, incurs severe performance degradation, mainly due to the differences in architectures between CNNs and ViTs. In particular, the distribution of activations for each channel vary drastically according to input instances, making PTQ methods for CNNs inappropriate for ViTs. To address this, we introduce instance-aware group quantization for ViTs (IGQViT). To this end, we propose to split the channels of activation maps into multiple groups dynamically for each input instance, such that activations within each group share similar statistical properties. We also extend our scheme to quantize softmax attentions across tokens. In addition, the number of groups for each layer is adjusted to minimize the discrepancies between predictions from quantized and full-precision models, under a bit-operation (BOP) constraint. We show extensive experimental results on image classification, object detection, and instance segmentation, with various transformer architectures, demonstrating the effectiveness of our approach.

1. Introduction

Transformers [35] can capture long-range dependencies across sequential inputs, which is of central importance in natural language processing, aggregating contextual information and providing discriminative feature representations. Recently, vision transformers (ViTs) [10] has demonstrated the effectiveness of transformers for images, providing state-of-the-art results on various visual recognition tasks, including image classification [25, 34], object detection [25, 42], and semantic segmentation [25, 33, 39]. However, a series of fully-connected (FC) and self-attention layers in ViTs requires a substantial amount of memory and computational cost, making it challenging to deploy them on devices with limited resources (e.g., drones and mobile phones). The growing demand for ViTs to operate on the resource-constrained devices has led to increased interest in developing network quantization techniques for ViTs.

Network quantization generally reduces bit-widths of weights and activations of a model for an efficient inference process, which can be categorized into two groups: Quantization-aware training (QAT) and post-training quantization (PTQ). QAT methods [11, 43, 44] train fullprecision models, while simulating the quantization process by inserting discretizers into networks to quantize, such that the discrepancy between the full-precision and quantized models is minimized in terms of accuracy. This suggests that QAT methods require entire training samples, and they are computationally expensive, making them impractical for the prompt deployment of neural networks. PTQ methods [19, 27, 37], on the other hand, calibrate quantization parameters (e.g., quantization intervals, zero-points) from pretrained full-precision models, enabling faster quantization of networks compared to QAT methods with only a limited number of training samples (usually less than 1k).

Several PTQ methods for transformers [9, 26, 40] apply layer-wise quantization techniques, where a single quantizer is applied to all activation values for efficiency. These methods, however, are not directly applicable for quantizing models using extremely low bit-widths (e.g., 4-bit), due to the significant scale variation on the activations for each channel. Exploiting channel-wise quantizers (i.e., applying different quantizers for each channel) could be a potential solution, but at the expense of computational overheads, due to floating-point summations of channel-wise outputs for matrix multiplication. Group quantization techniques [7, 32] could be an alternative to address this problem, where they divide consecutive channels uniformly into multiple groups, and apply a single quantizer for each group (Fig. 1a). However, we have observed that the channel-wise distributions of activation values vary largely among different samples, making conventional approaches inappropriate for ViTs.

In this paper, we present instance-aware group quantization for ViTs (IGQ-ViT), that effectively and efficiently addresses the variations of channel-wise distributions across different input instances (Fig. 1b). Specifically, we split the channels of activation maps into multiple groups dynamically, such that the activation values within each group share similar statistical properties, and then quantize the activations within the group using identical quantization parameters. We also propose to use the instance-aware grouping technique to softmax attentions, since the distributions of attention values vary significantly according to tokens. In addition, we present a simple yet effective method to optimize the number of groups for individual layers, under a bit-operation (BOP) constraint. IGQ-ViT can be applied to various components in ViTs, including input activations of FC layers and softmax attentions, unlike previous methods [21, 23, 26, 40] that are limited to specific parts of transformer architectures. We demonstrate the effectiveness and efficiency of IGQ-ViT for various transformers, including ViT [10] and its variants [25, 34], and show that IGQ-ViT achieves state-of-the-art results on standard benchmarks. We summarize the main contributions of our work as follows:

• We introduce a novel PTQ method for ViTs, dubbed IGQViT, that splits channels of activation maps into a number of groups dynamically according to input instances. We also propose to use the instance-aware grouping technique to split softmax attentions across tokens.

• We present a group size allocation technique searching for an optimal number of groups for each layer given a BOP constraint.

• We set a new state of the art on image classification [8], object detection, and instance segmentation [22], with various ViT architectures [10, 25, 34].

:::info
This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.

:::

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article The 4 Things You Need for a Tech Bubble The 4 Things You Need for a Tech Bubble
Next Article A marketplace for mission-ready AI: Accelerating capability delivery to the Pentagon A marketplace for mission-ready AI: Accelerating capability delivery to the Pentagon
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

The volume of online payments increases due to Black Friday 2025
The volume of online payments increases due to Black Friday 2025
Mobile
BYDFi Joins CCCC Lisbon 2025 as Sponsor, Empowering Creators And Web3 Education | HackerNoon
BYDFi Joins CCCC Lisbon 2025 as Sponsor, Empowering Creators And Web3 Education | HackerNoon
Computing
The Best Gaming Monitors We’ve Tested for 2025
The Best Gaming Monitors We’ve Tested for 2025
News
Aster Launches Stage 4 Airdrop and M Trading Competition to Accelerate Ecosystem Growth | HackerNoon
Aster Launches Stage 4 Airdrop and $10M Trading Competition to Accelerate Ecosystem Growth | HackerNoon
Computing

You Might also Like

BYDFi Joins CCCC Lisbon 2025 as Sponsor, Empowering Creators And Web3 Education | HackerNoon
Computing

BYDFi Joins CCCC Lisbon 2025 as Sponsor, Empowering Creators And Web3 Education | HackerNoon

4 Min Read
Aster Launches Stage 4 Airdrop and M Trading Competition to Accelerate Ecosystem Growth | HackerNoon
Computing

Aster Launches Stage 4 Airdrop and $10M Trading Competition to Accelerate Ecosystem Growth | HackerNoon

5 Min Read
The Remonetization of Gold and The Dawn of Tokenized Trust | HackerNoon
Computing

The Remonetization of Gold and The Dawn of Tokenized Trust | HackerNoon

12 Min Read
OpenZFS 2.4 Squeezes In Some Last Minute Improvements
Computing

OpenZFS 2.4 Squeezes In Some Last Minute Improvements

1 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?