By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Can ChatGPT-Style Models Survive Quantization? | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Can ChatGPT-Style Models Survive Quantization? | HackerNoon
Computing

Can ChatGPT-Style Models Survive Quantization? | HackerNoon

News Room
Last updated: 2025/03/07 at 2:17 AM
News Room Published 7 March 2025
Share
SHARE

Authors:

(1) Wanyun Cui, Shanghai University of Finance and Economics, with equal contribution;

(2) Qianle Wang, Shanghai University of Finance and Economics, with equal contribution.

Table of Links

Abstract and 1 Introduction

2 Related Work

3 Quantifying the Impact of Parameters on Model Performance & 4. Unified Mixed-Precision Training

5 Prevalence of Parameter Heterogeneity in LLMs

6 Quantization Experiments and 6.1 Implementation Details

6.2 Effect of Base LLM Quantization

6.3 Effect of Chat LLM Quantization

6.4 Comparison of Parameter Selection Criteria, Conclusion, & References

6.3 Effect of Chat LLM Quantization

We conduct experiments on Vicuna-1.5 [5]. We apply 3-bit quantization with group size=128 for CherryQ and other baselines.

Evaluation To assess the performance of quantized open-ended chat models, we employ a pairwise comparison on the Vicuna-bench [26], which consists of 80 test samples. We compare the responses generated by the quantized models against those generated by the original 16-bit Vicuna-1.5. The evaluation is performed using GPT-4, which automatically classifies the quantized model’s response as “win”, “tie”, or “lose” relative to the FP16 model’s response. To get rid of the ordering effect of the evaluation, we follow [17] to compare the responses with both orders, leading to 160 trials.

Figure 3 presents the results of the pairwise comparison for each quantized model against its FP16 counterpart. The results demonstrate that CherryQ consistently outperforms other quantization baselines in preserving the performance of chat models. It achieves the highest number of wins and ties against the FP16 models, while minimizing the number of losses.

Table 3: Performance of different 3-bit quantization methods on Huggingface OpenLLM for LLaMA2- 7B and LLaMA2-13B.Table 3: Performance of different 3-bit quantization methods on Huggingface OpenLLM for LLaMA2- 7B and LLaMA2-13B.

Figure 3: Comparison of 3-bit quantized models to FP16 Vicuna-1.5. (Left) Comparisons to Vicuna1.5-7B. (Right) Comparisons to Vicuna-1.5-13B. CherryQ even shows competitive quality compared to the 16-bit counterpart.Figure 3: Comparison of 3-bit quantized models to FP16 Vicuna-1.5. (Left) Comparisons to Vicuna1.5-7B. (Right) Comparisons to Vicuna-1.5-13B. CherryQ even shows competitive quality compared to the 16-bit counterpart.

Notably, 3-bit CherryQ achieves a slightly better win-tie-lose ratio over the FP16 Vicuna model, indicating that the 3-bit quantized model performs on par with or even better than the FP16 model. As intuitively CherryQ cannot surpass the target 16 bit model, we think the result suggests that CherryQ maintains almost all its performance even at 3 bit, making GPT-4 hard to distinguish the quality of low-bit and FP16 models.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Everything new on Prime Video next month
Next Article Stock up on free indie books in the latest Stuff Your Kindle Day
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

'Love Island USA' Season 7: When and Where to Watch
News
Here are the 2 French regions that are at risk this weekend
Mobile
Ransomware: What the LockBit 3.0 data leak reveals | Computer Weekly
News
What to Watch on Netflix This Week (May 9-16)
News

You Might also Like

Computing

GNOME Showtime Accepted As Video Player App For GNOME 49

0 Min Read
Computing

The HackerNoon Newsletter: If Youre an Amazon Ring Owner, You May Be an Accidental Spy (5/9/2025) | HackerNoon

2 Min Read

New Purpose-Built Blockchain T-Rex Raises $17 Million to Transform Attention Layer In Web3 | HackerNoon

8 Min Read
Computing

Ninja Deep Research: The AI Agent Everyone Can Actually Start Using Now | HackerNoon

10 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?