Authors:
(1) Wanyun Cui, Shanghai University of Finance and Economics, with equal contribution;
(2) Qianle Wang, Shanghai University of Finance and Economics, with equal contribution.
Table of Links
Abstract and 1 Introduction
2 Related Work
3 Quantifying the Impact of Parameters on Model Performance & 4. Unified Mixed-Precision Training
5 Prevalence of Parameter Heterogeneity in LLMs
6 Quantization Experiments and 6.1 Implementation Details
6.2 Effect of Base LLM Quantization
6.3 Effect of Chat LLM Quantization
6.4 Comparison of Parameter Selection Criteria, Conclusion, & References
5. Prevalence of Parameter Heterogeneity in LLMs
While Figure 1 showcases the heterogeneity of selected parameter matrices in different LLMs, it is crucial to investigate whether this phenomenon is pervasive across the hundreds of parameter matrices within each LLM. In this section, we conduct a comprehensive analysis of parameter heterogeneity from a macro perspective.
To quantify the degree of heterogeneity in a parameter matrix, we introduce the heterogeneity score of the matrix. Inspired by the observation in Figure 1, where a small subset of parameters exhibits significantly higher impacts compared to the maximum of the majority, we define the heterogeneity score as the ratio of the mean impact of the top 1% parameters to the maximum impact of the bottom 99% parameters, as shown in Equation (4). A higher heterogeneity score indicates a more pronounced disparity in parameter importance within the matrix.
For comparison, we also include the heterogeneity scores based on the magnitude of parameters, a commonly used measure of parameter importance [11]. The magnitude-based heterogeneity score is calculated using Equation (5).
To provide a comprehensive view of parameter heterogeneity across different matrices, we plot the scatter distribution of heterogeneity scores for all parameter matrices of each model in Figure 2. It clearly reveals that the parameter matrices across different LLMs exhibit high heterogeneity scores, especially when comparing with parameter magnitudes. This finding strongly suggests that parameter heterogeneity is not an isolated occurrence but rather a widespread phenomenon in LLMs.
The pervasiveness of parameter heterogeneity highlights the need for quantization strategies that can effectively handle the disparate importance of parameters, ensuring that the cherry parameters are preserved with higher precision while allowing for more aggressive quantization of the less influential normal parameters.