QDyLoRA In Action: Method, Benchmarks, And Why It Outperforms QLoRA

Table of Links

Abstract and 1. Introduction

Proposed Method: Quantized DyLoRA
Experiments and Evaluation
On the semi-sorted behavior of QDyLoRA
Conclusion, Limitations, and References

A. Supplementary Material

A.1. Hyperparameters

A.2. Generated Text Quality

2 Proposed Method: Quantized DyLoRA

Following QLoRA (Dettmers et al., 2023), we used 4-bit Normal Float (NF4) for storing the double quantized pre-trained weights. As all the computations need to be calculated in BFloat16 precision, DDequant-NF4 will dequantize the stored data. Similar to (Dettmers et al., 2023), we have:

Algorithm 1 describes the workflow of our proposed QDyLoRA in detail.

3 Experiments and Evaluation

This section evaluates the efficiency and efficacy of QDyLoRA through several instruct-fine-tuning

tasks. The first experiment compares QDyLoRA with QLoRA on Massively Multitask Language Understating (MMLU) benchmark (Hendrycks et al., 2020), consisting of more than 50 different tasks, spanning from fundamental mathematics and U.S. history to computer science and law. As shown in Table 1 [1], we finetune LLaMA-7b, LLaMA-13b, LLaMA2-13b, and Falcon40b on different datasets, Alpaca (Taori et al., 2023), OASST1 (Köpf et al., 2023), Self-Instruct (Wang et al., 2022), and FLANv2 (Chung et al., 2022), using QLoRA and QDyLoRA techniques. We use the same training budget and maximum LoRA rank[2] for each technique. The results consistently show that QDyLoRA achieves a superior performance by finding the optimal rank.

The second experiment provides a more in-depth comparison between QLoRA and QDyLoRA. In particular, we fairly finetuned Falcon-40b on WebGLM (Liu et al., 2023) and GSM8k (Cobbe et al., 2021) benchmarks, and compared their test performances across different ranks. As described in Table 2, QDyLoRA attains superior performance, notably when employing its optimal ranks (Rank 2 for Web-GLM and Rank 8 for GSM8k). Furthermore, QDyLoRA exhibits consistent superiority over QLoRA, particularly at lower ranks. These findings emphasize the adaptive nature of QDyLoRA in dynamically adjusting its focus during fine-tuning, leading to enhanced efficiency and efficacy compared to its static counterpart, QLoRA. The third experiment compares the performance of DyLoRA, QDyLoRA, and QLoRA on GSM8k and TriviaQA (Joshi et al., 2017) while adopting LLaMA2-13b and LLaMA-7b as LLMs. Table 3 reports the results. As the table illustrates, for smaller-size models, i.e. LLaMA-7b, DyLoRA and QDyLoRA both perform superior than QLoRA. For larger models, i.e. LLaMA2-13b, DyLoRA fails due to the out-of-memory (OOM) error while QDyLoRA works the best in such situations.

4 On the semi-sorted behavior of QDyLoRA

As shown in Table 2, QDyLoRA reveals a semisorted performance across ranks. We justify this behavior by pointing out the limited finetuning budget. In a limited budget assumption, QDyLoRA updates its lower ranks more frequently than its higher ranks. That is because of the fact that lower ranks are also updated when higher ranks are selected. In other words, lower ranks have more chance to get updated than higher ranks. Hence, lower ranks are more tuned than higher ranks.

This paper is available on arxiv under ATTRIBUTION-NONCOMMERCIAL-SHAREALIKE 4.0 INTERNATIONAL license.

[1] The same settings as the original QLoRA work are applied here.

[2] The maximum LoRA rank is fixed to 64. While QLoRA’s rank is always fixed, QDyLoRA can split the training across ranks in range 1 to 64.

QDyLoRA in Action: Method, Benchmarks, and Why It Outperforms QLoRA | HackerNoon

Table of Links

2 Proposed Method: Quantized DyLoRA

3 Experiments and Evaluation

4 On the semi-sorted behavior of QDyLoRA

Leave a Reply

Table of Links

2 Proposed Method: Quantized DyLoRA

3 Experiments and Evaluation

4 On the semi-sorted behavior of QDyLoRA

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Leave a Reply