By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: More Than a Feeling: Visualizing Why Filter Atoms Outsmart LoRA in Fine-Tuning | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > More Than a Feeling: Visualizing Why Filter Atoms Outsmart LoRA in Fine-Tuning | HackerNoon
Computing

More Than a Feeling: Visualizing Why Filter Atoms Outsmart LoRA in Fine-Tuning | HackerNoon

News Room
Last updated: 2025/07/01 at 11:17 AM
News Room Published 1 July 2025
Share
SHARE

Table of Links

Abstract and 1. Introduction

  1. Preliminary
  2. Methods
  3. Experiments
  4. Related Works
  5. Conclusion and References
  6. Details of Experiments
  7. Additional Experimental Results

7 Details of Experiments

7.1 Details of Datasets

VTAB dataset is uniquely challenging and well-suited for the evaluation of parameter-efficient tuning methods in the context of few-shot knowledge transfer. VTAB-1k encompasses a diverse range of image domains, including natural, structured, and specialized categories such as medical or satellite imagery. The tasks span various objectives, comprising object and scene recognition, distance classification, and counting. Consequently, VTAB-1k emerges as a highly valuable resource catering to the needs of both discriminative and generative transfer learning tasks.

In Table 5, we provide information on 19 tasks of the VTAB dataset, including the number of classes and the number of images in each data split of VTAB. Images in the VTAB benchmark encompass three distinct domains: (1) Natural images captured using standard cameras, (2) Specialized images captured using non-standard cameras like those in remote sensing and medical applications, and (3) Structured images generated through simulation environments.

VTAB-1k is a subset of VTAB. It contains only 1000 training and validation samples, which are designed for few-shot transfer learning.

Table 5: Information of VTAB dataset.Table 5: Information of VTAB dataset.

7.2 Experimental Settings

LoRA Implementation We adopt the LoRA implementation from https: //github.com/microsoft/LoRA.

LoHa and LoKr Implementation We adopt the LoHa and LoKr implementation from https://github.com/KohakuBlueleaf/LyCORIS.

DiffFit and BitFit Implementation We adopt the DiffFit and BitFit implementation from https://github.com/mkshing/DiffFit-pytorch.

Generative Tasks

Stable diffusion checkpoints. The pre-trained checkpoint we choose for Stable Diffusion is stable-diffusion-v1-4, which can be found at https://huggingface. co/CompVis/stable-diffusion.

Text prompts for the few-shot generative task. We use specific text prompts to train the Stable Diffusion or generate the images. We list the example prompts for each dataset as follows:

– photo of a <castle> .

– The <castle> stands against a backdrop of snow-capped mountains.

– A <castle> surrounded by a lush, vibrant forest.

– The <castle> overlooks a serene lake.

– The <castle> in the autumn season with colorful foliage.

– The <castle> on a rocky cliff, with crashing waves below.

– The <castle> guarded by mythical elves.

– A <castle> surrounded by a field of grazing sheep.

– A peacock in front of the <castle>.

– The <castle> overlooks a serene lake, where a family of geese swims.

– <castle>, oil painting ghibli inspired.

– <castle> painting by artist claude monet.

– <castle> digital painting 3d render geometric style.

– Georgia O’Keeffe style <castle> painting.

– a watercolor painting of the <castle>.

– The <castle> is surrounded by an otherworldly landscape, with glowing mushrooms and mystical creatures.

– The <castle>, made of crystal, shimmers in the sunlight.

– The <castle>, steampunk aesthetic, adorned with gears and metallic accents.

– The <castle> atop a mystical floating island.

– Top view of the <castle>.

Text prompts for the full generative task. We use specific text prompts to train the Stable Diffusion or generate the images. We list the example prompts for each dataset as follows:

– Caltech-101: This is a picture of accordion.

– CIFAR-100: This is a picture of apple.

– Clevr: This is a picture from CLEVR dataset.

– Diabetic Retinopathy: This is a retina image with no diabetic retinopathy.

– DMLab: This is a picture from DMLab dataset.

– Dsprites: This is a picture from dSprites dataset.

Fig. 6: The relations between accuracy and number of fine-tuning parameters, with different numbers of filter atoms (m = 6 and m = 12).Fig. 6: The relations between accuracy and number of fine-tuning parameters, with different numbers of filter atoms (m = 6 and m = 12).

– DTD: This is a picture of banded texture.

– EuroSAT: This is a satellite picture of annual crop.

– Flowers102: This is a picture of pink primrose.

– Kitti: This is a picture from KITTI dataset.

– Patch Camelyon: This is a histopathologic scans without tumor.

– Pet: This is a picture of Abyssinian cat.

– Resisc45: This is a remote sensing picture of airplane.

– Smallnorb: This is a picture from SmallNORB dataset.

– SUN397: This is a picture of abbey.

– SVHN: This is a picture of street view house number 0.

8 Additional Experimental Results

8.1 Validation Experiments

We provide additional experiments with m = 6, 12 in Figure 6. As we increase m from 6 to 12, the accuracy improves from 66.86% to 68.68%.

8.2 Additional Experiments of Discriminative Tasks

Performance Comparisons on Full Dataset Fine-tuning.

Implementation details. For CIFAR-100 and ImageNet-1K, we follow the finetuning setting of ConvNeXt in [30]. We employ the AdamW [33] optimizer to fine-tune models for 100 epochs for CIFAR-100, and 30 epochs for ImageNet1K. The cosine decay strategy is adopted for the learning rate schedule, and the linear warm-up is used in the first 10 epochs for CIFAR-100 and 5 epochs for ImageNet-1K.

We compare the performance of our approach with other baseline methods, and the results on CIFAR-100 and ImageNet-1K are shown in Table 6. With full dataset fine-tuning, the full fine-tuning achieves the highest accuracy, outperforming the parameter-efficient fine-tuning methods. One possible reason is both datasets have sufficient data to prevent over-fitting of the model. Our method achieves a higher accuracy than LoRA while requiring only a small number of parameters (1.2M v.s. 21M). In contrast, in the VTAB-1k benchmark, the amount of data is not very large (e.g., only 1,000 training images), which might cause over-fitting of the model for the full fine-tuning.

Table 6: Performance comparisons on the VTAB-1k benchmark with ConvNeXT models pre-trained on ImageNet-21K.Table 6: Performance comparisons on the VTAB-1k benchmark with ConvNeXT models pre-trained on ImageNet-21K.

Visualization of Generalization Error. To delve deeper into how various fine-tuning methods impact the generalization capabilities of pre-trained models, we illustrate in Figure 7 the generalization error for a discriminative task trained on the CIFAR-100 and Diabetic Retinopathy datasets, in relation to the number of fine-tuned parameters.

Fig. 7: Generalization error of (a) CIFAR-100 and (b) Diabetic Retinopathy.Fig. 7: Generalization error of (a) CIFAR-100 and (b) Diabetic Retinopathy.

8.3 Results of Few-shot Generative Tasks

We provide more experimental results of few-shot generative learning in Table. 7 and 8. In this experiment, we also include LoRA, LoHa, and LoKr with different configurations.

The generated images of different fine-tuning methods are shown in Figure 8 and 9.

Table 7: Evaluate different approaches in learning the concept <castle>.Table 7: Evaluate different approaches in learning the concept <castle>.

Table 8: Evaluate different approaches in learning the concept <canal>.Table 8: Evaluate different approaches in learning the concept <canal>.

8.4 Visualization of Generated Images

We visualize images generated by the models trained on each of VTAB tasks from Figure 10 to Figure 25.

8.5 Grad-CAM

To understand the underlying reason for the effectiveness of our approach on convolution-based models, we employ Grad-CAM [9] on the first block of ResNet50, which are fine-tuned on the CUB dataset [67] using the same experimental setting as above. For our method, we compare the experiment setting with m = 9, which means 9 filter atoms ∆D and the setting with (m, m1) = (9, 4), which means 36 ∆D1.

Based on the Grad-CAM visualization in Figure 26, our method exhibits larger active regions compared with LoRA. This observation indicates that our approach benefits from preserving the spatial structure of convolutional layers. When utilizing ∆D1, which expands the number of filter atoms, we observe more active regions in the Grad-CAM heatmap. This suggests that the introduction of extra filter atoms potentially captures a wider range of feature maps.

We provide more heatmap visualizations of Grad-CAM from the first block of ResNet50 in Figure 27.

Fig. 8: Images sampled from Stable Diffusion [49] checkpoints fine-tuned with different approaches. The text prompts used to generate images from top to bottom are: “The <castle> stands against a backdrop of snow-capped mountains”, “A <castle> surrounded by a lush, vibrant forest”, “A peacock in front of the <castle>”, and ‘The <castle> overlooks a serene lake, where a family of geese swims”.Fig. 8: Images sampled from Stable Diffusion [49] checkpoints fine-tuned with different approaches. The text prompts used to generate images from top to bottom are: “The <castle> stands against a backdrop of snow-capped mountains”, “A <castle> surrounded by a lush, vibrant forest”, “A peacock in front of the <castle>”, and ‘The <castle> overlooks a serene lake, where a family of geese swims”.

Fig. 9: Images sampled from Stable Diffusion [49] checkpoints fine-tuned with different approaches. The text prompts used to generate images from top to bottom are: “The <castle> stands against a backdrop of snow-capped mountains”, “A <castle> surrounded by a lush, vibrant forest”, “A peacock in front of the <castle>”, and ‘The <castle> overlooks a serene lake, where a family of geese swims”.Fig. 9: Images sampled from Stable Diffusion [49] checkpoints fine-tuned with different approaches. The text prompts used to generate images from top to bottom are: “The <castle> stands against a backdrop of snow-capped mountains”, “A <castle> surrounded by a lush, vibrant forest”, “A peacock in front of the <castle>”, and ‘The <castle> overlooks a serene lake, where a family of geese swims”.

Fig. 10: Images sampled from Stable Diffusion checkpoints fine-tuned on the Caltech101.Fig. 10: Images sampled from Stable Diffusion checkpoints fine-tuned on the Caltech101.

Fig. 11: Images sampled from Stable Diffusion checkpoints fine-tuned on the CIFAR100.Fig. 11: Images sampled from Stable Diffusion checkpoints fine-tuned on the CIFAR100.

Fig. 12: Images sampled from Stable Diffusion checkpoints fine-tuned on the SUN397.Fig. 12: Images sampled from Stable Diffusion checkpoints fine-tuned on the SUN397.

Fig. 13: Images sampled from Stable Diffusion checkpoints fine-tuned on the SVHN.Fig. 13: Images sampled from Stable Diffusion checkpoints fine-tuned on the SVHN.

Fig. 14: Images sampled from Stable Diffusion checkpoints fine-tuned on the Flowers102.Fig. 14: Images sampled from Stable Diffusion checkpoints fine-tuned on the Flowers102.

Fig. 15: Images sampled from Stable Diffusion checkpoints fine-tuned on the Pets.Fig. 15: Images sampled from Stable Diffusion checkpoints fine-tuned on the Pets.

Fig. 16: Images sampled from Stable Diffusion checkpoints fine-tuned on the DTD.Fig. 16: Images sampled from Stable Diffusion checkpoints fine-tuned on the DTD.

Fig. 17: Images sampled from Stable Diffusion checkpoints fine-tuned on the EuroSAT.Fig. 17: Images sampled from Stable Diffusion checkpoints fine-tuned on the EuroSAT.

Fig. 18: Images sampled from Stable Diffusion checkpoints fine-tuned on the Resisc45.Fig. 18: Images sampled from Stable Diffusion checkpoints fine-tuned on the Resisc45.

Fig. 19: Images sampled from Stable Diffusion checkpoints fine-tuned on the Patch Camelyon.Fig. 19: Images sampled from Stable Diffusion checkpoints fine-tuned on the Patch Camelyon.

Fig. 20: Images sampled from Stable Diffusion checkpoints fine-tuned on the Diabetic Retinopathy.Fig. 20: Images sampled from Stable Diffusion checkpoints fine-tuned on the Diabetic Retinopathy.

Fig. 21: Images sampled from Stable Diffusion checkpoints fine-tuned on the Kitti.Fig. 21: Images sampled from Stable Diffusion checkpoints fine-tuned on the Kitti.

Fig. 22: Images sampled from Stable Diffusion checkpoints fine-tuned on the Smallnorb.Fig. 22: Images sampled from Stable Diffusion checkpoints fine-tuned on the Smallnorb.

Fig. 23: Images sampled from Stable Diffusion checkpoints fine-tuned on the Dsprites.Fig. 23: Images sampled from Stable Diffusion checkpoints fine-tuned on the Dsprites.

Fig. 24: Images sampled from Stable Diffusion checkpoints fine-tuned on the CLEVR.Fig. 24: Images sampled from Stable Diffusion checkpoints fine-tuned on the CLEVR.

Fig. 25: Images sampled from Stable Diffusion checkpoints fine-tuned on the DMLab.Fig. 25: Images sampled from Stable Diffusion checkpoints fine-tuned on the DMLab.

Fig. 26: The Grad-CAM heatmap comparisons between our method and LoRA reveal that our approach exhibits larger active regions. The heatmap is generated from the first block of ResNet50 [13] utilizing the CUB dataset [67]. Fine-tuning the model with ∆D1 involves additional filter atoms, which leads to larger active regions in the heatmap compared to fine-tuning ∆D only. (a) The Grad-CAM from the first block of ResNet50. (b-d) The Grad-CAM from the 2-4 blocks of ResNet50.Fig. 26: The Grad-CAM heatmap comparisons between our method and LoRA reveal that our approach exhibits larger active regions. The heatmap is generated from the first block of ResNet50 [13] utilizing the CUB dataset [67]. Fine-tuning the model with ∆D1 involves additional filter atoms, which leads to larger active regions in the heatmap compared to fine-tuning ∆D only. (a) The Grad-CAM from the first block of ResNet50. (b-d) The Grad-CAM from the 2-4 blocks of ResNet50.

Fig. 27: Additional Grad-CAM heatmap comparisons between our method and LoRA from the first block of ResNet50Fig. 27: Additional Grad-CAM heatmap comparisons between our method and LoRA from the first block of ResNet50


Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article A Dedicated Hot Dog Cooker Is the Spirit of American Summer
Next Article The Best Early 4th of July Walmart Tech Deals: Smartphones, Speakers, and More
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Morgan Riddle in Wimbledon stuns in outfit while watching boyfriend Taylor Fritz
News
nsnfsSvNwhyyGgs
News
The End of the Guessing Game? Why Describing Data Beats Estimating It | HackerNoon
Computing
NetApp: Not just NAS filers, and a comprehensive cloud strategy | Computer Weekly
News

You Might also Like

Computing

The End of the Guessing Game? Why Describing Data Beats Estimating It | HackerNoon

19 Min Read
Computing

Huawei to pre-install self-developed HarmonyOS on all new devices in 2025 · TechNode

1 Min Read
Computing

Midas And 0G Partner To Bring Real-World Assets To AI-Native Blockchain Infrastructure | HackerNoon

5 Min Read
Computing

Baidu’s AI bot has 300 million users, two months after reaching 200 million milestone · TechNode

1 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?