By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Here’s Why AI Researchers Are Talking About Sparse Spectral Training | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Here’s Why AI Researchers Are Talking About Sparse Spectral Training | HackerNoon
Computing

Here’s Why AI Researchers Are Talking About Sparse Spectral Training | HackerNoon

News Room
Last updated: 2025/10/30 at 10:04 PM
News Room Published 30 October 2025
Share
Here’s Why AI Researchers Are Talking About Sparse Spectral Training  | HackerNoon
SHARE

Table of Links

Abstract and 1. Introduction

  1. Related Work

  2. Low Rank Adaptation

    3.1 LoRA and 3.2 Limitation of LoRA

    3.3 ReLoRA*

  3. Sparse Spectral Training

    4.1 Preliminaries and 4.2 Gradient Update of U, VT with Σ

    4.3 Why SVD Initialization is Important

    4.4 SST Balances Exploitation and Exploration

    4.5 Memory-Efficient Implementation for SST and 4.6 Sparsity of SST

  4. Experiments

    5.1 Machine Translation

    5.2 Natural Language Generation

    5.3 Hyperbolic Graph Neural Networks

  5. Conclusion and Discussion

  6. Broader Impacts and References

Supplementary Information

A. Algorithm of Sparse Spectral Training

B. Proof of Gradient of Sparse Spectral Layer

C. Proof of Decomposition of Gradient of Weight

D. Proof of Advantage of Enhanced Gradient over Default Gradient

E. Proof of Zero Distortion with SVD Initialization

F. Experiment Details

G. Singular Value Pruning

H. Evaluating SST and GaLore: Complementary Approaches to Memory Efficiency

I. Ablation Study

A Algorithm of Sparse Spectral Training

B Proof of Gradient of Sparse Spectral Layer

We can express the differential of W as the sum of differentials:

We have chain rule for the gradient of W:

C Proof of Decomposition of Gradient of Weight

D Proof of Advantage of Enhanced Gradient over Default Gradient

As only the direction of update matters, the scale of update can be adjusted by changing learning rate. We measure similarity using the Frobenius norm of the differences between SST updates and 3 times of the full-rank update.

E Proof of Zero Distortion with SVD Initialization

F Experiment Details

F.1 Implementation Details for SST

F.2 Hyperparameters of Machine Translation

IWSLT’14. The hyperparameters can be found in Table 6. We employ the same codebase and hyperparameters as those used in HyboNet [12], which is derived from OpenNMT-py [54]. The final model checkpoint is utilized for evaluation. Beam search, with a beam size of 2, is employed to optimize the evaluation process. Experiments were conducted on one A100 GPU.

For SST, number of steps per iteration (T3) is set to 200. Each iteration begins with a warmup phase lasting 20 steps. The number of iterations per round (T2) is determined by the formula T2 = d/r, where d represents the embedding dimension and r denotes the rank used in SST.

Table 6: Hyperparameters on IWSLT’14 for Euclidean and hyperbolic Transformer.

For SST, number of steps per iteration (T3) is set to 200 for Multi30K and 400 for IWSLT’17. Each iteration begins with a warmup phase lasting 20 steps. The number of iterations per round (T2) is determined by the formula T2 = d/r, where d represents the embedding dimension and r denotes the rank used in SST

F.3 Hyperparameters of Natural Language Generation

The hyperparameters for our experiments are detailed in Table 8. We employ a linear warmup of 2000 steps followed by a stable learning rate, without decay. A larger learning rate (0.001) is used for only low rank parameters (U, VT and Σ for SST, B and A for LoRA and ReLoRA*. The total training tokens for each experiment is 19.7B, roughly 2 epochs of OpenWebText. Distributed training is facilitated using the Accelerate [55] library across four A100 GPUs on a Linux server.

For SST, number of steps per iteration (T3) is set to 200. Each iteration begins with a warmup phase lasting 20 steps. The number of iterations per round (T2) is determined by the formula T2 = d/r, where d represents the embedding dimension and r denotes the rank used in SST.

Table 7: Hyperparameters on Multi30K and IWSLT’17 for vanilla Transformer.

Table 8: Hyperparameters for OPT Models

F.4 Hyperparameters of Hyperbolic Graph Neural Networks

We use HyboNet [12] as full-rank model, with same hyperparameters as those used in HyboNet. Experiments were conducted on one A100 GPU.

For SST, number of steps per iteration (T3) is set to 100. Each iteration begins with a warmup phase lasting 100 steps. The number of iterations per round (T2) is determined by the formula T2 = d/r, where d represents the embedding dimension and r denotes the rank used in SST.

We set dropout rate to 0.5 for the LoRA and SST methods during the node classification task on the Cora dataset. This is the only one deviation from the HyboNet configuration.

:::info
Authors:

(1) Jialin Zhao, Center for Complex Network Intelligence (CCNI), Tsinghua Laboratory of Brain and Intelligence (THBI) and Department of Computer Science;

(2) Yingtao Zhang, Center for Complex Network Intelligence (CCNI), Tsinghua Laboratory of Brain and Intelligence (THBI) and Department of Computer Science;

(3) Xinghang Li, Department of Computer Science;

(4) Huaping Liu, Department of Computer Science;

(5) Carlo Vittorio Cannistraci, Center for Complex Network Intelligence (CCNI), Tsinghua Laboratory of Brain and Intelligence (THBI), Department of Computer Science, and Department of Biomedical Engineering Tsinghua University, Beijing, China.

:::


:::info
This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Plug And Play: Acer CB322QK Professional Docking Monitor Is 0 Off Plug And Play: Acer CB322QK Professional Docking Monitor Is $100 Off
Next Article Navan IPO tumbles 20% after historic debut under SEC shutdown workaround |  News Navan IPO tumbles 20% after historic debut under SEC shutdown workaround | News
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

A Dungeons & Dragons TV Series With Excellent Reviews Is Taking Over Prime Video’s Top Charts – BGR
A Dungeons & Dragons TV Series With Excellent Reviews Is Taking Over Prime Video’s Top Charts – BGR
News
ByteDance restarts plan to sell Moonton, in talks with Saudi fund’s Savvy Games · TechNode
ByteDance restarts plan to sell Moonton, in talks with Saudi fund’s Savvy Games · TechNode
Computing
Microsoft creates framework for secure optical network architecture | Computer Weekly
Microsoft creates framework for secure optical network architecture | Computer Weekly
News
Android will soon remember your external display preferences just like a PC
Android will soon remember your external display preferences just like a PC
News

You Might also Like

ByteDance restarts plan to sell Moonton, in talks with Saudi fund’s Savvy Games · TechNode
Computing

ByteDance restarts plan to sell Moonton, in talks with Saudi fund’s Savvy Games · TechNode

1 Min Read
Common Desktop Environment “CDE” 2.5.3 Released After Two Years
Computing

Common Desktop Environment “CDE” 2.5.3 Released After Two Years

2 Min Read
TrustLinq Launches Swiss-Regulated Crypto-to-Fiat Payment Platform to Boost Cryptocurrency Adoption | HackerNoon
Computing

TrustLinq Launches Swiss-Regulated Crypto-to-Fiat Payment Platform to Boost Cryptocurrency Adoption | HackerNoon

4 Min Read
Vulkan’s VK_EXT_present_timing Merged After Five Years In The Making
Computing

Vulkan’s VK_EXT_present_timing Merged After Five Years In The Making

3 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?