By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Model Promotion: Using EMA to Balance Learning and Forgetting in IIL | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Model Promotion: Using EMA to Balance Learning and Forgetting in IIL | HackerNoon
Computing

Model Promotion: Using EMA to Balance Learning and Forgetting in IIL | HackerNoon

News Room
Last updated: 2025/11/05 at 1:43 PM
News Room Published 5 November 2025
Share
Model Promotion: Using EMA to Balance Learning and Forgetting in IIL | HackerNoon
SHARE

Table of Links

Abstract and 1 Introduction

  1. Related works

  2. Problem setting

  3. Methodology

    4.1. Decision boundary-aware distillation

    4.2. Knowledge consolidation

  4. Experimental results and 5.1. Experiment Setup

    5.2. Comparison with SOTA methods

    5.3. Ablation study

  5. Conclusion and future work and References

Supplementary Material

  1. Details of the theoretical analysis on KCEMA mechanism in IIL
  2. Algorithm overview
  3. Dataset details
  4. Implementation details
  5. Visualization of dusted input images
  6. More experimental results

4.2. Knowledge consolidation

Different from existing IIL methods that only focus on the student model, we propose to consolidate knowledge from student to teacher for better balance between learning and forgetting. The consolidation is not implemented through learning but through model exponential moving average (EMA). Model EMA was initially introduced by Tarvainen et al. [28] to enhance the generalizability of models. In the vanilla model EMA, the model is trained from scratch, and EMA is applied after every iteration. The underlying mechanism of model EMA is not thoroughly explained before. In this work, we leverage model EMA for knowledge consolidation (KC) in the context of IIL task and explain the mechanism theoretically. According to our theoretical analysis, we propose a new KC-EMA for knowledge consolidation. Mathematically, the model EMA can be formulated as

Hence, the teacher model can achieve a minima training loss on both the old task and the new task, which indicates improved generalization on both the old data and new observations. This has been verified by our experiments in Sec. 5. However, since α < 1, it is noteworthy that the gradient of the teacher model, whether on the old task or the new task, is larger than the initial gradient on the old task or the final gradient of the student model on the new task. That is, the obtained teacher model sacrifices some unilateral performance on either the old data or the new data in order to achieve better generalization on both. From this perspective, the mechanism of vanilla EMA could also be partially explained. In vanilla EMA, where the model starts from scratch and only the new task is considered, we only need to focus on the second term in Equation 13. Since the teacher model has larger gradient on the training data than the student model, it is less possible to overfit to the training data. As a result, the teacher model has better generalization as Tarvainen et al. [28] observed.

:::info
Authors:

(1) Qiang Nie, Hong Kong University of Science and Technology (Guangzhou);

(2) Weifu Fu, Tencent Youtu Lab;

(3) Yuhuan Lin, Tencent Youtu Lab;

(4) Jialin Li, Tencent Youtu Lab;

(5) Yifeng Zhou, Tencent Youtu Lab;

(6) Yong Liu, Tencent Youtu Lab;

(7) Qiang Nie, Hong Kong University of Science and Technology (Guangzhou);

(8) Chengjie Wang, Tencent Youtu Lab.

:::


:::info
This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.

:::

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Digital democracy is the key to staging wartime elections in Ukraine Digital democracy is the key to staging wartime elections in Ukraine
Next Article New retina e-paper could make e-ink displays sharper than your iPhone screen New retina e-paper could make e-ink displays sharper than your iPhone screen
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

The Q4 Playbook: What October’s Sales Taught Us About Holiday Creator Marketing
Computing
Musk cheers Trump’s pick to lead NASA Jared Isaacman
Musk cheers Trump’s pick to lead NASA Jared Isaacman
News
New IIL Setting: Enhancing Deployed Models with Only New Data | HackerNoon
New IIL Setting: Enhancing Deployed Models with Only New Data | HackerNoon
Computing
US indicts three cyber pros who moonlit for ransomware gang | Computer Weekly
US indicts three cyber pros who moonlit for ransomware gang | Computer Weekly
News

You Might also Like

The Q4 Playbook: What October’s Sales Taught Us About Holiday Creator Marketing

5 Min Read
New IIL Setting: Enhancing Deployed Models with Only New Data | HackerNoon
Computing

New IIL Setting: Enhancing Deployed Models with Only New Data | HackerNoon

2 Min Read
My #1 Free Content Brief Template (Google Doc & Word Template)
Computing

My #1 Free Content Brief Template (Google Doc & Word Template)

20 Min Read
Medical Image Synthesis: S-CycleGAN for RUSS and Segmentation | HackerNoon
Computing

Medical Image Synthesis: S-CycleGAN for RUSS and Segmentation | HackerNoon

2 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?