By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Does the Adam Optimizer Amplify Catastrophic Forgetting? | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Does the Adam Optimizer Amplify Catastrophic Forgetting? | HackerNoon
Computing

Does the Adam Optimizer Amplify Catastrophic Forgetting? | HackerNoon

News Room
Last updated: 2026/03/17 at 7:13 PM
News Room Published 17 March 2026
Share
Does the Adam Optimizer Amplify Catastrophic Forgetting? | HackerNoon
SHARE

:::info
Authors:

  1. Dylan R. Ashley
  2. Sina Ghiassian
  3. Richard S. Sutton

:::

TABLE OF LINKS

Abstract

1 Introduction

2 Related Work

3 Problem Formulation

4 Measuring Catastrophic Forgetting

5 Experimental Setup

6 Results

7 Discussion

8 Conclusion

9 Future Work and References

Abstract

Catastrophic forgetting remains a severe hindrance to the broad application of artificial neural networks (ANNs), however, it continues to be a poorly understood phenomenon. Despite the extensive amount of work on catastrophic forgetting, we argue that it is still unclear how exactly the phenomenon should be quantified, and, moreover, to what degree all of the choices we make when designing learning systems affect the amount of catastrophic forgetting. We use various testbeds from the reinforcement learning and supervised learning literature to (1) provide evidence that the choice of which modern gradient-based optimization algorithm is used to train an ANN has a significant impact on the amount of catastrophic forgetting and show that—surprisingly—in many instances classical algorithms such as vanilla SGD experience less catastrophic forgetting than the more modern algorithms such as Adam. We empirically compare four different existing metrics for quantifying catastrophic forgetting and (2) show that the degree to which the learning systems experience catastrophic forgetting is sufficiently sensitive to the metric used that a change from one principled metric to another is enough to change the conclusions of a study dramatically. Our results suggest that a much more rigorous experimental methodology is required when looking at catastrophic forgetting. Based on our results, we recommend inter-task forgetting in supervised learning must be measured with both retention and relearning metrics concurrently, and intra-task forgetting in reinforcement learning must—at the very least—be measured with pairwise interference.

1 Introduction

In online learning, catastrophic forgetting refers to the tendency for artificial neural networks (ANNs) to forget previously learned information when in the presence of new information (French, 1991, p. 173). Catastrophic forgetting presents a severe issue for the broad applicability of ANNs as many important learning problems, such as reinforcement learning, are online learning problems. Efficient online learning is also core to the continual—sometimes called lifelong (Chen and Liu, 2018, p. 55)—learning problem. The existence of catastrophic forgetting is of particular relevance now as ANNs have been responsible for a number of major artificial intelligence (AI) successes in recent years (e.g., Taigman et al. (2014), Mnih et al. (2015), Silver et al. (2016), Gatys et al. (2016), Vaswani et al. (2017), Radford et al. (2019), Senior et al. (2020)). Thus there is reason to believe that methods able to successfully mitigate catastrophic forgetting could lead to new breakthroughs in online learning problems.

The significance of the catastrophic forgetting problem means that it has attracted much attention from the AI community. It was first formally reported on in McCloskey and Cohen (1989) and, since then, numerous methods have been proposed to mitigate it (e.g., Kirkpatrick et al. (2017), Lee et al. (2017), Zenke et al. (2017), Masse et al. (2018), Sodhani et al. (2020)). Despite this, it continues to be an unsolved issue (Kemker et al., 2018). This may be partly because the phenomenon itself—and what contributes to it—is poorly understood, with recent work still uncovering fundamental connections (e.g., Mirzadeh et al. (2020)). This paper is offered as a step forward in our understanding of the phenomenon of catastrophic forgetting. In this work, we seek to improve our understanding of it by revisiting the fundamental questions of (1) how we should quantify catastrophic forgetting, and (2) to what degree do all of the choices we make when designing learning systems affect the amount of catastrophic forgetting. To answer the first question, we compare several different existing measures for catastrophic forgetting: retention, relearning, activation overlap, and pairwise interference. We discuss each of these metrics in detail in Section 4. We show that, despite each of these metrics providing a principled measure of catastrophic forgetting, the relative ranking of algorithms varies wildly between them. This result suggests that catastrophic forgetting is not a phenomenon that a single one of these metrics can effectively describe. As most existing research into methods to mitigate catastrophic forgetting rarely looks at more than one of these metrics, our results imply that a more rigorous experimental methodology is required in the research community. Based on our results, we recommend that work looking at inter-task forgetting in supervised learning must, at the very least, consider both retention and relearning metrics concurrently. For intra-task forgetting in reinforcement learning, our results suggest that pairwise interference may be a suitable metric, but that activation overlap should, in general, be avoided as a singular measure of catastrophic forgetting.

To address the question of to what degree all the choices we make when designing learning systems affect the amount of catastrophic forgetting, we look at how the choice of which modern gradientbased optimizer is used to train an ANN impacts the amount of catastrophic forgetting that occurs during training. We empirically compare vanilla SGD, SGD with Momentum (Qian, 1999; Rumelhart et al., 1986), RMSProp (Hinton et al., n.d.), and Adam (Kingma and Ba, 2014), under the different metrics and testbeds. Our results suggest that selecting one of these optimizers over another does indeed result in a significant change in the catastrophic forgetting experienced by the learning system. Furthermore, our results ground previous observations about why vanilla SGD is often favoured in continual learning settings (Mirzadeh et al., 2020, p. 6): namely that it frequently experiences less catastrophic forgetting than the more sophisticated gradient-based optimizers—with a particularly pronounced reduction when compared with Adam. To the best of our knowledge, this is the first work explicitly providing strong evidence of this. Importantly, in this work, we are trying to better understand the phenomenon of catastrophic forgetting itself, and not explicitly seeking to understand the relationship between catastrophic forgetting and performance. While that relation is important, it is not the focus of this work. Thus, we defer all discussion of that relation to Appendix C of our supplementary material. The source code for our experiments is available at https://github.com/dylanashley/catastrophic-forgetting/tree/arxiv.

:::info
This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Google Calendar’s update will make the app faster for travelers to use and it’s about time! Google Calendar’s update will make the app faster for travelers to use and it’s about time!
Next Article Deal: Save  and transform any room with the Govee RGBIC Neon Lights Deal: Save $40 and transform any room with the Govee RGBIC Neon Lights
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Managing cash flow when growth outpaces revenue
Managing cash flow when growth outpaces revenue
News
Unitree CEO says robots could surpass Usain Bolt’s 100m speed this year · TechNode
Unitree CEO says robots could surpass Usain Bolt’s 100m speed this year · TechNode
Computing
The Best Early Amazon Big Spring Sale Deals on Apple AirPods Are Here
The Best Early Amazon Big Spring Sale Deals on Apple AirPods Are Here
News
What Users Want From AI Next (It’s Not ‘Adult Mode’) | HackerNoon
What Users Want From AI Next (It’s Not ‘Adult Mode’) | HackerNoon
Computing

You Might also Like

Unitree CEO says robots could surpass Usain Bolt’s 100m speed this year · TechNode
Computing

Unitree CEO says robots could surpass Usain Bolt’s 100m speed this year · TechNode

1 Min Read
What Users Want From AI Next (It’s Not ‘Adult Mode’) | HackerNoon
Computing

What Users Want From AI Next (It’s Not ‘Adult Mode’) | HackerNoon

3 Min Read
Meituan expands quick commerce operations to Saudi Arabia with Xiaoxiang Supermarket · TechNode
Computing

Meituan expands quick commerce operations to Saudi Arabia with Xiaoxiang Supermarket · TechNode

1 Min Read
Private, Native USDT Payments on Bitcoin | HackerNoon
Computing

Private, Native USDT Payments on Bitcoin | HackerNoon

7 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?