Empirical Results: GPT-2 Analysis Of Transformer Memorization & Loss

Empirical Results: GPT-2 Analysis of Transformer Memorization & Loss | HackerNoon

Last updated: 2025/06/21 at 10:02 PM

News Room Published 21 June 2025

Table of Links

Abstract and 1 Introduction

2 Related Work

3 Model and 3.1 Associative memories

3.2 Transformer blocks

4 A New Energy Function

4.1 The layered structure

5 Cross-Entropy Loss

6 Empirical Results and 6.1 Empirical evaluation of the radius

6.2 Training GPT-2

6.3 Training Vanilla Transformers

7 Conclusion and Acknowledgments

Appendix A. Deferred Tables

Appendix B. Some Properties of the Energy Functions

Appendix C. Deferred Proofs from Section 5

Appendix D. Transformer Details: Using GPT-2 as an Example

References

6 Empirical Results

We explore the hypothesis regarding the radius r in Section 5 using a pre-trained GPT-2 medium model. Additionally, we train various GPT-2 small models and vanilla Transformer models to analyze their cross-entropy losses.

6.1 Empirical evaluation of the radius

Figure 3: Cross-entropy loss of GPT-2 small model trained on (left) 100%, (middle) 1%, and (right) 0.1% of OpenWebText-9B dataset with a typical training time.

Authors:

(1) Xueyan Niu, Theory Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd.;

(2) Bo Bai baibo ([email protected]);

(3) Lei Deng ([email protected]);

(4) Wei Han ([email protected]).

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

1. available at https://github.com/openai/gpt-2

Empirical Results: GPT-2 Analysis of Transformer Memorization & Loss | HackerNoon

Table of Links

6 Empirical Results

6.1 Empirical evaluation of the radius

Leave a Reply Cancel reply

Stay Connected

Latest News

Sony Adopts Pixel-Like Camera Bar for New Xperia, But US Buyers Are Out of Luck

Password1: how scammers exploit variations of your logins

Streamline your AI flow with this all-in-one platform, now $440 off

Thinking About Buying An Amazon Fire Stick? Here Are 5 Things You Need To Know – BGR

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Table of Links

6 Empirical Results

6.1 Empirical evaluation of the radius

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News