LogSumExp Function Properties: Lemmas For Energy Functions

LogSumExp Function Properties: Lemmas for Energy Functions | HackerNoon

Last updated: 2025/06/24 at 6:45 PM

News Room Published 24 June 2025

Table of Links

Abstract and 1 Introduction

2 Related Work

3 Model and 3.1 Associative memories

3.2 Transformer blocks

4 A New Energy Function

4.1 The layered structure

5 Cross-Entropy Loss

6 Empirical Results and 6.1 Empirical evaluation of the radius

6.2 Training GPT-2

6.3 Training Vanilla Transformers

7 Conclusion and Acknowledgments

Appendix A. Deferred Tables

Appendix B. Some Properties of the Energy Functions

Appendix C. Deferred Proofs from Section 5

Appendix D. Transformer Details: Using GPT-2 as an Example

References

Appendix B. Some Properties of the Energy Functions

We introduce some useful properties of the LogSumExp function defined below. This is particularly useful because The softmax function, widely utilized in the Transformer models, is the gradient of the LogSumExp function. As shown in (Grathwohl et al., 2019), the LogSumExp corresponds to the energy function of the a classifier.

Lemma 1 LogSumExp(x) is convex.

Proof

Consequently, we have the following smooth approximation for the min function.

B.1 Proof of Proposition 2

Authors:

(1) Xueyan Niu, Theory Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd.;

(2) Bo Bai baibo ([email protected]);

(3) Lei Deng ([email protected]);

(4) Wei Han ([email protected]).

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

LogSumExp Function Properties: Lemmas for Energy Functions | HackerNoon

Table of Links

Appendix B. Some Properties of the Energy Functions

B.1 Proof of Proposition 2

Leave a Reply Cancel reply

Stay Connected

Latest News

The revolutionary attraction of Futuroscope fixes its opening date

Are AI assistants active on your smartphone without real consent?

I Tested 14 Aluminum-Free Deodorants to See Which Ones Actually Worked

Etsy Cracks Down on 3D-Printed Designs, But Who Decides What’s Original?

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Table of Links

Appendix B. Some Properties of the Energy Functions

B.1 Proof of Proposition 2

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News