A Faster Path To Smarter AI: The New Anc-VI Method

A Faster Path to Smarter AI: The New Anc-VI Method | HackerNoon

Last updated: 2025/01/14 at 10:15 PM

News Room Published 14 January 2025

Authors:

(1) Jongmin Lee, Department of Mathematical Science, Seoul National University;

(2) Ernest K. Ryu, Department of Mathematical Science, Seoul National University and Interdisciplinary Program in Artificial Intelligence, Seoul National University.

Abstract and 1 Introduction

1.1 Notations and preliminaries

1.2 Prior works

2 Anchored Value Iteration

2.1 Accelerated rate for Bellman consistency operator

2.2 Accelerated rate for Bellman optimality opera

3 Convergence when y=1

4 Complexity lower bound

5 Approximate Anchored Value Iteration

6 Gauss–Seidel Anchored Value Iteration

7 Conclusion, Acknowledgments and Disclosure of Funding and References

A Preliminaries

B Omitted proofs in Section 2

C Omitted proofs in Section 3

D Omitted proofs in Section 4

E Omitted proofs in Section 5

F Omitted proofs in Section 6

G Broader Impacts

H Limitations

Abstract

Value Iteration (VI) is foundational to the theory and practice of modern reinforcement learning, and it is known to converge at a O(γ k )-rate, where γ is the discount factor. Surprisingly, however, the optimal rate in terms of Bellman error for the VI setup was not known, and finding a general acceleration mechanism has been an open problem. In this paper, we present the first accelerated VI for both the Bellman consistency and optimality operators. Our method, called Anc-VI, is based on an anchoring mechanism (distinct from Nesterov’s acceleration), and it reduces the Bellman error faster than standard VI. In particular, Anc-VI exhibits a O(1/k)-rate for γ ≈ 1 or even γ = 1, while standard VI has rate O(1) for γ ≥ 1 − 1/k, where k is the iteration count. We also provide a complexity lower bound matching the upper bound up to a constant factor of 4, thereby establishing optimality of the accelerated rate of Anc-VI. Finally, we show that the anchoring mechanism provides the same benefit in the approximate VI and Gauss–Seidel VI setups as well.

1 Introduction

Value Iteration (VI) is foundational to the theory and practice of modern dynamic programming (DP) and reinforcement learning (RL). It is well known that when a discount factor γ < 1 is used, (exact) VI is a contractive iteration in the k·k∞-norm and therefore converges. The progress of VI is measured by the Bellman error in practice (as the distance to the fixed point is not computable), and much prior work has been dedicated to analyzing the rates of convergence of VI and its variants.

37th Conference on Neural Information Processing Systems (NeurIPS 2023).

A Faster Path to Smarter AI: The New Anc-VI Method | HackerNoon

Abstract

1 Introduction

Leave a Reply Cancel reply

Stay Connected

Latest News

Best Laptop Deal of the Day: Knock 35% Off the Ultraportable LG Touch

How to Manage a Project Budget: 7 Easy Steps, Tips & Templates

Now it will be easier than ever not to get confused with USB-C ports. They will be labeled with their transmission speed

New Year Apps for 2025, Apple Predictions & Buyer’s Guide Favorites

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Abstract

1 Introduction

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News