Authors:
(1) Jongmin Lee, Department of Mathematical Science, Seoul National University;
(2) Ernest K. Ryu, Department of Mathematical Science, Seoul National University and Interdisciplinary Program in Artificial Intelligence, Seoul National University.
Abstract and 1 Introduction
1.1 Notations and preliminaries
1.2 Prior works
2 Anchored Value Iteration
2.1 Accelerated rate for Bellman consistency operator
2.2 Accelerated rate for Bellman optimality opera
3 Convergence when y=1
4 Complexity lower bound
5 Approximate Anchored Value Iteration
6 Gauss–Seidel Anchored Value Iteration
7 Conclusion, Acknowledgments and Disclosure of Funding and References
A Preliminaries
B Omitted proofs in Section 2
C Omitted proofs in Section 3
D Omitted proofs in Section 4
E Omitted proofs in Section 5
F Omitted proofs in Section 6
G Broader Impacts
H Limitations
2.1 Accelerated rate for Bellman consistency operator
First, for general state-action spaces, we present the accelerated convergence rate of Anc-VI for the Bellman consistency operator.
[1] Arguably, T π is affine, not linear, but we follow the convention of [69] say T π is linear.