Authors:
(1) Avrim Blum, Toyota Technological Institute at Chicago, IL, USA;
(2) Melissa Dutz, Toyota Technological Institute at Chicago, IL, USA.
Table of Links
Abstract and 1 Introduction
2 Setting and 2.1 Models of behaviorally-biased opponents
3 Preliminaries and Intuition
4.1 Myopic Best Responder and 4.2 Gambler’s Fallacy Opponent
4.3 Win-Stay, Lose-Shift Opponent
4.4 Follow-the-Leader Opponent and 4.5 Highest Average Payoff Opponent
5 Generalizing
5.1 Other Behaviorally-Biased Strategies
5.2 Exploiting an Unknown Strategy from a Known Set of Strategies
6 Future Work and References
A Appendix
A.1 Win-Stay Lose-Shift Variant: Tie-Stay
A.2 Follow-the-Leader Variant: Limited History
A.3 Ellipsoid Mistake Bounds
A.4 Highest Average Payoff Opponent
6. Future Work
In future work, it would be interesting to further characterize which behaviorally-biased strategies can be exploited to win nearly every round without observing payoffs. In that vein, one clear next step would be to explore the exploitability of behaviorally-motivated probabilistic strategies. In the probabilistic setting it may not be possible to win even most of the time; for example, probabilistic strategies could be regret-minimizing, in which case the best we can hope to achieve is the minimax value of the game. However, we could aim to get bounds on our performance relative to the best possible strategy given full information about the game matrix, payoffs, and the opponent’s strategy. Another direction would be considering more complex games, for example exploring extensive-form games.
References
1 Deepesh Agarwal and Balasubramaniam Natarajan. Tracking and handling behavioral biases in active learning frameworks. Information Sciences, 641:119117, 2023. URL: https://www.sciencedirect.com/science/article/pii/S0020025523007028, doi:10.1016/j.ins.2023.119117.
2 Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48–77, 2002. doi:10.1137/S0097539701398375.
3 C.F. Camerer. Behavioral Game Theory: Experiments in Strategic Interaction. The Roundtable Series in Behavioral Economics. Princeton University Press, 2011. URL: https://books.google.com/books?id=cr_Xg7cRvdcC.
4 Edward Cartwright. Behavioral economics. Routledge, 2018.
5 Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006. doi:10.1017/CBO9780511546921.
6 Nicolas Christin. Network security games: Combining game theory, behavioral economics, and network measurements. In John S. Baras, Jonathan Katz, and Eitan Altman, editors, Decision and Game Theory for Security, pages 4–6, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg. doi:10.1007/978-3-642-25280-8_2.
7 Giovanna Devetag and Massimo Warglien. Playing the wrong game: An experimental analysis of relational complexity and strategic misrepresentation. Games and Economic Behavior, 62(2):364–382, 2008. URL: https://www.sciencedirect.com/science/article/pii/S0899825607001133, doi:10.1016/j.geb.2007.05.007.
8 Michal Feldman, Adam Kalai, and Moshe Tennenholtz. Playing games without observing payoffs. In Innovations in Computer Science – ICS 2010, Tsinghua University, Beijing, China, January 5-7, 2010. Proceedings, pages 106–110. Tsinghua University Press, 2010. URL: http://conference.iiis.tsinghua.edu.cn/ICS2010/content/papers/9.html.
9 Joseph Y. Halpern and Rafael Pass. Algorithmic rationality: Game theory with costly computation. Journal of Economic Theory, 156:246–268, 2015. URL: https://doi.org/10.1016/j.jet.2014.04.007, doi:10.1016/J.JET.2014.04.007.
10 Daniel Kahneman. Thinking, Fast and Slow. Farrar, Straus and Giroux, New York, 2011.
11 Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2(4):285–318, 1988. doi:10.1007/BF00116827.
12 John M. McNamara, Alasdair I. Houston, and Olof Leimar. Learning, exploitation and bias in games. PLOS ONE, 16(2):1–14, 02 2021. doi:10.1371/journal.pone.0246588.
13 Martin Nowak and Karl Sigmund. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner’s dilemma game. Nature, 364(6432):56–58, July 1993. doi:10.1038/364056a0.
14 Martin J. Osborne and Ariel Rubinstein. A course in game theory. The MIT Press, Cambridge, USA, 1994. electronic edition. URL: https://arielrubinstein.tau.ac.il/books/GT.pdf.
15 Herbert Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5):527 – 535, 1952. doi:10.1090/s0002-9904-1952-09620-8.
16 Shai Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2):107–194, 2012. doi:10.1561/2200000018.
17 Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.
18 Santosh Vempala. Algorithmic convex geometry. https://faculty.cc.gatech.edu/~vempala/acg/notes.pdf, 2008. [Accessed 06-Sep-2023].
19 Zhijian Wang, Bin Xu, and Hai-Jun Zhou. Social cycling and conditional responses in the rock-paper-scissors game. Scientific Reports, 4(1), jul 2014. doi:10.1038/srep05830.