Authors:
(1) Avrim Blum, Toyota Technological Institute at Chicago, IL, USA;
(2) Melissa Dutz, Toyota Technological Institute at Chicago, IL, USA.
Table of Links
Abstract and 1 Introduction
2 Setting and 2.1 Models of behaviorally-biased opponents
3 Preliminaries and Intuition
4.1 Myopic Best Responder and 4.2 Gambler’s Fallacy Opponent
4.3 Win-Stay, Lose-Shift Opponent
4.4 Follow-the-Leader Opponent and 4.5 Highest Average Payoff Opponent
5 Generalizing
5.1 Other Behaviorally-Biased Strategies
5.2 Exploiting an Unknown Strategy from a Known Set of Strategies
6 Future Work and References
A Appendix
A.1 Win-Stay Lose-Shift Variant: Tie-Stay
A.2 Follow-the-Leader Variant: Limited History
A.3 Ellipsoid Mistake Bounds
A.4 Highest Average Payoff Opponent
5.1 Other Behaviorally-Biased Strategies
A natural question to ask is, what kinds of behaviorally-biased strategies can we exploit to win nearly every round of a permissible game (Definition 1)? Clearly, if we can predict the opponent’s actions and learn best responses to those actions the opponent plays, we can achieve this goal.
Predicting the Opponent’s Actions
Learning Best Responses
The more challenging question is which deterministic strategies we can exploit to learn best responses for any action the opponent might play. Note that we cannot do this for every deterministic strategy; consider a very simple opponent who always plays the same action a. In this case, the opponent’s play does not reveal any information about a best response to a.
▶ Observation 7. It is not always possible to exploit a known deterministic strategy to guarantee winning any rounds.
Note that not every reasonable behavioral strategy has this property; several of the biased opponents we discussed earlier do not: the Gambler’s Fallacy opponent, the Tie-Stay variant of Win-Stay Lose-Shift opponent, and the Highest Average Payoff opponent.