Authors:
(1) Avrim Blum, Toyota Technological Institute at Chicago, IL, USA;
(2) Melissa Dutz, Toyota Technological Institute at Chicago, IL, USA.
Table of Links
Abstract and 1 Introduction
2 Setting and 2.1 Models of behaviorally-biased opponents
3 Preliminaries and Intuition
4.1 Myopic Best Responder and 4.2 Gambler’s Fallacy Opponent
4.3 Win-Stay, Lose-Shift Opponent
4.4 Follow-the-Leader Opponent and 4.5 Highest Average Payoff Opponent
5 Generalizing
5.1 Other Behaviorally-Biased Strategies
5.2 Exploiting an Unknown Strategy from a Known Set of Strategies
6 Future Work and References
A Appendix
A.1 Win-Stay Lose-Shift Variant: Tie-Stay
A.2 Follow-the-Leader Variant: Limited History
A.3 Ellipsoid Mistake Bounds
A.4 Highest Average Payoff Opponent
A. Appendix
A.1 Win-Stay Lose-Shift Variant: Tie-Stay
Proof. Recall that the Tie-Stay variant of the Win-Stay Lose-Shift opponent plays the same action immediately following a win or a tie, and switches to the next action in its action ordering immediately following a loss. Because each action is beaten by at least one other action, the opponent must switch after one of the actions we play in response to its current action in phase 1 (since we play each action in succession). If the opponent shifts to play a new action in round r, we won round r − 1, so the best response we record is correct. Since the opponent always shifts to the next action in its action ordering, over n shifts, they will shift through every action before finally shifting back to the first action in its action ordering. Through this process we therefore record a best response to every action, and we record the correct action ordering in step 1. We play at most n actions in response to each action the opponent plays to make it shift, and one of them must beat the opponent’s action, so we incur no more than n(n − 1) losses or ties in this phase.
At the end of the above, the opponent will have shifted back to the first action of their action ordering. However, we don’t generally know in advance when this shift will occur, so the action we played (some a) may or may not have been a best response to it and we therefore can’t be sure whether the opponent will stay or shift. By playing a repeatedly, the opponent will eventually play some action twice; this will happen within n rounds since each action is beaten by at least one other action and ties with itself. However, while the opponent is shifting we must be winning, so we only incur 2 losses or ties here, when the opponent finally plays some action b twice in a row.
Since we finished round 2 with the opponent tying or winning, they will repeat their action (b) again in the next round; we win that round by playing the recorded best response to b (which we showed was correct) in phase 3.
At the start of phase 4, we predict correctly because we won the previous round and know the opponent will shift to the next action in its order (which we showed we recorded correctly). We showed earlier that we recorded correct best responses, so we win by playing the recorded best response to the predicted action. The same conditions hold for every remaining round, so we win all remaining rounds.