Authors:
(1) Avrim Blum, Toyota Technological Institute at Chicago, IL, USA;
(2) Melissa Dutz, Toyota Technological Institute at Chicago, IL, USA.
Table of Links
Abstract and 1 Introduction
2 Setting and 2.1 Models of behaviorally-biased opponents
3 Preliminaries and Intuition
- Strategies for Beating Behaviorally Biased Opponents
4.1 Myopic Best Responder and 4.2 Gambler’s Fallacy Opponent
4.3 Win-Stay, Lose-Shift Opponent
4.4 Follow-the-Leader Opponent and 4.5 Highest Average Payoff Opponent
5 Generalizing
5.1 Other Behaviorally-Biased Strategies
5.2 Exploiting an Unknown Strategy from a Known Set of Strategies
6 Future Work and References
A Appendix
A.1 Win-Stay Lose-Shift Variant: Tie-Stay
A.2 Follow-the-Leader Variant: Limited History
A.3 Ellipsoid Mistake Bounds
A.4 Highest Average Payoff Opponent
Abstract
Gameplay under various forms of uncertainty has been widely studied. Feldman et al. [8] studied a particularly low-information setting in which one observes the opponent’s actions but no payoffs, not even one’s own, and introduced an algorithm which guarantees one’s payoff nonetheless approaches the minimax optimal value (i.e., zero) in a symmetric zero-sum game. Against an opponent playing a minimax-optimal strategy, approaching the value of the game is the best one can hope to guarantee. However, a wealth of research in behavioral economics shows that people often do not make perfectly rational, optimal decisions. Here we consider whether it is possible to actually win in this setting if the opponent is behaviorally biased. We model several deterministic, biased opponents and show that even without knowing the game matrix in advance or observing any payoffs, it is possible to take advantage of each bias in order to win nearly every round (so long as the game has the property that each action beats and is beaten by at least one other action). We also provide a partial characterization of the kinds of biased strategies that can be exploited to win nearly every round, and provide algorithms for beating some kinds of biased strategies even when we don’t know which strategy the opponent uses.
1. Introduction
Game theory has traditionally assumed all agents are perfectly rational utility maximizers [14], but behavioral economics shows this assumption is often unsound; people often deviate from “optimal” behavior in predictable ways, exhibiting behavioral biases such as loss aversion (exhibiting a higher sensitivity to losses than equivalent gains) and confirmation bias (interpreting information in a way which favors existing beliefs) [10, 4]. These insights initiated the study of behavioral game theory, which has demonstrated that people employ consistent “irrational” strategies in wide-ranging strategic settings (see [3]). Given the evidence that people often use behavioral strategies in games, it is important to understand the implications of this in algorithmic interactions (see, e.g., [9, 1, 6, 12]).
In this work, we are interested in the extent to which behavioral biases are vulnerable to exploitation in competitive games. To study this, we consider symmetric, repeated, two-player, zero-sum games, and find that even an uninformed opponent who does not know the game or observe any payoffs can take advantage of a wide variety of biased strategies to beat the biased player in nearly every round.
The setting we consider here is based on work by Feldman et al. [8]. Gameplay under various forms of uncertainty is a focus of much research interest; for example, the bandit setting, in which a player observes payoffs after each round but not the opponent’s play, has been widely studied (see [2, 5]). Feldman et al. introduced a complementary setting, in which the player observes the opponent’s play but no payoffs. No assumptions are made about the opponent, who may have full knowledge of the game. In particular, Feldman et al. consider symmetric, two-player, repeated games. They introduce a “copycat” algorithm which guarantees a payoff approaching the value of the game, which is the best bound one could hope to achieve against an opponent playing a minimax-optimal strategy. For a symmetric zero-sum game, the value of the game is zero, so their algorithm guarantees a payoff approaching zero in that case. The Copycat algorithm basically works by ensuring that for any pair of actions a1 and a2, the number of rounds in which the uninformed player plays a1 and the opponent plays a2 is not very far from the number of rounds in which the player plays a2 and the opponent plays a1. In this way, the uninformed player is able to achieve roughly the value of the game without knowing (or learning) anything about payoffs.
The setting introduced by Feldman et al. [8] is a particularly interesting setting in which to study behavioral bias, because it allows us to consider to what extent one can exploit an opponent’s biases when the opponent’s behavior is essentially the only information one has. We consider the same basic setting, with a few additional restrictions. Like Feldman et al., we consider repeated, two-player, zero-sum, symmetric games in a setting where one does not have a priori knowledge of the game matrix and does not observe payoffs, but does observe the opponents actions. We also assume the opponent is deterministic and plays according to some behaviorally-biased strategy. For simplicity, we consider games where payoffs are limited to {1, 0, -1} (just wins, ties and losses). We assume no action is unbeatable and each action beats at least one other action.
There are two technical components to beating a biased opponent in this setting: one needs to learn to predict the opponent’s actions, and one needs to learn a best response to those actions the opponent plays. We note that it is not sufficient to just be able to predict accurately. For example, one natural idea which does not work (even if one is willing to take exponential time) is to simply find the lexicographically first game matrix consistent with all the opponent’s actions so far and their behavioral strategy, and then play a best response to the action predicted by that game matrix. It is possible that two game matrices, M1 and M2, could be consistent with the opponent’s play, but playing best responses with respect to M1 could result in losing most rounds (and tying in the remaining rounds) if the true game matrix is M2. We show such an example in section 3.
We model several deterministic, behaviorally-biased opponents, and show how to exploit each bias to win nearly every round. We also give a partial characterization of the kinds of behavioral strategies we can exploit in general, and show that in some cases, we can also win nearly every round against a biased opponent even if we do not know which behavioral strategy they use.
This work highlights the danger of using behaviorally-biased strategies in competitive settings. We might expect that, while such strategies are not minimax optimal, some seemingly reasonable ones might do well enough, particularly against an uninformed opponent. However, our work shows that some behavioral biases can be exploited even by a player without prior knowledge of the game matrix entries who does not observe payoffs.