|
Article Excerpt The minmax in repeated games with imperfect monitoring can differ from the minmax of those games with perfect monitoring when two or more players are able to gain common information known only to themselves, and utilize this information at a later stage. Gossner and Tomala showed that in a class of such games, the minmax is given by a weighted average of the payoffs of two main strategies: one in which the information is gained, and the other in which the information is utilized. However, all examples analyzed to date require only one main strategy in which information is created and utilized simultaneously. We show that two strategies are indeed needed by providing and solving a concrete example of a three-player game.
Key words: repeated games; imperfect monitoring; entropy; correlation system
1. Introduction. The minmax in a three-or-more-player repeated game with imperfect monitoring, if it exists, can differ from the minmax of the same game with perfect monitoring. Specifically, it can differ when some of the players are able to gain common information known only to themselves, using the signaling structure of the game. This additional information enables these players to correlate their strategies in ways unknown to the other players, thereby reducing the minmax. The difficulty in computing the minmax in the case of three or more players is due to the fact that there is a trade-off between using the additional information locally to achieve immediate improvement of payoff, and saving some of the information for use in future stages of the game. Based on this idea, Gossner and Tomala [3] characterized the minmax in a class of repeated games with imperfect monitoring (henceforth the GT class). Their result will be discussed in [section] 2. Meanwhile, we present it informally.
Assume v to be the minmax of one of the players, and let the team be the other players. An [epsilon]-minmax strategy for the team is a strategy that guarantees v + [epsilon] in all sufficiently long games.
The [epsilon]-minmax strategies used in the proof of Gossner and Tomala [3] (see Gossner and Tomala [2] for the detailed proof) are essentially nonstationary: There are different stages for gaining and utilizing information. However, in all the examples analyzed to date, essentially stationary strategies are sufficient. In such strategies, after the initialization step, the team's behavior is stationary; at any stage, the team gains and utilizes the same amount of information. These strategies are simpler than the essentially nonstationary strategies. It is only natural to ask whether there is a need for the nonstationarity property.
The main result of this paper is:
THEOREM 1.1. There are games in the GT class in which the minmax cannot be enforced by essentially stationary strategies.
We now present a three-player game with imperfect monitoring in order to demonstrate how information advantage can be gained using the signaling structure of the game, and how this advantage can be used to improve payoffs. We will later prove that, in this game, the minmax cannot be enforced by essentially stationary strategies.
Consider the following three-player game [GAMMA] with imperfect monitoring. At each stage, player I chooses j [member of] {T, B}, player II chooses k [member of] {L, R}, and player III chooses l [member of] {1, 2}. It is convenient to consider [GAMMA] as a zero-sum game between players I and II, who as teammates try to maximize, and player III, who attempts to minimize. Therefore, in this game we are looking for the maxmin, rather than the minmax, where the min runs over player III's strategies, and the max runs over the other players' mixed strategies. The outcome of the players' choices (j, k, l) is the payoff r(j, k, l), given by the matrices in Figure 1, where r(j, k, l) is the payoff to the team, paid by player III. Players I and II have perfect monitoring; i.e., after each stage they observe the actions of all the players. However, player III observes only the payoff. In other words, his signaling structure is as follows: When players I and II play (T, R) or (B, L), player III is aware of their actions. However, when they play (T, L) or (R, B), player III does not know whether players I and II played (T, L) or whether they played (R, B). Note that this happens exactly when the payoff r(j, k, l) = 1, and it is independent of player III's action.
[FIGURE 1 OMITTED]
Consider the following two-phase strategy. The team plays (1/2, 1/2) [cross product] (1/2, 1/2) ([cross product] denotes the direct product of the probabilities) until r(j, k, l) = 1 (note that this occurs with probability 1/2). When this occurs, the team plays as follows. If the choice in the last stage was (T, L, l), the teammates choose (T, R), and if the choice was (B, R, l), they choose (B, L). The team then returns to (1/2, 1/2) [cross product] (1/2, 1/2) until the next occurrence of r(j, k, 1) = 1.
In the above strategy, the aim of the first phase is to generate common information known only to the team. Each time r(j, k, l) = 1, the team can distinguish between (T, L, l) and (B, R, l), each of which occurs with probability 1/2, while player III cannot. The second phase utilizes all of this common information to reduce the third player's payoff.
In the following section we introduce basic definitions and the theorem proved by Gossner and Tomala [3] regarding the minmax of games with imperfect monitoring. In [section] 3, we discuss the proposed maxmin strategy of the game [GAMMA]. In [section] 4 we prove that the proposed strategy is, in fact, the maxmin strategy of [GAMMA], and that the maxmin can be obtained only by strategies that gain and utilize information in separate stages.
2. Preliminaries. In this section we present the model...
|