|
Article Excerpt We characterize the maximum payoff that a team can guarantee against another in a class of repeated games with imperfect monitoring. Our result relies on the optimal tradeoff for the team between optimization of stage payoffs and generation of signals for future correlation.
Key words: stochastic process; min-max values; signals; entropy; repeated games; secret correlation
1. Introduction. In many strategic situations, a group of players may find it beneficial to coordinate their action plans in a way that is hidden from other players. The manager of a sport team devises coordinated plans for the team members, and generals of allied armies need to keep their coordinated plans secret from enemies. On the Internet, coordinated attacks of systems (e.g., by viruses) are known to be much more dangerous than uncoordinated attacks. The management of a firm coordinates the actions of the units of production in a way that is hidden from the competitors.
Coordination of a group of players needs to rely on the observation of a common signal by its members. This signal can arise from an external correlation device (Aumann [2]), or be the result of communication between the players in the group (Forges [5]). In the model of repeated games with imperfect monitoring, each player observes a signal that depends stochastically on chosen actions (deterministic signals is a particular case); the signals may be correlated. These games feature both correlated signals and communication possibilities since actions may be used as messages.
This article explores the possibilities of secret correlation between team members in a repeated game with imperfect monitoring. In our model, two teams are matched against each other. Each member i of team I has an action set Ai. Team II is viewed as single player with action set B. At each stage, team II observes a (possibly random) signal s about I's action profile a, drawn according to some probability distribution q(s|a). Team I's members are informed of a, s, and possibly of II's actions (our result covers the cases in which team I has perfect, imperfect, or no observation of II's choice). The payoff to team I is a function of both team's action choices. In order to stress the value of secret correlation between team members, we assume that team II's goal is to minimize team I's payoff. Since team I has more information than team II about action choices, this extra information can be used as a correlation device for future actions. Our model allows us to study the optimal tradeoffs for team I between generation of signals for future correlation and use of correlation for present payoffs.
Our main result is a characterization of the best payoff that the team can guarantee against outside players as either the horizon of the game grows to infinity, or the discount factor goes to one. We emphasize three reasons why we believe characterizing the max min value is important:
First, such characterizations are important for the general study of repeated games with imperfect monitoring because they provide the individually rational levels. Some generalizations of the Folk Theorem from the perfect monitoring case to imperfect monitoring, such as Renault and Tomala [20] and Hrrner and Olszewski [12] show that the set of equilibrium payoffs of repeated games is the set of feasible and individually rational payoffs, but do not characterize the individually rational levels. Such a characterization completes these studies, thus providing full descriptions of the sets of equilibrium payoffs of the repeated games.
Second, von Stengel and Koller [22] proved that, in finite games where a team of players is matched against one outside player, the max min payoff is a Nash payoff. Furthermore, it is the most natural Nash payoff to select since team members can guarantee this value. Combined with our result, we know that the maximal Nash payoff to the team in the repeated game with imperfect monitoring is the max min we characterize.
Finally, the max min of the repeated game measures how successful team I is in correlating secretly its actions from outside players. Indeed, when no correlation is possible the max min of the repeated game coincides with the max min in mixed strategies of the stage game. When full correlation is achievable, this max min equals the generally higher max min in correlated strategies of the stage game. In general, only partial correlation may be achievable; the max min of the repeated game may lie between these two values. The study of the endogenous emergence of secret correlation of a group of players is interesting in itself. This article studies secret correlation as arising from monitoring structures. Gossner [8, 9] and Bavly and Neyman [3] studied its emergence through limitations of computational capacities of the players.
The problem faced by the team consists in finding the optimal tradeoff between using previous signals that are unknown to team II as correlation devices, and generating such signals for future use. We measure the amount of secret information contained in past signals by the signals' entropy. Our main result characterizes the team's max min payoff as the best payoff that can be obtained by a convex combination of correlated strategies under the constraint that the average entropy spent by the correlation devices does not exceed the average entropy of secret signals generated.
We motivate the problem by discussing examples in [section]2, present the model and definitions in [section] 3, and the main result in [section]4. We discuss examples in [section]5, and computational aspects in [section]6. The proof of the main results is given in [section]7. For simplicity, the model and the main result are first stated for a simple class of signalling structures; we extend our main result to more general signalling structures in [section]8. Finally, we show consequences for the Folk Theorem in [section]9.
2. Examples. We consider a three-player game where the teams are I = {1,2} and II = {3}. Player 1 chooses rows, Player 2 chooses columns, and Player 3 chooses matrices. The payoffs to the team are given by
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.].
In the repeated game with perfect monitoring, the team guarantees the max min of the one-shot game, where the max runs over the independent probability distributions on [A.sup.1] x [A.sup.2]. That is, the team guarantees 1/4.
Now assume that Player 3 receives blank signals, i.e., has no information on the action profile of I, whereas Players 1 and 2 observe each other's actions. The team can then use the first move of Player 1 as a correlation device, and thus can guarantee the max min of the one-shot game in long repetitions, where the max runs over the set of all probability distributions on [A.sup.1] x [A.sup.2]. That is, from the second stage on, I guarantees 1/2.
Now consider the case where team members observe each other's actions and the signal of Player 3 is given by the following matrix:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.].
Player 3 thus learns at each stage whether Players 1 and 2 played the same action. Consider the following strategy of the team: at Stage 1 each player randomizes between his two actions with equal probabilities. Let [a.sup.1.sub.1] be the random move of Player 1 at Stage 1. At each stage n > 1, play (a, a) if [a.sup.1.sub.1] = a and play (b, b) if [a.sup.1.sub.1] = b. The signal of Player 3 at Stage 1 is uniformly distributed and conditional on this signal; [a.sup.1.sub.1] is also uniformly distributed. Since after Stage 1 the signals will be constant, Player 3 never learns anything about the value of [a.sup.1.sub.1]. Actions of Players 1 and 2 are thus correlated from Stage 2 on and I guarantees 1/2.
Finally, consider the case where team members observe each other's actions and the signal of...
|