Zero-Determinant Strategy
This note aims to explain the parts omitted in this paper:
Press, William H., and Freeman J. Dyson. “Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent.” Proceedings of the National Academy of Sciences 109.26 (2012): 10409-10413.
Interesting Facts
- As stated in the title: Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent. And this kind of strategy is the Zero-Determinant strategy.
- The “Iterated Prisoner’s Dilemma is an Ultimatum game”.
- “One player can enforce a unilateral claim to an unfair share of rewards”, by the ZD strategy.
- Any evolutionary agent will be exploited by the agent with the ZD strategy. “An evolutionary player’s best response is to accede to the extortion. Only a player with a theory of mind about his opponent can do better.”
- “For any strategy of the longer-memory player Y, shorter-memory X’s score is exactly the same as if Y had played a certain shorter-memory strategy.”
Iterated Prisoner’s Dilemma
At each timestep, two agents are playing Prisoner’s Dilemma:
Player1\Player2 | Cooperate (Deny) | Defect (Confess) |
---|---|---|
Cooperate (Deny) | $R,R$ | $S,T$ |
Defect (Confess) | $T,S$ | $P,P$ |
where $T > R > P > S$ and $2R > T+S$, and the meanings are as follows.
- $T$: Temptation
- $R$: Reward
- $P$: Punishment
- $S$: Sucker’s payoff
The two agents repeatedly play this game $T$ times. It might be finite or infinite.
Longer-Memory Strategies Offer No Advantage
- Player $i$:
- Making decisions on short memories $\tau^i$.
- $\pi^i(a^i\mid \tau^i)$
- Player $j$
- Making decisions on long memories $(\tau^i,\Delta\tau)$.
- $\pi^j(a^j\mid \tau^i,\Delta\tau)$
Derivation:
\[\begin{aligned} \mathrm{Pr}(a^i, a^j) =&\sum\limits_{\tau^i,\Delta\tau} \pi^i(a^i\mid \tau^i) \cdot \pi^j(a^j\mid \tau^i,\Delta\tau) \cdot \mathrm{Pr}(\tau^i,\Delta\tau) \\ =& \sum\limits_{\tau^i} \pi^i(a^i\mid \tau^i) \cdot \left[ \sum\limits_{\Delta\tau} \pi^j(a^j\mid \tau^i,\Delta\tau) \cdot \mathrm{Pr}(\Delta\tau\mid \tau^i) \cdot \mathrm{Pr}(\tau^i) \right] \\ =& \sum\limits_{\tau^i} \pi^i(a^i\mid \tau^i) \cdot \left[ \sum\limits_{\Delta\tau} \mathrm{Pr}(a^j, \Delta\tau\mid \tau^i) \right] \cdot \mathrm{Pr}(\tau^i)\\ =& \sum\limits_{\tau^i} \pi^i(a^i\mid \tau^i) \cdot \mathrm{Pr}(a^j \mid \tau^i) \cdot \mathrm{Pr}(\tau^i) \end{aligned}\]$\mathrm{Pr}(a^j \mid \tau^i)$ is the player $j$’s marginalized strategy. And
\[\mathrm{Pr}(a^j \mid \tau^i) = \sum\limits_{\Delta\tau} \mathrm{Pr}(a^j, \Delta\tau\mid \tau^i).\]- After some plays, $j$ can estimate the expectations, and it can switch to an equivalent short-memory strategy.
- “$j$’s switching between a long- and short-memory strategy is completely undetectable (and irrelevant) to $i$.”
Is this conclusion only suitable for repeated games?
Zero-Determinant Strategy
Some tedious parts were automatically filled in with the help of ChatGPT-4.
Notation of 4 outcomes
Player1\Player2 | Cooperate (c) | Defect (d) |
---|---|---|
Cooperate (c) | $\mathrm{cc}$ (1) | $\mathrm{cc}$ (2) |
Defect (d) | $\mathrm{dc}$ (3) | $\mathrm{cc}$ (4) |
Notation of strategies
- There are two players, $i$ and $j$, with memory-one strategies.
- Strategies are based on the outcome of last play.
$p_1, p_2, p_3, p_4$ are independent and range from $[0,1].$
\[\begin{cases} p_{\mathrm{cc}} = \pi^i(a_t^i=\mathrm{Cooperate}\mid a_{t-1}^i =\mathrm{Cooperate},a_{t-1}^j=\mathrm{Cooperate}) \\ p_{\mathrm{cd}} = \pi^i(a_t^i=\mathrm{Cooperate}\mid a_{t-1}^i=\mathrm{Cooperate},a_{t-1}^j=\mathrm{Defect}) \\ p_{\mathrm{dc}} = \pi^i(a_t^i=\mathrm{Cooperate}\mid a_{t-1}^i=\mathrm{Defect},a_{t-1}^j=\mathrm{Cooperate}) \\ p_{\mathrm{dd}} = \pi^i(a_t^i=\mathrm{Cooperate}\mid a_{t-1}^i=\mathrm{Defect},a_{t-1}^j=\mathrm{Defect}) \end{cases}\] \[\mathbf{q} = \pi^j(a_t^j=\mathrm{Cooperate}\mid a_{t-1}^i,a_{t-1}^j)\]Markov transition matrix: $\mathbf{M}(\mathbf{p}, \mathbf{q})$
- It is a transition kernel of an MDP.
- The rows indicates the current states.
- The columns indicates the next states.
- Each entry indicates the probability of the current row state transitioning to the next column state.
$\mathrm{cc}$ | $\mathrm{cd}$ | $\mathrm{dc}$ | $\mathrm{dd}$ | |
---|---|---|---|---|
$\mathrm{cc}$ | $\mathrm{Pr}(\mathrm{cc}\mid \mathrm{cc})$ | $\mathrm{Pr}(\mathrm{cd}\mid \mathrm{cc})$ | $\mathrm{Pr}(\mathrm{dc}\mid \mathrm{cc})$ | $\mathrm{Pr}(\mathrm{dd}\mid \mathrm{cc})$ |
$\mathrm{cd}$ | $\mathrm{Pr}(\mathrm{cc}\mid \mathrm{cd})$ | $\mathrm{Pr}(\mathrm{cd}\mid \mathrm{cd})$ | $\mathrm{Pr}(\mathrm{dc}\mid \mathrm{cd})$ | $\mathrm{Pr}(\mathrm{dd}\mid \mathrm{cd})$ |
$\mathrm{dc}$ | $\mathrm{Pr}(\mathrm{cc}\mid \mathrm{dc})$ | $\mathrm{Pr}(\mathrm{cd}\mid \mathrm{dc})$ | $\mathrm{Pr}(\mathrm{dc}\mid \mathrm{dc})$ | $\mathrm{Pr}(\mathrm{dd}\mid \mathrm{dc})$ |
$\mathrm{dd}$ | $\mathrm{Pr}(\mathrm{cc}\mid \mathrm{dd})$ | $\mathrm{Pr}(\mathrm{cd}\mid \mathrm{dd})$ | $\mathrm{Pr}(\mathrm{dc}\mid \mathrm{dd})$ | $\mathrm{Pr}(\mathrm{dd}\mid \mathrm{dd})$ |
$\mathrm{cc}$ | $\mathrm{cd}$ | $\mathrm{dc}$ | $\mathrm{dd}$ | |
---|---|---|---|---|
$\mathrm{cc}$ | $p_1\cdot q_1$ | $p_1\cdot (1-q_1)$ | $(1-p_1)\cdot q_1$ | $(1-p_1)\cdot(1-q_1)$ |
$\mathrm{cd}$ | $p_2\cdot q_3$ | $p_2\cdot (1-q_3)$ | $(1-p_2)\cdot q_3$ | $(1-p_2)\cdot(1-q_3)$ |
$\mathrm{dc}$ | $p_3\cdot q_2$ | $p_3\cdot (1-q_2)$ | $(1-p_3)\cdot q_2$ | $(1-p_3)\cdot(1-q_2)$ |
$\mathrm{dd}$ | $p_4\cdot q_4$ | $p_4\cdot (1-q_4)$ | $(1-p_4)\cdot q_4$ | $(1-p_4)\cdot(1-q_4)$ |
$\mathbf{M}$ has a unit eigenvalue
This part requires some knowledge of stochastic processes. Check my other note.
- $\mathbf{M}$ is irreducible. It means that the entire state space is on communicating class, i.e., for every two states $i$ and $j$, it holds that $i\leftrightarrow j.$
- $\mathbf{M}$ is finite. It means that $\mathbf{M}$ has a finite number of states.
$\Rightarrow$ $\mathbf{M}$ is positive recurrent. It means that every state can return to itself in finite steps. Formally, $\mathbb{E}[T_i \mid X_0 = i] = m_i < \infty,$ where $T_i$ is the first return time and $T_i = \min \set{ n > 0 : X_n = i \mid X_0 = i }.$ (If the state space of an irreducible Markov chain is finite, then all its states are positive recurrent.)
$\Rightarrow$ $\mathbf{M}$ has a stationary distribution $\mathbf{v}^\intercal = \mathbf{v}^\intercal \mathbf{M}.$ (If the irreducible Markov chain is positive recurrent, then a stationary distribution $\mathrm{v}$ exists, is unique, and is given by $\mathrm{v}_i = 1/m_i$, where $m_i$ is the expected return time to state $i$.)
$\Rightarrow$ $\mathbf{M}$ has a unit eigenvalue: $1\cdot \mathbf{v}^\intercal = \mathbf{v}^\intercal \mathbf{M}.$
$\operatorname{det}(\mathbf{M’}) = 0$
$\mathbf{M}$ has a unit eigenvalue: $1\cdot \mathbf{v}^\intercal = \mathbf{v}^\intercal \mathbf{M}.$
$\Rightarrow$ $\mathbf{v}^\intercal \mathbf{M’} = \boldsymbol{0},$ where $\mathbf{M’} := \mathbf{M} - \mathbf{I}.$
$\Rightarrow$ $\mathbf{M’}$ is singular. Because $\mathbf{v}$ is not a zero vetor, meaning that there is a non-zero vector in the null space of $\mathbf{M’}.$
$\Rightarrow$ $\operatorname{det}(\mathbf{M’}) = 0,$ by definition.
Every row of $\operatorname{Adj}(\mathbf{M}’)$ is proportional to $\mathbf{v}$
\[\mathbf{M} = \left[\begin{array}{llll} p_1 q_1 & p_1\left(1-q_1\right) & \left(1-p_1\right) q_1 & \left(1-p_1\right)\left(1-q_1\right) \\ p_2 q_3 & p_2\left(1-q_3\right) & \left(1-p_2\right) q_3 & \left(1-p_2\right)\left(1-q_3\right) \\ p_3 q_2 & p_3\left(1-q_2\right) & \left(1-p_3\right) q_2 & \left(1-p_3\right)\left(1-q_2\right) \\ p_4 q_4 & p_4\left(1-q_4\right) & \left(1-p_4\right) q_4 & \left(1-p_4\right)\left(1-q_4\right) \end{array}\right]\] \[\mathbf{M'} = \left[\begin{array}{llll} p_1 q_1 - 1 & p_1\left(1-q_1\right) & \left(1-p_1\right) q_1 & \left(1-p_1\right)\left(1-q_1\right) \\ p_2 q_3 & p_2\left(1-q_3\right) - 1 & \left(1-p_2\right) q_3 & \left(1-p_2\right)\left(1-q_3\right) \\ p_3 q_2 & p_3\left(1-q_2\right) & \left(1-p_3\right) q_2 - 1 & \left(1-p_3\right)\left(1-q_2\right) \\ p_4 q_4 & p_4\left(1-q_4\right) & \left(1-p_4\right) q_4 & \left(1-p_4\right)\left(1-q_4\right) - 1 \end{array}\right]\]The cofactor of $\mathbf{M’}:$
\[C_{ij} = (-1)^{i+j} \cdot \operatorname{det}(\mathbf{M'}_{ij}),\]where $\mathbf{M’}_{ij}$ is the $3 \times 3$ submatrix, with the row and column containing the element $\mathbf{M’}_{ij}$ removed from the matrix $\mathbf{M’}.$
\[\operatorname{Adj}(\mathbf{M'}) = \begin{bmatrix} C_{11} & C_{21} & C_{31} & C_{41} \\ C_{12} & C_{22} & C_{32} & C_{42} \\ C_{13} & C_{23} & C_{33} & C_{43} \\ C_{14} & C_{24} & C_{34} & C_{44} \end{bmatrix}\]A basic lemma in Linear Algebra:
\[\mathbf{A} \cdot \operatorname{Adj}(\mathbf{A}) = \operatorname{Adj}(\mathbf{A}) \cdot \mathbf{A} = \operatorname{det}(\mathbf{A}) \cdot \mathbf{I}.\]Here:
\[\operatorname{Adj}(\mathbf{M'}) \cdot \mathbf{M'} = \operatorname{det}(\mathbf{M'}) \cdot \mathbf{I} = 0 \cdot \mathbf{I} = 0.\]Then we have $\operatorname{Adj}(\mathbf{M’}) \cdot \mathbf{M’} = \mathbf{v}^\intercal \mathbf{M’} = \boldsymbol{0}.$ It follows that every row of $\operatorname{Adj}(\mathbf{M’})$ and \mathbf{v} is in the null space of $\mathbf{M’}.$ And thus every row of $\operatorname{Adj}(\mathbf{M}’)$ is proportional to $\mathbf{v}.$
Adding the first column of $\mathbf{M′}$ into the second and third columns
\[\mathbf{M}^{\prime\prime}=\left[\begin{array}{llll} p_1 q_1 - 1 & p_1-1 & q_1-1 & \left(1-p_1\right)\left(1-q_1\right) \\ p_2 q_3 & p_2-1 & q_3 & \left(1-p_2\right)\left(1-q_3\right) \\ p_3 q_2 & p_3 & q_2 - 1 & \left(1-p_3\right)\left(1-q_2\right) \\ p_4 q_4 & p_4 & q_4 & \left(1-p_4\right)\left(1-q_4\right) - 1 \end{array}\right]\]The 4-th row of $\operatorname{Adj}(\mathbf{M’})$ is $(C_{14}, C_{24}, C_{34}, C_{44}).$ The signs and the determinants of these cofactors are not changed compared to the ones of $\mathbf{M}^{\prime\prime}.$
$\mathbf{v’} \cdot \mathbf{f} \equiv \operatorname{D}(\mathbf{p}, \mathbf{q}, \mathbf{f})$
\[\mathbf{v'} \cdot \mathbf{f} \equiv \operatorname{D}(\mathbf{p}, \mathbf{q}, \mathbf{f}) = \operatorname{det} \left[\begin{array}{llll} p_1 q_1 - 1 & p_1-1 & q_1-1 & f_1 \\ p_2 q_3 & p_2-1 & q_3 & f_2 \\ p_3 q_2 & p_3 & q_2 - 1 & f_3 \\ p_4 q_4 & p_4 & q_4 & f_4 \end{array}\right]\]Here $\mathbf{v’} = k\cdot\mathbf{v} = (C_{14}, C_{24}, C_{34}, C_{44}),$ where $k$ is an integer. The equation holds by the definition of determinant.
The rest of the paper is not hard to parse.