Alternating Prisoner's Dilemma: Strategies & Dynamics

Updated 2 February 2026

The alternating Prisoner's Dilemma model is a sequential game theory framework where players take turns, influencing strategic interactions and evolutionary outcomes.
It utilizes finite automata and memory-N strategies to model decision-making, enabling rigorous analysis of cooperation and defection under noise.
Recent findings show that forgiveness-based strategies, such as Forgiver and WSLS, robustly stabilize cooperation in noisy, dynamic environments.

The alternating Prisoner’s Dilemma (APD) generalizes the classical simultaneous Prisoner’s Dilemma by imposing strictly sequential moves: in each round, only one player acts while the other observes. This modification—originally modeled both as a strictly alternating sequence of moves and as a leader-follower (P-Q) dynamic—profoundly affects the structure of possible strategies, evolutionary stability, and the mechanisms by which cooperation can emerge and persist. Recent studies on arXiv, including formal automata-theoretic analyses and adaptive-dynamics treatments with finite memory, provide a rigorous mathematical framework for the APD, revealing both qualitative and quantitative distinctions from the simultaneous version (Zagorsky et al., 2013, Balabanova et al., 26 Jan 2026).

1. Formal Definition and Game Structure

In the strictly alternating model, two players, A and B, repeatedly play for $T$ rounds. At each round $t \in \{1,\dots,T\}$ , only one player moves—choosing to Cooperate (C) or Defect (D)—after which the roles switch. In the leader-follower formulation, player P (leader) moves first, then player Q (follower) responds. The payoffs associated with each action follow the donation game parametrization:

Cooperation: mover pays a cost $c > 0$ ; recipient gains benefit $b > c$ .
Defection: no cost, no benefit transferred.

The resulting payoffs over any pair of consecutive moves mirror the standard Prisoner’s Dilemma incentive order: $b > b-c > 0 > -c, \quad 2(b-c) > b-c.$ The long-run average payoff per round is computable, using Markov chain theory, over the space of possible history blocks for memory- $N$ strategies (Zagorsky et al., 2013, Balabanova et al., 26 Jan 2026).

2. Strategy Representation and Memory

Deterministic strategies in the APD are represented as finite automata or, in the general case, as memory- $N$ vector functions:

Finite Automata (memory-1): A strategy is coded by states labeled "C" or "D." One-state automata yield always cooperate (ALLC: $(1,1,1,1)$ ) and always defect (ALLD: $(0,0,0,0)$ ). Two-state automata cover 24 distinct strategies, characterized by the quadruple $(p_1, p_2, p_3, p_4) \in \{0,1\}^4$ , specifying transition probabilities conditional on the last move and the opponent’s most recent action (Zagorsky et al., 2013).
Memory- $N$ Strategies: Formally, a memory- $N$ strategy for player P is a vector $p=(p_{i_1\ldots i_{2N}}) \in [0,1]^{2^{2N}}$ , where each index corresponds to a length-$2N$ block of past moves (alternating between P and Q). The strategy assigns the probability to cooperate based on the observed history (Balabanova et al., 26 Jan 2026).

Important canonical strategies include:

Grim ($1,0,0,0$): Cooperate until opponent defects, then defect forever.
Tit-for-Tat (TFT) ($1,0,1,0$): Copy opponent’s previous move.
Win-Stay Lose-Shift (WSLS) ($1,0,0,1$): Repeat previous move if rewarded, switch otherwise.
Forgiver ($1,1,0,1$): Defects once after opponent’s defection, then immediately resumes cooperation, regardless of opponent’s action.

The space of all memory- $N$ deterministic strategies has cardinality $2^{2^{2N}}$ .

3. Payoff Structure, Noise, and Markov Analysis

APD with noise incorporates an implementation error rate $\varepsilon \in [0,0.5)$ , such that intended moves are flipped with probability $\varepsilon$ . The resulting Markov process has $2^{2N}$ states (length-$2N$ history blocks). For the automaton-based model with memory-1, the set of joint player states is of size 16.

The transition matrix $M_N(p,q)$ , a $2^{2N} \times 2^{2N}$ left-stochastic matrix, governs the probabilistic updates of history blocks, given strategies $p$ and $q$ . The unique invariant distribution $\nu_N(p,q)$ gives the stationary frequencies of histories. The long-run average payoff to P playing $p$ against Q playing $q$ is then

$A(p,q) = \langle \nu_N(p,q),\,f_N\rangle,$

where $f_N$ assigns the per-stage payoffs. Under Markov ergodicity, there is a unique stationary expected payoff for each strategy pair. For the memory-1 case, the transition matrix and payoffs can be written explicitly, and key determinants facilitate analytical calculation (Balabanova et al., 26 Jan 2026).

A $26 \times 26$ payoff matrix $A = (a_{ij})$ is constructed over all automaton-implementable strategies in the one- and two-state model (Zagorsky et al., 2013).

4. Evolutionary and Adaptive Dynamics

Two approaches to evolutionary dynamics in the APD are prominent:

Replicator Dynamics (Finite Discrete Strategy Sets): Given population frequencies $x_i$ for each strategy $S_i$ , the system evolves according to

$\dot x_i = x_i(f_i - \bar f),$

where $f_i = \sum_j a_{ij} x_j$ and $\bar f = \sum_i x_i f_i$ . Mutation introduces a uniform perturbation: $\dot x_i = x_i(f_i-\bar f) + u(\tfrac1{26} - x_i).$

Adaptive Dynamics (Continuous Trait Space): For memory- $N$ strategies with continuous entries, mutations are modeled as small trait perturbations, yielding the canonical ODE: $\dot p = \nabla_q A(q,p)|_{q=p} \in \mathbb{R}^{2^{2N}}.$ Alternatively, for a finite set of candidate strategies $\{p^1, \ldots, p^M\}$ , one recovers the replicator equation in the trait space (Balabanova et al., 26 Jan 2026).

The invariant measures, recurrences, and flow degeneracies for higher memory settings are analyzed via Markov chain theory and bifurcation techniques.

5. Equilibrium Selection and Stability Analysis

The APD exhibits multiple types of attractors in the evolutionary dynamics:

ALLD-Equilibrium: Monomorphic defection population (pure Nash equilibrium).
Grim-Equilibrium: Population plays Grim (forgiving only until first defection; then perpetual defection).
Forgiver-Alliance Mixed Equilibrium: A stable coalition dominated by Forgiver (typically 82% frequency), with contributions from ALLC, TFT, WSLS, and certain sink- $C$ automata (Zagorsky et al., 2013).

Forgiver’s error-correcting properties make it dominant in the mixed attractor for typical parameters. Exact Nash stability conditions depend on model specifics: $\frac{b}{c} > \frac{2 + \varepsilon - \varepsilon^2}{1-2\varepsilon} \ \Longrightarrow \ \text{Forgiver resists ALLD.}$ ALLD and Grim are strict Nash equilibria for the one-shot matrix but only constitute global attractors if $b/c$ is low or $\varepsilon$ is high. The mixed equilibrium becomes globally attracting for larger $b/c$ and moderate noise.

For the memory-1 APD donation game, WSLS is often a strict ESS when $b/c > 2$ . TFT is neutrally stable for some parameters, while ALLC and ALLD are never ESSs (Balabanova et al., 26 Jan 2026).

6. Structural and Symmetry Properties

Higher-memory APD models retain several structural features:

Quadratic Invariants: For memory-1, the adaptive-dynamics flow has two independent first integrals:

$F_1 = (p_{CC} - 1)^2 + p_{DC}^2, \quad F_2 = (p_{CD} - 1)^2 + p_{DD}^2,$

implying that strategy space foliates into 2-tori, reducing the effective dimensionality of the flows (Balabanova et al., 26 Jan 2026).

$\mathbb{Z}_2$ Symmetry: Under involutions swapping "cooperate" $\leftrightarrow$ "defect" for either player, the payoff $A(p,q)$ is invariant.

As memory depth $N$ increases, the alternating model converges toward the standard simultaneous PD, supporting the intuition that move order becomes irrelevant with sufficiently rich memory.

7. Numerical Observations and Main Theoretical Implications

For low memory (N=1), numerical phase portraits exhibit saddle and source networks, repelling cycles, and singular perturbation phenomena on invariant circles. With increased memory (N=2 and higher), the bifurcation locus becomes richer, but $\mathbb{Z}_2$ symmetries persist and cooperation is further stabilized.

The central result is that forgiveness, operationalized via the Forgiver automaton, robustly supports cooperative alliances in evolutionary play under noise and exploitation. Forgiver’s fast error-correction capabilities allow recovery from mutually destructive defection spirals, ensuring a higher long-run payoff with fellow cooperators than exploiters can extract. Meanwhile, in finite-memory adaptive-dynamics, structural invariants constrain the evolutionary landscape, often favoring WSLS or TFT-like strategies over unconditional forms (Zagorsky et al., 2013, Balabanova et al., 26 Jan 2026).

The APD thus serves as both a rigorous model for real-world, sequential reciprocity settings and a testbed for the analysis of memory, strategy complexity, and robustness of cooperation in noisy, evolutionary environments.

Markdown Report Issue Upgrade to Chat

References (2)

Forgiver triumphs in alternating Prisoner's Dilemma (2013)

Adaptive dynamics of alternating Prisoner's Dilemma with memory N (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Alternating Prisoner's Dilemma Model.