Papers
Topics
Authors
Recent
Search
2000 character limit reached

Bayesian-CFR for Incomplete Info Games

Updated 1 February 2026
  • Bayesian-CFR is a framework that integrates Bayesian belief updates with counterfactual regret minimization to approximate Bayesian Nash equilibria in extensive-form games.
  • It employs conditional kernel density estimation to nonparametrically recover type distributions, enabling efficient posterior updates and improved learning performance.
  • Empirical evaluations, especially in Texas hold’em, show that Bayesian-CFR and its extensions achieve superior exploitability metrics compared to traditional CFR methods.

Bayesian Counterfactual Regret Minimization (Bayesian-CFR) is a computational framework for solving extensive-form Bayesian games wherein each player holds incomplete information about the underlying game, including payoffs and private opponent data. The algorithm strategically leverages Bayesian belief updates and counterfactual regret minimization to approximate Bayesian Nash equilibria, outperforming prior approaches in both learning rate and exploitability for challenging games of incomplete information such as Texas hold'em poker (Zhang et al., 2024).

1. Formal Setting: Extensive-Form Bayesian Games

An extensive-form Bayesian game is specified by

Γ=(N,H,Z,P,σc,I,Θ,Pr,u)\Gamma = (N, H, Z, P, \sigma_c, I, \Theta, Pr, u)

with the following components:

  • N={1,
,n}N = \{1, \ldots, n\}: finite set of players.
  • HH: set of histories in the game tree; Z⊂HZ \subset H: terminal histories.
  • P(h)∈NâˆȘ{c}P(h) \in N \cup \{c\}: active player at node hh (chance denoted cc); σc\sigma_c is chance's action distribution.
  • I={Ii}iI = \{I_i\}_i: set of information sets; each IiI_i partitions HH for player ii.
  • Θ\Theta: prior type space, where Ξ∈Θ\theta \in \Theta encodes private parameters such as risk preferences or payoff functions.
  • Pr(Ξ)Pr(\theta): prior probability distribution on types.
  • ui,Ξ:Z→Ru_{i, \theta}: Z \rightarrow \mathbb{R}: utility function for player ii given type Ξ\theta.

Each player maintains a posterior belief over types, Prχ(Ξ∣Oχ)Pr_\chi(\theta | O_\chi), updated from their local observation history OχO_\chi during play. A type-contingent behavioral strategy profile σΘ={σi,Ξ}\sigma_\Theta = \{\sigma_{i, \theta}\} assigns, for each player and type, a mapping from information sets to probability distributions over actions. Terminal node probability is πσΘ(z)\pi^{\sigma_\Theta}(z), and ex-ante expected utility is

ui(σΘ)=∑ξ∈ΘPri(Ξ)∑z∈ZπσΘ(z)ui,Ξ(z).u_i(\sigma_\Theta) = \sum_{\theta \in \Theta} Pr_i(\theta) \sum_{z \in Z} \pi^{\sigma_\Theta}(z) u_{i, \theta}(z).

A Bayesian Nash equilibrium (BNE) is a strategy profile from which no player can benefit by unilaterally deviating given their beliefs about the game and opponents.

2. Bayesian Belief Updates via Conditional Kernel Density Estimation

Agents update their posterior beliefs over types using conditional kernel density estimation (CKDE). For a player χ\chi and mm reference samples {(hjâ€Č,Ξjâ€Č)}\{(h_j', \theta_j')\}, the history-likelihood for type Ξ\theta is estimated as

Pr^χ(h∣Ξ)=∑j=1mK(ds(h,hjâ€Č)w)Kâ€Č(dr(Ξ,Ξjâ€Č)wâ€Č)∑ℓ=1mKâ€Č(dr(Ξ,ξℓâ€Č)wâ€Č),\widehat{Pr}_\chi(h|\theta) = \frac{\sum_{j=1}^m K\left(\frac{d_s(h, h_j')}{w}\right) K'\left(\frac{d_r(\theta, \theta_j')}{w'}\right)}{\sum_{\ell=1}^m K'\left(\frac{d_r(\theta, \theta_\ell')}{w'}\right)},

where KK, Kâ€ČK' are smoothing kernels, ww, wâ€Čw' are bandwidths, and dsd_s, drd_r are distances over histories and type-space. Given observations Oχ=(h1,
,hn)O_\chi = (h_1, \ldots, h_n), the CKDE posterior is

Pr^mn(Ξ∣Oχ)∝Pr(Ξ)∏i=1nPr^χ(hi∣Ξ).\widehat{Pr}_m^n(\theta|O_\chi) \propto Pr(\theta) \prod_{i=1}^n \widehat{Pr}_\chi(h_i|\theta).

Provided sufficient smoothness and support, the CKDE posterior converges in L1L_1 to the true posterior as the number of samples and observations grow (see Lemma 3.1 and Theorem 3.2 in the source). This enables nonparametric recovery of type distributions in high-dimensional incomplete-information games.

3. Bayesian Regret and Counterfactual Regret

Classical counterfactual regret minimization computes instantaneous regrets at information sets to drive strategic update. In the Bayesian-CFR setting, the agent's payoff depends on unknown types and a posterior distribution; regret is therefore defined as follows.

  • Overall Bayesian Regret:

Ri,ΘT=1TmaxâĄÏƒi,Θ∗∑t=1T∑ξ∈ΘPr(Ξ∣Oχt)[ui,Ξ(σi,ξ∗,σ−i,Ξt)−ui,Ξ(σΘt)]R_{i, \Theta}^T = \frac{1}{T} \max_{\sigma^*_{i, \Theta}} \sum_{t=1}^T \sum_{\theta \in \Theta} Pr(\theta | O_\chi^t) \big[ u_{i, \theta}(\sigma^*_{i, \theta}, \sigma_{-i, \theta}^t) - u_{i, \theta}(\sigma_\Theta^t) \big ]

  • Instantaneous Bayesian Counterfactual Regret at I∈IiI \in I_i:

Ri,Θ,immT(I)=1Tmax⁥a∈A(I)∑t=1T∑ξPr(Ξ∣Oχt)π−i,Ξσt(I)[ui,Ξ(σΘt∣I→a)−ui,Ξ(σΘt)]R_{i, \Theta, imm}^T(I) = \frac{1}{T} \max_{a \in A(I)} \sum_{t=1}^T \sum_{\theta} Pr(\theta | O_\chi^t) \pi_{-i, \theta}^{\sigma^t}(I)\big[ u_{i,\theta}(\sigma_\Theta^t|_{I \to a}) - u_{i,\theta}(\sigma_\Theta^t) \big]

Driving regrets to zero at every infoset suffices for the overall regret to vanish and ensures convergence to BNE (Theorem 3.3).

4. Bayesian-CFR Algorithm

Bayesian-CFR is architecturally similar to classical CFR with the following modifications per iteration:

  • Posterior Sampling: Draw type Ξ∌Prt(⋅)\theta \sim Pr_t(\cdot).
  • Game-Tree Traversal: For each player pp, recursively traverse information sets accumulating regret:

rt(I,a)+=Prt(Ξ∣h)⋅π−p(h)[vσ(h⋅a)−vσ(h)]r_t(I, a) \mathrel{+}= Pr_t(\theta|h) \cdot \pi_{-p}(h)[v_\sigma(h\cdot a) - v_\sigma(h)]

  • Regret-Matching Update:

σt+1(I,a)∝max⁥{Rt+(I,a),0}\sigma_{t+1}(I, a) \propto \max \{ R_t^+(I, a), 0 \}

  • Observation Collection: Obtain new OχtO_\chi^t from simulated play.
  • Posterior Update: Re-estimate CKDE likelihoods and update Prt+1(Ξ∣Oχ)Pr_{t+1}(\theta | O_\chi).

Queue-based storage of (history, type) samples and global priors facilitate accumulating data for progressive belief improvements. Pseudocode (Algorithms 1 and 2 in (Zhang et al., 2024)) details this workflow.

5. Theoretical Regret Bounds

Bayesian-CFR inherits the regret-matching convergence rate with bounds modified for Bayesian belief updates. For each infoset II,

Ri,Θ,immT(I)≀Δu,i,ΘT∣A(I)∣/TR_{i,\Theta,imm}^T(I) \leq \Delta_{u,i,\Theta}^T \sqrt{|A(I)|}/\sqrt{T}

and in aggregate,

Ri,ΘTâ‰€âˆŁIi∣Δu,i,ΘTmax⁥I∣A(I)∣/TR_{i, \Theta}^T \leq |I_i| \Delta_{u,i,\Theta}^T \sqrt{\max_I |A(I)|}/\sqrt{T}

where

Δu,i,ΘT=∑ξPr(Ξ∣OχT)max⁥a,aâ€Č[ui,Ξ(a)−ui,Ξ(aâ€Č)]\Delta_{u,i,\Theta}^T = \sum_\theta Pr(\theta | O_\chi^T) \max_{a, a'} [u_{i, \theta}(a) - u_{i, \theta}(a')]

These bounds follow from the per-infoset regret decomposition, classical regret-matching rates, and summation over the finite set of information sets (Theorem 3.4).

6. Algorithmic Extensions: Bayesian-CFR+ and Deep Bayesian CFR

Two notable extensions generalize Bayesian-CFR by leveraging accelerated updates and network approximators:

  • Bayesian-CFR+: Implements regret-matching-plus, updating

R+,t(I,a)=max⁡{R+,t−1(I,a)+immediate regrett(I,a),0}R^{+, t}(I, a) = \max\{ R^{+,t-1}(I, a) + \text{immediate regret}_t(I, a), 0 \}

which empirically accelerates convergence, mirroring classical CFR+.

  • Deep Bayesian CFR: Employs neural networks to approximate per-infoset regrets and the average strategy, encoding type Ξ\theta as a one-hot vector injected at the second layer of the regret network. Training is performed via mean-squared error minimization on value memory (Mr,pM_{r,p}) and policy memory (Mπ,pM_{\pi,p}) tuples. The final strategy network is, likewise, fit via regression on average-policy labels. Under standard assumptions on Lipschitzness and replay-buffers, Theorem 4.1 offers a regret bound:

Rp,ΘT≀(1+2ρK)ΔΘT∣Ip∣∣A∣T+4 T ∣Ip∣∣A∣ ΔΘT ΔLR_{p,\Theta}^T \leq \left(1 + \tfrac{\sqrt{2}}{\sqrt{\rho K}}\right)\Delta_\Theta^T |I_p| \sqrt{|A|T} + 4\,T\,|I_p| \sqrt{|A|\,\Delta_\Theta^T\,\varepsilon_L}

where ΔL\varepsilon_L is the worst-case network approximation error, KK the batch size, and ρ\rho is a probability parameter.

7. Empirical Evaluation: Texas Hold’em Benchmarks

The framework is empirically validated on a two-player heads-up Texas hold’em variant from RLCard, featuring three latent payoff types (normal, conservative, aggressive) controlling pot splitting. Opponents are sampled either from pure types or mixed distributions over these payoffs.

  • Baselines: CFR, CFR+, Deep CFR, Monte Carlo CFR (MCCFR), and a DQN-style RL agent are compared.
  • Performance Metric: Exploitability measured in milli-big-blinds per game (mbb/g); lower is better.
Baseline Pure-Type Opponent (mbb/g) Mixed-Type Opponent (mbb/g)
Bayesian-CFR ≈ 0.17 ≈ 0.17
Bayesian-CFR+ ≈ 0.02 ≈ 0.02
Deep Bayesian CFR ≈ 0.08 ≈ 0.08
CFR ≈ 0.28 ≈ 0.31
CFR+ ≈ 0.07 ≈ 0.07
Deep CFR ≈ 0.36 ≈ 0.36
MCCFR ≈ 0.35 ≈ 0.35
DQN ≈ 1.34 ≈ 1.34

Ablation tests demonstrate the necessity of belief updates: models omitting posterior updates degrade to ≈0.27 exploitability, approaching the CFR baseline. In contrast, Bayesian-CFR-based models reliably approach an "ideal" complete-information value of ≈0.16 mbb/g, demonstrating efficient recovery of the missing information via nonparametric belief tracking.

8. Synthesis and Significance

Bayesian-CFR integrates nonparametric Bayesian belief tracking with the counterfactual regret framework, preserving the optimal O(1/T)O(1 / \sqrt{T}) convergence rate to Bayesian Nash equilibrium. Extensions using regret-matching-plus and deep function approximation further improve empirical convergence and exploitability. A plausible implication is that Bayesian-CFR provides a general path for equilibrium approximation in any application domain where type uncertainty and nonparametric priors are central, including market modeling, security games, and strategic learning under asymmetric information.

For full derivations, algorithmic details, and proofs see "Modeling Other Players with Bayesian Beliefs for Games with Incomplete Information" (Zhang et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bayesian Counterfactual Regret Minimization (Bayesian-CFR).