Shapley-Bellman Backward Recursion

Updated 30 December 2025

Shapley-Bellman Backward Recursion is defined as a backward dynamic programming method that represents ReLU neural networks as the saddle-point of a zero-sum stopping game.
It employs Shapley equations for coordinate-wise maximization and minimization, enabling precise output recovery, robustness certification, and verification through game-theoretic constructs.
The framework extends via entropic regularization to Softplus networks and reformulates network learning as an inverse game identification problem under optimization constraints.

Shapley-Bellman backward recursion refers to the process by which the output of a feed-forward ReLU neural network is retrieved as the saddle-point value of a zero-sum, turn-based, stopping game known as the ReLU-net game. In this framework, each layer and neuron of the network corresponds to a state in the game, with the network’s input interpreted as a terminal reward and the feed-forward calculation expressed as a backward recursion over the game's structure. The recursion relies on coordinate-wise maximization and minimization (the Shapley equations), akin to dynamic programming in turn-based zero-sum games, and generalizes to other activation functions such as Softplus via entropic regularization. This approach enables alternative perspectives on evaluation, verification, and learning in neural networks through game-theoretic constructs, path-integral representation, and monotonicity arguments (Gaubert et al., 23 Dec 2025).

1. Construction of the ReLU-Net Game

A ReLU network of depth $L$ is associated with a zero-sum, turn-based, stopping game with horizon $L+1$ . Each neuron $(\ell,i)$ in layer $\ell$ results in two game states: $(\ell,i,+)$ (“Max’s turn”) and $(\ell,i,-)$ (“Min’s turn”). There is also a cemetery absorbing state $\perp$ with zero reward. The player whose turn it is chooses either to stop (transitioning immediately to $\perp$ ) or to continue to the next layer. Transitions are defined probabilistically using the weight matrices $W^\ell \in \mathbb{R}^{k_\ell\times k_{\ell+1}}$ and a "discount factor" $\gamma^\ell_i = \sum_j|W^\ell_{i,j}|$ . The transition probability from $(\ell,i,\epsilon)$ to $(\ell+1,j,\eta)$ is determined by the sign-decomposition of $W^\ell_{i,j}$ and normalization by $\gamma^\ell_i$ , specifically:

$P^\ell_{i\epsilon, j\eta} = (W^\ell_{i,j})^+ / \gamma^\ell_i$ for $(\epsilon,\eta) = (+,+)$ or $(-,-)$ ,
$P^\ell_{i\epsilon, j\eta} = (W^\ell_{i,j})^- / \gamma^\ell_i$ for $(\epsilon,\eta) = (+,-)$ or $(-,+)$ ,
where $(a)^+ = \max(a,0)$ and $(a)^- = \max(-a,0)$ .

Immediate rewards are $b^\ell_i$ at $(\ell,i,+)$ and $-b^\ell_i$ at $(\ell,i,-)$ , with terminal rewards at layer $L+1$ : $(L+1,j,+)$ gets $x_j$ , $(L+1,j,-)$ gets $-x_j$ (Gaubert et al., 23 Dec 2025).

2. Shapley–Bellman Backward Recursion Equations

The game value is computed via Shapley–Bellman backward recursion. Define $V^\ell_{i+}(x)$ (Max’s value) and $V^\ell_{i-}(x)$ (Min’s value) for state $(\ell,i,+)$ and $(\ell,i,-)$ respectively:

Terminal condition: $V^{L+1}_{j+}(x)= x_j$ , $V^{L+1}_{j-}(x)= -x_j$
Recursive equations for $\ell=L,\dots,1$ :

$V^\ell_{i+}(x) = \max \left\{ 0,\ \gamma^\ell_i [ \sum_{j:W^\ell_{i,j}>0}P^\ell_{i+,j+}V^{\ell+1}_{j+}(x) + \sum_{j:W^\ell_{i,j}<0}P^\ell_{i+,j-}V^{\ell+1}_{j-}(x) ] + b^\ell_i \right\}$

$V^\ell_{i-}(x) = \min \left\{ 0,\ \gamma^\ell_i [ \sum_{j:W^\ell_{i,j}>0}P^\ell_{i-,j-}V^{\ell+1}_{j-}(x) + \sum_{j:W^\ell_{i,j}<0}P^\ell_{i-,j+}V^{\ell+1}_{j+}(x) ] - b^\ell_i \right\}$

By induction, $V^\ell_{i+}(x) = y^\ell_i$ and $V^\ell_{i-}(x) = -y^\ell_i$ , where $y^\ell_i$ is the feed-forward network output at neuron $(\ell,i)$ (Gaubert et al., 23 Dec 2025). The max/min recursion directly reproduces the coordinate-wise ReLU.

3. Discrete Feynman–Kac Path-Integral Formula

The game’s output admits a discrete path-integral (Feynman–Kac-type) representation. Given any pair of policies $\pi$ (Max) and $\sigma$ (Min), a trajectory $\alpha$ initiates at $(\ell,i,+)$ and evolves according to chosen actions and transition probabilities. For each $\alpha$ ,

Transition probability: $P^{\pi,\sigma}(\alpha) = \prod_{t=\ell}^{\ell+k-1}P^t_{\alpha(t),\alpha(t+1)}$
Reward aggregate:

$r(\alpha) = \sum_{t=\ell}^{\ell+k-1} \left( \prod_{u=\ell}^{t-1}\gamma^u \right) r(\alpha(t)) + \left( \prod_{u=\ell}^{\ell+k-1}\gamma^u \right) \varphi(\alpha(\ell+k))$

with the game value expressed as

$V^{\ell,\pi,\sigma}_{i+}(x) = \sum_{\alpha}P^{\pi,\sigma}(\alpha)r(\alpha).$

The saddle-point $V^\ell_{i+}(x) = \max_\pi \min_\sigma V^{\ell,\pi,\sigma}_{i+}(x)$ , and at optimal policies $(\pi^*,\sigma^*)$ , $y^\ell_i = V^\ell_{i+}(x)$ (Gaubert et al., 23 Dec 2025). This duality connects dynamic programming/backward recursion and expectation/path-integral perspectives.

4. Monotonicity, Robustness Bounds, and Certification

Replacing the terminal map $x \mapsto (\pm x)$ with $\varphi$ generalizes the recursion, maintaining coordinate-wise monotonicity:

If $x_{\min} \leq x \leq x_{\max}$ pointwise, then

$y = V^1(x,-x) \in [ V^1(x_{\min},-x_{\max}),\ V^1(x_{\max},-x_{\min}) ]$

This establishes explicit bounds on outputs from input bounds.

For fixed policies, the following piecewise-linear bounds arise:

Lower bound: $f^\pi(x) = \min_\sigma V^{1,\pi,\sigma}_{1+}(x)$ is concave and piecewise-linear; its super-level set $\{x\mid f^\pi(x)\geq\alpha\}$ is a polyhedral certificate of $y(x)\geq\alpha$ .
Upper bound: For $\sigma$ fixed, $g^\sigma(x)$ is convex/PL; its sub-level set certifies $y(x)\leq\beta$ .

This mechanism provides robustness certification against input perturbations and adversarial scenarios (Gaubert et al., 23 Dec 2025).

5. Inverse Game Formulation for Network Learning

Under supervised learning, given sample pairs $\{(x^{(m)},y^{(m)})\}$ , ReLU-net game training is reformulated as an "inverse game" problem. The unknowns are transition probabilities $P^\ell$ and rewards $b^\ell$ , constrained to reflect the network weights and normalization. The Shapley equations become equality constraints:

$V^\ell_{i+}(x^{(m)};P,b) = y^{\ell,(m)}_i, \hspace{1cm} \forall\ \ell, i, m$

subject to

$P^\ell_{i\epsilon, \cdot} \geq 0, \quad \sum_{j,\eta} P^\ell_{i\epsilon, j\eta} = 1,$

with $P^\ell_{(i,\pm),(j,\pm)} \propto (W^\ell_{i,j})^\pm$ . This yields a feasibility or least-squares optimization:

$\min_{P, b} \sum_m \| y^{(m)} - V^1(x^{(m)}; P, b) \|^2$

subject to the transition/reward constraints (Gaubert et al., 23 Dec 2025). This framework interprets neural network learning as identification of game structure from observed outputs.

6. Entropic Regularization and Extension to Softplus Networks

For neural networks with Softplus activation, the game-theoretic model is modified via entropic regularization. At state $s=(\ell,i,\epsilon)$ , the reward is augmented by $-\tau\log p$ (for Max, with $p$ probability of "continue") or $+\tau\log q$ (for Min). The recursion then yields:

For Max:

$V^\ell_{i+,\tau}(x)=\tau\log\left[1 + \exp\left(\frac{\gamma^\ell_i\sum_j \text{sign-split}P^\ell V^{\ell+1}_{j,\cdot} + b^\ell_i}{\tau}\right)\right]$

For Min:

$V^\ell_{i-,\tau}(x)= -\tau\log\left[1 + \exp\left(-\frac{\gamma^\ell_i\sum_j \text{sign-split}P^\ell V^{\ell+1}_{j,\cdot} - b^\ell_i}{\tau}\right)\right]$

which reproduces the Softplus activation, $\text{softplus}_\tau(z)=\tau \log(1+e^{z/\tau})$ . The path-integral formula generalizes to a log–sum–exp over all trajectories, representing the free-energy interpretation of the output (Gaubert et al., 23 Dec 2025). This suggests a fundamental equivalence between neural activation smoothing and entropy-regularized stopping games.

7. Connections, Implications, and Future Directions

The Shapley–Bellman backward recursion in the context of neural networks provides a rigorous bridge between dynamic programming, game theory, and deep learning architectures. It enables path-integral interpretations for network outputs, monotonicity-based verification, robustness analysis, and conceptualizes neural training as a constrained inverse game identification. The entropic regularization mechanism elucidates the link between classic ReLU activations and smooth variants such as Softplus. A plausible implication is a broader class of game-theoretic neural architectures in which backward recursion and optimal policy pairs systematically determine both evaluation and training regimes, potentially extending robustness and verification capabilities across a wider array of network types (Gaubert et al., 23 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Relu and softplus neural nets as zero-sum turn-based games (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Shapley-Bellman Backward Recursion.