Papers
Topics
Authors
Recent
Search
2000 character limit reached

Shapley-Bellman Backward Recursion

Updated 30 December 2025
  • Shapley-Bellman Backward Recursion is defined as a backward dynamic programming method that represents ReLU neural networks as the saddle-point of a zero-sum stopping game.
  • It employs Shapley equations for coordinate-wise maximization and minimization, enabling precise output recovery, robustness certification, and verification through game-theoretic constructs.
  • The framework extends via entropic regularization to Softplus networks and reformulates network learning as an inverse game identification problem under optimization constraints.

Shapley-Bellman backward recursion refers to the process by which the output of a feed-forward ReLU neural network is retrieved as the saddle-point value of a zero-sum, turn-based, stopping game known as the ReLU-net game. In this framework, each layer and neuron of the network corresponds to a state in the game, with the network’s input interpreted as a terminal reward and the feed-forward calculation expressed as a backward recursion over the game's structure. The recursion relies on coordinate-wise maximization and minimization (the Shapley equations), akin to dynamic programming in turn-based zero-sum games, and generalizes to other activation functions such as Softplus via entropic regularization. This approach enables alternative perspectives on evaluation, verification, and learning in neural networks through game-theoretic constructs, path-integral representation, and monotonicity arguments (Gaubert et al., 23 Dec 2025).

1. Construction of the ReLU-Net Game

A ReLU network of depth LL is associated with a zero-sum, turn-based, stopping game with horizon L+1L+1. Each neuron (,i)(\ell,i) in layer \ell results in two game states: (,i,+)(\ell,i,+) (“Max’s turn”) and (,i,)(\ell,i,-) (“Min’s turn”). There is also a cemetery absorbing state \perp with zero reward. The player whose turn it is chooses either to stop (transitioning immediately to \perp) or to continue to the next layer. Transitions are defined probabilistically using the weight matrices WRk×k+1W^\ell \in \mathbb{R}^{k_\ell\times k_{\ell+1}} and a "discount factor" γi=jWi,j\gamma^\ell_i = \sum_j|W^\ell_{i,j}|. The transition probability from (,i,ϵ)(\ell,i,\epsilon) to (+1,j,η)(\ell+1,j,\eta) is determined by the sign-decomposition of Wi,jW^\ell_{i,j} and normalization by γi\gamma^\ell_i, specifically:

  • Piϵ,jη=(Wi,j)+/γiP^\ell_{i\epsilon, j\eta} = (W^\ell_{i,j})^+ / \gamma^\ell_i for (ϵ,η)=(+,+)(\epsilon,\eta) = (+,+) or (,)(-,-),
  • Piϵ,jη=(Wi,j)/γiP^\ell_{i\epsilon, j\eta} = (W^\ell_{i,j})^- / \gamma^\ell_i for (ϵ,η)=(+,)(\epsilon,\eta) = (+,-) or (,+)(-,+),
  • where (a)+=max(a,0)(a)^+ = \max(a,0) and (a)=max(a,0)(a)^- = \max(-a,0).

Immediate rewards are bib^\ell_i at (,i,+)(\ell,i,+) and bi-b^\ell_i at (,i,)(\ell,i,-), with terminal rewards at layer L+1L+1: (L+1,j,+)(L+1,j,+) gets xjx_j, (L+1,j,)(L+1,j,-) gets xj-x_j (Gaubert et al., 23 Dec 2025).

2. Shapley–Bellman Backward Recursion Equations

The game value is computed via Shapley–Bellman backward recursion. Define Vi+(x)V^\ell_{i+}(x) (Max’s value) and Vi(x)V^\ell_{i-}(x) (Min’s value) for state (,i,+)(\ell,i,+) and (,i,)(\ell,i,-) respectively:

  • Terminal condition: Vj+L+1(x)=xjV^{L+1}_{j+}(x)= x_j, VjL+1(x)=xjV^{L+1}_{j-}(x)= -x_j
  • Recursive equations for =L,,1\ell=L,\dots,1:

Vi+(x)=max{0, γi[j:Wi,j>0Pi+,j+Vj++1(x)+j:Wi,j<0Pi+,jVj+1(x)]+bi}V^\ell_{i+}(x) = \max \left\{ 0,\ \gamma^\ell_i [ \sum_{j:W^\ell_{i,j}>0}P^\ell_{i+,j+}V^{\ell+1}_{j+}(x) + \sum_{j:W^\ell_{i,j}<0}P^\ell_{i+,j-}V^{\ell+1}_{j-}(x) ] + b^\ell_i \right\}

Vi(x)=min{0, γi[j:Wi,j>0Pi,jVj+1(x)+j:Wi,j<0Pi,j+Vj++1(x)]bi}V^\ell_{i-}(x) = \min \left\{ 0,\ \gamma^\ell_i [ \sum_{j:W^\ell_{i,j}>0}P^\ell_{i-,j-}V^{\ell+1}_{j-}(x) + \sum_{j:W^\ell_{i,j}<0}P^\ell_{i-,j+}V^{\ell+1}_{j+}(x) ] - b^\ell_i \right\}

By induction, Vi+(x)=yiV^\ell_{i+}(x) = y^\ell_i and Vi(x)=yiV^\ell_{i-}(x) = -y^\ell_i, where yiy^\ell_i is the feed-forward network output at neuron (,i)(\ell,i) (Gaubert et al., 23 Dec 2025). The max/min recursion directly reproduces the coordinate-wise ReLU.

3. Discrete Feynman–Kac Path-Integral Formula

The game’s output admits a discrete path-integral (Feynman–Kac-type) representation. Given any pair of policies π\pi (Max) and σ\sigma (Min), a trajectory α\alpha initiates at (,i,+)(\ell,i,+) and evolves according to chosen actions and transition probabilities. For each α\alpha,

  • Transition probability: Pπ,σ(α)=t=+k1Pα(t),α(t+1)tP^{\pi,\sigma}(\alpha) = \prod_{t=\ell}^{\ell+k-1}P^t_{\alpha(t),\alpha(t+1)}
  • Reward aggregate:

r(α)=t=+k1(u=t1γu)r(α(t))+(u=+k1γu)φ(α(+k))r(\alpha) = \sum_{t=\ell}^{\ell+k-1} \left( \prod_{u=\ell}^{t-1}\gamma^u \right) r(\alpha(t)) + \left( \prod_{u=\ell}^{\ell+k-1}\gamma^u \right) \varphi(\alpha(\ell+k))

with the game value expressed as

Vi+,π,σ(x)=αPπ,σ(α)r(α).V^{\ell,\pi,\sigma}_{i+}(x) = \sum_{\alpha}P^{\pi,\sigma}(\alpha)r(\alpha).

The saddle-point Vi+(x)=maxπminσVi+,π,σ(x)V^\ell_{i+}(x) = \max_\pi \min_\sigma V^{\ell,\pi,\sigma}_{i+}(x), and at optimal policies (π,σ)(\pi^*,\sigma^*), yi=Vi+(x)y^\ell_i = V^\ell_{i+}(x) (Gaubert et al., 23 Dec 2025). This duality connects dynamic programming/backward recursion and expectation/path-integral perspectives.

4. Monotonicity, Robustness Bounds, and Certification

Replacing the terminal map x(±x)x \mapsto (\pm x) with φ\varphi generalizes the recursion, maintaining coordinate-wise monotonicity:

  • If xminxxmaxx_{\min} \leq x \leq x_{\max} pointwise, then

y=V1(x,x)[V1(xmin,xmax), V1(xmax,xmin)]y = V^1(x,-x) \in [ V^1(x_{\min},-x_{\max}),\ V^1(x_{\max},-x_{\min}) ]

  • This establishes explicit bounds on outputs from input bounds.

For fixed policies, the following piecewise-linear bounds arise:

  • Lower bound: fπ(x)=minσV1+1,π,σ(x)f^\pi(x) = \min_\sigma V^{1,\pi,\sigma}_{1+}(x) is concave and piecewise-linear; its super-level set {xfπ(x)α}\{x\mid f^\pi(x)\geq\alpha\} is a polyhedral certificate of y(x)αy(x)\geq\alpha.
  • Upper bound: For σ\sigma fixed, gσ(x)g^\sigma(x) is convex/PL; its sub-level set certifies y(x)βy(x)\leq\beta.

This mechanism provides robustness certification against input perturbations and adversarial scenarios (Gaubert et al., 23 Dec 2025).

5. Inverse Game Formulation for Network Learning

Under supervised learning, given sample pairs {(x(m),y(m))}\{(x^{(m)},y^{(m)})\}, ReLU-net game training is reformulated as an "inverse game" problem. The unknowns are transition probabilities PP^\ell and rewards bb^\ell, constrained to reflect the network weights and normalization. The Shapley equations become equality constraints:

Vi+(x(m);P,b)=yi,(m), ,i,mV^\ell_{i+}(x^{(m)};P,b) = y^{\ell,(m)}_i, \hspace{1cm} \forall\ \ell, i, m

subject to

Piϵ,0,j,ηPiϵ,jη=1,P^\ell_{i\epsilon, \cdot} \geq 0, \quad \sum_{j,\eta} P^\ell_{i\epsilon, j\eta} = 1,

with P(i,±),(j,±)(Wi,j)±P^\ell_{(i,\pm),(j,\pm)} \propto (W^\ell_{i,j})^\pm. This yields a feasibility or least-squares optimization:

minP,bmy(m)V1(x(m);P,b)2\min_{P, b} \sum_m \| y^{(m)} - V^1(x^{(m)}; P, b) \|^2

subject to the transition/reward constraints (Gaubert et al., 23 Dec 2025). This framework interprets neural network learning as identification of game structure from observed outputs.

6. Entropic Regularization and Extension to Softplus Networks

For neural networks with Softplus activation, the game-theoretic model is modified via entropic regularization. At state s=(,i,ϵ)s=(\ell,i,\epsilon), the reward is augmented by τlogp-\tau\log p (for Max, with pp probability of "continue") or +τlogq+\tau\log q (for Min). The recursion then yields:

  • For Max:

Vi+,τ(x)=τlog[1+exp(γijsign-splitPVj,+1+biτ)]V^\ell_{i+,\tau}(x)=\tau\log\left[1 + \exp\left(\frac{\gamma^\ell_i\sum_j \text{sign-split}P^\ell V^{\ell+1}_{j,\cdot} + b^\ell_i}{\tau}\right)\right]

  • For Min:

Vi,τ(x)=τlog[1+exp(γijsign-splitPVj,+1biτ)]V^\ell_{i-,\tau}(x)= -\tau\log\left[1 + \exp\left(-\frac{\gamma^\ell_i\sum_j \text{sign-split}P^\ell V^{\ell+1}_{j,\cdot} - b^\ell_i}{\tau}\right)\right]

which reproduces the Softplus activation, softplusτ(z)=τlog(1+ez/τ)\text{softplus}_\tau(z)=\tau \log(1+e^{z/\tau}). The path-integral formula generalizes to a log–sum–exp over all trajectories, representing the free-energy interpretation of the output (Gaubert et al., 23 Dec 2025). This suggests a fundamental equivalence between neural activation smoothing and entropy-regularized stopping games.

7. Connections, Implications, and Future Directions

The Shapley–Bellman backward recursion in the context of neural networks provides a rigorous bridge between dynamic programming, game theory, and deep learning architectures. It enables path-integral interpretations for network outputs, monotonicity-based verification, robustness analysis, and conceptualizes neural training as a constrained inverse game identification. The entropic regularization mechanism elucidates the link between classic ReLU activations and smooth variants such as Softplus. A plausible implication is a broader class of game-theoretic neural architectures in which backward recursion and optimal policy pairs systematically determine both evaluation and training regimes, potentially extending robustness and verification capabilities across a wider array of network types (Gaubert et al., 23 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Shapley-Bellman Backward Recursion.