Shapley-Bellman Backward Recursion
- Shapley-Bellman Backward Recursion is defined as a backward dynamic programming method that represents ReLU neural networks as the saddle-point of a zero-sum stopping game.
- It employs Shapley equations for coordinate-wise maximization and minimization, enabling precise output recovery, robustness certification, and verification through game-theoretic constructs.
- The framework extends via entropic regularization to Softplus networks and reformulates network learning as an inverse game identification problem under optimization constraints.
Shapley-Bellman backward recursion refers to the process by which the output of a feed-forward ReLU neural network is retrieved as the saddle-point value of a zero-sum, turn-based, stopping game known as the ReLU-net game. In this framework, each layer and neuron of the network corresponds to a state in the game, with the network’s input interpreted as a terminal reward and the feed-forward calculation expressed as a backward recursion over the game's structure. The recursion relies on coordinate-wise maximization and minimization (the Shapley equations), akin to dynamic programming in turn-based zero-sum games, and generalizes to other activation functions such as Softplus via entropic regularization. This approach enables alternative perspectives on evaluation, verification, and learning in neural networks through game-theoretic constructs, path-integral representation, and monotonicity arguments (Gaubert et al., 23 Dec 2025).
1. Construction of the ReLU-Net Game
A ReLU network of depth is associated with a zero-sum, turn-based, stopping game with horizon . Each neuron in layer results in two game states: (“Max’s turn”) and (“Min’s turn”). There is also a cemetery absorbing state with zero reward. The player whose turn it is chooses either to stop (transitioning immediately to ) or to continue to the next layer. Transitions are defined probabilistically using the weight matrices and a "discount factor" . The transition probability from to is determined by the sign-decomposition of and normalization by , specifically:
- for or ,
- for or ,
- where and .
Immediate rewards are at and at , with terminal rewards at layer : gets , gets (Gaubert et al., 23 Dec 2025).
2. Shapley–Bellman Backward Recursion Equations
The game value is computed via Shapley–Bellman backward recursion. Define (Max’s value) and (Min’s value) for state and respectively:
- Terminal condition: ,
- Recursive equations for :
By induction, and , where is the feed-forward network output at neuron (Gaubert et al., 23 Dec 2025). The max/min recursion directly reproduces the coordinate-wise ReLU.
3. Discrete Feynman–Kac Path-Integral Formula
The game’s output admits a discrete path-integral (Feynman–Kac-type) representation. Given any pair of policies (Max) and (Min), a trajectory initiates at and evolves according to chosen actions and transition probabilities. For each ,
- Transition probability:
- Reward aggregate:
with the game value expressed as
The saddle-point , and at optimal policies , (Gaubert et al., 23 Dec 2025). This duality connects dynamic programming/backward recursion and expectation/path-integral perspectives.
4. Monotonicity, Robustness Bounds, and Certification
Replacing the terminal map with generalizes the recursion, maintaining coordinate-wise monotonicity:
- If pointwise, then
- This establishes explicit bounds on outputs from input bounds.
For fixed policies, the following piecewise-linear bounds arise:
- Lower bound: is concave and piecewise-linear; its super-level set is a polyhedral certificate of .
- Upper bound: For fixed, is convex/PL; its sub-level set certifies .
This mechanism provides robustness certification against input perturbations and adversarial scenarios (Gaubert et al., 23 Dec 2025).
5. Inverse Game Formulation for Network Learning
Under supervised learning, given sample pairs , ReLU-net game training is reformulated as an "inverse game" problem. The unknowns are transition probabilities and rewards , constrained to reflect the network weights and normalization. The Shapley equations become equality constraints:
subject to
with . This yields a feasibility or least-squares optimization:
subject to the transition/reward constraints (Gaubert et al., 23 Dec 2025). This framework interprets neural network learning as identification of game structure from observed outputs.
6. Entropic Regularization and Extension to Softplus Networks
For neural networks with Softplus activation, the game-theoretic model is modified via entropic regularization. At state , the reward is augmented by (for Max, with probability of "continue") or (for Min). The recursion then yields:
- For Max:
- For Min:
which reproduces the Softplus activation, . The path-integral formula generalizes to a log–sum–exp over all trajectories, representing the free-energy interpretation of the output (Gaubert et al., 23 Dec 2025). This suggests a fundamental equivalence between neural activation smoothing and entropy-regularized stopping games.
7. Connections, Implications, and Future Directions
The Shapley–Bellman backward recursion in the context of neural networks provides a rigorous bridge between dynamic programming, game theory, and deep learning architectures. It enables path-integral interpretations for network outputs, monotonicity-based verification, robustness analysis, and conceptualizes neural training as a constrained inverse game identification. The entropic regularization mechanism elucidates the link between classic ReLU activations and smooth variants such as Softplus. A plausible implication is a broader class of game-theoretic neural architectures in which backward recursion and optimal policy pairs systematically determine both evaluation and training regimes, potentially extending robustness and verification capabilities across a wider array of network types (Gaubert et al., 23 Dec 2025).