Bridging Stochastic Control and Deep Hedging: Structural Priors for No-Transaction Band Networks

Published 31 Mar 2026 in q-fin.PR, q-fin.CP, q-fin.PM, and q-fin.RM | (2603.29994v1)

Abstract: This paper studies the problem of hedging and pricing a European call option under proportional transaction costs, from two complementary perspectives. We first derive the optimal hedging strategy under CARA utility, following the stochastic control framework of Davis et al. (1993), characterising the no-transaction band via the Hamilton-Jacobi-Bellman Quasi-Variational Inequality (HJBQVI) and the Whalley-Wilmott asymptotic approximation. We then adopt a deep hedging approach, proposing two architectures that build on the No-Transaction Band Network of Imaki et al. (2023): NTBN-Delta, which makes delta-centring explicit, and WW-NTBN, which incorporates the Whalley-Wilmott formula as a structural prior on the bandwidth and replaces the hard clamp with a differentiable soft clamp. Numerical experiments show that WW-NTBN converges faster, matches the stochastic control no-transaction bands more closely, and generalises well across transaction cost regimes. We further apply both frameworks to the bull call spread, documenting the breakdown of price linearity under transaction costs.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper introduces a novel framework coupling stochastic control and deep hedging to derive optimal no-transaction bands through CARA utility and HJBQVI formulation.
It leverages Whalley-Wilmott asymptotics to calibrate band widths as functions of risk aversion, option gamma, and transaction cost parameters.
Empirical results show that the WW-NTBN model outperforms baseline architectures by achieving faster convergence, lower tail risk, and realistic indifference pricing.

Structural Priors for No-Transaction Band Networks: Bridging Stochastic Control and Deep Hedging

Introduction and Theoretical Framework

The paper "Bridging Stochastic Control and Deep Hedging: Structural Priors for No-Transaction Band Networks" (2603.29994) systematically investigates option hedging under proportional transaction costs, emphasizing the structural synthesis between the stochastic control approach and deep hedging methods. The authors provide a unified treatment of the CARA utility maximization framework for European-style derivatives, detailed derivations for the HJBQVI formulation, and connections to classical and band-based deep hedging architectures.

In the frictionless regime, the Black-Scholes model yields complete replication via continuous time delta-hedging; however, proportional frictions render continuous rebalancing suboptimal. Instead, optimal strategies become singular controls characterized by inaction regions—the so-called no-transaction bands—where trading occurs only when a portfolio deviates sufficiently from optimality to justify the incurred costs. These inaction zones are precisely and rigorously characterized using the HJBQVI and, in the small transaction cost limit, by the Whalley-Wilmott asymptotics, which provide closed-form expressions for the optimal band width as a function of risk aversion, local option gamma, and the transaction cost parameter.

Stochastic Control and Asymptotic Structure

The text establishes the theoretical scaffolding: an agent with CARA utility faces a risky asset following GBM, a riskless bond, and proportional transaction costs at execution. The problem is first posed absent frictions, leading to the Merton policy and, under a short option liability, to an explicit hedge decomposition into speculative and hedging demands. Upon introducing costs, portfolio dynamics become hybrid SDEs with finite variation and diffusive terms, and the agent's value function satisfies a four-dimensional HJBQVI. For CARA utility, the dimension reduces, and the core problem can be efficiently solved over time, position, and price.

Crucially, the Whalley-Wilmott expansion characterizes the no-transaction region: the optimal inaction band has half-width scaling as $\lambda^{1/3}$ , where $\lambda$ is transaction cost, and the locus is centered around the local Black-Scholes delta. This analytic result is structurally robust for small-to-moderate friction regimes, and the authors verify this through calibrated numerical dynamic programming.

Deep Hedging Architectures

The deep hedging paradigm recasts the option management task as direct policy optimization over neural network parameterizations, targeting convex risk measures (specifically, the entropic risk analogue to CARA). This brings three core architectures into focus:

MLP Baseline: A standard feedforward network outputs the next hedge, conditioned on market features and prior holding. It suffers from path- and action-dependence pathologies, particularly unstable in high transaction cost (TC) regimes.
NTBN-A (Delta-Centered): Expands upon the No-Transaction Band Network (NTBN) architecture, making the centering around Black-Scholes delta explicit and theoretically grounded. By parameterizing only the band widths and clamping the previous hedge within these bounds, the architecture aligns with control-theoretic structure, improving data efficiency and stability.
WW-NTBN (WW-Guided): Integrates the Whalley-Wilmott asymptotics directly as a structural prior for the band width, parameterizing only a small residual correction. Additionally, the hard clamp is replaced by a differentiable soft clamp, mitigating issues of vanishing gradients that impair training for actions outside the learned band. This architecture partially sacrifices model-agnosticism (requiring delta/gamma computation) for dramatically accelerated convergence, improved generalization across TC regimes, and close quantitative agreement to the optimal band structure.

Numerical Analysis and Empirical Results

The empirical section systematically benchmarks all methods under equivalent market scenarios and a range of transaction cost values:

Indifference Pricing: All models recover frictionless Black-Scholes prices in the $\lambda=0$ regime. As costs rise, prices increase monotonically, and a nontrivial bid-ask spread emerges as a direct consequence of the asymmetry in writer and buyer hedging costs induced by market frictions.
Band Structure: WW-NTBN produces no-transaction bands that align closely with the stochastic control benchmark for all tested frictions, owing to effective exploitation of the structural prior. NTBN-A also performs well, but with larger variance and slower training. The MLP, lacking explicit structural encoding, converges slowly and yields suboptimal bands, especially under high $\lambda$ .
Training Dynamics: WW-NTBN demonstrates rapid convergence and lower terminal entropic risk across all frictions. The distinction is most pronounced at low-to-moderate TCs, where prior-driven initialization yields immediate alignment with optimal band geometry.
P&L and Tail Risk: Models with more accurate band structures (WW-NTBN, NTBN-A) exhibit reduced trading activity and lower tail losses as measured by CVaR, directly stemming from avoidance of unnecessary trades and superior management of the cost-volatility tradeoff.
Bull Call Spread Nonlinearity: Under transaction costs, indifference pricing and hedging become subadditive. The joint hedge for a spread is strictly more efficient than the separate hedge, explaining by no-transaction bands' intersection dynamics: the gamma of a combined payoff can be substantially smaller than the sum of expected isolated exposure, allowing for wider inaction regions and sharply reduced trading frequency. Quantitative differences in prices and trade statistics validate this theoretical insight.

Implications and Future Research

The paper's principal implication is that neural network architectures for hedging complex derivatives benefit substantially from the explicit embedding of optimal control structure—either via band representations, explicit analytic priors, or differentiable enforcement mechanisms. These hybrid architectures (and, most notably, WW-NTBN) navigate the sample complexity and trainability-expressiveness tradeoff efficiently: in regimes where the log-normal assumption and WW formula are valid, they offer major practical advantages. Conversely, in highly non-Gaussian or regime-switching settings, model-agnostic architectures (e.g., NTBN-A) are recommended, though at the expense of some convergence and generalization performance.

The demonstrated subadditivity in indifference pricing for portfolios under transaction costs points to a pressing need for dealers to jointly optimize (and price) composite derivative exposures, as naive leg-wise hedging is dominated both on cost and risk management axes. Further, the authors indicate that these results pave the way for structured priors in reinforcement learning for control—using WW-initialized policies as warm starts for learning in more general market models (rough volatility, jumps) or large-dimensional portfolios.

Conclusion

This work rigorously bridges the economic, stochastic control, and deep learning views of the frictional hedging problem, advocating for architectures that internalize analytical band structure and differentiable constraints. The technical contribution is a principled, transparent, and empirically validated method (WW-NTBN) that demonstrates improved numerical and practical properties compared to both traditional control-based schemes and model-agnostic learning. The architectural trade-off between structural priors and flexibility is shown to be pivotal in the cost-frictional regime, and the explicit demonstration of nonlinearity in real-world structured products hedging has salient implications for both the mathematical finance and applied ML communities.

Markdown Report Issue