Bridging Structured Knowledge and Data: A Unified Framework with Finance Applications

Published 1 Apr 2026 in stat.ML, cs.AI, and cs.LG | (2604.00987v1)

Abstract: We develop Structured-Knowledge-Informed Neural Networks (SKINNs), a unified estimation framework that embeds theoretical, simulated, previously learned, or cross-domain insights as differentiable constraints within flexible neural function approximation. SKINNs jointly estimate neural network parameters and economically meaningful structural parameters in a single optimization problem, enforcing theoretical consistency not only on observed data but over a broader input domain through collocation, and therefore nesting approaches such as functional GMM, Bayesian updating, transfer learning, PINNs, and surrogate modeling. SKINNs define a class of M-estimators that are consistent and asymptotically normal with root-N convergence, sandwich covariance, and recovery of pseudo-true parameters under misspecification. We establish identification of structural parameters under joint flexibility, derive generalization and target-risk bounds under distributional shift in a convex proxy, and provide a restricted-optimal characterization of the weighting parameter that governs the bias-variance tradeoff. In an illustrative financial application to option pricing, SKINNs improve out-of-sample valuation and hedging performance, particularly at longer horizons and during high-volatility regimes, while recovering economically interpretable structural parameters with improved stability relative to conventional calibration. More broadly, SKINNs provide a general econometric framework for combining model-based reasoning with high-dimensional, data-driven estimation.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces SKINNs, a framework that jointly estimates neural and structural parameters to enforce theoretical consistency across both observed and unobserved data.
It embeds diverse forms of structured knowledge, such as BSM and HSV models, into a composite loss function, improving out-of-sample pricing and hedging performance.
The study demonstrates significant robustness under regime shifts, outperforming standard neural networks in volatile market conditions.

Structured-Knowledge-Informed Neural Networks (SKINNs): A Unified Framework Integrating Domain Knowledge and Data-Driven Learning in Finance

Introduction and Conceptual Foundations

This paper introduces Structured-Knowledge-Informed Neural Networks (SKINNs), a general estimation framework designed to bridge the longstanding divide between principled, theory-driven modeling and flexible, data-driven approaches. SKINNs embed theoretical, simulated, or previously learned knowledge as differentiable constraints within neural network learning objectives. The central innovation is the joint estimation of both model-based latent parameters and high-capacity neural network weights, enforcing theoretical consistency not just on observed data but—crucially—across a broader collocation domain that may include regions without direct observations.

SKINNs underlying objective is formulated as a composite loss function, where one component measures empirical loss (data fidelity) and the other penalizes deviations from theoretical consistency. Both the nuisance (network) parameters and economically meaningful structural parameters are treated as learnable variables in a unified, gradient-based optimization problem, which is amenable to stochastic first-order optimization.

Methodological Innovations

Key methodological contributions include:

Joint Parameter Estimation: Unlike PINNs and sequential transfer learning, SKINNs estimate both neural and structural parameters concurrently, enforcing co-adaptation via a composite objective. This allows the neural component to capture high-dimensional, non-linear empirical patterns, while the structured component regularizes the learning space.
Generalized Embedding of Structured Knowledge: SKINNs accommodate diverse forms—including closed-form parametric models (e.g., BSM, HSV), simulation-based deep surrogates (SPSKRs), and high-dimensional non-parametric objects (NPSKRs, such as empirical risk-neutral densities or autoencoder-based representations). The only universal requirement is differentiability for backpropagation.
Statistical Guarantees: The SKINNs estimator is an M-estimator with standard consistency and asymptotic normality properties, converging at the parametric $\sqrt{N}$ rate. The framework includes rigorous identification arguments for joint estimation, and provides a semi-parametric efficiency interpretation under orthogonal moment conditions.
Generalization under Distribution Shift: Formal generalization and target-risk bounds are derived, highlighting the stabilizing effect of structured regularization not just in-sample but under domain shift. Collocation-based enforcement plays a critical role here.

Empirical Application: Option Pricing

The empirical section applies SKINNs to S&P 500 index option pricing—a paradigm where theory-based structures (e.g., BSM, stochastic volatility models) are both highly influential and routinely misspecified by market realities, and where pure deep learning exhibits fragility outside the training domain.

Benchmarks and SKINNs Variants

Baselines: Classical structural models (BSM, ABSM, HSV, HSVJ), standard NNs, NN plus static constraints (e.g., shape, boundary), PINNs, and transfer learning (TLNNs) are compared.
SKINNs Variants: Multiple structured-knowledge instantiations are implemented, including parametric, semi-parametric (DSNN-HSV, DSNN-NASV), and non-parametric (MOPA, AE-based) representations.

A comprehensive S&P 500 options dataset (raw, unsmoothed, over 27 years) is used with a long forward rolling window approach, enabling extensive out-of-sample evaluation across different economic regimes—including crisis periods with high volatility.

Numerical Results and Claims

Out-of-Sample Pricing and Hedging: For short-horizon prediction, NNs and SKINNs perform comparably, with SKINNs offering marginal improvements as structured knowledge becomes more sophisticated. For long-horizon prediction and regime shifts (e.g., high volatility), all SKINNs variants statistically and substantially outperform NNs and boundary-constrained NNs, with RMSE reductions of 10–15% in normal conditions and up to 3–4 $\times$ improvements in high-volatility regimes relative to baseline NNs.
Consistency Across Evaluation Metrics: SKINNs improvements extend simultaneously to pricing accuracy and Delta-hedging performance—even though hedging is not directly targeted in training, indicating successful extracted structural information.
Statistical Significance: Diebold-Mariano and Wilcoxon signed-rank tests confirm improvement at 1–5% significance across prediction intervals.
Robustness Under Regime Shift: Regression analysis shows that the marginal improvement from SKINNs against plain NNs is significantly negative with respect to increases in average VIX, revealing that SKINNs deliver the greatest incremental benefit precisely when data-driven models are least reliable.
Figure 2: Out-of-sample option pricing performance over longer prediction horizons, where SKINN+HSV consistently outperforms all TLNN+HSV variants, particularly under distributional shift.

Analysis of Learned Structural Parameters

Interpretability and Stability: SKINNs recover economically meaningful latent parameters (such as implied volatility surfaces or risk-neutral distributions), which show time-series coherence correlated with known economic regimes. These parameters display improved numerical stability and lower variation compared to conventional calibration, especially for complex or high-dimensional models (e.g., HSVJ or empirical densities).
Predictive Economic Content: Regression of subsequent period market volatility (AvgVIX) on learned structural parameters indicates significant predictive relationships, often exceeding those from classic (calibrated) model parameters. Latent vectors learned from non-parametric representations (e.g., autoencoder, MOPA) also encode robust market regularities.
Figure 4: Learned risk-neutral variances from SKINN+MOPA exhibit strong alignment with subsequent realized market volatility (VIX), demonstrating the economic relevance of high-dimensional latent parameters.

Figure 6: Example SKINN+MOPA-learned risk-neutral probability density during the 2008 financial crisis, capturing economically significant distributional shifts.

Implications and Theoretical Insights

Unified Econometric Interpretation: SKINNs nests classical approaches—GMM, Bayesian MAP, PINN, transfer learning, and domain adaptation—as specific limiting configurations. The $\lambda$ penalty parameter is formalized as an optimal variance-minimizing GMM weight in restricted settings.
Scalability: SKINNs support latent parameter estimation in high- or infinite-dimensional settings (e.g., non-parametric densities, deep surrogate approximations) that are computationally infeasible for two-stage calibration or transfer learning approaches.
Robustness to Misspecification: Sandwich covariance estimators provide valid inference even under misspecified theory modules, and the learned parameters converge to pseudo-true values, supporting inference beyond black-box prediction.
Downstream Utility and Decision-Awareness: An extended application to asset pricing demonstrates how utility-based optimization constraints can be embedded within SKINNs’ architecture, directly informing neural model learning with downstream economic objectives (e.g., mean-variance portfolio optimization).

Practical and Theoretical Implications in AI

Practically, SKINNs provide a scalable, interpretable, and robust approach for problems with both high-dimensional data and rich, though perhaps misspecified, domain knowledge. This addresses a critical AI bottleneck for scientific and economic modeling—where model transparency and out-of-distribution generalization are paramount. Theoretically, SKINNs offer a route toward statistical inference in deep learning systems, bridging classical econometrics and modern machine learning, and enabling formal hypothesis testing for structural parameters learned jointly with complex function approximators.

Outlook and Future Developments

Extensions: Automated selection or adaptation of structured knowledge components (e.g., via hierarchical models, meta-learning) could further improve robustness and interpretability.
Beyond Finance: SKINNs is directly applicable to any domain featuring complex, semi-tractable models (physics, engineering, and structural economics).
Model Criticism and Discovery: The framework inherently supports formal model criticism by comparing out-of-sample performance and parameter stability across alternative economic specifications.

Conclusion

SKINNs establish a rigorous and computationally feasible paradigm for integrating domain theory and empirical data in high-capacity neural settings. By jointly estimating both neural and economically meaningful parameters, and by providing strong inferential and out-of-sample generalization guarantees, SKINNs unify econometric and machine learning methodologies, with significant implications for the design of interpretable, robust AI systems in finance and beyond.

Markdown Report Issue