Stylized Facts Alignment GAN
- The paper introduces differentiable stylized fact losses—capturing fat tails, volatility clustering, leverage effect, and coarse-to-fine volatility correlation—to enhance synthetic financial time series realism.
- It integrates these losses with a WGAN-GP backbone, resulting in generated data that mirror true market dynamics, as validated by extensive backtesting against real Shanghai Composite Index returns.
- The approach enables more reliable risk management and trading strategy evaluation by producing synthetic data that closely match statistical and functional properties of real financial markets.
The Stylized Facts Alignment GAN (SFAG) is a generative modeling framework designed to overcome critical limitations in synthetic financial time series creation. Conventional GAN-based approaches, notably GANs and WGAN-GP, often produce data that superficially resemble true market returns but fail under rigorous backtesting, mainly due to their neglect of structural characteristics such as extreme tails and asymmetric volatility. SFAG directly addresses these deficiencies by converting four canonical stylized facts—fat tails, volatility clustering, leverage effect, and coarse-to-fine volatility correlation—into differentiable loss terms optimized jointly with a WGAN-GP adversarial loss. This results in synthetic data that reliably mimic real-world market dynamics not only in visual diagnostics but also in trading outcomes, as demonstrated in extensive experiments on Shanghai Composite Index returns spanning 2004–2024 (Zhang et al., 19 Jan 2026).
1. Stylized-Fact Constraints and Differentiable Formulations
SFAG enforces four primary stylized facts observed in financial returns by defining each as a structural loss function. Let real return sequences be and generated sequences be , with .
- Fat Tails (GPD Tail Index): Financial returns exhibit heavy tails. SFAG fits a Generalized Pareto Distribution (GPD) to threshold exceedances and penalizes deviations in tail indices:
- Volatility Clustering (ACF of Squared Returns): Persistent autocorrelations in squared returns are captured by matching lag- autocorrelations up to a cutoff :
- Leverage Effect (Return–Volatility Asymmetry): Negative returns typically predict higher future volatility. SFAG matches the Pearson correlation between past returns and subsequent realized volatility:
- Coarse-to-Fine Volatility Correlation (CFVC): Volatility across multiple time scales is reproducibly correlated. Let assemble realized volatilities over multiple windows; penalize differences in the resulting correlation matrices:
These losses are all fully differentiable, enabling joint optimization via back-propagation through the generative network.
2. Model Architecture
SFAG employs a WGAN-GP backbone () with structurally standard generator and discriminator modules, augmented for stylized-fact alignment:
- Generator :
- Input: , sampled from .
- Output: synthetic return series , with .
- Structure: flexible time-series mapping (temporal CNN or transformer; 1D CNN stack in reported experiments).
- Discriminator :
- Input: sequence of length .
- Output: scalar realness score.
- Structure: temporal mirror CNN or MLP.
The distinguishing feature is the augmentation of generator’s objective with multiple, differentiable stylized-fact losses; the adversarial backbone remains unchanged.
3. Joint Loss and Training Protocol
The total loss for SFAG’s generator combines the WGAN-GP adversarial loss and a weighted sum of stylized-fact losses:
where
and . The hyperparameters are ramped linearly from 0 to full value over the initial 20% ($10,000$ iterations) of training to stabilize optimization.
Training pseudocode summary:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
Initialize θ, φ
Set Adam(θ, lr=2e−4, β1=0.5, β2=0.9)
Set Adam(φ, lr=2e−4, β1=0.5, β2=0.9)
for iteration = 1 to 50000 do
for t = 1 to 5 do # Discriminator update
Sample real batch {r} size 24, z ∼ N(0,I)
r̂ = Gθ(z)
Lgp = gradient‐penalty(Dφ, r, r̂)
Ladv_D = Dφ(r̂) − Dφ(r) + λgp Lgp
φ ← φ − Adam(∇φ Ladv_D)
end for
# Generator update
Sample z, r̂ = Gθ(z)
Compute losses Ladv_G, L_GPD, L_ACF, L_Lev, L_CFVC
Ltotal = Ladv_G + λ1 L_GPD + λ2 L_ACF + λ3 L_Lev + λ4 L_CFVC
θ ← θ − Adam(∇θ Ltotal)
end for |
4. Experimental Setup and Evaluation
Experiments utilized daily close-price log-returns from the Shanghai Composite Index (2004–2024; approx. 5,000 data points). Key settings:
- Sequence length days (about 10 years)
- Batch size: 24
- Latent dimension: 100
- Implementation: PyTorch, NVIDIA A100 GPU
Comparative baselines:
- Standard GAN (JS-divergence adversarial loss)
- WGAN-GP ()
Evaluation metrics:
- Stylized-fact gaps: absolute error in tail index (GPD), ACF (lag 1–20), leverage correlation, CFVC matrix.
- Backtest: 60-day momentum strategy (long if past 60-day return , else short; $5$ bps transaction cost), measuring annualized return, volatility, Sharpe ratio, max drawdown, VaR (95%), CVaR (95%).
5. Empirical Results
Stylized-Fact Alignment
SFAG demonstrates superior performance in stylized-fact preservation. Average absolute gaps (across five runs):
| Model | GPD Tail | ACF | Leverage | CFVC |
|---|---|---|---|---|
| Standard GAN | 0.2615 | 0.1431 | 32.4617 | 0.0863 |
| WGAN-GP | 0.0776 | 0.1053 | 33.7440 | 0.1021 |
| SFAG | 0.0146 | 0.0982 | 32.7516 | 0.0436 |
SFAG reduces the GPD tail index gap by over 80% versus WGAN-GP and decreases CFVC error by ~57%. Improvements in ACF and leverage gaps indicate more faithful reproduction of volatility persistence and asymmetric dynamics.
Momentum Strategy Backtest
Backtest results (average across ten generated paths):
| Metric | Real Data | Standard GAN | WGAN-GP | SFAG |
|---|---|---|---|---|
| Annualized Return | 33.10 % | 2467.24 % | 2152.07 % | 27.80 % |
| Annualized Volatility | 15.20 % | 991.83 % | 995.06 % | 9.37 % |
| Sharpe Ratio | 2.18 | 2.49 | 2.16 | 2.97 |
| Maximum Drawdown | 9.50 % | 109.87 % | 148.11 % | 4.37 % |
| VaR (95%) | –1.10 % | –78.03 % | –85.79 % | –0.91 % |
| CVaR (95%) | –2.23 % | –141.79 % | –144.62 % | –0.92 % |
Standard GAN and WGAN-GP experience “collapse,” yielding annualized returns and volatilities near , with catastrophic drawdowns and risk metrics. SFAG’s synthetic data yield backtest performance (return: , volatility: ) closely aligned with real data, producing plausible risk measures and Sharpe ratios.
6. Significance and Extensions
SFAG evidences that embedding domain-specific stylized facts as differentiable constraints is crucial for transitioning from superficial realism (e.g., visual similarity) to functional usability in financial synthetic data. Its multi-constraint structure enables generated series to pass both statistical diagnostics and trading backtests—unlike prior GAN frameworks focused solely on distributional matching.
Potential extensions and research directions:
- Adaptation to multi-asset portfolios and cross-market series (foreign exchange, commodities).
- Inclusion of further stylized facts (tail asymmetry, volatility-of-volatility, regime persistence).
- Application of alignment losses within diffusion or transformer-based networks for improved long-horizon fidelity and modeling capacity.
7. Implications and Outlook
SFAG exemplifies a shift toward structure-preserving realism in financial generative modeling, which is requisite for synthetic data to have practical utility in risk management and algorithmic trading. Aligning generative objectives with market-specific constraints such as tail properties and multi-scale volatility ensures both visual and functional relevance. A plausible implication is that structure-aware objective functions might be foundational for synthetic financial modeling beyond GANs, including in stochastic diffusion, autoregressive, or transformer-based frameworks (Zhang et al., 19 Jan 2026).