Cox-MT: Unified Survival & Event Modeling
- Cox-MT model is a family of frameworks that generalizes traditional Cox processes by integrating survival analysis, multi-task event modeling, and dependent point process inference.
- It combines deep neural Cox regression with a Mean Teacher architecture, using both supervised and consistency-based losses to robustly learn from censored and unlabeled data.
- The model extends to log-Gaussian Cox processes and credit risk default modeling, enabling scalable variational inference and closed-form computations for diverse applications.
The Cox-MT model refers to a family of advanced Cox process and proportional hazards frameworks that generalize survival analysis, multi-task event modeling, and dependent point process inference. The term "Cox-MT" appears in multiple recent works, denoting (1) deep semi-supervised Cox proportional hazards models utilizing a Mean Teacher architecture for survival prediction (Sun et al., 28 Jan 2026), (2) multi-task log-Gaussian Cox process constructions sharing latent functions and encoding inter-task correlations (Aglietti et al., 2018), and (3) generalized multivariate Cox processes enabling complex default dependence modeling in credit risk (Gueye et al., 7 Aug 2025). The following sections delineate key theoretical formulations, computational techniques, and applied results linked with the Cox-MT paradigm.
1. Mean Teacher Deep Cox Model in Survival Prediction
The Cox-MT implementation in (Sun et al., 28 Jan 2026) merges neural Cox regression with semi-supervised learning via the Mean Teacher protocol. The model comprises:
- Architecture: Two feedforward neural nets (student and teacher), parameterized respectively by θ and θ′ (teacher updated by exponential moving average: θ′t ← α θ′{t–1} + (1–α) θ_t). Single-modal instances ingest high-dimensional tabular (gene expression) or image features (DINOv2 whole-slide image embeddings). Multi-modal variants tokenize features and use mutual cross-attention between modalities before a final MLP.
- Loss function: The total loss combines supervised Cox partial-likelihood (for uncensored/time-to-event samples) and a consistency regularization across censored and unlabeled samples,
- Supervised:
- Unlabeled/Censored Regularization:
- Combined:
with typically constant (), optionally ramped up in early epochs.
Handling of censored and unlabeled data: Censored and fully unlabeled data influence learning via the Mean Teacher consistency term—teacher scores serve as soft targets, not discrete pseudo-labels. Data perturbations (noise, dropout, augmentations) yield robustness to input variability.
Empirical results: Cox-MT outperforms Cox-nnet across four TCGA cancer cohorts, with marked improvement as the number of unlabeled samples increases (BRCA c-index: 0.81→0.90, IBS: 0.087→0.061). Multi-modal Cox-MT leverages cross-attention to exceed single-modal performance.
This suggests Cox-MT's general recipe (student/teacher, partial-likelihood, soft regularization) may be applied to time-to-event modeling outside biology whenever labeled data is scarce and large auxiliary cohorts exist.
2. Multi-task Log-Gaussian Cox Process Model
The Cox-MT construction in (Aglietti et al., 2018) generalizes classical Cox process modeling to correlated multi-task event point-processes by:
- Model formulation:
where (shared latent functions), mixing coefficients treated as GP draws themselves: .
- Moment computations: First and second moments of are derived in closed form, allowing calculation of expected intensities and cross-task covariance:
Variational inference: Introduces inducing points for and ; mean-field Gaussian posteriors parametrized for scalable inference. The evidence lower bound (ELBO) enables gradient optimization of model parameters.
Computational efficiency: Inducing-point methods enable order-of-magnitude speedup over MCMC samplers for multivariate LGCPs, scalable to tasks/events.
This suggests Cox-MT is suitable for joint modeling of spatial-temporal phenomena across related event types, with direct extension to coregionalization and Bayesian hierarchical inference.
3. Multivariate Generalized Cox Processes for Dependent Defaults
The Cox-MT framework in (Gueye et al., 7 Aug 2025) addresses dependent default timing in credit risk via a multivariate construction that encompasses both common and idiosyncratic shocks:
- Setup: Default times
with an adapted, increasing càdlàg process (typically Lévy/compound Poisson/subordinator or shot-noise), independent.
- Azéma supermartingale and compensator representation:
Under deterministic compensator assumptions, , yielding .
- Construction of intensities: Each compensator is a sum of continuous and jump-driven parts,
where can encode idiosyncratic or systemic jumps.
- Joint survival probabilities:
with Möbius-inversion weights derived from the underlying jump parameters.
Special cases: Construction recovers independent Cox processes, common-factors, and pure compound Poisson cases as nested submodels.
Extension: Allows superposition of continuous Cox intensities and jump-driven default processes: the survival function factorizes over continuous and jump components.
Calibration and implementation: Analytical tractability (closed-form survival probabilities, Laplace transforms) facilitates calibration to market data and efficient numeric simulation (Monte Carlo of jump times, Fourier-Laplace inversion for survival probabilities).
This suggests Cox-MT enables unified modeling of abrupt (jump-driven) and gradual (continuous) sources of systemic and individual default risk, bridging structural and reduced-form credit risk models.
4. Calibration, Computational Implementation, and Efficiency
Across instantiations, Cox-MT models leverage analytical closed forms and scalable variational or Monte Carlo schemes:
Calibration: Parameters (e.g., continuous rates, Lévy exponents, cross-attention fusion layers, GP kernel hyperparameters) are fitted via maximum-likelihood, moment-matching, or gradient-based optimization. Marginal survival curves can be matched precisely, and joint dependencies tuned via latent function or noise kernel selection (Gueye et al., 7 Aug 2025, Aglietti et al., 2018).
Implementation:
- Deep Cox-MT: Adam optimizer, cross-validation over learning rates, robust to input noise/dropout (Sun et al., 28 Jan 2026).
- Multi-task Cox processes: Inducing-point selection by k-means, jitter for numerical stability, batch optimization of ELBO (Aglietti et al., 2018).
- Default modeling: Simulation of jump times and sizes, fast convolution for shot-noise components, Laplace transform inversion for multi-period survival (Gueye et al., 7 Aug 2025).
- Computational scaling: Variational inference with inducing points drops computational complexity from to per ELBO evaluation (Aglietti et al., 2018).
5. Applications and Empirical Results
The Cox-MT framework has demonstrated empirical strength in diverse contexts:
- Survival analysis: Cancer prognosis prediction, with semi-supervised gains (c-index improvement up to +0.09 to +0.18, IBS reductions 0.038–0.082) and superior multi-modal fusion (Sun et al., 28 Jan 2026).
- Spatial-temporal event modeling: Experiments on spatial crime datasets reveal Cox-MT achieves 10–100× speedup (and comparable or higher held-out log-likelihood) versus full-factorized LGCP or coregionalization models (Aglietti et al., 2018).
- Credit risk: Closed-form default probability and tranche price computation under the Cox-MT model with joint Lévy and shot-noise factors (Gueye et al., 7 Aug 2025).
6. Generalization and Transferability
The Cox-MT paradigm is structurally transferable:
- The Mean Teacher approach (deep Cox-MT) may be ported to any time-to-event domain, including engineering, medicine, and finance, subject to availability of large unlabeled or censored cohorts (Sun et al., 28 Jan 2026).
- Multi-task Cox process constructions extend naturally to ecological, epidemiological, and network event prediction, accommodating arbitrary inter-process dependence via GP priors (Aglietti et al., 2018).
- The generalized multivariate Cox process formalism admits unification of classical and jump-driven default/event models, and recovery of decorrelated or structured dependencies as special cases (Gueye et al., 7 Aug 2025).
A plausible implication is that Cox-MT models both unify the theoretical foundations of event-time modeling and provide a computationally tractable framework for learning in high-dimensional, correlated, and semi-supervised regimes.