Papers
Topics
Authors
Recent
Search
2000 character limit reached

Cox-MT: Unified Survival & Event Modeling

Updated 4 February 2026
  • Cox-MT model is a family of frameworks that generalizes traditional Cox processes by integrating survival analysis, multi-task event modeling, and dependent point process inference.
  • It combines deep neural Cox regression with a Mean Teacher architecture, using both supervised and consistency-based losses to robustly learn from censored and unlabeled data.
  • The model extends to log-Gaussian Cox processes and credit risk default modeling, enabling scalable variational inference and closed-form computations for diverse applications.

The Cox-MT model refers to a family of advanced Cox process and proportional hazards frameworks that generalize survival analysis, multi-task event modeling, and dependent point process inference. The term "Cox-MT" appears in multiple recent works, denoting (1) deep semi-supervised Cox proportional hazards models utilizing a Mean Teacher architecture for survival prediction (Sun et al., 28 Jan 2026), (2) multi-task log-Gaussian Cox process constructions sharing latent functions and encoding inter-task correlations (Aglietti et al., 2018), and (3) generalized multivariate Cox processes enabling complex default dependence modeling in credit risk (Gueye et al., 7 Aug 2025). The following sections delineate key theoretical formulations, computational techniques, and applied results linked with the Cox-MT paradigm.

1. Mean Teacher Deep Cox Model in Survival Prediction

The Cox-MT implementation in (Sun et al., 28 Jan 2026) merges neural Cox regression with semi-supervised learning via the Mean Teacher protocol. The model comprises:

  • Architecture: Two feedforward neural nets (student and teacher), parameterized respectively by θ and θ′ (teacher updated by exponential moving average: θ′t ← α θ′{t–1} + (1–α) θ_t). Single-modal instances ingest high-dimensional tabular (gene expression) or image features (DINOv2 whole-slide image embeddings). Multi-modal variants tokenize features and use mutual cross-attention between modalities before a final MLP.
  • Loss function: The total loss combines supervised Cox partial-likelihood (for uncensored/time-to-event samples) and a consistency regularization across censored and unlabeled samples,
    • Supervised:

    LCox(θ)=1DeiDe[fθ(xi+ηi)logjR(ti)efθ(xj+ηj)]L_{\rm Cox}(\theta) = -\frac{1}{|D_e|}\sum_{i\in D_e} [f_\theta(x_i+\eta_i) - \log\sum_{j\in R(t_i)} e^{f_\theta(x_j+\eta_j)}] - Unlabeled/Censored Regularization:

    Lcons(θ,θ)=1DcDuiDcDu[fθ(xi+ηi)fθ(xi+ηi)]2L_{\rm cons}(\theta,\theta') = \frac{1}{|D_c\cup D_u|}\sum_{i\in D_c\cup D_u} [f_\theta(x_i+\eta_i)-f_{\theta'}(x_i+\eta'_i)]^2 - Combined:

    Ltotal=LCox+λ(t)LconsL_{\rm total} = L_{\rm Cox} + \lambda(t)L_{\rm cons}

    with λ(t)\lambda(t) typically constant (w[0.1,3]w\in[0.1,3]), optionally ramped up in early epochs.

  • Handling of censored and unlabeled data: Censored and fully unlabeled data influence learning via the Mean Teacher consistency term—teacher scores serve as soft targets, not discrete pseudo-labels. Data perturbations (noise, dropout, augmentations) yield robustness to input variability.

  • Empirical results: Cox-MT outperforms Cox-nnet across four TCGA cancer cohorts, with marked improvement as the number of unlabeled samples increases (BRCA c-index: 0.81→0.90, IBS: 0.087→0.061). Multi-modal Cox-MT leverages cross-attention to exceed single-modal performance.

This suggests Cox-MT's general recipe (student/teacher, partial-likelihood, soft regularization) may be applied to time-to-event modeling outside biology whenever labeled data is scarce and large auxiliary cohorts exist.

2. Multi-task Log-Gaussian Cox Process Model

The Cox-MT construction in (Aglietti et al., 2018) generalizes classical Cox process modeling to correlated multi-task event point-processes by:

  • Model formulation:

fp(x)=q=1QApquq(x),    λp(x)=exp(fp(x))f_p(x) = \sum_{q=1}^Q A_{pq}u_q(x), \;\; \lambda_p(x) = \exp(f_p(x))

where uq()GP(0,kq)u_q(\cdot) \sim \mathrm{GP}(0, k_q) (shared latent functions), mixing coefficients ApqA_{pq} treated as GP draws themselves: Aq(p)GP(0,kA)A_q(p) \sim \mathrm{GP}(0, k_A).

  • Moment computations: First and second moments of λp(x)\lambda_p(x) are derived in closed form, allowing calculation of expected intensities and cross-task covariance:

E[λp(x)]=exp(12qkA(p,p)kq(x,x))E[\lambda_p(x)] = \exp\bigl( \frac12 \sum_q k_A(p,p)k_q(x,x) \bigr)

Cov[λp(x),λp(x)]=exp(12(vpp+vpp+2vpp))exp(12vpp)exp(12vpp)\mathrm{Cov}[\lambda_p(x), \lambda_{p'}(x')] = \exp(\frac12(v_{pp}+v_{p'p'}+2v_{pp'})) - \exp(\frac12 v_{pp})\exp(\frac12 v_{p'p'})

  • Variational inference: Introduces inducing points for uqu_q and AqA_q; mean-field Gaussian posteriors parametrized for scalable inference. The evidence lower bound (ELBO) enables gradient optimization of model parameters.

  • Computational efficiency: Inducing-point methods enable order-of-magnitude speedup over MCMC samplers for multivariate LGCPs, scalable to P,N>50P,N>50 tasks/events.

This suggests Cox-MT is suitable for joint modeling of spatial-temporal phenomena across related event types, with direct extension to coregionalization and Bayesian hierarchical inference.

3. Multivariate Generalized Cox Processes for Dependent Defaults

The Cox-MT framework in (Gueye et al., 7 Aug 2025) addresses dependent default timing in credit risk via a multivariate construction that encompasses both common and idiosyncratic shocks:

  • Setup: Default times

τi:=inf{t0:KtiΘi}\tau_i := \inf\{t\ge0 : K^i_t \ge \Theta^i\}

with KtiK^i_t an adapted, increasing càdlàg process (typically Lévy/compound Poisson/subordinator or shot-noise), ΘiExp(1)\Theta^i \sim \mathrm{Exp}(1) independent.

  • Azéma supermartingale and compensator representation:

Zti=eKti=ηtieΛtiZ^i_t = e^{-K^i_t} = \eta^i_t e^{-\Lambda^i_t}

Under deterministic compensator assumptions, ηi1\eta^i \equiv 1, yielding Zti=eΛti, Λti=KtiZ^i_t = e^{-\Lambda^i_t}, \ \Lambda^i_t=K^i_t.

  • Construction of intensities: Each compensator is a sum of continuous and jump-driven parts,

Λi(t)=0tλic(s)ds+jXi,j(t)\Lambda^i(t) = \int_0^t \lambda^c_i(s) ds + \sum_j X_{i,j}(t)

where Xi,jX_{i,j} can encode idiosyncratic or systemic jumps.

  • Joint survival probabilities:

S(t1,,td)=exp{J{1,,d}γJmaxiJti}S(t_1,\dots,t_d) = \exp\Bigl\{ -\sum_{\emptyset\neq J\subseteq\{1,\dots,d\}} \gamma^J \max_{i\in J} t_i \Bigr\}

with Möbius-inversion weights γJ\gamma^J derived from the underlying jump parameters.

  • Special cases: Construction recovers independent Cox processes, common-factors, and pure compound Poisson cases as nested submodels.

  • Extension: Allows superposition of continuous Cox intensities and jump-driven default processes: the survival function factorizes over continuous and jump components.

  • Calibration and implementation: Analytical tractability (closed-form survival probabilities, Laplace transforms) facilitates calibration to market data and efficient numeric simulation (Monte Carlo of jump times, Fourier-Laplace inversion for survival probabilities).

This suggests Cox-MT enables unified modeling of abrupt (jump-driven) and gradual (continuous) sources of systemic and individual default risk, bridging structural and reduced-form credit risk models.

4. Calibration, Computational Implementation, and Efficiency

Across instantiations, Cox-MT models leverage analytical closed forms and scalable variational or Monte Carlo schemes:

  • Calibration: Parameters (e.g., continuous rates, Lévy exponents, cross-attention fusion layers, GP kernel hyperparameters) are fitted via maximum-likelihood, moment-matching, or gradient-based optimization. Marginal survival curves can be matched precisely, and joint dependencies tuned via latent function or noise kernel selection (Gueye et al., 7 Aug 2025, Aglietti et al., 2018).

  • Implementation:

    • Deep Cox-MT: Adam optimizer, cross-validation over learning rates, robust to input noise/dropout (Sun et al., 28 Jan 2026).
    • Multi-task Cox processes: Inducing-point selection by k-means, jitter for numerical stability, batch optimization of ELBO (Aglietti et al., 2018).
    • Default modeling: Simulation of jump times and sizes, fast convolution for shot-noise components, Laplace transform inversion for multi-period survival (Gueye et al., 7 Aug 2025).
  • Computational scaling: Variational inference with inducing points drops computational complexity from O((PN)3)O((PN)^3) to O(Q(M3+M3+PM2+PM2))O(Q(M^3+M'^3+PM^2+PM'^2)) per ELBO evaluation (Aglietti et al., 2018).

5. Applications and Empirical Results

The Cox-MT framework has demonstrated empirical strength in diverse contexts:

  • Survival analysis: Cancer prognosis prediction, with semi-supervised gains (c-index improvement up to +0.09 to +0.18, IBS reductions 0.038–0.082) and superior multi-modal fusion (Sun et al., 28 Jan 2026).
  • Spatial-temporal event modeling: Experiments on spatial crime datasets reveal Cox-MT achieves 10–100× speedup (and comparable or higher held-out log-likelihood) versus full-factorized LGCP or coregionalization models (Aglietti et al., 2018).
  • Credit risk: Closed-form default probability and tranche price computation under the Cox-MT model with joint Lévy and shot-noise factors (Gueye et al., 7 Aug 2025).

6. Generalization and Transferability

The Cox-MT paradigm is structurally transferable:

  • The Mean Teacher approach (deep Cox-MT) may be ported to any time-to-event domain, including engineering, medicine, and finance, subject to availability of large unlabeled or censored cohorts (Sun et al., 28 Jan 2026).
  • Multi-task Cox process constructions extend naturally to ecological, epidemiological, and network event prediction, accommodating arbitrary inter-process dependence via GP priors (Aglietti et al., 2018).
  • The generalized multivariate Cox process formalism admits unification of classical and jump-driven default/event models, and recovery of decorrelated or structured dependencies as special cases (Gueye et al., 7 Aug 2025).

A plausible implication is that Cox-MT models both unify the theoretical foundations of event-time modeling and provide a computationally tractable framework for learning in high-dimensional, correlated, and semi-supervised regimes.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Cox-MT Model.