Papers
Topics
Authors
Recent
Search
2000 character limit reached

Single-Index Semiparametric Cure Models

Updated 25 January 2026
  • The model extends traditional mixture cure frameworks by integrating a single-index structure for incidence and a semiparametric transformation for latency.
  • It employs estimation techniques like profile likelihood with isotonic regression and multi-layer EM algorithms to robustly handle right and interval censoring.
  • Applications in melanoma, Alzheimer’s, and HIV studies demonstrate improved bias, efficiency, and realistic cure fraction estimation compared to standard methods.

A single-index semiparametric transformation cure model extends the conventional mixture cure framework by integrating single-index structures for modeling covariate effects in the incidence component (probability of being “uncured”) and employing semiparametric transformation models for the latency (survival among uncured). These models relax the restrictive parametric assumptions—particularly on how covariates impact the cure fraction—while maintaining interpretability and flexibility across censoring scenarios. The framework encompasses monotonicity constraints, accommodates interval and right-censoring, and is extensible to multivariate and competing risks contexts (Musta et al., 2022, &&&1&&&, Huang et al., 18 Jan 2026).

1. Model Formulation and Structure

Let TT be a (possibly infinite) event time subject to censoring, and CC an independent censoring variable. Observed data consist of Y=min(T,C)Y = \min(T, C) and Δ=1{TC}\Delta = 1\{ T \leq C \}. A latent “cure” indicator, B=1{T<}B = 1\{ T < \infty \}, identifies uncured subjects. The general mixture cure model decomposes the conditional survival function: S(tX,Z)=1π(X)+π(X)Su(tZ)S(t\mid X, Z) = 1 - \pi(X) + \pi(X) S_u(t \mid Z) where:

  • π(X)=P(B=1X)\pi(X) = P(B=1 \mid X) is the incidence (“uncure” probability), XX is a vector of incidence covariates,
  • Su(tZ)S_u(t \mid Z) is the proper survival for uncured, ZZ is a vector of latency covariates.

Single-index structure (incidence):

π(X)=g(γTX)\pi(X) = g(\gamma^T X)

with an unknown nondecreasing link g:R(0,1)g: \mathbb{R} \rightarrow (0,1) and constraint γ=1\| \gamma \| = 1.

Latency (semiparametric transformation):

  • Cox proportional hazards (PH): λu(tZ)=λ0(t)exp(βTZ)\lambda_u(t \mid Z) = \lambda_0(t) \exp( \beta^T Z ) with Su(tZ)=exp{Λ0(t)eβTZ}S_u(t \mid Z) = \exp \{ -\Lambda_0(t) e^{\beta^T Z} \} (Musta et al., 2022).
  • General transformation: Su(tZ)=exp(G{exp(βTZ)Λ(t)})S_u(t\mid Z) = \exp\bigl( - G\{ \exp(\beta^T Z) \Lambda(t) \} \bigr), with G(x)G(x) for gamma-frailty or other links, and unknown monotone Λ\Lambda (Huang et al., 18 Jan 2026).
  • In competing risks contexts, each cause-specific cumulative incidence function is linked via Hk(Fk(tX))=hk(t)+XTβkH_k( F_k(t|X) ) = h_k(t) + X^T\beta_k (Kattumannil et al., 2020).

2. Estimation Methodologies

Profile Likelihood and Isotonic Regression (Right-censored):

Estimation proceeds by maximizing the observed data log-likelihood: n(γ,β,Λ,g)=i=1n[Δi{logg(γTxi)+logfu(yizi)}+(1Δi)log{1g(γTxi)+g(γTxi)Su(yizi)}]\ell_n(\gamma, \beta, \Lambda, g) = \sum_{i=1}^n \left[ \Delta_i \{ \log g(\gamma^T x_i) + \log f_u(y_i | z_i) \} + (1-\Delta_i) \log \{ 1 - g(\gamma^T x_i) + g(\gamma^T x_i) S_u(y_i | z_i) \} \right] The function gg is estimated under a monotonicity constraint using weighted isotonic regression. For fixed (γ,β,Λ)(\gamma, \beta, \Lambda), the profile likelihood maximizer g^\hat{g} is computed and used in a maximization over (γ,β,Λ)(\gamma, \beta, \Lambda) (Musta et al., 2022).

EM Algorithms with Data Augmentation (Interval-censored):

For interval-censored data, a four-layer EM approach is implemented:

  • Layer 1: latent BiB_i,
  • Layer 2: gamma frailty ξi\xi_i,
  • Layer 3: truncated Poisson variables for censoring,
  • Layer 4: decomposition for spline basis coefficients. Kernel smoothing estimates g(αTXi)g(\alpha^T X_i); I-splines approximate Λ(t)\Lambda(t). The E-step computes posteriors for the latent quantities; the M-step updates model parameters subject to identifiability and monotonicity constraints (Huang et al., 18 Jan 2026).

Counting Process Martingale-based Estimation (Competing Risks):

Estimating equations leverage counting processes for cumulative incidence in the presence of a cure fraction: Uβk(βk,hk)=i=1n0Xi[dNik(u)Yi(u)dΛεk(hk(u)+XiTβk)]=0U_{\beta_k}(\beta_k,h_k) = \sum_{i=1}^n \int_0^\infty X_i \left[dN_{ik}(u) - Y_i(u)\,d\Lambda_{\varepsilon_k}(h_k(u)+X_i^T\beta_k) \right] = 0 with Nik(t)N_{ik}(t) the cause-kk counting process and Yi(t)Y_i(t) the at-risk process (Kattumannil et al., 2020).

3. Asymptotic Theory

Consistency and asymptotic normality are established under regularity conditions: bounded support, identifiability, appropriate smoothness for gg, Λ\Lambda, and empirical process entropy bounds (Musta et al., 2022, Huang et al., 18 Jan 2026). Key properties include:

  • Uniqueness and continuity of population maximizer g0,θg_{0,\theta} in the monotone class.
  • Uniform convergence of estimated gg to g0,θg_{0,\theta} in L2L^2.
  • Parameter estimators γ^\widehat{\gamma}, β^\widehat{\beta}, Λ^\widehat{\Lambda} are consistent and asymptotically normal (Op(n1/2)O_p(n^{-1/2}) rate).
  • For nonparametric gg, convergence rates combine kernel bandwidth and sample size: Op((nh)1/2+h2)O_p((nh)^{-1/2} + h^2) (Huang et al., 18 Jan 2026).
  • Exact variance estimators derived for regression coefficients via empirical analogues.

4. Practical Implementation, Computation, and Extensions

The estimation algorithms are computationally scalable:

  • Each EM iteration in right-censored models involves one isotonic regression (O(n)O(n)), one Cox fit (O(nlogn)O(n \log n)), and low-dimensional optimization for γ\gamma (Musta et al., 2022).
  • For interval censoring, the four-layer EM method with kernel and spline steps (SMCI) converges in under a minute for moderate n,dn, d (Huang et al., 18 Jan 2026).
  • Bandwidth choice for kernel smoothing (hn1/5h \propto n^{-1/5}) achieves robust performance.
  • Extension to alternative latency models (AFT, additive hazards, general monotone transformations) is direct: the same EM and isotonic steps apply (Musta et al., 2022).

Competing risks:

Cure proportions are encoded in the sum of KK cause-specific transformation models; no separate logistic or multinomial cure submodel is needed. The overall cure fraction is derived directly from the limits of the baseline transformations at infinity (Kattumannil et al., 2020).

5. Empirical Performance and Applications

Simulation studies validate the improved bias, MSE, and empirical coverage of single-index semiparametric transformation cure models compared to conventional logistic-Cox approaches:

  • Monotone single-index models outperform unconstrained or mis-specified parametric links when monotonicity holds (Musta et al., 2022).
  • SMCI methods—kernel and spline variants—exhibit superior accuracy for the incidence curve and regression parameters, particularly under non-logistic or non-monotone links (Huang et al., 18 Jan 2026).
  • In finite samples, kernel-based SMCI (SMCI-K) demonstrates the smallest ASE for π()\pi(\cdot) and robust parameter estimation.

Real data:

  • Melanoma survival (right-censored): the monotone single-index cure model provides interpretable regression coefficients and effectively recovers cure proportions (Musta et al., 2022).
  • Alzheimer’s disease (interval-censored, ADNI): age and APOE4 genotype significantly predict uncure probability and latency among susceptibles; SMCI reveals non-monotonic age effects in subgroups (Huang et al., 18 Jan 2026).
  • HIV progression (competing risks): PH-link model yields realistic cure fraction (16.6%\approx 16.6\%), outperforming naive logistic models (29.6%\approx 29.6\%) (Kattumannil et al., 2020).
Study Censoring Type Incidence Model Latency Model Key Finding
Musta & Yuen (Musta et al., 2022) Right Monotone single-index Cox PH Improved efficiency, real melanoma data
Huang et al. (Huang et al., 18 Jan 2026) Interval Single-index (kernel/spline) Semiparametric transformation Superior to logistic, robust in ADNI
Liu et al. (Kattumannil et al., 2020) Right (CompRisks) Single-index transformation Baseline transformation, cause-specific Direct estimation of cure in multicausal settings

6. Extensions, Generalizations, and Limitations

Single-index semiparametric transformation cure models generalize seamlessly across censoring paradigms (right, interval), allow for flexible latency modeling via arbitrary monotone transformations, and extend to competing risks and multistate frameworks. The methodology supports nonparametric smoothing of unknown links, spline-based hazard estimation, and data augmentation for complex likelihoods.

Limitations include potential sensitivity to monotonicity assumptions in the incidence link and computational load for high-dimensional covariate spaces. Identifiability depends on regularity and separation conditions. In practice, smoothness and kernel/spline tuning must be empirically cross-validated.

A plausible implication is that these models are poised to supplant purely parametric mixture cure models in settings where covariate effects are complex or the assumption of logistic form is doubtful, particularly in biomedical survival analysis with substantial cure fractions and complex censoring mechanisms (Musta et al., 2022, Huang et al., 18 Jan 2026, Kattumannil et al., 2020).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Single-Index Semiparametric Transformation Cure Models.