Single-Index Semiparametric Cure Models

Updated 25 January 2026

The model extends traditional mixture cure frameworks by integrating a single-index structure for incidence and a semiparametric transformation for latency.
It employs estimation techniques like profile likelihood with isotonic regression and multi-layer EM algorithms to robustly handle right and interval censoring.
Applications in melanoma, Alzheimer’s, and HIV studies demonstrate improved bias, efficiency, and realistic cure fraction estimation compared to standard methods.

A single-index semiparametric transformation cure model extends the conventional mixture cure framework by integrating single-index structures for modeling covariate effects in the incidence component (probability of being “uncured”) and employing semiparametric transformation models for the latency (survival among uncured). These models relax the restrictive parametric assumptions—particularly on how covariates impact the cure fraction—while maintaining interpretability and flexibility across censoring scenarios. The framework encompasses monotonicity constraints, accommodates interval and right-censoring, and is extensible to multivariate and competing risks contexts (Musta et al., 2022, &&&1&&&, Huang et al., 18 Jan 2026).

1. Model Formulation and Structure

Let $T$ be a (possibly infinite) event time subject to censoring, and $C$ an independent censoring variable. Observed data consist of $Y = \min(T, C)$ and $\Delta = 1\{ T \leq C \}$ . A latent “cure” indicator, $B = 1\{ T < \infty \}$ , identifies uncured subjects. The general mixture cure model decomposes the conditional survival function: $S(t\mid X, Z) = 1 - \pi(X) + \pi(X) S_u(t \mid Z)$ where:

$\pi(X) = P(B=1 \mid X)$ is the incidence (“uncure” probability), $X$ is a vector of incidence covariates,
$S_u(t \mid Z)$ is the proper survival for uncured, $Z$ is a vector of latency covariates.

Single-index structure (incidence):

$\pi(X) = g(\gamma^T X)$

with an unknown nondecreasing link $g: \mathbb{R} \rightarrow (0,1)$ and constraint $\| \gamma \| = 1$ .

Latency (semiparametric transformation):

Cox proportional hazards (PH): $\lambda_u(t \mid Z) = \lambda_0(t) \exp( \beta^T Z )$ with $S_u(t \mid Z) = \exp \{ -\Lambda_0(t) e^{\beta^T Z} \}$ (Musta et al., 2022).
General transformation: $S_u(t\mid Z) = \exp\bigl( - G\{ \exp(\beta^T Z) \Lambda(t) \} \bigr)$ , with $G(x)$ for gamma-frailty or other links, and unknown monotone $\Lambda$ (Huang et al., 18 Jan 2026).
In competing risks contexts, each cause-specific cumulative incidence function is linked via $H_k( F_k(t|X) ) = h_k(t) + X^T\beta_k$ (Kattumannil et al., 2020).

2. Estimation Methodologies

Profile Likelihood and Isotonic Regression (Right-censored):

Estimation proceeds by maximizing the observed data log-likelihood: $\ell_n(\gamma, \beta, \Lambda, g) = \sum_{i=1}^n \left[ \Delta_i \{ \log g(\gamma^T x_i) + \log f_u(y_i | z_i) \} + (1-\Delta_i) \log \{ 1 - g(\gamma^T x_i) + g(\gamma^T x_i) S_u(y_i | z_i) \} \right]$ The function $g$ is estimated under a monotonicity constraint using weighted isotonic regression. For fixed $(\gamma, \beta, \Lambda)$ , the profile likelihood maximizer $\hat{g}$ is computed and used in a maximization over $(\gamma, \beta, \Lambda)$ (Musta et al., 2022).

EM Algorithms with Data Augmentation (Interval-censored):

For interval-censored data, a four-layer EM approach is implemented:

Layer 1: latent $B_i$ ,
Layer 2: gamma frailty $\xi_i$ ,
Layer 3: truncated Poisson variables for censoring,
Layer 4: decomposition for spline basis coefficients. Kernel smoothing estimates $g(\alpha^T X_i)$ ; I-splines approximate $\Lambda(t)$ . The E-step computes posteriors for the latent quantities; the M-step updates model parameters subject to identifiability and monotonicity constraints (Huang et al., 18 Jan 2026).

Counting Process Martingale-based Estimation (Competing Risks):

Estimating equations leverage counting processes for cumulative incidence in the presence of a cure fraction: $U_{\beta_k}(\beta_k,h_k) = \sum_{i=1}^n \int_0^\infty X_i \left[dN_{ik}(u) - Y_i(u)\,d\Lambda_{\varepsilon_k}(h_k(u)+X_i^T\beta_k) \right] = 0$ with $N_{ik}(t)$ the cause- $k$ counting process and $Y_i(t)$ the at-risk process (Kattumannil et al., 2020).

3. Asymptotic Theory

Consistency and asymptotic normality are established under regularity conditions: bounded support, identifiability, appropriate smoothness for $g$ , $\Lambda$ , and empirical process entropy bounds (Musta et al., 2022, Huang et al., 18 Jan 2026). Key properties include:

Uniqueness and continuity of population maximizer $g_{0,\theta}$ in the monotone class.
Uniform convergence of estimated $g$ to $g_{0,\theta}$ in $L^2$ .
Parameter estimators $\widehat{\gamma}$ , $\widehat{\beta}$ , $\widehat{\Lambda}$ are consistent and asymptotically normal ( $O_p(n^{-1/2})$ rate).
For nonparametric $g$ , convergence rates combine kernel bandwidth and sample size: $O_p((nh)^{-1/2} + h^2)$ (Huang et al., 18 Jan 2026).
Exact variance estimators derived for regression coefficients via empirical analogues.

4. Practical Implementation, Computation, and Extensions

The estimation algorithms are computationally scalable:

Each EM iteration in right-censored models involves one isotonic regression ( $O(n)$ ), one Cox fit ( $O(n \log n)$ ), and low-dimensional optimization for $\gamma$ (Musta et al., 2022).
For interval censoring, the four-layer EM method with kernel and spline steps (SMCI) converges in under a minute for moderate $n, d$ (Huang et al., 18 Jan 2026).
Bandwidth choice for kernel smoothing ( $h \propto n^{-1/5}$ ) achieves robust performance.
Extension to alternative latency models (AFT, additive hazards, general monotone transformations) is direct: the same EM and isotonic steps apply (Musta et al., 2022).

Competing risks:

Cure proportions are encoded in the sum of $K$ cause-specific transformation models; no separate logistic or multinomial cure submodel is needed. The overall cure fraction is derived directly from the limits of the baseline transformations at infinity (Kattumannil et al., 2020).

5. Empirical Performance and Applications

Simulation studies validate the improved bias, MSE, and empirical coverage of single-index semiparametric transformation cure models compared to conventional logistic-Cox approaches:

Monotone single-index models outperform unconstrained or mis-specified parametric links when monotonicity holds (Musta et al., 2022).
SMCI methods—kernel and spline variants—exhibit superior accuracy for the incidence curve and regression parameters, particularly under non-logistic or non-monotone links (Huang et al., 18 Jan 2026).
In finite samples, kernel-based SMCI (SMCI-K) demonstrates the smallest ASE for $\pi(\cdot)$ and robust parameter estimation.

Real data:

Melanoma survival (right-censored): the monotone single-index cure model provides interpretable regression coefficients and effectively recovers cure proportions (Musta et al., 2022).
Alzheimer’s disease (interval-censored, ADNI): age and APOE4 genotype significantly predict uncure probability and latency among susceptibles; SMCI reveals non-monotonic age effects in subgroups (Huang et al., 18 Jan 2026).
HIV progression (competing risks): PH-link model yields realistic cure fraction ( $\approx 16.6\%$ ), outperforming naive logistic models ( $\approx 29.6\%$ ) (Kattumannil et al., 2020).

Study	Censoring Type	Incidence Model	Latency Model	Key Finding
Musta & Yuen (Musta et al., 2022)	Right	Monotone single-index	Cox PH	Improved efficiency, real melanoma data
Huang et al. (Huang et al., 18 Jan 2026)	Interval	Single-index (kernel/spline)	Semiparametric transformation	Superior to logistic, robust in ADNI
Liu et al. (Kattumannil et al., 2020)	Right (CompRisks)	Single-index transformation	Baseline transformation, cause-specific	Direct estimation of cure in multicausal settings

6. Extensions, Generalizations, and Limitations

Single-index semiparametric transformation cure models generalize seamlessly across censoring paradigms (right, interval), allow for flexible latency modeling via arbitrary monotone transformations, and extend to competing risks and multistate frameworks. The methodology supports nonparametric smoothing of unknown links, spline-based hazard estimation, and data augmentation for complex likelihoods.

Limitations include potential sensitivity to monotonicity assumptions in the incidence link and computational load for high-dimensional covariate spaces. Identifiability depends on regularity and separation conditions. In practice, smoothness and kernel/spline tuning must be empirically cross-validated.

A plausible implication is that these models are poised to supplant purely parametric mixture cure models in settings where covariate effects are complex or the assumption of logistic form is doubtful, particularly in biomedical survival analysis with substantial cure fractions and complex censoring mechanisms (Musta et al., 2022, Huang et al., 18 Jan 2026, Kattumannil et al., 2020).