Papers
Topics
Authors
Recent
Search
2000 character limit reached

Proportional Hazards Mixture Cure Model

Updated 11 December 2025
  • The Proportional Hazards Mixture Cure Model is a semiparametric framework that distinguishes cured subjects through logistic regression and assesses event risk using a Cox model.
  • It integrates dual components to separately capture the cure incidence and post-cure latency effects of covariates, ensuring interpretability and accurate inference.
  • Efficient estimation via the EM algorithm and profile likelihood delivers unbiased parameter estimates with well-established asymptotic properties.

A proportional hazards mixture cure model is a statistical framework for analyzing time-to-event data where a non-negligible fraction of subjects is assumed to be “cured”—that is, they will never experience the event of interest, no matter how long they are followed. This model extends traditional survival models by explicitly accounting for the cured proportion and allowing separate assessment of covariates’ effects on both the cure incidence (probability of being uncured) and the subsequent event-risk (latency) for the susceptible individuals. The standard approach specifies logistic regression for the incidence component and a Cox proportional hazards regression for the latency, integrating them into a semiparametric mixture structure. The proportional hazards mixture cure model provides a robust framework for both inference and prediction in the presence of cured subpopulations and is underpinned by a rigorously developed asymptotic theory (Mohammad et al., 2019).

1. Model Specification and Structure

Let TiT_i denote the observed survival or censoring time for individual ii, with censoring indicator δi=I{event observed}\delta_i=\mathbb I\{\text{event observed}\}. Two sets of covariates are considered: XiX_i for the cure incidence and ZiZ_i for the latency model. A latent indicator UiU_i indicates whether subject ii is “susceptible” (Ui=1U_i=1) or “cured” (Ui=0U_i=0).

  1. Cure Incidence (mixture component):

Pr(Ui=1Xi)=pi=exp(Xiγ)1+exp(Xiγ)\Pr(U_i=1 | X_i) = p_i = \frac{\exp(X_i^\top\gamma)}{1+\exp(X_i^\top\gamma)}

This logistic model yields the probability of subject ii0 being susceptible (uncured) given covariates ii1.

  1. Latency (proportional hazards component): Among susceptibles, the conditional hazard function is

ii2

and the corresponding survival function is

ii3

where ii4 is the baseline cumulative hazard.

  1. Marginal (population) survival:

ii5

This combines the cure probability and susceptible subpopulation survival in a two-component mixture (Mohammad et al., 2019).

2. Likelihood Construction and Profile Likelihood

The observed data likelihood, treating the latent ii6 as missing for censored cases, is constructed as:

ii7

with ii8. The baseline hazard ii9 is eliminated via profiling, using a weighted Breslow-type estimator:

δi=I{event observed}\delta_i=\mathbb I\{\text{event observed}\}0

where the weights

δi=I{event observed}\delta_i=\mathbb I\{\text{event observed}\}1

are the expected posterior probabilities of δi=I{event observed}\delta_i=\mathbb I\{\text{event observed}\}2 given the current parameters and observed data.

Plugging δi=I{event observed}\delta_i=\mathbb I\{\text{event observed}\}3 into the likelihood yields the profile likelihood δi=I{event observed}\delta_i=\mathbb I\{\text{event observed}\}4, which forms the basis for efficient estimation and inference (Mohammad et al., 2019).

3. Asymptotic Theory, Efficiency, and Variance Estimation

The model’s estimation theory is grounded in semiparametric M-estimation and tangent space projections. Key features:

  • The profile likelihood score for δi=I{event observed}\delta_i=\mathbb I\{\text{event observed}\}5 at the profiled baseline hazard is:
    • Incidence: δi=I{event observed}\delta_i=\mathbb I\{\text{event observed}\}6
    • Latency: δi=I{event observed}\delta_i=\mathbb I\{\text{event observed}\}7
    • with δi=I{event observed}\delta_i=\mathbb I\{\text{event observed}\}8 appropriately weighted in risk sets.
  • Under standard regularity (bounded covariates, positivity, identifiability), δi=I{event observed}\delta_i=\mathbb I\{\text{event observed}\}9 are asymptotically normal:

XiX_i0

where XiX_i1 is the profile-efficient information matrix, computed as the variance of the score vector.

  • Mohammad et al. demonstrate that the profile-likelihood score equals the efficient score obtained via projection theory, and the efficient information is given by the sample variance of the score function (Mohammad et al., 2019).

Empirical information and standard errors can be consistently estimated by:

XiX_i2

where XiX_i3 denotes the individual score vector contributions at the estimated parameters.

4. Estimation Algorithms and Practical Implementation

Estimation is typically performed via a combination of the EM algorithm and profile likelihood maximization:

  • E-step: Compute posterior weights XiX_i4 for each subject, reflecting the probability of being uncured given the observed time/censoring and current parameters.
  • M-step: Update the incidence parameters by weighted logistic regression, and the latency (proportional hazards) parameters by a weighted partial likelihood approach.

Computation of the nonparametric baseline cumulative hazard uses a recursive Breslow estimator with the XiX_i5 as weights. This approach is implemented in the SMCURE R package and provides consistent, efficient inference for both model components (Mohammad et al., 2019).

5. Simulation Studies and Data Applications

Simulations by Mohammad et al. across a range of cure-rate scenarios (11–75%) and covariate settings demonstrate:

  • Estimators for both incidence and latency coefficients exhibit negligible bias (XiX_i6).
  • Standard errors computed analytically via the profile score closely match those obtained by nonparametric bootstrap, with consistent 95% coverage.
  • Application to ECOG E1684 melanoma data confirms that both statistical significance and coefficient magnitudes are stable across profile likelihood and SMCURE-bootstrap approaches, with interpretational clarity provided by modeling cure and latency separately (Mohammad et al., 2019).

6. Extensions, Generalizations, and Comparative Results

The proportional hazards mixture cure model is flexible and can be extended or embedded in richer frameworks:

These developments maintain the core structure: logistic (or, more generally, flexible) incidence modeling coupled with a proportional hazards latency, under a two-component mixture for population survival.

7. Theoretical and Practical Implications

Adoption of the proportional hazards mixture cure model addresses a major limitation of standard survival analyses in the presence of cure, namely, the overestimation of long-term risk when cured patients are not separated. Efficient estimation approaches, rooted in profile likelihood and projection-theoretic efficiency, provide both theoretical rigor and practical accuracy for inference about both cure incidence and latency parameters. Empirical variance estimation via profile scores offers robust alternatives to computationally intensive bootstrap procedures, facilitating large-scale applications (Mohammad et al., 2019).

In conclusion, the proportional hazards mixture cure model is a foundational semiparametric cure modeling framework, allowing distinct and interpretable modeling of both the cure process and post-cure event dynamics, with a well-developed theory of efficient estimation, variance, and practical implementation.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Proportional Hazards Mixture Cure Model.