Papers
Topics
Authors
Recent
Search
2000 character limit reached

Regression Model for Censored Data

Updated 17 October 2025
  • Regression models for censored data are specialized frameworks that address partially observed responses through methods like right-censoring in survival analysis.
  • Dimension reduction via conditional independence and single-index models simplifies high-dimensional kernel smoothing, mitigating the curse of dimensionality.
  • Weighted empirical risk minimization and two-stage trimmed least squares yield asymptotically normal inference even with complex, high-dimensional covariates.

A regression model for censored data refers to any statistical or machine learning framework in which the response variable is only partially observed due to censoring—most commonly through right-censoring in survival analysis, but also through left, interval, or random censoring mechanisms. The canonical paradigm involves having a multivariate covariate XRdX \in \mathbb{R}^d and a univariate response %%%%1%%%% which is observable only up to an associated censoring variable CC, so that one actually observes T=YCT = Y \wedge C and an indicator δ=1{YC}\delta = \mathbb{1}\{Y \leq C\}. The principal challenge is to devise estimators for regression or conditional distribution functionals that correctly account for the censorship while controlling for the curse of dimensionality stemming from high-dimensional covariates.

1. Problem Setup and the Curse of Dimensionality

In the censored data regression context, for a sample of i.i.d. triplets (Xi,Ti,δi)(X_i, T_i, \delta_i), the fundamental object is the estimation of a regression or distributional functional of the unobserved (Xi,Yi)(X_i, Y_i). Naïve nonparametric estimators for, for instance,

F(x,y)=P(Xx,Yy)F(x, y) = \mathbb{P}(X \leq x, Y \leq y)

nominally require kernel or histogram smoothing in dd dimensions, leading to exponential sample size requirements in dd. Direct approaches are thus practically infeasible for moderate-to-large dd, motivating the need for dimension reduction assumptions that specifically exploit structure in the dependence of YY (and/or CC) on XX.

2. Dimension Reduction via Conditional Independence and Single-Index Models

The central structural assumption (A0) posits that for some known g:RdRg: \mathbb{R}^d \to \mathbb{R} (often taken as g(X)=λ(θ,X)g(X) = \lambda(\theta, X) for a low-dimensional parameter θ\theta), the censoring variable CC and the response YY are conditionally independent given g(X)g(X). That is,

YCg(X).Y \perp C \mid g(X).

Typically, g(X)g(X) is parameterized so that k=dim(θ)dk = \text{dim}(\theta) \ll d, and the estimation of conditional functionals can be performed by nonparametric smoothing on g(X)g(X), thereby bypassing the high-dimensionality of XX in the smoothing step.

In the regression problem, a further dimension reduction is introduced via a mean regression single-index model for a possibly truncated response:

E[Y1{Yτ}X]=m(β0X),τ<,\mathbb{E}[Y \,\mathbb{1}\{Y \leq \tau\} \mid X] = m(\beta_0^\top X), \qquad \tau < \infty,

where β0Rd\beta_0 \in \mathbb{R}^d is a finite-dimensional parameter and mm is an unknown smooth function. This structure means all information about XX influencing the mean is projected onto the one-dimensional index β0X\beta_0^\top X.

3. Construction of Joint Distribution and Regression Estimators

3.1. Joint Distribution Estimation

Given the conditional independence structure via g(X)g(X), the conditional distribution of censoring at time tt is estimated via a generalized Beran estimator:

G^θ(tz)=1Tit[1wi(θ)(z)j=1nwj(θ)(z)1{TjTi}]1δi,\hat{G}_{\theta}(t \mid z) = 1 - \prod_{T_i \leq t}[1 - \frac{w_i^{(\theta)}(z)}{\sum_{j=1}^n w_j^{(\theta)}(z) \mathbb{1}\{T_j \geq T_i\}}]^{1-\delta_i},

where wi(θ)(z)=K(λ(θ,Xi)zan)/j=1nK(λ(θ,Xj)zan)w_i^{(\theta)}(z) = K\left(\frac{\lambda(\theta, X_i) - z}{a_n}\right) / \sum_{j=1}^n K\left(\frac{\lambda(\theta, X_j) - z}{a_n}\right) for a univariate kernel KK and bandwidth an0a_n \to 0.

The joint estimator of F(x,y)F(x, y) then takes the form

F^g^(x,y)=1ni=1nδi1{Tiy,Xix}1G^θ^(Tig^(Xi)),\widehat{F}_{\hat{g}}(x, y) = \frac{1}{n} \sum_{i=1}^n \frac{\delta_i \mathbb{1}\{T_i \leq y, X_i \leq x\}}{1 - \hat{G}_{\hat{\theta}}(T_i^- \mid \hat{g}(X_i))},

where θ^\hat{\theta} is a root-nn-consistent estimator for the index parameter. This construction corrects for the effect of censoring via a weighting scheme that adapts to the conditional survival of censoring.

3.2. Mean Regression Single-Index Estimation

The regression parameter is estimated via a two-stage trimmed minimum least squares approach:

  1. Initial estimator: Minimize

Mn(β,f^,J~)=(yf^(βx;β))21{yτ}J~(x)dF^g^(x,y)M_n(\beta, \hat{f}, \tilde{J}) = \int (y - \hat{f}(\beta^\top x; \beta))^2 \mathbb{1}\{y \leq \tau\} \tilde{J}(x) d\widehat{F}_{\hat{g}}(x, y)

over β\beta in a compact set, with initial trimming function J~\tilde{J} and nonparametric kernel estimator for f^\hat{f}.

  1. Final estimator: With preliminary βn\beta_n, update the trimming region J(x)=1{fβτ(βnx)>c}J(x) = \mathbb{1}\{f_\beta^\tau(\beta_n^\top x) > c\} and minimize the same criterion over β\beta in shrinking neighborhoods to obtain β^\hat{\beta}. Here, fβτf_\beta^\tau is the density of βX\beta^\top X under truncation at YτY \leq \tau.

The nonparametric regression function is estimated as

f^(t;β)=K(βxth)y1{yτ}dF^g^(x,y)K(βxth)1{yτ}dF^g^(x,y),\hat{f}(t; \beta) = \frac{\int K\left(\frac{\beta^\top x - t}{h}\right) y \mathbb{1}\{y \leq \tau\} d\widehat{F}_{\hat{g}}(x, y)}{\int K\left(\frac{\beta^\top x - t}{h}\right) \mathbb{1}\{y \leq \tau\} d\widehat{F}_{\hat{g}}(x, y)},

where KK is a univariate kernel and hh is a bandwidth.

4. Asymptotic Theory and Efficiency

Under appropriate regularity (smoothness of GθG_\theta, positivity of densities, conditions on kernels and bandwidths), the estimators admit uniform consistency and i.i.d.-style asymptotic (influence function) representations for general functionals:

supϕFϕ(x,y)d[F^g^F](x,y)0a.s.\sup_{\phi \in \mathcal{F}} \left| \int \phi(x, y) d[\widehat{F}_{\hat{g}} - F](x, y) \right| \to 0\quad \text{a.s.}

For the regression parameter,

n(β^β0)=Ω1(1ni=1nη(Ti,δi,Xi))+op(1),\sqrt{n}(\hat{\beta} - \beta_0) = \Omega^{-1} \left( \frac{1}{n} \sum_{i=1}^n \eta(T_i, \delta_i, X_i) \right) + o_p(1),

with Ω\Omega defined in terms of the derivatives of the regression function, trimming regions, and the conditional distribution of YY. The estimator is root-nn consistent and asymptotically normal:

n(β^β0)dN(0,Ω1ΣΩ1),\sqrt{n}(\hat{\beta} - \beta_0) \xrightarrow{d} \mathcal{N}(0, \Omega^{-1} \Sigma \Omega^{-1}),

with Σ\Sigma computable from the influence function representation.

5. Methodological Innovations and Practical Implications

The methodology fundamentally leverages:

  • Dimension reduction in the censoring model: By parametrizing CYg(X)C \perp Y \mid g(X), kernel estimation for censoring correction operates in one dimension, reducing variance and avoiding the curse of dimensionality.
  • Single-index regression: The mean structure is reduced to βX\beta^\top X, allowing a fully nonparametric regression function mm over a single argument, further mitigating dimensionality issues.
  • Weighted empirical risk minimization: Both the joint distribution and the regression parameter estimators use weights that correct for censoring, via conditioning on the low-dimensional summaries of covariate information.

These allow consistent, asymptotically normal inference about joint and regression functionals in high-dimensional censored data, provided the conditional independence and single-index assumptions hold.

6. Theoretical and Computational Considerations

The kernel smoothing steps require bandwidth selection, and the estimator for θ\theta in g(X)g(X) must be root-nn consistent. The two-stage regression procedure requires careful selection of the trimming region and control of approximation error at the boundaries of the covariate space. Martingale asymptotics and counting process theory underlie the uniform convergence and central limit behavior.

The approach is implementable with moderate computational resources when dd is large but kk is small and kernel density estimation is practical. The main computational burden is in the iterative nonparametric estimation of the conditional censoring distribution and in the optimization over the regression parameter.

7. Impact on High-Dimensional Survival and Censored Regression

This framework addresses two major limitations in previous censored regression methodology:

  • It circumvents the high-variance, low-precision regime induced by kernel smoothing in high dimensions by explicit and testable dimension reduction in both censoring and regression index structure.
  • It provides asymptotics and practical implementation steps for kernel-based censored regression that are robust to high-dimensional, potentially complex, covariate distributions and censoring that depends on observable covariates.

The approach has direct implications for large-scale biomedical survival analysis, reliability studies with high-dimensional predictors, and in semiparametric regression models where the classic Cox proportional hazards model’s assumptions are not tenable.


Key Formula Recap:

Quantity Formula
Joint distribution estimator F^g^(x,y)=1ni=1nδi1{Tiy,Xix}1G^θ^(Tig^(Xi))\widehat{F}_{\hat{g}}(x, y) = \frac{1}{n} \sum_{i=1}^n \frac{\delta_i \mathbb{1}\{T_i \leq y, X_i \leq x\}}{1 - \hat{G}_{\hat{\theta}}(T_i^- \mid \hat{g}(X_i))}
Conditional censoring cdf Gθ(tz)=P(Ctλ(θ,X)=z)G_{\theta}(t|z) = \mathbb{P}(C \leq t \mid \lambda(\theta, X) = z)
Nonparametric regression estimator f^(t;β)=K(βxth)y1{yτ}dF^g^(x,y)K(βxth)1{yτ}dF^g^(x,y)\hat{f}(t; \beta) = \frac{\int K\left(\frac{\beta^\top x - t}{h}\right) y \mathbb{1}\{y \leq \tau\} d\widehat{F}_{\hat{g}}(x, y)}{\int K\left(\frac{\beta^\top x - t}{h}\right) \mathbb{1}\{y \leq \tau\} d\widehat{F}_{\hat{g}}(x, y)}
Regression parameter estimator β^=argminβBn[yf^(βx;β)]21{yτ}J(x)dF^g^(x,y)\hat{\beta} = \arg\min_{\beta \in \mathcal{B}_n} \int [y - \hat{f}(\beta^\top x; \beta)]^2 \mathbb{1}\{y \leq \tau\} J(x) d\widehat{F}_{\hat{g}}(x, y)
Asymptotic linearization n(β^β0)=Ω11ni=1nη(Ti,δi,Xi)+op(1)\sqrt{n}(\hat{\beta} - \beta_0) = \Omega^{-1} \frac{1}{n} \sum_{i=1}^n \eta(T_i, \delta_i, X_i) + o_p(1)

This approach constitutes a significant advance in the methodology for regression with censored data under high-dimensional covariates, enabling practical and theoretically valid inference when classical nonparametric and semiparametric approaches are no longer feasible due to dimension and censoring dependencies (Lopez et al., 2011).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Regression Model for Censored Data.