Papers
Topics
Authors
Recent
Search
2000 character limit reached

Confidence Interval Clipping via SCAD

Updated 22 February 2026
  • The confidence interval-based clipping method uses the SCAD estimator to create intervals that balance variable selection and estimation in sparse linear models.
  • It employs a thresholding mechanism that clips small signals while preserving the near-unbiased estimation of large coefficients in an orthonormal design.
  • The approach achieves oracle properties asymptotically and ensures conservative coverage, though reducing the expected interval length uniformly is inherently challenging.

A confidence interval-based clipping method refers to constructing confidence intervals (CIs) for regression coefficients where both the center and radius of the interval depend on a clipped or thresholded estimator, specifically the smoothly clipped absolute deviation (SCAD) estimator, rather than the classical least squares estimator. This approach is motivated by variable selection scenarios in sparse linear models, balancing the goals of selection, estimation, and valid post-selection inference. The method is particularly relevant for regression models with orthonormal design matrices and aims to extend the oracle and shrinkage properties of SCAD to associated inferential intervals (Farchione et al., 2012).

1. SCAD Penalty and Its Derivatives

The SCAD penalty pλ(θ)p_\lambda(\theta), proposed by Fan and Li (2001), is a nonconcave penalty used in penalized regression to encourage sparsity while alleviating bias for large coefficients. It is parameterized by λ>0\lambda > 0 (tuning parameter) and a>2a > 2 (typically a=3.7a=3.7):

pλ(θ)={λ∣θ∣,∣θ∣≤λ, −θ2−2aλ∣θ∣+λ22(a−1),λ<∣θ∣≤aλ, (a+1)λ22,∣θ∣>aλ.p_\lambda(\theta) = \begin{cases} \lambda |\theta|, & |\theta| \leq \lambda, \ -\frac{\theta^2 - 2 a \lambda |\theta| + \lambda^2}{2(a-1)}, & \lambda < |\theta| \leq a \lambda, \ \frac{(a+1)\lambda^2}{2}, & |\theta| > a \lambda. \end{cases}

Its derivative pλ′(θ)p'_\lambda(\theta) is

pλ′(θ)={λ sign(θ),∣θ∣≤λ, aλ−∣θ∣a−1 sign(θ),λ<∣θ∣≤aλ, 0,∣θ∣>aλ.p'_\lambda(\theta) = \begin{cases} \lambda\,\mathrm{sign}(\theta), & |\theta| \leq \lambda, \ \frac{a\lambda - |\theta|}{a-1}\,\mathrm{sign}(\theta), & \lambda < |\theta| \leq a\lambda, \ 0, & |\theta| > a\lambda. \end{cases}

Alternatively, for t≥0t \geq 0,

pλ′(t)=λI(t≤λ)+(aλ−t)+a−1I(t>λ),p'_\lambda(t) = \lambda I(t \leq \lambda) + \frac{(a\lambda - t)_+}{a-1} I(t > \lambda),

with pλ(θ)=∫0∣θ∣pλ′(t) dtp_\lambda(\theta) = \int_0^{|\theta|} p'_\lambda(t)\,dt.

SCAD is designed to perform thresholding (clipping small signals to zero) for variable selection, with less shrinkage for large signals, thereby maintaining model selection consistency and the oracle property (Farchione et al., 2012).

2. Construction of the SCAD Estimator in Orthonormal Designs

Under the standard Gaussian linear regression model λ>0\lambda > 00, λ>0\lambda > 01, with orthonormal design λ>0\lambda > 02, consider estimating a specific component λ>0\lambda > 03. The least-squares estimator is λ>0\lambda > 04, and λ>0\lambda > 05 is the unbiased estimator for λ>0\lambda > 06.

The SCAD estimator λ>0\lambda > 07 is the minimizer of λ>0\lambda > 08, where λ>0\lambda > 09, a>2a > 20. The explicit solution is

a>2a > 21

This estimator combines hard thresholding for small signals and near-unbiased estimation for large coefficients. For a>2a > 22, it coincides with the least-squares estimator (Farchione et al., 2012).

3. Confidence Interval Construction and Clipping Mechanism

The classical a>2a > 23 confidence interval for a>2a > 24 is a>2a > 25, where a>2a > 26 is the a>2a > 27 quantile of the a>2a > 28 distribution, with a>2a > 29.

The confidence interval centred on a=3.7a=3.70 adopts the form:

a=3.7a=3.71

where a=3.7a=3.72 is continuous with a=3.7a=3.73 for a=3.7a=3.74, a=3.7a=3.75.

Thus, the lower and upper endpoints are

a=3.7a=3.76

For a=3.7a=3.77, the interval a=3.7a=3.78 reduces to the classical interval, because both the center and width revert to the least-squares solution, enforcing "clipping" at large signal-to-noise ratios (Farchione et al., 2012).

4. Finite-Sample and Asymptotic Properties

4.1 Coverage Probability

Let a=3.7a=3.79 and pλ(θ)={λ∣θ∣,∣θ∣≤λ, −θ2−2aλ∣θ∣+λ22(a−1),λ<∣θ∣≤aλ, (a+1)λ22,∣θ∣>aλ.p_\lambda(\theta) = \begin{cases} \lambda |\theta|, & |\theta| \leq \lambda, \ -\frac{\theta^2 - 2 a \lambda |\theta| + \lambda^2}{2(a-1)}, & \lambda < |\theta| \leq a \lambda, \ \frac{(a+1)\lambda^2}{2}, & |\theta| > a \lambda. \end{cases}0. Define the function pλ(θ)={λ∣θ∣,∣θ∣≤λ, −θ2−2aλ∣θ∣+λ22(a−1),λ<∣θ∣≤aλ, (a+1)λ22,∣θ∣>aλ.p_\lambda(\theta) = \begin{cases} \lambda |\theta|, & |\theta| \leq \lambda, \ -\frac{\theta^2 - 2 a \lambda |\theta| + \lambda^2}{2(a-1)}, & \lambda < |\theta| \leq a \lambda, \ \frac{(a+1)\lambda^2}{2}, & |\theta| > a \lambda. \end{cases}1:

pλ(θ)={λ∣θ∣,∣θ∣≤λ, −θ2−2aλ∣θ∣+λ22(a−1),λ<∣θ∣≤aλ, (a+1)λ22,∣θ∣>aλ.p_\lambda(\theta) = \begin{cases} \lambda |\theta|, & |\theta| \leq \lambda, \ -\frac{\theta^2 - 2 a \lambda |\theta| + \lambda^2}{2(a-1)}, & \lambda < |\theta| \leq a \lambda, \ \frac{(a+1)\lambda^2}{2}, & |\theta| > a \lambda. \end{cases}2

The coverage probability of pλ(θ)={λ∣θ∣,∣θ∣≤λ, −θ2−2aλ∣θ∣+λ22(a−1),λ<∣θ∣≤aλ, (a+1)λ22,∣θ∣>aλ.p_\lambda(\theta) = \begin{cases} \lambda |\theta|, & |\theta| \leq \lambda, \ -\frac{\theta^2 - 2 a \lambda |\theta| + \lambda^2}{2(a-1)}, & \lambda < |\theta| \leq a \lambda, \ \frac{(a+1)\lambda^2}{2}, & |\theta| > a \lambda. \end{cases}3 is

pλ(θ)={λ∣θ∣,∣θ∣≤λ, −θ2−2aλ∣θ∣+λ22(a−1),λ<∣θ∣≤aλ, (a+1)λ22,∣θ∣>aλ.p_\lambda(\theta) = \begin{cases} \lambda |\theta|, & |\theta| \leq \lambda, \ -\frac{\theta^2 - 2 a \lambda |\theta| + \lambda^2}{2(a-1)}, & \lambda < |\theta| \leq a \lambda, \ \frac{(a+1)\lambda^2}{2}, & |\theta| > a \lambda. \end{cases}4

where pλ(θ)={λ∣θ∣,∣θ∣≤λ, −θ2−2aλ∣θ∣+λ22(a−1),λ<∣θ∣≤aλ, (a+1)λ22,∣θ∣>aλ.p_\lambda(\theta) = \begin{cases} \lambda |\theta|, & |\theta| \leq \lambda, \ -\frac{\theta^2 - 2 a \lambda |\theta| + \lambda^2}{2(a-1)}, & \lambda < |\theta| \leq a \lambda, \ \frac{(a+1)\lambda^2}{2}, & |\theta| > a \lambda. \end{cases}5 is the density of pλ(θ)={λ∣θ∣,∣θ∣≤λ, −θ2−2aλ∣θ∣+λ22(a−1),λ<∣θ∣≤aλ, (a+1)λ22,∣θ∣>aλ.p_\lambda(\theta) = \begin{cases} \lambda |\theta|, & |\theta| \leq \lambda, \ -\frac{\theta^2 - 2 a \lambda |\theta| + \lambda^2}{2(a-1)}, & \lambda < |\theta| \leq a \lambda, \ \frac{(a+1)\lambda^2}{2}, & |\theta| > a \lambda. \end{cases}6, and pλ(θ)={λ∣θ∣,∣θ∣≤λ, −θ2−2aλ∣θ∣+λ22(a−1),λ<∣θ∣≤aλ, (a+1)λ22,∣θ∣>aλ.p_\lambda(\theta) = \begin{cases} \lambda |\theta|, & |\theta| \leq \lambda, \ -\frac{\theta^2 - 2 a \lambda |\theta| + \lambda^2}{2(a-1)}, & \lambda < |\theta| \leq a \lambda, \ \frac{(a+1)\lambda^2}{2}, & |\theta| > a \lambda. \end{cases}7. Key coverage properties are:

  • pλ(θ)={λ∣θ∣,∣θ∣≤λ, −θ2−2aλ∣θ∣+λ22(a−1),λ<∣θ∣≤aλ, (a+1)λ22,∣θ∣>aλ.p_\lambda(\theta) = \begin{cases} \lambda |\theta|, & |\theta| \leq \lambda, \ -\frac{\theta^2 - 2 a \lambda |\theta| + \lambda^2}{2(a-1)}, & \lambda < |\theta| \leq a \lambda, \ \frac{(a+1)\lambda^2}{2}, & |\theta| > a \lambda. \end{cases}8 is an even function of pλ(θ)={λ∣θ∣,∣θ∣≤λ, −θ2−2aλ∣θ∣+λ22(a−1),λ<∣θ∣≤aλ, (a+1)λ22,∣θ∣>aλ.p_\lambda(\theta) = \begin{cases} \lambda |\theta|, & |\theta| \leq \lambda, \ -\frac{\theta^2 - 2 a \lambda |\theta| + \lambda^2}{2(a-1)}, & \lambda < |\theta| \leq a \lambda, \ \frac{(a+1)\lambda^2}{2}, & |\theta| > a \lambda. \end{cases}9.
  • For any pλ′(θ)p'_\lambda(\theta)0 with pλ′(θ)p'_\lambda(\theta)1 for pλ′(θ)p'_\lambda(\theta)2, pλ′(θ)p'_\lambda(\theta)3 as pλ′(θ)p'_\lambda(\theta)4.
  • Numerically, pλ′(θ)p'_\lambda(\theta)5 strictly exceeds pλ′(θ)p'_\lambda(\theta)6 for small pλ′(θ)p'_\lambda(\theta)7, so pλ′(θ)p'_\lambda(\theta)8 cannot be reduced in that region without violating nominal coverage (Farchione et al., 2012).

4.2 Oracle Properties and Asymptotics

In the asymptotic regime where pλ′(θ)p'_\lambda(\theta)9 and pλ′(θ)={λ sign(θ),∣θ∣≤λ, aλ−∣θ∣a−1 sign(θ),λ<∣θ∣≤aλ, 0,∣θ∣>aλ.p'_\lambda(\theta) = \begin{cases} \lambda\,\mathrm{sign}(\theta), & |\theta| \leq \lambda, \ \frac{a\lambda - |\theta|}{a-1}\,\mathrm{sign}(\theta), & \lambda < |\theta| \leq a\lambda, \ 0, & |\theta| > a\lambda. \end{cases}0, the SCAD estimator achieves the oracle property: it sets truly zero coefficients to exactly zero with probability tending to one, and for nonzero coefficients, pλ′(θ)={λ sign(θ),∣θ∣≤λ, aλ−∣θ∣a−1 sign(θ),λ<∣θ∣≤aλ, 0,∣θ∣>aλ.p'_\lambda(\theta) = \begin{cases} \lambda\,\mathrm{sign}(\theta), & |\theta| \leq \lambda, \ \frac{a\lambda - |\theta|}{a-1}\,\mathrm{sign}(\theta), & \lambda < |\theta| \leq a\lambda, \ 0, & |\theta| > a\lambda. \end{cases}1. Since pλ′(θ)={λ sign(θ),∣θ∣≤λ, aλ−∣θ∣a−1 sign(θ),λ<∣θ∣≤aλ, 0,∣θ∣>aλ.p'_\lambda(\theta) = \begin{cases} \lambda\,\mathrm{sign}(\theta), & |\theta| \leq \lambda, \ \frac{a\lambda - |\theta|}{a-1}\,\mathrm{sign}(\theta), & \lambda < |\theta| \leq a\lambda, \ 0, & |\theta| > a\lambda. \end{cases}2 converges to the standard pλ′(θ)={λ sign(θ),∣θ∣≤λ, aλ−∣θ∣a−1 sign(θ),λ<∣θ∣≤aλ, 0,∣θ∣>aλ.p'_\lambda(\theta) = \begin{cases} \lambda\,\mathrm{sign}(\theta), & |\theta| \leq \lambda, \ \frac{a\lambda - |\theta|}{a-1}\,\mathrm{sign}(\theta), & \lambda < |\theta| \leq a\lambda, \ 0, & |\theta| > a\lambda. \end{cases}3-interval when pλ′(θ)={λ sign(θ),∣θ∣≤λ, aλ−∣θ∣a−1 sign(θ),λ<∣θ∣≤aλ, 0,∣θ∣>aλ.p'_\lambda(\theta) = \begin{cases} \lambda\,\mathrm{sign}(\theta), & |\theta| \leq \lambda, \ \frac{a\lambda - |\theta|}{a-1}\,\mathrm{sign}(\theta), & \lambda < |\theta| \leq a\lambda, \ 0, & |\theta| > a\lambda. \end{cases}4, its coverage and central tendency inherit these oracle characteristics for large pλ′(θ)={λ sign(θ),∣θ∣≤λ, aλ−∣θ∣a−1 sign(θ),λ<∣θ∣≤aλ, 0,∣θ∣>aλ.p'_\lambda(\theta) = \begin{cases} \lambda\,\mathrm{sign}(\theta), & |\theta| \leq \lambda, \ \frac{a\lambda - |\theta|}{a-1}\,\mathrm{sign}(\theta), & \lambda < |\theta| \leq a\lambda, \ 0, & |\theta| > a\lambda. \end{cases}5.

4.3 Interval Length under Sparsity

The scaled expected length is

pλ′(θ)={λ sign(θ),∣θ∣≤λ, aλ−∣θ∣a−1 sign(θ),λ<∣θ∣≤aλ, 0,∣θ∣>aλ.p'_\lambda(\theta) = \begin{cases} \lambda\,\mathrm{sign}(\theta), & |\theta| \leq \lambda, \ \frac{a\lambda - |\theta|}{a-1}\,\mathrm{sign}(\theta), & \lambda < |\theta| \leq a\lambda, \ 0, & |\theta| > a\lambda. \end{cases}6

Desirable properties include pλ′(θ)={λ sign(θ),∣θ∣≤λ, aλ−∣θ∣a−1 sign(θ),λ<∣θ∣≤aλ, 0,∣θ∣>aλ.p'_\lambda(\theta) = \begin{cases} \lambda\,\mathrm{sign}(\theta), & |\theta| \leq \lambda, \ \frac{a\lambda - |\theta|}{a-1}\,\mathrm{sign}(\theta), & \lambda < |\theta| \leq a\lambda, \ 0, & |\theta| > a\lambda. \end{cases}7 as pλ′(θ)={λ sign(θ),∣θ∣≤λ, aλ−∣θ∣a−1 sign(θ),λ<∣θ∣≤aλ, 0,∣θ∣>aλ.p'_\lambda(\theta) = \begin{cases} \lambda\,\mathrm{sign}(\theta), & |\theta| \leq \lambda, \ \frac{a\lambda - |\theta|}{a-1}\,\mathrm{sign}(\theta), & \lambda < |\theta| \leq a\lambda, \ 0, & |\theta| > a\lambda. \end{cases}8. However, minimizing pλ′(θ)={λ sign(θ),∣θ∣≤λ, aλ−∣θ∣a−1 sign(θ),λ<∣θ∣≤aλ, 0,∣θ∣>aλ.p'_\lambda(\theta) = \begin{cases} \lambda\,\mathrm{sign}(\theta), & |\theta| \leq \lambda, \ \frac{a\lambda - |\theta|}{a-1}\,\mathrm{sign}(\theta), & \lambda < |\theta| \leq a\lambda, \ 0, & |\theta| > a\lambda. \end{cases}9 while maintaining t≥0t \geq 00 across all t≥0t \geq 01 has been shown to be unattainable without incurring unacceptably large lengths elsewhere, as established by Farchione and Kabaila (2008) and general admissibility arguments (Kabaila 2011) (Farchione et al., 2012).

5. Numerical Illustration and Optimization

Empirical evaluations consider t≥0t \geq 02 with t≥0t \geq 03 (large) or t≥0t \geq 04 (small degrees of freedom), and t≥0t \geq 05. The function t≥0t \geq 06 is represented by a natural cubic spline on t≥0t \geq 07 with t≥0t \geq 08 knots, enforcing t≥0t \geq 09. Optimization is performed:

  • Minimize pλ′(t)=λI(t≤λ)+(aλ−t)+a−1I(t>λ),p'_\lambda(t) = \lambda I(t \leq \lambda) + \frac{(a\lambda - t)_+}{a-1} I(t > \lambda),0,
  • Subject to pλ′(t)=λI(t≤λ)+(aλ−t)+a−1I(t>λ),p'_\lambda(t) = \lambda I(t \leq \lambda) + \frac{(a\lambda - t)_+}{a-1} I(t > \lambda),1 for pλ′(t)=λI(t≤λ)+(aλ−t)+a−1I(t>λ),p'_\lambda(t) = \lambda I(t \leq \lambda) + \frac{(a\lambda - t)_+}{a-1} I(t > \lambda),2, and pλ′(t)=λI(t≤λ)+(aλ−t)+a−1I(t>λ),p'_\lambda(t) = \lambda I(t \leq \lambda) + \frac{(a\lambda - t)_+}{a-1} I(t > \lambda),3 for all pλ′(t)=λI(t≤λ)+(aλ−t)+a−1I(t>λ),p'_\lambda(t) = \lambda I(t \leq \lambda) + \frac{(a\lambda - t)_+}{a-1} I(t > \lambda),4.

Key findings:

  • pλ′(t)=λI(t≤λ)+(aλ−t)+a−1I(t>λ),p'_\lambda(t) = \lambda I(t \leq \lambda) + \frac{(a\lambda - t)_+}{a-1} I(t > \lambda),5 for every pλ′(t)=λI(t≤λ)+(aλ−t)+a−1I(t>λ),p'_\lambda(t) = \lambda I(t \leq \lambda) + \frac{(a\lambda - t)_+}{a-1} I(t > \lambda),6; the interval cannot be shorter than the standard interval when pλ′(t)=λI(t≤λ)+(aλ−t)+a−1I(t>λ),p'_\lambda(t) = \lambda I(t \leq \lambda) + \frac{(a\lambda - t)_+}{a-1} I(t > \lambda),7.
  • pλ′(t)=λI(t≤λ)+(aλ−t)+a−1I(t>λ),p'_\lambda(t) = \lambda I(t \leq \lambda) + \frac{(a\lambda - t)_+}{a-1} I(t > \lambda),8 is also pλ′(t)=λI(t≤λ)+(aλ−t)+a−1I(t>λ),p'_\lambda(t) = \lambda I(t \leq \lambda) + \frac{(a\lambda - t)_+}{a-1} I(t > \lambda),9, often substantially so when pλ(θ)=∫0∣θ∣pλ′(t) dtp_\lambda(\theta) = \int_0^{|\theta|} p'_\lambda(t)\,dt0 or the variance of pλ(θ)=∫0∣θ∣pλ′(t) dtp_\lambda(\theta) = \int_0^{|\theta|} p'_\lambda(t)\,dt1 is large.
  • As pλ(θ)=∫0∣θ∣pλ′(t) dtp_\lambda(\theta) = \int_0^{|\theta|} p'_\lambda(t)\,dt2 increases, pλ(θ)=∫0∣θ∣pλ′(t) dtp_\lambda(\theta) = \int_0^{|\theta|} p'_\lambda(t)\,dt3 converges to pλ(θ)=∫0∣θ∣pλ′(t) dtp_\lambda(\theta) = \int_0^{|\theta|} p'_\lambda(t)\,dt4 from above, and pλ(θ)=∫0∣θ∣pλ′(t) dtp_\lambda(\theta) = \int_0^{|\theta|} p'_\lambda(t)\,dt5 exceeds pλ(θ)=∫0∣θ∣pλ′(t) dtp_\lambda(\theta) = \int_0^{|\theta|} p'_\lambda(t)\,dt6 for small pλ(θ)=∫0∣θ∣pλ′(t) dtp_\lambda(\theta) = \int_0^{|\theta|} p'_\lambda(t)\,dt7, then returns to pλ(θ)=∫0∣θ∣pλ′(t) dtp_\lambda(\theta) = \int_0^{|\theta|} p'_\lambda(t)\,dt8 as pλ(θ)=∫0∣θ∣pλ′(t) dtp_\lambda(\theta) = \int_0^{|\theta|} p'_\lambda(t)\,dt9.

These results establish a strict barrier: no λ>0\lambda > 000 achieves both uniform coverage and materially reduced expected length at λ>0\lambda > 001 (Farchione et al., 2012).

6. Comparisons and Admissibility Results

Classical intervals, by contrast, do not adapt to sparsity structure but avoid the inflammation of interval length observed in λ>0\lambda > 002 when enforcing minimal coverage. Admissibility results (Kabaila 2011) establish that intervals uniformly shorter than the usual confidence interval while preserving nominal coverage are infeasible, highlighting the trade-off intrinsic to CI construction in sparse regression (Farchione et al., 2012).

7. References and Historical Context

  • Fan, J. and Li, R. (2001), "Variable selection via nonconcave penalized likelihood and its oracle properties" (JASA 96, 1348–1360)
  • Farchione, D. and Kabaila, P. (2008), "Confidence intervals for the normal mean utilizing prior information" (Stat. Prob. Letters 78, 1094–1100)
  • Kabaila, P. (2011), "Admissibility of the usual confidence interval for the normal mean" (Stat. Prob. Letters 81, 352–359)
  • Confidence interval properties and their admissibility in the context of shrinkage, thresholding, and model selection are fundamentally shaped by the impossibility results for simultaneously achieving shorter expected length and nominal coverage (Farchione et al., 2012).

Key properties and limitations of the confidence interval-based clipping method in the SCAD framework are thus determined by fundamental statistical trade-offs. This framework provides essential insight for the post-selection inference literature, clarifying the inextricable link between shrinkage, coverage, and conservatism in high-dimensional regression.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Confidence Interval-Based Clipping Method.