Optimal data splitting for shrinkage mean estimation

Determine the optimal sample-splitting strategy for estimating the base location estimator \widehat{\kappa} and the shrinkage mean estimator \widehat{\mu}(\widehat{\kappa}; X_{1:n}), including the proportion of data that should be allocated to computing \widehat{\kappa} to enforce independence (Assumption 4), and ascertain whether computing both \widehat{\kappa} and \widehat{\mu}(\widehat{\kappa}; X_{1:n}) on the full sample—thereby violating Assumption 4—yields superior performance.

Background

The paper introduces a family of shrinkage-based mean estimators that improve concentration by downweighting sample points far from a base estimator \widehat{\kappa}. A key technical assumption (Assumption 4) requires \widehat{\kappa} to be independent of the sample used to compute the shrinkage estimator \widehat{\mu}(\widehat{\kappa}; X_{1:n}), which motivates data splitting in practice.

Section 6.2 discusses practical strategies to satisfy the independence assumption through sample splitting and compares them empirically to the alternative of using the full sample for both \widehat{\kappa} and \widehat{\mu}. The authors explicitly note uncertainty about how much data should be devoted to \widehat{\kappa}, and whether avoiding splitting (thus violating Assumption 4) could be preferable, highlighting an unresolved methodological question with direct implications for practical performance and for the necessity of the independence assumption.

References

Splitting the sample is a natural way to fulfill Assumption 4. Nevertheless, it is not clear how much of the available data should be dedicated to the base estimator, or even if computing the base estimator and the shrinkage estimator on the full sample (ignoring Assumption 4) is the better approach.

— Improved Concentration for Mean Estimators via Shrinkage (2512.12750 - Catão et al., 14 Dec 2025) in Section 6.2 (Splitting the sample)

Optimal data splitting for shrinkage mean estimation

Background

References

Related Problems