Papers
Topics
Authors
Recent
Search
2000 character limit reached

Berry-Esseen Bounds for High-D Self-Normalized Sums

Updated 30 December 2025
  • The paper establishes explicit Berry–Esseen bounds for self-normalized sums, achieving rates as fast as O((log d)^(3/2)/√n) under finite third moment conditions.
  • The methodology employs truncation, smoothing, and Taylor expansions to manage nonlinearities introduced by data-dependent normalization in high dimensions.
  • The results clarify the trade-off between moment assumptions, sample size, and dimension growth, optimizing convergence in multivariate statistical inference.

High-dimensional self-normalized sums arise in multivariate statistical inference, especially in cases where the dimensionality of observed random vectors grows with sample size. The Berry-Esseen bound quantifies the rate of convergence in the Central Limit Theorem (CLT), measuring how closely the distribution of a properly normalized sum approximates a Gaussian law. In high dimensions, the interplay between sample size, dimension, and moment assumptions becomes critical, particularly for self-normalized statistics, where scaling by the data-dependent standard deviation introduces strong dependencies and nonlinearities. Recent work establishes explicit Berry-Esseen type bounds for these self-normalized sums and their maxima, significantly advancing the understanding of high-dimensional CLTs under relaxed moment assumptions (Das, 2020, Chang et al., 15 Jan 2025).

1. Problem Formulation and Self-normalized Sums

Given a sequence of independent, identically distributed (IID), mean-zero random vectors Xi=(Xi1,,Xid)RdX_i = (X_{i1}, \dots, X_{id})^\top \in \mathbb{R}^d, the primary object of interest is the self-normalized sum: Tn=(Tn,1,,Tn,d),Tn,j=i=1nXiji=1nXij2T_n = (T_{n,1},\dots,T_{n,d})^\top, \qquad T_{n,j} = \frac{\sum_{i=1}^n X_{ij}}{\sqrt{\sum_{i=1}^n X_{ij}^2}} for j=1,,dj=1,\ldots,d. For coordinate-wise inference, the distribution of Tn=max1jdTn,j\|T_n\|_\infty = \max_{1\leq j\leq d} T_{n,j} is studied, as well as the uniform approximation of TnT_n over classes of hyper-rectangles in Rd\mathbb{R}^d.

The Berry-Esseen distance for evaluating the approximation to a multivariate normal is given by

Δn=supAArePr{TnA}Pr{ZA}\Delta_n = \sup_{A\in\mathcal{A}^{\text{re}}} \left| \Pr\{T_n \in A\} - \Pr\{Z \in A\} \right|

where ZN(0,Id)Z \sim N(0, I_d) and Are\mathcal{A}^{\text{re}} denotes the class of hyper-rectangles.

2. Explicit High-dimensional Berry-Esseen Bounds

Recent results provide explicit Berry-Esseen bounds for the approximation of the law of TnT_n (or its coordinatewise maximum) by an appropriate Gaussian distribution (Das, 2020, Chang et al., 15 Jan 2025).

(a) Berry-Esseen Bound for Hyper-rectangles

Under the assumptions that each Tn=(Tn,1,,Tn,d),Tn,j=i=1nXiji=1nXij2T_n = (T_{n,1},\dots,T_{n,d})^\top, \qquad T_{n,j} = \frac{\sum_{i=1}^n X_{ij}}{\sqrt{\sum_{i=1}^n X_{ij}^2}}0 is mean-zero, with finite Tn=(Tn,1,,Tn,d),Tn,j=i=1nXiji=1nXij2T_n = (T_{n,1},\dots,T_{n,d})^\top, \qquad T_{n,j} = \frac{\sum_{i=1}^n X_{ij}}{\sqrt{\sum_{i=1}^n X_{ij}^2}}1-th moment for some Tn=(Tn,1,,Tn,d),Tn,j=i=1nXiji=1nXij2T_n = (T_{n,1},\dots,T_{n,d})^\top, \qquad T_{n,j} = \frac{\sum_{i=1}^n X_{ij}}{\sqrt{\sum_{i=1}^n X_{ij}^2}}2, and the sequence is IID across Tn=(Tn,1,,Tn,d),Tn,j=i=1nXiji=1nXij2T_n = (T_{n,1},\dots,T_{n,d})^\top, \qquad T_{n,j} = \frac{\sum_{i=1}^n X_{ij}}{\sqrt{\sum_{i=1}^n X_{ij}^2}}3, the following bound is obtained ((Das, 2020), Theorem 6):

Tn=(Tn,1,,Tn,d),Tn,j=i=1nXiji=1nXij2T_n = (T_{n,1},\dots,T_{n,d})^\top, \qquad T_{n,j} = \frac{\sum_{i=1}^n X_{ij}}{\sqrt{\sum_{i=1}^n X_{ij}^2}}4

where

Tn=(Tn,1,,Tn,d),Tn,j=i=1nXiji=1nXij2T_n = (T_{n,1},\dots,T_{n,d})^\top, \qquad T_{n,j} = \frac{\sum_{i=1}^n X_{ij}}{\sqrt{\sum_{i=1}^n X_{ij}^2}}5

When both Tn=(Tn,1,,Tn,d),Tn,j=i=1nXiji=1nXij2T_n = (T_{n,1},\dots,T_{n,d})^\top, \qquad T_{n,j} = \frac{\sum_{i=1}^n X_{ij}}{\sqrt{\sum_{i=1}^n X_{ij}^2}}6 and Tn=(Tn,1,,Tn,d),Tn,j=i=1nXiji=1nXij2T_n = (T_{n,1},\dots,T_{n,d})^\top, \qquad T_{n,j} = \frac{\sum_{i=1}^n X_{ij}}{\sqrt{\sum_{i=1}^n X_{ij}^2}}7 are bounded away from Tn=(Tn,1,,Tn,d),Tn,j=i=1nXiji=1nXij2T_n = (T_{n,1},\dots,T_{n,d})^\top, \qquad T_{n,j} = \frac{\sum_{i=1}^n X_{ij}}{\sqrt{\sum_{i=1}^n X_{ij}^2}}8 and Tn=(Tn,1,,Tn,d),Tn,j=i=1nXiji=1nXij2T_n = (T_{n,1},\dots,T_{n,d})^\top, \qquad T_{n,j} = \frac{\sum_{i=1}^n X_{ij}}{\sqrt{\sum_{i=1}^n X_{ij}^2}}9,

j=1,,dj=1,\ldots,d0

For the case j=1,,dj=1,\ldots,d1 (finite third moment), the bound becomes

j=1,,dj=1,\ldots,d2

which matches the classical rate j=1,,dj=1,\ldots,d3 for the univariate Berry-Esseen theorem, up to a logarithmic factor in j=1,,dj=1,\ldots,d4.

(b) Berry-Esseen Bound for Maxima (Coordinatewise Maximum)

A complementary approach provides explicit, nonasymptotic bounds for the Kolmogorov distance between j=1,,dj=1,\ldots,d5 and its Gaussian counterpart (Chang et al., 15 Jan 2025). Assuming finite third absolute moments,

j=1,,dj=1,\ldots,d6

where the infimum is taken over all mean-zero j=1,,dj=1,\ldots,d7-variate Gaussians with correlation matrices. The bound vanishes as j=1,,dj=1,\ldots,d8 provided

j=1,,dj=1,\ldots,d9

A moment-matching version Tn=max1jdTn,j\|T_n\|_\infty = \max_{1\leq j\leq d} T_{n,j}0 controls the error for Gaussian approximations with the actual covariance of Tn=max1jdTn,j\|T_n\|_\infty = \max_{1\leq j\leq d} T_{n,j}1.

3. Moment Assumptions and Dimension Growth

The fundamental trade-off in high-dimensional CLTs with self-normalized sums is between the required finite moment, the dimension Tn=max1jdTn,j\|T_n\|_\infty = \max_{1\leq j\leq d} T_{n,j}2, and the sample size Tn=max1jdTn,j\|T_n\|_\infty = \max_{1\leq j\leq d} T_{n,j}3. For the error bound to vanish, the growth of dimension is controlled by: Tn=max1jdTn,j\|T_n\|_\infty = \max_{1\leq j\leq d} T_{n,j}4

Tn=max1jdTn,j\|T_n\|_\infty = \max_{1\leq j\leq d} T_{n,j}5

For Tn=max1jdTn,j\|T_n\|_\infty = \max_{1\leq j\leq d} T_{n,j}6 (finite third moment), the regime Tn=max1jdTn,j\|T_n\|_\infty = \max_{1\leq j\leq d} T_{n,j}7 is sufficient to vanishing error in the Berry-Esseen sense for uniform approximation over rectangles.

This is in contrast to non-self-normalized sums, which typically require only polylogarithmic dependence of Tn=max1jdTn,j\|T_n\|_\infty = \max_{1\leq j\leq d} T_{n,j}8 on Tn=max1jdTn,j\|T_n\|_\infty = \max_{1\leq j\leq d} T_{n,j}9 for uniform CLT results.

4. Core Proof Strategies

The derivation of Berry-Esseen bounds for self-normalized sums in high dimensions fundamentally departs from traditional approaches for sums of independent vectors.

Key steps include:

  • Componentwise reduction: Use independence of TnT_n0 (or factorization over rectangles) to reduce the multivariate problem to sums of one-dimensional bounds.
  • Refined Berry–Esseen for self-normalized sums: Deploy one-dimensional results of Jing–Shao–Wang (2003), Bentkus–Götze (1996), and Shao (2005) to control the error for self-normalized quantities.
  • Truncation and smoothing: Truncate coordinates to manage heavy tails and introduce a smooth surrogate for TnT_n1, enabling Taylor expansion and smoothing arguments.
  • Gaussian anti-concentration: The TnT_n2 factors arise from multivariate Gaussian anti-concentration and complexity of the TnT_n3-norm.
  • Balancing approximation and smoothing bias: Choose smoothing and truncation parameters to optimize the interplay between stochastic remainders and deterministic bias, establishing the explicit rates in TnT_n4 and TnT_n5.

A summary of the main proof ingredients and their quantitative contributions is provided in the following table:

Step Contribution to Bound Source
Truncation/linearization Controls TnT_n6 of large values (Chang et al., 15 Jan 2025)
Smoothing/Taylor expansion Contributes TnT_n7 exponent (Chang et al., 15 Jan 2025)
One-dimensional BE bound Determines TnT_n8 exponent (Chang et al., 15 Jan 2025)
Anti-concentration Further TnT_n9 growth in constants (Das, 2020)

For sums of independent vectors (without normalization), Berry-Esseen bounds of order Rd\mathbb{R}^d0 are attainable (Chernozhukov–Chetverikov–Kato, Kuchibhotla–Chakrabortty). Self-normalized statistics, however, exhibit fundamentally greater complexity: the normalization introduces high dependence and nonlinearity, precluding direct application of previous high-dimensional CLTs (Chang et al., 15 Jan 2025).

Earlier high-dimensional Berry-Esseen rates for self-normalized sums held only under exponential-moment or independence-across-Rd\mathbb{R}^d1 assumptions. The new results (Das, 2020, Chang et al., 15 Jan 2025) relax these requirements to polynomial moments and accommodate arbitrary covariance structures (for maxima), providing the first explicit bounds in these regimes.

The bounds are also shown to be optimal in the sense that for Rd\mathbb{R}^d2, one cannot do better than Rd\mathbb{R}^d3 in general ((Das, 2020), Proposition 4.1).

6. Refined Bounds, Applications, and Future Directions

Stronger moment assumptions (e.g., finite fourth moment) or refined Lindeberg interpolations may reduce the Rd\mathbb{R}^d4 exponents and improve the Rd\mathbb{R}^d5 rate to Rd\mathbb{R}^d6, though at the expense of analytical and technical complexity (Chang et al., 15 Jan 2025).

The truncation-based approach for moment-matching bounds controls errors even when coordinate variances diverge, offering robustness to heavy-tailed data distributions. The coordinatewise formulation directly informs statistical inference via Student's Rd\mathbb{R}^d7-statistic and the construction of simultaneous confidence intervals.

Extensions to dependent observations (e.g., mixing processes) remain an open problem.

7. Summary Table of Main Results

Reference Assumptions Bound Dimension Growth Regime
(Das, 2020) Rd\mathbb{R}^d8 IID, Rd\mathbb{R}^d9 Δn=supAArePr{TnA}Pr{ZA}\Delta_n = \sup_{A\in\mathcal{A}^{\text{re}}} \left| \Pr\{T_n \in A\} - \Pr\{Z \in A\} \right|0 Δn=supAArePr{TnA}Pr{ZA}\Delta_n = \sup_{A\in\mathcal{A}^{\text{re}}} \left| \Pr\{T_n \in A\} - \Pr\{Z \in A\} \right|1
(Chang et al., 15 Jan 2025) Δn=supAArePr{TnA}Pr{ZA}\Delta_n = \sup_{A\in\mathcal{A}^{\text{re}}} \left| \Pr\{T_n \in A\} - \Pr\{Z \in A\} \right|2 IID, Δn=supAArePr{TnA}Pr{ZA}\Delta_n = \sup_{A\in\mathcal{A}^{\text{re}}} \left| \Pr\{T_n \in A\} - \Pr\{Z \in A\} \right|3 Δn=supAArePr{TnA}Pr{ZA}\Delta_n = \sup_{A\in\mathcal{A}^{\text{re}}} \left| \Pr\{T_n \in A\} - \Pr\{Z \in A\} \right|4 Δn=supAArePr{TnA}Pr{ZA}\Delta_n = \sup_{A\in\mathcal{A}^{\text{re}}} \left| \Pr\{T_n \in A\} - \Pr\{Z \in A\} \right|5

These results bridge the gap between classical Berry–Esseen theory and modern high-dimensional inference for self-normalized sums, providing explicit error rates and clarifying the interplay between moment control, dimensionality, and normalization.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Berry-Esseen Bound for High-dimensional Self-normalized Sums.