Asymptotic Theory for Statistical Estimation

Updated 22 January 2026

Asymptotic theory is a framework that defines the limiting behavior of statistical estimators, quantifying consistency, convergence rates, and distributional forms as sample size increases.
It encompasses classical M-estimation, convex constrained estimation, simulation-based approaches, and nonparametric methods, offering practical strategies for robust inference.
Recent advances extend foundational results to models with irregular smoothness, Bayesian estimation, and high-dimensional settings, ensuring reliable performance even under complex dependencies.

Asymptotic theory for statistical estimation provides the rigorous framework for analyzing the limiting behavior of estimators as the sample size increases. It characterizes consistency, rates of convergence, limiting distributions, and efficiency, underpinning the statistical validity of inference procedures in classical, simulation-based, nonparametric, and high-dimensional models. Modern developments extend foundational results to cases with minimal smoothness, non-i.i.d. dependence, complex regularization, convex constraints, simulation-generated synthetic data, and Bayesian-type estimators, broadening the scope of asymptotic guarantees and their applications.

1. Classical Asymptotic Foundations and M-Estimation

The modern treatment of asymptotic analysis of estimators begins with general estimating-function and M-estimation theory. Consider observations $X_1, ..., X_n$ (i.i.d. or from a stationary process) and a parametric model $\{\mathbb{P}_\theta: \theta \in \Theta\}$ ; the estimator $\hat\theta_n$ is the solution to the estimating equation

$\psi_n(\hat\theta_n) = \sum_{t=1}^n \psi(X_t; \hat\theta_n) = 0.$

Under regularity involving local smoothness, uniform convergence, and identifiability, a sequence of (weakly/strongly) consistent estimators exists, with the convergence rate (typically $n^{-1/2}$ ) determined by the stochastic order of $\psi_n(\bar\theta)$ at the limit point $\bar\theta$ . If the Central Limit Theorem applies to $\psi_n(\bar\theta)$ and the Jacobian converges, then

$\sqrt{n}(\hat\theta_n - \bar\theta) \xrightarrow{d} N(0,\, A(\bar\theta)^{-1} B(\bar\theta) A(\bar\theta)^{-1}),$

with $A = \mathbb{E}_\theta[\partial_\theta \psi]$ and $B = \operatorname{Var}_\theta \psi$ (Jacod et al., 2017). This structure generalizes maximum likelihood, generalized method of moments, and many time series estimators.

2. Convex and Constrained Estimation: Beyond Differentiability

Recent developments remove the need for differentiability or smoothness around the risk minimizer. For convex losses or convex constraints, consistency and limiting distribution can be established through support functions and directional derivatives. If the loss is not differentiable, the first-order expansion at the population minimizer uses the support function $h_{\partial \Phi(\theta^*)}(t)$ of the subdifferential; if $0$ is in its interior, the estimator is exactly equal to $\theta^*$ for all $n$ beyond a threshold (the "one-step regime"). In the classical $n^{-1/2}$ regime, if the risk is twice differentiable, the asymptotic distribution is

$\sqrt{n} (\hat\theta_n - \theta^*) \rightarrow_d \mathcal{D}^+ \left[ \pi_{T}^S \right](-S^{-1} \nabla \Phi(\theta^*); Z),$

where $T$ is the tangent cone at $\theta^*$ , $S$ is the Hessian, and $Z \sim N(0, S^{-1} B S^{-1})$ (Brunel, 6 Nov 2025). This approach unifies robust estimation and regularized (penalized) M-estimators, extending to U-statistics and covering widely used estimators for location, scatter, and multivariate depth points.

3. Simulation-Based and Two-Stage Estimation

Simulation-based (also called likelihood-free or indirect) estimation presents a distinct regime in which asymptotic normality and efficiency are not direct consequences of likelihood theory. The Two-Stage (TS) approach proceeds by (1) simulating synthetic datasets for sampled parameters, (2) compressing each via a feature map to low-dimensional statistics, and (3) regressing the synthetic parameters on these features to construct an estimator. If the compression function and regression satisfy certain identifiability and smoothness conditions, the TS estimator is strongly consistent and asymptotically normal: $\sqrt{N}(\hat\theta_{TS} - \theta_0) \xrightarrow{d} N(0, \Sigma_{TS}),$ where $\Sigma_{TS}$ derives from the regression Jacobian and population quantile variances. Generally, $\Sigma_{TS} \geq I(\theta_0)^{-1}$ , and equality holds when the compression-regression map recovers the score function, connecting TS theory to the semiparametric efficiency framework (Lakshminarayanan et al., 25 Aug 2025).

This framework provides theoretical justification for simulation-based estimators used in computationally intensive applications, showing that a carefully chosen offline mapping can deliver both computational efficiency and classical asymptotics.

4. Asymptotics for Bayesian, Indirect, and Latent-Variable Estimators

Bayesian and distribution-based estimators, including Bayes-type estimators with general loss functions, often enjoy similar asymptotic properties as MLEs under suitable regularity and large deviation conditions. In particular, for block-separable Bayes-type estimators minimizing a general loss with a prior, the (normalized) estimator's deviation converges in distribution to a normal limit with a block-diagonal covariance dictated by Fisher (or quasi-Fisher) information of each block: $\text{for each block } k:\quad \sqrt{n_k}(\hat\theta_{n,k} - \theta^*_k) \to N(0, I_k^{-1}).$ This equivalence extends to various misspecified, irregular (e.g., ergodic diffusion) or high-frequency models, through technical machinery such as polynomial-type large deviation inequalities and the Ibragimov-Has'minskii expansion (Ogihara, 2013).

For models with latent variables, the asymptotic Kullback–Leibler risk of conditional estimation of latent states (e.g., in hierarchical or unsupervised learning) is governed, for ML and Bayesian estimators, by: $D_{ML}(n) = \frac{1}{2n} \operatorname{Tr}[(I_{XY}-I_X) I_X^{-1}],\quad D_{Bayes}(n) = \frac{1}{2n} \log\det[I_{XY} I_X^{-1}],$ where $I_{XY}$ and $I_X$ are Fisher information matrices including or marginalizing the latent variables. The leading-order Bayes error is strictly smaller (Yamazaki, 2012). These results underpin model-selection criteria and active learning in models with hidden structure.

Approximate Bayesian Computation (ABC) estimators, widely used in intractable likelihood models, satisfy a modified normal asymptotic theory that incorporates a nontrivial, tuning-parameter-dependent bias $b(\epsilon)$ . The bias decreases as the ABC tolerance parameter $\epsilon \to 0$ , with mean-square error $MSE \approx \|b(\epsilon)\|^2 + c/n$ , providing practical balancing guidance between computational tractability and statistical accuracy (Dean et al., 2011).

5. Nonparametric, High-dimensional, and Empirical Process Approaches

Asymptotic theory naturally extends to empirical process-based and nonparametric settings. In classical sample problems, the functional empirical process (fep) yields tight Gaussian approximations for sample means, variances, differences, and functions thereof, under only a finite fourth-moment condition. For any smooth functional $T(P_n)$ of the empirical distribution $P_n$ ,

$T(P_n) = T(P) + n^{-1/2} \mathbb{G}_n(\dot{T}(P)) + o_{\mathbb{P}}(n^{-1/2}),$

where $\dot{T}(P)$ is the influence function. As a result, the delta method and functional CLT give normal asymptotics even for non-Gaussian or dependent data, supporting inference for means, variances, and their ratios in finite or moderate sample sizes (Camara et al., 7 Aug 2025).

For nonparametric quadratic estimators ( $U$ -statistics), including estimators of functionals like $\int g^2$ or $\int b^2$ , the limiting distribution can remain normal even when the degenerate (second-order) term dominates and the convergence rate is slower than $n^{-1/2}$ . This is achieved through conditional CLTs under fine partitioning and moment control, justifying inference in high-dimensional or low-smoothness settings (Robins et al., 2015).

Modern analysis of kernel ridge regression for linear and derivative functionals shows that the optimal smoothing parameter for inference is $\lambda \sim n^{-1}$ (pointwise/derivative) or $\lambda \sim n^{-1}\log n$ (uniform norm), not the $n^{-2m/(2m+d)}$ rate that optimizes $L_2$ error. The bias and variance scale as $|\text{BIAS}| = O_P(\lambda^{\delta/2})$ , $\text{VAR} = O_P(n^{-1}\lambda^{\delta-1})$ , and

$\frac{l(\hat{f} - f) - \text{BIAS}}{\sqrt{\text{VAR}_n}} \to_d N(0, 1),$

where $l(\cdot)$ is a linear functional (Tuo et al., 2024).

6. Asymptotic Theory under Complex Dependence and Clustering

The extension of asymptotic theory to clustered or dependent data with unbounded or heterogeneous cluster sizes underlies the validity of robust standard errors in econometric applications. The Weak and Central Limit Laws are guaranteed under cluster-size negligibility (max cluster size over total sample → 0), with uniform integrability and moments conditions. For the sample mean,

$\sqrt{n}(\bar X_n - \mu) \to_d N(0, V_n),$

with $V_n = \sum_g \operatorname{Var}(X_g) / n$ the cluster-level variance. Cluster-robust covariance estimators remain consistent, and standard t- and Wald statistics based on these covariances remain valid even in complex sampling designs (Hansen et al., 2019).

Similarly, for network and graph data, asymptotic normality and consistency can be established for estimators involving parameters such as sparsity exponents and degree scalings under very weak assumptions (e.g., degree-distribution conditions). The effective sample size is controlled by network statistics rather than the number of nodes or edges, and Bayesian posteriors contract at rates governed by the observed graph sequence (Naulet et al., 2021).

7. Implications for Model Selection, Bayesian Learning, and Computational Schemes

Bayesian learning theory connects the asymptotic learning curve (expected generalization error) under broad regularity to algebraic-geometric invariants: $\mathbb{E}[B_g] = L_0 + \frac{\lambda/\beta + \nu}{n} + o(n^{-1}),$ where $\lambda$ is the log-canonical threshold and $\nu$ the singular fluctuation. For regular models, $\lambda = d/2$ , and the $1/n$ law is universal under the renormalizable condition. If renormalizability fails, non-classical rates appear (e.g., $n^{-2/3}$ ) (Watanabe, 2010). This insight underpins the validity of criteria such as WAIC in singular and misspecified scenarios.

For computational-statistical trade-offs in iterative methods (e.g., the JKO scheme for dynamic distributions) with parameter estimation, joint asymptotic theory shows that accumulated estimation and discretization errors can be described by deterministic PDE or stochastic PDE limits, quantifying the interplay between sample size, iteration number, and statistical error propagation (Wu et al., 11 Jan 2025).

Table: Key Concepts and Representative Results

Regime / Estimator	Consistency	Rate / Limit Law
Classical M-/estimating eqns	Strong/weak	$\sqrt{n}$ , Gaussian; $A^{-1}BA^{-1}$
Convex constrained M-estimation	Strong/weak	Projection, or support function
Two-Stage / Simulation-based	Strong	$\sqrt{N}$ , Gaussian, efficiency gap
Nonparametric U-statistics	Weak/strong	$n^{-\alpha}, \alpha \le 1/2$ , CLT
Functional empirical process	Strong	$\sqrt{n}$ , Gaussian CLT/delta-method
Clustered samples	Strong	$\sqrt{n}$ , cluster-robust CLT
Bayesian / Latent-variable	Strong	$1/n$ risk, information-theoretic
ABC / Approximate Bayes	Consistent up	$\sqrt{n}$ , Gaussian, $\epsilon$ -bias

References

General asymptotic M-estimation and stochastic processes: (Jacod et al., 2017)
Convex and constraint-based asymptotics: (Brunel, 6 Nov 2025)
Simulation-based (two-stage) estimation: (Lakshminarayanan et al., 25 Aug 2025)
Asymptotic theory for Bayes-type and block-structured estimators: (Ogihara, 2013), latent-variable errors: (Yamazaki, 2012)
Nonparametric/fep and empirical process methods: (Camara et al., 7 Aug 2025), high-dimensional U-statistics: (Robins et al., 2015)
Clustered/complex sampling: (Hansen et al., 2019), network/graph asymptotics: (Naulet et al., 2021)
Kernel ridge and semiparametric inference: (Tuo et al., 2024)
Bayesian learning curve asymptotics and renormalizability: (Watanabe, 2010)
Computational-statistical iterative schemes: (Wu et al., 11 Jan 2025)
Asymptotic theory for ABC/Bayesian approximation: (Dean et al., 2011)