Horvitz–Thompson Estimator Overview

Updated 22 January 2026

Horvitz–Thompson estimator is a design-unbiased method that computes population totals and means using known inclusion probabilities.
Extensions like the Hajek estimator and adaptive normalization address high variance by regulating weights from highly heterogeneous inclusion probabilities.
Widely applied in complex survey designs, network inference, and machine learning, it remains a cornerstone for robust estimation despite its variance challenges.

The Horvitz–Thompson estimator is a foundational tool in survey sampling, importance weighting, and related fields, providing design-unbiased estimation for population totals and means under arbitrary, potentially unequal probability sampling designs. Its versatility has led to widespread application and extensive theoretical generalization, with rigorous developments addressing its bias-variance properties, extensions to complex and adaptive designs, model-assisted improvements, and modern applications in causal and network inference, machine learning, and statistical genomics.

1. Mathematical Formulation and Principle

Let $U = \{1, \dots, N\}$ be a finite population, with each unit $i$ associated to a value $y_i$ . Suppose a sample $S \subset U$ is drawn such that for each unit, the (first-order) inclusion probability is $\pi_i = P(i \in S)$ , with $0 < \pi_i \leq 1$ and known for all $i$ . The Horvitz–Thompson (HT) estimator for the population total $T = \sum_{i \in U} y_i$ is

$\hat{T}_{HT} = \sum_{i \in S} \frac{y_i}{\pi_i}$

and for the mean, $\hat{\mu}_{HT} = \frac{1}{N} \sum_{i \in S} y_i / \pi_i$ (Datta et al., 2022).

The essential attribute is design-unbiasedness: $i$ 0, regardless of the sampling scheme, as long as inclusion probabilities are strictly positive and correctly specified.

The variance is given by

$i$ 1

where $i$ 2 and $i$ 3 (Datta et al., 2022).

2. Extensions, Efficiency, and Alternative Normalization

HT’s design-unbiasedness often comes at the cost of high variance, particularly when $i$ 4 are highly heterogeneous. The estimator ignores the randomness of the realized sample size $i$ 5, motivating the use of alternative normalization and shrinkage strategies to control variance.

Self-Normalized (Hajek) Estimator. Normalizes by the HT-estimated sample size:

$i$ 6

This introduces correlation between numerator and denominator, often reducing variance at the cost of slight bias (Khan et al., 2021).

Adaptive Normalization. The Trotter–Tukey family of IPW estimators interpolates between pure HT ( $i$ 7) and Hajek ( $i$ 8), with the adaptively normalized estimator (AN) minimizing asymptotic variance:

$i$ 9

where

$y_i$ 0

The AN estimator always achieves variance smaller than or equal to HT and Hajek except in degenerate cases (Khan et al., 2021). It is also optimal in the bias–variance trade-off within the class of affine-normalized IPW estimators.

Hard-Thresholding and Variance Regularization. To further reduce variance, especially when some $y_i$ 1 are extremely small, hard-thresholding replaces $y_i$ 2 below a threshold $y_i$ 3 by $y_i$ 4:

$y_i$ 5

This “improved Horvitz–Thompson estimator” introduces small bias (order $y_i$ 6) but yields strictly reduced MSE under mild regularity (Zong et al., 2018).

3. Advanced Designs: Two-Stage, Cross-Classified, and Network Sampling

Two-Stage Sampling

When sampling is performed hierarchically (e.g., clusters then individuals within clusters), the two-stage HT estimator is

$y_i$ 7

Under mild regularity, this estimator is consistent and asymptotically normal, with unbiased estimators available for the full variance (which decomposes into between- and within-cluster components) (Chauvet et al., 2018).

Cross-Classified Sampling

For settings where units are sampled independently along two dimensions (e.g., time × location), the cross-classified HT estimator is

$y_i$ 8

This estimator is unbiased, but often less efficient than two-stage sampling for the same allocated sample size; variance estimation requires careful handling of joint inclusion probabilities (Juillard et al., 2015).

Network and Interference Settings

Networked data, particularly under interference (where outcomes for one unit depend on others' treatments or exposures), require extended HT forms using exposure mappings:

$y_i$ 9

Here, $S \subset U$ 0 is the probability unit $S \subset U$ 1 receives treatment and is in exposure arm $S \subset U$ 2; similar for controls (Thiyageswaran et al., 2024). The estimator remains unbiased when the exposure mapping is correct, but becomes inadmissible relative to MSE among linear estimators under most random-exposure designs (Karwa et al., 2023).

4. Efficiency, Bayesian Remedies, and Adaptive Estimation

HT estimation is not, in general, semiparametrically efficient: it is uniformly minimum-variance among design-unbiased estimators but can be dominated in variance by biased or model-assisted estimators (Morikawa et al., 2022).

Semiparametric Efficiency. Adaptive estimators leveraging auxiliary models for the sampling weights, outcome distributions, or both, can achieve the efficiency bound representing the lowest possible asymptotic variance among regular asymptotically linear estimators. Construction typically proceeds via efficient influence function calculations and can use double machine learning to nonparametrically estimate required conditional moments (Morikawa et al., 2022).

Bayesian Smoothing and Shrinkage. In the presence of small or highly variable inclusion probabilities, classical HT estimators are susceptible to “weak paradoxes,” yielding estimates with extreme variance or even erratic behavior (e.g., Basu’s or Wasserman’s examples). Bayesian fix approaches, such as conjugate hierarchical models or binned shrinkage, regularize the HT estimator to achieve smaller MSE at minor cost in bias, without requiring a hard minimum bound on $S \subset U$ 3 (Datta et al., 2022).

5. Model-Assisted and Nonresponse-Adjusted Extensions

When auxiliary information is available, model-assisted estimators combine HT’s unbiasedness with the predictive power of a working model. The generalized regression estimator (GREG)

$S \subset U$ 4

is asymptotically unbiased and attains smaller variance than pure HT if the model is approximately correct (Eustache et al., 2022, Wang et al., 2011). Calibration methods adjust the HT weights to exactly match known auxiliary totals.

For nonresponse, double expansion applies: first for the sampling design ( $S \subset U$ 5), then for the response probability ( $S \subset U$ 6), giving

$S \subset U$ 7

In practice, $S \subset U$ 8 is estimated, and model-assisted adjustments can further improve variance properties, provided missingness is at random and the model for response is well specified (Eustache et al., 2022).

6. Practical Advancements and Applications

Horvitz–Thompson methodology has been deployed in a broad range of modern problems:

Prediction error estimation in complex samples: By extending Efron's optimism correction with HT weights, estimators of generalization error are unbiased for the true target population, and are algebraically equivalent to design-based AIC in canonical GLMs (Holbrook et al., 2017).
Rare subgroup and network interference causal inference: HT estimators remain unbiased for subgroup means and contrasts under complex two-stage and interference designs, whereas “natural” or normalized estimators become undefined or biased for rare subgroups (Gabriel, 2020).
Graph statistics from subgraph samples: For network summary quantities that can be expressed as sums over sampled units or edges (e.g., Dirichlet energy, homophily), the HT estimator affords unbiased and consistent estimates even when the total network is only partially observed, so long as the sampling design and edge-inclusion probabilities are known (Ajorlou et al., 18 Dec 2025).

Adaptive, model-assisted, and robust HT-like estimators have been developed and studied for settings with partially observed probabilities, finite response efforts (requiring nonparametric deconvolution for response probabilities), and unknown detection in spatial sampling, with established consistency and MSE guarantees under regularity (Greenshtein et al., 2013, Kansanen et al., 2019).

7. Limitations, Optimality, and Open Directions

While the Horvitz–Thompson estimator is indispensable for design-unbiased inference under arbitrary unequal probability sampling, its limitations are well-documented:

High Sampling-Weight Variance: Small or highly heterogeneous $S \subset U$ 9 can yield estimators with extremely high or even infinite variance, motivating shrinkage or regularization (Zong et al., 2018, Khan et al., 2021).
Inadmissibility under Complex Dependence: In experiments with network interference, the HT estimator is inadmissible in MSE among all linear estimators except under fixed-exposure designs. Shrinkage, model-based, and design-restricted alternatives are under active study (Karwa et al., 2023).
Sensitivity to Design and Weight Specification: Misspecification of inclusion probabilities, response models, or exposure mapping can lead to breakdown of unbiasedness, requiring data-adaptive, model-assisted, or semiparametric approaches for robust inference (Thiyageswaran et al., 2024).

Continued research addresses semiparametric optimality with flexible models for response or inclusion, robustification to misspecification and informative sampling, theoretical guarantees in high-dimensional or network-dependent settings, and practical computation in large-scale or streaming data regimes.

References:

(Zong et al., 2018) Improved Horvitz-Thompson Estimator in Survey Sampling
(Chauvet et al., 2018) Inference for two-stage sampling designs with application to a panel for urban policy
(Holbrook et al., 2017) Estimating prediction error for complex samples
(Khan et al., 2021) Adaptive normalization for IPW estimation
(Thiyageswaran et al., 2024) Data-adaptive exposure thresholds for the Horvitz-Thompson estimator of the ATE in experiments with network interference
(Kansanen et al., 2019) Horvitz-Thompson-like estimation with distance-based detection probabilities for circular plot sampling of forests
(Karwa et al., 2023) On the admissibility of Horvitz-Thompson estimator for estimating causal effects under network interference
(Greenshtein et al., 2013) Deconvolution with application to estimation of sampling probabilities and the Horvitz-Thompson estimator
(Eustache et al., 2022) Model-Assisted Estimators under Nonresponse in Sample Surveys
(Wang et al., 2011) Nonparametric Additive Model-assisted Estimation for Survey Data
(Morikawa et al., 2022) Semiparametric adaptive estimation under informative sampling
(Datta et al., 2022) Inverse Probability Weighting: from Survey Sampling to Evidence Estimation
(Ajorlou et al., 18 Dec 2025) Dirichlet Meets Horvitz and Thompson: Estimating Homophily in Large Networks via Sampling
(Juillard et al., 2015) Estimation under cross-classified sampling with application to a childhood survey
(Cardot et al., 2012) Variance estimation and asymptotic confidence bands for the mean estimator of sampled functional data with high entropy unequal probability sampling designs
(Chauvet, 2021) A cautionary note on the Hanurav-Vijayan sampling algorithm
(Gabriel, 2020) A note on Horvitz-Thompson estimators for rare subgroup analysis in the presence of interference