Negative Sampling in Model Training

Updated 9 February 2026

Negative Sampling is a technique that selects a manageable subset of negatives from a vast candidate pool to efficiently train models in scenarios with scarce or weakly labeled negative data.
Various methods, including random, hard, GAN-based, and structure-aware approaches, refine negative selection to improve learning trajectories and embedding quality.
Recent advances in adaptive hardness and diversity-augmented sampling have significantly boosted performance in recommendation systems, knowledge graph embeddings, and dense retrieval tasks.

Negative sampling is a fundamental technique for scalable model training in domains where explicit negative (non-matching) data are either absent, weakly labeled, or vastly outnumbered by positives. It has been pivotal across word and sentence embedding, graph learning, knowledge graph representation, recommender systems, dense and cross-modal retrieval, neural topic modeling, and semi-supervised learning. By formulating the objective so that models discriminate between observed ("positive") pairs and algorithmically constructed "negative" samples (plausible-but-unobserved or actively adversarial examples), negative sampling provides both an efficiency boost and a mechanism for shaping the learned representation's decision boundary.

1. Formal Role and Frameworks

At the core, negative sampling selects a manageable subset of negatives from a vast or intractably large implicit negative pool. For a pairwise objective, given an anchor (e.g., query, user, entity) $x$ , a known positive $x^+$ , and a candidate set $\mathcal{C}$ , we sample $x^-$ from a distribution $p_n(x^-)$ over $\mathcal{C}$ , then minimize an objective such as:

$\mathcal{L} = \mathbb{E}_{(x, x^+)}\,\mathbb{E}_{x^-\sim p_n}\,\ell\big(f(x), f(x^+), f(x^-)\big)$

where $f$ is an encoder (e.g., embedding network) and $\ell$ is a contrastive or ranking loss. The candidate pool $\mathcal{C}$ and the sampling distribution $p_n$ are both tunable, yielding a rich taxonomy of approaches (Yang et al., 2024).

A generalized negative sampling pipeline is:

for each batch:
    construct candidate pool C     # e.g., all non-interacted items
    sample negatives x^- ∼ p_n(·)
    compute contrastive/ranking loss
    update parameters

2. Historical Evolution and Method Taxonomy

The methodical development of negative sampling encompasses several evolutionary paths and families of techniques (Yang et al., 2024, Ma et al., 2024):

Random/static: Uniform negative sampling (RNS), popularity-based (PNS; e.g., unigram distribution in word2vec), or statically fixed heuristics.
Model-dependent hard negative sampling (DNS, HNS): Mining negatives most similar (under current model) to the positives, including self-adversarial and curriculum learning refinements (Lai et al., 2024, Deng et al., 11 Mar 2025, Kamigaito et al., 2022).
GAN-based/adversarial: Employing generative models to propose adversarial negatives; discriminator guides selection (e.g., IRGAN, KBGAN/IGAN) (Zhang et al., 2020, Zhang et al., 2018).
Auxiliary/structure-informed: Leverage neighborhood structure, external knowledge, or caching (e.g., SANS selects k-hop negatives; NSCaching maintains per-relation hard negative caches) (Ahrabian et al., 2020, Zhang et al., 2018).
In-batch/mini-batch: Using elements from the current batch as negatives, essential in large-scale contrastive settings (SimCLR, SimCSE, InfoNCE) (Yang et al., 2024).
Hybrid/reweighting/interpolation: Mixtures of easy/hard, cache-guided, dynamically weighted, diversity-promoting (e.g., DivNS, ANS) (Xuan et al., 20 Aug 2025, Zhao et al., 2023).
Theoretical optimums: Bayesian or risk-minimizing samplers estimating the true negative probability via order statistics and model-independent priors (Liu et al., 2022).

These distinctions reflect choices in pool construction, sampling adaptivity, and exploitation of external or model-derived signals.

3. Methodological Advances: Hardness, Diversity, and Adaptivity

3.1 Hardness Calibration

A core methodological frontier is managing the hardness of negatives.

Fixed Hardness and Its Limits: Many existing samplers select negatives of fixed hardness (e.g., top-1 hard in DNS), leading to the false positive problem (FPP, uncorrected high-scoring true negatives) or false negative problem (FNP, inadvertent selection of positives as negatives).
Adaptive Hardness: Adaptive Hardness Negative Sampling (AHNS) (Lai et al., 2024) formulates essential criteria: positive-awareness (hardness depends on current positive scores), negative correlation (easier negatives as positives become easier), and adjustability (control via hyperparameters). The sampler sets negative hardness according to a functional $\beta(s^+ + \alpha)^p$ for $p < 0$ , provably resolving FPP and FNP and raising NDCG lower bounds.

3.2 False Negatives and Bayesian Risk

Bayesian Negative Sampling (BNS) (Liu et al., 2022) is designed to reduce false negative risk by deriving class-conditional densities via order statistics, estimating a posterior for true-negativeness, and minimizing expected risk via a sampling gain function, which optimally balances informativeness and safety.

3.3 Caching and Memory-Driven Exploration

Cache-based methods such as NSCaching (Zhang et al., 2018, Zhang et al., 2020) track a small, dynamically refreshed set of high-gradient negatives per context (e.g., per (h,r) or (r,t)). Uniform or importance-weighted sampling within the cache balances exploitation and exploration, while updating strategies regulate exploration/exploitation trade-offs.

3.4 Structural and Information-Theoretic Extensions

Structure-aware negative sampling (SANS) restricts candidate selection to k-hop neighborhoods in graphs or KGs, consistently creating semantically harder negatives without the overhead of full adversarial pipelines (Ahrabian et al., 2020). In dense retrieval, methods like TriSampler (Yang et al., 2024) enforce geometric proximity constraints (quasi-triangular principle) among query, positive, and negative items, leading to negatives informative for both discrimination and calibration.

3.5 Diversity-Augmented and Synthetic Negatives

Standard hard-mining often leads to low diversity, biasing the model toward overfitting dense negative clusters. Diverse Negative Sampling (DivNS) (Xuan et al., 20 Aug 2025) maintains caches of informative negatives but, via a diversity-penalized determinantal point process (DPP), selects negative subsets that minimize feature redundancy and maximize exposure to the global item space. Augmented Negative Sampling (ANS) (Zhao et al., 2023) further synthesizes negatives by editing easy factors of candidate embeddings toward positives, balancing "ambiguous trap" and "information discrimination" issues.

4. Impact Across Research Areas

4.1 Recommendation and Collaborative Filtering

Negative sampling is indispensable in implicit feedback settings: treating all non-interacted pairs as negatives is impractical and possibly biased. Reviews (Ma et al., 2024, Yang et al., 2024) organize sampler strategies as static (uniform, popularity-based), dynamic (score-aware DNS), adversarial, re-weighted (attention or knowledge-driven), and structure- or knowledge-enhanced. Core challenges include controlling false negatives, achieving curriculum over hardness, and balancing fairness or group-specific effects (Xuan et al., 2023). Advances such as AHNS, DivNS, and BNS deliver robust gains in top- $K$ recall/NDCG (+2–8%) and sample efficiency, with tighter theoretical guarantees (Lai et al., 2024, Xuan et al., 20 Aug 2025, Liu et al., 2022).

4.2 Knowledge Graph Embedding

Knowledge graph representation requires negative triple generation, since most KGs contain only ground truth positives. Salient methods are:

NSCaching and its automatic/tunable variant (Zhang et al., 2018, Zhang et al., 2020), which streamline hard-negative discovery by cache, outperforming GAN-based methods in both accuracy and convergence rate.
SANS (Ahrabian et al., 2020) exploits local graph structure for hard negative sampling with minimal overhead, improving MRR and Hits@10 over uniform baselines.
Theoretical optimization: Margin and sample-count calibration is crucial for bounded scorers (e.g., TransE, RotatE), requiring $\gamma \gtrsim \log|Y|$ or $\nu \geq |Y|$ to avoid objective collapse (Kamigaito et al., 2022).

4.3 Retrieval and Metric Learning

In dense retrieval (both mono- and cross-modal), hard, semi-hard, and quasi-triangular region negatives foster faster convergence and better generalization, with explicit constraints outperforming random or top- $K$ hard negative mining (TriSampler, (Yang et al., 2024)). In-batch negatives accelerate contrastive loss but are sensitive to batch size and false negative prevalence (Yang et al., 2024). For audio--text retrieval, semi-hard cross-modal negatives stably maximize mean average precision, while pure hardest negatives cause feature collapse (Xie et al., 2022).

4.4 Graph and Hypergraph Representation

Hyperedge prediction tasks, due to the exponential number of unobserved hyperedges, particularly benefit from hard-negative synthesis in latent space (via convex combinations), which constrains negatives to challenge the classifier near the embedding boundary. Such methods yield +1–3% AUC gains and robustness to sampling hyperparameters (Deng et al., 11 Mar 2025).

4.5 Neural Topic Modeling and Semi-supervised Learning

Negative sampling in neural topic models, especially via decoder-side triplet losses, boosts topic coherence and diversity across a range of VAE-based architectures (Adhya et al., 23 Mar 2025). For semi-supervised learning, directly enforcing "not-in-class" constraints on unlabeled examples via NS³L improves error rates on image classification, sharpening decision boundaries without extra parameters (Chen et al., 2019).

5. Empirical and Theoretical Insights

Extensive experiments consistently confirm the superiority of adaptive, hard, and/or diverse negative samplers over static baselines across standard datasets and architectures:

Negative sampling strategies are critical in both optimizing learning trajectories and shaping the inductive bias of learned embeddings (Yang et al., 2024, Ma et al., 2024).
Hard negatives provide rich gradients but risk high false negative rates, requiring calibrated or risk-aware sampling (AHNS, BNS) (Lai et al., 2024, Liu et al., 2022).
Diversity-enhanced methods improve both performance metrics and embedding generality (Xuan et al., 20 Aug 2025).
Cache-based protocols yield nearly optimal trade-offs of overhead and informativeness (Zhang et al., 2018, Zhang et al., 2020).
Structure-aware and out-of-batch negative pools are essential in graph-based and semi-supervised tasks (Ahrabian et al., 2020, Li et al., 2021).

Table: Core Families of Negative Sampling Methods

Category	Key Mechanisms	Representative Works
Static	Uniform / popularity	(Yang et al., 2024, Ma et al., 2024)
Hard	Model score, adaptive pool	(Deng et al., 11 Mar 2025, Lai et al., 2024)
GAN/Auxiliary	Adversarial, cache, graph	(Zhang et al., 2020, Zhang et al., 2018, Ahrabian et al., 2020)
Diversity/Synthesis	DPP, mixup, augmentation	(Xuan et al., 20 Aug 2025, Zhao et al., 2023)
In-batch	Batch-as-pool, debiasing	(Yang et al., 2024, Xie et al., 2022)

6. Open Challenges and Future Research Directions

Key open problems and research directions include:

False Negative Identification: Filtering or dynamically re-labeling latent positives among "hard" negatives (Ma et al., 2024, Liu et al., 2022).
Curriculum Scheduling of Hardness: Designing annealing or curriculum strategies for negative sample hardness across training time (Lai et al., 2024).
Unified Theoretical Frameworks: Closing the gap between empirical practice and formal guarantees, especially relating to negative quantities per positive and optimal hardness calibration (Yang et al., 2024).
Domain-Transfer and Multimodality: Generalizing model-conditional, diversity-enhanced, and cross-modal sampling for new data settings, including LLM-based, graph, or sequence models (Ma et al., 2024).
Replacing or Augmenting Negative Sampling: Exploration of non-sampling paradigms (full softmax, negative-free self-supervision) and advances in efficient approximations (Yang et al., 2024).
Efficient and Adaptive Infrastructure: Scalable, O(1)-per-sample negative samplers (e.g., via LSH or advanced caching) for ultra-large output spaces (as in extreme classification) (Daghaghi et al., 2020).
Bias and Fairness: The impact of negative sampling on evaluation fairness and bias calibration, both for group-stratified datasets and more general exposure settings (Xuan et al., 2023).

7. Conclusions

Negative sampling constitutes a unifying and central paradigm for scalable, effective training across a broad range of machine learning domains. Contemporary developments emphasize adaptivity, hardness calibration, diversity, and structural context in sampling methods. Appropriately designed negative samplers critically affect optimization speed, generalization, robustness to noise and unseen data, and interpretability. The interplay between theoretical properties (e.g., minimization of expected pairwise risk, lower bounds on NDCG, statistical consistency) and practical ramifications (empirical gains on standard tasks) guarantees ongoing research and continued methodological innovation in this area (Yang et al., 2024, Ma et al., 2024).