VAD Annotations: Valence, Arousal, Dominance

Updated 9 February 2026

VAD annotations are a dimensional framework that quantifies affective meaning via continuous valence, arousal, and dominance scores.
They employ methods like human rating protocols, self-assessment scales, and fuzzy representations for robust emotion measurement.
High statistical reliability and versatile applications in text, speech, and multimodal analyses underscore the practical significance of VAD.

Valence–Arousal–Dominance (VAD) Annotations

Valence–Arousal–Dominance (VAD) annotation is a dimensional methodology for quantifying the affective meaning of words, sentences, or multimodal data in affective computing, natural language processing, and psychology. Each annotation comprises three real-valued coordinates: Valence (pleasure–displeasure), Arousal (activation–deactivation), and Dominance (control–submissiveness). VAD annotations are obtained by human ratings, proxy assessments, or functionally derived from discrete emotion labels, and provide a continuous, interpretable affective representation with robust applications in computational and cognitive sciences.

1. Theoretical Foundations and Dimensional Structure

The theoretical basis for VAD annotation was established through seminal factor analytic studies by Osgood (1957) and Russell (1980, 2003), which identified valence (V), arousal (A), and dominance (D) as orthogonal primary factors structuring affective meaning (Mohammad, 30 Mar 2025). Valence represents the pleasantness of an affect, arousal encodes its level of activation, and dominance quantifies the degree of control or submission. These three dimensions can be measured on continuous real-valued scales with explicit definitions and, in current large-scale lexica, are typically normalized to intervals [–1, +1] or [0, 1] for computational tractability (Mohammad, 30 Mar 2025, &&&2&&&).

The independence of V, A, and D (or equivalently, their low inter-dimension correlation) is consistently validated empirically through high split-half reliability measures and factor analyses (Mohammad, 30 Mar 2025). The VAD space has also been shown to generalize across modalities (text, speech, vision) and cultural-linguistic domains, motivating its use as a foundational annotation framework (Buechel et al., 2022, Liu et al., 15 May 2025, Jia et al., 2024).

2. Annotation Methodologies

2.1 Lexicon-Based Human Rating Protocols

The NRC VAD Lexicon v2 is the most comprehensive resource, containing human VAD ratings for over 55,000 English terms, including ∼10,000 multiword expressions (MWEs) (Mohammad, 30 Mar 2025, Mohammad, 25 Nov 2025). Annotators are recruited via crowdsourced platforms (Amazon Mechanical Turk), restricted to native or near-native English speakers (with demographic balance), and asked to rate each term separately for V, A, and D. A 7-point Likert scale (–3 to +3) is mapped onto [–1, +1] by the linear transformation: $s' = \frac{s_{\rm raw} - (-3)}{6} \times 2 - 1, \qquad s_{\rm raw} \in \{-3, ..., +3\}$ Quality control is maintained by embedding ∼2 % expert-validated “gold” questions, discarding all annotations from raters falling below 80 % accuracy, and aggregating remaining ratings by arithmetic mean (Mohammad, 25 Nov 2025).

2.2 Self-Assessment Scales and Perspectives

For sentence- or document-level VAD annotation, the Self-Assessment Manikin (SAM) scale is widely used: a non-verbal pictorial instrument offering 5- or 9-point scales for each dimension, exemplified in the EmoBank corpus (10,062 sentences) (Buechel et al., 2022). EmoBank’s protocol collects annotations for both the writer’s and the reader’s affective perspectives, finding that the reader yields higher inter-annotator agreement and stronger emotional signal (Buechel et al., 2022). Each sentence is rated by multiple annotators; mean aggregation and outlier removal finalize the annotation.

2.3 Proxy, Computational, and Fuzzy Methods

Non-direct protocols circumvent rater fatigue and subjective variance. A proxy-based method has participants construct geometric animations to encode discrete labels, then rate the resulting animation along each VAD dimension. This two-step approach enables a robust mapping between categorical labels and continuous VAD coordinates (means and SDs), proven robust across large crowdsourced samples (Wrobel, 16 Nov 2025).

Fuzzy VAD annotations use interval type-2 fuzzy sets to represent each coordinate as membership functions over “low,” “medium,” and “high” fuzzy categories. Raw numerical VAD self-reports are mapped via parameterized upper and lower Gaussian membership curves, yielding 18 interval-valued features (per event) and supporting downstream probabilistic classification over the $3^3=27$ cuboid lattice of the VAD cube (Asif et al., 2024).

3. Statistical Properties and Reliability

3.1 Split-Half Reliability and Inter-Annotator Agreement

Large-scale VAD lexica consistently achieve high split-half reliability (SHR), computed by randomly splitting annotators for each item, aggregating both halves, and averaging correlations (Spearman $\rho$ , Pearson $r$ ) over 1,000 repetitions. For NRC VAD v2:

Valence: $\rho=0.98,\, r=0.99$
Arousal: $\rho=0.97,\, r=0.98$
Dominance: $\rho=0.96,\, r=0.96$ (Mohammad, 30 Mar 2025, Mohammad, 25 Nov 2025)

Cronbach’s $\alpha$ is not always reported, but high SHR implies $\alpha \approx 0.95$ or above. Task-specific datasets (e.g., sentence-level EmoBank) show writer-vs-reader perspective differences in reliability ( $r=0.698$ –$0.738$ for valence) and mean-absolute-error (MAE), with the reader’s scores being more reliable on average (Buechel et al., 2022). Fuzzy VAD partitioning is validated by ablation, with type-2 fuzzy representations boosting classification accuracy over crisp partitions by roughly +1% in absolute terms (Asif et al., 2024).

3.2 Distributional Coverage

Valence, arousal, and dominance ratings span the entire –1, +1 range, though word distributions in lexica are typically skewed: valence toward positive (“positivity bias”), arousal near neutral, dominance mildly positive (Mohammad, 30 Mar 2025). MWEs present less compositionality in arousal and dominance than single words, with the Pearson $r$ between MWE and mean constituent scores:

Valence: $r\approx0.78$
Arousal: $r\approx0.55$
Dominance: $r\approx0.50$ (Mohammad, 25 Nov 2025)

4. Computational Mapping Between VAD and Categorical Emotion

4.1 From Discrete to VAD

Discrete-to-VAD mappings can be constructed by directly assigning to each categorical label its average VAD triple, either from lexicon lookup (e.g., NRC-VAD) or by proxy-based human mapping (Wrobel, 16 Nov 2025). In multimodal emotion models, discrete categories are “embedded” in VAD space for continuous prediction or to serve as cluster centroids for K-means classification (Jia et al., 2024).

For inference:

Lookup: $VAD(c) = (v^c, a^c, d^c)$
For proxies, mean VAD coordinates per discrete label are aggregated: $\mu_{i,d} = (1/N_i) \sum_{p=1}^{N_i} r_{p,i,d}$

4.2 From VAD to Categorical and Mixture-of-Emotions

Mapping from VAD to a probability distribution over emotion classes is achieved by modeling each class as a Gaussian in VAD space $(\mu, \sigma)$ , assuming independence per dimension: $P(x \mid E_k) = P(V_{\text{obs}} \mid E_k)\,P(A_{\text{obs}} \mid E_k)\,P(D_{\text{obs}} \mid E_k)$ Posterior $p_k$ (soft label) is then normalized: $p_k = \frac{\exp(\ell_k)}{\sum_{j=1}^K \exp(\ell_j)},\qquad \ell_k = \sum_{d\in\{V,A,D\}} [-\tfrac{1}{2}((x_d-\mu_{d,k})/\sigma_{d,k})^2 - \log(\sigma_{d,k}\sqrt{2\pi})]$ This approach enables mixture-of-emotions annotation for ambiguous or blended affective states (Neto et al., 6 Feb 2026).

4.3 Joint or Consistency-Based Learning

Deep models use auxiliary VAD heads for regression (e.g., BERT or RoBERTa + three linear heads), multi-task loss (classification $+$ VAD regression), and VAD-preserving consistency constraints to unify continuous and categorical outputs (Yang et al., 2023, Li et al., 3 Jan 2026, Park et al., 2019). Optimization objectives may include squared Earth Mover’s Distance (EMD), mean-squared error (MSE), or custom regularizers penalizing violations of VAD/categorical consistency.

5. Practical Application and Integration in AI Systems

5.1 Text and Speech Analysis

VAD lexica are readily used for affective profiling of text by tokenizing, lowercasing, and matching against dictionary entries; for MWEs, the maximal-length match protocol is applied. Document- or utterance-level VAD is the mean vector across matched entries: $V_{\rm doc} = \frac{1}{|T|}\sum_{t\in T}v_t,\quad A_{\rm doc} = \frac{1}{|T|}\sum_{t\in T}a_t,\quad D_{\rm doc} = \frac{1}{|T|}\sum_{t\in T}d_t$ Fuzzy VAD representations are integrated into deep architectures for EEG-based emotion recognition, supporting probabilistic and robust classification over large emotion sets (Asif et al., 2024).

5.2 Multimodal Emotion Detection

VAD annotations serve as regression targets in multimodal (audio, video, text) models. Continuous VAD predictions can be re-discretized via clustering (K-means with emotion anchor seeding), enabling the model to generate both discrete labels and open-vocabulary emotion descriptors (Jia et al., 2024).

5.3 Speech Synthesis and Control

In controllable emotional text-to-speech systems, continuous ADV (arousal, dominance, valence) coordinates, typically derived from SAM-based annotations and quantized into bins, allow for granularity in emotional rendering and modulation of speech prosody (Liu et al., 15 May 2025).

6. Limitations, Best Practices, and Future Directions

While VAD annotation has demonstrated cross-genre, cross-modal, and cross-cultural applicability, important limitations remain:

Lexica may show coverage gaps for specialized, contemporary, or non-English vocabulary, and static annotations may not reflect diachronic semantic drift (Mohammad, 30 Mar 2025, Hellrich et al., 2018).
Annotator demographic bias, particularly in crowdsourcing, could affect generalizability; best practice is to expand annotation populations for broader coverage (Mohammad, 30 Mar 2025, Mohammad, 25 Nov 2025).
Automated mappings are limited by assumptions (e.g., independence, Gaussianity in VAD space) and require empirical validation, such as Jensen–Shannon divergence against human mixtures (Neto et al., 6 Feb 2026).

Recommended practices include matching the maximal-length n-grams for MWEs, using held-out subsets for unbiased model evaluation, using type-2 fuzzy representations where subjectivity or uncertainty is high, and reporting both distributional and discrete prediction metrics (Asif et al., 2024, Neto et al., 6 Feb 2026, Mohammad, 30 Mar 2025). Extensions involve large-scale, cross-lingual annotation initiatives, human-centric mappings for new emotion categories, and deeper integration of VAD space with neural affect modeling (Wrobel, 16 Nov 2025).

References: