Papers
Topics
Authors
Recent
Search
2000 character limit reached

GoEmotions Taxonomy Overview

Updated 22 January 2026
  • GoEmotions Taxonomy is a fine-grained emotion classification system with 27 emotion categories plus Neutral, derived from 58,000 Reddit comments.
  • It employs a multi-label annotation protocol where up to three emotion labels per comment capture the complex, multiplex nature of online affect.
  • Advanced analysis using PPCA and hierarchical clustering validates its structure and demonstrates robust cross-domain applicability in emotion recognition.

The GoEmotions taxonomy is a fine-grained, empirically derived categorization of 27 emotion categories plus a Neutral class, established through large-scale manual annotation of 58,000 English Reddit comments. Designed to enable nuanced emotion recognition in natural language processing, this taxonomy underpins the GoEmotions dataset, which is the largest manually annotated corpus of its kind for English. Each comment in the dataset is annotated with up to three emotion labels (or Neutral if no clear emotion is present), capturing the complex and often multidimensional nature of affective expression in online discourse (Demszky et al., 2020).

1. Taxonomy Structure and Category Definitions

The GoEmotions taxonomy comprises 27 named emotion categories plus Neutral, each accompanied by concise operational definitions and exemplars, to facilitate clarity and consistency in multi-label annotation. Annotators assign up to three labels per instance, reflecting the recognition that emotional content can be multiplex in authentic discourse. The categories are as follows:

Category Definition Example
Admiration Esteem or respect for someone or something “Wow, I really admire your dedication to this project!”
Amusement Finding something funny or entertaining “Haha, that cat video cracked me up.”
Anger Strong displeasure or hostility “This policy makes me so furious!”
Annoyance Mild irritation or bother “Ugh, this slow loading is really annoying.”
Approval Agreement or endorsement of something “I totally approve of their new approach.”
Caring Concern for another’s well-being “I hope you’re doing okay after that news.”
Confusion Uncertainty or lack of understanding “I’m confused—what does this button do?”
Curiosity Desire to learn or know more “I wonder how they built that.”
Desire Wanting or wishing for something “I really want a puppy now.”
Disappointment Sadness or displeasure at an unmet expectation “That finale was such a let-down.”
Disapproval Negative judgment or rejection “I disapprove of that kind of behavior.”
Disgust Revulsion or strong disliking “That smell is absolutely disgusting.”
Embarrassment Feeling awkward or self-conscious “I can’t believe I said that—so embarrassing!”
Excitement High arousal positive anticipation “I’m so excited for the concert tonight!”
Fear Perceived threat or worry “I’m scared of what might happen next.”
Gratitude Thankfulness or appreciation “Thanks so much for your help!”
Grief Deep sorrow, especially at loss “I miss them so much; this grief won’t go away.”
Joy Pleasure or great happiness “I’m overjoyed to share the good news!”
Love Deep affection or attachment “I love you more every day.”
Nervousness Anxiety or unease about an outcome “I’m so nervous about the interview.”
Optimism Hopeful outlook toward the future “I’m hopeful we’ll get good results.”
Pride Satisfaction in one’s or another’s achievement “I’m so proud of how far you’ve come.”
Realization Sudden understanding or insight “Oh! Now I see what you meant.”
Relief Alleviation of anxiety or distress “What a relief that exam is over.”
Remorse Deep regret or guilt for wrongdoing “I truly regret what I said.”
Sadness Unhappiness or sorrow “I feel so sad today.”
Surprise Startlement or astonishment “Wow, I didn’t see that coming!”
Neutral No strong emotion conveyed “The meeting starts at 10 AM.”

This typology enables granular emotion detection, advancing beyond coarse- or dual-label categorical approaches prevalent in prior affective computing work (Demszky et al., 2020).

2. Latent Structure and Principal Component Analysis

To assess the internal coherence of the annotated emotion space, Principal Preserved Component Analysis (PPCA) is applied to the n×28n \times 28 binary label matrix %%%%1%%%%, where each row represents a comment and each column an emotion category. The sample covariance is computed as Cov(X)=(1/n)XX\mathrm{Cov}(X) = (1/n) X^\top X; solving Cov(X)v=λv\mathrm{Cov}(X) \cdot v = \lambda v yields principal directions v1,v2,v_1, v_2, \dots capturing axes of maximal preserved variance.

Hierarchical clustering on the first three principal components reveals emergent groupings with psychological plausibility, including:

  • “Positive–High Arousal”: {Amusement, Excitement, Joy}
  • “Positive–Low Arousal”: {Admiration, Approval, Gratitude, Pride}
  • “Negative–Angry”: {Anger, Annoyance, Disapproval, Disgust}
  • “Negative–Sad”: {Sadness, Disappointment, Grief, Remorse}
  • “Fearful”: {Fear, Nervousness}
  • “Cognitive/Uncertain”: {Confusion, Curiosity, Realization}
  • “Affectionate”: {Love, Caring}
  • “Future-oriented Positive”: {Desire, Optimism, Relief}
  • “Self-conscious”: {Embarrassment}
  • “Surprise”

These families validate the structured heterogeneity of the taxonomy and support the notion that annotator judgments preserve salient axes in emotion conceptual space (Demszky et al., 2020).

3. Annotation Methodology

Annotation involves randomly sampling publicly available Reddit comments, excluding topics related to pornography, politics, and personal data. Each comment is presented to three annotators via Google’s internal crowdsourcing interface, which displays all 27 emotion categories with definitions and examples, as well as the Neutral option.

Key procedural details:

  • Each annotator may select up to three emotion labels per comment; if none apply, “Neutral” is chosen.
  • Aggregation occurs via simple majority: a label is assigned if at least two of three annotators select it.
  • Comments with no majority label are discarded (rare scenario).

This protocol enables both multi-label and Neutral labeling, supporting nuanced representation of affect in language (Demszky et al., 2020).

4. Inter-Annotator Agreement Metrics

Consistency of annotation is quantified using both pairwise Cohen’s κ and Krippendorff’s α, metrics commonly deployed for multicategorical nominal data. The average pairwise Cohen’s κ across all annotator pairs is approximately 0.45. Krippendorff’s α is approximately 0.30. While these values are moderate relative to single-label tasks, they are typical for the high label cardinality and subjective nature inherent in multi-way emotion annotation (Demszky et al., 2020).

  • Cohen’s κ: κ=(pope)/(1pe)κ = (p_o - p_e)/(1 - p_e)
  • Krippendorff’s α: α=1Do/De,α = 1 - D_o/D_e, where DoD_o is observed disagreement, DeD_e expected

The inherent subjectivity in emotion perception and the multi-label protocol explain the moderate agreement.

5. Label Distribution and Class Imbalance

Label prevalence exhibits substantial imbalance, which has implications for downstream model learning dynamics. The most frequent labels are Neutral (28.3%), Annoyance (13.1%), Joy (8.9%), and Anger (8.5%). Rare categories (≤1%) include Grief, Remorse, Embarrassment, Disappointment, and Realization.

Category Count Percent
Neutral 16,400 28.3%
Annoyance 7,590 13.1%
Joy 5,170 8.9%
Anger 4,920 8.5%
Love 4,650 8.0%
Admiration 3,640 6.3%
Optimism 3,620 6.3%
Caring 3,820 6.6%
Approval 3,520 6.1%
Sadness 3,460 6.0%
Amusement 3,330 5.8%
Excitement 2,930 5.1%
Gratitude 2,840 4.9%
Pride 2,610 4.5%
Disapproval 2,210 3.8%
Desire 2,170 3.7%
Surprise 1,780 3.1%
Confusion 1,740 3.0%
Fear 1,590 2.8%
Disgust 1,490 2.6%
Relief 1,180 2.0%
Curiosity 1,150 2.0%
Nervousness 1,140 2.0%
Disappointment 1,030 1.8%
Realization 640 1.1%
Embarrassment 620 1.1%
Grief 460 0.8%
Remorse 430 0.8%

High label imbalance signals the necessity for careful treatment of low-frequency classes, such as via reweighting or augmentation, in supervised modeling applications (Demszky et al., 2020).

6. Transferability and Cross-Corpus Evaluation

Cross-domain validity is assessed by fine-tuning a BERT-base classifier on GoEmotions and applying it, without additional tuning, to multiple established emotion benchmarks, including SemEval-2018 Task 1 (“Affect in Tweets”), the Emotion Stimulus dataset (Ghazi et al.), and the EmotionLines dialogue corpus (EmotionX).

  • On SemEval-2018 Task 1, the GoEmotions-trained model yields an average F1 ≃ 0.68, exceeding a BERT baseline trained only on SemEval data (~0.63).
  • On the Emotion Stimulus dataset, zero-shot F1 ≃ 0.52 compared to ~0.45 for an off-the-shelf BERT model.
  • On EmotionLines, zero-shot classification accuracy improves by 3–5 points.

These results demonstrate that models trained on the GoEmotions taxonomy learn robust emotion representations that generalize across domains and taxonomic granularities, substantiating the resource’s utility in broad emotion understanding research (Demszky et al., 2020).

7. Implications and Application Considerations

The GoEmotions taxonomy provides a foundation for nuanced emotion classification research, supporting multi-label, fine-grained affective modeling. The empirical clustering supports the validity of the category structure, while annotation agreement metrics forewarn the inherent difficulty of subjective, multi-way emotion labeling. Data imbalance indicates a need for specialized modeling techniques on underrepresented categories. Empirical transfer learning results demonstrate the taxonomy’s broad applicability beyond its source domain, facilitating development and benchmarking of generalizable affective computing systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GoEmotions Taxonomy.