GoEmotions Taxonomy Overview
- GoEmotions Taxonomy is a fine-grained emotion classification system with 27 emotion categories plus Neutral, derived from 58,000 Reddit comments.
- It employs a multi-label annotation protocol where up to three emotion labels per comment capture the complex, multiplex nature of online affect.
- Advanced analysis using PPCA and hierarchical clustering validates its structure and demonstrates robust cross-domain applicability in emotion recognition.
The GoEmotions taxonomy is a fine-grained, empirically derived categorization of 27 emotion categories plus a Neutral class, established through large-scale manual annotation of 58,000 English Reddit comments. Designed to enable nuanced emotion recognition in natural language processing, this taxonomy underpins the GoEmotions dataset, which is the largest manually annotated corpus of its kind for English. Each comment in the dataset is annotated with up to three emotion labels (or Neutral if no clear emotion is present), capturing the complex and often multidimensional nature of affective expression in online discourse (Demszky et al., 2020).
1. Taxonomy Structure and Category Definitions
The GoEmotions taxonomy comprises 27 named emotion categories plus Neutral, each accompanied by concise operational definitions and exemplars, to facilitate clarity and consistency in multi-label annotation. Annotators assign up to three labels per instance, reflecting the recognition that emotional content can be multiplex in authentic discourse. The categories are as follows:
| Category | Definition | Example |
|---|---|---|
| Admiration | Esteem or respect for someone or something | “Wow, I really admire your dedication to this project!” |
| Amusement | Finding something funny or entertaining | “Haha, that cat video cracked me up.” |
| Anger | Strong displeasure or hostility | “This policy makes me so furious!” |
| Annoyance | Mild irritation or bother | “Ugh, this slow loading is really annoying.” |
| Approval | Agreement or endorsement of something | “I totally approve of their new approach.” |
| Caring | Concern for another’s well-being | “I hope you’re doing okay after that news.” |
| Confusion | Uncertainty or lack of understanding | “I’m confused—what does this button do?” |
| Curiosity | Desire to learn or know more | “I wonder how they built that.” |
| Desire | Wanting or wishing for something | “I really want a puppy now.” |
| Disappointment | Sadness or displeasure at an unmet expectation | “That finale was such a let-down.” |
| Disapproval | Negative judgment or rejection | “I disapprove of that kind of behavior.” |
| Disgust | Revulsion or strong disliking | “That smell is absolutely disgusting.” |
| Embarrassment | Feeling awkward or self-conscious | “I can’t believe I said that—so embarrassing!” |
| Excitement | High arousal positive anticipation | “I’m so excited for the concert tonight!” |
| Fear | Perceived threat or worry | “I’m scared of what might happen next.” |
| Gratitude | Thankfulness or appreciation | “Thanks so much for your help!” |
| Grief | Deep sorrow, especially at loss | “I miss them so much; this grief won’t go away.” |
| Joy | Pleasure or great happiness | “I’m overjoyed to share the good news!” |
| Love | Deep affection or attachment | “I love you more every day.” |
| Nervousness | Anxiety or unease about an outcome | “I’m so nervous about the interview.” |
| Optimism | Hopeful outlook toward the future | “I’m hopeful we’ll get good results.” |
| Pride | Satisfaction in one’s or another’s achievement | “I’m so proud of how far you’ve come.” |
| Realization | Sudden understanding or insight | “Oh! Now I see what you meant.” |
| Relief | Alleviation of anxiety or distress | “What a relief that exam is over.” |
| Remorse | Deep regret or guilt for wrongdoing | “I truly regret what I said.” |
| Sadness | Unhappiness or sorrow | “I feel so sad today.” |
| Surprise | Startlement or astonishment | “Wow, I didn’t see that coming!” |
| Neutral | No strong emotion conveyed | “The meeting starts at 10 AM.” |
This typology enables granular emotion detection, advancing beyond coarse- or dual-label categorical approaches prevalent in prior affective computing work (Demszky et al., 2020).
2. Latent Structure and Principal Component Analysis
To assess the internal coherence of the annotated emotion space, Principal Preserved Component Analysis (PPCA) is applied to the binary label matrix %%%%1%%%%, where each row represents a comment and each column an emotion category. The sample covariance is computed as ; solving yields principal directions capturing axes of maximal preserved variance.
Hierarchical clustering on the first three principal components reveals emergent groupings with psychological plausibility, including:
- “Positive–High Arousal”: {Amusement, Excitement, Joy}
- “Positive–Low Arousal”: {Admiration, Approval, Gratitude, Pride}
- “Negative–Angry”: {Anger, Annoyance, Disapproval, Disgust}
- “Negative–Sad”: {Sadness, Disappointment, Grief, Remorse}
- “Fearful”: {Fear, Nervousness}
- “Cognitive/Uncertain”: {Confusion, Curiosity, Realization}
- “Affectionate”: {Love, Caring}
- “Future-oriented Positive”: {Desire, Optimism, Relief}
- “Self-conscious”: {Embarrassment}
- “Surprise”
These families validate the structured heterogeneity of the taxonomy and support the notion that annotator judgments preserve salient axes in emotion conceptual space (Demszky et al., 2020).
3. Annotation Methodology
Annotation involves randomly sampling publicly available Reddit comments, excluding topics related to pornography, politics, and personal data. Each comment is presented to three annotators via Google’s internal crowdsourcing interface, which displays all 27 emotion categories with definitions and examples, as well as the Neutral option.
Key procedural details:
- Each annotator may select up to three emotion labels per comment; if none apply, “Neutral” is chosen.
- Aggregation occurs via simple majority: a label is assigned if at least two of three annotators select it.
- Comments with no majority label are discarded (rare scenario).
This protocol enables both multi-label and Neutral labeling, supporting nuanced representation of affect in language (Demszky et al., 2020).
4. Inter-Annotator Agreement Metrics
Consistency of annotation is quantified using both pairwise Cohen’s κ and Krippendorff’s α, metrics commonly deployed for multicategorical nominal data. The average pairwise Cohen’s κ across all annotator pairs is approximately 0.45. Krippendorff’s α is approximately 0.30. While these values are moderate relative to single-label tasks, they are typical for the high label cardinality and subjective nature inherent in multi-way emotion annotation (Demszky et al., 2020).
- Cohen’s κ:
- Krippendorff’s α: where is observed disagreement, expected
The inherent subjectivity in emotion perception and the multi-label protocol explain the moderate agreement.
5. Label Distribution and Class Imbalance
Label prevalence exhibits substantial imbalance, which has implications for downstream model learning dynamics. The most frequent labels are Neutral (28.3%), Annoyance (13.1%), Joy (8.9%), and Anger (8.5%). Rare categories (≤1%) include Grief, Remorse, Embarrassment, Disappointment, and Realization.
| Category | Count | Percent |
|---|---|---|
| Neutral | 16,400 | 28.3% |
| Annoyance | 7,590 | 13.1% |
| Joy | 5,170 | 8.9% |
| Anger | 4,920 | 8.5% |
| Love | 4,650 | 8.0% |
| Admiration | 3,640 | 6.3% |
| Optimism | 3,620 | 6.3% |
| Caring | 3,820 | 6.6% |
| Approval | 3,520 | 6.1% |
| Sadness | 3,460 | 6.0% |
| Amusement | 3,330 | 5.8% |
| Excitement | 2,930 | 5.1% |
| Gratitude | 2,840 | 4.9% |
| Pride | 2,610 | 4.5% |
| Disapproval | 2,210 | 3.8% |
| Desire | 2,170 | 3.7% |
| Surprise | 1,780 | 3.1% |
| Confusion | 1,740 | 3.0% |
| Fear | 1,590 | 2.8% |
| Disgust | 1,490 | 2.6% |
| Relief | 1,180 | 2.0% |
| Curiosity | 1,150 | 2.0% |
| Nervousness | 1,140 | 2.0% |
| Disappointment | 1,030 | 1.8% |
| Realization | 640 | 1.1% |
| Embarrassment | 620 | 1.1% |
| Grief | 460 | 0.8% |
| Remorse | 430 | 0.8% |
High label imbalance signals the necessity for careful treatment of low-frequency classes, such as via reweighting or augmentation, in supervised modeling applications (Demszky et al., 2020).
6. Transferability and Cross-Corpus Evaluation
Cross-domain validity is assessed by fine-tuning a BERT-base classifier on GoEmotions and applying it, without additional tuning, to multiple established emotion benchmarks, including SemEval-2018 Task 1 (“Affect in Tweets”), the Emotion Stimulus dataset (Ghazi et al.), and the EmotionLines dialogue corpus (EmotionX).
- On SemEval-2018 Task 1, the GoEmotions-trained model yields an average F1 ≃ 0.68, exceeding a BERT baseline trained only on SemEval data (~0.63).
- On the Emotion Stimulus dataset, zero-shot F1 ≃ 0.52 compared to ~0.45 for an off-the-shelf BERT model.
- On EmotionLines, zero-shot classification accuracy improves by 3–5 points.
These results demonstrate that models trained on the GoEmotions taxonomy learn robust emotion representations that generalize across domains and taxonomic granularities, substantiating the resource’s utility in broad emotion understanding research (Demszky et al., 2020).
7. Implications and Application Considerations
The GoEmotions taxonomy provides a foundation for nuanced emotion classification research, supporting multi-label, fine-grained affective modeling. The empirical clustering supports the validity of the category structure, while annotation agreement metrics forewarn the inherent difficulty of subjective, multi-way emotion labeling. Data imbalance indicates a need for specialized modeling techniques on underrepresented categories. Empirical transfer learning results demonstrate the taxonomy’s broad applicability beyond its source domain, facilitating development and benchmarking of generalizable affective computing systems.