GoEmotions Taxonomy Overview

Updated 22 January 2026

GoEmotions Taxonomy is a fine-grained emotion classification system with 27 emotion categories plus Neutral, derived from 58,000 Reddit comments.
It employs a multi-label annotation protocol where up to three emotion labels per comment capture the complex, multiplex nature of online affect.
Advanced analysis using PPCA and hierarchical clustering validates its structure and demonstrates robust cross-domain applicability in emotion recognition.

The GoEmotions taxonomy is a fine-grained, empirically derived categorization of 27 emotion categories plus a Neutral class, established through large-scale manual annotation of 58,000 English Reddit comments. Designed to enable nuanced emotion recognition in natural language processing, this taxonomy underpins the GoEmotions dataset, which is the largest manually annotated corpus of its kind for English. Each comment in the dataset is annotated with up to three emotion labels (or Neutral if no clear emotion is present), capturing the complex and often multidimensional nature of affective expression in online discourse (Demszky et al., 2020).

1. Taxonomy Structure and Category Definitions

The GoEmotions taxonomy comprises 27 named emotion categories plus Neutral, each accompanied by concise operational definitions and exemplars, to facilitate clarity and consistency in multi-label annotation. Annotators assign up to three labels per instance, reflecting the recognition that emotional content can be multiplex in authentic discourse. The categories are as follows:

Category	Definition	Example
Admiration	Esteem or respect for someone or something	“Wow, I really admire your dedication to this project!”
Amusement	Finding something funny or entertaining	“Haha, that cat video cracked me up.”
Anger	Strong displeasure or hostility	“This policy makes me so furious!”
Annoyance	Mild irritation or bother	“Ugh, this slow loading is really annoying.”
Approval	Agreement or endorsement of something	“I totally approve of their new approach.”
Caring	Concern for another’s well-being	“I hope you’re doing okay after that news.”
Confusion	Uncertainty or lack of understanding	“I’m confused—what does this button do?”
Curiosity	Desire to learn or know more	“I wonder how they built that.”
Desire	Wanting or wishing for something	“I really want a puppy now.”
Disappointment	Sadness or displeasure at an unmet expectation	“That finale was such a let-down.”
Disapproval	Negative judgment or rejection	“I disapprove of that kind of behavior.”
Disgust	Revulsion or strong disliking	“That smell is absolutely disgusting.”
Embarrassment	Feeling awkward or self-conscious	“I can’t believe I said that—so embarrassing!”
Excitement	High arousal positive anticipation	“I’m so excited for the concert tonight!”
Fear	Perceived threat or worry	“I’m scared of what might happen next.”
Gratitude	Thankfulness or appreciation	“Thanks so much for your help!”
Grief	Deep sorrow, especially at loss	“I miss them so much; this grief won’t go away.”
Joy	Pleasure or great happiness	“I’m overjoyed to share the good news!”
Love	Deep affection or attachment	“I love you more every day.”
Nervousness	Anxiety or unease about an outcome	“I’m so nervous about the interview.”
Optimism	Hopeful outlook toward the future	“I’m hopeful we’ll get good results.”
Pride	Satisfaction in one’s or another’s achievement	“I’m so proud of how far you’ve come.”
Realization	Sudden understanding or insight	“Oh! Now I see what you meant.”
Relief	Alleviation of anxiety or distress	“What a relief that exam is over.”
Remorse	Deep regret or guilt for wrongdoing	“I truly regret what I said.”
Sadness	Unhappiness or sorrow	“I feel so sad today.”
Surprise	Startlement or astonishment	“Wow, I didn’t see that coming!”
Neutral	No strong emotion conveyed	“The meeting starts at 10 AM.”

This typology enables granular emotion detection, advancing beyond coarse- or dual-label categorical approaches prevalent in prior affective computing work (Demszky et al., 2020).

2. Latent Structure and Principal Component Analysis

To assess the internal coherence of the annotated emotion space, Principal Preserved Component Analysis (PPCA) is applied to the $n \times 28$ binary label matrix $X$ , where each row represents a comment and each column an emotion category. The sample covariance is computed as $\mathrm{Cov}(X) = (1/n) X^\top X$ ; solving $\mathrm{Cov}(X) \cdot v = \lambda v$ yields principal directions $v_1, v_2, \dots$ capturing axes of maximal preserved variance.

Hierarchical clustering on the first three principal components reveals emergent groupings with psychological plausibility, including:

“Positive–High Arousal”: {Amusement, Excitement, Joy}
“Positive–Low Arousal”: {Admiration, Approval, Gratitude, Pride}
“Negative–Angry”: {Anger, Annoyance, Disapproval, Disgust}
“Negative–Sad”: {Sadness, Disappointment, Grief, Remorse}
“Fearful”: {Fear, Nervousness}
“Cognitive/Uncertain”: {Confusion, Curiosity, Realization}
“Affectionate”: {Love, Caring}
“Future-oriented Positive”: {Desire, Optimism, Relief}
“Self-conscious”: {Embarrassment}
“Surprise”

These families validate the structured heterogeneity of the taxonomy and support the notion that annotator judgments preserve salient axes in emotion conceptual space (Demszky et al., 2020).

3. Annotation Methodology

Annotation involves randomly sampling publicly available Reddit comments, excluding topics related to pornography, politics, and personal data. Each comment is presented to three annotators via Google’s internal crowdsourcing interface, which displays all 27 emotion categories with definitions and examples, as well as the Neutral option.

Key procedural details:

Each annotator may select up to three emotion labels per comment; if none apply, “Neutral” is chosen.
Aggregation occurs via simple majority: a label is assigned if at least two of three annotators select it.
Comments with no majority label are discarded (rare scenario).

This protocol enables both multi-label and Neutral labeling, supporting nuanced representation of affect in language (Demszky et al., 2020).

4. Inter-Annotator Agreement Metrics

Consistency of annotation is quantified using both pairwise Cohen’s κ and Krippendorff’s α, metrics commonly deployed for multicategorical nominal data. The average pairwise Cohen’s κ across all annotator pairs is approximately 0.45. Krippendorff’s α is approximately 0.30. While these values are moderate relative to single-label tasks, they are typical for the high label cardinality and subjective nature inherent in multi-way emotion annotation (Demszky et al., 2020).

Cohen’s κ: $κ = (p_o - p_e)/(1 - p_e)$
Krippendorff’s α: $α = 1 - D_o/D_e,$ where $D_o$ is observed disagreement, $D_e$ expected

The inherent subjectivity in emotion perception and the multi-label protocol explain the moderate agreement.

5. Label Distribution and Class Imbalance

Label prevalence exhibits substantial imbalance, which has implications for downstream model learning dynamics. The most frequent labels are Neutral (28.3%), Annoyance (13.1%), Joy (8.9%), and Anger (8.5%). Rare categories (≤1%) include Grief, Remorse, Embarrassment, Disappointment, and Realization.

Category	Count	Percent
Neutral	16,400	28.3%
Annoyance	7,590	13.1%
Joy	5,170	8.9%
Anger	4,920	8.5%
Love	4,650	8.0%
Admiration	3,640	6.3%
Optimism	3,620	6.3%
Caring	3,820	6.6%
Approval	3,520	6.1%
Sadness	3,460	6.0%
Amusement	3,330	5.8%
Excitement	2,930	5.1%
Gratitude	2,840	4.9%
Pride	2,610	4.5%
Disapproval	2,210	3.8%
Desire	2,170	3.7%
Surprise	1,780	3.1%
Confusion	1,740	3.0%
Fear	1,590	2.8%
Disgust	1,490	2.6%
Relief	1,180	2.0%
Curiosity	1,150	2.0%
Nervousness	1,140	2.0%
Disappointment	1,030	1.8%
Realization	640	1.1%
Embarrassment	620	1.1%
Grief	460	0.8%
Remorse	430	0.8%

High label imbalance signals the necessity for careful treatment of low-frequency classes, such as via reweighting or augmentation, in supervised modeling applications (Demszky et al., 2020).

6. Transferability and Cross-Corpus Evaluation

Cross-domain validity is assessed by fine-tuning a BERT-base classifier on GoEmotions and applying it, without additional tuning, to multiple established emotion benchmarks, including SemEval-2018 Task 1 (“Affect in Tweets”), the Emotion Stimulus dataset (Ghazi et al.), and the EmotionLines dialogue corpus (EmotionX).

On SemEval-2018 Task 1, the GoEmotions-trained model yields an average F1 ≃ 0.68, exceeding a BERT baseline trained only on SemEval data (~0.63).
On the Emotion Stimulus dataset, zero-shot F1 ≃ 0.52 compared to ~0.45 for an off-the-shelf BERT model.
On EmotionLines, zero-shot classification accuracy improves by 3–5 points.

These results demonstrate that models trained on the GoEmotions taxonomy learn robust emotion representations that generalize across domains and taxonomic granularities, substantiating the resource’s utility in broad emotion understanding research (Demszky et al., 2020).

7. Implications and Application Considerations

The GoEmotions taxonomy provides a foundation for nuanced emotion classification research, supporting multi-label, fine-grained affective modeling. The empirical clustering supports the validity of the category structure, while annotation agreement metrics forewarn the inherent difficulty of subjective, multi-way emotion labeling. Data imbalance indicates a need for specialized modeling techniques on underrepresented categories. Empirical transfer learning results demonstrate the taxonomy’s broad applicability beyond its source domain, facilitating development and benchmarking of generalizable affective computing systems.

Markdown Report Issue Upgrade to Chat

References (1)

GoEmotions: A Dataset of Fine-Grained Emotions (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GoEmotions Taxonomy.