Partition Cardinality Matrix in Emotion Analysis
- Partition Cardinality Matrix is a representation capturing the distribution and overlap of labels in multi-label datasets, as illustrated by the GoEmotions taxonomy.
- It is derived from a binary label matrix and uses covariance analysis with principal component methods to uncover coherent affective clusters.
- The structure aids in addressing label imbalance and guides model adaptations, quality control practices, and cross-domain transfer for improved emotion detection.
The GoEmotions taxonomy defines a fine-grained scheme for categorizing expressed emotion in English-language online text, developed as part of the @@@@1@@@@. The taxonomy encompasses 27 distinct emotion categories plus a Neutral label, resulting from a large-scale manual annotation of 58,000 Reddit comments. Each utterance may be labeled with up to three emotion categories (or Neutral, if no emotional content is present), enabling nuanced representation of multiple, co-occurring emotional states. The taxonomy and its underlying dataset serve as an empirical foundation for emotion analysis in NLP and have demonstrated robust transfer potential for benchmarks beyond their source corpus (Demszky et al., 2020).
1. Taxonomy Definition and Scope
The GoEmotions taxonomy comprises 27 named emotion categories plus Neutral, each defined by concise descriptions and supported with illustrative examples. Annotators assign up to three categories per text segment or assign Neutral in the absence of emotional content. The full inventory is as follows:
| Category | Description |
|---|---|
| Admiration | Esteem or respect |
| Amusement | Finding something funny or entertaining |
| Anger | Strong displeasure or hostility |
| Annoyance | Mild irritation or bother |
| Approval | Agreement or endorsement |
| Caring | Concern for another’s well-being |
| Confusion | Uncertainty or lack of understanding |
| Curiosity | Desire to learn or know more |
| Desire | Wanting or wishing for something |
| Disappointment | Sadness/displeasure at an unmet expectation |
| Disapproval | Negative judgment or rejection |
| Disgust | Revulsion or strong disliking |
| Embarrassment | Feeling awkward or self-conscious |
| Excitement | High arousal positive anticipation |
| Fear | Perceived threat or worry |
| Gratitude | Thankfulness or appreciation |
| Grief | Deep sorrow, especially at loss |
| Joy | Pleasure or great happiness |
| Love | Deep affection or attachment |
| Nervousness | Anxiety or unease about an outcome |
| Optimism | Hopeful outlook toward the future |
| Pride | Satisfaction in achievement |
| Realization | Sudden understanding or insight |
| Relief | Alleviation of anxiety or distress |
| Remorse | Deep regret or guilt for wrongdoing |
| Sadness | Unhappiness or sorrow |
| Surprise | Startlement or astonishment |
| Neutral | No strong emotion conveyed |
This scope enables annotation of both basic and complex affective states observed in user-generated online discourse. Definitions for each category and usage instructions were provided to annotators to minimize ambiguity (Demszky et al., 2020).
2. Empirical Structure and Category Groupings
To examine and validate the latent structure of the proposed taxonomy, the authors applied Principal Preserved Component Analysis (PPCA) to the co-labeling covariance matrix derived from the binary label matrix ( for comments and 28 labels). The covariance is computed as
and principal directions are found by solving
Hierarchical clustering on the first three principal components revealed coherent clusters corresponding to broad affective families, such as:
- Positive–High Arousal: {Amusement, Excitement, Joy}
- Positive–Low Arousal: {Admiration, Approval, Gratitude, Pride}
- Negative–Angry: {Anger, Annoyance, Disapproval, Disgust}
- Negative–Sad: {Sadness, Disappointment, Grief, Remorse}
- Fearful: {Fear, Nervousness}
- Cognitive/Uncertain: {Confusion, Curiosity, Realization}
- Affectionate: {Love, Caring}
- Future-oriented Positive: {Desire, Optimism, Relief}
- Self-conscious: {Embarrassment}
- Surprise
These observed relationships support the internal consistency of the taxonomy and establish an emotion space compatible with hierarchical or multi-label modeling approaches.
3. Annotation Protocols and Quality Control
Text samples were randomly selected from public Reddit comments, excluding datasets associated with pornography, politics, or personally identifiable information. Annotation was conducted using Google’s internal crowdsourcing interface, displaying all 27 emotion categories, definitions, and example sentences, as well as the Neutral label.
Each comment received three independent annotations. Annotators could select up to three emotion categories, or Neutral if no emotion matched. Majority vote aggregation assigned a label to a comment if selected by at least two out of three annotators. Comments with no majority label—an infrequent occurrence—were excluded from the dataset (Demszky et al., 2020).
4. Agreement Metrics and Subjectivity
Label consistency was quantified using two canonical measures for multi-rater, nominal classification:
- Pairwise Cohen’s : Average .
- Krippendorff’s : .
The equations are:
where is observed agreement and is chance agreement, and
where is observed disagreement and is expected disagreement.
These moderate values are consistent with those reported for other tasks involving many categories and subjective affective judgments. The observed agreement reflects both the complexity of emotion perception in language and the multi-label protocol.
5. Label Frequency, Imbalance, and Modeling Implications
The final annotated collection of 58,000 comments exhibits substantial label imbalance, detailed below:
| Category | Count | Percent |
|---|---|---|
| Neutral | 16,400 | 28.3% |
| Admiration | 3,640 | 6.3% |
| Amusement | 3,330 | 5.8% |
| Anger | 4,920 | 8.5% |
| Annoyance | 7,590 | 13.1% |
| Approval | 3,520 | 6.1% |
| Caring | 3,820 | 6.6% |
| Confusion | 1,740 | 3.0% |
| Curiosity | 1,150 | 2.0% |
| Desire | 2,170 | 3.7% |
| Disappointment | 1,030 | 1.8% |
| Disapproval | 2,210 | 3.8% |
| Disgust | 1,490 | 2.6% |
| Embarrassment | 620 | 1.1% |
| Excitement | 2,930 | 5.1% |
| Fear | 1,590 | 2.8% |
| Gratitude | 2,840 | 4.9% |
| Grief | 460 | 0.8% |
| Joy | 5,170 | 8.9% |
| Love | 4,650 | 8.0% |
| Nervousness | 1,140 | 2.0% |
| Optimism | 3,620 | 6.3% |
| Pride | 2,610 | 4.5% |
| Realization | 640 | 1.1% |
| Relief | 1,180 | 2.0% |
| Remorse | 430 | 0.8% |
| Sadness | 3,460 | 6.0% |
| Surprise | 1,780 | 3.1% |
High-frequency categories include Neutral, Annoyance, Joy, and Anger. Mid-frequency categories encompass Admiration, Caring, Approval, and Optimism. Rare categories (≤1%) are Grief, Remorse, Embarrassment, Disappointment, and Realization. This distribution suggests that models trained on GoEmotions must address class imbalance, especially for low-resource labels, possibly via class reweighting, data augmentation, or other tailored approaches.
6. Evaluation via Transfer Learning and Cross-Domain Validity
To assess the generalization potential of the taxonomy, a BERT-base classifier fine-tuned on GoEmotions was evaluated—without further adaptation—on multiple standard emotion analysis benchmarks, including SemEval-2018 Task 1: Affect in Tweets, the Emotion Stimulus dataset, and EmotionLines (EmotionX).
Results demonstrate that GoEmotions-trained models provide useful representations that transfer favorably across both coarser taxonomies and out-of-domain tasks. Notably:
- On SemEval-2018 Task 1 (four “basic” emotions): GoEmotions-trained model average F1 ≈ 0.68, outperforming a BERT baseline trained only on SemEval (~0.63).
- On the Emotion Stimulus dataset: zero-shot F1 ≈ 0.52, versus ~0.45 for off-the-shelf BERT.
- On EmotionLines: zero-shot accuracy increased by 3–5 points.
These outcomes indicate that the GoEmotions taxonomy is not only descriptively fine-grained but also functionally robust for emotion analysis tasks beyond its initial corpus (Demszky et al., 2020).
7. Significance and Prospective Applications
The GoEmotions taxonomy establishes a rigorous, fine-grained foundation for categorical emotion annotation, enabling more nuanced modeling of affect in textual data. It is particularly suitable for applications requiring multidimensional emotion detection, such as empathetic dialog systems, affective content moderation, and detailed social media analysis. The taxonomy’s success in cross-benchmark transfer also positions it as a resource for universal affective representation learning. A plausible implication is that continued research on handling annotation subjectivity and rare label modeling in such taxonomies may drive further advances in this domain.