- The paper demonstrates a novel method using BERTopic and SC-WEAT to quantitatively assess gender bias in over 500K song lyrics.
- It finds that words related to intelligence and strength are predominantly associated with male-coded terms, while weakness and appearance align with female stereotypes.
- Analysis across genres reveals significant variations, particularly explicit misogyny in modern rap contrasted with mixed trends in rock and R&B.
Computational Analysis of Gender Bias in Song Lyrics: Insights from Topic Modeling and Word Embedding Association Tests
This paper systematically investigates gender bias in a large-scale corpus of English song lyrics by coupling unsupervised topic modeling with automated bias quantification techniques. Using a combined dataset of over half a million lyrics spanning multiple genres and decades, the authors leverage BERTopic to extract thematic structure, and the Single Category Word Embedding Association Test (SC-WEAT) to map the association between words related to gender attributes and gendered stereotypes. The multi-perspective approach illuminates not only aggregate trends in gender bias across the corpus, but also genre- and topic-specific manifestations with quantitative rigor.
Dataset Construction and Scope
The corpus combines metadata from the WASABI Song Corpus and English lyric content from Genius, resulting in 537,553 songs predominantly categorized into pop, rap, rock, country, and R&B. The temporal range covers the 1950s to 2022, enabling diachronic analysis. Stratified sampling is used for downstream topic modeling to control for genre prevalence, mitigating the risk of over-representing dominant genres such as pop.
Topic Discovery via BERTopic
The authors employ BERTopic, which integrates transformer-based embeddings (all-MiniLM-L6-V2) and class-based TF-IDF (c-TF-IDF) to derive interpretable topic clusters. The methodology includes:
- Embedding all samples and dimensionality reduction for scalability.
- Clustering with a density-based method, assigning each lyric to one of 541 emergent topics (with 1.5% outliers).
- Topic attribution allows the tracking of theme dynamics across genres and decades.
Strong findings in topic modeling include:
- Pop demonstrates thematic diversity, while rap is highly concentrated: one topic ("nigga_niggas_bitch") encompasses 37.88% of rap songs.
- Emergent rap topics from the 1990s onward—marked by misogynistic and profane lexicon—contrast with earlier decades’ prominence of romantic or sentimental topics.
- Thematic shift is validated by instance-level and aggregate c-TF-IDF analyses, and aligns with external content studies on sexualization and profanity in music lyrics.
Quantitative Gender Bias Measurement with SC-WEAT
To quantify gender bias in lyrics, word embeddings (Word2Vec, trained per-genre and per-topic) are subjected to the SC-WEAT, associating six semantically motivated target sets (e.g., Intelligence, Strength, Weakness, Appearance) with gendered attribute sets (Male, Female). The rationale for static embeddings includes their suitability for capturing corpus-level, context-invariant biases over contemporary contextual models.
Key computational findings:
- Intelligence and Strength words are consistently more aligned with male attribute vectors, especially in rap and country, as indicated by positive SC-WEAT effect sizes.
- Weakness and Appearance words demonstrate significant female bias across four of five genres, further reinforcing traditional gender role stereotypes.
- Genre- and topic-specific analyses are crucial for granularity: e.g., while Appearance words are female-biased overall, some topics in R&B exhibit a relative male bias in this set; Intelligence words in certain rock topics are female-biased, defying the aggregated trend.
- Topic-specific SC-WEAT results reveal that even the same semantic domain ("tears_heart_wish") can display different gender associations depending on genre, highlighting the interaction between thematic content and genre convention.
Implications and Limitations
The paper highlights critical, quantitative evidence of entrenched gender biases and evolving thematic content in English-language music lyrics. Profanity and explicit misogyny are particularly salient in modern rap, while traditional stereotypes tying strength/intelligence to men and weakness/appearance to women persist across genres. These findings robustly corroborate and extend prior manual and computational studies.
On a methodological front, the direct coupling of interpretable topic projections with fine-grained bias assessment provides a reproducible template for similar corpus-level bias studies. However, several limitations warrant scrutiny:
- The study relies on a binary gender schema, thus omitting non-binary identities and spectrum effects.
- Static word embeddings do not account for polysemy or context-specific valence, potentially flattening nuanced usage.
- Assignment of a single topic per lyric by BERTopic may neglect the common occurrence of mixed topicalities in music.
- Surveyed corpus spans only lyrics in English, limiting generalizability.
Directions for Future Research
Substantial opportunities exist for methodological elaboration and expansive sociotechnical investigation:
- Multilingual extension to compare bias propagation cross-culturally and internationally.
- Use of contextualized embedding models to disambiguate word sense and further refine association tests.
- Modeling gender as a non-binary construct to capture spectrum and intersectional bias.
- Joint topic modeling allowing for multi-topic attribution per song, thus reflecting hybrid thematics.
- Extending the approach to audio features and music videos for multimodal bias studies.
Conclusion
The integration of topic modeling and embedding-association testing demonstrates that gender bias in song lyrics is both persistent and highly modulated by genre and thematic context. Concrete numerical trends (e.g., SC-WEAT effect sizes, topic prevalence proportions) underpin critical insights, especially regarding the association of intelligence and strength with male-coded terms and the sexualization/weakness association with female-coded terms. Further, the topic- and genre-conditional variations stress the utility of high-resolution, computationally driven analysis for understanding the transmission of cultural bias through music. This framework holds utility for cultural analytics, digital humanities, and the critical examination of media-generated stereotypes, informing future interventions in content moderation, cultural policy, and AI fairness in creative domains.