Papers
Topics
Authors
Recent
Search
2000 character limit reached

Multilabel Movie Genre Classification

Updated 14 January 2026
  • Multilabel movie genre classification is a task that assigns one or more genres to films using multimodal data such as text, visuals, audio, and metadata.
  • Advanced techniques like transformer models, deep neural networks, and multimodal fusion strategies achieve high performance, with metrics reaching up to 90% Macro-F1 on benchmark datasets.
  • Applications span recommendation engines, digital archives, and content-based retrieval, while challenges include addressing class imbalance and incomplete metadata.

Multilabel movie genre classification refers to the problem of automatically assigning one or more genres to a movie instance, recognizing that most films exhibit multiple overlapping genre characteristics. This task is a principal challenge at the intersection of information retrieval, machine learning, and multimedia analysis, underpinning recommender systems, archival organization, and audience expectation modeling. In contrast to single-label genre prediction, multilabel classification allows each movie to be tagged with any subset of a predefined genre set, reflecting the complex and combinatorial nature of contemporary film categorization.

1. Problem Formulation and Datasets

Formally, multilabel genre classification seeks a function f:X→2Gf : X \rightarrow 2^{\mathcal{G}}, where XX is the feature space (metadata, video, audio, text, or multimodal representations), and G={g1,…,gK}\mathcal{G} = \{g_1, \ldots, g_K\} is the set of KK possible genres. Each movie instance is annotated with a binary vector y∈{0,1}Ky \in \{0,1\}^K. Primary benchmark datasets include:

Label cardinality (average number of labels per movie) varies, typically between 1.8 and 3.7, with significant long-tailed imbalance.

2. Feature Modalities and Representation

Genres can be inferred from a broad range of modalities, either unimodal or in multimodal fusion:

Some systems employ explicit feature engineering (e.g., handcrafted LBP descriptors on posters and spectrograms (Mangolin et al., 2020)) while others focus exclusively on end-to-end deep learning approaches.

3. Model Architectures and Fusion Strategies

A wide array of classifier architectures have been evaluated for multilabel movie genre classification, including:

Example: The "Movie-CLIP" model fuses sparse shot-sampled CLIP-visual features, PANNs-audio, and keyword-filtered CLIP-textual features via a learnable scalar-weighted sum, achieving a macro-mAP of 65.4% on MovieNet (Zhang et al., 2022).

4. Loss Functions, Inference, and Thresholding

The standard multi-label setting employs the following candidate losses and inference policies:

Systems supporting probabilistic outputs enable retrieval and fine-grained semantic similarity, as in "Genre Spectrum" (Agrawal et al., 2023) and NT-Xent fine-tuned clustering (Fish et al., 2020).

5. Empirical Evaluation and Results

Evaluation is conducted using a range of multilabel metrics:

Paper/Method Macro-F1 / mAP Micro-F1 / mAP Hamming Loss Best Modality Fusion Key Dataset
Genre Spectrum (Agrawal et al., 2023) ≈0.90 / — 0.78 / — — BERT/GPT-4 multi-label MLPs IMDb, Rotten Tomatoes, Gracenote
IDKG (Li et al., 2023) 0.832 0.849 — KG + poster + plot fusion, contrastive MM-IMDb, MM-IMDb 2.0
Movie-CLIP (Zhang et al., 2022) 65.4% mAP 75.2% mAP — Visual+audio+ASR-CLIP fusion MovieNet
MMX-Trailer-20 (Fish et al., 2020) — / 0.597 — / 0.583 — Collab. gating over expert nets MMX-Trailer-20
Poster (ERDT) (Nareti et al., 2023) 56.4% F1 — 0.1655 ResDenseTransf. ensemble IMDb Posters
Poster (MCAM+SMSAM) (Nareti et al., 2024) 68.2% F1 — — CLIP bi-modal, cross-attn IMDb Posters
Trailers12k (Montalvo-Lezama et al., 2022) 0.756 μAP (75.6%) — — Swin-3D Transformer Trailers12k
Multi-modal (late fusion) (Mangolin et al., 2020) F1=0.628 — — LSTM on synopsis + CNN on video TMDb / OpenSubtitles / Posters

Macro-F1 and mean average precision (mAP) remain standard, but many works report per-class/genre F1, balanced accuracy, hit ratio, and Jaccard index. The highest reported macro-F1/mAP values on large, modern, multimodal sets approach ≈0.83–0.90 (Agrawal et al., 2023, Li et al., 2023). Ensemble fusion and KG-guided contrastive learning yield the largest improvements, especially for long-tail genres.

6. Domains of Application, Extensions, and Limitations

Applications:

  • Automated genre annotation for digital archives, recommendation engines, and streaming platforms (Agrawal et al., 2023).
  • Content-based retrieval and clustering using genre spectrum or NT-Xent-like embedding spaces (Fish et al., 2020).
  • Fine-grained similarity search (e.g., "nearby" movies in multilabel semantic space).

Extensions:

  • Inclusion of micro-genres using LLM-generated labels (Agrawal et al., 2023).
  • Knowledge graph integration for stronger metadata reasoning (Li et al., 2023).
  • Fine-grained semantic clustering to dissociate subtle style/tone blends within/between coarse-class genres (Fish et al., 2020).
  • Multimodal label augmentation through cross-modal co-occurrence inference (Nareti et al., 2023).

Limitations:

  • Performance degrades on low-signal classes and with imbalanced label distributions (Nareti et al., 2023, Nareti et al., 2024).
  • Models relying solely on visual, audio, or shallow text representations underperform those aggregating deep contextual embeddings or multi-source fusion (Hoang, 2018, Mangolin et al., 2020, Agrawal et al., 2023).
  • Absence of joint modeling for hierarchical or ontology-aware genre structures (Li et al., 2023).
  • Incomplete metadata (missing cast/crew nodes or poor textual descriptions) remain problematic.

7. Open Challenges and Future Directions

Current research trajectories include:

A plausible implication is that continued advances in LLMs, self-supervised cross-modal learning, and adaptive thresholding/fusion strategies will further elevate multilabel movie genre classification performance and expand its applicability across diverse content ecosystems.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multilabel Movie Genre Classification.