NusaX-Senti: Multilingual Sentiment Analysis

Updated 20 January 2026

NusaX-Senti is a multilingual benchmark offering 1,000 carefully annotated sentiment sentences per language across 12 languages, including underrepresented Indonesian dialects and English.
It employs a rigorous annotation pipeline with manual translations, quality control, and balanced class distributions to maintain high data integrity.
The benchmark supports advanced NLP methods such as zero-shot cross-lingual adaptation, sparse fine-tuning, and circuit-based transfer learning, driving robust performance improvements.

NusaX-Senti is a multilingual parallel sentiment classification benchmark constructed for evaluating and advancing NLP technologies on Indonesia’s underrepresented local languages. It provides high-quality, parallel, sentence-level sentiment labels across twelve languages including ten Indonesian local languages, standard Indonesian, and English. NusaX-Senti has catalyzed research in zero-shot cross-lingual adaptation, compositional sparse fine-tuning, circuit-based transfer learning, and robust resource construction methods.

1. Dataset Composition and Annotation Pipeline

NusaX-Senti comprises 1,000 sentences per language, stratified by sentiment class and split into 500 train, 100 development, and 400 test examples for each of twelve languages: Acehnese (ace), Balinese (ban), Banjarese (bjn), Buginese (bug), Madurese (mad), Minangkabau (min), Javanese (jav), Ngaju (nij), Sundanese (sun), Toba Batak (bbc), Indonesian (ind), and English (eng) (Winata et al., 2022).

All sentiment examples are annotated with three-way polarity ("positive," "negative," "neutral")—a distribution precisely documented per split. Source sentences derive from the SmSA dataset (≈11k multi-domain Indonesian user comments), then filtered to remove abusive content and sampled to preserve class balance. Translation into local languages and English is performed by bilingual annotators following guidelines that preserve sentiment polarity, named entities, full content, and informal register, normalized as required. Annotator recruitment entails mutual intelligibility screening and language proficiency testing. Quality control includes reviewer swaps and correction tracking; 5% of examples are intentionally perturbed to ensure attentiveness. Types of QC edits, notably in Balinese, Sundanese, and Javanese, are quantified, with word edits and typo corrections most prevalent.

The dataset structure is rigorously curated to eliminate cross-split contamination, notably excluding any overlap between SMSA and NusaX-Senti test sets (Simon et al., 21 May 2025). Inter-annotator agreement is formalized via Cohen’s $\kappa$ , though no specific coefficient is reported.

2. Parallelism, Lexicon Construction, and Downstream Benchmarking

NusaX-Senti is fully parallel: every language’s 1,000 sentences are direct human translations of the same Indonesian sources, eschewing automatic MT or synthetic back-translations (Winata et al., 2022). This design enables benchmarking both sentiment analysis and multilingual machine translation across all $12^2$ pairs.

Sentiment lexicons are created by translating a 400-word Indonesian sentiment seed list into all target languages, yielding 800–1,600 bilingual pairs per language, with further augmentation via PanLex. Example entries such as "baik (good)” → “bagus” (Minangkabau) and “good” (English) illustrate the lexicon’s utility for dictionary-substitution baselines and data augmentation.

Multi-task benchmarking includes monolingual, multilingual (pooled data), and LOLO (Leave-One-Language-Out) setups. Metrics such as precision, recall, and $F_1$ are utilized, following standard definitions. Baseline results indicate strong performance for classical classifiers (SVM, logistic regression), but best cross-lingual generalization is achieved by large multilingual transformer models such as XLM-R $_{\text{LARGE}}$ , which offer 1–2 percentage point improvements in mean macro- $F_1$ over monolingual baselines.

3. Methodologies for Cross-Lingual Transfer

Recent advances apply NusaX-Senti as a benchmark for assessing methods in data-efficient adaptation and robust zero-shot transfer. Notable approaches include:

A. DeFT-X: Denoised Sparse Fine-Tuning

DeFT-X is a composable sparse fine-tuning framework wherein language-specific ( $\phi_L$ ) and task-specific ( $\phi_T$ ) ‘SFTs’ are learned, denoised, magnitude-pruned, and additively composed with the base model $\theta^{(0)}$ (XLM-R). For each fine-tuning objective, model weights are first denoised via singular value decomposition (SVD):

Compute $\delta = \theta^{(1)} - \theta^{(0)}$ ;
SVD: $W = U \Sigma V^\top$ per weight submatrix;
Truncate to rank $r$ (capturing $\approx90\%$ of energy);
Isolate residual $R = W - W_k$ , prune to keep $5\%$ of entries by magnitude;
Form denoised matrix $\widetilde{W} = W_k + (m \odot R)$ ;
Aggregate and prune $\widetilde{\delta}$ to $k$ trainable parameters, mask via $\mu$ , and fine-tune only those entries.

At inference, combine $\theta_{TL} = \theta^{(0)} + \phi_L + \phi_T$ and stack the classifier head from sentiment adaptation. This enables effective transfer to target NusaX languages with only source-language labeled data and a few MB of unlabeled target corpora (Simon et al., 21 May 2025).

B. CT-SFT: Circuit-Targeted Supervised Fine-Tuning

CT-SFT (Circuit-Targeted Supervised Fine-Tuning) targets adaptation via task-relevant transformer circuits. After competence tuning on source data, the method discovers attention heads most relevant to sentiment via label-balanced mean baseline subtraction and task-directional relevance scoring:

Identify heads via directional relevance $R(s, t)$ , propagating signal/baseline streams;
Select circuits by top- $K$ heads at increasing depth (e.g., 6/12/18 heads, comprising $0.23\%$ – $0.66\%$ of parameters);
Perform mechanism-guided fine-tuning on target language data with head-level gradient masking;
Update only selected heads plus LayerNorm parameters, all other weights frozen.

This approach leverages the hypothesis that crucial sentiment-coding mechanisms are transferable, and restricts adaptation to maintain proxy-language competence and minimize catastrophic forgetting (Nur'aini et al., 13 Jan 2026).

4. Empirical Results and Comparative Performance

NusaX-Senti enables rigorous evaluation across transfer protocols. Table: Sentiment micro-F1 (NusaX-Senti, XLM-R $_{\text{BASE}}$ ) for five languages (Simon et al., 21 May 2025):

Method	mad	bjn	ban	ace	min	Avg
MAD-X	68.5	77.6	78.0	74.9	79.9	75.8
LT-SFT	79.0	82.7	80.4	75.7	83.0	80.2
DeFT-X (r=100)	79.8	83.8	81.4	76.8	85.1	81.4

DeFT-X yields relative gains over LT-SFT in all languages tested (mean +1.2% micro-F1), with overlap analysis indicating reduced destructive interference between language and task-specific vectors (parameter overlap $\lesssim$ 20% vs. $\gtrsim$ 30% for LT-SFT). Ablation studies confirm that denoising (low-rank SVD plus magnitude-pruned residual) is critical; omitting higher-order components or pruning degrades performance by up to 2.2% F1. Sparse fine-tuning is fundamental; eliminating it drops average F1 by 10.6 points.

CT-SFT likewise demonstrates improvements. On NusaX-Senti’s small training pools (e.g., 100 samples per target), CT-SFT outperforms full-model fine-tuning by 0.05–0.16 accuracy, updating <1% parameters. Crucially, it preserves source-language competence: while continued full fine-tuning reduces Indonesian accuracy to 0.34–0.45, CT-SFT maintains it around 0.76. This suggests that surgical updates suppress catastrophic forgetting (Nur'aini et al., 13 Jan 2026).

5. Resource Construction Challenges and Insights

NusaX-Senti’s creation revealed multiplex operational and linguistic challenges (Winata et al., 2022):

Annotator recruitment: Standard crowdsourcing platforms do not accommodate these languages; dialector screening is essential.
Data quality: Informal registers, typographic variability, and code-switching inhibit automated filtering and necessitate bespoke QC pipelines.
Cross-lingual transfer: Malayic languages related to Indonesian (e.g., Minangkabau, Banjarese) exhibit positive transfer, whereas more divergent languages (e.g., Buginese, Toba Batak) yield lower zero-shot F1, indicating phylogenetic distance as a transfer determinant.
Multilingual modeling: Pooled models (multilingual, LOLO) surpass monolingual baselines even with minimal (≈500) training examples per language.

A plausible implication is that culturally anchored parallel corpora and targeted lexicons can rapidly bootstrap modeling capacity for severely low-resource languages if properly aligned and curated.

6. Practical Guidelines and Future Directions

NusaX-Senti provides empirically grounded motifs for transfer learning in extreme low-resource regimes:

Construct parallel culturally-aligned datasets anchored on a pivot language (e.g., Indonesian) when possible.
Augment resources via manual lexicon translation and external bilingual databases.
For cross-lingual transfer—adopt methods that sparsify adaptation (DeFT-X, CT-SFT) and denoise updates to minimize interference; denoising via SVD preserves generalization, while circuit selection in transformers localizes adaptation to task-relevant heads (Simon et al., 21 May 2025, Nur'aini et al., 13 Jan 2026).
Assemble several MB of unlabeled target data for MLM-driven adaptation, and prune fine-tuning masks to match adapter-comparable parameter budgets ( $\sim$ 3–5%).
Exploit related-language parallels in extending NLP technology for other under-resourced linguistic domains.

As NusaX-Senti demonstrates, well-constructed parallel sentiment resources facilitate both methodological innovation and a more comprehensive understanding of transfer bottlenecks, data quality trade-offs, and model adaptation strategies in low-resource NLP. Future research may integrate synthetic data, advanced circuit diagnostics, and broader phylogenetic sampling to further improve robustness and generalization.

Markdown Report Issue Upgrade to Chat

References (3)

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages (2022)

DeFTX: Denoised Sparse Fine-Tuning for Zero-Shot Cross-Lingual Transfer (2025)

Mechanisms are Transferable: Data-Efficient Low-Resource Adaptation via Circuit-Targeted Supervised Fine-Tuning (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to NusaX-Senti.