Multi-party Code-switching Dialogues

Updated 31 January 2026

Multi-party code-switching dialogues are interactive exchanges where speakers alternate languages at sentence or intra-sentential levels, reflecting complex sociolinguistic dynamics.
Key datasets such as MaSaC, CodeSwitch-Reddit, and PingPong offer diverse modalities, detailed annotations, and metrics like CMI and SPF to benchmark dialogue systems.
Advanced computational models employing personality-aware fusion (PA₃) improve response generation but highlight challenges in handling non-linear, code-mixed, and multi-threaded inputs.

Multi-party code-switching dialogues are interactive exchanges in which multiple participants alternate between two or more languages within their conversational turns, often at the level of sentence, clause, or even intra-sentential phrase. Such phenomena are widespread among multilingual communities, both in spoken and written modalities, and present distinctive challenges for computational approaches to dialogue modeling, natural language understanding, and generation. The complexity of these dialogues arises not only from linguistic mixing but also from diverse social, pragmatic, and structural factors, including speaker roles, thread structure, long-range references, and the integration of individual traits such as personality.

1. Data Resources and Corpus Construction

The curation of authentic multi-party code-switching dialogue datasets is foundational for empirical analysis and model development. Three datasets comprise the leading resources for this task: MaSaC (Kumar et al., 2024), CodeSwitch-Reddit (Rabinovich et al., 2019), and PingPong (Farhansyah et al., 24 Jan 2026).

MaSaC (Hindi–English): Constructed from transcripts of the Indian sitcom “Sarabhai v/s Sarabhai,” MaSaC contains 8,607 multi-party dialogues spanning 11,440 utterances with an average of 3.6 speakers and rich code-mixing. Speaker identities are preserved, and turns average ∼10 tokens (maximum 218). Speaker roles are annotated, and expert-derived Big-Five personality labels are available for evaluation.
CodeSwitch-Reddit (Written Multilingual Forums): This dataset aggregates 135,313 posts from country-specific subreddits, filtered for code-switching between English and one of five languages (Tagalog, Greek, Romanian, Indonesian, Russian) using polyglot language identification and strict authenticity criteria. The dataset tracks author ID, subreddit, timestamps, and parent links, enabling the reconstruction of multi-party, threaded interactions.
PingPong (Multi-party, Multi-threaded Dialogues): PingPong offers 500 human-authored text-chat dialogues across five language combinations—two bilingual and three trilingual—each containing 2 to 4 speakers. Collected via Discord, the conversations allow for multi-threaded reply structures (replying to earlier turns; cross-references), and participant selection prioritizes authentic code-switching proficiency.

These resources vary in modality (spoken/written), annotation granularity (from utterance-level to post-level), and supported analytical depth. PingPong explicitly quantifies multi-threaded, multi-party structure (average turns per dialog: ∼60–98; average tokens: ∼649–1,120), employs the Code‑Mixing Index (CMI) and Switch‑Point Fraction (SPF) for measuring mixing intensity, and preserves explicit reply links for study of non-linear turn-taking.

Dataset	Modality	Lang. Pairs	Dialogs/Posts	Speaker Tracking	Thread Structure
MaSaC	Spoken	Hindi–English	8,607	Yes	Linear turns
CodeSwitch-Reddit	Written	5 English+L2	135,313	Yes (author ID)	Parent links
PingPong	Written	2 bi-, 3 tri-lingual	500	Yes	Multi-threaded

2. Annotation Schemes and Definition of Multi-party Code-switching

Precise annotation is pivotal for both empirical analysis and system development. MaSaC annotates speakers and Big-Five personality traits; dialogue segments are defined as ordered sequences of speaker–utterance pairs, $C_t = \{(s_1, u_1), \ldots, (s_t, u_t)\}$ , with subsequent response generation conditioned on both context and personality. CodeSwitch-Reddit identifies code-switched posts using per-post language identification, filters out quoted replies and named entities to ensure authentic code-mixing, and preserves parent–child relationships for thread reconstruction.

PingPong defines dialogues as lists of conversational turns among multiple speakers, permitting both linear and threaded reply structures. Turn-level metadata (speaker, reply-to index, timestamp) are maintained, supporting structural analysis such as speaker dominance ( $D_s$ ), reply distance ( $d_{reply} = |i - j|$ ), and aggregation of mixing metrics.

No dataset currently provides token-level, in-turn language boundary annotation for multi-party code-switching, though CodeSwitch-Reddit’s raw data supports future extension to pointwise CS tagging and full-thread reconstructions.

3. Computational Modeling: Response Generation and Fusion Mechanisms

The modeling of multi-party code-switching dialogues demands systems able to handle complex context, multiple speakers, and linguistic mixing. A detailed architecture for personality-aware code-mixed response generation appears in (Kumar et al., 2024):

Personality Induction: Speaker traits are induced in an unsupervised fashion by training a transformer encoder (RoBERTa-base) to predict Big-Five traits ( $p_{t+1}$ ) from contextual embeddings ( $h_{context}$ ).
Fusion (PA₃ Mechanism): The PA₃ module injects personality into the encoder via a two-step attention mechanism:
1. Context-aware key/value fusion, where keys and values are gated by both context and personality representations.
2. Axial attention, performing self-attention along individual tensor axes to tightly couple personality and dialogue context.
Encoder–Decoder Architecture: PA₃ is integrated into standard transformers (BART, T5), modifying self-attention at each encoder layer while the decoder attends to the fused representation $\hat{H}$ .

The objective combines cross-entropy loss on personality classification and maximum-likelihood generation loss over response tokens:

$L_{total} = L_{gen} + \alpha L_{cls}$

Training employs a shared vocabulary (∼18,000 terms) with byte-pair encoding to accommodate code-mixed tokens.

4. Quantitative and Qualitative Evaluations

Multi-party code-switching dialogue systems are evaluated using established metrics (ROUGE, BLEU, BERTScore) and human assessment.

Personality-aware Response Generation (Kumar et al., 2024): Incorporating PA₃ yields measurable improvements (BART backbone): ROUGE-1 increase +1.73, ROUGE-2 +1.11, BLEU-1 +1.26, BERTScore +2.81. Ablation demonstrates that full two-step fusion (context + axial) achieves the highest gains.
Human Evaluation: Annotators rate fluency, coherence, relevance, and personality alignment, with PA₃-infused models scoring 3.1–3.2 (out of 5), outperforming non-personality models (∼2.0).
PingPong Baselines (Farhansyah et al., 24 Jan 2026): Current LLMs (Qwen2.5, Gemma, Sahabat-AI-Gemma) perform suboptimally, especially on non-English and trilingual dialogs: QA accuracy below 30% zero-shot in low-resource pairs, summarization ROUGE-L as low as 0.058 in AR–DZ–FR, and topic classification accuracy dropping to ∼30% in complex dialogs.

This suggests that multi-party code-switching conversational modeling remains a markedly challenging open problem, with current NLP systems exhibiting limitations in coherence tracking, long-range reference resolution, and code-mixed input understanding.

5. Structural and Sociolinguistic Dynamics

PingPong and CodeSwitch-Reddit document specific properties of multi-party code-switching dialogue structure:

Structural Complexity (Farhansyah et al., 24 Jan 2026): Human-authored code-switching dialogues demonstrate greater variance in utterance length (e.g., 66.54 vs. 35.14 tokens for 2-speaker dialogs), higher speaker-dominance ratios (3.256 in 4-speaker groups), and non-linear reply patterns (average reply distance $d_{reply}$ ∼2.7–4.1).
Content and Style (Rabinovich et al., 2019): Topic models reveal that CS posts typically focus on personal, relational, and emotional themes, while monolingual posts skew toward public/political topics.
Formality: CS posts on Reddit have statistically significant higher informality scores (mean CS = 0.160 vs. mono = 0.153; $p \approx 1 \times 10^{-23}$ ).
Speaker Proficiency: High code-switchers utilize slightly less sophisticated vocabulary, but their grammatical structures (sentence length, parse depth) are more complex—each difference statistically significant.

A plausible implication is that linguistic agility in code-switching may relate to a speaker’s ability to manage multi-dimensional, context-sensitive dialogue acts.

6. Methodological Challenges and Future Directions

Research in multi-party code-switching dialogues contends with several methodological obstacles:

Annotation Gaps: Absence of token-level language boundary tagging across multi-party threads constrains fine-grained modeling and empirical analysis.
Model Limitations: Current LLMs are typically optimized for linear, monolingual dialogues; multi-threaded, code-mixed inputs provoke failures in coherence, sequence modeling, and entity tracking (Farhansyah et al., 24 Jan 2026).
Personality and User Profiling: (Kumar et al., 2024) illustrates that incorporating unsupervised personality induction leads to substantial generative improvements; extensions to richer user profiles (e.g., age, locale, continuous trait embeddings) are recommended.
Corpus Expansion: Scaling approaches to more language pairs and real-world genres (chat logs, social media) necessitates large, code-mixed corpora, robust BPE re-tokenization, and inclusive demographic metadata.

Best practices highlight joint training for code-switching and personality modeling, context-aware attention fusion, and leveraging thread/tree metadata for multi-party structure reconstruction.

7. Applications and Practical Implications

The development and analysis of multi-party code-switching dialogue datasets catalyze several applications:

Language Identification: Token-level and utterance-level CS detection algorithms can be benchmarked and improved using these datasets (Rabinovich et al., 2019).
Dialogue Management: The preservation of parent–child post links in CodeSwitch-Reddit and explicit threading in PingPong allows for development of dialogue managers capable of modeling turn order, structure, and code-switching patterns.
Response Generation: Personality-aware and context-sensitive code-mixed dialogue agents can be constructed for realistic domains, leveraging PA₃ fusion and unsupervised trait induction (Kumar et al., 2024).
Benchmarking NLP Systems: The persistent performance gap on code-mixed tasks (QA, summarization, topic classification) in PingPong provides an empirical yardstick for future NLP models targeting multilingual, multi-party settings.

Further, these resources enable sociolinguistic study of multilingual discourse, including accommodation, priming, and style shift across successively mixed-language utterances in complex group interaction. Expansion and refinement of multi-party, code-mixed resources are likely to spur advances in robust, fair, and adaptive dialogue systems for truly global communication contexts.