CaSiNo Dataset Overview
- CaSiNo is a large-scale dataset comprising 1030 negotiation dialogues with rich annotations capturing multiple persuasion strategies.
- The dataset uses controlled experimental protocols with detailed pre- and post-survey metadata and systematic, turn-based resource negotiation.
- It supports multi-task neural modeling for utterance-level strategy recognition, advancing research in negotiation language and dialogue systems.
The CaSiNo dataset is a large-scale corpus of English-language negotiation dialogues designed to support the study and development of automatic negotiation systems. Collected with controlled experimental methodology, CaSiNo provides annotated, multi-turn dyadic interactions in a closed-domain setting where human participants negotiate the distribution of limited camping resources. The dataset is accompanied by detailed metadata, rich annotation of persuasion strategies, and a modeling benchmark focused on utterance-level strategy recognition, enabling precise research in negotiation language modeling and dialogue systems (Chawla et al., 2021).
1. Corpus Design and Structure
CaSiNo comprises 1030 negotiation dialogues involving 846 unique U.S.-based participants. Recruitment was conducted via Amazon Mechanical Turk (MTurk), restricting eligibility to workers with at least 500 prior approved Human Intelligence Tasks and ≥95% approval rate. Within each dialogue, two participants, situated as "campsite neighbors," alternate turns to negotiate the division of three resource types—Food, Water, and Firewood—each comprising three packages. Each negotiator is randomly assigned a preference ordering (High/Medium/Low) over the three resources, and the point values per unit are: High = 5, Medium = 4, Low = 3 (maximum score per negotiator: 36). Dialogues average 11.6 utterances (22 tokens/utterance), ensuring sufficiently rich interaction for linguistic and strategic analysis.
The closed-domain scenario aligns negotiation language with a realistic task while maintaining tractable, symmetric, and reproducible evaluation based on realized points. This design diverges from abstract negotiation settings (e.g., "books/balls/hats") by fostering more linguistically complex and contextually motivated dialogue.
2. Data Collection and Experimental Protocol
The data collection protocol features several stages to ensure contextualization and controlled elicitation:
- Pre-survey: Participants provided demographics (age, gender, education, ethnicity), TIPI-measured Big Five personality, and Social Value Orientation.
- Negotiation Training: All participants completed a 5-minute tutorial emphasizing negotiation best practices—stating initial demands, disclosing preferences, emoting appropriately, and providing personal justifications.
- Preparation: Each participant reviewed their assigned resource priorities and wrote free-form justifications (e.g., activity-specific needs).
- Dialogue Phase: Conversations were conducted via ParlAI-based chat with enforced turn-taking (minimum 10 utterances per dialogue) and optional emoticons (happy, sad, angry, surprise).
- Agreement and Post-Survey: Negotiators submitted a proposed division, which was then accepted, rejected, or declined ("walked away"). Post-interaction, participants rated satisfaction and partner-likeness (5-point Likert).
- Incentive Structure: Each participant received a \$2 base payment and a per-point bonus of 8.33¢. If negotiations failed, both received five points (equivalent to a single high-priority resource).
The dataset encompasses all dialogue logs, system and user metadata, pre/post-surveys, negotiation outcomes, and real-valued payoffs.
3. Persuasion Strategy Annotation Scheme
Utterance-level persuasion strategies are annotated with a multi-label schema grounded in negotiation literature. CaSiNo-Ann, the annotation subset (396 dialogues, 4615 utterances), applies nine strategy labels plus a Non-strategic category:
- Prosocial—Generic: Small-Talk, Empathy, Coordination.
- Prosocial—Preference: No-Need, Elicit-Pref.
- Proself—Generic: Undervalue-Partner (UV-Part), Vouch-Fairness.
- Proself—Preference: Self-Need, Other-Need.
Each utterance is labeled by three expert annotators, with Krippendorff’s provided for inter-annotator agreement by strategy (e.g., Small-Talk: 0.81, Self-Need: 0.75, Other-Need: 0.89). Label frequency indicates high prevalence for Small-Talk (22.8%) and Self-Need (20.9%), and lower rates for others such as UV-Part (2.8%) and No-Need (4.2%).
| Label | Count | α |
|---|---|---|
| Small-Talk | 1054 | 0.81 |
| Self-Need | 964 | 0.75 |
| Coordination | 579 | 0.42 |
| Empathy | 254 | 0.42 |
| No-Need | 196 | 0.77 |
| Elicit-Pref | 377 | 0.77 |
| UV-Partner | 131 | 0.72 |
| Vouch-Fairness | 439 | 0.62 |
| Other-Need | 409 | 0.89 |
| Non-strategic | 1455 | — |
Counts and α from CaSiNo-Ann; multi-label per utterance.
4. Dataset Format, Splits, and Example Interactions
Distributed as per-dialogue JSON files, CaSiNo encodes the negotiation transcripts (including speaker, text, timestamp, and optional emoticon), pre-survey and post-survey responses, resource priorities and justifications, and negotiation outcome details (agreement, satisfaction, opponent-likeness, final points). While there is no fixed train/dev/test split on the full corpus, CaSiNo-Ann employs 5-fold cross-validation with a 5% held-out validation set per fold for model selection.
Exemplary annotated dialogues illustrate diverse strategy deployment. For instance, utterances like “How are you today? Did you have any preferences on the supplies we will be trading?” are tagged with Small-Talk, Coordination, and Elicit-Pref. In contrast, proself tactics such as “Do you really need that much firewood?” are annotated as UV-Part and Coordination.
5. Statistical and Correlational Analyses
Quantitative analysis reveals statistically significant relationships:
- Individual points correlate with self-reported satisfaction () and opponent-likeness ().
- Satisfaction and opponent-likeness are highly correlated ().
- Integrative potential (priority misalignment) associates positively with aggregate negotiated points ().
- Strategy usage correlates with negotiation outcomes; for example, Small-Talk is positively associated with opponent-likeness () and satisfaction (), while Vouch-Fairness correlates negatively with all three primary outcome measures (e.g., ).
6. Strategy Recognition and Modeling Benchmark
A multi-task neural modeling framework benchmarks automatic strategy recognition. Utterance inputs (current plus three previous turns) are encoded with a shared BERT-base encoder. Each strategy task is assigned a self-attention head and additional turn-index positional embeddings. The loss is multi-label binary cross-entropy, summing over utterances and strategy tasks: Modeling experiments incorporate in-domain pre-training (IDPT; MLM objective, masking) and data oversampling.
Strategy prediction results (mean across 5 folds) are as follows:
| Model | Avg F1 | Joint Accuracy |
|---|---|---|
| Majority | 0.0 | 39.6% |
| LR-BoW | 43.4 | 52.4% |
| BERT-FT | 58.5 | 64.0% |
| Multi-task + IDPT + OS | 68.3 | 70.2% |
Multi-task parameter sharing and in-domain pre-training significantly improve F1 on rare/imbalanced strategies (e.g., No-Need from 16.4→46.2, UV-Partner from 20.4→47.3) without loss on frequent strategies (Self-Need: 72.3→75.2, Elicit-Pref: 80.5→81.8).
7. Research Impact and Usage
CaSiNo enables granular empirical analysis of negotiation language, strategic interaction, and automatic recognition of persuasion tactics within a controlled but semantically rich environment. Its design supports experimental rigor and reproducibility for negotiation dialogue modeling, making it a critical resource for both dialogue system evaluation and social-cognitive studies of negotiation (Chawla et al., 2021).
A plausible implication is that CaSiNo’s annotated negotiations provide a naturalistic but reproducible substrate for research into automatic strategy adaption, emotion modeling, and human-machine negotiation, underlining its research value across computational linguistics, machine learning, and human-agent interaction communities.