Multi-Turn Conversational Scams
- Multi-turn conversational scams are adversarial interactions using layered dialogue and psychological manipulation to extract sensitive data or trigger unauthorized actions.
- They employ escalation phases, persona role-play, and channel migration to bypass single-turn safety measures, necessitating adaptive LLM-based detection.
- Recent research utilizes simulated attacker-victim dialogues, dynamic risk scoring, and federated learning to enhance multi-turn scam detection and mitigation.
Multi-turn conversational scams are adversarial interactions—often orchestrated by human or automated agents—employing a sequence of strategically crafted dialogue turns to elicit sensitive information, money, or unauthorized actions from a target. These scams exploit temporally extended social dynamics, context layering, and psychological manipulation to evade detection by both humans and automated safety systems. In recent years, the advent of LLMs and generative agents has dramatically increased both the sophistication of scam methodologies and the challenges for defenders. Multi-turn scams exhibit escalation phases, persona role-play, channel migration, and context-aware adaptation that single-turn detection mechanisms consistently fail to capture.
1. Structural Taxonomy and Behavioral Patterns
Modern research distinguishes multi-turn scams according to interaction depth, attack surface, and adversarial goal. Scam classes include short-interaction urgent scams (e.g., phishing via pop-ups), medium-horizon technical-support or account recovery fraud, and long-horizon trust-building operations such as pig-butchering, which may last days to weeks and incorporate complex workflows: cross-platform contact-migration, multimedia verification, staged payments, and psychological grooming (Spokoyny et al., 27 Oct 2025).
Yuan et al. provide a high-resolution taxonomy using simulated LLM-to-LLM red teaming, identifying ten recurrent attacker strategy “families”: authority pressure, urgency creation, threat of loss, information harvesting, channel shift, credential engineering, reciprocity, sunk-cost exploitation, rapport building, and payment engineering. Empirical dialogue trajectory analysis reveals stable escalation envelopes—invariant across language and model architecture—whereby initial rapport segues through urgency and isolation toward high-risk transactional asks (Yuan et al., 6 Jan 2026).
Defender tactics have been grouped into analogous “defense families”: authority verification, deliberate delay, de-escalation, data minimization, channel control, credential skepticism, reciprocity resistance, exit readiness, emotional boundary enforcement, and payment friction. Automated and human-enabled defensive agents leverage these mechanisms to disrupt social engineering progression at multiple points in the adversarial envelope.
2. Psychological, Linguistic, and Social Engineering Techniques
Core to the efficacy of multi-turn scams are psychological manipulation frameworks, notably the Foot-in-the-Door (FITD) principle, incremental trust scaffolding, urgency, authority signaling, and emotional rapport. Yuan et al. (Yuan et al., 6 Jan 2026) and others demonstrate that these tactics are not only prevalent in organic attacks but are reproducibly invoked by LLM-powered scam simulators and attackers. Automated pipelines now encode FITD into multi-turn template generation—where a series of benign queries culminates in a policy-violating final request—yielding attack success rates (ASRs) for vulnerable LLMs in excess of 60% under multi-turn conditions (Kumarappan et al., 24 Nov 2025).
Linguistically, escalation is manifest in: progressive specificity, shifts from generic information-seeking to credential harvesting, pretextual authority role-claims, and migration from platform-contained chat to out-of-band channels. Channel shift (e.g., moving users from official support to encrypted messengers) and payment engineering (deploying social or technical jargon to legitimize fraudulent invoices) are canonical markers.
3. Simulation and Detection Methodologies
Contemporary research has unified multi-turn scam simulation, detection, and defensive response under the agentic LLM paradigm.
- Simulation: Controlled frameworks instantiate attacker (ScamBot) and victim (VictimBot) agents as LLM instances with explicit persona prompts. Multi-lingual and multi-scenario evaluations (e.g., 18,648 simulated dialogues across 8 models) permit systematic measurement of attack success and defensive robustness (Yuan et al., 6 Jan 2026). bespoke systems such as SE-VSim (Kumarage et al., 18 Mar 2025), Bot Wars (Basta et al., 10 Mar 2025), and Chatterbox (Spokoyny et al., 27 Oct 2025) automate long-horizon engagement, leveraging chain-of-thought reasoning, personality-driven prompt stratification, and structured context tracking.
- Detection: Modern LLM-based detectors ingest the full multi-turn conversational transcript and leverage self-attention, history-aggregating encoders, and, in recent advances, multi-channel risk scoring (dynamic context embedding, intent drift, escalation pattern matching) to flag high-risk exchanges (Kulkarni et al., 18 Mar 2025, Shen et al., 2024). CASE (Jaipuria et al., 27 Aug 2025) operationalizes a two-stage pipeline integrating real-time guided interviews with schema extraction for manual and automated enforcement. Real-world deployments report substantial gains: CASE demonstrated a 21% uplift in scam enforcement volumes on GPay India through integration of structured conversational intelligence.
- Evaluation Metrics: Attack Success Rate (ASR), F1-score, precision/recall, and human-interpretable outcome labeling are now standard. Taxonomy-driven annotation (BERTopic clustering, expert coding) enables fine-grained attribution of strategy prevalence and defense efficacy (Yuan et al., 6 Jan 2026). Robustness benchmarking of LLMs under single-turn and multi-turn conditions exposes dramatic gaps: e.g., GPT-4o Mini experiences up to 32 percentage point ASR inflation in multistep FITD, while Google Gemini 2.5 Flash is nearly immune (Kumarappan et al., 24 Nov 2025).
4. LLM Vulnerabilities and Failure Modes
LLMs, even those subjected to extensive reinforcement learning from human feedback (RLHF) and safety alignment, remain susceptible to incremental contextual exploitation. A dominant class of attacks decomposes prohibited queries into sub-questions that are individually innocuous but, in composition, accumulate to a disallowed outcome in the final conversational turn (Zhou et al., 2024, Nihal et al., 9 Oct 2025).
Identified structural weaknesses:
- Pattern specificity: Robustness to one conversational pattern (e.g., hypothetical discussion) does not generalize to others (e.g., personal experience or educational pretext) (Nihal et al., 9 Oct 2025).
- Failure of single-turn guardrails: Models aligned via per-turn refusals fail to generalize across dialogue-level intent accumulation (Zhou et al., 2024).
- Role instability and guardrail misalignment: Automated red teaming indicates high error rates—premature refusals or role confusion—especially in multilingual deployments due to safety alignment asymmetries (Yuan et al., 6 Jan 2026).
- Adaption to evolving tactics: Both genuine and simulated attackers display rapid adaptation, e.g., shifting from authority to rapport-building mid-dialogue, or leveraging multimedia/formatting obfuscation (Yao et al., 24 Dec 2025).
5. Defense Architectures and Mitigation Techniques
Defensive frameworks have evolved from static moderation to dynamic, context-aware supervision:
- Temporal Context Awareness (TCA): Implements continuous semantic drift tracking, cross-turn intention consistency checks, and sliding-window risk scoring. Demonstrated 92% detection accuracy in simulated adversarial dialogues, outperforming single-turn and static-drift thresholds by 8–24 percentage points (Kulkarni et al., 18 Mar 2025).
- Pattern-Aware Filtering: Pattern Enhanced Chain of Attack (PE-CoA) identifies five conversational archetypes, revealing fine-grained model vulnerability profiles and motivating "pattern-aware" defense layers (Nihal et al., 9 Oct 2025).
- Persona-based and delegate architectures: Systems such as SE-OmniGuard (Kumarage et al., 18 Mar 2025) utilize personality-aware risk models and agentic microservices for event-driven feature extraction and multilevel aggregation.
- FedAvg and Differentially Private Learning: AI-in-the-Loop (Hossain et al., 4 Sep 2025) demonstrates that federated training with differential privacy maintains high engagement and effective detection (PII leakage ≤ 0.0085) while avoiding raw data centralization.
- Multimodal and channel coverage: OCR on embedded images, multi-platform connector architectures, and crossmodal fusion are essential for detection completeness (Yao et al., 24 Dec 2025, Acharya et al., 2024).
Proposed guidelines for robust deployment include adversarial multi-turn benchmark fine-tuning, modular safety alignment decoupled from conversational context, escalation and pattern detector modules, runtime cumulative risk scoring, and cross-lingual dynamic testing.
6. Practical Applications and Empirical Findings
Scalable multi-turn engagement systems now power both research and real-world enforcement. CASE, deployed on GPay India, leverages Gemini LLM-based interviewers and extractors, achieving high topic adherence (99.9%), scam MO elicitation rates (75.3%), and robust safety compliance (99.9%). Chatterbox and ScamChatBot have executed thousands of decoy conversations, mapping scammer payment profiles and workflow milestones at scale (Spokoyny et al., 27 Oct 2025, Acharya et al., 2024). AI-in-the-Loop and Bot Wars have confirmed that dynamic agentic frameworks can both bait and stall adversaries, optimally balancing engagement, safety, and privacy through thresholded, harm-aware utility functions and guard model selection (Basta et al., 10 Mar 2025, Hossain et al., 4 Sep 2025). Dramatic context-induced vulnerability gaps have been documented: multi-turn context can increase attack success rates by over 30 percentage points in unsafe LLMs, exposing misalignment between intended and emergent conversational safety behaviors (Kumarappan et al., 24 Nov 2025).
7. Limitations, Open Challenges, and Future Directions
Despite measurable progress, critical limitations persist:
- Guardrail collapse and linguistic artifacts: Over-triggering of safety mechanisms (especially in Chinese), template artifacts, and coherence erosion disrupt both defensive fidelity and research evaluation (Yuan et al., 6 Jan 2026).
- Persona drift and prompt leakage: Maintaining stable simulation of victim and attacker roles across extended, cross-platform, and multimodal sessions remains challenging; model drift can yield spurious or invalid outputs (Spokoyny et al., 27 Oct 2025).
- Training and data scarcity: Current simulation and detection models depend on synthetic datasets, notable for domain drift and coverage gaps; increased real-world data and continual fine-tuning are flagged as future needs (Tan et al., 2024).
- Real-time and privacy-preserving constraints: Deployment in latency-sensitive, privacy-regulated, multilingual, and adversarially adaptive environments requires federated, privacy-aware, and robustly scalable architectures (Hossain et al., 4 Sep 2025).
- Benchmark development: The consensus across research is for multi-turn, cross-lingual adversarial benchmarks, persona and channel diversity, semantic context tracking, and escalation pattern cataloging as key ingredients for the next generation of defense models (Nihal et al., 9 Oct 2025, Yuan et al., 6 Jan 2026).
The rapidly evolving landscape of multi-turn conversational scams mandates layered, dynamically adaptive, and contextually aware defense frameworks, integrating advances in LLM adversarial simulation, psychological modeling, privacy engineering, and robust federated learning. These approaches will be critical to securing AI-empowered communication platforms against longitudinal social engineering and deception.