Deceptive Humor: A Synthetic Multilingual Benchmark Dataset for Bridging Fabricated Claims with Humorous Content

Published 20 Mar 2025 in cs.CL | (2503.16031v3)

Abstract: In the evolving landscape of online discourse, misinformation increasingly adopts humorous tones to evade detection and gain traction. This work introduces Deceptive Humor as a novel research direction, emphasizing how false narratives, when coated in humor, can become more difficult to detect and more likely to spread. To support research in this space, we present the Deceptive Humor Dataset (DHD) a collection of humor-infused comments derived from fabricated claims using the ChatGPT-4o model. Each entry is labeled with a Satire Level (from 1 for subtle satire to 3 for overt satire) and categorized into five humor types: Dark Humor, Irony, Social Commentary, Wordplay, and Absurdity. The dataset spans English, Telugu, Hindi, Kannada, Tamil, and their code-mixed forms, making it a valuable resource for multilingual analysis. DHD offers a structured foundation for understanding how humor can serve as a vehicle for the propagation of misinformation, subtly enhancing its reach and impact. Strong baselines are established to encourage further research and model development in this emerging area.