- The paper introduces a novel tool that classifies texts into four detailed categories, capturing nuanced LLM interventions.
- The methodology fine-tunes state-of-the-art models like DeBERTa on a diverse, multi-domain dataset to achieve high accuracy.
- The enhanced detection system aids in maintaining content authenticity in education, research, and forensic applications.
Fine-Grained Machine-Generated Text Detection with LLM-DetectAIve
The rapid proliferation of advanced LLMs such as GPT-4, Claude-3.5, Gemini-1.5, and Llama-70b has led to an increase in machine-generated texts (MGTs) across various domains. This surge presents significant challenges, particularly in differentiating between human-authored and machine-generated texts and in maintaining the integrity of textual content. The paper "LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection" introduces a novel system, LLM-DetectAIve, designed to address these challenges through fine-grained text classification, thereby offering new insights into the degrees of LLM intervention during text creation.
Methodology
LLM-DetectAIve classifies texts into four distinct categories:
- Human-Written (HW): Text created solely by a human without any AI assistance.
- Machine-Generated (MG): Text entirely produced by a machine without human intervention.
- Machine-Written Machine-Humanized (MW-MH): Text initially generated by a machine and then subtly modified to appear more human-like.
- Human-Written Machine-Polished (HW-MP): Text written by a human and subsequently refined or polished by a machine.
The inclusion of two additional categories beyond the conventional binary detection provides a nuanced understanding of the extent of LLM involvement in text production.
Dataset and Experimental Setup
A comprehensive dataset was curated and extended to encompass the four classification labels across diverse domains including arXiv, Wikihow, Wikipedia, Reddit, student essays (OUTFOX), and peer reviews (PeerRead). The dataset included 79,220 human-written texts and 103,075 machine-generated texts, further expanded with new LLMs and various prompts to generate the intricate classes.
Fine-tuning was conducted on state-of-the-art models such as RoBERTa, DeBERTa, and DistilBERT using this multi-way labeled dataset. Domain-specific detectors were initially evaluated, showing high accuracy within specific domains. To enhance generalization, a universal detector was trained, achieving notable performance.
Results
Experiments demonstrated that DeBERTa consistently outperformed RoBERTa across all metrics on the universal dataset, achieving an accuracy of 95.71%. Domain adversarial neural networks (DANNs) were further employed to improve cross-domain performance, leading to a significant enhancement in overall accuracy of up to 96.06% with RoBERTa.
The system was compared against existing MGT detection tools like GPTZero, ZeroGPT, and Sapling AI. LLM-DetectAIve outperformed these systems, achieving 97.50% accuracy, highlighting its robustness and reliability in fine-grained detection.
Implications and Future Work
The introduction of LLM-DetectAIve has substantial implications for maintaining the integrity and authenticity of textual content in various fields such as education and academic research. By providing fine-grained detection, it enables a more accurate assessment of text origin, which is crucial for ensuring fair evaluations in educational settings and aiding in authorship detection in forensic investigations.
Future advancements may involve improving DANN to achieve even more robust detection outcomes. The exploration of using DANNs on text generators instead of domains promises to enhance detection generalization across different text generators. Another potential development includes adding a fifth category—machine-written then human-edited—to cover more complex real-world scenarios.
Conclusion
LLM-DetectAIve presents a significant step forward in the domain of machine-generated text detection by offering a fine-grained classification system. This tool is particularly valuable for applications requiring high integrity and authenticity of textual content, promising to support fair academic evaluations and bolster trust in textual data across multiple domains. Future developments are anticipated to further refine and expand the capabilities of this innovative detection system.