LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection

Published 8 Aug 2024 in cs.CL | (2408.04284v3)

Abstract: The ease of access to LLMs has enabled a widespread of machine-generated texts, and now it is often hard to tell whether a piece of text was human-written or machine-generated. This raises concerns about potential misuse, particularly within educational and academic domains. Thus, it is important to develop practical systems that can automate the process. Here, we present one such system, LLM-DetectAIve, designed for fine-grained detection. Unlike most previous work on machine-generated text detection, which focused on binary classification, LLM-DetectAIve supports four categories: (i) human-written, (ii) machine-generated, (iii) machine-written, then machine-humanized, and (iv) human-written, then machine-polished. Category (iii) aims to detect attempts to obfuscate the fact that a text was machine-generated, while category (iv) looks for cases where the LLM was used to polish a human-written text, which is typically acceptable in academic writing, but not in education. Our experiments show that LLM-DetectAIve can effectively identify the above four categories, which makes it a potentially useful tool in education, academia, and other domains. LLM-DetectAIve is publicly accessible at https://github.com/mbzuai-nlp/LLM-DetectAIve. The video describing our system is available at https://youtu.be/E8eT_bE7k8c.

Abstract PDF HTML Upgrade to Chat

Citations (2)

View on Semantic Scholar

Summary

The paper introduces a novel tool that classifies texts into four detailed categories, capturing nuanced LLM interventions.
The methodology fine-tunes state-of-the-art models like DeBERTa on a diverse, multi-domain dataset to achieve high accuracy.
The enhanced detection system aids in maintaining content authenticity in education, research, and forensic applications.

Fine-Grained Machine-Generated Text Detection with LLM-DetectAIve

The rapid proliferation of advanced LLMs such as GPT-4, Claude-3.5, Gemini-1.5, and Llama-70b has led to an increase in machine-generated texts (MGTs) across various domains. This surge presents significant challenges, particularly in differentiating between human-authored and machine-generated texts and in maintaining the integrity of textual content. The paper "LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection" introduces a novel system, LLM-DetectAIve, designed to address these challenges through fine-grained text classification, thereby offering new insights into the degrees of LLM intervention during text creation.

Methodology

LLM-DetectAIve classifies texts into four distinct categories:

Human-Written (HW): Text created solely by a human without any AI assistance.
Machine-Generated (MG): Text entirely produced by a machine without human intervention.
Machine-Written Machine-Humanized (MW-MH): Text initially generated by a machine and then subtly modified to appear more human-like.
Human-Written Machine-Polished (HW-MP): Text written by a human and subsequently refined or polished by a machine.

The inclusion of two additional categories beyond the conventional binary detection provides a nuanced understanding of the extent of LLM involvement in text production.

Dataset and Experimental Setup

A comprehensive dataset was curated and extended to encompass the four classification labels across diverse domains including arXiv, Wikihow, Wikipedia, Reddit, student essays (OUTFOX), and peer reviews (PeerRead). The dataset included 79,220 human-written texts and 103,075 machine-generated texts, further expanded with new LLMs and various prompts to generate the intricate classes.

Fine-tuning was conducted on state-of-the-art models such as RoBERTa, DeBERTa, and DistilBERT using this multi-way labeled dataset. Domain-specific detectors were initially evaluated, showing high accuracy within specific domains. To enhance generalization, a universal detector was trained, achieving notable performance.

Results

Experiments demonstrated that DeBERTa consistently outperformed RoBERTa across all metrics on the universal dataset, achieving an accuracy of 95.71%. Domain adversarial neural networks (DANNs) were further employed to improve cross-domain performance, leading to a significant enhancement in overall accuracy of up to 96.06% with RoBERTa.

The system was compared against existing MGT detection tools like GPTZero, ZeroGPT, and Sapling AI. LLM-DetectAIve outperformed these systems, achieving 97.50% accuracy, highlighting its robustness and reliability in fine-grained detection.

Implications and Future Work

The introduction of LLM-DetectAIve has substantial implications for maintaining the integrity and authenticity of textual content in various fields such as education and academic research. By providing fine-grained detection, it enables a more accurate assessment of text origin, which is crucial for ensuring fair evaluations in educational settings and aiding in authorship detection in forensic investigations.

Future advancements may involve improving DANN to achieve even more robust detection outcomes. The exploration of using DANNs on text generators instead of domains promises to enhance detection generalization across different text generators. Another potential development includes adding a fifth category—machine-written then human-edited—to cover more complex real-world scenarios.

Conclusion

LLM-DetectAIve presents a significant step forward in the domain of machine-generated text detection by offering a fine-grained classification system. This tool is particularly valuable for applications requiring high integrity and authenticity of textual content, promising to support fair academic evaluations and bolster trust in textual data across multiple domains. Future developments are anticipated to further refine and expand the capabilities of this innovative detection system.