Tracing Thought: Using Chain-of-Thought Reasoning to Identify the LLM Behind AI-Generated Text

Published 23 Apr 2025 in cs.CL and cs.AI | (2504.16913v1)

Abstract: In recent years, the detection of AI-generated text has become a critical area of research due to concerns about academic integrity, misinformation, and ethical AI deployment. This paper presents COT Fine-tuned, a novel framework for detecting AI-generated text and identifying the specific LLM. responsible for generating the text. We propose a dual-task approach, where Task A involves classifying text as AI-generated or human-written, and Task B identifies the specific LLM behind the text. The key innovation of our method lies in the use of Chain-of-Thought reasoning, which enables the model to generate explanations for its predictions, enhancing transparency and interpretability. Our experiments demonstrate that COT Fine-tuned achieves high accuracy in both tasks, with strong performance in LLM identification and human-AI classification. We also show that the CoT reasoning process contributes significantly to the models effectiveness and interpretability.

Abstract PDF Upgrade to Chat

Summary

AI-Generated Text Detection: Tracing Thought

The paper "OSINT at CT2 - AI-Generated Text Detection: Tracing Thought: Using Chain-of-Thought Reasoning to Identify the LLM Behind AI-Generated Text" by Shifali Agrahari and Sanasam Ranbir Singh addresses a pertinent issue in the field of natural language processing: the detection and classification of AI-generated text. This research is essential due to growing concerns regarding misinformation, academic integrity, and ethical AI deployment. The study presents a novel framework, COT_Finetuned, which offers a dual-task approach for not only determining if a text is AI-generated but also identifying the specific language model responsible for its creation.

Methodology

The authors propose a dual-task approach comprising two distinct yet interconnected tasks. Task A involves binary classification to discern whether a given text document is AI-generated or human-written. Task B extends this to identifying which specific LLM has generated the text, focusing on models like GPT-4, DeBERTa, and others. A key innovation of this framework is the implementation of Chain-of-Thought (CoT) reasoning, which enhances the transparency and interpretability of the model's predictions by generating explanations alongside them.

The technical methodology involves fine-tuning a model using a dataset labeled for both tasks, incorporating a combined loss function to optimize classification accuracy and interpretability. The inclusion of CoT reasoning allows for a structured explanation of stylistic choices and decision-making processes unique to different LLMs.

Results and Analysis

The paper reports significant improvement in classification accuracy facilitated by the CoT reasoning process. COT_Finetuned displays high performance in both tasks, significantly outperforming traditional methods in identifying AI-generated text and determining its source LLM. The numerical results indicate robust performance metrics, underscoring the method's efficacy in real-world applications.

Implications and Future Directions

This research has notable implications for content moderation and academic integrity, offering a system that can reliably differentiate between human and machine-generated content. The transparency afforded by CoT reasoning is significant for ethical AI deployment, providing insights into the algorithmic decision-making that can be crucial for users and policymakers alike.

The paper opens several avenues for future research. The distinction of specific stylistic patterns among various LLMs suggests potential advancements in the personalization and customization of AI models for diverse applications. Additionally, the approach encourages further exploration into CoT reasoning, which could enrich other domains of AI explainability.

In conclusion, this study contributes to the field by enhancing our understanding and methodology regarding the detection of AI-generated text. Its dual-task framework not only provides clarity on the authorship of AI content but also indicates the responsible deployment of AI technologies in various domains, aligning with ethical standards and supporting content authenticity.