Active Prompting with Chain-of-Thought for Large Language Models

Published 23 Feb 2023 in cs.CL | (2302.12246v5)

Abstract: The increasing scale of LLMs brings emergent abilities to various complex tasks requiring reasoning, such as arithmetic and commonsense reasoning. It is known that the effective design of task-specific prompts is critical for LLMs' ability to produce high-quality answers. In particular, an effective approach for complex question-and-answer tasks is example-based prompting with chain-of-thought (CoT) reasoning, which significantly improves the performance of LLMs. However, current CoT methods rely on a fixed set of human-annotated exemplars, which are not necessarily the most effective examples for different tasks. This paper proposes a new method, Active-Prompt, to adapt LLMs to different tasks with task-specific example prompts (annotated with human-designed CoT reasoning). For this purpose, we propose a solution to the key problem of determining which questions are the most important and helpful ones to annotate from a pool of task-specific queries. By borrowing ideas from the related problem of uncertainty-based active learning, we introduce several metrics to characterize the uncertainty so as to select the most uncertain questions for annotation. Experimental results demonstrate the superiority of our proposed method, achieving state-of-the-art on eight complex reasoning tasks. Further analyses of different uncertainty metrics, pool sizes, zero-shot learning, and accuracy-uncertainty relationship demonstrate the effectiveness of our method. Our code will be available at https://github.com/shizhediao/active-prompt.

Abstract PDF Upgrade to Chat

Citations (100)

View on Semantic Scholar

Summary

The paper introduces Active-Prompt, a novel method that integrates uncertainty-based active learning with human-designed chain-of-thought reasoning to optimize task-specific prompts for large language models.
It employs metrics such as disagreement and entropy to dynamically select and annotate the most uncertain queries, reducing reliance on fixed human exemplars.
Evaluations on eight reasoning tasks, including arithmetic and commonsense challenges, demonstrate significant improvements over traditional chain-of-thought methods.

Active Prompting with Chain-of-Thought for LLMs

LLMs have exhibited remarkable abilities across a range of complex reasoning tasks, including arithmetic, commonsense reasoning, and symbolic reasoning. A crucial factor in harnessing these capabilities is the design of task-specific prompts that effectively guide the LLMs to produce accurate outputs. This paper introduces "Active-Prompt," a novel method that leverages task-specific prompting augmented with human-designed chain-of-thought (CoT) reasoning, aiming to adapt LLMs more efficiently to various complex reasoning tasks.

The traditional approach of employing chain-of-thought prompting involves using a fixed set of human-annotated exemplars. However, these exemplars might not always be optimal for every task as they are often fixed and manually curated without tailoring to task-specific nuances. To address this limitation, the authors propose the Active-Prompt method, which dynamically selects the most critical questions to annotate, drawing from a pool of task-specific queries, thereby optimizing the examples used in the prompting phase.

The core of the Active-Prompt method lies in the selection mechanism for annotating questions, which borrows ideas from uncertainty-based active learning. The method introduces several uncertainty metrics to guide the selection process, including disagreement, entropy, variance, and self-confidence. These metrics assess the uncertainty in LLM predictions and identify the most uncertain questions for subsequent human annotation. Once annotated, these examples provide tailored exemplary reasoning chains for LLMs during inference.

Experimental evaluations on eight complex reasoning tasks demonstrated the efficacy of Active-Prompt. The method achieved state-of-the-art results, significantly outperforming existing practices, including baseline chain-of-thought and self-consistency approaches. The assessments covered various reasoning challenges, with particular emphasis on arithmetic and commonsense reasoning.

For instance, in arithmetic reasoning tasks such as GSM8K and AQuA, Active-Prompt improved results by judiciously selecting contextually relevant and uncertain examples to be probed and annotated. Additionally, in commonsense tasks, where variability and ambiguity are higher, adoption of entropy-based selection proved advantageous.

The implications of this research are multifaceted:

Practically, the Active-Prompt method reduces reliance on extensive human-curated exemplars by employing a structured strategy to identify and annotate only the most impactful questions, optimizing both time and resources.
Theoretically, the study enriches the understanding of integrating active learning principles with in-context learning paradigms in large models, paving the way for intelligent prompting mechanisms that dynamically adapt to the evolving capacities of LLMs.

Future research could explore further interplay between uncertainty and diversity in exemplars, and experiment with the collaborative potential wherein active selections could be tuned with learned models or meta-learning frameworks, pushing the boundaries of LLM efficiency and performance in new domains.

In conclusion, Active-Prompt represents a significant advance in optimizing LLM reasoning through intelligent prompting, offering fresh insights and techniques that could be instrumental in the continued development of adaptive natural language understanding systems.