Answering Ambiguous Questions via Iterative Prompting

Published 8 Jul 2023 in cs.CL | (2307.03897v1)

Abstract: In open-domain question answering, due to the ambiguity of questions, multiple plausible answers may exist. To provide feasible answers to an ambiguous question, one approach is to directly predict all valid answers, but this can struggle with balancing relevance and diversity. An alternative is to gather candidate answers and aggregate them, but this method can be computationally costly and may neglect dependencies among answers. In this paper, we present AmbigPrompt to address the imperfections of existing approaches to answering ambiguous questions. Specifically, we integrate an answering model with a prompting model in an iterative manner. The prompting model adaptively tracks the reading process and progressively triggers the answering model to compose distinct and relevant answers. Additionally, we develop a task-specific post-pretraining approach for both the answering model and the prompting model, which greatly improves the performance of our framework. Empirical studies on two commonly-used open benchmarks show that AmbigPrompt achieves state-of-the-art or competitive results while using less memory and having a lower inference latency than competing approaches. Additionally, AmbigPrompt also performs well in low-resource settings. The code are available at: https://github.com/sunnweiwei/AmbigPrompt.

Abstract PDF HTML Upgrade to Chat

Citations (5)

View on Semantic Scholar

Summary

The paper introduces AmbigPrompt, an iterative prompting framework for generating diverse and accurate multi-answers in open-domain question answering.
It leverages a retrospective prompting mechanism to progressively condition on previous answers, ensuring improved relevance and reduced computational cost.
Experimental evaluations show AmbigPrompt outperforms baseline models on AmbigQA with significantly fewer parameters and lower latency, even in low-resource settings.

Iterative Prompting for Ambiguous Multi-Answer Question Answering

Motivation and Problem Formulation

The paper addresses the challenge of open-domain question answering (QA) where ambiguity in the question often leads to multiple plausible answers, reflecting real-world usage scenarios where human questions are commonly underspecified. Traditional approaches, either generating all answers in a single pass or aggregating candidates from multiple passages, fail to balance relevance and diversity efficiently and often neglect inter-answer dependencies or incur substantial computational overhead.

Ambiguous QA is formally cast as finding multiple plausible answers $\mathcal{A}$ for a question $q$ given a large corpus $\Omega$ . Passage retrieval yields evidence $\mathcal{C}$ used to infer $\mathcal{A}$ , emphasizing both precision (relevance) and recall (diversity).

Figure 1: An illustration of an open-domain question, its supporting Wikipedia passages, and the range of valid answers.

Methodology: AmbigPrompt Architecture

The main contribution is AmbigPrompt, an iterative prompting framework comprising an encoder-decoder answering model (FiD architecture) and a prompting model, which share parameters for seamless integration. Rather than generating all answers simultaneously, AmbigPrompt alternates between generating prompts conditioned on previously produced answers and composing new answers, progressively expanding the answer set.

Figure 2: AmbigPrompt's workflow interleaves prompt generation based on prior answers and answer generation, appending each new answer to the output set.

This iterative loop is implemented via a retrospective prompting mechanism, where the prompting model generates continuous prompts $\mathbf{E}$ by cross-attending to prior answers and the context. The FiD encoder then prepends $\mathbf{E}$ to its attention layers, with the decoder producing subsequent answers conditioned on both context and introspective prompts.

Figure 3: Retrospective prompting mechanism details, showing the cross-attention construction of prompting vectors $\mathbf{E}$ with answer context.

AmbigPrompt terminates generation once the "End of Iteration" token is produced, avoiding repeated answers and improving diversity.

Optimization and Task-Adaptive Pretraining

AmbigPrompt parameters are optimized in two stages:

Task-adaptive post-pretraining: On synthesized multi-answer QA from single-answer datasets, pseudo-answers are generated via an auxiliary reader. The model is trained to predict answers conditioned on variable prior answer sets to induce robustness to answer dependencies and ordering.
Prompt-based fine-tuning: On annotated multi-answer datasets, answers are shuffled, and the model is explicitly trained to output the termination token to stop iteration.

This optimization ensures the model generalizes effectively to multi-answer settings, including low-resource scenarios.

Experimental Evaluation and Results

AmbigPrompt is evaluated on AmbigQA and WebQSP benchmarks. On AmbigQA, it achieves an F1 of 48.7 (full set) and 38.8 (multi-answer subset), outperforming comparable baselines, including FiD and Refuel, in both accuracy and efficiency. AmbigPrompt uses only 220M parameters, significantly fewer than high-capacity models (e.g., RECTIFY at 6B), yet matches or surpasses them in performance and exhibits dramatically reduced latency and memory footprint.

Figure 4: (a) Latency (log scale) vs. F1 on AmbigQA, demonstrating AmbigPrompt's superior performance-resource profile. (b) Dataset size vs. F1, showing robust performance in low-resource settings.

AmbigPrompt also excels when trained on limited data, maintaining strong performance versus baselines even in low-resource configurations. Ablation studies highlight the necessity of task-adaptive pretraining, answer-conditional prompting, and interleaving cross-attention; removal of any component degrades performance substantially.

Figure 5: Comparative analysis of F1, Precision, Recall, and average answer count for AmbigPrompt versus FiD variants.

AmbigPrompt manages the relevance-diversity trade-off more effectively than FiD-multi, achieving higher precision without sacrificing recall and generating a balanced number of plausible answers.

Practical Implications and Theoretical Insights

AmbigPrompt demonstrates that iterative prompting with lightweight models enables efficient and accurate multi-answer QA without resorting to resource-intensive architectures. Its architecture conditions answer generation on introspective prompts, directly modeling dependencies and ensuring both diversity and relevance.

The approach can be extended to low-resource languages or domains where multi-answer annotation is sparse, as its prompting mechanism elicits knowledge from pre-trained models. Integrating chain-of-thought prompting or scaling to larger LLMs may further enhance performance, particularly in reasoning-intensive or high-complexity QA settings.

Conclusion

AmbigPrompt introduces a prompt-guided, iterative answer generation framework that addresses ambiguous open-domain questions by leveraging answer-conditioned continuous prompts and task-adaptive pretraining. The design achieves superior performance to both low- and high-capacity baselines with markedly reduced computational requirements. The iterative prompting paradigm offers a scalable, resource-efficient direction for future multi-answer QA systems and may inform broader prompting-based strategies for complex generative tasks in AI.

Markdown Report Issue