DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG Rerankers

Published 28 May 2025 in cs.IR | (2505.22584v1)

Abstract: Rerankers play a critical role in multimodal Retrieval-Augmented Generation (RAG) by refining ranking of an initial set of retrieved documents. Rerankers are typically trained using hard negative mining, whose goal is to select pages for each query which rank high, but are actually irrelevant. However, this selection process is typically passive and restricted to what the retriever can find in the available corpus, leading to several inherent limitations. These include: limited diversity, negative examples which are often not hard enough, low controllability, and frequent false negatives which harm training. Our paper proposes an alternative approach: Single-Page Hard Negative Query Generation, which goes the other way around. Instead of retrieving negative pages per query, we generate hard negative queries per page. Using an automated LLM-VLM pipeline, and given a page and its positive query, we create hard negatives by rephrasing the query to be as similar as possible in form and context, yet not answerable from the page. This paradigm enables fine-grained control over the generated queries, resulting in diverse, hard, and targeted negatives. It also supports efficient false negative verification. Our experiments show that rerankers trained with data generated using our approach outperform existing models and significantly improve retrieval performance.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper presents a novel single-page hard negative query generation approach that overcomes limitations of traditional document-level mining.
It leverages a dual-stage pipeline using Pixtral-12B and Qwen models to generate and verify both positive and challenging unanswerable queries.
The method demonstrates significant precision improvements in benchmarks like ViDoReV2 and Real-MM-RAG, with tailored applications in finance.

The paper "DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG Rerankers" presents a novel approach to improving the efficacy of rerankers within the domain of Multi-Modal Retrieval-Augmented Generation (RAG). The focus is on a method for generating hard negative examples that refines the reranking process and enhances the retrieval quality of multimodal documents.

Introduction

The challenge in Retrieval-Augmented Generation frameworks lies in augmenting text with supplementary visual or structural elements while maintaining precision. First-stage retrieval models typically rely on embedding similarities that may falter in visually complex arenas. To address this, the study introduces a reranking strategy designed to optimize results beyond this initial retrieval.

Traditional methods for hard negative mining involve retrieving negative pages from the corpus, which tends to be passive, constrained by corpus content, and resource-heavy. This research reverses that process by initiating Single-Page Hard Negative Query Generation. Instead of focusing on negative pages, it produces queries that remain unanswerable based on current documents yet closely mimic positive iterations, thus broadening the difficulty and variety of training data.

Figure 1: Proposed Single-Page Hard Negative Query Generation Approach.

Methodology

Query Generation Pipeline

Positive Query Generation: Using Pixtral-12B VLM, the pipeline first generates candidate queries that may logically be derived from document content.
Verification: Employing Qwen2.5-VL-7B-Instruct VLM, each query's potential to be directly answered by the document is tested, ensuring robustness through dual prompt strategies.
Figure 2: Query Verification Prompt.
Hard Negative Query Generation: The approach uses Qwen2.5-7B-Instruct LLM to derive negative queries that maintain structural resemblance to positives but comprise unanswerable elements—culminating in a robust, controllable data set for reranking improvements.
Figure 3: Negative Generation Prompt.

Figure 4: Examples of Our Generated Negative Queries.

Finance-Specific Adaptation

Leveraging data variability inherent in financial documents, a finance-tailored dataset was constructed. This set employs fine-grained negative query generation to tackle prevalent challenges, such as distinguishing temporal identifiers or financial metrics.

Figure 5: Finance Negative Generation Prompt.

Results and Evaluation

Datasets and Benchmarks

DocReRank's effectiveness was validated across benchmarks like ViDoReV2 and Real-MM-RAG, emphasizing its retrieval and precision capabilities. The datasets harnessed include Col-HNDoc for document mining and novel datasets like Col-HNQue for procedural negatives.

Model Training

A multimodal reranker based on Qwen2-VL-2B-Instruct, DocReRank capitalizes on LoRA to maintain relevance throughout document evaluation and query matching. Performance enhancements were pronounced when DocReRank data was introduced, eclipsing models reliant solely on document-level negatives.

The reranker demonstrates superior learning, employing tripartite (query, image, label) training parameters, bolstered by efficient data division into negatives and rephrasings.

Conclusion

The Single-Page Hard Negative Query Generation method outlined provides a substantial leap forward for refining RAG frameworks. In doing so, it unearths new paradigms in retrieval precision, affording significant cross-domain adaptability. Future developments are projected to extend this query generation model, cultivating more finely-tuned and application-specific datasets for enhanced balanced output. The rigorous generation and validation processes prescribed could serve as a benchmark for future endeavors into document-centric retrieval frameworks.

In addressing the inherent inflexibility and computational intensiveness of traditional document-level mining, the methodology demonstrated promises discernible in its measured improvements over extant models. This positions it as a vital tool for contexts necessitating finely nuanced data discrimination, notably in domains rich with intricate detail, such as financial documentation.

Markdown Report Issue