Simple Applications of BERT for Ad Hoc Document Retrieval

Published 26 Mar 2019 in cs.IR and cs.CL | (1903.10972v1)

Abstract: Following recent successes in applying BERT to question answering, we explore simple applications to ad hoc document retrieval. This required confronting the challenge posed by documents that are typically longer than the length of input BERT was designed to handle. We address this issue by applying inference on sentences individually, and then aggregating sentence scores to produce document scores. Experiments on TREC microblog and newswire test collections show that our approach is simple yet effective, as we report the highest average precision on these datasets by neural approaches that we are aware of.

Abstract PDF Upgrade to Chat

Citations (192)

View on Semantic Scholar

Summary

Investigating BERT's Capabilities in Ad Hoc Document Retrieval

The paper "Simple Applications of BERT for Ad Hoc Document Retrieval" by Yang et al. presents a pragmatic approach to leveraging BERT, a pretrained deep learning LLM, for the task of ad hoc document retrieval. This approach is motivated by the successes of BERT in diverse NLP tasks, including question answering (QA), and seeks to address the unique challenges posed by document retrieval processes. Specifically, the authors focus on resolving the limitations of BERT concerning document length by adopting an inference strategy that processes sentences individually, followed by aggregation of these scores for document-level ranking.

Methodology Overview

The overarching strategy involves applying BERT for sentence-level scoring, necessitated by the relatively limited input length that BERT can process. This paper introduces a methodology where each sentence within a candidate document is evaluated independently using BERT, with an emphasis on subsequently aggregating these sentence scores to derive a comprehensive document score. The proposed method circumvents the intricacies of fine-tuning due to the unavailability of sentence-level relevance judgments in existing datasets. By harnessing established relevance matching techniques alongside BERT's semantic capabilities, this blended strategy aims to enhance retrieval effectiveness.

Experimental Evaluation

The efficacy of the proposed approach is evaluated using the TREC Microblog Tracks (2011-2014) and the TREC 2004 Robust Track datasets. Results distinctly reflect the method's proficiency through the attainment of the highest average precision scores when benchmarked against neural model approaches. Specifically, the approach demonstrates significant improvements in average precision (AP) and precision at rank 30 (P30) metrics across the TREC Microblog datasets, surpassing previously established neural models and similar baselines. For the TREC 2004 Robust Track, an analogous enhancement in performance metrics was observed, elucidating BERT's adaptability to document retrieval tasks previously calibrated for newswire articles.

Implications and Future Work

The presented method delineates a research trajectory that integrates sentence-level inference with simple aggregation techniques, producing high precision in document retrieval tasks. Notably, this method accentuates BERT's capability in adapting from question answering to document relevance scenarios without the necessity for sentence-level granular relevance annotations, suggesting a pathway for further research into distant supervision techniques and other finer granularity relevance judgments.

Moreover, the study's acknowledgment of superior effectiveness with microblog-derived fine-tuning data over QA data paves the way for further exploration into training dataset impacts and domain alignment in neural document retrieval contexts. Future investigations may consider expanding on relevance tasks, exploring full-document context capture, or refining aggregation processes to further enhance retrieval performance.

In conclusion, Yang et al.'s exploration into BERT for document retrieval signifies an evolutionary step in bridging pretrained models with traditional IR paradigms, offering insights into the adaptability and broad applicability of robust NLP models within IR systems.