SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section

Published 29 Aug 2024 in cs.CL | (2408.16444v1)

Abstract: Document summarization is a task to shorten texts into concise and informative summaries. This paper introduces a novel dataset designed for summarizing multiple scientific articles into a section of a survey. Our contributions are: (1) SurveySum, a new dataset addressing the gap in domain-specific summarization tools; (2) two specific pipelines to summarize scientific articles into a section of a survey; and (3) the evaluation of these pipelines using multiple metrics to compare their performance. Our results highlight the importance of high-quality retrieval stages and the impact of different configurations on the quality of generated summaries.

Abstract PDF HTML Upgrade to Chat

References (19)

Summary

The paper introduces SurveySum, a dataset specifically designed to summarize multiple scientific articles into cohesive survey sections.
It details two summarization pipelines using monoT5-3B and gpt-4-0125-preview, emphasizing the role of retrieval quality in summary accuracy.
Evaluation metrics including References F1, G-Eval, and Check-Eval demonstrate that advanced LLMs yield superior summaries compared to traditional methods.

SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section

The paper "SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section" by Fernandes et al. presents an innovative contribution to the domain of text summarization. This work addresses a critical gap in domain-specific summarization tools by introducing the SurveySum dataset, specifically designed for summarizing multiple scientific articles into coherent sections of a survey.

Introduction and Problem Statement

Document summarization aims to distill extensive texts into concise, informative summaries. The significance of this task is elevated in the context of scientific literature, where the volume of publications necessitates efficient summarization for comprehensible and accessible synthesis. Traditional summarization methods include extractive and abstractive approaches, each with distinct methodologies and challenges.

The extension to Multi-Document Summarization (MDS) brings additional complexity, requiring the amalgamation of information from varied sources while maintaining coherence and eliminating redundancy. Existing datasets like Multi-News and Multi-XScience adopt this approach in non-scientific and scientific contexts, respectively. However, the authors identify a significant gap in datasets aimed at generating cohesive sections of scientific surveys, which are integral for researchers to capture state-of-the-art developments comprehensively.

Contributions

The authors address this gap through three primary contributions:

SurveySum Dataset: This dataset is constructed by extracting sections from comprehensive surveys in artificial intelligence, natural language processing, and machine learning. These sections, along with the cited scientific articles, form the basis of the dataset, explicitly designed for the MDS task.
Summarization Pipelines: Two specific pipelines are proposed for summarizing scientific articles into survey sections. These pipelines involve stages of document retrieval, chunking of text, and final summary generation using LLMs.
Evaluation Framework: An extensive evaluation of the proposed pipelines using multiple metrics, providing a comparative analysis of their performance.

Methodology

The creation of SurveySum involves meticulously selecting comprehensive surveys based on predefined criteria, parsing these surveys to extract sections and their corresponding citations, and retrieving the full texts of these cited articles. This method ensures that the dataset encapsulates diverse topics while maintaining technical robustness.

Pipelines

Pipeline 1 employs a monoT5-3B model for retrieving text chunks and uses the gpt-3.5-turbo-0125 model to generate the final summaries. Three configurations were evaluated:

Pipeline 1.1: Summarization using 5 chunks.
Pipeline 1.2: Summarization using 10 chunks.
Pipeline 1.3: Utilizing articles retrieved from the Semantic Scholar API.

Pipeline 2 involves reranking text chunks using the SPECTER2 embeddings model and gpt-4-0125-preview:

Pipeline 2.1: Summarization using 1 chunk.
Pipeline 2.2: Summarization using 5 chunks.
Pipeline 2.3: Summarization using 10 chunks.
Pipeline 2.4: Utilizing gpt-4-0125-preview for reranking.
Pipeline 2.5: Utilizing gpt-4-0125-preview with 5 chunks.
Pipeline 2.6: Utilizing gpt-4-0125-preview with 10 chunks.

Evaluation and Results

The evaluation metrics employed include the References F1 Score, G-Eval, and Check-Eval. The results indicate a correlation between the quality of retrieval and the effectiveness of summarization. Notably, the pipeline configurations using articles from SurveySum outperformed those relying on Semantic Scholar retrieval in both G-Eval and Check-Eval scores. Moreover, setups utilizing the gpt-4-0125-preview model consistently yielded superior results compared to those using gpt-3.5-turbo-0125.

Implications and Future Work

The introduction of SurveySum and the proposed summarization pipelines provide a robust foundation for advancing MDS in the domain of scientific literature. The findings suggest that high-quality retrieval stages are crucial for generating coherent and accurate summaries. The differential performance of various LLMs underscores the importance of model selection in enhancing summarization quality.

Future research could explore the integration of more sophisticated retrieval mechanisms and the application of these pipelines in other scientific domains. Additionally, improving the granularity and interpretability of evaluation metrics would further augment the benchmarking of summarization models.

In summary, this paper offers a significant contribution to document summarization, particularly in the scientific domain, by addressing the unique challenges of summarizing multiple articles into coherent survey sections. The proposed methodologies and the SurveySum dataset lay the groundwork for future advancements in MDS, with practical implications for efficiently navigating and synthesizing the ever-expanding body of scientific literature.

Markdown Report Issue