BookSum: A Collection of Datasets for Long-form Narrative Summarization

Published 18 May 2021 in cs.CL | (2105.08209v2)

Abstract: The majority of available text summarization datasets include short-form source documents that lack long-range causal and temporal dependencies, and often contain strong layout and stylistic biases. While relevant, such datasets will offer limited challenges for future generations of text summarization systems. We address these issues by introducing BookSum, a collection of datasets for long-form narrative summarization. Our dataset covers source documents from the literature domain, such as novels, plays and stories, and includes highly abstractive, human written summaries on three levels of granularity of increasing difficulty: paragraph-, chapter-, and book-level. The domain and structure of our dataset poses a unique set of challenges for summarization systems, which include: processing very long documents, non-trivial causal and temporal dependencies, and rich discourse structures. To facilitate future work, we trained and evaluated multiple extractive and abstractive summarization models as baselines for our dataset.

Abstract PDF Upgrade to Chat

Citations (128)

View on Semantic Scholar

Summary

The paper presents a novel dataset, BookSum, which addresses the limitations of summarizing lengthy literary texts through multi-granular annotations.
The methodology employs a hierarchical design with paragraph, chapter, and book-level summaries to rigorously test and improve current NLP models.
Experiments using models like BART and PEGASUS highlight challenges in abstraction quality and long-range dependency understanding.

An Academic Overview of "BookSum: A Collection of Datasets for Long-form Narrative Summarization"

The paper "BookSum: A Collection of Datasets for Long-form Narrative Summarization" introduces a novel collection of datasets aimed at advancing the domain of narrative summarization by focusing on long-form documents. This work addresses notable limitations in the current summarization datasets, which predominantly feature short-form texts such as news articles, and often exhibit layout biases that simplify summarization tasks.

Content and Contribution

The BookSum dataset encompasses narrative text from the literature domain, including novels, plays, and stories, accompanied by highly abstractive, human-written summaries. The dataset is structured into three levels of granularity—paragraph, chapter, and book-level—to present increasing levels of complexity for summarization systems. This hierarchical design offers a distinct challenge, as it requires models to process documents that range from several hundred words up to hundreds of pages.

Challenges Addressed

BookSum's construction addresses several domain-specific challenges for summarization models:

Processing Lengthy Documents: Existing neural models often struggle with the extreme length of literary works, and BookSum supports the development and evaluation of systems that can handle such inputs efficiently.
Understanding Causal and Temporal Dependencies: Literary narratives often demand comprehension of complex, long-range dependencies, an aspect that this dataset seeks to encapsulate and challenge directly.
Discourse Structure and Narrative Flow: Capturing the richness of storytelling, including subplots and narrative shifts, requires sophisticated document understanding and summarization strategies.

Methodology

To facilitate research, the authors implemented a comprehensive data preparation pipeline. Data was sourced from public-domain books available via the Project Gutenberg repository, whereas summaries were aggregated from online educational resources. The dataset compilation involved meticulous cleaning, splitting, and alignment processes, ensuring high-quality, coherent pairings of source texts and summaries.

Experimental Framework

The authors benchmarked a variety of current summarization models, both extractive and abstractive, to establish performance baselines on the BookSum dataset. Methods such as BART, PEGASUS, and transformer-based encoders were evaluated using metrics like ROUGE, BERTScore, and SummaQA. However, due to the abstractiveness and length of target summaries, the paper reveals challenges with existing evaluation metrics, pointing to the need for improved or new evaluation strategies that can better assess abstractive summarization quality in long-form texts.

Implications and Future Directions

BookSum promises to propel advancements in the summarization field, motivating innovations in both model architecture and evaluation methodology. The introduction of long-form literature challenges indicates a significant step toward more robust NLP systems capable of engaging with complex and extended textual information. Future developments may include designing memory-efficient models and exploring hierarchical processing techniques that align with human cognitive processes when summarizing expansive narratives.

Ultimately, BookSum provides an invaluable resource for researchers seeking to push the boundaries of what current summarization technology can achieve, moving toward comprehensive document understanding in varied narrative-rich contexts.