Presentations are not always linear! GNN meets LLM for Document-to-Presentation Transformation with Attribution

Published 21 May 2024 in cs.CL and cs.AI | (2405.13095v1)

Abstract: Automatically generating a presentation from the text of a long document is a challenging and useful problem. In contrast to a flat summary, a presentation needs to have a better and non-linear narrative, i.e., the content of a slide can come from different and non-contiguous parts of the given document. However, it is difficult to incorporate such non-linear mapping of content to slides and ensure that the content is faithful to the document. LLMs are prone to hallucination and their performance degrades with the length of the input document. Towards this, we propose a novel graph based solution where we learn a graph from the input document and use a combination of graph neural network and LLM to generate a presentation with attribution of content for each slide. We conduct thorough experiments to show the merit of our approach compared to directly using LLMs for this task.

Abstract PDF HTML Upgrade to Chat

References (29)

Citations (2)

View on Semantic Scholar

Summary

The paper introduces GDP, a novel method integrating Graph Neural Networks and Large Language Models to convert documents into coherent, non-linear presentations.
It leverages graph construction, GNN-based paragraph embeddings, and spectral clustering to effectively cluster content before generating slides with iterative LLM prompts.
Evaluation on the SciDuet dataset shows improved narrative structure and content fidelity over baseline approaches, validating the method's practical impact.

Document-to-Presentation Transformation with GNN and LLM

Introduction

The transformation of long documents into presentations poses significant challenges due to the need for a non-linear narrative structure that effectively captures the document's essence. The paper "Presentations are not always linear! GNN meets LLM for Document-to-Presentation Transformation with Attribution" introduces a novel methodology that leverages Graph Neural Networks (GNN) and LLMs to address these challenges. The aim is to generate presentations that attribute content accurately and maintain coherence despite the non-linear mapping from the source document.

Methodology Overview

The proposed method, GDP (Graph-based Document to Presentation), constructs a graph from the input document where nodes represent paragraphs. GNNs are employed to learn latent semantic relationships between these paragraphs, allowing for effective clustering into coherent slide content. The approach addresses the inherent limitations of LLMs, such as hallucinations and the challenges posed by long input contexts.

Figure 1: A presentation (right) with non-linear narrative and attribution to the source document (left).

Graph Construction and Neural Network Integration

Graph Construction: Each paragraph in the document is represented as a node. Edges between nodes are formed based on semantic similarity, quantified using a fine-tuned RoBERTa-based classifier. The threshold for edge creation is determined experimentally to balance graph sparsity and connectivity.
Graph Neural Network (GNN) Training: A two-layer Graph Convolutional Network (GCN) processes this graph structure to embed paragraphs into a semantically rich vector space. The unsupervised training objective is to minimize a binary cross-entropy loss over the graph's edges, promoting similar embeddings for semantically linked paragraphs.
Clustering via Spectral Clustering: The node embeddings obtained from the GNN are clustered into groups representing slides. Spectral clustering is utilized owing to its ability to handle complex, non-convex data distributions, which are anticipated in the latent paragraph representations.
Figure 2: Architectural Diagram.

Slide Generation with LLM

Once clusters are established, an LLM, specifically GPT-3.5, enhances the cluster-to-slide transformation. The model is prompted iteratively to generate slides by feeding it text from clustered paragraphs along with summaries of preceding slides to maintain narrative coherence. This step combines the narrative structuring capabilities of neural networks with the natural language generation strengths of LLMs.

Experimental Setup and Evaluation

The proposed method is evaluated on the SciDuet dataset, comprising academic documents and their corresponding presentations. The authors compare GDP against baseline methods, including direct LLM applications:

Baseline Comparisons: Standard GPT-based approaches, such as GPT-Flat and GPT-COT, perform poorly in terms of content fidelity and narrative flow primarily due to the linear nature of their context processing.
Performance Metrics: The evaluation utilizes ROUGE scores for lexical matching, Coverage metrics for content completeness, Perplexity for fluency, and a custom non-linearity metric to assess narrative structure.

Non-Linearity and Content Attribution

The GDP methodology demonstrates significant improvements in generating presentations that reflect non-linear narratives akin to human-generated presentations. The non-linearity metric for human-created presentations is approximately 38.6%, while GDP achieves 24.9%, indicating a more narrative-centric arrangement of slides without linear constraints.

Figure 3: Qualitative example to compare the slides generated by a baseline GPT-Flat and our approach GDP from the input document.

Implications and Future Work

This research presents a well-rounded approach that melds GNN's structure learning with LLM's language generation capabilities, effectively addressing both narrative coherence and attribution accuracy. Future directions could explore incorporating multimodal inputs, adapting the framework for diverse document types, and enhancing template selection to enrich slide presentation aesthetics.

Conclusion

The GDP approach marks significant progress in document-to-presentation transformations, effectively managing non-linear narratives and maintaining content fidelity. By leveraging the advanced capabilities of GNNs and LLMs, the methodology overcomes traditional summarization limitations, offering a sophisticated tool for automating presentation generation.

Markdown Report Issue

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Practical Applications

Immediate Applications

Below are actionable use cases that can be deployed now, leveraging the paper’s GDP pipeline (graph learning + LLM) and its core features: non-linear narrative construction, slide-level attribution to source paragraphs, improved coverage/fluency, and domain-agnostic applicability to long, text-heavy documents.

Industry

Enterprise document-to-deck generation with traceability
- Sectors: software, consulting, legal, finance, energy
- Workflow: Ingest long reports/PRDs/RFPs/whitepapers → extract paragraphs → build document graph → cluster → generate slides with titles and attributed bullet points → export PPTX/Google Slides.
- Tools/products: “Presenter with Attribution” plugin for PowerPoint/Google Slides; GDP-as-a-Service API; SharePoint/Confluence integration.
- Assumptions/dependencies: High-quality PDF/text extraction; reliable paragraph segmentation; user-provided slide count K; LLM access; organizational content privacy controls.
Sales enablement and proposal acceleration
- Sectors: B2B SaaS, consulting, manufacturing
- Workflow: Convert proposals and case studies into client-ready decks with slide-level links to source paragraphs for quick edits and compliance.
- Tools/products: CRM/CPQ plugin; proposal-to-deck generator.
- Assumptions/dependencies: Accurate mapping of proposal sections; consistent document formatting.
Marketing content repurposing
- Sectors: marketing, media
- Workflow: Turn long-form blogs/whitepapers into campaign decks with non-linear narrative tailored to audience; include attributed bullets to speed approvals.
- Tools/products: CMS plugin (e.g., WordPress/HubSpot) exporting decks; brand checklist integration.
- Assumptions/dependencies: Brand style constraints; reviewer approval workflows.
Knowledge management and auditing
- Sectors: regulated industries (finance, healthcare, aerospace)
- Workflow: Generate training decks from policies/SOPs with slide-to-paragraph attribution to support audits and reduce hallucination risk.
- Tools/products: Compliance auditor dashboard linking slides to source paragraphs.
- Assumptions/dependencies: Document versioning; access controls; policy repositories.
Analyst report summarization to investor decks
- Sectors: finance
- Workflow: Non-linear narrative captures cross-section insights (market overview → risks → valuations), with paragraph-level citations.
- Tools/products: Deck generator integrated with financial research platforms.
- Assumptions/dependencies: Correct financial terminology; domain-specific LLM prompts.
Technical support and field manuals
- Sectors: manufacturing, energy, telecom
- Workflow: Convert long technical manuals into stepwise training decks with attributed instructions and safety notes.
- Tools/products: LMS integration; offline deck export for field use.
- Assumptions/dependencies: Consistent manual structures; text-heavy documents.

Academia and Education

Research paper to talk slides
- Sectors: academia, edtech
- Workflow: Ingest papers → generate non-linear decks that mirror human presentation narrative; attribution aids last-minute edits and fact-checking.
- Tools/products: Conference prep assistant; integrated with arXiv/Institutional repositories.
- Assumptions/dependencies: PDF quality; slides count K set by presenter.
Course material and lecture prep
- Sectors: education
- Workflow: Convert chapters/long readings into lecture decks, preserving narrative (problem → methods → results) with slide-level citations.
- Tools/products: LMS plugin (Moodle/Canvas); instructor dashboard.
- Assumptions/dependencies: Text-centric readings; instructor curation.

Policy and Government

Legislative briefings and stakeholder decks
- Sectors: public policy, government
- Workflow: Turn long bills/reports into briefings with traceable bullets back to statutory text; supports transparency and reduces misinterpretation.
- Tools/products: Briefing generator for committees; public portal with clickable citations.
- Assumptions/dependencies: Document standardization; privacy and FOIA considerations.

Healthcare

Clinical guideline and policy brief slides
- Sectors: healthcare administration
- Workflow: Convert guidelines/policies into training decks; attribution supports compliance and reduces risk of clinical misstatements.
- Tools/products: Hospital policy-to-training deck tool.
- Assumptions/dependencies: Text-only focus (no images/diagrams yet); domain prompting.

Daily Life

Book/article-to-presentation generator for study groups
- Sectors: consumer productivity
- Workflow: Create study decks from long readings; non-linear narrative supports thematic discussion; attributed slides aid citation.
- Tools/products: Browser/Notion plugin with deck export.
- Assumptions/dependencies: Clean text extraction; user-defined slide count.

Long-Term Applications

These depend on further research, scaling, or development—especially multimodal handling, template/style intelligence, and automation of agenda/slide count.

Multimodal and Template Intelligence

Multimodal document-to-deck (images/tables/diagrams)
- Sectors: healthcare, engineering, research, finance
- Innovation: Integrate VLMs (e.g., CLIP/LLaVA) for figure/table extraction and slide construction with captions and chart reproductions.
- Tools/products: “Multimodal Presenter” with visual attribution.
- Dependencies: Robust OCR/table/figure parsing; domain-specific visual understanding; privacy/compliance for images.
Style and template recommendation
- Sectors: marketing, corporate communications
- Innovation: Automatic selection of slide layouts, colors, and themes matched to content intent and audience persona.
- Tools/products: Theme selector; brand compliance checker.
- Dependencies: Brand guidelines; intent detection; user preference models.

Advanced Narrative and Personalization

Audience-aware narrative shaping
- Sectors: education, enterprise training, sales
- Innovation: Personalize non-linear narratives for different roles (exec vs. technical) using graph-level re-weighting and constrained generation.
- Tools/products: Narrative designer; persona switcher.
- Dependencies: Role metadata; evaluation of comprehension outcomes.
Automated agenda and slide-count inference
- Sectors: all
- Innovation: Predict K and agenda topics from document graph (avoiding manual K input), with constraints on time and audience.
- Tools/products: Time-bounded deck planner.
- Dependencies: Reliable topic segmentation; pacing models.

Reliability, Governance, and Collaboration

End-to-end fact-checking and hallucination detection
- Sectors: regulated industries, public sector
- Innovation: Use attribution + retrieval to auto-flag bullets not supported by source paragraphs; integrate trust scores.
- Tools/products: Fact-checker panel; compliance audit trails.
- Dependencies: High-precision citation alignment; organizational policies.
Collaborative editing on the document graph
- Sectors: enterprise productivity
- Innovation: Editable paragraph-slide graph; users can move nodes, see downstream effects, and re-generate affected slides.
- Tools/products: Graph workspace; versioning with change impact preview.
- Dependencies: Real-time graph ops; UI scalability.

Cross-Document and Knowledge Integration

Cross-document synthesis decks
- Sectors: consulting, research, policy
- Innovation: Build a unified graph across multiple documents; cluster nodes into cross-source slides with source-level attribution for synthesis.
- Tools/products: Multi-source synthesizer; cross-repo search integration.
- Dependencies: Document normalization; deduplication; citation management.
Knowledge base integration for ongoing updates
- Sectors: enterprise KM
- Innovation: Reusable graph representations of documents enabling auto-updated decks when source docs change.
- Tools/products: “Live Decks” connected to repositories.
- Dependencies: Change detection; incremental clustering; governance.

Evaluation and Metrics Transfer

Narrative non-linearity and coverage as standard quality metrics
- Sectors: edtech, NLG tooling
- Innovation: Adopt the paper’s non-linearity and coverage metrics to evaluate other generative summarizers and slide generators.
- Tools/products: NLG evaluator SDK; analytics dashboards.
- Dependencies: Agreement studies with human raters; task-specific calibrations.

Multilingual and Accessibility

Multilingual doc-to-deck with source-language attribution
- Sectors: global enterprises, NGOs
- Innovation: Generate decks in different languages while preserving citations to original paragraphs; support parallel corpora.
- Tools/products: Localization-aware presenter.
- Dependencies: Robust multilingual LLMs; translation quality control.
Accessibility-first decks
- Sectors: public sector, education
- Innovation: Auto-generate alt-text and screen-reader friendly structure based on the graph and attribution.
- Tools/products: Accessibility checker and generator.
- Dependencies: Standards compliance (e.g., WCAG); multimodal support.

Real-Time and Streaming

Real-time meeting notes to attributed slides
- Sectors: enterprise productivity
- Innovation: Stream transcriptions → segment into “paragraphs” → build evolving graph → generate ongoing slides for live briefings.
- Tools/products: Meeting assistant presenter.
- Dependencies: High-quality ASR; latency constraints; dynamic graph updates.

Sector-Specific Extensions

Clinical trial/protocol decks with risk and rationale tracing
- Sectors: healthcare, pharma
- Innovation: Non-linear narrative linking rationale, methods, endpoints, and risks with precise citations.
- Tools/products: Protocol-to-deck generator.
- Dependencies: Domain ontologies; strict compliance workflows.
Regulatory impact summaries
- Sectors: finance, energy, public policy
- Innovation: Auto-extract implications across sections and present decision-ready slides; maintain traceability to specific clauses.
- Tools/products: Regulatory navigator.
- Dependencies: Legal text parsing; expert validation cycles.

Notes on feasibility across applications:

The current pipeline is text-only; multimodal adoption is a key dependency for technical and healthcare-heavy domains.
Slide count K must be provided at inference; automated K/agenda inference requires additional modeling.
Attribution depends on accurate paragraph segmentation and graph thresholding; noisy PDFs/OCR can reduce precision.
Privacy, governance, and LLM access costs must be addressed for enterprise deployment.
Domain adaptation (prompts/classifiers) may be required for specialized jargon (finance, legal, clinical).

Presentations are not always linear! GNN meets LLM for Document-to-Presentation Transformation with Attribution

Summary

Document-to-Presentation Transformation with GNN and LLM

Introduction

Methodology Overview

Graph Construction and Neural Network Integration

Slide Generation with LLM

Experimental Setup and Evaluation

Non-Linearity and Content Attribution

Implications and Future Work

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Practical Applications

Immediate Applications

Industry

Academia and Education

Policy and Government

Healthcare

Daily Life

Long-Term Applications

Multimodal and Template Intelligence

Advanced Narrative and Personalization

Reliability, Governance, and Collaboration

Cross-Document and Knowledge Integration

Evaluation and Metrics Transfer

Multilingual and Accessibility

Real-Time and Streaming

Sector-Specific Extensions

Open Problems

Continue Learning

Authors (4)

Collections

Tweets

Presentations are not always linear! GNN meets LLM for Document-to-Presentation Transformation with Attribution

Summary

Document-to-Presentation Transformation with GNN and LLM

Introduction

Methodology Overview

Graph Construction and Neural Network Integration

Slide Generation with LLM

Experimental Setup and Evaluation

Non-Linearity and Content Attribution

Implications and Future Work

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Practical Applications

Immediate Applications

Industry

Academia and Education

Policy and Government

Healthcare

Daily Life

Long-Term Applications

Multimodal and Template Intelligence

Advanced Narrative and Personalization

Reliability, Governance, and Collaboration

Cross-Document and Knowledge Integration

Evaluation and Metrics Transfer

Multilingual and Accessibility

Real-Time and Streaming

Sector-Specific Extensions

Open Problems

Continue Learning

Related Papers

Authors (4)

Collections

Tweets