Papers
Topics
Authors
Recent
Search
2000 character limit reached

Presentations are not always linear! GNN meets LLM for Document-to-Presentation Transformation with Attribution

Published 21 May 2024 in cs.CL and cs.AI | (2405.13095v1)

Abstract: Automatically generating a presentation from the text of a long document is a challenging and useful problem. In contrast to a flat summary, a presentation needs to have a better and non-linear narrative, i.e., the content of a slide can come from different and non-contiguous parts of the given document. However, it is difficult to incorporate such non-linear mapping of content to slides and ensure that the content is faithful to the document. LLMs are prone to hallucination and their performance degrades with the length of the input document. Towards this, we propose a novel graph based solution where we learn a graph from the input document and use a combination of graph neural network and LLM to generate a presentation with attribution of content for each slide. We conduct thorough experiments to show the merit of our approach compared to directly using LLMs for this task.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. ’auto-presentation’: a multi-agent system for building automatic multi-modal presentation of a topic from world wide web information. In IEEE/WIC/ACM International Conference on Intelligent Agent Technology, pages 246–249.
  2. Automatic era: Presentation slides from academic paper. In 2016 International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT), pages 809–814.
  3. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  4. Doc2ppt: Automatic presentation slides generation from scientific documents. In AAAI Conference on Artificial Intelligence.
  5. Yue Hu and Xiaojun Wan. 2013. Ppsgen: learning to generate presentation slides for academic papers. In Twenty-Third International Joint Conference on Artificial Intelligence. Citeseer.
  6. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  7. Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net.
  8. Deep submodular networks for extractive data summarization. arXiv preprint arXiv:2010.08593.
  9. Towards topic-aware slide generation for academic papers with unsupervised mutual learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 13243–13251.
  10. VMSMO: Learning to generate multimodal summary for video-based news articles. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9360–9369, Online. Association for Computational Linguistics.
  11. Visual instruction tuning. Advances in neural information processing systems, 36.
  12. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172.
  13. G-eval: NLG evaluation using gpt-4 with better human alignment. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2511–2522, Singapore. Association for Computational Linguistics.
  14. Roberta: A robustly optimized bert pretraining approach. Preprint, arXiv:1907.11692.
  15. On learning text style transfer with direct rewards. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4262–4273, Online. Association for Computational Linguistics.
  16. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 14.
  17. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
  18. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
  19. Garr Reynolds. 2011. Presentation Zen: Simple ideas on presentation design and delivery. New Riders.
  20. Slidesgen: Automatic generation of presentation slides for a technical paper using summarization. In Twenty-Second International FLAIRS Conference.
  21. D2S: Document-to-slide generation via query-based text summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1405–1418, Online. Association for Computational Linguistics.
  22. Presentation slides generation from scientific papers using support vector regression. In 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT), pages 286–291.
  23. Llama 2: Open foundation and fine-tuned chat models. Preprint, arXiv:2307.09288.
  24. Phrase-based presentation slides generation for academic papers. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31.
  25. Phrase-based presentation slides generation for academic papers. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1).
  26. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  27. Automatically generating engaging presentation slide decks. In EvoMUSART.
  28. Kaige Xie and Mark Riedl. 2024. Creating suspenseful stories: Iterative planning with large language models. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2391–2407, St. Julian’s, Malta. Association for Computational Linguistics.
  29. MSMO: Multimodal summarization with multimodal output. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4154–4164, Brussels, Belgium. Association for Computational Linguistics.
Citations (2)

Summary

  • The paper introduces GDP, a novel method integrating Graph Neural Networks and Large Language Models to convert documents into coherent, non-linear presentations.
  • It leverages graph construction, GNN-based paragraph embeddings, and spectral clustering to effectively cluster content before generating slides with iterative LLM prompts.
  • Evaluation on the SciDuet dataset shows improved narrative structure and content fidelity over baseline approaches, validating the method's practical impact.

Document-to-Presentation Transformation with GNN and LLM

Introduction

The transformation of long documents into presentations poses significant challenges due to the need for a non-linear narrative structure that effectively captures the document's essence. The paper "Presentations are not always linear! GNN meets LLM for Document-to-Presentation Transformation with Attribution" introduces a novel methodology that leverages Graph Neural Networks (GNN) and LLMs to address these challenges. The aim is to generate presentations that attribute content accurately and maintain coherence despite the non-linear mapping from the source document.

Methodology Overview

The proposed method, GDP (Graph-based Document to Presentation), constructs a graph from the input document where nodes represent paragraphs. GNNs are employed to learn latent semantic relationships between these paragraphs, allowing for effective clustering into coherent slide content. The approach addresses the inherent limitations of LLMs, such as hallucinations and the challenges posed by long input contexts. Figure 1

Figure 1: A presentation (right) with non-linear narrative and attribution to the source document (left).

Graph Construction and Neural Network Integration

  1. Graph Construction: Each paragraph in the document is represented as a node. Edges between nodes are formed based on semantic similarity, quantified using a fine-tuned RoBERTa-based classifier. The threshold for edge creation is determined experimentally to balance graph sparsity and connectivity.
  2. Graph Neural Network (GNN) Training: A two-layer Graph Convolutional Network (GCN) processes this graph structure to embed paragraphs into a semantically rich vector space. The unsupervised training objective is to minimize a binary cross-entropy loss over the graph's edges, promoting similar embeddings for semantically linked paragraphs.
  3. Clustering via Spectral Clustering: The node embeddings obtained from the GNN are clustered into groups representing slides. Spectral clustering is utilized owing to its ability to handle complex, non-convex data distributions, which are anticipated in the latent paragraph representations. Figure 2

    Figure 2: Architectural Diagram.

Slide Generation with LLM

Once clusters are established, an LLM, specifically GPT-3.5, enhances the cluster-to-slide transformation. The model is prompted iteratively to generate slides by feeding it text from clustered paragraphs along with summaries of preceding slides to maintain narrative coherence. This step combines the narrative structuring capabilities of neural networks with the natural language generation strengths of LLMs.

Experimental Setup and Evaluation

The proposed method is evaluated on the SciDuet dataset, comprising academic documents and their corresponding presentations. The authors compare GDP against baseline methods, including direct LLM applications:

  • Baseline Comparisons: Standard GPT-based approaches, such as GPT-Flat and GPT-COT, perform poorly in terms of content fidelity and narrative flow primarily due to the linear nature of their context processing.
  • Performance Metrics: The evaluation utilizes ROUGE scores for lexical matching, Coverage metrics for content completeness, Perplexity for fluency, and a custom non-linearity metric to assess narrative structure.

Non-Linearity and Content Attribution

The GDP methodology demonstrates significant improvements in generating presentations that reflect non-linear narratives akin to human-generated presentations. The non-linearity metric for human-created presentations is approximately 38.6%, while GDP achieves 24.9%, indicating a more narrative-centric arrangement of slides without linear constraints. Figure 3

Figure 3: Qualitative example to compare the slides generated by a baseline GPT-Flat and our approach GDP from the input document.

Implications and Future Work

This research presents a well-rounded approach that melds GNN's structure learning with LLM's language generation capabilities, effectively addressing both narrative coherence and attribution accuracy. Future directions could explore incorporating multimodal inputs, adapting the framework for diverse document types, and enhancing template selection to enrich slide presentation aesthetics.

Conclusion

The GDP approach marks significant progress in document-to-presentation transformations, effectively managing non-linear narratives and maintaining content fidelity. By leveraging the advanced capabilities of GNNs and LLMs, the methodology overcomes traditional summarization limitations, offering a sophisticated tool for automating presentation generation.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Practical Applications

Immediate Applications

Below are actionable use cases that can be deployed now, leveraging the paper’s GDP pipeline (graph learning + LLM) and its core features: non-linear narrative construction, slide-level attribution to source paragraphs, improved coverage/fluency, and domain-agnostic applicability to long, text-heavy documents.

Industry

  • Enterprise document-to-deck generation with traceability
    • Sectors: software, consulting, legal, finance, energy
    • Workflow: Ingest long reports/PRDs/RFPs/whitepapers → extract paragraphs → build document graph → cluster → generate slides with titles and attributed bullet points → export PPTX/Google Slides.
    • Tools/products: “Presenter with Attribution” plugin for PowerPoint/Google Slides; GDP-as-a-Service API; SharePoint/Confluence integration.
    • Assumptions/dependencies: High-quality PDF/text extraction; reliable paragraph segmentation; user-provided slide count K; LLM access; organizational content privacy controls.
  • Sales enablement and proposal acceleration
    • Sectors: B2B SaaS, consulting, manufacturing
    • Workflow: Convert proposals and case studies into client-ready decks with slide-level links to source paragraphs for quick edits and compliance.
    • Tools/products: CRM/CPQ plugin; proposal-to-deck generator.
    • Assumptions/dependencies: Accurate mapping of proposal sections; consistent document formatting.
  • Marketing content repurposing
    • Sectors: marketing, media
    • Workflow: Turn long-form blogs/whitepapers into campaign decks with non-linear narrative tailored to audience; include attributed bullets to speed approvals.
    • Tools/products: CMS plugin (e.g., WordPress/HubSpot) exporting decks; brand checklist integration.
    • Assumptions/dependencies: Brand style constraints; reviewer approval workflows.
  • Knowledge management and auditing
    • Sectors: regulated industries (finance, healthcare, aerospace)
    • Workflow: Generate training decks from policies/SOPs with slide-to-paragraph attribution to support audits and reduce hallucination risk.
    • Tools/products: Compliance auditor dashboard linking slides to source paragraphs.
    • Assumptions/dependencies: Document versioning; access controls; policy repositories.
  • Analyst report summarization to investor decks
    • Sectors: finance
    • Workflow: Non-linear narrative captures cross-section insights (market overview → risks → valuations), with paragraph-level citations.
    • Tools/products: Deck generator integrated with financial research platforms.
    • Assumptions/dependencies: Correct financial terminology; domain-specific LLM prompts.
  • Technical support and field manuals
    • Sectors: manufacturing, energy, telecom
    • Workflow: Convert long technical manuals into stepwise training decks with attributed instructions and safety notes.
    • Tools/products: LMS integration; offline deck export for field use.
    • Assumptions/dependencies: Consistent manual structures; text-heavy documents.

Academia and Education

  • Research paper to talk slides
    • Sectors: academia, edtech
    • Workflow: Ingest papers → generate non-linear decks that mirror human presentation narrative; attribution aids last-minute edits and fact-checking.
    • Tools/products: Conference prep assistant; integrated with arXiv/Institutional repositories.
    • Assumptions/dependencies: PDF quality; slides count K set by presenter.
  • Course material and lecture prep
    • Sectors: education
    • Workflow: Convert chapters/long readings into lecture decks, preserving narrative (problem → methods → results) with slide-level citations.
    • Tools/products: LMS plugin (Moodle/Canvas); instructor dashboard.
    • Assumptions/dependencies: Text-centric readings; instructor curation.

Policy and Government

  • Legislative briefings and stakeholder decks
    • Sectors: public policy, government
    • Workflow: Turn long bills/reports into briefings with traceable bullets back to statutory text; supports transparency and reduces misinterpretation.
    • Tools/products: Briefing generator for committees; public portal with clickable citations.
    • Assumptions/dependencies: Document standardization; privacy and FOIA considerations.

Healthcare

  • Clinical guideline and policy brief slides
    • Sectors: healthcare administration
    • Workflow: Convert guidelines/policies into training decks; attribution supports compliance and reduces risk of clinical misstatements.
    • Tools/products: Hospital policy-to-training deck tool.
    • Assumptions/dependencies: Text-only focus (no images/diagrams yet); domain prompting.

Daily Life

  • Book/article-to-presentation generator for study groups
    • Sectors: consumer productivity
    • Workflow: Create study decks from long readings; non-linear narrative supports thematic discussion; attributed slides aid citation.
    • Tools/products: Browser/Notion plugin with deck export.
    • Assumptions/dependencies: Clean text extraction; user-defined slide count.

Long-Term Applications

These depend on further research, scaling, or development—especially multimodal handling, template/style intelligence, and automation of agenda/slide count.

Multimodal and Template Intelligence

  • Multimodal document-to-deck (images/tables/diagrams)
    • Sectors: healthcare, engineering, research, finance
    • Innovation: Integrate VLMs (e.g., CLIP/LLaVA) for figure/table extraction and slide construction with captions and chart reproductions.
    • Tools/products: “Multimodal Presenter” with visual attribution.
    • Dependencies: Robust OCR/table/figure parsing; domain-specific visual understanding; privacy/compliance for images.
  • Style and template recommendation
    • Sectors: marketing, corporate communications
    • Innovation: Automatic selection of slide layouts, colors, and themes matched to content intent and audience persona.
    • Tools/products: Theme selector; brand compliance checker.
    • Dependencies: Brand guidelines; intent detection; user preference models.

Advanced Narrative and Personalization

  • Audience-aware narrative shaping
    • Sectors: education, enterprise training, sales
    • Innovation: Personalize non-linear narratives for different roles (exec vs. technical) using graph-level re-weighting and constrained generation.
    • Tools/products: Narrative designer; persona switcher.
    • Dependencies: Role metadata; evaluation of comprehension outcomes.
  • Automated agenda and slide-count inference
    • Sectors: all
    • Innovation: Predict K and agenda topics from document graph (avoiding manual K input), with constraints on time and audience.
    • Tools/products: Time-bounded deck planner.
    • Dependencies: Reliable topic segmentation; pacing models.

Reliability, Governance, and Collaboration

  • End-to-end fact-checking and hallucination detection
    • Sectors: regulated industries, public sector
    • Innovation: Use attribution + retrieval to auto-flag bullets not supported by source paragraphs; integrate trust scores.
    • Tools/products: Fact-checker panel; compliance audit trails.
    • Dependencies: High-precision citation alignment; organizational policies.
  • Collaborative editing on the document graph
    • Sectors: enterprise productivity
    • Innovation: Editable paragraph-slide graph; users can move nodes, see downstream effects, and re-generate affected slides.
    • Tools/products: Graph workspace; versioning with change impact preview.
    • Dependencies: Real-time graph ops; UI scalability.

Cross-Document and Knowledge Integration

  • Cross-document synthesis decks
    • Sectors: consulting, research, policy
    • Innovation: Build a unified graph across multiple documents; cluster nodes into cross-source slides with source-level attribution for synthesis.
    • Tools/products: Multi-source synthesizer; cross-repo search integration.
    • Dependencies: Document normalization; deduplication; citation management.
  • Knowledge base integration for ongoing updates
    • Sectors: enterprise KM
    • Innovation: Reusable graph representations of documents enabling auto-updated decks when source docs change.
    • Tools/products: “Live Decks” connected to repositories.
    • Dependencies: Change detection; incremental clustering; governance.

Evaluation and Metrics Transfer

  • Narrative non-linearity and coverage as standard quality metrics
    • Sectors: edtech, NLG tooling
    • Innovation: Adopt the paper’s non-linearity and coverage metrics to evaluate other generative summarizers and slide generators.
    • Tools/products: NLG evaluator SDK; analytics dashboards.
    • Dependencies: Agreement studies with human raters; task-specific calibrations.

Multilingual and Accessibility

  • Multilingual doc-to-deck with source-language attribution
    • Sectors: global enterprises, NGOs
    • Innovation: Generate decks in different languages while preserving citations to original paragraphs; support parallel corpora.
    • Tools/products: Localization-aware presenter.
    • Dependencies: Robust multilingual LLMs; translation quality control.
  • Accessibility-first decks
    • Sectors: public sector, education
    • Innovation: Auto-generate alt-text and screen-reader friendly structure based on the graph and attribution.
    • Tools/products: Accessibility checker and generator.
    • Dependencies: Standards compliance (e.g., WCAG); multimodal support.

Real-Time and Streaming

  • Real-time meeting notes to attributed slides
    • Sectors: enterprise productivity
    • Innovation: Stream transcriptions → segment into “paragraphs” → build evolving graph → generate ongoing slides for live briefings.
    • Tools/products: Meeting assistant presenter.
    • Dependencies: High-quality ASR; latency constraints; dynamic graph updates.

Sector-Specific Extensions

  • Clinical trial/protocol decks with risk and rationale tracing
    • Sectors: healthcare, pharma
    • Innovation: Non-linear narrative linking rationale, methods, endpoints, and risks with precise citations.
    • Tools/products: Protocol-to-deck generator.
    • Dependencies: Domain ontologies; strict compliance workflows.
  • Regulatory impact summaries
    • Sectors: finance, energy, public policy
    • Innovation: Auto-extract implications across sections and present decision-ready slides; maintain traceability to specific clauses.
    • Tools/products: Regulatory navigator.
    • Dependencies: Legal text parsing; expert validation cycles.

Notes on feasibility across applications:

  • The current pipeline is text-only; multimodal adoption is a key dependency for technical and healthcare-heavy domains.
  • Slide count K must be provided at inference; automated K/agenda inference requires additional modeling.
  • Attribution depends on accurate paragraph segmentation and graph thresholding; noisy PDFs/OCR can reduce precision.
  • Privacy, governance, and LLM access costs must be addressed for enterprise deployment.
  • Domain adaptation (prompts/classifiers) may be required for specialized jargon (finance, legal, clinical).

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 22 likes about this paper.