DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation

Published 26 Feb 2026 in cs.AI | (2602.22839v1)

Abstract: Presentation generation requires deep content research, coherent visual design, and iterative refinement based on observation. However, existing presentation agents often rely on predefined workflows and fixed templates. To address this, we present DeepPresenter, an agentic framework that adapts to diverse user intents, enables effective feedback-driven refinement, and generalizes beyond a scripted pipeline. Specifically, DeepPresenter autonomously plans, renders, and revises intermediate slide artifacts to support long-horizon refinement with environmental observations. Furthermore, rather than relying on self-reflection over internal signals (e.g., reasoning traces), our environment-grounded reflection conditions the generation process on perceptual artifact states (e.g., rendered slides), enabling the system to identify and correct presentation-specific issues during execution. Results on the evaluation set covering diverse presentation-generation scenarios show that DeepPresenter achieves state-of-the-art performance, and the fine-tuned 9B model remains highly competitive at substantially lower cost. Our project is available at: https://github.com/icip-cas/PPTAgent

Abstract PDF Upgrade to Chat

Summary

The paper introduces an agentic dual-agent framework (Researcher and Presenter) that autonomously curates content and designs adaptive presentations.
It employs environment-grounded reflection by inspecting perceptual artifacts to iteratively correct visual and textual defects.
Experimental results validate superior content quality, visual diversity, and cost-effective performance compared to baseline systems.

DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation

Overview

The paper "DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation" (2602.22839) introduces a novel agentic framework—DeepPresenter—for automated presentation generation. Unlike existing systems founded on rigid templates and introspective self-reflection, DeepPresenter employs a dual-agent architecture (Researcher and Presenter) with environment-grounded reflective capabilities. This approach enables autonomous content research, adaptive visual design, and iterative artifact refinement based on rendered environmental observations.

Figure 1: DeepPresenter’s dual-agent system collaboratively generates manuscripts and slides, leveraging perceptual artifacts for environment-grounded reflection.

Framework Architecture and Methodology

Task Formulation and Dual-Agent Collaboration

Presentation generation is modeled as a sequential multi-step agentic task operating within an interactive environment. DeepPresenter decomposes execution into Researcher and Presenter phases:

Researcher Agent: Conducts adaptive information retrieval and synthesis, generating a structured manuscript aligned with user intents. This agent invokes retrieval, analysis, and file manipulation tools, enabling intent-specific research depth responsive to task complexity.
Presenter Agent: Transforms the manuscript into visually coherent HTML/CSS slides in a content-driven manner rather than template-based generation. Presenter develops a global design plan with topic-aligned themes and generates each slide as a fixed-layout artifact.

This decomposition ensures both deep content curation and topic-aware visual realization.

Environment-Grounded Reflection

DeepPresenter does not rely on introspective reasoning traces or intermediate code-level self-reflection. Instead, it grounds revision in perceptual artifact states—specifically, rendered manuscripts and slides exposed via the inspect tool:

Figure 2: Contrasting self-reflection with environment-grounded reflection; DeepPresenter leverages environmental observations for artifact inspection and targeted correction.

Agents inspect rendered artifacts, identify post-render defects (e.g., low contrast, layout overflow, broken images), and revise intermediate representations via the think tool. This observe–reflect–revise loop tightly aligns artifact inspection with end-user perception, effectively mitigating defects that introspection cannot capture.

Data Synthesis and Extrinsic Verification

To enable robust fine-tuning of DeepPresenter-9B, a high-quality ultra-present dataset of agentic trajectories is synthesized using a three-stage pipeline:

Figure 3: Data synthesis pipeline integrating query construction, extrinsic verification via critic reasoning traces, and multi-step trajectory filtering for quality assurance.

Extrinsic verification leverages independent critics to review artifact states and prescribe defect-triggered corrective actions. This strategy decouples verification from the originating agent trajectory, mitigating self-verification bias and driving higher-quality reflective behaviors during agent rollouts.

Experimental Evaluation

Quantitative Results

DeepPresenter is evaluated on 128 diverse presentation-generation tasks with metrics for constraint satisfaction, content quality, visual style, and diversity. The framework consistently achieves state-of-the-art results across commercial and academic baselines:

Gemini-3-Pro backbone: Avg. score 4.44, surpassing Gamma (4.36) and all open-source systems.
Visual diversity: DeepPresenter achieves a Vendi Score of 0.79, substantially exceeding template-based baselines (≤0.35).
DeepPresenter-9B: Compact model achieves 4.19, outperforming all open-source systems, closely matching GPT-5 (4.22) at reduced cost.

Ablation studies validate the critical contributions of environment-grounded reflection, dual-agent decomposition, and rigorous trajectory filtering:

Removing environment-grounded reflection degrades performance (Gemini-3-Pro: 4.32 vs. 4.44; DeepPresenter-9B: 3.82 vs. 4.19).
Exclusion of dual-agent design yields substantial drops (Gemini-3-Pro: 4.04; DeepPresenter-9B: 3.23).

Extrinsic Verification Analysis

Extrinsic verification significantly strengthens defect identification and revision signals, particularly for slide-level failures:

Figure 4: Defect distribution comparison—extrinsic verification leads to broader and deeper detection of manuscript and slide errors.

Trajectory synthesis without extrinsic verification exhibits self-justification bias, with agents rationalizing defects and missing substantial post-render issues.

Failure and Efficiency Studies

Trajectory failure analysis indicates quality errors dominate long-horizon generation (43.0%), followed by environment/interruption failures (32.3%). Constraint and consistency violations are less prevalent.

Figure 5: Failure distribution in agentic rollouts—quality and environment errors are primary concerns.

Efficiency evaluations show DeepPresenter-9B advances Pareto-optimal trade-offs, delivering high quality at comparable or reduced inference cost.

Figure 6: Cost-performance trade-off; DeepPresenter-9B outperforms prior open-source frontiers at similar price points.

Qualitative Assessment

Case studies highlight DeepPresenter’s ability to generate visually thematic, asset-rich slides, while baselines produce text-heavy, template-constrained outputs with frequent misalignment of visual and narrative elements.

Figure 7: Qualitative comparison—DeepPresenter produces topic-resonant, visually coherent presentations absent in baseline methods.

Tool Usage and Domain Analysis

Agent tool invocation patterns further validate role specialization. Researcher predominantly utilizes retrieval tools in persona-driven scenarios requiring knowledge search; Presenter focuses on file and reasoning tools for artifact manipulation and reflection.

Figure 8: Tool usage patterns—specialized agent roles adapt tool selection based on task domain characteristics.

Domain breakdowns demonstrate DeepPresenter’s strong performance across instructional, academic, and persona-driven presentations, with domain-specific trade-offs between constraint compliance and visual style.

Implications and Future Directions

The environment-grounded reflection, dual-agent specialization, and extrinsic verification mechanisms established in DeepPresenter represent a significant architectural departure from template-based, introspective systems. Practically, this enables highly adaptive, perceptually aligned artifact generation suitable for real-world educational, business, and research applications. Theoretically, it augments agentic LLM frameworks with embodied environmental feedback loops beyond internal token-based reasoning.

Future work may address inference-time integration of extrinsic verification to further mitigate self-verification bias, explore greater robustness in multi-step tool-using rollouts, and generalize environment-grounded reflection for other multimodal artifact generation domains.

Conclusion

DeepPresenter establishes an agentic paradigm for presentation generation, coupling autonomous research and design specialization with post-render environment-grounded reflection. Strong empirical results corroborate the efficacy of grounding artifact revision in perceptual environmental states and decoupling verification signals via extrinsic critics. The compact DeepPresenter-9B highlights the viability of distilling competitive agentic behaviors at reduced cost through high-quality trajectory synthesis. This framework paves the way for more adaptive, perceptually aligned, and efficient multimodal artifact generation in future AI systems.

Markdown Report Issue