Papers
Topics
Authors
Recent
Search
2000 character limit reached

PaperBanana: Automated Scientific Illustrations

Updated 2 February 2026
  • The paper introduces an agentic framework for automated academic illustrations by integrating retrieval, content planning, style guidance, and iterative refinement.
  • It achieves superior performance by improving faithfulness, conciseness, readability, and aesthetics compared to baseline models.
  • The study also extends the approach to statistical plot generation and discusses related innovations in energy devices and geometric analysis.

PaperBanana refers to several distinct but technically significant constructs spanning automated scientific illustration (PaperBanana framework), sustainable flexible electronics (banana-derived supercapacitors), and advanced geometric or analytic structures (banana integrals, banana-shaped actuators). While the term may arise independently in each context, the following exposition prioritizes the most prominent recent usage as an agentic illustration-generation framework for AI scientists, followed by an overview of related concepts in scientific energy storage, soft material mechanics, and mathematical physics.

1. Automated Academic Illustration: The PaperBanana Framework

The PaperBanana framework is an agentic, reference-driven system for automated generation of publication-ready academic illustrations, introduced to address the bottleneck of manual figure drafting in AI-augmented research environments (&&&0&&&). The architecture decomposes the illustration pipeline into specialized autonomous agents orchestrated in modular sequence, leveraging state-of-the-art vision–LLMs (VLMs) and diffusion-based image generators.

System Architecture and Key Agents:

  • Retriever Agent performs generative retrieval of top-NN reference example triplets (Si,Ci,Ii)(S_i, C_i, I_i) (with SiS_i being a methodology segment, CiC_i a caption, and IiI_i the figure) from a reference repository, conditioning on proposed context SS and intent CC.
  • Planner (Content Planner) uses a VLM (e.g., Gemini-3-Pro) to transform (S,C,E)(S, C, \mathcal{E}) into a structured content plan PP—an explicit specification of diagram entities and relations:

P=VLMplan(S,C,(Si,Ci,Ii)i=1N)P = \mathrm{VLM}_{\mathrm{plan}}(S,\,C,\,(S_i, C_i, I_i)_{i=1}^N)

  • Stylist (Style Planner) creates an “Aesthetic Guideline” G\mathcal{G} by summarizing style features across the reference set, then refines PP into a fully specified, style-conformant plan PP^*:

P=VLMstyle(P,G)P^* = \mathrm{VLM}_{\mathrm{style}}(P,\,\mathcal{G})

  • Visualizer (Image Renderer) maps the textual plan PtP_t at each iterative step tt into a raster diagram ItI_t using fine-tuned scientific figure diffusion models (Nano-Banana-Pro) or generalist generators (GPT-Image-1.5).
  • Critic (Self-Critic Agent) performs multimodal analysis of ItI_t against (S,C)(S, C), issuing a suggested revision Pt+1P_{t+1} for further refinement.

Iterative Refinement Loop: For t=0T1:It=ImageGen(Pt)Pt+1=VLMcritic(It,S,C,Pt)\text{For } t = 0 \ldots T-1: \quad I_t = \mathrm{ImageGen}(P_t) \quad P_{t+1} = \mathrm{VLM}_{\mathrm{critic}}(I_t, S, C, P_t) with final output ITI_T. This loop operationalizes model-based self-critique, functionally analogous to a discrete “gradient” update in description space: Pt+1=Pt+αPC(ImageGen(Pt);S,C)P_{t+1} = P_t + \alpha\,\nabla_{P}\,\mathcal{C}\bigl(\mathrm{ImageGen}(P_t);S,C\bigr) where C\mathcal{C} scores faithfulness, conciseness, and aesthetics.

2. Content and Style Planning Mechanisms

The content planner’s output PP encodes diagram semantics as a set of nodes (modules, data artifacts) and directed edges (flow relations), e.g. (nj,shapej,labelj)(n_j, \mathrm{shape}_j, \mathrm{label}_j), (ejk,arrow-stylejk)(e_{jk}, \mathrm{arrow\text{-}style}_{jk}). This specification supports unambiguous mapping to schematic illustration backends. The style planner constructs the aesthetic guideline G\mathcal{G} by aggregating statistics on color palettes, shape motifs (e.g., rounded versus sharp corners), line conventions (solid, dashed, orthogonal, or curved trajectories), and typographic rules (mathematics in serif, labels in sans-serif). The plan PP^* is then enriched with explicit HEX codes, font metrics, and parameterized visual instructions enforced through prompt engineering.

3. Evaluation Using PaperBananaBench

Benchmarking is accomplished via PaperBananaBench, containing 292 meticulously curated NeurIPS 2025 methodology-diagram cases and 292 held-out reference examples for agentic retrieval. Domain categories include Agent Reasoning, Vision Perception, Generative Learning, and Science Applications. Evaluation employs a VLM-based Judge (Gemini-3-Pro) that compares generated diagrams II to human references IrefI^{\mathrm{ref}} across:

  • Faithfulness: semantic preservation relative to S,CS,C
  • Conciseness: lack of superfluous or redundant elements
  • Readability: clarity and legibility of diagram components
  • Aesthetics: adherence to prevailing visual norms

The scoring is categorical: 100 (“Model win”), 50 (“Tie”), 0 (“Human win”). Aggregation prioritizes faithfulness and readability.

Method Faithfulness Conciseness Readability Aesthetics Overall
Nano-Banana-Pro 43.0 43.5 38.5 65.5 43.2
PaperBanana (ours) 45.8 80.7 51.4 72.1 60.2

Ablation results confirm criticality of each agent: removing Retriever (-16 points overall), Stylist (-17.5% conciseness), or Critic (-15.8% faithfulness) each substantially degrade performance.

4. Extension to Statistical Plot Generation

PaperBanana generalizes to the automatic production of statistical plots by extending the Visualizer agent with a code generator, which translates plan PtP_t into executable scripts (e.g., Python/Matplotlib). The Critic then analyzes generated plots in conjunction with raw data, updating Pt+1P_{t+1} to correct mis-specifications. On the ChartMimic direct mimic suite (240 cases, 7 plot types), PaperBanana surpasses a Gemini-3-Pro code-generation baseline by +1.4% (faithfulness), +5.0% (conciseness), +3.1% (readability), and +4.0% (aesthetics), while matching human plot faithfulness and marginally exceeding human performance in other dimensions.

5. Underlying Models, Optimization Objectives, and Systemic Significance

The VLM backbone (Gemini-3-Pro) is central for all high-level planning and critique subroutines. Image generators are diffusion models trained on scientific illustration corpora; Nano-Banana-Pro, in particular, offers high diagram structural fidelity. Training objectives for the generators employ a mean squared diffusion denoising loss: Lgen=Ex0,ϵN(0,I),tϵϵθ(xt,t)2,\mathcal{L}_{\mathrm{gen}} = \mathbb{E}_{x_0,\epsilon\sim\mathcal{N}(0,I),t} \left\|\epsilon - \epsilon_\theta(x_t,t)\right\|^2\,, with xt=αtx0+1αtϵx_t = \sqrt{\alpha_t}x_0 + \sqrt{1-\alpha_t}\epsilon.

The framework is deployed without additional fine-tuning; all style adaptation is handled by in-context prompting.

a. Supercapacitor from Banana-Peel Biomass (“PaperBanana” Device, Editor’s term)

Activated carbon derived from banana peel using KOH activation (62m2/g62\,\mathrm{m}^2/\mathrm{g} BET area) is employed in a flexible, interdigitated supercapacitor fabricated on PET via screen-printed Ag electrodes and drop-cast PVA/H3_3PO4_4 gel electrolyte (Singh et al., 2019). The device delivers areal capacitances up to 33.18mF/cm233.18\,\mathrm{mF/cm}^2, energy density 5.87μWh/cm25.87\,\mu\mathrm{Wh/cm}^2, and 90%\sim 90\% retention under 5000 cycles and mechanical bending, with scalability enabled by low-cost, waste-derived carbon and roll-to-roll compatible screen printing.

b. Isometric Deformations in Soft Matter: Banana-Shaped Seedpod

The “folded Goursat” family analytically characterizes isometric deformations with folds in thin shells, inspired by banana-shaped seedpods (Couturier, 2016). These geometric constructs allow for controllable actuation (closing or opening) determined by fold placement, with optimization favoring elongated morphologies for ease of opening and minimized mechanical cost.

c. Banana Integrals in Mathematical Physics

Multi-loop “banana” Feynman integrals admit descriptions in terms of periods of K3 surfaces, with modular/automorphic properties determined by mass configuration. Maximal cuts of three-loop banana integrals lead to explicit orthogonal modular forms, Hilbert/Siegel/Hermitian modular forms, and factorized elliptic expressions, structured by the transcendental lattice and associated monodromy groups (Duhr, 21 Feb 2025).

7. Concluding Synthesis and Implications

The principal PaperBanana framework illustrates the synergistic integration of retrieval-augmented VLMs, image/text planning, automated aesthetic induction, iterative self-critique, and scientific benchmarking, collectively enabling an agentic pipeline for the generation and refinement of high-fidelity, publication-grade scientific illustrations. Its extension to statistical plotting substantiates its generality. Related developments in energy storage and geometry—where “banana” refers to highly engineered morphologies as well as specialized integrals—demonstrate the breadth of technical interpretations, with significant impact arising from agentic automation in research workflows, scalable biointegrated devices, and advanced analytic structures in physical mathematics (Zhu et al., 30 Jan 2026, Singh et al., 2019, Couturier, 2016, Duhr, 21 Feb 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PaperBanana.