Can Nano Banana 2 Replace Traditional Image Restoration Models? An Evaluation of Its Performance on Image Restoration Tasks

Published 3 Apr 2026 in cs.CV | (2604.03061v1)

Abstract: Recent advances in generative AI raise the question of whether general-purpose image editing models can serve as unified solutions for image restoration. In this work, we conduct a systematic evaluation of Nano Banana 2 for image restoration across diverse scenes and degradation types. Our results show that prompt design plays a critical role, where concise prompts with explicit fidelity constraints achieve the best trade-off between reconstruction accuracy and perceptual quality. Compared with state-of-the-art restoration models, Nano Banana 2 achieves superior performance in full-reference metrics while remaining competitive in perceptual quality, which is further supported by user studies. We also observe strong generalization in challenging scenarios, such as small faces, dense crowds, and severe degradations. However, the model remains sensitive to prompt formulation and may require iterative refinement for optimal results. Overall, our findings suggest that general-purpose generative models hold strong potential as unified image restoration solvers, while highlighting the importance of controllability and robustness. All test results are available on https://github.com/yxyuanxiao/NanoBanana2TestOnIR.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper demonstrates that Nano Banana 2 achieves competitive performance against state-of-the-art IR models, excelling in both perceptual and quantitative metrics.
The methodology underscores the critical role of prompt engineering, showing that longer, fidelity-guided prompts notably reduce semantic drift and enhance restoration precision.
The study identifies limitations, including prompt sensitivity, stochastic output variations, and over-generation artifacts, which guide future improvements in unified image restoration.

Evaluation of Nano Banana 2 as a Unified Image Restoration Model

Introduction

The study "Can Nano Banana 2 Replace Traditional Image Restoration Models? An Evaluation of Its Performance on Image Restoration Tasks" (2604.03061) systematically examines the capability of Nano Banana 2, a general-purpose, generative text-guided image editing model, for classical and challenging image restoration (IR) scenarios. The research addresses a core question in low-level vision: can high-capacity, instruction-driven generative models subsume the role of specialist IR networks traditionally designed for specific degradations (denoising, deblurring, super-resolution, artifact removal, etc.)?

The authors provide a rigorous empirical analysis focusing on prompt engineering, fidelity versus perceptual trade-offs, robustness across degradation types and scenes, and benchmarking against state-of-the-art (SOTA) IR methods. The study demonstrates that prompt design critically governs restoration behavior, highlights strong quantitative and qualitative results, explores generalization and failure modes, and discusses implications for the evolution of unified vision models.

Prompts, Fidelity, and Restoration Behavior

Prompt formulation was shown to be instrumental in guiding Nano Banana 2's restoration output. The study designs 12 carefully crafted prompts varying in length (short/long) and in explicitness regarding fidelity constraints.

Longer prompts consistently enhance both distortion-oriented metrics (PSNR, SSIM, LPIPS) and no-reference perceptual metrics (MUSIQ, MANIQA, CLIP-IQA), particularly for complex tasks such as text and surveillance restoration.

Figure 1: Long prompts, by supplying more contextual and task cues, yield more precise and detail-consistent image restorations, especially in scenarios with high informational ambiguity.

Fidelity constraints embedded in prompts suppress semantic drift and hallucination, encouraging reconstructions that preserve structure and meaning, quantified by reductions in severe semantic deviations (from 2 infidelity cases per prompt without fidelity constraints, to 0.5 with fidelity cues).

Figure 2: Prompts with explicit fidelity constraints mitigate semantic artifacts, producing results with greater structural and semantic accuracy.

Nevertheless, prompt engineering does not fully resolve model infidelity, as cases of semantic deviation can persist even with strong fidelity cues.

Figure 3: Despite explicit fidelity constraints in prompts, Nano Banana 2 can still hallucinate or alter input content, indicating intrinsic generative model limitations.

Furthermore, the model exhibits a distinct perception–distortion trade-off. Prompts that prioritize fidelity achieve superior PSNR/SSIM, while purely perceptual prompts maximize not-reference metrics but risk unattested, over-enhanced details and semantic shifts.

Stability, Over-Generation, and Output Consistency

The stability analysis demonstrates that Nano Banana 2 is generally consistent across repeated runs for a given prompt and input, but is susceptible to substantial stochastic variations and non-deterministic outputs on complex, ill-posed cases.

Figure 4: In inherently ambiguous or severely degraded scenarios, repeated runs of Nano Banana 2 with identical inputs can produce visible color shifts, scale changes, and structural instabilities.

Additionally, over-generation is frequently encountered: the model amplifies textures, creates unrealistic fine detail, or introduces visually plausible-yet-unsupported structures, especially absent strong fidelity guidance.

Figure 5: Over-generation artifacts include exaggerated textural content and implausible fine structures, limiting reliability in restoration tasks requiring strict content preservation.

Benchmarking and Quantitative Performance

Comparative experiments cover diverse scenes (e.g., small faces, crowd, hands/feet, text) and multiple degradations (motion blur, old films, surveillance noise). Nano Banana 2 consistently achieves superior or highly competitive results, particularly excelling in full-reference metrics (SSIM, LPIPS), demonstrably surpassing or matching SOTA IR models like HYPIR, TSD-SR, PiSA-SR, and DiffBIR.

Figure 6: Across highly challenging degradation and scene types (motion blur, old film, surveillance, small faces, hands, text), Nano Banana 2 delivers clearer, structurally-coherent outputs and maintains semantically consistent content.

The user study substantiates these findings: human raters consistently prefer Nano Banana 2's outputs for perceptual quality, awarding it the highest mean score and tightest distribution across examined systems.

Figure 7: User study results reveal Nano Banana 2's restoration outputs are consistently rated as most perceptually convincing and preferred over competing models.

Limitations, Robustness, and Theoretical Implications

Despite strong generalization and competitive quantitative gains, the research identifies several limitations:

Prompt Sensitivity: Restoration quality is non-trivially dependent on prompt choice and often requires iterative human intervention and engineering to avoid artifacts or hallucination.
Stochastic Variability: In complex cases, repeated generations yield non-deterministic outputs, creating practical challenges for applications where reproducibility is required.
Over-Generation and Infidelity: The model may invent plausible but fictitious structures or fail to accurately reconstruct lost semantics when input is highly ambiguous or information-poor.

These failure modes underscore critical challenges in controlling deep generative systems for restoration. Theoretical implications include a need for more robust prompt-grounded conditioning, hybridization with traditional constraint mechanisms, and deeper understanding of perception–fidelity trade-offs as models become more unified and generalist. Additionally, human-centric evaluation, rather than sole reliance on automated metrics, is highlighted as essential for future model assessment.

Future Directions in Unified AI Vision

Nano Banana 2's performance marks a significant step toward the unification of low-level vision tasks under broad, instruction-driven generative frameworks. Future advances are anticipated in several directions:

Enhanced controllability and fidelity via cross-modal, constraint-enforced guidance, possibly fusing model-based and generative paradigms.
Robust prompt engineering with automated, adaptive prompt optimization for downstream IR applications.
Improved output determinism through structured stochasticity control and better uncertainty quantification.
Methodological advances in perceptual/semantic evaluation to align model outputs with human judgment in practical deployment scenarios.

Conclusion

This study provides an in-depth, quantitative and qualitative assessment of Nano Banana 2 in image restoration, demonstrating that general-purpose, high-capacity generative models can act as credible, unified restoration engines for a diverse spectrum of tasks. Restoration outcomes hinge critically on prompt engineering and fidelity guidance; pure generative paradigms bring both potential and new risks, particularly with over-generation and semantic drift. Addressing limitations in controllability and stability will be essential for widespread, reliable adoption of such unified models in real-world vision systems.

Markdown Report Issue