ToonCrafter: Generative Cartoon Interpolation

Published 28 May 2024 in cs.CV | (2405.17933v1)

Abstract: We introduce ToonCrafter, a novel approach that transcends traditional correspondence-based cartoon video interpolation, paving the way for generative interpolation. Traditional methods, that implicitly assume linear motion and the absence of complicated phenomena like dis-occlusion, often struggle with the exaggerated non-linear and large motions with occlusion commonly found in cartoons, resulting in implausible or even failed interpolation results. To overcome these limitations, we explore the potential of adapting live-action video priors to better suit cartoon interpolation within a generative framework. ToonCrafter effectively addresses the challenges faced when applying live-action video motion priors to generative cartoon interpolation. First, we design a toon rectification learning strategy that seamlessly adapts live-action video priors to the cartoon domain, resolving the domain gap and content leakage issues. Next, we introduce a dual-reference-based 3D decoder to compensate for lost details due to the highly compressed latent prior spaces, ensuring the preservation of fine details in interpolation results. Finally, we design a flexible sketch encoder that empowers users with interactive control over the interpolation results. Experimental results demonstrate that our proposed method not only produces visually convincing and more natural dynamics, but also effectively handles dis-occlusion. The comparative evaluation demonstrates the notable superiority of our approach over existing competitors.

Abstract PDF HTML Upgrade to Chat

Citations (12)

View on Semantic Scholar

Summary

The paper presents ToonCrafter, a novel framework that leverages live-action motion priors to overcome challenges in cartoon video interpolation.
It employs innovative techniques like Toon Rectification Learning and a Dual-Reference-Based 3D Decoder to ensure detail preservation and temporal consistency.
A sketch encoder provides interactive control, enabling flexible adjustments validated by rigorous quantitative metrics and user studies.

ToonCrafter: Generative Cartoon Interpolation

The paper introduces ToonCrafter, a novel generative framework aimed at enhancing cartoon video interpolation. This methodology is a departure from traditional correspondence-based techniques that grapple with the intrinsic complexities of cartoon animations, such as exaggerated non-linear motions and prevalent occlusion phenomena.

Methodology Overview

ToonCrafter addresses limitations inherent to standard cartoon interpolation methods by adapting motion priors derived from live-action videos within a generative framework. The core strategy involves several key innovations:

Toon Rectification Learning Strategy: This element of ToonCrafter enables the adaptation of motion priors from live-action videos for application in the cartoon domain by mitigating domain gaps and content leakage. Notably, this method fine-tunes specific components (image-context projector and spatial layers) without altering the temporal layers, thereby preserving real-world motion priors while adapting appearance distributions.
Dual-Reference-Based 3D Decoder: This component injects and propagates detail information from the initial and final frames across generated frames to counter the loss of detail and quality degradation typical of highly compressed latent spaces. This is achieved using a hybrid-attention-residual-learning mechanism and pseudo-3D convolutions.
Sketch Encoder for Interactive Control: This allows users to input sketch guidance to interactively control the interpolation results, providing flexibility in handling temporally sparse or dense motion structures.

Experimental Evaluation

Empirical evaluations of ToonCrafter demonstrate its superiority in both qualitative and quantitative metrics compared to conventional approaches. The dataset used includes a meticulously curated collection of high-quality cartoon videos, ensuring diverse and challenging test conditions.

Quantitative Metrics

Fréchet Video Distance (FVD) and Kernel Video Distance (KVD): ToonCrafter achieves superior performance, with an FVD of 43.92 and a KVD of 1.52, indicating improved temporal motion dynamics and spatial coherence.
LPIPS: ToonCrafter's LPIPS score of 0.1733, while lower than some traditional methods, is mitigated by its superiority in non-full-reference metrics, such as perceptual similarity.
CLIP image/text metrics: With scores of 0.9221 (CLIP $_\text{img}$ ) and 0.3129 (CLIP $_\text{txt}$ ), ToonCrafter demonstrates improved semantic alignment.
Cumulative Probability Blur Detection (CPBD): ToonCrafter achieves a score of 0.6723, indicating high output sharpness.

Qualitative Comparisons

Comparative visual assessments reveal that ToonCrafter excels in generating intermediate frames with realistic and contextually consistent animations, outperforming traditional methods that often produce distorted or implausible results. The effectiveness is evidenced even in cases involving large non-linear motions and dis-occlusions.

User Study

A user study involving 24 participants confirmed ToonCrafter's edge, with significant preferences for motion quality, temporal coherence, and frame fidelity.

Ablation Studies

A series of ablation studies underscore the efficacy of the proposed strategies:

Toon Rectification Learning: Freezing temporal layers while fine-tuning image-context projectors and spatial layers preserves motion priors and enhances adaptation to the cartoon domain.
Dual-Reference-Based 3D Decoder: Incorporating both hybrid-attention-residual mechanisms and pseudo-3D convolutions is critical for maintaining detail integrity and temporal coherence.
Sketch-Based Guidance: Frame-independent sketch encoders ensure optimal control over interpolation without compromising coherence.

Practical and Theoretical Implications

ToonCrafter’s generative framework expands the feasibility and quality of cartoon interpolation, enabling more efficient and less labor-intensive animation production. From a theoretical standpoint, the framework elucidates the potential of leveraging live-action motion priors within generative models, demonstrating effective domain adaptation and detail preservation techniques.

Future Developments

Potential future advancements include:

Integration of More Sophisticated Priors: Incorporating a wider variety of motion priors across different animation styles and genres.
Enhanced User Control Mechanisms: Developing more intuitive and robust interactive tools for animators.
Scaling to Ultra-High Definition Content: Adapting the framework for 4K or higher resolutions to meet modern content production standards.

Conclusion

ToonCrafter represents a significant stride in the field of cartoon animation, offering a robust and flexible solution for generating high-quality interpolations while addressing existing limitations. The framework’s adaptability and user-centric design promise substantial contributions to both animation production and the broader research community within visual media and computer vision.