COCO-Inpaint: A Benchmark for Image Inpainting Detection and Manipulation Localization

Published 25 Apr 2025 in cs.CV and cs.AI | (2504.18361v1)

Abstract: Recent advancements in image manipulation have achieved unprecedented progress in generating photorealistic content, but also simultaneously eliminating barriers to arbitrary manipulation and editing, raising concerns about multimedia authenticity and cybersecurity. However, existing Image Manipulation Detection and Localization (IMDL) methodologies predominantly focus on splicing or copy-move forgeries, lacking dedicated benchmarks for inpainting-based manipulations. To bridge this gap, we present COCOInpaint, a comprehensive benchmark specifically designed for inpainting detection, with three key contributions: 1) High-quality inpainting samples generated by six state-of-the-art inpainting models, 2) Diverse generation scenarios enabled by four mask generation strategies with optional text guidance, and 3) Large-scale coverage with 258,266 inpainted images with rich semantic diversity. Our benchmark is constructed to emphasize intrinsic inconsistencies between inpainted and authentic regions, rather than superficial semantic artifacts such as object shapes. We establish a rigorous evaluation protocol using three standard metrics to assess existing IMDL approaches. The dataset will be made publicly available to facilitate future research in this area.

Abstract PDF Upgrade to Chat

Authors (8)

Summary

COCO-Inpaint: A Benchmark for Image Inpainting Detection and Manipulation Localization

The paper introduces COCO-Inpaint, a benchmark specifically designed for detecting and localizing image manipulations performed through inpainting techniques. While image manipulation detection and localization (IMDL) has typically focused on splicing or copy-move forgeries, COCO-Inpaint addresses the gap regarding inpainting manipulations. It provides a comprehensive dataset constructed to highlight intrinsic inconsistencies rather than relying solely on semantic artifacts like object shapes.

Contributions

COCO-Inpaint offers several key contributions to the domain of image forensic analysis:

Inpainting Samples by State-of-the-Art Models: The dataset includes high-quality samples generated by six leading inpainting models, namely SD1.5-Inpainting, SDXL-Inpainting, SD3.5, Flux.1-Fill-dev, BrushNet, and PowerPaint. This diversity is crucial in evaluating the effectiveness of detection methods across different types of manipulations.
Diverse Generation Scenarios: The benchmark enables varied manipulation contexts through four mask generation strategies and optional text guidance. These masks include segmentation-based, bounding box, random polygon, and random box types, creating 258,266 inpainted images with rich semantic variety.
Rigorous Evaluation Protocol: The paper sets forth a rigorous testing protocol using three standard metrics for assessing IMDL approaches. This provides a structured platform for evaluation and advances research in detecting image manipulations.

Numerical and Bold Claims

The benchmark's scale, encompassing 258,266 manipulated images and 117,266 authentic images, is a substantial improvement over existing datasets. The paper argues that models trained on COCO-Inpaint significantly enhance detection sensitivity and segmentation accuracy. Further, the model architecture employed shows that vision transformer-based models consistently outperform CNN-based models across diverse experimental conditions.

Implications and Future Directions

COCO-Inpaint not only advances the state-of-the-art in image manipulation detection but also has broader implications for enhancing cybersecurity and multimedia authenticity. As image inpainting models continue to evolve, benchmarks like COCO-Inpaint will be vital in ensuring that detection methods remain robust. This dataset provides an opportunity for fostering development in IMDL methodologies that can effectively generalize across different manipulation techniques and styles.

Looking ahead, the increasing sophistication and realism of inpainting models necessitate further research segments focusing on improving cross-model and cross-mask generalization capabilities. IMDL will also need to address the challenges posed by the rapid advancements in AI-driven image generation technologies. Future benchmarks could further expand the scope by incorporating video manipulations or exploring real-time detection capabilities.

In summary, COCO-Inpaint lays a sophisticated and comprehensive foundation for advancing research in image manipulation detection, offering a robust platform for evaluating current methods and inspiring innovation in detection strategies capable of dealing with next-generation image editing techniques. The dataset's public availability aims to facilitate ongoing research and encourage collaborations that push the boundaries of multimedia forensics.

Markdown Report Issue