Diffusion-Based Ear Inpainting

Updated 3 February 2026

Diffusion-based ear inpainting is a technique that leverages PDEs, probabilistic generative modeling, and deep neural networks to reconstruct missing ear regions with anatomical precision.
It employs methods like directional diffusion, regularized diffusion-shock, and score-based DDPMs to achieve high structural fidelity as measured by metrics such as PSNR and SSIM.
Practical implementations integrate geometric priors, precise masking, and dynamic sampling strategies to ensure seamless restoration and improved biometric recognition outcomes.

Diffusion-based ear inpainting encompasses a diverse set of algorithms leveraging partial differential equations, probabilistic generative modeling, and deep learning for reconstructing missing or occluded regions in ear images. This area is characterized by the tailored integration of geometric priors and masked conditioning within diffusion processes to restore anatomical plausibility and fine details critical for biometric recognition and visual fidelity.

1. Mathematical Foundations of Diffusion-Based Inpainting

Diffusion-based inpainting approaches describe the evolution of image intensities in missing regions via diffusion processes, often formalized as discrete or continuous-time iterations that transport information from known to unknown pixels under spatial smoothness and structure-preservation constraints.

Discrete Diffusion: Directional kernels steer the convolution-based updates according to estimated local edge orientation, enabling propagation of geometric features into the inpainting domain. For a patch-based scheme, the primary update is

$I^{(t+1)}(x,y)=I^{(t)}(x,y)+\lambda\sum_{(p,q)\in\{-1,0,1\}^2} K_\theta(p,q) \left[I^{(t)}(x+p, y+q) - I^{(t)}(x,y)\right],$

with fixed-point constraint for known pixels (Deriu et al., 2015).

Continuous PDEs: Regularized diffusion-shock models integrate homogeneous Laplacian diffusion with coherence-enhancing shock filtering, described by

$\partial_t u = g(|\nabla u_\nu|^2) \Delta u - (1-g(|\nabla u_\nu|^2)) S_\varepsilon(\partial_{\mathbf{w}\mathbf{w}} u_\sigma) |\nabla u|$

for spatial location $\mathbf{x}$ , where $g$ enforces edge-aware weighting and the shock term sharpens structural flow (Schaefer et al., 2023).

Generative Diffusion Models: Denoising Diffusion Probabilistic Models (DDPMs) operate in high-dimensional latent or image space, iteratively reversing a prescribed noising schedule. For inpainting, the reverse process is conditioned by replacing masked regions via sample injection or mask-aware neural modules:

$p_\theta(x_{t-1}|x_t, M) = \mathcal{N}(x_{t-1}; \mu_\theta(x_t,t,M), \beta_t I)$

with $\mu_\theta$ parameterized by a U-Net architecture integrating the mask as an explicit input (Arun et al., 27 Jan 2026).

2. Principal Methodologies for Ear Inpainting

The field comprises several core algorithmic strategies:

Directional Diffusion: Enhances edge propagation by estimating patchwise orientation (using finite-difference or gradient-averaging heuristics), constructing $3\times3$ directional kernels adapted by local edge direction, and applying sequential per-patch updates. Small patch sizes and refined orientation estimation are critical for anatomical features (e.g., cartilage ridges) in ears (Deriu et al., 2015).
Regularised Diffusion-Shock Inpainting (RDS): Alternates between isotropic diffusion (for smooth fills) and shock filtering aligned with local texture directionality, guided by the dominant eigenvector of the structure tensor. Characteristic parameters include spatial scales $(\sigma, \nu, \rho)$ , switch coefficients $(\lambda, \varepsilon)$ , and explicit time-step selection. Cartilage preservation is governed by careful tuning of shock regularization and local adaptation of the diffusion/shock weighting (Schaefer et al., 2023).
Sub-Riemannian Hypoelliptic Diffusion (AHE): Lifts image data to a three-dimensional $(x, y, \theta)$ representation, allowing anisotropic evolution along dominant orientation fields, efficiently discretized by semi-discrete orientation sampling and Crank-Nicolson integration. Advanced averaging steps across boundaries ensure data-fidelity and smooth transition between known and inpainted pixels, with robustness even at high (≥80%) corruption rates (Boscain et al., 2015).
Score-Based and DDPM Inpainting: Applies a learned denoising network (score function) within at each diffusion step, with special mechanisms for mask-handling: (1) iterative mask-resample in RePaint, (2) drift and noise realignment in RePaint $^+$ for unbiased recovery and geometric convergence, and (3) end-to-end U-Net architectures conditioned on both mask and current noisy estimate (Rout et al., 2023, Arun et al., 27 Jan 2026).
Seed Harmonization and Training-Free Plug-ins: IS-Diff introduces harmonization of initial diffusion noise for masked regions by sampling from a Gaussian mixture model fit to the unmasked image parts, followed by dynamic selective refinement checks for semantic compatibility at an intermediate timestep. HarmonPaint modifies U-Net attention maps to enforce mask separation structurally and transfers feature statistics or style vectors from unmasked to masked regions for stylistic coherence, all in a training-free manner (Lyu et al., 15 Sep 2025, Li et al., 22 Jul 2025).

3. Integration of Geometric and Anatomical Priors

Ear inpainting demands high geometric fidelity, with robust propagation of essential structures such as the helix, antihelix, concha, and lobule. Adaptation to the ear domain involves:

Precise Mask Construction: Ear segmentation or manual annotation is used to define the region for restoration. For accessories (earrings, earphones), mask generation leverages object detectors (YOLOv10), box proposal networks (Grounding DINO), and precise boundary extraction (SAM 2), optionally followed by morphological operations to refine the mask (Arun et al., 27 Jan 2026).
Patch Size and Tensor Scales: Small patch sizes (e.g., $n=8$ --$12$ for directional diffusion) and spatial scales ( $\sigma$ , $\nu$ ) below the feature width are critical to preserve thin cartilage details and curvilinear lobule boundaries (Deriu et al., 2015, Schaefer et al., 2023).
Orientation and Structure Alignment: Adapted orientation estimation (Sobel gradients, structure tensors, or tangent estimates at mask boundaries) ensures anisotropic updating operates along relevant anatomical contours, minimizing geometric artifacts (Deriu et al., 2015, Schaefer et al., 2023, Boscain et al., 2015).
Boundary Blending: Post-inpainting, edge blending (e.g., light Gaussian blur of mask boundary) is often applied to eliminate visible seams between restored and known pixels, particularly when directly compositing noisy and denoised reconstructions (Arun et al., 27 Jan 2026).

4. Quantitative and Qualitative Evaluation

Assessment of ear inpainting methods is performed using both signal-based and geometry-sensitive criteria:

Pixel-Level Metrics: Peak Signal-to-Noise Ratio (PSNR), Mean Squared Error (MSE), and Structural Similarity Index (SSIM) over the inpainted and entire image region (Schaefer et al., 2023, Deriu et al., 2015, Boscain et al., 2015).
Landmark and Structure-Based Measures: Landmark detector-based RMS error (for anatomical points), contour overlap (Dice coefficient), or average edge magnitude difference at mask boundaries (Lyu et al., 15 Sep 2025).
Biometric Performance Impact: Recognition system AUC measured for transformer-based backbones before and after inpainting, with observed gains up to +5.7 percentage points (pp) on heavily occluded datasets (Arun et al., 27 Jan 2026).
Perceptual and Human-Study Judgments: LPIPS, FID, CLIP Score, and practitioner-based blind tests comparing realism and identity preservation (Li et al., 22 Jul 2025, Lyu et al., 15 Sep 2025, Arun et al., 27 Jan 2026).

For example, regularised diffusion-shock inpainting yields PSNR in the range $28$–$32$ dB and SSIM $0.88$–$0.94$ for ear phantoms with large missing lobes (Schaefer et al., 2023), while directional diffusion achieves MSE $0.00055\pm0.00051$ with $n=16$ patches (Deriu et al., 2015).

5. Comparative Algorithmic Performance and Convergence

Empirical comparisons highlight trade-offs in structural preservation, computational complexity, and convergence guarantees:

Method	Pixel Fidelity (e.g. MSE/SSIM/PSNR)	Geometric/Style Coherence	Typical Use Case
Directional diffusion	Highest among classical diffusion	Strong edge propagation	Textured/curvilinear fill (Deriu et al., 2015)
Regularised diffusion-shock	Comparable or better than prior PDEs	Razor-sharp edge conservation	Anatomy with strong ridges (Schaefer et al., 2023)
AHE (hypoelliptic)	Robust under extreme corruption	Folds and contours preserved	≥80% missing, highly structured (Boscain et al., 2015)
RePaint $^+$	Linear (geometric) convergence	Provably unbiased sample recovery	Training-free, general masks (Rout et al., 2023)
IS-Diff	Harmonized, unbiased inpainting	High semantic/style consistency	Plug-in for existing DDIM (Lyu et al., 15 Sep 2025)
HarmonPaint	Enhanced structure and style harmony	Style transfer from context	Training-free, plug-and-play (Li et al., 22 Jul 2025)
Masked DDPM (biometric)	∼+2–6 pp gain in AUC (coarse patch)	Anatomical landmark fidelity	Ear recognition preprocessing (Arun et al., 27 Jan 2026)

Key theoretical results establish new guarantees for inpainting: RePaint $^+$ achieves linear convergence in the error norm due to exact realignment of drift and dispersion at each mask replacement step, overcoming the bias inherent in naïve mask-resample strategies (Rout et al., 2023).

6. Practical Implementation and Ear-Specific Adaptations

Successful application to ear inpainting requires task-aware pipeline design:

Hyperparameter Selection: Calibration of noise schedules, patch scales, shock regularization, and guidance scaling (e.g., $\tau$ , $\lambda$ in HarmonPaint) affects both structural and stylistic integration (Li et al., 22 Jul 2025).
Integration with Vision Pipelines: Diffusion-based models operate as preprocessing modules for recognition systems, with explicit mask-channel concatenation and mask-adaptive network modules for robustness to a diversity of occlusions, including challenging accessories (Arun et al., 27 Jan 2026).
Dynamic Sampling and Refinement: Mechanisms such as Dynamic Selective Refinement in IS-Diff permit automatic detection and rectification of semantically unaligned inpaintings partway through the reverse diffusion chain, increasing both visual plausibility and statistical harmonization with the unmasked context (Lyu et al., 15 Sep 2025).

Recommended evaluation includes side-by-side visualizations, edge continuity inspection, and landmark consistency metrics; user studies may supplement quantitative evidence when ground-truth is unavailable, as performed in several recent works.

7. Trends and Open Challenges

Recent advances in plug-and-play, training-free diffusion inpainting have significantly improved versatility and transferability across application domains without extensive retraining (Li et al., 22 Jul 2025, Lyu et al., 15 Sep 2025). The integration of masking and attention mechanisms, dynamic refinement of sampling trajectories, and the preservation of domain-specific geometric priors are central to the state of the art.

Despite quantitative and qualitative improvements, potential challenges remain in pathological occlusions covering the majority of the ear, severe background clutter, and cross-domain generalization. For extreme inpainting scenarios, leveraging explicit symmetry (e.g., between bilateral ears) or anatomical databases is a plausible direction.

This field continues to evolve towards a synergy between mathematically rigorous diffusion models, geometric PDE methods, and large-scale masked conditional generative frameworks, enabling practical and robust restoration for critical biometric and forensic applications.