Joint Bilateral Filtering Strategy

Updated 10 January 2026

Joint bilateral filtering strategy is an edge-aware technique that uses a separate guidance image to drive denoising and upsampling while preserving key boundaries.
It combines classical spatial and range kernels with modern deep neural architectures and reinforcement learning to adaptively optimize filtering parameters.
Applications include low-dose CT denoising, multimodal upsampling, and edge-preserving detail enhancement, achieving high SSIM and competitive PSNR metrics.

A joint bilateral filtering strategy is a family of edge-aware filtering techniques in which a target signal is denoised or upsampled using a spatial kernel and a range kernel computed not solely from the target image itself, but by leveraging a separate guidance image. This guidance image, which may derive from the same modality or a different one, provides structure priors that allow the filter to selectively suppress noise or enhance features while explicitly preserving salient boundaries and textures. Recent developments combine these classical concepts with deep neural architectures—either by using deep networks to estimate guidance images, by embedding joint bilateral filters as trainable layers, or by adaptively optimizing kernel parameters via auxiliary networks or reinforcement learning.

1. Mathematical Foundations of Joint Bilateral Filtering

The canonical joint bilateral filter (JBF) computes a denoised output $I_f$ at spatial position $x$ as a normalized weighted sum over a local neighborhood $N(x)$ : $I_f(x) = \frac{ \sum_{o \in N(x)} I_n(o) \; G_{\sigma_s}(x - o) \; G_{\sigma_i}\big(I_g(x) - I_g(o)\big) }{ \sum_{o \in N(x)} G_{\sigma_s}(x - o) \; G_{\sigma_i}\big(I_g(x) - I_g(o)\big) }$ where $I_n$ is the noisy input, $I_g$ is the guidance image, $G_{\sigma_s}$ is a spatial (usually Gaussian) kernel of standard deviation $\sigma_s$ , and $G_{\sigma_i}$ is the range (intensity) kernel with width $\sigma_i$ or $x$ 0. Weighted averaging in the spatial domain enforces locality, while the range kernel restricts the influence of pixels whose guidance values differ sharply, ensuring edges in $x$ 1—typically (but not necessarily) less noisy—are preserved in $x$ 2 (Patwari et al., 2020, Patwari et al., 2020, Wagner et al., 2022, Li et al., 2017, Gupta et al., 2018).

Trainable variants parametrize $x$ 3 and $x$ 4 (e.g., shallow CNNs or learned Gaussian widths) and, by back-propagation through the weights, permit direct learning of spatial/range selectivity from data (Patwari et al., 2020, Wagner et al., 2022). For 3D volumetric data, bandwidths $x$ 5, $x$ 6, $x$ 7, and $x$ 8 may be optimized independently.

2. Guidance Image Estimation

Filtering quality depends critically on the guidance image $x$ 9. In low-dose CT applications, $N(x)$ 0 is estimated with a deep residual CNN. A typical architecture consists of stacked 3D conv layers (without padding) followed by 2D deconvolutions and skip connections, trained patch-wise to minimize mean squared error to a full-dose (noise-free) target: $N(x)$ 1 where $N(x)$ 2 denotes the network. The guidance image is then $N(x)$ 3 (Patwari et al., 2020, Patwari et al., 2020, Wagner et al., 2022). This denoiser is typically pre-trained and then frozen when the joint bilateral filter is applied. Guidance images for multimodal or cross-domain tasks may be estimated from other sensory inputs (e.g., RGB for depth upsampling), allowing the JBF to transfer structural information across modalities (Li et al., 2017, Gupta et al., 2018).

3. Trainable Joint Bilateral Filtering Pipelines

Three major neural architectures for JBF have been advanced:

Explicit Filter Parameterization: Each JBF block contains shallow CNN subnetworks predicting the spatial and range kernel weights from local patches. These subnetworks have minimal parameters (e.g., two 3D-conv + ReLU layers for each kernel; 112 parameters per block), enabling interpretability (Patwari et al., 2020).
Hybrid Denoiser-Filter Cascades: A pre-trained DNN (e.g., RED-CNN, QAE) predicts the guidance image $N(x)$ 4. Several JBF layers follow, with only the JBF kernel widths being trainable (typically 12 parameters for three JBF layers operating on 3D CT). The network constrains denoising at test time, acting as a regularizer and providing safety bounds when the guidance is untrustworthy (Wagner et al., 2022).
CNN Generalizations of JBF: End-to-end architectures process both target and guidance images with parallel CNN branches, fusing features in a third pathway and predicting either residuals or corrections to the target signal. Such designs support cross-modal generalization and offer learnable structure-transfer gating (Li et al., 2017).

Ablation studies confirm that learning spatial and range kernels—versus using fixed Gaussians—improves structure preservation and quantitative measures (PSNR, SSIM). Iterating multiple JBF blocks typically enhances denoising further (Patwari et al., 2020).

4. Adaptive Parameter Optimization and Interpretability

Classic bilateral filters use heuristic or globally fixed $N(x)$ 5, $N(x)$ 6. Modern JBF pipelines often employ pixel-adaptive optimization:

Reinforcement Learning (RL): Parameters $N(x)$ 7, $N(x)$ 8 are locally updated by an RL agent operating on image patches. This agent selects which parameter to modify and by how much (discrete steps: $N(x)$ 9, $I_f(x) = \frac{ \sum_{o \in N(x)} I_n(o) \; G_{\sigma_s}(x - o) \; G_{\sigma_i}\big(I_g(x) - I_g(o)\big) }{ \sum_{o \in N(x)} G_{\sigma_s}(x - o) \; G_{\sigma_i}\big(I_g(x) - I_g(o)\big) }$ 0, $I_f(x) = \frac{ \sum_{o \in N(x)} I_n(o) \; G_{\sigma_s}(x - o) \; G_{\sigma_i}\big(I_g(x) - I_g(o)\big) }{ \sum_{o \in N(x)} G_{\sigma_s}(x - o) \; G_{\sigma_i}\big(I_g(x) - I_g(o)\big) }$ 1). A reward network predicts a scalar image-quality metric; updates that increase quality (as judged by the reward function) are encouraged. Joint Bellman updates are computed via deep Q-learning, with intertwined heads for parameter selection and adjustment (Patwari et al., 2020). This approach limits black-box effects and increases accountability, as the only degrees of freedom affecting the output are explicitly geometric and statistical.
Trainable Gaussian Widths: As in (Wagner et al., 2022), only a small set of kernel width parameters are trained for each JBF block. Gradients can be derived analytically, and parameters are updated by optimizers such as Adam, decoupling guidance network training from filter adaptation.

Adaptivity at the pixel or patch level allows the filter to aggressively smooth in homogeneous regions while strictly preserving edges, as governed by the local structure in the guidance image.

5. Applications and Quantitative Evaluation

Joint bilateral filtering is widely employed in:

Low-dose CT Denoising: State-of-the-art pipelines (JBFnet, RL-tuned JBF, cascaded JBFs) outperform pure deep networks such as GANs, CPCE3D, Deep GFnet, and others, with higher SSIM (up to 0.9825) and competitive PSNR. JBF’s regime of a small number of interpretable parameters improves clinical accountability (Patwari et al., 2020, Patwari et al., 2020, Wagner et al., 2022).
Edge-preserving Detail Enhancement: By coupling to a guidance volume, JBF preserves anatomical or structural details even under significant noise, showing robustness to guidance errors by constraining worst-case influence through the range kernel.
Multimodal Upsampling and Cross-modal Denoising: In classical and learned forms, JBF is essential for depth/RGB fusion, flash/no-flash denoising, and saliency/color upsampling, producing state-of-the-art RMSE and F-measure scores with real-time performance (Li et al., 2017).
Phase-Preserved Coefficient Denoising: For image denoising in the Curvelet domain, JBF is used to recover magnitudes of coefficients below a hard threshold, while preserving phase to maintain edge and location information, outperforming classical methods and matching BM3D at high noise levels (Gupta et al., 2018).

Results on AAPM test sets demonstrate JBFnet achieves a mean SSIM of $I_f(x) = \frac{ \sum_{o \in N(x)} I_n(o) \; G_{\sigma_s}(x - o) \; G_{\sigma_i}\big(I_g(x) - I_g(o)\big) }{ \sum_{o \in N(x)} G_{\sigma_s}(x - o) \; G_{\sigma_i}\big(I_g(x) - I_g(o)\big) }$ 2 (full system), and a learned (fixed) JBF attains highest PSNR (46.79 dB) among peer methods (Patwari et al., 2020). Hybrid pipelines yield significant RMSE reductions (up to 82%) when combining DNN guidance with JBF post-processing in challenging, out-of-distribution scenarios (Wagner et al., 2022).

6. Theoretical Guarantees, Limitations, and Extensions

JBF’s range kernel ensures the filter does not oversmooth across, or hallucinate, edges present in the guidance image, and stricter guarantees are imposed when the guidance network is frozen or trusted only to supply structure. Even catastrophic errors in network output can only influence the filtered signal within the envelope permitted by the Gaussian range kernel (Wagner et al., 2022). Only two (or a small handful in multi-axis cases) interpretable parameters mediate all smoothing and edge preservation.

Limitations include the potential for over-smoothing fine details if these are absent or inconsistent between target and guidance, and a theoretical understanding of learned guidance and selective structure transfer is not fully developed. Performance in highly textured regions unseen in training may lag, motivating research into attention mechanisms or more sophisticated structure-aware regularizers (Li et al., 2017). For Curvelet-based domains, JBF is integrated with standard bilateral and guided image filters for full recovery, highlighting applicability across spatial and transform domains (Gupta et al., 2018).