Weakly-Supervised Convolutional Neural Networks for Multimodal Image Registration

Published 9 Jul 2018 in cs.CV, cs.AI, cs.LG, and cs.NE | (1807.03361v1)

Abstract: One of the fundamental challenges in supervised learning for multimodal image registration is the lack of ground-truth for voxel-level spatial correspondence. This work describes a method to infer voxel-level transformation from higher-level correspondence information contained in anatomical labels. We argue that such labels are more reliable and practical to obtain for reference sets of image pairs than voxel-level correspondence. Typical anatomical labels of interest may include solid organs, vessels, ducts, structure boundaries and other subject-specific ad hoc landmarks. The proposed end-to-end convolutional neural network approach aims to predict displacement fields to align multiple labelled corresponding structures for individual image pairs during the training, while only unlabelled image pairs are used as the network input for inference. We highlight the versatility of the proposed strategy, for training, utilising diverse types of anatomical labels, which need not to be identifiable over all training image pairs. At inference, the resulting 3D deformable image registration algorithm runs in real-time and is fully-automated without requiring any anatomical labels or initialisation. Several network architecture variants are compared for registering T2-weighted magnetic resonance images and 3D transrectal ultrasound images from prostate cancer patients. A median target registration error of 3.6 mm on landmark centroids and a median Dice of 0.87 on prostate glands are achieved from cross-validation experiments, in which 108 pairs of multimodal images from 76 patients were tested with high-quality anatomical labels.

Abstract PDF Upgrade to Chat

Citations (359)

View on Semantic Scholar

Summary

The paper introduces a weakly-supervised CNN approach that leverages anatomical labels for robust multimodal image registration.
The study applies a multiscale Dice loss within a memory-efficient architecture to overcome class imbalance and enhance landmark alignment.
Experiments on 76 patients yielded a median TRE of 3.6 mm and DSC of 0.87, demonstrating significant improvements over traditional methods.

Weakly-Supervised Convolutional Neural Networks for Multimodal Image Registration

The paper "Weakly-Supervised Convolutional Neural Networks for Multimodal Image Registration" addresses a pivotal challenge in multimodal medical image registration, specifically when dealing with images produced by different modalities, such as T2-weighted Magnetic Resonance Imaging (MRI) and 3D transrectal ultrasound (TRUS) for prostate cancer patients. The traditional approaches in this domain rely heavily on voxel-level spatial correspondence, which is often difficult to ascertain due to the lack of reliable ground-truth data. This study proposes an innovative solution aimed at overcoming these challenges by leveraging convolutional neural networks (CNNs) in a weakly-supervised learning framework.

Methodology Overview

The core contribution of this work is in formulating a weakly-supervised learning strategy that utilizes anatomical labels instead of voxel-level matching. These anatomical labels, which can represent organs, boundaries, and other significant landmarks, serve as higher-level correspondence indicators in the training phase. The proposed CNN model is trained to predict dense displacement fields (DDFs), which allows the alignment of labeled anatomical structures across different image pairs, encouraging the training process to focus on anatomically consistent transformations rather than individual voxel intensity correlations.

Network Architecture

The framework integrates a memory-efficient network architecture with multiscale Dice loss as a key component. This loss function enhances label similarity across different scales, proving to be beneficial in overcoming class imbalance issues. The research investigates several variants of the network architecture, testing configurations like multiscale cross-entropy loss, pre-filtered label maps, and different predictions at various resolutions, ultimately settling on a design that balances computational efficiency with accuracy.

Experimental Results

The study includes a comprehensive cross-validation using a substantial dataset of 76 patients involving 108 pairs of multimodal images. The CNN approach displayed a median target registration error (TRE) of 3.6 mm and a median Dice similarity coefficient (DSC) of 0.87 on prostate gland segmentation. These results indicate a marked improvement over classical pairwise intensity-based methods, which are fraught with challenges due to intricate dependencies on the imaging process, spatial-temporal variabilities, and computational constraints.

Implications and Future Work

The results of this study hold significant implications for clinical practice. The proposed method offers a reliable, real-time, non-iterative image registration solution that requires no anatomical labels or initialization during inference. This represents a step forward in image-guided interventions, potentially transforming procedures where multimodal image integration is crucial.

The paper suggests several avenues for future research. There is potential to generalize this framework to other clinical applications with similar imaging constraints. Further refinement of the training process could enhance the model's robustness, with future work potentially exploring advanced regularization techniques and greater exploitation of 3D spatial information.

In conclusion, the work makes a valuable contribution to the field of medical image registration by presenting a feasible and efficient solution to the inherent challenges faced in multimodal imaging applications. The versatility and adaptability of the proposed framework could inspire further developments in automated image registration methodologies across diverse medical imaging settings.