PECI-Net: Bolus segmentation from video fluoroscopic swallowing study images using preprocessing ensemble and cascaded inference
Abstract: Bolus segmentation is crucial for the automated detection of swallowing disorders in videofluoroscopic swallowing studies (VFSS). However, it is difficult for the model to accurately segment a bolus region in a VFSS image because VFSS images are translucent, have low contrast and unclear region boundaries, and lack color information. To overcome these challenges, we propose PECI-Net, a network architecture for VFSS image analysis that combines two novel techniques: the preprocessing ensemble network (PEN) and the cascaded inference network (CIN). PEN enhances the sharpness and contrast of the VFSS image by combining multiple preprocessing algorithms in a learnable way. CIN reduces ambiguity in bolus segmentation by using context from other regions through cascaded inference. Moreover, CIN prevents undesirable side effects from unreliably segmented regions by referring to the context in an asymmetric way. In experiments, PECI-Net exhibited higher performance than four recently developed baseline models, outperforming TernausNet, the best among the baseline models, by 4.54\% and the widely used UNet by 10.83\%. The results of the ablation studies confirm that CIN and PEN are effective in improving bolus segmentation performance.
- A preliminary deep learning study on automatic segmentation of contrast-enhanced bolus in videofluorography of swallowing. Scientific Reports, 12(1):18754, 2022.
- Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(12):2481–2495, 2017.
- Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6154–6162, 2018.
- Automated bolus detection in videofluoroscopic images of swallowing using mask-rcnn. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 2173–2177. IEEE, 2020.
- Swin-unet: Unet-like pure transformer for medical image segmentation. In European conference on computer vision, pages 205–218. Springer, 2022.
- Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2010.
- Medical deep learning—a systematic meta-review. Computer methods and programs in biomedicine, 221:106874, 2022.
- Automatic hyoid bone tracking in real-time ultrasound swallowing videos using deep learning based and correlation filter based trackers. Sensors, 21(11):3712, 2021.
- Digital Image Processing. Pearson, 4th edition, 2018.
- Dysphagia, an unrecognized handicap. Dysphagia, 6:193–199, 1991.
- Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
- Hiformer: Hierarchical multi-scale representations using transformers for medical image segmentation. arXiv preprint arXiv:2207.08518, 2022.
- Deep learning for automatic hyoid tracking in videofluoroscopic swallow studies. Dysphagia, 38(1):171–180, 2023.
- Detection of aspiration from images of a videofluoroscopic swallowing study adopting deep learning. Oral Radiology, pages 1–10, 2023.
- Hyoid bone tracking in a videofluoroscopic swallowing study using a deep-learning-based segmentation network. Diagnostics, 11(7):1147, 2021.
- Deep learning analysis to automatically detect the presence of penetration or aspiration in videofluoroscopic swallowing study. Journal of Korean Medical Science, 37(6), 2022a.
- Clinical usefulness of the korean version of the dysphagia handicap index: reliability, validity, and role as a screening test. Dysphagia, 37(1):183–191, 2022b.
- Semi-automatic tracking, smoothing and segmentation of hyoid bone motion from videofluoroscopic swallowing study. PloS one, 12(11):e0188684, 2017.
- Segment anything. 2023.
- Machine learning analysis to automatically measure response time of pharyngeal swallowing reflex in videofluoroscopic swallowing study. Scientific Reports, 10(1):14735, 2020.
- Automatic pharyngeal phase recognition in untrimmed videofluoroscopic swallowing study using transfer learning with deep convolutional neural networks. Diagnostics, 11(2):300, 2021.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
- Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Segment anything in medical images. 2023.
- The videofluorographic swallowing study. Physical medicine and rehabilitation clinics of North America, 19(4):769–785, 2008.
- Dysphagia and swallowing disorders. Medical Clinics, 105(5):939–954, 2021.
- V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV), pages 565–571. Ieee, 2016.
- Overview of segmentation x-ray medical images using image processing technique. In Journal of Physics: Conference Series, volume 1529, page 042017. IOP Publishing, 2020.
- Thomas Brox Olaf Ronneberger, Philipp Fischer. U-net: Convolutional networks for biomedical image segmentation. arXiv preprint arXiv:1505.04597, 2015.
- Videofluoroscopic studies of swallowing dysfunction and the relative risk of pneumonia. American Journal of Roentgenology, 180(6):1613–1616, 2003.
- Adaptive histogram equalization and its variations. Computer vision, graphics, and image processing, 39(3):355–368, 1987.
- Pose machines: Articulated pose estimation via inference machines. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part II 13, pages 33–47. Springer, 2014.
- Ali M Reza. Realization of the contrast limited adaptive histogram equalization (clahe) for real-time image enhancement. Journal of VLSI signal processing systems for signal, image and video technology, 38:35–44, 2004.
- A penetration-aspiration scale. Dysphagia, 11:93–98, 1996.
- Artificial intelligence and dysphagia: novel solutions to old problems. Arquivos de Gastroenterologia, 57:343–346, 2020.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
- Impact of formal training on agreement of videofluoroscopic swallowing study interpretation across and within disciplines. Abdominal Radiology, 43:2938–2944, 2018.
- Dysphagia in parkinson’s disease. Dysphagia, 31(1):24–32, 2016.
- A systematic review of the prevalence of oropharyngeal dysphagia in stroke, parkinson’s disease, alzheimer’s disease, head injury, and pneumonia. Dysphagia, 31(3):434–441, 2016.
- Video-fluoroscopic swallowing study scale for predicting aspiration pneumonia in parkinson’s disease. PloS one, 13(6):e0197608, 2018.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Alexey Shvets Vladimir Iglovikov. Ternausnet: U-net with vgg11 encoder pre-trained on imagenet for image segmentation. arXiv preprint arXiv:1801.05746, 2018.
- Convolutional pose machines. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 4724–4732, 2016.
- Prevalence of oropharyngeal dysphagia in geriatric patients and real-life associations with diseases and drugs. Scientific Reports, 11(1):21955, 2021.
- Correlations between aspiration and pharyngeal residue scale scores for fiberoptic endoscopic evaluation and videofluoroscopy. Yonsei Medical Journal, 60(12):1181–1186, 2019.
- Video-transunet: temporally blended vision transformer for ct vfss instance segmentation. In Fifteenth International Conference on Machine Vision (ICMV 2022), volume 12701, pages 98–105. SPIE, 2023a.
- Video-swinunet: Spatio-temporal deep learning framework for vfss instance segmentation. 2023b.
- Automatic hyoid bone detection in fluoroscopic images using deep learning. Scientific reports, 8(1):12310, 2018.
- Automatic annotation of cervical vertebrae in videofluoroscopy images via deep learning. Medical image analysis, 74:102218, 2021.
- Evaluation of normal swallowing functions by using dynamic high-density surface electromyography maps. Biomedical engineering online, 16:1–18, 2017.
- Statistical validation of image segmentation quality based on a spatial overlap index1: scientific reports. Academic radiology, 11(2):178–189, 2004.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.