Papers
Topics
Authors
Recent
Search
2000 character limit reached

Understanding differences in applying DETR to natural and medical images

Published 27 May 2024 in cs.CV | (2405.17677v2)

Abstract: Transformer-based detectors have shown success in computer vision tasks with natural images. These models, exemplified by the Deformable DETR, are optimized through complex engineering strategies tailored to the typical characteristics of natural scenes. However, medical imaging data presents unique challenges such as extremely large image sizes, fewer and smaller regions of interest, and object classes which can be differentiated only through subtle differences. This study evaluates the applicability of these transformer-based design choices when applied to a screening mammography dataset that represents these distinct medical imaging data characteristics. Our analysis reveals that common design choices from the natural image domain, such as complex encoder architectures, multi-scale feature fusion, query initialization, and iterative bounding box refinement, do not improve and sometimes even impair object detection performance in medical imaging. In contrast, simpler and shallower architectures often achieve equal or superior results. This finding suggests that the adaptation of transformer models for medical imaging data requires a reevaluation of standard practices, potentially leading to more efficient and specialized frameworks for medical diagnosis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Area under the free-response roc curve (froc) and a related summary index. Biometrics, 65(1):247–256, 2009.
  2. End-to-end object detection with transformers. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pages 213–229. Springer, 2020.
  3. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306, 2021.
  4. Group detr: Fast training convergence with decoupled one-to-many label assignment. arXiv preprint arXiv:2207.13085, 2022a.
  5. Conditional detr v2: Efficient detection transformer with box queries. arXiv preprint arXiv:2207.08914, 2022b.
  6. Dynamic detr: End-to-end object detection with dynamic attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2988–2997, 2021a.
  7. Transmed: Transformers advance multi-modal medical image classification. Diagnostics, 11(8):1384, 2021b.
  8. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  9. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88:303–338, 2010.
  10. Balanced-mixup for highly imbalanced medical image classification. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part V 24, pages 323–333. Springer, 2021.
  11. High-resolution synthesis of high-density breast mammograms: Application to improved fairness in deep learning based mass detection. Frontiers in Oncology, 12:1044496, 01 2023. .
  12. Neural networks and the bias/variance dilemma. Neural Computation, 4(1):1–58, 1992.
  13. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 580–587, 2014.
  14. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, pages 2961–2969, 2017.
  15. The digital database for screening mammography. In M.J. Yaffe, editor, Proceedings of the Fifth International Workshop on Digital Mammography, pages 212–218. Medical Physics Publishing, 2001. ISBN 1-930524-00-5.
  16. Ai-based cancer detection model for contrast-enhanced mammography. Bioengineering, 10:974, 08 2023. .
  17. Yolov4-based cnn model versus nested contours algorithm in the suspicious lesion detection on the mammography image: A direct comparison in the real clinical settings. Journal of Imaging, 8:88, 03 2022. .
  18. A Competition, Benchmark, Code, and Data for Using Artificial Intelligence to Detect Lesions in Digital Breast Tomosynthesis. JAMA Network Open, 6(2):e230524–e230524, 02 2023. ISSN 2574-3805. . URL https://doi.org/10.1001/jamanetworkopen.2023.0524.
  19. Harold W Kuhn. The hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1-2):83–97, 1955.
  20. 204Mass with Indistinct Margins. In Breast Imaging. Oxford University Press, 01 2018. ISBN 9780190270261. . URL https://doi.org/10.1093/med/9780190270261.003.0024.
  21. Myopiadetr: End-to-end pathological myopia detection based on transformer using 2d fundus images. Frontiers in Neuroscience, 17:1130609, 2023.
  22. D^ 2etr: Decoder-only detr with computationally efficient cross-scale attention. arXiv preprint arXiv:2203.00860, 2022.
  23. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
  24. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, pages 2980–2988, 2017.
  25. Dab-detr: Dynamic anchor boxes are better queries for detr. arXiv preprint arXiv:2201.12329, 2022.
  26. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021.
  27. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  28. Lymph node detection in t2 mri with transformers. In Medical Imaging 2022: Computer-Aided Diagnosis, volume 12033, pages 855–859. SPIE, 2022.
  29. The brain tumor segmentation (brats-mets) challenge 2023: Brain metastasis segmentation on pre-treatment mri. arXiv preprint arXiv:2306.00838, 2023.
  30. Evaluation of computer-aided detection and diagnosis systems a. Medical Physics, 40(8):087001, 2013.
  31. Attention-based transformers for instance segmentation of cells in microstructures. In 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 700–707. IEEE, 2020.
  32. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 779–788, 2016.
  33. Generalized intersection over union. June 2019.
  34. (Input) Size Matters for CNN Classifiers, page 133–144. Springer International Publishing, 2021. ISBN 9783030863401. . URL http://dx.doi.org/10.1007/978-3-030-86340-1_11.
  35. Sparse detr: Efficient end-to-end object detection with learnable sparsity. arXiv preprint arXiv:2111.14330, 2021.
  36. The effect of image resolution on deep learning in radiography. Radiology: Artificial Intelligence, 2(1):e190015, 2020.
  37. Cotr: Convolution in transformer network for end to end polyp detection. In 2021 7th International Conference on Computer and Communications (ICCC), pages 1757–1761. IEEE, 2021.
  38. Impact of image resolution on deep learning performance in endoscopy image classification: an experimental study using a large dataset of endoscopic images. Diagnostics, 11(12):2183, 2021.
  39. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning, pages 10347–10357. PMLR, 2021.
  40. Medical transformer: Gated axial-attention for medical image segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, pages 36–46. Springer, 2021.
  41. Attention is all you need. Advances in Neural Information Processing Systems, 30, 2017.
  42. Development and validation of a deep-learning algorithm for the detection of polyps during colonoscopy. Nature Biomedical Engineering, 2(10):741–748, 2018.
  43. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2097–2106, 2017.
  44. Anchor detr: Query design for transformer-based detector. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 2567–2575, 2022.
  45. The nyu breast cancer screening dataset v1.0. Tech. rep., New York Univ., New York, NY, USA, 2019.
  46. Efficient detr: improving end-to-end object detector with dense prior. arXiv preprint arXiv:2104.01318, 2021.
  47. A systematic survey of deep learning in breast cancer. International Journal of Intelligent Systems, 37(1):152–216, 2022.
  48. A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection. IEEE Transactions on Instrumentation and Measurement, 71:1–14, 2022.
  49. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605, 2022.
  50. A graph-transformer for whole slide image classification. IEEE Transactions on Medical Imaging, 41(11):3003–3015, 2022.
  51. Mffenet: Multiscale feature fusion and enhancement network for rgb–thermal urban road scene parsing. IEEE Transactions on Multimedia, 24:2526–2538, 2021.
  52. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159, 2020.
  53. Detrs with collaborative hybrid assignments training. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6748–6758, 2023.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.