Papers
Topics
Authors
Recent
Search
2000 character limit reached

LIX: Implicitly Infusing Spatial Geometric Prior Knowledge into Visual Semantic Segmentation for Autonomous Driving

Published 13 Mar 2024 in cs.CV, cs.AI, cs.LG, and cs.RO | (2403.08215v2)

Abstract: Despite the impressive performance achieved by data-fusion networks with duplex encoders for visual semantic segmentation, they become ineffective when spatial geometric data are not available. Implicitly infusing the spatial geometric prior knowledge acquired by a data-fusion teacher network into a single-modal student network is a practical, albeit less explored research avenue. This article delves into this topic and resorts to knowledge distillation approaches to address this problem. We introduce the Learning to Infuse ''X'' (LIX) framework, with novel contributions in both logit distillation and feature distillation aspects. We present a mathematical proof that underscores the limitation of using a single, fixed weight in decoupled knowledge distillation and introduce a logit-wise dynamic weight controller as a solution to this issue. Furthermore, we develop an adaptively-recalibrated feature distillation algorithm, including two novel techniques: feature recalibration via kernel regression and in-depth feature consistency quantification via centered kernel alignment. Extensive experiments conducted with intermediate-fusion and late-fusion networks across various public datasets provide both quantitative and qualitative evaluations, demonstrating the superior performance of our LIX framework when compared to other state-of-the-art approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. J. Li et al., “RoadFormer: Duplex Transformer for rgb-normal semantic road scene parsing,” arXiv preprint arXiv:2309.10356, 2023.
  2. R. Fan et al., “SNE-RoadSeg: Incorporating surface normal information into semantic segmentation for accurate freespace detection,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 340–356.
  3. J. Zhang et al., “CMX: Cross-modal fusion for RGB-X semantic segmentation with transformers,” IEEE Transactions on Intelligent Transportation Systems, 2023.
  4. B. Yin et al., “DFormer: Rethinking RGBD representation learning for semantic segmentation,” arXiv preprint arXiv:2309.09668, 2023.
  5. J. Zhang et al., “Delivering arbitrary-modal semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 1136–1147.
  6. Hazirbas et al., “FuseNet: Incorporating depth into semantic segmentation via fusion-based cnn architecture,” in Proceedings of the Asian Conference on Computer Vision (ACCV).   Springer, 2017, pp. 213–228.
  7. Y. Liu et al., “Application of multi-modal fusion attention mechanism in semantic segmentation,” in Proceedings of the Asian Conference on Computer Vision (ACCV), 2022, pp. 1245–1264.
  8. M. Cordts et al., “The CityScapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3213–3223.
  9. G. Hinton et al., “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
  10. C. Li, G. Cheng, and J. Han, “Boosting knowledge distillation via intra-class logit distribution smoothing,” IEEE Transactions on Circuits and Systems for Video Technology, pp. 1–1, 2023.
  11. C. Shu et al., “Channel-wise knowledge distillation for dense prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 5311–5320.
  12. S. Zagoruyko and N. Komodakis, “Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer,” arXiv preprint arXiv:1612.03928, 2016.
  13. J. Kim et al., “Paraphrasing complex network: Network compression via factor transfer,” Advances in Neural Information Processing Systems (NeurIPS), vol. 31, 2018.
  14. Z. Huang and N. Wang, “Like what you like: Knowledge distill via neuron selectivity transfer,” arXiv preprint arXiv:1707.01219, 2017.
  15. W. Park et al., “Relational knowledge distillation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3967–3976.
  16. F. Tung and G. Mori, “Similarity-preserving knowledge distillation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 1365–1374.
  17. S. Ahn et al., “Variational information distillation for knowledge transfer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9163–9171.
  18. P. Chen et al., “Distilling knowledge via knowledge review,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 5008–5017.
  19. S. Hu et al., “Multi-modal unsupervised domain adaptation for semantic image segmentation,” Pattern Recognition, vol. 137, p. 109299, 2023.
  20. N. C. Garcia et al., “Modality distillation with multiple stream networks for action recognition,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 103–118.
  21. B. Zhao et al., “Decoupled knowledge distillation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 11 953–11 962.
  22. T. Nguyen et al., “Do wide and deep networks learn the same things? uncovering how neural network representations vary with width and depth,” arXiv preprint arXiv:2010.15327, 2020.
  23. W.-D. K. Ma et al., “The hsic bottleneck: Deep learning without back-propagation,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 34, no. 04, 2020, pp. 5085–5092.
  24. Y. Zhang et al., “Deep multimodal fusion for semantic image segmentation: A survey,” Image and Vision Computing, vol. 105, p. 104042, 2021.
  25. H. Wang et al., “SNE-RoadSeg+: Rethinking depth-normal translation and deep supervision for freespace detection,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2021, pp. 1140–1145.
  26. Q. Ha et al., “MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2017, pp. 5108–5115.
  27. A. Valada et al., “Adapnet: Adaptive semantic segmentation in adverse environmental conditions,” in 2017 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2017, pp. 4644–4651.
  28. Y. Cheng et al., “Locality-sensitive deconvolution networks with gated fusion for rgb-d indoor semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3029–3037.
  29. X. Liu et al., “Representation learning using multi-task deep neural networks for semantic classification and information retrieval,” in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015, pp. 912–921.
  30. A. Romero et al., “FitNets: Hints for thin deep nets,” arXiv preprint arXiv:1412.6550, 2014.
  31. Z. Guo et al., “Boosting graph neural networks via adaptive knowledge distillation,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 37, no. 6, 2023, pp. 7793–7801.
  32. B. Peng et al., “Correlation congruence for knowledge distillation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 5007–5016.
  33. B. Heo et al., “Knowledge transfer via distillation of activation boundaries formed by hidden neurons,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 33, no. 01, 2019, pp. 3779–3787.
  34. J. Yim et al., “A gift from knowledge distillation: Fast optimization, network minimization and transfer learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4133–4141.
  35. S. Kornblith et al., “Similarity of neural network representations revisited,” in International Conference on Machine Learning (ICML).   PMLR, 2019, pp. 3519–3529.
  36. C. Cortes et al., “Algorithms for learning kernels based on centered alignment,” The Journal of Machine Learning Research, vol. 13, no. 1, pp. 795–828, 2012.
  37. Y. Cabon et al., “Virtual KITTI 2,” arXiv preprint arXiv:2001.10773, 2020.
  38. M. Menze and A. Geiger, “Object scene flow for autonomous vehicles,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3061–3070.
  39. J. Li et al., “Practical stereo matching via cascaded recurrent network with adaptive correlation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 16 263–16 272.
  40. H. Robbins and S. Monro, “A stochastic approximation method,” The Annals of Mathematical Statistics, pp. 400–407, 1951.
  41. C. Min et al., “ORFD: A dataset and benchmark for off-road freespace detection,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 2532–2538.
  42. E. Xie et al., “SegFormer: Simple and efficient design for semantic segmentation with transformers,” Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 12 077–12 090, 2021.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.