Dense Depth Distillation with Out-of-Distribution Simulated Images
Abstract: We study data-free knowledge distillation (KD) for monocular depth estimation (MDE), which learns a lightweight model for real-world depth perception tasks by compressing it from a trained teacher model while lacking training data in the target domain. Owing to the essential difference between image classification and dense regression, previous methods of data-free KD are not applicable to MDE. To strengthen its applicability in real-world tasks, in this paper, we propose to apply KD with out-of-distribution simulated images. The major challenges to be resolved are i) lacking prior information about scene configurations of real-world training data and ii) domain shift between simulated and real-world images. To cope with these difficulties, we propose a tailored framework for depth distillation. The framework generates new training samples for embracing a multitude of possible object arrangements in the target domain and utilizes a transformation network to efficiently adapt them to the feature statistics preserved in the teacher model. Through extensive experiments on various depth estimation models and two different datasets, we show that our method outperforms the baseline KD by a good margin and even achieves slightly better performance with as few as 1/6 of training images, demonstrating a clear superiority.
- Learning efficient object detection models with knowledge distillation, in: Advances in neural information processing systems, pp. 742–751.
- Learning student networks in the wild, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6424–6433.
- Structure-aware residual pyramid network for monocular depth estimation, in: International Joint Conferences on Artificial Intelligence, pp. 694–700.
- S2r-depthnet: Learning a generalizable depth-specific structural representation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3033–3042.
- Feature-map-level online adversarial knowledge distillation, in: International Conference on Machine Learning, pp. 2006–2015.
- Scannet: Richly-annotated 3d reconstructions of indoor scenes, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5828–5839.
- Imagenet: A large-scale hierarchical image database, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255.
- How do neural networks see depth in single images?, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2183–2191.
- DepthLab: Real-Time 3D Interaction With Depth Maps for Mobile Augmented Reality, in: Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, pp. 829–843.
- Mosaicking to distill: Knowledge distillation from out-of-domain data, in: Advances in Neural Information Processing Systems, pp. 11920–11932.
- Up to 100x faster data-free knowledge distillation, in: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 6597–6604.
- Data-free adversarial distillation. arXiv preprint arXiv:1912.11006 .
- Deep ordinal regression network for monocular depth estimation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2002–2011.
- Semantic histogram based graph matching for real-time multi-robot global localization in large scale environment. IEEE Robotics and Automation Letters 6, 8349–8356.
- Deep residual learning for image recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 2.
- Deep depth completion from extremely sparse data: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 8244–8264.
- Boosting lightweight depth estimation via knowledge distillation, in: International Conference on Knowledge Science, Engineering and Management, pp. 27–39.
- Progressive self-distillation for ground-to-aerial perception knowledge transfer. arXiv preprint arXiv:2208.13404 .
- Analysis of deep networks for monocular depth estimation through adversarial attacks with proposal of a defense method. arXiv preprint arXiv:1911.08790 .
- Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1043–1051.
- Squeeze-and-excitation networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141.
- Visualization of convolutional neural networks for monocular depth estimation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3869–3878.
- Like what you like: Knowledge distill via neuron selectivity transfer. arXiv preprint arXiv:1707.01219 .
- Guiding monocular depth estimation using depth-attention volume, in: European Conference on Computer Vision (ECCV), pp. 581–597.
- Deeper depth prediction with fully convolutional residual networks, in: International Conference on 3D Vision (3DV), pp. 239–248.
- Sparse and dense data with cnns: Depth completion and semantic segmentation, in: International Conference on 3D Vision (3DV), pp. 52–60.
- Plnet: Plane and line priors for unsupervised indoor depth estimation, in: International Conference on 3D Vision (3DV), pp. 741–750.
- Domain adaptation without source data. IEEE Transactions on Artificial Intelligence 2, 508–518.
- Adam: A method for stochastic optimization, in: International Conference on Learning Representations (ICLR).
- Learning multiple layers of features from tiny images , 32–33URL: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 2278–2324.
- Knowledge flow: Improve upon your teachers, in: International Conference on Learning Representations (ICLR).
- Structured knowledge distillation for semantic segmentation., in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2604–2613.
- Data-free knowledge distillation for deep neural networks. arXiv preprint arXiv:1710.07535 .
- Sparse-to-dense: Depth prediction from sparse depth samples and a single image, in: IEEE International Conference on Robotics and Automation (ICRA), pp. 1–8.
- Ensemble distribution distillation. arXiv preprint arXiv:1905.00076 .
- Fast robust monocular depth estimation for obstacle detection with fully convolutional networks, in: IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 4296–4303.
- Scenenet rgb-d: 5m photorealistic images of synthetic indoor trajectories with ground truth. arXiv preprint arXiv:1612.05079 .
- Improved knowledge distillation via teacher assistant, in: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 5191–5198.
- Effectiveness of arbitrary transfer sets for data-free knowledge distillation, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1430–1438.
- Real-time joint semantic segmentation and depth estimation using asymmetric annotations, in: IEEE International Conference on Robotics and Automation (ICRA), pp. 7101–7107.
- Classmix: Segmentation-based data augmentation for semi-supervised learning, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1368–1377.
- Pytorch: An imperative style, high-performance deep learning library, in: Advances in Neural Information Processing Systems, pp. 8024–8035.
- Refine and distill: Exploiting cycle-inconsistency and knowledge distillation for unsupervised monocular depth estimation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9760–9769.
- Mobilenetv2: Inverted residuals and linear bottlenecks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510–4520.
- Meal: Multi-model ensemble via adversarial learning, in: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 4886–4893.
- Indoor segmentation and support inference from rgbd images, in: European Conference on Computer Vision (ECCV), pp. 746–760.
- Self-supervised depth completion from direct visual-lidar odometry in autonomous driving. IEEE Transactions on Intelligent Transportation Systems 23, 11654–11665.
- Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, in: Advances in Neural Information Processing Systems, pp. 1195–1204.
- Sparsity invariant cnns, in: International Conference on 3D Vision (3DV), pp. 11–20.
- Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 3048–3068.
- Knowledge distillation for fast and accurate monocular depth estimation on mobile devices, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2457–2465.
- Fastdepth: Fast monocular depth estimation on embedded systems, in: IEEE International Conference on Robotics and Automation (ICRA), pp. 6101–6108.
- Positive-unlabeled compression on the cloud, in: Advances in Neural Information Processing Systems (NeurIPS), pp. 2561–2570.
- Exploiting the intrinsic neighborhood structure for source-free domain adaptation, in: Advances in neural information processing systems, pp. 29393–29405.
- Dreaming to distill: Data-free knowledge transfer via deepinversion, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8715–8724.
- Virtual normal: Enforcing geometric constraints for accurate and robust depth prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 7282–7295.
- Enforcing geometric constraints of virtual normal for depth prediction, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5684–5693.
- Knowledge extraction with no observable data, in: Advances in Neural Information Processing Systems, pp. 2701–2710.
- Dilated residual networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 472–480.
- Masked gan for unsupervised depth and pose prediction with scale consistency. IEEE Transactions on Neural Networks and Learning Systems 32, 5392–5403.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.