Papers
Topics
Authors
Recent
Search
2000 character limit reached

LUM-ViT: Learnable Under-sampling Mask Vision Transformer for Bandwidth Limited Optical Signal Acquisition

Published 3 Mar 2024 in cs.CV, eess.IV, and eess.SP | (2403.01412v1)

Abstract: Bandwidth constraints during signal acquisition frequently impede real-time detection applications. Hyperspectral data is a notable example, whose vast volume compromises real-time hyperspectral detection. To tackle this hurdle, we introduce a novel approach leveraging pre-acquisition modulation to reduce the acquisition volume. This modulation process is governed by a deep learning model, utilizing prior information. Central to our approach is LUM-ViT, a Vision Transformer variant. Uniquely, LUM-ViT incorporates a learnable under-sampling mask tailored for pre-acquisition modulation. To further optimize for optical calculations, we propose a kernel-level weight binarization technique and a three-stage fine-tuning strategy. Our evaluations reveal that, by sampling a mere 10% of the original image pixels, LUM-ViT maintains the accuracy loss within 1.8% on the ImageNet classification task. The method sustains near-original accuracy when implemented on real-world optical hardware, demonstrating its practicality. Code will be available at https://github.com/MaxLLF/LUM-ViT.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. A systematic review of hardware-accelerated compression of remotely sensed hyperspectral images. Sensors, 22(1), 2022. ISSN 1424-8220. doi: 10.3390/s22010263.
  2. 220 band AVIRIS hyperspectral image data set: June 12, 1992 Indian Pines test site 3, 2015.
  3. Reinforcement learning in a large-scale photonic recurrent neural network. Optica, 5(6):756, jun 2018. doi: 10.1364/optica.5.000756.
  4. Mask-guided spectral-wise transformer for efficient hyperspectral image reconstruction. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  17481–17490, 2022a. doi: 10.1109/CVPR52688.2022.01698.
  5. Mst++: Multi-stage spectral-wise transformer for efficient spectral reconstruction, 2022b.
  6. Binarized spectral compressive imaging, 2023.
  7. Learning phrase representations using rnn encoder–decoder for statistical machine translation. arXiv preprint arXiv:D14-1179, 2014.
  8. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.  248–255, 2009. doi: 10.1109/CVPR.2009.5206848.
  9. Heatvit: Hardware-efficient adaptive token pruning for vision transformers. In 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp.  442–455, 2023. doi: 10.1109/HPCA56546.2023.10071047.
  10. D.L. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289–1306, 2006. doi: 10.1109/TIT.2006.871582.
  11. An image is worth 16x16 words: Transformers for image recognition at scale, 2021.
  12. James E. Fowler. Compressive pushbroom and whiskbroom sensing for hyperspectral remote-sensing imaging. In 2014 IEEE International Conference on Image Processing (ICIP), pp.  684–688, 2014. doi: 10.1109/ICIP.2014.7025137.
  13. Masked autoencoders are scalable vision learners, 2021.
  14. A deep learning-based hyperspectral object classification approach via imbalanced training samples handling. Remote Sensing, 15(14), 2023. ISSN 2072-4292. doi: 10.3390/rs15143532.
  15. Categorical reparameterization with gumbel-softmax, 2017.
  16. Plant-cnn-vit: Plant classification with ensemble of convolutional neural networks and vision transformer. Plants, 12(14), 2023. ISSN 2223-7747. doi: 10.3390/plants12142642.
  17. Spatial anomaly detection in hyperspectral imaging using optical neural networks. IEEE Intelligent Systems, 38(2):64–72, 2023a. doi: 10.1109/MIS.2023.3241431.
  18. Adaptive sparse vit: Towards learnable adaptive token pruning by fully exploiting self-attention, 2023b.
  19. Swin transformer: Hierarchical vision transformer using shifted windows, 2021.
  20. Bit: Robustly binarized multi-distilled transformer, 2022.
  21. Decoupled weight decay regularization, 2019.
  22. Challenges and solutions for processing real-time big data stream: A systematic literature review. IEEE Access, 8:119123–119143, 2020. doi: 10.1109/ACCESS.2020.3005268.
  23. Adavit: Adaptive vision transformers for efficient image recognition, 2021.
  24. Deep learning for anomaly detection: A review. ACM Comput. Surv., 54(2), mar 2021. ISSN 0360-0300. doi: 10.1145/3439950.
  25. Review of bio-optical imaging systems with a high space-bandwidth product. Adv Photonics, 3(4):044001, 2021. doi: 10.1117/1.ap.3.4.044001.
  26. Binary neural networks: A survey. Pattern Recognition, 105:107281, 2020. ISSN 0031-3203. doi: https://doi.org/10.1016/j.patcog.2020.107281.
  27. Dynamic spatial sparsification for efficient vision transformers and convolutional neural networks, 2023.
  28. Feature extraction for hyperspectral imagery: The evolution from shallow to deep: Overview and toolbox. IEEE Geoscience and Remote Sensing Magazine, 8(4):60–88, 2020. doi: 10.1109/MGRS.2020.2979764.
  29. Hyperspectral Imaging and Applications. MDPI, 2022. doi: 10.3390/books978-3-03921-523-2.
  30. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  31. Single-pixel hyperspectral imaging. In Qionghai Dai and Tsutomu Shimura (eds.), Optoelectronic Imaging and Multimedia Technology IV, volume 10020, pp.  38 – 44. International Society for Optics and Photonics, SPIE, 2016. doi: 10.1117/12.2247809.
  32. Training data-efficient image transformers & distillation through attention, 2021.
  33. Attention is all you need, 2017.
  34. An optical neural network using less than 1 photon per multiplication. Nature Communications, 12(1):1–9, 2021. doi: 10.1038/s41467-021-27774-8.
  35. Distributed compressed hyperspectral sensing imaging based on spectral unmixing. Sensors, 20(8), 2020. ISSN 1424-8220. doi: 10.3390/s20082305.
  36. Dimensionality reduced training by pruning and freezing parts of a deep neural network: A survey. Artificial Intelligence Review, 2023. doi: 10.1007/s10462-023-10489-1.
  37. Binaryvit: Towards efficient and accurate binary vision transformers, 2023.
  38. Evo-vit: Slow-fast token evolution for dynamic vision transformer, 2021.
  39. ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models. Transactions of the Association for Computational Linguistics, 10:291–306, 03 2022. ISSN 2307-387X. doi: 10.1162/tacl˙a˙00461.
  40. Tokens-to-token vit: Training vision transformers from scratch on imagenet. arXiv preprint arXiv:2101.11986, 2021.
  41. Restormer: Efficient transformer for high-resolution image restoration. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  5718–5729, 2022. doi: 10.1109/CVPR52688.2022.00564.
  42. mixup: Beyond empirical risk minimization, 2018.
  43. An information entropy masked vision transformer (iem-vit) model for recognition of tea diseases. Agronomy, 13(4), 2023. ISSN 2073-4395. doi: 10.3390/agronomy13041156.
  44. A resolution-enhanced digital micromirror device (dmd) projection system. IEEE Access, 9:78153–78164, 2021. doi: 10.1109/ACCESS.2021.3082564.
  45. Random erasing data augmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07):13001–13008, Apr. 2020. doi: 10.1609/aaai.v34i07.7000.
  46. Informer: Beyond efficient transformer for long sequence time-series forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, 35(12):11106–11115, May 2021. doi: 10.1609/aaai.v35i12.17325.
  47. Cnn based hdr/wcg characteristic detection for ultra-high definition video. In 2020 International Wireless Communications and Mobile Computing (IWCMC), pp.  1584–1589, 2020. doi: 10.1109/IWCMC48107.2020.9148248.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.