LUM-ViT: Learnable Under-sampling Mask Vision Transformer for Bandwidth Limited Optical Signal Acquisition
Abstract: Bandwidth constraints during signal acquisition frequently impede real-time detection applications. Hyperspectral data is a notable example, whose vast volume compromises real-time hyperspectral detection. To tackle this hurdle, we introduce a novel approach leveraging pre-acquisition modulation to reduce the acquisition volume. This modulation process is governed by a deep learning model, utilizing prior information. Central to our approach is LUM-ViT, a Vision Transformer variant. Uniquely, LUM-ViT incorporates a learnable under-sampling mask tailored for pre-acquisition modulation. To further optimize for optical calculations, we propose a kernel-level weight binarization technique and a three-stage fine-tuning strategy. Our evaluations reveal that, by sampling a mere 10% of the original image pixels, LUM-ViT maintains the accuracy loss within 1.8% on the ImageNet classification task. The method sustains near-original accuracy when implemented on real-world optical hardware, demonstrating its practicality. Code will be available at https://github.com/MaxLLF/LUM-ViT.
- A systematic review of hardware-accelerated compression of remotely sensed hyperspectral images. Sensors, 22(1), 2022. ISSN 1424-8220. doi: 10.3390/s22010263.
- 220 band AVIRIS hyperspectral image data set: June 12, 1992 Indian Pines test site 3, 2015.
- Reinforcement learning in a large-scale photonic recurrent neural network. Optica, 5(6):756, jun 2018. doi: 10.1364/optica.5.000756.
- Mask-guided spectral-wise transformer for efficient hyperspectral image reconstruction. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17481–17490, 2022a. doi: 10.1109/CVPR52688.2022.01698.
- Mst++: Multi-stage spectral-wise transformer for efficient spectral reconstruction, 2022b.
- Binarized spectral compressive imaging, 2023.
- Learning phrase representations using rnn encoder–decoder for statistical machine translation. arXiv preprint arXiv:D14-1179, 2014.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, 2009. doi: 10.1109/CVPR.2009.5206848.
- Heatvit: Hardware-efficient adaptive token pruning for vision transformers. In 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 442–455, 2023. doi: 10.1109/HPCA56546.2023.10071047.
- D.L. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289–1306, 2006. doi: 10.1109/TIT.2006.871582.
- An image is worth 16x16 words: Transformers for image recognition at scale, 2021.
- James E. Fowler. Compressive pushbroom and whiskbroom sensing for hyperspectral remote-sensing imaging. In 2014 IEEE International Conference on Image Processing (ICIP), pp. 684–688, 2014. doi: 10.1109/ICIP.2014.7025137.
- Masked autoencoders are scalable vision learners, 2021.
- A deep learning-based hyperspectral object classification approach via imbalanced training samples handling. Remote Sensing, 15(14), 2023. ISSN 2072-4292. doi: 10.3390/rs15143532.
- Categorical reparameterization with gumbel-softmax, 2017.
- Plant-cnn-vit: Plant classification with ensemble of convolutional neural networks and vision transformer. Plants, 12(14), 2023. ISSN 2223-7747. doi: 10.3390/plants12142642.
- Spatial anomaly detection in hyperspectral imaging using optical neural networks. IEEE Intelligent Systems, 38(2):64–72, 2023a. doi: 10.1109/MIS.2023.3241431.
- Adaptive sparse vit: Towards learnable adaptive token pruning by fully exploiting self-attention, 2023b.
- Swin transformer: Hierarchical vision transformer using shifted windows, 2021.
- Bit: Robustly binarized multi-distilled transformer, 2022.
- Decoupled weight decay regularization, 2019.
- Challenges and solutions for processing real-time big data stream: A systematic literature review. IEEE Access, 8:119123–119143, 2020. doi: 10.1109/ACCESS.2020.3005268.
- Adavit: Adaptive vision transformers for efficient image recognition, 2021.
- Deep learning for anomaly detection: A review. ACM Comput. Surv., 54(2), mar 2021. ISSN 0360-0300. doi: 10.1145/3439950.
- Review of bio-optical imaging systems with a high space-bandwidth product. Adv Photonics, 3(4):044001, 2021. doi: 10.1117/1.ap.3.4.044001.
- Binary neural networks: A survey. Pattern Recognition, 105:107281, 2020. ISSN 0031-3203. doi: https://doi.org/10.1016/j.patcog.2020.107281.
- Dynamic spatial sparsification for efficient vision transformers and convolutional neural networks, 2023.
- Feature extraction for hyperspectral imagery: The evolution from shallow to deep: Overview and toolbox. IEEE Geoscience and Remote Sensing Magazine, 8(4):60–88, 2020. doi: 10.1109/MGRS.2020.2979764.
- Hyperspectral Imaging and Applications. MDPI, 2022. doi: 10.3390/books978-3-03921-523-2.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Single-pixel hyperspectral imaging. In Qionghai Dai and Tsutomu Shimura (eds.), Optoelectronic Imaging and Multimedia Technology IV, volume 10020, pp. 38 – 44. International Society for Optics and Photonics, SPIE, 2016. doi: 10.1117/12.2247809.
- Training data-efficient image transformers & distillation through attention, 2021.
- Attention is all you need, 2017.
- An optical neural network using less than 1 photon per multiplication. Nature Communications, 12(1):1–9, 2021. doi: 10.1038/s41467-021-27774-8.
- Distributed compressed hyperspectral sensing imaging based on spectral unmixing. Sensors, 20(8), 2020. ISSN 1424-8220. doi: 10.3390/s20082305.
- Dimensionality reduced training by pruning and freezing parts of a deep neural network: A survey. Artificial Intelligence Review, 2023. doi: 10.1007/s10462-023-10489-1.
- Binaryvit: Towards efficient and accurate binary vision transformers, 2023.
- Evo-vit: Slow-fast token evolution for dynamic vision transformer, 2021.
- ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models. Transactions of the Association for Computational Linguistics, 10:291–306, 03 2022. ISSN 2307-387X. doi: 10.1162/tacl˙a˙00461.
- Tokens-to-token vit: Training vision transformers from scratch on imagenet. arXiv preprint arXiv:2101.11986, 2021.
- Restormer: Efficient transformer for high-resolution image restoration. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5718–5729, 2022. doi: 10.1109/CVPR52688.2022.00564.
- mixup: Beyond empirical risk minimization, 2018.
- An information entropy masked vision transformer (iem-vit) model for recognition of tea diseases. Agronomy, 13(4), 2023. ISSN 2073-4395. doi: 10.3390/agronomy13041156.
- A resolution-enhanced digital micromirror device (dmd) projection system. IEEE Access, 9:78153–78164, 2021. doi: 10.1109/ACCESS.2021.3082564.
- Random erasing data augmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07):13001–13008, Apr. 2020. doi: 10.1609/aaai.v34i07.7000.
- Informer: Beyond efficient transformer for long sequence time-series forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, 35(12):11106–11115, May 2021. doi: 10.1609/aaai.v35i12.17325.
- Cnn based hdr/wcg characteristic detection for ultra-high definition video. In 2020 International Wireless Communications and Mobile Computing (IWCMC), pp. 1584–1589, 2020. doi: 10.1109/IWCMC48107.2020.9148248.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.