Hybrid Vision Transformer and Quantum Convolutional Neural Network for Image Classification

Published 14 Oct 2025 in quant-ph | (2510.12291v1)

Abstract: Quantum machine learning (QML) holds promise for computational advantage, yet progress on real-world tasks is hindered by classical preprocessing and noisy devices. We introduce ViT-QCNN-FT, a hybrid framework that integrates a fine-tuned Vision Transformer with a quantum convolutional neural network (QCNN) to compress high-dimensional images into features suited for noisy intermediate-scale quantum (NISQ) devices. By systematically probing entanglement, we show that ansatzes with uniformly distributed entanglement entropy consistently deliver superior non-local feature fusion and state-of-the-art accuracy (99.77% on CIFAR-10). Surprisingly, quantum noise emerges as a double-edged factor: in some cases, it enhances accuracy (+2.71% under amplitude damping). Strikingly, substituting the QCNN with classical counterparts of equal parameter count leads to a dramatic 29.36% drop, providing unambiguous evidence of quantum advantage. Our study establishes a principled pathway for co-designing classical and quantum architectures, pointing toward practical QML capable of tackling complex, high-dimensional learning tasks.