Glove-Based Gesture Control Systems

Updated 24 January 2026

Glove-based gesture control is a system that uses sensorized gloves with flex, IMU, and capacitive sensors to capture hand kinematics for intuitive real-time interaction.
It employs advanced signal processing, calibration, and machine learning algorithms, achieving recognition accuracies up to 99% and low-latency performance.
Applications include robotics, VR, rehabilitation, and surgical interfaces, enabling both discrete command mappings and continuous kinematic control.

Glove-based gesture control refers to the use of sensorized gloves to extract hand and finger kinematics as input for real-time recognition of gestures, enabling intuitive and fine-grained human–machine interaction. These systems have been applied across domains including robotics teleoperation, virtual reality (VR), rehabilitation, UAV/UGV control, and medical/surgical interfaces. Modern glove-based systems integrate multimodal sensing—commonly resistive flex sensors, IMUs, capacitive sensors, tactile matrices—on an instrumented fabric, coupled to embedded machine-learning pipelines achieving low-latency, robust recognition and direct mapping to discrete or continuous control actions.

1. Sensor Technologies and Glove Architectures

Recent reviews describe the proliferation of custom and commercial data gloves since the 1980s, classifying sensor types and placement strategies (Belcamino et al., 2024). Predominant sensor modalities are:

Resistive bend ("flex") sensors: Resistance $R(\theta) = R_0 + k\,\theta$ tracks local joint angles, providing up to 120–180° dynamic range. Typical usage places one per DoF at the DIP/PIP/MCP finger joints, thumb CMC/MCP, and sometimes the palm (Belcamino et al., 2024, Fedoseev et al., 2021).
Inertial measurement units (IMUs): 3-axis accelerometers and gyroscopes, with optional magnetometers, enable direct tracking of hand orientation via sensor fusion. Placement may be per finger, palm, or both (Králik et al., 2021, Habel et al., 22 Jan 2026, Kvasić et al., 10 Jun 2025, Islam et al., 17 Jan 2026).
Capacitive strain sensors: Thin-film or liquid-metal based channels register strain/grip via $\Delta C/C_0 \approx 0.45\%$ per 1% strain, enabling multi-point measurement of joint angles and inter-finger spacing (Dong et al., 8 Apr 2025, Bello et al., 2023).
Pressure/tactile sensors (FSRs): Used primarily for contact detection or pinch force estimation.
Contact-closure circuits: For binary event detection in pinch-based minimal gloves (e.g., "mudra" devices) (Freire et al., 2019).

Architecturally, gloves are designed with trade-offs between coverage, weight, ergonomics, and DoF. Full kinematic gloves (e.g., 22-sensor CyberGlove II) offer comprehensive joint tracking (Neto et al., 2013), while minimalist systems target specific gesture vocabularies to minimize encumbrance (Freire et al., 2019).

2. Signal Processing, Calibration, and Feature Extraction

Raw sensor data require calibration and preprocessing prior to feature extraction:

Flex sensor calibration uses linear mapping between measured resistance ( $R$ ) and joint angle ( $\theta$ ), anchored by per-user minima/maxima (Fedoseev et al., 2021, Neto et al., 2013, Belcamino et al., 2024).
IMU orientation estimation employs sensor fusion algorithms: complementary filters, Madgwick/RUKF, or quaternion integration (Králik et al., 2021, Habel et al., 22 Jan 2026, Islam et al., 17 Jan 2026). Bias removal and adaptive scaling are standard for drift compensation.
Capacitive and stretch sensors are normalized by in-situ open/closed calibration, mapping capacitance response to [0,1] range per user (Dong et al., 8 Apr 2025, Kvasić et al., 10 Jun 2025).
Feature vectors are constructed from per-time-step joint angles, finger spacings, hand orientation (quaternions/Euler), or contact events (Belcamino et al., 2024). Some pipelines utilize sliding windows and stack delayed samples to capture short-term dynamics (Neto et al., 2013, Dong et al., 8 Apr 2025).

Table 1 summarizes common processing steps by glove type:

Sensor Modality	Calibration Pipeline	Feature Set
Resistive flex	Per-user min/max, linear mapping	$\boldsymbol{\theta}$ (all joints)
IMU	Gyro bias removal, sensor fusion	$[\omega_x, \omega_y, \omega_z]$ , $q_t$
Capacitive/stretch	Open/closed auto-calibration	$\Delta C/C_0$ , inter-finger distances
Contact closure	Signal debounce (optional)	Binary touch events

Recognition quality and robustness are contingent on regular calibration and low-noise preprocessing (Dong et al., 8 Apr 2025, Fedoseev et al., 2021).

3. Gesture Recognition Algorithms and Embedded Inference

Glove-based gesture recognition utilizes varied algorithms based on application constraints, gesture set cardinality, and computational resources:

Threshold-based mapping: Suitable for sparse command sets, e.g., IMU-tilt-based wheelchair control using angular rate thresholds ( $g'_x$ , $g'_y$ ) at $\Delta C/C_0 \approx 0.45\%$ 0 deg/s for direction (Islam et al., 17 Jan 2026).
Feedforward neural networks: 2-stage ANNs (44-44-1, 44-44-G) for real-time, continuous static hand posture segmentation and classification, achieving 98–99% recognition rates for $\Delta C/C_0 \approx 0.45\%$ 1 gestures (Neto et al., 2013).
Convolutional and Transformer models: Used in recognition of long temporal sequences and fine-grained finger dynamics, e.g., 1D-CNN+MLP pipeline for 99.1% accuracy on 30-class soft-glove gestures (Dong et al., 8 Apr 2025); Transformer encoder for 99.9% accuracy on multi-finger IMU sets (Králik et al., 2021).
Hidden Markov Models (HMM), SVMs: Employed for robust temporal event segmentation, especially under noise or variable sequence lengths (Kvasić et al., 10 Jun 2025).
Hierarchical/Multi-modal NNs: Staged pipelines (inertial, then capacitive) reduce power and error on microcontroller-class MCUs in real-time edge applications (Bello et al., 2023).
Attention-based models: Outperform RNN/LSTM in teleoperation latency compensation via accurate multi-step lookahead in intent prediction tasks (Ahmed et al., 2021).

Performance metrics reported include recognition accuracy (typically 85–99%), F1-score, system latency ( $\Delta C/C_0 \approx 0.45\%$ 2 ms (Dong et al., 8 Apr 2025) to $\Delta C/C_0 \approx 0.45\%$ 3 ms (Bello et al., 2023)), and confusion matrices for gesture discrimination.

4. Control Mapping and Teleoperation Applications

Gesture-to-control mappings fall along a spectrum:

Discrete command mapping: Each recognized hand posture or gesture is mapped to a specific system or robot command, e.g. Fist $\Delta C/C_0 \approx 0.45\%$ 4 grasp, Thumb-up $\Delta C/C_0 \approx 0.45\%$ 5 ascend (Fedoseev et al., 2021, Habel et al., 22 Jan 2026).
Continuous kinematic mapping: Hand and finger joint outputs are retargeted to direct kinematic chains of robots, exoskeletons, or VR manipulators. Pose and orientation are mapped via scaling and rotation matrices, e.g. $\Delta C/C_0 \approx 0.45\%$ 6 (Borgioli et al., 2024).
Assistive and safety-critical control: Underwater diver–AUV acoustic command transmission $\Delta C/C_0 \approx 0.45\%$ 7 s latency, 85% on-glove accuracy, 80% end-to-end, wheelchair navigation 95.5% success, $\Delta C/C_0 \approx 0.45\%$ 8 ms latency, drone control via attitude mapping/step commands (Habel et al., 22 Jan 2026, Bello et al., 2023).
Surgical and medical robotics: Glove-driven da Vinci Research Kit interfaces using multi-DoF flex and IMU glove (Borgioli et al., 2024), achieving $\Delta C/C_0 \approx 0.45\%$ 9 mm positional RMSE and $R$ 0 rad orientational RMSE with sub-250 ms latency.

Tables mapping gesture IDs to specific robotic or system actions are standard (Neto et al., 2013, Fedoseev et al., 2021, Kvasić et al., 10 Jun 2025).

5. System Integration, Feedback, and Evaluation

High-performance glove-based systems integrate hardware, embedded real-time inference, communication, and user feedback subsystems:

On-glove feedback: Vibration motors and LEDs confirm recognition or warn of system state (e.g., speed limit exceeded in UAV piloting (Habel et al., 22 Jan 2026), command confirmation (Kvasić et al., 10 Jun 2025, Borgioli et al., 2024)).
Communication protocols: Systems employ Wi-Fi, BLE, acoustic, or wired (USB/UART) connections, with data rates determined by channel count and sampling frequency ( $R$ 1 kb/s for 16 flex at 100 Hz, 12-bit ADC (Belcamino et al., 2024)).
Edge-inference: TinyML frameworks (e.g., TFLite-Micro) permit onboard CNN sequence classification at 1.15 W/2 MB flash (Bello et al., 2023).
User calibration and robustness: Per-user auto-calibration, modular sensor configuration, and parameter normalization enhance across-user functionality (Dong et al., 8 Apr 2025, Kvasić et al., 10 Jun 2025).

Empirical evaluations report mean task completion times, error metrics, and user feedback; for example, 98.5% classifier accuracy and sub-0.25 s system latency in surgical robotics (Borgioli et al., 2024), or 99%+ gesture recognition at 15 ms latency for industrial robot teleoperation (Neto et al., 2013).

6. Design Guidelines, Limitations, and Future Directions

Best practices and open challenges in glove-based gesture control systems have been synthesized in recent reviews (Belcamino et al., 2024):

Sensor/placement trade-offs: High DoF tracking vs. weight, stiffness, and wiring complexity; three IMUs per glove suffice for near-maximal accuracy in finger-differentiated gestures (Králik et al., 2021).
Drift and error management: Periodic rest poses, soft/hard-iron calibration, functional model-based alignment; exoskeletons proposed for deformation compensation (Belcamino et al., 2024).
Latency minimization: Pipelined inference, staged classifiers, and state-machine driven control loops ensure response rates compatible with real-time teleoperation (Dong et al., 8 Apr 2025, Bello et al., 2023).
Scalability and modularity: Plug-and-play sensor modules, flexible buses, and open digital interfaces designed for rapid prototyping and integration into external HCI, VR, or ROS frameworks (Dong et al., 8 Apr 2025).
Standardization and reproducibility: Lack of standard kinematic models and calibration protocols hampers comparability and reusability; systematic reviews advocate for open-source hardware/software initiatives (Belcamino et al., 2024, Freire et al., 2019).
Emerging directions: Miniaturized IMUs for per-joint tracking, high-resolution capacitive "skins," energy-efficient deep learning, self-calibrating systems, and haptic feedback actuation remain active fronts. Integration with exoskeletons for force feedback, subject-independent models, and advanced teleoperation intents (multi-step intent forecasting, dynamic stiffness adaptation) has also been identified as promising (Ahmed et al., 2021, Dong et al., 8 Apr 2025).

7. Comparative Analysis and Applications

Glove-based gesture control excels over vision-based approaches in scenarios requiring occlusion robustness, environmental tolerance (e.g., underwater, low-light), or privacy constraints (Kvasić et al., 10 Jun 2025, Bello et al., 2023). In underwater diver–robot interaction, glove-based acoustic gesture transmission (85–80% success at 0.33 s latency) is contrasted with vision-based methods (90–95% accuracy, but failure in poor visibility) (Kvasić et al., 10 Jun 2025). Simplified contact-based designs (OMG-VR) are sufficient for domain-specific interactions (e.g., molecular VR assembly), outperforming more expensive and complex full-pose gloves given well-bounded gesture sets (Freire et al., 2019).

Major domains of application include:

Industrial and service robotics: Precision telemanipulation, continuous and discrete control (Neto et al., 2013, Ahmed et al., 2021).
VR/AR and rehabilitation: Real-time hand pose capture, avatar animation, and medical hand function assessment (Dong et al., 8 Apr 2025, Freire et al., 2019).
Assistive technology: Smart wheelchairs, prosthetic limb control, and UAV piloting (Islam et al., 17 Jan 2026, Habel et al., 22 Jan 2026, Bello et al., 2023).
Human–robot collaboration and HMI: Rich, context-adaptive interfaces where bidirectional user-feedback and intent prediction are required (Borgioli et al., 2024, Ahmed et al., 2021).

Limitations remain in large-scale subject adaptation, continuous gesture transitions (dynamic gestures), drift under extended use, and the encumbrance or coverage of sensorized gloves. Research continues to address the trade-off between sensing richness, computational efficiency, robustness, and user comfort (Belcamino et al., 2024, Dong et al., 8 Apr 2025, Králik et al., 2021, Kvasić et al., 10 Jun 2025).