RingGesture: A Ring-Based Mid-Air Gesture Typing System Powered by a Deep-Learning Word Prediction Framework
Abstract: Text entry is a critical capability for any modern computing experience, with lightweight augmented reality (AR) glasses being no exception. Designed for all-day wearability, a limitation of lightweight AR glass is the restriction to the inclusion of multiple cameras for extensive field of view in hand tracking. This constraint underscores the need for an additional input device. We propose a system to address this gap: a ring-based mid-air gesture typing technique, RingGesture, utilizing electrodes to mark the start and end of gesture trajectories and inertial measurement units (IMU) sensors for hand tracking. This method offers an intuitive experience similar to raycast-based mid-air gesture typing found in VR headsets, allowing for a seamless translation of hand movements into cursor navigation. To enhance both accuracy and input speed, we propose a novel deep-learning word prediction framework, Score Fusion, comprised of three key components: a) a word-gesture decoding model, b) a spatial spelling correction model, and c) a lightweight contextual LLM. In contrast, this framework fuses the scores from the three models to predict the most likely words with higher precision. We conduct comparative and longitudinal studies to demonstrate two key findings: firstly, the overall effectiveness of RingGesture, which achieves an average text entry speed of 27.3 words per minute (WPM) and a peak performance of 47.9 WPM. Secondly, we highlight the superior performance of the Score Fusion framework, which offers a 28.2% improvement in uncorrected Character Error Rate over a conventional word prediction framework, Naive Correction, leading to a 55.2% improvement in text entry speed for RingGesture. Additionally, RingGesture received a System Usability Score of 83 signifying its excellent usability.
- Enron Email Dataset. https://www.cs.cmu.edu/~./enron/. Dataset.
- Haptic capacitive. https://sensel.com/product/#haptic-capacitve.
- Reddit Data via pushshift.io. https://pushshift.io. Dataset.
- Wikipedia Talk Page Data. https://dumps.wikimedia.org. Dataset.
- Yelp Dataset. https://www.yelp.com/dataset. Dataset.
- Long short term memory neural network for keyboard gesture decoding. pp. 2076–2080, 04 2015.
- Apple Inc. Apple Vision Pro. https://www.apple.com/vision-pro/, 2023. Accessed: 2024-03-10.
- Y. Baba and H. Suzuki. How are spelling errors generated and corrected? a study of corrected and uncorrected spelling errors using keystroke logs. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 373–377. Association for Computational Linguistics, 2020.
- A neural probabilistic language model. Journal of Machine Learning Research, 3:1137–1155, 2003. This paper introduces a foundational approach to language modeling using neural networks, setting the stage for the integration of deep learning in language modeling.
- J. Brooke. Sus: A quick and dirty usability scale. Usability Evaluation in Industry, pp. 189–194, 1996.
- The first conversational intelligence challenge. In The NIPS’17 Competition: Building Intelligent Systems, pp. 25–46. Springer, 2018.
- F. Cai and M. de Rijke. A survey of query auto completion in information retrieval. Foundations and Trends in Information Retrieval, 10(4):273–363, 2016.
- Swipeboard: a text entry technique for ultra-small interfaces that supports novice to expert transitions. In Proceedings of the 27th annual ACM symposium on User interface software and technology, pp. 615–620, 2014.
- W. contributors. Microsoft hololens 2 - wikipedia. 2019. Accessed: 2024-03-10.
- W. contributors. Apple vision pro - wikipedia. 2023. Accessed: 2024-03-10.
- Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 1:4171–4186, 2019.
- Performance envelopes of virtual keyboard text input strategies in virtual reality. In 2019 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 289–300, 2019. doi: 10 . 1109/ISMAR . 2019 . 00027
- Evaluating the performance of hand-based probabilistic text input methods on a mid-air virtual qwerty keyboard. In IEEE Transactions on Visualization and Computer Graphics: forthcoming, 2023.
- Fast and precise touch-based text entry for head-mounted augmented reality with variable occlusion. ACM Trans. Comput.-Hum. Interact., 25(6), article no. Article 30, 40 pages, Dec. 2018.
- Fast and precise touch-based text entry for head-mounted augmented reality with variable occlusion. ACM Transactions on Computer-Human Interaction (TOCHI), 25(6):1–40, 2018.
- Comparative analysis of optitrack motion capture systems. In Advances in Motion Sensing and Control for Robotic Applications, pp. 15–31. Springer, 2019. doi: 10 . 1007/978-3-030-17369-2_2
- Y. Goldberg. Neural Network Methods for Natural Language Processing. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, 2017. An in-depth exploration of how neural network methods are applied in natural language processing, including language modeling.
- Wristext: One-handed text entry on smartwatch using wrist gestures. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pp. 1–14, 2018.
- Bidirectional lstm networks for improved phoneme classification and recognition. In Artificial Neural Networks: Formal Models and Their Applications–ICANN 2005, pp. 799–804. Springer Berlin Heidelberg, 2005.
- Typing on glasses: Adapting text entry to smart eyewear. In Proceedings of the 17th International Conference on Human-Computer Interaction with Mobile Devices and Services, 9 pages, p. 144–152. Association for Computing Machinery, 2015.
- Qwertyring: Text entry on physical surfaces using a ring. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 4(4):1–29, 2020.
- Rotoswype: Word-gesture typing using a ring. In Proceedings of the 2019 CHI conference on human factors in computing systems, pp. 1–12, 2019.
- Stat: Subtle typing around the thigh for head-mounted displays. In 22nd International Conference on Human-Computer Interaction with Mobile Devices and Services. ACM, 2020.
- Pinchtext: One-handed text entry technique combining pinch gestures and hand positions for head-mounted displays. International Journal of Human–Computer Interaction, pp. 1–17, 2022.
- Hifinger: One-handed text entry technique for virtual environments based on touches between fingers. Sensors, 19(14), 2019. doi: 10 . 3390/s19143063
- D. Jurafsky and J. H. Martin. Speech and Language Processing. Draft, 3 ed., 2019. A comprehensive guide to the field of speech and language processing, covering both traditional and modern approaches including N-gram models and deep learning techniques.
- Electroring: Subtle pinch and touch detection with a ring. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pp. 1–12, 2021.
- Star: Smartphone-analogous typing in augmented reality. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pp. 1–13, 2023.
- B. Klimt and Y. Yang. The enron corpus: A new dataset for email classification research. In European conference on machine learning, pp. 217–226. Springer, 2004.
- P.-O. Kristensson and S. Zhai. Shark2: a large vocabulary shorthand writing system for pen-based computers. In Proceedings of the 17th annual ACM symposium on User interface software and technology, pp. 43–52, 2004.
- K. Kukich. Techniques for automatically correcting words in text. ACM Computing Surveys (CSUR), 24(4):377–439, 1992.
- Hibey: Hide the keyboard in augmented reality. In 2019 IEEE International Conference on Pervasive Computing and Communications (PerCom, pp. 1–10. IEEE, 2019.
- Y. Lee and G. J. Kim. Vitty: Virtual touch typing interface with added finger buttons. In International Conference on Virtual, Augmented and Mixed Reality, pp. 111–119. Springer, 2017. doi: 10 . 1007/978-3-319-57987-0_9
- How we swipe: a large-scale shape-writing dataset and empirical findings. In Proceedings of the 23rd International Conference on Mobile Human-Computer Interaction, pp. 1–13, 2021.
- G. Lepouras. Comparing methods for numerical input in immersive virtual environments. Virtual Reality, 22(1):63–77, 2018. doi: 10 . 1007/s10055-017-0312-5
- Drg-keyboard: Enabling subtle gesture typing on the fingertip with dual imu rings. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 6(4), article no. 170, 30 pages, jan 2023. doi: 10 . 1145/3569463
- How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. p. 13, 03 2016. doi: 10 . 18653/v1/D16-1230
- Phrase sets for evaluating text entry techniques. In CHI’03 extended abstracts on Human factors in computing systems, pp. 754–755, 2003.
- Vulture: a mid-air word-gesture keyboard. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1073–1082, 2014.
- Vulture: A mid-air word-gesture keyboard. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’14, 10 pages, p. 1073–1082. Association for Computing Machinery, New York, NY, USA, 2014. doi: 10 . 1145/2556288 . 2556964
- Designing Experiments and Analyzing Data: A Model Comparison Perspective. Lawrence Erlbaum Associates Publishers, 2 ed., 2004.
- Meta AI. Pytext: A natural language modeling framework based on pytorch. https://github.com/facebookresearch/pytext, 2018. Accessed: insert date here.
- Meta Platforms, Inc. Oculus Quest Series. https://www.oculus.com/quest/. Accessed: 2024-03-10.
- Microsoft Corporation. Microsoft HoloLens 2. https://www.microsoft.com/en-us/hololens, 2019. Accessed: 2024-03-10.
- R. Mitton. English Spelling and the Computer. Longman Group, Harlow, Essex, UK, 1996.
- Space saving text input method for head mounted display with virtual 12-key keyboard. In 2018 IEEE 32nd International Conference on Advanced Information Networking and Applications (AINA), pp. 342–349, 2018. doi: 10 . 1109/AINA . 2018 . 00059
- A. Pauls and D. Klein. Faster and smaller n-gram language models. In Annual Meeting of the Association for Computational Linguistics, 2011.
- Argot: A wearable one-handed keyboard glove. In Proceedings of the 2014 ACM international symposium on wearable computers: adjunct program, pp. 87–92, 2014.
- T. A. Pirinen and K. Lindén. State-of-the-art in weighted finite-state spell-checking. In Computational Linguistics and Intelligent Text Processing, vol. 8404, pp. 519–532. Springer, 2014.
- Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 2019.
- G. Rakhmetulla and A. S. Arif. Swipering: Gesture typing on smartwatches using a segmented qwerty around the bezel. In Graphics Interface 2021, 2020.
- Performance and user experience of touchscreen and gesture keyboards in a lab setting and in the wild. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 679–688, 2015.
- Recent advances in recurrent neural networks, 2017.
- T. Salimans and D. P. Kingma. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. ArXiv, abs/1602.07868, 2016.
- M. Schuster and K. K. Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11):2673–2681, 1997.
- C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423, 623–656, 1948.
- Simulating realistic human motion trajectories of mid-air gesture typing. In 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 393–402. IEEE, 2021.
- Fast and robust mid-air gesture typing for ar headsets using 3d trajectory decoding. IEEE Transactions on Visualization and Computer Graphics, 2023.
- Personalization of a mid-air gesture keyboard using multi-objective bayesian optimization. In 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 702–710. IEEE, 2022.
- Kwickchat: A multi-turn dialogue system for aac using context-aware sentence generation by bag-of-keywords. In 27th International Conference on Intelligent User Interfaces, pp. 853–867, 2022.
- G. S. Snoddy. Learning and stability: a psychophysiological analysis of a case of motor learning with clinical applications. Journal of Applied Psychology, 10(1):1–36, 1926.
- Selection-based text entry in virtual reality. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI ’18, 13 pages, p. 1–13. Association for Computing Machinery, New York, NY, USA, 2018. doi: 10 . 1145/3173574 . 3174221
- Lstm neural networks for language modeling. In Thirteenth annual conference of the international speech communication association, 2012.
- Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008, 2017.
- R. Venkatesan and B. Li. Convolutional Neural Networks in Visual Computing: A Concise Guide. CRC Press, 2017. Archived from the original on 2023-10-16. Retrieved 2020-12-13.
- Palmtype: Using palms as keyboards for smart glasses. In Proceedings of the 17th International Conference on Human-Computer Interaction with Mobile Devices and Services, 8 pages, p. 153–160. Association for Computing Machinery, 2015.
- Understanding the heisenberg effect of spatial interaction: A selection induced error for spatially tracked input devices. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, CHI ’20, 10 pages, p. 1–10. Association for Computing Machinery, New York, NY, USA, 2020. doi: 10 . 1145/3313831 . 3376876
- Pointing and selection methods for text entry in augmented reality head mounted displays. In 2019 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 279–288. IEEE, 2019.
- Phrase-gesture typing on smartphones. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, UIST ’22, article no. 55, 11 pages. Association for Computing Machinery, New York, NY, USA, 2022. doi: 10 . 1145/3526113 . 3545683
- Tiptext: Eyes-free text entry on a fingertip keyboard. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology, pp. 883–899, 2019.
- Tap, dwell or gesture? exploring head-based text entry techniques for hmds. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 4479–4488. ACM, 2017.
- Tap, dwell or gesture? exploring head-based text entry techniques for hmds. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, CHI ’17, 10 pages, p. 4479–4488. Association for Computing Machinery, New York, NY, USA, 2017.
- Pizzatext: Text entry for virtual reality systems using dual thumbsticks. IEEE Transactions on Visualization and Computer Graphics, 24(11):2927–2935, 2018. doi: 10 . 1109/TVCG . 2018 . 2868581
- Gaze speedup: Eye gaze assisted gesture typing in virtual reality. In Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 595–606, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.