Papers
Topics
Authors
Recent
Search
2000 character limit reached

RingGesture: A Ring-Based Mid-Air Gesture Typing System Powered by a Deep-Learning Word Prediction Framework

Published 8 Oct 2024 in cs.CV, cs.AI, and cs.CL | (2410.18100v1)

Abstract: Text entry is a critical capability for any modern computing experience, with lightweight augmented reality (AR) glasses being no exception. Designed for all-day wearability, a limitation of lightweight AR glass is the restriction to the inclusion of multiple cameras for extensive field of view in hand tracking. This constraint underscores the need for an additional input device. We propose a system to address this gap: a ring-based mid-air gesture typing technique, RingGesture, utilizing electrodes to mark the start and end of gesture trajectories and inertial measurement units (IMU) sensors for hand tracking. This method offers an intuitive experience similar to raycast-based mid-air gesture typing found in VR headsets, allowing for a seamless translation of hand movements into cursor navigation. To enhance both accuracy and input speed, we propose a novel deep-learning word prediction framework, Score Fusion, comprised of three key components: a) a word-gesture decoding model, b) a spatial spelling correction model, and c) a lightweight contextual LLM. In contrast, this framework fuses the scores from the three models to predict the most likely words with higher precision. We conduct comparative and longitudinal studies to demonstrate two key findings: firstly, the overall effectiveness of RingGesture, which achieves an average text entry speed of 27.3 words per minute (WPM) and a peak performance of 47.9 WPM. Secondly, we highlight the superior performance of the Score Fusion framework, which offers a 28.2% improvement in uncorrected Character Error Rate over a conventional word prediction framework, Naive Correction, leading to a 55.2% improvement in text entry speed for RingGesture. Additionally, RingGesture received a System Usability Score of 83 signifying its excellent usability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (79)
  1. Enron Email Dataset. https://www.cs.cmu.edu/~./enron/. Dataset.
  2. Haptic capacitive. https://sensel.com/product/#haptic-capacitve.
  3. Reddit Data via pushshift.io. https://pushshift.io. Dataset.
  4. Wikipedia Talk Page Data. https://dumps.wikimedia.org. Dataset.
  5. Yelp Dataset. https://www.yelp.com/dataset. Dataset.
  6. Long short term memory neural network for keyboard gesture decoding. pp. 2076–2080, 04 2015.
  7. Apple Inc. Apple Vision Pro. https://www.apple.com/vision-pro/, 2023. Accessed: 2024-03-10.
  8. Y. Baba and H. Suzuki. How are spelling errors generated and corrected? a study of corrected and uncorrected spelling errors using keystroke logs. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 373–377. Association for Computational Linguistics, 2020.
  9. A neural probabilistic language model. Journal of Machine Learning Research, 3:1137–1155, 2003. This paper introduces a foundational approach to language modeling using neural networks, setting the stage for the integration of deep learning in language modeling.
  10. J. Brooke. Sus: A quick and dirty usability scale. Usability Evaluation in Industry, pp. 189–194, 1996.
  11. The first conversational intelligence challenge. In The NIPS’17 Competition: Building Intelligent Systems, pp. 25–46. Springer, 2018.
  12. F. Cai and M. de Rijke. A survey of query auto completion in information retrieval. Foundations and Trends in Information Retrieval, 10(4):273–363, 2016.
  13. Swipeboard: a text entry technique for ultra-small interfaces that supports novice to expert transitions. In Proceedings of the 27th annual ACM symposium on User interface software and technology, pp. 615–620, 2014.
  14. W. contributors. Microsoft hololens 2 - wikipedia. 2019. Accessed: 2024-03-10.
  15. W. contributors. Apple vision pro - wikipedia. 2023. Accessed: 2024-03-10.
  16. Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 1:4171–4186, 2019.
  17. Performance envelopes of virtual keyboard text input strategies in virtual reality. In 2019 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 289–300, 2019. doi: 10 . 1109/ISMAR . 2019 . 00027
  18. Evaluating the performance of hand-based probabilistic text input methods on a mid-air virtual qwerty keyboard. In IEEE Transactions on Visualization and Computer Graphics: forthcoming, 2023.
  19. Fast and precise touch-based text entry for head-mounted augmented reality with variable occlusion. ACM Trans. Comput.-Hum. Interact., 25(6), article no. Article 30, 40 pages, Dec. 2018.
  20. Fast and precise touch-based text entry for head-mounted augmented reality with variable occlusion. ACM Transactions on Computer-Human Interaction (TOCHI), 25(6):1–40, 2018.
  21. Comparative analysis of optitrack motion capture systems. In Advances in Motion Sensing and Control for Robotic Applications, pp. 15–31. Springer, 2019. doi: 10 . 1007/978-3-030-17369-2_2
  22. Y. Goldberg. Neural Network Methods for Natural Language Processing. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, 2017. An in-depth exploration of how neural network methods are applied in natural language processing, including language modeling.
  23. Wristext: One-handed text entry on smartwatch using wrist gestures. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pp. 1–14, 2018.
  24. Bidirectional lstm networks for improved phoneme classification and recognition. In Artificial Neural Networks: Formal Models and Their Applications–ICANN 2005, pp. 799–804. Springer Berlin Heidelberg, 2005.
  25. Typing on glasses: Adapting text entry to smart eyewear. In Proceedings of the 17th International Conference on Human-Computer Interaction with Mobile Devices and Services, 9 pages, p. 144–152. Association for Computing Machinery, 2015.
  26. Qwertyring: Text entry on physical surfaces using a ring. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 4(4):1–29, 2020.
  27. Rotoswype: Word-gesture typing using a ring. In Proceedings of the 2019 CHI conference on human factors in computing systems, pp. 1–12, 2019.
  28. Stat: Subtle typing around the thigh for head-mounted displays. In 22nd International Conference on Human-Computer Interaction with Mobile Devices and Services. ACM, 2020.
  29. Pinchtext: One-handed text entry technique combining pinch gestures and hand positions for head-mounted displays. International Journal of Human–Computer Interaction, pp. 1–17, 2022.
  30. Hifinger: One-handed text entry technique for virtual environments based on touches between fingers. Sensors, 19(14), 2019. doi: 10 . 3390/s19143063
  31. D. Jurafsky and J. H. Martin. Speech and Language Processing. Draft, 3 ed., 2019. A comprehensive guide to the field of speech and language processing, covering both traditional and modern approaches including N-gram models and deep learning techniques.
  32. Electroring: Subtle pinch and touch detection with a ring. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pp. 1–12, 2021.
  33. Star: Smartphone-analogous typing in augmented reality. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pp. 1–13, 2023.
  34. B. Klimt and Y. Yang. The enron corpus: A new dataset for email classification research. In European conference on machine learning, pp. 217–226. Springer, 2004.
  35. P.-O. Kristensson and S. Zhai. Shark2: a large vocabulary shorthand writing system for pen-based computers. In Proceedings of the 17th annual ACM symposium on User interface software and technology, pp. 43–52, 2004.
  36. K. Kukich. Techniques for automatically correcting words in text. ACM Computing Surveys (CSUR), 24(4):377–439, 1992.
  37. Hibey: Hide the keyboard in augmented reality. In 2019 IEEE International Conference on Pervasive Computing and Communications (PerCom, pp. 1–10. IEEE, 2019.
  38. Y. Lee and G. J. Kim. Vitty: Virtual touch typing interface with added finger buttons. In International Conference on Virtual, Augmented and Mixed Reality, pp. 111–119. Springer, 2017. doi: 10 . 1007/978-3-319-57987-0_9
  39. How we swipe: a large-scale shape-writing dataset and empirical findings. In Proceedings of the 23rd International Conference on Mobile Human-Computer Interaction, pp. 1–13, 2021.
  40. G. Lepouras. Comparing methods for numerical input in immersive virtual environments. Virtual Reality, 22(1):63–77, 2018. doi: 10 . 1007/s10055-017-0312-5
  41. Drg-keyboard: Enabling subtle gesture typing on the fingertip with dual imu rings. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 6(4), article no. 170, 30 pages, jan 2023. doi: 10 . 1145/3569463
  42. How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. p. 13, 03 2016. doi: 10 . 18653/v1/D16-1230
  43. Phrase sets for evaluating text entry techniques. In CHI’03 extended abstracts on Human factors in computing systems, pp. 754–755, 2003.
  44. Vulture: a mid-air word-gesture keyboard. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1073–1082, 2014.
  45. Vulture: A mid-air word-gesture keyboard. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’14, 10 pages, p. 1073–1082. Association for Computing Machinery, New York, NY, USA, 2014. doi: 10 . 1145/2556288 . 2556964
  46. Designing Experiments and Analyzing Data: A Model Comparison Perspective. Lawrence Erlbaum Associates Publishers, 2 ed., 2004.
  47. Meta AI. Pytext: A natural language modeling framework based on pytorch. https://github.com/facebookresearch/pytext, 2018. Accessed: insert date here.
  48. Meta Platforms, Inc. Oculus Quest Series. https://www.oculus.com/quest/. Accessed: 2024-03-10.
  49. Microsoft Corporation. Microsoft HoloLens 2. https://www.microsoft.com/en-us/hololens, 2019. Accessed: 2024-03-10.
  50. R. Mitton. English Spelling and the Computer. Longman Group, Harlow, Essex, UK, 1996.
  51. Space saving text input method for head mounted display with virtual 12-key keyboard. In 2018 IEEE 32nd International Conference on Advanced Information Networking and Applications (AINA), pp. 342–349, 2018. doi: 10 . 1109/AINA . 2018 . 00059
  52. A. Pauls and D. Klein. Faster and smaller n-gram language models. In Annual Meeting of the Association for Computational Linguistics, 2011.
  53. Argot: A wearable one-handed keyboard glove. In Proceedings of the 2014 ACM international symposium on wearable computers: adjunct program, pp. 87–92, 2014.
  54. T. A. Pirinen and K. Lindén. State-of-the-art in weighted finite-state spell-checking. In Computational Linguistics and Intelligent Text Processing, vol. 8404, pp. 519–532. Springer, 2014.
  55. Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 2019.
  56. G. Rakhmetulla and A. S. Arif. Swipering: Gesture typing on smartwatches using a segmented qwerty around the bezel. In Graphics Interface 2021, 2020.
  57. Performance and user experience of touchscreen and gesture keyboards in a lab setting and in the wild. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 679–688, 2015.
  58. Recent advances in recurrent neural networks, 2017.
  59. T. Salimans and D. P. Kingma. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. ArXiv, abs/1602.07868, 2016.
  60. M. Schuster and K. K. Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11):2673–2681, 1997.
  61. C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423, 623–656, 1948.
  62. Simulating realistic human motion trajectories of mid-air gesture typing. In 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 393–402. IEEE, 2021.
  63. Fast and robust mid-air gesture typing for ar headsets using 3d trajectory decoding. IEEE Transactions on Visualization and Computer Graphics, 2023.
  64. Personalization of a mid-air gesture keyboard using multi-objective bayesian optimization. In 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 702–710. IEEE, 2022.
  65. Kwickchat: A multi-turn dialogue system for aac using context-aware sentence generation by bag-of-keywords. In 27th International Conference on Intelligent User Interfaces, pp. 853–867, 2022.
  66. G. S. Snoddy. Learning and stability: a psychophysiological analysis of a case of motor learning with clinical applications. Journal of Applied Psychology, 10(1):1–36, 1926.
  67. Selection-based text entry in virtual reality. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI ’18, 13 pages, p. 1–13. Association for Computing Machinery, New York, NY, USA, 2018. doi: 10 . 1145/3173574 . 3174221
  68. Lstm neural networks for language modeling. In Thirteenth annual conference of the international speech communication association, 2012.
  69. Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008, 2017.
  70. R. Venkatesan and B. Li. Convolutional Neural Networks in Visual Computing: A Concise Guide. CRC Press, 2017. Archived from the original on 2023-10-16. Retrieved 2020-12-13.
  71. Palmtype: Using palms as keyboards for smart glasses. In Proceedings of the 17th International Conference on Human-Computer Interaction with Mobile Devices and Services, 8 pages, p. 153–160. Association for Computing Machinery, 2015.
  72. Understanding the heisenberg effect of spatial interaction: A selection induced error for spatially tracked input devices. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, CHI ’20, 10 pages, p. 1–10. Association for Computing Machinery, New York, NY, USA, 2020. doi: 10 . 1145/3313831 . 3376876
  73. Pointing and selection methods for text entry in augmented reality head mounted displays. In 2019 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 279–288. IEEE, 2019.
  74. Phrase-gesture typing on smartphones. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, UIST ’22, article no. 55, 11 pages. Association for Computing Machinery, New York, NY, USA, 2022. doi: 10 . 1145/3526113 . 3545683
  75. Tiptext: Eyes-free text entry on a fingertip keyboard. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology, pp. 883–899, 2019.
  76. Tap, dwell or gesture? exploring head-based text entry techniques for hmds. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 4479–4488. ACM, 2017.
  77. Tap, dwell or gesture? exploring head-based text entry techniques for hmds. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, CHI ’17, 10 pages, p. 4479–4488. Association for Computing Machinery, New York, NY, USA, 2017.
  78. Pizzatext: Text entry for virtual reality systems using dual thumbsticks. IEEE Transactions on Visualization and Computer Graphics, 24(11):2927–2935, 2018. doi: 10 . 1109/TVCG . 2018 . 2868581
  79. Gaze speedup: Eye gaze assisted gesture typing in virtual reality. In Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 595–606, 2023.

Summary

  • The paper introduces RingGesture, a novel system for mid-air gesture typing in AR using a ring device and a deep-learning word prediction framework.
  • RingGesture utilizes a Score Fusion framework integrating word-gesture decoding, spatial spelling correction, and contextual language models to enhance accuracy and speed.
  • Evaluations show RingGesture achieves 27.3 WPM (peak 47.9 WPM), a 28.2% reduction in Character Error Rate, and a System Usability Score of 83.

Overview of RingGesture: A Ring-Based Mid-Air Gesture Typing System Powered by a Deep Learning Word Prediction Framework

The paper introduces RingGesture, a novel system designed for text input via a ring-based device intended for augmented reality (AR) glasses. This system addresses text entry challenges imposed by AR's constraints, particularly when lightweight glasses cannot incorporate multiple cameras for hand tracking. As a solution, RingGesture uses an innovative combination of gesture detection and inertial measurement to capture hand movements and translate them into text entries. The system features a pinch gesture to initiate and conclude hand movements, mimicking the interaction style of virtual reality raycast-based gesture typing systems.

Key Components and Methodology

RingGesture operates on a deep-learning framework termed Score Fusion, which enhances accuracy and speed in text entry. This framework integrates three core components: a word-gesture decoding model, a spatial spelling correction model, and a contextual LLM. This integration works by fusing individual model scores to predict the most probable word or phrase being typed.

  1. Word-Gesture Decoding Model: This model interprets the user's mid-air gesture paths to generate text, relying on a pre-trained deep-learning architecture using a dataset for gesture typing.
  2. Spatial Spelling Correction Model: It adjusts predictions based on the proximity of gesture paths to target keys on a virtual keyboard, thus correcting potential errors induced by hand movement noise and vibrations.
  3. Contextual LLM: Provides predictive capabilities by understanding the context of previous words and phrases for more coherent sentence construction, utilizing lightweight LLMs that allow real-time processing.

Performance and Usability

Empirical evaluations show that the integration of these models significantly boosts the text input speed and accuracy. RingGesture achieves an average of 27.3 words per minute with a peak rate of 47.9 WPM, showing its competitive edge against traditional mobile text entry methods. The system furthers usability by achieving a System Usability Score of 83, indicating positive user reception.

The framework's efficacy is highlighted by a 28.2% improvement in Character Error Rate (CER) when compared to traditional word prediction frameworks. Moreover, Score Fusion enhances text input speed by 55.2%, indicating the potential for enhanced productivity within AR contexts.

Implications and Future Work

The implications of this research are substantial, not only within the AR domain but potentially extending to other applications requiring innovative input modalities. RingGesture affirms the viability of ring-based text entry solutions for interactive systems constrained by the capability to physically integrate complex input mechanisms. On a theoretical level, this work contributes insights into machine learning's role in interfacing human gestures with computational systems.

Future research directions could explore the refinement of tactile feedback to further reduce input errors and the integration of advanced machine learning models for even more efficient context understanding. Additionally, expanding evaluations in terms of user-specific adaptations could further tailor the system's usability across different demographics and use cases.

In conclusion, RingGesture stands as a promising advancement in AR input methods. By leveraging deep learning for enhanced text prediction and user interaction, it sets a precedent for practical, efficient text entry in augmented settings, paving the way for more integrated and ubiquitous technology solutions.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.