Typist Experiment: an Investigation of Human-to-Human Dictation via Role-play to Inform Voice-based Text Authoring
Abstract: Voice dictation is increasingly used for text entry, especially in mobile scenarios. However, the speech-based experience gets disrupted when users must go back to a screen and keyboard to review and edit the text. While existing dictation systems focus on improving transcription and error correction, little is known about how to support speech input for the entire text creation process, including composition, reviewing and editing. We conducted an experiment in which ten pairs of participants took on the roles of authors and typists to work on a text authoring task. By analysing the natural language patterns of both authors and typists, we identified new challenges and opportunities for the design of future dictation interfaces, including the ambiguity of human dictation, the differences between audio-only and with screen, and various passive and active assistance that can potentially be provided by future systems.
- Crossmodal Error Correction of Continuous Handwriting Recognition by Speech. In Proceedings of the 12th International Conference on Intelligent User Interfaces (Honolulu, Hawaii, USA) (IUI ’07). Association for Computing Machinery, New York, NY, USA, 243–250. https://doi.org/10.1145/1216295.1216339
- Writing strategies for science communication: Data and computational analysis. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 5327–5344.
- Shiri Azenkot and Nicole B Lee. 2013. Exploring the use of speech input by blind people on mobile devices. In Proceedings of the 15th international ACM SIGACCESS conference on computers and accessibility. 1–8.
- Alan D Baddeley and Graham J Hitch. 1994. Developments in the concept of working memory. Neuropsychology 8, 4 (1994), 485.
- Thomas Binder. 1999. Setting the stage for improvised video scenarios. In CHI’99 extended abstracts on Human factors in computing systems. 230–231.
- Eva Brandt and Camilla Grunnet. 2000. Evoking the future: Drama and props in user centered design. In Proceedings of Participatory Design Conference (PDC 2000). 11–20.
- Oğuz Turan Buruk and Oğuzhan Özcan. 2016. WEARPG: Game Design Implications for Movement-Based Play in Table-Top Role-Playing Games with Arm-Worn Devices. In Proceedings of the 20th International Academic Mindtrek Conference (Tampere, Finland) (AcademicMindtrek ’16). Association for Computing Machinery, New York, NY, USA, 403–412. https://doi.org/10.1145/2994310.2994315
- Janice Carter-Wesley. 2009. Voice recognition dictation for nurses. JONA: The Journal of Nursing Administration 39, 7/8 (2009), 310–312.
- State-of-the-Art Speech Recognition with Sequence-to-Sequence Models. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 4774–4778. https://doi.org/10.1109/ICASSP.2018.8462105
- Mark G. Core and Lenhart K. Schubert. 1999. A Syntactic Framework for Speech Repairs and Other Disruptions. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (College Park, Maryland) (ACL ’99). Association for Computational Linguistics, USA, 413–420. https://doi.org/10.3115/1034678.1034742
- Susan De La Paz. 1999. Composing via dictation and speech recognition systems: Compensatory technology for students with learning disabilities. Learning Disability Quarterly 22, 3 (1999), 173–182.
- Just Speak It: Minimize Cognitive Load for Eyes-Free Text Editing with a Smart Voice Assistant. In The 34th Annual ACM Symposium on User Interface Software and Technology. 910–921.
- Comparing Smartphone Speech Recognition and Touchscreen Typing for Composition and Transcription. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3313831.3376861
- Coordination of communication: Effects of shared visual context on collaborative work. In Proceedings of the 2000 ACM conference on Computer supported cooperative work. 21–30.
- EDITalk: Towards Designing Eyes-Free Interactions for Mobile Word Processing. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–10. https://doi.org/10.1145/3173574.3173977
- EYEditor: Towards On-the-Go Heads-Up Text Editing Using Voice and Manual Input. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376173
- Eyeditor: Towards on-the-go heads-up text editing using voice and manual input. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.
- Commanding and Re-Dictation: Developing Eyes-Free Voice-Based Interaction for Editing Dictated Text. ACM Transactions on Computer-Human Interaction (TOCHI) 27, 4 (2020), 1–31.
- Commanding and Re-Dictation: Developing Eyes-Free Voice-Based Interaction for Editing Dictated Text. ACM Trans. Comput.-Hum. Interact. 27, 4, Article 28 (Aug. 2020), 31Â pages. https://doi.org/10.1145/3390889
- Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine 29, 6 (Nov 2012), 82–97. https://doi.org/10.1109/MSP.2012.2205597
- Using’endowed props’ in scenario-based design. In Proceedings of the second Nordic conference on Human-computer interaction. 1–10.
- Multitasking with play write, a mobile microproductivity writing tool. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. 411–422.
- Patterns of Entry and Correction in Large Vocabulary Continuous Speech Recognition Systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). ACM, New York, NY, USA, 568–575. https://doi.org/10.1145/302979.303160
- Patterns of entry and correction in large vocabulary continuous speech recognition systems. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 568–575.
- Tatsuya Kawahara. 2007. Intelligent transcription system based on spontaneous speech processing. In Second International Conference on Informatics Research for Development of Knowledge Society Infrastructure (ICKS’07). IEEE, 19–26.
- Detecting Multiple Speech Disfluencies Using a Deep Residual Network with Bidirectional Long Short-Term Memory. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6089–6093.
- Voice typing: a new speech interaction model for dictation on touchscreen devices. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2277–2286.
- Eye-write: Gaze sharing for collaborative writing. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.
- Steering the conversation: a linguistic exploration of natural language interactions with a digital assistant during simulated driving. Applied ergonomics 63 (2017), 53–61.
- W Levelt. 1999. Producing spoken language. The neurocognition of language (1999), 83–122.
- Enriching speech recognition with automatic detection of sentence boundaries and disfluencies. IEEE Transactions on audio, speech, and language processing 14, 5 (2006), 1526–1540.
- Paul Benjamin Lowry and Jay F Nunamaker. 2003. Using Internet-based, distributed collaborative writing tools to improve coordination and group awareness in writing teams. IEEE Transactions on Professional Communication 46, 4 (2003), 277–297.
- Use of role-play and gamification in a software project course. In 2017 IEEE frontiers in education conference (FIE). IEEE, 1–5.
- Kristina Moroz-Lapin. 2009. Role play in HCI studies. HCI Educators 2009-playing with our education (2009), 64–67.
- Christine Nakatani and Julia Hirschberg. 1993. A Speech-First Model for Repair Detection and Correction. In Proceedings of the 31st Annual Meeting on Association for Computational Linguistics (Columbus, Ohio) (ACL ’93). Association for Computational Linguistics, USA, 46–53. https://doi.org/10.3115/981574.981581
- WearWrite: Crowd-assisted writing from smartwatches. In Proceedings of the 2016 CHI conference on human factors in computing systems. 3834–3846.
- Jun Ogata and Masataka Goto. 2005. Speech repair: quick error correction just by using selection operation for speech input interfaces. In Ninth European Conference on Speech Communication and Technology.
- Voice recognition dictation: radiologist as transcriptionist. Journal of digital imaging 21, 4 (2008), 384–389.
- How people write together. In Proceedings of the Hawaii International Conference on System Sciences, Vol. 25. IEEE INSTITUTE OF ELECTRICAL AND ELECTRONICS, 127–127.
- Toward intelligent support of authoring machinima media content: story and visualization. (2008).
- Mark O Riedl. 2010. Story planning: Creativity through exploration, retrieval, and analogical transformation. Minds and Machines 20, 4 (2010), 589–614.
- Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition. arXiv:1402.1128Â [cs.NE]
- Wizard of Oz Experimentation for Language Technology Applications: Challenges and Tools. Interacting with Computers 27, 6 (2015), 592–615.
- Hands-free, speech-based navigation during dictation: Difficulties, consequences, and solutions. Human-computer interaction 18, 3 (2003), 229–257.
- Productivity, satisfaction, and interaction strategies of individuals with spinal cord injuries and traditional users interacting with speech recognition software. Universal Access in the information Society 1, 1 (2001), 4–15.
- Gry Seland. 2006. System Designer Assessments of Role Play as a Design Method: A Qualitative Study. In Proceedings of the 4th Nordic Conference on Human-Computer Interaction: Changing Roles (Oslo, Norway) (NordiCHI ’06). Association for Computing Machinery, New York, NY, USA, 222–231. https://doi.org/10.1145/1182475.1182499
- Kristian T Simsarian. 2003. Take it to the next stage: the roles of role playing in the design process. In CHI’03 extended abstracts on Human factors in computing systems. 1012–1013.
- Elizabeth Stokoe. 2011. Simulated interaction and communication skills training: The ‘conversation-analytic role-play method’. In Applied conversation analysis. Springer, 119–139.
- Multimodal Error Correction for Speech User Interfaces. ACM Trans. Comput.-Hum. Interact. 8, 1 (March 2001), 60–98. https://doi.org/10.1145/371127.371166
- Multimodal error correction for speech user interfaces. ACM transactions on computer-human interaction (TOCHI) 8, 1 (2001), 60–98.
- Bernhard Suhm and Alex Waibel. 1997. Exploiting repair context in interactive error recovery. In Fifth European Conference on Speech Communication and Technology.
- Dag Svanaes and Gry Seland. 2004. Putting the users center stage: role playing and low-fi prototyping enable end users to design mobile systems. In Proceedings of the SIGCHI conference on Human factors in computing systems. 479–486.
- Supporting collaborative writing with microtasks. In Proceedings of the 2016 CHI conference on human factors in computing systems. 2657–2668.
- Rhinedd Toms. 1985. The Effective Use of Role-Play: A Handbook for Teachers and Trainers. By Morry van Ments. London: Kogan Page. 1983. Pp. 186.£ 12.50. The British Journal of Psychiatry 146, 3 (1985), 340–340.
- Johanna Viitanen. 2009. Redesigning digital dictation for physicians: A user-centred approach. Health Informatics Journal 15, 3 (2009), 179–190.
- The use of improvisational role-play in user centered design processes. In International Conference on Human-Computer Interaction. Springer, 262–272.
- Why users do not want to write together when they are writing together: Users’ rationales for today’s collaborative writing practices. Proceedings of the ACM on Human-Computer Interaction 1, CSCW (2017), 1–18.
- Voice and Touch Based Error-tolerant Multimodal Text Editing and Correction for Smartphones. In The 34th Annual ACM Symposium on User Interface Software and Technology. 162–178.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.