Chart What I Say: Exploring Cross-Modality Prompt Alignment in AI-Assisted Chart Authoring
Abstract: Recent chart-authoring systems, such as Amazon Q in QuickSight and Copilot for Power BI, demonstrate an emergent focus on supporting natural language input to share meaningful insights from data through chart creation. Currently, chart-authoring systems tend to integrate voice input capabilities by relying on speech-to-text transcription, processing spoken and typed input similarly. However, cross-modality input comparisons in other interaction domains suggest that the structure of spoken and typed-in interactions could notably differ, reflecting variations in user expectations based on interface affordances. Thus, in this work, we compare spoken and typed instructions for chart creation. Findings suggest that while both text and voice instructions cover chart elements and element organization, voice descriptions have a variety of command formats, element characteristics, and complex linguistic features. Based on these findings, we developed guidelines for designing voice-based authoring-oriented systems and additional features that can be incorporated into existing text-based systems to support speech modality.
- Multimodal Presentation of Two-Dimensional Charts: An Investigation Using Open Office XML and Microsoft Excel. ACM Trans. Access. Comput. 3, 2, Article 8 (nov 2010), 50Â pages. https://doi.org/10.1145/1857920.1857925
- Affordances of Input Modalities for Visual Data Exploration in Immersive Environments. https://api.semanticscholar.org/CorpusID:20980425
- Nicholas J Belkin. 1980. Anomalous states of knowledge as a basis for information retrieval. Canadian journal of information science 5, 1 (1980), 133–143.
- Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology 3, 2 (2006), 77–101.
- A Multi-Modal Natural Language Interface to an Information Visualization Environment. International Journal of Speech Technology 4 (07 2001), 297–314. https://doi.org/10.1023/A:1011368926479
- Text-to-Viz: Automatic Generation of Infographics from Proportion-Related Natural Language Statements. IEEE Transactions on Visualization and Computer Graphics 26 (2020), 906–916.
- SVGPlott: An Accessible Tool to Generate Highly Adaptable, Accessible Audio-Tactile Charts for and from Blind and Visually Impaired People. In Proceedings of the 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments (Rhodes, Greece) (PETRA ’19). Association for Computing Machinery, New York, NY, USA, 186–195. https://doi.org/10.1145/3316782.3316793
- Gemini: A Family of Highly Capable Multimodal Models. arXiv:2312.11805Â [cs.CL]
- Holistic Evaluation of Language Models. arXiv:2211.09110Â [cs.CL]
- ReMap: Lowering the barrier to help-seeking with multimodal search. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 979–986.
- DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology (Charlotte, NC, USA) (UIST ’15). Association for Computing Machinery, New York, NY, USA, 489–500. https://doi.org/10.1145/2807442.2807478
- Ido Guy. 2018. The Characteristics of Voice Search: Comparing Spoken with Typed-in Mobile Web Search Queries. ACM Trans. Inf. Syst. 36, 3, Article 30 (mar 2018), 28Â pages. https://doi.org/10.1145/3182163
- Applying Pragmatics Principles for Interaction with Visual Analytics. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2018), 309–318. https://doi.org/10.1109/TVCG.2017.2744684
- Communicating Visualizations without Visuals: Investigation of Visualization Alternative Text for People with Visual Impairments. IEEE Transactions on Visualization and Computer Graphics 28, 1 (2022), 1095–1105. https://doi.org/10.1109/TVCG.2021.3114846
- The Power of Scale for Parameter-Efficient Prompt Tuning. arXiv:2104.08691Â [cs.CL]
- Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation. arXiv preprint arXiv:2401.10838 (2024).
- ADVISor: Automatic Visualization Answer for Natural-Language Question on Tabular Data. 2021 IEEE 14th Pacific Visualization Symposium (PacificVis) (2021), 11–20.
- nvBench: A Large-Scale Synthesized Dataset for Cross-Domain Natural Language to Visualization Task. ArXiv abs/2112.12926 (2021).
- Are LLMs Robust for Spoken Dialogues? arXiv e-prints (2024), arXiv–2401.
- Shiri Melumad. 2023. Vocalizing search: How voice technologies alter consumer search processes and satisfaction. Journal of Consumer Research (2023), ucad009.
- NL4DV: A Toolkit for Generating Analytic Specifications for Data Visualization from Natural Language Queries. IEEE Transactions on Visualization and Computer Graphics 27 (2021), 369–379.
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774Â [cs.CL]
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
- Text2Chart: A Multi-Staged Chart Generator from Natural Language Text. In PAKDD.
- Eviza: A Natural Language Interface for Visual Analysis. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology (Tokyo, Japan) (UIST ’16). Association for Computing Machinery, New York, NY, USA, 365–377. https://doi.org/10.1145/2984511.2984588
- Vidya Setlur and Melanie Tory. 2022. How Do You Converse with an Analytical Chatbot? Revisiting Gricean Maxims for Designing Analytical Conversational Behavior. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 29, 17 pages. https://doi.org/10.1145/3491102.3501972
- InChorus: Designing Consistent Multimodal Interactions for Data Visualization on Tablet Devices. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376782
- Collecting and Characterizing Natural Language Utterances for Specifying Data Visualizations. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (2021).
- Arjun Srinivasan and Vidya Setlur. 2021. Snowy: Recommending Utterances for Conversational Visual Analysis. In The 34th Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’21). Association for Computing Machinery, New York, NY, USA, 864–880. https://doi.org/10.1145/3472749.3474792
- Statista. 2007. Statista - The Statistics Portal for Market Data, Market Research and Market Studies. https://www.statista.com. [Accessed: April 17, 2023].
- Sevi: Speech-to-Visualization through Neural Machine Translation. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD ’22). Association for Computing Machinery, New York, NY, USA, 2353–2356. https://doi.org/10.1145/3514221.3520150
- Philip Tucker and Dylan M. Jones. 1991. Voice as interface: An overview. International Journal of Human–Computer Interaction 3, 2 (1991), 145–170. https://doi.org/10.1080/10447319109526002 arXiv:https://doi.org/10.1080/10447319109526002
- SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. arXiv:1905.00537Â [cs.CL]
- Towards Natural Language-Based Visualization Authoring. IEEE Transactions on Visualization and Computer Graphics 29, 1 (2023), 1222–1232. https://doi.org/10.1109/TVCG.2022.3209357
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.