Papers
Topics
Authors
Recent
Search
2000 character limit reached

In-IDE Human-AI Experience in the Era of Large Language Models; A Literature Review

Published 19 Jan 2024 in cs.SE and cs.HC | (2401.10739v2)

Abstract: Integrated Development Environments (IDEs) have become central to modern software development, especially with the integration of AI to enhance programming efficiency and decision-making. The study of in-IDE Human-AI Experience is critical in understanding how these AI tools are transforming the software development process, impacting programmer productivity, and influencing code quality. We conducted a literature review to study the current state of in-IDE Human-AI Experience research, bridging a gap in understanding the nuanced interactions between programmers and AI assistants within IDEs. By analyzing 36 selected papers, our study illustrates three primary research branches: Design, Impact, and Quality of Interaction. The trends, challenges, and opportunities identified in this paper emphasize the evolving landscape of software development and inform future directions for research and development in this dynamic field. Specifically, we invite the community to investigate three aspects of these interactions: designing task-specific user interface, building trust, and improving readability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Guidelines for Human-AI Interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13.
  2. Trust in Generative AI among students: An Exploratory Study. arXiv:2310.04631 [cs.HC]
  3. Spellburst: A Node-based Interface for Exploratory Creative Coding with Natural Language Prompts. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). ACM.
  4. Is GitHub’s Copilot as Bad as Humans at Introducing Vulnerabilities in Code? arXiv:2204.04741 [cs.SE]
  5. Grounded Copilot: How Programmers Interact with Code-Generating Models. Proc. ACM Program. Lang. 7, OOPSLA1, Article 78 (apr 2023), 27 pages.
  6. Prompt Sapper: A LLM-Empowered Production Tool for Building AI Chains. arXiv:2306.12028 [cs.SE]
  7. Deep reinforcement learning from human preferences. Advances in neural information processing systems 30 (2017).
  8. GitHub Copilot AI pair programmer: Asset or Liability? arXiv:2206.15331 [cs.SE]
  9. CoPrompt: Supporting Prompt Sharing and Referring in Collaborative Natural Language Programming. arXiv:2310.09235 [cs.HC]
  10. How Do Data Analysts Respond to AI Assistance? A Wizard-of-Oz Study. arXiv:2309.10108 [cs.HC]
  11. SE Factual Knowledge in Frozen Giant Code Model: A Study on FQN and its Retrieval. arXiv:2212.08221 [cs.SE]
  12. Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models. arXiv:2307.10236 [cs.SE]
  13. Saki Imai. 2022. Is GitHub Copilot a Substitute for Human Pair-Programming? An Empirical Study. In Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings (Pittsburgh, Pennsylvania) (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 319–321.
  14. Exploring the Learnability of Program Synthesizers by Novice Programmers. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 64, 15 pages.
  15. How Novices Use LLM-Based Code Generators to Solve CS1 Coding Tasks in a Self-Paced Learning Environment. arXiv:2309.14049 [cs.HC]
  16. Florian Lehmann and Daniel Buschek. 2020. Examining Autocompletion as a Basic Concept for Interaction with Generative AI. i-com 19, 3 (2020), 251–264.
  17. A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges. arXiv:2303.17125 [cs.SE]
  18. Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765–4774.
  19. A Mixed Reality Approach for Innovative Pair Programming Education with a Conversational AI Virtual Avatar. In Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering (Oulu, Finland) (EASE ’23). Association for Computing Machinery, New York, NY, USA, 450–454.
  20. On the Design of AI-powered Code Assistants for Notebooks. arXiv:2301.11178 [cs.HC]
  21. Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming. arXiv:2210.14306 [cs.SE]
  22. When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming. arXiv:2306.04930 [cs.HC]
  23. In-IDE Generation-based Information Support with a Large Language Model. arXiv:2307.08177 [cs.SE]
  24. Nhan Nguyen and Sarah Nadi. 2022. An Empirical Evaluation of GitHub Copilot’s Code Suggestions. In Proceedings of the 19th International Conference on Mining Software Repositories (Pittsburgh, Pennsylvania) (MSR ’22). Association for Computing Machinery, New York, NY, USA, 1–5.
  25. Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions. arXiv:2108.09293 [cs.CR]
  26. “It’s Weird That It Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers. 31, 1, Article 4 (nov 2023), 31 pages.
  27. Amon Rapp. 2023. Human–Computer Interaction. In Oxford Research Encyclopedia of Psychology.
  28. Peter Robe and Sandeep Kaur Kuttal. 2022. Designing PairBuddy—A Conversational Agent for Pair Programming. ACM Trans. Comput.-Hum. Interact. 29, 4, Article 34 (may 2022), 44 pages.
  29. The Programmer’s Assistant: Conversational Interaction with a Large Language Model for Software Development. In Proceedings of the 28th International Conference on Intelligent User Interfaces (Sydney, NSW, Australia) (IUI ’23). Association for Computing Machinery, New York, NY, USA, 491–514.
  30. Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants. arXiv:2208.09727 [cs.CR]
  31. Automatically assessing code understandability: How far are we?. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 417–427.
  32. A comprehensive model for code readability. Journal of Software: Evolution and Process 30, 6 (2018), e1958.
  33. Copilot for Xcode: Exploring AI-Assisted Programming by Prompting Cloud-based Large Language Models. arXiv:2307.14349 [cs.SE]
  34. Towards More Effective AI-Assisted Programming: A Systematic Design Exploration to Improve Visual Studio IntelliCode’s User Experience. In Proceedings of the 45th International Conference on Software Engineering: Software Engineering in Practice (Australia) (ICSE-SEIP ’23). IEEE Press, 185–195.
  35. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems. ACM, 1–7.
  36. Mircea-Serban Vasiliniuc and Adrian Groza. 2023. Case Study: Using AI-Assisted Code Generation In Mobile Teams. arXiv:2308.04736 [cs.SE]
  37. Investigating and Designing for Trust in AI-powered Code Generation Tools. arXiv:2305.11248 [cs.HC]
  38. Toward General Design Principles for Generative AI Applications. arXiv:2301.05578 [cs.HC]
  39. Better Together? An Evaluation of AI-Supported Code Translation. In 27th International Conference on Intelligent User Interfaces (Helsinki, Finland) (IUI ’22). Association for Computing Machinery, New York, NY, USA, 369–391.
  40. Michel Wermelinger. 2023. Using GitHub Copilot to Solve Simple Programming Problems. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (Toronto ON, Canada) (SIGCSE 2023). Association for Computing Machinery, New York, NY, USA, 172–178.
  41. Improving Code Autocompletion with Transfer Learning. In Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice (Pittsburgh, Pennsylvania) (ICSE-SEIP ’22). Association for Computing Machinery, New York, NY, USA, 161–162.
  42. On the Concerns of Developers When Using GitHub Copilot. arXiv:2311.01020 [cs.SE]
  43. Productivity Assessment of Neural Code Completion. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming (San Diego, CA, USA) (MAPS 2022). Association for Computing Machinery, New York, NY, USA, 21–29.
Citations (2)

Summary

  • The paper presents a systematic literature review of 36 studies from 2020 to 2024, identifying design, workflow, and quality aspects of in-IDE Human-AI interactions.
  • It outlines design principles for AI-enabled tools, emphasizing user control, adaptability, and clarity in code suggestions.
  • It reveals the impact of AI assistance on productivity and code security, while addressing challenges related to model errors and trust.

In-IDE Human-AI Experience in the Era of LLMs: A Literature Review

This paper presents a literature review of Human-AI eXperience (HAX) within Integrated Development Environments (IDEs) in light of recent advancements in LLMs. The review analyzes 36 papers published between 2020 and 2024, identifying three primary research areas: Design, Impact, and Quality of Interaction. The study synthesizes current research trends, challenges, and opportunities, offering insights for future research and development in this evolving field.

Methodological Approach to Literature Review

The study employed a systematic literature review methodology, selecting relevant papers from ACM Digital Library, DBLP, IEEE Digital Library, and ArXiv. A targeted search string was used to identify papers focusing on the intersection of IDEs and AI assistance. Inclusion and exclusion criteria were applied to ensure the relevance and recency of the selected studies, resulting in a final set of 36 papers. The extracted information included publication year, authorship, study goals, research questions, methodology, and key findings.

Key Research Areas in In-IDE HAX

Design of AI-Enabled Tools

This research area focuses on user interface design considerations when integrating AI technologies into programming environments. The review identifies design principles for AI assistance, emphasizing clear communication, user control, adaptability, and user-friendly interactions. Generative AI design principles include communicating probabilistic nature, facilitating user annotation, accommodating imperfection through feedback, and implementing user-driven controls. Code assistants should act as adaptable ghostwriters, offering context control, balancing politeness and promotion, integrating search and documentation, and incorporating means of verification. Autocompletion features should provide glanceable suggestions, juxtaposition for clarity, simplicity through familiarity, sufficient visibility for validation, and snoozability to prevent interruptions. Several papers explore the potential of AI as a pair programmer, highlighting both the acceptance and the challenges in creating user-friendly interfaces for novice programmers.

Impact of HAX on Programmers' Workflow

This area investigates how AI assistance reshapes the programming workflow, focusing on usability issues, effects on productivity, and user trust. The research indicates that in-IDE Human-AI Interaction significantly alters the traditional programming workflow, introducing dedicated time for interacting with AI and processing its outputs. While AI tools can increase productivity, they may also lead to trade-offs in code quality, as developers sometimes struggle to align AI-generated outputs with their requirements and expectations. The context in which AI tools are used, the quality of suggestions, and compatibility issues play crucial roles in shaping the overall effectiveness and user perception. Studies on novices suggest that AI tools can positively influence programming education but require attention to challenges such as over-reliance.

Quality of AI Assistants

This research area examines the performance of AI assistants, focusing on correctness, understandability, security, and the ability to solve algorithmic problems. The review reveals that the effectiveness of an AI assistant depends not only on its user interface but also on the quality of the model's output. Studies show that while AI assistants can provide relevant solutions and suggestions, they might also be erroneous and require user correction. Regarding code comprehensibility and complexity, AI assistants generally produce understandable code that may be less complex than human-written code. However, security assessments reveal potential vulnerabilities, highlighting the importance of fine-tuning foundational models to enhance overall interaction quality.

Future Research Directions

The authors suggest focusing on three aspects of in-IDE HAX: task-specific user interfaces, trust, and readability. They propose that chat-based interaction may not always be the most effective approach and that different tasks may require alternative methods. Enhancing model reactivity by transforming predictable actions into automatic suggestions is also suggested. Addressing the developers' attitudes toward AI may impact the interaction with technology. Highlighting tokens that affect the output the most, approximating the uncertainty of the model, and providing clear and transparent context could facilitate trust between AI and the user. In terms of code quality, the authors propose that readability is a promising concept for code models' alignment.

Threats to Validity

The authors acknowledge potential threats to the validity of their findings, including sampling bias, temporal bias, source reliability, and interpretation bias. They address these threats by providing a detailed search protocol, acknowledging the limitations of including non-peer-reviewed papers, and emphasizing transparency in the analysis process.

Conclusion

The literature review provides a comprehensive overview of in-IDE Human-AI Experience, highlighting key research areas, design principles, and future research directions. The study emphasizes the need for task-specific user interfaces, building trust in AI assistants, and improving code readability to enhance the overall developer experience. The identified research areas, curated dataset, and proposed directions contribute to the collective understanding of the evolving dynamics between humans and AI within IDEs.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.