A Taxonomy for Human-LLM Interaction Modes: An Initial Exploration

Published 30 Mar 2024 in cs.HC | (2404.00405v1)

Abstract: With ChatGPT's release, conversational prompting has become the most popular form of human-LLM interaction. However, its effectiveness is limited for more complex tasks involving reasoning, creativity, and iteration. Through a systematic analysis of HCI papers published since 2021, we identified four key phases in the human-LLM interaction flow - planning, facilitating, iterating, and testing - to precisely understand the dynamics of this process. Additionally, we have developed a taxonomy of four primary interaction modes: Mode 1: Standard Prompting, Mode 2: User Interface, Mode 3: Context-based, and Mode 4: Agent Facilitator. This taxonomy was further enriched using the "5W1H" guideline method, which involved a detailed examination of definitions, participant roles (Who), the phases that happened (When), human objectives and LLM abilities (What), and the mechanics of each interaction mode (How). We anticipate this taxonomy will contribute to the future design and evaluation of human-LLM interaction.

Abstract PDF HTML Upgrade to Chat

References (85)

Citations (13)

View on Semantic Scholar

Summary

The paper establishes a taxonomy outlining four interaction phases and four principal modes that structure human-LLM interactions.
It employs a systematic literature review from top HCI venues to develop distinct modes including standard prompting, UI augmentation, context-based interaction, and agent facilitator.
The taxonomy informs HCI design and highlights research gaps, promoting iterative innovation in next-generation LLM applications.

Taxonomizing Human-LLM Interaction: Four Modes and Interactional Phases

Introduction

The paper "A Taxonomy for Human-LLM Interaction Modes: An Initial Exploration" (2404.00405) addresses the pressing need for systematic frameworks that can analyze and advance the diverse ways humans interact with LLMs. The proliferation of LLM-powered applications has led to a variety of interaction paradigms, but the literature has lacked both a clear segmentation of interaction phases and a comprehensive taxonomy of interaction modes. This work systematically reviews post-2021 HCI research to derive a four-phase interactional process and a structured four-mode taxonomy, unifying fragmented perspectives under a multidimensional analytic lens.

Methodology

A systematic literature review was conducted across the flagship HCI publication venues (CHI, CSCW, UIST, IUI) from 2021 onwards, capturing both LLM-integrated systems and enabling interaction techniques. The review incorporated a two-stage filtering: manual keyword search and expert screening by multiple authors. The final codex consisted of 73 high-relevance papers, each annotated according to "5W1H"-inspired meta-data (Who, What, When, How), which grounded the iterative construction and refinement of the taxonomy. The process emphasized both distinct phases of interaction and the class of affordances each interaction mode unlocks.

Four Phases of Human-LLM Interaction

The authors formalize the interaction with LLMs as a temporal flow of four key phases:

Planning: Pre-interaction design including articulation of task objectives, decomposition into subproblems, and explicit prompt engineering.
Facilitating: Real-time engagement with the LLM, including iterative prompt refinement, giving and receiving suggestions, and result selection.
Iterating: Systematic adjustment of established prompts/interactions, focusing on quality improvement or error correction without continued turn-taking.
Testing: Exploration and empirical evaluation of prompt and system variants, enabling robustness analyses and ablation studies.

This phase delineation enables concrete mapping between user objectives, LLM affordances, and design implications for interactive systems.

The Taxonomy: Four Principal Modes

Mode 1: Standard Prompting

This foundational mode encompasses both single-turn and multi-turn textual prompting, which is dominant in chat interfaces for ChatGPT, Claude, Gemini, and Llama 2. Two submodes are identified: simple conversational prompting and conversational prompting with explicit reasoning (e.g., chain-of-thought, step decomposition). While these enable casual querying and basic complex-task decomposition, empirical results highlight severe limitations in supporting iterative refinement and context management, leading to suboptimal outcomes for ill-defined or creative problems.

Figure 1: Taxonomy schematics illustrating the core interaction modes and their dependencies on user/LLM roles across the flow phases.

Mode 2: User Interface (UI) Augmentation

UI-augmented prompting structures the interaction via custom controls or visual affordances:

Structured Input UIs (e.g., PromptMaker) scaffold prompt creation, supporting consistency and reducing the cognitive burden of prompt engineering.
Output Variation UIs allow specification of result formats and facilitate multi-faceted result inspection (e.g., GenLine, GenForm).
Iterative UIs introduce features for debugging, relabeling, or retrying (e.g., BotDesigner, Promptify), explicitly supporting the iterative phase delineated above.
Testing UIs provide for empirical comparison and rapid prototyping (e.g., VISAR, Kim et al.'s framework).
UI for Reasoning enables direct user manipulation of logical decomposition via visual programming (e.g., PromptChainer, ChainForge), fusing human-in-the-loop reasoning and tool transparency.

Collectively, UI modes overcome many of the information density and procedural control limitations of direct textual prompting.

Figure 2: Visual depiction of UI-augmented interaction modes, indicating affordances for structured input, output control, and iterative reasoning.

Mode 3: Context-Based Interaction

This mode foregrounds the augmentation of LLMs with context alignment, either through:

Explicit Context: The context is given directly via codebooks, role assignment, or command rules (e.g., AutoSurveyGPT, Xiao et al.'s deductive coding system).
Implicit Context: The LLM infers intent or dimensional constraints through example-based few-shot prompting, role-play, or analysis of discourse cues (e.g., role as an expert or scenario-based priming).

This contextualization is key to alignment with user priorities in tasks characterized by ambiguous objectives or shifting criteria.

Figure 3: Mode architecture for context-based interaction, delineating explicit rule/bias configuration and implicit inference mechanisms.

Mode 4: Agent Facilitator

Beyond dyadic human-LLM interaction, this mode explores LLMs as mediators or facilitators in multi-agent/team settings:

Team Process Facilitator: LLMs streamline communication, consensus building, and meeting coordination in teams (e.g., using clarifying agents for multilingual groups).
Capability-Aware Task Delegator: LLMs support task assignment and resource allocation within teams, leveraging recognition of member expertise and planning requirements (e.g., RetroLens, domain delegation frameworks).

Such agentic roles move LLMs from tool to organizational partner, prompting new questions about coordination, explainability, and control.

Figure 4: Overview of the Agent Facilitator mode, showing LLMs mediating team processes and performing capability-based task assignment.

Implications for HCI and AI

The multidimensional taxonomy delivers two key analytic values: (1) it provides designers with a framework for exhaustive design space analysis, ensuring all phases and potential stakeholder roles are considered, and (2) it reveals opportunity for compositional innovation by hybridizing or sequencing interaction modes (e.g., role-play + UI iteration for sensemaking tasks).

Practically, the taxonomy informs the design of next-generation LLM-powered systems, driving the integration of reflective, iterative, and empirically-guided interfaces into domains far beyond writing and coding, including image/video generation and analytic pipelines. Theoretically, it exposes research gaps around poorly supported phases (notably iteration and testing in standard prompting) and misaligned affordances in certain hybrid applications.

The taxonomy’s flexibility is crucial: as HCI and LLM research trajectories diversify, especially toward multi-modal input/output and embodied/agentic deployments, the proposed classification provides a scalable foundation for iterative extension.

Limitations and Future Directions

The current taxonomy is bounded by manually curated HCI research from select venues and a focus on natural language and prompt-centered workflows. As LLMs are deployed as action agents in robotics, IoT, and dynamic real-world settings, new modes may emerge centered on perceptual grounding, physical affordance manipulation, and long-horizon planning. Further expansions should target state-of-the-art venues across NLP (ACL, EMNLP, NAACL) and other fields intersecting with human-AI interaction.

The granularity of overlapping categories also warrants further empirical validation. Many systems blend multiple modes, and how best to model their influence and interaction effects remains an open research agenda.

Conclusion

This paper establishes an analytically robust taxonomy for human-LLM interaction, distinguishing both temporal process phases and four principal interaction modes (Standard Prompting, UI, Context-based, Agent Facilitator). The taxonomy advances the theoretical discourse in HCI and AI by formalizing the design space of human-LLM systems and operationalizing best practices for system development and evaluation. As LLMs continue their rapid technical progression, the taxonomy will serve as a foundation for the principled evolution of interactive AI systems and for systematic exploration of human-centered augmentation strategies in ever-broader domains.