The Collaboration Gap in Human-AI Work

Published 20 Apr 2026 in cs.HC, cs.AI, cs.IR, and cs.LG | (2604.18096v1)

Abstract: LLMs are increasingly presented as collaborators in programming, design, writing, and analysis. Yet the practical experience of working with them often falls short of this promise. In many settings, users must diagnose misunderstandings, reconstruct missing assumptions, and repeatedly repair misaligned responses. This poster introduces a conceptual framework for understanding why such collaboration remains fragile. Drawing on a constructivist grounded theory analysis of 16 interviews with designers, developers, and applied AI practitioners working on LLM-enabled systems, and informed by literature on human-AI collaboration, we argue that stable collaboration depends not only on model capability but on the interaction's grounding conditions. We distinguish three recurrent structures of human-AI work: one-shot assistance, weak collaboration with asymmetric repair, and grounded collaboration. We propose that collaboration breaks down when the appearance of partnership outpaces the grounding capacity of the interaction and contribute a framework for discussing grounding, repair, and interaction structure in LLM-enabled work.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper identifies key determinants for effective human-AI collaboration by examining grounding capacity and repair burden.
The paper applies constructivist grounded theory to practitioner interviews, revealing one-shot, weak, and grounded interaction structures.
The paper proposes actionable design mechanisms like scoping, signalling, and repair to improve evaluation and system reliability in human-AI work.

Conceptualizing the Collaboration Gap in Human-AI Interaction

Introduction

This paper, "The Collaboration Gap in Human-AI Work" (2604.18096), investigates the fragility of human-LLM collaboration by dissecting the structural prerequisites for stable, productive interaction. Through a constructivist grounded-theory analysis of practitioner interviews, the authors propose a framework centering on grounding capacity and repair burden, thereby shifting the focus from mere model capability to interaction design.

Theoretical Context and Motivation

The persistent failures in human-AI interaction, despite advances in LLM performance, are attributed not solely to model limitations but to the inadequacy of mechanisms supporting robust collaboration. The paper bridges prior research on common ground in CSCW and HCI (e.g., Clark and Brennan, 1991; Amershi et al., 2019) with emerging empirical findings on LLMs in professional environments. It identifies the core determinants of effective collaboration as mutual understanding (grounding) and the distribution of effort necessary to detect and repair misalignments. This perspective contrasts with approaches that treat human-AI co-working as an extension of single-process usability or transparency evaluations.

Empirical Framework

Drawing on interviews with 16 practitioners engaged with LLM-enabled systems across diverse workflows, the paper defines three recurrent structures of interaction:

One-shot Assistance: Minimal grounding, single-turn exchanges, and predominantly post hoc human repair. Appropriate for limited, low-stakes tasks (e.g., summarization), but non-scalable for substantive collaboration.
Weak Collaboration: Iterative interaction, but with the majority of repair burden on the human. This regime typifies misalignment: the interaction superficially resembles partnership, but grounding is unstable and the human is forced to reconstruct task context and correct outputs.
Grounded Collaboration: High grounding capacity, mutual clarification, context tracking, and shared repair. Explicit mechanisms allow for surfacing ambiguities, constrained task scope, and negotiated repair, facilitating more resilient joint problem solving.

The analysis underscores that interaction design, not just model sophistication, modulates the transition from brittle, user-led correction to stable, co-adaptive collaboration.

Mechanisms for Enhancing Collaboration

The practitioner data highlights three primary mechanisms for increasing grounding capacity and redistributing repair responsibilities:

Scoping: Constraining task boundaries, fixing formats, and modularizing workflows reduce ambiguity and the need for extensive shared context.
Signalling: Eliciting evidence of LLM understanding—such as through system restatement of assumptions, explicit indication of task state, or uncertainty quantification—enables more inspectable and actionable mutual understanding.
Repair: Embedding explicit pathways for rollback, clarification, and contestation transforms repair from ad hoc to systematic, thereby lowering cognitive and managerial overhead on the human collaborator.

These practices collectively operationalize the theoretical requirements for stable interaction, making them actionable targets for designers and evaluators.

Implications for HCI and CSCW Research

The framework has significant implications for both empirical evaluation and practical design of human-AI systems:

Collaboration Should Be Distinguished from Interactivity: Multi-turn dialogue or iteration with an LLM does not imply true collaboration. Grounded collaboration must be measured in terms of shared context and the balance of repair effort.
Collaboration Gaps Are Structural, Not Just Technical: Failures commonly arise from insufficient support for grounding and repair, not necessarily from limitations in model accuracy or generativity.
Repair Burden as an Evaluation Metric: The locus of repair work (human, system, or shared) serves as a practical diagnostic for the robustness of the collaborative interaction.

These insights support the argument that workflow and socio-technical infrastructure are as critical as algorithmic progress for realizing reliable human-AI teams.

Future Directions

This conceptual reframing motivates new avenues for both research and engineering:

Interaction Protocols: Development of adaptive protocols wherein grounding status is negotiated, not assumed.
System Instrumentation: Instrumenting LLMs with the ability to signal contextual uncertainty, knowledge gaps, and to invite user-driven repair, pushing toward more transparent and balanced interaction.
Collaborative Benchmarks: Design of evaluation suites (e.g., [Poelitz et al., (Poelitz et al., 24 Feb 2026)]) that explicitly assess not only task success but also the quality and distribution of repair and grounding.

Progress in these directions could catalyze a transition from fragile, tool-centric use of LLMs to more stable forms of machine-assisted teamwork.

Conclusion

The paper provides a theoretically-grounded, empirically informed framework for diagnosing and addressing the instability of human-AI collaboration. By emphasizing the centrality of common ground and the balance of repair burden, it identifies crucial levers for transforming how LLMs are embedded in workflows. This work reframes design and evaluation of collaborative AI as fundamentally socio-technical: achieving reliability depends not just on what models can do in isolation, but on how grounding and repair are operationalized throughout the interaction structure.

Markdown Report Issue