Interaction, Process, Infrastructure: A Unified Architecture for Human-Agent Collaboration

Published 13 Jun 2025 in cs.HC and cs.AI | (2506.11718v1)

Abstract: As AI tools proliferate across domains, from chatbots and copilots to emerging agents, they increasingly support professional knowledge work. Yet despite their growing capabilities, these systems remain fragmented: they assist with isolated tasks but lack the architectural scaffolding for sustained, adaptive collaboration. We propose a layered framework for human-agent systems that integrates three interdependent dimensions: interaction, process, and infrastructure. Crucially, our architecture elevates process to a primary focus by making it explicit, inspectable, and adaptable, enabling humans and agents to align with evolving goals and coordinate over time. This model clarifies limitations of current tools, unifies emerging system design approaches, and reveals new opportunities for researchers and AI system builders. By grounding intelligent behavior in structured collaboration, we reimagine human-agent collaboration not as task-specific augmentation, but as a form of coherent and aligned system for real-world work.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a modular three-layer framework that decouples interaction, process, and infrastructure for cohesive human-agent collaboration.
It details a persistent process layer that structures workflows via dynamic modules, supporting adaptable and inspectable task execution.
The architecture overcomes siloed AI paradigms by enabling reflective, adaptive, and transparent human-agent interactions across diverse applications.

A Unified Layered Architecture for Human-Agent Collaboration

Introduction and Motivation

The paper "Interaction, Process, Infrastructure: A Unified Architecture for Human-Agent Collaboration" (2506.11718) addresses the fragmentation and lack of sustained, adaptive collaboration in current AI systems. While chatbots, copilots, and agents have advanced individual capabilities, they remain siloed, forcing users to manually bridge gaps between tools, restate intent, and track dependencies. The authors argue that the missing element is not model capability, but a shared, inspectable, and adaptable process representation that aligns human and agent activities over time. They propose a three-layer architecture—Interaction, Process, and Infrastructure—that decouples and modularizes collaboration, enabling extensible, transparent, and reflective human-agent systems.

Analysis of Existing Paradigms

The paper systematically categorizes current AI systems into three paradigms:

Chatbots (Conversation-First): Optimized for expressive, low-barrier engagement but limited by stateless, linear interactions. They lack persistent task structure and force users to track progress manually.
Copilots (Application-First): Embedded in software environments, providing context-sensitive micro-support. However, they are constrained by the host application's scope and lack awareness of broader workflows or evolving goals.
Agents (Task-First): Capable of multi-step planning and execution, maintaining internal representations of goals and workflows. Yet, they often operate autonomously with limited support for user inspection, revision, or midstream intervention.

Each paradigm encodes an implicit, hardcoded model of collaboration, resulting in fractured workflows and limited user control. The authors highlight that these limitations stem from architectural assumptions rather than model capabilities.

The Three-Layer Framework

The proposed architecture consists of three modular yet interdependent layers:

Process Layer: The Collaborative Core

The Process Layer is the central innovation, providing a persistent, structured representation of goals, workflows, reasoning paths, and progress. It comprises five dynamic modules:

Problem Space: Captures evolving task scope and objectives.
Workflow: Encodes coordination logic, sequencing, delegation, and contingencies.
Operations: Concrete task steps executed via functions, models, tools, or humans.
Environment: Shared workspace for drafts, intermediate results, and decision traces.
Reflection: Meta-level assessment for outcome evaluation and process revision.

Structural Adaptation is a key property, allowing both humans and agents to reshape not only actions but the architecture of collaboration itself. This enables dynamic role allocation, process granularity adjustment, and surfacing decision points for human input.

Figure 2: The three-layer framework for human-agent collaboration, illustrating the modular yet interdependent architecture enabling adaptive and reflective collaboration.

Interaction Layer: The Surface of Shared Understanding

The Interaction Layer serves as the interface for expressing intent, steering direction, providing constraints, and refining outcomes. It supports multimodal inputs and mixed-initiative control, projecting process state into interpretable forms (e.g., chat threads, workflow diagrams, kanban boards). Decoupling process logic from interface modality enables flexible, role-sensitive, and stage-aware coordination. The layer is designed to support dynamic representation selection, cross-level navigation, and attention orchestration, addressing cognitive overhead and representational consistency.

Infrastructure Layer: Orchestration, Execution, and Memory

The Infrastructure Layer provides the computational substrate, encompassing models, tools, agents, memory systems, and coordination mechanisms. It is organized into:

Personalization: User-specific context, memory, and continuity.
Foundation: Generative and computational engines (LLMs, APIs, search, simulation).
Coordination: Protocols and frameworks for agent-agent and agent-tool messaging, supporting distributed collaboration and process coherence.

The infrastructure is modular and extensible, supporting plug-and-play integration and propagation of improvements across workflows.

Use Case: Decision Intelligence System

The paper presents a detailed scenario involving Mei, a product strategist, collaborating with an intelligent system built on the proposed framework. Mei's high-level intent triggers the construction of a structured problem space, with friction threads representing emerging inconsistencies. The system retrieves relevant artifacts, instantiates analyses, and enables Mei to merge, relabel, and restructure threads, shifting problem framing and workflow logic. Reflection checkpoints and filtering constraints support decision readiness, while the system generates traceable decision packets for leadership review. The process is abstracted into a reusable workflow pattern, demonstrating persistent, inspectable, and evolving collaboration.

Research Challenges and Opportunities

The authors identify several open research directions:

Unified, Adaptive Architecture: The layered model enables structural coherence, component reuse, and epistemic generalization across domains. Explicit linkage between layers supports reflective adaptation and architectural self-improvement.
Process-Aligned Interfaces: Interfaces must act as representational projections of the process layer, supporting dynamic representation selection, cross-level navigation, and attention orchestration.
Role, Initiative, and Uncertainty Management: Adaptive delegation and explicit role representations are required for fluid control distribution, collaborative recovery, and trust maintenance.
Longitudinal Co-Evolution: Process-aligned systems enable persistent, evolving collaboration patterns, shared memory, and meta-cognition, reframing agents as reflective partners.

Implications and Future Directions

The proposed architecture has significant implications for both practical system design and theoretical understanding of human-agent collaboration. By elevating process to a first-class construct, the framework enables transparent, extensible, and adaptive systems that support open-ended, evolving work. It provides a foundation for composable, process-aware tools that can generalize across domains and support longitudinal co-evolution. Future developments may include standardized process modeling languages, meta-reasoning protocols, and evaluation methodologies for collaborative trajectories.

Conclusion

The paper advocates a shift from task augmentation to process-first system design, presenting a layered architecture that unifies interaction, process, and infrastructure. This approach enables intelligent systems that are coordinated, reflective, and capable of genuine collaboration. By embedding shared logic at the core, the framework opens new avenues for adaptive, extensible, and longitudinal human-agent systems, addressing the limitations of current paradigms and laying the groundwork for future research in collaborative intelligence.

Markdown