Multi-Agent Critique Systems

Updated 8 February 2026

Multi-Agent Critique is a paradigm that employs multiple cooperative or adversarial agents, such as LLMs and RL policies, to evaluate and refine candidate solutions.
It integrates explicit critique functions and structured communication protocols to systematically detect logical errors and guide decision-making.
Empirical studies show enhanced performance, increased diversity in reasoning, and robust error correction across complex tasks.

Multi-Agent Critique is a structured paradigm in which multiple cooperative or adversarial agents, potentially based on LLMs or reinforcement learning (RL) policies, engage in the evaluation, verification, and iterative refinement of candidate solutions to complex tasks. The critique mechanism operates either as an explicit agent role within modular agent architectures or as an emergent collective process within debate protocols. It underpins robustness, diversity, and rigor in automated reasoning and decision-making systems by systematically surfacing logical errors, incorrect outcomes, or suboptimal behaviors before final commitment.

1. Formal Definitions and Core Mechanisms

In Multi-Agent LLM (MA-LLM) contexts, multi-agent critique refers to protocols where agents $A_1, \ldots, A_n$ engage in dialogue $D$ on a given prompt $\tau$ , producing and mutually evaluating outputs $x \in X$ until a solution $x^*$ is selected, as codified in (Tillmann, 29 May 2025). Critique functions $C_i: X \to \mathbb{R}$ may represent explicit scoring (confidence, log-probability), qualitative feedback, or factuality/diversity measures.

In modular multi-agent RL, critique is operationalized through dedicated modules or agents. For example, the PokéAI system features a Critique Agent as a stateless verifier, ingesting (a) the original subgoal, (b) the post-execution game state, and (c) a summary from the Execution agent. It deterministically checks for goal satisfaction (e.g., $game\_state.player\_coords =$ target) and emits a binary success/failure signal, along with natural language explanations in failure cases (Liu et al., 30 Jun 2025).

Examples of critique coding:

def critique(task_id, task_desc, game_state, exec_summary):
    goal = parse_goal_from(task_desc)
    if verify(game_state, goal):
        return {task_id, status: "SUCCESS"}
    else:
        reason = explain_failure(game_state, goal)
        return {task_id, status: "FAILURE", reason: reason}

In RL settings, the critic (centralized or decentralized) provides gradient estimation for actor updates, policy evaluation, and credit assignment. Critique is thus mathematically formalized as the computation of value functions $Q(\cdot)$ or evaluation baselines under various critic architectures (Lowe et al., 2017, Iqbal et al., 2018, Liu et al., 2019, Lyu et al., 2024).

2. Architectures, Agent Roles, and Communication Patterns

MA-LLMs and Critique Agents

MA-LLM systems utilize profile-driven or persona-specialized agents—ranging from naive proposers and red-team critics to judges or fact-verifiers (Tillmann, 29 May 2025, Jang et al., 2 Feb 2026, Lan et al., 2024, Srinivas et al., 2024). Communication follows fully connected or hierarchical graphs $G = (V,E)$ , with roles and communication protocols engineered to expose errors, challenge consensus, and aggregate judgments (Tillmann, 29 May 2025, Wang et al., 27 May 2025).

In complex pipeline frameworks such as PatExpert, critique is instantiated as a two-headed judge: "Gold-LLM-as-a-Judge" (factual correctness, relevance, completeness) and "Reward-LLM-as-a-Judge" (coherence, clarity, helpfulness) scoring candidate outputs. These scores are aggregated, thresholded, and accompanied by explainability-focused feedback, closing a correction loop before the next iteration (Srinivas et al., 2024).

RL Critic Architectures

Centralized Training, Decentralized Execution (CTDE): Critics receive joint state, actions, or histories at train time but per-agent policies use local observables at execution (Lowe et al., 2017, Hernandez-Leal et al., 2018, Lyu et al., 2024).
Attention-Based Critics: Actor-Attention-Critic (MAAC) uses a self-attention pooling of other agent states/actions in critics to dynamically focus on key information (Iqbal et al., 2018).
Permutation-Invariant Critics: Graph-convolutional architectures ensure output invariance under agent shuffling, critical for scaling to large homogeneous teams (Liu et al., 2019).
Double Critics: Twin-critic (MATD3) methods take the minimum of two critic estimates to correct for overestimation bias in joint Q-learning (Ackermann et al., 2019).

Communication takes the form of well-defined message-passing protocols (e.g., JSON–RPC, peer-to-peer vector embeddings, graph-broadcast), often governed by retry logic, aggregation, or triggering based on disagreement or failure (Liu et al., 30 Jun 2025, Srinivas et al., 2024, Li et al., 9 Jan 2026).

3. Decision Protocols, Consensus, and Scoring

Decision logic in multi-agent critique depends on (a) aggregation of individual critique results and (b) resolution of disagreement:

Scoring Functions: Agents emit real-valued scores $s_i(x)$ or binary success/failure judgments, typically via log-probabilities, embedding similarities, or explicit criteria matching (e.g., subgoal completion) (Tillmann, 29 May 2025, Jang et al., 2 Feb 2026).
Vote Aggregation: Majority vote, Borda count, weighted voting (weights calibrated to agent reliability), or judge-based final arbitration (Tillmann, 29 May 2025, 2505.22960).
Iterative Refinement: Critique loops continue until consensus, convergence, or resource limits (context/tokens/rounds) are exhausted (Lan et al., 2024, Srinivas et al., 2024, Li et al., 9 Jan 2026). The use of explicit triggers (e.g., disagreement threshold) may activate external verification tools for deadlock resolution (Li et al., 9 Jan 2026).

In reinforcement learning, critics provide value baselines for policy gradient estimation and credit assignment, with advanced architectures leveraging attention, recurrence, or shared-parameter models for efficiency and stability (Iqbal et al., 2018, Liu et al., 2019, Lowe et al., 2017).

4. Empirical Effectiveness, Pathologies, and Specializations

Measured Benefits

MA-LLMs: Typical gains over single-agent baselines are $+3\mathchar`-8$% for 3–5 agents and 2–3 rounds, with larger benefits in tasks requiring synthesis of diverse reasoning or robust error correction (Tillmann, 29 May 2025).
Specialized Critique Agents: Hybrid judge/reward models improve precision, recall, and explainability of outputs in workflow settings (e.g., patent analysis, complex clinical QA) (Srinivas et al., 2024, Wang et al., 27 May 2025).
RL Critics: Attention-based and permutation-invariant critics show $15\mathchar`-50$% improvements in team reward, stability, test-time scalability, and faster convergence, notably under non-stationarity and partial observability (Iqbal et al., 2018, Liu et al., 2019).
Heterogeneous Agents: Individualized critique (e.g., “scientific DNA” in INDIBATOR) sharply increases both outcome quality and diversity in scientific discovery tasks (Jang et al., 2 Feb 2026).
Dissent Injection: Structured critique agents such as Catfish Agent demonstrably reduce error-prone unanimity and increase critical engagement in LLM teams (e.g., –47% silent agreement rate in medical reasoning) (Wang et al., 27 May 2025).

Pathologies and Mitigations

Premature Consensus/Silent Agreement: Absence of dissent mechanisms can lead to groupthink and higher error rates; explicit intervention agents are required for robust critical analysis (Wang et al., 27 May 2025).
Problem Drift: In multi-round protocols, continued debate may increase reasoning error or converge to non-optimal outcomes (problem drift), especially beyond optimal agent/round counts (Tillmann, 29 May 2025, 2505.22960).
Computational Cost: Token and context usage grow $O(n \cdot m \cdot L)$ ; optimizations include summarization, sparse activation, or subnet pruning (Tillmann, 29 May 2025).
Emergent Bias: Debate dynamics can amplify or suppress bias, sometimes unpredictably; system-level fairness audits are required (Madigan et al., 18 Dec 2025).

5. Theoretical Foundations and Limitations

RL Critique Component Analysis

Benefits: Centralized critics reduce variance, accelerate learning, and improve coordination in multi-agent RL (Lowe et al., 2017, Iqbal et al., 2018).
Limitations: When using state-based (rather than history-based) critics in partial observability, bias and excess variance are introduced; recurrent or hybrid history/message-based critics may recover performance at higher computational cost (Lyu et al., 2024).
Symmetry: Permutation-invariant critics are essential for homogeneous-agent scalability and unbiased policy gradients in symmetric tasks (Liu et al., 2019).

Structural Properties

Stateless vs. Stateful Critique: Minimalist agents (e.g., PokéAI Critique) may be stateless, delivering limited expressiveness but high transparency. Richer, model-based critics support stepwise logical critique and context-dependent scoring (Liu et al., 30 Jun 2025, Li et al., 9 Jan 2026).
Profile Granularity: Coarse persona labeling (e.g., "critic"/"reviewer") is inferior to fine-grained, data-derived profile conditioning, which drives individual diversity and domain-aligned critique capacity (Jang et al., 2 Feb 2026).
Emergence Theory: Simple aggregation may fail to capture higher-order synergy or bias; only by explicitly measuring global-vs-local performance (e.g., system synergy, bias amplification) can the emergent collective phenomena be assessed (Madigan et al., 18 Dec 2025, Kostka et al., 29 Jul 2025).

6. Research Directions, Challenges, and Open Problems

Integrated Training: There is a recognized need for end-to-end training of interaction protocols and agent models, not just hard-coded or prompt-based role assignment (Tillmann, 29 May 2025).
Debiasing and Failure Mode Analysis: Systematic studies of under-explored failure modes (drift, bias, adversarial knowledge flooding) and adoption of interaction-aware mitigation protocols are critical (Tillmann, 29 May 2025, Madigan et al., 18 Dec 2025).
Scalability: O( $N^2$ ) cross-critiques and context bloat place limits on large-scale deployments unless mitigated by learned topologies, summarization, or ring architectures (Jang et al., 2 Feb 2026, Li et al., 9 Jan 2026).
Socio-Cognitive Extensions: The integration of theory of mind, adaptive critique, and explicit belief modeling may unlock robust collective intelligence and cognitive synergy, but orchestration and coverage heuristics remain an open frontier (Kostka et al., 29 Jul 2025).
Terminology and Benchmarking: Careful distinction between true MAS architectures and ad hoc “multi-agent LLM” orchestrations is essential; the alignment of benchmarks and reporting with foundational MAS metrics (e.g., autonomy, emergence, synergy) is necessary for scientific rigor (Malfa et al., 27 May 2025).

7. Application Case Studies

System/Paper	Critique Mechanism	Domain
PokéAI (Liu et al., 30 Jun 2025)	Stateless verifier, task success/failure	Game playing (Pokemon Red)
PatExpert (Srinivas et al., 2024)	Parallel Judge-LMs, iterative scoring	Patent analysis
INDIBATOR (Jang et al., 2 Feb 2026)	Profile-grounded, diversity/factuality	Molecular discovery
Catfish Agent (Wang et al., 27 May 2025)	Role-based dissent, consensus-breaking	Medical QA/VQA
DynaDebate (Li et al., 9 Jan 2026)	Path generation, process-centric critique	Math/Reasoning

Each demonstrates distinct approaches—ranging from single-module, binary verification to multidisciplinary critical evaluation combined with domain-adaptive process critique—highlighting the breadth of design and tuning axes available within the multi-agent critique landscape.

Multi-Agent Critique constitutes a rich and evolving field, intersecting classical MAS theory, modern reinforcement learning, and the frontier of LLM-based reasoning. By structuring agent interactions, scoring, and explanation mechanisms around critique-driven protocols, practitioners and researchers can harness and coordinate diverse expertise, drive systematic error correction, and achieve robust, interpretable AI outcomes across domains. Continued advances in architectural design, theoretical analysis, and empirical evaluation will further clarify best practices and limitations in deploying critique-driven intelligence at scale.