Tool-Based Disagreement Resolution

Updated 6 January 2026

Tool-based disagreement resolution is a systematic approach that employs formal, algorithmic, and computational tools to detect and resolve conflicting claims in multi-agent environments.
It leverages abstract argumentation frameworks, logic programming, and tool recruitment strategies to process complex disputes in domains like peer review, data curation, and software engineering.
The approach enhances transparency and scalability by delivering reproducible resolution protocols, measurable performance metrics, and structured reasoning.

Tool-based disagreement resolution refers to the systematic use of formal, algorithmic, and computational tools to identify, analyze, and resolve conflicting claims, arguments, or actions among agents (human or artificial) in complex decision-making, review, or collaborative environments. This paradigm leverages argumentation theory, logic-based frameworks, optimization techniques, and specialist utilities to facilitate transparent, scalable, and principled dispute resolution across domains such as peer review, AI reasoning, multi-agent systems, collaborative data curation, and software engineering.

1. Foundations: Argumentation Frameworks and Formal Semantics

Many tool-based disagreement resolution approaches are grounded in abstract argumentation frameworks (AAF), as formalized by Dung. An AAF is a directed graph $(A,R)$ , where $A$ is a set of abstract arguments and $R \subseteq A \times A$ is the attack relation—encoding which arguments "attack" others. Extensional semantics, such as conflict-free, admissible, preferred, stable, complete, and grounded sets, define operational criteria for selecting which arguments should ultimately be accepted or rejected. In specialized domains like peer review, the framework is strictly acyclic and multipartite, allowing for a unique linear-time complete extension (Baimuratov et al., 18 Jul 2025).

Semantics and their properties:

Semantics	Definition (Sketch)	Typical Use
Conflict-free	No pairwise attacks in the set	Well-formedness
Admissible	Defends against all attackers	Plausible support
Preferred	Maximal admissible set	Choice under debate
Stable	Attacks all outside arguments	Strong defense
Complete	Contains all admissible arguments it defends	Semantic closure
Grounded	Minimal complete extension	Uncontroversial

In computational settings, these semantics are calculated via logic programming or description logic reasoners, often yielding explainable and reproducible resolutions.

2. Multi-Agent and Tool-Driven Dynamics

Recent advances explicitly leverage disagreement among agents to recruit specialist tools, as exemplified by DART in multimodal AI reasoning. When agents disagree—either on direct answers or intermediate reasoning chains—a recruitment agent identifies the relevant points of conflict and activates a tool suite (e.g., visual grounding, OCR, attribute detection). Execution of these tools generates "expert statements" that are rigorously compared to each agent's output using tool-aligned agreement scores. Agents then update their outputs, and an aggregator selects the final answer, yielding rises in accuracy and richer inter-agent discussion compared to debate-only or single-agent baselines (Sivakumaran et al., 8 Dec 2025).

Workflow overview:

Agents propose answers and reasoning.
Recruitment agent detects disagreement and selects relevant tools.
Tools produce expert outputs.
Agents rerun, conditioned on expert results and agreement scores.
Aggregator finalizes output.

Disagreement detection is both at the output and at the reasoning level, and targeted tool-calling is guided by conflict typology and tool capability. Tool-based strategies are particularly effective in high-stakes, ambiguity-prone, or technical tasks where no single agent holds complete expertise.

3. Domain-Specific Applications

Peer Review

Peer review dispute resolution formalized as an AAF models all objections and responses as attack relations on the root claim ("the manuscript deserves acceptance"). Edges only run from later to earlier rounds; acyclicity ensures unique evaluation (Baimuratov et al., 18 Jul 2025). Each argument (objection or reply) is formally annotated and compiled via OWL DL, with reasoning performed by standard DL engines, outputting which arguments survive and whether the root is admitted. This approach guarantees transparency, neutrality (by eschewing data-driven bias), and efficient scaling.

Data Curation

Conflicting data cleaning actions in collaborative curation are encoded as arguments in an AF; pairwise attacks capture direct edit conflicts. The AF is compiled into a logic program $P_{AF}$ and solved under well-founded (grounded) or stable semantics, classifying actions into accepted, defeated, or ambiguous. Stable models enumerate resolution possibilities, exposing ambiguity for user intervention. Integration with tools like OpenRefine yields reproducible, transparent conflict reconciliation (Xia et al., 2024).

Software Engineering

Tools such as BuCoR resolve build conflicts by combining a catalog of rule-based program transformation patterns with example-based generalization over prior edits. Conflict types are detected through entity-level diffs and matched to specific resolution rules; when these are insufficient, the tool mines, generalizes, and reapplies exemplar repairs from peer branches (Towqir et al., 25 Jul 2025). This hybrid architecture balances generality and project-specific adaptation, consistently covering a sizable fraction of real conflicts.

Multi-objective Optimization for Specification Conflicts

Goal conflicts in formal requirements (e.g., LTL specifications) are addressed by generating candidate resolutions via syntactic modifications and searching for Pareto-optimal sets that maximize satisfiability, boundary-condition resolution, and (syntactic/semantic) similarity to the original specification. Multi-objective search algorithms (NSGA-III, WBGA, AMOSA) drive the search process (Carvalho et al., 2023). Only candidates passing all consistency checks are proposed to engineers.

4. Algorithmic Strategies and Resolution Protocols

A common protocol in tool-based resolution involves:

Assessment: Quantify both evaluation divergence (e.g., $L^1$ or KL divergence) and representational misalignment (e.g., RSA alignment).
Diagnosis: Case-based strategy selection—data-sharing suffices under low misalignment, high divergence; conceptual negotiation is needed for high misalignment; hybrid treatment for both.
Intervention: Selection and deployment of the most informative data, tool, or transformation, guided by active learning or domain-specific mapping.
Iteration: Re-assess, re-align, and repeat until both divergence and misalignment are minimized (Oktar et al., 2023).

In argumentation-based mediation (e.g., BDI agents), additional layers involve resource-sharing, domain knowledge exchange, and negotiation under explicit bridge-rules, with mediation cycles proceeding until an acceptable, reasoned solution is constructed or deadlock is detected (Trescak et al., 2014).

5. Tool Architecture, Implementation, and Performance

Most systems proceed via the following engineering pipeline:

Annotation/Extraction: Structural or argumentation-aware annotation of inputs (manual, mining-assisted, or agent-generated).
Formal Modeling: Compilation to AFs, logic programs, or ontologies.
Solving: Use of DL reasoners, ASP solvers, or custom fixpoint algorithms to find extensions or solution sets.
Interpretation and User Interaction: surfacing uncontroversial resolutions for automatic application, exposing ambiguous cases (undecided extensions) for explicit user choice, and providing game-theoretic or dialectical provenance.
Evaluation: Metrics include computational throughput, conflict coverage, accuracy against ground-truth outcomes, and human interpretability.

Prototypes commonly achieve linear-time resolution for well-structured (acyclic) AFs, milliseconds-level throughput for reasoner calls, and over 50% exact agreement with human or developer gold standards—even in open-ended peer review or code merge disputes (Baimuratov et al., 18 Jul 2025, Towqir et al., 25 Jul 2025).

6. Bias, Transparency, and Scalability

The logic- and rule-based nature of these systems confers key properties:

Bias Resistance: Symbolic reasoning prevents amplification of historical biases that plague data-driven systems.
Transparency: Explicit argumentation graphs, decision provenance (e.g., defeat trees), and clear mapping from attacks/defenses to decisions enable post-hoc auditing.
Throughput and Automation: Parallelizable pipelines (annotation→ontology→reasoning) scale to thousands of cases daily, with future work emphasizing more sophisticated argument mining and tool selection strategies.

In collaborative and AI settings, embedding continuous "disagreement monitors" and dynamic re-alignment modules further ensure system robustness and maintainability (Oktar et al., 2023).

7. Limitations and Directions for Future Work

Current limitations include dependency on structured annotation or accurate argument extraction, potential prompt sensitivity in LLM–driven systems, the need for high-quality tool pools, and incomplete domain transferability. Addressing these demands research on:

End-to-end learning for tool-selection and agent recruitment (Sivakumaran et al., 8 Dec 2025)
Integration of human-in-the-loop guidance for candidate resolution vetting (Carvalho et al., 2023)
Adoption of domain- and representation-agnostic alignment probes (Oktar et al., 2023)
Systematic provenance tracking and interpretability enhancement in data-driven workflows (Xia et al., 2024)

Emerging trends emphasize reinforcement learning for minimal tool-calling policies, dynamic multi-agent role assignment, and seamless integration with existing collaborative platforms.

References:

(Baimuratov et al., 18 Jul 2025) Dispute Resolution in Peer Review with Abstract Argumentation and OWL DL (Sivakumaran et al., 8 Dec 2025) DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal Reasoning (Oktar et al., 2023) Dimensions of Disagreement: Unpacking Divergence and Misalignment in Cognitive Science and Artificial Intelligence (Trescak et al., 2014) Dispute Resolution Using Argumentation-Based Mediation (Towqir et al., 25 Jul 2025) Resolving Build Conflicts via Example-Based and Rule-Based Program Transformations (Xia et al., 2024) Reconciling Conflicting Data Curation Actions: Transparency Through Argumentation (Carvalho et al., 2023) ACoRe: Automated Goal-Conflict Resolution