Papers
Topics
Authors
Recent
Search
2000 character limit reached

Aegean-Serve: Consensus-Aware Multi-Agent LLM Engine

Updated 30 December 2025
  • Aegean-Serve is a consensus-aware serving engine that formalizes distributed multi-agent LLM reasoning with guarantees of correctness, liveness, and token efficiency.
  • It leverages incremental quorum detection, stability conditioning, and early termination to reduce latency by up to 20× while maintaining near-baseline accuracy.
  • The system integrates a consensus coordinator, agreement monitor, and asynchronous agent execution to optimize performance for complex tasks like mathematical reasoning.

Aegean-Serve is a consensus-aware serving engine for multi-agent LLM reasoning systems that integrates the Aegean consensus protocol. It guarantees correctness and liveness during distributed, stochastic agentic AI execution through incremental quorum detection, early termination, and stability conditioning. Aegean-Serve formalizes high-quality, low-latency multi-agent inference, addressing the inefficiency of classical barrier- or heuristic-based orchestration and establishing a provably correct, token-efficient solution applicable to mathematical reasoning and related domains (Ruan et al., 23 Dec 2025).

1. Formal Model of Multi-Agent Refinement

Aegean-Serve is grounded in a formal model for collective agent refinement, adapting classical distributed consensus to the context of stochastic, reasoning agents. The system considers NN agents, each maintaining an internal context cic_i and providing a reasoning function:

Ri:(ci,InputString)(ci,s),R_i: (c_i, \text{InputString}) \to (c'_i, s),

where ss is a candidate solution with a corresponding reasoning trace. Distributed execution advances in terms tNt \in \mathbb{N}, each with rounds r=0,1,2,r = 0, 1, 2,\ldots of refinement. In round rr of term tt, the leader maintains a refinement set RrtS\mathcal{R}^t_r \subset S, with one candidate per agent from the preceding quorum.

A Quality Oracle Q:(Task,Solution)RQ: (\text{Task}, \text{Solution}) \to \mathbb{R}, inaccessible to the agents but used for theoretical guarantees, evaluates solution quality.

Correctness is specified as:

  • Refinement Validity (Safety 1): Any output ss^* must be at least as good as the majority-optimal individual solution; that is,

Q(Task,s)minM:M=N/2Q(Task,sM)Q(\text{Task}, s^*) \geq \min_{M: |M| = \lceil N/2 \rceil} Q(\text{Task}, s^M)

where sM=argmaxs{si0:iM}Q(Task,s)s^M = \arg\max_{s \in \{s^0_i : i \in M\}} Q(\text{Task}, s).

  • Refinement Monotonicity (Safety 2): If two outputs ss (round rr) and ss' (round r>rr' > r) are produced (real-time order), then

Q(Task,s)Q(Task,s).Q(\text{Task}, s') \geq Q(\text{Task}, s).

  • Refinement Termination (Liveness): Under partial synchrony and up to (N1)/2\lfloor (N-1)/2 \rfloor agent failures, the system eventually produces some output.

The reasoning refinement assumption posits that agent refinement never degrades solution quality:

sS, Q(Task,Ri(S))Q(Task,s).\forall s \in S,~ Q(\text{Task}, R_i(S)) \geq Q(\text{Task}, s).

Empirical evidence in multi-agent LLM ensembles supports substantial improvement for weaker models post-exchange without degrading performance from stronger models.

2. The Aegean Consensus Protocol

The Aegean protocol is leader-based and round-driven, supporting early termination via incremental quorum detection and stability testing. Its operation proceeds as follows:

  1. Leader election (term tt).
  2. Initial round (r=0r=0): Leader broadcasts the task. Each agent proposes a solution; the leader gathers a quorum of α=(N+1)/2\alpha = \lceil (N+1)/2 \rceil distinct proposals to form R0t\mathcal{R}^t_0.
  3. Refinement rounds (r1r \geq 1):
    • Leader broadcasts Rr1t\mathcal{R}^t_{r-1}.
    • Each agent refines to produce sirRi(Rr1t)s^r_i \leftarrow R_i(\mathcal{R}^t_{r-1}).
    • The leader collects a quorum of α\alpha refinements, forming Rrt\mathcal{R}^t_r, and submits (Rrt,r)(\mathcal{R}^t_r, r) to the decision engine.

A stability horizon β\beta (typically β=2\beta=2) is enforced: a candidate ss must achieve quorum support fr(s)αf_r(s) \geq \alpha for β\beta consecutive rounds to be eligible for commitment (early-stop).

This structure filters transient, stochastic majority oscillations intrinsic to LLM pipelines. The protocol provably maintains the formal correctness guarantees outlined in section 1.

3. System Architecture and Implementation

Aegean-Serve operationalizes the protocol within a high-throughput serving engine, composed of:

  • Consensus Coordinator: Manages protocol state (term, round, refinement sets, stability counters), leader election, message broadcast, and calls the decision engine.
  • Agreement Monitor: Receives agent completions (OnComplete), normalizes and aggregates candidate answers, tracks support per round, and triggers early-stop conditions.
  • Agent Execution Engine: Schedules inference tasks through dispatch/handle/callback mechanisms, enables cancellation for early abort, manages collective agent state.

Concurrency is event-driven: the coordinator launches NN asynchronous agent inferences and, as each completes, corresponding results are immediately checked for agreement, circumventing full round barriers. This design allows early abortion of in-flight computations upon consensus, and utilizes heartbeats/timeouts to mitigate stragglers and unresponsive agents.

Safety and liveness are enforced in the runtime via strict application of quorum criteria (α\alpha) and the stability horizon (β\beta), underpinned by the monotonic refinement assumption.

4. Empirical Evaluation and Quantitative Outcomes

Aegean-Serve was benchmarked against a Multi-Agent-Base (MA-Base) baseline (barrier synchronization, fixed rounds, majority-vote commit) and best/worst single models across several mathematical reasoning tasks (GSM8K, MMLU, AIME, IMO). Trials encompassed local (8×H100 GPU) and API-driven agents (GPT-5-mini, Gemini-2.5-flash, Claude-4.5-Haiku).

Latency Reduction:

Aegean-Serve reduced average latency by factors of 1.2–20× and P99 tail latency up to 11×. For example, on GSM8K (API) mean latency decreased from 49.2 s (MA-Base) to 8.0 s (Aegean, 6.2× speedup).

Accuracy:

Aegean-Serve maintained accuracy within 2.5 percentage points of the fully-barriered baseline across all benchmarks:

Benchmark MA-Base Aegean BestSingle
GSM8K 98% 97% 93%
MMLU 96% 95% 70%
AIME 60% 60% 46.7%
IMO 49.5% 47% 18%

Token Efficiency:

Early termination led to substantial savings in output tokens—up to 4.4× reduction on GSM8K (5.7K→1.3K), with moderate gains on more challenging tasks (e.g., IMO: 73.8K→64.8K).

Ablations:

  • Varying NN (homogenous Qwen3-8B agents) from 3 to 7 yielded sublinear increases in time but stable/improving accuracy.
  • Lowering the stability horizon (β=1\beta=1) reduced latency but significantly harmed accuracy (e.g., GSM8K, 85%); β=2\beta=2 provided near-optimal tradeoff.
  • Quorum thresholds below majority (e.g., α=1\alpha=1) destroyed correctness; higher than majority increased latency without additional benefit.

5. Theoretical and Practical Significance

Aegean-Serve establishes the first consensus protocol and serving engine for systems of stochastic, distributed reasoning agents with guarantees analogous to classical distributed systems. It eschews ad-hoc heuristics, eliminating fixed loop limits and barrier synchronization in favor of formal early termination. By decoupling progress from slowest agents, it fundamentally reduces latency and resource consumption, for both on-premises GPU fleets and remote API aggregation.

The protocol is mathematically grounded, yielding provable validity, monotonicity, and termination under the fail-stop, partial synchrony model, and is empirically demonstrated to retain state-of-the-art accuracy.

A plausible implication is that Aegean-Serve can serve as a blueprint for future multi-agent orchestration platforms engaging highly variable, high-compute agentic workloads. The enforcement of consensus guarantees in a stochastic environment may inspire analogous approaches in adjacent domains such as decentralized scientific collaboration, robust ensemble forecasting, or federated AI deployment.

6. Limitations and Future Prospects

While Aegean-Serve minimizes straggler-induced delays and premature commitment, its effectiveness is contingent on correctness of the reasoning refinement assumption and appropriate choices of quorum size (α\alpha) and stability horizon (β\beta). The protocol does not address fully Byzantine agents or adversarial consensus. Further, early termination is tuned primarily for competitive refinement–not all multi-agent tasks may derive comparable benefit, especially where agent diversity is low or stochasticity is minimal.

Future research may explore extensions to other stochastic agent settings, adaptive quoruming strategies, and integration with continuous or federated learning environments. Additional potential lies in applying the Aegean-Serve paradigm to large-scale scientific and industrial process control, where provably safe and liveness-assured collective reasoning is critical.


For a detailed synthesis, refer to "Reaching Agreement Among Reasoning LLM Agents" (Ruan et al., 23 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Aegean-Serve.