Aegean-Serve: Consensus-Aware Multi-Agent LLM Engine
- Aegean-Serve is a consensus-aware serving engine that formalizes distributed multi-agent LLM reasoning with guarantees of correctness, liveness, and token efficiency.
- It leverages incremental quorum detection, stability conditioning, and early termination to reduce latency by up to 20× while maintaining near-baseline accuracy.
- The system integrates a consensus coordinator, agreement monitor, and asynchronous agent execution to optimize performance for complex tasks like mathematical reasoning.
Aegean-Serve is a consensus-aware serving engine for multi-agent LLM reasoning systems that integrates the Aegean consensus protocol. It guarantees correctness and liveness during distributed, stochastic agentic AI execution through incremental quorum detection, early termination, and stability conditioning. Aegean-Serve formalizes high-quality, low-latency multi-agent inference, addressing the inefficiency of classical barrier- or heuristic-based orchestration and establishing a provably correct, token-efficient solution applicable to mathematical reasoning and related domains (Ruan et al., 23 Dec 2025).
1. Formal Model of Multi-Agent Refinement
Aegean-Serve is grounded in a formal model for collective agent refinement, adapting classical distributed consensus to the context of stochastic, reasoning agents. The system considers agents, each maintaining an internal context and providing a reasoning function:
where is a candidate solution with a corresponding reasoning trace. Distributed execution advances in terms , each with rounds of refinement. In round of term , the leader maintains a refinement set , with one candidate per agent from the preceding quorum.
A Quality Oracle , inaccessible to the agents but used for theoretical guarantees, evaluates solution quality.
Correctness is specified as:
- Refinement Validity (Safety 1): Any output must be at least as good as the majority-optimal individual solution; that is,
where .
- Refinement Monotonicity (Safety 2): If two outputs (round ) and (round ) are produced (real-time order), then
- Refinement Termination (Liveness): Under partial synchrony and up to agent failures, the system eventually produces some output.
The reasoning refinement assumption posits that agent refinement never degrades solution quality:
Empirical evidence in multi-agent LLM ensembles supports substantial improvement for weaker models post-exchange without degrading performance from stronger models.
2. The Aegean Consensus Protocol
The Aegean protocol is leader-based and round-driven, supporting early termination via incremental quorum detection and stability testing. Its operation proceeds as follows:
- Leader election (term ).
- Initial round (): Leader broadcasts the task. Each agent proposes a solution; the leader gathers a quorum of distinct proposals to form .
- Refinement rounds ():
- Leader broadcasts .
- Each agent refines to produce .
- The leader collects a quorum of refinements, forming , and submits to the decision engine.
A stability horizon (typically ) is enforced: a candidate must achieve quorum support for consecutive rounds to be eligible for commitment (early-stop).
This structure filters transient, stochastic majority oscillations intrinsic to LLM pipelines. The protocol provably maintains the formal correctness guarantees outlined in section 1.
3. System Architecture and Implementation
Aegean-Serve operationalizes the protocol within a high-throughput serving engine, composed of:
- Consensus Coordinator: Manages protocol state (term, round, refinement sets, stability counters), leader election, message broadcast, and calls the decision engine.
- Agreement Monitor: Receives agent completions (OnComplete), normalizes and aggregates candidate answers, tracks support per round, and triggers early-stop conditions.
- Agent Execution Engine: Schedules inference tasks through dispatch/handle/callback mechanisms, enables cancellation for early abort, manages collective agent state.
Concurrency is event-driven: the coordinator launches asynchronous agent inferences and, as each completes, corresponding results are immediately checked for agreement, circumventing full round barriers. This design allows early abortion of in-flight computations upon consensus, and utilizes heartbeats/timeouts to mitigate stragglers and unresponsive agents.
Safety and liveness are enforced in the runtime via strict application of quorum criteria () and the stability horizon (), underpinned by the monotonic refinement assumption.
4. Empirical Evaluation and Quantitative Outcomes
Aegean-Serve was benchmarked against a Multi-Agent-Base (MA-Base) baseline (barrier synchronization, fixed rounds, majority-vote commit) and best/worst single models across several mathematical reasoning tasks (GSM8K, MMLU, AIME, IMO). Trials encompassed local (8×H100 GPU) and API-driven agents (GPT-5-mini, Gemini-2.5-flash, Claude-4.5-Haiku).
Latency Reduction:
Aegean-Serve reduced average latency by factors of 1.2–20× and P99 tail latency up to 11×. For example, on GSM8K (API) mean latency decreased from 49.2 s (MA-Base) to 8.0 s (Aegean, 6.2× speedup).
Accuracy:
Aegean-Serve maintained accuracy within 2.5 percentage points of the fully-barriered baseline across all benchmarks:
| Benchmark | MA-Base | Aegean | BestSingle |
|---|---|---|---|
| GSM8K | 98% | 97% | 93% |
| MMLU | 96% | 95% | 70% |
| AIME | 60% | 60% | 46.7% |
| IMO | 49.5% | 47% | 18% |
Token Efficiency:
Early termination led to substantial savings in output tokens—up to 4.4× reduction on GSM8K (5.7K→1.3K), with moderate gains on more challenging tasks (e.g., IMO: 73.8K→64.8K).
Ablations:
- Varying (homogenous Qwen3-8B agents) from 3 to 7 yielded sublinear increases in time but stable/improving accuracy.
- Lowering the stability horizon () reduced latency but significantly harmed accuracy (e.g., GSM8K, 85%); provided near-optimal tradeoff.
- Quorum thresholds below majority (e.g., ) destroyed correctness; higher than majority increased latency without additional benefit.
5. Theoretical and Practical Significance
Aegean-Serve establishes the first consensus protocol and serving engine for systems of stochastic, distributed reasoning agents with guarantees analogous to classical distributed systems. It eschews ad-hoc heuristics, eliminating fixed loop limits and barrier synchronization in favor of formal early termination. By decoupling progress from slowest agents, it fundamentally reduces latency and resource consumption, for both on-premises GPU fleets and remote API aggregation.
The protocol is mathematically grounded, yielding provable validity, monotonicity, and termination under the fail-stop, partial synchrony model, and is empirically demonstrated to retain state-of-the-art accuracy.
A plausible implication is that Aegean-Serve can serve as a blueprint for future multi-agent orchestration platforms engaging highly variable, high-compute agentic workloads. The enforcement of consensus guarantees in a stochastic environment may inspire analogous approaches in adjacent domains such as decentralized scientific collaboration, robust ensemble forecasting, or federated AI deployment.
6. Limitations and Future Prospects
While Aegean-Serve minimizes straggler-induced delays and premature commitment, its effectiveness is contingent on correctness of the reasoning refinement assumption and appropriate choices of quorum size () and stability horizon (). The protocol does not address fully Byzantine agents or adversarial consensus. Further, early termination is tuned primarily for competitive refinement–not all multi-agent tasks may derive comparable benefit, especially where agent diversity is low or stochasticity is minimal.
Future research may explore extensions to other stochastic agent settings, adaptive quoruming strategies, and integration with continuous or federated learning environments. Additional potential lies in applying the Aegean-Serve paradigm to large-scale scientific and industrial process control, where provably safe and liveness-assured collective reasoning is critical.
For a detailed synthesis, refer to "Reaching Agreement Among Reasoning LLM Agents" (Ruan et al., 23 Dec 2025).