Nezha: Deployable and High-Performance Consensus Using Synchronized Clocks

Published 3 Jun 2022 in cs.DC, cs.DB, and cs.NI | (2206.03285v11)

Abstract: This paper presents a high-performance consensus protocol, Nezha, which can be deployed by cloud tenants without any support from their cloud provider. Nezha bridges the gap between protocols such as Multi-Paxos and Raft, which can be readily deployed and protocols such as NOPaxos and Speculative Paxos, that provide better performance, but require access to technologies such as programmable switches and in-network prioritization, which cloud tenants do not have. Nezha uses a new multicast primitive called deadline-ordered multicast (DOM). DOM uses high-accuracy software clock synchronization to synchronize sender and receiver clocks. Senders tag messages with deadlines in synchronized time; receivers process messages in deadline order, on or after their deadline. We compare Nezha with Multi-Paxos, Fast Paxos, Raft, a NOPaxos version we optimized for the cloud, and 2 recent protocols, Domino and TOQ-EPaxos, that use synchronized clocks. In throughput, Nezha outperforms all baselines by a median of 5.4x (range: 1.9-20.9x). In latency, Nezha outperforms five baselines by a median of 2.3x (range: 1.3-4.0x), with one exception: it sacrifices 33% latency compared with our optimized NOPaxos in one test. We also prototype two applications, a key-value store and a fair-access stock exchange, on top of Nezha to show that Nezha only modestly reduces their performance relative to an unreplicated system. Nezha is available at https://github.com/Steamgjk/Nezha.

Abstract PDF Upgrade to Chat

Authors (4)

Citations (6)

View on Semantic Scholar

Summary

The paper introduces the Nezha protocol, which employs deadline-ordered multicast with synchronized clocks to achieve a median throughput improvement of 5.4x.
It incorporates speculative execution and stateless proxies to minimize client latency and distribute processing load effectively.
Empirical evaluations show Nezha consistently outperforms baselines like Multi-Paxos, Raft, and NOPaxos, enabling practical consensus in tenant-driven cloud environments.

Nezha: Deployable and High-Performance Consensus Using Synchronized Clocks

The paper introduces Nezha, a consensus protocol designed specifically to enhance both performance and deployability for cloud tenants without requiring support from cloud providers. Nezha bridges the gap between practical deployed protocols like Multi-Paxos and Raft and high-performance protocols like NOPaxos that demand network features generally unavailable to tenants. Central to Nezha's operation is a novel multicast primitive known as deadline-ordered multicast (DOM), which uses high-accuracy software clock synchronization to manage message delivery among replicas efficiently.

Core Components and Methodology

Nezha presents solutions to two persistent challenges in consensus protocols: achieving high throughput and low latency while still operating under the constraints of public cloud infrastructure, where tenants often lack advanced network features like programmable switches. To address this, Nezha introduces three innovative tactics:

Deadline-Ordered Multicast (DOM): Nezha capitalizes on DOM to maintain consistent message ordering across all replicas, utilizing clock synchronization to tag messages with deadlines. This serves to enhance consistency without compromising set equality, a significant leap given the constraints of typical cloud environments.
Speculative Execution: Nezha decouples the execution of requests from commitment, allowing leaders to execute requests speculatively and reduce client latency. This requires only a minor rerun of the request if necessary, allowing for greater efficiency compared to existing methods.
Stateless Proxies: By using proxies to handle much of the consensus process burdens—such as multicasting, clock synchronization, and quorum checks—Nezha alleviates load from client systems. This approach supports scalability without compromising performance benefits, accommodating a wider adoption across cloud services.

Empirical Evaluation

Nezha has been tested against multiple baseline protocols, including Multi-Paxos, Fast Paxos, Raft, and Optimized NOPaxos, as well as newer protocols like Domino and TOQ-EPaxos. Across various workloads—both closed-loop and open-loop—Nezha consistently outperforms these alternatives by a significant margin. Particularly in terms of throughput, Nezha shows a median improvement of 5.4 times and a range of improvement spanning from 1.9 to 20.9 times. In latency terms, Nezha efficiently minimizes this critical performance metric across the board except when compared against highly optimized versions of competitor protocols used in narrow scenarios.

Implications and Future Work

Nezha's introduction of effective clock synchronizations as a performance enhancement—independent of correctness guarantees—presents potential for future protocol enhancements. The adaptability of Nezha suggests a meaningful shift towards tenant-dominant control over consensus-based applications, enabling more extensive deployments of fault-tolerant systems without significant cloud provider intervention.

Looking forward, possible developments may include integrating Nezha into existing system architectures such as Kubernetes for comprehensive performance evaluations and exploring the roles of DOM in decentralized resource management across distributed systems. Additionally, further refinement of Nezha to achieve seamless integration and practical deployment across varied environments and workloads remains a promising area of research.

In conclusion, the Nezha protocol offers a balanced and practical solution to existing challenges in consensus protocol deployment for cloud tenants, making it a noteworthy addition to the landscape of distributed systems research and cloud computing.

Markdown Report Issue