Papers
Topics
Authors
Recent
Search
2000 character limit reached

Fault Tolerance for Stream Processing Engines

Published 3 May 2016 in cs.DC | (1605.00928v3)

Abstract: Distributed Stream Processing Engines (DSPEs) target applications related to continuous computation, online machine learning and real-time query processing. DSPEs operate on high volume of data by applying lightweight operations on real-time and continuous streams. Such systems require clusters of hundreds of machine for their deployment. Streaming applications come with various requirements, i.e., low-latency, high throughput, scalability and high availability. In this survey, we study the fault tolerance problem for DSPEs. We discuss fault tolerance techniques that are used in modern stream processing engines that are Storm, S4, Samza, SparkStreaming and MillWheel. Further, we give insight on fault tolerance approaches that we categorize as active replication, passive replication and upstream backup. Finally, we discuss implications of the fault tolerance techniques for different streaming application requirements.

Citations (13)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.