Papers
Topics
Authors
Recent
Search
2000 character limit reached

Evaluating Accumulo Performance for a Scalable Cyber Data Processing Pipeline

Published 21 Jul 2014 in cs.DB | (1407.5661v1)

Abstract: Streaming, big data applications face challenges in creating scalable data flow pipelines, in which multiple data streams must be collected, stored, queried, and analyzed. These data sources are characterized by their volume (in terms of dataset size), velocity (in terms of data rates), and variety (in terms of fields and types). For many applications, distributed NoSQL databases are effective alternatives to traditional relational database management systems. This paper considers a cyber situational awareness system that uses the Apache Accumulo database to provide scalable data warehousing, real-time data ingest, and responsive querying for human users and analytic algorithms. We evaluate Accumulo's ingestion scalability as a function of number of client processes and servers. We also describe a flexible data model with effective techniques for query planning and query batching to deliver responsive results. Query performance is evaluated in terms of latency of the client receiving initial result sets. Accumulo performance is measured on a database of up to 8 nodes using real cyber data.

Citations (3)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.