Papers
Topics
Authors
Recent
Search
2000 character limit reached

INGESTBASE: A Declarative Data Ingestion System

Published 21 Jan 2017 in cs.DB | (1701.06093v1)

Abstract: Big data applications have fast arriving data that must be quickly ingested. At the same time, they have specific needs to preprocess and transform the data before it could be put to use. The current practice is to do these preparatory transformations once the data is already ingested, however, this is expensive to run and cumbersome to manage. As a result, there is a need to push data preprocessing down to the ingestion itself. In this paper, we present a declarative data ingestion system, called INGESTBASE, to allow application developers to plan and specify their data ingestion logic in a more systematic manner. We introduce the notion of ingestions plans, analogous to query plans, and present a declarative ingestion language to help developers easily build sophisticated ingestion plans. INGESTBASE provides an extensible ingestion optimizer to rewrite and optimize ingestion plans by applying rules such as operator reordering and pipelining. Finally, the INGESTBASE runtime engine runs the optimized ingestion plan in a distributed and fault-tolerant manner. Later, at query processing time, INGESTBASE supports ingestion-aware data access and interfaces with upstream query processors, such as Hadoop MapReduce and Spark, to post- process the ingested data. We demonstrate through a number of experiments that INGESTBASE: (i) is flexible enough to express a variety of ingestion techniques, (ii) incurs a low ingestion overhead, (iii) provides efficient access to the ingested data, and (iv) has much better performance, up to 6 times, than preparing data as an afterthought, via a query processor.

Citations (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.