Papers
Topics
Authors
Recent
Search
2000 character limit reached

Improve3C: Data Cleaning on Consistency and Completeness with Currency

Published 31 Jul 2018 in cs.DB | (1808.00024v1)

Abstract: Data quality plays a key role in big data management today. With the explosive growth of data from a variety of sources, the quality of data is faced with multiple problems. Motivated by this, we study the multiple data quality improvement on completeness, consistency and currency in this paper. For the proposed problem, we introduce a 4-step framework, named Improve3C, for detection and quality improvement on incomplete and inconsistent data without timestamps. We compute and achieve a relative currency order among records derived from given currency constraints, according to which inconsistent and incomplete data can be repaired effectively considering the temporal impact. For both effectiveness and efficiency consideration, we carry out inconsistent repair ahead of incomplete repair. Currency-related consistency distance is defined to measure the similarity between dirty records and clean ones more accurately. In addition, currency orders are treated as an important feature in the training process of incompleteness repair. The solution algorithms are introduced in detail with examples. A thorough experiment on one real-life data and a synthetic one verifies that the proposed method can improve the performance of dirty data cleaning with multiple quality problems which are hard to be cleaned by the existing approaches effectively.

Citations (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.