Papers
Topics
Authors
Recent
Search
2000 character limit reached

Topology-Aware Subset Repair via Entropy-Guided Density and Graph Decomposition

Published 27 Jan 2026 in cs.DB | (2601.19671v1)

Abstract: Subset repair is an important data cleaning technique that enforces integrity constraints by deleting a minimal number of conflicting tuples, yet multiple minimal repairs often exist. Density-based methods address this ambiguity by favoring repairs that preserve dense, high-quality data regions; however, their effectiveness is limited by density bias from dirty clusters, high computational cost, and uniform attribute weighting. We propose a topology-aware approximate subset repair framework based on a joint density-conflict penalty model. The framework integrates three key components. First, a two-layer conflict detection strategy combines attribute inverted indexes with CFD rule grouping to efficiently identify violations. Second, we introduce EntroCFDensity, a density metric that incorporates information entropy and CFD weights to dynamically adjust attribute importance and reduce homogeneity bias. Third, a conflict degree measure is defined to complement local density, enabling a topology-adaptive penalty mechanism with dynamic weight allocation guided by the coefficient of variation. The conflict graph is further decomposed into independent subgraphs, transforming global repair into tractable local subproblems. Based on this framework, we develop two algorithms: PPIS, a scalable heuristic, and MICO, a mixed-integer programming method with theoretical guarantees. Experimental results show that our approach improves repair accuracy and robustness while effectively preserving high-quality data.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.