Papers
Topics
Authors
Recent
Search
2000 character limit reached

TreeCat: Standalone Catalog Engine for Large Data Systems

Published 4 Mar 2025 in cs.DB | (2503.02956v1)

Abstract: With ever increasing volume and heterogeneity of data, advent of new specialized compute engines, and demand for complex use cases, large scale data systems require a performant catalog system that can satisfy diverse needs. We argue that existing solutions, including recent lakehouse storage formats, have fundamental limitations and that there is a strong motivation for a specialized database engine, dedicated to serve as the catalog. We present the design and implementation of TreeCat, a database engine that features a hierarchical data model with a path-based query language, a storage format optimized for efficient range queries and versioning, and a correlated scan operation that enables fast query execution. A key performance challenge is supporting concurrent read and write operations from many different clients while providing strict consistency guarantees. To this end, we present a novel MVOCC (multi-versioned optimistic concurrency control) protocol that guarantees serializable isolation. We conduct a comprehensive experimental evaluation comparing our concurrency control scheme with prior techniques, and evaluating our overall system against Hive Metastore, Delta Lake, and Iceberg.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.