Papers
Topics
Authors
Recent
Search
2000 character limit reached

OpenForge: Probabilistic Metadata Integration

Published 13 Dec 2024 in cs.DB | (2412.09788v1)

Abstract: Modern data stores increasingly rely on metadata for enabling diverse activities such as data cataloging and search. However, metadata curation remains a labor-intensive task, and the broader challenge of metadata maintenance -- ensuring its consistency, usefulness, and freshness -- has been largely overlooked. In this work, we tackle the problem of resolving relationships among metadata concepts from disparate sources. These relationships are critical for creating clean, consistent, and up-to-date metadata repositories, and a central challenge for metadata integration. We propose OpenForge, a two-stage prior-posterior framework for metadata integration. In the first stage, OpenForge exploits multiple methods including fine-tuned LLMs to obtain prior beliefs about concept relationships. In the second stage, OpenForge refines these predictions by leveraging Markov Random Field, a probabilistic graphical model. We formalize metadata integration as an optimization problem, where the objective is to identify the relationship assignments that maximize the joint probability of assignments. The MRF formulation allows OpenForge to capture prior beliefs while encoding critical relationship properties, such as transitivity, in probabilistic inference. Experiments on real-world datasets demonstrate the effectiveness and efficiency of OpenForge. On a use case of matching two metadata vocabularies, OpenForge outperforms GPT-4, the second-best method, by 25 F1-score points.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.