Papers
Topics
Authors
Recent
Search
2000 character limit reached

Semantic Labeling Using a Deep Contextualized Language Model

Published 30 Oct 2020 in cs.LG, cs.DB, and cs.IR | (2010.16037v1)

Abstract: Generating schema labels automatically for column values of data tables has many data science applications such as schema matching, and data discovery and linking. For example, automatically extracted tables with missing headers can be filled by the predicted schema labels which significantly minimizes human effort. Furthermore, the predicted labels can reduce the impact of inconsistent names across multiple data tables. Understanding the connection between column values and contextual information is an important yet neglected aspect as previously proposed methods treat each column independently. In this paper, we propose a context-aware semantic labeling method using both the column values and context. Our new method is based on a new setting for semantic labeling, where we sequentially predict labels for an input table with missing headers. We incorporate both the values and context of each data column using the pre-trained contextualized LLM, BERT, that has achieved significant improvements in multiple natural language processing tasks. To our knowledge, we are the first to successfully apply BERT to solve the semantic labeling task. We evaluate our approach using two real-world datasets from different domains, and we demonstrate substantial improvements in terms of evaluation metrics over state-of-the-art feature-based methods.

Citations (14)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.