Semantic Data Understanding with Character Level Learning

Abstract

Databases are growing in size and complexity. With the emergence of data lakes, databases have become open, fast evolving and highly heterogeneous. Understanding the complex relationships among different entity types in such scenarios is both challenging and necessary to data scientists. We propose an approach that utilizes a convolutional neural network to learn patterns associated with each entity type in the database at the character level. We demonstrate that the learned character-level patterns can capture sufficient semantic information for many useful applications including data lake schema exploration, and interactive data cleaning.

Type
Publication
International Conference on Information Reuse and Integration for Data Science
Michael Mior
Michael Mior
Assistant Professor

My research focuses on data integration and understanding for non-relational data.