Enhancing JSON Schema Discovery by Uncovering Hidden Data

Abstract

Schema discovery is finding the structure of data. It helps users understand the meaning of data and write queries to manipulate it. This is typically easy for relational databases, but complex for non-relational (NoSQL) databases with JavaScript Object Notation (JSON) documents. JSON is a representation of documents that contain objects stored in the form of nested key-value pairs. For relational databases, the schema is predefined because the data they contain is structured, but for NoSQL databases, data is usually unstructured or semi-structured. In a collection of JSON documents, the structure of one document can be completely different from another. Several algorithms were developed to discover schemas from JSON documents, but they provide the physical structure and semantic information that is insufficient for data understanding and analysis. In this paper, we enumerate the major techniques used to extract a schema from JSON documents and present the next challenge: uncovering hidden data disguised as metadata. This challenge needs to be addressed within the field of JSON schema discovery to enhance the quality of the discovered schemas.

Publication
VLDB PhD Workshop
Justin Namba
Justin Namba
PhD Student, NRT Trainee