Semantic JSON Generation

Abstract

Semantic types describe the information about the entity types and the data those types hold. Detecting semantic types has been a challenge in recent years, and most machine learning models fail to detect semantic types with great accuracy when used against dirty data. These models were generally trained on relational databases, and the testing results of models trained on JSON datasets are still unknown. I introduce a way of creating JSON data files that can be used for training the models that can detect semantic types. I used the sherlock dataset to create JSON data files based on the relationships found amongst the semantic types. The relationships between the semantic types were determined using the ontology mentioned on DBpedia. I was able to find different types of relationships between the semantic types, and based on those relationships I was able to generate Semantic JSON data files. However, I found some anomalies corresponding to some semantic types in the final JSON data files. To evaluate the results, I tracked the anomalies from the sherlock dataset to the source dataset. The source dataset was corrupted at the time sherlock dataset was created.

Type
Chirag Goel
Chirag Goel
MSc Capstone Student
Software Engineer at Meta