Semantic JSON Generation

Abstract

What started as a contribution towards enhancing the per- formance of the query optimizer in Apache Calcite, which is an actively growing open-source framework for building and managing databases, transitioned into optimization of SQL queries in general. To build an intricate analytic system for any emerging real-world big data application, complex queries are needed. These applications demand very high performance and ultra-practical functionality, which can be is to provide a high-level analysis of the system. These complex queries will have a lot of reusable conditional subexpressions. The idea of reusability can be extended even to big data systems where querying happens in batches and data being dealt with will in terabytes and will be continuously growing. These systems are expected to deliver high performance by processing the queries and obtain results quickly. The main idea behind optimizing these queries would be is to how these conditional sub-expressions can be scrutinized and capitalized upon, which will result in efficient big data systems with reusability. In this paper, the idea of reusable conditional sub-expressions is achieved by building directed acyclic graphs for every sub-expression part of the query and inter-linked accordingly. An optimizer which takes in multiple SQL statements as input will provide an efficient way to enhance the performance of the system to which the SQL queries are plugged into.

Type
Shashank Prabhakar
Shashank Prabhakar
MSc Capstone Student
Software Engineer at Microsoft