Data migration for column family database evolution

Abstract

Context: Database evolution involves processes such as the evolution of the schema, the adaptation of the application to the new schema, and migrations of data to the new or modified structures of the schema. Data migration is particularly crucial in databases where data repetition is common such as the NoSQL column family DBMSs. In these systems, data integrity cannot be enforced from the database side, but instead needs to be maintained from the application side. Database evolution is also affected by data repetition and the absence of data integrity enforcement from the database, as any evolution of the schema requires data migrations to maintain data integrity. Objectives: Ensure data integrity in NoSQL column family DBMSs during database evolution by providing specific instructions for the execution of the necessary data migrations. Methods: We propose MoDEvo, a model-driven engineering approach that provides a data migration model to ensure data integrity for database evolution in column-family DBMSs. This model is then transformed into an executable script that implements the migration procedures. Results: We evaluate MoDEvo by executing data migrations in case studies obtained from open-source projects where the schema evolved. In this evaluation we use Apache Cassandra, the most popular column-family DBMS. Through this evaluation, we verify that the scripts generated from the data migration model effectively maintain data integrity within the database. Conclusion: MoDEvo aids database evolution in column family DBMSs by avoiding the incurrence in the creation of inconsistencies and can also detect impossible migrations, thereby preventing errors. There is still room for improvement such as extending the supported databases to other paradigms where data repetition is common and addressing the evolution of the client applications alongside schema evolution.

Type
Publication
Information and Software Technology
Michael Mior
Michael Mior
Assistant Professor

My research focuses on data integration and understanding for non-relational data.