This Is How A Scalable Data Architecture Succeeds With CQRS

If multiple applications use a database that cannot meet the application-specific requirements, the Command-Query-Responsibility-Segregation architecture pattern (CQRS) can help. It separates writing and reading access from each other and runs on different databases.

When applications interact with a database, they mostly use the standard CRUD operations (Create, Read, Update, Delete). It is relatively common for writing processes to have different requirements (e.g. scaling, scheme) than reading strategies. As a result, several applications use a database that cannot meet the application-specific needs. For example, an e-commerce application often uses a relational database to store orders. This means that complex queries can be executed across multiple tables when reading. On the other hand, due to their architecture, SQL databases quickly reach their limits regarding write access.

The Command-Query-Responsibility-Segregation (CQRS) architecture pattern provides a remedy for such problems. This architecture pattern separates writing and reading access from each other and runs on different databases.

Architecture Pattern: Command Query Responsibility Segregation (CQRS)

The CQRS architecture pattern is based on treating operations that change the data status (create, update) separately from the reading operations. The separation of the two access patterns allows the individual requirements for scaling, latency, or scheme to be implemented with an optimal database. So that both data stores have the same data status, this must be replicated in a background process that is invisible to the application.

The need for data replication between the write and read database and the resulting delay makes CQRS best suited for those applications that can handle less stringent read-write consistency requirements (eventual consistency).

The separation of write and read access to two different databases can be easily combined with other architectural patterns like microservices or event sourcing. When using microservices, complex processes are divided into independent services that communicate via interfaces. When using CQRS, a separate microservice can be created for write and read access, but this is not necessary since the separation can also occur in the application code.

But how does CQRS integrate with event sourcing? With event sourcing, data changes are not written directly to the database but are added as events to an event log (such as Apache Kafka ). The event log replaces the write database, and event handlers replicate data statuses in the read database. The data status in the event log reflects the most current data status and the complete history of a data set. This means that not only the last data status can be replicated in a read database, but also a data status at any point in time.

Product Database Replication From MySQL To Elasticsearch Using Python

Elasticsearch is a distributed search and analysis tool for various data such as text, geo, unstructured and structured. Elasticsearch ‘s distributed architecture allows for high scalability and performance for large amounts of data. Compared to Elasticsearch, SQL databases such as MySQL are less suitable for use cases such as, e.g. B. Full-text search optimised. This is because Elasticsearch indexes text data differently than a SQL database.

A product database can serve as a practical example. An online shop stores its product catalogue in a MySQL database. This works fine to insert, update, and delete new products. Now the company has decided to introduce a full-text search for products. The MySQL database is reaching its limits here since customers are not looking for the specific product ID but for keywords such as, e.g. B. “Green scarf for women”, which are not shown in the index of the SQL database. In such a case, the full-text search queries can be routed to Elasticsearch using a CQRS architecture. The requests for changing and adding products remain on the MySQL database.

Tools in different programming languages are available to keep the two data stores on the same data status. The following example uses python-MySQL-replication.

To use this library, the binary log must be activated in the configuration file on the MySQL database. This log contains events describing database changes.

Implementation On AWS

There are different ways to replicate datasets. Here, a decisive factor is the database engines used for the source and target systems. Many databases support some form of change data capture (CDC), where changes to data are captured in real-time and then pushed to the target system. With a MySQL database as the source system, the binary log can be used to capture changes and stream them to a target system.

Stream Changes With The AWS Database Migration Service (DMS)

With AWS DMS, data can be transferred between homogeneous databases such as MySQL to MySQL and heterogeneous database platforms such as MySQL to Amazon Elasticsearch Service. With this service, both an initial migration and a continuous data replication can be carried out using CDC. When using DMS for CQRS purposes, different database platforms are usually used. It may therefore be necessary to convert the schema. AWS provides the AWS Schema Conversion Tool (SCT) for this.

Materialised Views With AWS Glue Elastic Views

With this new function of AWS Glue, it is possible to create so-called materialised views without writing any code yourself since the table definition is done using SQL commands. Since AWS Glue Elastic Views, in contrast to DMS, is designed as a “serverless” service, it is unnecessary to manage individual servers when using it. This also means that the capacity is automatically scaled depending on the data throughput. Amazon DynamoDB, Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service are currently supported. Amazon RDS and Amazon Aurora will also be supported soon.

Conclusion

Command Query Responsibility Segregation (CQRS) is an architectural pattern for treating write and read operations separately from each other so that the appropriate data store can be used for the respective access pattern. This allows the performance of applications to be optimised, but additional costs can also arise due to the necessary replication and the operation of two data storage devices. CQRS can be implemented in on-premises data centres and the cloud to optimise applications. By operating CQRS in a cloud environment, managed database systems such as Amazon RDS can be used, which reduces the management effort for databases. In addition, AWS provides two fully managed services, AWS DMS and AWS Glue Elastic Views.

Also Read: What Is Data Analytics With AWS