Back
Featured image of post Different methods to tackle Change Data Capture (CDC) & Data Replication

Different methods to tackle Change Data Capture (CDC) & Data Replication

Disclaimer All information i write is of my own opinion, and is not reflective of AWS or Amazon.

Why

Data replication is the process of making multiple copies of your data and storing this data at multiple locations to improve their accessibility across the network.

Some situations which you may want to replicate your data may include (but are not limited to):

  1. Data flowing from your application to your database needs to be replicated to a data lake or data warehouse
  2. You’d like to replicate data from a database to a secondary database location for Disaster Recovery (DR) processes or to perform Data Analytics

How

Common database technologies today either have built-in capabilities, or use third-party tools to accomplish data replication. While Oracle and Microsoft SQL actively support data replication, some other technologies (MySQL / Postgres) may not include this feature out of the box.

I hope the following gives you some ideas on how you can architect methods to replicate data, both before and after the data hits your data storage solution.

1. Replicate the data before the data storage layer

The first method is potentially the easiest - and that is to replicate the data BEFORE the data storage layer.

Pros:

  • By replicating and handling the data before it reaches the data storage medium, you can fan-out the data before managing or changing the configuration of the database.

Cons:

  • You must ensure that the application or processing of this data is strongly consistent, because you are now introducing a potential failure link
  • You are responsible to write the code that handles the data that is to be replicated to different mediums.

Data Replication - Before the storage layer
Data Replication - Before the storage layer

2. Utilise Read Replica(s)

You may be in a situation where you’d rather not handle the data replication, and prefer to rely on read replicas to peform data replication. The benefits of this approach is that by creating a read replica, you can offset read-heavy workloads to the additional copy of your data, reducing the load on your primary instance.

Pros:

  • Database managed feature, so no need to write application logic
  • Can take advantage of write or read replicas based on workflow requirements
  • Read Replicas can be promoted to a master node in the event of a failure

Cons:

  • Increased cost due to secondary instance
  • Homogenous method of replicating data, you’ll have to introduce a new method or service to go mysql –> Postgres for example

Data Replication - Read Replica
Data Replication - Read Replica

3. Aurora Data Activity Streams

If you are using Amazon Aurora Database, you can utilise the Database Activity Streams feature to provide near real-time streams of activities within the relational database.

Database Activity Streams are supported for both MySQL and PostgreSQL.

4. Amazon Data Migration Serice (DMS)

In some instances, you may need to perform a heterogeneous data replication between different database engines, such as MySQL to Postgres. The challenges involved with this may prove to be frustrating as you have to build and code your own DB engine translation, or you could use a service such as Amazon Data Migration Service (DMS).

You could also use DMS as part of your data lake workflow by connecting DMS to your database, and then using AWS Glue to format the data. Ideally you want to always keep a copy of the original data before formatting, as part of your data retention and data lineage strategy.

This blog highlights this strategy.

Data Replication - DMS
Data Replication - DMS

5. Third Party Tools & Binlog Replication

You may find yourself in a situation where all of the above is not viable, and you prefer an open source method of data replication. You can look at utilising Debezium or look for other Change Data Capture (CDC) tools.

Built with Hugo
Theme Stack designed by Jimmy