Table of Contents
Data replication is a process of making multiple copies of data (replicating) and storing them at different locations (machines) to improve the availability, reliability, and fault tolerance of a software system. The multi-leader replication strategy is one of the popular algorithms for replicating changes between machines (a.k.a nodes).
What is Multi-Leader Replication Strategy?
There are multiple ways to replicate data between machines. The most common approach is to use the single-leader setup which promotes a single machine (which is often called a leader) to serve write requests. The leader node would propagate the updates to other machines (a.k.a replica or followers). Here is the illustration of the single-leader replication setup:
The natural extension of the single-leader formation is to increase the number of nodes handling write requests for various benefits (e.g., write performance, network partition). We call this the multi-leader replication. In this setup, multiple machines are promoted to serve write requests and send updates to other machines asynchronously. The setup would look something like this:
When to Use Multi-Leader Replication Strategy?
You might be wondering, “why the single-leader setup is the most popular replication algorithm?” It’s because of its simplicity. It rarely makes sense to use the multi-leader setup due to its complexity, which will be discussed in the next section in detail. However, there are several scenarios where the benefits outweigh the complexity:
- Multi-datacenter operation
- Offline-mode Support
- Collaborative Editing
Imagine a system that handles requests coming from multiple regions (e.g., North America, Asia) to support international users. You would need machines in multiple regions to provide the best latency across the globe. With the single-leader setup, you would have to redirect all the write requests to the leader machine, which could be located across the globe:
With the multi-leader setup, you can have one leader in the US and another leader located in Asia to handle local writes without needing cross-datacenter redirects:
Furthermore, since the cross-datacenter requests are expensive, we can have a synchronizer that syncs data across data centers periodically (e.g., every hour), instead of sending updates for every write requests:
Aside from the performance improvement, the multi-leader setup would allow tolerating datacenter outages or network failures. However, the benefits come with a big downside of allowing the same data concurrently being modified in different datacenters. If strong consistency is an important requirement for your system, the multi-leader setup might not be the best setup for replication.
Another use case is to support offline mode. The offline mode is prevalent nowadays in a lot of services (e.g., Google Maps, Evernote). It allows users to continue using services without a network connection.
To make this happen, your local machine/device would be acting as another write replica and become responsible for handling write requests and syncing the data to other machines when the network connection comes back online. Such setup could look like this:
When you are using a collaborative editing tool, you would have noticed that your edits are instantly applied. This is possible because once again, your machine is acting as a write replica to handle edit requests and send updates asynchronously to other machines.
You might have seen an error message like “Something went wrong, please refresh.” Such error can happen when multiple users attempt to make updates to the same data and the system cannot automatically resolve the write conflicts.
What’s the Complexity in Using Multi-Leader Setup?
We briefly mentioned above that the multi-leader setup can be unfavorable due to its complexity. Let’s look at what causes the complexity in this setup. We will look at the two main complexities in using the multi-leader algorithm:
- Handling write-conflicts
- Inconsistent read
Handling Write Conflicts
What is a conflict? Imagine a situation where two users are trying to change the title of a document simultaneously and the edit requests are handled by two different leader nodes. How do you know in what order to apply the edit request or how to merge two requests?
In principle, we can make the conflict detection synchronous by sending the updates to replicas synchronously. Then, we decide to either wait or abort the concurrent write requests. However, it would vanish all the benefits of the multi-leader setup.
There are multiple approaches to handling write conflicts. For instance, you can configure to avoid write conflicts by making sure the same data is handled by the same leader node. Or you can decide to implement some algorithm to automatically resolve conflicts in a consistent manner (e.g., using a unique ID composed of timestamp and randomly generated number), which would potentially incur data loss.
To make things a bit more complicated, imagine if the replication updates are lost (e.g., due to network issues). Or the replication updates arrive in other nodes with a significant delay or out-of order.
For now, we will leave it here by just understanding the complication involved in handling write conflicts and sometimes even detecting the occurrences of concurrent writes. We will use another blog to dive deeper into this concept.
Another problem with the multi-leader setup is handling inconsistent read. The replication updates are done asynchronously so we can potentially end up with a situation where two nodes have different views of the data. If a user somehow ends up sending read requests to both machines, they might end up with reading inconsistent data:
The inconsistent read problem also exists in the single-leader setup because read requests can be handled by any replicas (even if a replica is not fully synced with the leader node). However, since the multi-leader setup allows more nodes to handle write requests, it requires more synchronization between nodes and potential cases of inconsistency between machines.
With that, I hope you have a better understanding of the multi-leader replication strategy. Even with the advantages it provides (e.g., performance, network partition tolerance), it comes with additional complexity and inconsistent read problems. Also, it might be much harder to operate or maintain due to the complexity.
Like with any other system design concept, you can only make the optimal decision by thoroughly assessing the requirements, understanding the pros and cons of different approaches, and making the right trade-offs.
Here are some list of topics you might be interested to read further to better understand about replication:
- How to handle different node outages?
- When to use single-leader replication strategy?
- When to use leaderless replication strategy?
- What are some problems with replication lags?
- How to detect and handle write conflicts in distributed system?