What is Two Phase Commit in Distributed Transaction?
Table of Contents
- Introduction
- What is Two Phase Commit (2PC)?
- When to Use Two Phase Commit?
- Caveat in Using Two Phase Commit
- How to Implement Two Phase Commit?
- Other Similar Algorithms
- Conclusion
Introduction
Consensus is one of the most important and fundamental problems in distributed computing. The goal of consensus algorithms is pretty simple: get multiple machines (a.k.a nodes) to agree on something. In reality, this is quite hard to achieve due to various failure scenarios (e.g., network partition).
What is Two Phase Commit (2PC)?
Two-phase commit (a.k.a 2PC) is an algorithm for achieving atomic transaction commit across multiple nodes. By definition of atomic transaction, all nodes participating in 2PC must either successfully commit or abort together.
One node would be acting as a coordinator (a.k.a transaction manager) to initiate 2PC. There are two phases to this algorithm:
- prepare phase: asking other nodes whether they can commit the proposed transaction.
- commit phase: commanding other nodes to either commit or abort the proposed transaction.
Prepare Phase: 1. prepare -> 2. acquire lock -> 3. ack (yes/no)
Commit Phase: 1. commit -> 2. commit & release lock -> 3. ack (ok)
What does Saying “Yes” Mean?
During the prepare phase, once a node accepts the proposed transaction, it must commit the transaction if the coordinator sends the commit request, regardless of any failure scenarios. In order to respect this promise, the participant node must write transaction data in disk before sending the commit promise to the coordinator. So when it recovers from failure, it knows which transaction to commit. The coordinator will retry forever to broadcast to either abort/commit.
When to Use Two Phase Commit?
You can use 2PC algorithm to achieve consensus on all nodes (i.e. making all nodes agree on the outcome of a transaction) in a distributed system. One really interesting use case is to implement exactly-once message processing using 2PC. Please read this article: “An Overview of End-to-End Exactly-Once Processing in Apache Flink” if you are interested why two phase commit algorithm must be used to achieve the exactly once semantic.
Another use case is to perform an atomic transaction commit across multiple nodes. This is often needed when your system has a hard requirement to have a consistent view of data all time.
Caveat in Using Two Phase Commit
2PC is often called a “blocking” atomic commit protocol (or “anti-availability” protocol) because all nodes/members must be up for it to work. Especially, if a coordinator dies, all nodes would have to wait to hear the final decision.
Coordinator Failure
After receiving prepare responses from all nodes, a coordinator must writes the final decision (either to commit or abort) in disk before sending the final decision to other nodes. So in case the coordinator dies in the middle of the protocol, it knows how to recover. If the coordinator crashes before making the final decision for the commit phase, it can simply send the abort request for to all nodes.
Moreover, until the coordinator recovers, all nodes that promised to commit the transaction would have to prohibit read/write operation on the data affected by the transaction. Time-out would not help because participant nodes cannot simply abort the transaction after certain period because it promised to follow the final decision from the coordinator. Therefore, if the coordinator does not recover successfully, a person might have to manually intervene to make decision on the ongoing transaction on each participant nodes.
Google Spanner, for example, mitigate this drawback by having each member be a Paxos group, thus ensuring each 2PC “member” highly available even if some of its Paxos participants are down. Data is divided into groups that form the basic unit of placement and replication. Refer to this paper for more information.
How to Implement Two Phase Commit?
The simplest way to implement 2PC is using Zookeeper, which is an open-source and a high-performance coordination service for distributed applications. The “Two-phased Commit” section in the article describes how to implement 2PC in detail.
In real-life, you won’t have to implement 2PC from scratch unless you are part of an infrastructure team building out something new. Even then, it’s highly likely that you will leverage open source like Zookeeper to make things easier. Remember, “do not reinvent the wheel.”
Other Similar Algorithms
There are other similar algorithms to achieve similar things with different requirements.
Paxos
Paxos is an algorithm to reach consensus on something across multiple nodes. The main difference with 2PC is that it does not require agreement from all nodes. It requires agreement from the majority of the nodes. By relaxing the consensus requirement from all nodes to majority of nodes, it improves the availability and performance aspect of the distributed transaction.
Unless atomic commit is strictly required (e.g., in exactly-once message processing), Paxos algorithm is preferable due to its availability and performance benefits. Refer to this video to learn more about Paxos algorithm.
Two Phase Locking (2PL)
2PL is not similar to 2PC. You can think of them as different thing. A lot of people get confused the two because of their similar name. 2PL is a locking algorithm to achieve serializability (i.e. making sure the same data is not concurrently modified at the same time) in a single machine. Whereas, 2PC is an algorithm to achieve atomic commit in a distributed computing. 2PC actually uses 2PL to make sure the same data is not concurrently modified by other process while it is in a distributed transaction.
Conclusion
2PC is a powerful algorithm to achieve consensus on all nodes in a distributed system. However, it comes with certain disadvantages (e.g., performance, availability). Like any other system design concept, it’s crucial to understand one’s requirement and limitation to make an optimal choice for your design.
I hope this article helps you to understand pros and cons of 2PC algorithm and to build capability to throughly assess trade-off between different algorithms for your future system designs. Please comment below if you have any further questions or concerns.