Get latest articles directly in your inbox
CAP Theorem is one of the most common and important topic for system design interviews. In the world of distributed system it is important to understand the concept of CAP theorem that enables us to choose better design based on the requirements. In this blog we will deep dive into the details of CAP theorem and clear out the misconceptions around it.
Preparing for Interviews? I highly recommend Mastering System Design course on Educative.
Understanding the CAP characteristics
Before learning about what CAP theorem states, we need to understand the three characteristics - Consistency, Availability and Partition Tolerance.
Consistency essentially means that a read request by any client should receive the same data at same point of time irrespective of which node the call is made to in the distributed system. So every read receives the most recent write or an error (if occured). Consistency in CAP actually means linearizability and is not related to C in ACID property.
Imagine there are two nodes N1 and N2 in a system. At
t = 0 we have
a = 1. Not at
t = 1 node N1 recieves a write request with
a = 1. Then, at
t = 2 we receive 2 read requests - one at N1 and other at N2. If the system is consistent it should return
a = 1 for both read request on N1 and N2.
If a system wants to be consistent then it has to instantly forward or replicate data to all the other nodes in the distributed system whenever data is written to one node.
This characteristic is fulfilled if a system can serve every request it receives with a non-error response. Since we are always returning a result, we are fine even if the data is not the most recent write as of now but will be in future requests. A very simple example is that a person should be able to search for a product on Amazon even if the some of the nodes for this service are down. So every non-failing node in a system should return a valid response.
A system is “Partition tolerant” if it continues to operate despite an arbitrary number of messages being dropped (due to network failure) or delayed by the network between nodes. This system is allowed to lose messages sent from one node to another if the communication goes down/delayed due to any network issue. But, we know that in a distributed world, network failures are always possible and can happen at any point in time. It is one of the fallacy of distributed computing is that networks are reliable.
What is CAP Theorem?
CAP theorem states that it is impossible for a distributed data system to simultaneously provide more than two out of the following three guarantees.
- Partition tolerance
It is also known as Brewer’s theorem after computer scientist Professor Eric Brewer who first advanced this in 2000.
As we learned in the previous section that no distributed system is safe from network failures, thus partition tolerance is always present. Based on CAP theorem, we are left with choosing either consistency or availability. Let’s learn about both options and when to choose one over other.
Consistency over Availability
In this case the system will return an error or a time out if particular information cannot be guaranteed to be up to date due to network partitioning. An example would be a financial system where you need data to be consistent for all request. Imagine getting a higher price in a stock trade while the actual price has reduced. Obviously this is not acceptable and thus consistency over availability is preferred in such systems.
Availability over Consistency
Here the system will always process the query and try to return the most recent available version of the information irrespective of whether it can guarantee latest data or not. Example for this would apply for sites like amazon, flipkart, etc where it is okay to show delayed or limited response to customer search queries rather than showing nothing at all.
Misconceptions around CAP theorem
Now that you are aware about the CAP theorem and this might help you in your system design interview. But, CAP uses very narrow definitions and is too simplistic. It specifies a specific model of the distributed system where only network failures are considered. I would highly recommend you to read this brilliant post by Martin Kleppmann to understand the misconceptions around CAP theorem. I wrote this post so that you get idea about CAP theorem and then go and read about the misconceptions from Martin’s article.
Books for mastering system design
- Designing Data-Intensive Applications
- System Design Interview – An Insider’s Guide
- Clean Architecture: A Craftsman’s Guide to Software Structure and Design
Liked the article? Consider supporting me ☕️
I hope you learned something new. Feel free to suggest improvements ✔️
I share regular updates and resources on Twitter. Let’s connect!
Keep exploring 🔎 Keep learning 🚀
Liked the content? Do support :)