This page may be out of date. Submit any pending changes before refreshing this page.
Hide this message.
dowhilezero

CAP Theorem

While studying distributed systems one is bound to come across CAP theorem. CAP theorem helps in making some high level architectural decisions about the system, these decisions greatly depends on the application that the system will host.

CAP (Consistency, Availability, Partition Tolerance) theorem states that we can choose only two of these in a distributed system. Let's see why.

Picture two nodes connected to each other; data is replicated between them i.e. both of them have the exact same data set. Now let's see what C, A and P means in this setup.

N1-------N2

Consistency: If I change the data set in N1, I'll have to replicate this change in N2 so my data looks the same if I see it from either N1 or N2.

Availability: As long as both N1 and N2 are up and running I should be able to query/update data from any of them.

Partition Tolerance: If the link between N1 and N2 fails, I should still be able to query/update my data set.

The best way to understand CAP theorem is to see what happens during a network partition.

N1----/----N2

Clearly N1 cannot communicate with N2 anymore. Let's assume we have some means to discover partition. Now, if someone talks to N1 and changes the data set, N1 does not have any means to propagate this change to N2.

If availability is what we desire, according to the definition, we'll have to allow changes in both N1 and N2. This causes a "split brain problem" where some updates are in N1, others in N2 making the entire system as a whole, inconsistent.

If we choose consistency we'll have to block all changes on N1 and N2, which by definition makes the system unavailable.

We can be a little smarter and direct all communication to one node, either N1 or N2. This will reduce availability because we have only one node in the system which not all clients can reach. Essentially, we assume that the other node is unavailable.

Once the link is restored, we can propagate all the changes made in N1 while the link was down (we can keep a log of all changes once we discover a partition) to N2 so both have the same data set.

Many NoSQL databases choose availability, they have an "eventual consistency" model which has more to do with asynchronous replication than partitioning. They replicate data asynchronously in the background and allow updates on any node. There are chances of someone reading stale data from a node because the latest data hasn't reached there yet. Replication in the background means a node can return to the client after updating its local data set i.e. without waiting for other nodes to make changes. This greatly increases availability.

So we can either choose CP or AP. At this point we should note that we cannot choose CA, it does not make any sense. If partition occurs we cannot be both consistent and available. So the original definition, that we can choose any two is a little flawed.

It is also worthwhile to note that the choice is not a "switch" but more of a sliding scale. Between C and P we can choose the level of C and A which best fits our application.

For example, we can be read available on any node but not update available, or be available on only one node and apply some post-partition recovery process (like we chose to do). The entire system need not go down when we hit a partition to satisfy consistency. Likewise, we can choose eventual consistency if our application is okay with using stale data to improve availability.

Eric Brewer has written a brilliant article about this recently. It has quite some insights, especially about how the choice between C and A is a sliding scale.