CAP Theorem Explained

room271 · on June 19, 2014

Unfortunately, this guy doesn't understand the CAP theorem at all.

Once you are distributed, P is not an optional. Rather, in the case of network failure, consistency or availability is what suffers. A system cannot be both CA and distributed.

So, for example, Elasticsearch is not a 'CA' solution despite the diagram in the article, but is actually closer to PC (although, in practice it is far more subtle than even that as it is not perfectly consistent, and configuration options allow for some trade-offs between availability and consistency in the case of communication errors.

Dave_Rosenthal · on June 19, 2014

I can confirm. I design distributed databases; this "explanation" is one of the worst I have ever seen (which is saying something).

MasterScrat · on June 19, 2014

Could you please point us to the "best"?

mjb · on June 19, 2014

I've commented elsewhere in this thread with a list of better articles and papers. If you've only got time to read one, I'd probably read http://research.microsoft.com/apps/pubs/default.aspx?id=1926.... In fact, if you do nothing else, just look at figure 2 from that paper.

Dave_Rosenthal · on June 19, 2014

Thanks to mjb for the good list. To it I'll add: http://markburgess.org/blog_cap.html

I think I also might write one up myself.

Dave_Rosenthal · on June 19, 2014

In case anyone is still watching this thread--my version:

http://blog.foundationdb.com/minimal-explanation-of-the-cap-...

notacoward · on June 19, 2014

A system can maintain C and A despite a partition as long as no requests need to cross the partition boundary. In practice this isn't something you can count on, and CA systems tend to have absolutely horrible failure modes when reality intrudes, but - unless we redefine "partition" to mean a partition that actually gets in your way - they're not strictly impossible.

Longer version (approx. four years old) here:

http://pl.atyp.us/wordpress/index.php/2010/10/when-partition...

mjb · on June 19, 2014

If you've got no coordination then distributed systems are much easier. That's a good design goal, for sure, and distributed database designers spend a lot of time thinking about where they actually need coordination. On the other hand, a lot of very interesting applications - like high-availability ACID - require coordination.

martius · on June 19, 2014

Something always bugs me with CAP theorem explanations, and I am really confused by all that: usually, I read (or understand from those articles) that a system is partition tolerant when "the system is available and consistent while one server is down (=one server is out of reach)".

But, what I think is actually true is: Partition tolerant means that nodes still respond to requests while some other nodes are out of reach (one node being one or more instances of the DBMS in one cluster).

Am I right?

So, If that is true, having a RDBMS with master<=>master replication between two sites is Available and Partition tolerant, albeit inconsistent (the replication is not possible).

In the case of a master<=>master replication as a load-balancing/failover solution, it's about availability and consistency, but we should not even call that "distributed".

Still right?

bri3d · on June 19, 2014

I think, if you take the view espoused in http://codahale.com/you-cant-sacrifice-partition-tolerance/ , that P can be explained as the "the system is predictable across a partition."

Geography and whether it's a "failover solution" doesn't change that the system is distributed - anything with more than one node is fundamentally distributed, and I'd argue that even two processes communicating on one instance can be considered as a distributed system as well.

The system can choose which kind of predictable it wants to be: whether it sacrifices consistency in favor of being available (accepting reads and writes without knowing that reads are fresh or writes linearizable) or availability in favor of consistency (rejecting reads and writes that can't be confirmed).

In the case of "master<=>master replication" as it's configured in most relational databases, I think the system tries to be AP: conflicted writes or stale reads are possible because barring special options like 2-phase commit, the replicas generally lag each other (as anyone who's tried to reconcile a MySQL split-brain situation knows, this can be a tremendous pain).

martius · on June 20, 2014

It makes sense, thank you!

ggreer · on June 19, 2014

Also, he seems to think that C vs A is decided when one chooses a data store, when in fact it can be done at the granularity of individual queries. For example, Cassandra allows reads and writes to one node, the majority of nodes (quorum), or to all nodes.

rg81 · on June 19, 2014

There's some debate on the Elasticsearch forums around CA vs. CP - with the overall consensus in favor of CA. Elasticsearch doesn't perform well across networks so in practice, most deployments have multiple nodes in a single datacenter - CA.

Thanks for your feedback though, I have some posts queued up on more of the subtleties in the model, this was just meant to be an introductory post.

kainosnoema · on June 19, 2014

Elasticsearch actually sacrifices both consistency and availability during partitions, even when inside a single datacenter (trust me, they happen in production).

If you configure ES to prioritize consistency somewhat (minimum_master_nodes), it prevents writes during a partition—but there's at least one "split" partition scenario where even minimum_master_nodes doesn't prevent inconsistent writes. If you configure ES to prioritize availability during a partition, it isn't consistent. Remember, ES doesn't claim to be consistent and doesn't even use any sort of consensus algorithm.

room271 · on June 19, 2014

I feel like my original comment was a bit harsh so thanks for replying.

Re: the debate about CA vs CP, I've seen this too, but I think it reflects confusion rather than genuine options. The 'P' of CAP takes effect whenever you have two or more processes which clients can communicate with. So even if you are running within the same datacenter when using ES, provided clients can communicate with more than one ES node, CAP still applies.

mjb · on June 19, 2014

As others have said, that article is very misleading. Some better ones:

* http://research.microsoft.com/apps/pubs/default.aspx?id=1926... - including a very nice way of thinking about CAP and tradeoffs)

* http://www.infoq.com/articles/cap-twelve-years-later-how-the... - a good perspective on CAP, and why many people still don't understand it clearly.

* http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pd... - don't get put off by the formal language. Gilbert and Lynch is still (IMO) the best explanation of what CAP means, and what it implies.

* http://blog.cloudera.com/blog/2010/04/cap-confusion-problems... - Some good criticism of the way CAP is frequently explained.

* http://codahale.com/you-cant-sacrifice-partition-tolerance/ - Why CA systems don't actually exist.

* http://cs-www.cs.yale.edu/homes/dna/papers/abadi-pacelc.pdf - PACELC, maybe a better model.

* http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-an... - Another look at PACELC, with some examples.

SEJeff · on June 19, 2014

For those who want to read about real tests of how various common software fares under network partitions, here is some amazing work done by aphyr and his infamous Jepesen tests:

http://aphyr.com/tags/jepsen

This post has some info on elastic, but nothing in excruciating detail: http://aphyr.com/posts/288-the-network-is-reliable

SEJeff · on June 19, 2014

Won't let me edit it, but there is now a full test on elasticsearch here: http://aphyr.com/posts/317-call-me-maybe-elasticsearch

nwjsmith · on June 19, 2014

You can't sacrifice partition tolerance: http://codahale.com/you-cant-sacrifice-partition-tolerance/

angersock · on June 19, 2014

I'm sorry, but when I read this, I thought of SimCity...

"YOU CANT REDUCE FUNDING TO PARTITION TOLERANCE. YOULL REGRET THIS."

I wonder if anybody's made SimDataCenter?

dllthomas · on June 19, 2014

"I wonder if anybody's made SimDataCenter?"

That would be really interesting.

mycodebreaks · on June 19, 2014

In many talks and presentations, I have heard that only eventual consistency was possible in a distributed system. since, A and P are already needed, you can only compromise on C.

Therefore, I am curious in knowing. Here's my question: how do they configure production systems in order to get reasonable or 100% maybe consistency? In other words, the reads must factor latest write into account. No, stale reads.

doverton · on June 19, 2014

"Consistency around CAP is similar to what you find in a typical ACID model - except that, now, we're in a distributed model."

I thought that consistency in ACID meant that data was always consistent with the rules of the database, whereas in CAP it means that the same data held in different locations is the same? Is that not right?

Dave_Rosenthal · on June 19, 2014

You are right. The article is totally wrong.

From Eric Brewer himself: "The relationship between CAP and ACID is more complex and often misunderstood, in part because the C and A in ACID represent different concepts than the same letters in CAP..."

(from http://www.infoq.com/articles/cap-twelve-years-later-how-the...)

fh973 · on June 19, 2014

No, the data in different locations does not need to be the same (think: quorum operations).

Consistent means that they are consistent from the perspective of the client within the rules of the consistency model. If you have sequential consistency and quorum operations, your replicas do not need to have the same data, but the view of your clients is always consistent.