CAP Theorem Explained: Trade-offs with Real-World Examples
Every system design interview that touches distributed databases eventually arrives at the same question: "What trade-offs does the CAP theorem force you to make?" Most candidates recite the definition — consistency, availability, partition tolerance, pick two — and stop there. That answer is incomplete, and interviewers know it.
The CAP theorem is not a menu where you circle two items. It is a constraint that shapes how every distributed database behaves during network failures. Understanding it properly means understanding why your database does what it does when things go wrong, not just when everything is healthy.
Let's break it down from first principles.
What the CAP Theorem Actually States
In 2000, Eric Brewer conjectured — and in 2002 Seth Gilbert and Nancy Lynch formally proved — that a distributed data store can provide at most two of the following three guarantees simultaneously:
- Consistency (C) — Every read receives the most recent write or an error. All nodes see the same data at the same time.
- Availability (A) — Every request receives a non-error response, without guaranteeing it contains the most recent write.
- Partition Tolerance (P) — The system continues to operate despite arbitrary message loss or failure of part of the network.
The critical insight most people miss: partition tolerance is not optional in a distributed system. Networks fail. Switches die. Data center links drop. If your system runs on more than one node, partitions will happen. The real choice is between consistency and availability when a partition occurs.
Why You Cannot Have All Three
Consider a simple distributed system with two nodes, Node A and Node B, that replicate data between each other. A client writes a value to Node A.
Now a network partition occurs — Node A and Node B cannot communicate.
A client sends a read request to Node B. The system has exactly two options:
- Return the old (stale) data — the system remains available but sacrifices consistency
- Return an error or wait — the system remains consistent but sacrifices availability
There is no third option. The new data is on Node A. Node B cannot reach Node A. Node B must either serve stale data or refuse to serve.
This is not a theoretical edge case. This is Tuesday afternoon in any distributed system.
Consistency: What It Really Means
CAP consistency is specifically linearizability — the strongest consistency model. It means that once a write completes, all subsequent reads from any node must return that value.
This is different from the "C" in ACID (database transactions). ACID consistency means a transaction takes the database from one valid state to another. CAP consistency is about read-write ordering across distributed nodes.
// Linearizable (CAP-consistent):
Client 1: write(x, 5) -> OK at time T1
Client 2: read(x) -> returns 5 (at any time after T1)
// NOT linearizable:
Client 1: write(x, 5) -> OK at time T1
Client 2: read(x) -> returns 3 (stale value, after T1)
Levels of Consistency in Practice
Real systems offer a spectrum of consistency levels:
| Level | Description | Example |
|---|---|---|
| Strong / Linearizable | Reads always return latest write | Single-leader with sync replication |
| Sequential | Operations appear in some total order | ZooKeeper |
| Causal | Causally related ops appear in order | MongoDB sessions |
| Eventual | Replicas converge given enough time | Cassandra, DynamoDB (default) |
| Read-your-writes | You see your own writes | Session-sticky reads |
Most distributed databases operate somewhere between strong and eventual consistency, and many let you tune this per query.
Availability: What It Really Means
CAP availability means every request to a non-failed node must result in a response. Not a fast response — any response. A system that returns errors during a partition is not CAP-available, even if it responds quickly.
This is a very strong definition. In practice, most systems accept degraded availability — slower responses, partial functionality — rather than complete availability or complete unavailability.
CP Systems: Consistency Over Availability
CP systems choose to remain consistent during a partition, which means they may become unavailable (reject requests or time out) when nodes cannot communicate.
HBase
HBase uses a single master architecture for write coordination. During a region server failure or partition, the affected regions become unavailable until failover completes. Every read returns consistent data, but there is a window of unavailability.
// HBase behavior during partition:
Client -> RegionServer (partitioned from Master)
-> Request fails or times out
-> Master detects failure, reassigns region
-> New RegionServer comes online
-> Requests succeed with consistent data
When to choose HBase: Financial ledgers, inventory systems, any use case where serving stale data is worse than being temporarily unavailable.
MongoDB (with majority write concern)
MongoDB with w: majority and readConcern: majority behaves as a CP system. Writes must be acknowledged by a majority of replica set members. During a partition, the minority side cannot elect a primary and becomes read-only (or unavailable for writes).
// MongoDB CP configuration
const result = await collection.insertOne(
{ accountId: "A123", balance: 5000 },
{ writeConcern: { w: "majority", wtimeout: 5000 } }
);
// Read with majority concern
const doc = await collection.findOne(
{ accountId: "A123" },
{ readConcern: { level: "majority" } }
);
Redis Cluster
Redis Cluster with WAIT command acknowledgment behaves as CP. During a partition, writes to the minority partition may be rejected. Redis prioritizes data consistency — if a master is partitioned from the majority, it stops accepting writes after a timeout.
Other CP Systems
- ZooKeeper — Uses ZAB consensus protocol. Unavailable if quorum is lost.
- etcd — Uses Raft consensus. Requires majority for reads and writes.
- Google Spanner — Uses TrueTime and Paxos. Consistent but may increase latency during partitions.
AP Systems: Availability Over Consistency
AP systems choose to remain available during a partition, which means different nodes may return different (potentially stale) data.
Cassandra
Cassandra is the canonical AP system. It uses a peer-to-peer architecture with tunable consistency. At its default consistency level (ONE), any single replica can serve reads and accept writes, even during a partition.
// Cassandra with consistency level ONE (AP behavior):
Client -> Any replica node -> Immediate response
(may be stale during partition)
// After partition heals:
Anti-entropy repair -> All replicas converge
Read repair -> Stale replicas updated on read
However, Cassandra's consistency is tunable. With QUORUM reads and writes, it behaves more like a CP system:
-- AP behavior (default)
SELECT * FROM messages WHERE user_id = 'u123'
USING CONSISTENCY ONE;
-- CP-like behavior
SELECT * FROM messages WHERE user_id = 'u123'
USING CONSISTENCY QUORUM;
When to choose Cassandra: Time-series data, IoT sensor readings, activity feeds — use cases where availability matters more than instant consistency.
DynamoDB
DynamoDB defaults to eventually consistent reads but offers strongly consistent reads as an option. During partitions, eventually consistent reads always succeed (AP), while strongly consistent reads may fail (CP).
// AP behavior — eventually consistent (default)
const result = await dynamodb.getItem({
TableName: "Messages",
Key: { messageId: { S: "msg-001" } }
});
// CP behavior — strongly consistent
const result = await dynamodb.getItem({
TableName: "Messages",
Key: { messageId: { S: "msg-001" } },
ConsistentRead: true
});
CouchDB
CouchDB uses multi-master replication and embraces conflicts. During a partition, both sides accept writes. When the partition heals, CouchDB detects conflicts and lets the application resolve them.
// CouchDB conflict after partition heals:
{
"_id": "doc123",
"_rev": "3-abc",
"_conflicts": ["3-def"]
}
// Application must resolve which version wins
Other AP Systems
- Riak — Uses vector clocks for conflict detection. Highly available.
- Amazon S3 — Eventually consistent for overwrite PUTs and DELETEs (was — now strongly consistent as of 2020).
- DNS — The original AP system. Caches propagate updates eventually.
CA Systems: Why They Don't Exist in Distributed Systems
A CA system would provide consistency and availability but not partition tolerance. This means the system would fail completely (not just degrade) when a network partition occurs.
A single-node PostgreSQL database is a CA system. There is no network partition possible because there is only one node. It is always consistent and always available — until the single node fails.
The moment you add a second node, network partitions become possible, and you are forced to choose between C and A during those partitions.
// "CA" system = single node
PostgreSQL (single instance) -> Consistent + Available
-> No partition tolerance needed
-> Single point of failure
// The moment you distribute it:
PostgreSQL (primary + replica) -> Must choose C or A during partition
This is why the CAP theorem is really a choice between CP and AP for any real distributed system.
The PACELC Extension
Daniel Abadi proposed PACELC in 2010 to address a limitation of CAP: it only describes behavior during partitions, but distributed systems spend most of their time without partitions. The full trade-off space matters.
PACELC stands for: if there is a Partition, choose between Availability and Consistency; Else (normal operation), choose between Latency and Consistency.
| System | During Partition (PAC) | Normal Operation (ELC) |
|---|---|---|
| Cassandra | PA (available) | EL (low latency) |
| DynamoDB | PA (available) | EL (low latency) |
| MongoDB | PC (consistent) | EC (consistent) |
| HBase | PC (consistent) | EC (consistent) |
| PNUTS (Yahoo) | PA (available) | EL (low latency) |
| Spanner | PC (consistent) | EC (consistent, higher latency) |
| Cosmos DB | Tunable | Tunable |
This is more useful for real-world decisions because it captures the latency-consistency trade-off that you deal with every day, not just during rare partition events.
Why PACELC Matters in Practice
Consider two systems that are both "CP" under CAP:
- MongoDB (PC/EC) — consistent during partitions AND low-latency consistent reads in normal operation (reads from primary)
- Google Spanner (PC/EC) — consistent always, but uses TrueTime synchronization that adds latency to writes
Both are CP, but they behave very differently in normal operation. PACELC captures this.
Real-World Decision Framework
When choosing a database for a distributed system, walk through this framework:
Step 1: What Happens If Users See Stale Data?
If stale data causes financial loss or safety issues:
-> You need CP (strong consistency)
-> Accept: unavailability during partitions, higher write latency
If stale data is annoying but not dangerous:
-> AP is likely fine (eventual consistency)
-> Accept: temporary inconsistency, conflict resolution complexity
Step 2: What Are Your Availability Requirements?
If downtime costs more than inconsistency:
-> Choose AP
-> Examples: social media feeds, product catalogs, session stores
If inconsistency costs more than downtime:
-> Choose CP
-> Examples: bank accounts, inventory counts, booking systems
Step 3: What Is Your Partition Profile?
Single data center with redundant networking:
-> Partitions are rare, lean toward consistency
-> Consider: PostgreSQL with synchronous replication
Multi-region deployment:
-> Partitions are more likely, cross-region latency is real
-> Consider: AP with conflict resolution, or CP with regional unavailability
Edge/IoT deployment:
-> Partitions are the norm, not the exception
-> Consider: AP with CRDTs or application-level conflict resolution
Step 4: Can You Use Tunable Consistency?
Many modern databases let you choose consistency per operation:
// Cassandra: tune per query
// Critical reads (CP-like)
await session.execute(query, params, { consistency: cassandra.types.consistencies.quorum });
// Non-critical reads (AP)
await session.execute(query, params, { consistency: cassandra.types.consistencies.one });
// DynamoDB: tune per read
// Account balance (CP)
await dynamodb.getItem({ ...params, ConsistentRead: true });
// User preferences (AP)
await dynamodb.getItem({ ...params, ConsistentRead: false });
This is often the right answer: use strong consistency for operations that require it and eventual consistency for everything else.
Common Interview Mistakes
Here are the mistakes I see most often in system design interviews:
Mistake 1: Treating CAP as a Permanent Choice
CAP applies during partitions. The same system can behave differently during normal operation. Saying "MongoDB is CP" without qualifying "during a partition, with majority write concern" is incomplete.
Mistake 2: Ignoring That Partitions Are on a Spectrum
A partition is not binary. You can have:
- Complete network split between data centers
- Partial connectivity (some nodes can communicate, others cannot)
- Intermittent failures (messages delayed but eventually delivered)
- Asymmetric partitions (A can reach B, but B cannot reach A)
Each scenario may trigger different behavior in the same database.
Mistake 3: Forgetting About Latency
A system that is "consistent" but takes 30 seconds to respond during a partition is technically consistent and available by CAP definitions, but it is useless in practice. PACELC addresses this.
Mistake 4: Assuming AP Means "No Consistency"
AP systems are eventually consistent. Given enough time without new writes or partitions, all replicas converge. Many AP systems also support stronger consistency levels for specific operations.
Practical Patterns for Handling CAP Trade-offs
Pattern 1: Consistency at the Edge, Availability at the Core
Use a CP database for the source of truth and an AP cache layer for reads:
Client -> CDN (AP, cached) -> API Gateway
-> Redis Cache (AP, eventual) -> PostgreSQL (CP, consistent)
Pattern 2: Per-Entity Consistency
Not all data in the same system needs the same consistency level:
- User profile updates: eventual consistency is fine
- Account balance: strong consistency required
- Message read receipts: eventual consistency is fine
- Payment processing: strong consistency required
Pattern 3: Conflict-Free Replicated Data Types (CRDTs)
For AP systems, CRDTs provide automatic conflict resolution for specific data structures:
- Counters — increment-only counters that merge correctly
- Sets — add/remove sets that resolve conflicts deterministically
- Registers — last-writer-wins or multi-value registers
// G-Counter CRDT (grow-only counter)
Node A: {A: 5, B: 3} // A saw 5 increments, knows B had 3
Node B: {A: 2, B: 7} // B saw 7 increments, knows A had 2
// After merge (take max per node):
Merged: {A: 5, B: 7} // Total count: 12
Riak and Azure Cosmos DB support CRDTs natively.
Summary: A Decision Table
| Use Case | Consistency Need | Recommended Approach |
|---|---|---|
| Banking / financial transactions | Strong | CP (PostgreSQL, Spanner) |
| E-commerce inventory | Strong for checkout | CP for writes, AP for catalog reads |
| Social media feed | Eventual | AP (Cassandra, DynamoDB) |
| Real-time chat messages | Causal | AP with ordering guarantees |
| Configuration management | Strong | CP (etcd, ZooKeeper) |
| IoT sensor data | Eventual | AP (Cassandra, TimescaleDB) |
| Shopping cart | Eventual with merge | AP with CRDTs |
| Leader election / locking | Strong | CP (ZooKeeper, etcd) |
The CAP theorem is not an academic exercise. It is the fundamental constraint that determines how your distributed system behaves when things go wrong. Every time you choose a database, configure replication, or set a consistency level, you are making a CAP trade-off. Understanding the theorem means you make that trade-off deliberately instead of discovering it at 3 AM during an outage.
Keep Reading
Designing a Scalable Notification System
A system design deep dive into building a notification platform that handles push, email, SMS, and in-app notifications at scale — covering architecture, priority queues, fan-out strategies, rate limiting, and delivery tracking.
API Design Best Practices: REST, GraphQL, and gRPC Compared
A deep dive into the three dominant API paradigms — REST, GraphQL, and gRPC — covering design principles, pagination strategies, versioning, authentication patterns, and practical guidance on choosing the right one for your system.
Event-Driven Architecture: Patterns, Pitfalls, and Practical Guidance
A comprehensive guide to event-driven architecture — covering pub/sub, event sourcing, CQRS, saga patterns, message broker trade-offs, and the hard lessons teams learn in production.
Comments
No comments yet. Be the first!