·13 min read·Rishi

CAP Theorem Explained: Trade-offs with Real-World Examples

CAP Theorem Explained: Trade-offs with Real-World Examples

Every system design interview that touches distributed databases eventually arrives at the same question: "What trade-offs does the CAP theorem force you to make?" Most candidates recite the definition — consistency, availability, partition tolerance, pick two — and stop there. That answer is incomplete, and interviewers know it.

The CAP theorem is not a menu where you circle two items. It is a constraint that shapes how every distributed database behaves during network failures. Understanding it properly means understanding why your database does what it does when things go wrong, not just when everything is healthy.

Let's break it down from first principles.

What the CAP Theorem Actually States

In 2000, Eric Brewer conjectured — and in 2002 Seth Gilbert and Nancy Lynch formally proved — that a distributed data store can provide at most two of the following three guarantees simultaneously:

  • Consistency (C) — Every read receives the most recent write or an error. All nodes see the same data at the same time.
  • Availability (A) — Every request receives a non-error response, without guaranteeing it contains the most recent write.
  • Partition Tolerance (P) — The system continues to operate despite arbitrary message loss or failure of part of the network.

The critical insight most people miss: partition tolerance is not optional in a distributed system. Networks fail. Switches die. Data center links drop. If your system runs on more than one node, partitions will happen. The real choice is between consistency and availability when a partition occurs.

Why You Cannot Have All Three

Consider a simple distributed system with two nodes, Node A and Node B, that replicate data between each other. A client writes a value to Node A.

Now a network partition occurs — Node A and Node B cannot communicate.

A client sends a read request to Node B. The system has exactly two options:

  1. Return the old (stale) data — the system remains available but sacrifices consistency
  2. Return an error or wait — the system remains consistent but sacrifices availability

There is no third option. The new data is on Node A. Node B cannot reach Node A. Node B must either serve stale data or refuse to serve.

This is not a theoretical edge case. This is Tuesday afternoon in any distributed system.

Consistency: What It Really Means

CAP consistency is specifically linearizability — the strongest consistency model. It means that once a write completes, all subsequent reads from any node must return that value.

This is different from the "C" in ACID (database transactions). ACID consistency means a transaction takes the database from one valid state to another. CAP consistency is about read-write ordering across distributed nodes.

// Linearizable (CAP-consistent):
Client 1: write(x, 5)     -> OK at time T1
Client 2: read(x)          -> returns 5 (at any time after T1)

// NOT linearizable:
Client 1: write(x, 5)     -> OK at time T1
Client 2: read(x)          -> returns 3 (stale value, after T1)

Levels of Consistency in Practice

Real systems offer a spectrum of consistency levels:

LevelDescriptionExample
Strong / LinearizableReads always return latest writeSingle-leader with sync replication
SequentialOperations appear in some total orderZooKeeper
CausalCausally related ops appear in orderMongoDB sessions
EventualReplicas converge given enough timeCassandra, DynamoDB (default)
Read-your-writesYou see your own writesSession-sticky reads

Most distributed databases operate somewhere between strong and eventual consistency, and many let you tune this per query.

Availability: What It Really Means

CAP availability means every request to a non-failed node must result in a response. Not a fast response — any response. A system that returns errors during a partition is not CAP-available, even if it responds quickly.

This is a very strong definition. In practice, most systems accept degraded availability — slower responses, partial functionality — rather than complete availability or complete unavailability.

CP Systems: Consistency Over Availability

CP systems choose to remain consistent during a partition, which means they may become unavailable (reject requests or time out) when nodes cannot communicate.

HBase

HBase uses a single master architecture for write coordination. During a region server failure or partition, the affected regions become unavailable until failover completes. Every read returns consistent data, but there is a window of unavailability.

// HBase behavior during partition:
Client -> RegionServer (partitioned from Master)
         -> Request fails or times out
         -> Master detects failure, reassigns region
         -> New RegionServer comes online
         -> Requests succeed with consistent data

When to choose HBase: Financial ledgers, inventory systems, any use case where serving stale data is worse than being temporarily unavailable.

MongoDB (with majority write concern)

MongoDB with w: majority and readConcern: majority behaves as a CP system. Writes must be acknowledged by a majority of replica set members. During a partition, the minority side cannot elect a primary and becomes read-only (or unavailable for writes).

// MongoDB CP configuration
const result = await collection.insertOne(
  { accountId: "A123", balance: 5000 },
  { writeConcern: { w: "majority", wtimeout: 5000 } }
);

// Read with majority concern
const doc = await collection.findOne(
  { accountId: "A123" },
  { readConcern: { level: "majority" } }
);

Redis Cluster

Redis Cluster with WAIT command acknowledgment behaves as CP. During a partition, writes to the minority partition may be rejected. Redis prioritizes data consistency — if a master is partitioned from the majority, it stops accepting writes after a timeout.

Other CP Systems

  • ZooKeeper — Uses ZAB consensus protocol. Unavailable if quorum is lost.
  • etcd — Uses Raft consensus. Requires majority for reads and writes.
  • Google Spanner — Uses TrueTime and Paxos. Consistent but may increase latency during partitions.

AP Systems: Availability Over Consistency

AP systems choose to remain available during a partition, which means different nodes may return different (potentially stale) data.

Cassandra

Cassandra is the canonical AP system. It uses a peer-to-peer architecture with tunable consistency. At its default consistency level (ONE), any single replica can serve reads and accept writes, even during a partition.

// Cassandra with consistency level ONE (AP behavior):
Client -> Any replica node -> Immediate response
          (may be stale during partition)

// After partition heals:
Anti-entropy repair -> All replicas converge
Read repair -> Stale replicas updated on read

However, Cassandra's consistency is tunable. With QUORUM reads and writes, it behaves more like a CP system:

-- AP behavior (default)
SELECT * FROM messages WHERE user_id = 'u123'
  USING CONSISTENCY ONE;

-- CP-like behavior
SELECT * FROM messages WHERE user_id = 'u123'
  USING CONSISTENCY QUORUM;

When to choose Cassandra: Time-series data, IoT sensor readings, activity feeds — use cases where availability matters more than instant consistency.

DynamoDB

DynamoDB defaults to eventually consistent reads but offers strongly consistent reads as an option. During partitions, eventually consistent reads always succeed (AP), while strongly consistent reads may fail (CP).

// AP behavior — eventually consistent (default)
const result = await dynamodb.getItem({
  TableName: "Messages",
  Key: { messageId: { S: "msg-001" } }
});

// CP behavior — strongly consistent
const result = await dynamodb.getItem({
  TableName: "Messages",
  Key: { messageId: { S: "msg-001" } },
  ConsistentRead: true
});

CouchDB

CouchDB uses multi-master replication and embraces conflicts. During a partition, both sides accept writes. When the partition heals, CouchDB detects conflicts and lets the application resolve them.

// CouchDB conflict after partition heals:
{
  "_id": "doc123",
  "_rev": "3-abc",
  "_conflicts": ["3-def"]
}
// Application must resolve which version wins

Other AP Systems

  • Riak — Uses vector clocks for conflict detection. Highly available.
  • Amazon S3 — Eventually consistent for overwrite PUTs and DELETEs (was — now strongly consistent as of 2020).
  • DNS — The original AP system. Caches propagate updates eventually.

CA Systems: Why They Don't Exist in Distributed Systems

A CA system would provide consistency and availability but not partition tolerance. This means the system would fail completely (not just degrade) when a network partition occurs.

A single-node PostgreSQL database is a CA system. There is no network partition possible because there is only one node. It is always consistent and always available — until the single node fails.

The moment you add a second node, network partitions become possible, and you are forced to choose between C and A during those partitions.

// "CA" system = single node
PostgreSQL (single instance) -> Consistent + Available
                             -> No partition tolerance needed
                             -> Single point of failure

// The moment you distribute it:
PostgreSQL (primary + replica) -> Must choose C or A during partition

This is why the CAP theorem is really a choice between CP and AP for any real distributed system.

The PACELC Extension

Daniel Abadi proposed PACELC in 2010 to address a limitation of CAP: it only describes behavior during partitions, but distributed systems spend most of their time without partitions. The full trade-off space matters.

PACELC stands for: if there is a Partition, choose between Availability and Consistency; Else (normal operation), choose between Latency and Consistency.

SystemDuring Partition (PAC)Normal Operation (ELC)
CassandraPA (available)EL (low latency)
DynamoDBPA (available)EL (low latency)
MongoDBPC (consistent)EC (consistent)
HBasePC (consistent)EC (consistent)
PNUTS (Yahoo)PA (available)EL (low latency)
SpannerPC (consistent)EC (consistent, higher latency)
Cosmos DBTunableTunable

This is more useful for real-world decisions because it captures the latency-consistency trade-off that you deal with every day, not just during rare partition events.

Why PACELC Matters in Practice

Consider two systems that are both "CP" under CAP:

  • MongoDB (PC/EC) — consistent during partitions AND low-latency consistent reads in normal operation (reads from primary)
  • Google Spanner (PC/EC) — consistent always, but uses TrueTime synchronization that adds latency to writes

Both are CP, but they behave very differently in normal operation. PACELC captures this.

Real-World Decision Framework

When choosing a database for a distributed system, walk through this framework:

Step 1: What Happens If Users See Stale Data?

If stale data causes financial loss or safety issues:
  -> You need CP (strong consistency)
  -> Accept: unavailability during partitions, higher write latency

If stale data is annoying but not dangerous:
  -> AP is likely fine (eventual consistency)
  -> Accept: temporary inconsistency, conflict resolution complexity

Step 2: What Are Your Availability Requirements?

If downtime costs more than inconsistency:
  -> Choose AP
  -> Examples: social media feeds, product catalogs, session stores

If inconsistency costs more than downtime:
  -> Choose CP
  -> Examples: bank accounts, inventory counts, booking systems

Step 3: What Is Your Partition Profile?

Single data center with redundant networking:
  -> Partitions are rare, lean toward consistency
  -> Consider: PostgreSQL with synchronous replication

Multi-region deployment:
  -> Partitions are more likely, cross-region latency is real
  -> Consider: AP with conflict resolution, or CP with regional unavailability

Edge/IoT deployment:
  -> Partitions are the norm, not the exception
  -> Consider: AP with CRDTs or application-level conflict resolution

Step 4: Can You Use Tunable Consistency?

Many modern databases let you choose consistency per operation:

// Cassandra: tune per query
// Critical reads (CP-like)
await session.execute(query, params, { consistency: cassandra.types.consistencies.quorum });

// Non-critical reads (AP)
await session.execute(query, params, { consistency: cassandra.types.consistencies.one });
// DynamoDB: tune per read
// Account balance (CP)
await dynamodb.getItem({ ...params, ConsistentRead: true });

// User preferences (AP)
await dynamodb.getItem({ ...params, ConsistentRead: false });

This is often the right answer: use strong consistency for operations that require it and eventual consistency for everything else.

Common Interview Mistakes

Here are the mistakes I see most often in system design interviews:

Mistake 1: Treating CAP as a Permanent Choice

CAP applies during partitions. The same system can behave differently during normal operation. Saying "MongoDB is CP" without qualifying "during a partition, with majority write concern" is incomplete.

Mistake 2: Ignoring That Partitions Are on a Spectrum

A partition is not binary. You can have:

  • Complete network split between data centers
  • Partial connectivity (some nodes can communicate, others cannot)
  • Intermittent failures (messages delayed but eventually delivered)
  • Asymmetric partitions (A can reach B, but B cannot reach A)

Each scenario may trigger different behavior in the same database.

Mistake 3: Forgetting About Latency

A system that is "consistent" but takes 30 seconds to respond during a partition is technically consistent and available by CAP definitions, but it is useless in practice. PACELC addresses this.

Mistake 4: Assuming AP Means "No Consistency"

AP systems are eventually consistent. Given enough time without new writes or partitions, all replicas converge. Many AP systems also support stronger consistency levels for specific operations.

Practical Patterns for Handling CAP Trade-offs

Pattern 1: Consistency at the Edge, Availability at the Core

Use a CP database for the source of truth and an AP cache layer for reads:

Client -> CDN (AP, cached) -> API Gateway
  -> Redis Cache (AP, eventual) -> PostgreSQL (CP, consistent)

Pattern 2: Per-Entity Consistency

Not all data in the same system needs the same consistency level:

  • User profile updates: eventual consistency is fine
  • Account balance: strong consistency required
  • Message read receipts: eventual consistency is fine
  • Payment processing: strong consistency required

Pattern 3: Conflict-Free Replicated Data Types (CRDTs)

For AP systems, CRDTs provide automatic conflict resolution for specific data structures:

  • Counters — increment-only counters that merge correctly
  • Sets — add/remove sets that resolve conflicts deterministically
  • Registers — last-writer-wins or multi-value registers
// G-Counter CRDT (grow-only counter)
Node A: {A: 5, B: 3}  // A saw 5 increments, knows B had 3
Node B: {A: 2, B: 7}  // B saw 7 increments, knows A had 2

// After merge (take max per node):
Merged: {A: 5, B: 7}  // Total count: 12

Riak and Azure Cosmos DB support CRDTs natively.

Summary: A Decision Table

Use CaseConsistency NeedRecommended Approach
Banking / financial transactionsStrongCP (PostgreSQL, Spanner)
E-commerce inventoryStrong for checkoutCP for writes, AP for catalog reads
Social media feedEventualAP (Cassandra, DynamoDB)
Real-time chat messagesCausalAP with ordering guarantees
Configuration managementStrongCP (etcd, ZooKeeper)
IoT sensor dataEventualAP (Cassandra, TimescaleDB)
Shopping cartEventual with mergeAP with CRDTs
Leader election / lockingStrongCP (ZooKeeper, etcd)

The CAP theorem is not an academic exercise. It is the fundamental constraint that determines how your distributed system behaves when things go wrong. Every time you choose a database, configure replication, or set a consistency level, you are making a CAP trade-off. Understanding the theorem means you make that trade-off deliberately instead of discovering it at 3 AM during an outage.

Keep Reading

Comments

No comments yet. Be the first!