Caching Strategies for High-Performance Systems
Every millisecond counts. When your database query takes 50ms and a cache lookup takes 0.5ms, caching is not an optimization — it is a fundamental architectural decision. The difference between a system that handles 1,000 requests per second and one that handles 100,000 often comes down to how well you cache.
Why Caching Matters: The Latency Numbers
Before diving into strategies, internalize these approximate latency figures:
| Operation | Latency |
|---|---|
| L1 cache reference | 0.5 ns |
| L2 cache reference | 7 ns |
| Main memory reference | 100 ns |
| Read 1 MB from memory | 250 us |
| Redis GET (same datacenter) | 0.5 ms |
| Round trip within datacenter | 0.5 ms |
| SSD random read | 0.15 ms |
| Disk seek | 10 ms |
| Read 1 MB from network | 10 ms |
| PostgreSQL simple query | 1-50 ms |
| Round trip US coast to coast | 40 ms |
A cache hit at the Redis layer is 100x faster than a typical database query. At scale, this difference determines whether your system feels instant or sluggish.
Cache-Aside (Lazy Loading)
Cache-aside is the most common caching pattern. The application manages both the cache and the database directly.
How It Works
- Application receives a request
- Check the cache for the data
- Cache hit: return the cached data
- Cache miss: query the database, store the result in the cache, return the data
class CacheAsideService:
def __init__(self, cache, database):
self.cache = cache
self.db = database
def get_user(self, user_id: str) -> dict:
# Step 1: Check cache
cached = self.cache.get(f"user:{user_id}")
if cached:
return json.loads(cached)
# Step 2: Cache miss — read from database
user = self.db.query("SELECT * FROM users WHERE id = %s", user_id)
if user is None:
return None
# Step 3: Populate cache for next time
self.cache.set(
f"user:{user_id}",
json.dumps(user),
ex=3600 # TTL of 1 hour
)
return user
def update_user(self, user_id: str, data: dict):
# Update database first
self.db.execute("UPDATE users SET ... WHERE id = %s", user_id)
# Invalidate cache (do NOT update it — delete it)
self.cache.delete(f"user:{user_id}")
When to Use Cache-Aside
- Read-heavy workloads where data changes infrequently
- When you can tolerate slightly stale data on cache misses
- When you want full control over what gets cached and when
Pitfall: Delete, Don't Update
On writes, delete the cache entry rather than updating it. Updating creates a race condition where two concurrent writes can leave the cache in an inconsistent state.
Read-Through Cache
Read-through looks similar to cache-aside, but the cache itself is responsible for loading data from the database on a miss. The application only talks to the cache.
class ReadThroughCache:
"""
The cache library handles loading on miss.
Application code never touches the database directly for reads.
"""
def __init__(self, cache_client, loader_fn):
self.cache = cache_client
self.loader = loader_fn
def get(self, key: str) -> dict:
value = self.cache.get(key)
if value is None:
# Cache is responsible for calling the loader
value = self.loader(key)
if value is not None:
self.cache.set(key, value, ex=3600)
return value
# Usage
cache = ReadThroughCache(
redis_client,
loader_fn=lambda key: db.query_user(key.split(":")[1])
)
user = cache.get("user:42") # App never calls DB directly
Advantage over cache-aside: Simpler application code. The caching logic is encapsulated inside the cache layer, so application developers cannot accidentally skip the cache.
Write-Through Cache
Write-through ensures that every write goes to both the cache and the database synchronously. The write is only considered successful when both operations complete.
class WriteThroughCache:
def write(self, key: str, value: dict):
# Write to cache AND database in a single operation
self.cache.set(key, json.dumps(value))
self.db.execute(
"INSERT INTO data (key, value) VALUES (%s, %s) "
"ON CONFLICT (key) DO UPDATE SET value = %s",
key, json.dumps(value), json.dumps(value)
)
# Both succeed or the operation fails
Pros and Cons
- Pro: Cache is always consistent with the database — no stale reads
- Pro: Simple mental model — write once, it is everywhere
- Con: Higher write latency — every write waits for both cache and DB
- Con: Writes to data that may never be read waste cache space
Write-through works best when combined with read-through, giving you a unified caching layer.
Write-Behind (Write-Back)
Write-behind is the performance-optimized cousin of write-through. Writes go to the cache immediately, and the cache asynchronously flushes changes to the database in the background.
class WriteBehindCache:
def __init__(self, cache, db, flush_interval=5):
self.cache = cache
self.db = db
self.dirty_keys = set()
self.flush_interval = flush_interval
self._start_background_flush()
def write(self, key: str, value: dict):
# Write to cache immediately — fast!
self.cache.set(key, json.dumps(value))
self.dirty_keys.add(key)
# Return immediately, DB write happens async
def _flush_to_db(self):
keys_to_flush = list(self.dirty_keys)
self.dirty_keys.clear()
for key in keys_to_flush:
value = self.cache.get(key)
if value:
self.db.upsert(key, json.loads(value))
def _start_background_flush(self):
# Runs every flush_interval seconds
scheduler.every(self.flush_interval).seconds.do(self._flush_to_db)
The Risk
If the cache node crashes before flushing to the database, you lose data. Write-behind is only appropriate when:
- You can tolerate some data loss (counters, analytics, session data)
- You batch writes for efficiency (reducing DB load by 10-100x)
- You have cache replication for durability
Cache Invalidation Strategies
Phil Karlton famously said there are only two hard things in computer science: cache invalidation and naming things. Here are your options.
TTL-Based Invalidation
The simplest approach: every cache entry expires after a fixed time.
# Simple TTL
cache.set("product:123", data, ex=300) # Expires in 5 minutes
# TTL with jitter to prevent thundering herd
import random
base_ttl = 300
jitter = random.randint(0, 60)
cache.set("product:123", data, ex=base_ttl + jitter)
When to use: Data that changes unpredictably and where staleness up to TTL seconds is acceptable.
Event-Based Invalidation
Invalidate cache entries when the underlying data changes, using events or database triggers.
# Using an event bus for cache invalidation
class UserService:
def update_user(self, user_id, data):
self.db.update_user(user_id, data)
# Publish event — cache subscriber will invalidate
self.event_bus.publish("user.updated", {
"user_id": user_id,
"timestamp": time.time()
})
class CacheInvalidationSubscriber:
def on_user_updated(self, event):
user_id = event["user_id"]
self.cache.delete(f"user:{user_id}")
self.cache.delete(f"user_profile:{user_id}")
self.cache.delete(f"user_permissions:{user_id}")
When to use: When you need strong consistency and have a reliable event system in place.
Versioned Keys
Instead of invalidating, create a new cache key each time data changes.
# Store a version counter
version = cache.incr(f"user:{user_id}:version")
cache_key = f"user:{user_id}:v{version}"
cache.set(cache_key, user_data, ex=86400)
# On read, always fetch the current version first
current_version = cache.get(f"user:{user_id}:version")
data = cache.get(f"user:{user_id}:v{current_version}")
When to use: When you want to avoid delete-then-repopulate races and old cache entries can expire naturally.
Cache Eviction Policies
When the cache is full, something has to go. These policies determine what gets evicted.
LRU (Least Recently Used)
Evicts the entry that has not been accessed for the longest time. This is the default policy for Redis (allkeys-lru or volatile-lru).
- Best for: General-purpose caching where recent access patterns predict future access
- Weakness: A one-time scan of many keys can evict frequently used entries
LFU (Least Frequently Used)
Evicts the entry that has been accessed the fewest times. Redis supports this with allkeys-lfu.
- Best for: Workloads with a stable set of hot keys (product catalog, popular content)
- Weakness: New entries start with low frequency and may be evicted before they prove popular
FIFO (First In, First Out)
Evicts the oldest entry regardless of access patterns.
- Best for: Time-series data where newer entries are always more relevant
- Weakness: Ignores access patterns entirely — a frequently accessed old item gets evicted
Choosing the Right Policy
Decision guide:
├── Is access recency a good predictor? → LRU
├── Do you have a stable set of hot keys? → LFU
├── Is data freshness more important than access patterns? → FIFO
└── Not sure? → Start with LRU (it works well for most workloads)
Distributed Caching with Redis and Memcached
Redis
Redis is the Swiss Army knife of caching. Beyond simple key-value storage, it supports:
- Data structures: Strings, hashes, lists, sets, sorted sets, streams
- Persistence: RDB snapshots and AOF logs for durability
- Replication: Primary-replica for read scaling and failover
- Cluster mode: Automatic sharding across multiple nodes
- Pub/Sub: Built-in event system for cache invalidation
# Redis cluster example with read replicas
from redis.cluster import RedisCluster
rc = RedisCluster(
startup_nodes=[
{"host": "redis-1", "port": 6379},
{"host": "redis-2", "port": 6379},
{"host": "redis-3", "port": 6379},
],
read_from_replicas=True # Scale reads across replicas
)
Memcached
Memcached is simpler and faster for pure key-value caching:
- Multi-threaded: Better CPU utilization than single-threaded Redis
- Memory efficient: Less overhead per key for simple string values
- Consistent hashing: Built-in support for scaling the cluster
Choose Redis when you need data structures, persistence, or pub/sub. Choose Memcached when you need raw throughput for simple key-value lookups with large datasets.
Cache Stampede Prevention
A cache stampede (or thundering herd) happens when a popular cache entry expires and hundreds of concurrent requests all miss the cache and hit the database simultaneously.
Locking (Mutex Pattern)
Only one request computes the value while others wait.
def get_with_lock(key: str) -> dict:
value = cache.get(key)
if value:
return json.loads(value)
# Try to acquire a lock
lock_key = f"lock:{key}"
if cache.set(lock_key, "1", nx=True, ex=10):
try:
# This request computes the value
value = expensive_database_query(key)
cache.set(key, json.dumps(value), ex=3600)
return value
finally:
cache.delete(lock_key)
else:
# Another request is computing — wait and retry
time.sleep(0.05)
return get_with_lock(key)
Probabilistic Early Expiration
Recompute the value before it actually expires, with increasing probability as expiration approaches.
import math, random
def get_with_early_recompute(key: str, beta: float = 1.0) -> dict:
value, ttl, compute_time = cache.get_with_metadata(key)
if value is None:
# True miss — compute and cache
return recompute_and_cache(key)
# Probabilistic early recompute
# As TTL approaches 0, probability of recompute approaches 1
if ttl > 0:
random_threshold = compute_time * beta * math.log(random.random())
if -random_threshold >= ttl:
# Recompute early in the background
recompute_and_cache(key)
return value
This approach avoids the need for distributed locks and naturally spreads recomputation across time.
Cache Warming
Cold caches can crush your database after a deployment or cache node restart. Cache warming pre-populates the cache before traffic arrives.
class CacheWarmer:
def warm(self):
"""Pre-populate cache with known hot data."""
# Load top 1000 most accessed products
hot_products = self.db.query(
"SELECT * FROM products ORDER BY access_count DESC LIMIT 1000"
)
pipe = self.cache.pipeline()
for product in hot_products:
pipe.set(
f"product:{product['id']}",
json.dumps(product),
ex=3600
)
pipe.execute()
print(f"Warmed {len(hot_products)} product entries")
def warm_from_access_log(self):
"""Replay recent access logs to warm the cache."""
recent_keys = self.access_log.get_top_keys(
since=datetime.now() - timedelta(hours=1),
limit=5000
)
for key in recent_keys:
self.get_with_cache(key) # Triggers cache population
Best practices for cache warming:
- Run warming scripts as part of your deployment pipeline
- Use access logs to identify hot keys rather than guessing
- Warm progressively — do not slam the database with 100,000 queries at once
- Monitor cache hit rate during and after warming to verify effectiveness
Multi-Level Caching
The most performant systems use multiple cache layers, each trading off capacity for speed.
L1: In-Process Cache
Fastest possible access — no network hop. Stored in application memory.
from functools import lru_cache
from cachetools import TTLCache
# Simple in-process cache with TTL
l1_cache = TTLCache(maxsize=1000, ttl=60) # 1000 items, 60s TTL
def get_user(user_id: str) -> dict:
# L1: Check in-process cache (sub-microsecond)
if user_id in l1_cache:
return l1_cache[user_id]
# L2: Check Redis (0.5ms)
cached = redis.get(f"user:{user_id}")
if cached:
user = json.loads(cached)
l1_cache[user_id] = user # Promote to L1
return user
# L3: Database (5-50ms)
user = db.query_user(user_id)
if user:
redis.set(f"user:{user_id}", json.dumps(user), ex=3600)
l1_cache[user_id] = user
return user
L2: Distributed Cache (Redis/Memcached)
Shared across all application instances. Survives individual app restarts.
L3: CDN Cache
For static or semi-static content, push caching to the edge — as close to the user as possible.
Cache-Control: public, max-age=300, s-maxage=3600, stale-while-revalidate=86400
This tells CDNs to cache for 1 hour, browsers for 5 minutes, and serve stale content for up to 24 hours while revalidating in the background.
The Full Picture
Request flow with multi-level caching:
User Request
│
▼
┌─────────┐ HIT
│ CDN (L3) │ ──────► Return cached response (< 10ms)
└────┬────┘
│ MISS
▼
┌──────────────┐ HIT
│ App Cache L1 │ ──────► Return from memory (< 0.1ms)
│ (in-process) │
└──────┬───────┘
│ MISS
▼
┌──────────────┐ HIT
│ Redis L2 │ ──────► Return from Redis (< 1ms)
│ (distributed) │
└──────┬───────┘
│ MISS
▼
┌──────────────┐
│ Database │ ──────► Return from DB (5-50ms)
│ (source of │ + populate L1, L2, L3
│ truth) │
└──────────────┘
Real-World Patterns and Lessons
Pattern: Cache-Aside with Event Invalidation
This is the most common pattern in production systems. Use cache-aside for reads and event-based invalidation for writes.
# The "golden pattern" for most applications
class ProductService:
def get_product(self, product_id):
# Cache-aside read
cached = self.cache.get(f"product:{product_id}")
if cached:
return json.loads(cached)
product = self.db.get_product(product_id)
self.cache.set(f"product:{product_id}", json.dumps(product), ex=1800)
return product
def update_product(self, product_id, data):
self.db.update_product(product_id, data)
# Event-based invalidation
self.events.publish("product.updated", {"id": product_id})
# Also delete immediately for this instance
self.cache.delete(f"product:{product_id}")
Pattern: Request Collapsing
Multiple identical concurrent requests are collapsed into a single database query.
import asyncio
from collections import defaultdict
class RequestCollapser:
def __init__(self):
self._pending = {}
async def get(self, key: str, loader) -> dict:
if key in self._pending:
# Another request is already loading this — wait for it
return await self._pending[key]
future = asyncio.get_event_loop().create_future()
self._pending[key] = future
try:
result = await loader(key)
future.set_result(result)
return result
finally:
del self._pending[key]
Lessons from Production
- Monitor your cache hit rate. A healthy cache should have 95%+ hit rate for hot data. Below 80%, investigate why.
- Set TTLs on everything. Entries without TTLs accumulate and eventually cause memory pressure.
- Use cache key prefixes and namespaces.
user:42:profileis infinitely better than42for debugging. - Never cache null results without short TTLs. Otherwise, a temporary database error can fill your cache with negative entries.
- Plan for cache failure. Your system should degrade gracefully when the cache is unavailable, not crash.
- Size your cache based on your working set, not your total data. If 10% of products get 90% of traffic, cache that 10%.
Key Takeaways
- Cache-aside is the most flexible and widely used pattern — start here
- Write-through guarantees consistency but adds write latency
- Write-behind maximizes write throughput but risks data loss
- Multi-level caching (L1 in-process, L2 distributed, L3 CDN) gives you the best of all worlds
- Cache stampede prevention is critical for high-traffic keys — use locking or probabilistic early recompute
- Cache invalidation is the hard part — combine TTL with event-based invalidation for the best results
- Always monitor cache hit rate, memory usage, and eviction rate
Caching is not something you bolt on after the fact. The best systems are designed with caching as a first-class architectural concern from day one.
Keep Reading
Designing a Scalable Notification System
A system design deep dive into building a notification platform that handles push, email, SMS, and in-app notifications at scale — covering architecture, priority queues, fan-out strategies, rate limiting, and delivery tracking.
API Design Best Practices: REST, GraphQL, and gRPC Compared
A deep dive into the three dominant API paradigms — REST, GraphQL, and gRPC — covering design principles, pagination strategies, versioning, authentication patterns, and practical guidance on choosing the right one for your system.
Event-Driven Architecture: Patterns, Pitfalls, and Practical Guidance
A comprehensive guide to event-driven architecture — covering pub/sub, event sourcing, CQRS, saga patterns, message broker trade-offs, and the hard lessons teams learn in production.
Comments
No comments yet. Be the first!