April 2, 202613 min readRishi

Caching Strategies for High-Performance Systems

Every millisecond counts. When your database query takes 50ms and a cache lookup takes 0.5ms, caching is not an optimization — it is a fundamental architectural decision. The difference between a system that handles 1,000 requests per second and one that handles 100,000 often comes down to how well you cache.

Why Caching Matters: The Latency Numbers

Before diving into strategies, internalize these approximate latency figures:

Operation	Latency
L1 cache reference	0.5 ns
L2 cache reference	7 ns
Main memory reference	100 ns
Read 1 MB from memory	250 us
Redis GET (same datacenter)	0.5 ms
Round trip within datacenter	0.5 ms
SSD random read	0.15 ms
Disk seek	10 ms
Read 1 MB from network	10 ms
PostgreSQL simple query	1-50 ms
Round trip US coast to coast	40 ms

A cache hit at the Redis layer is 100x faster than a typical database query. At scale, this difference determines whether your system feels instant or sluggish.

Cache-Aside (Lazy Loading)

Cache-aside is the most common caching pattern. The application manages both the cache and the database directly.

How It Works

Application receives a request
Check the cache for the data
Cache hit: return the cached data
Cache miss: query the database, store the result in the cache, return the data

class CacheAsideService:
    def __init__(self, cache, database):
        self.cache = cache
        self.db = database

    def get_user(self, user_id: str) -> dict:
        # Step 1: Check cache
        cached = self.cache.get(f"user:{user_id}")
        if cached:
            return json.loads(cached)

        # Step 2: Cache miss — read from database
        user = self.db.query("SELECT * FROM users WHERE id = %s", user_id)
        if user is None:
            return None

        # Step 3: Populate cache for next time
        self.cache.set(
            f"user:{user_id}",
            json.dumps(user),
            ex=3600  # TTL of 1 hour
        )
        return user

    def update_user(self, user_id: str, data: dict):
        # Update database first
        self.db.execute("UPDATE users SET ... WHERE id = %s", user_id)
        # Invalidate cache (do NOT update it — delete it)
        self.cache.delete(f"user:{user_id}")

When to Use Cache-Aside

Read-heavy workloads where data changes infrequently
When you can tolerate slightly stale data on cache misses
When you want full control over what gets cached and when

Pitfall: Delete, Don't Update

On writes, delete the cache entry rather than updating it. Updating creates a race condition where two concurrent writes can leave the cache in an inconsistent state.

Read-Through Cache

Read-through looks similar to cache-aside, but the cache itself is responsible for loading data from the database on a miss. The application only talks to the cache.

class ReadThroughCache:
    """
    The cache library handles loading on miss.
    Application code never touches the database directly for reads.
    """
    def __init__(self, cache_client, loader_fn):
        self.cache = cache_client
        self.loader = loader_fn

    def get(self, key: str) -> dict:
        value = self.cache.get(key)
        if value is None:
            # Cache is responsible for calling the loader
            value = self.loader(key)
            if value is not None:
                self.cache.set(key, value, ex=3600)
        return value

# Usage
cache = ReadThroughCache(
    redis_client,
    loader_fn=lambda key: db.query_user(key.split(":")[1])
)
user = cache.get("user:42")  # App never calls DB directly

Advantage over cache-aside: Simpler application code. The caching logic is encapsulated inside the cache layer, so application developers cannot accidentally skip the cache.

Write-Through Cache

Write-through ensures that every write goes to both the cache and the database synchronously. The write is only considered successful when both operations complete.

class WriteThroughCache:
    def write(self, key: str, value: dict):
        # Write to cache AND database in a single operation
        self.cache.set(key, json.dumps(value))
        self.db.execute(
            "INSERT INTO data (key, value) VALUES (%s, %s) "
            "ON CONFLICT (key) DO UPDATE SET value = %s",
            key, json.dumps(value), json.dumps(value)
        )
        # Both succeed or the operation fails

Pros and Cons

Pro: Cache is always consistent with the database — no stale reads
Pro: Simple mental model — write once, it is everywhere
Con: Higher write latency — every write waits for both cache and DB
Con: Writes to data that may never be read waste cache space

Write-through works best when combined with read-through, giving you a unified caching layer.

Write-Behind (Write-Back)

Write-behind is the performance-optimized cousin of write-through. Writes go to the cache immediately, and the cache asynchronously flushes changes to the database in the background.

class WriteBehindCache:
    def __init__(self, cache, db, flush_interval=5):
        self.cache = cache
        self.db = db
        self.dirty_keys = set()
        self.flush_interval = flush_interval
        self._start_background_flush()

    def write(self, key: str, value: dict):
        # Write to cache immediately — fast!
        self.cache.set(key, json.dumps(value))
        self.dirty_keys.add(key)
        # Return immediately, DB write happens async

    def _flush_to_db(self):
        keys_to_flush = list(self.dirty_keys)
        self.dirty_keys.clear()
        for key in keys_to_flush:
            value = self.cache.get(key)
            if value:
                self.db.upsert(key, json.loads(value))

    def _start_background_flush(self):
        # Runs every flush_interval seconds
        scheduler.every(self.flush_interval).seconds.do(self._flush_to_db)

The Risk

If the cache node crashes before flushing to the database, you lose data. Write-behind is only appropriate when:

You can tolerate some data loss (counters, analytics, session data)
You batch writes for efficiency (reducing DB load by 10-100x)
You have cache replication for durability

Cache Invalidation Strategies

Phil Karlton famously said there are only two hard things in computer science: cache invalidation and naming things. Here are your options.

TTL-Based Invalidation

The simplest approach: every cache entry expires after a fixed time.

# Simple TTL
cache.set("product:123", data, ex=300)  # Expires in 5 minutes

# TTL with jitter to prevent thundering herd
import random
base_ttl = 300
jitter = random.randint(0, 60)
cache.set("product:123", data, ex=base_ttl + jitter)

When to use: Data that changes unpredictably and where staleness up to TTL seconds is acceptable.

Event-Based Invalidation

Invalidate cache entries when the underlying data changes, using events or database triggers.

# Using an event bus for cache invalidation
class UserService:
    def update_user(self, user_id, data):
        self.db.update_user(user_id, data)
        # Publish event — cache subscriber will invalidate
        self.event_bus.publish("user.updated", {
            "user_id": user_id,
            "timestamp": time.time()
        })

class CacheInvalidationSubscriber:
    def on_user_updated(self, event):
        user_id = event["user_id"]
        self.cache.delete(f"user:{user_id}")
        self.cache.delete(f"user_profile:{user_id}")
        self.cache.delete(f"user_permissions:{user_id}")

When to use: When you need strong consistency and have a reliable event system in place.

Versioned Keys

Instead of invalidating, create a new cache key each time data changes.

# Store a version counter
version = cache.incr(f"user:{user_id}:version")
cache_key = f"user:{user_id}:v{version}"
cache.set(cache_key, user_data, ex=86400)

# On read, always fetch the current version first
current_version = cache.get(f"user:{user_id}:version")
data = cache.get(f"user:{user_id}:v{current_version}")

When to use: When you want to avoid delete-then-repopulate races and old cache entries can expire naturally.

Cache Eviction Policies

When the cache is full, something has to go. These policies determine what gets evicted.

LRU (Least Recently Used)

Evicts the entry that has not been accessed for the longest time. This is the default policy for Redis (allkeys-lru or volatile-lru).

Best for: General-purpose caching where recent access patterns predict future access
Weakness: A one-time scan of many keys can evict frequently used entries

LFU (Least Frequently Used)

Evicts the entry that has been accessed the fewest times. Redis supports this with allkeys-lfu.

Best for: Workloads with a stable set of hot keys (product catalog, popular content)
Weakness: New entries start with low frequency and may be evicted before they prove popular

FIFO (First In, First Out)

Evicts the oldest entry regardless of access patterns.

Best for: Time-series data where newer entries are always more relevant
Weakness: Ignores access patterns entirely — a frequently accessed old item gets evicted

Choosing the Right Policy

Decision guide:
├── Is access recency a good predictor? → LRU
├── Do you have a stable set of hot keys? → LFU
├── Is data freshness more important than access patterns? → FIFO
└── Not sure? → Start with LRU (it works well for most workloads)

Distributed Caching with Redis and Memcached

Redis

Redis is the Swiss Army knife of caching. Beyond simple key-value storage, it supports:

Data structures: Strings, hashes, lists, sets, sorted sets, streams
Persistence: RDB snapshots and AOF logs for durability
Replication: Primary-replica for read scaling and failover
Cluster mode: Automatic sharding across multiple nodes
Pub/Sub: Built-in event system for cache invalidation

# Redis cluster example with read replicas
from redis.cluster import RedisCluster

rc = RedisCluster(
    startup_nodes=[
        {"host": "redis-1", "port": 6379},
        {"host": "redis-2", "port": 6379},
        {"host": "redis-3", "port": 6379},
    ],
    read_from_replicas=True  # Scale reads across replicas
)

Memcached

Memcached is simpler and faster for pure key-value caching:

Multi-threaded: Better CPU utilization than single-threaded Redis
Memory efficient: Less overhead per key for simple string values
Consistent hashing: Built-in support for scaling the cluster

Choose Redis when you need data structures, persistence, or pub/sub. Choose Memcached when you need raw throughput for simple key-value lookups with large datasets.

Cache Stampede Prevention

A cache stampede (or thundering herd) happens when a popular cache entry expires and hundreds of concurrent requests all miss the cache and hit the database simultaneously.

Locking (Mutex Pattern)

Only one request computes the value while others wait.

def get_with_lock(key: str) -> dict:
    value = cache.get(key)
    if value:
        return json.loads(value)

    # Try to acquire a lock
    lock_key = f"lock:{key}"
    if cache.set(lock_key, "1", nx=True, ex=10):
        try:
            # This request computes the value
            value = expensive_database_query(key)
            cache.set(key, json.dumps(value), ex=3600)
            return value
        finally:
            cache.delete(lock_key)
    else:
        # Another request is computing — wait and retry
        time.sleep(0.05)
        return get_with_lock(key)

Probabilistic Early Expiration

Recompute the value before it actually expires, with increasing probability as expiration approaches.

import math, random

def get_with_early_recompute(key: str, beta: float = 1.0) -> dict:
    value, ttl, compute_time = cache.get_with_metadata(key)

    if value is None:
        # True miss — compute and cache
        return recompute_and_cache(key)

    # Probabilistic early recompute
    # As TTL approaches 0, probability of recompute approaches 1
    if ttl > 0:
        random_threshold = compute_time * beta * math.log(random.random())
        if -random_threshold >= ttl:
            # Recompute early in the background
            recompute_and_cache(key)

    return value

This approach avoids the need for distributed locks and naturally spreads recomputation across time.

Cache Warming

Cold caches can crush your database after a deployment or cache node restart. Cache warming pre-populates the cache before traffic arrives.

class CacheWarmer:
    def warm(self):
        """Pre-populate cache with known hot data."""
        # Load top 1000 most accessed products
        hot_products = self.db.query(
            "SELECT * FROM products ORDER BY access_count DESC LIMIT 1000"
        )
        pipe = self.cache.pipeline()
        for product in hot_products:
            pipe.set(
                f"product:{product['id']}",
                json.dumps(product),
                ex=3600
            )
        pipe.execute()
        print(f"Warmed {len(hot_products)} product entries")

    def warm_from_access_log(self):
        """Replay recent access logs to warm the cache."""
        recent_keys = self.access_log.get_top_keys(
            since=datetime.now() - timedelta(hours=1),
            limit=5000
        )
        for key in recent_keys:
            self.get_with_cache(key)  # Triggers cache population

Best practices for cache warming:

Run warming scripts as part of your deployment pipeline
Use access logs to identify hot keys rather than guessing
Warm progressively — do not slam the database with 100,000 queries at once
Monitor cache hit rate during and after warming to verify effectiveness

Multi-Level Caching

The most performant systems use multiple cache layers, each trading off capacity for speed.

L1: In-Process Cache

Fastest possible access — no network hop. Stored in application memory.

from functools import lru_cache
from cachetools import TTLCache

# Simple in-process cache with TTL
l1_cache = TTLCache(maxsize=1000, ttl=60)  # 1000 items, 60s TTL

def get_user(user_id: str) -> dict:
    # L1: Check in-process cache (sub-microsecond)
    if user_id in l1_cache:
        return l1_cache[user_id]

    # L2: Check Redis (0.5ms)
    cached = redis.get(f"user:{user_id}")
    if cached:
        user = json.loads(cached)
        l1_cache[user_id] = user  # Promote to L1
        return user

    # L3: Database (5-50ms)
    user = db.query_user(user_id)
    if user:
        redis.set(f"user:{user_id}", json.dumps(user), ex=3600)
        l1_cache[user_id] = user
    return user

L2: Distributed Cache (Redis/Memcached)

Shared across all application instances. Survives individual app restarts.

L3: CDN Cache

For static or semi-static content, push caching to the edge — as close to the user as possible.

Cache-Control: public, max-age=300, s-maxage=3600, stale-while-revalidate=86400

This tells CDNs to cache for 1 hour, browsers for 5 minutes, and serve stale content for up to 24 hours while revalidating in the background.

The Full Picture

Request flow with multi-level caching:

User Request
    │
    ▼
┌─────────┐   HIT
│ CDN (L3) │ ──────► Return cached response (< 10ms)
└────┬────┘
     │ MISS
     ▼
┌──────────────┐   HIT
│ App Cache L1  │ ──────► Return from memory (< 0.1ms)
│ (in-process)  │
└──────┬───────┘
       │ MISS
       ▼
┌──────────────┐   HIT
│ Redis L2      │ ──────► Return from Redis (< 1ms)
│ (distributed) │
└──────┬───────┘
       │ MISS
       ▼
┌──────────────┐
│ Database      │ ──────► Return from DB (5-50ms)
│ (source of    │         + populate L1, L2, L3
│  truth)       │
└──────────────┘

Real-World Patterns and Lessons

Pattern: Cache-Aside with Event Invalidation

This is the most common pattern in production systems. Use cache-aside for reads and event-based invalidation for writes.

# The "golden pattern" for most applications
class ProductService:
    def get_product(self, product_id):
        # Cache-aside read
        cached = self.cache.get(f"product:{product_id}")
        if cached:
            return json.loads(cached)
        product = self.db.get_product(product_id)
        self.cache.set(f"product:{product_id}", json.dumps(product), ex=1800)
        return product

    def update_product(self, product_id, data):
        self.db.update_product(product_id, data)
        # Event-based invalidation
        self.events.publish("product.updated", {"id": product_id})
        # Also delete immediately for this instance
        self.cache.delete(f"product:{product_id}")

Pattern: Request Collapsing

Multiple identical concurrent requests are collapsed into a single database query.

import asyncio
from collections import defaultdict

class RequestCollapser:
    def __init__(self):
        self._pending = {}

    async def get(self, key: str, loader) -> dict:
        if key in self._pending:
            # Another request is already loading this — wait for it
            return await self._pending[key]

        future = asyncio.get_event_loop().create_future()
        self._pending[key] = future
        try:
            result = await loader(key)
            future.set_result(result)
            return result
        finally:
            del self._pending[key]

Lessons from Production

Monitor your cache hit rate. A healthy cache should have 95%+ hit rate for hot data. Below 80%, investigate why.
Set TTLs on everything. Entries without TTLs accumulate and eventually cause memory pressure.
Use cache key prefixes and namespaces. user:42:profile is infinitely better than 42 for debugging.
Never cache null results without short TTLs. Otherwise, a temporary database error can fill your cache with negative entries.
Plan for cache failure. Your system should degrade gracefully when the cache is unavailable, not crash.
Size your cache based on your working set, not your total data. If 10% of products get 90% of traffic, cache that 10%.

Key Takeaways

Cache-aside is the most flexible and widely used pattern — start here
Write-through guarantees consistency but adds write latency
Write-behind maximizes write throughput but risks data loss
Multi-level caching (L1 in-process, L2 distributed, L3 CDN) gives you the best of all worlds
Cache stampede prevention is critical for high-traffic keys — use locking or probabilistic early recompute
Cache invalidation is the hard part — combine TTL with event-based invalidation for the best results
Always monitor cache hit rate, memory usage, and eviction rate

Caching is not something you bolt on after the fact. The best systems are designed with caching as a first-class architectural concern from day one.

SharePost Share

Keep reading

Jul 13, 20265 min read

Designing Ad Click Aggregation: Exactly-Once Counting at Scale

Billions of clicks, billed to the cent: streaming aggregation with watermarks, dedupe, idempotent sinks, and lambda-style reconciliation.

system-design

Jul 13, 20265 min read

Designing a CDN: Cache Hierarchy, Invalidation, and Request Routing

How a CDN actually works: edge PoPs, origin shields, consistent-hash cache keys, purge fan-out, and the anycast vs DNS routing decision.

system-design

Jul 13, 20265 min read

Designing a Distributed Cache Service: Redis Cluster Internals and Hot Keys

Build the cache, not just use it: slot-based sharding, gossip and failover, eviction under memory pressure, and the hot-key problem that shards can't solve.

Why Caching Matters: The Latency Numbers

Cache-Aside (Lazy Loading)

How It Works

When to Use Cache-Aside

Pitfall: Delete, Don't Update

Read-Through Cache

Write-Through Cache

Pros and Cons

Write-Behind (Write-Back)

The Risk

Cache Invalidation Strategies

TTL-Based Invalidation

Event-Based Invalidation

Versioned Keys

Cache Eviction Policies

LRU (Least Recently Used)

LFU (Least Frequently Used)

FIFO (First In, First Out)

Choosing the Right Policy

Distributed Caching with Redis and Memcached

Redis

Memcached

Cache Stampede Prevention

Locking (Mutex Pattern)

Probabilistic Early Expiration

Cache Warming

Multi-Level Caching

L1: In-Process Cache

L2: Distributed Cache (Redis/Memcached)

L3: CDN Cache

The Full Picture

Real-World Patterns and Lessons

Pattern: Cache-Aside with Event Invalidation

Pattern: Request Collapsing

Lessons from Production

Key Takeaways

Keep reading

Designing Ad Click Aggregation: Exactly-Once Counting at Scale

Designing a CDN: Cache Hierarchy, Invalidation, and Request Routing

Designing a Distributed Cache Service: Redis Cluster Internals and Hot Keys

New posts, straight to your inbox

Comments