June 20, 20267 min readRishi

Backpressure: What Happens When Your System Can't Keep Up

A data pipeline reads events from a queue and writes them to a database. It runs fine for months. Then the database has a slow afternoon — an index rebuild, a noisy neighbor, whatever. The pipeline keeps reading from the queue at full speed but can't write fast enough. The unwritten events pile up in an in-memory buffer. The buffer grows. Memory climbs. The process gets OOM-killed. It restarts, replays from the queue, fills the buffer again, and dies again. A temporary database slowdown has become a permanent crash loop.

The missing ingredient is backpressure: a way for the slow consumer to push back on the fast producer and say "slow down, I'm full." It's one of those concepts that's invisible when present and catastrophic when absent, and it shows up the moment any two components in your system can run at different speeds.

The fundamental mismatch

Any time a producer can generate work faster than a consumer can process it, you have a choice to make about what happens to the excess. There are only a few options, and pretending the problem doesn't exist silently picks the worst one.

Producer  ──events──▶  [ buffer ]  ──▶  Consumer
 (fast)                              (slow)

When the consumer falls behind, the buffer between them fills. What then?

Unbounded buffer — the buffer grows until you run out of memory. This is the default if you don't think about it, and it's a time bomb. "We'll just queue it" without a bound is how the crash loop above happens.
Drop data — when the buffer is full, throw new items away (or evict old ones). Bounded memory, but you lose data. Fine for some workloads (metrics samples, live video frames), unacceptable for others (financial transactions).
Block / slow the producer — when the buffer is full, make the producer wait until there's room. This is true backpressure: the slowness propagates upstream until it reaches something that can legitimately absorb it.

Backpressure is the discipline of choosing deliberately instead of defaulting to the time bomb. Almost always, the right design is a bounded buffer plus an explicit policy for what happens when it's full.

Pull beats push for flow control

A deep reason this problem is so common: many systems are built push-based, where the producer sends data whenever it has data, and the consumer must cope. Push has no natural brake — the producer doesn't know or care whether the consumer is keeping up.

Pull-based systems invert this. The consumer requests the next item when it's ready, and the producer only sends in response to demand. The brake is built in: if the consumer stops asking, the producer stops sending. No item moves without the consumer signaling it has capacity.

This is exactly the model behind Reactive Streams (and its implementations like Project Reactor, RxJava, Akka Streams) — the consumer signals demand (request(n)), and the publisher must never send more than has been requested. It's a formal protocol for backpressure baked into the type system. The lesson generalizes even if you never use those libraries: design the consumer to control the rate, not the producer.

A message queue gives you this naturally. Kafka consumers pull batches at their own pace; the broker doesn't shove messages at them. A consumer that's behind simply polls less often, and the unprocessed data sits durably in the log — which is a far better place for a backlog than your process's heap. This is a major reason durable queues are the backbone of resilient pipelines: the queue is the bounded, persistent buffer, and pulling from it is the backpressure mechanism.

Propagating the signal all the way up

The subtle part of backpressure is that slowing the immediate producer often isn't enough — the pressure has to travel all the way back to a component that can actually absorb it without harm.

Consider an HTTP service that accepts requests, puts work on an internal queue, and a worker pool drains the queue into a database. If the database slows down:

The workers slow down (they're blocked on the DB).
The internal queue fills.
If nothing pushes back, the HTTP layer keeps accepting requests and the queue grows unbounded — the time bomb again.

Proper backpressure means step 3 propagates: when the internal queue is full, the HTTP layer stops accepting new work and returns 503 Service Unavailable (ideally with a Retry-After header). Now the pressure has reached the system boundary, where the client — which can retry with backoff, or where a human can wait — absorbs it. That's the right place. The pressure traveled from the database all the way to the edge instead of accumulating in a heap somewhere in the middle.

DB slow → workers block → queue fills → HTTP returns 503 → client backs off
          (pressure propagates upstream to a place that can absorb it)

A system without this propagation doesn't reject load — it accepts load it can't handle and then dies, which is strictly worse than rejecting it. Shedding load gracefully is a feature. A 503 is a healthy system protecting itself; an OOM kill is an unhealthy one that didn't.

Practical mechanisms

You rarely build backpressure from scratch; you assemble it from primitives:

Bounded queues / blocking buffers. Use a fixed-capacity queue. When full, either block the producer or reject. Java's ArrayBlockingQueue, Go's buffered channels, and most async frameworks give you this. The bound is the whole point — never use an unbounded queue between components with different speeds.
Semaphores / concurrency limits. Cap the number of in-flight operations. A new request can't start until an in-flight one finishes. This naturally limits how much work piles up.
Token buckets / rate limiters on the producer side, to cap the input rate before it ever becomes a backlog.
Credit-based flow control, where the consumer grants the producer a budget of items it's allowed to send, replenished as the consumer makes progress. This is what TCP does with its receive window, and what gRPC/HTTP-2 do at the stream level — backpressure is literally built into the transport you already use.
Timeouts and 503 at the edge so excess load is rejected fast rather than queued forever.

Designing for it from the start

The trap with backpressure is that you only discover its absence during an incident, because everything works fine until something downstream gets slow — and something downstream always eventually gets slow. By then the buffer is already overflowing in production.

So bake it in from the start. The questions to ask of every producer/consumer boundary in your design:

What's the bound on the buffer between these two? (If the answer is "unbounded," that's a bug, not a design.)
When the buffer is full, do we block, drop, or reject — and is that the right choice for this data?
Does the pressure propagate to somewhere that can absorb it, or does it just accumulate in memory?
Can we observe the buffer depth, so we see pressure building before it becomes an outage?

Get those four answers right at every boundary and your system degrades gracefully under load instead of falling off a cliff. Backpressure isn't a feature you add later — it's the difference between a system that gets slower when overwhelmed and one that dies. Build it in while the diagram is still on the whiteboard, because retrofitting it during a 2 a.m. crash loop is a much worse time to learn the lesson.

SharePost Share

Keep reading

Jun 18, 20266 min read

Database Connection Pooling: The Bottleneck You Forgot to Tune

More connections is not more throughput. Past a point, adding connections makes your database slower. Here is how pools actually work and how to size one without guessing.

system-design tutorial

Jun 17, 20267 min read

Write-Ahead Logging: The Unsung Hero of Database Durability

How does a database survive a power cut mid-write without corrupting your data? The answer is a deceptively simple rule: log the change before you apply it. Here is why WAL is everywhere.