March 30, 20268 min readRishi

Idempotency Keys: The Pattern That Saves Your Payment System

A retry in a payment system is terrifying. Your client sends a POST /charges for $500, the request hits the server, the server processes the charge, the response fails to arrive (network glitch, timeout, whatever). The client retries. Without protection, the customer is now $1,000 poorer and someone on your team has a very bad afternoon.

The protection is an idempotency key. The pattern is well-known and widely implemented. The failure modes are less well-known, and they are what cause the worst incidents. This post walks through the mechanism and the traps.

The Core Pattern

The client generates a unique key per logical operation (not per attempt) and sends it with the request:

POST /charges
Idempotency-Key: a3f1-8b42-...
{ "amount": 500, "customer_id": "..." }

The server:

Receives the request.
Looks up the key in a dedicated store.
If the key is new, processes the request and stores (key, request_fingerprint, response) atomically.
If the key exists with a completed response, returns the stored response without re-executing.
If the key exists but the request is still in flight, returns a "still processing" response.

That is the shape. The interesting details are in each step.

Scope: Per-Endpoint, Per-Customer

An idempotency key is not globally unique — it is unique within a scope. The scope usually combines endpoint and tenant:

key_scope = (endpoint, customer_id)

This matters because two customers can legitimately send the same key value for different requests without conflict, and because a key from one endpoint should not interact with another endpoint. Storing idempotency state under a flat key across the whole system is a footgun that creates weird cross-tenant collisions.

Fingerprinting: Detect Client Bugs

Store a hash of the request body alongside each key. On a repeat request with the same key, compare fingerprints. If the body is different, the client has a bug — same key, different payload — and you should return an error rather than silently returning the old response.

import hashlib, json

def fingerprint(body: dict) -> str:
    canonical = json.dumps(body, sort_keys=True, separators=(",", ":"))
    return hashlib.sha256(canonical.encode()).hexdigest()

The error usually looks like 409 Conflict with a body saying "idempotency key was used with a different request body." Real client bugs will surface as this error in your monitoring; real retries will hit the happy path.

The In-Flight Problem

The subtlest part: what do you do when a key has been received but the first request has not finished yet?

Three wrong answers:

Let the second request through. You just lost idempotency. Both requests execute.
Return the cached response. There is no cached response yet — the first hasn't finished.
Silently wait indefinitely. You gave the client a pending connection they might not recover from.

The right answer: return an explicit "in progress" response (usually 409 Conflict or 425 Too Early) and tell the client to retry after a brief delay. The client can poll or back off. This keeps your in-flight count bounded.

Implementation-wise, this means your idempotency record has a state column: pending, completed, failed. The atomic insert transitions nothing → pending. The response handler transitions pending → completed (or failed) under a transaction that also writes the response.

The Transaction Boundary

The idempotency record and the business-effect record must be written in the same transaction. If they are not, you have a window where the charge happened but the idempotency record does not know about it — or the idempotency record says "completed" but the charge never ran.

BEGIN;
  INSERT INTO idempotency_keys (key, scope, state, fingerprint)
  VALUES ($1, $2, 'completed', $3);

  INSERT INTO charges (id, amount, customer_id, ...)
  VALUES ($4, $5, $6, ...);
COMMIT;

For operations that span multiple services — charge a card via Stripe, then write a charge record locally — you need more care. The typical pattern: store the idempotency key with a pending state, execute the external call with its own idempotency key (passed through), record the result. If the external call has its own idempotency guarantee (Stripe does), the retry path is safe end-to-end.

Key Lifetime

Keys do not live forever. They cost space, they complicate queries, and they rarely matter beyond the retry window.

A common pattern: keep keys for 24 hours and run a nightly cleanup. Stripe keeps them for 24 hours. For operations with longer retry windows (e.g., batch-processing workflows), 7 days is reasonable.

Two things to get right:

Cleanup must not delete records that point to data the key might still protect. If your key TTL is 24 hours but a client retry could come 26 hours later, you are exposed. Match the TTL to the actual retry window, not a random round number.
Do not rely on TTL-on-insert semantics that differ between stores. Some stores use lazy expiration; keys appear to exist for minutes after their TTL. If you are checking key existence, make sure you are also checking the stored expiry time.

The Mistakes That Cause Real Incidents

Generating the key on the server instead of the client. A server-generated key changes on retry; the whole point of the key is that the client keeps the same value across attempts. Spec the key as a required header from the client.

Using non-unique keys. Some teams use a request ID that is actually a sequence number or a timestamp. A timestamp collides under concurrency. A UUID v4 is the safe default. For clients that cannot generate UUIDs, accept a client-chosen string that is at least 128 bits of entropy.

Forgetting to handle the "success with retry" path. The original request might succeed, but the response fails to reach the client. The client retries. Your server must return the exact same response — same status code, same body — not a "already processed" message. Store the full response, not just the fact of completion.

Checking idempotency after side effects. I have seen systems that do the business logic, then check the key, then write the key. On a crash between the logic and the key write, you get re-execution. The check must come first, and the write must be atomic with the logic.

Different retry behavior on different error classes. A 5xx server error from an upstream is retryable; a 4xx business error usually is not. But your idempotency store should not care — store the response regardless. Clients and intermediaries will sometimes retry after 4xx, and your key protects both cases.

A Minimal Working Implementation

Putting it together, in a pseudo-Python shape that compiles to most real stacks:

def handle_request(key: str, scope: str, body: dict) -> Response:
    fp = fingerprint(body)

    with db.transaction():
        record = db.select_for_update(
            "SELECT state, fingerprint, response FROM idempotency_keys "
            "WHERE key = %s AND scope = %s",
            (key, scope),
        )

        if record is None:
            db.insert(
                "INSERT INTO idempotency_keys (key, scope, state, fingerprint) "
                "VALUES (%s, %s, 'pending', %s)",
                (key, scope, fp),
            )
        else:
            if record.fingerprint != fp:
                return Response(409, {"error": "idempotency_key_reused_with_different_body"})
            if record.state == "pending":
                return Response(409, {"error": "request_in_progress"})
            if record.state == "completed":
                return Response(record.response.status, record.response.body)

    try:
        result = do_the_actual_work(body)
        with db.transaction():
            db.update(
                "UPDATE idempotency_keys SET state = 'completed', response = %s "
                "WHERE key = %s AND scope = %s",
                (serialize(result), key, scope),
            )
        return result
    except Exception as e:
        with db.transaction():
            db.update(
                "UPDATE idempotency_keys SET state = 'failed', response = %s "
                "WHERE key = %s AND scope = %s",
                (serialize({"error": str(e)}), key, scope),
            )
        raise

That is the pattern. It fits in one function; it gets every tricky case right.

What Idempotency Keys Do Not Do

Worth being explicit: idempotency keys handle retry safety for write operations. They do not help with:

Concurrent duplicate intents. If a user double-clicks a "Pay" button and the UI fires two different idempotency keys, both charges will go through. Fix this in the client by debouncing, not in the server.
Deduplication of out-of-band events. If your charges can come from webhooks, scheduled jobs, and the API, idempotency keys only protect the API path. The webhook path needs its own dedup, usually keyed on the event ID.
Exactly-once semantics in the queue sense. Idempotency at the API boundary does not give you exactly-once processing downstream — you need dedup at every write boundary, keyed appropriately.

Where This Should Live in Your Stack

An idempotency middleware that wraps your write endpoints is the cleanest shape. Clients pass Idempotency-Key, the middleware handles the lookup, pending-state, fingerprint check, and response caching. Your business logic stays clean. You get the same protection across every endpoint for free.

I would rather have a simple middleware implementing all of the above than a clever framework feature that implements most of it with a subtle bug. The pattern is small enough to write yourself, and writing it yourself is how you make sure it is right.

SharePost Share

Keep reading

Apr 25, 20268 min read

Event-Driven System Design: The Decisions That Bite You Later

A practical guide to the design decisions that determine whether an event-driven system stays maintainable or quietly rots — delivery guarantees, ordering, idempotency, schema evolution, and the outbox.

system-design tutorial

Apr 4, 20268 min read

Vector Databases Compared: pgvector, Qdrant, Pinecone, and When You Don't Need Any

A practical comparison of the vector databases people actually deploy in 2026 — and an honest look at when a vector database is the wrong tool for the job.

system-design ai tutorial

Mar 20, 20268 min read

Multi-Tenant Data Isolation: Row-Level, Schema-Level, Database-Level, and How to Choose

Three patterns for multi-tenant data isolation in SaaS, the trade-offs between cost, blast radius, and compliance, and a migration path from one to another when you outgrow your first choice.