Caching Strategies in Distributed Systems

Imagine you run an e-commerce website. You cache your most popular product's data in Redis with a 5-minute expiration. For 5 minutes, everything is blazing fast. Then the cache expires. 10,000 users, all browsing that product at the same instant, slam into your database at once.

Your database wasn't built for that. It buckles, slows down, and the whole site goes down. This is called a cache stampede, and it has caused real outages at companies of every size from startups to Netflix.

This article walks you through seven strategies that prevent this, from the simplest (adding randomness) to the most elegant (probabilistic math).

Why Basic TTL Caching Is Not Enough

TTL (Time-To-Live) is the simplest caching strategy. You store data with an expiration time. After that time, the entry disappears, and the next request fetches fresh data from the database.

SET product:42 → {name: "iPhone 16", price: "₹79,900"}  TTL = 300 seconds

// For 300 seconds, every read is instant.
// At second 301 — cache miss — hit the database.

This works fine for small apps. But at scale, four problems emerge.

Problem 1: The Stampede

When a hot key expires, every concurrent request misses the cache and queries the database simultaneously. A key serving 10,000 requests/second creates 10,000 database queries in one instant. This is the thundering herd.

Problem 2: Staleness Dilemma

Short TTL = frequent database hits. Long TTL = users see outdated data. If a user changes their profile name, other users may see the old name for the entire TTL duration. There is no way to say "invalidate this right now."

Problem 3: Cold Start

After a deployment or cache restart, everything is a miss. Your database suddenly handles 100% of traffic. Worse, if all keys were set around the same time, they expire together. This is a repeating spike every TTL cycle.

Problem 4: Memory Waste

A key accessed once gets the same lifetime as a key accessed a million times. You waste memory on rarely-used entries while potentially evicting popular ones.

How Cache Expiry Causes Traffic Spikes

Let's trace exactly what happens when a popular cache key expires.

The Single-Key Spike

Second 59:   5,000 req → cache HIT   → DB sees 0 queries      ✓
Second 60:   5,000 req → cache MISS  → DB sees 5,000 queries  x
Second 61:   5,000 req → cache HIT   → DB sees 0 queries      ✓

That one-second window is the thundering herd. Now imagine thousands of keys all set at the same time. They all expire together, creating a repeating sawtooth pattern:

Synchronized TTL Expiry - Sawtooth Pattern

DB Load
  ▲
  │   ╱│       ╱│       ╱│
  │  ╱ │      ╱ │      ╱ │     ← periodic spikes every 5 minutes
  │ ╱  │     ╱  │     ╱  │
  │╱   │    ╱   │    ╱   │
  └────┴───────┴───────┴────▶ time
    5min    10min    15min

The Death Spiral

This is where it gets truly dangerous. The sequence is:

Hot keys expire simultaneously. Thousands of cache misses at once.
Database overloads. Queries that normally take 10ms now take 2 seconds.
Timeouts fire. Application threads are blocked. Request queues fill up.
Retries double the load. Clients and load balancers retry timed-out requests, piling onto an already struggling database.
Full outage. The DB is so slow that cache repopulation fails. More keys expire. Death spiral.

cache expiry → DB overload → slow responses → timeouts & retries → more DB load → full outage

TTL Jitter — Adding Randomness to Expiration

The simplest fix. Instead of every key expiring at exactly the same time, add a small random offset. Expirations spread across a window instead of a single moment.

Without Jitter vs. With Jitter

Without jitter:
  Key A: TTL = 300s  → expires at t+300  ┐
  Key B: TTL = 300s  → expires at t+300  ├── all at once 
  Key C: TTL = 300s  → expires at t+300  ┘

With jitter:
  Key A: 300 + random(-30, 30) = 317s → expires at t+317  ┐
  Key B: 300 + random(-30, 30) = 282s → expires at t+282  ├── spread out 
  Key C: 300 + random(-30, 30) = 305s → expires at t+305  ┘

Instead of 10,000 cache misses in one second, you get roughly 167 misses per second over 60 seconds. Your database barely notices.

Jitter Strategies

Additive Uniform: Simple default, adds spread after base. Staleness guarantee doesn't improve. It only gets slightly worse on average.
Formula: base + random(0, range)

Symmetric: Preserves average TTL exactly; best general default.
Formula: base + random(-r, +r)

Proportional: Systems with varied TTLs. Scale the jitter window relative to the TTL itself.
Formula: base * (1 + random(-p, +p))

Full Jitter: Maximum spread; stampede prevention over staleness
Formula: random(min, base)

How to Choose the Jitter Window

The window must be large enough to absorb the spike:

// If 10,000 keys expire together
// and DB can handle 500 queries/sec
// then spread over at least 20 seconds

min_jitter_range = expiring_keys / db_capacity
                 = 10,000 / 500 = 20 seconds

// High traffic systems
ttl = base_ttl + random(-min_jitter_range/2, min_jitter_range/2)

// Low traffic systems
// Rule of thumb: 10–20% of your base TTL
ttl = base_ttl + random(-base_ttl * 0.1, base_ttl * 0.1)

Jitter is the cheapest fix. It should be on by default everywhere. But it only reduces spike height; it doesn't eliminate spikes. Think of it as the first layer of defense.

Probabilistic Early Re-Computation

Every technique so far is reactive. It deals with expiry after it happens. What if the cache could refresh itself before expiring, with no background workers and no coordination between servers?

The Idea

As a cache entry gets closer to its TTL, each incoming request independently "rolls a dice" to decide whether to refresh it. The closer to expiry, the higher the chance. Statistically, exactly one or two requests volunteer to refresh the value just before it expires.

Analogy: Imagine a potluck dinner. The chips bowl is getting low. Every guest who walks past independently thinks "should I refill it?" The emptier the bowl, the more likely each person is to volunteer. Nobody talks to anyone. Statistically, one person refills it right before it empties.

How It Works

Each cache entry stores three things: the value, the expiry time, and delta (how long the last recomputation took). On every read, a single line of math decides:

time_remaining = expiry - now()
gap = delta × beta × ln(random())    // always negative

if time_remaining + gap ≤ 0:
    recompute_and_cache()     // this request volunteers
else:
    return cached_value        // serve normally

Why This Works

ln(random()) produces a negative number. When lots of time remains (say 200 seconds), the negative value would need to be astronomically large to trigger a refresh — probability is near zero. When only 2 seconds remain, even a small negative value triggers it — probability is high.
The delta (computation cost) makes the algorithm self-adapt. An expensive 5-second query starts refreshing earlier because it needs the head start. A cheap 10ms query waits until the last moment.
beta controls how aggressively the early recomputation kicks in. default value 0.5 (conservative)

Multiple Servers, Zero Coordination

Each of your 10 application servers rolls its own independent dice on every request. No server knows what any other is doing. Yet the math guarantees that across all servers, roughly one will volunteer at the right time. This is what makes probabilistic re-computation uniquely elegant — no locks, no background workers, no stale data, and roughly one extra DB query per key per cycle.

Limitation: Low-traffic keys may not get enough requests near expiry to trigger the probability. For those keys, a regular cache miss occurs — and that's fine, because a single miss on a low-traffic key isn't a stampede.

Mutex / Cache Locking

A hot key expires. Hundreds of requests arrive. Instead of all of them querying the database, only one is allowed to. Everyone else waits for that single result.

Mutex Locking Flow

t=1ms: Request A → cache MISS → acquires lock ✓ → queries DB
t=1ms: Request B → cache MISS → lock taken     → waits...
t=2ms: Request C → cache MISS → lock taken     → waits...
t=2ms: Request D → cache MISS → lock taken     → waits...

t=500ms: Request A → gets result → writes cache → releases lock

t=501ms: Request B → reads fresh cache ✓
t=501ms: Request C → reads fresh cache ✓
t=501ms: Request D → reads fresh cache ✓

// DB saw 1 query, not 500

Three Things You Must Get Right

1. The lock must auto-expire. If the server holding the lock crashes, the lock must disappear on its own. Without this, a crash causes permanent deadlock.

2. Only the owner can release. Store a unique ID as the lock value. Before deleting, check "is this still my lock?" atomically. Otherwise, Server A might accidentally delete Server B's lock after a timeout.

3. Waiters need a timeout. If the lock holder fails silently, waiters can't wait forever. After a timeout, they either recompute themselves or return an error.

What Should Waiters Do?

Strategy	Behavior	Best For
Spin/Poll	Check cache every 50ms until value appears	Short recomputations (<1s)
Return Stale	Serve the previous expired value immediately	User-facing APIs (latency matters)
Pub/Sub	Subscribe to a channel; holder notifies on completion	Very high concurrency (10K+/sec)

The Return Stale strategy is the most common in production for user-facing apps. Nobody waits, nobody sees an error, and the data is at most one cycle stale. This naturally combines mutex with Stale-While-Revalidate (our next method).

Stale-While-Revalidate (SWR)

Every strategy so far has the same limitation: at some point, someone waits for the recomputation. SWR says: what if nobody waits, ever?

The key idea: each cache entry gets two TTLs — a soft one and a hard one.

SWR — Two TTLs

Walk Through It Step by Step

t = 310s: Request arrives. Cache is stale but exists. Return the stale value immediately (fast!). Spawn a background thread to refresh from the database.
t = 311s: Background refresh completes. Cache now has the fresh value with a new TTL.
t = 312s: Next request gets the fresh value. The user never waited. The DB saw one gentle query.

The Tradeoff: SWR trades a bounded window of staleness for the guarantee that no user ever blocks on a cache miss. This is perfect for product catalogs, user profiles, feeds, and dashboards. It is not suitable for account balances, inventory counts during checkout, or anything where a stale read causes incorrect actions.

Real World: When Netflix releases a new season, millions of users see cached show metadata instantly while the cache refreshes behind the scenes. Users get a fast experience; the database sees smooth, gentle traffic.

Cache Warming / Pre-Warming

Every strategy above assumes the cache already has data. But what happens when the cache is completely empty? After a marketing push notification is sent, 100% of traffic hits the database at once.

Cache warming means proactively loading data before traffic arrives.

The iPhone Launch Example

Flipkart knows the iPhone 17 goes on sale at 8:00 PM. Millions of users will hit the same product page within seconds.

The same logic applies everywhere: warm the show metadata before a Netflix season drops, warm match stats before an IPL game starts, warm the landing page before a marketing email goes out.

The key principle: if you know what users will request and when, fill the cache before they arrive.

You can't warm everything. In most systems, the top 5–20% of keys handle 80–90% of traffic. Warm those and the long tail handles itself. A single miss on a rarely-accessed key isn't a stampede.

Tradeoffs: Freshness, Latency & Consistency

In caching, you're always balancing three competing properties. You can optimize for any two, but the third must give way.

The Impossible Triangle

The Tradeoff

Optimize For	Sacrifice	Strategy	Example
Freshness + Consistency	Latency	Write-through, sync invalidation	Bank balance after transfer
Freshness + Latency	Consistency	SWR, background refresh	Social media trending feed
Latency + Consistency	Freshness	Long TTL, aggressive caching	Product catalog descriptions

Different Data, Different Strategies

A single application should use different caching strategies for different data types. Product descriptions tolerate staleness (SWR). Inventory counts need strong freshness (write-through). Recommendations are expensive to compute and tolerant of staleness (probabilistic recomputation). Auth tokens need both freshness and low latency (write-through with local cache).

When to Use Which Strategy

The Decision Path

Is it read-heavy? (read:write > 10:1) → Cache it. Write-heavy → consider write-through or skip caching.
How stale can it be? Zero → Write-through. Seconds → Event invalidation. Minutes+ → TTL + Jitter.
How hot is the key? Extremely popular → add Probabilistic Recomputation + Mutex + SWR.
Are there predictable spikes? Sales, launches, matches → add Cache Warming.

Defense in Depth: The Full Stack

In production, these strategies layer together. Each one is a safety net for the layer above:

Layer 0 — Prevention:       Cache Warming
  Fill the cache before traffic arrives.

Layer 1 — Smoothing:        TTL Jitter
  Spread expirations so they don't synchronize.

Layer 2 — Proactive:        Probabilistic Early Re-computation
  Refresh before expiry with zero coordination.

Layer 3 — User Experience:  Stale-While-Revalidate
  Even if expiry happens, no user ever waits.

Layer 4 — DB Safety:        Mutex / Cache Lock
  Even if background refresh fires from multiple
  servers, only one hits the database.

Real-World Example: E-Commerce Product Page

One page load. Six data types. Six different strategies.

Component	Strategy	Staleness	Why
Product images	Browser cache (30d) + CDN (24h)	Hours	Images rarely change
Product description	SWR + Jitter	Minutes	Updates are infrequent
Product price	Event-driven invalidation + Mutex	Seconds	Users expect accuracy
Inventory count	Write-through	Zero	Stale count = overselling
Recommendations	Probabilistic + Jitter	Minutes	Expensive, staleness OK
Flash sale items	Scheduled warming	Zero at launch	Must be warm at spike

The End

Cache stampede prevention is not about picking one technique. It's about layering multiple strategies so that each one is a safety net for the others. Jitter prevents synchronized expiry.

Probabilistic recomputation refreshes before keys die. SWR ensures no user ever waits. Mutex deduplicates database queries. Warming prevents cold starts entirely. Together, they make cache-related outages virtually impossible.

Caching Strategies in Distributed Systems

Why Basic TTL Caching Is Not Enough

How Cache Expiry Causes Traffic Spikes

TTL Jitter — Adding Randomness to Expiration

Probabilistic Early Re-Computation

Mutex / Cache Locking

Stale-While-Revalidate (SWR)

Cache Warming / Pre-Warming

Tradeoffs: Freshness, Latency & Consistency

When to Use Which Strategy

The End

Comments

More from this blog

Understanding the Thundering Herd Problem

Understanding Kafka Through a Rapido Ride

Command Palette

Why Basic TTL Caching Is Not Enough

How Cache Expiry Causes Traffic Spikes

TTL Jitter — Adding Randomness to Expiration

Probabilistic Early Re-Computation

Mutex / Cache Locking

Stale-While-Revalidate (SWR)

Cache Warming / Pre-Warming

Tradeoffs: Freshness, Latency & Consistency

When to Use Which Strategy

The End

Comments

More from this blog