Caching Strategies in Distributed Systems
Beyond TTL: How Production Systems Survive Cache Expiry at Scale

Imagine you run an e-commerce website. You cache your most popular product's data in Redis with a 5-minute expiration. For 5 minutes, everything is blazing fast. Then the cache expires. 10,000 users, all browsing that product at the same instant, slam into your database at once.
Your database wasn't built for that. It buckles, slows down, and the whole site goes down. This is called a cache stampede, and it has caused real outages at companies of every size from startups to Netflix.
This article walks you through seven strategies that prevent this, from the simplest (adding randomness) to the most elegant (probabilistic math).
Why Basic TTL Caching Is Not Enough
TTL (Time-To-Live) is the simplest caching strategy. You store data with an expiration time. After that time, the entry disappears, and the next request fetches fresh data from the database.
SET product:42 → {name: "iPhone 16", price: "₹79,900"} TTL = 300 seconds
// For 300 seconds, every read is instant.
// At second 301 — cache miss — hit the database.
This works fine for small apps. But at scale, four problems emerge.
Problem 1: The Stampede
When a hot key expires, every concurrent request misses the cache and queries the database simultaneously. A key serving 10,000 requests/second creates 10,000 database queries in one instant. This is the thundering herd.
Problem 2: Staleness Dilemma
Short TTL = frequent database hits. Long TTL = users see outdated data. If a user changes their profile name, other users may see the old name for the entire TTL duration. There is no way to say "invalidate this right now."
Problem 3: Cold Start
After a deployment or cache restart, everything is a miss. Your database suddenly handles 100% of traffic. Worse, if all keys were set around the same time, they expire together. This is a repeating spike every TTL cycle.
Problem 4: Memory Waste
A key accessed once gets the same lifetime as a key accessed a million times. You waste memory on rarely-used entries while potentially evicting popular ones.
How Cache Expiry Causes Traffic Spikes
Let's trace exactly what happens when a popular cache key expires.
The Single-Key Spike
Second 59: 5,000 req → cache HIT → DB sees 0 queries ✓
Second 60: 5,000 req → cache MISS → DB sees 5,000 queries x
Second 61: 5,000 req → cache HIT → DB sees 0 queries ✓
That one-second window is the thundering herd. Now imagine thousands of keys all set at the same time. They all expire together, creating a repeating sawtooth pattern:
Synchronized TTL Expiry - Sawtooth Pattern
DB Load
▲
│ ╱│ ╱│ ╱│
│ ╱ │ ╱ │ ╱ │ ← periodic spikes every 5 minutes
│ ╱ │ ╱ │ ╱ │
│╱ │ ╱ │ ╱ │
└────┴───────┴───────┴────▶ time
5min 10min 15min
The Death Spiral
This is where it gets truly dangerous. The sequence is:
Hot keys expire simultaneously. Thousands of cache misses at once.
Database overloads. Queries that normally take 10ms now take 2 seconds.
Timeouts fire. Application threads are blocked. Request queues fill up.
Retries double the load. Clients and load balancers retry timed-out requests, piling onto an already struggling database.
Full outage. The DB is so slow that cache repopulation fails. More keys expire. Death spiral.
cache expiry → DB overload → slow responses → timeouts & retries → more DB load → full outage
TTL Jitter — Adding Randomness to Expiration
The simplest fix. Instead of every key expiring at exactly the same time, add a small random offset. Expirations spread across a window instead of a single moment.
Without Jitter vs. With Jitter
Without jitter:
Key A: TTL = 300s → expires at t+300 ┐
Key B: TTL = 300s → expires at t+300 ├── all at once
Key C: TTL = 300s → expires at t+300 ┘
With jitter:
Key A: 300 + random(-30, 30) = 317s → expires at t+317 ┐
Key B: 300 + random(-30, 30) = 282s → expires at t+282 ├── spread out
Key C: 300 + random(-30, 30) = 305s → expires at t+305 ┘
Instead of 10,000 cache misses in one second, you get roughly 167 misses per second over 60 seconds. Your database barely notices.
Jitter Strategies
Additive Uniform: Simple default, adds spread after base. Staleness guarantee doesn't improve. It only gets slightly worse on average.
Formula: base + random(0, range)
Symmetric: Preserves average TTL exactly; best general default.
Formula: base + random(-r, +r)
Proportional: Systems with varied TTLs. Scale the jitter window relative to the TTL itself.
Formula: base * (1 + random(-p, +p))
Full Jitter: Maximum spread; stampede prevention over staleness
Formula: random(min, base)
How to Choose the Jitter Window
The window must be large enough to absorb the spike:
// If 10,000 keys expire together
// and DB can handle 500 queries/sec
// then spread over at least 20 seconds
min_jitter_range = expiring_keys / db_capacity
= 10,000 / 500 = 20 seconds
// High traffic systems
ttl = base_ttl + random(-min_jitter_range/2, min_jitter_range/2)
// Low traffic systems
// Rule of thumb: 10–20% of your base TTL
ttl = base_ttl + random(-base_ttl * 0.1, base_ttl * 0.1)
Jitter is the cheapest fix. It should be on by default everywhere. But it only reduces spike height; it doesn't eliminate spikes. Think of it as the first layer of defense.
Probabilistic Early Re-Computation
Every technique so far is reactive. It deals with expiry after it happens. What if the cache could refresh itself before expiring, with no background workers and no coordination between servers?
The Idea
As a cache entry gets closer to its TTL, each incoming request independently "rolls a dice" to decide whether to refresh it. The closer to expiry, the higher the chance. Statistically, exactly one or two requests volunteer to refresh the value just before it expires.
Analogy: Imagine a potluck dinner. The chips bowl is getting low. Every guest who walks past independently thinks "should I refill it?" The emptier the bowl, the more likely each person is to volunteer. Nobody talks to anyone. Statistically, one person refills it right before it empties.
How It Works
Each cache entry stores three things: the value, the expiry time, and delta (how long the last recomputation took). On every read, a single line of math decides:
time_remaining = expiry - now()
gap = delta × beta × ln(random()) // always negative
if time_remaining + gap ≤ 0:
recompute_and_cache() // this request volunteers
else:
return cached_value // serve normally
Why This Works
ln(random()) produces a negative number. When lots of time remains (say 200 seconds), the negative value would need to be astronomically large to trigger a refresh — probability is near zero. When only 2 seconds remain, even a small negative value triggers it — probability is high.
The delta (computation cost) makes the algorithm self-adapt. An expensive 5-second query starts refreshing earlier because it needs the head start. A cheap 10ms query waits until the last moment.
beta controls how aggressively the early recomputation kicks in. default value 0.5 (conservative)
Multiple Servers, Zero Coordination
Each of your 10 application servers rolls its own independent dice on every request. No server knows what any other is doing. Yet the math guarantees that across all servers, roughly one will volunteer at the right time. This is what makes probabilistic re-computation uniquely elegant — no locks, no background workers, no stale data, and roughly one extra DB query per key per cycle.
Limitation: Low-traffic keys may not get enough requests near expiry to trigger the probability. For those keys, a regular cache miss occurs — and that's fine, because a single miss on a low-traffic key isn't a stampede.
Mutex / Cache Locking
A hot key expires. Hundreds of requests arrive. Instead of all of them querying the database, only one is allowed to. Everyone else waits for that single result.
Mutex Locking Flow
t=1ms: Request A → cache MISS → acquires lock ✓ → queries DB
t=1ms: Request B → cache MISS → lock taken → waits...
t=2ms: Request C → cache MISS → lock taken → waits...
t=2ms: Request D → cache MISS → lock taken → waits...
t=500ms: Request A → gets result → writes cache → releases lock
t=501ms: Request B → reads fresh cache ✓
t=501ms: Request C → reads fresh cache ✓
t=501ms: Request D → reads fresh cache ✓
// DB saw 1 query, not 500
Three Things You Must Get Right
1. The lock must auto-expire. If the server holding the lock crashes, the lock must disappear on its own. Without this, a crash causes permanent deadlock.
2. Only the owner can release. Store a unique ID as the lock value. Before deleting, check "is this still my lock?" atomically. Otherwise, Server A might accidentally delete Server B's lock after a timeout.
3. Waiters need a timeout. If the lock holder fails silently, waiters can't wait forever. After a timeout, they either recompute themselves or return an error.
What Should Waiters Do?
| Strategy | Behavior | Best For |
|---|---|---|
| Spin/Poll | Check cache every 50ms until value appears | Short recomputations (<1s) |
| Return Stale | Serve the previous expired value immediately | User-facing APIs (latency matters) |
| Pub/Sub | Subscribe to a channel; holder notifies on completion | Very high concurrency (10K+/sec) |
The Return Stale strategy is the most common in production for user-facing apps. Nobody waits, nobody sees an error, and the data is at most one cycle stale. This naturally combines mutex with Stale-While-Revalidate (our next method).
Stale-While-Revalidate (SWR)
Every strategy so far has the same limitation: at some point, someone waits for the recomputation. SWR says: what if nobody waits, ever?
The key idea: each cache entry gets two TTLs — a soft one and a hard one.
SWR — Two TTLs
Walk Through It Step by Step
t = 310s: Request arrives. Cache is stale but exists. Return the stale value immediately (fast!). Spawn a background thread to refresh from the database.
t = 311s: Background refresh completes. Cache now has the fresh value with a new TTL.
t = 312s: Next request gets the fresh value. The user never waited. The DB saw one gentle query.
The Tradeoff: SWR trades a bounded window of staleness for the guarantee that no user ever blocks on a cache miss. This is perfect for product catalogs, user profiles, feeds, and dashboards. It is not suitable for account balances, inventory counts during checkout, or anything where a stale read causes incorrect actions.
Real World: When Netflix releases a new season, millions of users see cached show metadata instantly while the cache refreshes behind the scenes. Users get a fast experience; the database sees smooth, gentle traffic.
Cache Warming / Pre-Warming
Every strategy above assumes the cache already has data. But what happens when the cache is completely empty? After a marketing push notification is sent, 100% of traffic hits the database at once.
Cache warming means proactively loading data before traffic arrives.
The iPhone Launch Example
Flipkart knows the iPhone 17 goes on sale at 8:00 PM. Millions of users will hit the same product page within seconds.
The same logic applies everywhere: warm the show metadata before a Netflix season drops, warm match stats before an IPL game starts, warm the landing page before a marketing email goes out.
The key principle: if you know what users will request and when, fill the cache before they arrive.
You can't warm everything. In most systems, the top 5–20% of keys handle 80–90% of traffic. Warm those and the long tail handles itself. A single miss on a rarely-accessed key isn't a stampede.
Tradeoffs: Freshness, Latency & Consistency
In caching, you're always balancing three competing properties. You can optimize for any two, but the third must give way.
The Impossible Triangle
The Tradeoff
| Optimize For | Sacrifice | Strategy | Example |
|---|---|---|---|
| Freshness + Consistency | Latency | Write-through, sync invalidation | Bank balance after transfer |
| Freshness + Latency | Consistency | SWR, background refresh | Social media trending feed |
| Latency + Consistency | Freshness | Long TTL, aggressive caching | Product catalog descriptions |
Different Data, Different Strategies
A single application should use different caching strategies for different data types. Product descriptions tolerate staleness (SWR). Inventory counts need strong freshness (write-through). Recommendations are expensive to compute and tolerant of staleness (probabilistic recomputation). Auth tokens need both freshness and low latency (write-through with local cache).
When to Use Which Strategy
The Decision Path
Is it read-heavy? (read:write > 10:1) → Cache it. Write-heavy → consider write-through or skip caching.
How stale can it be? Zero → Write-through. Seconds → Event invalidation. Minutes+ → TTL + Jitter.
How hot is the key? Extremely popular → add Probabilistic Recomputation + Mutex + SWR.
Are there predictable spikes? Sales, launches, matches → add Cache Warming.
Defense in Depth: The Full Stack
In production, these strategies layer together. Each one is a safety net for the layer above:
Layer 0 — Prevention: Cache Warming
Fill the cache before traffic arrives.
Layer 1 — Smoothing: TTL Jitter
Spread expirations so they don't synchronize.
Layer 2 — Proactive: Probabilistic Early Re-computation
Refresh before expiry with zero coordination.
Layer 3 — User Experience: Stale-While-Revalidate
Even if expiry happens, no user ever waits.
Layer 4 — DB Safety: Mutex / Cache Lock
Even if background refresh fires from multiple
servers, only one hits the database.
Real-World Example: E-Commerce Product Page
One page load. Six data types. Six different strategies.
| Component | Strategy | Staleness | Why |
|---|---|---|---|
| Product images | Browser cache (30d) + CDN (24h) | Hours | Images rarely change |
| Product description | SWR + Jitter | Minutes | Updates are infrequent |
| Product price | Event-driven invalidation + Mutex | Seconds | Users expect accuracy |
| Inventory count | Write-through | Zero | Stale count = overselling |
| Recommendations | Probabilistic + Jitter | Minutes | Expensive, staleness OK |
| Flash sale items | Scheduled warming | Zero at launch | Must be warm at spike |
The End
Cache stampede prevention is not about picking one technique. It's about layering multiple strategies so that each one is a safety net for the others. Jitter prevents synchronized expiry.
Probabilistic recomputation refreshes before keys die. SWR ensures no user ever waits. Mutex deduplicates database queries. Warming prevents cold starts entirely. Together, they make cache-related outages virtually impossible.

