#Caching in System Design

Elliot Forbes Elliot Forbes · Jun 20, 2026 · 7 min read

Caching trades memory and freshness for speed and reduced load. By storing a copy of frequently accessed data closer to the consumer, you serve repeated requests without touching the primary data store.

At scale, caching is one of the highest-leverage tools available. A database query that takes 20ms on a cold path takes under 1ms when served from Redis. That difference compounds across millions of requests per day.

Caches live at many layers. Client-side caches store data in the browser or mobile app. CDN caches hold static assets and rendered pages at the edge.

Application-layer caches (Redis, Memcached) sit between the app and the database. Databases themselves maintain an internal query cache as well.

The Core Concept

The most common caching pattern is cache-aside, also called lazy loading. The application code owns all cache interactions explicitly.

On a read, the app checks the cache first. If the key is present (a cache hit), the data is returned immediately.

On a miss, the app reads from the database, writes the result into the cache, and returns it to the caller.

Cache-aside read flow: the app reads the cache, and on a miss reads the database and populates the cache

Cache-aside is simple and resilient. If the cache goes down, the app falls back to the database without crashing. The tradeoff is that every cache miss costs two round-trips: one to the cache and one to the database.

Caching Strategies

Cache-aside (lazy loading) is the default for read-heavy workloads. The cache is populated on demand, so it only ever holds data that has actually been requested. Cold caches start empty and warm up over time.

Write-through keeps the cache and the database in sync by writing to both on every mutation. The write is not acknowledged to the client until both the cache and the database have confirmed it.

This means the cache is always consistent with the database, but every write pays the latency cost of a synchronous database write before returning.

Write-back (write-behind) writes to the cache only and acknowledges success immediately. The cache flushes dirty entries to the database asynchronously in the background.

This gives very fast write responses, but if the cache crashes before the flush completes, those in-flight writes are lost permanently.

Eviction and Expiry

Caches have finite memory. When a cache fills up, it must remove entries to make room. The eviction policy determines which entries are dropped.

LRU (Least Recently Used) is the most common policy. Entries not accessed for the longest time are evicted first.

It performs well when access patterns have temporal locality - recently used items tend to be used again soon.

LFU (Least Frequently Used) evicts items that have been accessed the fewest total times. It is better than LRU when some items are consistently hot and should never be evicted, even after a brief period of no access.

FIFO (First In, First Out) evicts the oldest entries by insertion time regardless of access. It is simple but rarely optimal for caching.

TTL (time-to-live) is expiry by time, not by pressure. Each cache entry is assigned a maximum age. After the TTL expires the entry is considered stale and the next request for that key triggers a fresh database read.

TTLs are the simplest tool for bounding cache staleness. They cap how long cached data can diverge from the source of truth before it is automatically refreshed.

This is not the same as explicit invalidation, which proactively removes a key the moment its underlying data changes.

Shorter TTLs mean fresher data but higher database load. Longer TTLs reduce load but increase the risk of serving stale data.

Tradeoffs

Thundering herd (cache stampede) occurs when a popular cache entry expires and many concurrent requests all miss at once, all read from the database simultaneously, and all attempt to repopulate the cache at the same time.

The database receives a sudden spike of load that can cause cascading failures.

Mitigations include request coalescing (one in-flight request serves all waiting callers on completion), staggered TTLs (random jitter to spread out expirations), and distributed locks (only one process repopulates a key while others wait).

Cold-cache penalty is the cost paid when a cache starts empty - after a deployment, a restart, or a failover. Until the cache warms up, every request hits the database. Pre-warming caches before routing traffic is a standard mitigation.

Consistency with the source of truth is the hardest problem in caching. Any strategy that is not write-through introduces a window where the cache holds stale data.

This is directly analogous to the stale-read problem in replication, where followers lag behind the leader.

How much staleness is acceptable is a product decision. A product inventory count may require tight consistency. A user profile image can tolerate minutes of stale data.

Understanding that boundary is what interviewers are probing when they ask about caching tradeoffs.

These tradeoffs are part of a broader consistency-availability tension described by the CAP theorem.

A cache that always holds fresh data is effectively a CP cache. One that may serve stale data in exchange for low latency is an AP cache.

How It Comes Up in Interviews

Interviewers rarely ask you to define cache-aside verbatim. They present a system and ask where caching fits and what risks it introduces.

What caching strategy would you use and why?

It depends on the access pattern. For read-heavy data with tolerable staleness, cache-aside is the default: simple, resilient, and populated on demand.

For write-heavy data where the cache must always be fresh, write-through is safer. Write-back is only justified when write throughput is extreme and you can accept the risk of losing in-flight data on a cache crash.

How do you keep a cache consistent with the database?

You cannot guarantee perfect consistency without paying for it. Write-through keeps the cache and database in sync at the cost of write latency.

For cache-aside, TTLs bound how stale entries can get. For critical data, invalidate the cache key explicitly on every write so the next read repopulates from the database.

Explicit invalidation gives fresher data than TTLs but creates a brief window where multiple readers race to repopulate the same key.

What is cache invalidation and why is it hard?

Cache invalidation is the process of removing or updating a cached entry when the underlying data changes. It is hard because a cache is a copy, and keeping a copy in sync requires knowing exactly when the original changes.

In distributed systems, that knowledge is expensive to propagate reliably. The usual fallback is accepting a staleness window via TTLs.

Phil Karlton’s remark that cache invalidation is one of the two hard problems in computer science reflects this accurately.

Where would you place a cache in this design?

Start by identifying the hottest read paths. Add a cache between the application and the database for data with a high read-to-write ratio and a tolerable staleness window.

Add a CDN cache for static or semi-static responses that vary only by URL. Be specific about the cache key and TTL - interviewers want to see you have thought through the details, not just said “add Redis”.

What’s Next

Frequently Asked Questions

What is caching in system design?

Caching is the practice of storing a copy of data in a faster or closer location to reduce the cost of retrieving it repeatedly from the original source.

In system design, caches reduce database load and cut response latency for frequently accessed data.

What is the difference between write-through and write-back caching?

Write-through writes to both the cache and the database synchronously before acknowledging success. The cache is always consistent with the database but every write is slower because it must wait for the database.

Write-back writes to the cache only and acknowledges success immediately, then flushes to the database asynchronously. This is faster for writes but risks data loss if the cache fails before the flush completes.

What is cache-aside?

Cache-aside (lazy loading) is a caching strategy where the application manages all cache interactions. On a read, the app checks the cache first.

On a miss, it reads from the database, populates the cache, and returns the result. The cache fills lazily, on demand, rather than eagerly on every write.

Why is cache invalidation hard?

A cache is a copy of data that lives separately from the source of truth. Keeping a copy in sync requires knowing exactly when the original changes and propagating that reliably across a distributed system.

In practice, you must choose between accepting staleness (via TTLs) or paying the coordination cost of explicit invalidation on every write. Neither option is free.

Share this article