3 System Design Patterns That Show Up in Every Interview

Feedback

System design interviews can feel overwhelming, but the same patterns come up again and again. In this video, I’ll break down three core patterns you need to know: caching, load balancing, and database sharding. You’ll learn how each one actually works in real systems, and the key tradeoffs to watch out for with each one. If you want to feel confident walking into your next system design interview, master these three first. Subscribe for more clear breakdowns of system design concepts, real-world examples, and the patterns you’ll see at big tech companies and startups alike. System design misconceptions: https://www.youtube.com/watch?v=UovsPd8fDfc&feature=youtu.be Check out my FREE System Design Playbook: https://stan.store/arjay_the_Dev/p/my-ultimate-system-design-guide Join my email list: https://magic.beehiiv.com/v1/3409dcd8-0419-4a38-932c-d3db43ca6164 System Design Community Interest Form: https://docs.google.com/forms/d/e/1FAIpQLSeLkc5LUQl8dpXAOH27bodHMX8CAWXChcpso06cbtYRYP8wQA/viewform -- Video Content -- 0:00 - Intro 0:10 - Caching 2:05 - Load Balancing 3:25 - Sharding

Arjay McCandless

5:16

2025-10-18

Loading YouTube video...

Video ID:V4Zam6_oZKE

Use the "Analytical Tools" panel on the right to create personalized analytical resources from this video content.

Transcript

184 segmentsCurrent: 0:00

Hey, I'm R.J. I've sat on both sides of

the interview table in a ton of

interviews. And while system design is

such a huge field, there's three topics

that come up in almost every interview.

Let's start with caching. Caching comes

up so often because every system has hot

data or data that gets accessed

repeatedly. Interviewers want to see if

you can speed up access to that data

without crushing your database. At its

core, caching just means storing data

closer to the user or the application so

you don't have to keep going back to a

slower system or database to get data

every single time. This could be in

memory, on disk, in the browser, or even

out at a CDN edge. The goal is always

the same, faster access. Imagine we're

building a system like Twitter. If every

time somebody loaded their feed, we had

to hit the underlying database, this

would absolutely crush the system.

Instead, we can cache the feed in

memory. Now users can get it instantly.

But caching isn't just one thing. We

might cache in the application layer

with something like Reddus or memcache.

We could also cache on the client side

in the user's browser so the same data

doesn't have to reload over and over

again. Or we could cache at the CDN

edge. This is what makes platforms like

YouTube or Netflix so fast worldwide.

Now, while all that might sound amazing,

caching is not without its problems.

Now, we introduce a problem called cache

invalidation. What happens when the data

in our cache goes stale? And how do we

make sure we're not serving outdated

information? To deal with this, here are

a few common strategies for controlling

what's in the cache. First, write

through caching. Every single write will

go to both the database and the cache at

the same time. This will keep things

between the database and the cache

consistent, but also makes writes a

little bit slower. Next we have write

around caching. Writes go straight to

the database and the cache is only

updated when the data is read again

later. This avoids filling the cache

with new data that might never be read.

But it does mean that the first time

somebody reads that data, it is slower

because we have to repopulate the cache

with the data. And finally, we have

right behind caching. Writes go to the

cache first and then are asynchronously

pushed to the database. This gives you

super fast writes because we're only

writing to the cache in memory. But if

the cache somehow fails before syncing,

you risk losing data. Caching can speed

things up and make designs way more

efficient, but it's not without its

trade-offs. If you want to see a video

where I talk a little bit more about the

downsides of caching in real systems,

I'll link it in the description for your

next watch. Let's move on to the second

pattern, load balancing. Load balancing

comes up because one bottleneck almost

every company faces when they go to

scale up their service is one server is

not enough. If that server crashes,

customers lose access to their product.

If they need to support more users,

well, the hardware for one server is

only so powerful. Imagine we're running

an e-commerce site during Black Friday.

We have millions of people all hitting

the checkout button at the same time. If

we have one server handling all those

requests, it would crash instantly no

matter how big your virtual machine is.

With a load balancer in front, those

requests can be distributed across many

servers and suddenly we can scale

horizontally. Now, there are a few

common ways to actually handle the load

balancing here. The first of which is

roundroin. Requests are just sent one

after another to each server in turn.

Simple, but this doesn't account for

uneven workloads. Next, we have least

connections. Traffic goes to the server

handling the fewest active requests.

This is more adaptive, but now we have

to track how many requests each server

is handling. And finally, hashbased. The

request gets routed based on a key like

a user ID or a shopping cart ID. This is

great for keeping related traffic

together, but it can create hotspots if

keys aren't well distributed. Load

balancers can also live at different

layers. Layer 4 load balancers look at

things like network information like the

IP and port. Layer 7 load balancers can

go a lot deeper and then we can route

based on things like HTTP headers and

URLs. Layer 7 gives you more control,

but it also adds additional overhead.

Load balancing is one of those patterns

that you just have to know. I've never

been in a design interview where it

didn't come up. Let's dive into the

third pattern. Database sharding. This

pretty much always comes up because at

some point your database just gets too

big. Database sharding basically means

splitting your data across multiple

databases so no single machine has to

store or handle all of the load. Each

shard is responsible for just a portion

of the data. Imagine we've been asked to

design a healthcare system that manages

digital medical records. Hospitals

across the country are all writing

patient data at the same time. visits,

prescriptions, lab results. If we try to

store everything in one database,

performance will quickly suffer.

Instead, you might shard by patient ID.

So each shard only holds a slice of the

total patients and their records. This

way, queries stay fast even as the

system scales to millions of patients.

There are a few common ways to shard.

The first of which is called rangebased

sharding. We can split up the data by

ranges of patient IDs. For example,

shard one could store IDs 1 through 1

million, shard 1 million through 2

million, and so on. It's simple, but it

can cause hotspots if most of the

activity clusters in one range. Next,

hashbased sharding. We can apply a hash

to the patient ID and then route that

record to a shard. With something like

consistent hashing, we can perfectly

balance the load here, but range queries

or queries that might have to go across

related data might now have to span

multiple partitions, which is a really

bad access pattern. One other strategy

that can work really well is

geossharding. We split up the data by

region. In this example, patients on the

east coast could be in one shard and the

west coast on another shard. This will

work really well when queries are always

scoped to a single region. But if data

needs to be shared across regions, this

can quickly become a problem. Sharding

is super powerful, but it also

introduces some risks. Whenever you're

in an interview, make sure you think

about how to minimize cross shard

queries, what you'll do if you have to

reshard because one shard grows too

large, and how you'll avoid hot keys

from overloading a single shard.

Sharding is pretty much always going to

show up in interviews where you're

designing for millions or billions of

users. But it's not without its

trade-offs. If you found this helpful,

feel free to subscribe. I'll be simply

breaking down more system design

concepts throughout the next coming

weeks.

Analytical Tools

Create personalized analytical resources on-demand.

Analysis

Visualization

Learning

Assessment

Chatbot is available after you save this video to your library.