3 System Design Patterns That Show Up in Every Interview
Loading YouTube video...
Video ID:V4Zam6_oZKE
Use the "Analytical Tools" panel on the right to create personalized analytical resources from this video content.
Transcript
Hey, I'm R.J. I've sat on both sides of
the interview table in a ton of
interviews. And while system design is
such a huge field, there's three topics
that come up in almost every interview.
Let's start with caching. Caching comes
up so often because every system has hot
data or data that gets accessed
repeatedly. Interviewers want to see if
you can speed up access to that data
without crushing your database. At its
core, caching just means storing data
closer to the user or the application so
you don't have to keep going back to a
slower system or database to get data
every single time. This could be in
memory, on disk, in the browser, or even
out at a CDN edge. The goal is always
the same, faster access. Imagine we're
building a system like Twitter. If every
time somebody loaded their feed, we had
to hit the underlying database, this
would absolutely crush the system.
Instead, we can cache the feed in
memory. Now users can get it instantly.
But caching isn't just one thing. We
might cache in the application layer
with something like Reddus or memcache.
We could also cache on the client side
in the user's browser so the same data
doesn't have to reload over and over
again. Or we could cache at the CDN
edge. This is what makes platforms like
YouTube or Netflix so fast worldwide.
Now, while all that might sound amazing,
caching is not without its problems.
Now, we introduce a problem called cache
invalidation. What happens when the data
in our cache goes stale? And how do we
make sure we're not serving outdated
information? To deal with this, here are
a few common strategies for controlling
what's in the cache. First, write
through caching. Every single write will
go to both the database and the cache at
the same time. This will keep things
between the database and the cache
consistent, but also makes writes a
little bit slower. Next we have write
around caching. Writes go straight to
the database and the cache is only
updated when the data is read again
later. This avoids filling the cache
with new data that might never be read.
But it does mean that the first time
somebody reads that data, it is slower
because we have to repopulate the cache
with the data. And finally, we have
right behind caching. Writes go to the
cache first and then are asynchronously
pushed to the database. This gives you
super fast writes because we're only
writing to the cache in memory. But if
the cache somehow fails before syncing,
you risk losing data. Caching can speed
things up and make designs way more
efficient, but it's not without its
trade-offs. If you want to see a video
where I talk a little bit more about the
downsides of caching in real systems,
I'll link it in the description for your
next watch. Let's move on to the second
pattern, load balancing. Load balancing
comes up because one bottleneck almost
every company faces when they go to
scale up their service is one server is
not enough. If that server crashes,
customers lose access to their product.
If they need to support more users,
well, the hardware for one server is
only so powerful. Imagine we're running
an e-commerce site during Black Friday.
We have millions of people all hitting
the checkout button at the same time. If
we have one server handling all those
requests, it would crash instantly no
matter how big your virtual machine is.
With a load balancer in front, those
requests can be distributed across many
servers and suddenly we can scale
horizontally. Now, there are a few
common ways to actually handle the load
balancing here. The first of which is
roundroin. Requests are just sent one
after another to each server in turn.
Simple, but this doesn't account for
uneven workloads. Next, we have least
connections. Traffic goes to the server
handling the fewest active requests.
This is more adaptive, but now we have
to track how many requests each server
is handling. And finally, hashbased. The
request gets routed based on a key like
a user ID or a shopping cart ID. This is
great for keeping related traffic
together, but it can create hotspots if
keys aren't well distributed. Load
balancers can also live at different
layers. Layer 4 load balancers look at
things like network information like the
IP and port. Layer 7 load balancers can
go a lot deeper and then we can route
based on things like HTTP headers and
URLs. Layer 7 gives you more control,
but it also adds additional overhead.
Load balancing is one of those patterns
that you just have to know. I've never
been in a design interview where it
didn't come up. Let's dive into the
third pattern. Database sharding. This
pretty much always comes up because at
some point your database just gets too
big. Database sharding basically means
splitting your data across multiple
databases so no single machine has to
store or handle all of the load. Each
shard is responsible for just a portion
of the data. Imagine we've been asked to
design a healthcare system that manages
digital medical records. Hospitals
across the country are all writing
patient data at the same time. visits,
prescriptions, lab results. If we try to
store everything in one database,
performance will quickly suffer.
Instead, you might shard by patient ID.
So each shard only holds a slice of the
total patients and their records. This
way, queries stay fast even as the
system scales to millions of patients.
There are a few common ways to shard.
The first of which is called rangebased
sharding. We can split up the data by
ranges of patient IDs. For example,
shard one could store IDs 1 through 1
million, shard 1 million through 2
million, and so on. It's simple, but it
can cause hotspots if most of the
activity clusters in one range. Next,
hashbased sharding. We can apply a hash
to the patient ID and then route that
record to a shard. With something like
consistent hashing, we can perfectly
balance the load here, but range queries
or queries that might have to go across
related data might now have to span
multiple partitions, which is a really
bad access pattern. One other strategy
that can work really well is
geossharding. We split up the data by
region. In this example, patients on the
east coast could be in one shard and the
west coast on another shard. This will
work really well when queries are always
scoped to a single region. But if data
needs to be shared across regions, this
can quickly become a problem. Sharding
is super powerful, but it also
introduces some risks. Whenever you're
in an interview, make sure you think
about how to minimize cross shard
queries, what you'll do if you have to
reshard because one shard grows too
large, and how you'll avoid hot keys
from overloading a single shard.
Sharding is pretty much always going to
show up in interviews where you're
designing for millions or billions of
users. But it's not without its
trade-offs. If you found this helpful,
feel free to subscribe. I'll be simply
breaking down more system design
concepts throughout the next coming
weeks.
Analytical Tools
Create personalized analytical resources on-demand.