Do Not Blindly Assume You Need Multiple Machines in System Design Interview! Measure First!
Table of Contents
- Introduction
- What is Back-of-the-Envelope Calculation?
- When Do I Need Back-of-the-Envelope Calculation?
- How Do I Do Back-of-the-Envelope Calculation?
- Estimation for Simple Backend Server + Database
- Conclusion
Introduction
Imagine you are in a system design interview.
Q. Have you ever blindly assumed you need multiple machines for the system design without any measurements?
Do not introduce all the complexity and failure modes of distributed systems just to force a good design. Only introduce them when you need them!
You do not always need data partitioning and multiple machines! Avoid them if you can and it will even get you extra bonus points!!
What is Back-of-the-Envelope Calculation?
A back-of-the-envelope calculation is a rough calculation, typically jotted down on any available scrap of paper such as an envelope. It is more than a guess but less than an accurate calculation or mathematical proof.
This is all we need!!
You do not need accurate measurement during the system design interview. They do not expect you to be a genius mathematician!
When Do I Need Back-of-the-Envelope Calculation?
Typically, during the system design interviews, you need the back-of-the-envelope calculation to answer the following two questions:
- Q. Do I need multiple machines for processing requests?
- Q. Do I need data partitioning?
We will look at how to answer the first question in this blog!
The calculation allows to measure whether you need a single machine or multiple machines + approx. # of servers to support the system requirements.
A lot of people have misconception that this calculation is only useful for measuring the # of servers required to support the load on the system. But it is also equally important to understand whether you need single machine vs. multiple machines.
How Do I Do Back-of-the-Envelope Calculation?
The main objective is to understand whether a single machine can handle the desired RPS (Request Per Second). For this, we need to check which resources could potentially be the limiting factor:
- CPU Bounds: limited by speed of the CPU
- Memory Bounds: limited by memory space to hold the working data
- I/O Bounds: limited by input/output operations to be completed
Once we find the limiting factor, we can compare with the desired RPS to see whether we need multiple machines or not.
CPU Bound
A CPU (Central Processing Unit) core is a CPU’s processor. In the old days, every processor had just one core that could focus on one task at a time. Nowadays, you can have CPUs with 8 cores or 16 cores (Some CPUs have even 24 cores).
All CPUs have threads, which allow your CPU to perform multiple things at once. Each CPU core can have two threads. So a processor with 8 cores will have 16 threads.
The maximum number of RPS that a CPU can handle depends on the # of cores it has and the duration of the CPU usage per request. We can use the following formula to estimate RPS bounded by CPU:
Imagine you have a CPU with 8 cores and each request takes about 10ms. Then, the maximum RPS you can support would be:
- (8 cores * 2) / 10 ms = 1600 RPS
Note: the tricky part about this calculation is figuring out the task duration. In order to have reasonable estimation, you need to understand what’s involved in processing the request.
Memory Bound
Your computer’s main memory is called RAM (Random Access Memory). You can think of it as a workspace the computer uses to get work done.
The maximum number of RPS that a RAM can handle depends on the total RAM and the duration and the memory usage of the task itself. We can use the following formula to estimate RPS bounded by memory:
Imagine you have a RAM with 16GB, and your task takes 10MB of memory and 100ms to process:
- 16GB / (10MB * 100 ms) = 16k RPS
Note: the task duration can be different from the CPU usage time because a request might not use CPU while waiting for I/O request due to context switch (e.g., making a request to database). However, you still need to store context in memory even in case of context switch.
I/O Bound
In order to talk to another computer, you need a file descriptor to send request and receive response. A computer has a limited number of file descriptors and this can bound the maximum number of requests you can handle.
If a machine can handle 10k concurrent connection and each task takes 100ms to complete, the I/O bound would come out to be:
- 10k / 100ms = 100k RPS
The C10K problem paper mentions that 10k connection is a hard problem to solve. We can use this evidence to assume 10–20k maximum concurrent connection per machine. Netflix engineering team mentioned that they managed to have ~20k open connection per machine for their notification service in a tech talk.
This means you can have ~10–20k RPS per machine.
Note: the exact number of file descriptors is not important. You just need to pick a reasonable number (e.g., 10k).
Estimation for Simple Backend Server + Database
Let’s use this simple system design to apply what we learned so far! This system is responsible for store & get key-value pairs. We have a backend server talking to a database with the following requirements:
- Desired QPS/RPS: ~10k
- Value/Key Data Size: ~10MB
CPU Bound
One thing to note is that the backend server is not doing much work. It’s forwarding requests to database instance to create/update/get key-value pairs. This implies that most of the requests would be blocked on I/O request to database, not using CPU.
Hence, the CPU usage time would be heavily influenced by the context switch time and memory access time:
- average context switch time: ~20μs in average
- memory access time: bounded by a few micro seconds
Assuming we have 8 CPU cores and task duration of ~50μs (rough estimation, which is okay), the maximum RPS, using the formula, comes out to be:
- (8 cores * 2) / 50 μs = 320k RPS
Note: It is a pretty big number indicating the system is not bounded by CPU.
Memory Bound
The task duration would be dominated by the network latency because its latency is significantly larger than other factors (e.g., context switch, memory access).
The network latency depends on a lot of factors (e.g., network congestion, bandwidth) but you can just choose a network latency you’ve seen in the past for a single network call: ~400ms.
Assuming we have 16GB memory and maximum data size of key-value pair is 10MB, our formula comes out to:
- (16 GB) / (10MB * 400ms) = 4000 RPS
I/O Bound
Assuming a single machine can support up to 10k concurrent connection and use the same task duration we used above, the I/O bound would come out to be:
- (10k connection) / 400ms = 25k RPS
Single Machine vs. Multiple Machines
Overall, this simple system design is bounded by memory and the maximum RPS it can support per machine turns out to be 4000 RPS. In order to support the desired RPS of 10k RPS, we would need multiple machines!
Even if the desired RPS is not bounded by any of the resources, you still need to consider having back-up in case something happens to improve the availability of your system.
Conclusion
In a system design interview, you should be able to justify for everything you say! Use the back-of-the-envelop calculation we learned today to justify why you need multiple machines, instead of a single machine!
It’s better to not say anything unless you know how to justify yourself!