System Design Primer
More advanced: Microservice Design Patterns Course
- DNS takes a URL and returns an IP
- NoSQL doesn’t offer joins
- Use NoSQL for unstructured data, data flows where you’re only serializing and deserializing data, or storing massive amounts of data
- It’s common to have multiple copies of your db to enable failover
Master-slave replication
- A master db will only support writes. Slave dbs will get copies of the data from the master and only support reads
- This lets you distribute load for reads to serve more reads in parallel
- You usually have more slaves than masters since reads are more common
- If only one slave is available and it goes offline, reads will be temporarily directed to the master until a new slave can be found
- If the master goes offline, a slave will be promoted
- Horizontal scaling of databases
- Each database is known as a shard. Instead of replicating the db, it’s split up over multiple databases
- We can use a basic hash function to decide which database to use for a user: ex user_id % num_dbs
Issues
- Resharding - a single shared can no longer hold any more data
- Celebrity problem - Ex. multiple celebrities twitter accounts end up on the same shard
- It makes it hard to perform joins
- Vertical if you have low traffic
- Vertical scaling doesn’t allow for failover and has a hard limit
- Lets you distribute traffic among servers
- The load balancer communicates with the servers via a private IP
CDNs cache static content like CSS, JS files
Ex. user in hong kong requests the data from the CDN. It isn’t there, so we pull it from S3 We then cache the data in the CDN
Considerations
- Set an expiration time
- It’s recommended to keep user state data in a shared DB
- You can split your whole web server operation between geographical regions
Ex. you have copies of your web servers, dbs, and caches over different geographical regions
Considerations
- You may need copies of your data in each region in case one data center goes out
Supports asynchronous communication via a producer publishing messages, and a consumer that subscribes to the message queue consuming them
Power of 2 | Value | Name |
---|---|---|
10 | 1k | Kilobyte |
20 | 1M | Megabyte |
30 | Billion | Gigabyte |
40 | Trillion | Terabyte |
50 | Quadrillion | Perabyte |
L1 Cache - in the CPU L2 Cache - L3 Cache -
Latency numbers - page 36
- Don’t over-engineer
- Always keep the interviewer in the loop of what you’re thinking
- Understand the problem and establish design scope
- Don’t give quick answers. Think through and fully understand the requirements
- What features are we going to build?
- How many users?
- How fast does the company anticipate to scale?
- What’s the tech stack?
- What are the most important features?
- What’s the traffic volume?
- Don’t give quick answers. Think through and fully understand the requirements
- Propose a high-level design and get their feedback
- Design deep dive
Focus on bottlenecks. Some interviewers want you to focus on high-level design
- Wrap Up
- Discuss potential improvements, give a recap
Controls the rate of traffic sent by a client or a service
Ex. Number of accounts from the same ip, number of writes per second
Client-Side Requests can be forged by malicious actors
- So we should do it server side
- The Rate-Limiter should sit between the client and servers and throw HTTP errors
API Gateways are managed services that provide rate-limiting
Rate Limiting
We use Redis to store data, since it’s fast and has INCR - increment and EXPIRE
- If two requests concurrently read the counter before writing back, they will both incremented it by one
- Or we may need multiple rate-limiter servers
- We can have two clients with two rate limiters, both using a shared Redis store
- Synchronize data with an eventual consistency model
- Multi-data centers are crucial because latency will be high for users far geographically from the data center
- A technique to hash requests evenly across servers
Basic Technique: hash(
server_key
) % n This fails, however, when servers are added and removed - Instead, let’s picture a hash ring, where the hash space from 0 to 2^160 - 1 is connected in a circle
- Now, we put our servers evenly spaced out on the ring
- To determine which server a key goes to, we start at the hash position and go forward until a key is found
- Add it between s0 (server 0) and s1, then s1 and s2,
- It’s impossible to keep the servers evenly-spaced
- It’s possible to have non-uniform key distribution - lots of data mapped to the same server
Virtual Nodes - each server has multiple virtual nodes on the ring. Because there’s a higher count per server, the spacing becomes more even
Partitions of the ring
Single-Server approach: Store key-value pairs in a hash table that keeps everything in RAM
Optimize by: Storing the most-used data on RAM and the rest on disk
- Data compression
CAP Theorem
We thus need a consistent hashing algorithm to spread the traffic
- We should spread our replicas over various data centers in different geographic regions
We need to keep data in sync over various replicas
N - number of replicas W - Write Quorum. 1 means that each node must receive confirmation from 1 node that The data was send R - Read Quorum - The number of responses a read must wait for
- Strong consistency - any read is the most recent write
- Weak consistency
- Eventual consistency - give it time
If we have two writes on different servers that modify the same data:
- we use a vector clock to determine which came first - this stores server id and version
- If a server goes down
Detecting failures
- If two servers say that a server is down, then we trust it
Gossip Protocol
- Each node maintains a node membership list - contains member IDs (other nodes) and heartbeat counts
- Each node periodically sends a heartbeat. If a heartbeat counter is lagging, the node is down
- Once a node notices that another is down, it sends heartbeats containing s2’s info to random nodes
- A technique for high availability
The system chooses the first W and R available servers for reads and writes
If a server is down, another will temporarily process requests
- When the server comes back, the temporary server will hand off that data
Compare each piece of data on the replicas and update each replica to have the newest version
Reads
- Go through a memory cache first, then a bloom filter (if not present in cache) to determine which SST holds the data
Writes
- Go into the WAL (write-ahead-log), then memory cache, then SST
- A coordinator node coordinates data from client to servers using consistent hashing
- Maintain heartbeats between nodes to keep servers up to date
Goal | Technique |
Big Data | Consistent Hashing |
High Availability Reads | Data Replication |
High Availability Writes | Vector Clocks |
Dataset Partitioning | Consistent Hashing |
Tunable Consistency | Quorum Consensus |
- in the API Gateway - do IP rate limiting