Distributed Systems

System Design Primer

More advanced: Microservice Design Patterns Course

DNS Service

DNS takes a URL and returns an IP

Database

NoSQL doesn’t offer joins
Use NoSQL for unstructured data, data flows where you’re only serializing and deserializing data, or storing massive amounts of data

Database Replication

It’s common to have multiple copies of your db to enable failover

Master-slave replication

A master db will only support writes. Slave dbs will get copies of the data from the master and only support reads
- This lets you distribute load for reads to serve more reads in parallel
You usually have more slaves than masters since reads are more common

Failures

If only one slave is available and it goes offline, reads will be temporarily directed to the master until a new slave can be found
If the master goes offline, a slave will be promoted

Sharding

Horizontal scaling of databases
Each database is known as a shard. Instead of replicating the db, it’s split up over multiple databases
We can use a basic hash function to decide which database to use for a user: ex user_id % num_dbs

Issues

Resharding - a single shared can no longer hold any more data
Celebrity problem - Ex. multiple celebrities twitter accounts end up on the same shard
It makes it hard to perform joins

Vertical vs. Horizontal Scaling

Vertical if you have low traffic
Vertical scaling doesn’t allow for failover and has a hard limit

Load Balancer

Lets you distribute traffic among servers
The load balancer communicates with the servers via a private IP

Cache

CDN

CDNs cache static content like CSS, JS files

Ex. user in hong kong requests the data from the CDN. It isn’t there, so we pull it from S3 We then cache the data in the CDN

Considerations

Set an expiration time

Stateless Servers

It’s recommended to keep user state data in a shared DB

Geo Routing

You can split your whole web server operation between geographical regions
Ex. you have copies of your web servers, dbs, and caches over different geographical regions

Considerations

You may need copies of your data in each region in case one data center goes out

Message Queues

Supports asynchronous communication via a producer publishing messages, and a consumer that subscribes to the message queue consuming them

Back of the Envelope Estimation

Power of 2	Value	Name
10	1k	Kilobyte
20	1M	Megabyte
30	Billion	Gigabyte
40	Trillion	Terabyte
50	Quadrillion	Perabyte

L1 Cache - in the CPU L2 Cache - L3 Cache -

Latency numbers - page 36

Framework for System Design Interview Questions

Don’t over-engineer
Always keep the interviewer in the loop of what you’re thinking

Understand the problem and establish design scope
- Don’t give quick answers. Think through and fully understand the requirements
  - What features are we going to build?
  - How many users?
  - How fast does the company anticipate to scale?
  - What’s the tech stack?
  - What are the most important features?
  - What’s the traffic volume?
Propose a high-level design and get their feedback
Design deep dive
Focus on bottlenecks. Some interviewers want you to focus on high-level design
Wrap Up
- Discuss potential improvements, give a recap

Design a Rate-Limiter

Controls the rate of traffic sent by a client or a service

Ex. Number of accounts from the same ip, number of writes per second

Client-Side Requests can be forged by malicious actors

So we should do it server side
The Rate-Limiter should sit between the client and servers and throw HTTP errors

API Gateways are managed services that provide rate-limiting

Rate Limiting

We use Redis to store data, since it’s fast and has INCR - increment and EXPIRE

Distributed Rate-Limiter

If two requests concurrently read the counter before writing back, they will both incremented it by one
Or we may need multiple rate-limiter servers
We can have two clients with two rate limiters, both using a shared Redis store
Synchronize data with an eventual consistency model

Performance Optimization

Multi-data centers are crucial because latency will be high for users far geographically from the data center

Design Consistent Hashing

A technique to hash requests evenly across servers
Basic Technique: hash(server_key) % n This fails, however, when servers are added and removed
Instead, let’s picture a hash ring, where the hash space from 0 to 2^160 - 1 is connected in a circle
Now, we put our servers evenly spaced out on the ring
To determine which server a key goes to, we start at the hash position and go forward until a key is found

Adding a Server

Add it between s0 (server 0) and s1, then s1 and s2,

Problems with this Approach

It’s impossible to keep the servers evenly-spaced
It’s possible to have non-uniform key distribution - lots of data mapped to the same server

Virtual Nodes - a Better Approach

Virtual Nodes - each server has multiple virtual nodes on the ring. Because there’s a higher count per server, the spacing becomes more even

Partitions of the ring

Design a Key-Value Store

Single-Server approach: Store key-value pairs in a hash table that keeps everything in RAM

Optimize by: Storing the most-used data on RAM and the rest on disk

Data compression

CAP Theorem

We thus need a consistent hashing algorithm to spread the traffic

We should spread our replicas over various data centers in different geographic regions

Consistency

We need to keep data in sync over various replicas

N - number of replicas W - Write Quorum. 1 means that each node must receive confirmation from 1 node that The data was send R - Read Quorum - The number of responses a read must wait for

Strong consistency - any read is the most recent write
Weak consistency
Eventual consistency - give it time

If we have two writes on different servers that modify the same data:

we use a vector clock to determine which came first - this stores server id and version

Handling Failures

If a server goes down

Detecting failures

If two servers say that a server is down, then we trust it

Gossip Protocol

Each node maintains a node membership list - contains member IDs (other nodes) and heartbeat counts
Each node periodically sends a heartbeat. If a heartbeat counter is lagging, the node is down
Once a node notices that another is down, it sends heartbeats containing s2’s info to random nodes

Sloppy Quorum - Temporary Failures

A technique for high availability
The system chooses the first W and R available servers for reads and writes

If a server is down, another will temporarily process requests
- When the server comes back, the temporary server will hand off that data

Handling Permanent Failures

Compare each piece of data on the replicas and update each replica to have the newest version

Reads & Writes

Reads

Go through a memory cache first, then a bloom filter (if not present in cache) to determine which SST holds the data

Writes

Go into the WAL (write-ahead-log), then memory cache, then SST

Full Design

A coordinator node coordinates data from client to servers using consistent hashing
Maintain heartbeats between nodes to keep servers up to date

Goals - Techniques

Goal	Technique
Big Data	Consistent Hashing
High Availability Reads	Data Replication
High Availability Writes	Vector Clocks
Dataset Partitioning	Consistent Hashing
Tunable Consistency	Quorum Consensus

Security

in the API Gateway - do IP rate limiting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed_Systems.org

Distributed_Systems.org

Distributed Systems

DNS Service

Database

Database Replication

Failures

Sharding

Vertical vs. Horizontal Scaling

Load Balancer

Cache

CDN

Stateless Servers

Geo Routing

Message Queues

Back of the Envelope Estimation

Framework for System Design Interview Questions

Design a Rate-Limiter

Distributed Rate-Limiter

Performance Optimization

Design Consistent Hashing

Adding a Server

Problems with this Approach

Virtual Nodes - a Better Approach

Design a Key-Value Store

Consistency

Handling Failures

Sloppy Quorum - Temporary Failures

Handling Permanent Failures

Reads & Writes

Full Design

Goals - Techniques

Security

Files

Distributed_Systems.org

Latest commit

History

Distributed_Systems.org

File metadata and controls

Distributed Systems

DNS Service

Database

Database Replication

Failures

Sharding

Vertical vs. Horizontal Scaling

Load Balancer

Cache

CDN

Stateless Servers

Geo Routing

Message Queues

Back of the Envelope Estimation

Framework for System Design Interview Questions

Design a Rate-Limiter

Distributed Rate-Limiter

Performance Optimization

Design Consistent Hashing

Adding a Server

Problems with this Approach

Virtual Nodes - a Better Approach

Design a Key-Value Store

Consistency

Handling Failures

Sloppy Quorum - Temporary Failures

Handling Permanent Failures

Reads & Writes

Full Design

Goals - Techniques

Security