This project demonstrates a partitioned signup flow based on a primer from "Designing Data-Intensive Applications" book by Martin Kleppmann @martinkl (Thank you!): users sign up at Account service which requires a username. There are so many people willing to register, that a single PostgreSQL database can't hold all account records, but three servers are enough for this hypothetical service load. Therefore we should split (partition) user accounts on three databases and make sure a username is unique across all of them.
The idea is to write signup requests into account.signup_request
Kafka topic which is partitioned by username.
Hence all attempts to claim username Bob will be stored in the same Kafka partition based on
consistent hashing algorithm.
For example, {username: Bob, request_id: 13rUw7cUfrGO9Go9xbZearzuuAu}
message is written to
hash('Bob') % partitions_count
partition.
Since we have three PostgreSQL instances, we need to split account.signup_request
topic into three partitions (0, 1, 2).
Each signup-server process sequentially reads Kafka messages from its own partition and
stores user accounts in its PostgreSQL database.
If Bob username exists in Postgres, the program emits a failure message to account.signup_response
topic.
Otherwise, Bob's account is created and success message is written to the topic. For example:
{username: Bob, success: false, request_id: 13rVCgpmD0UgKH6zNHdfcPG63Df}
{username: Bob, success: true, request_id: 13rUw7cUfrGO9Go9xbZearzuuAu}
Note, request_id
is generated by a client who sends signup requests.
Request IDs are needed to deduplicate messages. IDs are kept for a certain duration
(until a message ages out) or limited by storage size. I have not tried deduplication in this project,
although I was curious what storage will be the way to go.
For instance, Segment shared how they leverage RocksDB in Delivering Billions of Messages Exactly Once,
while CockroachDB uses RocksDB as a Storage Layer.
Now I know what to try next!
A word about artificial keys in PostgreSQL. UUID v4 is a common choice to generate a random unique ID for an entity, e.g., invoice ID. Indexing of highly randomized values cause write amplification, so INSERTs become slow. In SQL Keys in Depth the author shows the superior performance of UUID v1 algorithm which produces node MAC address + timestamp monotonically increasing values. In this demo I used K-Sortable Unique IDentifier (timestamp + randomly generated payload) to assign user IDs in PostgreSQL. Segment goes into KSUID details in A Brief History of the UUID.
Let's run three PostgreSQL docker containers on 5433, 5434, 5435 ports with account
dbs created.
We also need Kafka which will have account.signup_request
and account.signup_response
topics
with 3 partitions and 1 replica. Docker Compose will take care of that. The only caveat is that
you should set KAFKA_ADVERTISED_HOST_NAME
.
$ cd ./docker/
$ KAFKA_ADVERTISED_HOST_NAME=$(ipconfig getifaddr en0) docker-compose up
Install dependencies using dep package manager and build all commands.
$ dep ensure
$ make build
Create PostgreSQL schema in every db with schema command.
$ ./schema -pgport=5433 && ./schema -pgport=5434 && ./schema -pgport=5435
Run three signup-server for each account.signup_request
partition to process signup requests.
$ ./signup-server -partition=0 -pgport=5433
$ ./signup-server -partition=1 -pgport=5434
$ ./signup-server -partition=2 -pgport=5435
Finally, run signup-ctl and type usernames to send signup requests. Note, both programs have a debug mode to show more logs.
$ ./signup-ctl
bob
2:0 13rUw7cUfrGO9Go9xbZearzuuAu bob ✅
alice
2:1 13rUwm0PI5tMT3FEx4OwW905yWw alice ✅
john
2:2 13rUyyeODTy1GDdvRhtgLjC5sbG john ✅
lloyd
0:0 13rV46Yp6Ng0uEPuUmsF51S5pi2 lloyd ✅
aaron
0:1 13rV4lXEoSQrcSpaFYamWMBaDWt aaron ✅
peter
1:0 13rVCAFeRJxK671227gtFSq069F peter ✅
bob
2:3 13rVCgpmD0UgKH6zNHdfcPG63Df bob ❌
lloyd
0:2 13rVEyTTAh7P76aKQ3ZEomDEzqX lloyd ❌
sam
2:4 13rVFmwyaw2u5UXXNKIKMplycqb sam ✅
Signup responses are printed in partition_id:offset request_id username
format.
As you can see, bob successfully registered and the attempt to sign up as bob again failed.
2:0 13rUw7cUfrGO9Go9xbZearzuuAu bob ✅
...
2:3 13rVCgpmD0UgKH6zNHdfcPG63Df bob ❌
To run tests you will need Postgres and test env variables set up.
$ make docker_run_postgres
$ make test