A tinyurl clone service. A classic system design interview question:
How would you design TinyURL?
Visit demo to see the project in action.
Clone the project and run it locally.
# Setup project
./bin/setup.sh
# Run locally
./bin/local.sh
- Environments: local tests, local deploy, staging deploy, production deploy
- Unit and integration tests
- Bootstrap front-end
- Deployable "serverless" app
- Flask app configuration management
- Python linter
- More doc strings
- Continuous Integration (Circle CI or Travis CI)
- Internationalization (i18n)
- Lambda cold starts
- Choose at least 2 subnets for Lambda to run your functions in high availability mode
- Automatic API docs
- Disable push to master and require all changes via pull request
- Analytics dashboard
- Viral alerts
- Tracking tags: email, blog, etc.
- More tests
- Flask app error handlers
- Tests for Redis return types
- Staging integration tests
- Bugs
- Duplicate logs in CloudWatch Logs, Lambda appears to modify root / flask logger
- https://stackoverflow.com/questions/742013/how-do-i-create-a-url-shortener
- https://serverless.com/blog/flask-python-rest-api-serverless-lambda-dynamodb/
- https://pypi.org/project/redis/
- https://stackoverflow.com/questions/1119722/base-62-conversion
- https://stackoverflow.com/questions/22340676/find-or-create-idiom-in-rest-api-design
- https://flask.palletsprojects.com/en/1.1.x/patterns/apierrors/
- http://werkzeug.palletsprojects.com/en/0.16.x/exceptions/
- https://flask.palletsprojects.com/en/1.1.x/appcontext/
- https://flask.palletsprojects.com/en/1.1.x/logging/
- https://flask.palletsprojects.com/en/1.1.x/testing/
- https://serverless.com/blog/serverless-api-gateway-domain/
- https://serverless-stack.com/chapters/stages-in-serverless-framework.html
- https://serverless.com/framework/docs/dashboard/testing/
- https://serverless-stack.com/chapters/load-secrets-from-env.html
I use FaaS Lambda to support the application for a few good reasons. Given this is a relatively small project, which is seldomly used, Lambda will be very cost effective. In order to host the application with an an EC2 instance an ASG (Auto Scaling Group) or an ECS (Elastic Container Service) will need to keep at least one instance ready at all times regardless of traffic. Running an EC2 24 hours a day costs money. Whereas Lambda does not require any permanently provisioned machines (but it will have cold starts) and is very scalable too.
URLs can be viral which means traffic distribution of unique URLs will not be uniform. Assuming an 80-20 rule: 80% of the traffic is generated by 20% of the URLs. This application is read heavy (redirect from TinyURL) and will no doubt have significantly less writes (create TinyURL).
DynamoDB can be highly available for the right price, and highly scalable with the right design. However it is not suitable for TinyURL since the partition key, which should uniquely identify the URL, will inevitably reach provisioned thorough-put. When this capacity is reached the application can no longer be serviced by DynamoDB, and thus requests cannot be serviced without manual intervention. Provided that a viral event might happen at any point in the day, it is not acceptable to react based on read traffic.
Redis supports hset and hget which does not suffer from hot partitions from heavy reads and will consistently perform at O(1) time complexity. Redis can be scaled as storage needs increase, persisted and clustered. With the right monitoring solution, capacity planning makes Redis scalable proactively as storage is required.
tl;dr:
- DyanamoDB is highly available for the right price, but does not scale well under heavy read loads on a hot partition.
- Redis does well under heavy reads, and can be scaled proactively.
In the following examples, please note that a table is used as an abstraction to help illustrate how data is persisted. In reality the there are two dictionary-like associations: short-to-long and long-to-short. The following technique is then used to determine the short ID.
A unique sequential number is tracked simply by querying the size of a set (see hset) with time complexity O(1). In this case, this will be the cardinality of the short ID to long URL mapping. This also helps de-duplicate records if an URL is submitted more than once.
When a new association is to be written, there is a potential race condition among Lambda functions. To resolve this, a watch is setup for conditional execution of a transaction. The write is then re-attempted if the cardinality of the set changes before it completes the transaction. Meaning another Lambda function successfully created an association with the same short ID. In this situation, URLs between Lambda functions can either be identical or distinct, but they do share the same short ID.
Given an URL, insert it into table. Assuming the ID is automatically assigned by the database.
id | long | short |
---|---|---|
125 | 'https://www.youtube.com/watch?v=dQw4w9WgXcQ' | NULL |
Get an id
(125) (an auto incremented unique identifier). Convert id
into a base-62 string ('cb') which will be the short ID of the long form URL. Update table at id
, and update the short ID. This can be done in the same transaction: insert then update.
id | long | short |
---|---|---|
125 | 'https://www.youtube.com/watch?v=dQw4w9WgXcQ' | 'cb' |
Convert short base-62 string ('cb') into a base-10 integer which is used to lookup the entry. Select from table given id
(125), and return long form URL.
id | long | short |
---|---|---|
125 | 'https://www.youtube.com/watch?v=dQw4w9WgXcQ' | 'cb' |
# Local development: `./bin/local.sh`
export TINYURL_ENDPOINT=http://localhost:5000
# OR, in the cloud: `./bin/provision.sh --stage staging`
export TINYURL_ENDPOINT=https://tinyurl-staging.7okyo.com
curl \
--write-out '%{http_code}\n' \
--request POST "${TINYURL_ENDPOINT}/api" \
--header 'Content-Type: application/json' \
--data '{"url": "http://example.com"}'
curl \
--write-out '%{http_code}\n' \
--request GET "${TINYURL_ENDPOINT}/api?url=http://example.com"
curl \
--write-out '%{http_code}\n' \
--request GET "${TINYURL_ENDPOINT}/api?id=a"
curl \
--write-out '%{http_code}\n' \
--request GET "${TINYURL_ENDPOINT}/a"
Command | Wrapper for | Description |
---|---|---|
./bin/setup.sh |
N/A | Setup project -- run this for before all others |
./bin/test.sh |
pytest | Run tests |
./bin/local.sh |
serverless wsgi | Run locally |
./bin/provision.sh |
serverless deploy | Provision cloud |
./bin/deprovision.sh |
serverless remove | De-provision cloud |
./bin/logs.sh |
serverless logs | Get logs from cloud |
The pytest and serverless arguments can be passed into the underlying CLI tools. For example, to deploy to production use run ./bin/provision.sh --stage production
, since ./bin/provision.sh
is a wrapper for serverless deploy
.
AWS DNS is unable to resolve the S3 path for the deploy. To continue developing, try switching the provider region.
Serverless: Recoverable error occurred (Inaccessible host:
*.s3.amazonaws.com'. This service may not be available in the
us-east-1' region.), sleeping for 5 seconds. Try 4 of 4
Lambda log collection is not supported in ca-central-1.
ServerlessError: No existing streams for the function