tinyurl

A tinyurl clone service. A classic system design interview question:

How would you design TinyURL?

Visit demo to see the project in action.

Quick Start

Clone the project and run it locally.

# Setup project
./bin/setup.sh

# Run locally
./bin/local.sh

Features

Environments: local tests, local deploy, staging deploy, production deploy
Unit and integration tests
Bootstrap front-end
Deployable "serverless" app

TODO

Flask app configuration management
Python linter
More doc strings
Continuous Integration (Circle CI or Travis CI)
Internationalization (i18n)
Lambda cold starts
Choose at least 2 subnets for Lambda to run your functions in high availability mode
Automatic API docs
Disable push to master and require all changes via pull request
Analytics dashboard
Viral alerts
Tracking tags: email, blog, etc.
More tests
- Flask app error handlers
- Tests for Redis return types
- Staging integration tests
Bugs
- Duplicate logs in CloudWatch Logs, Lambda appears to modify root / flask logger

References

Design decision

I use FaaS Lambda to support the application for a few good reasons. Given this is a relatively small project, which is seldomly used, Lambda will be very cost effective. In order to host the application with an an EC2 instance an ASG (Auto Scaling Group) or an ECS (Elastic Container Service) will need to keep at least one instance ready at all times regardless of traffic. Running an EC2 24 hours a day costs money. Whereas Lambda does not require any permanently provisioned machines (but it will have cold starts) and is very scalable too.

Database

URLs can be viral which means traffic distribution of unique URLs will not be uniform. Assuming an 80-20 rule: 80% of the traffic is generated by 20% of the URLs. This application is read heavy (redirect from TinyURL) and will no doubt have significantly less writes (create TinyURL).

DynamoDB vs ElastiCache: Redis

DynamoDB can be highly available for the right price, and highly scalable with the right design. However it is not suitable for TinyURL since the partition key, which should uniquely identify the URL, will inevitably reach provisioned thorough-put. When this capacity is reached the application can no longer be serviced by DynamoDB, and thus requests cannot be serviced without manual intervention. Provided that a viral event might happen at any point in the day, it is not acceptable to react based on read traffic.

Redis supports hset and hget which does not suffer from hot partitions from heavy reads and will consistently perform at O(1) time complexity. Redis can be scaled as storage needs increase, persisted and clustered. With the right monitoring solution, capacity planning makes Redis scalable proactively as storage is required.

tl;dr:

DyanamoDB is highly available for the right price, but does not scale well under heavy read loads on a hot partition.
Redis does well under heavy reads, and can be scaled proactively.

Implementation

In the following examples, please note that a table is used as an abstraction to help illustrate how data is persisted. In reality the there are two dictionary-like associations: short-to-long and long-to-short. The following technique is then used to determine the short ID.

A unique sequential number is tracked simply by querying the size of a set (see hset) with time complexity O(1). In this case, this will be the cardinality of the short ID to long URL mapping. This also helps de-duplicate records if an URL is submitted more than once.

When a new association is to be written, there is a potential race condition among Lambda functions. To resolve this, a watch is setup for conditional execution of a transaction. The write is then re-attempted if the cardinality of the set changes before it completes the transaction. Meaning another Lambda function successfully created an association with the same short ID. In this situation, URLs between Lambda functions can either be identical or distinct, but they do share the same short ID.

Example: Long to short

Given an URL, insert it into table. Assuming the ID is automatically assigned by the database.

id	long	short
125	'https://www.youtube.com/watch?v=dQw4w9WgXcQ'	NULL

Get an id (125) (an auto incremented unique identifier). Convert id into a base-62 string ('cb') which will be the short ID of the long form URL. Update table at id, and update the short ID. This can be done in the same transaction: insert then update.

id	long	short
125	'https://www.youtube.com/watch?v=dQw4w9WgXcQ'	'cb'

Example: Short to long

Convert short base-62 string ('cb') into a base-10 integer which is used to lookup the entry. Select from table given id (125), and return long form URL.

id	long	short
125	'https://www.youtube.com/watch?v=dQw4w9WgXcQ'	'cb'

API

Set Endpoint

# Local development: `./bin/local.sh`
export TINYURL_ENDPOINT=http://localhost:5000

# OR, in the cloud: `./bin/provision.sh --stage staging`
export TINYURL_ENDPOINT=https://tinyurl-staging.7okyo.com

Make TinyURL

curl \
    --write-out '%{http_code}\n' \
    --request POST "${TINYURL_ENDPOINT}/api" \
    --header 'Content-Type: application/json' \
    --data '{"url": "http://example.com"}'

Search TinyURL

With Long URL

curl \
    --write-out '%{http_code}\n' \
    --request GET "${TINYURL_ENDPOINT}/api?url=http://example.com"

With Short ID

curl \
    --write-out '%{http_code}\n' \
    --request GET "${TINYURL_ENDPOINT}/api?id=a"

Redirect from TinyURL

curl \
    --write-out '%{http_code}\n' \
    --request GET "${TINYURL_ENDPOINT}/a"

Commands

Command	Wrapper for	Description
`./bin/setup.sh`	N/A	Setup project -- run this for before all others
`./bin/test.sh`	pytest	Run tests
`./bin/local.sh`	serverless wsgi	Run locally
`./bin/provision.sh`	serverless deploy	Provision cloud
`./bin/deprovision.sh`	serverless remove	De-provision cloud
`./bin/logs.sh`	serverless logs	Get logs from cloud

Arguments

The pytest and serverless arguments can be passed into the underlying CLI tools. For example, to deploy to production use run ./bin/provision.sh --stage production, since ./bin/provision.sh is a wrapper for serverless deploy.

Troubleshooting

AWS DNS is unable to resolve the S3 path for the deploy. To continue developing, try switching the provider region.

Serverless: Recoverable error occurred (Inaccessible host: *.s3.amazonaws.com'. This service may not be available in the us-east-1' region.), sleeping for 5 seconds. Try 4 of 4

Lambda log collection is not supported in ca-central-1.

ServerlessError: No existing streams for the function

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
app		app
bin		bin
core		core
services		services
tests		tests
.gitignore		.gitignore
README.md		README.md
package.json		package.json
requirements.txt		requirements.txt
run.py		run.py
serverless.yaml		serverless.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tinyurl

Quick Start

Features

TODO

References

Design decision

Database

DynamoDB vs ElastiCache: Redis

Implementation

Example: Long to short

Example: Short to long

API

Set Endpoint

Make TinyURL

Search TinyURL

With Long URL

With Short ID

Redirect from TinyURL

Commands

Arguments

Troubleshooting

About

Releases

Packages

Languages

eddiecorrigall/tinyurl

Folders and files

Latest commit

History

Repository files navigation

tinyurl

Quick Start

Features

TODO

References

Design decision

Database

DynamoDB vs ElastiCache: Redis

Implementation

Example: Long to short

Example: Short to long

API

Set Endpoint

Make TinyURL

Search TinyURL

With Long URL

With Short ID

Redirect from TinyURL

Commands

Arguments

Troubleshooting

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages