Run large-scale tasks on AWS Spot Instances.
Have you ever needed to perform tasks on a large scale, but felt constrained by the limitations of serverless Lambda functions in terms of CPU/memory and execution time? If so, you're in the right place.
aws-shotgun offers a solution that's particularly effective for:
- Handling a high volume of HTTP requests
- Large-scale web scraping
- CPU-intensive tasks on a large scale
The best part? It leverages AWS Spot Instances, allowing you to maximize cost savings.
To get started with aws-shotgun, you need to understand two basic concepts: the Producer and the Consumers.
Producer: This is a Lambda function responsible for sending inputs to an SQS queue.
Consumers: These are AWS Spot Instances that retrieve tasks from the queue, execute the tasks, and then store the results in S3 buckets.
With aws-shotgun, you don't need to worry about setting up or cleaning up the infrastructure. Your main focus remains on defining the business logic that you want to execute at scale. So, let's dive in and start using aws-shotgun!
Name | Version |
---|---|
AWS CLIv2 | >=2.12.1 |
terraform | >=1.5.5 |
Node.js | ~18.16.0 |
jq | >=1.6 |
It is preferable to create a fresh account so you don't mix this infra with your other running aws environments. Make sure to set the following environment variables before running any Terraform commands:
export AWS_ACCESS_KEY_ID="<your access key>"
export AWS_SECRET_ACCESS_KEY="<your secret key>"
export AWS_REGION="<your region>"
or aws configure
Your AWS account must have permissions to create/update/destroy resources for the following AWS services:
- Amazon EC2
- Amazon S3
- Amazon SQS
- Amazon VPC
- AWS Identity and Access Management (IAM)
- AWS Lambda
When the Amazon EC2 Spot Instances are launched, they will automatically start
polling SQS for messages. You can define custom processing for the response that
is received from the target endpoint by updating
src/consumer/index.js
. This handler is invoked
for each message that is processed, and should return the JSON object that will
be written to S3.
Initially, the response body is written as-is to S3.
The settings.json
file defines the following configuration
values:
Name | Description | Default Value |
---|---|---|
aws_region |
AWS region to deploy to | us-east-1 |
aws_spot_instance_bid_usd |
Spot instance bid price (USD) | 0.015 |
aws_spot_instance_type |
Spot instance type | t2.micro |
aws_spot_instance_count |
Number of spot instances | 2 |
aws_sqs_batch_size |
Batch size for receiving SQS messages (Max: 10) | 10 |
The src/producer/urls.json
is where you define the
inputs to your consumers.
The example src/producer/urls.json
is configured
to send http requests to several mock API endpoints.
Run the following command to deploy the infrastructure:
script/start
Once complete, the Lambda function will populate the SQS queue with messages for each of the endpoints to test. The Spot Instances will then poll the SQS queue for messages and send requests to the endpoints. The responses will be written to S3.
You can check the current status of the process by running the following command:
script/status
This will output the approximate number of messages in the SQS queue and objects in the S3 bucket.
Message count: 56
Object count: 44
When the message count is 0
and the object count is stable, you can destroy
the infrastructure by running the following command:
script/stop
This will copy the output from the S3 bucket to the output
directory and
destroy the infrastructure.
Note
The S3 bucket is not deleted by default. You must delete the bucket manually after destroying the infrastructure.
- Show better status
- Show errors in ec2
- Local test run on 5 random samples
- Price estimation.
- Spot fleet instead of normal requests