This module creates resources required to run the GitHub action runner on AWS EC2 spot instances. The life cycle of the runners on AWS is managed by two lambda functions. One function will handle scaling up, the other scaling down.
The action runners are created via a launch template; in the launch template only the subnet needs to be provided. During launch the installation is handled via a user data script. The configuration is fetched from SSM parameter store.
The scale up lambda is triggered by events on a SQS queue. Events on this queue are delayed, which will give the workflow some time to start running on available runners. For each event the lambda will check if the workflow is still queued and no other limits are reached. In that case the lambda will create a new EC2 instance. The lambda only needs to know which launch template to use and which subnets are available. From the available subnets a random one will be chosen. Once the instance is created the event is assumed as handled, and we assume the workflow wil start at some moment once the created instance is ready.
The scale down lambda is triggered via a CloudWatch event. The event is triggered by a cron expression defined in the variable scale_down_schedule_expression
(https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/ScheduledEvents.html). For scaling down GitHub does not provide a good API yet, therefore we run the scaling down based on this event every x minutes. Each time the lambda is triggered it tries to remove all runners older than x minutes (configurable) managed in this deployment. In case the runner can be removed from GitHub, which means it is not executing a workflow, the lambda will terminate the EC2 instance.
Usage examples are available in the root module. By default the root module will assume local zip files containing the lambda distribution are available. See the download lambda module for more information.
The Lambda function is written in TypeScript and requires Node 12.x and yarn. Sources are located in [./lambdas/runners]. Two lambda functions share the same sources, there is one entry point for scaleDown
and another one for scaleUp
.
cd lambdas/runners
yarn install
Test are implemented with Jest, calls to AWS and GitHub are mocked.
yarn run test
To compile all TypeScript/JavaScript sources in a single file ncc is used.
yarn run dist
Name | Version |
---|---|
terraform | >= 0.14.1 |
aws | ~> 4.0 |
Name | Version |
---|---|
aws | ~> 4.0 |
Name | Source | Version |
---|---|---|
pool | ./pool | n/a |
Name | Description | Type | Default | Required |
---|---|---|---|---|
ami_filter | Map of lists used to create the AMI filter for the action runner AMI. | map(list(string)) |
null |
no |
ami_owners | The list of owners used to select the AMI of action runner instances. | list(string) |
[ |
no |
aws_partition | (optional) partition for the base arn if not 'aws' | string |
"aws" |
no |
aws_region | AWS region. | string |
n/a | yes |
block_device_mappings | The EC2 instance block device configuration. Takes the following keys: device_name , delete_on_termination , volume_type , volume_size , encrypted , iops |
list(object({ |
[ |
no |
cloudwatch_config | (optional) Replaces the module default cloudwatch log config. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html for details. | string |
null |
no |
create_service_linked_role_spot | (optional) create the service linked role for spot instances that is required by the scale-up lambda. | bool |
false |
no |
disable_runner_autoupdate | Disable the auto update of the github runner agent. Be-aware there is a grace period of 30 days, see also the GitHub article | bool |
false |
no |
egress_rules | List of egress rules for the GitHub runner instances. | list(object({ |
[ |
no |
enable_cloudwatch_agent | Enabling the cloudwatch agent on the ec2 runner instances, the runner contains default config. Configuration can be overridden via cloudwatch_config . |
bool |
true |
no |
enable_ephemeral_runners | Enable ephemeral runners, runners will only be used once. | bool |
false |
no |
enable_job_queued_check | Only scale if the job event received by the scale up lambda is is in the state queued. By default enabled for non ephemeral runners and disabled for ephemeral. Set this variable to overwrite the default behavior. | bool |
null |
no |
enable_managed_runner_security_group | Enabling the default managed security group creation. Unmanaged security groups can be specified via runner_additional_security_group_ids . |
bool |
true |
no |
enable_organization_runners | n/a | bool |
n/a | yes |
enable_runner_detailed_monitoring | Enable detailed monitoring for runners | bool |
false |
no |
enable_ssm_on_runners | Enable to allow access to the runner instances for debugging purposes via SSM. Note that this adds additional permissions to the runner instances. | bool |
n/a | yes |
enabled_userdata | Should the userdata script be enabled for the runner. Set this to false if you are using your own prebuilt AMI | bool |
true |
no |
environment | A name that identifies the environment, used as prefix and for tagging. | string |
null |
no |
ghes_ssl_verify | GitHub Enterprise SSL verification. Set to 'false' when custom certificate (chains) is used for GitHub Enterprise Server (insecure). | bool |
true |
no |
ghes_url | GitHub Enterprise Server URL. DO NOT SET IF USING PUBLIC GITHUB | string |
null |
no |
github_app_parameters | Parameter Store for GitHub App Parameters. | object({ |
n/a | yes |
idle_config | List of time period that can be defined as cron expression to keep a minimum amount of runners active instead of scaling down to 0. By defining this list you can ensure that in time periods that match the cron expression within 5 seconds a runner is kept idle. | list(object({ |
[] |
no |
instance_allocation_strategy | The allocation strategy for spot instances. AWS recommends to use capacity-optimized however the AWS default is lowest-price . |
string |
"lowest-price" |
no |
instance_max_spot_price | Max price price for spot intances per hour. This variable will be passed to the create fleet as max spot price for the fleet. | string |
null |
no |
instance_profile_path | The path that will be added to the instance_profile, if not set the prefix will be used. | string |
null |
no |
instance_target_capacity_type | Default lifecyle used runner instances, can be either spot or on-demand . |
string |
"spot" |
no |
instance_type | [DEPRECATED] See instance_types. | string |
"m5.large" |
no |
instance_types | List of instance types for the action runner. Defaults are based on runner_os (amzn2 for linux and Windows Server Core for win). | list(string) |
null |
no |
key_name | Key pair name | string |
null |
no |
kms_key_arn | Optional CMK Key ARN to be used for Parameter Store. | string |
null |
no |
lambda_architecture | AWS Lambda architecture. Lambda functions using Graviton processors ('arm64') tend to have better price/performance than 'x86_64' functions. | string |
"x86_64" |
no |
lambda_runtime | AWS Lambda runtime. | string |
"nodejs16.x" |
no |
lambda_s3_bucket | S3 bucket from which to specify lambda functions. This is an alternative to providing local files directly. | any |
null |
no |
lambda_security_group_ids | List of security group IDs associated with the Lambda function. | list(string) |
[] |
no |
lambda_subnet_ids | List of subnets in which the lambda will be launched, the subnets needs to be subnets in the vpc_id . |
list(string) |
[] |
no |
lambda_timeout_scale_down | Time out for the scale down lambda in seconds. | number |
60 |
no |
lambda_timeout_scale_up | Time out for the scale up lambda in seconds. | number |
60 |
no |
lambda_zip | File location of the lambda zip file. | string |
null |
no |
log_level | Logging level for lambda logging. Valid values are 'silly', 'trace', 'debug', 'info', 'warn', 'error', 'fatal'. | string |
"info" |
no |
log_type | Logging format for lambda logging. Valid values are 'json', 'pretty', 'hidden'. | string |
"pretty" |
no |
logging_kms_key_id | Specifies the kms key id to encrypt the logs with | string |
null |
no |
logging_retention_in_days | Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653. | number |
180 |
no |
market_options | DEPCRECATED: Replaced by instance_target_capacity_type . |
string |
null |
no |
metadata_options | Metadata options for the ec2 runner instances. | map(any) |
{ |
no |
minimum_running_time_in_minutes | The time an ec2 action runner should be running at minimum before terminated if non busy. If not set the default is calculated based on the OS. | number |
null |
no |
overrides | This map provides the possibility to override some defaults. The following attributes are supported: name_sg overrides the Name tag for all security groups created by this module. name_runner_agent_instance overrides the Name tag for the ec2 instance defined in the auto launch configuration. name_docker_machine_runners overrides the Name tag spot instances created by the runner agent. |
map(string) |
{ |
no |
pool_config | The configuration for updating the pool. The pool_size to adjust to by the events triggered by the the schedule_expression. For example you can configure a cron expression for week days to adjust the pool to 10 and another expression for the weekend to adjust the pool to 1. |
list(object({ |
[] |
no |
pool_lambda_reserved_concurrent_executions | Amount of reserved concurrent executions for the scale-up lambda function. A value of 0 disables lambda from being triggered and -1 removes any concurrency limitations. | number |
1 |
no |
pool_lambda_timeout | Time out for the pool lambda lambda in seconds. | number |
60 |
no |
pool_runner_owner | The pool will deploy runners to the GitHub org ID, set this value to the org to which you want the runners deployed. Repo level is not supported. | string |
null |
no |
prefix | The prefix used for naming resources | string |
"github-actions" |
no |
role_path | The path that will be added to the role; if not set, the prefix will be used. | string |
null |
no |
role_permissions_boundary | Permissions boundary that will be added to the created role for the lambda. | string |
null |
no |
runner_additional_security_group_ids | (optional) List of additional security groups IDs to apply to the runner | list(string) |
[] |
no |
runner_architecture | The platform architecture of the runner instance_type. | string |
"x64" |
no |
runner_as_root | Run the action runner under the root user. Variable runner_run_as will be ingored. |
bool |
false |
no |
runner_boot_time_in_minutes | The minimum time for an EC2 runner to boot and register as a runner. | number |
5 |
no |
runner_ec2_tags | Map of tags that will be added to the launch template instance tag specificatons. | map(string) |
{} |
no |
runner_extra_labels | Extra labels for the runners (GitHub). Separate each label by a comma | string |
"" |
no |
runner_group_name | Name of the runner group. | string |
"Default" |
no |
runner_iam_role_managed_policy_arns | Attach AWS or customer-managed IAM policies (by ARN) to the runner IAM role | list(string) |
[] |
no |
runner_log_files | (optional) List of logfiles to send to CloudWatch, will only be used if enable_cloudwatch_agent is set to true. Object description: log_group_name : Name of the log group, prefix_log_group : If true, the log group name will be prefixed with /github-self-hosted-runners/<var.prefix> , file_path : path to the log file, log_stream_name : name of the log stream. |
list(object({ |
null |
no |
runner_os | The EC2 Operating System type to use for action runner instances (linux,windows). | string |
"linux" |
no |
runner_run_as | Run the GitHub actions agent as user. | string |
"ec2-user" |
no |
runners_lambda_s3_key | S3 key for runners lambda function. Required if using S3 bucket to specify lambdas. | any |
null |
no |
runners_lambda_s3_object_version | S3 object version for runners lambda function. Useful if S3 versioning is enabled on source bucket. | any |
null |
no |
runners_maximum_count | The maximum number of runners that will be created. | number |
3 |
no |
s3_bucket_runner_binaries | n/a | object({ |
n/a | yes |
s3_location_runner_binaries | S3 location of runner distribution. | string |
n/a | yes |
scale_down_schedule_expression | Scheduler expression to check every x for scale down. | string |
"cron(*/5 * * * ? *)" |
no |
scale_up_reserved_concurrent_executions | Amount of reserved concurrent executions for the scale-up lambda function. A value of 0 disables lambda from being triggered and -1 removes any concurrency limitations. | number |
1 |
no |
sqs_build_queue | SQS queue to consume accepted build events. | object({ |
n/a | yes |
subnet_ids | List of subnets in which the action runners will be launched, the subnets needs to be subnets in the vpc_id . |
list(string) |
n/a | yes |
tags | Map of tags that will be added to created resources. By default resources will be tagged with name. | map(string) |
{} |
no |
userdata_post_install | User-data script snippet to insert after GitHub action runner install | string |
"" |
no |
userdata_pre_install | User-data script snippet to insert before GitHub action runner install | string |
"" |
no |
userdata_template | Alternative user-data template, replacing the default template. By providing your own user_data you have to take care of installing all required software, including the action runner. Variables userdata_pre/post_install are ignored. | string |
null |
no |
vpc_id | The VPC for the security groups. | string |
n/a | yes |
Name | Description |
---|---|
lambda_scale_down | n/a |
lambda_scale_up | n/a |
launch_template | n/a |
role_pool | n/a |
role_runner | n/a |
role_scale_down | n/a |
role_scale_up | n/a |
This module is part of the Philips Forest.
___ _
/ __\__ _ __ ___ ___| |_
/ _\/ _ \| '__/ _ \/ __| __|
/ / | (_) | | | __/\__ \ |_
\/ \___/|_| \___||___/\__|
Infrastructure
Talk to the forestkeepers in the forest
-channel on Slack.