diff --git a/.github/lint/tflint.tfvars b/.github/lint/tflint.tfvars index d1f5d686eb..a2b52671e5 100644 --- a/.github/lint/tflint.tfvars +++ b/.github/lint/tflint.tfvars @@ -1,4 +1,4 @@ -aws_region = null +aws_region = "eu-west-1" github_app = { id = "0" key_base64 = "0" diff --git a/README.md b/README.md index 3215f4ddc7..8f7f552136 100644 --- a/README.md +++ b/README.md @@ -35,14 +35,14 @@ This [Terraform](https://www.terraform.io/) module creates the required infrastr - [Sub modules](#sub-modules) - [Logging](#logging) - [Debugging](#debugging) -- [Security Consideration](#security-consideration) +- [Security Considerations](#security-considerations) - [Requirements](#requirements) - [Providers](#providers) - [Modules](#modules) - [Resources](#resources) - [Inputs](#inputs) - [Outputs](#outputs) -- [Contribution](#contribution) +- [Contributing](#contributing) - [Philips Forest](#philips-forest) ## Motivation @@ -66,7 +66,7 @@ In AWS an [API gateway](https://docs.aws.amazon.com/apigateway/index.html) endpo The "scale up runner" lambda listens to the SQS queue and picks up events. The lambda runs various checks to decide whether a new EC2 spot instance needs to be created. For example, the instance is not created if the build is already started by an existing runner, or the maximum number of runners is reached. -The Lambda first requests a registration token from GitHub, which is needed later by the runner to register itself. This avoids the case that the EC2 instance, which later in the process will install the agent, needs administration permissions to register the runner. Next, the EC2 spot instance is created via the launch template. The launch template defines the specifications of the required instance and contains a [`user_data`](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html) script. This script will install the required software and configure it. The registration token for the action runner is stored in the parameter store (SSM), from which the user data script will fetch it and delete it once it has been retrieved. Once the user data script is finished, the action runner should be online, and the workflow will start in seconds. +The Lambda first requests a JIT configuration or registration token from GitHub, which is needed later by the runner to register itself. This avoids the case that the EC2 instance, which later in the process will install the agent, needs administration permissions to register the runner. Next, the EC2 spot instance is created via the launch template. The launch template defines the specifications of the required instance and contains a [`user_data`](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html) script. This script will install the required software and configure it. The registration token for the action runner is stored in the parameter store (SSM), from which the user data script will fetch it and delete it once it has been retrieved. Once the user data script is finished, the action runner should be online, and the workflow will start in seconds. Scaling down the runners is at the moment brute-forced, every configurable amount of minutes a lambda will check every runner (instance) if it is busy. In case the runner is not busy it will be removed from GitHub and the instance terminated in AWS. At the moment there seems to be no other option to scale down more smoothly. @@ -92,7 +92,7 @@ To be able to support a number of use-cases the module has quite a lot of config - Multi-Runner module. This modules allows you to create multiple runner configurations with a single webhook and single GitHub App to simplify deployment of different types of runners. Refer to the [ReadMe](.modules/../modules/multi-runner/README.md) for more information to understand the functionality. - Workflow job event. You can configure the webhook in GitHub to send workflow job events to the webhook. Workflow job events were introduced by GitHub in September 2021 and are designed to support scalable runners. We advise using the workflow job event when possible. - Linux vs Windows. You can configure the OS types linux and win. Linux will be used by default. -- Re-use vs Ephemeral. By default runners are re-used, until detected idle. Once idle they will be removed from the pool. To improve security we are introducing ephemeral runners. Those runners are only used for one job. Ephemeral runners are only working in combination with the workflow job event. We also suggest using a pre-build AMI to improve the start time of jobs. +- Re-use vs Ephemeral. By default runners are re-used, until detected idle. Once idle they will be removed from the pool. To improve security we are introducing ephemeral runners. Those runners are only used for one job. Ephemeral runners are only working in combination with the workflow job event. For ephemeral runners the lambda requests a JIT (just in time) configuration object via the GitHub to register the runner. [JIT configuration](https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-just-in-time-runners) is limited to ephemeral runners, for non ephemeral a registration token is requested. In both cases the configuration is made available to the instance via the same SSM parameter. We also suggest using a pre-build AMI to improve the start time of jobs. - GitHub Cloud vs GitHub Enterprise Server (GHES). The runners support GitHub Cloud as well GitHub Enterprise Server. For GHES we rely on our community for support and testing. We have no possibility to test ourselves on GHES. - Spot vs on-demand. The runners use either the EC2 spot or on-demand life cycle. Runners will be created via the AWS [CreateFleet API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_CreateFleet.html). The module (scale up lambda) will request via the CreateFleet API to create instances in one of the subnets and of the specified instance types. - ARM64 support via Graviton/Graviton2 instance-types. When using the default example or top-level module, specifying `instance_types` that match a Graviton/Graviton 2 (ARM64) architecture (e.g. a1, t4g or any 6th-gen `g` or `gd` type), you must also specify `runner_architecture = "arm64"` and the sub-modules will be automatically configured to provision with ARM64 AMIs and leverage GitHub's ARM64 action runner. See below for more details. @@ -105,7 +105,7 @@ The module uses the AWS System Manager Parameter Store to store configuration fo | ----------- | ----------- | | `ssm_paths.root/var.prefix?/app/` | App secrets used by Lambda's | | `ssm_paths.root/var.prefix?/runners/config/` | Configuration parameters used by runner start script | -| `ssm_paths.root/var.prefix?/runners/tokens/` | Registration tokens for the runners generated by the scale-up lambda, consumed by the start script on the runner. | +| `ssm_paths.root/var.prefix?/runners/tokens/` | Either JIT configuration (ephemeral runners) or registration tokens (non ephemeral runners) generated by the control plane (scale-up lambda), and consumed by the start script on the runner to activate / register the runner. Available configuration parameters: @@ -330,7 +330,7 @@ You can configure runners to be ephemeral, runners will be used only for one job - All events in the queue will lead to a new runner created by the lambda. By setting `enable_job_queued_check` to `true` you can enforce a rule of only creating a runner if the event has a correlated queued job. Setting this can avoid creating useless runners, for example when jobs got cancelled before a runner was created or if the job was already picked up by another runner. We suggest using this in combination with a pool. - To ensure runners are created in the same order GitHub sends the events, by default we use a FIFO queue. This is mainly relevant for repo level runners. For ephemeral runners you can set `enable_fifo_build_queue` to `false`. - Errors related to scaling should be retried via SQS. You can configure `job_queue_retention_in_seconds` and `redrive_build_queue` to tune the behavior. We have no mechanism to avoid events never being processed, which means potentially no runner gets created and the job in GitHub times out in 6 hours. - + The example for [ephemeral runners](./examples/ephemeral) is based on the [default example](./examples/default). Have look at the diff to see the major configuration differences. ### Prebuilt Images @@ -438,7 +438,9 @@ In case the setup does not work as intended follow the trace of events: ## Security Considerations -This module creates resources in your AWS infrastructure, and EC2 instances for hosting the self-hosted runners on-demand. IAM permissions are set to a minimal level, and could be further limited by using permission boundaries. Instances permissions are limited to retrieve and delete the registration token, access the instance's own tags, and terminate the instance itself. +This module creates resources in your AWS infrastructure, and EC2 instances for hosting the self-hosted runners on-demand. IAM permissions are set to a minimal level, and could be further limited by using permission boundaries. Instances permissions are limited to retrieve and delete the registration token, access the instance's own tags, and terminate the instance itself. By nature instances are short-lived, we strongly suggest to use ephemeral runners to ensure a safe build environment for each workflow job execution. + +Ephemeral runners are using the JIT configuration, confguration that only can be used once to activate a runner. For non-ephemeral runners this option is not provided by GitHub. For non-ephemeeral runners a registration token is passed via SSM. After using the token, the token is deleted. But the token remains valid and is potential available in memory on the runner. For ephemeral runners this problem is avoid by using just in time tokens. The examples are using standard AMI's for different operation systems. Instances are not hardened, and sudo operation are not blocked. To provide an out of the box working experience by default the module installs and configures the runner. However secrets are not hard coded, they finally end up in the memory of the instances. You can harden the instance by providing your own AMI and overwriting the cloud-init script. diff --git a/examples/default/.terraform.lock.hcl b/examples/default/.terraform.lock.hcl index 98416d29d1..e58658f11f 100644 --- a/examples/default/.terraform.lock.hcl +++ b/examples/default/.terraform.lock.hcl @@ -3,9 +3,10 @@ provider "registry.terraform.io/hashicorp/aws" { version = "5.2.0" - constraints = ">= 3.73.0, ~> 5.2" + constraints = ">= 5.0.0, ~> 5.2" hashes = [ "h1:KMTZaIU2/3+cQICzsReZf2mE9FytDc7/iA1i669gyHI=", + "h1:NB70SyXuleF41iKoGtzs61f8xZtGJ+M4bxFI0DeHOzg=", "zh:0e48449e9f29b64663e7ff641a3ba1da434608460c33a20bdb45efeb1e067d4a", "zh:0ec657a1e586087368cc3051ccb8bdf67e8763e50eece76b8dc4695f8d349ebb", "zh:1cff541e792477c4dc8b8405a6f76a56d1292e23d6fc367993efed2b3988c208", @@ -29,6 +30,7 @@ provider "registry.terraform.io/hashicorp/local" { constraints = "~> 2.0" hashes = [ "h1:7RnIbO3CFakblTJs7o0mUiY44dc9xGYsLhSNFSNS1Ds=", + "h1:ZUEYUmm2t4vxwzxy1BvN1wL6SDWrDxfH7pxtzX8c6d0=", "zh:53604cd29cb92538668fe09565c739358dc53ca56f9f11312b9d7de81e48fab9", "zh:66a46e9c508716a1c98efbf793092f03d50049fa4a83cd6b2251e9a06aca2acf", "zh:70a6f6a852dd83768d0778ce9817d81d4b3f073fab8fa570bff92dcb0824f732", @@ -49,6 +51,7 @@ provider "registry.terraform.io/hashicorp/random" { constraints = "~> 3.0" hashes = [ "h1:3hjTP5tQBspPcFAJlfafnWrNrKnr7J4Cp0qB9jbqf30=", + "h1:IL9mSatmwov+e0+++YX2V6uel+dV6bn+fC/cnGDK3Ck=", "zh:04e3fbd610cb52c1017d282531364b9c53ef72b6bc533acb2a90671957324a64", "zh:119197103301ebaf7efb91df8f0b6e0dd31e6ff943d231af35ee1831c599188d", "zh:4d2b219d09abf3b1bb4df93d399ed156cadd61f44ad3baf5cf2954df2fba0831", diff --git a/examples/multi-runner/.terraform.lock.hcl b/examples/multi-runner/.terraform.lock.hcl index 0e4a36729b..e58658f11f 100644 --- a/examples/multi-runner/.terraform.lock.hcl +++ b/examples/multi-runner/.terraform.lock.hcl @@ -3,9 +3,10 @@ provider "registry.terraform.io/hashicorp/aws" { version = "5.2.0" - constraints = ">= 3.63.0, ~> 5.2" + constraints = ">= 5.0.0, ~> 5.2" hashes = [ "h1:KMTZaIU2/3+cQICzsReZf2mE9FytDc7/iA1i669gyHI=", + "h1:NB70SyXuleF41iKoGtzs61f8xZtGJ+M4bxFI0DeHOzg=", "zh:0e48449e9f29b64663e7ff641a3ba1da434608460c33a20bdb45efeb1e067d4a", "zh:0ec657a1e586087368cc3051ccb8bdf67e8763e50eece76b8dc4695f8d349ebb", "zh:1cff541e792477c4dc8b8405a6f76a56d1292e23d6fc367993efed2b3988c208", @@ -29,6 +30,7 @@ provider "registry.terraform.io/hashicorp/local" { constraints = "~> 2.0" hashes = [ "h1:7RnIbO3CFakblTJs7o0mUiY44dc9xGYsLhSNFSNS1Ds=", + "h1:ZUEYUmm2t4vxwzxy1BvN1wL6SDWrDxfH7pxtzX8c6d0=", "zh:53604cd29cb92538668fe09565c739358dc53ca56f9f11312b9d7de81e48fab9", "zh:66a46e9c508716a1c98efbf793092f03d50049fa4a83cd6b2251e9a06aca2acf", "zh:70a6f6a852dd83768d0778ce9817d81d4b3f073fab8fa570bff92dcb0824f732", @@ -49,6 +51,7 @@ provider "registry.terraform.io/hashicorp/random" { constraints = "~> 3.0" hashes = [ "h1:3hjTP5tQBspPcFAJlfafnWrNrKnr7J4Cp0qB9jbqf30=", + "h1:IL9mSatmwov+e0+++YX2V6uel+dV6bn+fC/cnGDK3Ck=", "zh:04e3fbd610cb52c1017d282531364b9c53ef72b6bc533acb2a90671957324a64", "zh:119197103301ebaf7efb91df8f0b6e0dd31e6ff943d231af35ee1831c599188d", "zh:4d2b219d09abf3b1bb4df93d399ed156cadd61f44ad3baf5cf2954df2fba0831", diff --git a/examples/multi-runner/main.tf b/examples/multi-runner/main.tf index 2b23867fb5..527142716f 100644 --- a/examples/multi-runner/main.tf +++ b/examples/multi-runner/main.tf @@ -17,31 +17,8 @@ module "base" { } module "multi-runner" { - source = "../../modules/multi-runner" - multi_runner_config = local.multi_runner_config - # Alternative to loading runner configuration from Yaml files is using static configuration: - # multi_runner_config = { - # "linux-x64" = { - # matcherConfig : { - # labelMatchers = [["self-hosted", "linux", "x64", "amazon"]] - # exactMatch = false - # } - # fifo = true - # delay_webhook_event = 0 - # runner_config = { - # runner_os = "linux" - # runner_architecture = "x64" - # runner_name_prefix = "amazon-x64_" - # create_service_linked_role_spot = true - # enable_ssm_on_runners = true - # instance_types = ["m5ad.large", "m5a.large"] - # runner_extra_labels = "amazon" - # runners_maximum_count = 1 - # enable_ephemeral_runners = true - # scale_down_schedule_expression = "cron(* * * * ? *)" - # } - # } - # } + source = "../../modules/multi-runner" + multi_runner_config = local.multi_runner_config aws_region = local.aws_region vpc_id = module.base.vpc.vpc_id subnet_ids = module.base.vpc.private_subnets @@ -68,5 +45,4 @@ module "multi-runner" { # Enable debug logging for the lambda functions # log_level = "debug" - } diff --git a/lambdas/functions/control-plane/package.json b/lambdas/functions/control-plane/package.json index 50d7c4dfce..93f286c501 100644 --- a/lambdas/functions/control-plane/package.json +++ b/lambdas/functions/control-plane/package.json @@ -41,8 +41,8 @@ "@aws-sdk/client-ec2": "^3.350.0", "@aws-sdk/types": "^3.347.0", "@octokit/auth-app": "4.0.13", - "@octokit/rest": "^19.0.7", - "@octokit/types": "^9.0.0", + "@octokit/rest": "19.0.12", + "@octokit/types": "^10.0.0", "@terraform-aws-github-runner/aws-powertools-util": "*", "@terraform-aws-github-runner/aws-ssm-util": "*", "cron-parser": "^4.8.1", diff --git a/lambdas/functions/control-plane/src/aws/runners.d.ts b/lambdas/functions/control-plane/src/aws/runners.d.ts index 63c50b4327..da15866b75 100644 --- a/lambdas/functions/control-plane/src/aws/runners.d.ts +++ b/lambdas/functions/control-plane/src/aws/runners.d.ts @@ -26,11 +26,9 @@ export interface ListRunnerFilters { } export interface RunnerInputParameters { - runnerServiceConfig: string[]; environment: string; runnerType: RunnerType; runnerOwner: string; - ssmTokenPath: string; subnets: string[]; launchTemplateName: string; ec2instanceCriteria: { diff --git a/lambdas/functions/control-plane/src/aws/runners.test.ts b/lambdas/functions/control-plane/src/aws/runners.test.ts index 94c49455d7..ddb7098e54 100644 --- a/lambdas/functions/control-plane/src/aws/runners.test.ts +++ b/lambdas/functions/control-plane/src/aws/runners.test.ts @@ -12,7 +12,6 @@ import { import { GetParameterCommand, GetParameterResult, PutParameterCommand, SSMClient } from '@aws-sdk/client-ssm'; import { mockClient } from 'aws-sdk-client-mock'; import 'aws-sdk-client-mock-jest'; -import { performance } from 'perf_hooks'; import ScaleError from './../scale-runners/ScaleError'; import { createRunner, listEC2Runners, terminateRunner } from './runners'; @@ -26,7 +25,6 @@ const LAUNCH_TEMPLATE = 'lt-1'; const ORG_NAME = 'SomeAwesomeCoder'; const REPO_NAME = `${ORG_NAME}/some-amazing-library`; const ENVIRONMENT = 'unit-test-environment'; -const SSM_TOKEN_PATH = '/github-action-runners/default/runners/tokens'; const RUNNER_NAME_PREFIX = ''; const RUNNER_TYPES: RunnerType[] = ['Repo', 'Org']; @@ -60,7 +58,6 @@ describe('list instances', () => { it('returns a list of instances', async () => { mockEC2Client.on(DescribeInstancesCommand).resolves(mockRunningInstances); - // mockDescribeInstances.promise.mockReturnValue(mockRunningInstances); const resp = await listEC2Runners(); expect(resp.length).toBe(1); expect(resp).toContainEqual({ @@ -182,7 +179,6 @@ describe('create runner', () => { //mockEC2.createFleet.mockImplementation(() => mockCreateFleet); mockEC2Client.on(CreateFleetCommand).resolves({ Instances: [{ InstanceIds: ['i-1234'] }] }); - mockSSMClient.on(PutParameterCommand).resolves({}); mockSSMClient.on(GetParameterCommand).resolves({}); }); @@ -192,7 +188,6 @@ describe('create runner', () => { expect(mockEC2Client).toHaveReceivedCommandWith(CreateFleetCommand, { ...expectedCreateFleetRequest({ ...defaultExpectedFleetRequestValues, type: type }), }); - expect(mockSSMClient).toHaveReceivedCommandTimes(PutParameterCommand, 1); }); it('calls create fleet of 2 instances with the correct config for org ', async () => { @@ -205,83 +200,6 @@ describe('create runner', () => { expect(mockEC2Client).toHaveReceivedCommandWith(CreateFleetCommand, { ...expectedCreateFleetRequest({ ...defaultExpectedFleetRequestValues, totalTargetCapacity: 2 }), }); - expect(mockSSMClient).toHaveReceivedCommandTimes(PutParameterCommand, 2); - - for (const instance of instances[0].InstanceIds) { - expect(mockSSMClient).toHaveReceivedCommandWith(PutParameterCommand, { - Name: `${SSM_TOKEN_PATH}/${instance}`, - Type: 'SecureString', - Value: '--token foo --url http://github.com', - }); - } - }); - - it('calls create fleet of 40 instances (ssm rate limit condition) to test time delay ', async () => { - const startTime = performance.now(); - const instances = [ - { - InstanceIds: [ - 'i-1234', - 'i-5678', - 'i-5567', - 'i-5569', - 'i-5561', - 'i-5560', - 'i-5566', - 'i-5536', - 'i-5526', - 'i-5516', - 'i-122', - 'i-123', - 'i-124', - 'i-125', - 'i-126', - 'i-127', - 'i-128', - 'i-129', - 'i-130', - 'i-131', - 'i-132', - 'i-133', - 'i-134', - 'i-135', - 'i-136', - 'i-137', - 'i-138', - 'i-139', - 'i-140', - 'i-141', - 'i-142', - 'i-143', - 'i-144', - 'i-145', - 'i-146', - 'i-147', - 'i-148', - 'i-149', - 'i-150', - 'i-151', - ], - }, - ]; - mockEC2Client.on(CreateFleetCommand).resolves({ Instances: instances }); - - await createRunner({ ...createRunnerConfig(defaultRunnerConfig), numberOfRunners: 40 }); - const endTime = performance.now(); - - expect(endTime - startTime).toBeGreaterThan(1000); - expect(mockEC2Client).toHaveReceivedCommandWith( - CreateFleetCommand, - expectedCreateFleetRequest({ ...defaultExpectedFleetRequestValues, totalTargetCapacity: 40 }), - ); - expect(mockSSMClient).toHaveReceivedCommandTimes(PutParameterCommand, 40); - for (const instance of instances[0].InstanceIds) { - expect(mockSSMClient).toHaveReceivedCommandWith(PutParameterCommand, { - Name: `${SSM_TOKEN_PATH}/${instance}`, - Type: 'SecureString', - Value: '--token foo --url http://github.com', - }); - } }); it('calls create fleet of 1 instance with the on-demand capacity', async () => { @@ -289,7 +207,6 @@ describe('create runner', () => { expect(mockEC2Client).toHaveReceivedCommandWith(CreateFleetCommand, { ...expectedCreateFleetRequest({ ...defaultExpectedFleetRequestValues, capacityType: 'on-demand' }), }); - expect(mockSSMClient).toHaveReceivedCommandTimes(PutParameterCommand, 1); }); it('calls run instances with the on-demand capacity', async () => { @@ -299,15 +216,6 @@ describe('create runner', () => { }); }); - it('creates ssm parameters for each created instance', async () => { - await createRunner(createRunnerConfig(defaultRunnerConfig)); - expect(mockSSMClient).toHaveReceivedCommandWith(PutParameterCommand, { - Name: `${SSM_TOKEN_PATH}/i-1234`, - Type: 'SecureString', - Value: '--token foo --url http://github.com', - }); - }); - it('does not create ssm parameters when no instance is created', async () => { mockEC2Client.on(CreateFleetCommand).resolves({ Instances: [] }); await expect(createRunner(createRunnerConfig(defaultRunnerConfig))).rejects.toThrowError(Error); @@ -393,7 +301,6 @@ describe('create runner with errors', () => { CreateFleetCommand, expectedCreateFleetRequest(defaultExpectedFleetRequestValues), ); - expect(mockSSMClient).toHaveReceivedCommand(PutParameterCommand); }); it('test error by create fleet call is thrown.', async () => { @@ -447,11 +354,9 @@ interface RunnerConfig { function createRunnerConfig(runnerConfig: RunnerConfig): RunnerInputParameters { return { - runnerServiceConfig: ['--token foo', '--url http://github.com'], environment: ENVIRONMENT, runnerType: runnerConfig.type, runnerOwner: REPO_NAME, - ssmTokenPath: SSM_TOKEN_PATH, launchTemplateName: LAUNCH_TEMPLATE, ec2instanceCriteria: { instanceTypes: ['m5.large', 'c5.large'], diff --git a/lambdas/functions/control-plane/src/aws/runners.ts b/lambdas/functions/control-plane/src/aws/runners.ts index 6a4911b45f..b65885fc3d 100644 --- a/lambdas/functions/control-plane/src/aws/runners.ts +++ b/lambdas/functions/control-plane/src/aws/runners.ts @@ -8,7 +8,7 @@ import { TerminateInstancesCommand, } from '@aws-sdk/client-ec2'; import { createChildLogger } from '@terraform-aws-github-runner/aws-powertools-util'; -import { getParameter, putParameter } from '@terraform-aws-github-runner/aws-ssm-util'; +import { getParameter } from '@terraform-aws-github-runner/aws-ssm-util'; import moment from 'moment'; import ScaleError from './../scale-runners/ScaleError'; @@ -116,24 +116,11 @@ function generateFleetOverrides( return result; } -function removeTokenForLogging(config: string[]): string[] { - const result: string[] = []; - config.forEach((e) => { - if (e.startsWith('--token')) { - result.push('--token '); - } else { - result.push(e); - } - }); - return result; -} - -export async function createRunner(runnerParameters: Runners.RunnerInputParameters): Promise { +export async function createRunner(runnerParameters: Runners.RunnerInputParameters): Promise { logger.debug('Runner configuration.', { runner: { configuration: { ...runnerParameters, - runnerServiceConfig: removeTokenForLogging(runnerParameters.runnerServiceConfig), }, }, }); @@ -238,22 +225,7 @@ export async function createRunner(runnerParameters: Runners.RunnerInputParamete logger.info(`Created instance(s): ${instances.join(',')}`); - const delay = async (ms: number) => new Promise((resolve) => setTimeout(resolve, ms)); - const ssmParameterStoreMaxThroughput = 40; - const isDelay = instances.length >= ssmParameterStoreMaxThroughput ? true : false; - - for (const instance of instances) { - await putParameter( - `${runnerParameters.ssmTokenPath}/${instance}`, - runnerParameters.runnerServiceConfig.join(' '), - true, - ); - - if (isDelay) { - // Delay to prevent AWS ssm rate limits by being within the max throughput limit - await delay(25); - } - } + return instances; } // If launchTime is undefined, this will return false diff --git a/lambdas/functions/control-plane/src/pool/pool.ts b/lambdas/functions/control-plane/src/pool/pool.ts index d5a4beb82c..ca72f1d517 100644 --- a/lambdas/functions/control-plane/src/pool/pool.ts +++ b/lambdas/functions/control-plane/src/pool/pool.ts @@ -18,12 +18,13 @@ interface RunnerStatus { export async function adjust(event: PoolEvent): Promise { logger.info(`Checking current pool size against pool of size: ${event.poolSize}`); - const runnerExtraLabels = process.env.RUNNER_EXTRA_LABELS; - const runnerGroup = process.env.RUNNER_GROUP_NAME; - const runnerNamePrefix = process.env.RUNNER_NAME_PREFIX; + const runnerLabels = process.env.RUNNER_LABELS || ''; + const runnerGroup = process.env.RUNNER_GROUP_NAME || ''; + const runnerNamePrefix = process.env.RUNNER_NAME_PREFIX || ''; const environment = process.env.ENVIRONMENT; const ghesBaseUrl = process.env.GHES_URL; const ssmTokenPath = process.env.SSM_TOKEN_PATH; + const ssmConfigPath = process.env.SSM_CONFIG_PATH || ''; const subnets = process.env.SUBNET_IDS.split(','); const instanceTypes = process.env.INSTANCE_TYPES.split(','); const instanceTargetTargetCapacityType = process.env.INSTANCE_TARGET_CAPACITY_TYPE; @@ -94,11 +95,14 @@ export async function adjust(event: PoolEvent): Promise { { ephemeral, ghesBaseUrl, - runnerExtraLabels, + runnerLabels, runnerGroup, runnerOwner, + runnerNamePrefix, runnerType: 'Org', disableAutoUpdate: disableAutoUpdate, + ssmTokenPath, + ssmConfigPath, }, { ec2instanceCriteria: { @@ -109,7 +113,6 @@ export async function adjust(event: PoolEvent): Promise { }, environment, launchTemplateName, - ssmTokenPath, subnets, numberOfRunners: topUp, amiIdSsmParameterName, diff --git a/lambdas/functions/control-plane/src/scale-runners/scale-up.test.ts b/lambdas/functions/control-plane/src/scale-runners/scale-up.test.ts index 5a3ea531e5..db9b849710 100644 --- a/lambdas/functions/control-plane/src/scale-runners/scale-up.test.ts +++ b/lambdas/functions/control-plane/src/scale-runners/scale-up.test.ts @@ -1,6 +1,10 @@ +import { GetParameterCommand, PutParameterCommand, SSMClient } from '@aws-sdk/client-ssm'; import { Octokit } from '@octokit/rest'; +import { mockClient } from 'aws-sdk-client-mock'; +import 'aws-sdk-client-mock-jest'; import { mocked } from 'jest-mock'; import nock from 'nock'; +import { performance } from 'perf_hooks'; import * as ghAuth from '../gh-auth/gh-auth'; import { createRunner, listEC2Runners } from './../aws/runners'; @@ -9,17 +13,23 @@ import ScaleError from './ScaleError'; import * as scaleUpModule from './scale-up'; const mockOctokit = { + paginate: jest.fn(), checks: { get: jest.fn() }, actions: { createRegistrationTokenForOrg: jest.fn(), createRegistrationTokenForRepo: jest.fn(), getJobForWorkflowRun: jest.fn(), + generateRunnerJitconfigForOrg: jest.fn(), + generateRunnerJitconfigForRepo: jest.fn(), }, apps: { getOrgInstallation: jest.fn(), getRepoInstallation: jest.fn(), }, }; +const mockCreateRunner = mocked(createRunner); +const mockListRunners = mocked(listEC2Runners); +const mockSSMClient = mockClient(SSMClient); jest.mock('@octokit/rest', () => ({ Octokit: jest.fn().mockImplementation(() => mockOctokit), @@ -27,6 +37,10 @@ jest.mock('@octokit/rest', () => ({ jest.mock('./../aws/runners'); jest.mock('./../gh-auth/gh-auth'); +export type RunnerType = 'ephemeral' | 'non-ephemeral'; + +// for ephemeral and non-ephemeral runners +const RUNNER_TYPES: RunnerType[] = ['ephemeral', 'non-ephemeral']; const mocktokit = Octokit as jest.MockedClass; const mockedAppAuth = mocked(ghAuth.createGithubAppAuth, { shallow: false }); @@ -54,7 +68,6 @@ const cleanEnv = process.env; const EXPECTED_RUNNER_PARAMS: RunnerInputParameters = { environment: 'unit-test-environment', - runnerServiceConfig: [`--url https://github.enterprise.something/${TEST_DATA.repositoryOwner}`, '--token 1234abcd'], runnerType: 'Org', runnerOwner: TEST_DATA.repositoryOwner, launchTemplateName: 'lt-1', @@ -63,7 +76,6 @@ const EXPECTED_RUNNER_PARAMS: RunnerInputParameters = { targetCapacityType: 'spot', instanceAllocationStrategy: 'lowest-price', }, - ssmTokenPath: '/github-action-runners/default/runners/config', subnets: ['subnet-123'], }; let expectedRunnerParams: RunnerInputParameters; @@ -80,7 +92,6 @@ beforeEach(() => { process.env.RUNNERS_MAXIMUM_COUNT = '3'; process.env.ENVIRONMENT = EXPECTED_RUNNER_PARAMS.environment; process.env.LAUNCH_TEMPLATE_NAME = 'lt-1'; - process.env.SSM_TOKEN_PATH = EXPECTED_RUNNER_PARAMS.ssmTokenPath; process.env.SUBNET_IDS = 'subnet-123'; process.env.INSTANCE_TYPES = 'm5.large'; process.env.INSTANCE_TARGET_CAPACITY_TYPE = 'spot'; @@ -90,7 +101,22 @@ beforeEach(() => { status: 'queued', }, })); - + mockOctokit.paginate.mockImplementation(() => [ + { + id: 1, + name: 'Default', + }, + ]); + mockOctokit.actions.generateRunnerJitconfigForOrg.mockImplementation(() => ({ + data: { + encoded_jit_config: 'TEST_JIT_CONFIG_ORG', + }, + })); + mockOctokit.actions.generateRunnerJitconfigForRepo.mockImplementation(() => ({ + data: { + encoded_jit_config: 'TEST_JIT_CONFIG_REPO', + }, + })); mockOctokit.checks.get.mockImplementation(() => ({ data: { status: 'queued', @@ -116,7 +142,9 @@ beforeEach(() => { mockOctokit.actions.createRegistrationTokenForRepo.mockImplementation(() => mockTokenReturnValue); mockOctokit.apps.getOrgInstallation.mockImplementation(() => mockInstallationIdReturnValueOrgs); mockOctokit.apps.getRepoInstallation.mockImplementation(() => mockInstallationIdReturnValueRepos); - const mockListRunners = mocked(listEC2Runners); + mockCreateRunner.mockImplementation(async () => { + return ['i-12345']; + }); mockListRunners.mockImplementation(async () => [ { instanceId: 'i-1234', @@ -176,8 +204,16 @@ describe('scaleUp with GHES', () => { describe('on org level', () => { beforeEach(() => { process.env.ENABLE_ORGANIZATION_RUNNERS = 'true'; - process.env.RUNNER_NAME_PREFIX = 'unit-test'; + process.env.ENABLE_EPHEMERAL_RUNNERS = 'true'; + process.env.RUNNER_NAME_PREFIX = 'unit-test-'; + process.env.RUNNER_GROUP_NAME = 'Default'; + process.env.SSM_CONFIG_PATH = '/github-action-runners/default/runners/config'; + process.env.SSM_TOKEN_PATH = '/github-action-runners/default/runners/config'; + process.env.RUNNER_LABELS = 'label1,label2'; + expectedRunnerParams = { ...EXPECTED_RUNNER_PARAMS }; + mockSSMClient.reset(); + mockSSMClient.on(GetParameterCommand).resolves({ Parameter: { Value: '1' } }); }); it('gets the current org level runners', async () => { @@ -191,12 +227,14 @@ describe('scaleUp with GHES', () => { it('does not create a token when maximum runners has been reached', async () => { process.env.RUNNERS_MAXIMUM_COUNT = '1'; + process.env.ENABLE_EPHEMERAL_RUNNERS = 'false'; await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); expect(mockOctokit.actions.createRegistrationTokenForOrg).not.toBeCalled(); expect(mockOctokit.actions.createRegistrationTokenForRepo).not.toBeCalled(); }); it('creates a token when maximum runners has not been reached', async () => { + process.env.ENABLE_EPHEMERAL_RUNNERS = 'false'; await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); expect(mockOctokit.actions.createRegistrationTokenForOrg).toBeCalledWith({ org: TEST_DATA.repositoryOwner, @@ -232,14 +270,9 @@ describe('scaleUp with GHES', () => { }); it('creates a runner with labels in a specific group', async () => { - process.env.RUNNER_EXTRA_LABELS = 'label1,label2'; + process.env.RUNNER_LABELS = 'label1,label2'; process.env.RUNNER_GROUP_NAME = 'TEST_GROUP'; await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); - expectedRunnerParams.runnerServiceConfig = [ - ...expectedRunnerParams.runnerServiceConfig, - '--labels label1,label2', - '--runnergroup TEST_GROUP', - ]; expect(createRunner).toBeCalledWith(expectedRunnerParams); }); @@ -248,8 +281,127 @@ describe('scaleUp with GHES', () => { await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); expect(createRunner).toBeCalledWith({ ...expectedRunnerParams, amiIdSsmParameterName: 'my-ami-id-param' }); }); - }); + it('Throws an error if runner group doesnt exist for ephemeral runners', async () => { + process.env.RUNNER_GROUP_NAME = 'test-runner-group'; + mockSSMClient.on(GetParameterCommand).rejects(); + await expect(scaleUpModule.scaleUp('aws:sqs', TEST_DATA)).rejects.toBeInstanceOf(Error); + expect(mockOctokit.paginate).toHaveBeenCalledTimes(1); + }); + + it('create SSM parameter for runner group id if it doesnt exist', async () => { + mockSSMClient.on(GetParameterCommand).rejects(); + await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); + expect(mockOctokit.paginate).toHaveBeenCalledTimes(1); + expect(mockSSMClient).toHaveReceivedCommandTimes(PutParameterCommand, 2); + expect(mockSSMClient).toHaveReceivedNthSpecificCommandWith(1, PutParameterCommand, { + Name: `${process.env.SSM_CONFIG_PATH}/runner-group/${process.env.RUNNER_GROUP_NAME}`, + Value: '1', + Type: 'String', + }); + }); + + it('Doesnt create SSM parameter for runner group id if it exists', async () => { + mockSSMClient.on(GetParameterCommand).resolves({ Parameter: { Value: '1' } }); + await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); + expect(mockOctokit.paginate).toHaveBeenCalledTimes(0); + expect(mockSSMClient).toHaveReceivedCommandTimes(PutParameterCommand, 1); + }); + + it('create start runner config for ephemeral runners ', async () => { + process.env.RUNNERS_MAXIMUM_COUNT = '2'; + mockSSMClient.on(GetParameterCommand).resolves({ Parameter: { Value: '1' } }); + await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); + expect(mockOctokit.actions.generateRunnerJitconfigForOrg).toBeCalledWith({ + org: TEST_DATA.repositoryOwner, + name: 'unit-test-i-12345', + runner_group_id: 1, + labels: ['label1', 'label2'], + }); + expect(mockSSMClient).toHaveReceivedNthSpecificCommandWith(1, PutParameterCommand, { + Name: '/github-action-runners/default/runners/config/i-12345', + Value: 'TEST_JIT_CONFIG_ORG', + Type: 'SecureString', + }); + }); + + it('create start runner config for non-ephemeral runners ', async () => { + process.env.ENABLE_EPHEMERAL_RUNNERS = 'false'; + process.env.RUNNERS_MAXIMUM_COUNT = '2'; + await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); + expect(mockOctokit.actions.generateRunnerJitconfigForOrg).not.toBeCalled(); + expect(mockOctokit.actions.createRegistrationTokenForOrg).toBeCalled(); + expect(mockSSMClient).toHaveReceivedNthSpecificCommandWith(1, PutParameterCommand, { + Name: '/github-action-runners/default/runners/config/i-12345', + Value: + '--url https://github.enterprise.something/Codertocat --token 1234abcd ' + + '--labels label1,label2 --runnergroup Default', + Type: 'SecureString', + }); + }); + it.each(RUNNER_TYPES)( + 'calls create start runner config of 40' + ' instances (ssm rate limit condition) to test time delay ', + async (type: RunnerType) => { + process.env.ENABLE_EPHEMERAL_RUNNERS = type === 'ephemeral' ? 'true' : 'false'; + process.env.RUNNERS_MAXIMUM_COUNT = '40'; + mockCreateRunner.mockImplementation(async () => { + return instances; + }); + mockListRunners.mockImplementation(async () => { + return []; + }); + mockSSMClient.on(GetParameterCommand).resolves({ Parameter: { Value: '1' } }); + const startTime = performance.now(); + const instances = [ + 'i-1234', + 'i-5678', + 'i-5567', + 'i-5569', + 'i-5561', + 'i-5560', + 'i-5566', + 'i-5536', + 'i-5526', + 'i-5516', + 'i-122', + 'i-123', + 'i-124', + 'i-125', + 'i-126', + 'i-127', + 'i-128', + 'i-129', + 'i-130', + 'i-131', + 'i-132', + 'i-133', + 'i-134', + 'i-135', + 'i-136', + 'i-137', + 'i-138', + 'i-139', + 'i-140', + 'i-141', + 'i-142', + 'i-143', + 'i-144', + 'i-145', + 'i-146', + 'i-147', + 'i-148', + 'i-149', + 'i-150', + 'i-151', + ]; + await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); + const endTime = performance.now(); + expect(endTime - startTime).toBeGreaterThan(1000); + expect(mockSSMClient).toHaveReceivedCommandTimes(PutParameterCommand, 40); + }, + 10000, + ); + }); describe('on repo level', () => { beforeEach(() => { process.env.ENABLE_ORGANIZATION_RUNNERS = 'false'; @@ -257,10 +409,9 @@ describe('scaleUp with GHES', () => { expectedRunnerParams = { ...EXPECTED_RUNNER_PARAMS }; expectedRunnerParams.runnerType = 'Repo'; expectedRunnerParams.runnerOwner = `${TEST_DATA.repositoryOwner}/${TEST_DATA.repositoryName}`; - expectedRunnerParams.runnerServiceConfig = [ - `--url https://github.enterprise.something/${TEST_DATA.repositoryOwner}/${TEST_DATA.repositoryName}`, - `--token 1234abcd`, - ]; + // `--url https://github.enterprise.something/${TEST_DATA.repositoryOwner}/${TEST_DATA.repositoryName}`, + // `--token 1234abcd`, + // ]; }); it('gets the current repo level runners', async () => { @@ -280,6 +431,7 @@ describe('scaleUp with GHES', () => { }); it('creates a token when maximum runners has not been reached', async () => { + process.env.ENABLE_EPHEMERAL_RUNNERS = 'false'; await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); expect(mockOctokit.actions.createRegistrationTokenForOrg).not.toBeCalled(); expect(mockOctokit.actions.createRegistrationTokenForRepo).toBeCalledWith({ @@ -324,23 +476,15 @@ describe('scaleUp with GHES', () => { }); it('creates a runner with correct config and labels', async () => { - process.env.RUNNER_EXTRA_LABELS = 'label1,label2'; + process.env.RUNNER_LABELS = 'label1,label2'; await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); - expectedRunnerParams.runnerServiceConfig = [ - ...expectedRunnerParams.runnerServiceConfig, - `--labels label1,label2`, - ]; expect(createRunner).toBeCalledWith(expectedRunnerParams); }); it('creates a runner and ensure the group argument is ignored', async () => { - process.env.RUNNER_EXTRA_LABELS = 'label1,label2'; + process.env.RUNNER_LABELS = 'label1,label2'; process.env.RUNNER_GROUP_NAME = 'TEST_GROUP_IGNORED'; await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); - expectedRunnerParams.runnerServiceConfig = [ - ...expectedRunnerParams.runnerServiceConfig, - `--labels label1,label2`, - ]; expect(createRunner).toBeCalledWith(expectedRunnerParams); }); @@ -407,10 +551,6 @@ describe('scaleUp with public GH', () => { process.env.ENABLE_ORGANIZATION_RUNNERS = 'true'; process.env.RUNNER_NAME_PREFIX = 'unit-test'; expectedRunnerParams = { ...EXPECTED_RUNNER_PARAMS }; - expectedRunnerParams.runnerServiceConfig = [ - `--url https://github.com/${TEST_DATA.repositoryOwner}`, - `--token 1234abcd`, - ]; }); it('gets the current org level runners', async () => { @@ -443,14 +583,9 @@ describe('scaleUp with public GH', () => { }); it('creates a runner with labels in s specific group', async () => { - process.env.RUNNER_EXTRA_LABELS = 'label1,label2'; + process.env.RUNNER_LABELS = 'label1,label2'; process.env.RUNNER_GROUP_NAME = 'TEST_GROUP'; await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); - expectedRunnerParams.runnerServiceConfig = [ - ...expectedRunnerParams.runnerServiceConfig, - `--labels label1,label2`, - `--runnergroup TEST_GROUP`, - ]; expect(createRunner).toBeCalledWith(expectedRunnerParams); }); }); @@ -462,10 +597,6 @@ describe('scaleUp with public GH', () => { expectedRunnerParams = { ...EXPECTED_RUNNER_PARAMS }; expectedRunnerParams.runnerType = 'Repo'; expectedRunnerParams.runnerOwner = `${TEST_DATA.repositoryOwner}/${TEST_DATA.repositoryName}`; - expectedRunnerParams.runnerServiceConfig = [ - `--url https://github.com/${TEST_DATA.repositoryOwner}/${TEST_DATA.repositoryName}`, - `--token 1234abcd`, - ]; }); it('gets the current repo level runners', async () => { @@ -514,23 +645,15 @@ describe('scaleUp with public GH', () => { }); it('creates a runner with correct config and labels', async () => { - process.env.RUNNER_EXTRA_LABELS = 'label1,label2'; + process.env.RUNNER_LABELS = 'label1,label2'; await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); - expectedRunnerParams.runnerServiceConfig = [ - ...expectedRunnerParams.runnerServiceConfig, - `--labels label1,label2`, - ]; expect(createRunner).toBeCalledWith(expectedRunnerParams); }); it('creates a runner and ensure the group argument is ignored', async () => { - process.env.RUNNER_EXTRA_LABELS = 'label1,label2'; + process.env.RUNNER_LABELS = 'label1,label2'; process.env.RUNNER_GROUP_NAME = 'TEST_GROUP_IGNORED'; await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); - expectedRunnerParams.runnerServiceConfig = [ - ...expectedRunnerParams.runnerServiceConfig, - `--labels label1,label2`, - ]; expect(createRunner).toBeCalledWith(expectedRunnerParams); }); @@ -549,7 +672,6 @@ describe('scaleUp with public GH', () => { process.env.ENABLE_EPHEMERAL_RUNNERS = 'true'; process.env.ENABLE_JOB_QUEUED_CHECK = 'false'; await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); - expectedRunnerParams.runnerServiceConfig = [...expectedRunnerParams.runnerServiceConfig, `--ephemeral`]; expect(mockOctokit.actions.getJobForWorkflowRun).not.toBeCalled(); expect(createRunner).toBeCalledWith(expectedRunnerParams); }); @@ -559,14 +681,12 @@ describe('scaleUp with public GH', () => { process.env.ENABLE_JOB_QUEUED_CHECK = 'true'; await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); expect(mockOctokit.actions.getJobForWorkflowRun).toBeCalled(); - expectedRunnerParams.runnerServiceConfig = [...expectedRunnerParams.runnerServiceConfig, `--ephemeral`]; expect(createRunner).toBeCalledWith(expectedRunnerParams); }); it('disable auto update on the runner.', async () => { process.env.DISABLE_RUNNER_AUTOUPDATE = 'true'; await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); - expectedRunnerParams.runnerServiceConfig = [...expectedRunnerParams.runnerServiceConfig, `--disableupdate`]; expect(createRunner).toBeCalledWith(expectedRunnerParams); }); diff --git a/lambdas/functions/control-plane/src/scale-runners/scale-up.ts b/lambdas/functions/control-plane/src/scale-runners/scale-up.ts index b0516a3587..f9159e18d0 100644 --- a/lambdas/functions/control-plane/src/scale-runners/scale-up.ts +++ b/lambdas/functions/control-plane/src/scale-runners/scale-up.ts @@ -1,5 +1,6 @@ import { Octokit } from '@octokit/rest'; import { addPersistentContextToChildLogger, createChildLogger } from '@terraform-aws-github-runner/aws-powertools-util'; +import { getParameter, putParameter } from '@terraform-aws-github-runner/aws-ssm-util'; import yn from 'yn'; import { createGithubAppAuth, createGithubInstallationAuth, createOctoClient } from '../gh-auth/gh-auth'; @@ -9,6 +10,17 @@ import ScaleError from './ScaleError'; const logger = createChildLogger('scale-up'); +export interface RunnerGroup { + name: string; + id: number; +} + +interface EphemeralRunnerConfig { + runnerName: string; + runnerGroupId: number; + runnerLabels: string[]; +} + export interface ActionRequestMessage { id: number; eventType: 'check_run' | 'workflow_job'; @@ -20,16 +32,18 @@ export interface ActionRequestMessage { interface CreateGitHubRunnerConfig { ephemeral: boolean; ghesBaseUrl: string; - runnerExtraLabels: string | undefined; - runnerGroup: string | undefined; + runnerLabels: string; + runnerGroup: string; + runnerNamePrefix: string; runnerOwner: string; runnerType: 'Org' | 'Repo'; disableAutoUpdate: boolean; + ssmTokenPath: string; + ssmConfigPath: string; } interface CreateEC2RunnerConfig { environment: string; - ssmTokenPath: string; subnets: string[]; launchTemplateName: string; ec2instanceCriteria: RunnerInputParameters['ec2instanceCriteria']; @@ -43,12 +57,8 @@ function generateRunnerServiceConfig(githubRunnerConfig: CreateGitHubRunnerConfi `--token ${token}`, ]; - if (githubRunnerConfig.runnerExtraLabels !== undefined) { - config.push(`--labels ${githubRunnerConfig.runnerExtraLabels}`); - } - - if (githubRunnerConfig.ephemeral) { - config.push(`--ephemeral`); + if (githubRunnerConfig.runnerLabels !== undefined) { + config.push(`--labels ${githubRunnerConfig.runnerLabels}`); } if (githubRunnerConfig.disableAutoUpdate) { @@ -73,6 +83,18 @@ async function getGithubRunnerRegistrationToken(githubRunnerConfig: CreateGitHub return registrationToken.data.token; } +function removeTokenFromLogging(config: string[]): string[] { + const result: string[] = []; + config.forEach((e) => { + if (e.startsWith('--token')) { + result.push('--token '); + } else { + result.push(e); + } + }); + return result; +} + async function getInstallationId( ghesApiUrl: string, enableOrgLevel: boolean, @@ -116,21 +138,72 @@ async function isJobQueued(githubInstallationClient: Octokit, payload: ActionReq return isQueued; } +async function getRunnerGroupId(githubRunnerConfig: CreateGitHubRunnerConfig, ghClient: Octokit): Promise { + // if the runnerType is Repo, then runnerGroupId is default to 1 + let runnerGroupId: number | undefined = 1; + if (githubRunnerConfig.runnerType === 'Org' && githubRunnerConfig.runnerGroup !== undefined) { + let runnerGroup: string | undefined; + // check if runner group id is already stored in SSM Parameter Store and + // use it if it exists to avoid API call to GitHub + try { + runnerGroup = await getParameter( + `${githubRunnerConfig.ssmConfigPath}/runner-group/${githubRunnerConfig.runnerGroup}`, + ); + } catch (err) { + logger.debug('Handling error:', err as Error); + logger.warn( + `SSM Parameter "${githubRunnerConfig.ssmConfigPath}/runner-group/${githubRunnerConfig.runnerGroup}" + for Runner group ${githubRunnerConfig.runnerGroup} does not exist`, + ); + } + if (runnerGroup === undefined) { + // get runner group id from GitHub + runnerGroupId = await GetRunnerGroupByName(ghClient, githubRunnerConfig); + // store runner group id in SSM + try { + await putParameter( + `${githubRunnerConfig.ssmConfigPath}/runner-group/${githubRunnerConfig.runnerGroup}`, + runnerGroupId.toString(), + false, + ); + } catch (err) { + logger.debug('Error storing runner group id in SSM Parameter Store', err as Error); + throw err; + } + } else { + runnerGroupId = parseInt(runnerGroup); + } + } + return runnerGroupId; +} + +async function GetRunnerGroupByName(ghClient: Octokit, githubRunnerConfig: CreateGitHubRunnerConfig): Promise { + const runnerGroups: RunnerGroup[] = await ghClient.paginate(`GET /orgs/{org}/actions/runner-groups`, { + org: githubRunnerConfig.runnerOwner, + per_page: 100, + }); + const runnerGroupId = runnerGroups.find((runnerGroup) => runnerGroup.name === githubRunnerConfig.runnerGroup)?.id; + + if (runnerGroupId === undefined) { + throw new Error(`Runner group ${githubRunnerConfig.runnerGroup} does not exist`); + } + + return runnerGroupId; +} + export async function createRunners( githubRunnerConfig: CreateGitHubRunnerConfig, ec2RunnerConfig: CreateEC2RunnerConfig, ghClient: Octokit, ): Promise { - const token = await getGithubRunnerRegistrationToken(githubRunnerConfig, ghClient); - - const runnerServiceConfig = generateRunnerServiceConfig(githubRunnerConfig, token); - - await createRunner({ - runnerServiceConfig, + const instances = await createRunner({ runnerType: githubRunnerConfig.runnerType, runnerOwner: githubRunnerConfig.runnerOwner, ...ec2RunnerConfig, }); + if (instances.length !== 0) { + await createStartRunnerConfig(githubRunnerConfig, instances, ghClient); + } } export async function scaleUp(eventSource: string, payload: ActionRequestMessage): Promise { @@ -139,8 +212,8 @@ export async function scaleUp(eventSource: string, payload: ActionRequestMessage if (eventSource !== 'aws:sqs') throw Error('Cannot handle non-SQS events!'); const enableOrgLevel = yn(process.env.ENABLE_ORGANIZATION_RUNNERS, { default: true }); const maximumRunners = parseInt(process.env.RUNNERS_MAXIMUM_COUNT || '3'); - const runnerExtraLabels = process.env.RUNNER_EXTRA_LABELS; - const runnerGroup = process.env.RUNNER_GROUP_NAME; + const runnerLabels = process.env.RUNNER_LABELS; + const runnerGroup = process.env.RUNNER_GROUP_NAME || 'Default'; const environment = process.env.ENVIRONMENT; const ghesBaseUrl = process.env.GHES_URL; const ssmTokenPath = process.env.SSM_TOKEN_PATH; @@ -154,6 +227,8 @@ export async function scaleUp(eventSource: string, payload: ActionRequestMessage const instanceAllocationStrategy = process.env.INSTANCE_ALLOCATION_STRATEGY || 'lowest-price'; // same as AWS default const enableJobQueuedCheck = yn(process.env.ENABLE_JOB_QUEUED_CHECK, { default: true }); const amiIdSsmParameterName = process.env.AMI_ID_SSM_PARAMETER_NAME; + const runnerNamePrefix = process.env.RUNNER_NAME_PREFIX || ''; + const ssmConfigPath = process.env.SSM_CONFIG_PATH || ''; if (ephemeralEnabled && payload.eventType !== 'workflow_job') { logger.warn(`${payload.eventType} event is not supported in combination with ephemeral runners.`); @@ -187,7 +262,6 @@ export async function scaleUp(eventSource: string, payload: ActionRequestMessage const installationId = await getInstallationId(ghesApiUrl, enableOrgLevel, payload); const ghAuth = await createGithubInstallationAuth(installationId, ghesApiUrl); const githubInstallationClient = await createOctoClient(ghAuth.token, ghesApiUrl); - if (!enableJobQueuedCheck || (await isJobQueued(githubInstallationClient, payload))) { const currentRunners = await listEC2Runners({ environment, @@ -203,11 +277,14 @@ export async function scaleUp(eventSource: string, payload: ActionRequestMessage { ephemeral, ghesBaseUrl, - runnerExtraLabels, + runnerLabels, runnerGroup, + runnerNamePrefix, runnerOwner, runnerType, disableAutoUpdate, + ssmTokenPath, + ssmConfigPath, }, { ec2instanceCriteria: { @@ -218,7 +295,6 @@ export async function scaleUp(eventSource: string, payload: ActionRequestMessage }, environment, launchTemplateName, - ssmTokenPath, subnets, amiIdSsmParameterName, }, @@ -232,3 +308,90 @@ export async function scaleUp(eventSource: string, payload: ActionRequestMessage } } } +async function createStartRunnerConfig( + githubRunnerConfig: CreateGitHubRunnerConfig, + instances: string[], + ghClient: Octokit, +) { + if (githubRunnerConfig.ephemeral) { + await createStartRunnerConfigForEphemeralRunners(githubRunnerConfig, instances, ghClient); + } else { + await createStartRunnerConfigForNonEphemeralRunners(githubRunnerConfig, instances, ghClient); + } +} + +function addDelay(instances: string[]) { + const delay = async (ms: number) => new Promise((resolve) => setTimeout(resolve, ms)); + const ssmParameterStoreMaxThroughput = 40; + const isDelay = instances.length >= ssmParameterStoreMaxThroughput ? true : false; + return { isDelay, delay }; +} + +async function createStartRunnerConfigForNonEphemeralRunners( + githubRunnerConfig: CreateGitHubRunnerConfig, + instances: string[], + ghClient: Octokit, +) { + const { isDelay, delay } = addDelay(instances); + const token = await getGithubRunnerRegistrationToken(githubRunnerConfig, ghClient); + const runnerServiceConfig = generateRunnerServiceConfig(githubRunnerConfig, token); + + logger.debug('Runner service config for non-ephemeral runners', { + runner_service_config: removeTokenFromLogging(runnerServiceConfig), + }); + + for (const instance of instances) { + await putParameter(`${githubRunnerConfig.ssmTokenPath}/${instance}`, runnerServiceConfig.join(' '), true); + if (isDelay) { + // Delay to prevent AWS ssm rate limits by being within the max throughput limit + await delay(25); + } + } +} + +async function createStartRunnerConfigForEphemeralRunners( + githubRunnerConfig: CreateGitHubRunnerConfig, + instances: string[], + ghClient: Octokit, +) { + const runnerGroupId = await getRunnerGroupId(githubRunnerConfig, ghClient); + const { isDelay, delay } = addDelay(instances); + const runnerLabels = githubRunnerConfig.runnerLabels.split(','); + + logger.debug(`Runner group id: ${runnerGroupId}`); + logger.debug(`Runner labels: ${runnerLabels}`); + for (const instance of instances) { + // generate jit config for runner registration + const ephemeralRunnerConfig: EphemeralRunnerConfig = { + runnerName: `${githubRunnerConfig.runnerNamePrefix}${instance}`, + runnerGroupId: runnerGroupId, + runnerLabels: runnerLabels, + }; + logger.debug(`Runner name: ${ephemeralRunnerConfig.runnerName}`); + const runnerConfig = + githubRunnerConfig.runnerType === 'Org' + ? await ghClient.actions.generateRunnerJitconfigForOrg({ + org: githubRunnerConfig.runnerOwner, + name: ephemeralRunnerConfig.runnerName, + runner_group_id: ephemeralRunnerConfig.runnerGroupId, + labels: ephemeralRunnerConfig.runnerLabels, + }) + : await ghClient.actions.generateRunnerJitconfigForRepo({ + owner: githubRunnerConfig.runnerOwner.split('/')[0], + repo: githubRunnerConfig.runnerOwner.split('/')[1], + name: ephemeralRunnerConfig.runnerName, + runner_group_id: ephemeralRunnerConfig.runnerGroupId, + labels: ephemeralRunnerConfig.runnerLabels, + }); + + // store jit config in ssm parameter store + logger.debug('Runner JIT config for ephemeral runner generated.', { + instance: instance, + }); + await putParameter(`${githubRunnerConfig.ssmTokenPath}/${instance}`, runnerConfig.data.encoded_jit_config, true); + if (isDelay) { + // Delay to prevent AWS ssm rate limits by being within the max throughput limit + await delay(25); + } + } +} diff --git a/lambdas/functions/gh-agent-syncer/package.json b/lambdas/functions/gh-agent-syncer/package.json index f8ba83ce3b..ebe3ad6784 100644 --- a/lambdas/functions/gh-agent-syncer/package.json +++ b/lambdas/functions/gh-agent-syncer/package.json @@ -16,7 +16,7 @@ "all": "yarn build && yarn format && yarn lint && yarn test" }, "devDependencies": { - "@octokit/rest": "^19.0.7", + "@octokit/rest": "^19.0.12", "@trivago/prettier-plugin-sort-imports": "^4.1.1", "@types/aws-lambda": "^8.10.114", "@types/jest": "^29.5.0", diff --git a/lambdas/functions/webhook/package.json b/lambdas/functions/webhook/package.json index cedca35926..3083e6b106 100644 --- a/lambdas/functions/webhook/package.json +++ b/lambdas/functions/webhook/package.json @@ -39,10 +39,11 @@ }, "dependencies": { "@aws-sdk/client-sqs": "^3.315.0", - "@octokit/rest": "19.0.7", + "@octokit/rest": "^19.0.12", + "@octokit/types": "^10.0.0", "@octokit/webhooks": "^11.0.0", - "@terraform-aws-github-runner/aws-ssm-util": "*", "@terraform-aws-github-runner/aws-powertools-util": "*", + "@terraform-aws-github-runner/aws-ssm-util": "*", "aws-lambda": "^1.0.7" } } diff --git a/lambdas/functions/webhook/src/webhook/handler.ts b/lambdas/functions/webhook/src/webhook/handler.ts index d78f88b1e6..7b37b14b12 100644 --- a/lambdas/functions/webhook/src/webhook/handler.ts +++ b/lambdas/functions/webhook/src/webhook/handler.ts @@ -11,7 +11,7 @@ const supportedEvents = ['workflow_job']; const logger = createChildLogger('handler'); export async function handle(headers: IncomingHttpHeaders, body: string): Promise { - const { environment, repositoryWhiteList, queuesConfig } = readEnvironmentVariables(); + const { repositoryWhiteList, queuesConfig } = readEnvironmentVariables(); // ensure header keys lower case since github headers can contain capitals. for (const key in headers) { diff --git a/lambdas/libs/aws-powertools-util/src/logger/index.ts b/lambdas/libs/aws-powertools-util/src/logger/index.ts index ee3241eed2..aac25d4ab0 100644 --- a/lambdas/libs/aws-powertools-util/src/logger/index.ts +++ b/lambdas/libs/aws-powertools-util/src/logger/index.ts @@ -4,7 +4,7 @@ import { Context } from 'aws-lambda'; const childLoggers: Logger[] = []; const defaultValues = { - region: process.env.AWS_REGION || 'N/A', + region: process.env.AWS_REGION, environment: process.env.ENVIRONMENT || 'N/A', }; diff --git a/lambdas/package.json b/lambdas/package.json index 15819939ba..0eeaa66f26 100644 --- a/lambdas/package.json +++ b/lambdas/package.json @@ -16,5 +16,8 @@ }, "devDependencies": { "lerna": "^6.6.1" + }, + "resolutions": { + "@octokit/types": "^10.0.0" } } diff --git a/lambdas/yarn.lock b/lambdas/yarn.lock index 6bed0c7d52..2f20452b1c 100644 --- a/lambdas/yarn.lock +++ b/lambdas/yarn.lock @@ -1917,7 +1917,7 @@ dependencies: "@octokit/types" "^9.0.0" -"@octokit/core@^4.0.0", "@octokit/core@^4.1.0": +"@octokit/core@^4.0.0": version "4.2.0" resolved "https://registry.npmjs.org/@octokit/core/-/core-4.2.0.tgz" integrity sha512-AgvDRUg3COpR82P7PBdGZF/NNqGmtMq2NiPqeSsDIeCfYFOZ9gddqWNQHnFdEUf+YwOj4aZYmJnlPp7OXmDIDg== @@ -1930,6 +1930,19 @@ before-after-hook "^2.2.0" universal-user-agent "^6.0.0" +"@octokit/core@^4.2.1": + version "4.2.4" + resolved "https://registry.yarnpkg.com/@octokit/core/-/core-4.2.4.tgz#d8769ec2b43ff37cc3ea89ec4681a20ba58ef907" + integrity sha512-rYKilwgzQ7/imScn3M9/pFfUf4I1AZEH3KhyJmtPdE2zfaXAn2mFfUy4FbKewzc2We5y/LlKLj36fWJLKC2SIQ== + dependencies: + "@octokit/auth-token" "^3.0.0" + "@octokit/graphql" "^5.0.0" + "@octokit/request" "^6.0.0" + "@octokit/request-error" "^3.0.0" + "@octokit/types" "^9.0.0" + before-after-hook "^2.2.0" + universal-user-agent "^6.0.0" + "@octokit/endpoint@^7.0.0": version "7.0.5" resolved "https://registry.npmjs.org/@octokit/endpoint/-/endpoint-7.0.5.tgz" @@ -1964,20 +1977,10 @@ "@octokit/types" "^9.0.0" btoa-lite "^1.0.0" -"@octokit/openapi-types@^12.11.0": - version "12.11.0" - resolved "https://registry.npmjs.org/@octokit/openapi-types/-/openapi-types-12.11.0.tgz" - integrity sha512-VsXyi8peyRq9PqIz/tpqiL2w3w80OgVMwBHltTml3LmVvXiphgeqmY9mvBw9Wu7e0QWk/fqD37ux8yP5uVekyQ== - -"@octokit/openapi-types@^14.0.0": - version "14.0.0" - resolved "https://registry.npmjs.org/@octokit/openapi-types/-/openapi-types-14.0.0.tgz" - integrity sha512-HNWisMYlR8VCnNurDU6os2ikx0s0VyEjDYHNS/h4cgb8DeOxQ0n72HyinUtdDVxJhFy3FWLGl0DJhfEWk3P5Iw== - -"@octokit/openapi-types@^17.1.2": - version "17.1.2" - resolved "https://registry.yarnpkg.com/@octokit/openapi-types/-/openapi-types-17.1.2.tgz#b7bc1cc5d3581adac9dce197a21f0e5f2ceaabf1" - integrity sha512-OaS7Ol4Y+U50PbejfzQflGWRMxO04nYWO5ZBv6JerqMKE2WS/tI9VoVDDPXHBlRMGG2fOdKwtVGlFfc7AVIstw== +"@octokit/openapi-types@^18.0.0": + version "18.0.0" + resolved "https://registry.yarnpkg.com/@octokit/openapi-types/-/openapi-types-18.0.0.tgz#f43d765b3c7533fd6fb88f3f25df079c24fccf69" + integrity sha512-V8GImKs3TeQRxRtXFpG2wl19V7444NIOTDF24AWuIbmNaNYOQMWRbjcGDXV5B+0n887fgDcuMNOmlul+k+oJtw== "@octokit/plugin-enterprise-rest@6.0.1": version "6.0.1" @@ -1991,12 +1994,21 @@ dependencies: "@octokit/types" "^6.41.0" -"@octokit/plugin-paginate-rest@^6.0.0": - version "6.0.0" - resolved "https://registry.npmjs.org/@octokit/plugin-paginate-rest/-/plugin-paginate-rest-6.0.0.tgz" - integrity sha512-Sq5VU1PfT6/JyuXPyt04KZNVsFOSBaYOAq2QRZUwzVlI10KFvcbUo8lR258AAQL1Et60b0WuVik+zOWKLuDZxw== +"@octokit/plugin-paginate-rest@^6.1.2": + version "6.1.2" + resolved "https://registry.yarnpkg.com/@octokit/plugin-paginate-rest/-/plugin-paginate-rest-6.1.2.tgz#f86456a7a1fe9e58fec6385a85cf1b34072341f8" + integrity sha512-qhrmtQeHU/IivxucOV1bbI/xZyC/iOBhclokv7Sut5vnejAIAEXVcGQeRpQlU39E0WwK9lNvJHphHri/DB6lbQ== dependencies: - "@octokit/types" "^9.0.0" + "@octokit/tsconfig" "^1.0.2" + "@octokit/types" "^9.2.3" + +"@octokit/plugin-paginate-rest@^7.0.0": + version "7.1.2" + resolved "https://registry.yarnpkg.com/@octokit/plugin-paginate-rest/-/plugin-paginate-rest-7.1.2.tgz#1e25e88acab8198858960f6ef7ae02e6c9f7139f" + integrity sha512-Jx8KuKqEAVRsK6fMzZKv3h6UH9/NRDHsDRtUAROqqmZlCptM///Uef7A1ViZ/cbDplekz7VbDWdFLAZ/mpuDww== + dependencies: + "@octokit/tsconfig" "^2.0.0" + "@octokit/types" "^9.3.2" "@octokit/plugin-request-log@^1.0.4": version "1.0.4" @@ -2011,13 +2023,12 @@ "@octokit/types" "^8.1.1" deprecation "^2.3.1" -"@octokit/plugin-rest-endpoint-methods@^7.0.0": - version "7.0.1" - resolved "https://registry.npmjs.org/@octokit/plugin-rest-endpoint-methods/-/plugin-rest-endpoint-methods-7.0.1.tgz" - integrity sha512-pnCaLwZBudK5xCdrR823xHGNgqOzRnJ/mpC/76YPpNP7DybdsJtP7mdOwh+wYZxK5jqeQuhu59ogMI4NRlBUvA== +"@octokit/plugin-rest-endpoint-methods@^7.1.2": + version "7.2.3" + resolved "https://registry.yarnpkg.com/@octokit/plugin-rest-endpoint-methods/-/plugin-rest-endpoint-methods-7.2.3.tgz#37a84b171a6cb6658816c82c4082ac3512021797" + integrity sha512-I5Gml6kTAkzVlN7KCtjOM+Ruwe/rQppp0QU372K1GP7kNOYEKe8Xn5BW4sE62JAHdwpq95OQK/qGNyKQMUzVgA== dependencies: - "@octokit/types" "^9.0.0" - deprecation "^2.3.1" + "@octokit/types" "^10.0.0" "@octokit/request-error@^3.0.0", "@octokit/request-error@^3.0.3": version "3.0.3" @@ -2040,6 +2051,16 @@ node-fetch "^2.6.7" universal-user-agent "^6.0.0" +"@octokit/rest@19.0.12": + version "19.0.12" + resolved "https://registry.yarnpkg.com/@octokit/rest/-/rest-19.0.12.tgz#8e1e4319c73a434f29bee2440fd4a1b3c641d69a" + integrity sha512-oOeaUb9mmtLej6r0hjyT0KGN0NaT3BOMKLaloRrETWGeiLBWQaxaf8Z5dh7bduxxntxZoXSW5Sb5oyl0JKKAxg== + dependencies: + "@octokit/core" "^4.2.1" + "@octokit/plugin-paginate-rest" "^7.0.0" + "@octokit/plugin-request-log" "^1.0.4" + "@octokit/plugin-rest-endpoint-methods" "^7.1.2" + "@octokit/rest@19.0.3": version "19.0.3" resolved "https://registry.npmjs.org/@octokit/rest/-/rest-19.0.3.tgz" @@ -2050,36 +2071,32 @@ "@octokit/plugin-request-log" "^1.0.4" "@octokit/plugin-rest-endpoint-methods" "^6.0.0" -"@octokit/rest@19.0.7", "@octokit/rest@^19.0.7": - version "19.0.7" - resolved "https://registry.npmjs.org/@octokit/rest/-/rest-19.0.7.tgz" - integrity sha512-HRtSfjrWmWVNp2uAkEpQnuGMJsu/+dBr47dRc5QVgsCbnIc1+GFEaoKBWkYG+zjrsHpSqcAElMio+n10c0b5JA== +"@octokit/rest@^19.0.12": + version "19.0.13" + resolved "https://registry.yarnpkg.com/@octokit/rest/-/rest-19.0.13.tgz#e799393264edc6d3c67eeda9e5bd7832dcf974e4" + integrity sha512-/EzVox5V9gYGdbAI+ovYj3nXQT1TtTHRT+0eZPcuC05UFSWO3mdO9UY1C0i2eLF9Un1ONJkAk+IEtYGAC+TahA== dependencies: - "@octokit/core" "^4.1.0" - "@octokit/plugin-paginate-rest" "^6.0.0" + "@octokit/core" "^4.2.1" + "@octokit/plugin-paginate-rest" "^6.1.2" "@octokit/plugin-request-log" "^1.0.4" - "@octokit/plugin-rest-endpoint-methods" "^7.0.0" + "@octokit/plugin-rest-endpoint-methods" "^7.1.2" -"@octokit/types@^6.41.0": - version "6.41.0" - resolved "https://registry.npmjs.org/@octokit/types/-/types-6.41.0.tgz" - integrity sha512-eJ2jbzjdijiL3B4PrSQaSjuF2sPEQPVCPzBvTHJD9Nz+9dw2SGH4K4xeQJ77YfTq5bRQ+bD8wT11JbeDPmxmGg== - dependencies: - "@octokit/openapi-types" "^12.11.0" +"@octokit/tsconfig@^1.0.2": + version "1.0.2" + resolved "https://registry.yarnpkg.com/@octokit/tsconfig/-/tsconfig-1.0.2.tgz#59b024d6f3c0ed82f00d08ead5b3750469125af7" + integrity sha512-I0vDR0rdtP8p2lGMzvsJzbhdOWy405HcGovrspJ8RRibHnyRgggUSNO5AIox5LmqiwmatHKYsvj6VGFHkqS7lA== -"@octokit/types@^8.1.1": - version "8.2.1" - resolved "https://registry.npmjs.org/@octokit/types/-/types-8.2.1.tgz" - integrity sha512-8oWMUji8be66q2B9PmEIUyQm00VPDPun07umUWSaCwxmeaquFBro4Hcc3ruVoDo3zkQyZBlRvhIMEYS3pBhanw== - dependencies: - "@octokit/openapi-types" "^14.0.0" +"@octokit/tsconfig@^2.0.0": + version "2.0.0" + resolved "https://registry.yarnpkg.com/@octokit/tsconfig/-/tsconfig-2.0.0.tgz#9f316995c2fd49516b2d2ffb2e02a4341e347fe8" + integrity sha512-tWnrai3quGt8+gRN2edzo9fmraWekeryXPeXDomMw2oFSpu/lH3VSWGn/q4V+rwjTRMeeXk/ci623/01Zet4VQ== -"@octokit/types@^9.0.0": - version "9.2.2" - resolved "https://registry.yarnpkg.com/@octokit/types/-/types-9.2.2.tgz#d111d33928f288f48083bfe49d8a9a5945e67db1" - integrity sha512-9BjDxjgQIvCjNWZsbqyH5QC2Yni16oaE6xL+8SUBMzcYPF4TGQBXGA97Cl3KceK9mwiNMb1mOYCz6FbCCLEL+g== +"@octokit/types@^10.0.0", "@octokit/types@^6.41.0", "@octokit/types@^8.1.1", "@octokit/types@^9.0.0", "@octokit/types@^9.2.3", "@octokit/types@^9.3.2": + version "10.0.0" + resolved "https://registry.yarnpkg.com/@octokit/types/-/types-10.0.0.tgz#7ee19c464ea4ada306c43f1a45d444000f419a4a" + integrity sha512-Vm8IddVmhCgU1fxC1eyinpwqzXPEYu0NrYzD3YZjlGjyftdLBTeqNblRC0jmJmgxbJIsQlyogVeGnrNaaMVzIg== dependencies: - "@octokit/openapi-types" "^17.1.2" + "@octokit/openapi-types" "^18.0.0" "@octokit/webhooks-definitions@^3.67.3": version "3.67.3" diff --git a/main.tf b/main.tf index dbb704cadb..c9492e8130 100644 --- a/main.tf +++ b/main.tf @@ -215,7 +215,7 @@ module "runners" { scale_down_schedule_expression = var.scale_down_schedule_expression minimum_running_time_in_minutes = var.minimum_running_time_in_minutes runner_boot_time_in_minutes = var.runner_boot_time_in_minutes - runner_extra_labels = var.runner_extra_labels + runner_labels = local.runner_labels runner_as_root = var.runner_as_root runner_run_as = var.runner_run_as runners_maximum_count = var.runners_maximum_count diff --git a/modules/multi-runner/runners.tf b/modules/multi-runner/runners.tf index aea680bddd..698876556e 100644 --- a/modules/multi-runner/runners.tf +++ b/modules/multi-runner/runners.tf @@ -42,7 +42,7 @@ module "runners" { scale_down_schedule_expression = each.value.runner_config.scale_down_schedule_expression minimum_running_time_in_minutes = each.value.runner_config.minimum_running_time_in_minutes runner_boot_time_in_minutes = each.value.runner_config.runner_boot_time_in_minutes - runner_extra_labels = each.value.runner_config.runner_extra_labels + runner_labels = "self-hosted,${each.value.runner_config.runner_os},${each.value.runner_config.runner_architecture},${each.value.runner_config.runner_extra_labels}" runner_as_root = each.value.runner_config.runner_as_root runner_run_as = each.value.runner_config.runner_run_as runners_maximum_count = each.value.runner_config.runners_maximum_count diff --git a/modules/runners/README.md b/modules/runners/README.md index 1e0faa817d..3873e4d391 100644 --- a/modules/runners/README.md +++ b/modules/runners/README.md @@ -184,9 +184,9 @@ yarn run dist | [runner\_as\_root](#input\_runner\_as\_root) | Run the action runner under the root user. Variable `runner_run_as` will be ignored. | `bool` | `false` | no | | [runner\_boot\_time\_in\_minutes](#input\_runner\_boot\_time\_in\_minutes) | The minimum time for an EC2 runner to boot and register as a runner. | `number` | `5` | no | | [runner\_ec2\_tags](#input\_runner\_ec2\_tags) | Map of tags that will be added to the launch template instance tag specifications. | `map(string)` | `{}` | no | -| [runner\_extra\_labels](#input\_runner\_extra\_labels) | Extra labels for the runners (GitHub). Separate each label by a comma | `string` | `""` | no | | [runner\_group\_name](#input\_runner\_group\_name) | Name of the runner group. | `string` | `"Default"` | no | | [runner\_iam\_role\_managed\_policy\_arns](#input\_runner\_iam\_role\_managed\_policy\_arns) | Attach AWS or customer-managed IAM policies (by ARN) to the runner IAM role | `list(string)` | `[]` | no | +| [runner\_labels](#input\_runner\_labels) | All the labels for the runners (GitHub) including the default one's(e.g: self-hosted, linux, x64, label1, label2). Separate each label by a comma | `string` | n/a | yes | | [runner\_log\_files](#input\_runner\_log\_files) | (optional) List of logfiles to send to CloudWatch, will only be used if `enable_cloudwatch_agent` is set to true. Object description: `log_group_name`: Name of the log group, `prefix_log_group`: If true, the log group name will be prefixed with `/github-self-hosted-runners/`, `file_path`: path to the log file, `log_stream_name`: name of the log stream. |
list(object({
log_group_name = string
prefix_log_group = bool
file_path = string
log_stream_name = string
}))
| `null` | no | | [runner\_name\_prefix](#input\_runner\_name\_prefix) | The prefix used for the GitHub runner name. The prefix will be used in the default start script to prefix the instance name when register the runner in GitHub. The value is availabe via an EC2 tag 'ghr:runner\_name\_prefix'. | `string` | `""` | no | | [runner\_os](#input\_runner\_os) | The EC2 Operating System type to use for action runner instances (linux,windows). | `string` | `"linux"` | no | diff --git a/modules/runners/main.tf b/modules/runners/main.tf index 6da3b4f0e7..1738215639 100644 --- a/modules/runners/main.tf +++ b/modules/runners/main.tf @@ -41,6 +41,8 @@ locals { ami_filter = merge(local.default_ami[var.runner_os], var.ami_filter) enable_job_queued_check = var.enable_job_queued_check == null ? !var.enable_ephemeral_runners : var.enable_job_queued_check + + arn_ssm_parameters_path_config = "arn:${var.aws_partition}:ssm:${var.aws_region}:${data.aws_caller_identity.current.account_id}:parameter${var.ssm_paths.root}/${var.ssm_paths.config}" } data "aws_ami" "runner" { diff --git a/modules/runners/policies-runner.tf b/modules/runners/policies-runner.tf index 2702bdb0d9..0fbdba99f8 100644 --- a/modules/runners/policies-runner.tf +++ b/modules/runners/policies-runner.tf @@ -27,7 +27,7 @@ resource "aws_iam_role_policy" "ssm_parameters" { policy = templatefile("${path.module}/policies/instance-ssm-parameters-policy.json", { arn_ssm_parameters_path_tokens = "arn:${var.aws_partition}:ssm:${var.aws_region}:${data.aws_caller_identity.current.account_id}:parameter${var.ssm_paths.root}/${var.ssm_paths.tokens}" - arn_ssm_parameters_path_config = "arn:${var.aws_partition}:ssm:${var.aws_region}:${data.aws_caller_identity.current.account_id}:parameter${var.ssm_paths.root}/${var.ssm_paths.config}" + arn_ssm_parameters_path_config = local.arn_ssm_parameters_path_config } ) } diff --git a/modules/runners/policies/lambda-scale-up.json b/modules/runners/policies/lambda-scale-up.json index d6ec6d8561..5892de68c3 100644 --- a/modules/runners/policies/lambda-scale-up.json +++ b/modules/runners/policies/lambda-scale-up.json @@ -33,7 +33,8 @@ ], "Resource": [ "${github_app_key_base64_arn}", - "${github_app_id_arn}" + "${github_app_id_arn}", + "${ssm_config_path}/*" ] }, { diff --git a/modules/runners/pool.tf b/modules/runners/pool.tf index ce31f90faa..14a6024569 100644 --- a/modules/runners/pool.tf +++ b/modules/runners/pool.tf @@ -38,7 +38,7 @@ module "pool" { disable_runner_autoupdate = var.disable_runner_autoupdate ephemeral = var.enable_ephemeral_runners boot_time_in_minutes = var.runner_boot_time_in_minutes - extra_labels = var.runner_extra_labels + labels = var.runner_labels launch_template = aws_launch_template.runner group_name = var.runner_group_name name_prefix = var.runner_name_prefix @@ -47,9 +47,11 @@ module "pool" { } subnet_ids = var.subnet_ids ssm_token_path = "${var.ssm_paths.root}/${var.ssm_paths.tokens}" + ssm_config_path = "${var.ssm_paths.root}/${var.ssm_paths.config}" ami_id_ssm_parameter_name = var.ami_id_ssm_parameter_name ami_id_ssm_parameter_read_policy_arn = var.ami_id_ssm_parameter_name != null ? aws_iam_policy.ami_id_ssm_parameter_read[0].arn : null tags = local.tags + arn_ssm_parameters_path_config = local.arn_ssm_parameters_path_config } aws_partition = var.aws_partition diff --git a/modules/runners/pool/README.md b/modules/runners/pool/README.md index 0b590a2bc7..3e50e60b47 100644 --- a/modules/runners/pool/README.md +++ b/modules/runners/pool/README.md @@ -46,7 +46,7 @@ No modules. | Name | Description | Type | Default | Required | |------|-------------|------|---------|:--------:| | [aws\_partition](#input\_aws\_partition) | (optional) partition for the arn if not 'aws' | `string` | `"aws"` | no | -| [config](#input\_config) | n/a |
object({
lambda = object({
log_level = string
logging_retention_in_days = number
logging_kms_key_id = string
reserved_concurrent_executions = number
s3_bucket = string
s3_key = string
s3_object_version = string
security_group_ids = list(string)
runtime = string
architecture = string
timeout = number
zip = string
subnet_ids = list(string)
})
tags = map(string)
ghes = object({
url = string
ssl_verify = string
})
github_app_parameters = object({
key_base64 = map(string)
id = map(string)
})
subnet_ids = list(string)
runner = object({
disable_runner_autoupdate = bool
ephemeral = bool
boot_time_in_minutes = number
extra_labels = string
launch_template = object({
name = string
})
group_name = string
name_prefix = string
pool_owner = string
role = object({
arn = string
})
})
instance_types = list(string)
instance_target_capacity_type = string
instance_allocation_strategy = string
instance_max_spot_price = string
prefix = string
pool = list(object({
schedule_expression = string
size = number
}))
role_permissions_boundary = string
kms_key_arn = string
ami_kms_key_arn = string
role_path = string
ssm_token_path = string
ami_id_ssm_parameter_name = string
ami_id_ssm_parameter_read_policy_arn = string
})
| n/a | yes | +| [config](#input\_config) | n/a |
object({
lambda = object({
log_level = string
logging_retention_in_days = number
logging_kms_key_id = string
reserved_concurrent_executions = number
s3_bucket = string
s3_key = string
s3_object_version = string
security_group_ids = list(string)
runtime = string
architecture = string
timeout = number
zip = string
subnet_ids = list(string)
})
tags = map(string)
ghes = object({
url = string
ssl_verify = string
})
github_app_parameters = object({
key_base64 = map(string)
id = map(string)
})
subnet_ids = list(string)
runner = object({
disable_runner_autoupdate = bool
ephemeral = bool
boot_time_in_minutes = number
labels = string
launch_template = object({
name = string
})
group_name = string
name_prefix = string
pool_owner = string
role = object({
arn = string
})
})
instance_types = list(string)
instance_target_capacity_type = string
instance_allocation_strategy = string
instance_max_spot_price = string
prefix = string
pool = list(object({
schedule_expression = string
size = number
}))
role_permissions_boundary = string
kms_key_arn = string
ami_kms_key_arn = string
role_path = string
ssm_token_path = string
ssm_config_path = string
ami_id_ssm_parameter_name = string
ami_id_ssm_parameter_read_policy_arn = string
arn_ssm_parameters_path_config = string
})
| n/a | yes | | [lambda\_tracing\_mode](#input\_lambda\_tracing\_mode) | Enable X-Ray tracing for the lambda functions. | `string` | `null` | no | ## Outputs diff --git a/modules/runners/pool/main.tf b/modules/runners/pool/main.tf index f0b3e3491d..319b40ed69 100644 --- a/modules/runners/pool/main.tf +++ b/modules/runners/pool/main.tf @@ -33,12 +33,13 @@ resource "aws_lambda_function" "pool" { PARAMETER_GITHUB_APP_KEY_BASE64_NAME = var.config.github_app_parameters.key_base64.name POWERTOOLS_LOGGER_LOG_EVENT = var.config.lambda.log_level == "debug" ? "true" : "false" RUNNER_BOOT_TIME_IN_MINUTES = var.config.runner.boot_time_in_minutes - RUNNER_EXTRA_LABELS = var.config.runner.extra_labels + RUNNER_LABELS = var.config.runner.labels RUNNER_GROUP_NAME = var.config.runner.group_name RUNNER_NAME_PREFIX = var.config.runner.name_prefix RUNNER_OWNER = var.config.runner.pool_owner SERVICE_NAME = "runners-pool" SSM_TOKEN_PATH = var.config.ssm_token_path + SSM_CONFIG_PATH = var.config.ssm_config_path SUBNET_IDS = join(",", var.config.subnet_ids) } } @@ -78,11 +79,12 @@ resource "aws_iam_role_policy" "pool" { name = "${var.config.prefix}-lambda-pool-policy" role = aws_iam_role.pool.name policy = templatefile("${path.module}/policies/lambda-pool.json", { - arn_runner_instance_role = var.config.runner.role.arn - github_app_id_arn = var.config.github_app_parameters.id.arn - github_app_key_base64_arn = var.config.github_app_parameters.key_base64.arn - kms_key_arn = var.config.kms_key_arn - ami_kms_key_arn = var.config.ami_kms_key_arn + arn_ssm_parameters_path_config = var.config.arn_ssm_parameters_path_config + arn_runner_instance_role = var.config.runner.role.arn + github_app_id_arn = var.config.github_app_parameters.id.arn + github_app_key_base64_arn = var.config.github_app_parameters.key_base64.arn + kms_key_arn = var.config.kms_key_arn + ami_kms_key_arn = var.config.ami_kms_key_arn }) } diff --git a/modules/runners/pool/policies/lambda-pool.json b/modules/runners/pool/policies/lambda-pool.json index 3306892f8c..5bd7bb82f7 100644 --- a/modules/runners/pool/policies/lambda-pool.json +++ b/modules/runners/pool/policies/lambda-pool.json @@ -26,6 +26,18 @@ ], "Resource": "*" }, + { + "Effect": "Allow", + "Action": [ + "ssm:GetParameter", + "ssm:GetParameters", + "ssm:GetParametersByPath" + ], + "Resource": [ + "${arn_ssm_parameters_path_config}", + "${arn_ssm_parameters_path_config}/*" + ] + }, { "Effect": "Allow", "Action": [ diff --git a/modules/runners/pool/variables.tf b/modules/runners/pool/variables.tf index 29be761d1e..8377ba25dd 100644 --- a/modules/runners/pool/variables.tf +++ b/modules/runners/pool/variables.tf @@ -29,7 +29,7 @@ variable "config" { disable_runner_autoupdate = bool ephemeral = bool boot_time_in_minutes = number - extra_labels = string + labels = string launch_template = object({ name = string }) @@ -54,8 +54,10 @@ variable "config" { ami_kms_key_arn = string role_path = string ssm_token_path = string + ssm_config_path = string ami_id_ssm_parameter_name = string ami_id_ssm_parameter_read_policy_arn = string + arn_ssm_parameters_path_config = string }) } diff --git a/modules/runners/scale-up.tf b/modules/runners/scale-up.tf index c95ce304cd..102c78c5a9 100644 --- a/modules/runners/scale-up.tf +++ b/modules/runners/scale-up.tf @@ -34,12 +34,13 @@ resource "aws_lambda_function" "scale_up" { PARAMETER_GITHUB_APP_ID_NAME = var.github_app_parameters.id.name PARAMETER_GITHUB_APP_KEY_BASE64_NAME = var.github_app_parameters.key_base64.name POWERTOOLS_LOGGER_LOG_EVENT = var.log_level == "debug" ? "true" : "false" - RUNNER_EXTRA_LABELS = lower(var.runner_extra_labels) + RUNNER_LABELS = lower(var.runner_labels) RUNNER_GROUP_NAME = var.runner_group_name RUNNER_NAME_PREFIX = var.runner_name_prefix RUNNERS_MAXIMUM_COUNT = var.runners_maximum_count SERVICE_NAME = "runners-scale-up" SSM_TOKEN_PATH = "${var.ssm_paths.root}/${var.ssm_paths.tokens}" + SSM_CONFIG_PATH = "${var.ssm_paths.root}/${var.ssm_paths.config}" SUBNET_IDS = join(",", var.subnet_ids) } } @@ -97,6 +98,7 @@ resource "aws_iam_role_policy" "scale_up" { sqs_arn = var.sqs_build_queue.arn github_app_id_arn = var.github_app_parameters.id.arn github_app_key_base64_arn = var.github_app_parameters.key_base64.arn + ssm_config_path = "arn:aws:ssm:${var.aws_region}:${data.aws_caller_identity.current.account_id}:parameter${var.ssm_paths.root}/${var.ssm_paths.config}" kms_key_arn = local.kms_key_arn ami_kms_key_arn = local.ami_kms_key_arn }) diff --git a/modules/runners/templates/start-runner.sh b/modules/runners/templates/start-runner.sh index 03e1782e50..278b473213 100644 --- a/modules/runners/templates/start-runner.sh +++ b/modules/runners/templates/start-runner.sh @@ -80,9 +80,6 @@ fi chown -R $run_as . -echo "Configure GH Runner as user $run_as" -sudo --preserve-env=RUNNER_ALLOW_RUNASROOT -u "$run_as" -- ./config.sh --unattended --name "$runner_name_prefix$instance_id" --work "_work" $${config} - info_arch=$(uname -p) info_os=$(( lsb_release -ds || cat /etc/*release || uname -om ) 2>/dev/null | head -n1 | cut -d "=" -f2- | tr -d '"') @@ -112,7 +109,8 @@ if [[ $agent_mode = "ephemeral" ]]; then cat >/opt/start-runner-service.sh <<-EOF echo "Starting the runner in ephemeral mode" - sudo --preserve-env=RUNNER_ALLOW_RUNASROOT -u "$run_as" -- ./run.sh + + sudo --preserve-env=RUNNER_ALLOW_RUNASROOT -u "$run_as" -- ./run.sh --jitconfig $${config} echo "Runner has finished" echo "Stopping cloudwatch service" @@ -125,6 +123,9 @@ EOF nohup /opt/start-runner-service.sh & else + echo "Configure GH Runner as user $run_as" + sudo --preserve-env=RUNNER_ALLOW_RUNASROOT -u "$run_as" -- ./config.sh --unattended --name "$runner_name_prefix$instance_id" --work "_work" $${config} + echo "Installing the runner as a service" ./svc.sh install "$run_as" echo "Starting the runner in persistent mode" diff --git a/modules/runners/variables.tf b/modules/runners/variables.tf index 157a571ee0..d4b369b5c0 100644 --- a/modules/runners/variables.tf +++ b/modules/runners/variables.tf @@ -214,10 +214,9 @@ variable "runner_boot_time_in_minutes" { default = 5 } -variable "runner_extra_labels" { - description = "Extra labels for the runners (GitHub). Separate each label by a comma" +variable "runner_labels" { + description = "All the labels for the runners (GitHub) including the default one's(e.g: self-hosted, linux, x64, label1, label2). Separate each label by a comma" type = string - default = "" } variable "runner_group_name" {