Skip to content

Commit

Permalink
# This is a combination of 2 commits.
Browse files Browse the repository at this point in the history
# This is the 1st commit message:

Rebase

# This is the commit message #2:

fix: terraform error
  • Loading branch information
npalm committed Aug 5, 2021
1 parent 1a6324d commit 510db35
Show file tree
Hide file tree
Showing 11 changed files with 552 additions and 126 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,11 @@ A logical question would be why not Kubernetes? In the current approach we stay

## Overview

The moment a GitHub action workflow requiring a `self-hosted` runner is triggered, GitHub will try to find a runner which can execute the workload. This module reacts to GitHub's [`check_run` event](https://docs.github.com/en/free-pro-team@latest/developers/webhooks-and-events/webhook-events-and-payloads#check_run) for the triggered workflow and creates a new runner if necessary.
The moment a GitHub action workflow requiring a `self-hosted` runner is triggered, GitHub will try to find a runner which can execute the workload. This module reacts to GitHub's [`workflow_job` event](https://docs.github.com/en/free-pro-team@latest/developers/webhooks-and-events/webhook-events-and-payloads#workflow_job) for the triggered workflow and creates a new runner if necessary.

For receiving the `check_run` event, a GitHub App needs to be created with a webhook to which the event will be published. Installing the GitHub App in a specific repository or all repositories ensures the `check_run` event will be sent to the webhook.
For receiving the `workflow_job` event, a Webhook needs to be created. The webhook hook can be defined on enterprise, org, repo, or app level. When using the GitHub app ensure the app is installed in the specific repository or all repositories.

In AWS a [API gateway](https://docs.aws.amazon.com/apigateway/index.html) endpoint is created that is able to receive the GitHub webhook events via HTTP post. The gateway triggers the webhook lambda which will verify the signature of the event. This check guarantees the event is sent by the GitHub App. The lambda only handles `check_run` events with status `created`. The accepted events are posted on a SQS queue. Messages on this queue will be delayed for a configurable amount of seconds (default 30 seconds) to give the available runners time to pick up this build.
In AWS a [API gateway](https://docs.aws.amazon.com/apigateway/index.html) endpoint is created that is able to receive the GitHub webhook events via HTTP post. The gateway triggers the webhook lambda which will verify the signature of the event. This check guarantees the event is sent by the GitHub App. The lambda only handles `workflow_job` events with status `queued` and matching the runner labels. The accepted events are posted on a SQS queue. Messages on this queue will be delayed for a configurable amount of seconds (default 30 seconds) to give the available runners time to pick up this build.

The "scale up runner" lambda is listening to the SQS queue and picks up events. The lambda runs various checks to decide whether a new EC2 spot instance needs to be created. For example, the instance is not created if the build is already started by an existing runner, or the maximum number of runners is reached.

Expand All @@ -56,7 +56,7 @@ Secrets and private keys are stored in SSM Parameter Store. These values are enc

Permission are managed on several places. Below the most important ones. For details check the Terraform sources.

- The GitHub App requires access to actions and publish `check_run` events to AWS.
- The GitHub App requires access to actions and publish `workflow_job` events to the AWS webhook (API gateway).
- The scale up lambda should have access to EC2 for creating and tagging instances.
- The scale down lambda should have access to EC2 to terminate instances.

Expand Down
25 changes: 15 additions & 10 deletions examples/default/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,17 @@ resource "random_password" "random" {
length = 28
}

module "runners" {
source = "../../"

aws_region = local.aws_region
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
################################################################################
### Hybrid acccount
################################################################################

module "runners" {
source = "../../"
create_service_linked_role_spot = true
aws_region = local.aws_region
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets

environment = local.environment
tags = {
Expand All @@ -27,11 +32,11 @@ module "runners" {
webhook_secret = random_password.random.result
}

webhook_lambda_zip = "lambdas-download/webhook.zip"
runner_binaries_syncer_lambda_zip = "lambdas-download/runner-binaries-syncer.zip"
runners_lambda_zip = "lambdas-download/runners.zip"
enable_organization_runners = false
runner_extra_labels = "default,example"
# webhook_lambda_zip = "lambdas-download/webhook.zip"
# runner_binaries_syncer_lambda_zip = "lambdas-download/runner-binaries-syncer.zip"
# runners_lambda_zip = "lambdas-download/runners.zip"
enable_organization_runners = true
runner_extra_labels = "default,example"

# enable access to the runners via SSM
enable_ssm_on_runners = true
Expand Down
35 changes: 26 additions & 9 deletions modules/runners/lambdas/runners/src/scale-runners/scale-up.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ const mockOctokit = {
actions: {
createRegistrationTokenForOrg: jest.fn(),
createRegistrationTokenForRepo: jest.fn(),
getJobForWorkflowRun: jest.fn(),
},
apps: {
getOrgInstallation: jest.fn(),
Expand All @@ -30,15 +31,15 @@ const mockCreateClient = mocked(ghAuth.createOctoClient, true);

const TEST_DATA: scaleUpModule.ActionRequestMessage = {
id: 1,
eventType: 'check_run',
eventType: 'workflow_job',
repositoryName: 'hello-world',
repositoryOwner: 'Codertocat',
installationId: 2,
};

const TEST_DATA_WITHOUT_INSTALL_ID: scaleUpModule.ActionRequestMessage = {
id: 3,
eventType: 'check_run',
eventType: 'workflow_job',
repositoryName: 'hello-world',
repositoryOwner: 'Codertocat',
installationId: 0,
Expand Down Expand Up @@ -69,6 +70,12 @@ beforeEach(() => {
process.env.ENVIRONMENT = 'unit-test-environment';
process.env.LAUNCH_TEMPLATE_NAME = 'lt-1,lt-2';

mockOctokit.actions.getJobForWorkflowRun.mockImplementation(() => ({
data: {
status: 'queued',
},
}));

mockOctokit.checks.get.mockImplementation(() => ({
data: {
status: 'queued',
Expand Down Expand Up @@ -126,16 +133,16 @@ describe('scaleUp with GHES', () => {

it('checks queued workflows', async () => {
await scaleUpModule.scaleUp('aws:sqs', TEST_DATA);
expect(mockOctokit.checks.get).toBeCalledWith({
check_run_id: TEST_DATA.id,
expect(mockOctokit.actions.getJobForWorkflowRun).toBeCalledWith({
job_id: TEST_DATA.id,
owner: TEST_DATA.repositoryOwner,
repo: TEST_DATA.repositoryName,
});
});

it('does not list runners when no workflows are queued', async () => {
mockOctokit.checks.get.mockImplementation(() => ({
data: { total_count: 0, runners: [] },
mockOctokit.actions.getJobForWorkflowRun.mockImplementation(() => ({
data: { total_count: 0 },
}));
await scaleUpModule.scaleUp('aws:sqs', TEST_DATA);
expect(listRunners).not.toBeCalled();
Expand Down Expand Up @@ -200,6 +207,11 @@ describe('scaleUp with GHES', () => {
expect(createRunner).toBeCalledWith(expectedRunnerParams, 'lt-1');
});

it('creates a runner with legacy event check_run', async () => {
await scaleUpModule.scaleUp('aws:sqs', { ...TEST_DATA, eventType: 'check_run' });
expect(createRunner).toBeCalledWith(expectedRunnerParams, 'lt-1');
});

it('creates a runner with labels in a specific group', async () => {
process.env.RUNNER_EXTRA_LABELS = 'label1,label2';
process.env.RUNNER_GROUP_NAME = 'TEST_GROUP';
Expand Down Expand Up @@ -339,8 +351,8 @@ describe('scaleUp with public GH', () => {

it('checks queued workflows', async () => {
await scaleUpModule.scaleUp('aws:sqs', TEST_DATA);
expect(mockOctokit.checks.get).toBeCalledWith({
check_run_id: TEST_DATA.id,
expect(mockOctokit.actions.getJobForWorkflowRun).toBeCalledWith({
job_id: TEST_DATA.id,
owner: TEST_DATA.repositoryOwner,
repo: TEST_DATA.repositoryName,
});
Expand All @@ -363,7 +375,7 @@ describe('scaleUp with public GH', () => {
});

it('does not list runners when no workflows are queued', async () => {
mockOctokit.checks.get.mockImplementation(() => ({
mockOctokit.actions.getJobForWorkflowRun.mockImplementation(() => ({
data: { status: 'completed' },
}));
await scaleUpModule.scaleUp('aws:sqs', TEST_DATA);
Expand Down Expand Up @@ -406,6 +418,11 @@ describe('scaleUp with public GH', () => {
expect(createRunner).toBeCalledWith(expectedRunnerParams, LAUNCH_TEMPLATE);
});

it('creates a runner with legacy event check_run', async () => {
await scaleUpModule.scaleUp('aws:sqs', { ...TEST_DATA, eventType: 'check_run' });
expect(createRunner).toBeCalledWith(expectedRunnerParams, LAUNCH_TEMPLATE);
});

it('creates a runner with labels in s specific group', async () => {
process.env.RUNNER_EXTRA_LABELS = 'label1,label2';
process.env.RUNNER_GROUP_NAME = 'TEST_GROUP';
Expand Down
60 changes: 40 additions & 20 deletions modules/runners/lambdas/runners/src/scale-runners/scale-up.ts
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
import { listRunners, createRunner, RunnerInputParameters } from './runners';
import { createOctoClient, createGithubAuth } from './gh-auth';
import yn from 'yn';
import { Octokit } from '@octokit/rest';

export interface ActionRequestMessage {
id: number;
eventType: string;
eventType: 'check_run' | 'workflow_job';
repositoryName: string;
repositoryOwner: string;
installationId: number;
Expand All @@ -30,31 +31,27 @@ export const scaleUp = async (eventSource: string, payload: ActionRequestMessage
const githubClient = await createOctoClient(ghAuth.token, ghesApiUrl);
installationId = enableOrgLevel
? (
await githubClient.apps.getOrgInstallation({
org: payload.repositoryOwner,
})
).data.id
await githubClient.apps.getOrgInstallation({
org: payload.repositoryOwner,
})
).data.id
: (
await githubClient.apps.getRepoInstallation({
owner: payload.repositoryOwner,
repo: payload.repositoryName,
})
).data.id;
await githubClient.apps.getRepoInstallation({
owner: payload.repositoryOwner,
repo: payload.repositoryName,
})
).data.id;
}

const ghAuth = await createGithubAuth(installationId, 'installation', ghesApiUrl);

const githubInstallationClient = await createOctoClient(ghAuth.token, ghesApiUrl);
const checkRun = await githubInstallationClient.checks.get({
check_run_id: payload.id,
owner: payload.repositoryOwner,
repo: payload.repositoryName,
});

const runnerType = enableOrgLevel ? 'Org' : 'Repo';
const runnerOwner = enableOrgLevel ? payload.repositoryOwner : `${payload.repositoryOwner}/${payload.repositoryName}`;

if (checkRun.data.status === 'queued') {
const isQueued = await getJobStatus(githubInstallationClient, payload);
if (isQueued) {
const currentRunners = await listRunners({
environment,
runnerType,
Expand All @@ -67,9 +64,9 @@ export const scaleUp = async (eventSource: string, payload: ActionRequestMessage
const registrationToken = enableOrgLevel
? await githubInstallationClient.actions.createRegistrationTokenForOrg({ org: payload.repositoryOwner })
: await githubInstallationClient.actions.createRegistrationTokenForRepo({
owner: payload.repositoryOwner,
repo: payload.repositoryName,
});
owner: payload.repositoryOwner,
repo: payload.repositoryName,
});
const token = registrationToken.data.token;

const labelsArgument = runnerExtraLabels !== undefined ? `--labels ${runnerExtraLabels}` : '';
Expand All @@ -81,7 +78,7 @@ export const scaleUp = async (eventSource: string, payload: ActionRequestMessage
runnerServiceConfig: enableOrgLevel
? `--url ${configBaseUrl}/${payload.repositoryOwner} --token ${token} ${labelsArgument}${runnerGroupArgument}`
: `--url ${configBaseUrl}/${payload.repositoryOwner}/${payload.repositoryName} ` +
`--token ${token} ${labelsArgument}`,
`--token ${token} ${labelsArgument}`,
runnerOwner,
runnerType,
});
Expand All @@ -91,6 +88,29 @@ export const scaleUp = async (eventSource: string, payload: ActionRequestMessage
}
};

async function getJobStatus(githubInstallationClient: Octokit, payload: ActionRequestMessage): Promise<boolean> {
let isQueued = false;
if (payload.eventType === 'workflow_job') {
const jobForWorkflowRun = await githubInstallationClient.actions.getJobForWorkflowRun({
job_id: payload.id,
owner: payload.repositoryOwner,
repo: payload.repositoryName,
});
isQueued = jobForWorkflowRun.data.status === 'queued';
} else if (payload.eventType === 'check_run') {
const checkRun = await githubInstallationClient.checks.get({
check_run_id: payload.id,
owner: payload.repositoryOwner,
repo: payload.repositoryName,
});
isQueued = checkRun.data.status === 'queued';
} else {
throw Error(`Event ${payload.eventType} is not supported`);
}

return isQueued;
}

export async function createRunnerLoop(runnerParameters: RunnerInputParameters): Promise<void> {
const launchTemplateNames = process.env.LAUNCH_TEMPLATE_NAME?.split(',') as string[];
let launched = false;
Expand Down
1 change: 1 addition & 0 deletions modules/webhook/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ No Modules.
| repository\_white\_list | List of repositories allowed to use the github app | `list(string)` | `[]` | no |
| role\_path | The path that will be added to the role, if not set the environment name will be used. | `string` | `null` | no |
| role\_permissions\_boundary | Permissions boundary that will be added to the created role for the lambda. | `string` | `null` | no |
| runner\_extra\_labels | Extra labels for the runners (GitHub). Separate each label by a comma | `string` | `""` | no |
| sqs\_build\_queue | SQS queue to publish accepted build events. | <pre>object({<br> id = string<br> arn = string<br> })</pre> | n/a | yes |
| tags | Map of tags that will be added to created resources. By default resources will be tagged with name and environment. | `map(string)` | `{}` | no |
| webhook\_lambda\_s3\_key | S3 key for webhook lambda function. Required if using S3 bucket to specify lambdas. | `any` | `null` | no |
Expand Down
Loading

0 comments on commit 510db35

Please sign in to comment.