Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support AWS EventBridge #4188

Merged
merged 40 commits into from
Oct 28, 2024
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
d309e3d
feat: Adding support eventbridge
npalm Oct 16, 2024
9c54c68
add tests
npalm Oct 16, 2024
cabcb69
add tests
npalm Oct 16, 2024
9cff596
add tests
npalm Oct 16, 2024
e41af5e
refactor / clean terraform code for webhook
npalm Oct 17, 2024
c0fe919
update toplevel modules
npalm Oct 17, 2024
6649aa9
docs: auto update terraform docs
Oct 17, 2024
bd15fc9
rename entry points lambda
npalm Oct 17, 2024
0c342d6
clean
npalm Oct 17, 2024
55ccf5d
code cleanup
npalm Oct 17, 2024
1041336
adjust permissions for lambda
npalm Oct 17, 2024
d1ad024
enable eventbride mode for multi-runner
npalm Oct 17, 2024
8ef42b5
format code
npalm Oct 17, 2024
2f8eebc
docs: auto update terraform docs
Oct 17, 2024
2f509fe
pass allowed events to lambda
npalm Oct 17, 2024
36a224f
docs: auto update terraform docs
Oct 17, 2024
49db66c
reset default example
npalm Oct 17, 2024
3b5b3d5
update docs
npalm Oct 17, 2024
0c16c55
adjust outputs
npalm Oct 18, 2024
0c8e93f
remove commented code
npalm Oct 21, 2024
5f1f0fe
Introduce object to configure eventbridge
npalm Oct 21, 2024
27c92a9
docs: auto update terraform docs
Oct 21, 2024
6ac6580
Update lambdas/functions/webhook/src/lambda.test.ts
npalm Oct 21, 2024
c7e7d3b
Update docs/index.md
npalm Oct 21, 2024
8aa9d9e
Apply suggestions from code review
npalm Oct 22, 2024
5c107e3
review suggestions
npalm Oct 22, 2024
f5001db
docs: auto update terraform docs
Oct 22, 2024
f329b4c
Merge branch 'main' into npalm/eventbridge
npalm Oct 22, 2024
47cb781
review suggestions
npalm Oct 23, 2024
0deeee8
docs: auto update terraform docs
Oct 23, 2024
94ccd52
Merge branch 'main' into npalm/eventbridge
stuartp44 Oct 24, 2024
1d224e8
Fix lock file
stuartp44 Oct 24, 2024
ec8fe8b
Apply suggestions from code review
npalm Oct 24, 2024
21abef5
docs: auto update terraform docs
Oct 24, 2024
8526ef0
add logging for dispatching
npalm Oct 24, 2024
c3ea002
dedupe lock file, and fix dependency warnings
npalm Oct 24, 2024
8279322
typos
npalm Oct 24, 2024
2d17a5b
docs: auto update terraform docs
Oct 24, 2024
b8b33ae
Merge branch 'main' into npalm/eventbridge
npalm Oct 25, 2024
8841c26
docs: auto update terraform docs
Oct 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 21 additions & 1 deletion .terraform.lock.hcl

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@ Talk to the forestkeepers in the `runners-channel` on Slack.
| <a name="input_enable_user_data_debug_logging_runner"></a> [enable\_user\_data\_debug\_logging\_runner](#input\_enable\_user\_data\_debug\_logging\_runner) | Option to enable debug logging for user-data, this logs all secrets as well. | `bool` | `false` | no |
| <a name="input_enable_userdata"></a> [enable\_userdata](#input\_enable\_userdata) | Should the userdata script be enabled for the runner. Set this to false if you are using your own prebuilt AMI. | `bool` | `true` | no |
| <a name="input_enable_workflow_job_events_queue"></a> [enable\_workflow\_job\_events\_queue](#input\_enable\_workflow\_job\_events\_queue) | Enabling this experimental feature will create a secondory sqs queue to which a copy of the workflow\_job event will be delivered. | `bool` | `false` | no |
npalm marked this conversation as resolved.
Show resolved Hide resolved
| <a name="input_eventbridge"></a> [eventbridge](#input\_eventbridge) | Enable the use of EventBridge by the module. By enable this feature events will be putted on the EventBridge bhy the<br/> webhook instead of directly dispatchting to queues for sacling.<br/><br/> `enable`: Enable the EventBridge feature.<br/> `accept_events`: List can be used to only allow specific events to be putted on the EventBridge. By default all events, empty list will be be interpreted as all events. | <pre>object({<br/> enable = optional(bool, false)<br/> accept_events = optional(list(string), null)<br/> })</pre> | `{}` | no |
npalm marked this conversation as resolved.
Show resolved Hide resolved
| <a name="input_ghes_ssl_verify"></a> [ghes\_ssl\_verify](#input\_ghes\_ssl\_verify) | GitHub Enterprise SSL verification. Set to 'false' when custom certificate (chains) is used for GitHub Enterprise Server (insecure). | `bool` | `true` | no |
| <a name="input_ghes_url"></a> [ghes\_url](#input\_ghes\_url) | GitHub Enterprise Server URL. Example: https://github.internal.co - DO NOT SET IF USING PUBLIC GITHUB | `string` | `null` | no |
| <a name="input_github_app"></a> [github\_app](#input\_github\_app) | GitHub app parameters, see your github app. Ensure the key is the base64-encoded `.pem` file (the output of `base64 app.private-key.pem`, not the content of `private-key.pem`). | <pre>object({<br/> key_base64 = string<br/> id = string<br/> webhook_secret = string<br/> })</pre> | n/a | yes |
Expand Down
Binary file modified docs/assets/aws-architecture.dark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/aws-architecture.light.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
77 changes: 76 additions & 1 deletion docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ To be able to support a number of use-cases, the module has quite a lot of confi

- Org vs Repo level. You can configure the module to connect the runners in GitHub on an org level and share the runners in your org, or set the runners on repo level and the module will install the runner to the repo. There can be multiple repos but runners are not shared between repos.
- Multi-Runner module. This modules allows you to create multiple runner configurations with a single webhook and single GitHub App to simplify deployment of different types of runners. Check the detailed module [documentation](modules/public/multi-runner.md) for more information or checkout the [multi-runner example](examples/multi-runner.md).
- Workflow job event. You can configure the webhook in GitHub to send workflow job events to the webhook. Workflow job events were introduced by GitHub in September 2021 and are designed to support scalable runners. We advise using the workflow job event when possible.
- Webhook mode, the module can be deployed in the mode `direct` and `eventbridge` (Experimental). The `direct` mode is the default and will directly distribute to SQS for the scale-up lambda. The `eventbridge` mode will publish the event to an event bus with a target rule the events are sent to a dispatch lambda. The dispatch lambda will send the event to the SQS queue. The `eventbridge` mode is useful when you want to have more control over the events and potentially filter them. The `eventbridge` mode is disabled by default. We expect thhe `eventbridge` mode will be the future direction to build a data lake, build metrics, acto on `workflow_job` job started events, etc.
npalm marked this conversation as resolved.
Show resolved Hide resolved
- Linux vs Windows. You can configure the OS types linux and win. Linux will be used by default.
- Re-use vs Ephemeral. By default runners are re-used, until detected idle. Once idle they will be removed from the pool. To improve security we are introducing ephemeral runners. Those runners are only used for one job. Ephemeral runners only work in combination with the workflow job event. For ephemeral runners the lambda requests a JIT (just in time) configuration via the GitHub API to register the runner. [JIT configuration](https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-just-in-time-runners) is limited to ephemeral runners (and currently not supported by GHES). For non-ephemeral runners, a registration token is always requested. In both cases the configuration is made available to the instance via the same SSM parameter. To disable JIT configuration for ephemeral runners set `enable_jit_config` to `false`. We also suggest using a pre-build AMI to improve the start time of jobs for ephemeral runners.
- Job retry (**Beta**). By default the scale-up lambda will discard the message when it is handled. Meaning in the ephemeral use-case an instance is created. The created runner will ask GitHub for a job, no guarantee it will run the job for which it was scaling. Result could be that with small system hick-up the job is keeping waiting for a runner. Enable a pool (org runners) is one option to avoid this problem. Another option is to enable the job retry function. Which will retry the job after a delay for a configured number of times.
Expand Down Expand Up @@ -259,8 +259,83 @@ Below an example of the the log messages created.
}
```

### EventBridge

The module can be deployed in the mode `eventbridge` (Experimental). The `eventbridge` mode will publish the event to an event bus with a target rule the events are sent to a dispatch lambda. The dispatch lambda will send the event to the SQS queue. The `eventbridge` mode is disabled by default. We expect thhe `eventbridge` mode will be the future direction to build a data lake, build metrics, acto on `workflow_job` job started events, etc.
npalm marked this conversation as resolved.
Show resolved Hide resolved

Example to use the EventBridge:

```hcl

module "runners" {
source = "philips-labs/github-runners/aws"

...
eventbridge = {
enable = true
}
...
}

locals {
event_bus_name = module.runners.webhook.eventbridge.event_bus.name
}

resource "aws_cloudwatch_event_rule" "example" {
name = "${local.prefix}-github-events-all"
description = "Caputure all GitHub events"
event_bus_name = local.event_bus_name
event_pattern = <<EOF
{
"source": [{
"prefix": "github"
}]
}
EOF
}

resource "aws_cloudwatch_event_target" "main" {
rule = aws_cloudwatch_event_rule.example.name
arn = <arn of target>
event_bus_name = local.event_bus_name
role_arn = aws_iam_role.event_rule_firehose_role.arn
}

data "aws_iam_policy_document" "event_rule_firehose_role" {
statement {
actions = ["sts:AssumeRole"]

principals {
type = "Service"
identifiers = ["events.amazonaws.com"]
}
}
}

resource "aws_iam_role" "event_rule_role" {
name = "${local.prefix}-eventbridge-github-rule"
assume_role_policy = data.aws_iam_policy_document.event_rule_firehose_role.json
}

data aws_iam_policy_document firehose_stream {
statement {
INSER_YOUR_POIICY_HERE_TO_ACCESS_THE_TARGET
}
}

resource "aws_iam_role_policy" "event_rule_firehose_role" {
name = "target-event-rule-firehose"
role = aws_iam_role.event_rule_firehose_role.name
policy = data.aws_iam_policy_document.firehose_stream.json
}
```

### Queue to publish workflow job events

!!! warning "Deprecated

This fearure will be removed since we introducing the EventBridge. Same functinallity can be implemented by adding a rule to the EventBridge to forward `workflow_job` events to the SQS queue.

This queue is an experimental feature to allow you to receive a copy of the wokflow_jobs events sent by the GitHub App. This can be used to calculate a matrix or monitor the system.

To enable the feature set `enable_workflow_job_events_queue = true`. Be aware though, this feature is experimental!
Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ The diagram below shows the architecture of the module, groups are indicating th

### Webhook

The moment a GitHub action workflow requiring a `self-hosted` runner is triggered, GitHub will try to find a runner which can execute the workload. See [additional notes](additional_notes.md) for how the selection is made. This module reacts to GitHub's [`workflow_job` event](https://docs.github.com/en/free-pro-team@latest/developers/webhooks-and-events/webhook-events-and-payloads#workflow_job) for the triggered workflow and creates a new runner if necessary.
The moment a GitHub action workflow requiring a `self-hosted` runner is triggered, GitHub will try to find a runner which can execute the workload. See [additional notes](additional_notes.md) for how the selection is made. The module can be deployed in two modes. One mode called `direct`, after accepting the [`workflow_job` event](https://docs.github.com/en/free-pro-team@latest/developers/webhooks-and-events/webhook-events-and-payloads#workflow_job) event the module will dispatch the event to a SQS queue on which the scale-up function will act. The second mode, `eventbridge` will funnel events via the AWS EventBridge. the EventBridge enables act on other events then only the `workflow_job` event with status `queued`. besides that the EventBridge suppors replay functionality. For future exenstions to act on events or create a data lake we will relay on the EventBridge.
npalm marked this conversation as resolved.
Show resolved Hide resolved

For receiving the `workflow_job` event by the webhook (lambda), a webhook needs to be created in GitHub. The same app as for API calls can be used to create the webhook. Or a dedicated webhook can be defined.

Expand Down
19 changes: 18 additions & 1 deletion examples/default/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -97,8 +97,21 @@ module "runners" {
# prefix GitHub runners with the environment name
runner_name_prefix = "${local.environment}_"

# webhook supports to modes, either direct or via the eventbridge, uncommet to enable eventbridge
npalm marked this conversation as resolved.
Show resolved Hide resolved
# eventbridge = {
# enable = true
# # adjust the allow events to only allow specific events, like workflow_job
# # allowed_events = ['workflow_job']
# }

# Enable debug logging for the lambda functions
log_level = "info"
# log_level = "debug"

# tracing_config = {
# mode = "Active"
# capture_error = true
# capture_http_requests = true
# }

enable_ami_housekeeper = true
ami_housekeeper_cleanup_config = {
Expand Down Expand Up @@ -158,3 +171,7 @@ module "webhook_github_app" {
# name = "alias/github/action-runners"
# target_key_id = aws_kms_key.github.key_id
# }
# moved {
# from = module.runners.module.webhook.aws_lambda_function.webhook
# to = module.runners.module.webhook.module.webhook.aws_lambda_function.webhook
# }
8 changes: 8 additions & 0 deletions examples/multi-runner/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,14 @@ module "runners" {
id = var.github_app.id
webhook_secret = random_id.random.hex
}

# Deploy webhook using the EventBridge
eventbridge = {
enable = true
# adjust the allow events to only allow specific events, like workflow_job
accept_events = ["workflow_job"]
}

# enable this section for tracing
# tracing_config = {
# mode = "Active"
npalm marked this conversation as resolved.
Show resolved Hide resolved
Expand Down
2 changes: 1 addition & 1 deletion lambdas/functions/ami-housekeeper/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
"@aws-github-runner/aws-ssm-util": "*",
"@aws-sdk/client-ec2": "^3.670.0",
"@aws-sdk/client-ssm": "^3.670.0",
"@aws-sdk/types": "^3.664.0",
"@aws-sdk/types": "^3.667.0",
"cron-parser": "^4.9.0",
"typescript": "^5.5.4"
},
Expand Down
2 changes: 1 addition & 1 deletion lambdas/functions/control-plane/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
"@aws-lambda-powertools/parameters": "^2.9.0",
"@aws-sdk/client-ec2": "^3.670.0",
"@aws-sdk/client-sqs": "^3.670.0",
"@aws-sdk/types": "^3.664.0",
"@aws-sdk/types": "^3.667.0",
"@middy/core": "^4.7.0",
"@octokit/auth-app": "6.1.2",
"@octokit/core": "5.2.0",
Expand Down
2 changes: 1 addition & 1 deletion lambdas/functions/gh-agent-syncer/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
"@aws-github-runner/aws-powertools-util": "*",
"@aws-sdk/client-s3": "^3.673.0",
"@aws-sdk/lib-storage": "^3.673.0",
"@aws-sdk/types": "^3.664.0",
"@aws-sdk/types": "^3.667.0",
"@middy/core": "^4.7.0",
"@octokit/rest": "20.1.1",
"axios": "^1.7.7"
Expand Down
2 changes: 1 addition & 1 deletion lambdas/functions/termination-watcher/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
"dependencies": {
"@aws-github-runner/aws-powertools-util": "*",
"@aws-sdk/client-ec2": "^3.670.0",
"@aws-sdk/types": "^3.664.0",
"@aws-sdk/types": "^3.667.0",
"@middy/core": "^4.7.0",
"typescript": "^5.5.4"
},
Expand Down
4 changes: 2 additions & 2 deletions lambdas/functions/webhook/jest.config.ts
npalm marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@ const config: Config = {
...defaultConfig,
coverageThreshold: {
global: {
statements: 99.2,
statements: 99.58,
branches: 100,
functions: 100,
lines: 99.25,
lines: 99.57,
},
},
};
Expand Down
1 change: 1 addition & 0 deletions lambdas/functions/webhook/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
"all": "yarn build && yarn format && yarn lint && yarn test"
},
"devDependencies": {
"@aws-sdk/client-eventbridge": "^3.670.0",
"@trivago/prettier-plugin-sort-imports": "^4.3.0",
"@types/aws-lambda": "^8.10.145",
"@types/express": "^4.17.21",
Expand Down
Loading