-
Notifications
You must be signed in to change notification settings - Fork 633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: mark orphan runners before removing them #4001
Conversation
3eb522f
to
3172cb2
Compare
|
||
expect(mockTagRunners).toHaveBeenCalledWith(orphanRunner.instanceId, [ | ||
{ | ||
Key: 'orphan', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key: 'orphan', | |
Key: 'Orphan', |
As above?
try { | ||
await terminateRunner(instanceId); | ||
await tag(instanceId, [{ Key: 'orphan', Value: 'true' }]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
await tag(instanceId, [{ Key: 'orphan', Value: 'true' }]); | |
await tag(instanceId, [{ Key: 'Orphan', Value: 'true' }]); |
as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in fact this should be ghr:orphan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created #4026 for formatting of the keuys.
|
||
for (const runner of orphanRunners) { | ||
await terminateRunner(runner.instanceId).catch((e) => { | ||
logger.error(`Failed to terminate orphan runner '${runner.instanceId}'`, { error: e }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any need to do a catch twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes the inner try/catch is catchting error during termination. The outer any other inlcuding the listEC2Runners
const environment = process.env.ENVIRONMENT; | ||
const scaleDownConfigs = JSON.parse(process.env.SCALE_DOWN_CONFIG) as [ScalingDownConfig]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have noted here that we parse without sanity checking it first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config is typed in Terraform, but it is a loose coupling.
|
||
for (const runner of orphanRunners) { | ||
await terminateRunner(runner.instanceId).catch((e) => { | ||
logger.error(`Failed to terminate orphan runner '${runner.instanceId}'`, { error: e }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any point doing a catch twice? maybe worth doing it once in the catch block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did quite some testing with it, but it needed due to all the async executing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
🤖 I have created a release *beep* *boop* --- ## [5.14.0](philips-labs/terraform-aws-github-runner@v5.13.0...v5.14.0) (2024-08-01) ### Features * mark orphan runners before removing them ([#4001](https://github.com/philips-labs/terraform-aws-github-runner/issues/4001)) ([6cde62c](philips-labs/terraform-aws-github-runner@6cde62c)) ### Bug Fixes * upgrade aws powertools to v2 ([#4027](https://github.com/philips-labs/terraform-aws-github-runner/issues/4027)) ([900217b](philips-labs/terraform-aws-github-runner@900217b)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: forest-releaser[bot] <80285352+forest-releaser[bot]@users.noreply.github.com>
Problem
Orphan runners are deleted right after detection. This can be clash with self termination (ephemeral) runners. Typically the runner is waiting a few sseconds before exectuing a self termination.
Solution
In this solution we first mark a runner orphan, but not delete the runner. In a next cycle of the scale down function. First all orphan runners are terminated.
Improvements
Todo
Example of log