Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: mark orphan runners before removing them #4001

Merged
merged 9 commits into from
Aug 1, 2024
Merged

Conversation

npalm
Copy link
Collaborator

@npalm npalm commented Jul 17, 2024

Problem

Orphan runners are deleted right after detection. This can be clash with self termination (ephemeral) runners. Typically the runner is waiting a few sseconds before exectuing a self termination.

Solution

In this solution we first mark a runner orphan, but not delete the runner. In a next cycle of the scale down function. First all orphan runners are terminated.

Improvements

  • Improved logging, only logging the main flow once at info. All other logs moved to debug
  • Scale-down write permissions limitted to the envirnoment

Todo

  • Update docs
  • Test default runner deployment
  • Test mult runner deployment

Example of log

  • Two instances
  • One made orphan by removing the runner from GitHub
  • In the log
    • Idle runner got removed
    • Orphan get marked as orphan
    • Next cycle orphan terminated.
image

@npalm npalm force-pushed the npalm/scale-orphan branch from 3eb522f to 3172cb2 Compare July 17, 2024 15:43

expect(mockTagRunners).toHaveBeenCalledWith(orphanRunner.instanceId, [
{
Key: 'orphan',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Key: 'orphan',
Key: 'Orphan',

As above?

try {
await terminateRunner(instanceId);
await tag(instanceId, [{ Key: 'orphan', Value: 'true' }]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
await tag(instanceId, [{ Key: 'orphan', Value: 'true' }]);
await tag(instanceId, [{ Key: 'Orphan', Value: 'true' }]);

as above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact this should be ghr:orphan

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created #4026 for formatting of the keuys.


for (const runner of orphanRunners) {
await terminateRunner(runner.instanceId).catch((e) => {
logger.error(`Failed to terminate orphan runner '${runner.instanceId}'`, { error: e });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any need to do a catch twice?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes the inner try/catch is catchting error during termination. The outer any other inlcuding the listEC2Runners

const environment = process.env.ENVIRONMENT;
const scaleDownConfigs = JSON.parse(process.env.SCALE_DOWN_CONFIG) as [ScalingDownConfig];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have noted here that we parse without sanity checking it first?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config is typed in Terraform, but it is a loose coupling.


for (const runner of orphanRunners) {
await terminateRunner(runner.instanceId).catch((e) => {
logger.error(`Failed to terminate orphan runner '${runner.instanceId}'`, { error: e });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any point doing a catch twice? maybe worth doing it once in the catch block

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did quite some testing with it, but it needed due to all the async executing.

@npalm npalm requested a review from stuartp44 July 31, 2024 16:11
@npalm npalm requested a review from Brend-Smits August 1, 2024 06:03
@npalm npalm marked this pull request as draft August 1, 2024 07:13
@npalm npalm marked this pull request as ready for review August 1, 2024 13:24
Copy link
Contributor

@Brend-Smits Brend-Smits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@npalm npalm merged commit 6cde62c into main Aug 1, 2024
43 of 44 checks passed
@npalm npalm deleted the npalm/scale-orphan branch August 1, 2024 14:59
npalm pushed a commit that referenced this pull request Aug 1, 2024
🤖 I have created a release *beep* *boop*
---


##
[5.14.0](philips-labs/terraform-aws-github-runner@v5.13.0...v5.14.0)
(2024-08-01)


### Features

* mark orphan runners before removing them
([#4001](https://github.com/philips-labs/terraform-aws-github-runner/issues/4001))
([6cde62c](philips-labs/terraform-aws-github-runner@6cde62c))


### Bug Fixes

* upgrade aws powertools to v2
([#4027](https://github.com/philips-labs/terraform-aws-github-runner/issues/4027))
([900217b](philips-labs/terraform-aws-github-runner@900217b))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: forest-releaser[bot] <80285352+forest-releaser[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants