Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(runners): add configurable eviction strategy to idle config #3375

Merged
merged 7 commits into from
Aug 8, 2023

Conversation

maschwenk
Copy link
Contributor

We do some on-instance caching so when we scale down we'd prefer to keep the older instances around instead of the new ones (because they will have a hotter cache). This adds a configurable setting to the idleConfig to pick a sorting strategy. Never contributed to this repo, so please tell me if I'm doing something wrong!

@maschwenk maschwenk changed the title Add eviction strategy to idle config [feature] Add configurable eviction strategy to idle config Jul 20, 2023
@maschwenk maschwenk changed the title [feature] Add configurable eviction strategy to idle config [feat] Add configurable eviction strategy to idle config Jul 20, 2023
@maschwenk maschwenk changed the title [feat] Add configurable eviction strategy to idle config feat: Add configurable eviction strategy to idle config Jul 20, 2023
@npalm npalm self-requested a review July 21, 2023 04:19
@maschwenk maschwenk force-pushed the main branch 2 times, most recently from 2274baf to 0019895 Compare July 21, 2023 17:47
@maschwenk
Copy link
Contributor Author

@npalm I rebased and fixed the prettier failures as well as one typecheck failure. I'm running into failures for tests because of missing type coverage. Will wait to make sure you approve of the approach before proceeding to write tests.

@npalm
Copy link
Collaborator

npalm commented Jul 23, 2023

@npalm I rebased and fixed the prettier failures as well as one typecheck failure. I'm running into failures for tests because of missing type coverage. Will wait to make sure you approve of the approach before proceeding to write tests.

Thanks for the PR. WIll check the approach ASAP.

@GuptaNavdeep1983
Copy link
Contributor

@maschwenk one of the test case is failing for this PR. Can you please fix that?

Copy link
Collaborator

@npalm npalm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks good to me. I understand the idea of keeping caches aroudn. But we see this as a risk as well. Supporting this eviction strategy is fine as long you esnure you add a test case.

@maschwenk
Copy link
Contributor Author

@npalm 👍🏼 sounds good. Will fix.

@maschwenk
Copy link
Contributor Author

@npalm I've added a test to make sure the default config is extracted correctly

@GuptaNavdeep1983 took on your suggestion as well 👍🏼

@maschwenk
Copy link
Contributor Author

@npalm Ahh I just realized it's failing due to the test coverage? I couldn't figure out why before. Happy to add a bit of coverage.

@maschwenk maschwenk force-pushed the main branch 2 times, most recently from ff3cbf0 to b24497b Compare July 26, 2023 01:45
@maschwenk
Copy link
Contributor Author

@npalm Added a test in lambdas/functions/control-plane/src/scale-runners/scale-down.test.ts to get the coverage level to acceptable

@maschwenk
Copy link
Contributor Author

@npalm 😅 the Code health thing did not like some duplication I had, cleaned that up. This should be ready for review! 🙏🏼

@maschwenk
Copy link
Contributor Author

@npalm should be ready for another look 🙏🏼 sorry for the churn

README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
Copy link
Collaborator

@npalm npalm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks in gneral good, a few remarks. Will test asap

@maschwenk
Copy link
Contributor Author

@npalm I think my comment above the eviction strategy was more confusing so I just moved it up into the paragraph above to give a more thoughtful explanation. Completely aside from the point of this PR, your note of:

This helps keep your environment up-to-date and reduce problems like running out of disk space or RAM

The disk space clause makes sense to me, but I'd be curious how often your instances are carrying around "dead" RAM after jobs run on them? We definitely have RAM issues but because we don't run things containerized it often just toasts the box completely and we take it out of rotation versus leaking over time. Just curious about your experience there. Thanks for reviewing!

@maschwenk
Copy link
Contributor Author

@npalm were you able to take a look?

@npalm
Copy link
Collaborator

npalm commented Aug 4, 2023

@npalm were you able to take a look?

Sorry not yet, seems also the coerage is dropped slightly. I wil have a look early next week

@npalm
Copy link
Collaborator

npalm commented Aug 4, 2023

Just ran the test suite locally, but see no coverage error. Will have a check later.

@npalm
Copy link
Collaborator

npalm commented Aug 5, 2023

Just ran the test suite locally, but see no coverage error. Will have a check later.

Re-ran the build, seems no fine.

@npalm npalm changed the title feat: Add configurable eviction strategy to idle config feat(runners): Add configurable eviction strategy to idle config Aug 5, 2023
@npalm npalm changed the title feat(runners): Add configurable eviction strategy to idle config feat(runners): add configurable eviction strategy to idle config Aug 8, 2023
Copy link
Collaborator

@npalm npalm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maschwenk thx for your contribution

@npalm npalm merged commit 896f473 into github-aws-runners:main Aug 8, 2023
npalm pushed a commit that referenced this pull request Aug 8, 2023
🤖 I have created a release *beep* *boop*
---


##
[4.1.0](philips-labs/terraform-aws-github-runner@v4.0.2...v4.1.0)
(2023-08-08)


### Features

* **runners:** add configurable eviction strategy to idle config
([#3375](https://github.com/philips-labs/terraform-aws-github-runner/issues/3375))
([896f473](philips-labs/terraform-aws-github-runner@896f473))


### Bug Fixes

* **lambda:** bump the aws group in /lambdas with 5 updates
([#3413](https://github.com/philips-labs/terraform-aws-github-runner/issues/3413))
([1acc8ba](philips-labs/terraform-aws-github-runner@1acc8ba))
* **runners:** retry aws metadata token download on Linux
([#3408](https://github.com/philips-labs/terraform-aws-github-runner/issues/3408))
([ef46827](philips-labs/terraform-aws-github-runner@ef46827))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: forest-releaser[bot] <80285352+forest-releaser[bot]@users.noreply.github.com>
@maschwenk
Copy link
Contributor Author

@npalm I'm seeing those failures you are getting on master, I think I have an idea of why they are happening. Will upstream a change if I figure it out.

@npalm
Copy link
Collaborator

npalm commented Aug 8, 2023

Do you mean the release build on main? For some reason the build is randomly failing. I have no clue why (yet). The documentation build is another problem. And a less a big issue. Do you have any clue why the release build is failing when building all lambda's?

@maschwenk
Copy link
Contributor Author

Do you mean the release build on main? For some reason the build is randomly failing. I have no clue why (yet). The documentation build is another problem. And a less a big issue. Do you have any clue why the release build is failing when building all lambda's?

@npalm I don't yet, but my test was mutating process.env, so I was thinking perhaps the mutation was polluting global state. Haven't been able to repro locally. It's also odd because the stack-trace is pretty useless, not clear what the actual error is, just shows where it's coming from.

npalm added a commit that referenced this pull request Aug 9, 2023
- fix for randomly failing unit test after merging #3375
- add workspace config for vscode
- increase coverage
@npalm
Copy link
Collaborator

npalm commented Aug 9, 2023

I think I have solved the issue in #3418

@npalm
Copy link
Collaborator

npalm commented Aug 9, 2023

Not yet, still failing

@npalm
Copy link
Collaborator

npalm commented Aug 9, 2023

@maschwenk I have tried earlier today to fix the test. I am not able to reproduce the failing test on my local. Assume it has something to do with the order. I added the SCALE_DOWN_CONFIG to other test to ensure the value is set right. At some moment got a error the coverage was not metting our settings. Fixed that as well. THought it was all good. But now the main is failing again. So ondering if you have still any ieda.

@maschwenk
Copy link
Contributor Author

@npalm still at a little bit of a loss. Is there any way we can get better logs out of the failing tests perhaps? Or maybe try forcing the Jest test to run in an order that will deterministically fail?

@npalm
Copy link
Collaborator

npalm commented Aug 10, 2023

Yes we can set a few things in jest (at least)

  • verbose -> true
  • silent -> false
  • define reporters. and save as artiefact (retention 1 day).

I will also dig in a bit more next week.

@npalm
Copy link
Collaborator

npalm commented Aug 15, 2023

This PR should fix the issue: #3432

@maschwenk
Copy link
Contributor Author

@npalm ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants