Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve HTCondor example and integration tests #1440

Merged

Conversation

tpdownes
Copy link
Member

  • Convert HTCondor example to multigroup deployment with custom image
  • Modify integration test to use deploy/destroy commands
  • Modify example to autoscale 2 idle nodes and add a test to check
  • Increase yum/dnf timeout to 300 seconds to mitigate race condition with yum-cron/dnf-automatic installation of packages during boot

Submission Checklist

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cloud HPC Toolkit Contribution guidelines #

tpdownes added 2 commits June 12, 2023 23:14
- yum-cron.service or dnf-automatic.service can often start execution
  in the middle of google-startup-scripts. This can cause problems
  executing package or GPG key operations through ansible. Resolve by
  adopting ansible.builtin.yum (which uses dnf as a backend when
  available) and adding a default lock_timeout of 5 minutes.
- Remove use of ansible.builtin.gpg_key as it does not honor any timeout
  and it doesn't appear to solve the problem it was thought to solve.
- Improve HTCondor example through building a custom image
- Modify integration test to use deploy/destroy commands
- Modify example to autoscale 2 idle nodes and add a test to check
@tpdownes tpdownes requested a review from nick-stroud June 13, 2023 04:40
@tpdownes tpdownes marked this pull request as ready for review June 13, 2023 15:39
@nick-stroud nick-stroud assigned tpdownes and unassigned nick-stroud Jun 13, 2023
@tpdownes tpdownes assigned nick-stroud and unassigned tpdownes Jun 13, 2023
@tpdownes tpdownes requested a review from nick-stroud June 13, 2023 19:20
@nick-stroud nick-stroud assigned tpdownes and unassigned nick-stroud Jun 13, 2023
@tpdownes tpdownes merged commit 644e688 into GoogleCloudPlatform:develop Jun 13, 2023
@tpdownes tpdownes deleted the htcondor_test_autoscaling branch June 13, 2023 20:58
rohitramu pushed a commit to rohitramu/hpc-toolkit that referenced this pull request Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants