-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: add nightly gke cluster cleanup job #9031
Conversation
fe5d76d
to
080d7ac
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #9031 +/- ##
==========================================
- Coverage 46.64% 46.63% -0.01%
==========================================
Files 1172 1172
Lines 143619 143619
Branches 2410 2410
==========================================
- Hits 66986 66983 -3
- Misses 76428 76431 +3
Partials 205 205
Flags with carried forward coverage won't be shown. Click here to find out more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
f26f546
to
fc9c543
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also LGTM! 😄
cfc0dbf
to
e93351b
Compare
e93351b
to
d282817
Compare
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have the users @djanicekpach, @jesse-amano-hpe, @determined-ci on file. In order for us to review and merge your code, please start the CLA process at https://determined.ai/cla. After we approve your CLA, we will update the contributors list (private) and comment |
Docsite preview being generated for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving for the sake of moving forward, but there are come changes I'd like to see longer-term. :)
- run: | ||
name: Delete GKE CI Cluster Namespaces | ||
command: | | ||
kubectl get namespace | grep -Eo "^test-cpu-[a-z0-9]+-[a-z0-9]+-[0-9]" | xargs -L1 kubectl delete namespace || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The right-er way to mass delete from kubernetes is to apply labels, and then delete items which have a given label value. If the namespaces were created with a label which was, for example, like "determiend.ai/ci" and an arbitrary value (perhaps the ID of the test), then this could be as simple as:
kubectl get namespace | grep -Eo "^test-cpu-[a-z0-9]+-[a-z0-9]+-[0-9]" | xargs -L1 kubectl delete namespace || true | |
kubectl delete namespace -l determiend.ai/ci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great idea. I will look into adding a unique label to each namespace to simplify this cleanup job in the future
.circleci/real_config.yml
Outdated
gcloud compute networks get-effective-firewalls $GCP_NETWORK_NAME --project "$GCP_PROJECT_ID" \ | ||
--format="table(name)" | tail -n +2 | \ | ||
while read fw; do | ||
if [[ $fw =~ "k8s" ]]; then | ||
gcloud compute firewall-rules delete "$fw" --quiet --project "$GCP_PROJECT_ID" | ||
fi | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gcloud delete commands mostly take multiple targets for delete, so this could be done (faster) with a single command along the lines of this:
gcloud compute networks get-effective-firewalls $GCP_NETWORK_NAME --project "$GCP_PROJECT_ID" \ | |
--format="table(name)" | tail -n +2 | \ | |
while read fw; do | |
if [[ $fw =~ "k8s" ]]; then | |
gcloud compute firewall-rules delete "$fw" --quiet --project "$GCP_PROJECT_ID" | |
fi | |
done | |
gcloud compute firewall-rules delete "$fw" --quiet --project "$GCP_PROJECT_ID" $( \ | |
gcloud compute networks get-effective-firewalls $GCP_NETWORK_NAME --project "$GCP_PROJECT_ID" --regexp=".*k8s.*" \ | |
--format="table(name)" | tail -n +2 | |
) |
Piping into a while loop like that means any command inside the loop which reads from stdin will consume output before the read
gets another chance, so it's generally ideal to avoid that structure for maintenance (works now, but what if someone adds a second command in the loop later?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would actually be even easier with a cloud-custodian yaml file which could just select the stuff to remove. :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the kubectl delete namespace
command removes firewall rules, target pools, and forwarding rules (since those resources are namespace). So, I'll remove these three steps from the cleaup job since they get inherently deleted from the deletion of the namespace!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, even better!
d282817
to
16d7008
Compare
✅ Deploy Preview for determined-ui canceled.
|
16d7008
to
ea01a59
Compare
ea01a59
to
2f63f83
Compare
Description
Create nightly cleanup job that removes all namespaces (and therefore all k8s objects within them) from the CircleCI GKE cluster and deletes all buckets created by CI jobs used for testing with that cluster.
Successful run with resources to delete: https://app.circleci.com/pipelines/github/determined-ai/determined/52752/workflows/ef46cdef-ca8d-4345-a1e7-49e15d7473aa/jobs/2370042
Successful run with no resources to delete: https://app.circleci.com/pipelines/github/determined-ai/determined/52764/workflows/0a6c0b3f-8e4b-427e-ba92-d6f7b5e930a6/jobs/2370726
Test Plan
CI passes.
Checklist
docs/release-notes/
.See Release Note for details.
Ticket
DET-10117