[CI] Use Kubernetes GC to clean kubevirt VMs (packet-* jobs) #11530

VannTen · 2024-09-13T10:58:37Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
We regularly have CI flakes where the job failed to delete k8s namespace in the CI cluster.
It's not much, but it's a little hiccup in the PR process which I'd like to eliminate.

I'm not sure what the exact reason is, probably some race between the jobs and the time between fetching the list of namespace and the deletion.
Regardless, a simpler way to delete the VMs is to let them be dependants (in the kubernetes sense) of the job pod. This way, once the job pod is deleted, kubernetes garbage collection in the CI cluster will take care of removing the associated VMs

Special notes for your reviewer:
PR on the ci infra kubespray/kspray-infra#1 (private repo, maintainers have access)

Does this PR introduce a user-facing change?:

NONE

/label tide/merge-method-merge

k8s-ci-robot · 2024-09-13T10:58:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: VannTen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [VannTen]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

VannTen · 2024-09-13T11:02:25Z

/ok-to-test

VannTen · 2024-09-13T11:03:12Z

/cc @ant31

VannTen · 2024-09-13T12:44:52Z

/retest
(now that I fixed the gitlab-runner config)

VannTen · 2024-09-13T19:27:40Z

/retest

VannTen · 2024-09-20T13:46:10Z

/label ci-full

(To test it works correctly for everything)

k8s-ci-robot · 2024-09-20T13:46:13Z

@VannTen: The label(s) /label ci-full cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/label ci-full

(To test it works correctly for everything)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

VannTen · 2024-09-20T13:53:35Z

@ant31 What's the process to add the ci-{extended,full} label ? Can't find them with a quick grepping

ant31 · 2024-09-23T09:05:55Z

I think the PR to add them via /label is still in review

For now you can add them manually

VannTen · 2024-09-23T09:11:11Z

I'll do that once the initial set of tests pass then 👍

ant31 · 2024-11-05T08:35:43Z

we probably want to add some delay here and there and have some kind of retry at this step

VannTen · 2024-11-05T09:04:48Z

That's what the last commit does. But if the retry works and then the IP disappears, we're back where we started ^. Can't think of an effective workaround for now 🤔

ant31 · 2024-11-05T12:56:32Z

maybe upgrading kubevirt ?

VannTen · 2024-11-05T12:58:42Z

Maybe. But the linked bug being still open is not making me very hopeful... (there is always a change we're not hitting that specifically but I'm not counting on it).

VMI in Kubevirt are the abstraction below VirtualMachine. - We don't really need the extra abstraction of VirtualMachine objects - Convert the waiting for VMs ip address to use kubernetes.core.k8s_info and no shell pipeline

Not constraining the inventory to .ini allows us to use dynamic inventory, which is needed for simplifying kubevirt jobs inventory. Also reduces the scope of the ANSIBLE_INVENTORY variable.

VannTen · 2024-11-13T16:41:49Z

/label ci-extended

VannTen · 2024-11-13T16:44:42Z

I believe I have found a way to make this work: instead of using the kubevirt inventory plugin, I'm creating a static inventory out the the valid state
(which we have in "Wait for vms to have IP addresses")

VannTen · 2024-11-13T17:09:39Z

Btw, I kept the commit with kubevirt because it's helpful context, and if that bug is resolved later on we might switch to it.

VannTen · 2024-11-13T23:29:48Z

/retest-failed The failure seems to be related to the kubeadm stuff fixed earlier

VannTen · 2024-11-14T08:19:00Z

/hold cancel

This allows a single source of truth for the virtual machines in a kubevirt ci-run. `etcd_member_name` should be correctly handled in kubespray-defaults for testing the recover cases.

VirtualMachineInstance resources sometimes temporarily loose their IP (at least as far as the kubevirt controllers can see). See kubevirt/kubevirt#12698 for the upstream bug. This does not seems to affect actual connection (if it did, our current CI would not work). However, our CI execute multiple playbooks, and in particular: 1. The provisioning playbook (which checks that the IPs have been provisioned by querying the K8S API) 2. Kubespray itself If any of the VirtualMachineInstance looses its IP between after 1 checked for it, and before 2 starts, the dynamic inventory (which is invoked when the playbook is launched by ansible-playbook) will not have an ip for that host, and will try to use the name for ssh, which of course will not work. Instead, when we have a valid state during provisioning (all IPs presents), use it to construct a static inventory which will be used for the rest of the CI run.

We should not rollback our test setup during upgrade test. The only reason to do that would be for incompatible changes in the test inventory, and we already checkout master for those (${CI_JOB_NAME}.yml) Also do some cleanup by removing unnecessary intermediary variables

The new CI does not define k8s_cluster group, so it relies on kubernetes-sigs#11559. This does not work for upgrade testing (which use the previous release). We can revert this commit after 2.27.0

https://docs.ansible.com/ansible/latest/reference_appendices/config.html#envvar-ANSIBLE_VERBOSITY

VannTen · 2024-11-14T09:06:15Z

@ant31 @tico88612 this should be ready for review (finally ^^)

ant31 · 2024-11-14T10:47:15Z

/lgtm

ant31 · 2024-11-14T10:47:22Z

great work, thanks @VannTen

k8s-ci-robot requested review from tico88612 and yankay September 13, 2024 10:58

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 13, 2024

k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Sep 13, 2024

k8s-ci-robot requested a review from ant31 September 13, 2024 11:03

VannTen force-pushed the ci/cleanup_with_k8s_gc branch 2 times, most recently from fd43216 to d70f8e2 Compare September 20, 2024 13:45

VannTen force-pushed the ci/cleanup_with_k8s_gc branch from d70f8e2 to bbfd93a Compare September 20, 2024 13:50

VannTen force-pushed the ci/cleanup_with_k8s_gc branch 2 times, most recently from b67b0b2 to 25bb5d0 Compare September 20, 2024 14:56

VannTen mentioned this pull request Sep 21, 2024

Only require minimum structure in inventory, compute the rest #11559

Merged

VannTen force-pushed the ci/cleanup_with_k8s_gc branch from 25bb5d0 to 32a0dfc Compare September 23, 2024 10:17

VannTen force-pushed the ci/cleanup_with_k8s_gc branch 3 times, most recently from 81ac78a to d1ca52f Compare October 4, 2024 07:18

rrajendran17 mentioned this pull request Nov 12, 2024

vmi Interface pod IP disappers intermittently kubevirt/kubevirt#12698

Open

VannTen added 2 commits November 13, 2024 17:32

CI: use VirtualMachineInstance for VMs

c46e5dc

VMI in Kubevirt are the abstraction below VirtualMachine. - We don't really need the extra abstraction of VirtualMachine objects - Convert the waiting for VMs ip address to use kubernetes.core.k8s_info and no shell pipeline

CI: make the ansible inventory a directory

a8e7238

Not constraining the inventory to .ini allows us to use dynamic inventory, which is needed for simplifying kubevirt jobs inventory. Also reduces the scope of the ANSIBLE_INVENTORY variable.

k8s-ci-robot added the ci-extended Run additional tests label Nov 13, 2024

VannTen force-pushed the ci/cleanup_with_k8s_gc branch from 13f0332 to a4e517d Compare November 13, 2024 16:41

VannTen force-pushed the ci/cleanup_with_k8s_gc branch from a4e517d to f0f437e Compare November 13, 2024 16:55

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 14, 2024

VannTen added 6 commits November 14, 2024 09:40

CI: use kubevirt.core dynamic inventory

329ffd4

This allows a single source of truth for the virtual machines in a kubevirt ci-run. `etcd_member_name` should be correctly handled in kubespray-defaults for testing the recover cases.

CI: workaround for upgrade test backward compatibility

b5464af

The new CI does not define k8s_cluster group, so it relies on kubernetes-sigs#11559. This does not work for upgrade testing (which use the previous release). We can revert this commit after 2.27.0

CI: directly use ANSIBLE_VERBOSITY instead of tweaking command line

236a748

https://docs.ansible.com/ansible/latest/reference_appendices/config.html#envvar-ANSIBLE_VERBOSITY

CI: (packet-ci) simplify role layout

47f6781

VannTen force-pushed the ci/cleanup_with_k8s_gc branch from f0f437e to 47f6781 Compare November 14, 2024 08:41

k8s-ci-robot assigned ant31 Nov 14, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 14, 2024

k8s-ci-robot merged commit 05e2b47 into kubernetes-sigs:master Nov 14, 2024
52 checks passed

VannTen deleted the ci/cleanup_with_k8s_gc branch November 29, 2024 10:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Use Kubernetes GC to clean kubevirt VMs (packet-* jobs) #11530

[CI] Use Kubernetes GC to clean kubevirt VMs (packet-* jobs) #11530

VannTen commented Sep 13, 2024

k8s-ci-robot commented Sep 13, 2024

VannTen commented Sep 13, 2024

VannTen commented Sep 13, 2024

VannTen commented Sep 13, 2024

VannTen commented Sep 13, 2024 via email

VannTen commented Sep 20, 2024

k8s-ci-robot commented Sep 20, 2024

VannTen commented Sep 20, 2024 via email

ant31 commented Sep 23, 2024

VannTen commented Sep 23, 2024

ant31 commented Nov 5, 2024 •

edited

Loading

VannTen commented Nov 5, 2024 via email

ant31 commented Nov 5, 2024

VannTen commented Nov 5, 2024 via email

VannTen commented Nov 13, 2024

VannTen commented Nov 13, 2024

VannTen commented Nov 13, 2024 via email

VannTen commented Nov 13, 2024 via email

VannTen commented Nov 14, 2024

VannTen commented Nov 14, 2024

ant31 commented Nov 14, 2024

ant31 commented Nov 14, 2024

[CI] Use Kubernetes GC to clean kubevirt VMs (packet-* jobs) #11530

[CI] Use Kubernetes GC to clean kubevirt VMs (packet-* jobs) #11530

Conversation

VannTen commented Sep 13, 2024

k8s-ci-robot commented Sep 13, 2024

VannTen commented Sep 13, 2024

VannTen commented Sep 13, 2024

VannTen commented Sep 13, 2024

VannTen commented Sep 13, 2024 via email

VannTen commented Sep 20, 2024

k8s-ci-robot commented Sep 20, 2024

VannTen commented Sep 20, 2024 via email

ant31 commented Sep 23, 2024

VannTen commented Sep 23, 2024

ant31 commented Nov 5, 2024 • edited Loading

VannTen commented Nov 5, 2024 via email

ant31 commented Nov 5, 2024

VannTen commented Nov 5, 2024 via email

VannTen commented Nov 13, 2024

VannTen commented Nov 13, 2024

VannTen commented Nov 13, 2024 via email

VannTen commented Nov 13, 2024 via email

VannTen commented Nov 14, 2024

VannTen commented Nov 14, 2024

ant31 commented Nov 14, 2024

ant31 commented Nov 14, 2024

ant31 commented Nov 5, 2024 •

edited

Loading