-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Use Kubernetes GC to clean kubevirt VMs (packet-* jobs) #11530
[CI] Use Kubernetes GC to clean kubevirt VMs (packet-* jobs) #11530
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: VannTen The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/ok-to-test |
/cc @ant31 |
/retest |
/retest
|
fd43216
to
d70f8e2
Compare
/label ci-full (To test it works correctly for everything) |
@VannTen: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
d70f8e2
to
bbfd93a
Compare
@ant31
What's the process to add the ci-{extended,full} label ? Can't find them with a quick grepping
|
b67b0b2
to
25bb5d0
Compare
I'll do that once the initial set of tests pass then 👍 |
25bb5d0
to
32a0dfc
Compare
81ac78a
to
d1ca52f
Compare
we probably want to add some delay here and there and have some kind of retry at this step |
That's what the last commit does. But if the retry works and then the IP
disappears, we're back where we started ^. Can't think of an effective
workaround for now 🤔
|
maybe upgrading kubevirt ? |
Maybe. But the linked bug being still open is not making me very hopeful... (there is always a change we're not hitting that specifically but I'm not counting on it).
|
VMI in Kubevirt are the abstraction below VirtualMachine. - We don't really need the extra abstraction of VirtualMachine objects - Convert the waiting for VMs ip address to use kubernetes.core.k8s_info and no shell pipeline
Not constraining the inventory to .ini allows us to use dynamic inventory, which is needed for simplifying kubevirt jobs inventory. Also reduces the scope of the ANSIBLE_INVENTORY variable.
/label ci-extended |
13f0332
to
a4e517d
Compare
I believe I have found a way to make this work: instead of using the kubevirt inventory plugin, I'm creating a static inventory out the the valid state |
a4e517d
to
f0f437e
Compare
Btw, I kept the commit with kubevirt because it's helpful context, and if that bug is resolved later on we might switch to it.
|
/retest-failed
The failure seems to be related to the kubeadm stuff fixed earlier
|
/hold cancel |
This allows a single source of truth for the virtual machines in a kubevirt ci-run. `etcd_member_name` should be correctly handled in kubespray-defaults for testing the recover cases.
VirtualMachineInstance resources sometimes temporarily loose their IP (at least as far as the kubevirt controllers can see). See kubevirt/kubevirt#12698 for the upstream bug. This does not seems to affect actual connection (if it did, our current CI would not work). However, our CI execute multiple playbooks, and in particular: 1. The provisioning playbook (which checks that the IPs have been provisioned by querying the K8S API) 2. Kubespray itself If any of the VirtualMachineInstance looses its IP between after 1 checked for it, and before 2 starts, the dynamic inventory (which is invoked when the playbook is launched by ansible-playbook) will not have an ip for that host, and will try to use the name for ssh, which of course will not work. Instead, when we have a valid state during provisioning (all IPs presents), use it to construct a static inventory which will be used for the rest of the CI run.
We should not rollback our test setup during upgrade test. The only reason to do that would be for incompatible changes in the test inventory, and we already checkout master for those (${CI_JOB_NAME}.yml) Also do some cleanup by removing unnecessary intermediary variables
The new CI does not define k8s_cluster group, so it relies on kubernetes-sigs#11559. This does not work for upgrade testing (which use the previous release). We can revert this commit after 2.27.0
f0f437e
to
47f6781
Compare
@ant31 @tico88612 this should be ready for review (finally ^^) |
/lgtm |
great work, thanks @VannTen |
What type of PR is this?
/kind feature
What this PR does / why we need it:
We regularly have CI flakes where the job failed to delete k8s namespace in the CI cluster.
It's not much, but it's a little hiccup in the PR process which I'd like to eliminate.
I'm not sure what the exact reason is, probably some race between the jobs and the time between fetching the list of namespace and the deletion.
Regardless, a simpler way to delete the VMs is to let them be dependants (in the kubernetes sense) of the job pod. This way, once the job pod is deleted, kubernetes garbage collection in the CI cluster will take care of removing the associated VMs
Special notes for your reviewer:
PR on the ci infra kubespray/kspray-infra#1 (private repo, maintainers have access)
Does this PR introduce a user-facing change?:
/label tide/merge-method-merge