Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kind, vgpu: Bump vgpu kind to k8s-1.25 #979

Merged
merged 2 commits into from
Mar 9, 2023

Conversation

oshoval
Copy link
Contributor

@oshoval oshoval commented Mar 8, 2023

Since we don't use CPU manager on vgpu lane we can bump to k8s-1.25
and remove the cpu manager configuration due to
kubernetes-sigs/kind#2999 (affects k8s-1.24+ with cpu manager)

Rename vgpu lane to reflect the k8s version bump.

Kubevirtci Infra: kubevirt/project-infra#2653
Kubvirt infra: kubevirt/project-infra#2654

@kubevirt-bot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@kubevirt-bot kubevirt-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Mar 8, 2023
@oshoval
Copy link
Contributor Author

oshoval commented Mar 8, 2023

/test check-up-kind-1.23-sriov

lets see it didn't affect sriov (since a common file was changed)

Note - didn't try to just bump to 1.25 and do config cpu manager, but without using it
which is basically an option, but it is more implicit, as it hides a broken feature.

@oshoval
Copy link
Contributor Author

oshoval commented Mar 8, 2023

/test check-up-kind-1.23-sriov

@oshoval
Copy link
Contributor Author

oshoval commented Mar 8, 2023

It breaks SRIOV tests that run on kubevirt
kubevirtci presubmit doesnt run them

Do not config cpu manager for vgpu,
because kind 1.24+ has this bug for cpu manager:
kubernetes-sigs/kind#2999

Since we don't use cpu manager on vgpu lane we can bump to k8s-1.25
and remove cpu manager.

Rename lane.

Signed-off-by: Or Shoval <[email protected]>
The functions can add extra mounts / cpu manager to non worker nodes,
it depends where it is called.
If it is called before the worker snippet in the manifest
it will configure it for the control-plane, otherwise
for the worker node.

Rename it to reflect it.

Signed-off-by: Or Shoval <[email protected]>
@oshoval oshoval marked this pull request as ready for review March 8, 2023 14:38
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 8, 2023
@kubevirt-bot kubevirt-bot requested review from ormergi and stu-gott March 8, 2023 14:38
@oshoval
Copy link
Contributor Author

oshoval commented Mar 8, 2023

@vladikr @brianmcarey @dhiller @xpivarc
can you please take a look ?

this allows bumping vgpu lane to k8s-1.25
tested on kubevirt (both sriov and vgpu)
created infra PR as well

@kubevirt-bot
Copy link
Contributor

@oshoval: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
check-up-kind-1.23-vgpu 9b14eed link false /test check-up-kind-1.23-vgpu

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@oshoval
Copy link
Contributor Author

oshoval commented Mar 8, 2023

https://prow.ci.kubevirt.io/view/gs/kubevirt-prow/pr-logs/pull/kubevirt_kubevirtci/979/check-up-kind-1.23-vgpu/1633477265524264960
fails because provider was renamed
tested under kubevirt with local hack
infra PR attached

@oshoval oshoval changed the title kind: Bump vgpu kind to k8s-1.25 kind, vgpu: Bump vgpu kind to k8s-1.25 Mar 8, 2023
Copy link
Member

@brianmcarey brianmcarey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Mar 9, 2023
@xpivarc
Copy link
Member

xpivarc commented Mar 9, 2023

/lgtm
@dhiller Can I borrow your approve?

@ormergi
Copy link
Contributor

ormergi commented Mar 9, 2023

Please note that the issues with kind 1.24+ could still occur w/o CPU manager see kubernetes-sigs/kind#2999 (comment).

On the the SRIOV take, it turned out to be flaky, tests were failing from time to time due to the missing permissions to /dev/null.
I suggest to run the GPU lanes with this PR changes on kubevirt/kubevirt few times (about 5) verifying that it's actually stable.

@oshoval
Copy link
Contributor Author

oshoval commented Mar 9, 2023

Please note that the issues with kind 1.24+ could still occur w/o CPU manager see kubernetes-sigs/kind#2999 (comment).

Interesting

On the the SRIOV take, it can down to be flaky, tests were failing from time to time due to the missing permissions to /dev/null. I suggest to run the GPU lanes with this PR changes on kubevirt/kubevirt few times (about 5) verifying that it's actually stable.

I ran at least 4 times already and it was stable, I believe that as long as we don't touch the /dev/null we are fine
(which can be other bug of selinux imo, as we had on vm based providers).
In case we do see it after it is merged imo, we can simply move to k3d (assuming that it will work, as SR-IOV did).

Note that kind SR-IOV was not affected by this PR, kind SR-IOV still has cpu manager and k8s-1.23.

Copy link
Contributor

@dhiller dhiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhiller

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 9, 2023
@kubevirt-bot kubevirt-bot merged commit 3e52bb0 into kubevirt:main Mar 9, 2023
@oshoval
Copy link
Contributor Author

oshoval commented Mar 9, 2023

Btw the lane of vpgu on kubevirtci is optional, this is why it was merged even that the old lane failed
(running it correctly passed)

Thanks

kubevirt-bot added a commit to kubevirt-bot/kubevirt that referenced this pull request Mar 9, 2023
[3e52bb0 kind, vgpu: Bump vgpu kind to k8s-1.25](kubevirt/kubevirtci#979)
[7e486e5 k3d: Introduce k3d SR-IOV provider](kubevirt/kubevirtci#972)
[42c3f70 Fix some typos](kubevirt/kubevirtci#971)
[e37ca14 Remove the centos8 based k8s-1.26 provider](kubevirt/kubevirtci#969)
[46a9824 Run bazelisk run //robots/cmd/kubevirtci-bumper:kubevirtci-bumper -- -ensure-last-three-minor-of v1 --k8s-provider-dir /home/prow/go/src/github.com/kubevirt/project-infra/../kubevirtci/cluster-provision/k8s](kubevirt/kubevirtci#974)

```release-note
NONE
```

Signed-off-by: kubevirt-bot <[email protected]>
germag pushed a commit to germag/kubevirt that referenced this pull request Mar 13, 2023
[3e52bb0 kind, vgpu: Bump vgpu kind to k8s-1.25](kubevirt/kubevirtci#979)
[7e486e5 k3d: Introduce k3d SR-IOV provider](kubevirt/kubevirtci#972)
[42c3f70 Fix some typos](kubevirt/kubevirtci#971)
[e37ca14 Remove the centos8 based k8s-1.26 provider](kubevirt/kubevirtci#969)
[46a9824 Run bazelisk run //robots/cmd/kubevirtci-bumper:kubevirtci-bumper -- -ensure-last-three-minor-of v1 --k8s-provider-dir /home/prow/go/src/github.com/kubevirt/project-infra/../kubevirtci/cluster-provision/k8s](kubevirt/kubevirtci#974)

```release-note
NONE
```

Signed-off-by: kubevirt-bot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. size/S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants