Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k3d: Introduce k3d SR-IOV provider #972

Merged
merged 2 commits into from
Mar 8, 2023
Merged

Conversation

oshoval
Copy link
Contributor

@oshoval oshoval commented Feb 27, 2023

What this PR does / why we need it:
Since kubernetes-sigs/kind#2999 blocks us from updating
to newer k8s versions using kind (because we use cpu manager),
we are introducing k3d.

Current k8s version: v1.25.6+k3s1 (K3D_TAG=v5.4.7)

Changes:

  • Support of local multi instances was removed,
    we are not using it, and it shouldn't affect multi instances on CI,
    once we want to introduce it.
  • Added gracefully releasing of the SR-IOV nics.
    It reduces the downtime between cluster-down and cluster-up nicely,
    as the nics disappear for few minutes otherwise.
  • Only one PF per node is supported, we don't need more for now.
  • Use the k3d local registry instead one of our own.
  • The provider supports only the following configuration:
    1 server (control-plane node) and 2 agents (workers).
    If we will need other configuration it can be done on follow PR,
    for now there is no reason to support other config.
  • Few cosmetical refactoring.

Notes for reviewers:
In order to ease reviews, the first commit is a pure copy of the current kind folders.
The rest are the changes themselves.
Main files are provider.sh and common.sh, maybe better to look on them as a whole
instead of the diff because there are a lot of changes for them.

Project infra PR:

Tested on kubevirt itself as well:

Action items on follow PRs:

  • Use in memory etcd
  • Add CI proxy support
  • Support Podman on CI (works locally atm)
  • We might need to mirror k3s image to quay.

Will be done according severity.

Potential nice to have follow PRs:

  • Use cluster.yaml for the cluster creation
  • Preprovision the image to reduce local setup time and potential pull flakes.
  • Gracefully detach the nics from CI job / update CNI DEL command to do it.
  • Reduce prints during cluster-up (present verbose/debug mode maybe).

Can be extracted to an issue if needed.

@kubevirt-bot kubevirt-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 27, 2023
@kubevirt-bot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@kubevirt-bot kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/XXL labels Feb 27, 2023
@oshoval
Copy link
Contributor Author

oshoval commented Feb 27, 2023

@oshoval oshoval force-pushed the k3d branch 4 times, most recently from c4183bd to 7626450 Compare February 28, 2023 11:22
@oshoval oshoval force-pushed the k3d branch 3 times, most recently from 0738126 to 4586936 Compare February 28, 2023 12:41
@oshoval oshoval changed the title WIP: K3d SRIOV WIP: k3d: Introduce k3d SR-IOV provider Feb 28, 2023
@oshoval oshoval force-pushed the k3d branch 2 times, most recently from acad0a4 to 1447c12 Compare February 28, 2023 13:51
@enp0s3
Copy link
Contributor

enp0s3 commented Feb 28, 2023

/uncc
@oshoval Feel free to CC me if needed

@kubevirt-bot kubevirt-bot removed the request for review from enp0s3 February 28, 2023 13:55
@oshoval
Copy link
Contributor Author

oshoval commented Feb 28, 2023

/uncc @oshoval Feel free to CC me if needed

Thanks, note that uncc will continue to nag emails, you can unsubscribe instead and that would help
reducing the noise.
Sorry for the noise, even if i uncced you it wouldn't stop the emails IIRC.
@enp0s3

@oshoval oshoval force-pushed the k3d branch 2 times, most recently from ddf34f4 to faf635d Compare March 1, 2023 09:03
@oshoval oshoval changed the title WIP: k3d: Introduce k3d SR-IOV provider k3d: Introduce k3d SR-IOV provider Mar 1, 2023
@oshoval oshoval marked this pull request as ready for review March 1, 2023 09:30
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 1, 2023
@kubevirt-bot kubevirt-bot requested review from awels and ormergi March 1, 2023 09:30
@kubevirt-bot kubevirt-bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 6, 2023
@oshoval
Copy link
Contributor Author

oshoval commented Mar 6, 2023

Added info about bumping moving parts
once someone actually bump, he can update those docs in case needed.

Since kubernetes-sigs/kind#2999
blocks us from updating to newer k8s versions using kind,
we are introducing k3d.

Changes:

* Support of local multi instances was removed,
we are not using it, and it shouldn't affect multi instances on CI
once we want to introduce it.
* Added gracefully releasing of the SR-IOV nics.
It reduces the downtime between cluster-down and cluster-up nicely,
as the nics disappear for few minutes otherwise.
* Only one PF per node is supported, we don't need more for now.
* Use the k3d local registry instead one of our own.
* The provider is hardcoded with 1 server (master node) and 2 agents (workers).
If we will need other configuration it can be done on follow PR,
for now there is no reason to support other config.

Signed-off-by: Or Shoval <[email protected]>
Copy link
Member

@EdDev EdDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

It will be nice as a follow up to drop the use of Calico, as we do not really need it.

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Mar 6, 2023
@oshoval
Copy link
Contributor Author

oshoval commented Mar 6, 2023

Thank you!

It will be nice as a follow up to drop the use of Calico, as we do not really need it.

It is bit problematic, there is an internal flannel, that even if you want flannel, it need to be disabled first and deployed manually (with some changes possibly), because we are using multus and the internal's settings doesn't go well with our multus.
It is doable but will need deeper investigation on how to deploy flannel such that it will work with multus.
k8snetworkplumbingwg/multus-cni#564
k8snetworkplumbingwg/multus-cni#549
Didnt work for me out of the box yet, so i used the suggested calico and it worked great, at least for now
Maybe with a fresher look it will be easier in the future.

EDIT - managed to advance, not sure if all perfect yet though.

@oshoval
Copy link
Contributor Author

oshoval commented Mar 6, 2023

Thank you!

It will be nice as a follow up to drop the use of Calico, as we do not really need it.

Update:
With this 5ab5de3
all the tests are passing
but need to mirror the images first and i want to validate few things
we can do it on a follow PR
Note that it isn't faster, it is lighter, but for example if we will want to use k3d for development of more non SR-IOV tests,
calico might be better isn't it?

@EdDev
Copy link
Member

EdDev commented Mar 6, 2023

With this 5ab5de3
all the tests are passing
but need to mirror the images first and i want to validate few things
we can do it on a follow PR
Note that it isn't faster, it is lighter, but for example if we will want to use k3d for development of more non SR-IOV tests,
calico might be better isn't it?

I do not know why Callico will be better.
What feature of it are we using that requires it?
My interest of dropping it is to depend on less components and having a simple setup.

@oshoval
Copy link
Contributor Author

oshoval commented Mar 6, 2023

I do not know why Callico will be better. What feature of it are we using that requires it? My interest of dropping it is to depend on less components and having a simple setup.

dual stack ? single stack ipv6 ?
not sure, maybe flannel also has it
we must use flannel instead if so and install it manually because of the issues mentioned above
so it is 1:1 unless you prefer flannel, in case calico doesnt have interesting features that may help us
for developing locally etc in the future, anyhow we can always switch if needed

Cons of flannel:
We will need to mirror the images to quay, it seems calico already has mirrors at least for some versions.
I wonder if flannel will include fixes in the future in case needed as fast as calico, it seems calico adapts faster ?
(but i might be wrong here)

Copy link
Contributor

@qinqon qinqon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, in general can we move the sriov things to cluster-up/cluster/k3d/sriov.sh ?

@kubevirt-bot kubevirt-bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 8, 2023
@oshoval oshoval marked this pull request as ready for review March 8, 2023 09:38
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 8, 2023
@qinqon
Copy link
Contributor

qinqon commented Mar 8, 2023

/lgtm
/approve

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Mar 8, 2023
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qinqon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 8, 2023
@kubevirt-bot kubevirt-bot merged commit 7e486e5 into kubevirt:main Mar 8, 2023
kubevirt-bot added a commit to kubevirt-bot/kubevirt that referenced this pull request Mar 9, 2023
[7e486e5 k3d: Introduce k3d SR-IOV provider](kubevirt/kubevirtci#972)
[42c3f70 Fix some typos](kubevirt/kubevirtci#971)
[e37ca14 Remove the centos8 based k8s-1.26 provider](kubevirt/kubevirtci#969)
[46a9824 Run bazelisk run //robots/cmd/kubevirtci-bumper:kubevirtci-bumper -- -ensure-last-three-minor-of v1 --k8s-provider-dir /home/prow/go/src/github.com/kubevirt/project-infra/../kubevirtci/cluster-provision/k8s](kubevirt/kubevirtci#974)

```release-note
NONE
```

Signed-off-by: kubevirt-bot <[email protected]>
kubevirt-bot added a commit to kubevirt-bot/kubevirt that referenced this pull request Mar 9, 2023
[3e52bb0 kind, vgpu: Bump vgpu kind to k8s-1.25](kubevirt/kubevirtci#979)
[7e486e5 k3d: Introduce k3d SR-IOV provider](kubevirt/kubevirtci#972)
[42c3f70 Fix some typos](kubevirt/kubevirtci#971)
[e37ca14 Remove the centos8 based k8s-1.26 provider](kubevirt/kubevirtci#969)
[46a9824 Run bazelisk run //robots/cmd/kubevirtci-bumper:kubevirtci-bumper -- -ensure-last-three-minor-of v1 --k8s-provider-dir /home/prow/go/src/github.com/kubevirt/project-infra/../kubevirtci/cluster-provision/k8s](kubevirt/kubevirtci#974)

```release-note
NONE
```

Signed-off-by: kubevirt-bot <[email protected]>
germag pushed a commit to germag/kubevirt that referenced this pull request Mar 13, 2023
[3e52bb0 kind, vgpu: Bump vgpu kind to k8s-1.25](kubevirt/kubevirtci#979)
[7e486e5 k3d: Introduce k3d SR-IOV provider](kubevirt/kubevirtci#972)
[42c3f70 Fix some typos](kubevirt/kubevirtci#971)
[e37ca14 Remove the centos8 based k8s-1.26 provider](kubevirt/kubevirtci#969)
[46a9824 Run bazelisk run //robots/cmd/kubevirtci-bumper:kubevirtci-bumper -- -ensure-last-three-minor-of v1 --k8s-provider-dir /home/prow/go/src/github.com/kubevirt/project-infra/../kubevirtci/cluster-provision/k8s](kubevirt/kubevirtci#974)

```release-note
NONE
```

Signed-off-by: kubevirt-bot <[email protected]>
@oshoval oshoval mentioned this pull request Mar 30, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants