Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'IB VF GUID Configuration' design doc #653

Conversation

almaslennikov
Copy link
Contributor

No description provided.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link
Collaborator

@ykulazhenkov ykulazhenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx @almaslennikov.

I added few comments

doc/design/ib-vf-configuration.md Show resolved Hide resolved
doc/design/ib-vf-configuration.md Show resolved Hide resolved
doc/design/ib-vf-configuration.md Outdated Show resolved Hide resolved
@almaslennikov almaslennikov force-pushed the guid-config-design-doc branch from e69cac9 to 4727e2f Compare March 13, 2024 07:20
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@almaslennikov almaslennikov force-pushed the guid-config-design-doc branch from 4727e2f to b02243e Compare March 13, 2024 07:22
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

it’s proposed not to return an error in this case but assign as many GUIDs as possible.
To ensure that nothing breaks when users add/remove VFs, the GUID distribution order should always be the same for each individual host.

If there are fewer GUIDs than VFs, then all the GUIDs should be assigned.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we fail in this case ?
or do we randomly generate for the rest ?

if you have 5 GUIDS and 10 VFs, what will be the "other" 5 GUIDs ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They will be assigned randomly as it's done now. Noted that in the doc

### Goals

* IB GUID configuration can be read from a static json file on the host
* IB GUID configuration is static and created in advance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add per node right as you need unique pkeys in the cluster right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@almaslennikov did you add this clarification to the design doc ?

doc/design/ib-vf-configuration.md Show resolved Hide resolved

There can be fewer VFs created than GUIDs. To persist the dynamic nature of the SR-IOV Network operator,
it’s proposed not to return an error in this case but assign as many GUIDs as possible.
To ensure that nothing breaks when users add/remove VFs, the GUID distribution order should always be the same for each individual host.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how this will work with the parallel nic configuration that was implemented in the sriov operator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that if we synchronize the access to the GUID pool, we shouldn't face any issues

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add this one to the test or validation section so we remember to test this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we have changed the impl so that the pool object is immutable. No additional synchronization measures are needed. Tested that on a baremetal cluster


If there are fewer GUIDs than VFs, then all the GUIDs should be assigned.

### Config file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will be the know location of this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to use /var/opt/infiniband_guids. It should be writable across different cloud platforms. Noted that in the doc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have folder that is in used by the sriov operator maybe we can use that one and not create another one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's store the file in /etc/sriov-operator/infiniband/guids then

doc/design/ib-vf-configuration.md Show resolved Hide resolved
@almaslennikov almaslennikov force-pushed the guid-config-design-doc branch from b02243e to 9347961 Compare March 18, 2024 07:02
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@ykulazhenkov
Copy link
Collaborator

thx, looks good to me

@almaslennikov almaslennikov force-pushed the guid-config-design-doc branch from 9347961 to 8da9cce Compare April 16, 2024 06:35
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@almaslennikov almaslennikov force-pushed the guid-config-design-doc branch from 8da9cce to e71fe7d Compare April 29, 2024 07:23
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@almaslennikov almaslennikov force-pushed the guid-config-design-doc branch from e71fe7d to 854311c Compare April 29, 2024 07:29
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks OK. added a few small comments, once addressed im LGTM

doc/design/ib-vf-configuration.md Show resolved Hide resolved
doc/design/ib-vf-configuration.md Outdated Show resolved Hide resolved
doc/design/ib-vf-configuration.md Outdated Show resolved Hide resolved
doc/design/ib-vf-configuration.md Outdated Show resolved Hide resolved
doc/design/ib-vf-configuration.md Outdated Show resolved Hide resolved
doc/design/ib-vf-configuration.md Outdated Show resolved Hide resolved
doc/design/ib-vf-configuration.md Outdated Show resolved Hide resolved
doc/design/ib-vf-configuration.md Show resolved Hide resolved
doc/design/ib-vf-configuration.md Outdated Show resolved Hide resolved
@almaslennikov almaslennikov force-pushed the guid-config-design-doc branch from 854311c to a7c1294 Compare May 15, 2024 07:01
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minor nit otherwise lgtm

@coveralls
Copy link

coveralls commented May 16, 2024

Pull Request Test Coverage Report for Build 9203390833

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.02%) to 39.615%

Files with Coverage Reduction New Missed Lines %
controllers/drain_controller.go 1 68.06%
Totals Coverage Status
Change from base Build 9203023396: 0.02%
Covered Lines: 5123
Relevant Lines: 12932

💛 - Coveralls

@almaslennikov almaslennikov force-pushed the guid-config-design-doc branch from a7c1294 to 93fa079 Compare May 21, 2024 06:06
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link
Member

@zeeke zeeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one minor comment. Other than that, LGTM

doc/design/ib-vf-configuration.md Show resolved Hide resolved
@almaslennikov almaslennikov force-pushed the guid-config-design-doc branch from 93fa079 to 7ead95e Compare May 23, 2024 06:38
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link
Member

@zeeke zeeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adrianchiris
Copy link
Collaborator

LGTM, merging.

@adrianchiris adrianchiris merged commit d0f214e into k8snetworkplumbingwg:master Jun 5, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants