Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Update DiscoverSriovDevices to use /sys/class/net in order to support netns isolation #432

Closed
oshoval opened this issue Dec 1, 2020 · 2 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@oshoval
Copy link
Contributor

oshoval commented Dec 1, 2020

We found out that there is a case in which the user needs to distribute the PFs exclusivity in a few network namespaces.
For example running two clusters, each with its own netns,
and each netns with one PF exclusivity (assigned by ip link set <PF> netns <NS>).
One use case, for example, is to run 2 prow jobs on the same node, each with its own PF and netns.

Since current config-daemon DiscoverSriovDevices detects the interfaces via /sys/devices/pci*,
all the PFs would be visible because the daemon runs on host netns.
As a result the unconfigured PFs will be reset in resetSriovDevice which is called by SyncNodeState.
This will cause one cluster to corrupt the 2nd cluster, even if the PF isn't in its own netns.

Please consider using /sys/class/net/*/device/uevent for discovering instead.
Tested it for the above scenario and it fixed the problem,
i could run two clusters, each with its own PF, side by side on the same node.

As we spoke, it should be discussed if there are use cases where the daemon still needs to discover all the interfaces, via /sys/devices/pci* and then a flag should be added in order to select the desired discovery method.

see U/S k8snetworkplumbingwg/sriov-network-operator#2

/cc @zshi-redhat

@oshoval oshoval changed the title Suggestion: Update DiscoverSriovDevices to use /sys.class/net in order to support netns isolation Suggestion: Update DiscoverSriovDevices to use /sys/class/net in order to support netns isolation Dec 1, 2020
@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 21, 2021
@oshoval
Copy link
Contributor Author

oshoval commented Mar 22, 2021

Fixed on upstream

@oshoval oshoval closed this as completed Mar 22, 2021
zeeke pushed a commit to zeeke/sriov-network-operator that referenced this issue Jul 6, 2023
Remove imagePullPolicy alltogether and use k8s defaults
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

3 participants