-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-36734,OCPBUGS-36733,OCPBUGS-36731,OCPBUGS-36730: 4.15 critical bugs #954
OCPBUGS-36734,OCPBUGS-36733,OCPBUGS-36731,OCPBUGS-36730: 4.15 critical bugs #954
Conversation
The generic plugin was applying config changes only if the desired spec of interfaces was different from the last applied spec. This logic is different from the one in OnNodeStateChange where the real status of the interfaces is used to detect changes. By removing the LastState parameter (and related code), the generic plugin will also use the real status of interfaces to decide whether to apply changes or not. The SyncNodeState function has this logic.
Users could modify the settings of VFs which have been configured by the sriov operator. This PR starts the reconciliation loop when these changes are detected in the generic plugin. Signed-off-by: Marcelo Guerrero <[email protected]>
Logic to check missing kernel arguments is placed in a method to be used by both OnNodeStateChange and CheckStatusChanges.
/hold waiting for Link the correct jira backports before merging |
Webhook resources (`ValidatingWebhookConfiguration` and `MutatingWebhookConfiguration`) in OpenShift are configured with `service.beta.openshift.io/inject-cabundle` in a way that a third component fills the ClientConfig.CABundle field of the webhook. When reconciling webhooks, do not override the field and avoid a flakiness, as there might be a time slot in which the API server is not configured with a valid client certificate: ``` Error from server (InternalError): error when creating "policies": Internal error occurred: failed calling webhook "operator-webhook.sriovnetwork.openshift.io": failed to call webhook: Post "https://operator-webhook-service.openshift-sriov-network-operator.svc:443/mutating-custom-resource?timeout=10s": tls: failed to verify certificate: x509: certificate signed by unknown authority ``` The same behavior also happens when using CertManager Refs: - https://docs.openshift.com/container-platform/4.15/security/certificates/service-serving-certificate.html - https://issues.redhat.com/browse/OCPBUGS-32139 - https://cert-manager.io/docs/concepts/ca-injector/ Signed-off-by: Andrea Panattoni <[email protected]>
we need to be consistent with the policy order Signed-off-by: Sebastian Sch <[email protected]>
Hi @zeeke please remove the func test for mtu that is only for 4.16+ everything else looks good :) |
When the MTU set in the SRIOV Network Node Policy is lower than the actual MTU of the PF, it triggers the reconcile loop for the Node state indefinitely, preventing the configuration from completing. Signed-off-by: amaslennikov <[email protected]>
If a Virtual Function is configured with a DPDK driver (e.g. `vfio-pci`) and it is not referred by any SriovNetworkNodePolicy, `NeedToUpdateSriov` function must not trigger a reconfiguration. This may happen if a PF is configured by multiple policies (via PF partitioning) and a policy is deleted by the user. In these cases, the VF is not reconfigured [1] and a drain loop is started The same logic applies to VDPA devices. refs: [1] https://github.com/k8snetworkplumbingwg/sriov-network-operator/blob/5f3c4e903f789aa177fe54686efd6c18576b7ab1/pkg/host/internal/sriov/sriov.go#L457 Signed-off-by: Andrea Panattoni <[email protected]>
It's possible to have a race in the VFIsReady function. vf netdevice can have a default eth0 device name and be the time we call the netlink syscall to get the device information eth0 can be a different device. this cause duplicate mac allocation on vf admin mac address Signed-off-by: Sebastian Sch <[email protected]>
Signed-off-by: Andrea Panattoni <[email protected]>
@zeeke: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/jira cherrypick OCPBUGS-36507 |
@zeeke: Jira Issue OCPBUGS-36507 has been cloned as Jira Issue OCPBUGS-36730. Will retitle bug to link to clone. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira cherrypick OCPBUGS-36308 |
@zeeke: Jira Issue OCPBUGS-36308 has been cloned as Jira Issue OCPBUGS-36731. Will retitle bug to link to clone. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: SchSeba, zeeke The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/label cherry-pick-approved |
/hold cancel |
@zeeke: This pull request references Jira Issue OCPBUGS-36734, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
Requesting review from QA contact: This pull request references Jira Issue OCPBUGS-36733, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
Requesting review from QA contact: This pull request references Jira Issue OCPBUGS-36731, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: This pull request references Jira Issue OCPBUGS-36730, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
944b717
into
openshift:release-4.15
/cherrypick release-4.14 |
@zeeke: #954 failed to apply on top of branch "release-4.14":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[ART PR BUILD NOTIFIER] This PR has been included in build sriov-network-operator-container-v4.15.0-202407091338.p0.g944b717.assembly.stream.el9 for distgit sriov-network-operator. |
4.15 Backport of:
Webhook.ClientConfig.CABundle
k8snetworkplumbingwg/sriov-network-operator#711cc @SchSeba, @mlguerrero12