Skip to content
This repository has been archived by the owner on Oct 22, 2024. It is now read-only.

test flake: updating labels in default deployment while running – fedora-1_19.olm-operator.API #742

Closed
pohly opened this issue Sep 21, 2020 · 10 comments
Assignees
Labels
0.8 needs to be fixed in 0.8.x

Comments

@pohly
Copy link
Contributor

pohly commented Sep 21, 2020

From https://cloudnative-k8sci.southcentralus.cloudapp.azure.com/blue/organizations/jenkins/pmem-csi/detail/PR-738/2/tests:

/mnt/workspace/pmem-csi_PR-738/test/e2e/operator/deployment_api.go:461
Sep 21 10:17:28.182: validate driver after update and restart
Unexpected error:
<*errors.errorString | 0xc003051a70>: {
s: "timed out waiting for deployment, last error: deployed driver different from expected deployment:\nlabel foo missing for \"pmem-csi-with-defaults-external-provisioner-cfg\" of type \"rbac.authorization.k8s.io/v1, Kind=Role\" in namespace \"default\"\nlabel foo missing for \"pmem-csi-with-defaults-external-provisioner-runner\" of type \"rbac.authorization.k8s.io/v1, Kind=ClusterRole\" in namespace \"\"\nlabel foo missing for \"pmem-csi-with-defaults-csi-provisioner-role-cfg\" of type \"rbac.authorization.k8s.io/v1, Kind=RoleBinding\" in namespace \"default\"\nlabel foo missing for \"pmem-csi-with-defaults-csi-provisioner-role\" of type \"rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding\" in namespace \"\"\nlabel foo missing for \"pmem-csi-with-defaults-controller\" of type \"/v1, Kind=ServiceAccount\" in namespace \"default\"\nlabel foo missing for \"pmem-csi-with-defaults-controller\" of type \"/v1, Kind=Service\" in namespace \"default\"\nlabel foo missing for \"pmem-csi-with-defaults-metrics\" of type \"/v1, Kind=Service\" in namespace \"default\"\nlabel foo missing for \"pmem-csi-with-defaults\" of type \"storage.k8s.io/v1beta1, Kind=CSIDriver\" in namespace \"\"",
}
timed out waiting for deployment, last error: deployed driver different from expected deployment:
label foo missing for "pmem-csi-with-defaults-external-provisioner-cfg" of type "rbac.authorization.k8s.io/v1, Kind=Role" in namespace "default"
label foo missing for "pmem-csi-with-defaults-external-provisioner-runner" of type "rbac.authorization.k8s.io/v1, Kind=ClusterRole" in namespace ""
label foo missing for "pmem-csi-with-defaults-csi-provisioner-role-cfg" of type "rbac.authorization.k8s.io/v1, Kind=RoleBinding" in namespace "default"
label foo missing for "pmem-csi-with-defaults-csi-provisioner-role" of type "rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding" in namespace ""
label foo missing for "pmem-csi-with-defaults-controller" of type "/v1, Kind=ServiceAccount" in namespace "default"
label foo missing for "pmem-csi-with-defaults-controller" of type "/v1, Kind=Service" in namespace "default"
label foo missing for "pmem-csi-with-defaults-metrics" of type "/v1, Kind=Service" in namespace "default"
label foo missing for "pmem-csi-with-defaults" of type "storage.k8s.io/v1beta1, Kind=CSIDriver" in namespace ""
occurred
/mnt/workspace/pmem-csi_PR-738/test/e2e/operator/deployment_api.go:458
@pohly
Copy link
Contributor Author

pohly commented Sep 21, 2020

@avalluri : can you have a look at this? Is the timeout perhaps simply too short?

@pohly pohly changed the title test flake: test flake: updating labels in default deployment while running – fedora-1_19.olm-operator.API Sep 21, 2020
@avalluri
Copy link
Contributor

I doubt if this failure is with timeout. For some reason the chagnes the operator detected for that deployment is different:

pmem-csi-operator-fbd749dcb-7j4vw/[email protected]: I0921 10:12:33.820351 1 controller_driver.go:97] Deployment: "pmem-csi-with-defaults", state "Running", changes map[deviceMode:{} imagePullPolicy:{} logLevel:{} controllerResources:{} nodeResources:{} nodeSelector:{} pmemPercentage:{} kubeletDir:{}], in cache true

I suspect this is result of some other previous test failures.

@avalluri
Copy link
Contributor

Okey, there is no other test failures, something tricky.

@pohly
Copy link
Contributor Author

pohly commented Sep 23, 2020

And another one:
https://cloudnative-k8sci.southcentralus.cloudapp.azure.com/blue/organizations/jenkins/pmem-csi/detail/version-skew-2/22/pipeline/93

[Fail] olm-operator API updating provisionerImage in deployment with specific values [It] while running 

@pohly
Copy link
Contributor Author

pohly commented Sep 23, 2020

In #723 (comment), @avalluri wrote:

We could revert 1a433b7. I see that is causing the operator to detect default values as changes.

@pohly
Copy link
Contributor Author

pohly commented Sep 23, 2020

While all of the examples above are from PR #723, that's simply because I've been working with that PR the most. It also occurs in "devel":
https://cloudnative-k8sci.southcentralus.cloudapp.azure.com/job/pmem-csi/job/devel/242/execution/node/110/log/?consoleFull

02:29:11.403  Sep 22 13:58:27.783: FAIL: validate driver after update and restart
02:29:11.403  Unexpected error:
02:29:11.403      <*errors.errorString | 0xc003a7be80>: {
02:29:11.403          s: "timed out waiting for deployment, last error: deployed driver different from expected deployment:\nlabel foo missing for \"pmem-csi-with-defaults-external-provisioner-cfg\" of type \"rbac.authorization.k8s.io/v1, Kind=Role\" in namespace \"default\"\nlabel foo missing for \"pmem-csi-with-defaults-external-provisioner-runner\" of type \"rbac.authorization.k8s.io/v1, Kind=ClusterRole\" in namespace \"\"\nlabel foo missing for \"pmem-csi-with-defaults-csi-provisioner-role-cfg\" of type \"rbac.authorization.k8s.io/v1, Kind=RoleBinding\" in namespace \"default\"\nlabel foo missing for \"pmem-csi-with-defaults-csi-provisioner-role\" of type \"rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding\" in namespace \"\"\nlabel foo missing for \"pmem-csi-with-defaults-controller\" of type \"/v1, Kind=ServiceAccount\" in namespace \"default\"\nlabel foo missing for \"pmem-csi-with-defaults-controller\" of type \"/v1, Kind=Service\" in namespace \"default\"\nlabel foo missing for \"pmem-csi-with-defaults-metrics\" of type \"/v1, Kind=Service\" in namespace \"default\"\nlabel foo missing for \"pmem-csi-with-defaults\" of type \"storage.k8s.io/v1beta1, Kind=CSIDriver\" in namespace \"\"",
02:29:11.403      }
02:29:11.403      timed out waiting for deployment, last error: deployed driver different from expected deployment:
02:29:11.403      label foo missing for "pmem-csi-with-defaults-external-provisioner-cfg" of type "rbac.authorization.k8s.io/v1, Kind=Role" in namespace "default"
02:29:11.403      label foo missing for "pmem-csi-with-defaults-external-provisioner-runner" of type "rbac.authorization.k8s.io/v1, Kind=ClusterRole" in namespace ""
02:29:11.403      label foo missing for "pmem-csi-with-defaults-csi-provisioner-role-cfg" of type "rbac.authorization.k8s.io/v1, Kind=RoleBinding" in namespace "default"
02:29:11.403      label foo missing for "pmem-csi-with-defaults-csi-provisioner-role" of type "rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding" in namespace ""
02:29:11.403      label foo missing for "pmem-csi-with-defaults-controller" of type "/v1, Kind=ServiceAccount" in namespace "default"
02:29:11.403      label foo missing for "pmem-csi-with-defaults-controller" of type "/v1, Kind=Service" in namespace "default"
02:29:11.403      label foo missing for "pmem-csi-with-defaults-metrics" of type "/v1, Kind=Service" in namespace "default"
02:29:11.403      label foo missing for "pmem-csi-with-defaults" of type "storage.k8s.io/v1beta1, Kind=CSIDriver" in namespace ""
02:29:11.403  occurred

@pohly
Copy link
Contributor Author

pohly commented Sep 23, 2020

We could revert 1a433b7. I see that is causing the operator to detect default values as changes.

Why is this flaky?

If you think that you have a fix, then please prepare a PR. But it needs to have a good explanation of why the revised code is correct and why the old one wasn't.

@avalluri
Copy link
Contributor

I submitted the fix #743 for the deployment comparison issue. The issue was with the recent change of reverting deployment spec changes. That also resets from the cached copy of that deployment and in next reconcile loop comparing the cached deployment against the updated deployment with set defults, shows all those defaults as changes.

pohly added a commit to pohly/pmem-CSI that referenced this issue Sep 24, 2020
The operator had a bug where its change detection caused it to update
objects unnecessarily. This wasn't caught before by the tests because
the modified objects still had the expected content (as far as we
know, at least - intel#742 still
lacks a proper explanation).

Now the update unit test catches that bug:

    TestDeploymentController/Kubernetes_1.18/updating/pmemPercentage_in_default_deployment: deployment_controller_test.go:264:
        	Error Trace:	deployment_controller_test.go:264
        	            				deployment_controller_test.go:559
        	            				deployment_controller_test.go:584
        	Error:      	Received unexpected error:
        	            	deployed driver different from expected deployment:
        	            	object was modified unnecessarily: "pmem-csi-with-defaults-node" of type "apps/v1, Kind=DaemonSet" in namespace "test-namespace"
        	            	object was modified unnecessarily: "pmem-csi-with-defaults-controller" of type "apps/v1, Kind=StatefulSet" in namespace "test-namespace"
        	Test:       	TestDeploymentController/Kubernetes_1.18/updating/pmemPercentage_in_default_deployment
        	Messages:   	validate deployment

We cannot use the same validation during E2E testing because the app
controllers also modify the objects by setting their status.
pohly added a commit to pohly/pmem-CSI that referenced this issue Sep 24, 2020
The operator had a bug where its change detection caused it to update
objects unnecessarily. This wasn't caught before by the tests because
the modified objects still had the expected content (as far as we
know, at least - intel#742 still
lacks a proper explanation).

Now the update unit test catches that bug:

    TestDeploymentController/Kubernetes_1.18/updating/pmemPercentage_in_default_deployment: deployment_controller_test.go:264:
        	Error Trace:	deployment_controller_test.go:264
        	            				deployment_controller_test.go:559
        	            				deployment_controller_test.go:584
        	Error:      	Received unexpected error:
        	            	deployed driver different from expected deployment:
        	            	object was modified unnecessarily: "pmem-csi-with-defaults-node" of type "apps/v1, Kind=DaemonSet" in namespace "test-namespace"
        	            	object was modified unnecessarily: "pmem-csi-with-defaults-controller" of type "apps/v1, Kind=StatefulSet" in namespace "test-namespace"
        	Test:       	TestDeploymentController/Kubernetes_1.18/updating/pmemPercentage_in_default_deployment
        	Messages:   	validate deployment

We cannot use the same validation during E2E testing because the app
controllers also modify the objects by setting their status.
@pohly
Copy link
Contributor Author

pohly commented Oct 2, 2020

Not seen anymore. We don't know why it's gone now, but let's close the issue.

@pohly pohly closed this as completed Oct 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
0.8 needs to be fixed in 0.8.x
Projects
None yet
Development

No branches or pull requests

2 participants