[V2] fix: fix issue where file is read before written in e2e tests 🐞 #1326

jmclong · 2022-05-10T21:12:50Z

What type of PR is this?

/kind bug
/kind flake

What this PR does / why we need it:
This change fixes a test issue in the several e2e tests, where we are looking for a file in pods to contain a certain string. The framework.LookForStringInPodExec method fails the test automatically if the command executed has a non-zero exit code. This can happen if the pod has not completed writing to the file.

There are some additional methods like framework.RunHostCmd that we might consider using, but these assume the pod is running Linux. Because of this, I decided to roll our own method.

Requirements:

uses conventional commit messages
includes documentation
adds unit tests
tested upgrade from previous version

Special notes for your reviewer:
I changed name from Exec => PollForStringInPodsExec to make it clearer that the command passed in will be executed multiple times.

There is some duplicated code in test/resources/deployment.go and test/resources/statefulset.go, but I think it is better to do this in a separate PR if we want to refactor that.

Fixes e2e test failure symptoms similar to this:

{ Failure /home/prow/go/src/sigs.k8s.io/azuredisk-csi-driver/test/e2e/dynamic_provisioning_test.go:436
Unexpected error:
    <exec.CodeExitError>: {
        Err: <*errors.errorString | 0xc0000a3740>{
            s: "error running /usr/local/bin/kubectl --server=https://kubetest-wuw9nwbh.westus2.cloudapp.azure.com --kubeconfig=/root/tmp1662410290/kubeconfig/kubeconfig.westus2.json --namespace=azuredisk-4663 exec azuredisk-volume-tester-p797m-6495d86c45-74zxb --namespace=azuredisk-4663 -- cmd /c type C:\\mnt\\test-1\\data.txt:\nCommand stdout:\n\nstderr:\nThe system cannot find the file specified.\r\ncommand terminated with exit code 1\n\nerror:\nexit status 1",
        },
        Code: 1,
    }
    error running /usr/local/bin/kubectl --server=https://kubetest-wuw9nwbh.westus2.cloudapp.azure.com --kubeconfig=/root/tmp1662410290/kubeconfig/kubeconfig.westus2.json --namespace=azuredisk-4663 exec azuredisk-volume-tester-p797m-6495d86c45-74zxb --namespace=azuredisk-4663 -- cmd /c type C:\mnt\test-1\data.txt:
    Command stdout:
    
    stderr:
    The system cannot find the file specified.
    command terminated with exit code 1
    
    error:
    exit status 1
occurred
/home/prow/go/src/sigs.k8s.io/azuredisk-csi-driver/vendor/k8s.io/kubernetes/test/e2e/framework/util.go:603}

Release note:

none

k8s-ci-robot · 2022-05-10T21:12:59Z

Hi @jmclong. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

edreed · 2022-05-10T23:25:46Z

/ok-to-test

test/resources/utils.go

jmclong · 2022-05-11T13:40:58Z

First test failure was in
Dynamic Provisioning [single-az] should create a deployment object, write and read to it, delete the pod and write and read to it again [kubernetes.io/azure-disk] [disk.csi.azure.com] [Windows] [csi-azuredisk-scheduler-extender].

This is a test impacted by my change, but the issue appears unrelated. The first pod is deleted, but the second pod is never able to attach or mount the volume because it thinks it is still in use. If you look when the test is trying to clean up the PV on test teardown, it polls for the full 10 minutes. Ginkgo does not seem to report errors that occur during defers after the initial error.

May 11 00:07:29.291: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for azuredisk-volume-tester-9dqfp-6666c46988-n6dpl: { } Scheduled: Successfully assigned azuredisk-1498/azuredisk-volume-tester-9dqfp-6666c46988-n6dpl to k8s-agentpool1-37254516-0
May 11 00:07:29.291: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for azuredisk-volume-tester-9dqfp-6666c46988-sk4q9: { } Scheduled: Successfully assigned azuredisk-1498/azuredisk-volume-tester-9dqfp-6666c46988-sk4q9 to k8s-agentpool1-37254516-1
May 11 00:07:29.291: INFO: At 2022-05-10 23:51:34 +0000 UTC - event for azuredisk-volume-tester-9dqfp: {deployment-controller } ScalingReplicaSet: Scaled up replica set azuredisk-volume-tester-9dqfp-6666c46988 to 1
May 11 00:07:29.291: INFO: At 2022-05-10 23:51:34 +0000 UTC - event for azuredisk-volume-tester-9dqfp-6666c46988: {replicaset-controller } SuccessfulCreate: Created pod: azuredisk-volume-tester-9dqfp-6666c46988-n6dpl
May 11 00:07:29.291: INFO: At 2022-05-10 23:51:34 +0000 UTC - event for pvc-5x68d: {persistentvolume-controller } ExternalProvisioning: waiting for a volume to be created, either by external provisioner "disk.csi.azure.com" or manually created by system administrator
May 11 00:07:29.291: INFO: At 2022-05-10 23:51:34 +0000 UTC - event for pvc-5x68d: {disk.csi.azure.com_k8s-agentpool1-37254516-0_ae2584ae-f27f-4714-b649-ce129d0e36cc } Provisioning: External provisioner is provisioning volume for claim "azuredisk-1498/pvc-5x68d"
May 11 00:07:29.291: INFO: At 2022-05-10 23:51:34 +0000 UTC - event for pvc-5x68d: {persistentvolume-controller } WaitForFirstConsumer: waiting for first consumer to be created before binding
May 11 00:07:29.291: INFO: At 2022-05-10 23:51:36 +0000 UTC - event for pvc-5x68d: {disk.csi.azure.com_k8s-agentpool1-37254516-0_ae2584ae-f27f-4714-b649-ce129d0e36cc } ProvisioningSucceeded: Successfully provisioned volume pvc-0ba2a2e8-b74b-4f48-bbcb-24418260bf11
May 11 00:07:29.291: INFO: At 2022-05-10 23:51:37 +0000 UTC - event for azuredisk-volume-tester-9dqfp-6666c46988-n6dpl: {attachdetach-controller } SuccessfulAttachVolume: AttachVolume.Attach succeeded for volume "pvc-0ba2a2e8-b74b-4f48-bbcb-24418260bf11" 
May 11 00:07:29.291: INFO: At 2022-05-10 23:51:51 +0000 UTC - event for azuredisk-volume-tester-9dqfp-6666c46988-n6dpl: {kubelet k8s-agentpool1-37254516-0} Pulling: Pulling image "k8s.gcr.io/e2e-test-images/busybox:1.29-2"
May 11 00:07:29.291: INFO: At 2022-05-10 23:51:52 +0000 UTC - event for azuredisk-volume-tester-9dqfp-6666c46988-n6dpl: {kubelet k8s-agentpool1-37254516-0} Pulled: Successfully pulled image "k8s.gcr.io/e2e-test-images/busybox:1.29-2" in 428.757518ms
May 11 00:07:29.291: INFO: At 2022-05-10 23:51:52 +0000 UTC - event for azuredisk-volume-tester-9dqfp-6666c46988-n6dpl: {kubelet k8s-agentpool1-37254516-0} Started: Started container volume-tester
May 11 00:07:29.291: INFO: At 2022-05-10 23:51:52 +0000 UTC - event for azuredisk-volume-tester-9dqfp-6666c46988-n6dpl: {kubelet k8s-agentpool1-37254516-0} Created: Created container volume-tester
May 11 00:07:29.291: INFO: At 2022-05-10 23:51:54 +0000 UTC - event for azuredisk-volume-tester-9dqfp-6666c46988: {replicaset-controller } SuccessfulCreate: Created pod: azuredisk-volume-tester-9dqfp-6666c46988-sk4q9
May 11 00:07:29.291: INFO: At 2022-05-10 23:51:54 +0000 UTC - event for azuredisk-volume-tester-9dqfp-6666c46988-n6dpl: {kubelet k8s-agentpool1-37254516-0} Killing: Stopping container volume-tester
May 11 00:07:29.291: INFO: At 2022-05-10 23:51:54 +0000 UTC - event for azuredisk-volume-tester-9dqfp-6666c46988-sk4q9: {attachdetach-controller } FailedAttachVolume: Multi-Attach error for volume "pvc-0ba2a2e8-b74b-4f48-bbcb-24418260bf11" Volume is already used by pod(s) azuredisk-volume-tester-9dqfp-6666c46988-n6dpl
May 11 00:07:29.291: INFO: At 2022-05-10 23:53:57 +0000 UTC - event for azuredisk-volume-tester-9dqfp-6666c46988-sk4q9: {kubelet k8s-agentpool1-37254516-1} FailedMount: Unable to attach or mount volumes: unmounted volumes=[test-volume-1], unattached volumes=[kube-api-access-w24f5 test-volume-1]: timed out waiting for the condition
May 11 00:07:29.291: INFO: At 2022-05-10 23:56:16 +0000 UTC - event for azuredisk-volume-tester-9dqfp-6666c46988-sk4q9: {kubelet k8s-agentpool1-37254516-1} FailedMount: Unable to attach or mount volumes: unmounted volumes=[test-volume-1], unattached volumes=[test-volume-1 kube-api-access-w24f5]: timed out waiting for the condition

The second failure is in Dynamic Provisioning [multi-az] should receive FailedMount event with invalid mount options [kubernetes.io/azure-disk] [disk.csi.azure.com] [csi-azuredisk-scheduler-extender] .

This test finds the 'FailedMount' event, but the PV never gets deleted as it polls for the full 10 minutes.

test/resources/utils.go

jmclong · 2022-05-11T16:56:19Z

Failures might be due (or related) to #1300? The PVs get released but never deleted.

edreed · 2022-05-11T23:03:43Z

/approve
/lgtm

k8s-ci-robot · 2022-05-11T23:04:08Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edreed, jmclong, landreasyan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [edreed]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 10, 2022

k8s-ci-robot requested review from andyzhangx and edreed May 10, 2022 21:13

jmclong changed the title ~~fix: fix issue where file is read before written in e2e tests 🐞~~ [V2] fix: fix issue where file is read before written in e2e tests 🐞 May 10, 2022

jmclong marked this pull request as ready for review May 10, 2022 21:45

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 10, 2022

k8s-ci-robot requested a review from ZeroMagic May 10, 2022 21:45

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 10, 2022

landreasyan reviewed May 11, 2022

View reviewed changes

test/resources/utils.go Outdated Show resolved Hide resolved

jmclong force-pushed the fix-sporadic-looking-for-file branch from 7a67d6c to a410368 Compare May 11, 2022 14:25

jmclong requested a review from landreasyan May 11, 2022 14:27

landreasyan reviewed May 11, 2022

View reviewed changes

test/resources/utils.go Outdated Show resolved Hide resolved

landreasyan approved these changes May 11, 2022

View reviewed changes

fix: fix issue where file is read before written in e2e tests

b0e613e

jmclong force-pushed the fix-sporadic-looking-for-file branch from a410368 to b0e613e Compare May 11, 2022 17:19

k8s-ci-robot assigned edreed May 11, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 11, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 11, 2022

k8s-ci-robot merged commit 708c938 into kubernetes-sigs:main_v2 May 11, 2022

jmclong deleted the fix-sporadic-looking-for-file branch May 12, 2022 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V2] fix: fix issue where file is read before written in e2e tests 🐞 #1326

[V2] fix: fix issue where file is read before written in e2e tests 🐞 #1326

jmclong commented May 10, 2022 •

edited

Loading

k8s-ci-robot commented May 10, 2022

edreed commented May 10, 2022

jmclong commented May 11, 2022

jmclong commented May 11, 2022

edreed commented May 11, 2022

k8s-ci-robot commented May 11, 2022

[V2] fix: fix issue where file is read before written in e2e tests 🐞 #1326

[V2] fix: fix issue where file is read before written in e2e tests 🐞 #1326

Conversation

jmclong commented May 10, 2022 • edited Loading

k8s-ci-robot commented May 10, 2022

edreed commented May 10, 2022

jmclong commented May 11, 2022

jmclong commented May 11, 2022

edreed commented May 11, 2022

k8s-ci-robot commented May 11, 2022

jmclong commented May 10, 2022 •

edited

Loading