Add liveness probe #225

leakingtapan · 2019-02-26T02:20:16Z

Is this a bug fix or adding new feature?
Fixes: #159

What is this PR about? / Why do we need it?

What testing is done?

k8s-ci-robot · 2019-02-26T02:20:21Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: leakingtapan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [leakingtapan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coveralls · 2019-02-26T02:26:02Z

Pull Request Test Coverage Report for Build 486

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 67.773%

Totals
Change from base Build 484:	0.0%
Covered Lines:	980
Relevant Lines:	1446

💛 - Coveralls

dkoshkin · 2019-02-26T09:29:20Z

deploy/kubernetes/controller.yaml

+            initialDelaySeconds: 10
+            timeoutSeconds: 3
+            periodSeconds: 2
+            failureThreshold: 1


failureThreshold: 1 might be too aggressive, in their example they actually have 2:

failureThreshold: 5 ... failureThreshold: 1

Looks like the test failures could have been caused because the the node driver restarted:

Logging pods the kubelet thinks is on node ip-172-20-50-17.us-west-2.compute.internal Feb 26 02:54:34.076: INFO: kube-proxy-ip-172-20-50-17.us-west-2.compute.internal started at <nil> (0+0 container statuses recorded) Feb 26 02:54:34.076: INFO: ebs-csi-node-xn2c8 started at 2019-02-26 02:33:06 +0000 UTC (0+3 container statuses recorded) Feb 26 02:54:34.076: INFO: Container ebs-plugin ready: true, restart count 6

All 4 failures were on the volumes that take a long time to format and even if the formatting is attempted a second time, the 15 min timeout is not enough time.

Notice the driver keeps being restarted many time even without any traffic due to probe timeout. Update the periodSeconds to 10s for less frequent probe, failureThreshold to 5 for higher failure tolerance. With the change, driver is back stable.

deploy/kubernetes/controller.yaml

bertinatto · 2019-03-01T09:04:23Z

/lgtm

…t/cherry-pick-224-to-release-4.13 [release-4.13] OCPBUGS-13811: Volume unmount repeats after successful unmount, preventing pod delete

leakingtapan requested review from bertinatto and dkoshkin February 26, 2019 02:20

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 26, 2019

k8s-ci-robot requested a review from d-nishi February 26, 2019 02:20

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 26, 2019

dkoshkin reviewed Feb 26, 2019

View reviewed changes

bertinatto reviewed Feb 26, 2019

View reviewed changes

deploy/kubernetes/controller.yaml Outdated Show resolved Hide resolved

dkoshkin mentioned this pull request Feb 27, 2019

aws-ebs/deployments d2iq-archive/csi-driver-deployments#1

Merged

leakingtapan force-pushed the livenessprobe branch from da040ad to 11535a4 Compare February 28, 2019 17:29

Cheng Pan added 2 commits February 28, 2019 10:45

Add liveness probe

02492d0

Remove imagePullPolicy in favor of image pull best practise

f80776c

leakingtapan force-pushed the livenessprobe branch from 11535a4 to f80776c Compare February 28, 2019 18:45

leakingtapan requested review from bertinatto and dkoshkin February 28, 2019 18:48

dkoshkin approved these changes Feb 28, 2019

View reviewed changes

k8s-ci-robot assigned bertinatto Mar 1, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 1, 2019

k8s-ci-robot merged commit 084be25 into kubernetes-sigs:master Mar 1, 2019

dkoshkin mentioned this pull request Mar 1, 2019

aws-ebs: add livenessProbe and remove imagePullPolicy d2iq-archive/csi-driver-deployments#2

Merged

leakingtapan deleted the livenessprobe branch March 4, 2019 07:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add liveness probe #225

Add liveness probe #225

leakingtapan commented Feb 26, 2019

k8s-ci-robot commented Feb 26, 2019

coveralls commented Feb 26, 2019 •

edited

Loading

dkoshkin Feb 26, 2019

dkoshkin Feb 26, 2019

leakingtapan Feb 28, 2019 •

edited

Loading

bertinatto commented Mar 1, 2019

Add liveness probe #225

Add liveness probe #225

Conversation

leakingtapan commented Feb 26, 2019

k8s-ci-robot commented Feb 26, 2019

coveralls commented Feb 26, 2019 • edited Loading

Pull Request Test Coverage Report for Build 486

💛 - Coveralls

dkoshkin Feb 26, 2019

Choose a reason for hiding this comment

dkoshkin Feb 26, 2019

Choose a reason for hiding this comment

leakingtapan Feb 28, 2019 • edited Loading

Choose a reason for hiding this comment

bertinatto commented Mar 1, 2019

coveralls commented Feb 26, 2019 •

edited

Loading

leakingtapan Feb 28, 2019 •

edited

Loading