Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add liveness probe #225

Merged
merged 2 commits into from
Mar 1, 2019
Merged

Conversation

leakingtapan
Copy link
Contributor

Is this a bug fix or adding new feature?
Fixes: #159

What is this PR about? / Why do we need it?

What testing is done?

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: leakingtapan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 26, 2019
@k8s-ci-robot k8s-ci-robot requested a review from d-nishi February 26, 2019 02:20
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 26, 2019
@coveralls
Copy link

coveralls commented Feb 26, 2019

Pull Request Test Coverage Report for Build 486

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 67.773%

Totals Coverage Status
Change from base Build 484: 0.0%
Covered Lines: 980
Relevant Lines: 1446

💛 - Coveralls

initialDelaySeconds: 10
timeoutSeconds: 3
periodSeconds: 2
failureThreshold: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failureThreshold: 1 might be too aggressive, in their example they actually have 2:

failureThreshold: 5
...
failureThreshold: 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the test failures could have been caused because the the node driver restarted:

Logging pods the kubelet thinks is on node ip-172-20-50-17.us-west-2.compute.internal
Feb 26 02:54:34.076: INFO: kube-proxy-ip-172-20-50-17.us-west-2.compute.internal started at <nil> (0+0 container statuses recorded)
Feb 26 02:54:34.076: INFO: ebs-csi-node-xn2c8 started at 2019-02-26 02:33:06 +0000 UTC (0+3 container statuses recorded)
Feb 26 02:54:34.076: INFO: 	Container ebs-plugin ready: true, restart count 6

All 4 failures were on the volumes that take a long time to format and even if the formatting is attempted a second time, the 15 min timeout is not enough time.

Copy link
Contributor Author

@leakingtapan leakingtapan Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice the driver keeps being restarted many time even without any traffic due to probe timeout. Update the periodSeconds to 10s for less frequent probe, failureThreshold to 5 for higher failure tolerance. With the change, driver is back stable.

@bertinatto
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 1, 2019
@k8s-ci-robot k8s-ci-robot merged commit 084be25 into kubernetes-sigs:master Mar 1, 2019
@leakingtapan leakingtapan deleted the livenessprobe branch March 4, 2019 07:10
dobsonj pushed a commit to dobsonj/aws-ebs-csi-driver that referenced this pull request Oct 17, 2023
…t/cherry-pick-224-to-release-4.13

[release-4.13] OCPBUGS-13811: Volume unmount repeats after successful unmount, preventing pod delete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integration with k8s liveness probe
5 participants