-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor health checks and wait until NGINX process ends #4487
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4487 +/- ##
=========================================
Coverage ? 59.14%
=========================================
Files ? 89
Lines ? 6780
Branches ? 0
=========================================
Hits ? 4010
Misses ? 2337
Partials ? 433
Continue to review full report at Codecov.
|
Shutdown works like this
|
Changes required:
With these changes, after a stop is triggered:
|
f6ac5c9
to
cb7009e
Compare
12ca913
to
624af90
Compare
ad21cbc
to
71d9a0b
Compare
9f2bbb3
to
087f47c
Compare
/retest |
// StatusSocket defines the location of the unix socket used by NGINX for the status server | ||
var StatusSocket = "/tmp/nginx-status-server.sock" | ||
// StatusPort port used by NGINX for the status server | ||
var StatusPort = 10256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason you're switching to tcp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason you're switching to tcp?
If the node is smaller than should be or during spikes of CPU, the request fails (socket). I cannot reproduce this consistently but when I can is only related to CPU utilization.
After this is merged I will add an e2e test so show how this can be reproduced (using stress)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aledbf did this e2e test ever get added? I'm running into a similar situation when CPU spikes and I'm wondering if the switch away from a socket actually resolves it
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aledbf, ElvinEfendi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Question for @aledbf : Was there any reason why not to use Referring to this line: https://github.com/aledbf/ingress-nginx/blob/master/cmd/waitshutdown/main.go#L29 |
@fred I know your question is quite old, but for anyone who is wondering, it sends SIGTERM to nginx-ingress-controller, which in turn sends SIQUIT to nginx. |
What this PR does / why we need it:
Removes the go pprof endpoint from the healthz port (10254).
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #4006 #4052Special notes for your reviewer: