-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change upstream on error when sticky session balancer is used #4048
Change upstream on error when sticky session balancer is used #4048
Conversation
@fedunineyu thanks for your PR, can you add at unit tests for this? |
Here you are: 0d7029f9468722671342fa0446740dbc29662443 |
I've made several load tests by scenario in #4035 and noticed that proposed fix should be updated: it doesn't work well in the situations when the failing request arrived without sticky cookie. I'm going to obtain failing upstream from |
@ElvinEfendi Recent e2e test run failed :
Should I restart it? If yes, how can I do it without commit? |
/retest |
@fedunineyu: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
@@ -59,31 +67,82 @@ local function set_cookie(self, value) | |||
end | |||
end | |||
|
|||
function _M.get_last_failure() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have to be public function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is made "public" to change its behavior for request failure simulation (see this test).
|
||
-- use previous upstream if this is the first attempt or previous attempt succeeded | ||
if state_name == nil and upstream_from_cookie ~= nil then | ||
do return upstream_from_cookie end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do ... end
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, right. They are not required.
removed in ddf738fddadfb7ea9090ed2bd6fe277a34dcb81e
@fedunineyu I did not have time to extensively review yet, but will get to this sometime this week. |
Looking more at this PR, it's suggesting a fundamental change - you're kinda doing passive healthchecking. Please read #4035 (comment). |
I've posted reply to your comment. |
What if there was a network blip and request to the sticky endpoint fails? With your PR you will generate a new cookie and pick a new endpoint. But it is possible that if you retried the same endpoint request could have succeeded, so you unnecessarily broke stickiness.
Nginx's standard behaviour does not dictate how you choose upstream/endpoint on retry. It's left to the balancer to decide that. Therefore I'm saying you are changing the ingress-nginx sticky balancer implementation conceptually - now you are breaking stickiness on first failure you see, I'm not sure if this is what most of the people expects. Current idea behind stickiness implementation is simple: proxy to the same endpoint as long as it is healthy. And healthiness is defined by Kubernetes Readiness probe. Nginx Plus seems to have
But that does not define when it deems a server as "cannot process a request". Is it based on healthchecking? Is it based on the first failure it sees? Is it based of |
Right you are. Stickiness will be broken for those requests that where issued during network blip. But they would be processed by another upstream. As applications should tolerate session lost (containers can stop working, nodes can restart and so on, right?), I think, there should be nothing serious with this issue.
Ah, I see. Now I understand your comment about fundamental change. Returning to the people expectations... |
I want to mention here about HAProxy. It has redispatch option:
So adding similar configuration flag (annotation) is not such bad idea? |
@fedunineyu @yadolov I like the idea of new configuration option using annotation 👍 |
In 8b0944345e808c4f00a6a2bbb1b1c3fbefcecf4e I've added support for annotation |
@ElvinEfendi I'd like to fix it in a separate PR but to avoid conflicts it would be better to merge this PR first. |
@fedunineyu that was an intentional change. We decided since there's no revealing information and security risk why would we hash it unnecessarily. Let me know if you think that's concerning. |
@ElvinEfendi |
/ok-to-test |
@fedunineyu please squash the commits and rebase |
Codecov Report
@@ Coverage Diff @@
## master #4048 +/- ##
=========================================
+ Coverage 57.76% 57.8% +0.04%
=========================================
Files 87 87
Lines 6459 7037 +578
=========================================
+ Hits 3731 4068 +337
- Misses 2296 2512 +216
- Partials 432 457 +25
Continue to review full report at Codecov.
|
1. Session cookie is updated on previous attempt failure when `session-cookie-change-on-failure = true` (default value is `false`). 2. Added tests to check both cases. 3. Updated docs. Co-Authored-By: Vladimir Grishin <[email protected]>
8b09443
to
254629c
Compare
Done! |
/retest |
/approve |
Give me a few more days on this. |
/lgtm Thanks @fedunineyu ! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aledbf, ElvinEfendi, fedunineyu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
Currently with sticky sessions enabled next upstream is not requested on error (details are in #4035).
In this PR lua script for sticky session balancer is modified so that on error key for consistent hash is regenerated to point to another upstream.
Which issue this PR fixes : fixes #4035