-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce a balancer interface #2543
introduce a balancer interface #2543
Conversation
84d0006
to
8e42ddd
Compare
c711cfa
to
422ef31
Compare
Codecov Report
@@ Coverage Diff @@
## master #2543 +/- ##
==========================================
+ Coverage 40.63% 40.77% +0.13%
==========================================
Files 74 74
Lines 5094 5094
==========================================
+ Hits 2070 2077 +7
+ Misses 2743 2737 -6
+ Partials 281 280 -1
Continue to review full report at Codecov.
|
/hold |
/assign @aledbf |
@ElvinEfendi this looks really good. Just one question about this, can you add information about the ingress/service in the log messages? Not necessarily in this PR but I think we need more information than just IP/Port in case of failures. WDYT? |
@andrewlouis93 @csfrancis @ibawt would be great if some of you can take a look and give feedback as well. |
@aledbf Later on I'd like to make |
/lgtm |
/assign @andrewlouis93 |
@andrewlouis93: GitHub didn't allow me to assign the following users: andrewlouis93. Note that only kubernetes members and repo collaborators can be assigned. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
rootfs/etc/nginx/lua/balancer.lua
Outdated
return backend | ||
end | ||
local function get_implementation(backend) | ||
local name = backend["load-balance"] or DEFAULT_LB_ALG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe out the scope of this PR - but could we have the DEFAULT_LB_ALG set per ingress?
If a user wanted to use a default lb algorithm it would currently require each app behind the ingress to update their load-balance
attribute. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewlouis93 load-balance
is also a configmap property so you can have a default value of it per ingress deployment and then customize further per app using the annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but currently this lua code is not reading the load-balance
configmap property is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backend["load-balance"]
is not necessary only load-balance
annotation. It is whatever is set to the Backend
struct in controller side. So yeah Lua will get custom load-balance algorithm when you set it in configmap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, makes sense :)
rootfs/etc/nginx/lua/balancer.lua
Outdated
if not backend then | ||
return nil | ||
if backend["sessionAffinityConfig"] and backend["sessionAffinityConfig"]["name"] == "cookie" then | ||
name = "sticky" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a backend load-balance
algorithm is set, and is overriden here since sessionAffinity
is required - should we emit a log message here indicating that the load-balance implementation is being ignored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a default behaviour in non dynamic mode as well. We can maybe document better how this annotation overrides load-balance
annotation and config, but I don't see how logging here would add any value.
That being said we can do INFO
logging if you think it is valuable. But I'd rather see this becoming necessity in practice first and then add the logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That being said we can do INFO logging if you think it is valuable. But I'd rather see this becoming necessity in practice first and then add the logging.
💯
code LGTM @ElvinEfendi did you any specific worries you'd like me to go over in depth? |
function _M.balance() | ||
local balancer = get_balancer() | ||
if not balancer then | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to log something if not balancer here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewlouis93 https://github.com/kubernetes/ingress-nginx/pull/2543/files#diff-b00d77a6df9c8c05a483044b08e6bc50R103 short-circuits when balancer does not exist and this code does not even get executed. And we get 503 response code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right I thought about that - but consider the case when a sync_backends
that results in balancers = {}
occurs between a request's rewrite and balance phase?
rootfs/etc/nginx/lua/balancer.lua
Outdated
end | ||
end | ||
|
||
function _M.log() | ||
local balancer = get_balancer() | ||
if not balancer then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ibawt thanks for looking into, reviewing general design approach is enough for now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only really looked at balancer.lua
, but I mostly approve of the interface. I question if rewrite
is necessary, but everything else looks good.
rootfs/etc/nginx/lua/balancer.lua
Outdated
return | ||
end | ||
|
||
balancer:after_balance() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be checking if the function exists, since it can be optional.
rootfs/etc/nginx/lua/balancer.lua
Outdated
end | ||
|
||
local host, port = balance() | ||
local host, port = balancer:balance() | ||
if not host then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you be checking that port
is set as well?
rootfs/etc/nginx/lua/balancer.lua
Outdated
if not host then | ||
ngx.status = ngx.HTTP_SERVICE_UNAVAILABLE | ||
return ngx.exit(ngx.status) | ||
ngx.log(ngx.WARN, "no host returned, balancer: " .. balancer.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why you got rid of the 503 here? To me it seems like it would be better to be more explicit about this error case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned at https://github.com/kubernetes/ingress-nginx/pull/2543/files#r191262645 it is not possible to modify status at this phase.
end | ||
|
||
ngx_balancer.set_more_tries(1) | ||
|
||
local ok, err = ngx_balancer.set_current_peer(host, port) | ||
if not ok then | ||
ngx.log(ngx.ERR, string.format("error while setting current upstream peer to %s", tostring(err))) | ||
ngx.log(ngx.ERR, "error while setting current upstream peer to " .. tostring(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment here ... should this return a 503?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local balancer = get_balancer() | ||
if not balancer then | ||
ngx.status = ngx.HTTP_SERVICE_UNAVAILABLE | ||
return ngx.exit(ngx.status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super familiar with the ingress-nginx config, but I'm guessing this means that every configured upstream
will be configured to use Lua load balancing? Likewise, will rewrite_by_lua
be called for every configured site? Why not move this logic into balance
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csfrancis this was in balance()
before. But it turns out modifying ngx.status
is not allowed in balance
phase.
but I'm guessing this means that every configured upstream will be configured to use Lua load balancing
that's correct it's either all or none
rootfs/etc/nginx/lua/balancer.lua
Outdated
return | ||
end | ||
|
||
if getmetatable(balancer) ~= implementation then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is weird .. why are you calling getmetatable
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csfrancis because of the way OOP implemented in Lua. Another option would be to compare .name
but IMO this is more elegant.
If you notice new()
function in implementations return a new Lua table and sets its metatable to self
.
So this logic is to detect the situation where an app starts with let's say EWMA, then in next deploy switches to Round Robin. This condition will detect that and replace EWMA instance with new RR instance for the backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird .. that was not immediately obvious to me. More elegant, but I feel like .name
is more clear. But that's just my opinion, man.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a comment would make this clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csfrancis I wrote this with .name
initially - another reason why I switched to the current version was that if we use .name
for this then we have to make that field mandatory for every implementation. Currently .name
is more for logging purposes and optional.
I'll extract this into a function with a meaningful name :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe just add comment for now
add inline comment to make LB algorithm change detection logic clearer also require port in addition to host
5d12848
to
b4e6513
Compare
/hold cancel |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aledbf, ElvinEfendi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
This PR refactors balancer logic to be more modular, extensible and testable. It also fixes bug with returning 503 by using
rewrite_by_lua
handler. Now balancer.lua has four public methods namelyinit_worker()
,rewrite()
,balance()
, andlog()
which get called in their respective phase.init_worker()
function kicks in when Nginx worker starts. It calls privatesync_backends()
function that takes care of initializing, updating and deleting respective balancer instances for the backends.sync_backends()
also gets called periodically in every 1 second to keep the backend balancers in sync for the given Nginx worker.rewrite()
function takes care of returning 503 when there's no balancer for the backend current request corresponds to.balancer()
gets the balancer instance for given request/backend and calls:balance()
function which executes the respective load balancing algorithm and returnshost
,port
pair.balancer()
then takes care of calling Nginx API function to set the upstream peer.log()
function gets the balancer instance for given request/backend and calls:after_balance()
. Currently only EWMA algorithm implementsafter_balance()
. This is useful when the LB algorithm needs to store some stats related to i.e response times or response status code etc.--
With this PR if someone wants to introduce a new load balancing algorithm they can create the respective Lua module in
balancer/
folder and register it inbalancer.lua
. Every new load balancing module has to implement at leastnew(backend)
,sync(backend)
andbalance()
. Optionally they can implementafter_balance()
as well.--
Once this PR gets merged I'll add more unit tests which will most likely make some of the related e2e tests weak enough that we can just delete.
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #Special notes for your reviewer: