Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(balancer) async initialization of balancer objects #3187

Merged
merged 2 commits into from
Feb 14, 2018

Conversation

hishamhm
Copy link
Contributor

@hishamhm hishamhm commented Jan 29, 2018

When upstreams include targets using hostnames instead of IPs, the initialization of balancer objects needs to perform DNS resolution, which involves opening a socket, which is not available in the init_worker phase.

This patch delays balancer initialization to the access phase.

Updated: This patch performs balancer initialization in a timer, which makes initialization asynchronous to worker initialization. For this reason, it adds a locking system to create_balancer to avoid race conditions when requests happen while the balancers are being initialized.

The locking system uses a simplified version of the resty.lock logic which does
not use the shm, since balancers are per-worker and we are concerned here only
with intra-worker concurrency control.

@Tieske
Copy link
Member

Tieske commented Jan 29, 2018

this test runs on every request. Would it be possible to use a timer, started from init_worker to do the initialization?

@hishamhm
Copy link
Contributor Author

@Tieske can one ensure that the timer will be serviced before the first access? I don't want to risk it ever running in an inconsistent state.

@Tieske
Copy link
Member

Tieske commented Jan 31, 2018

if the timer is set at 0, it will run immediately

@hishamhm hishamhm force-pushed the fix/balancer-initialization branch from b4efc00 to 096c7b5 Compare February 6, 2018 18:27
Copy link
Member

@thibaultcha thibaultcha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with the approach in general, just left a few concerns!

if not balancer_initialized then
local ok, err = balancer.init()
if not ok then
log(ngx.ERR, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add some context to this error message, like "failed to initialize balancers: " ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it might be worth to bump this to an EMERG message (instead of ERR)? Error generally is used for something that impacts the execution of a particular request. An error such as this one has much wider implications to many of the subsequent requests that would be made.

@@ -616,11 +637,16 @@ local function init()
if ok ~= nil then
oks = oks + 1
else
log(ngx.STDERR, "failed creating balancer for ", name, ": ", err)
log(ngx.ERR, "failed creating balancer for ", name, ": ", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe my below comment was the reason why this was an STDERR level (probably best to use CRIT though)

@@ -27,6 +28,7 @@ local _load_targets_into_memory

-- table holding our balancer objects, indexed by upstream name
local balancers = {}
local balancers_initialized = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be sweet to declare this in a do end block wrapping the below init function?

return true
end

local lock, err = resty_lock:new("kong", { timeout = 3600 })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this timeout a bit excessive? Something like 30 might be safer? Is there a particular intent why it is so high? Anyway, it doesn't make much sense for it to be higher than the default exptime (which is 30s), and actually, it would be modified to 30 right now, see https://github.com/openresty/lua-resty-lock/blob/master/lib/resty/lock.lua#L104

@hishamhm hishamhm force-pushed the fix/balancer-initialization branch 4 times, most recently from 4c672d5 to b0346d6 Compare February 8, 2018 17:32
@hishamhm hishamhm changed the title fix(balancer) delay initialization of balancer objects fix(balancer) async initialization of balancer objects Feb 8, 2018
@hishamhm hishamhm force-pushed the fix/balancer-initialization branch 2 times, most recently from f413715 to 7c65baa Compare February 8, 2018 18:15
@hishamhm hishamhm force-pushed the fix/balancer-initialization branch from 7c65baa to 4afa415 Compare February 8, 2018 18:41
Copy link
Member

@thibaultcha thibaultcha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall approach looks good to me - I am of the opinion that we propose this patch to @aaronhmiller so that he can give it a try! :)

@@ -470,7 +517,7 @@ local function get_balancer(target, no_create)
return nil, "balancer not found"
else
log(ERR, "balancer not found for ", upstream.name, ", will create it")
return create_balancer(upstream)
return create_balancer(upstream), upstream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dammit, seems like this already was an issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it wasn't -- I mean, it was with regard to the function signature but it didn't happen because the get_balancer -> create_balancer code path was never hit when all balancers were created at startup. Now that it is async and get_balancer can run before init finishes running, then this code path can happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My one concern here is that this can be confusing for users of get_balancer and increase cognitive load: create_balancer can return 1 or 2 values, making get_balancer return 1, 2, or 3 values. Usually, we deal with this by catching errors like:

local balancer, err = create_balancer(upstream)
if not balancer then
  return nil, err
end

return balancer, upstream -- still somewhat annoying to me, I prefer `balancer, nil, upstream` but up to you!

if ok then
return balancers[upstream.id]
else
return nil, "timeout waiting for balancer for " .. upstream.id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: this goes against our code style which instructs to avoid subsequent branches when the previous one already returns :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok, I had read the style doc differently! my vague recollection is that it instructed to return early on the ground of performing validations and avoiding unneeded computation -- since there's no difference in computation here I thought it didn't make a difference here (I tend to find this more readable because it reads as "or else, fail!" instead of "the normal code flow is to fail" -- I'll just invert the test condition)

will change it, no probs! :)

@@ -289,9 +293,50 @@ do
end
end

local creating = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we scope this and other create_balancer dependencies under a do ... end block? It'd be a little bit saner/easier to skim through imho

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already inside a do-block and far down so it's only visible to the two relevant functions that need it. Can we leave this one as-is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I missed it; quite large for do block, but that's fine!

local upstreams, err = get_all_upstreams()
if not upstreams then
log(ngx.STDERR, "failed loading initial list of upstreams: ", err)
return
return nil, "failed loading initial list of upstreams: " .. err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we don't log this returned error in our balancer.init() call in init_worker? Maybe we should keep this as a CRIT log + early return?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok! 👍

When upstreams include targets using hostnames instead of IPs, the
initialization of balancer objects needs to perform DNS resolution, which
involves opening a socket, which is not available in the init_worker phase.

This patch performs balancer initialization in a timer, which makes
initialization asynchronous to worker initialization. For this reason,
it adds a locking system to create_balancer to avoid race conditions
when requests happen while the balancers are being initialized.

The locking system uses a simplified version of `resty.lock` which does
not use the shm, since balancers are per-worker and we are concerned only
with intra-worker concurrency control.
@hishamhm hishamhm force-pushed the fix/balancer-initialization branch from 4afa415 to 1d5a10e Compare February 14, 2018 13:33
local upstreams, err = get_all_upstreams()
if not upstreams then
log(ngx.STDERR, "failed loading initial list of upstreams: ", err)
log(CRIT, "failed loading initial list of upstreams: " .. err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but better not make use of Lua-land string concatenation in those error messages, and rely on the variadic arguments of ngx.log instead

@@ -470,7 +517,7 @@ local function get_balancer(target, no_create)
return nil, "balancer not found"
else
log(ERR, "balancer not found for ", upstream.name, ", will create it")
return create_balancer(upstream)
return create_balancer(upstream), upstream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My one concern here is that this can be confusing for users of get_balancer and increase cognitive load: create_balancer can return 1 or 2 values, making get_balancer return 1, 2, or 3 values. Usually, we deal with this by catching errors like:

local balancer, err = create_balancer(upstream)
if not balancer then
  return nil, err
end

return balancer, upstream -- still somewhat annoying to me, I prefer `balancer, nil, upstream` but up to you!

@hishamhm hishamhm merged commit 1d63abe into master Feb 14, 2018
@hishamhm hishamhm deleted the fix/balancer-initialization branch February 14, 2018 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants