Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store hook secret in shared dict, don't overwrite it on reload #68

Merged
merged 4 commits into from
May 15, 2017

Conversation

luto
Copy link
Collaborator

@luto luto commented May 10, 2017

See #66 for details.

@luto
Copy link
Collaborator Author

luto commented May 10, 2017

The testing framework really doesn't like it when the server gets reloaded during a test. Might be very tricky to get coverage on this one..

@luto
Copy link
Collaborator Author

luto commented May 10, 2017

Instead of actually reloading the server, I now just call auto_ssl:init() again. I hope that's a fair compromise.

Now that this is finished and tested, I'd be happy to hear your feedback on this, @GUI! :)

Copy link
Collaborator

@GUI GUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooof, sorry about that AUTO_SSL_HOOK_SECRET global variable change having this unintended consequence during reloads. But thank you very much for tracking this down and for this pull request!

I had thought about a similar approach with a second shared dict, but I had avoided it when I initially updated things (back in 274b160) just to try and keep things backwards compatible (so users wouldn't need to worry about adding the second shared dict configuration). But I think this second shared dict solution is probably best, we'll just want to flag it in the upgrade notes (and the error handling you added will also be helpful).

I think all of this looks great, and your approach to testing and triggering init() manually seems good (Test::Nginx can be a little weird and not flexible for these type of tests, but at least there's tests!).

I just left a few comments for some very small tweaks, but let me know if you have any other thoughts or different opinions. I'm also happy to make those tweaks after merging if they sound good to you.

Thanks again!

.gitignore Outdated
/t/tmp
/t/logs
/t/vendor
/t/servroot*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I noted I don't think these gitignore changes should be needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README.md Outdated
# The "auto_ssl" shared dict is used to temporarily store various settings
# like the secret used by the hook server on port 8999. Do not change or
# omit it.
lua_shared_dict auto_ssl_settings 1m;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be inclined to change this recommended size down to something like 64k. While nobody's probably going to miss 1MB of RAM on a modern server, since this settings dict is only storing a single secret token (64 bytes long) and a single boolean value, even 64KB gives us plenty of space in case we want to store other things in here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- Generate the secret token.
local random = resty_random.bytes(32)
AUTO_SSL_HOOK_SECRET = str.to_hex(random)
ngx.shared.auto_ssl_settings:set("hook_server:secret", str.to_hex(random))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the issues we saw earlier with not a lot of error logging when shard dicts run out of memory, I've just been trying to wrap any :set with additional error logging, like in 3eb12de

So maybe something like:

local _, set_err, set_forcible = ngx.shared.auto_ssl_settings:set("hook_server:secret", str.to_hex(random))
if set_err then
  ngx.log(ngx.ERR, "auto-ssl: failed to set shdict for hook_server:secret: ", set_err)
elseif set_forcible then
  ngx.log(ngx.ERR, "auto-ssl: 'lua_shared_dict auto_ssl_settings' might be too small - consider increasing its configured size (old entries were removed while adding hook_server:secret)")
end

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -82,6 +89,10 @@ local function setup_storage(auto_ssl_instance)
end

return function(auto_ssl_instance)
if not ngx.shared.auto_ssl_settings then
ngx.log(ngx.ERR, "auto-ssl: dict auto_ssl_settings could not be found. Please add it to your configuration: `lua_shared_dict auto_ssl_settings 1m;`")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This should help with upgrades. I'd just make sure this size in this recommendation aligns with whatever the README says (which I had proposed lowering to 64k).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:white

@@ -5,7 +5,7 @@ local function start()
local exit_status = os.execute("umask 0022 && " .. auto_ssl.lua_root .. "/bin/resty-auto-ssl/start_sockproc")
-- Lua 5.2+ returns boolean. Prior versions return status code.
if exit_status == 0 or exit_status == true then
local _, set_err, set_forcible = ngx.shared.auto_ssl:set("sockproc_started", true)
local _, set_err, set_forcible = ngx.shared.auto_ssl_settings:set("sockproc_started", true)
if set_err then
ngx.log(ngx.ERR, "auto-ssl: failed to set shdict for sockproc_started: ", set_err)
elseif set_forcible then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the error message below, I think we'll just want to updated it to reference the auto_ssl_settings dict (instead of auto_ssl).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

luto added 4 commits May 11, 2017 11:30
This dictionary will be used for various internal values like
`hook_server:secret` or `sockproc_started`. They must survive server
reloads and can thus not be stored in global variables.
@luto
Copy link
Collaborator Author

luto commented May 11, 2017

Thanks for your notes! The changes you requested are now in this PR. Just let me know if you need anything else. :)

I had thought about a similar approach with a second shared dict, but I had avoided it when I initially updated things (back in 274b160) just to try and keep things backwards compatible (so users wouldn't need to worry about adding the second shared dict configuration).

Of course I'd prefer a backwards compatible solution as well. But I just could not think of a clean one. If you come up with one, just let me know. I'm happy to change this around, so it's usable for everyone without much hassle.

@GUI GUI merged commit 15b180d into auto-ssl:master May 15, 2017
@GUI
Copy link
Collaborator

GUI commented May 15, 2017

Thanks! I'll try to get this wrapped up in the next release later this week.

@luto
Copy link
Collaborator Author

luto commented May 15, 2017

Great. Thank you!

@luto luto deleted the fix-66 branch May 15, 2017 09:36
@GUI
Copy link
Collaborator

GUI commented May 24, 2017

Sorry for the delay in getting a release out, I've been a bit swamped these past few weeks.

(Also a big thanks for all the contributions and helping out in responding to issues. Let me know if you'd be interested in commit access on this repo, and I'd be happy to set you up.)

I actually had another thought about a way to do this in a backwards compatible way. Instead of splitting things out into a separate shdict for the certs versus settings, we could use safe_set everywhere on the existing shdict. This would allow us to keep everything in a single shdict (both settings and certs) without having to worry about the settings unexpectedly getting pushed out when things run low on memory.

The primary difference in behavior with this approach would be that if you ran out of configured memory in the shdict, it would be a hard error, and new certificates could not be stored in memory (but old certs would always remain in place, rather than being forced out of memory). I think originally I had thought it might be beneficial for less frequently used certs to automatically get pushed out of memory if newer certs came along, but in reality that might not be the best approach. While pushing less frequently used certs out of memory might allow things to continue to work if you run out of configured memory, performance will likely degrade (since different domains might be fighting over what's in memory), and this might not be very obvious to a system admin. So a hard fail for new certificates might be more obvious to admins (in which case bumping the memory is a straightforward task), and it has more predictable performance guarantees.

So in any case, do you have any thoughts about this? I'm happy to put together a pull request using safe_set, if you want to see how that would look.

Splitting the settings out into a separate shdict still seems like it makes some sense, and I'm not too worried about the backwards compatibility issue (adding the second shdict is quite easy), so I think we could still go with the changes you have in this PR. But after running into some shdict memroy issues on another project last week, it had me rethinking the usage of safe_set, and I was wondering if that might be a better approach here. But I'd be interested in your take on it. Thanks!

@luto
Copy link
Collaborator Author

luto commented May 24, 2017

Sorry for the delay in getting a release out, I've been a bit swamped these past few weeks.

No worries. We've got a workaround very specific to our setup in place, so there is no real rush.

(Also a big thanks for all the contributions and helping out in responding to issues. Let me know if you'd be interested in commit access on this repo, and I'd be happy to set you up.)

That would certainly help with closing duplicate issues and so on. I'd still just submit pull requests for non-trivial changes.

we could use safe_set everywhere on the existing shdict. This would allow us to keep everything in a single shdict (both settings and certs) without having to worry about the settings unexpectedly getting pushed out when things run low on memory.
(...)
So in any case, do you have any thoughts about this? I'm happy to put together a pull request using safe_set, if you want to see how that would look.

I view the shared dict as a cache. If that cache is not big enough, make it bigger. If that cache can not be made bigger, because you're simply running out of RAM, you'll need a architectural changes anyway. If someone has enough certificates so that a shared dict (no matter which size) is no longer feasible to store them, I'd opt of storing the certificates in redis, mongodb or wherever. A miss in this configuration would not be that expensive, since no direct file access is needed.

Since there are already warning messages when certificates get pushed out of the cache, recognizing the situation should not be too hard.

@GUI
Copy link
Collaborator

GUI commented May 31, 2017

Yup, good points. I've updated things to use safe_set when interacting with the new auto_ssl_settings shared dict (since items should really never be removed from this one), but continue using set for the auto_ssl shared dict (since it is more of a cache, like you mentioned, so it's okay if things get evicted when full).

And sorry again for the delay in getting this released. For some reasons the tests on CircleCI have been periodically failing with the more recent commits, and I'm not sure why (the tests seem to run fine locally). The tests don't always fail, and I'm not entirely sure if it's related to the more recent commits, but I'm going to try to sort out what's happening in the CI environment this week and then get things released.

(And you should have an invite for push access. Thanks again for your help and contributions!)

GUI added a commit that referenced this pull request Jun 18, 2017
If the `auto_ssl` shared dict ran out of memory, and then nginx were
reloaded, then a race condition existed where multiple "sockproc"
processes could try to be started at the same time.

While I think this situation would be unlikely to affect a production
system in a negative way (since the race condition only occurs during
nginx reloads and 1 of the sockproc processes should still succeed
and allow things to work), it did lead to some errors being logged,
which would intermittently cause our tests to fail (it would crop up on
the test following our t/memory.t test, since that next test would be
the first reload following our test to explicitly exhaust the memory).

This is fixed by using the `auto_ssl_settings` shared dict for storing
the resty-lock details (the lock prevents multiple processes from being
started at once). This smaller shared dict (introduced in #68) is used
for storing bits of data that won't grow in size so we can better ensure
the data will never be evicted from the cache. I'm now able to
repeatedly run the test suite in a loop without hitting this edge case.

Note that we are still using the `auto_ssl` shared dict for storing
resty-lock details for domain registrations, since the memory
requirements for that may grow (since there's a lock per domain, it's
dynamic in size). But that should be okay, because similar to the SSL
certs stored in `auto_ssl`, we're okay with cache evictions for old data
in those cases (along with warnings being logged).

Also possibly relevant is that currently resty-lock always uses `add`
for the shared dict (so it will evict old data to make room if
necessary), but there's a pull request for allowing use of `safe_add`:
openresty/lua-resty-lock#6 While we should be
okay by switching things to `auto_ssl_settings` (since we should never
have enough stored data in there to need evicting old data), it's
something to keep an eye on.

Related to:
- #68 (comment)
- #48 (comment)
@GUI GUI added this to the v0.11.0 milestone Jun 19, 2017
@GUI
Copy link
Collaborator

GUI commented Jun 19, 2017

Apologies (again) for the delay, but thanks for your patience. v0.11.0 has been released with these changes, and I noted the need for the extra shared dict in the upgrade notes.

Thanks!

@luto
Copy link
Collaborator Author

luto commented Jun 19, 2017

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants