-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store hook secret in shared dict, don't overwrite it on reload #68
Conversation
The testing framework really doesn't like it when the server gets reloaded during a test. Might be very tricky to get coverage on this one.. |
Instead of actually reloading the server, I now just call Now that this is finished and tested, I'd be happy to hear your feedback on this, @GUI! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooof, sorry about that AUTO_SSL_HOOK_SECRET
global variable change having this unintended consequence during reloads. But thank you very much for tracking this down and for this pull request!
I had thought about a similar approach with a second shared dict, but I had avoided it when I initially updated things (back in 274b160) just to try and keep things backwards compatible (so users wouldn't need to worry about adding the second shared dict configuration). But I think this second shared dict solution is probably best, we'll just want to flag it in the upgrade notes (and the error handling you added will also be helpful).
I think all of this looks great, and your approach to testing and triggering init()
manually seems good (Test::Nginx can be a little weird and not flexible for these type of tests, but at least there's tests!).
I just left a few comments for some very small tweaks, but let me know if you have any other thoughts or different opinions. I'm also happy to make those tweaks after merging if they sound good to you.
Thanks again!
.gitignore
Outdated
/t/tmp | ||
/t/logs | ||
/t/vendor | ||
/t/servroot* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I noted I don't think these gitignore changes should be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
README.md
Outdated
# The "auto_ssl" shared dict is used to temporarily store various settings | ||
# like the secret used by the hook server on port 8999. Do not change or | ||
# omit it. | ||
lua_shared_dict auto_ssl_settings 1m; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be inclined to change this recommended size down to something like 64k
. While nobody's probably going to miss 1MB of RAM on a modern server, since this settings dict is only storing a single secret token (64 bytes long) and a single boolean value, even 64KB gives us plenty of space in case we want to store other things in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
lib/resty/auto-ssl/init_master.lua
Outdated
-- Generate the secret token. | ||
local random = resty_random.bytes(32) | ||
AUTO_SSL_HOOK_SECRET = str.to_hex(random) | ||
ngx.shared.auto_ssl_settings:set("hook_server:secret", str.to_hex(random)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the issues we saw earlier with not a lot of error logging when shard dicts run out of memory, I've just been trying to wrap any :set
with additional error logging, like in 3eb12de
So maybe something like:
local _, set_err, set_forcible = ngx.shared.auto_ssl_settings:set("hook_server:secret", str.to_hex(random))
if set_err then
ngx.log(ngx.ERR, "auto-ssl: failed to set shdict for hook_server:secret: ", set_err)
elseif set_forcible then
ngx.log(ngx.ERR, "auto-ssl: 'lua_shared_dict auto_ssl_settings' might be too small - consider increasing its configured size (old entries were removed while adding hook_server:secret)")
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
lib/resty/auto-ssl/init_master.lua
Outdated
@@ -82,6 +89,10 @@ local function setup_storage(auto_ssl_instance) | |||
end | |||
|
|||
return function(auto_ssl_instance) | |||
if not ngx.shared.auto_ssl_settings then | |||
ngx.log(ngx.ERR, "auto-ssl: dict auto_ssl_settings could not be found. Please add it to your configuration: `lua_shared_dict auto_ssl_settings 1m;`") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This should help with upgrades. I'd just make sure this size in this recommendation aligns with whatever the README says (which I had proposed lowering to 64k).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:white
@@ -5,7 +5,7 @@ local function start() | |||
local exit_status = os.execute("umask 0022 && " .. auto_ssl.lua_root .. "/bin/resty-auto-ssl/start_sockproc") | |||
-- Lua 5.2+ returns boolean. Prior versions return status code. | |||
if exit_status == 0 or exit_status == true then | |||
local _, set_err, set_forcible = ngx.shared.auto_ssl:set("sockproc_started", true) | |||
local _, set_err, set_forcible = ngx.shared.auto_ssl_settings:set("sockproc_started", true) | |||
if set_err then | |||
ngx.log(ngx.ERR, "auto-ssl: failed to set shdict for sockproc_started: ", set_err) | |||
elseif set_forcible then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the error message below, I think we'll just want to updated it to reference the auto_ssl_settings
dict (instead of auto_ssl
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
This dictionary will be used for various internal values like `hook_server:secret` or `sockproc_started`. They must survive server reloads and can thus not be stored in global variables.
Thanks for your notes! The changes you requested are now in this PR. Just let me know if you need anything else. :)
Of course I'd prefer a backwards compatible solution as well. But I just could not think of a clean one. If you come up with one, just let me know. I'm happy to change this around, so it's usable for everyone without much hassle. |
Thanks! I'll try to get this wrapped up in the next release later this week. |
Great. Thank you! |
Sorry for the delay in getting a release out, I've been a bit swamped these past few weeks. (Also a big thanks for all the contributions and helping out in responding to issues. Let me know if you'd be interested in commit access on this repo, and I'd be happy to set you up.) I actually had another thought about a way to do this in a backwards compatible way. Instead of splitting things out into a separate shdict for the certs versus settings, we could use The primary difference in behavior with this approach would be that if you ran out of configured memory in the shdict, it would be a hard error, and new certificates could not be stored in memory (but old certs would always remain in place, rather than being forced out of memory). I think originally I had thought it might be beneficial for less frequently used certs to automatically get pushed out of memory if newer certs came along, but in reality that might not be the best approach. While pushing less frequently used certs out of memory might allow things to continue to work if you run out of configured memory, performance will likely degrade (since different domains might be fighting over what's in memory), and this might not be very obvious to a system admin. So a hard fail for new certificates might be more obvious to admins (in which case bumping the memory is a straightforward task), and it has more predictable performance guarantees. So in any case, do you have any thoughts about this? I'm happy to put together a pull request using safe_set, if you want to see how that would look. Splitting the settings out into a separate shdict still seems like it makes some sense, and I'm not too worried about the backwards compatibility issue (adding the second shdict is quite easy), so I think we could still go with the changes you have in this PR. But after running into some shdict memroy issues on another project last week, it had me rethinking the usage of |
No worries. We've got a workaround very specific to our setup in place, so there is no real rush.
That would certainly help with closing duplicate issues and so on. I'd still just submit pull requests for non-trivial changes.
I view the shared dict as a cache. If that cache is not big enough, make it bigger. If that cache can not be made bigger, because you're simply running out of RAM, you'll need a architectural changes anyway. If someone has enough certificates so that a shared dict (no matter which size) is no longer feasible to store them, I'd opt of storing the certificates in redis, mongodb or wherever. A miss in this configuration would not be that expensive, since no direct file access is needed. Since there are already warning messages when certificates get pushed out of the cache, recognizing the situation should not be too hard. |
Yup, good points. I've updated things to use And sorry again for the delay in getting this released. For some reasons the tests on CircleCI have been periodically failing with the more recent commits, and I'm not sure why (the tests seem to run fine locally). The tests don't always fail, and I'm not entirely sure if it's related to the more recent commits, but I'm going to try to sort out what's happening in the CI environment this week and then get things released. (And you should have an invite for push access. Thanks again for your help and contributions!) |
If the `auto_ssl` shared dict ran out of memory, and then nginx were reloaded, then a race condition existed where multiple "sockproc" processes could try to be started at the same time. While I think this situation would be unlikely to affect a production system in a negative way (since the race condition only occurs during nginx reloads and 1 of the sockproc processes should still succeed and allow things to work), it did lead to some errors being logged, which would intermittently cause our tests to fail (it would crop up on the test following our t/memory.t test, since that next test would be the first reload following our test to explicitly exhaust the memory). This is fixed by using the `auto_ssl_settings` shared dict for storing the resty-lock details (the lock prevents multiple processes from being started at once). This smaller shared dict (introduced in #68) is used for storing bits of data that won't grow in size so we can better ensure the data will never be evicted from the cache. I'm now able to repeatedly run the test suite in a loop without hitting this edge case. Note that we are still using the `auto_ssl` shared dict for storing resty-lock details for domain registrations, since the memory requirements for that may grow (since there's a lock per domain, it's dynamic in size). But that should be okay, because similar to the SSL certs stored in `auto_ssl`, we're okay with cache evictions for old data in those cases (along with warnings being logged). Also possibly relevant is that currently resty-lock always uses `add` for the shared dict (so it will evict old data to make room if necessary), but there's a pull request for allowing use of `safe_add`: openresty/lua-resty-lock#6 While we should be okay by switching things to `auto_ssl_settings` (since we should never have enough stored data in there to need evicting old data), it's something to keep an eye on. Related to: - #68 (comment) - #48 (comment)
Apologies (again) for the delay, but thanks for your patience. v0.11.0 has been released with these changes, and I noted the need for the extra shared dict in the upgrade notes. Thanks! |
Thank you! |
See #66 for details.