-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve service discovery to only reload checks that need it #2702
Conversation
ef2c7af
to
9d3d27b
Compare
@@ -200,7 +200,7 @@ def get_hostname(config=None): | |||
return gce_hostname | |||
|
|||
# Try to get the docker hostname | |||
docker_util = DockerUtil() | |||
docker_util = DockerUtil(agentConfig=config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing agentConfig
doesn't change anything here, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's used in DockerUtil to get the instance of the config_store. We could pass a subset of it instead since we basically just need the sd_config_backend
field, but that would be more confusing than useful imho.
Added in a few comments, but looks nice overall! 📈 |
2fae878
to
60933bd
Compare
Thanks @olivielpeau, that was a kick-ass review! |
60933bd
to
ebf1fd9
Compare
Responded to one of your comments, but other than that LGTM Feel free to merge once that's addressed 🎆 |
👍 looking good, feel free to merge! (once the commit history is cleaned up a bit :) ) |
The `res.etcd_index` value that was previously used is the `X-Etcd-Index` header of etcd's http response. According to etcd's docs this index is the current index of the whole cluster, so any modification on any key increments this value. Using it as a reference to determine if the datastore has changed would trigger a config reload on every modification in the datastore (whereas we want a config reload only when the `sd_template_dir` changes). To fix this, we parse the `modifiedIndex` of all the returned keys instead, and we take the max index as the correct index.
7441f84
to
98c6b40
Compare
It's failing on a cassandra timeout. Merging as it's unrelated. |
What does this PR do?
It makes configuration reload smarter.
The gist of it is that the agent now maintains a mapping
container_identifier <--> checks
and only reloads checks for which a docker event associated with an identifier in this mapping happened since the last collector run.Motivation
In environments where a lot of docker events happen, full config reload on any container event is not a good design. We noticed several cases where the agent triggered a reload at each run, making rate collection impossible (calculating rates requires values from several runs, and these are forgotten on each check reload).
Individual reload is a better option.
Additional notes
In more details:
identifier_to_checks
cache is built based on the content of the config store and theauto_conf
folder. This cache is invalidated and refreshed when a template is modified in the K/V store.SIGHUP
is received by the collector or the agent detects a change in the config store.auto_conf
files are not detected yet so an agent restart or a manualSIGHUP
is still needed here.load_check
function that does the same thing asload_check_directory
but for a single check which name was passed as a third parameter.