Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve service discovery to only reload checks that need it #2702

Merged
merged 3 commits into from
Aug 19, 2016

Conversation

hkaj
Copy link
Member

@hkaj hkaj commented Jul 25, 2016

What does this PR do?

It makes configuration reload smarter.
The gist of it is that the agent now maintains a mapping container_identifier <--> checks and only reloads checks for which a docker event associated with an identifier in this mapping happened since the last collector run.

Motivation

In environments where a lot of docker events happen, full config reload on any container event is not a good design. We noticed several cases where the agent triggered a reload at each run, making rate collection impossible (calculating rates requires values from several runs, and these are forgotten on each check reload).
Individual reload is a better option.

Additional notes

In more details:

  • the identifier_to_checks cache is built based on the content of the config store and the auto_conf folder. This cache is invalidated and refreshed when a template is modified in the K/V store.
  • a full reload is triggered on two events: a SIGHUP is received by the collector or the agent detects a change in the config store.
  • Changes to auto_conf files are not detected yet so an agent restart or a manual SIGHUP is still needed here.
  • Incidentally this PR adds a load_check function that does the same thing as load_check_directory but for a single check which name was passed as a third parameter.

@hkaj hkaj force-pushed the haissam/fine-grained-reload branch 5 times, most recently from ef2c7af to 9d3d27b Compare July 28, 2016 08:53
@@ -200,7 +200,7 @@ def get_hostname(config=None):
return gce_hostname

# Try to get the docker hostname
docker_util = DockerUtil()
docker_util = DockerUtil(agentConfig=config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing agentConfig doesn't change anything here, or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's used in DockerUtil to get the instance of the config_store. We could pass a subset of it instead since we basically just need the sd_config_backend field, but that would be more confusing than useful imho.

@olivielpeau
Copy link
Member

Added in a few comments, but looks nice overall! 📈

@hkaj hkaj force-pushed the haissam/fine-grained-reload branch from 2fae878 to 60933bd Compare July 29, 2016 06:23
@hkaj
Copy link
Member Author

hkaj commented Jul 29, 2016

Thanks @olivielpeau, that was a kick-ass review!
Updated according to your comments.

@hkaj hkaj force-pushed the haissam/fine-grained-reload branch from 60933bd to ebf1fd9 Compare July 29, 2016 07:37
@hkaj hkaj added this to the 5.8.6 milestone Jul 29, 2016
@olivielpeau
Copy link
Member

Responded to one of your comments, but other than that LGTM

Feel free to merge once that's addressed 🎆

@olivielpeau
Copy link
Member

👍 looking good, feel free to merge! (once the commit history is cleaned up a bit :) )

hkaj and others added 3 commits August 19, 2016 16:27
The `res.etcd_index` value that was previously used is the
`X-Etcd-Index` header of etcd's http response.

According to etcd's docs this index is the current index of the whole
cluster, so any modification on any key increments this value.

Using it as a reference to determine if the datastore has changed
would trigger a config reload on every modification in the datastore
(whereas we want a config reload only when the `sd_template_dir`
changes).

To fix this, we parse the `modifiedIndex` of all the returned keys
instead, and we take the max index as the correct index.
@hkaj hkaj force-pushed the haissam/fine-grained-reload branch from 7441f84 to 98c6b40 Compare August 19, 2016 17:31
@hkaj
Copy link
Member Author

hkaj commented Aug 19, 2016

It's failing on a cassandra timeout. Merging as it's unrelated.

@hkaj hkaj merged commit 21a282f into master Aug 19, 2016
@hkaj hkaj deleted the haissam/fine-grained-reload branch August 23, 2016 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants