Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Service Discovery] create a template cache to reduce calls to the KV store #3060

Merged
merged 1 commit into from
Jan 25, 2017

Conversation

hkaj
Copy link
Member

@hkaj hkaj commented Dec 5, 2016

Note: Please remember to review the Datadog Contribution Guidelines
if you have not yet done so.

What does this PR do?

Create a cache for configuration templates

  • it keeps auto_configuration templates and templates coming from the KV store
  • it invalides the KV templates when the agent detects an update in the config store
  • to reload auto_conf or JMX templates we still need to restart the collector or send it a SIGHUP.

Minor clean ups and test updates

Motivation

Using this cache avoids querying the config store at every container update.

Testing Guidelines

Tests were updated accordingly. A custom docker image using this build is also available at the TODO tag.

@hkaj hkaj force-pushed the haissam/sd-cache-kv-tpls branch from 9122643 to 6defada Compare December 5, 2016 18:23
@hkaj hkaj added this to the 5.11.0 milestone Dec 5, 2016
@hkaj hkaj force-pushed the haissam/sd-cache-kv-tpls branch from 6defada to e3094d3 Compare December 6, 2016 14:34
@hkaj hkaj removed the checks label Dec 7, 2016
Copy link
Member

@truthbk truthbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good. Just a few minor comments/questions.

init_tpl = auto_conf.get('init_config') or {}
instance_tpl = auto_conf.get('instances', [])
if image not in auto_conf_images:
auto_conf_images[image] = [[check_name], [init_tpl], [instance_tpl]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it end up being less error-prone to have the init and instance configs/templates in a dict with check_name as the key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean like auto_conf_images[(image, check_name)] = [init_tpl, instance_tpl]?

Problem with that is if users have several instances of the same check for the same image (and a customer actually asked how to do that yesterday so it happens). In this case we still need to wrap init_tpl and instance_tpl in lists and end up with the same logic as we have now.

But maybe I didn't understand correctly your suggestion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I originally meant auto_conf_images[image] = {'checkname': {'init': init_tpl, 'instances: [intance_tpl_0, instance_tpl_1,...]}}

But then I read on, and I saw it wasn't necessarily a good idea given the current structure/logic. I felt it would make the code a little more maintainable (perhaps). We can maybe just keep in mind for the future (it'd be a pretty big refactor), there's nothing wrong with the current logic.

log.warning("Conflicting templates in auto_conf for image %s and check %s. "
"Please check your template files." % (image, check_name))
continue
self.auto_conf_templates[image][0].append(check_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, I'm not sure if this wouldn't be better as self.auto_conf_templates[image][check_name] = (init_tpl, instance_tpl)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I get it, we stick to lists for "compatibility" across the board when dealing with the configs. I think we could maybe refactor in the future, but it's a legit choice.

@@ -50,7 +50,7 @@ def get_check_class(agentConfig, check_name):
return None


def get_auto_conf(agentConfig, check_name):
def get_auto_conf(check_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like get_auto_conf() and get_auto_config() as names are a bit too close to each other...

# if _issue_read doesn't return anything we still create a key in the cache
# so that subsequent reads don't trigger issue_read every time
else:
self.kv_templates[identifier] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a NOOP but I noticed that we do include in the results templates[CONFIG_FROM_TEMPLATE] = self.kv_templates[identifier] even when self.kv_templates[identifier] is None if we have a hit, but we don't do that with a miss.

Just saying because it occured to me we could refactor this as:

if identifier not in self.kv_templates:
    tpls = self._issue_read(identifier)
    if tpls:
        templates[CONFIG_FROM_TEMPLATE] = tpls
        self.kv_templates[identifier] = tpls
    else:
        self.kv_templates[identifier] = None

templates[CONFIG_FROM_TEMPLATE] = self.kv_templates[identifier]

because we always populate after a hit or fetch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

man that snippet is on point! 👌

# lastly fallback to auto_conf
return set(self.identifier_to_checks[self._get_image_ident(identifier)])
# lastly fallback to legacy name for auto-conf
return self.template_cache.get_check_names(self._get_image_ident(identifier))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: can we have checks under the legacy name that should also be applied? So a case where identifier and the legacy self._get_image_ident(identifier) would both apply to an image?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! That's a use case we didn't support before (if user-defined templates were found we skipped auto config) but I added logic for that here and forgot to update this part. Will do it now.

@mock.patch.object(AbstractConfigStore, '_issue_read', side_effect=issue_read)
def test_read_config_from_store(self, issue_read):
@mock.patch.object(_TemplateCache, '_issue_read', side_effect=issue_read)
def test_read_config_from_store(self, args):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big 🙇 for all these awesome tests!

@hkaj hkaj force-pushed the haissam/sd-cache-kv-tpls branch from e3094d3 to 999f7bd Compare December 21, 2016 14:41
Copy link
Member

@truthbk truthbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see anything wrong with this! :shipit:

@hkaj hkaj force-pushed the haissam/sd-cache-kv-tpls branch 8 times, most recently from 7543301 to 2eaab69 Compare January 10, 2017 00:05
Copy link
Member

@truthbk truthbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of nits, but I approve! ;)

@@ -245,7 +245,7 @@ def run(self, config=None):
if self._agentConfig.get('service_discovery'):
self.sd_backend = get_sd_backend(self._agentConfig)

if _is_affirmative(self._agentConfig.get('sd_jmx_enable')):
if _is_affirmative(self._agentConfig.get('sd_jmx_enable', '')):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would probably be more readable as: _is_affirmative(self._agentConfig.get('sd_jmx_enable', False)).... Am I missing something?

log.exception('Failed to retrieve a template for %s.' % identifier)
# create a key in the cache even if _issue_read doesn't return a tpl
# so that subsequent reads don't trigger issue_read
self.kv_templates[identifier] = tpls or None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure you need the or None? tpls will already be something if self._issue_read(identifier) returns either a list or None, or the None we've set it in the except blocks.

@masci masci modified the milestones: 5.11.0, 5.12.0 Jan 24, 2017
This cache is invalidated when a template change is detected.
It doesn't detect changes to the auto_conf folder or JMX checks,
only to the KV store.
Refreshing configs from files still requires a SIGHUP or a restart.
@hkaj hkaj force-pushed the haissam/sd-cache-kv-tpls branch from 2eaab69 to 13d8a24 Compare January 25, 2017 16:28
@hkaj
Copy link
Member Author

hkaj commented Jan 25, 2017

Thanks @truthbk ! I addressed your comments and squashed the commits. Will merge once the tests pass 🎉

@hkaj hkaj merged commit df1a506 into master Jan 25, 2017
@hkaj hkaj deleted the haissam/sd-cache-kv-tpls branch January 25, 2017 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants