Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docker_deamon] docker healthcheck as service checks. #2859

Merged
merged 3 commits into from
Nov 21, 2016

Conversation

truthbk
Copy link
Member

@truthbk truthbk commented Sep 21, 2016

What does this PR do?

This PR allows us to send service checks according to the healthcheck reported by docker for containers. This is a new feature introduced with Docker 1.12, and not all containers will implement it. Currently we will send an UNKNOWN status in that case. Maybe we should only submit the service check when available.

Because grabbing the health status requires inspecting the containers, and that's a potentially expensive call, for now this feature will be hid behind a feature flag health_service_checks.

Also added a simple test. More work could be done on this front.

Motivation

Provides yet another point of view regarding a container's health, as intended by the container's maintainers.

I have also noticed a couple CI/tooling issues - we can sneak them in here. Or address them in a separate PR which probably makes more sense. One of the issues affects contributors as there's a problem with the old setuptools and rake setup_env.

@truthbk truthbk changed the title Jaime/docker health sc [docker_deamon] docker healthcheck as service checks. Sep 21, 2016
@truthbk truthbk force-pushed the jaime/docker_health_sc branch from dfb3fd3 to 905c881 Compare September 21, 2016 23:03
@truthbk truthbk added the checks label Sep 23, 2016
@truthbk truthbk added this to the 5.10.0 milestone Sep 23, 2016
Copy link
Member

@hkaj hkaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 minor comments about the config, otherwise LGTM

# Report docker container healthcheck events as service checks
# Note: enabling this option modifies the way in which we inspect the containers and causes
# some overhead - if you run a high volume of containers we may timeout. Please only
# enable if absolutely necessary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove the last sentence. The slow part of inspecting a container is the lock acquisition over said container, and this is also done in docker ps, so we may timeout the check but it could also happen without the inspect part.
Same for the custom_cgroups option actually, the last sentence should be dropped.

# Report docker container healthcheck events as service checks
# Note: enabling this option modifies the way in which we inspect the containers and causes
# some overhead - if you run a high volume of containers we may timeout. Please only
# enable if absolutely necessary.
Copy link
Member

@hkaj hkaj Oct 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should also be a mention that this option is available starting from Docker 1.12 and enabling it with an older version will result in an UNKNOWN state for the service check. It should be obvious but I'd rather be extra explicit here.

@truthbk truthbk force-pushed the jaime/docker_health_sc branch from 32d4345 to 99bedbe Compare October 14, 2016 14:12
@hkaj
Copy link
Member

hkaj commented Oct 14, 2016

Actually I wonder if we shouldn't ask for a list of container name instead and only enable service checks for those listed. This could get spammy, and health checks in dynamic clusters where containers cease to exist and appear very often this may not have a lot of value. What do you think?

@@ -268,6 +272,11 @@ def check(self, instance):
if self.collect_disk_stats:
self._report_disk_stats()

# Report docker healthcheck SC's where available
if health_service_checks:
health_scs_containers = instance.get('health_service_check_whitelist', [])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter is not documented

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry forgot to push.

@remh remh modified the milestones: 5.11.0, 5.10.0 Nov 2, 2016
@truthbk
Copy link
Member Author

truthbk commented Nov 10, 2016

@hkaj so I tested this out... my only lingering question would be. Should we make the whitelist by image name (current behavior) or container name or both? Taking into account tags or not? Should we support wildcards? Let me know your thoughts - as you can see right now it's a straight match.

@truthbk truthbk force-pushed the jaime/docker_health_sc branch from 5ce326c to 95b886c Compare November 10, 2016 18:57
Copy link
Member

@hkaj hkaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@truthbk ideally I think we should support the same behavior as for include/exclude containers, so regex support and on all container tags.
But that'll require more work. Maybe we can ship a first iteration with regex support but only on the image? And then transition to all-tags support.
If we go this way I'd suggest using another name for the yaml field, something like healthcheck_image_whitelist. This way when we improve it to support all tags we can use the new, more generic parameter healthcheck_whitelist and slowly deprecate the old one.
Of course you can also go hard and support all tags from the start if you feel like it 😄

@@ -250,8 +251,11 @@ def check(self, instance):
# containers running with custom cgroups?
custom_cgroups = _is_affirmative(instance.get('custom_cgroups', False))

# submit healtcheck service checks?
health_service_checks = _is_affirmative(instance.get('health_service_checks', False))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need this parameter anymore. If the list is there, enable health checks, otherwise disable them. So you can look for the white list param directly, maybe check if it isn't empty.

Copy link
Member

@hkaj hkaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once conflicts are fixed and tests pass

[docker_daemon] set proper tagging and SC name for healthcheck.

[docker_daemon] put healtcheck sc behind feature flag - it requires inspect.

[ci] fixing docker_daemon CI.

[ci] fixing old setuptools setup_env issues. Bumping to newest version.

[docker_daemon] adding test for container inspeciton logic.

[docker_daemon] amending config yaml.
[docker_daemon] whitelist health service checks yaml update.

[docker_daemon] whitelist is a set for O(1) average complexity.
@truthbk truthbk force-pushed the jaime/docker_health_sc branch from 3567ef2 to 8b1cc24 Compare November 18, 2016 20:53
@truthbk truthbk merged commit 35a8788 into master Nov 21, 2016
@truthbk truthbk deleted the jaime/docker_health_sc branch November 21, 2016 15:17
@masci masci modified the milestones: 5.11.0, 5.12.0 Jan 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants