-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docker_deamon] docker healthcheck as service checks. #2859
Conversation
dfb3fd3
to
905c881
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 minor comments about the config, otherwise LGTM
# Report docker container healthcheck events as service checks | ||
# Note: enabling this option modifies the way in which we inspect the containers and causes | ||
# some overhead - if you run a high volume of containers we may timeout. Please only | ||
# enable if absolutely necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove the last sentence. The slow part of inspecting a container is the lock acquisition over said container, and this is also done in docker ps
, so we may timeout the check but it could also happen without the inspect part.
Same for the custom_cgroups
option actually, the last sentence should be dropped.
# Report docker container healthcheck events as service checks | ||
# Note: enabling this option modifies the way in which we inspect the containers and causes | ||
# some overhead - if you run a high volume of containers we may timeout. Please only | ||
# enable if absolutely necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should also be a mention that this option is available starting from Docker 1.12 and enabling it with an older version will result in an UNKNOWN state for the service check. It should be obvious but I'd rather be extra explicit here.
32d4345
to
99bedbe
Compare
Actually I wonder if we shouldn't ask for a list of container name instead and only enable service checks for those listed. This could get spammy, and health checks in dynamic clusters where containers cease to exist and appear very often this may not have a lot of value. What do you think? |
@@ -268,6 +272,11 @@ def check(self, instance): | |||
if self.collect_disk_stats: | |||
self._report_disk_stats() | |||
|
|||
# Report docker healthcheck SC's where available | |||
if health_service_checks: | |||
health_scs_containers = instance.get('health_service_check_whitelist', []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter is not documented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry forgot to push.
@hkaj so I tested this out... my only lingering question would be. Should we make the whitelist by image name (current behavior) or container name or both? Taking into account tags or not? Should we support wildcards? Let me know your thoughts - as you can see right now it's a straight match. |
5ce326c
to
95b886c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@truthbk ideally I think we should support the same behavior as for include/exclude containers, so regex support and on all container tags.
But that'll require more work. Maybe we can ship a first iteration with regex support but only on the image? And then transition to all-tags support.
If we go this way I'd suggest using another name for the yaml field, something like healthcheck_image_whitelist
. This way when we improve it to support all tags we can use the new, more generic parameter healthcheck_whitelist
and slowly deprecate the old one.
Of course you can also go hard and support all tags from the start if you feel like it 😄
@@ -250,8 +251,11 @@ def check(self, instance): | |||
# containers running with custom cgroups? | |||
custom_cgroups = _is_affirmative(instance.get('custom_cgroups', False)) | |||
|
|||
# submit healtcheck service checks? | |||
health_service_checks = _is_affirmative(instance.get('health_service_checks', False)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need this parameter anymore. If the list is there, enable health checks, otherwise disable them. So you can look for the white list param directly, maybe check if it isn't empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once conflicts are fixed and tests pass
[docker_daemon] set proper tagging and SC name for healthcheck. [docker_daemon] put healtcheck sc behind feature flag - it requires inspect. [ci] fixing docker_daemon CI. [ci] fixing old setuptools setup_env issues. Bumping to newest version. [docker_daemon] adding test for container inspeciton logic. [docker_daemon] amending config yaml.
[docker_daemon] whitelist health service checks yaml update. [docker_daemon] whitelist is a set for O(1) average complexity.
3567ef2
to
8b1cc24
Compare
What does this PR do?
This PR allows us to send service checks according to the healthcheck reported by docker for containers. This is a new feature introduced with Docker
1.12
, and not all containers will implement it. Currently we will send anUNKNOWN
status in that case. Maybe we should only submit the service check when available.Because grabbing the health status requires inspecting the containers, and that's a potentially expensive call, for now this feature will be hid behind a feature flag
health_service_checks
.Also added a simple test. More work could be done on this front.
Motivation
Provides yet another point of view regarding a container's health, as intended by the container's maintainers.
I have also noticed a couple CI/tooling issues - we can sneak them in here. Or address them in a separate PR which probably makes more sense. One of the issues affects contributors as there's a problem with the old
setuptools
andrake setup_env
.