-
-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(api): Do not log healthcheck error if downfile exists #6635
Conversation
Kubernetes tells the old snuba pods to mark itself as unhealthy, so that envoy stops sending traffic to it. Snuba reports itself as unhealthy to Sentry. Gocd checks Sentry for new errors. Gocd stalls the deployment because there are new errors.
snuba/utils/health_info.py
Outdated
if not down_file_exists: | ||
logger.error(f"Snuba health check failed! Tags: {metric_tags}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just info log it if the down file exists, it's useful log info, just not an error
I found this additional health_envoy check. If we already check the downfile for envoy this way, what other services need to react to the downfile? Could they use the same endpoint instead? Lines 212 to 213 in 120c358
|
@@ -107,7 +107,10 @@ def get_health_info(thorough: Union[bool, str]) -> HealthInfo: | |||
payload = json.dumps(body) | |||
if status != 200: | |||
metrics.increment("healthcheck_failed", tags=metric_tags) | |||
logger.error(f"Snuba health check failed! Tags: {metric_tags}") | |||
if down_file_exists: | |||
logger.error("Snuba health check failed! Tags: %s", metric_tags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving away from f-strings to logger-native string formatting. The reason we got a new issue at all is because we were interpolating the string before logging. Now we only get one issue instead of N for every possible tag combination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works because then it's not a "new" issue in Sentry, is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's the goal. now it's grouping by the literal string "Snuba health check failed! Tags: %s"
snuba/utils/health_info.py
Outdated
if down_file_exists: | ||
logger.error("Snuba health check failed! Tags: %s", metric_tags) | ||
else: | ||
logger.info("Snuba health check failed! Tags: %s", metric_tags) | ||
|
||
if status != 200 or down_file_exists: | ||
logger.info(payload) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@untitaker do we need this log then? if you added the check above?
Kubernetes tells the old snuba pods to mark itself as unhealthy, so that
envoy stops sending traffic to it.
Snuba reports itself as unhealthy to Sentry.
Gocd checks Sentry for new errors.
Gocd stalls the deployment because there are new errors.