Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log results of cluster health check #1203

Merged

Conversation

danielmitterdorfer
Copy link
Member

When the cluster health check fails e.g. due to invalid setup where we
check for a green cluster on a single node but indices have been
configured with replicas, the benchmark appears to hang indefinitely.
The reason is that the cluster health check is retried by default but
there is no indication in the logs.

With this commit we log the result of the cluster health check. We also
raise the log level of retry log messages so users can analyze the
behavior by inspecting logs.

Closes #1150

When the cluster health check fails e.g. due to invalid setup where we
check for a green cluster on a single node but indices have been
configured with replicas, the benchmark appears to hang indefinitely.
The reason is that the cluster health check is retried by default but
there is no indication in the logs.

With this commit we log the result of the cluster health check. We also
raise the log level of retry log messages so users can analyze the
behavior by inspecting logs.

Closes elastic#1150
@danielmitterdorfer danielmitterdorfer added enhancement Improves the status quo :Usability Makes Rally easier to use labels Mar 9, 2021
@danielmitterdorfer danielmitterdorfer added this to the 2.1.0 milestone Mar 9, 2021
@danielmitterdorfer danielmitterdorfer self-assigned this Mar 9, 2021
@danielmitterdorfer
Copy link
Member Author

Here are some examples of the behavior from my local testing:

In the success case we see in the logs:

PID:10002 esrally.driver.runner INFO cluster-health: expected status=[green], actual status=[green], relocating shards=[0], success=[True].

When the cluster health status is not reached, we get:

PID:11154 esrally.driver.runner INFO [cluster-health] has timed out. Retrying in [0.50] seconds.

and additionally, the Python client issues a warning right before the log line above:

PID:11154 elasticsearch WARNING GET http://127.0.0.1:39200/_cluster/health/logs-*?wait_for_status=green&wait_for_no_relocating_shards=true [status:408 request:30.006s]

Copy link
Contributor

@DJRickyB DJRickyB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thank you. Upping the log level for retry is a good idea

@danielmitterdorfer
Copy link
Member Author

Thanks for the review!

@danielmitterdorfer danielmitterdorfer merged commit 5d93d2b into elastic:master Mar 9, 2021
@danielmitterdorfer danielmitterdorfer deleted the log-cluster-health branch March 9, 2021 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo :Usability Makes Rally easier to use
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Log results of cluster health check
2 participants