-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Es shard level metrics #1752
Es shard level metrics #1752
Conversation
d7cf065
to
29d99cf
Compare
shard_role = "primary" | ||
elif count_replicas: | ||
replica_number += 1 | ||
shard_name = 'R' + i_str + '_{0}'.format(replica_number) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could write shard_name = 'R{0}_{1}'.format(i_str, replica_number)
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed :) Thanks.
e6d0b3b
to
a20a1dc
Compare
# The "shard_level_metrics" flag enables metrics and service checks on a per- | ||
# shard basis (all the information is fetched under the /_stats?level=shards | ||
# endpoint). The metrics and service check sent for each shard are named as | ||
# such: elasticsearch.shard.metric.name . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could add a line about how the metrics are tagged, so that people know how to graph per shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the one six lines below suitable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
a20a1dc
to
7dfdecf
Compare
@@ -297,3 +346,87 @@ def test_health_event(self): | |||
tags=['host:localhost', 'port:9200'], | |||
count=1 | |||
) | |||
|
|||
def test_pshard_metrics(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're testing for new metrics you should add a coverage_report at the end of this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good remark indeed @hkaj .
@hkaj Test coverage report has been added to the new tests. |
c6b7d99
to
13f580b
Compare
Awesome! You can rebase and 🚢 then. Congrats 🎉 |
Added statistics over primary shards only to our check. They're basically retrieved under the `_stats` endpoint and are aggregated metrics on the primary shards. It means that they don't take into account metrics from replica shards. So for instance `primaries.docs.count` will contain the total number of documents in the cluster without replicas. Computing replica's would be as simple as summing the "standard" document count metric over all nodes in the cluster. A test for those new metrics in particular has been added.
This adds shard-level metrics to elasticsearch. The feature is disabled by default. A flag has to be set in elastic.yaml to enable it. It fetches, for every shard (primary or replica) of every index, a set of metrics restricted to those shards. It also create a `state` service check on a per-shard basis. Metrics and state are fetched from the `/_stats?level=shard` endpoint. A test has also been added. The metrics and service checks are sent with a bunch of new tags: `es_node:node_name`, `es_shard:shard_name`, `es_index:index_name`, `es_role:(primary|replica)` to make it easy to define a scope to monitor/graph in our backend. The `shard_specific` tag is also added to avoid confusion with other stats metrics, aggregated on elasticsearch's end.
Following Haissam's suggestion, I added a call to `AgentCheckTest.coverage_report()` at the end the new elastic tests since they involve testing the presence of new metrics.
38a0707
to
0c7bbbb
Compare
Thanks, I just wait for Travis to run and I'll 🚢 it :) |
Looks like it's good to be shipped :) |
Can't wait for tomorrow to see how much want we can have with these new metrics on staging :) |
…metrics" This reverts commit b19f251, reversing changes made to 3e6c0b1. This reverts BOTH the addition of shard-level metrics to Elasticsearch as well as the addition of PShard-related statistics. Pshard statistics will be readded for the 5.5 release, shard level statistics will be discussed for 5.6.0 because right now, they obviously create way to much contexts in our backend.
Add shard level metrics to Elasticsearch integration (Careful: branch has been created on top of etienne/es-pshard-metrics)