Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Es shard level metrics #1752

Merged
merged 3 commits into from
Jul 14, 2015
Merged

Es shard level metrics #1752

merged 3 commits into from
Jul 14, 2015

Conversation

elafarge
Copy link
Contributor

@elafarge elafarge commented Jul 7, 2015

Add shard level metrics to Elasticsearch integration (Careful: branch has been created on top of etienne/es-pshard-metrics)

@elafarge elafarge force-pushed the etienne/es-shard-level-metrics branch 3 times, most recently from d7cf065 to 29d99cf Compare July 8, 2015 04:45
shard_role = "primary"
elif count_replicas:
replica_number += 1
shard_name = 'R' + i_str + '_{0}'.format(replica_number)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could write shard_name = 'R{0}_{1}'.format(i_str, replica_number) instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed :) Thanks.

@elafarge elafarge force-pushed the etienne/es-shard-level-metrics branch 2 times, most recently from e6d0b3b to a20a1dc Compare July 8, 2015 22:29
# The "shard_level_metrics" flag enables metrics and service checks on a per-
# shard basis (all the information is fetched under the /_stats?level=shards
# endpoint). The metrics and service check sent for each shard are named as
# such: elasticsearch.shard.metric.name .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could add a line about how the metrics are tagged, so that people know how to graph per shard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the one six lines below suitable ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@elafarge elafarge force-pushed the etienne/es-shard-level-metrics branch from a20a1dc to 7dfdecf Compare July 10, 2015 22:45
@@ -297,3 +346,87 @@ def test_health_event(self):
tags=['host:localhost', 'port:9200'],
count=1
)

def test_pshard_metrics(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're testing for new metrics you should add a coverage_report at the end of this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good remark indeed @hkaj .

@elafarge
Copy link
Contributor Author

@hkaj Test coverage report has been added to the new tests.

@elafarge elafarge force-pushed the etienne/es-shard-level-metrics branch from c6b7d99 to 13f580b Compare July 13, 2015 18:35
@hkaj
Copy link
Member

hkaj commented Jul 14, 2015

Awesome! You can rebase and 🚢 then. Congrats 🎉

Etienne LAFARGE added 3 commits July 14, 2015 15:39
Added statistics over primary shards only to our check. They're
basically retrieved under the `_stats` endpoint and are aggregated
metrics on the primary shards. It means that they don't take into
account metrics from replica shards. So for instance
`primaries.docs.count` will contain the total number of documents in the
cluster without replicas. Computing replica's would be as simple as
summing the "standard" document count metric over all nodes in the
cluster.

A test for those new metrics in particular has been added.
This adds shard-level metrics to elasticsearch. The feature is disabled
by default. A flag has to be set in elastic.yaml to enable it.

It fetches, for every shard (primary or replica) of every index, a set
of metrics restricted to those shards. It also create a `state` service
check on a per-shard basis. Metrics and state are fetched from the
`/_stats?level=shard` endpoint. A test has also been added.

The metrics and service checks are sent with a bunch of new tags:
`es_node:node_name`, `es_shard:shard_name`, `es_index:index_name`,
`es_role:(primary|replica)` to make it easy to define a scope to
monitor/graph in our backend. The `shard_specific` tag is also added to
avoid confusion with other stats metrics, aggregated on elasticsearch's
end.
Following Haissam's suggestion, I added a call to
`AgentCheckTest.coverage_report()` at the end the new elastic tests
since they involve testing the presence of new metrics.
@elafarge elafarge force-pushed the etienne/es-shard-level-metrics branch from 38a0707 to 0c7bbbb Compare July 14, 2015 19:40
@elafarge
Copy link
Contributor Author

Thanks, I just wait for Travis to run and I'll 🚢 it :)

@elafarge
Copy link
Contributor Author

Looks like it's good to be shipped :)

elafarge added a commit that referenced this pull request Jul 14, 2015
@elafarge elafarge merged commit b19f251 into master Jul 14, 2015
@elafarge
Copy link
Contributor Author

Can't wait for tomorrow to see how much want we can have with these new metrics on staging :)

@LeoCavaille LeoCavaille deleted the etienne/es-shard-level-metrics branch August 11, 2015 19:00
elafarge pushed a commit that referenced this pull request Aug 27, 2015
…metrics"

This reverts commit b19f251, reversing
changes made to 3e6c0b1.

This reverts BOTH the addition of shard-level metrics to Elasticsearch
as well as the addition of PShard-related statistics.

Pshard statistics will be readded for the 5.5 release, shard level
statistics will be discussed for 5.6.0 because right now, they obviously
create way to much contexts in our backend.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants