Add Shard Stats to _nodes/stats #33696

pickypg · 2018-09-14T04:29:41Z

This adds some useful shard stats, like the number of shards, as well adding all shard details to the response (when requested).

The goal is to enable Monitoring, both internally and from Beats, to fetch the shard stats for the node in one request and thus also dramatically simplify and reduce the overall number of documents
stored in the .monitoring-es-* indices.

Part of elastic/kibana#19704 (comment)

server/src/main/java/org/elasticsearch/action/admin/cluster/node/stats/NodeStats.java

pickypg · 2018-09-14T16:45:43Z

Jenkins test this

server/src/main/java/org/elasticsearch/action/admin/cluster/node/stats/NodeStats.java

ycombinator

Functionally this LGTM. I verified that the _nodes/stats API now returns shards as well, and I also verified that nodes_stats documents in .monitoring-es-* contain the same shards.

The code makes sense to me but I'd prefer if @tlrx could do a "proper" code review from an Java/ES POV.

@pickypg I imagine that at some point we'll want to update the UI to use the new fields and then remove the code under https://github.com/elastic/elasticsearch/tree/master/x-pack/plugin/monitoring/src/main/java/org/elasticsearch/xpack/monitoring/collector/shard. Are there issues for this work already?

pickypg · 2018-09-17T15:35:41Z

Thanks @ycombinator.

I imagine that at some point we'll want to update the UI to use the new fields

It's the second part of elastic/kibana#19704 (comment). I should probably create a separate UI for it. We cannot remove the shard documents until we add the equivalent data to the index_stats documents. But once we do that, then the UI can be updated to look for the shard data as part of the document or otherwise fallback to the current approach (granting full BWC).

tlrx · 2018-09-18T08:13:07Z

Could we split this into 2 independent changes: one for adding the shards stats in node stats and another one that makes use of it in Monitoring? That would make the review easier.

pickypg · 2018-09-18T14:35:26Z

@tlrx Sure. I'll back out the Monitoring changes from this PR today.

This adds some useful shard stats, like the number of shards, as well adding all shard details to the response (when requested). The goal is to enable Monitoring, both internally and from Beats, to fetch the shard stats for the node in one request and thus also dramatically simplify and reduce the overall number of documents stored in the `.monitoring-es-*` indices.

pickypg · 2018-09-26T17:27:06Z

@tlrx Ended up getting distracted, but I've gone ahead and updated it and rebased onto the latest master due to a conflict.

…part of this PR)

tlrx

The change looks good to me, but this must be reviewed by someone from the @elastic/es-core-infra team

tlrx · 2018-09-28T06:57:21Z

.../src/main/java/org/elasticsearch/action/admin/cluster/stats/TransportClusterStatsAction.java

-                }
-            }
-        }
+                true, true, true, false, true, false, false, false, false, false, false, false, true);


Are we sure that we want to always return shard stats in the cluster stats?

The old code was always doing the equivalent lookup of shard stats without any conditions to block it, so I think it's the right thing to do.

The old code in Cluster Stats or and in Monitoring?

TransportClusterStatsAction - it's the loop that this is replacing.

pickypg · 2018-10-15T19:49:20Z

Can someone @elastic/es-core-infra take a look at this PR?

GlenRSmith · 2018-12-27T18:41:45Z

Current workaround in Metricbeats is to use the http module, e.g.

- module: http
  metricsets:
    - json
  namespace: "node_stats"
  path: "_stats"
  dedot.enabled: true
  query:
    level: shards

I haven't been able to figure out how to accomplish this in a plugin, though, without resorting to creating an HTTP client inside the plugin, which seems trappy to me, and possibly susceptible to being unintentionally blocked by permissions.

Tangentially, I also can't figure out how to get immutable source_node UUID from an HTTP call.

Edit: the case that led me to the workaround above was actually index stats, in order to get shard level stats with the ability to know which node each shard resides upon, to help discover hot nodes.

Without looking at the rest (no pun intended) of the proposed change set, I see that the rest-api-spec for indices.stats.json is untouched, so I infer that that endpoint is unchanged. Should I create an ER issue for that (and stop making noise in this PR)?

pickypg · 2018-12-27T19:32:25Z

Tangentially, I also can't figure out how to get immutable source_node UUID from an HTTP call.

That's something generated by Monitoring and it's also something we're probably going to leave behind (see elastic/kibana#23720).

Should I create an ER issue for that (and stop making noise in this PR)?

This is the other "part" of the issue that this PR is targeting, so I think we're good once this PR gets in. The goal was to do both in one PR, but I stopped doing both in one PR because it was going to make a much larger jumble of changes that would be harder to review.

pickypg · 2018-12-27T19:41:10Z

@GlenRSmith Not to distract from this issue, but it may be worthwhile for you to checkout the work being done in Metricbeat's Elasticsearch module, which is what will inevitably be calling this endpoint. It may simplify whatever it is you're doing.

GlenRSmith · 2019-01-09T01:50:59Z

@pickypg Thanks for the reply, and sorry for the lag. I expected GH would notify me of a mention, but it didn't.

That's something generated by Monitoring and it's also something we're probably going to leave behind

Thanks for the head's up. This surprises me. I see that the field is part of DiscoveryNode, but I wasn't sure where the initial value was populated. Rather than code dive, I tested a fresh OS instance, and found that clusterService.localNode().getId() does return a UUID. Moreover, the UUID is retained between restarts. Should this be expected unless, say, something like the IP of the host changes?

Not to distract from this issue, but it may be worthwhile for you to checkout the work being done in Metricbeat's Elasticsearch module, which is what will inevitably be calling this endpoint. It may simplify whatever it is you're doing.

Completely agree; that's the direction I would prefer to head (which probably comes as no surprise). As mentioned above, I'm concurrently trying to tease relevant info out of metricbeats, so I'm happy to hear more effort is being directed that way.

pickypg · 2019-01-09T02:10:35Z

Moreover, the UUID is retained between restarts. Should this be expected unless, say, something like the IP of the host changes?

That's the persistent UUID, so it's safe/right to use. The only thing that will change it is changing/losing your path.data directory. It is the UUID used throughout the cluster state and any node responses, except where explicitly called out as the ephemeral ID (which, as the name suggests, changes between restarts).

Relating it to this PR, GET /_nodes/stats already uses that ID to key the node objects in the response, as does GET /_nodes:

GET /_nodes/stats
{
  "nodes" : {
    "6BqcjkTqSjeHR3zmZymoKA" : { },
    "D7vXhrLZRLipcxdejcToyg": { },

6BqcjkTqSjeHR3zmZymoKA and D7vXhrLZRLipcxdejcToyg are persistent UUIDs of two nodes.

Probably worth moving this to discuss if there are any other questions.

tlrx · 2019-09-25T12:09:56Z

@pickypg @ycombinator do we want to revive this or should we close?

pickypg · 2019-09-25T12:28:28Z

I’d love to see this picked up. I had hoped that the @elastic/stack-monitoring team would so that they could enrich node stats (and then index stats) with shard data rather than have to aggregate everything in the UI.

But in its current form this can only serve as a blueprint at this point.

pickypg added >enhancement :Data Management/Stats Statistics tracking and retrieval APIs v7.0.0 :Data Management/Monitoring v6.5.0 labels Sep 14, 2018

pickypg requested review from ycombinator and tlrx September 14, 2018 04:29

ycombinator reviewed Sep 14, 2018

View reviewed changes

server/src/main/java/org/elasticsearch/action/admin/cluster/node/stats/NodeStats.java Show resolved Hide resolved

ycombinator reviewed Sep 17, 2018

View reviewed changes

server/src/main/java/org/elasticsearch/action/admin/cluster/node/stats/NodeStats.java Show resolved Hide resolved

ycombinator approved these changes Sep 17, 2018

View reviewed changes

pickypg added 11 commits September 26, 2018 10:26

Remove commented out code

8bcde4b

Remove dead code / unused imports

2fbb609

Use shard ID for shard path

04b301d

Remove now-unused import

e5a855d

improve NodeStatsTests

8003663

use correct BWC version in serialization prior to backport

4abd5ed

use proper version

495298e

exempt check

da8041b

expand test rather than simply not accepting the value

0549f73

remove monitoring changes from this PR

aa9aa26

pickypg force-pushed the feature/shard-stats-for-node-stats branch from 2e0a607 to aa9aa26 Compare September 26, 2018 17:26

update test with missing parameter (now that the test no longer is a …

78179af

…part of this PR)

tlrx reviewed Sep 28, 2018

View reviewed changes

chrisronline mentioned this pull request Oct 18, 2018

[WIP] [Monitoring] Use single monitoring index to avoid shard duplicates elastic/kibana#24231

Closed

3 tasks

colings86 added v6.6.0 and removed v6.5.0 labels Oct 25, 2018

jasontedor added v6.7.0 and removed v6.6.0 labels Dec 19, 2018

jasontedor added v8.0.0 and removed v7.0.0 labels Feb 6, 2019

danielmitterdorfer added v7.2.0 and removed v6.7.0 labels Feb 7, 2019

ycombinator mentioned this pull request Apr 18, 2019

[Monitoring] Duplicated Shards elastic/kibana#19704

Closed

jakelandis added v7.3.0 and removed v7.2.0 labels Jun 17, 2019

jpountz removed :Data Management/Monitoring v7.3.0 v8.0.0 labels Jul 5, 2019

pickypg mentioned this pull request Jul 12, 2019

[Stack Monitoring] Logstash Pipelines view can trigger OOM elastic/kibana#37246

Closed

ycombinator mentioned this pull request Sep 19, 2019

[Stack Monitoring] Shards per Node is not accurate from Metricbeat elastic/kibana#46185

Closed

pickypg closed this Sep 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Shard Stats to _nodes/stats #33696

Add Shard Stats to _nodes/stats #33696

pickypg commented Sep 14, 2018

pickypg commented Sep 14, 2018

ycombinator left a comment

pickypg commented Sep 17, 2018

tlrx commented Sep 18, 2018

pickypg commented Sep 18, 2018

pickypg commented Sep 26, 2018

tlrx left a comment

tlrx Sep 28, 2018

pickypg Sep 29, 2018

tlrx Oct 1, 2018

pickypg Oct 1, 2018

pickypg commented Oct 15, 2018

GlenRSmith commented Dec 27, 2018 •

edited

Loading

pickypg commented Dec 27, 2018

pickypg commented Dec 27, 2018

GlenRSmith commented Jan 9, 2019

pickypg commented Jan 9, 2019

tlrx commented Sep 25, 2019

pickypg commented Sep 25, 2019

Add Shard Stats to _nodes/stats #33696

Add Shard Stats to _nodes/stats #33696

Conversation

pickypg commented Sep 14, 2018

pickypg commented Sep 14, 2018

ycombinator left a comment

Choose a reason for hiding this comment

pickypg commented Sep 17, 2018

tlrx commented Sep 18, 2018

pickypg commented Sep 18, 2018

pickypg commented Sep 26, 2018

tlrx left a comment

Choose a reason for hiding this comment

tlrx Sep 28, 2018

Choose a reason for hiding this comment

pickypg Sep 29, 2018

Choose a reason for hiding this comment

tlrx Oct 1, 2018

Choose a reason for hiding this comment

pickypg Oct 1, 2018

Choose a reason for hiding this comment

pickypg commented Oct 15, 2018

GlenRSmith commented Dec 27, 2018 • edited Loading

pickypg commented Dec 27, 2018

pickypg commented Dec 27, 2018

GlenRSmith commented Jan 9, 2019

pickypg commented Jan 9, 2019

tlrx commented Sep 25, 2019

pickypg commented Sep 25, 2019

GlenRSmith commented Dec 27, 2018 •

edited

Loading