Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Shard Stats to _nodes/stats #33696

Closed

Conversation

pickypg
Copy link
Member

@pickypg pickypg commented Sep 14, 2018

This adds some useful shard stats, like the number of shards, as well adding all shard details to the response (when requested).

The goal is to enable Monitoring, both internally and from Beats, to fetch the shard stats for the node in one request and thus also dramatically simplify and reduce the overall number of documents
stored in the .monitoring-es-* indices.

Part of elastic/kibana#19704 (comment)

@pickypg
Copy link
Member Author

pickypg commented Sep 14, 2018

Jenkins test this

Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally this LGTM. I verified that the _nodes/stats API now returns shards as well, and I also verified that nodes_stats documents in .monitoring-es-* contain the same shards.

The code makes sense to me but I'd prefer if @tlrx could do a "proper" code review from an Java/ES POV.

@pickypg I imagine that at some point we'll want to update the UI to use the new fields and then remove the code under https://github.com/elastic/elasticsearch/tree/master/x-pack/plugin/monitoring/src/main/java/org/elasticsearch/xpack/monitoring/collector/shard. Are there issues for this work already?

@pickypg
Copy link
Member Author

pickypg commented Sep 17, 2018

Thanks @ycombinator.

I imagine that at some point we'll want to update the UI to use the new fields

It's the second part of elastic/kibana#19704 (comment). I should probably create a separate UI for it. We cannot remove the shard documents until we add the equivalent data to the index_stats documents. But once we do that, then the UI can be updated to look for the shard data as part of the document or otherwise fallback to the current approach (granting full BWC).

@tlrx
Copy link
Member

tlrx commented Sep 18, 2018

Could we split this into 2 independent changes: one for adding the shards stats in node stats and another one that makes use of it in Monitoring? That would make the review easier.

@pickypg
Copy link
Member Author

pickypg commented Sep 18, 2018

@tlrx Sure. I'll back out the Monitoring changes from this PR today.

This adds some useful shard stats, like the number of shards, as well
adding all shard details to the response (when requested).

The goal is to enable Monitoring, both internally and from Beats, to
fetch the shard stats for the node in one request and thus also
dramatically simplify and reduce the overall number of documents
stored in the `.monitoring-es-*` indices.
@pickypg pickypg force-pushed the feature/shard-stats-for-node-stats branch from 2e0a607 to aa9aa26 Compare September 26, 2018 17:26
@pickypg
Copy link
Member Author

pickypg commented Sep 26, 2018

@tlrx Ended up getting distracted, but I've gone ahead and updated it and rebased onto the latest master due to a conflict.

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me, but this must be reviewed by someone from the @elastic/es-core-infra team

}
}
}
true, true, true, false, true, false, false, false, false, false, false, false, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that we want to always return shard stats in the cluster stats?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code was always doing the equivalent lookup of shard stats without any conditions to block it, so I think it's the right thing to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code in Cluster Stats or and in Monitoring?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TransportClusterStatsAction - it's the loop that this is replacing.

@pickypg
Copy link
Member Author

pickypg commented Oct 15, 2018

Can someone @elastic/es-core-infra take a look at this PR?

@GlenRSmith
Copy link
Contributor

GlenRSmith commented Dec 27, 2018

Current workaround in Metricbeats is to use the http module, e.g.

- module: http
  metricsets:
    - json
  namespace: "node_stats"
  path: "_stats"
  dedot.enabled: true
  query:
    level: shards

I haven't been able to figure out how to accomplish this in a plugin, though, without resorting to creating an HTTP client inside the plugin, which seems trappy to me, and possibly susceptible to being unintentionally blocked by permissions.

Tangentially, I also can't figure out how to get immutable source_node UUID from an HTTP call.

Edit: the case that led me to the workaround above was actually index stats, in order to get shard level stats with the ability to know which node each shard resides upon, to help discover hot nodes.

Without looking at the rest (no pun intended) of the proposed change set, I see that the rest-api-spec for indices.stats.json is untouched, so I infer that that endpoint is unchanged. Should I create an ER issue for that (and stop making noise in this PR)?

@pickypg
Copy link
Member Author

pickypg commented Dec 27, 2018

Tangentially, I also can't figure out how to get immutable source_node UUID from an HTTP call.

That's something generated by Monitoring and it's also something we're probably going to leave behind (see elastic/kibana#23720).

Should I create an ER issue for that (and stop making noise in this PR)?

This is the other "part" of the issue that this PR is targeting, so I think we're good once this PR gets in. The goal was to do both in one PR, but I stopped doing both in one PR because it was going to make a much larger jumble of changes that would be harder to review.

@pickypg
Copy link
Member Author

pickypg commented Dec 27, 2018

@GlenRSmith Not to distract from this issue, but it may be worthwhile for you to checkout the work being done in Metricbeat's Elasticsearch module, which is what will inevitably be calling this endpoint. It may simplify whatever it is you're doing.

@GlenRSmith
Copy link
Contributor

@pickypg Thanks for the reply, and sorry for the lag. I expected GH would notify me of a mention, but it didn't.

That's something generated by Monitoring and it's also something we're probably going to leave behind

Thanks for the head's up. This surprises me. I see that the field is part of DiscoveryNode, but I wasn't sure where the initial value was populated. Rather than code dive, I tested a fresh OS instance, and found that clusterService.localNode().getId() does return a UUID. Moreover, the UUID is retained between restarts. Should this be expected unless, say, something like the IP of the host changes?

Not to distract from this issue, but it may be worthwhile for you to checkout the work being done in Metricbeat's Elasticsearch module, which is what will inevitably be calling this endpoint. It may simplify whatever it is you're doing.

Completely agree; that's the direction I would prefer to head (which probably comes as no surprise). As mentioned above, I'm concurrently trying to tease relevant info out of metricbeats, so I'm happy to hear more effort is being directed that way.

@pickypg
Copy link
Member Author

pickypg commented Jan 9, 2019

Moreover, the UUID is retained between restarts. Should this be expected unless, say, something like the IP of the host changes?

That's the persistent UUID, so it's safe/right to use. The only thing that will change it is changing/losing your path.data directory. It is the UUID used throughout the cluster state and any node responses, except where explicitly called out as the ephemeral ID (which, as the name suggests, changes between restarts).

Relating it to this PR, GET /_nodes/stats already uses that ID to key the node objects in the response, as does GET /_nodes:

GET /_nodes/stats
{
  "nodes" : {
    "6BqcjkTqSjeHR3zmZymoKA" : { },
    "D7vXhrLZRLipcxdejcToyg": { },

6BqcjkTqSjeHR3zmZymoKA and D7vXhrLZRLipcxdejcToyg are persistent UUIDs of two nodes.

Probably worth moving this to discuss if there are any other questions.

@tlrx
Copy link
Member

tlrx commented Sep 25, 2019

@pickypg @ycombinator do we want to revive this or should we close?

@pickypg
Copy link
Member Author

pickypg commented Sep 25, 2019

I’d love to see this picked up. I had hoped that the @elastic/stack-monitoring team would so that they could enrich node stats (and then index stats) with shard data rather than have to aggregate everything in the UI.

But in its current form this can only serve as a blueprint at this point.

@pickypg pickypg closed this Sep 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Stats Statistics tracking and retrieval APIs >enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants