Track network metrics between nodes #19335

PhaedrusTheGreek · 2016-07-08T13:49:45Z

Typically Elasticsearch doesn't work well in cross-datacentre architectures, but how can you define that? So long as there is reliable and ample network connection between 2 sites, why not? If Elasticsearch had insight into the reliability of it's relationship to other nodes in the cluster, this could serve as a vital cluster health metrics.

To know that, it would be great if each ES node could track shard transfer rate, ping time, packet loss, relative uptime, etc, metrics against any/all other known nodes. Also tracking minimum_masters stable time from each node's perspective would be useful too.

The results of the metrics could be used in diagnosing or indicating stability problems due to network issues. The availability metrics would be skewed by node restarts, etc, but it would still be highly useful. The transfer rate data would always be consistent.

clintongormley · 2016-07-15T09:50:25Z

Discussed in FixItFriday. Agreed that at least some of these metrics would be good to have, but it would be a time-consuming and tedious job to add these stats. Nice to have, but maybe not worth the effort?

I'll mark it as adoptme and high hanging fruit

ywelsch · 2016-07-15T10:07:55Z

A simpler way to get started here might be to log warnings on the node (similar to slow logs). If pinging takes longer than a (user-definable) threshold, we could for example log a warning. Same for slow shard transfer rates etc.

inqueue · 2016-10-20T14:40:59Z

+1 @ywelsch

jpcarey · 2016-10-20T14:50:47Z

+1 @ywelsch

martijnvg · 2018-03-19T14:54:11Z

This issue has been open for a while, but not a lot has happened with it. I will close this issue for now, because it is a high hanging fruit and there are currently no plans to work on this improvement, also another approach that @ywelsch suggested is easier to get started. Please feel free to leave feedback on the proposal (including +1s).

clintongormley added discuss :Data Management/Stats Statistics tracking and retrieval APIs labels Jul 8, 2016

clintongormley added >enhancement high hanging fruit help wanted adoptme and removed discuss labels Jul 15, 2016

martijnvg closed this as completed Mar 19, 2018

PhaedrusTheGreek mentioned this issue Dec 6, 2018

Search API response time breakdown #21073

Open

ywelsch mentioned this issue Dec 7, 2018

Improve Elasticsearch network monitoring #36127

Closed

22 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track network metrics between nodes #19335

Track network metrics between nodes #19335

PhaedrusTheGreek commented Jul 8, 2016

clintongormley commented Jul 15, 2016

ywelsch commented Jul 15, 2016

inqueue commented Oct 20, 2016

jpcarey commented Oct 20, 2016

martijnvg commented Mar 19, 2018

Track network metrics between nodes #19335

Track network metrics between nodes #19335

Comments

PhaedrusTheGreek commented Jul 8, 2016

clintongormley commented Jul 15, 2016

ywelsch commented Jul 15, 2016

inqueue commented Oct 20, 2016

jpcarey commented Oct 20, 2016

martijnvg commented Mar 19, 2018