Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track network metrics between nodes #19335

Closed
PhaedrusTheGreek opened this issue Jul 8, 2016 · 5 comments
Closed

Track network metrics between nodes #19335

PhaedrusTheGreek opened this issue Jul 8, 2016 · 5 comments
Labels

Comments

@PhaedrusTheGreek
Copy link
Contributor

Typically Elasticsearch doesn't work well in cross-datacentre architectures, but how can you define that? So long as there is reliable and ample network connection between 2 sites, why not? If Elasticsearch had insight into the reliability of it's relationship to other nodes in the cluster, this could serve as a vital cluster health metrics.

To know that, it would be great if each ES node could track shard transfer rate, ping time, packet loss, relative uptime, etc, metrics against any/all other known nodes. Also tracking minimum_masters stable time from each node's perspective would be useful too.

The results of the metrics could be used in diagnosing or indicating stability problems due to network issues. The availability metrics would be skewed by node restarts, etc, but it would still be highly useful. The transfer rate data would always be consistent.

@clintongormley clintongormley added discuss :Data Management/Stats Statistics tracking and retrieval APIs labels Jul 8, 2016
@clintongormley
Copy link
Contributor

Discussed in FixItFriday. Agreed that at least some of these metrics would be good to have, but it would be a time-consuming and tedious job to add these stats. Nice to have, but maybe not worth the effort?

I'll mark it as adoptme and high hanging fruit

@ywelsch
Copy link
Contributor

ywelsch commented Jul 15, 2016

A simpler way to get started here might be to log warnings on the node (similar to slow logs). If pinging takes longer than a (user-definable) threshold, we could for example log a warning. Same for slow shard transfer rates etc.

@inqueue
Copy link
Member

inqueue commented Oct 20, 2016

+1 @ywelsch

1 similar comment
@jpcarey
Copy link
Contributor

jpcarey commented Oct 20, 2016

+1 @ywelsch

@martijnvg
Copy link
Member

This issue has been open for a while, but not a lot has happened with it. I will close this issue for now, because it is a high hanging fruit and there are currently no plans to work on this improvement, also another approach that @ywelsch suggested is easier to get started. Please feel free to leave feedback on the proposal (including +1s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants