-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support "cluster" scope in Metricbeat elasticsearch module #18547
Conversation
💚 Build SucceededExpand to view the summary
Build stats
Test stats 🧪
|
015e5e8
to
d3ae758
Compare
Pinging @elastic/integrations (Team:Integrations) |
// If we're talking to a set of ES nodes directly, only collect stats from the master node so | ||
// we don't collect the same stats from every node and end up duplicating them. | ||
if m.HostsMode == elasticsearch.HostsModeNode { | ||
isMaster, err := elasticsearch.IsMaster(m.HTTP, m.GetServiceURI()) | ||
if err != nil { | ||
return errors.Wrap(err, "error determining if connected Elasticsearch node is master") | ||
} | ||
|
||
// Not master, no event sent | ||
if !isMaster { | ||
m.Logger().Debug("trying to fetch ccr stats from a non-master node") | ||
return nil | ||
// Not master, no event sent | ||
if !isMaster { | ||
m.Logger().Debug("trying to fetch ccr stats from a non-master node") | ||
return nil | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this logic repeats across the board, I wonder if it should go into a helper function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored in 348c9b9.
// it will provid the data for multiple nodes. This will mean the detection of the | ||
// master node will not be accurate anymore as often in these cases a proxy is in front | ||
// of ES and it's not know if the request will be routed to the same node as before. | ||
// TODO: call GET _nodes/_master?filter_path=nodes.*.name to figure out the ID of the master node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this TODO is done by GetMasterNodeID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in 34cd849.
metricbeat/metricbeat.reference.yml
Outdated
# a distinct Elasticsearch cluster, e.g. a load-balancer fronting the cluster. | ||
# Set to node (default) to treat each item in the hosts list as an individual | ||
# node in the Elasticsearch cluster. | ||
#hosts_mode: node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if mode
is enough, in other places we have used scope
too. Anyway I don't have a strong opinion here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was reserving mode
for this change: #9424 (comment).
Using scope
here instead of hosts_mode
sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in f2a5618.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great Shaunak! thank you for working on this
I tested the changes in this PR for heap usage w.r.t # of nodes in the ES cluster being monitored. There was no significant difference in heap usage, no matter how many nodes were in the the ES cluster being monitored. Details below. Setup
Resultshttps://docs.google.com/spreadsheets/d/1Boi0uw846OSY3vnGqC604Dj8Lh28lD3G9OsKFgPGz8I/edit?usp=sharing |
Pinging @elastic/stack-monitoring (Stack monitoring) |
Co-authored-by: DeDe Morton <[email protected]>
Co-authored-by: DeDe Morton <[email protected]>
eeea51c
to
54f6c3f
Compare
0b0bc5a
to
2f507a9
Compare
…20413) * Adding configuration for hosts_mode * Only perform master check in HostsModeNode * Only ask the node if it's the master node if we're in HostsModeNode * Unpack host_mode string into enum * Adding some specific TODOs in node_stats code * Updating x-pack/metricbeat reference config * Set correct service URI * Get master node ID * Adding CHANGELOG entry * Rename hosts_mode => scope * Removing stale TODO comment * Adding docs * Refactoring common code into helper method * Do not set service URI up front * Updating documentation per review * Remove comments from doc examples * Adding configuration for hosts_mode * Set correct service URI * Adding CHANGELOG entry * Rename hosts_mode => scope * Do not set service URI up front * Update metricbeat/docs/modules/elasticsearch.asciidoc Co-authored-by: DeDe Morton <[email protected]> * Update metricbeat/module/elasticsearch/_meta/docs.asciidoc Co-authored-by: DeDe Morton <[email protected]> * Update reference config * Cleaning up CHANGELOG * Updating generated files Co-authored-by: DeDe Morton <[email protected]> Co-authored-by: DeDe Morton <[email protected]>
…ne-2.0 * upstream/master: [docs] Promote ingest management to beta (elastic#20295) Upgrade elasticsearch client library used in tests (elastic#20405) Disable logging when pulling on python integration tests (elastic#20397) Remove pillow from testing requirements.txt (elastic#20407) [Filebeat][ATP Module]Setting user agent field required by the API (elastic#20440) [Ingest Manager] Send datastreams fields (elastic#20402) Add event.ingested to all Filebeat modules (elastic#20386) [Elastic Agent] Fix agent control socket path to always be less than 107 characters (elastic#20426) Improve cgroup_regex docs with examples (elastic#20425) Makes `metrics` config option required in app_insights (elastic#20406) Ensure install scripts only install if needed (elastic#20349) Update container name for the azure filesets (elastic#19899) Group same timestamp metrics values in app_insights metricset (elastic#20403) add_process_metadata processor adds container id even if process metadata not accessible (elastic#19767) Support "cluster" scope in Metricbeat elasticsearch module (elastic#18547) [Filebeat][SophosXG Module] Renaming module and fileset (elastic#20396) Update Suricata dashboards (elastic#20394) [Elastic Agent] Improve version, restart, enroll CLI commands (elastic#20359) Prepare home directories for docker images in a different stage (elastic#20356)
…allation * upstream/master: (23 commits) [docs] Promote ingest management to beta (elastic#20295) Upgrade elasticsearch client library used in tests (elastic#20405) Disable logging when pulling on python integration tests (elastic#20397) Remove pillow from testing requirements.txt (elastic#20407) [Filebeat][ATP Module]Setting user agent field required by the API (elastic#20440) [Ingest Manager] Send datastreams fields (elastic#20402) Add event.ingested to all Filebeat modules (elastic#20386) [Elastic Agent] Fix agent control socket path to always be less than 107 characters (elastic#20426) Improve cgroup_regex docs with examples (elastic#20425) Makes `metrics` config option required in app_insights (elastic#20406) Ensure install scripts only install if needed (elastic#20349) Update container name for the azure filesets (elastic#19899) Group same timestamp metrics values in app_insights metricset (elastic#20403) add_process_metadata processor adds container id even if process metadata not accessible (elastic#19767) Support "cluster" scope in Metricbeat elasticsearch module (elastic#18547) [Filebeat][SophosXG Module] Renaming module and fileset (elastic#20396) Update Suricata dashboards (elastic#20394) [Elastic Agent] Improve version, restart, enroll CLI commands (elastic#20359) Prepare home directories for docker images in a different stage (elastic#20356) New multiline mode in Filebeat: while_pattern (elastic#19662) ...
…8547) * Adding configuration for hosts_mode * Only perform master check in HostsModeNode * Only ask the node if it's the master node if we're in HostsModeNode * Unpack host_mode string into enum * Adding some specific TODOs in node_stats code * Updating x-pack/metricbeat reference config * Set correct service URI * Get master node ID * Adding CHANGELOG entry * Rename hosts_mode => scope * Removing stale TODO comment * Adding docs * Refactoring common code into helper method * Do not set service URI up front * Updating documentation per review * Remove comments from doc examples * Adding configuration for hosts_mode * Set correct service URI * Adding CHANGELOG entry * Rename hosts_mode => scope * Do not set service URI up front * Update metricbeat/docs/modules/elasticsearch.asciidoc Co-authored-by: DeDe Morton <[email protected]> * Update metricbeat/module/elasticsearch/_meta/docs.asciidoc Co-authored-by: DeDe Morton <[email protected]> * Update reference config * Cleaning up CHANGELOG * Updating generated files Co-authored-by: DeDe Morton <[email protected]>
What does this PR do?
This PR introduces a new
scope
setting for theelasticsearch
Metricbeat module. This setting can take one of two values:node
(default): indicates that each item in thehosts
list points to a distinct Elasticsearch node in a cluster, orcluster
: indicates that each item in thehosts
lists points to a single endpoint for a distinct Elasticsearch cluster (e.g. a load-balancing proxy fronting the cluster).Why is it important?
Sometimes it may not be possible for Metricbeat to reach individual Elasticsearch nodes. It might only have access to a single endpoint that fronts the entire Elasticsearch cluster.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Manual testing
Testing the new
scope: cluster
functionality introduced in this PR requires an Elasticsearch cluster with multiple nodes. The easiest way is probably to spin up an Elastic Cloud cluster.Enable the
elasticsearch-xpack
module.Configure the module (
./modules.d/elasticsearch-xpack.yml
) like so:Obviously, replace the
XXXXX
placeholders with your own.The key is that there should be exactly one item under
hosts
, pointing to single endpoint for your cluster. This could be the Elasticsearch endpoint obtained from Elastic Cloud or the address of a single node in your on-prem/local Elasticsearch cluster.Configure Metricbeat to send the collected stats to an Elasticsearch cluster. This will act as your Monitoring Cluster. It could be the same cluster you're using to collect stats from (as configured in your
elasticsearch-xpack
module configuration above) or it could be an entirely separate cluster.Start Metricbeat.
Let Metricbeat run for ~30 seconds. Make sure there are no errors in the Metricbeat logs.
Perform the following query against your Monitoring Cluster.
This query checks 3 things:
aggs.by_cluster_uuid
aggregation checks that we are only seeing data for a single cluster and that all documents contain that single cluster UUID.aggregations.by_cluster_uuid.buckets
only contains a single bucket, for the single cluster UUID of the Elasticsearch cluster you are monitoring.aggregations.by_cluster_uuid.buckets[0].doc_count
is the same as thehits.total.value
.aggs.cluster_stats
aggregation checks thattype: cluster_stats
documents are only indexed once every collection period (10 seconds) and that there is at most one document per collection period. We are usingtype: cluster_stats
here as an example; the same should be true for anytype
s other thantype: node_stats
.aggregations.cluster_stats.by_period.buckets
have several buckets, each corresponding to a time period. Each buckets should be 10 seconds "wide". Within each bucket, verify thatdoc_count
is <= 1.aggs.node_stats
aggregation checks thattype: node_stats
documents are only indexed once every collection period (10 seconds) and that there are at mostN
documents per collection period, whereN
is the number of nodes in the cluster you are monitoring.aggregations.node_stats.by_period.buckets
have several buckets, each corresponding to a time period. Each buckets should be 10 seconds "wide". Within each bucket, verify thatdoc_count
is <=N
, whereN
is the number of nodes in the cluster you are monitoring.Related issues
elasticsearch
module should be able to collect from a single cluster endpoint #18539.