-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Stack Monitoring] Diagnostic query docs (#127572)
- Loading branch information
1 parent
5410626
commit f071726
Showing
4 changed files
with
121 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
CPU Utilization is a metric that seems like a simple question: How hard are my CPUs working? | ||
|
||
But the way CPU resources get managed can get interesting. Especially when [cgroups](https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt) and [CFS](https://www.kernel.org/doc/html/latest/scheduler/sched-design-CFS.html) are used. | ||
|
||
When trying to debug why a CPU metric doesn't look the way you expect it to in a Stack Monitoring graph, this information may be helpful. | ||
|
||
At the time of writing, the code path to get from a system level CPU metric to a utilization percentage looks like this: | ||
|
||
1. `node_cpu_metric` set to `node_cgroup_quota_as_cpu_utilization` when cgroup is enabled: [node_detail.js](/x-pack/plugins/monitoring/server/routes/api/v1/elasticsearch/node_detail.js#L61-65) | ||
1. `node_cgroup_quota_as_cpu_utilization` defined as a `QuotaMetric` against `cpu.cfs_quota_micros`: [metrics.ts](/x-pack/plugins/monitoring/server/lib/metrics/elasticsearch/metrics.ts#L798-801) | ||
1. `QuotaMetric` tries to produce a ratio of usage to quota, but returns null when quota isn't a positive number: [quota_metric.ts](/x-pack/plugins/monitoring/server/lib/metrics/classes/quota_metric.ts#L79-80) | ||
|
||
So it's important to be aware of the `monitoring.ui.container.elasticsearch.enabled` setting, which defaults to `true` on cloud.elastic.co. | ||
|
||
Some values of `cfs_quota_micros` could produce unexpected results. For example, if cgroups enabled but no quota is set, you'll get an "N/A" in the stack monitoring UI since elasticsearch can't directly see how much of the CPU it's using. | ||
|
||
You can confirm a point-in-time value of `cfs_quota_micros` for Elasticsearch by using the [node stats API](https://www.elastic.co/guide/en/elasticsearch/reference/master/cluster-nodes-stats.html). | ||
|
||
The CPU available on Elastic Cloud is based on the memory size of the instance, and smaller instance sizes get an additional boost via direct adjustments to the `cfs_quota_us` cgroup setting. | ||
|
||
For self-hosted deployments, the cgroup configuration will likely need to be checked via `docker inspect`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
If the stack monitoring UI isn't showing data for any cluster, it may first be useful to survey the available data using a query like this: | ||
|
||
```Kibana Dev Tools | ||
POST .monitoring-*/_search | ||
{ | ||
"size": 0, | ||
"query": { | ||
"range": { | ||
"timestamp": { | ||
"gte": "now-1h", | ||
"lte": "now" | ||
} | ||
} | ||
}, | ||
"aggs": { | ||
"clusters": { | ||
"terms": { | ||
"field": "cluster_uuid", | ||
"size": 1000 | ||
}, | ||
"aggs": { | ||
"indices": { | ||
"terms": { | ||
"field": "_index", | ||
"size": 1000 | ||
}, | ||
"aggs": { | ||
"documentTypes": { | ||
"terms": { | ||
"field": "type", | ||
"size": 1000 | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
This will show what document types are available in each index for each cluster UUID in the last hour. | ||
|
||
The main cluster list requires ES cluster stats to be available. You can use this query to check for the presence of cluster stats for a given `CLUSTER_UUID` (note the replacement required in the query). | ||
|
||
```Kibana Dev Tools | ||
POST .monitoring-*,*:.monitoring-*,metrics-*,*:metrics-*/_search | ||
{ | ||
"size": 10, | ||
"query": { | ||
"bool": { | ||
"filter": [ | ||
{ | ||
"bool": { | ||
"should": [ | ||
{ | ||
"term": { | ||
"type": "cluster_stats" | ||
} | ||
}, | ||
{ | ||
"term": { | ||
"metricset.name": "cluster_stats" | ||
} | ||
} | ||
] | ||
} | ||
}, | ||
{ | ||
"term": { | ||
"cluster_uuid": "<CLUSTER UUID>" | ||
} | ||
}, | ||
{ | ||
"range": { | ||
"timestamp": { | ||
"format": "epoch_millis", | ||
"gte": "now-7d", | ||
"lte": "now" | ||
} | ||
} | ||
} | ||
] | ||
} | ||
}, | ||
"collapse": { | ||
"field": "cluster_uuid" | ||
}, | ||
"sort": { | ||
"timestamp": { | ||
"order": "desc", | ||
"unmapped_type": "long" | ||
} | ||
} | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters