-
Notifications
You must be signed in to change notification settings - Fork 714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Stack Monitoring] Elasticsearch monitoring with Metricbeat and Filebeat as sidecars #4528
Conversation
b41bf30
to
8cf2dd0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks promising! I wonder however if we can model this with just one additional association type es->es instead of two separate ones for metricbeat and filebeat?
It was my first idea that I too quickly abandoned because I did not yet understand very well how associations work. I started again and you're right it's a little simpler and cleaner. I will update the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity I did a test with version 6.8 of the stack, Metricbeat containers do not start:
2021-06-03T08:59:11.655Z INFO instance/beat.go:280 Setup Beat: metricbeat; Version: 6.8.16
2021-06-03T08:59:11.657Z INFO elasticsearch/client.go:164 Elasticsearch url: https://monitoring-es-http.demo.svc:9200
2021-06-03T08:59:11.657Z INFO [publisher] pipeline/module.go:110 Beat name: monitored-es-master-0
2021-06-03T08:59:11.657Z INFO instance/beat.go:359 metricbeat stopped.
2021-06-03T08:59:11.657Z ERROR instance/beat.go:906 Exiting: 1 error: The elasticsearch module with xpack.enabled: true must have metricsets: [ccr cluster_stats index index_recovery index_summary ml_job node_stats shard]
Exiting: 1 error: The elasticsearch module with xpack.enabled: true must have metricsets: [ccr cluster_stats index index_recovery index_summary ml_job node_stats shard]
I think it's because enrich
is not supported before 7.5
(not sure about the version). It seems that we need to generate the Metricbeat configuration according to the stack version and plan to add an e2e test to validate version dependant behaviour (can be added in a subsequent PR).
Also, reading this compatibility matrix I'm wondering if we should prevent association between a monitored ES/Beat 6.x and a monitoring ES 7.x ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple small comments, I also played with it a bit and it works nicely.
Do we plan to include other apps (Kibana, Beats) monitoring as well?
Remains to be done:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filebeat keeps crash looping for me with one or more modules must be configured
I haven't quite figured out what causes it but my guess is YAML indentation is to blame somewhere. Otherwise I think we are almost ready to merge 👍
Bad copy-paste f4759fd 🤦♂️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Nice work @thbkrkr 👍 I have done a few tests, ran the e2e test you added, explored the validations. There is probably room for more in-depth testing of edge cases where a resource constrained Metricbeat or Filebeat bring down a whole Pod but I haven't tried that myself.
Thanks Peter! I'm going to follow-up with Kibana monitoring.
This is a disadvantage of the sidecar pattern. If it happens, you will have to increase the compute resources. I did a quick test be restricting memory for Metricbeat:
Elasticsearch is green and reachable but from the operator's point of view, the ES resource is red and it blocks the deployment of the associated Kibana.
To increase the memory limit, you need to apply the updated manifest with a new limit and kill the pod. |
Per the PR below, add the minimum version to manual. #4528 > enable stack logging for the ES container using the environment variable ES_LOG_STYLE=file > Minimum supported Stack version is 7.14.0 (to benefit from ES_LOG_STYLE=file)
Adds a new
monitoring
field to the Elasticsearch resource to configure one or two different Elasticsearch references to set up stack monitoring with Metricbeat and log delivery with Filebeat. The referenced ES are used to send the data collected by the beats.This is implemented with a multiple association of ES type (1 es <-> [1|2] es).
New
stackmon
packages contains functions to:xpack.monitoring.*
settings in the ES configES_LOG_STYLE=file
The beats configuration is built from a base configuration merged with the output config section which defines es info to send data. For Metricsbeat, the base config is a template to inject the es info to collect data.
A hash of the two beats config files is added in a pod label to ensure pods are rotated when es user passwords are rotated.
YAML example for testing
Limitations
Custom Elasticsearch image that don't follow the Elastic scheme (To use custom beat images, you have to override the podTemplate.$registry/elasticsearch/elasticsearch:$version
) are not supportedmonitoring.[metrics|logs].elasticsearchRefs
accepts only one Elasticsearch reference. It's a slice to future proof the API for Elastic agent.Relates to #4183.