Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stack Monitoring] Elasticsearch monitoring with Metricbeat and Filebeat as sidecars #4528

Merged
merged 86 commits into from
Jul 6, 2021

Conversation

thbkrkr
Copy link
Contributor

@thbkrkr thbkrkr commented May 27, 2021

Adds a new monitoring field to the Elasticsearch resource to configure one or two different Elasticsearch references to set up stack monitoring with Metricbeat and log delivery with Filebeat. The referenced ES are used to send the data collected by the beats.

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
spec:
  monitoring:
    metrics:
      elasticsearchRefs: 
        - name: m1
          namespace: m1
    logs:
      elasticsearchRefs:
        - name: m2
          namespace: m2

This is implemented with a multiple association of ES type (1 es <-> [1|2] es).

New stackmon packages contains functions to:

  • set the xpack.monitoring.* settings in the ES config
  • enable stack logging for the ES container using the environment variable ES_LOG_STYLE=file
  • add volumes to the ES pod to mount the ES CA and the beats configurations in the beats sidecar containers
  • inject Metricsbeat and Filebeat as sidecar container in the ES pod than that of ES container with volumeMounts for the ES CA
  • reconcile the Metricsbeat and Filebeat configurations in ConfigMaps

The beats configuration is built from a base configuration merged with the output config section which defines es info to send data. For Metricsbeat, the base config is a template to inject the es info to collect data.

A hash of the two beats config files is added in a pod label to ensure pods are rotated when es user passwords are rotated.

YAML example for testing
apiVersion: v1
kind: Namespace
metadata:
  name: a
---
apiVersion: v1
kind: Namespace
metadata:
  name: b
---
apiVersion: v1
kind: Namespace
metadata:
  name: c
---
#######################################################################
# Monitored Elasticsearch
#######################################################################
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: test
  namespace: a
spec:
  version: 7.14.0-SNAPSHOT
  # --------------------- #
  monitoring:
    metrics:
      elasticsearchRefs: 
        - name: m1
          namespace: b
    logs:
      elasticsearchRefs:
        - name: m1
          namespace: b
        # - name: m2
        #   namespace: c
  # --------------------- #
  nodeSets:
  - name: master
    count: 2
    config:
      node.store.allow_mmap: false
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: test
  namespace: a
spec:
  version: 7.14.0-SNAPSHOT
  count: 1
  elasticsearchRef:
    name: test
    namespace: a
---
#######################################################################
# Monitoring clusters
#######################################################################
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: m1
  namespace: b
spec:
  version: 7.14.0-SNAPSHOT
  nodeSets:
  - name: master
    count: 2
    config:
      node.store.allow_mmap: false
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: m1
  namespace: b
spec:
  version: 7.14.0-SNAPSHOT
  count: 1
  elasticsearchRef:
    name: m1
    namespace: b
---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: m2
  namespace: c
spec:
  version: 7.14.0-SNAPSHOT
  nodeSets:
  - name: master
    count: 2
    config:
      node.store.allow_mmap: false
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: m2
  namespace: c
spec:
  version: 7.14.0-SNAPSHOT
  count: 1
  elasticsearchRef:
    name: m2
    namespace: c

Limitations

  • Minimum supported Stack version is 7.14.0 (to benefit from ES_LOG_STYLE=file)
  • Custom Elasticsearch image that don't follow the Elastic scheme ($registry/elasticsearch/elasticsearch:$version) are not supported To use custom beat images, you have to override the podTemplate.
  • The monitored Elasticsearch is not deployed until monitoring clusters are not ready yet or the association are not configured
  • monitoring.[metrics|logs].elasticsearchRefs accepts only one Elasticsearch reference. It's a slice to future proof the API for Elastic agent.

Relates to #4183.

@thbkrkr thbkrkr added the >feature Adds or discusses adding a feature to the product label May 27, 2021
@thbkrkr thbkrkr force-pushed the stack-monitoring branch 2 times, most recently from b41bf30 to 8cf2dd0 Compare May 27, 2021 14:01
@thbkrkr thbkrkr force-pushed the stack-monitoring branch from 8cf2dd0 to e621963 Compare May 27, 2021 22:32
Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks promising! I wonder however if we can model this with just one additional association type es->es instead of two separate ones for metricbeat and filebeat?

pkg/apis/elasticsearch/v1/elasticsearch_types.go Outdated Show resolved Hide resolved
pkg/apis/elasticsearch/v1/elasticsearch_types.go Outdated Show resolved Hide resolved
pkg/controller/elasticsearch/stackmon/volume.go Outdated Show resolved Hide resolved
pkg/apis/common/v1/association.go Outdated Show resolved Hide resolved
@thbkrkr thbkrkr changed the title Elasticsarch monitoring with Metricbeat and Filebeat as sidecars Elasticsearch monitoring with Metricbeat and Filebeat as sidecars May 31, 2021
@thbkrkr
Copy link
Contributor Author

thbkrkr commented May 31, 2021

I wonder however if we can model this with just one additional association type es->es instead of two separate ones for metricbeat and filebeat?

It was my first idea that I too quickly abandoned because I did not yet understand very well how associations work. I started again and you're right it's a little simpler and cleaner. I will update the PR.

Copy link
Contributor

@barkbay barkbay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity I did a test with version 6.8 of the stack, Metricbeat containers do not start:

2021-06-03T08:59:11.655Z        INFO    instance/beat.go:280    Setup Beat: metricbeat; Version: 6.8.16
2021-06-03T08:59:11.657Z        INFO    elasticsearch/client.go:164     Elasticsearch url: https://monitoring-es-http.demo.svc:9200
2021-06-03T08:59:11.657Z        INFO    [publisher]     pipeline/module.go:110  Beat name: monitored-es-master-0
2021-06-03T08:59:11.657Z        INFO    instance/beat.go:359    metricbeat stopped.
2021-06-03T08:59:11.657Z        ERROR   instance/beat.go:906    Exiting: 1 error: The elasticsearch module with xpack.enabled: true must have metricsets: [ccr cluster_stats index index_recovery index_summary ml_job node_stats shard]
Exiting: 1 error: The elasticsearch module with xpack.enabled: true must have metricsets: [ccr cluster_stats index index_recovery index_summary ml_job node_stats shard]

I think it's because enrich is not supported before 7.5 (not sure about the version). It seems that we need to generate the Metricbeat configuration according to the stack version and plan to add an e2e test to validate version dependant behaviour (can be added in a subsequent PR).

Also, reading this compatibility matrix I'm wondering if we should prevent association between a monitored ES/Beat 6.x and a monitoring ES 7.x ?

cmd/manager/main.go Outdated Show resolved Hide resolved
pkg/apis/elasticsearch/v1/elasticsearch_types.go Outdated Show resolved Hide resolved
pkg/controller/elasticsearch/stackmon/config.go Outdated Show resolved Hide resolved
Copy link
Contributor

@david-kow david-kow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple small comments, I also played with it a bit and it works nicely.

Do we plan to include other apps (Kibana, Beats) monitoring as well?

pkg/apis/elasticsearch/v1/elasticsearch_types.go Outdated Show resolved Hide resolved
pkg/apis/elasticsearch/v1/elasticsearch_types.go Outdated Show resolved Hide resolved
pkg/apis/elasticsearch/v1/elasticsearch_types.go Outdated Show resolved Hide resolved
pkg/apis/elasticsearch/v1/elasticsearch_types.go Outdated Show resolved Hide resolved
pkg/controller/association/controller/es_es.go Outdated Show resolved Hide resolved
pkg/controller/elasticsearch/driver/driver.go Outdated Show resolved Hide resolved
pkg/controller/elasticsearch/nodespec/podspec.go Outdated Show resolved Hide resolved
@thbkrkr
Copy link
Contributor Author

thbkrkr commented Jun 7, 2021

Remains to be done:

  • min supported version (>= 7.14)
  • dedicated role

pkg/controller/common/stackmon/beat.go Outdated Show resolved Hide resolved
pkg/controller/common/stackmon/monitoring.go Outdated Show resolved Hide resolved
pkg/controller/common/stackmon/monitoring.go Outdated Show resolved Hide resolved
pkg/controller/common/stackmon/monitoring.go Outdated Show resolved Hide resolved
pkg/controller/elasticsearch/stackmon/validations.go Outdated Show resolved Hide resolved
pkg/controller/elasticsearch/stackmon/validations.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filebeat keeps crash looping for me with one or more modules must be configured I haven't quite figured out what causes it but my guess is YAML indentation is to blame somewhere. Otherwise I think we are almost ready to merge 👍

pkg/controller/elasticsearch/stackmon/beat_config.go Outdated Show resolved Hide resolved
pkg/controller/elasticsearch/stackmon/validations.go Outdated Show resolved Hide resolved
test/e2e/es/stack_monitoring_test.go Show resolved Hide resolved
@thbkrkr
Copy link
Contributor Author

thbkrkr commented Jul 5, 2021

Filebeat keeps crash looping for me with one or more modules must be configured I haven't quite figured out what causes it but my guess is YAML indentation is to blame somewhere.

Bad copy-paste f4759fd 🤦‍♂️

Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice work @thbkrkr 👍 I have done a few tests, ran the e2e test you added, explored the validations. There is probably room for more in-depth testing of edge cases where a resource constrained Metricbeat or Filebeat bring down a whole Pod but I haven't tried that myself.

@thbkrkr
Copy link
Contributor Author

thbkrkr commented Jul 6, 2021

Thanks Peter! I'm going to follow-up with Kibana monitoring.

There is probably room for more in-depth testing of edge cases where a resource constrained Metricbeat or Filebeat bring down a whole Pod but I haven't tried that myself.

This is a disadvantage of the sidecar pattern. If it happens, you will have to increase the compute resources.

I did a quick test be restricting memory for Metricbeat:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: alpha
  namespace: production
spec:
  version: 7.14.0-SNAPSHOT
  monitoring:
    metrics:
      elasticsearchRefs:
        - name: monitoring
          namespace: observability
    logs:
      elasticsearchRefs:
        - name: monitoring
          namespace: observability
  nodeSets:
  - name: master
    count: 2
    config:
      node.store.allow_mmap: false
    podTemplate:
      spec:
        containers:
          - name: metricbeat
            resources:
              limits:
                memory: 20M

Elasticsearch is green and reachable but from the operator's point of view, the ES resource is red and it blocks the deployment of the associated Kibana.

> k get elastic,sts,deploy,pods
NAME                                               HEALTH    NODES   VERSION           PHASE             AGE
elasticsearch.elasticsearch.k8s.elastic.co/alpha   unknown           7.14.0-SNAPSHOT   ApplyingChanges   14m

NAME                                 HEALTH   NODES   VERSION           AGE
kibana.kibana.k8s.elastic.co/alpha   red              7.14.0-SNAPSHOT   14m

NAME                               READY   AGE
statefulset.apps/alpha-es-master   0/2     13m

NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/alpha-kb   0/1     1            0           13m

NAME                            READY   STATUS             RESTARTS   AGE
pod/alpha-es-master-0           2/3     CrashLoopBackOff   7          13m
pod/alpha-es-master-1           2/3     CrashLoopBackOff   7          13m
pod/alpha-kb-6cf84594b9-rm4wp   0/1     Running            0          13m
> eckurl production alpha '/_cat/health' 
1625568204 10:43:24 alpha green 2 2 2 1 0 0 0 0 - 100.0%

To increase the memory limit, you need to apply the updated manifest with a new limit and kill the pod.

@thbkrkr thbkrkr added the v1.7.0 label Jul 6, 2021
@thbkrkr thbkrkr merged commit a25136b into elastic:master Jul 6, 2021
@barkbay barkbay changed the title Elasticsearch monitoring with Metricbeat and Filebeat as sidecars [Stack Monitoring] Elasticsearch monitoring with Metricbeat and Filebeat as sidecars Jul 20, 2021
@thbkrkr thbkrkr deleted the stack-monitoring branch August 31, 2021 15:28
kunisen added a commit that referenced this pull request Nov 8, 2021
Per the PR below, add the minimum version to manual.

#4528
> enable stack logging for the ES container using the environment variable ES_LOG_STYLE=file
> Minimum supported Stack version is 7.14.0 (to benefit from ES_LOG_STYLE=file)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature Adds or discusses adding a feature to the product v1.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants