Cannot use Stack Monitoring to have Elasticsearch monitor itself #4709

thbkrkr · 2021-07-29T13:56:38Z

Configuring an Elasticsearch cluster to monitor itself is not possible due to a circular dependency issue (#4627).

The Elasticsearch monitoring cluster referenced in monitoring.metrics and monitoring.logs must be a separate cluster.

This is currently documented as a limitation:

cloud-on-k8s/docs/advanced-topics/stack-monitoring.asciidoc

Line 65 in a6efd82

    
           CAUTION: You cannot configure an Elasticsearch cluster to monitor itself, the monitoring cluster has to be a separate cluster.

The text was updated successfully, but these errors were encountered:

shubhaat · 2021-07-29T17:37:50Z

I've thought a bit more about this, and on the whole I think this is not super important for ECK or ECE (though ECE supports it). I think we see this more with ESS customers, mostly because some customers don't want to setup another cluster for monitoring. Our recommendation (best practices) is to setup a separate cluster for monitoring, because it helps when your monitored cluster is overwhelmed and with separation of concerns. I'd not call this a bug, but just a limitation for now, and see if we get requests for this.

malcolm061990 · 2021-10-08T14:04:26Z

Hi, guys.
Firstly thanks for your product :)
But this issue is still very actual. Previously we deployed raw elasticsearch cluster (not elastic cloud) with several beats and it monitors itself. Simple.
Please figure out this issue for elastic-cloud.

brsolomon-deloitte · 2021-12-20T15:26:44Z

@shubhaat is it also recommended to have a separate Kibana instance dedicated to only displaying Stack Monitoring? (Separate from a 'main' Kibana instance used to discover/search data from a 'main' Elasticsearch instance.)

shubhaat · 2021-12-21T14:22:32Z

Yes, that would be the case @brsolomon-deloitte. A self monitoring cluster is easier to setup, but if your cluster goes down so does your monitoring cluster which can be inconvenient. For production usecases it is recommended setting up a separate monitoring cluster so when the monitored cluster is under stress, the monitoring cluster continues to work, and is any alerts and such still work.

thbkrkr · 2022-02-17T12:26:57Z

Update: with #5339 it almost works because now we avoid to deploy ES with an invalid monitoring config but it remains a tricky issue.

The Elasticsearch controller starts by reconciling the required k8s objects for ES (http/transport secrets and services, user/role secrets, ...).
In //, the association controller configures the es->-es association and as soon as the http service and the user secret exist, it sets the association conf annotation in the ES resource.
Since association reconciliation is much faster than ES reconciliation, we don't have time to create pods without monitoring. ES reconciliation fails when adjusting the discovery config due to a conflict on update because the association controller has already updated the resource. Note that for ES we accept to reconcile ES even if an association is not configured, we just requeue if they are not. I think for self monitoring we should be strict on that. I need to check why we are doing this.
Then, the Elasticsearch controller reconciles the ES resource again and this time it creates the pods with monitoring.
Everything looks good until the cluster is ready. In background, ES reconciliation is always requeued until we can get and annotate the cluster uuid.
As soon as we reconcile the cluster uuid in the annotation, it removes the monitoring and the last pod is rotated once until the next reconciliation that recreate the pods with monitoring. -The end-.

The bug is caused because we consider that the association is configured and we configure monitoring iff the assocConfs map is populated. Because this map is not persisted and set only at runtime at the beginning of the reconciliation loop, any update on ES resource wipes the map.

# Pseudo code flow
Reconcile Elasticsearch (R1)
|- FetchWithAssociations // set assocConfs from the annotation
|- ReconcileNodeSpecs    // prepare pods specs
   |- monitoring.IsReconcilable // yes, monitoring ref is defined and configured (depends on assocConfs)
      |- WithMonitoring // yes, configure monitoring
=> pods are created with monitoring

... when the cluster is started and we get its cluster uuid

Reconcile Elasticsearch (R2)
|- FetchWithAssociations // set assocConfs from the annotation
|- ReconcileClusterUUID  // update ES with the cluster uuid annotation => reset assocConfs
|- ReconcileNodeSpecs    // prepare pods specs
   |- monitoring.IsReconcilable // no, monitoring is not configured because no assocConfs
=> pods are recreated without monitoring, last pod is rotated

...

Reconcile Elasticsearch again (like R1)
// ReconcileClusterUUID is done, it will never update again the ES
=> pods are recreated with monitoring, last pod is rotated

thbkrkr · 2022-02-17T15:05:42Z

Summary

If you have an Elasticsearch resource and its associations are configured. We populate a map of AssociationConf at the beginning of the reconciliation loop using an AssociationConf stored in JSON in an annotation of the resource:

cloud-on-k8s/pkg/controller/elasticsearch/elasticsearch_controller.go

Line 165 in 4bbfdd8

requeue, err := r.fetchElasticsearchWithAssociations(ctx, request, &es)

This map is not persisted, only set at runtime:

cloud-on-k8s/pkg/apis/elasticsearch/v1/elasticsearch_types.go

Line 475 in 4bbfdd8

AssocConfs map[types.NamespacedName]commonv1.AssociationConf `json:"-"`

If there is an update to the ES resource:

cloud-on-k8s/pkg/controller/elasticsearch/bootstrap/bootstrap.go

Line 96 in 4bbfdd8

return k8sClient.Update(context.Background(), cluster)

It resets the map!
So, if you depends on the map after the update, you will see that the associations are not configured even though they are 💥 .

Note: there are several places during the ES reconciliation where we update the resource for safety reason. For example, we don't want to reboostrap a cluster already bootstrapped.

Ideas to solve that

Stop to update the ES resource on the fly during the reconciliation and just sent one update at the end. We would loose the safety side of this early updates.
Reorder the code so that we never depends on the assocConfs map after an update of the ES resource. Super wonky.
When we read the map, always verify that if the annotation has an association conf, the map is populated.

thbkrkr added the >bug Something isn't working label Jul 29, 2021

pebrc added >enhancement Enhancement of existing functionality and removed >bug Something isn't working labels Nov 23, 2021

thbkrkr self-assigned this Mar 15, 2022

thbkrkr mentioned this issue Mar 16, 2022

Make sure to read association configuration again from annotations if it was cleared #5489

Merged

thbkrkr closed this as completed in #5489 Mar 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot use Stack Monitoring to have Elasticsearch monitor itself #4709

Cannot use Stack Monitoring to have Elasticsearch monitor itself #4709

thbkrkr commented Jul 29, 2021

shubhaat commented Jul 29, 2021

malcolm061990 commented Oct 8, 2021

brsolomon-deloitte commented Dec 20, 2021

shubhaat commented Dec 21, 2021

thbkrkr commented Feb 17, 2022 •

edited

Loading

thbkrkr commented Feb 17, 2022 •

edited

Loading

Cannot use Stack Monitoring to have Elasticsearch monitor itself #4709

Cannot use Stack Monitoring to have Elasticsearch monitor itself #4709

Comments

thbkrkr commented Jul 29, 2021

shubhaat commented Jul 29, 2021

malcolm061990 commented Oct 8, 2021

brsolomon-deloitte commented Dec 20, 2021

shubhaat commented Dec 21, 2021

thbkrkr commented Feb 17, 2022 • edited Loading

thbkrkr commented Feb 17, 2022 • edited Loading

Summary

Ideas to solve that

thbkrkr commented Feb 17, 2022 •

edited

Loading

thbkrkr commented Feb 17, 2022 •

edited

Loading