Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack monitoring and version upgrade order #4627

Closed
sebgl opened this issue Jul 9, 2021 · 3 comments
Closed

Stack monitoring and version upgrade order #4627

sebgl opened this issue Jul 9, 2021 · 3 comments
Labels
>bug Something isn't working discuss We need to figure this out

Comments

@sebgl
Copy link
Contributor

sebgl commented Jul 9, 2021

We ensure a strict ordering of stack product version upgrades: Elasticsearch first, then Enterprise Search, then Kibana, etc.

This is done through our association mechanism: the associated resource can only be updated once all referenced resources are updated:

// AllowVersion returns true if the given resourceVersion is lower or equal to the associations' versions.
// For example: Kibana in version 7.8.0 cannot be deployed if its Elasticsearch association reports version 7.7.0.
// A difference in the patch version is ignored: Kibana 7.8.1+ can be deployed alongside Elasticsearch 7.8.0.
// Referenced resources version is parsed from the association conf annotation.
func AllowVersion(resourceVersion version.Version, associated commonv1.Associated, logger logr.Logger, recorder record.EventRecorder) bool {
for _, assoc := range associated.GetAssociations() {
assocRef := assoc.AssociationRef()
if !assocRef.IsDefined() {
// no association specified, move on
continue
}
if assoc.AssociationConf() == nil || assoc.AssociationConf().Version == "" {
// no conf reported yet, this may be the initial resource creation
logger.Info("Delaying version deployment since the version of an associated resource is not reported yet",
"version", resourceVersion, "ref_namespace", assocRef.Namespace, "ref_name", assocRef.Name)
return false
}
refVer, err := version.Parse(assoc.AssociationConf().Version)
if err != nil {
logger.Error(err, "Invalid version found in association configuration", "association_version", assoc.AssociationConf().Version)
return false
}
compatibleVersions := refVer.GTE(resourceVersion) || ((refVer.Major == resourceVersion.Major) && (refVer.Minor == resourceVersion.Minor))
if !compatibleVersions {
// the version of the referenced resource (example: Elasticsearch) is lower than
// the desired version of the reconciled resource (example: Kibana)
logger.Info("Delaying version deployment since a referenced resource is not upgraded yet",
"version", resourceVersion, "ref_version", refVer,
"ref_type", assoc.AssociationType(), "ref_namespace", assocRef.Namespace, "ref_name", assocRef.Name)
recorder.Event(associated, corev1.EventTypeWarning, events.EventReasonDelayed,
fmt.Sprintf("Delaying deployment of version %s since the referenced %s is not upgraded yet", resourceVersion, assoc.AssociationType()))
return false
}
}
return true
}

This function gets called as part of individual resource controllers. For example in the Kibana reconciliation:

if !association.AllowVersion(d.version, kb, logger, d.Recorder()) {
return results // will eventually retry
}

We don't do that check in the Elasticsearch controller. Mostly because so far we didn't have any Elasticsearch -> X association. That's not true anymore with stack monitoring.

In the Kibana controller, we do check that any referenced ES cluster for monitoring purposes is running with a higher or equal version.
For consistency, we should do the same in the Elasticsearch controller.

This is where things get a bit tricky :)

  • Association to itself: Elasticsearch monitoring data can be configured to be sent to itself. In which case we need to make sure we don't get into a circular race condition where the version cannot be updated at all because the controller waits for its own resource to be updated first (🔁).

  • Circular associations: we could configure 2 different clusters to monitor each other (ES A <-> ES B). In which case both would require the other one to be updated first.

(The current situation where we don't do that check in the Elasticsearch controller is maybe an OK solution to that problem, overall).

@sebgl sebgl added the discuss We need to figure this out label Jul 9, 2021
@botelastic botelastic bot added the triage label Jul 9, 2021
@thbkrkr thbkrkr added the >bug Something isn't working label Jul 27, 2021
@botelastic botelastic bot removed the triage label Jul 27, 2021
@thbkrkr
Copy link
Contributor

thbkrkr commented Jul 27, 2021

Currently:

  • the circular associations (ES A <-> ES B) works and that with a version upgrade
  • the association to itself does not work

@thbkrkr
Copy link
Contributor

thbkrkr commented Jul 28, 2021

Note on version compatibility of monitored and monitoring clusters: Ideally the monitoring cluster and the production cluster run on the same Elastic Stack version. However, a monitoring cluster on the latest release of 7.x also works with production clusters that use the same major version. Monitoring clusters that use 7.x also work with production clusters that use the latest release of 6.x (source: https://www.elastic.co/guide/en/elasticsearch/reference/current/monitoring-production.html#monitoring-production).

@thbkrkr thbkrkr closed this as completed Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Something isn't working discuss We need to figure this out
Projects
None yet
Development

No branches or pull requests

2 participants