Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus integration in Watcher controller #40

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

amoralej
Copy link
Contributor

@amoralej amoralej commented Jan 16, 2025

Watcher will use Prometheus as metrics store. We need to integrate the
prometheus configuration from the Watcher controller.

In order to configure Prometheus datasource in Watcher we are using a secret which will provide the required connection data. This secret will be provided by the telemetry operator at some point although could be any secret which has the expected fields, host, port and prometheus_ca.crt.

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/a4c4be5e3fde466491956dbce76f2b96

✔️ noop SUCCESS in 0s
✔️ openstack-meta-content-provider SUCCESS in 1h 45m 20s
watcher-operator-validation FAILURE in 1h 12m 21s
watcher-operator-kuttl RETRY_LIMIT in 29m 12s

@amoralej
Copy link
Contributor Author

recheck

@amoralej amoralej force-pushed the addprometheus branch 2 times, most recently from f5f6817 to c51c6e9 Compare January 17, 2025 10:41
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/5f8c8ce0324f4f54b447479fb19e1ddc

✔️ noop SUCCESS in 0s
✔️ openstack-meta-content-provider SUCCESS in 1h 30m 58s
✔️ watcher-operator-validation SUCCESS in 1h 19m 09s
watcher-operator-kuttl RETRY_LIMIT in 51m 03s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/bc81efe937e44deda7ff67e1a796a300

✔️ noop SUCCESS in 0s
✔️ openstack-meta-content-provider SUCCESS in 1h 41m 09s
✔️ watcher-operator-validation SUCCESS in 1h 29m 05s
watcher-operator-kuttl RETRY_LIMIT in 1h 07m 18s

@amoralej amoralej changed the title [WIP] Add prometheus configuration [WIP] Prometheus integration in Watcher controller Jan 17, 2025
@amoralej amoralej changed the title [WIP] Prometheus integration in Watcher controller Prometheus integration in Watcher controller Jan 20, 2025
Copy link

This change depends on a change that failed to merge.

Change openstack-k8s-operators/ci-framework#2658 is needed.

@amoralej
Copy link
Contributor Author

recheck

Copy link
Contributor

@raukadah raukadah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Overall codewise looks good.

api/v1beta1/common_types.go Outdated Show resolved Hide resolved
ci/playbooks/create-coo-subscription.yaml Outdated Show resolved Hide resolved
ci/playbooks/deploy_watcher_service.yaml Outdated Show resolved Hide resolved
}

if instance.Spec.PrometheusTLSCaCertSecret != nil {
PrometheusCaCert = watcher.CustomPrometheusCaCertFolderPath + instance.Spec.PrometheusTLSCaCertSecret.Key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use filepath.Join to add these two file path.

Suggested change
PrometheusCaCert = watcher.CustomPrometheusCaCertFolderPath + instance.Spec.PrometheusTLSCaCertSecret.Key
PrometheusCaCert = filepath.Join(watcher.CustomPrometheusCaCertFolderPath , instance.Spec.PrometheusTLSCaCertSecret.Key)

templates/watcher/config/00-default.conf Show resolved Hide resolved
Copy link

openshift-ci bot commented Jan 20, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: raukadah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -246,7 +253,7 @@ func (r *WatcherReconciler) Reconcile(ctx context.Context, req ctrl.Request) (re

subLevelSecretName, err := r.createSubLevelSecret(ctx, helper, instance, transporturlSecret, inputSecret, db)
if err != nil {
return ctrl.Result{}, nil
return ctrl.Result{RequeueAfter: time.Duration(60) * time.Second}, nil
}
// the subLevelSecretName will be the value for the Secret field in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that the subLevelSecretName variable is unused, in ensureAPI we simply use the fact the secret is created with the same name of the instance. I think either way is fine, but in a follow-up we should change the function to stop returning it if we're not going to get used, or use is when creating the subCRs instances

// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=65535
// +kubebuilder:validation:Optional
PrometheusPort int32 `json:"prometheusPort,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not really the corect interface.

if we were to pass the service endpoint to watcher directly
we should be past either the name of a promitious Cr that we can then use to retire a transport URL equivalent as a secrete or we should be passing the k8s service to consume.

that is not how we should implement this however.

at the end of the day the planned way to do the integration is for watcher to lookup the pometious end point in keystone as aodh does.

that is work that still need to be completed in the 2025.1 cycle upstream as a prerquist for FR2.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the correct way to integrate this in the short term would be for the telemetry operator to render a secrete with eh port/host info and tls flag and for use to have a filed that takes the secrete by name.

we can default the name of that so that it does not need to be specified by default.

for the CA secrete we can have a separate filed for that as they do in the telemetry operator
PrometheusTLSCaCertSecret
https://github.com/openstack-k8s-operators/telemetry-operator/blob/acd9c11a60caf6ff96d31bb56e290975eb94d785/api/v1beta1/autoscaling_types.go#L176

this is an interface we shoudltry an get right before we merge anything.

exposing the hostname in the CRD is something we should avoid doing.

Copy link
Contributor Author

@amoralej amoralej Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me explain how this is supposed to work. I've taken this approach from the autoscaling controller in telemetry-operator but I'm fine if we prefer to change it. With this approach we are covering two ways to configure Prometheus parameters:

  1. Provide our custom host, port and CA certificate to any value that the customer is willing, that could be even an external prometheus not managed by openstack-operator . For this case I am creating the three parameters PrometheusHost, PrometheusPort and PrometheusTLSCaCertSecret. These values are not intended to be used when using the integrated prometheus deployed by Telemetry operator.
  2. When using the prometheus deployed by telemetry-operator, those three parameters are expected to keep empty. In that case the watcher-operator checks that there is a MetricStorage object created and will configure host, port and cacert just based on the naming conventions used by telemetry-operator:
  • PrometheusHost = fmt.Sprintf("%s-prometheus.%s.svc", telemetryv1.DefaultServiceName, instance.Namespace)
  • PrometheusPort = telemetryv1.DefaultPrometheusPort
  • PrometheusCaCert = tls.DownstreamTLSCABundlePath

If we prefer to only support the configuration of the integrated prometheus, we can simplify it, totally get rid of the Prometheus* parameters in the Spec and apply just approach 2.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, before we merge anything downstream to enable this integration, my hope is that we have completed the keystone integration in the data source plugin so that you do not need to specify any Prometheus host or port in watcher.

to me having the hostname in the CR is a seciutiy hardening opportunity
its not a vulnerability but we should avoid it, i don't think the current telmetery operator CRd is approate and it likely should be changed.

we have not requrieemtn to support external prometious instance and i don't think that is something we should do at least not in the near term without a document usecase and further discussion.

in the absence of the keystone integration my preference would be to modify the telemetry operator to store the data required in a pre-defiend secrete
and for us to pass that by name to the wathcer CR.

if somethone whated to use an external prometious which again i don't think we should support then the could just create a different secrete with the correct filed and pass that in instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense, I will simplify it make the operator to rely in the values used by the telemetry-operator when deploying prometheus. Thanks for the feedback

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to be clear, this discussion and deciding to not support the external prometheus case applies to the operator only right? upstream we will have to support both as we cannot assume prometheus will be integrated with keystone via telemetry operator in the same way

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/3f3bf0e6167c4f3496ee9975ee5d65d5

✔️ noop SUCCESS in 0s
✔️ openstack-meta-content-provider SUCCESS in 1h 18m 00s
watcher-operator-validation FAILURE in 1h 06m 10s
watcher-operator-kuttl RETRY_LIMIT in 21m 05s

Copy link

@marios marios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had another pass at this
Will you now update to get the host/port via tlemetry operator as discussed

@amoralej amoralej removed the approved label Jan 23, 2025
@amoralej amoralej changed the title Prometheus integration in Watcher controller WIP Prometheus integration in Watcher controller Jan 23, 2025
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4cbcaf3b338441afb309c7306eb79ddc

✔️ noop SUCCESS in 0s
✔️ openstack-meta-content-provider SUCCESS in 57m 12s
watcher-operator-validation FAILURE in 46m 15s
✔️ watcher-operator-kuttl SUCCESS in 34m 49s

@amoralej
Copy link
Contributor Author

recheck

In order to configure Prometheus datasource in Watcher we are using a
secret which will provide the required connection data. This secret will
be provided by the telemetry operator at some point although could be
any secret which has the expected fields, host, port and
ca_secret and ca_key.

This patch adds the new field into the spec, validates that it provides the
required fields and copy from Watcher to SubCRs. It also adds a watcher
on the secret in the main controller (also adds the watcher on the top
level .Spec.secret which was misssing).
This patch includes the prometheus configuration data from the provided
secret into the config files in the watcherapi controller.
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/5df118c8dd3549f5a5fd6ce3a4393e0d

✔️ noop SUCCESS in 0s
✔️ openstack-meta-content-provider SUCCESS in 1h 43m 42s
✔️ watcher-operator-validation SUCCESS in 1h 12m 35s
watcher-operator-kuttl RETRY_LIMIT in 50m 30s

This patch adds some required prometheus secret for kuttl jobs to run
and adds validation checks.

Also, it modifies the watcher_deploy make target to create a secret so
that the watcher can be deployed until the secret is created by the
telemetry operator.
@amoralej amoralej changed the title WIP Prometheus integration in Watcher controller Prometheus integration in Watcher controller Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants