[OSPRH-8406] Switch to using ScrapeConfigs #446

vyzigold · 2024-07-15T13:18:57Z

We recently discovered issues with authentication, IPv6 and ServiceMonitors in STF. This PR is proactively switching to use ScrapeConfigs instead of ServiceMonitors. The functinality should be equivalent to before. Old ServiceMonitors owned by the MetricStorage controller are deleted.

There is a slight difference in the labels associated with the collected metrics.

The Node Exporter metrics are now missing the "job" label, which didn't seem useful and it follows how ceilometer and rabbit metrics are collected.
Ceilometer and RabbitMQ metrics don't have the "service" label anymore, because ScrapeConfigs don't have the information to create that label. Instead they now have the "instance" label.

The "instance" label is now used to differentiate between different Rabbit clusters in dashboards instead of the "service" label.

I used this opportunity to move the ScrapeConfig creation code into its own function, following the example of dashboard code.

openshift-ci · 2024-07-15T13:19:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vyzigold

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [vyzigold]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

vyzigold · 2024-07-15T13:22:18Z

controllers/metricstorage_controller.go

-			err = r.Client.Delete(ctx, svcMonitor)
+	for _, monitor := range monitorList.Items {
+		if object.CheckOwnerRefExist(instance.ObjectMeta.UID, monitor.ObjectMeta.OwnerReferences) {
+			err = r.Client.Delete(ctx, monitor)


This will delete all ServiceMonitors owned by the telemetry-operator. It's here to cleanup the ServiceMonitors created by previous versions of the operator. Do we want this here? Or is there some better place to do this?

This should be on the reconcileUpdate func, however we are not yet using it.

I'll create the function and add it there then since I have the code ready. We can figure out how / when to call the function later I guess.

vyzigold · 2024-07-15T13:25:15Z

controllers/metricstorage_controller.go

-			}
-		}
-		if !rabbitmqExists {
-			err = r.Client.Delete(ctx, svcMonitor)


The deletion of ServiceMonitors (or ScrapeConfigs) for non-existing RabbitMQs isn't needed with ScrapeConfigs. Previously each instance of RabbitMQ would have its own ServiceMonitor, which would need to be deleted when the RabbitMQ instance was deleted. Now there is only one ScrapeConfig for all RabbitMQ instances. The targets inside the ScrapeConfig are updated each reconciliation loop, so it's always up to date.

jlarriba · 2024-07-16T13:15:21Z

controllers/metricstorage_controller.go

 		ObjectMeta: metav1.ObjectMeta{
-			Name:      fmt.Sprintf("%s-%s", instance.Name, ceilometerServerName),
+			Name:      fmt.Sprintf("%s-ceilometer", instance.Name),


I think we should be using telemetry.ServiceName here instead of instance.Name. The instance is called metric-storage, so the ScrapeConfigs get created as metric-storage-ceilometer and such. I think it is a much better idea to have them created as telemetry-ceilometer.

softwarefactory-project-zuul · 2024-07-16T15:37:11Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/3c2df121d625416fadb295798a1f691d

❌ openstack-k8s-operators-content-provider FAILURE in 8m 57s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ telemetry-operator-multinode-autoscaling-tempest SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ telemetry-operator-multinode-default-telemetry SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ telemetry-operator-multinode-logging SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ functional-tests-on-osp18 SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider (non-voting)

We recently discovered issues with authentication, IPv6 and ServiceMonitors in STF. This PR is proactively switching to use ScrapeConfigs instead of ServiceMonitors. The functinality should be equivalent to before. Old ServiceMonitors owned by the MetricStorage controller are deleted. There is a slight difference in the labels associated with the collected metrics. - The Node Exporter metrics are now missing the "job" label, which didn't seem useful and it follows how ceilometer and rabbit metrics are collected. - Ceilometer and RabbitMQ metrics don't have the "service" label anymore, because ScrapeConfigs don't have the information to create that label. Instead they now have the "instance" label. The "instance" label is now used to differentiate between different Rabbit clusters in dashboards instead of the "service" label. I used this opportunity to move the ScrapeConfig creation code into its own function, following the example of dashboard code.

jlarriba · 2024-07-17T05:53:19Z

/lgtm

openshift-ci bot requested review from elfiesmelfie and jlarriba July 15, 2024 13:19

openshift-ci bot added the approved label Jul 15, 2024

vyzigold commented Jul 15, 2024

View reviewed changes

vyzigold force-pushed the move_to_scrapeconfigs branch 2 times, most recently from 2683248 to 08754a1 Compare July 15, 2024 19:11

This was referenced Jul 15, 2024

[OSPRH-8508] Support ipv6 for metrics retrieval #442

Closed

[OSPRH-8075] Extend default-telemetry #435

Merged

vyzigold force-pushed the move_to_scrapeconfigs branch 2 times, most recently from 53f8446 to 37b4032 Compare July 16, 2024 07:08

jlarriba reviewed Jul 16, 2024

View reviewed changes

vyzigold force-pushed the move_to_scrapeconfigs branch from 37b4032 to c8316f5 Compare July 16, 2024 14:50

vyzigold added 2 commits July 16, 2024 14:01

Fix ScrapeConfigs for CustomMonitoringStack

5eb13cf

vyzigold force-pushed the move_to_scrapeconfigs branch from c8316f5 to 5eb13cf Compare July 16, 2024 18:02

openshift-ci bot assigned jlarriba Jul 17, 2024

openshift-ci bot added the lgtm label Jul 17, 2024

openshift-merge-bot bot merged commit 9004ebc into openstack-k8s-operators:main Jul 17, 2024
6 checks passed

vyzigold mentioned this pull request Jul 17, 2024

Add kube-state-metrics service #337

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OSPRH-8406] Switch to using ScrapeConfigs #446

[OSPRH-8406] Switch to using ScrapeConfigs #446

vyzigold commented Jul 15, 2024

openshift-ci bot commented Jul 15, 2024

vyzigold Jul 15, 2024

jlarriba Jul 15, 2024

vyzigold Jul 15, 2024

vyzigold Jul 15, 2024

jlarriba Jul 16, 2024

vyzigold Jul 16, 2024

softwarefactory-project-zuul bot commented Jul 16, 2024

jlarriba commented Jul 17, 2024

[OSPRH-8406] Switch to using ScrapeConfigs #446

[OSPRH-8406] Switch to using ScrapeConfigs #446

Conversation

vyzigold commented Jul 15, 2024

openshift-ci bot commented Jul 15, 2024

vyzigold Jul 15, 2024

Choose a reason for hiding this comment

jlarriba Jul 15, 2024

Choose a reason for hiding this comment

vyzigold Jul 15, 2024

Choose a reason for hiding this comment

vyzigold Jul 15, 2024

Choose a reason for hiding this comment

jlarriba Jul 16, 2024

Choose a reason for hiding this comment

vyzigold Jul 16, 2024

Choose a reason for hiding this comment

softwarefactory-project-zuul bot commented Jul 16, 2024

jlarriba commented Jul 17, 2024