Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OSPRH-8406] Switch to using ScrapeConfigs #446

Conversation

vyzigold
Copy link
Contributor

We recently discovered issues with authentication, IPv6 and ServiceMonitors in STF. This PR is proactively switching to use ScrapeConfigs instead of ServiceMonitors. The functinality should be equivalent to before. Old ServiceMonitors owned by the MetricStorage controller are deleted.

There is a slight difference in the labels associated with the collected metrics.

  • The Node Exporter metrics are now missing the "job" label, which didn't seem useful and it follows how ceilometer and rabbit metrics are collected.
  • Ceilometer and RabbitMQ metrics don't have the "service" label anymore, because ScrapeConfigs don't have the information to create that label. Instead they now have the "instance" label.

The "instance" label is now used to differentiate between different Rabbit clusters in dashboards instead of the "service" label.

I used this opportunity to move the ScrapeConfig creation code into its own function, following the example of dashboard code.

@openshift-ci openshift-ci bot requested review from elfiesmelfie and jlarriba July 15, 2024 13:19
Copy link
Contributor

openshift-ci bot commented Jul 15, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vyzigold

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

err = r.Client.Delete(ctx, svcMonitor)
for _, monitor := range monitorList.Items {
if object.CheckOwnerRefExist(instance.ObjectMeta.UID, monitor.ObjectMeta.OwnerReferences) {
err = r.Client.Delete(ctx, monitor)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will delete all ServiceMonitors owned by the telemetry-operator. It's here to cleanup the ServiceMonitors created by previous versions of the operator. Do we want this here? Or is there some better place to do this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be on the reconcileUpdate func, however we are not yet using it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll create the function and add it there then since I have the code ready. We can figure out how / when to call the function later I guess.

}
}
if !rabbitmqExists {
err = r.Client.Delete(ctx, svcMonitor)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deletion of ServiceMonitors (or ScrapeConfigs) for non-existing RabbitMQs isn't needed with ScrapeConfigs. Previously each instance of RabbitMQ would have its own ServiceMonitor, which would need to be deleted when the RabbitMQ instance was deleted. Now there is only one ScrapeConfig for all RabbitMQ instances. The targets inside the ScrapeConfig are updated each reconciliation loop, so it's always up to date.

@vyzigold vyzigold force-pushed the move_to_scrapeconfigs branch 2 times, most recently from 2683248 to 08754a1 Compare July 15, 2024 19:11
@vyzigold vyzigold force-pushed the move_to_scrapeconfigs branch 2 times, most recently from 53f8446 to 37b4032 Compare July 16, 2024 07:08
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("%s-%s", instance.Name, ceilometerServerName),
Name: fmt.Sprintf("%s-ceilometer", instance.Name),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be using telemetry.ServiceName here instead of instance.Name. The instance is called metric-storage, so the ScrapeConfigs get created as metric-storage-ceilometer and such. I think it is a much better idea to have them created as telemetry-ceilometer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@vyzigold vyzigold force-pushed the move_to_scrapeconfigs branch from 37b4032 to c8316f5 Compare July 16, 2024 14:50
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/3c2df121d625416fadb295798a1f691d

openstack-k8s-operators-content-provider FAILURE in 8m 57s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ telemetry-operator-multinode-autoscaling-tempest SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ telemetry-operator-multinode-default-telemetry SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ telemetry-operator-multinode-logging SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ functional-tests-on-osp18 SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider (non-voting)

vyzigold added 2 commits July 16, 2024 14:01
We recently discovered issues with authentication, IPv6
and ServiceMonitors in STF. This PR is proactively switching
to use ScrapeConfigs instead of ServiceMonitors. The
functinality should be equivalent to before. Old ServiceMonitors
owned by the MetricStorage controller are deleted.

There is a slight difference in the labels associated with
the collected metrics.
 - The Node Exporter metrics are now missing the "job" label,
   which didn't seem useful and it follows how ceilometer
   and rabbit metrics are collected.
 - Ceilometer and RabbitMQ metrics don't have the "service"
   label anymore, because ScrapeConfigs don't have the information
   to create that label. Instead they now have the "instance"
   label.

The "instance" label is now used to differentiate between different
Rabbit clusters in dashboards instead of the "service" label.

I used this opportunity to move the ScrapeConfig creation code
into its own function, following the example of dashboard code.
@vyzigold vyzigold force-pushed the move_to_scrapeconfigs branch from c8316f5 to 5eb13cf Compare July 16, 2024 18:02
@jlarriba
Copy link
Collaborator

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Jul 17, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 9004ebc into openstack-k8s-operators:main Jul 17, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants