Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus integration in Watcher controller #40

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -381,15 +381,20 @@ watcher: export CATALOG_IMG=${CATALOG_IMAGE}
watcher: ## Install watcher operator via olm
bash ci/olm.sh
oc apply -f ci/olm.yaml
timeout 300s bash -c "while ! (oc get csv -n openshift-operators -l operators.coreos.com/cluster-observability-operator.openshift-operators -o jsonpath='{.items[*].status.phase}' | grep Succeeded); do sleep 10; done"
SeanMooney marked this conversation as resolved.
Show resolved Hide resolved
timeout 300s bash -c "while ! (oc get csv -n openstack-operators -l operators.coreos.com/watcher-operator.openstack-operators -o jsonpath='{.items[*].status.phase}' | grep Succeeded); do sleep 1; done"

.PHONY: watcher_deploy
watcher_deploy: ## Deploy watcher service
oc apply -f config/samples/watcher_requirements.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we proably shoudl be waiting in watcher_deploy and watcher_deploy_cleanup too

again like we do for kuttle

https://github.com/openstack-k8s-operators/nova-operator/blob/main/Makefile#L400-L417

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, I'm adding a wait for watcher CR after creating it.

oc apply -f config/samples/watcher_v1beta1_watcher.yaml
oc wait watcher watcher --for condition=Ready --timeout=600s

.PHONY: watcher_deploy_cleanup
watcher_deploy_cleanup: ## Undeploy watcher service
oc delete -f config/samples/watcher_v1beta1_watcher.yaml
oc delete -f config/samples/watcher_requirements.yaml
timeout 300s bash -c "while (oc get watcher watcher); do sleep 10; done"

.PHONY: watcher_cleanup
watcher_cleanup: export CATALOG_IMG=${CATALOG_IMAGE}
Expand Down
4 changes: 4 additions & 0 deletions api/bases/watcher.openstack.org_watcherapis.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,10 @@ spec:
description: PreserveJobs - do not delete jobs after they finished
e.g. to check logs
type: boolean
prometheusSecret:
default: metric-storage-prometheus-config
description: Secret containing prometheus connection parameters
type: string
replicas:
default: 1
description: Replicas of Watcher service to run
Expand Down
4 changes: 4 additions & 0 deletions api/bases/watcher.openstack.org_watcherappliers.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,10 @@ spec:
description: PreserveJobs - do not delete jobs after they finished
e.g. to check logs
type: boolean
prometheusSecret:
default: metric-storage-prometheus-config
description: Secret containing prometheus connection parameters
type: string
replicas:
default: 1
description: Replicas of Watcher service to run
Expand Down
4 changes: 4 additions & 0 deletions api/bases/watcher.openstack.org_watcherdecisionengines.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,10 @@ spec:
description: PreserveJobs - do not delete jobs after they finished
e.g. to check logs
type: boolean
prometheusSecret:
default: metric-storage-prometheus-config
description: Secret containing prometheus connection parameters
type: string
replicas:
default: 1
description: Replicas of Watcher service to run
Expand Down
4 changes: 4 additions & 0 deletions api/bases/watcher.openstack.org_watchers.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,10 @@ spec:
description: PreserveJobs - do not delete jobs after they finished
e.g. to check logs
type: boolean
prometheusSecret:
default: metric-storage-prometheus-config
description: Secret containing prometheus connection parameters
type: string
rabbitMqClusterName:
default: rabbitmq
description: |-
Expand Down
5 changes: 5 additions & 0 deletions api/v1beta1/common_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,11 @@ type WatcherCommon struct {
// to /etc/<service>/<service>.conf.d directory as a custom config file.
CustomServiceConfig string `json:"customServiceConfig,omitempty"`

// +kubebuilder:validation:Optional
// +kubebuilder:default=metric-storage-prometheus-config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

required filed should not have a default.
if they have a default the are optional.

operator lint is meant to catch this but i don't actual see a lint rule in place for this edgecase.

https://github.com/gibizer/operator-lint/tree/main/linters/crd

Copy link
Contributor

@marios marios Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question about the default value ... is there some significance to "metric-storage-prometheus-config" I couldn't find any more references in this repo with grep.

it doesn't "sound" like a secret ;) so just curious

[EDIT]: I couldn't reply to this so adding an edit instead. it seems the convention is to use some variation of -config- in the name for the secrets, especially since we're using them for more than just secret but also to carry the config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That name was proposed by the CloudOps team as the default secret name that will be created by the telemetry-operator with the required info.

I'm doing it optional, thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes so any secret the contains configuration used by a service has -config in it

that actually came form the fact that originally they started as config maps and then we moved most config into secrets because they could or will contain passwords or URLs

so the convention of adding "config" to the name stuck when it became a secret

// Secret containing prometheus connection parameters
PrometheusSecret string `json:"prometheusSecret"`

// +kubebuilder:validation:Optional
// +operator-sdk:csv:customresourcedefinitions:type=spec
// TLS - Parameters related to the TLS
Expand Down
2 changes: 2 additions & 0 deletions api/v1beta1/conditions.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,6 @@ const (
WatcherAPIReadyMessage = "WatcherAPI successfully created"
// WatcherAPIReadyErrorMessage -
WatcherAPIReadyErrorMessage = "WatcherAPI error occured %s"
// WatcherPrometheusSecretErrorMessage -
WatcherPrometheusSecretErrorMessage = "Error with prometheus config secret"
)
12 changes: 12 additions & 0 deletions ci/olm.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
cat > ci/olm.yaml <<EOF_CAT
SeanMooney marked this conversation as resolved.
Show resolved Hide resolved
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: cluster-observability-operator
namespace: openshift-operators
spec:
channel: development
installPlanApproval: Automatic
name: cluster-observability-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
---
apiVersion: v1
kind: Namespace
metadata:
Expand Down
4 changes: 4 additions & 0 deletions config/crd/bases/watcher.openstack.org_watcherapis.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,10 @@ spec:
description: PreserveJobs - do not delete jobs after they finished
e.g. to check logs
type: boolean
prometheusSecret:
default: metric-storage-prometheus-config
description: Secret containing prometheus connection parameters
type: string
replicas:
default: 1
description: Replicas of Watcher service to run
Expand Down
4 changes: 4 additions & 0 deletions config/crd/bases/watcher.openstack.org_watcherappliers.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,10 @@ spec:
description: PreserveJobs - do not delete jobs after they finished
e.g. to check logs
type: boolean
prometheusSecret:
default: metric-storage-prometheus-config
description: Secret containing prometheus connection parameters
type: string
replicas:
default: 1
description: Replicas of Watcher service to run
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,10 @@ spec:
description: PreserveJobs - do not delete jobs after they finished
e.g. to check logs
type: boolean
prometheusSecret:
default: metric-storage-prometheus-config
description: Secret containing prometheus connection parameters
type: string
replicas:
default: 1
description: Replicas of Watcher service to run
Expand Down
4 changes: 4 additions & 0 deletions config/crd/bases/watcher.openstack.org_watchers.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,10 @@ spec:
description: PreserveJobs - do not delete jobs after they finished
e.g. to check logs
type: boolean
prometheusSecret:
default: metric-storage-prometheus-config
description: Secret containing prometheus connection parameters
type: string
rabbitMqClusterName:
default: rabbitmq
description: |-
Expand Down
10 changes: 10 additions & 0 deletions config/samples/watcher_requirements.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
apiVersion: v1
kind: Secret
metadata:
name: metric-storage-prometheus-config
namespace: openstack
stringData:
host: metric-storage-prometheus.openstack.svc
port: "9090"
ca_secret: "combined-ca-bundle"
ca_key: "internal-ca-bundle.pem"
13 changes: 12 additions & 1 deletion controllers/watcher_common.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,16 +24,22 @@ import (
)

const (
passwordSecretField = ".spec.secret"
passwordSecretField = ".spec.secret"
prometheusSecretField = ".spec.prometheusSecret"
)

var (
apiWatchFields = []string{
passwordSecretField,
prometheusSecretField,
}
applierWatchFields = []string{
passwordSecretField,
}
watcherWatchFields = []string{
passwordSecretField,
prometheusSecretField,
}
)

const (
Expand All @@ -48,6 +54,11 @@ const (
// DatabaseUsername is the name of key in the secret for the database
// hostname
DatabaseHostname = "database_hostname"
// Prometheus configuration keys in prometheusSecret
PrometheusHost = "host"
PrometheusPort = "port"
PrometheusCaCertSecret = "ca_secret"
PrometheusCaCertKey = "ca_key"

// WatcherAPILabelPrefix - a unique, service binary specific prefix for the
// labels the WatcherAPI controller uses on children objects
Expand Down
131 changes: 131 additions & 0 deletions controllers/watcher_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,16 @@ import (
batchv1 "k8s.io/api/batch/v1"
corev1 "k8s.io/api/core/v1"
rbacv1 "k8s.io/api/rbac/v1"
"k8s.io/apimachinery/pkg/fields"
"k8s.io/apimachinery/pkg/types"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/builder"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
"sigs.k8s.io/controller-runtime/pkg/handler"
"sigs.k8s.io/controller-runtime/pkg/log"
"sigs.k8s.io/controller-runtime/pkg/predicate"
"sigs.k8s.io/controller-runtime/pkg/reconcile"

rabbitmqv1 "github.com/openstack-k8s-operators/infra-operator/apis/rabbitmq/v1beta1"
keystonev1 "github.com/openstack-k8s-operators/keystone-operator/api/v1beta1"
Expand Down Expand Up @@ -215,6 +221,7 @@ func (r *WatcherReconciler) Reconcile(ctx context.Context, req ctrl.Request) (re
// end of TransportURL creation

// Check we have the required inputs
// Top level secret
hash, _, inputSecret, err := ensureSecret(
ctx,
types.NamespacedName{Namespace: instance.Namespace, Name: instance.Spec.Secret},
Expand All @@ -230,6 +237,7 @@ func (r *WatcherReconciler) Reconcile(ctx context.Context, req ctrl.Request) (re
return ctrl.Result{}, errors.New("error retrieving required data from secret")
}

// TransportURL Secret
hashTransporturl, _, transporturlSecret, err := ensureSecret(
ctx,
types.NamespacedName{Namespace: instance.Namespace, Name: transportURL.Status.SecretName},
Expand All @@ -245,6 +253,44 @@ func (r *WatcherReconciler) Reconcile(ctx context.Context, req ctrl.Request) (re
return ctrl.Result{}, errors.New("error retrieving required data from transporturl secret")
}

// Prometheus config secret

hashPrometheus, _, prometheusSecret, err := ensureSecret(
ctx,
types.NamespacedName{Namespace: instance.Namespace, Name: instance.Spec.PrometheusSecret},
[]string{
PrometheusHost,
PrometheusPort,
},
helper.GetClient(),
&instance.Status.Conditions,
r.RequeueTimeout,
)
if err != nil || hashPrometheus == "" {
// Empty hash means that there is some problem retrieving the key from the secret
instance.Status.Conditions.Set(condition.FalseCondition(
condition.InputReadyCondition,
condition.RequestedReason,
condition.SeverityWarning,
watcherv1beta1.WatcherPrometheusSecretErrorMessage))
return ctrl.Result{}, errors.New("error retrieving required data from prometheus secret")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might do this already but this should result in the inputready condition being false i think
we could also do something like

		instance.Status.Conditions.Set(condition.FalseCondition(
			watcherv1beta1.WatcherPrometheusReadyCondition,
			condition.RequestedReason,
			condition.SeverityWarning,
watcherv1beta1.WatcherPrometheusInvalidSecretMessage))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the way i noteest that for the transport URL we are doing

ctrl.Result{RequeueAfter: time.Duration(10) * time.Second}, nil

that a latent bug that we should fix separate.

we should not be doing any requeueing manually.

that one fo the anti-patterns that we need to review for and address in our existing code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might do this already but this should result in the inputready condition being false i think

I'll use the regular inputready condition with a custom message for prometheus secret error, is that fine?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep thats perfect

}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the way we need to create a finalise on the PrometheusSecret and remove it in our delete code path.

I'm not sure if we should do it before or after we ensure it has the correct content but I'm thinking before we call ensureSecret.

if we do it here then it might have been delete after we validated it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I didn't think on that.

// Add finalizer to prometheus config secret to prevent it from being deleted now that we're using it
if controllerutil.AddFinalizer(&prometheusSecret, helper.GetFinalizer()) {
err := helper.GetClient().Update(ctx, &prometheusSecret)
if err != nil {
instance.Status.Conditions.Set(condition.FalseCondition(
condition.InputReadyCondition,
condition.RequestedReason,
condition.SeverityWarning,
watcherv1beta1.WatcherPrometheusSecretErrorMessage))
return ctrl.Result{}, err
}
}

// End of Prometheus config secret

subLevelSecretName, err := r.createSubLevelSecret(ctx, helper, instance, transporturlSecret, inputSecret, db)
if err != nil {
return ctrl.Result{}, nil
Expand Down Expand Up @@ -787,6 +833,9 @@ func (r *WatcherReconciler) ensureAPI(
// We need to have TLS defined in SubCRs to have some values available
watcherAPISpec.TLS = instance.Spec.TLS

// We need to have the PrometheusSecret in watcherapi
watcherAPISpec.PrometheusSecret = instance.Spec.PrometheusSecret

apiDeployment := &watcherv1beta1.WatcherAPI{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("%s-api", instance.Name),
Expand Down Expand Up @@ -862,13 +911,56 @@ func (r *WatcherReconciler) reconcileDelete(ctx context.Context, instance *watch
}
}

// Remove the finalizer from our Prometheus Secret
prometheusSecret := &corev1.Secret{}
reader := helper.GetClient()
err = reader.Get(ctx,
types.NamespacedName{Namespace: instance.Namespace, Name: instance.Spec.PrometheusSecret},
prometheusSecret)

if err == nil {
if controllerutil.RemoveFinalizer(prometheusSecret, helper.GetFinalizer()) {
err = helper.GetClient().Update(ctx, prometheusSecret)
if err != nil && !k8s_errors.IsNotFound(err) {
return ctrl.Result{}, err
}
util.LogForObject(helper, "Removed finalizer from prometheus config secret", instance)
}
}
//

controllerutil.RemoveFinalizer(instance, helper.GetFinalizer())
Log.Info(fmt.Sprintf("Reconciled Service '%s' delete successfully", instance.Name))
return ctrl.Result{}, nil
}

// SetupWithManager sets up the controller with the Manager.
func (r *WatcherReconciler) SetupWithManager(mgr ctrl.Manager) error {

// index passwordSecretField
if err := mgr.GetFieldIndexer().IndexField(context.Background(), &watcherv1beta1.Watcher{}, passwordSecretField, func(rawObj client.Object) []string {
// Extract the secret name from the spec, if one is provided
cr := rawObj.(*watcherv1beta1.Watcher)
if cr.Spec.Secret == "" {
return nil
}
return []string{cr.Spec.Secret}
}); err != nil {
return err
}

// index prometheusSecretField
if err := mgr.GetFieldIndexer().IndexField(context.Background(), &watcherv1beta1.Watcher{}, prometheusSecretField, func(rawObj client.Object) []string {
// Extract the secret name from the spec, if one is provided
cr := rawObj.(*watcherv1beta1.Watcher)
if cr.Spec.PrometheusSecret == "" {
return nil
}
return []string{cr.Spec.PrometheusSecret}
}); err != nil {
return err
}

return ctrl.NewControllerManagedBy(mgr).
For(&watcherv1beta1.Watcher{}).
Owns(&watcherv1beta1.WatcherAPI{}).
Expand All @@ -883,5 +975,44 @@ func (r *WatcherReconciler) SetupWithManager(mgr ctrl.Manager) error {
Owns(&rbacv1.RoleBinding{}).
Owns(&batchv1.Job{}).
Owns(&corev1.Secret{}).
Watches(
&corev1.Secret{},
handler.EnqueueRequestsFromMapFunc(r.findObjectsForSrc),
builder.WithPredicates(predicate.ResourceVersionChangedPredicate{}),
).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont fully recall how it works so I can't comment on the details but yes we should be watching changes to secrets that we have references too.

Complete(r)
}

func (r *WatcherReconciler) findObjectsForSrc(ctx context.Context, src client.Object) []reconcile.Request {
requests := []reconcile.Request{}

l := log.FromContext(ctx).WithName("Controllers").WithName("Watcher")

for _, field := range watcherWatchFields {
crList := &watcherv1beta1.WatcherList{}
listOps := &client.ListOptions{
FieldSelector: fields.OneTermEqualSelector(field, src.GetName()),
Namespace: src.GetNamespace(),
}
err := r.Client.List(ctx, crList, listOps)
if err != nil {
l.Error(err, fmt.Sprintf("listing %s for field: %s - %s", crList.GroupVersionKind().Kind, field, src.GetNamespace()))
return requests
}

for _, item := range crList.Items {
l.Info(fmt.Sprintf("input source %s changed, reconcile: %s - %s", src.GetName(), item.GetName(), item.GetNamespace()))

requests = append(requests,
reconcile.Request{
NamespacedName: types.NamespacedName{
Name: item.GetName(),
Namespace: item.GetNamespace(),
},
},
)
}
}

return requests
}
Loading