Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce High Level MR metrics #683

Merged
merged 6 commits into from
Apr 24, 2024
Merged

Conversation

ezgidemirel
Copy link
Member

@ezgidemirel ezgidemirel commented Apr 3, 2024

Description of your changes

This pull request introduces the following high level metrics in the MR reconciler and cherry-picks the relevant commit in this PR.

  • crossplane_managed_resource_exists{"gvk"}
  • crossplane_managed_resource_ready{"gvk"}
  • crossplane_managed_resource_synced{"gvk"}
  • crossplane_managed_resource_first_time_to_reconcile_seconds{"gvk"}
  • crossplane_managed_resource_first_time_to_readiness_seconds{"gvk"}
  • crossplane_managed_resource_deletion_seconds{"gvk"}
  • crossplane_managed_resource_drift_seconds{"gvk"}

The crossplane_managed_resource_time_to_first_reconcile_seconds and crossplane_managed_resource_first_time_to_readiness_seconds metrics are pushed only once for each resource, capturing the initial reconciliation time after creation and the first instance the resource becomes ready, respectively. It's important to note that these metrics won't be recorded in the event of a pod restart. One potential solution to this issue is to introduce an additional status subresource, such as "observations," where these values can be stored persistently.

Example provider PR which consumes changes-> crossplane-contrib/provider-kubernetes#224

Fixes #

I have:

  • Read and followed Crossplane's contribution process.
  • Run make reviewable test to ensure this PR is ready for review.

@ezgidemirel ezgidemirel requested review from a team as code owners April 3, 2024 13:26
@ezgidemirel ezgidemirel requested review from bobh66 and ytsarev April 3, 2024 13:26
pkg/reconciler/managed/metrics.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/metrics.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/metrics.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/metrics.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/metrics.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/metrics.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/metrics.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/metrics.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/metrics.go Outdated Show resolved Hide resolved
func (r *XRStateRecorder) Record(ctx context.Context, gvk schema.GroupVersionKind) {
l := &unstructured.UnstructuredList{}
l.SetGroupVersionKind(gvk)
err := r.client.List(ctx, l)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't list from a client. List from a cache, i.e. pass in a cache here. Listing can be expensive, imagine 10000 objects.

}

for _, ro := range o {
ro(r)
}

// State recorder is started in the background to record MR states.
go r.stateRecorder.Run(context.Background(), schema.GroupVersionKind(of))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never leak go routines from a constructor. This must be started outside with an availablectx.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made state recorders Runnable and registered to manager to manage go routine's lifecycle with it.

func (r *MRStateRecorder) Record(ctx context.Context, gvk schema.GroupVersionKind) {
l := &unstructured.UnstructuredList{}
l.SetGroupVersionKind(gvk)
err := r.client.List(ctx, l)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also here: pass a cache please to make sure this does not hit the apiserver. This can be a very heavy operation, listing thousands of MRs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be sufficient to accept client.Reader and add a comment letting the caller know it must be cache-backed?

I don't feel strongly - mostly wondering if there's some way to do this up at the controller-runtime level of abstraction without moving down into client-go things. Would be nice, but not critical.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about passing the following option to the NewManager function in provider's main.go?

mgr, err := ctrl.NewManager(ratelimiter.LimitRESTConfig(cfg, *maxReconcileRate), ctrl.Options{
	Client: client.Options{Cache: &client.CacheOptions{Unstructured: true}},
	...
})

This makes shouldByPassCache method to return false in the List call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I no longer use unstructured list, so we'll use client's cache.

@ezgidemirel ezgidemirel force-pushed the mr-metrics branch 2 times, most recently from 7c1aaab to 54a8995 Compare April 15, 2024 11:37
@negz
Copy link
Member

negz commented Apr 15, 2024

Thanks @ezgidemirel! Still needs some work per @sttts's comments but I like this direction much better.

@ezgidemirel ezgidemirel force-pushed the mr-metrics branch 2 times, most recently from 597a61f to 535d7fa Compare April 16, 2024 08:33
Signed-off-by: ezgidemirel <[email protected]>
pkg/reconciler/managed/metrics.go Outdated Show resolved Hide resolved
Copy link
Member

@mergenci mergenci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"gvk" seems to be misplaced in these instances.

pkg/reconciler/managed/metrics.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/metrics.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/metrics.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/metrics.go Outdated Show resolved Hide resolved
@negz
Copy link
Member

negz commented Apr 18, 2024

@ezgidemirel Your PR description mentions some crossplane_composite_resource_ metrics, but I don't see where those are implemented. Are you planning for them to be part of this PR? I'd expect core Crossplane to emit these metrics.

@ezgidemirel
Copy link
Member Author

@ezgidemirel Your PR description mentions some crossplane_composite_resource_ metrics, but I don't see where those are implemented. Are you planning for them to be part of this PR? I'd expect core Crossplane to emit these metrics.

I was planning to add the state metrics with this PR, but then removed with this commit. I'll update the PR description.

Signed-off-by: ezgidemirel <[email protected]>

mrs := mrList.GetItems()
if len(mrs) == 0 {
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: This early return seems to prevent setting the metric to zero when all resources are deleted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what we can do at this stage. If we don't have any MRs left, we cannot send the value with the right gvk. I think it's ok to say that, within a time framed monitoring, we'll display the correct metrics. If there are no MRs with a specific GVK in the last 15 mins, users won't see any metrics for that GVK.

Copy link
Member

@turkenh turkenh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ezgidemirel 🙏

@negz I see all your comments were resolved and we need to get this in to be able ship this functionality with the upcoming provider releases (i.e. tomorrow), so, I'll merge it now. Feel free add more comments if there are still open points and we can get them resolved with a follow up PR.

@mergenci, similarly, feel free to open a follow up PR regarding your comment on early return.

@eduardo-diaz-em
Copy link

eduardo-diaz-em commented Oct 18, 2024

@ezgidemirel Your PR description mentions some crossplane_composite_resource_ metrics, but I don't see where those are implemented. Are you planning for them to be part of this PR? I'd expect core Crossplane to emit these metrics.

I was planning to add the state metrics with this PR, but then removed with this commit. I'll update the PR description.

Hi @ezgidemirel I am running crossplane on 1.17 and I still can't see the crossplane_managed_resources_* metrics (the rest are working fine). According to the docs they should be available, is this a bug or am I missing something? This is my crossplane configuration from ArgoCD:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: crossplane
  namespace: argocd
spec:
  project: system-apps
  source:
    chart: crossplane
    repoURL: https://charts.crossplane.io/stable
    targetRevision: 1.17.0
    helm:
      values: |
        resourcesRBACManager:
          requests:
            cpu: 250m
            memory: 512Mi
          limits:
            cpu: 250m
            memory: 512Mi
        resourcesCrossplane:
          requests:
            cpu: 250m
            memory: 512Mi
          limits:
            cpu: 250m
            memory: 512Mi
        metrics:
          enabled: true
  destination:
    server: 'https://kubernetes.default.svc'
    namespace: crossplane
  syncPolicy:
    automated:
      prune: false
      selfHeal: false
    syncOptions:
      - ServerSideApply=true
      - CreateNamespace=true

@mergenci
Copy link
Member

@eduardo-diaz-em, this is a crossplane-runtime PR. crossplane-runtime is used by Crossplane providers. Your provider version and which version of crossplane-runtime it uses is important, rather than your Crossplane version. For instance, provider-upjet-aws uses crossplane-runtime v1.16.0-rc.2.0.

I acknowledge that it would be helpful if the documentation clarified distinction between Crossplane and Crossplane provider metrics.

@pablomdc
Copy link

pablomdc commented Nov 5, 2024

I'm using same xpkg.upbound.io/upbound/provider-aws-sqs:v1.9.0 which is using crossplane-runtime v1.16.0-rc.2.0.2024 along with crossplane v1.16 same happens to me, I can see all metrics but not the crossplane_managed_resources_* ones

@eduardo-diaz-em managed to get this solved?
@ezgidemirel @mergenci will appreciate any clue here

This is my v1.16 chart config :

extraObjects:

  • apiVersion: aws.upbound.io/v1beta1
    kind: ProviderConfig
    metadata:
    name: default
    spec:
    credentials:
    source: IRSA
  • apiVersion: pkg.crossplane.io/v1beta1
    kind: DeploymentRuntimeConfig
    metadata:
    name: irsa-runtimeconfig
    spec:
    serviceAccountTemplate:
    metadata:
    name: crossplane-aws
    annotations:
    eks.amazonaws.com/role-arn: .....
  • apiVersion: pkg.crossplane.io/v1
    kind: Provider
    metadata:
    name: provider-aws-sqs
    spec:
    package: xpkg.upbound.io/upbound/provider-aws-sqs:v1.9.0
    runtimeConfigRef:
    name: irsa-runtimeconfig

@pablomdc
Copy link

pablomdc commented Nov 6, 2024

Please ignore, I found the problem was I was looking at crossplane pod metrics, but not at the provider pods ones :)
metrics are here: provider-aws-sqs-7a356e3a0f47-55bdc59969-vkmpt
Noting here, in case someone faced same confusion

@jbw976
Copy link
Member

jbw976 commented Nov 6, 2024

Thanks for the quick follow up @pablomdc - if anyone else stops by this PR, one helpful resource is also this demo to use the high level metrics that I had put together a few months back with help from @ezgidemirel: https://github.com/jbw976/demo-metrics

@eduardo-diaz-em
Copy link

Buena esa @pablomdc I was also looking at crossplane pods rather than provider ones. Gracias!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants