Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tenant Metrics #451

Open
oliverbaehler opened this issue Oct 19, 2021 · 13 comments
Open

Tenant Metrics #451

oliverbaehler opened this issue Oct 19, 2021 · 13 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Milestone

Comments

@oliverbaehler
Copy link
Collaborator

Describe the feature

We would like to have more metrics being exported about the current tenant controlled by an operator. Some Metrics t hat would be helpful:

  • Basic info metric (Tenant Active State, Namespace Quota and used Namespaces, so we can count up how many tenants there are)
  • Cordoned tenants (Consider the tenant cordoning label, to tell which tenants are cordoned and which are not)
  • Quota Usage (Evaluate how much of o quota spanned over a tenant is used vs max given quota). Same on namespace basis

These are the most important ones i could think of. Another interesting feature (which most other metric exporter are lacking) ist the possibility to add labels to my resource which then are added as metric label. So le't's say I want to be able to show all tenants of certain customer. I would add the label metrics.clastix.io/customer=a to a tenant cr and then the label customerwith the value a would show up in the metric. This way everyone has much greater flexibility to organize metrics, even if they come from the same controller. Simply if you would check if there's any label matching metrics.clastix.io/* and then register it with the actual metric.

What would the new user story look like?

Doesn't change.

Expected behavior

A clear and concise description of what you expect to happen.

@oliverbaehler oliverbaehler added the blocked-needs-validation Issue need triage and validation label Oct 19, 2021
@bsctl bsctl added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed and removed blocked-needs-validation Issue need triage and validation labels Oct 19, 2021
@bsctl
Copy link
Member

bsctl commented Oct 19, 2021

@oliverbaehler thanks for submitting this request. Implementing them is pretty easy and straightforward. Would you like to submit a PR too?

@oliverbaehler
Copy link
Collaborator Author

@bsctl Yes I will try

@bsctl
Copy link
Member

bsctl commented Nov 16, 2021

@oliverbaehler any progress on this issue? Do you need for help?

@adberger
Copy link

adberger commented Dec 7, 2021

In consultation with @oliverbaehler I would like to take on this topic. However, the implementation does not seem as easy as written by @bsctl . How are metrics implemented in general by capsule and how to add custom metrics?

@adberger
Copy link

@prometherion Any idea where to get started?

@prometherion
Copy link
Member

Hey @adberger, thanks for the ping here!

Since Capsule is built on top of controller-runtime, we already have some controller metrics exported: this is really convenient since we don't have to put in place any additional webserver for metrics exposure, and implement the collector.

I started working on this a few days ago, just pushed on branch issues/451 the work in progress, if you could take a look at it would be perfect so I can share the remarks I found so far.

Let's take the example of summarizing the total overall count of cordoned and active Tenants: I would expect a metric named capsule_tenants_status_count with two labels, such as cordoned and active. Exposing this kind of metric using the prometheus.NewCounterFunc is not possible since we cannot use labels that would do the trick.

However, since you're really active on this topic, would be great to have feedback from you regarding the structure of these metrics: would be bad to have metrics such as capsule_tenants_active_count and capsule_tenants_cordoned_count?

To me, honestly, having two time series is pretty odd, since Prometheus labels are solving this kind of problem.

@adberger
Copy link

adberger commented Dec 27, 2021

Hey @adberger, thanks for the ping here!

Since Capsule is built on top of controller-runtime, we already have some controller metrics exported: this is really convenient since we don't have to put in place any additional webserver for metrics exposure, and implement the collector.

I started working on this a few days ago, just pushed on branch issues/451 the work in progress, if you could take a look at it would be perfect so I can share the remarks I found so far.

Let's take the example of summarizing the total overall count of cordoned and active Tenants: I would expect a metric named capsule_tenants_status_count with two labels, such as cordoned and active. Exposing this kind of metric using the prometheus.NewCounterFunc is not possible since we cannot use labels that would do the trick.

However, since you're really active on this topic, would be great to have feedback from you regarding the structure of these metrics: would be bad to have metrics such as capsule_tenants_active_count and capsule_tenants_cordoned_count?

To me, honestly, having two time series is pretty odd, since Prometheus labels are solving this kind of problem.

Thank you very much.
I'll look into it until 9th of January and give feedback.

@adberger
Copy link

adberger commented Jan 6, 2022

@prometherion I might be doing something wrong but I don't see the metrics yet.

I did https://capsule.clastix.io/docs/contributing/development/#fork-build-and-deploy-capsule with a kind cluster
Then I did kubectl port-forward service/capsule-controller-manager-metrics-service -n capsule-system 8888:8080 and opened http://localhost:8888/metrics in a browser.

Regarding your question: Having two metrics (capsule_tenants_active_count & capsule_tenants_cordoned_count) seems fine for me. You just need different PromQL queries, but the result stays the same.

@prometherion
Copy link
Member

I might be doing something wrong but I don't see the metrics yet

Are you referring to the custom or basic ones?

Becuase for the latters:

curl -s localhost:8888/metrics | wc -l
1618

@adberger
Copy link

I might be doing something wrong but I don't see the metrics yet

Are you referring to the custom or basic ones?

Becuase for the latters:

curl -s localhost:8888/metrics | wc -l
1618

I meant the custom ones

gernest added a commit to gernest/capsule that referenced this issue Apr 12, 2022
closes projectcapsule#451

This commit adds pkg/stats that defines prometheus counters for  active and
cordoned tenants.

These metrics are exposed to controller-runtime. So they are visible on the
controller metrics endpoint.

This was tested manually , after creating a new tenant the metrics showed
up on the metrics service endpoint

```
capsule_tenants_status_active_count 1
```
gernest added a commit to gernest/capsule that referenced this issue Apr 12, 2022
closes projectcapsule#451

This commit adds pkg/stats that defines prometheus counters for  active and
cordoned tenants.

These metrics are exposed to controller-runtime. So they are visible on the
controller metrics endpoint.

This was tested manually , after creating a new tenant the metrics showed
up on the metrics service endpoint

```
capsule_tenants_status_active_count 1
```
gernest added a commit to gernest/capsule that referenced this issue Apr 12, 2022
closes projectcapsule#451

This commit adds pkg/stats that defines prometheus counters for  active and
cordoned tenants.

These metrics are exposed to controller-runtime. So they are visible on the
controller metrics endpoint.

This was tested manually , after creating a new tenant the metrics showed
up on the metrics service endpoint

```
capsule_tenants_status_active_count 1
```
@adberger
Copy link

Solved with https://github.com/kubernetes/kube-state-metrics/blob/main/docs/customresourcestate-metrics.md

Example with kube-prometheus-stack Helm Chart (https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack):

kube-state-metrics:
  rbac:
    extraRules:
      - apiGroups: [ "capsule.clastix.io" ]
        resources: ["tenants"]
        verbs: [ "list", "watch" ]
  customResourceState:
    enabled: true
    config:
      spec:
        resources:
          - groupVersionKind:
              group: capsule.clastix.io
              kind: "Tenant"
              version: "v1beta2"
            labelsFromPath:
              name: [metadata, name]
            metrics:
              - name: "tenant_size"
                help: "Count of namespaces in the tenant"
                each:
                  type: Gauge
                  gauge:
                    path: [status, size]
                commonLabels:
                  custom_metric: "yes"
                labelsFromPath:
                  capsule_tenant: [metadata, name]
                  kind: [ kind ]
              - name: "tenant_state"
                help: "The operational state of the Tenant"
                each:
                  type: StateSet
                  stateSet:
                    labelName: state
                    path: [status, state]
                    list: [Active, Cordoned]
                commonLabels:
                  custom_metric: "yes"
                labelsFromPath:
                  capsule_tenant: [metadata, name]
                  kind: [ kind ]
              - name: "tenant_namespaces_info"
                help: "Namespaces of a Tenant"
                each:
                  type: Info
                  info:
                    path: [status, namespaces]
                    labelsFromPath:
                      tenant_namespace: []
                commonLabels:
                  custom_metric: "yes"
                labelsFromPath:
                  capsule_tenant: [metadata, name]
                  kind: [ kind ]

@prometherion
Copy link
Member

This is definitely gold, thanks for sharing @adberger: could we transform this issue from a code-based feature to a documentation one?

Looking forward to reviewing a PR from you!

@adberger
Copy link

@prometherion Sorry, I currently don't have any intention to make a contribution to the documentation of capsule because I'm quite busy at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants