CustomResource: no metrics if no resource exists for CRD when ksm starts #2142

chrischdi · 2023-08-10T14:40:30Z

What happened:

There are two issues when no resources exist for a CRD at the time when kube-state-metrics starts:

If a resource for a custom resource gets created after kube-state-metrics starts:
- no HELP or TYPE will get exported
In a special case, there will also no metrics get exported for it.

What you expected to happen:

Not requiring a restart to expose the metrics

How to reproduce it (as minimally and precisely as possible):

Applied a CRD
Started KSM with configuration for the CRD
Sleep for 15s
Created a CR of the above CRD type
The metrics don't get exposed
Restart KSM
Metrics will get exposed

Used Configuration:

# This file was auto-generated via: make generate-metrics-config
kind: CustomResourceStateMetrics
spec:
  resources:
  - groupVersionKind:
      group: cluster.x-k8s.io
      kind: Cluster
      version: v1beta1
    labelsFromPath:
      name:
      - metadata
      - name
      namespace:
      - metadata
      - namespace
      uid:
      - metadata
      - uid
    metricNamePrefix: capi_cluster
    metrics:
    - name: created
      help: Unix creation timestamp.
      each:
        gauge:
          path:
          - metadata
          - creationTimestamp
        type: Gauge
    - name: status_condition
      help: The condition of a cluster.
      each:
        stateSet:
          labelName: status
          labelsFromPath:
            type:
            - type
          list:
          - 'True'
          - 'False'
          - Unknown
          path:
          - status
          - conditions
          valueFrom:
          - status
        type: StateSet

Note: in this case: after creating the resource:

The metric capi_cluster_created will get exposed, but no HELP or TYPE.
The metric for capi_cluster_status_condition will not get exposed

Anything else we need to know?:

I think the issue got introduced in #1851

If the method hasResources which is called here:

kube-state-metrics/internal/store/builder.go

Line 548 in cc06755

if b.hasResources(resourceName, expectedType) {

get's changed to always return true instead of actually tring to list CRs:

kube-state-metrics/internal/store/builder.go

Line 627 in cc06755

var list *unstructured.UnstructuredList

it meets the expectation and metrics get exposed as soon as the first CR for the CRD gets created.

Environment:

kube-state-metrics version: v2.9.2
Kubernetes version (use kubectl version): v1.27.3
Cloud provider or hardware configuration: kind
Other info:

The text was updated successfully, but these errors were encountered:

chrischdi · 2023-08-10T15:16:35Z

I think the overall issue is in

kube-state-metrics/pkg/metrics_store/metrics_writer.go

Line 49 in bb6e9f4

func (m MetricsWriter) WriteAll(w io.Writer) error {

If the first store has no header set: it will add an empty header for the first store

kube-state-metrics/pkg/metrics_store/metrics_writer.go

Lines 63 to 65 in bb6e9f4

    
           if m.stores[0].headers == nil && m.stores[0].metrics != nil { 
        
           	m.stores[0].headers = []string{""} 
        
           }

Afterwards we loop over the headers, and use the index there to write the metrics

kube-state-metrics/pkg/metrics_store/metrics_writer.go

Lines 66 to 85 in bb6e9f4

    
           for i, help := range m.stores[0].headers { 
        
           	if help != "" && help != "\n" { 
        
           		help += "\n" 
        
           	} 
        
           	// TODO: This writes out the help text for each metric family, before checking if the metrics for it exist, 
        
           	// TODO: which is not ideal, and furthermore, diverges from the OpenMetrics standard. 
        
           	_, err := w.Write([]byte(help)) 
        
           	if err != nil { 
        
           		return fmt.Errorf("failed to write help text: %v", err) 
        
           	} 
        
           	for _, s := range m.stores { 
        
           		for _, metricFamilies := range s.metrics { 
        
           			_, err := w.Write(metricFamilies[i]) 
        
           			if err != nil { 
        
           				return fmt.Errorf("failed to write metrics family: %v", err) 
        
           			} 
        
           		} 
        
           	} 
        
           }

In this case, if we have more than one metricFamilies in a single store we will only expose the first metric in there.

chrischdi · 2023-08-10T16:10:14Z

Thinking a bit more about it:

What hasResources tries to accomplish is to not expose HELP and TYPE if we have no metrics. But together with the "empty first header" thing in WriteAll this breaks and there is no re-detection of the headers if resources get created.

dashpole · 2023-08-10T16:40:12Z

/assign @dgrisonnet
/triage accpeted

k8s-ci-robot · 2023-08-10T16:40:14Z

@dashpole: The label(s) triage/accpeted cannot be applied, because the repository doesn't have them.

In response to this:

/assign @dgrisonnet
/triage accpeted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dashpole · 2023-08-10T16:41:38Z

/triage accepted

chihshenghuang · 2023-08-10T17:47:39Z

i have the same issue and provide the log at #2141

chrischdi · 2023-08-11T06:59:07Z

i have the same issue and provide the log at #2141

Jep, I've seen that issue and also tried the linked PR. Though I was not sure if that describes the same issue.

chrischdi · 2023-08-14T06:29:44Z

Be aware, geneated by AI architect, may contain some faulty statements

Don't know if I should consider this as spam and report. It does not really provide new and/or useful suggestions which are not already there in the above.

buger · 2023-08-14T06:42:31Z

I woud argue that it does not contain any new info. At least for me, who was not deep into this project before.
Keep in mind that before doing this suggestions, it analysed all the source code, etc. It is not just GPT bot which rewrites/summarise original input.

I would ask not to report, as it just experiement, and I picked a few OSS project to trial my project on real-world problems. And in each project I took up to 5 tickets and PRs. So I'm not going to continue posting this more, unless repository owner will want me to continue.

mrueg · 2023-08-14T07:30:18Z

If you run any further "experiements", please reach out to any repository owner first and ask them for their okay before targetting their project. Spamming a project (even if it's only 5 tickets) without any sort of opt-in is a pretty rude and non-friendly behavior.
This is not helpful to resolve any issue, potentially confusing users and not desired by the maintainers of this project as it wastes time from everyone reading those generated messages.

To be clear, we do not want you to continue.

mrbobbytables · 2023-08-14T16:30:21Z

@buger K8s GitHub admin here - For context - We are broadly seeing more LLM / genAI driven comments and PRs. 99% of these comments provide bad advise or are generally useless in terms of code contributions, BUT because they're "written well" or have just enough that seem legit on first pass it takes a lot more effort on the maintainers to deal with.

To expand a tad further - there was a recent study that showed 52% of ChatGPT answers were incorrect, and 77% considered too verbose. Admittedly the sample size for it was rather small, so may not be the best study but still rings true anecdotally from what we see.

ZhangsongLee · 2024-01-04T12:14:31Z

If we apply a CRD after starting ksm and then create a CR later, The metrics don't get exposed.
We expect to expose these metrics without restarting ksm。

kubernetes/kube-state-metrics#2142

chrischdi added the kind/bug Categorizes issue or PR as related to a bug. label Aug 10, 2023

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 10, 2023

k8s-ci-robot assigned dgrisonnet Aug 10, 2023

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 10, 2023

This comment was marked as spam.

Sign in to view

chrischdi mentioned this issue Aug 18, 2023

fix: custommetrics: always extract the headers but only write it when we have metrics #2154

Merged

chihshenghuang mentioned this issue Aug 20, 2023

CustomResourceStateMetrics didn't report the custom resource status data to metrics and kube-state-metrics crash if the custom resource property change #2141

Closed

k8s-ci-robot closed this as completed in #2154 Aug 28, 2023

chrischdi mentioned this issue Sep 1, 2023

🌱 hack: bump kube-state-metrics and prometheus charts kubernetes-sigs/cluster-api#9352

Merged

ZhangsongLee mentioned this issue Jan 4, 2024

CustomResource: no metrics if CRD apply after ksm starts #2296

Open

david-martin mentioned this issue Jan 22, 2024

Restart Kube State Metrics after install to ensure all CRDs exist on … Kuadrant/api-quickstart#28

Closed

david-martin added a commit to Kuadrant/multicluster-gateway-controller that referenced this issue Jan 22, 2024

Bump ksm version to include fix for crds

9ca6a87

kubernetes/kube-state-metrics#2142

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CustomResource: no metrics if no resource exists for CRD when ksm starts #2142

CustomResource: no metrics if no resource exists for CRD when ksm starts #2142

chrischdi commented Aug 10, 2023 •

edited

Loading

chrischdi commented Aug 10, 2023

chrischdi commented Aug 10, 2023

dashpole commented Aug 10, 2023

k8s-ci-robot commented Aug 10, 2023

dashpole commented Aug 10, 2023

chihshenghuang commented Aug 10, 2023

chrischdi commented Aug 11, 2023

This comment was marked as spam.

chrischdi commented Aug 14, 2023

buger commented Aug 14, 2023 •

edited

Loading

mrueg commented Aug 14, 2023 •

edited

Loading

mrbobbytables commented Aug 14, 2023

ZhangsongLee commented Jan 4, 2024

CustomResource: no metrics if no resource exists for CRD when ksm starts #2142

CustomResource: no metrics if no resource exists for CRD when ksm starts #2142

Comments

chrischdi commented Aug 10, 2023 • edited Loading

chrischdi commented Aug 10, 2023

chrischdi commented Aug 10, 2023

dashpole commented Aug 10, 2023

k8s-ci-robot commented Aug 10, 2023

dashpole commented Aug 10, 2023

chihshenghuang commented Aug 10, 2023

chrischdi commented Aug 11, 2023

This comment was marked as spam.

chrischdi commented Aug 14, 2023

buger commented Aug 14, 2023 • edited Loading

mrueg commented Aug 14, 2023 • edited Loading

mrbobbytables commented Aug 14, 2023

ZhangsongLee commented Jan 4, 2024

chrischdi commented Aug 10, 2023 •

edited

Loading

buger commented Aug 14, 2023 •

edited

Loading

mrueg commented Aug 14, 2023 •

edited

Loading