Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CustomResource: no metrics if no resource exists for CRD when ksm starts #2142

Closed
chrischdi opened this issue Aug 10, 2023 · 13 comments · Fixed by #2154
Closed

CustomResource: no metrics if no resource exists for CRD when ksm starts #2142

chrischdi opened this issue Aug 10, 2023 · 13 comments · Fixed by #2154
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@chrischdi
Copy link
Member

chrischdi commented Aug 10, 2023

What happened:

There are two issues when no resources exist for a CRD at the time when kube-state-metrics starts:

  1. If a resource for a custom resource gets created after kube-state-metrics starts:
    • no HELP or TYPE will get exported
  2. In a special case, there will also no metrics get exported for it.

What you expected to happen:

  • Not requiring a restart to expose the metrics

How to reproduce it (as minimally and precisely as possible):

  • Applied a CRD
  • Started KSM with configuration for the CRD
  • Sleep for 15s
  • Created a CR of the above CRD type
  • The metrics don't get exposed
  • Restart KSM
  • Metrics will get exposed

Used Configuration:

# This file was auto-generated via: make generate-metrics-config
kind: CustomResourceStateMetrics
spec:
  resources:
  - groupVersionKind:
      group: cluster.x-k8s.io
      kind: Cluster
      version: v1beta1
    labelsFromPath:
      name:
      - metadata
      - name
      namespace:
      - metadata
      - namespace
      uid:
      - metadata
      - uid
    metricNamePrefix: capi_cluster
    metrics:
    - name: created
      help: Unix creation timestamp.
      each:
        gauge:
          path:
          - metadata
          - creationTimestamp
        type: Gauge
    - name: status_condition
      help: The condition of a cluster.
      each:
        stateSet:
          labelName: status
          labelsFromPath:
            type:
            - type
          list:
          - 'True'
          - 'False'
          - Unknown
          path:
          - status
          - conditions
          valueFrom:
          - status
        type: StateSet

Note: in this case: after creating the resource:

  • The metric capi_cluster_created will get exposed, but no HELP or TYPE.
  • The metric for capi_cluster_status_condition will not get exposed

Anything else we need to know?:

I think the issue got introduced in #1851

If the method hasResources which is called here:

if b.hasResources(resourceName, expectedType) {

get's changed to always return true instead of actually tring to list CRs:

var list *unstructured.UnstructuredList

it meets the expectation and metrics get exposed as soon as the first CR for the CRD gets created.

Environment:

  • kube-state-metrics version: v2.9.2
  • Kubernetes version (use kubectl version): v1.27.3
  • Cloud provider or hardware configuration: kind
  • Other info:
@chrischdi chrischdi added the kind/bug Categorizes issue or PR as related to a bug. label Aug 10, 2023
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 10, 2023
@chrischdi
Copy link
Member Author

I think the overall issue is in

func (m MetricsWriter) WriteAll(w io.Writer) error {

If the first store has no header set: it will add an empty header for the first store

if m.stores[0].headers == nil && m.stores[0].metrics != nil {
m.stores[0].headers = []string{""}
}

Afterwards we loop over the headers, and use the index there to write the metrics

for i, help := range m.stores[0].headers {
if help != "" && help != "\n" {
help += "\n"
}
// TODO: This writes out the help text for each metric family, before checking if the metrics for it exist,
// TODO: which is not ideal, and furthermore, diverges from the OpenMetrics standard.
_, err := w.Write([]byte(help))
if err != nil {
return fmt.Errorf("failed to write help text: %v", err)
}
for _, s := range m.stores {
for _, metricFamilies := range s.metrics {
_, err := w.Write(metricFamilies[i])
if err != nil {
return fmt.Errorf("failed to write metrics family: %v", err)
}
}
}
}

In this case, if we have more than one metricFamilies in a single store we will only expose the first metric in there.

@chrischdi
Copy link
Member Author

Thinking a bit more about it:

What hasResources tries to accomplish is to not expose HELP and TYPE if we have no metrics. But together with the "empty first header" thing in WriteAll this breaks and there is no re-detection of the headers if resources get created.

@dashpole
Copy link
Contributor

/assign @dgrisonnet
/triage accpeted

@k8s-ci-robot
Copy link
Contributor

@dashpole: The label(s) triage/accpeted cannot be applied, because the repository doesn't have them.

In response to this:

/assign @dgrisonnet
/triage accpeted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dashpole
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 10, 2023
@chihshenghuang
Copy link
Contributor

i have the same issue and provide the log at #2141

@chrischdi
Copy link
Member Author

i have the same issue and provide the log at #2141

Jep, I've seen that issue and also tried the linked PR. Though I was not sure if that describes the same issue.

@buger

This comment was marked as spam.

@chrischdi
Copy link
Member Author

Be aware, geneated by AI architect, may contain some faulty statements

Don't know if I should consider this as spam and report. It does not really provide new and/or useful suggestions which are not already there in the above.

@buger
Copy link

buger commented Aug 14, 2023

I woud argue that it does not contain any new info. At least for me, who was not deep into this project before.
Keep in mind that before doing this suggestions, it analysed all the source code, etc. It is not just GPT bot which rewrites/summarise original input.

I would ask not to report, as it just experiement, and I picked a few OSS project to trial my project on real-world problems. And in each project I took up to 5 tickets and PRs. So I'm not going to continue posting this more, unless repository owner will want me to continue.

@mrueg
Copy link
Member

mrueg commented Aug 14, 2023

If you run any further "experiements", please reach out to any repository owner first and ask them for their okay before targetting their project. Spamming a project (even if it's only 5 tickets) without any sort of opt-in is a pretty rude and non-friendly behavior.
This is not helpful to resolve any issue, potentially confusing users and not desired by the maintainers of this project as it wastes time from everyone reading those generated messages.

To be clear, we do not want you to continue.

@mrbobbytables
Copy link
Member

@buger K8s GitHub admin here - For context - We are broadly seeing more LLM / genAI driven comments and PRs. 99% of these comments provide bad advise or are generally useless in terms of code contributions, BUT because they're "written well" or have just enough that seem legit on first pass it takes a lot more effort on the maintainers to deal with.

To expand a tad further - there was a recent study that showed 52% of ChatGPT answers were incorrect, and 77% considered too verbose. Admittedly the sample size for it was rather small, so may not be the best study but still rings true anecdotally from what we see.

@ZhangsongLee
Copy link

If we apply a CRD after starting ksm and then create a CR later, The metrics don't get exposed.
We expect to expose these metrics without restarting ksm。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
9 participants