-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/customresourcestate implement info and stateSet metric type and refactor configuration file #1777
pkg/customresourcestate implement info and stateSet metric type and refactor configuration file #1777
Conversation
e6eda69
to
fd190ac
Compare
fd190ac
to
770ebd7
Compare
Example config: kind: CustomResourceStateMetrics
spec:
resources:
- groupVersionKind:
group: cluster.x-k8s.io
kind: Cluster
version: v1beta1
namespace: capi
subsystem: cluster
labelsFromPath:
name: [metadata, name]
namespace: [metadata, namespace]
uid: [metadata, uid]
metrics:
- each:
gauge:
path:
- metadata
- creationTimestamp
type: Gauge
name: created
help: Unix creation timestamp.
- each:
gauge:
nilIsZero: true
path:
- spec
- paused
type: Gauge
name: spec_paused
- each:
stateSet:
labelName: status
labelsFromPath:
type:
- type
list:
- 'True'
- 'False'
- Unknown
path:
- status
- conditions
valueFrom:
- status
type: StateSet
name: status_condition
help: The current status conditions of a cluster.
- each:
stateSet:
labelName: phase
list:
- Pending
- Provisioning
- Provisioned
- Deleting
- Failed
- Unknown
path:
- status
- phase
type: StateSet
name: status_phase
- groupVersionKind:
group: controlplane.cluster.x-k8s.io
kind: KubeadmControlPlane
version: v1beta1
namespace: capi
subsystem: kubeadmcontrolplane
labelsFromPath:
name: [metadata, name]
namespace: [metadata, namespace]
uid: [metadata, uid]
cluster_name: [metadata, ownerReferences, "[kind=Cluster]", name]
metrics:
- each:
gauge:
path:
- metadata
- creationTimestamp
type: Gauge
name: created
help: Unix creation timestamp.
- each:
stateSet:
labelName: status
labelsFromPath:
type:
- type
list:
- 'True'
- 'False'
- Unknown
path:
- status
- conditions
valueFrom:
- status
type: StateSet
name: status_condition
help: The current status conditions of a kubeadmcontrolplane.
- each:
gauge:
path:
- status
- replicas
nilIsZero: true
type: Gauge
name: status_replicas
- each:
gauge:
path:
- status
- readyReplicas
nilIsZero: true
type: Gauge
name: status_replicas_ready
- each:
gauge:
path:
- status
- unavailableReplicas
nilIsZero: true
type: Gauge
name: status_replicas_unavailable
- each:
gauge:
path:
- status
- updatedReplicas
nilIsZero: true
type: Gauge
name: status_replicas_updated
- each:
gauge:
path:
- spec
- replicas
type: Gauge
name: spec_replicas
- each:
gauge:
path:
- spec
- rolloutStrategy
- rollingUpdate
- maxSurge
type: Gauge
name: spec_strategy_rollingupdate_max_surge
- each:
info:
labelsFromPath:
owner_is_controller:
- controller
owner_kind:
- kind
owner_name:
- name
owner_uid:
- uid
path:
- metadata
- ownerReferences
type: Info
name: owner
- each:
info:
labelsFromPath:
version:
- spec
- version
type: Info
name: info
- groupVersionKind:
group: cluster.x-k8s.io
kind: Machine
version: v1beta1
namespace: capi
subsystem: machine
labelsFromPath:
name: [metadata, name]
namespace: [metadata, namespace]
uid: [metadata, uid]
cluster_name: [spec, clusterName]
metrics:
- each:
gauge:
path:
- metadata
- creationTimestamp
type: Gauge
name: created
help: Unix creation timestamp.
- each:
stateSet:
labelName: status
labelsFromPath:
type:
- type
list:
- 'True'
- 'False'
- Unknown
path:
- status
- conditions
valueFrom:
- status
type: StateSet
name: status_condition
help: The current status conditions of a machine.
- each:
stateSet:
labelName: phase
list:
- Pending
- Provisioning
- Provisioned
- Running
- Deleting
- Deleted
- Failed
- Unknown
path:
- status
- phase
type: StateSet
name: status_phase
- each:
info:
labelsFromPath:
owner_is_controller:
- controller
owner_kind:
- kind
owner_name:
- name
owner_uid:
- uid
path:
- metadata
- ownerReferences
type: Info
name: owner
- each:
info:
labelsFromPath:
node_name:
- status
- nodeRef
- name
node_uid:
- status
- nodeRef
- uid
type: Info
name: status_noderef
- each:
info:
labelsFromPath:
failure_domain:
- spec
- failureDomain
internal_ip:
- status
- addresses
- "[type=InternalIP]"
- address
provider_id:
- spec
- providerID
version:
- spec
- version
type: Info
name: info
- groupVersionKind:
group: cluster.x-k8s.io
kind: MachineDeployment
version: v1beta1
namespace: capi
subsystem: machinedeployment
labelsFromPath:
name: [metadata, name]
namespace: [metadata, namespace]
uid: [metadata, uid]
cluster_name: [spec, clusterName]
metrics:
- each:
stateSet:
labelName: phase
list:
- ScalingUp
- ScalingDown
- Running
- Failed
- Unknown
path:
- status
- phase
type: StateSet
name: status_phase
- each:
gauge:
path:
- metadata
- creationTimestamp
type: Gauge
name: created
help: Unix creation timestamp.
- each:
gauge:
nilIsZero: true
path:
- spec
- paused
type: Gauge
name: spec_paused
- each:
stateSet:
labelName: status
labelsFromPath:
type:
- type
list:
- 'True'
- 'False'
- Unknown
path:
- status
- conditions
valueFrom:
- status
type: StateSet
name: status_condition
help: The current status conditions of a machinedeployment.
- each:
gauge:
path:
- status
- replicas
nilIsZero: true
type: Gauge
name: status_replicas
- each:
gauge:
path:
- status
- availableReplicas
nilIsZero: true
type: Gauge
name: status_replicas_available
- each:
gauge:
path:
- status
- unavailableReplicas
nilIsZero: true
type: Gauge
name: status_replicas_unavailable
- each:
gauge:
path:
- status
- updatedReplicas
nilIsZero: true
type: Gauge
name: status_replicas_updated
- each:
gauge:
path:
- spec
- replicas
type: Gauge
name: spec_replicas
- each:
gauge:
path:
- spec
- strategy
- rollingUpdate
- maxUnavailable
type: Gauge
name: spec_strategy_rollingupdate_max_unavailable
- each:
gauge:
path:
- spec
- strategy
- rollingUpdate
- maxSurge
type: Gauge
name: spec_strategy_rollingupdate_max_surge
- each:
info:
labelsFromPath:
owner_is_controller:
- controller
owner_kind:
- kind
owner_name:
- name
owner_uid:
- uid
path:
- metadata
- ownerReferences
type: Info
name: owner
- groupVersionKind:
group: cluster.x-k8s.io
kind: MachineHealthCheck
version: v1beta1
namespace: capi
subsystem: machinehealthcheck
labelsFromPath:
name: [metadata, name]
namespace: [metadata, namespace]
uid: [metadata, uid]
cluster_name: [spec, clusterName]
metrics:
- each:
gauge:
path:
- metadata
- creationTimestamp
type: Gauge
name: created
help: Unix creation timestamp.
- each:
info:
labelsFromPath:
owner_is_controller:
- controller
owner_kind:
- kind
owner_name:
- name
owner_uid:
- uid
path:
- metadata
- ownerReferences
type: Info
name: owner
- each:
stateSet:
labelName: status
labelsFromPath:
type:
- type
list:
- 'True'
- 'False'
- Unknown
path:
- status
- conditions
valueFrom:
- status
type: StateSet
name: status_condition
help: The current status conditions of a machinehealthcheck.
- each:
gauge:
path:
- status
- expectedMachines
type: Gauge
name: status_expected_machines
- each:
gauge:
path:
- status
- currentHealthy
type: Gauge
name: status_current_healthy
- each:
gauge:
path:
- status
- remediationsAllowed
type: Gauge
name: status_remediations_allowed
- groupVersionKind:
group: cluster.x-k8s.io
kind: MachineSet
version: v1beta1
namespace: capi
subsystem: machineset
labelsFromPath:
name: [metadata, name]
namespace: [metadata, namespace]
uid: [metadata, uid]
cluster_name: [spec, clusterName]
metrics:
- each:
gauge:
path:
- metadata
- creationTimestamp
type: Gauge
name: created
help: Unix creation timestamp.
- each:
gauge:
path:
- status
- availableReplicas
nilIsZero: true
type: Gauge
name: status_available_replicas
- each:
stateSet:
labelName: status
labelsFromPath:
type:
- type
list:
- 'True'
- 'False'
- Unknown
path:
- status
- conditions
valueFrom:
- status
type: StateSet
name: status_condition
help: The current status conditions of a machineset.
- each:
gauge:
path:
- status
- replicas
nilIsZero: true
type: Gauge
name: status_replicas
- each:
gauge:
path:
- status
- fullyLabeledReplicas
type: Gauge
name: status_fully_labeled_replicas
- each:
gauge:
path:
- status
- readyReplicas
nilIsZero: true
type: Gauge
name: status_ready_replicas
- each:
gauge:
path:
- spec
- replicas
nilIsZero: true
type: Gauge
name: spec_replicas
- each:
info:
labelsFromPath:
owner_is_controller:
- controller
owner_kind:
- kind
owner_name:
- name
owner_uid:
- uid
path:
- metadata
- ownerReferences
type: Info
name: owner
Example resulting metrics:
I did use a cluster-api capd quickstart + added a machinehealthcheck to create the objects. |
770ebd7
to
f2659da
Compare
A thing to note: This PR as of now would currently add a breaking change to the experimental feature configuration file for the custom resources. If wanted we could also migrate the old config to a versioned configuration file and add the new layout as new version so it could leverage conversion funcs to stay compatible to the old configuration. |
I am traveling currently, should be able to look at the PR next week. |
we are really looking forward to getting this merged and starting using kube-stat-metrics in Cluster API, any chance to bump this up in the backlog list? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution, I like the change and I think it takes CRD generation in the right direction.
One thing I did not understand is where the susbsystem
field comes from in the example attached in the PR description.
Thank you for taking a look :-) The
It makes sense to use it to e.g. have consistent metric names while migrating to a other api verison of the CR. |
I see, that maybe should not have gotten through. We should use |
Yes, should be fine to change, otherwise the config refactoring would also not be possible like this. As it defaults not to At the end the field allows to overwrite / replace the calculated Edit: we can't use |
Alternative approaches to the current Current example, results in metric kind: CustomResourceStateMetrics
spec:
resources:
- groupVersionKind:
group: cluster.x-k8s.io
kind: Cluster
version: v1beta1
namespace: capi
subsystem: cluster # overwrites cluster_x_k8s_io_v1beta1_cluster
metrics:
... Alternative 1: rename kind: CustomResourceStateMetrics
spec:
resources:
- groupVersionKind:
group: cluster.x-k8s.io
kind: Cluster
version: v1beta1
namespace: capi
apiVersion: cluster # overwrites cluster_x_k8s_io_v1beta1_cluster
metrics:
... Alternative 2: Toggle GVK defaulting via bool: kind: CustomResourceStateMetrics
spec:
resources:
- groupVersionKind:
group: cluster.x-k8s.io
kind: Cluster
version: v1beta1
namespace: capi_cluster
disableGVKInName: true # disables cluster_x_k8s_io_v1beta1_cluster
metrics:
... Of course there are also others :-) |
One thing I am still not following, if we have a GVK, why do we need a subsystem/apiversion as well? Isn't this information already contained in the GVK? |
The I'll try to make an example, let's consider we have the following config: kind: CustomResourceStateMetrics
spec:
resources:
- groupVersionKind:
group: cluster.x-k8s.io
kind: Cluster
version: v1alpha4
namespace: capi
metrics:
... The resulting metrics would be called: Now a new api version gets introduced kind: CustomResourceStateMetrics
spec:
resources:
- groupVersionKind:
group: cluster.x-k8s.io
kind: Cluster
version: v1beta1
namespace: capi
metrics:
... This results in also changing the metric names to If instead the current |
Got it, thanks for the explanation. In that case, this field seems to be related to the of the metric, and not to the kube api resource itself. From what I understand, the value can be arbitrary and user-defined. So something like I also don't think the CR namespace should end up in the metric name, but rather be exposed as a label. This way if Wdyt? |
I agree, so the idea would be to have a field
👍 because namespace is also misleading because it has nothing to do with a kubernetes namespace of a custom resource I will adjust the PR to do so :-) |
/hold Want to do some more testing tomorrow :-) |
Yes sure 👍 thanks for pointing me to it 😃 |
0cfbf0e
to
f8e4991
Compare
I also updated the docs now. Happy to get another round of feedback :-) Thanks to everyone involved 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the extensive work on this feature 👍
/lgtm
/hold
Can you fix linting?
|
…efactor configuration file * Adds detection of booleans in string format to getNum. * Refactors configuration file to allow definition of different metric types having different configuration variables. * Refactor order of types and funcs in pkg/customersourcestate. * Allows info and stateSet metrics to iterate over arrays. * Adds `nilIsZero` config variable to gauge to indicate non-existing values to tread as 0 value instead of returning an error. * Skip adding a label instead of setting value to `<nil>`. * Replace namespace and subsystem by metricsNamePrefix * Adjust docs for customresourcestate metrics to align with new configuration file
f8e4991
to
52d3ab6
Compare
Thanks a lot for your contribution! /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chrischdi, fpetkovski, mrueg The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@chrischdi Great work, thank you very much! Looking forward to using this feature :) |
What this PR does / why we need it:
having different configuration variables.
I'm opening this PR to start discussing possible solutions :-) I'm open to all feedback 👍
From a user perspective, this PR basically refactors the layout of the custom resource config file.
It introduces a
type
variable as well as typed metric configuration forGauge
,Info
andStateSet
metrics, which aligns its naming from the OpenMetrics specification.The
type
variable aligns with the recommendations of the sig-api-machinery regarding the discriminator field.How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality)
Does not change cardinality.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #1755
Fixes #1762
Example configuration:
Example resulting metrics:
Example object