Skip to content

Commit

Permalink
update metrics
Browse files Browse the repository at this point in the history
  • Loading branch information
jmdeal committed Nov 30, 2024
1 parent 57417b2 commit c5a1fd2
Show file tree
Hide file tree
Showing 3 changed files with 321 additions and 15 deletions.
112 changes: 107 additions & 5 deletions website/content/en/docs/reference/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,30 @@ Number of nodeclaims disrupted in total by Karpenter. Labeled by reason the node
Number of nodeclaims created in total by Karpenter. Labeled by reason the nodeclaim was created and the owning nodepool.
- Stability Level: STABLE

### `operator_nodeclaim_status_condition_transitions_total`
The count of transitions of a nodeclaim, type and status. Labeled by the type, reason, and status.
- Stability Level: BETA

### `operator_nodeclaim_status_condition_transition_seconds`
The amount of time a condition was in a given state before transitioning. Labeled by the name of the nodeclaim, and the namespace.
- Stability Level: BETA

### `operator_nodeclaim_status_condition_current_status_seconds`
The current amount of time in seconds that a status condition has been in a specific state. Labeled by the name of the nodelcaim, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_nodeclaim_status_condition_count`
The number of a condition for a nodeclaim, type and status. Labeled by the name, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_nodeclaim_termination_current_time_seconds`
The current amount of time in seconds that a nodeclaim has been in terminating state. Labeled by name, and namespace.
- Stability Level: BETA

### `operator_nodeclaim_termination_duration_seconds`
The amount of time taken by a nodeclaim to terminate completely.
- Stability Level: BETA

## Nodes Metrics

### `karpenter_nodes_total_pod_requests`
Expand Down Expand Up @@ -92,6 +116,34 @@ Number of nodes created in total by Karpenter. Labeled by owning nodepool.
Node allocatable are the resources allocatable by nodes.
- Stability Level: BETA

### `operator_node_status_condition_transitions_total`
The count of transitions of a node, type and status.
- Stability Level: BETA

### `operator_node_status_condition_transition_seconds`
The amount of time a condition was in a given state before transitioning. Labeled by the name of the nodeclaim, and the namespace.
- Stability Level: BETA

### `operator_node_status_condition_current_status_seconds`
The current amount of time in seconds that a status condition has been in a specific state. Labeled by the name of the nodelcaim, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_node_status_condition_count`
The number of a condition for a node, type and status. Labeled by the name, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_node_termination_current_time_seconds`
The current amount of time in seconds that a node has been in terminating state. Labeled by name, and namespace.
- Stability Level: BETA

### `operator_node_termination_duration_seconds`
The amount of time taken by a node to terminate completely.
- Stability Level: BETA

### `operator_node_event_count`
The number of a events for a node.
- Stability Level: BETA

## Pods Metrics

### `karpenter_pods_state`
Expand All @@ -110,7 +162,7 @@ The amount of time taken by an object to terminate completely.

### `operator_termination_current_time_seconds`
The current amount of time in seconds that an object has been in terminating state.
- Stability Level: ALPHA
- Stability Level: DEPRECATED

## Voluntary Disruption Metrics

Expand Down Expand Up @@ -158,6 +210,56 @@ Limits specified on the nodepool that restrict the quantity of resources provisi
The number of nodes for a given NodePool that can be concurrently disrupting at a point in time. Labeled by NodePool. Note that allowed disruptions can change very rapidly, as new nodes may be created and others may be deleted at any point.
- Stability Level: ALPHA

### `operator_nodepool_status_condition_transitions_total`
The count of transitions of a nodepool, type and status. Labeled by the type, reason, and status.
- Stability Level: BETA

### `operator_nodepool_status_condition_transition_seconds`
The amount of time a condition was in a given state before transitioning. Labeled by the name of the nodeclaim, and the namespace.
- Stability Level: BETA

### `operator_nodepool_status_condition_current_status_seconds`
The current amount of time in seconds that a status condition has been in a specific state. Labeled by the name of the nodelcaim, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_nodepool_status_condition_count`
The number of an condition for a nodepool, type and status. Labeled by the name, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_nodepool_termination_current_time_seconds`
The current amount of time in seconds that a nodepool has been in terminating state. Labeled by name, and namespace.
- Stability Level: BETA

### `operator_nodepool_termination_duration_seconds`
Duration of NodePool termination in seconds.
- Stability Level: BETA

## EC2NodeClass Metrics

### `operator_ec2nodeclass_status_condition_transitions_total`
The count of transitions of a ec2nodeclass, type and status. Labeled by the type, reason, and status.
- Stability Level: BETA

### `operator_ec2nodeclass_status_condition_transition_seconds`
The amount of time a condition was in a given state before transitioning. Labeled by the name of the nodeclaim, and the namespace.
- Stability Level: BETA

### `operator_ec2nodeclass_status_condition_current_status_seconds`
The current amount of time in seconds that a status condition has been in a specific state. Labeled by the name of the nodelcaim, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_ec2nodeclass_status_condition_count`
The number of an condition for an ec2nodeclass, type and status. Labeled by the name, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_ec2nodeclass_termination_current_time_seconds`
The current amount of time in seconds that an ec2nodeclass has been in terminating state. Labeled by name, and namespace.
- Stability Level: BETA

### `operator_ec2nodeclass_termination_duration_seconds`
Duration of ec2nodeclass termination in seconds.
- Stability Level: BETA

## Interruption Metrics

### `karpenter_interruption_received_messages_total`
Expand Down Expand Up @@ -292,19 +394,19 @@ Total number of adds handled by workqueue

### `operator_status_condition_transitions_total`
The count of transitions of a given object, type and status.
- Stability Level: BETA
- Stability Level: DEPRECATED

### `operator_status_condition_transition_seconds`
The amount of time a condition was in a given state before transitioning. e.g. Alarm := P99(Updated=False) > 5 minutes
- Stability Level: BETA
- Stability Level: DEPRECATED

### `operator_status_condition_current_status_seconds`
The current amount of time in seconds that a status condition has been in a specific state. Alarm := P99(Updated=Unknown) > 5 minutes
- Stability Level: BETA
- Stability Level: DEPRECATED

### `operator_status_condition_count`
The number of an condition for a given object, type and status. e.g. Alarm := Available=False > 0
- Stability Level: BETA
- Stability Level: DEPRECATED

## Client Go Metrics

Expand Down
112 changes: 107 additions & 5 deletions website/content/en/preview/reference/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,30 @@ Number of nodeclaims disrupted in total by Karpenter. Labeled by reason the node
Number of nodeclaims created in total by Karpenter. Labeled by reason the nodeclaim was created and the owning nodepool.
- Stability Level: STABLE

### `operator_nodeclaim_status_condition_transitions_total`
The count of transitions of a nodeclaim, type and status. Labeled by the type, reason, and status.
- Stability Level: BETA

### `operator_nodeclaim_status_condition_transition_seconds`
The amount of time a condition was in a given state before transitioning. Labeled by the name of the nodeclaim, and the namespace.
- Stability Level: BETA

### `operator_nodeclaim_status_condition_current_status_seconds`
The current amount of time in seconds that a status condition has been in a specific state. Labeled by the name of the nodelcaim, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_nodeclaim_status_condition_count`
The number of a condition for a nodeclaim, type and status. Labeled by the name, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_nodeclaim_termination_current_time_seconds`
The current amount of time in seconds that a nodeclaim has been in terminating state. Labeled by name, and namespace.
- Stability Level: BETA

### `operator_nodeclaim_termination_duration_seconds`
The amount of time taken by a nodeclaim to terminate completely.
- Stability Level: BETA

## Nodes Metrics

### `karpenter_nodes_total_pod_requests`
Expand Down Expand Up @@ -92,6 +116,34 @@ Number of nodes created in total by Karpenter. Labeled by owning nodepool.
Node allocatable are the resources allocatable by nodes.
- Stability Level: BETA

### `operator_node_status_condition_transitions_total`
The count of transitions of a node, type and status.
- Stability Level: BETA

### `operator_node_status_condition_transition_seconds`
The amount of time a condition was in a given state before transitioning. Labeled by the name of the nodeclaim, and the namespace.
- Stability Level: BETA

### `operator_node_status_condition_current_status_seconds`
The current amount of time in seconds that a status condition has been in a specific state. Labeled by the name of the nodelcaim, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_node_status_condition_count`
The number of a condition for a node, type and status. Labeled by the name, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_node_termination_current_time_seconds`
The current amount of time in seconds that a node has been in terminating state. Labeled by name, and namespace.
- Stability Level: BETA

### `operator_node_termination_duration_seconds`
The amount of time taken by a node to terminate completely.
- Stability Level: BETA

### `operator_node_event_count`
The number of a events for a node.
- Stability Level: BETA

## Pods Metrics

### `karpenter_pods_state`
Expand All @@ -110,7 +162,7 @@ The amount of time taken by an object to terminate completely.

### `operator_termination_current_time_seconds`
The current amount of time in seconds that an object has been in terminating state.
- Stability Level: ALPHA
- Stability Level: DEPRECATED

## Voluntary Disruption Metrics

Expand Down Expand Up @@ -158,6 +210,56 @@ Limits specified on the nodepool that restrict the quantity of resources provisi
The number of nodes for a given NodePool that can be concurrently disrupting at a point in time. Labeled by NodePool. Note that allowed disruptions can change very rapidly, as new nodes may be created and others may be deleted at any point.
- Stability Level: ALPHA

### `operator_nodepool_status_condition_transitions_total`
The count of transitions of a nodepool, type and status. Labeled by the type, reason, and status.
- Stability Level: BETA

### `operator_nodepool_status_condition_transition_seconds`
The amount of time a condition was in a given state before transitioning. Labeled by the name of the nodeclaim, and the namespace.
- Stability Level: BETA

### `operator_nodepool_status_condition_current_status_seconds`
The current amount of time in seconds that a status condition has been in a specific state. Labeled by the name of the nodelcaim, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_nodepool_status_condition_count`
The number of an condition for a nodepool, type and status. Labeled by the name, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_nodepool_termination_current_time_seconds`
The current amount of time in seconds that a nodepool has been in terminating state. Labeled by name, and namespace.
- Stability Level: BETA

### `operator_nodepool_termination_duration_seconds`
Duration of NodePool termination in seconds.
- Stability Level: BETA

## EC2NodeClass Metrics

### `operator_ec2nodeclass_status_condition_transitions_total`
The count of transitions of a ec2nodeclass, type and status. Labeled by the type, reason, and status.
- Stability Level: BETA

### `operator_ec2nodeclass_status_condition_transition_seconds`
The amount of time a condition was in a given state before transitioning. Labeled by the name of the nodeclaim, and the namespace.
- Stability Level: BETA

### `operator_ec2nodeclass_status_condition_current_status_seconds`
The current amount of time in seconds that a status condition has been in a specific state. Labeled by the name of the nodelcaim, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_ec2nodeclass_status_condition_count`
The number of an condition for an ec2nodeclass, type and status. Labeled by the name, namespace, type, status, and reason.
- Stability Level: BETA

### `operator_ec2nodeclass_termination_current_time_seconds`
The current amount of time in seconds that an ec2nodeclass has been in terminating state. Labeled by name, and namespace.
- Stability Level: BETA

### `operator_ec2nodeclass_termination_duration_seconds`
Duration of ec2nodeclass termination in seconds.
- Stability Level: BETA

## Interruption Metrics

### `karpenter_interruption_received_messages_total`
Expand Down Expand Up @@ -292,19 +394,19 @@ Total number of adds handled by workqueue

### `operator_status_condition_transitions_total`
The count of transitions of a given object, type and status.
- Stability Level: BETA
- Stability Level: DEPRECATED

### `operator_status_condition_transition_seconds`
The amount of time a condition was in a given state before transitioning. e.g. Alarm := P99(Updated=False) > 5 minutes
- Stability Level: BETA
- Stability Level: DEPRECATED

### `operator_status_condition_current_status_seconds`
The current amount of time in seconds that a status condition has been in a specific state. Alarm := P99(Updated=Unknown) > 5 minutes
- Stability Level: BETA
- Stability Level: DEPRECATED

### `operator_status_condition_count`
The number of an condition for a given object, type and status. e.g. Alarm := Available=False > 0
- Stability Level: BETA
- Stability Level: DEPRECATED

## Client Go Metrics

Expand Down
Loading

0 comments on commit c5a1fd2

Please sign in to comment.