Skip to content

Commit

Permalink
Metric for CloudProvider.Refresh() duration
Browse files Browse the repository at this point in the history
This function can take an variable amount of time due to various
conditions (ie. many nodegroups changes causing forced refreshes,
caches time to live expiries, ...).

Monitoring that duration is useful to diagnose those variations,
and to uncover external issues (ie. throttling from cloud provider)
affecting cluster-autoscaler.
  • Loading branch information
bpineau committed May 31, 2021
1 parent 0298597 commit 986fe3a
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 0 deletions.
2 changes: 2 additions & 0 deletions cluster-autoscaler/core/static_autoscaler.go
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,9 @@ func (a *StaticAutoscaler) RunOnce(currentTime time.Time) errors.AutoscalerError
}

// Call CloudProvider.Refresh before any other calls to cloud provider.
refreshStart := time.Now()
err = a.AutoscalingContext.CloudProvider.Refresh()
metrics.UpdateDurationFromStart(metrics.CloudProviderRefresh, refreshStart)
if err != nil {
klog.Errorf("Failed to refresh cloud provider config: %v", err)
return errors.ToAutoscalerError(errors.CloudProviderError, err)
Expand Down
1 change: 1 addition & 0 deletions cluster-autoscaler/metrics/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ const (
FindUnneeded FunctionLabel = "findUnneeded"
UpdateState FunctionLabel = "updateClusterState"
FilterOutSchedulable FunctionLabel = "filterOutSchedulable"
CloudProviderRefresh FunctionLabel = "cloudProviderRefresh"
Main FunctionLabel = "main"
Poll FunctionLabel = "poll"
Reconfigure FunctionLabel = "reconfigure"
Expand Down

0 comments on commit 986fe3a

Please sign in to comment.