fix CloudProvider metric #1031

cjerad · 2021-12-20T16:08:21Z

1. Issue, if available:
None

2. Description of changes:
Previously, the latency recorded for the CloudProvider.Create() method may have missed some latency. Now the latency "start time" is set correctly.

3. Does this change impact docs?

Yes, PR includes docs updates
Yes, issue opened: link to issue
No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

netlify · 2021-12-20T16:08:27Z

✔️ Deploy Preview for karpenter-docs-prod canceled.

🔨 Explore the source changes: d62d1ee

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/61c0f3cc80c6760008f2a987

ellistarn · 2021-12-20T17:08:27Z

pkg/cloudprovider/metrics/cloudprovider.go

@@ -67,9 +67,10 @@ func Decorate(cloudProvider cloudprovider.CloudProvider) cloudprovider.CloudProv
 }

 func (d *decorator) Create(ctx context.Context, constraints *v1alpha5.Constraints, instanceTypes []cloudprovider.InstanceType, quantity int, callback func(*v1.Node) error) <-chan error {
+	recordLatency := metrics.Measure(methodDurationHistogramVec.WithLabelValues(getControllerName(ctx), "Create", d.Name()))


The cloud provider was originally written to be async (i.e. return a chan) in order to support batching on the cloud provider side. Instead, batching was built into the arguments (i.e. quantity).

I don't think it would be crazy to change the cloud provider to return an error instead of a chan(error), which would simplify the complexity in the metrics implementation.

I have a noob question. Does removing the async design prohibit us from performing time consuming callbacks when a node is launched? For example, can i still wait until the node is ready in the callback function?

fix CloudProvider metric

4b4ce94

ellistarn reviewed Dec 20, 2021

View reviewed changes

refactor CloudProvider.Create() to return an error

75492a4

cjerad marked this pull request as ready for review December 20, 2021 20:40

remove unused work queue

d62d1ee

ellistarn approved these changes Dec 23, 2021

View reviewed changes

ellistarn merged commit 09674aa into aws:main Dec 23, 2021

cjerad deleted the fix-metric branch January 3, 2022 15:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix CloudProvider metric #1031

fix CloudProvider metric #1031

cjerad commented Dec 20, 2021

netlify bot commented Dec 20, 2021 •

edited

Loading

ellistarn Dec 20, 2021

felix-zhe-huang Dec 21, 2021

fix CloudProvider metric #1031

fix CloudProvider metric #1031

Conversation

cjerad commented Dec 20, 2021

netlify bot commented Dec 20, 2021 • edited Loading

ellistarn Dec 20, 2021

Choose a reason for hiding this comment

felix-zhe-huang Dec 21, 2021

Choose a reason for hiding this comment

netlify bot commented Dec 20, 2021 •

edited

Loading