Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix CloudProvider metric #1031

Merged
merged 3 commits into from
Dec 23, 2021
Merged

fix CloudProvider metric #1031

merged 3 commits into from
Dec 23, 2021

Conversation

cjerad
Copy link
Contributor

@cjerad cjerad commented Dec 20, 2021

1. Issue, if available:
None

2. Description of changes:
Previously, the latency recorded for the CloudProvider.Create() method may have missed some latency. Now the latency "start time" is set correctly.

3. Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: link to issue
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@netlify
Copy link

netlify bot commented Dec 20, 2021

✔️ Deploy Preview for karpenter-docs-prod canceled.

🔨 Explore the source changes: d62d1ee

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/61c0f3cc80c6760008f2a987

@@ -67,9 +67,10 @@ func Decorate(cloudProvider cloudprovider.CloudProvider) cloudprovider.CloudProv
}

func (d *decorator) Create(ctx context.Context, constraints *v1alpha5.Constraints, instanceTypes []cloudprovider.InstanceType, quantity int, callback func(*v1.Node) error) <-chan error {
recordLatency := metrics.Measure(methodDurationHistogramVec.WithLabelValues(getControllerName(ctx), "Create", d.Name()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cloud provider was originally written to be async (i.e. return a chan) in order to support batching on the cloud provider side. Instead, batching was built into the arguments (i.e. quantity).

I don't think it would be crazy to change the cloud provider to return an error instead of a chan(error), which would simplify the complexity in the metrics implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a noob question. Does removing the async design prohibit us from performing time consuming callbacks when a node is launched? For example, can i still wait until the node is ready in the callback function?

@cjerad cjerad marked this pull request as ready for review December 20, 2021 20:40
@ellistarn ellistarn merged commit 09674aa into aws:main Dec 23, 2021
@cjerad cjerad deleted the fix-metric branch January 3, 2022 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants