-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch EC2 DescribeInstances calls #1947
Conversation
Code Coverage Diff
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm pending scalability tests before approval
c := newEC2Cloud(region, awsSdkDebugLog, userAgentExtra) | ||
|
||
if batching { | ||
klog.V(4).InfoS("NewCloud: batching enabled") | ||
cloudInstance, ok := c.(*cloud) | ||
if !ok { | ||
return nil, fmt.Errorf("expected *cloud type but got %T", c) | ||
} | ||
cloudInstance.bm = newBatcherManager(cloudInstance.ec2) | ||
} | ||
|
||
c := newEC2Cloud(region, awsSdkDebugLog, userAgentExtra, batching) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! passing the batching status down the helper is much cleaner 👍
pkg/cloud/cloud.go
Outdated
return nil, r.Err | ||
} | ||
if r.Result == nil { | ||
return nil, fmt.Errorf("batchDescribeInstances: no instance found %s", task) // TODO Q: Should this be Cloud's ErrNotFound instead? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO Q: Should this be Cloud's ErrNotFound instead?
Yes +1 to this, that error type is more appropriate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change batchDescribeVolumes
as well then to be consistent.
3e14c19
to
ee19d85
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
one nitpick
/hold Waiting on scalability tests. |
ee19d85
to
f320c1a
Compare
Settled on 300ms instanceIDBatcher maxDelay by running scalability tests. We found that this value balanced The table below shows total amount DescribeInstances API Calls made when attaching and detaching x volumes in a scalability test.
|
/unhold |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ConnorJC3 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
Is this a bug fix or adding new feature?
Feature
What is this PR about? / Why do we need it?
Coalesces EC2 DescribeInstances calls across ControllerPublishVolume/ControllerUnpublishVolume RPCs.
This decreases the likelihood of being exceeding one's Non-mutating API request token limit (ie being throttled for making too many
Describe*
calls).What testing is done?
Manual testing on statefulset to check that DescribeInstances calls were batched across RPCs
CI