-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cap IOPS when calculating from iopsPerGB #809
Conversation
Pull Request Test Coverage Report for Build 1771
💛 - Coveralls |
/retest |
Did I break it? |
That looks related to the recent change I made. I'll take a look today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @AndyXiangLi for verifying EBS limits.
func capIOPS(requestedCapacityGiB int64, requstedIOPSPerGB, minTotalIOPS, maxTotalIOPS, maxIOPSPerGB int64) int64 { | ||
iops := requestedCapacityGiB * requstedIOPSPerGB | ||
|
||
if iops < minTotalIOPS { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of this change. This might end up in a surprise bill for the customer, so if IOPS is smaller than minimum we should just return and explicit error instead of increasing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider the alternative as described in PR description is forcing the user to inspect the storageclass then edit their PVC to be larger which also incurs a larger bill. Using example in PR description, two scenarios:
- We accept this PR and always increase iops to the floor: 5GiB at 100iops is 5*.125+100*.065 = $7.125/month.
- We force user to set the iops to the floor: 10GiB at 100 iops is 10*.125+100*.065 = $7.75/month (+ the cost of their time to visit aws docs and pull out their calculator).
This is just one example **, I don't really want to put the effort into drawing graphs. I think as long as we document this behavior it will be fine. Admins will appreciate they do not have to estimate their users' behavior (if they are on a tight budget they can just set iopspergb to 1 and call it a day) and users do not have to pull out their calculators.
So atm I am inclined to accept the PR.
edit: cost per gb is .065 not .0625
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my reasoning. If a user expects their volume to cost, say, $5 with the values they're setting, it becomes a problem at scale. The difference on paper is only $2, but what if you have 1000 of such volumes across 10 regions? Suddenly your bill is 2 x 10 x 1000 more than you expect. Am I missing something?
We can include the limits in the error message (or even a direct link to the docs) to make it a little bit easier on the user. I understand it's still not a great user experience, but I think it's preferable to having a surprise bill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be safe, my idea is we don't make decision for customers, customer can adjust the iopsPerGB or volume size per their requirement as long as we deliver the message clearly (Like Ayberk mentioned, attach the doc link in the error message, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user is mistaken in their $5 calculation if it's predicated on their volume having an impossible GB+IOPS configuration. The choice is not between $7 and $5, it's between $7 (increase both size and iops), $6 (increase only iops), and not running the Pod w/ EBS at all. So really we are debating whether we should let the user know about their mistake or fix it on their behalf. I prefer the latter as long as we document it, let me spell it out.
(Keep in mind on a multi-tenant cluster the person creating PVCs doesn't necessarily have permissions to create StorageClasses. Also the above numbers are imaginary but you get my point)
Say the driver provisioning for StorageClass A with iopspergb x is presented with an impossible GB+IOPS configuration. To remedy the situation, something/somebody must:
- Increase SIZE and IOPS of volume such that its IOPS is greater than the minimum.
- Increase IOPS of volume such that its IOPS is greater than the minimum.
- Decide that fixing the volume to be a valid configuration would be too expensive, so do nothing, and use EFS instead or just not run their workload.
Note option 1) is ALWAYS more expensive than option 2).
If the driver throws an error and refuses to create the volume (like today), option 2 is unavailable. Because for StorageClass A there is no way for them to control iops except by increasing size of volume. So if they want to run their stateful workload they are forced to go with the more expensive option 1
If the driver fixes the volume for the user (if we accept this PR as-is), option 3 is unavailable. I don't think this option 3 would be that popular. I think people would prefer their workload to run w/ EBS than not.
IF this is so contentious, we can make it a flag that defaults to current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am open to remove increasing the IOPS, however, but it must come with a very clear error message. The current one is very confusing. With 4 GiB volume and 1 IOPS per GB, I get:
could not create volume in EC2: InvalidParameterValue: Volume iops of 4 is too low; minimum is 100.
But there is no iops
setting in the storage class. Only iopsPerGB
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
talked to @msau42 on slack
What about this:
- Let's cap max IOPS to get within the supported range. I don't see any controversy here.
- For CSI storage classes, do not increase iops to the supported range and return error (= as it is today, only with a nicer error message).
- However, increase the iops for storage classes migrated from in-tree, to preserve existing behavior.
How to distinguish these two in CreateVolume
?
- CSI translation lib can add
alpha.csi.storage.k8s.io/migrated
parameter to CSI StorageClass when it's migrated from in-tree. Note: this translation is in-memory, in the external-provisioner. It's not stored in API server. - The driver sees
alpha.csi.storage.k8s.io/migrated
parameter in CreateVolume. - When users create a CSI StorageClass with
alpha.csi.storage.k8s.io/migrated
parameter, it's their problem. We will not document the parameter and we reserve the right to remove it in the future. That's why thealpha
prefix, even though it will be there probably forever :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When users create a CSI StorageClass with alpha.csi.storage.k8s.io/migrated parameter, it's their problem. We will not document the parameter and we reserve the right to remove it in the future. That's why the alpha prefix, even though it will be there probably forever :-(
This also has a nice side-effect of enabling customers to bypass the error by adding this. Feels dirty, but I like it :P
I am all for this approach. It's the best of both worlds: we avoid the surprise bill issue and we don't change the behavior for transitioning volumes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not make it an explicit parameter then like ignoreIopsTooLow
(terrible name but ya)? If you introduce this hack then someone will rely on it and complain, doesn't amtter if it says alpha. migration would silently add the parameter, but for new StorageClasses admin will have the choice whether to err on side of ensuring all PVCs get a PV even if overprovisioned v.s. ensuring all PVCs either get an exactly-provisoined PV or no PV at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added parameter allowAutoIOPSPerGBIncrease
(naming is hard!) + unit tests + documentation, PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this as long as we edit README too
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jsafrane, wongma7 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
/retest |
BTW, my motivations for this PR are:
(I don't like |
Updated README. |
pkg/cloud/cloud.go
Outdated
io1MaxTotalIOPS = 64000 | ||
io1MaxIOPSPerGB = 50 | ||
io2MinTotalIOPS = 100 | ||
io2MaxTotalIOPS = 256000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be 64k, similar to io1. EBS team told us 256k is for block express, which is currently opt-in, so for most customers the limit is still 64k.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lowered to 64k.
Consider AWS IOPS limits when calculating IOPS from iopsPerGB to always provision a supported volume.
/test pull-aws-ebs-csi-driver-unit |
@@ -154,6 +155,8 @@ func (d *controllerService) CreateVolume(ctx context.Context, req *csi.CreateVol | |||
if err != nil { | |||
return nil, status.Errorf(codes.InvalidArgument, "Could not parse invalid iopsPerGB: %v", err) | |||
} | |||
case AllowAutoIOPSPerGBIncreaseKey: | |||
allowIOPSPerGBIncrease = value == "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this case insensitive? Since the keys are, it'd be more consistent.
createType = diskOptions.VolumeType | ||
iops = capacityGiB * int64(diskOptions.IOPSPerGB) | ||
iops, err = capIOPS(diskOptions.VolumeType, capacityGiB, int64(diskOptions.IOPSPerGB), io2MinTotalIOPS, io2MaxTotalIOPS, io2MaxIOPSPerGB, diskOptions.AllowIOPSPerGBIncrease) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine for now, but it might make more sense to create a struct for volume types with each encapsulating their limits, eventually. cc @AndyXiangLi something to keep in mind if we add more types with similar logic.
/unhold |
Not sure if I missed any comment,IMO it's ready for review. @AndyXiangLi @wongma7 PTAL. |
This looks good to me. Thank you! |
With this merged, I should update CSI translation lib to add |
IMO it's fine, we only need to support the combo of external-provisioner+driver we release+test in our helm chart. I guess someone out there may try using new external-provisioner and old driver and run into this issue but we shouldnt have to support it : / |
Filed kubernetes/kubernetes#101082 to update the translation. |
Is this a bug fix or adding new feature?
bugfix
What is this PR about? / Why do we need it?
Consider AWS IOPS limits when calculating IOPS from
iopsPerGB
to always provision a supported volume. Users should be create a StorageClass "fast" that provides max. IOPS that AWS can support for any volume size they provide, without errors.Without the PR, users must calculate
iopsPerGB * PVC size
and choose the right PVC size where resulting IOPS does not violate AWS IOPS limits. For example, with iopsPerGB = 10 and PVC size 5 GB, they get errorVolume iops of 50 is too low; minimum is 100.
Please check the current AWS limits carefully. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html is not exactly clear what's max IOPS per GB for io2 volumes - on one place it's 500:1 and on another 1000:1, 500 seems to be the right ratio.
What testing is done?
Wrote few new unit tests, checking only IOPS in
CreateVolumeInput
.