-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCE ephemeral storage on local SSDs #4318
GCE ephemeral storage on local SSDs #4318
Conversation
Welcome @adrienjt! |
cc: @jayantjain93 I think you're the one with the most context to review |
if disk.Type == "SCRATCH" && disk.InitializeParams.DiskType == "local-ssd" { | ||
// we could just multiply ssdCount by a constant 375GiB, because all GCE local SSDs are 375GiB | ||
// but this loop verifies that the instance template has at least the requested number of local SSDs | ||
// and is forward-compatible with a potentially different size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make the change forward-compatible imo.
This only works for the case when a new disk size is introduced and all the disks for the node are of the same size. In all other cases, this will create an unsuspecting bug.
I would recommend that we use a constant 375GiB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only works for the case when a new disk size is introduced and all the disks for the node are of the same size.
That's what I had in mind, e.g., imagine that a new machine family is introduced with 500GiB local SSDs. It wouldn't make much sense to have local SSDs of different sizes on the same node IMO, because that's not RAID-friendly, but to avoid unsuspecting bugs, I could continue the loop to check that all disks are the same size. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imagine that a new machine family is introduced with 500GiB local SSDs. It wouldn't make much sense to have local SSDs of different sizes on the same node IMO
I am trying to avoid these speculations of what GCE might end up supporting.
I would just do away with all the checking of size also. If we're not supporting any other disk size than 375GiB, there isn't really a point to run the loop. Because running the loop makes the assumption of all disk are of x size. But we don't know how x will evolve or if all disk will be of same size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should at least make sure that enough local SSDs are attached. A loop is needed to count them. So we might as well also check that all disks are the same size, and use that size.
// we could just multiply ssdCount by a constant 375GiB, because all GCE local SSDs are 375GiB | ||
// but this loop verifies that the instance template has at least the requested number of local SSDs | ||
// and is forward-compatible with a potentially different size | ||
count++ |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct.
@@ -228,18 +231,61 @@ func (t *GceTemplateBuilder) BuildNodeFromTemplate(mig Mig, template *gce.Instan | |||
return &node, nil | |||
} | |||
|
|||
// isEphemeralStorageWithInstanceTemplateDisabled will allow bypassing Disk Size of Boot Disk from being | |||
func ephemeralStorageLocalSSDCount(kubeEnvValue string) int64 { | |||
v, found, err := extractAutoscalerVarFromKubeEnv(kubeEnvValue, "EPH_STORAGE_LOCAL_SSD_COUNT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change the variable name to ephemeral-storage-local-ssd-count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me. With underscores though, right? Most variables in AUTOSCALER_ENV_VARS are lowercase with underscores. For this one, I was just trying to be consistent with BLOCK_EPH_STORAGE_BOOT_DISK... which isn't consistent with other variables, but is the "closest relative".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought you were trying to be consistent with BLOCK_EPH_STORAGE_BOOT_DISK, but that is a special case flag which is used for blocking, where as this is a capacity.
Underscores sounds fine, I might have missed the hierarchy for AUTOSCALER_ENV_VARS.
storage := int64(math.Ceil(0.015635*float64(diskSize))) + int64(math.Ceil(4.148*GiB)) // os partition estimation | ||
storage += int64(math.Min(100*MiB, math.Ceil(0.001*float64(diskSize)))) // over-provisioning buffer | ||
if !ephemeralStorageOnLocalSSDs { | ||
storage += int64(math.Ceil(4.148 * GiB)) // os partition estimation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these values/changes verified and what process did you follow to do test this.
These current values are obtained experimentally and there wasn't any testing done to segregate the os partition and filesystem overhead, hence I'm not sure if these values correctly estimate the ephemeral storage on the local storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've observed that the value given by df
(the 1K-blocks column), which is what the kubelet uses as capacity if system reserved is 0, in our system (not GKE, custom Ubuntu node images), is 386913664 KiB (~= 369 GiB) for a single 375 GiB local SSD, so the overhead is 1.60%, close to your value for COS, and lower than your value for Ubuntu, but that's okay, I'd rather reserve more than not enough.
Note: when I increase the number of disks to 4 or 8, RAID increases the overhead value to 1.61%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is possible to test this on standard node image for cos and ubuntu (used in GKE)?
We want make sure that it works well for most number of disks, and we would be okay to put additional offset for disks 4 to 8, if we can map this data more accurately.
We had spent an extensive amount of time to estimate the values used here, and deviations in this had caused major supportability issues for us. Hence, my concerns about checking these values.
If you're okay to share some data files, I wouldn't mind having a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.google.com/spreadsheets/d/1j0dWPXlr8J24H5af_piYGCb0zVg51P9HCLbz6IvOofo/edit?usp=sharing
See new code for how I use the results from the GKE experiment.
storage := int64(math.Ceil(0.03083*float64(diskSize))) + int64(math.Ceil(0.171*GiB)) // os partition estimation | ||
storage += int64(math.Min(100*MiB, math.Ceil(0.001*float64(diskSize)))) // over-provisioning buffer | ||
if !ephemeralStorageOnLocalSSDs { | ||
storage += int64(math.Ceil(0.171 * GiB)) // os partition estimation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as above about testing.
if tc.kubeEnv != "" { | ||
template.Properties.Metadata.Items = []*gce.MetadataItems{{Key: "kube-env", Value: &tc.kubeEnv}} | ||
} | ||
node, err := tb.BuildNodeFromTemplate(mig, template, tc.physicalCpu, tc.physicalMemory, tc.pods) | ||
node, err := tb.BuildNodeFromTemplate(mig, template, physicalCpu, physicalMemory, tc.pods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing tc.
, uses static constant and forgoes testing of the cases.
We need to use tc.
so testing can be done on an individual level. This closes door to every test a different value for physicalCpu
and physicalMemory
other than those defined as a static value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The same values are used for all tests.
- These values aren't variables of the system under test. As explained in the comment that I added, the test itself computes expected CPU/memory capacity/allocatable, so it doesn't make sense to use different values for different test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned all the values are same (since we end up copy pasting the scenarios with additional fields). But those values are still being individually tested.
With static values we will not be able to unit test different values for BuildNodeFromTemplate, which is compared in line:216
I'm not seeing any gains from making this static. Maybe I'm missing context ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to wrap my head around what this test actually tests, since it's calling submethods BuildCapacity and CalculateAllocatable in the test itself. It turns out physicalCpu and physicalMemory are test invariants: changing their values doesn't affect the test results, because they are used to compute the expected results as well.
That being said, someone could add logic in BuildNodeFromTemplate, outside of BuildCapacity and CalculateAllocatable, that would use physicalCpu and physicalMemory as variables, in which case they wouldn't be invariants anymore, but that's not currently the case.
Anyway, I've dropped my changes in this regard because I don't want this PR to be blocked because of my attempt to clarify test boundaries. I'm happy to let someone else fix that later, if anyone cares.
Type: "SCRATCH", | ||
InitializeParams: &gce.AttachedDiskInitializeParams{ | ||
DiskType: "local-ssd", | ||
DiskSizeGb: localSSDSizeGiB, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend not using a static value for DiskSizeGb
instead define a variable in the test struct making the testing scenarios extendable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I'll adjust if we decide to keep it non-static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
}, | ||
}, | ||
}, | ||
}, | ||
} | ||
for i := int64(0); i < tc.attachedLocalSSDCount; i++ { | ||
template.Properties.Disks = append(template.Properties.Disks, &gce.AttachedDisk{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think defining the template may become obsolete if we use a static value for Disk size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I'll adjust if we do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not using a static value
var physicalCpu int64 = 8 | ||
var physicalMemory int64 = 200 * units.MiB | ||
var bootDiskSizeGiB int64 = 300 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recommend not introducing these variables.
The values these variables are referring to are test cases, and each should be considered individually (even if they may share a common value).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see reply to other comment, esp. (2): #4318 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted, cf. #4318 (comment)
physicalEphemeralStorage = 0 | ||
} | ||
capacity, err := tb.BuildCapacity(tc.physicalCpu, tc.physicalMemory, tc.accelerators, OperatingSystemLinux, OperatingSystemDistributionCOS, physicalEphemeralStorage*units.GiB, tc.pods) | ||
capacity, err := tb.BuildCapacity(physicalCpu, physicalMemory, tc.accelerators, OperatingSystemLinux, OperatingSystemDistributionCOS, tc.physicalEphemeralStorageGiB*units.GiB, tc.ephemeralStorageOnLocalSSD, tc.pods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement is by design. To test reading a value in the template with the Block flag vs the correct value determined by tc.isEphemeralStorageBlocked
.
The test case loses it's validity by changing the value in line:125.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value in line 125 is the expected physicalEphemeralStorageGiB (before reservation). It includes the expectation that isEphemeralStorageBlocked. For tests where isEphemeralStorageBlocked was true, I expect physicalEphemeralStorageGiB to be zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That isn't correct. This flag is used for local ssd cases, where we have a boot disk present but we don't elect to acknowledge the size as boot disk. This is testing the flag control on the expected value. The non-zero value is set in the instance template.
If any functional change happens we expect to catch it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted, cf. #4318 (comment)
@MaciekPytel Thanks for the heads up. @adrienjt I have reviewed the changes and left some comments. |
@jayantjain93 thank you for your review. I have replied to your comments, please let me know what you think and I'll update the code once we agree. |
return storage | ||
case OperatingSystemDistributionWindowsLTSC, OperatingSystemDistributionWindowsSAC: | ||
storage := int64(math.Ceil(0.1133 * GiB)) // os partition estimation | ||
storage += int64(math.Ceil(0.010 * GiB)) // over-provisioning buffer | ||
storage += int64(math.Ceil(0.1133 * GiB)) // os partition estimation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are experimental observations across multiple machine types and zones. So the formulas are the closest representation of what we observed.
I would confirm the values for windows also on SSDs.
@@ -228,18 +231,61 @@ func (t *GceTemplateBuilder) BuildNodeFromTemplate(mig Mig, template *gce.Instan | |||
return &node, nil | |||
} | |||
|
|||
// isEphemeralStorageWithInstanceTemplateDisabled will allow bypassing Disk Size of Boot Disk from being | |||
func ephemeralStorageLocalSSDCount(kubeEnvValue string) int64 { | |||
v, found, err := extractAutoscalerVarFromKubeEnv(kubeEnvValue, "EPH_STORAGE_LOCAL_SSD_COUNT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought you were trying to be consistent with BLOCK_EPH_STORAGE_BOOT_DISK, but that is a special case flag which is used for blocking, where as this is a capacity.
Underscores sounds fine, I might have missed the hierarchy for AUTOSCALER_ENV_VARS.
if disk.Type == "SCRATCH" && disk.InitializeParams.DiskType == "local-ssd" { | ||
// we could just multiply ssdCount by a constant 375GiB, because all GCE local SSDs are 375GiB | ||
// but this loop verifies that the instance template has at least the requested number of local SSDs | ||
// and is forward-compatible with a potentially different size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imagine that a new machine family is introduced with 500GiB local SSDs. It wouldn't make much sense to have local SSDs of different sizes on the same node IMO
I am trying to avoid these speculations of what GCE might end up supporting.
I would just do away with all the checking of size also. If we're not supporting any other disk size than 375GiB, there isn't really a point to run the loop. Because running the loop makes the assumption of all disk are of x size. But we don't know how x will evolve or if all disk will be of same size.
if tc.kubeEnv != "" { | ||
template.Properties.Metadata.Items = []*gce.MetadataItems{{Key: "kube-env", Value: &tc.kubeEnv}} | ||
} | ||
node, err := tb.BuildNodeFromTemplate(mig, template, tc.physicalCpu, tc.physicalMemory, tc.pods) | ||
node, err := tb.BuildNodeFromTemplate(mig, template, physicalCpu, physicalMemory, tc.pods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned all the values are same (since we end up copy pasting the scenarios with additional fields). But those values are still being individually tested.
With static values we will not be able to unit test different values for BuildNodeFromTemplate, which is compared in line:216
I'm not seeing any gains from making this static. Maybe I'm missing context ?
physicalEphemeralStorage = 0 | ||
} | ||
capacity, err := tb.BuildCapacity(tc.physicalCpu, tc.physicalMemory, tc.accelerators, OperatingSystemLinux, OperatingSystemDistributionCOS, physicalEphemeralStorage*units.GiB, tc.pods) | ||
capacity, err := tb.BuildCapacity(physicalCpu, physicalMemory, tc.accelerators, OperatingSystemLinux, OperatingSystemDistributionCOS, tc.physicalEphemeralStorageGiB*units.GiB, tc.ephemeralStorageOnLocalSSD, tc.pods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That isn't correct. This flag is used for local ssd cases, where we have a boot disk present but we don't elect to acknowledge the size as boot disk. This is testing the flag control on the expected value. The non-zero value is set in the instance template.
If any functional change happens we expect to catch it here.
storage := int64(math.Ceil(0.015635*float64(diskSize))) + int64(math.Ceil(4.148*GiB)) // os partition estimation | ||
storage += int64(math.Min(100*MiB, math.Ceil(0.001*float64(diskSize)))) // over-provisioning buffer | ||
if !ephemeralStorageOnLocalSSDs { | ||
storage += int64(math.Ceil(4.148 * GiB)) // os partition estimation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is possible to test this on standard node image for cos and ubuntu (used in GKE)?
We want make sure that it works well for most number of disks, and we would be okay to put additional offset for disks 4 to 8, if we can map this data more accurately.
We had spent an extensive amount of time to estimate the values used here, and deviations in this had caused major supportability issues for us. Hence, my concerns about checking these values.
If you're okay to share some data files, I wouldn't mind having a look.
3455568
to
3e19bac
Compare
@jayantjain93 I have addressed your comments and updated the PR. Please let me know what you think. |
Thanks for the follow up @adrienjt. I did start having a look, but it may take me a couple days to get back to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of comments.
Also, the PR has run the workflow which is passing the unit tests.
Rest is LGTM.
storage += int64(math.Min(100*MiB, math.Ceil(0.001*float64(diskSize)))) // over-provisioning buffer | ||
return storage | ||
if ephemeralStorageLocalSSDCount > 0 { | ||
storage = 1024 * ephemeralStorageOnLocalSSDFilesystemOverheadInKiBByOSAndDiskCount[OperatingSystemDistributionCOS][mapActualToMeasuredEphemeralStorageLocalSSDCount(ephemeralStorageLocalSSDCount)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems cleaner to just return the actual overhead from the function rather than to return a disk count that again uses some array. Maybe refactoring this to return the reserved for count/os would be more readable.
e.g.
func reservedEphemeralStorageForLocalSSD(diskCount, OSDistribution) (reservedStorage) {
}
And let all the complexity be handled there including kiB to B conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it's not really OS-reserved, but really a filesystem overhead, so it would make sense to move this out of CalculateOSReservedEphemeralStorage.
diskSizeGiB = disk.InitializeParams.DiskSizeGb | ||
} else if diskSizeGiB != disk.InitializeParams.DiskSizeGb { | ||
return 0, fmt.Errorf("local SSDs of different sizes are not supported") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since #4318 (comment) is outdated, starting a new thread.
I like the idea of verifying the count and disk size, but I think the disk size check could be implemented differently.
I would think of something like this:
for _, disk := range instanceProperties.Disks {
if disk != nil && disk.InitializeParams != nil {
if disk.Type == "SCRATCH" && disk.InitializeParams.DiskType == "local-ssd" && disk.InitializeParams.DiskSizeGb == LOCAL_SSD_DISK_SIZE {
count++
}
}
}
if count < ssdCount {
return 0, fmt.Errorf("actual local SSD count is lower than ephemeral_storage_local_ssd_count")
}
and avoid fetching the size of the disk, and be relevant to checking.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your proposal, it's okay if the template has ssdCount disks of size LOCAL_SSD_DISK_SIZE, and N additional disks of any sizes. I guess that's all we need. The check I had, "local SSDs of different sizes are not supported", is indeed too strict, and would allow any (unique) disk size for which experimental filesystem overheads have not been measured. Anyway, in practice, all disks are 375GB at the moment, so the checks are equivalent, but I'll switch to yours.
d4b92c2
to
5efd026
Compare
func mapActualToMeasuredEphemeralStorageLocalSSDCount(count int64) int64 { | ||
if count <= 8 { | ||
return count | ||
} | ||
if count <= 16 { | ||
return 16 | ||
} | ||
return 24 // max attachable | ||
} | ||
|
||
// EphemeralStorageOnLocalSSDFilesystemOverheadInBytes estimates the difference | ||
// between the total physical capacity of the local SSDs and the ephemeral | ||
// storage filesystem capacity. It uses experimental values measured in GKE, | ||
// which are good-enough approximations for custom Kubernetes on GCE. | ||
func EphemeralStorageOnLocalSSDFilesystemOverheadInBytes(diskCount int64, osDistribution OperatingSystemDistribution) int64 { | ||
c := mapActualToMeasuredEphemeralStorageLocalSSDCount(diskCount) | ||
o, ok := ephemeralStorageOnLocalSSDFilesystemOverheadInKiBByOSAndDiskCount[osDistribution] | ||
if !ok { | ||
klog.Errorf("Ephemeral storage backed by local SSDs is not supported for image family %v", osDistribution) | ||
return 0 | ||
} | ||
return 1024 * o[c] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any reason to keep these two functions separate.
Having a local variable measurableCount
etc. would work in EphemeralStorageOnLocalSSDFilesystemOverheadInBytes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
// which are good-enough approximations for custom Kubernetes on GCE. | ||
func EphemeralStorageOnLocalSSDFilesystemOverheadInBytes(diskCount int64, osDistribution OperatingSystemDistribution) int64 { | ||
c := mapActualToMeasuredEphemeralStorageLocalSSDCount(diskCount) | ||
o, ok := ephemeralStorageOnLocalSSDFilesystemOverheadInKiBByOSAndDiskCount[osDistribution] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this new refactor, OperatingSystemDistributionCOSContainerd and OperatingSystemDistributionUbuntuContainerd os distributions now become unsupported.
I don't think this was intentional since the overhead are the same for containerd and docker runtimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test to capture that.
return int64(n) | ||
} | ||
|
||
const localSSDDiskSizeInGiB = 375 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this to the top of the file with other consts line:45
and start the same of the variable with cap. LocalSSDDiskSizeInGiB
physicalMemory: 200 * units.MiB, | ||
ephemeralStorageLocalSSDCount: 2, | ||
attachedLocalSSDCount: 4, | ||
expectedErr: false, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need a test case with ephemeral storage in kube_reserved and localSSD.
Can you confirm if kube_reserved is using LocalSSDs as ephemeral storage and not boot disk in the current case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirm. Adding a test...
/ok-to-test |
If ephemeral_storage_local_ssd_count=N is specified in kube-env (in AUTOSCALER_ENV_VARS), physical ephemeral storage is N*375GiB instead of the boot disk size, and capacity is measured from GKE experiments. The existing BLOCK_EPH_STORAGE_BOOT_DISK is ignored if ephemeral_storage_local_ssd_count>0.
41111b0
to
d3dd856
Compare
/lgtm |
@MaciekPytel , this needs to be approved for testing. And needs a final approval. |
/assign @MaciekPytel |
|
||
if count < ssdCount { | ||
return 0, fmt.Errorf("actual local SSD count is lower than ephemeral_storage_local_ssd_count") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for a dumb question, but why do we need both kube-env var and instance property? Wouldn't it be simpler to just rely on init params? Is there any use case for having an instance with more local SSDs than configured in kube-env?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. We use a subset of local SSDs as a persistent volume and the rest for ephemeral storage.
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: adrienjt, jayantjain93, MaciekPytel The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Dunno if anyone will see this, but is it possible that this PR works for SAS attached local-ssd but not NVMe attached? Seeing this behavior on gke |
This PR helps for self-managed clusters in GCE, but doesn't work out of the box for GKE. We need some changes on GKE side to set up env vars correctly first. |
That certainly 'sounds' like a simple fix. Why is it taking a year to implement? |
Yeah, it should be simple, just didn't get prioritized (this PR was sent by an external contributor). Let me check if that's indeed the only change needed and get back to you later this week. |
So I did a quick prototype and it passed an otherwise failing scale-from-0 scenario. I'll work on getting this change in. |
This should work as soon as cluster versions higher than 1.24.3-gke.2200 become available. |
Might want to update the limitations page with a note:
|
Yup, the docs were updated. The feature is available from 1.24.5-gke.600 onwards. |
If EPH_STORAGE_LOCAL_SSD_COUNT=N is specified in kube-env (in AUTOSCALER_ENV_VARS), physical ephemeral storage is N*375GiB instead of the boot disk size, and only the variable part of OS reserved (filesystem overhead vs. fixed OS partition) is subtracted to compute capacity.
The existing BLOCK_EPH_STORAGE_BOOT_DISK is ignored if EPH_STORAGE_LOCAL_SSD_COUNT>0.
fixes #4281 and #1869