Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCE ephemeral storage on local SSDs #4318

Merged

Conversation

adrienjt
Copy link
Contributor

@adrienjt adrienjt commented Sep 8, 2021

If EPH_STORAGE_LOCAL_SSD_COUNT=N is specified in kube-env (in AUTOSCALER_ENV_VARS), physical ephemeral storage is N*375GiB instead of the boot disk size, and only the variable part of OS reserved (filesystem overhead vs. fixed OS partition) is subtracted to compute capacity.

The existing BLOCK_EPH_STORAGE_BOOT_DISK is ignored if EPH_STORAGE_LOCAL_SSD_COUNT>0.

fixes #4281 and #1869

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 8, 2021
@k8s-ci-robot
Copy link
Contributor

Welcome @adrienjt!

It looks like this is your first PR to kubernetes/autoscaler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/autoscaler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 8, 2021
@MaciekPytel
Copy link
Contributor

cc: @jayantjain93 I think you're the one with the most context to review

if disk.Type == "SCRATCH" && disk.InitializeParams.DiskType == "local-ssd" {
// we could just multiply ssdCount by a constant 375GiB, because all GCE local SSDs are 375GiB
// but this loop verifies that the instance template has at least the requested number of local SSDs
// and is forward-compatible with a potentially different size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make the change forward-compatible imo.

This only works for the case when a new disk size is introduced and all the disks for the node are of the same size. In all other cases, this will create an unsuspecting bug.

I would recommend that we use a constant 375GiB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works for the case when a new disk size is introduced and all the disks for the node are of the same size.

That's what I had in mind, e.g., imagine that a new machine family is introduced with 500GiB local SSDs. It wouldn't make much sense to have local SSDs of different sizes on the same node IMO, because that's not RAID-friendly, but to avoid unsuspecting bugs, I could continue the loop to check that all disks are the same size. Wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imagine that a new machine family is introduced with 500GiB local SSDs. It wouldn't make much sense to have local SSDs of different sizes on the same node IMO

I am trying to avoid these speculations of what GCE might end up supporting.

I would just do away with all the checking of size also. If we're not supporting any other disk size than 375GiB, there isn't really a point to run the loop. Because running the loop makes the assumption of all disk are of x size. But we don't know how x will evolve or if all disk will be of same size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should at least make sure that enough local SSDs are attached. A loop is needed to count them. So we might as well also check that all disks are the same size, and use that size.

// we could just multiply ssdCount by a constant 375GiB, because all GCE local SSDs are 375GiB
// but this loop verifies that the instance template has at least the requested number of local SSDs
// and is forward-compatible with a potentially different size
count++

This comment was marked as resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct.

@@ -228,18 +231,61 @@ func (t *GceTemplateBuilder) BuildNodeFromTemplate(mig Mig, template *gce.Instan
return &node, nil
}

// isEphemeralStorageWithInstanceTemplateDisabled will allow bypassing Disk Size of Boot Disk from being
func ephemeralStorageLocalSSDCount(kubeEnvValue string) int64 {
v, found, err := extractAutoscalerVarFromKubeEnv(kubeEnvValue, "EPH_STORAGE_LOCAL_SSD_COUNT")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change the variable name to ephemeral-storage-local-ssd-count

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me. With underscores though, right? Most variables in AUTOSCALER_ENV_VARS are lowercase with underscores. For this one, I was just trying to be consistent with BLOCK_EPH_STORAGE_BOOT_DISK... which isn't consistent with other variables, but is the "closest relative".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you were trying to be consistent with BLOCK_EPH_STORAGE_BOOT_DISK, but that is a special case flag which is used for blocking, where as this is a capacity.

Underscores sounds fine, I might have missed the hierarchy for AUTOSCALER_ENV_VARS.

storage := int64(math.Ceil(0.015635*float64(diskSize))) + int64(math.Ceil(4.148*GiB)) // os partition estimation
storage += int64(math.Min(100*MiB, math.Ceil(0.001*float64(diskSize)))) // over-provisioning buffer
if !ephemeralStorageOnLocalSSDs {
storage += int64(math.Ceil(4.148 * GiB)) // os partition estimation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these values/changes verified and what process did you follow to do test this.

These current values are obtained experimentally and there wasn't any testing done to segregate the os partition and filesystem overhead, hence I'm not sure if these values correctly estimate the ephemeral storage on the local storage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've observed that the value given by df (the 1K-blocks column), which is what the kubelet uses as capacity if system reserved is 0, in our system (not GKE, custom Ubuntu node images), is 386913664 KiB (~= 369 GiB) for a single 375 GiB local SSD, so the overhead is 1.60%, close to your value for COS, and lower than your value for Ubuntu, but that's okay, I'd rather reserve more than not enough.

Note: when I increase the number of disks to 4 or 8, RAID increases the overhead value to 1.61%.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is possible to test this on standard node image for cos and ubuntu (used in GKE)?

We want make sure that it works well for most number of disks, and we would be okay to put additional offset for disks 4 to 8, if we can map this data more accurately.

We had spent an extensive amount of time to estimate the values used here, and deviations in this had caused major supportability issues for us. Hence, my concerns about checking these values.

If you're okay to share some data files, I wouldn't mind having a look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storage := int64(math.Ceil(0.03083*float64(diskSize))) + int64(math.Ceil(0.171*GiB)) // os partition estimation
storage += int64(math.Min(100*MiB, math.Ceil(0.001*float64(diskSize)))) // over-provisioning buffer
if !ephemeralStorageOnLocalSSDs {
storage += int64(math.Ceil(0.171 * GiB)) // os partition estimation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above about testing.

if tc.kubeEnv != "" {
template.Properties.Metadata.Items = []*gce.MetadataItems{{Key: "kube-env", Value: &tc.kubeEnv}}
}
node, err := tb.BuildNodeFromTemplate(mig, template, tc.physicalCpu, tc.physicalMemory, tc.pods)
node, err := tb.BuildNodeFromTemplate(mig, template, physicalCpu, physicalMemory, tc.pods)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing tc., uses static constant and forgoes testing of the cases.

We need to use tc. so testing can be done on an individual level. This closes door to every test a different value for physicalCpu and physicalMemory other than those defined as a static value

Copy link
Contributor Author

@adrienjt adrienjt Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The same values are used for all tests.
  2. These values aren't variables of the system under test. As explained in the comment that I added, the test itself computes expected CPU/memory capacity/allocatable, so it doesn't make sense to use different values for different test cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned all the values are same (since we end up copy pasting the scenarios with additional fields). But those values are still being individually tested.
With static values we will not be able to unit test different values for BuildNodeFromTemplate, which is compared in line:216

I'm not seeing any gains from making this static. Maybe I'm missing context ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to wrap my head around what this test actually tests, since it's calling submethods BuildCapacity and CalculateAllocatable in the test itself. It turns out physicalCpu and physicalMemory are test invariants: changing their values doesn't affect the test results, because they are used to compute the expected results as well.

That being said, someone could add logic in BuildNodeFromTemplate, outside of BuildCapacity and CalculateAllocatable, that would use physicalCpu and physicalMemory as variables, in which case they wouldn't be invariants anymore, but that's not currently the case.

Anyway, I've dropped my changes in this regard because I don't want this PR to be blocked because of my attempt to clarify test boundaries. I'm happy to let someone else fix that later, if anyone cares.

Type: "SCRATCH",
InitializeParams: &gce.AttachedDiskInitializeParams{
DiskType: "local-ssd",
DiskSizeGb: localSSDSizeGiB,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend not using a static value for DiskSizeGb instead define a variable in the test struct making the testing scenarios extendable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I'll adjust if we decide to keep it non-static.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

},
},
},
},
}
for i := int64(0); i < tc.attachedLocalSSDCount; i++ {
template.Properties.Disks = append(template.Properties.Disks, &gce.AttachedDisk{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think defining the template may become obsolete if we use a static value for Disk size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I'll adjust if we do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not using a static value

Comment on lines 46 to 48
var physicalCpu int64 = 8
var physicalMemory int64 = 200 * units.MiB
var bootDiskSizeGiB int64 = 300
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommend not introducing these variables.
The values these variables are referring to are test cases, and each should be considered individually (even if they may share a common value).

Copy link
Contributor Author

@adrienjt adrienjt Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see reply to other comment, esp. (2): #4318 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted, cf. #4318 (comment)

physicalEphemeralStorage = 0
}
capacity, err := tb.BuildCapacity(tc.physicalCpu, tc.physicalMemory, tc.accelerators, OperatingSystemLinux, OperatingSystemDistributionCOS, physicalEphemeralStorage*units.GiB, tc.pods)
capacity, err := tb.BuildCapacity(physicalCpu, physicalMemory, tc.accelerators, OperatingSystemLinux, OperatingSystemDistributionCOS, tc.physicalEphemeralStorageGiB*units.GiB, tc.ephemeralStorageOnLocalSSD, tc.pods)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is by design. To test reading a value in the template with the Block flag vs the correct value determined by tc.isEphemeralStorageBlocked.
The test case loses it's validity by changing the value in line:125.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value in line 125 is the expected physicalEphemeralStorageGiB (before reservation). It includes the expectation that isEphemeralStorageBlocked. For tests where isEphemeralStorageBlocked was true, I expect physicalEphemeralStorageGiB to be zero.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That isn't correct. This flag is used for local ssd cases, where we have a boot disk present but we don't elect to acknowledge the size as boot disk. This is testing the flag control on the expected value. The non-zero value is set in the instance template.

If any functional change happens we expect to catch it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted, cf. #4318 (comment)

@jayantjain93
Copy link
Contributor

jayantjain93 commented Sep 13, 2021

@MaciekPytel Thanks for the heads up.

@adrienjt I have reviewed the changes and left some comments.
p.s. Thanks for the taking the time out to support this feature.

@adrienjt
Copy link
Contributor Author

@jayantjain93 thank you for your review. I have replied to your comments, please let me know what you think and I'll update the code once we agree.

return storage
case OperatingSystemDistributionWindowsLTSC, OperatingSystemDistributionWindowsSAC:
storage := int64(math.Ceil(0.1133 * GiB)) // os partition estimation
storage += int64(math.Ceil(0.010 * GiB)) // over-provisioning buffer
storage += int64(math.Ceil(0.1133 * GiB)) // os partition estimation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are experimental observations across multiple machine types and zones. So the formulas are the closest representation of what we observed.

I would confirm the values for windows also on SSDs.

@@ -228,18 +231,61 @@ func (t *GceTemplateBuilder) BuildNodeFromTemplate(mig Mig, template *gce.Instan
return &node, nil
}

// isEphemeralStorageWithInstanceTemplateDisabled will allow bypassing Disk Size of Boot Disk from being
func ephemeralStorageLocalSSDCount(kubeEnvValue string) int64 {
v, found, err := extractAutoscalerVarFromKubeEnv(kubeEnvValue, "EPH_STORAGE_LOCAL_SSD_COUNT")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you were trying to be consistent with BLOCK_EPH_STORAGE_BOOT_DISK, but that is a special case flag which is used for blocking, where as this is a capacity.

Underscores sounds fine, I might have missed the hierarchy for AUTOSCALER_ENV_VARS.

if disk.Type == "SCRATCH" && disk.InitializeParams.DiskType == "local-ssd" {
// we could just multiply ssdCount by a constant 375GiB, because all GCE local SSDs are 375GiB
// but this loop verifies that the instance template has at least the requested number of local SSDs
// and is forward-compatible with a potentially different size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imagine that a new machine family is introduced with 500GiB local SSDs. It wouldn't make much sense to have local SSDs of different sizes on the same node IMO

I am trying to avoid these speculations of what GCE might end up supporting.

I would just do away with all the checking of size also. If we're not supporting any other disk size than 375GiB, there isn't really a point to run the loop. Because running the loop makes the assumption of all disk are of x size. But we don't know how x will evolve or if all disk will be of same size.

if tc.kubeEnv != "" {
template.Properties.Metadata.Items = []*gce.MetadataItems{{Key: "kube-env", Value: &tc.kubeEnv}}
}
node, err := tb.BuildNodeFromTemplate(mig, template, tc.physicalCpu, tc.physicalMemory, tc.pods)
node, err := tb.BuildNodeFromTemplate(mig, template, physicalCpu, physicalMemory, tc.pods)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned all the values are same (since we end up copy pasting the scenarios with additional fields). But those values are still being individually tested.
With static values we will not be able to unit test different values for BuildNodeFromTemplate, which is compared in line:216

I'm not seeing any gains from making this static. Maybe I'm missing context ?

physicalEphemeralStorage = 0
}
capacity, err := tb.BuildCapacity(tc.physicalCpu, tc.physicalMemory, tc.accelerators, OperatingSystemLinux, OperatingSystemDistributionCOS, physicalEphemeralStorage*units.GiB, tc.pods)
capacity, err := tb.BuildCapacity(physicalCpu, physicalMemory, tc.accelerators, OperatingSystemLinux, OperatingSystemDistributionCOS, tc.physicalEphemeralStorageGiB*units.GiB, tc.ephemeralStorageOnLocalSSD, tc.pods)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That isn't correct. This flag is used for local ssd cases, where we have a boot disk present but we don't elect to acknowledge the size as boot disk. This is testing the flag control on the expected value. The non-zero value is set in the instance template.

If any functional change happens we expect to catch it here.

storage := int64(math.Ceil(0.015635*float64(diskSize))) + int64(math.Ceil(4.148*GiB)) // os partition estimation
storage += int64(math.Min(100*MiB, math.Ceil(0.001*float64(diskSize)))) // over-provisioning buffer
if !ephemeralStorageOnLocalSSDs {
storage += int64(math.Ceil(4.148 * GiB)) // os partition estimation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is possible to test this on standard node image for cos and ubuntu (used in GKE)?

We want make sure that it works well for most number of disks, and we would be okay to put additional offset for disks 4 to 8, if we can map this data more accurately.

We had spent an extensive amount of time to estimate the values used here, and deviations in this had caused major supportability issues for us. Hence, my concerns about checking these values.

If you're okay to share some data files, I wouldn't mind having a look.

@adrienjt adrienjt force-pushed the gce-ephemeral-storage-local-ssd branch from 3455568 to 3e19bac Compare September 20, 2021 09:00
@adrienjt
Copy link
Contributor Author

@jayantjain93 I have addressed your comments and updated the PR. Please let me know what you think.

@jayantjain93
Copy link
Contributor

jayantjain93 commented Sep 22, 2021

@jayantjain93 I have addressed your comments and updated the PR. Please let me know what you think.

Thanks for the follow up @adrienjt. I did start having a look, but it may take me a couple days to get back to you.

Copy link
Contributor

@jayantjain93 jayantjain93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of comments.
Also, the PR has run the workflow which is passing the unit tests.
Rest is LGTM.

storage += int64(math.Min(100*MiB, math.Ceil(0.001*float64(diskSize)))) // over-provisioning buffer
return storage
if ephemeralStorageLocalSSDCount > 0 {
storage = 1024 * ephemeralStorageOnLocalSSDFilesystemOverheadInKiBByOSAndDiskCount[OperatingSystemDistributionCOS][mapActualToMeasuredEphemeralStorageLocalSSDCount(ephemeralStorageLocalSSDCount)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems cleaner to just return the actual overhead from the function rather than to return a disk count that again uses some array. Maybe refactoring this to return the reserved for count/os would be more readable.
e.g.

func reservedEphemeralStorageForLocalSSD(diskCount, OSDistribution) (reservedStorage) {
}

And let all the complexity be handled there including kiB to B conversion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's not really OS-reserved, but really a filesystem overhead, so it would make sense to move this out of CalculateOSReservedEphemeralStorage.

diskSizeGiB = disk.InitializeParams.DiskSizeGb
} else if diskSizeGiB != disk.InitializeParams.DiskSizeGb {
return 0, fmt.Errorf("local SSDs of different sizes are not supported")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since #4318 (comment) is outdated, starting a new thread.

I like the idea of verifying the count and disk size, but I think the disk size check could be implemented differently.

I would think of something like this:

for _, disk := range instanceProperties.Disks {
		if disk != nil && disk.InitializeParams != nil {
			if disk.Type == "SCRATCH" && disk.InitializeParams.DiskType == "local-ssd" && disk.InitializeParams.DiskSizeGb == LOCAL_SSD_DISK_SIZE {
				count++
			}
		}
	}

	if count < ssdCount {
		return 0, fmt.Errorf("actual local SSD count is lower than ephemeral_storage_local_ssd_count")
	}

and avoid fetching the size of the disk, and be relevant to checking.
Thoughts?

Copy link
Contributor Author

@adrienjt adrienjt Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your proposal, it's okay if the template has ssdCount disks of size LOCAL_SSD_DISK_SIZE, and N additional disks of any sizes. I guess that's all we need. The check I had, "local SSDs of different sizes are not supported", is indeed too strict, and would allow any (unique) disk size for which experimental filesystem overheads have not been measured. Anyway, in practice, all disks are 375GB at the moment, so the checks are equivalent, but I'll switch to yours.

@adrienjt adrienjt force-pushed the gce-ephemeral-storage-local-ssd branch 2 times, most recently from d4b92c2 to 5efd026 Compare September 24, 2021 23:45
Comment on lines 238 to 269
func mapActualToMeasuredEphemeralStorageLocalSSDCount(count int64) int64 {
if count <= 8 {
return count
}
if count <= 16 {
return 16
}
return 24 // max attachable
}

// EphemeralStorageOnLocalSSDFilesystemOverheadInBytes estimates the difference
// between the total physical capacity of the local SSDs and the ephemeral
// storage filesystem capacity. It uses experimental values measured in GKE,
// which are good-enough approximations for custom Kubernetes on GCE.
func EphemeralStorageOnLocalSSDFilesystemOverheadInBytes(diskCount int64, osDistribution OperatingSystemDistribution) int64 {
c := mapActualToMeasuredEphemeralStorageLocalSSDCount(diskCount)
o, ok := ephemeralStorageOnLocalSSDFilesystemOverheadInKiBByOSAndDiskCount[osDistribution]
if !ok {
klog.Errorf("Ephemeral storage backed by local SSDs is not supported for image family %v", osDistribution)
return 0
}
return 1024 * o[c]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason to keep these two functions separate.

Having a local variable measurableCount etc. would work in EphemeralStorageOnLocalSSDFilesystemOverheadInBytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// which are good-enough approximations for custom Kubernetes on GCE.
func EphemeralStorageOnLocalSSDFilesystemOverheadInBytes(diskCount int64, osDistribution OperatingSystemDistribution) int64 {
c := mapActualToMeasuredEphemeralStorageLocalSSDCount(diskCount)
o, ok := ephemeralStorageOnLocalSSDFilesystemOverheadInKiBByOSAndDiskCount[osDistribution]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this new refactor, OperatingSystemDistributionCOSContainerd and OperatingSystemDistributionUbuntuContainerd os distributions now become unsupported.
I don't think this was intentional since the overhead are the same for containerd and docker runtimes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test to capture that.

return int64(n)
}

const localSSDDiskSizeInGiB = 375
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this to the top of the file with other consts line:45
and start the same of the variable with cap. LocalSSDDiskSizeInGiB

physicalMemory: 200 * units.MiB,
ephemeralStorageLocalSSDCount: 2,
attachedLocalSSDCount: 4,
expectedErr: false,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need a test case with ephemeral storage in kube_reserved and localSSD.

Can you confirm if kube_reserved is using LocalSSDs as ephemeral storage and not boot disk in the current case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm. Adding a test...

@jayantjain93
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Sep 27, 2021
If ephemeral_storage_local_ssd_count=N is specified in kube-env (in AUTOSCALER_ENV_VARS), physical ephemeral storage is N*375GiB instead of the boot disk size, and capacity is measured from GKE experiments.

The existing BLOCK_EPH_STORAGE_BOOT_DISK is ignored if ephemeral_storage_local_ssd_count>0.
@adrienjt adrienjt force-pushed the gce-ephemeral-storage-local-ssd branch from 41111b0 to d3dd856 Compare September 27, 2021 18:22
@jayantjain93
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 28, 2021
@jayantjain93
Copy link
Contributor

@MaciekPytel , this needs to be approved for testing. And needs a final approval.

@jayantjain93
Copy link
Contributor

/assign @MaciekPytel


if count < ssdCount {
return 0, fmt.Errorf("actual local SSD count is lower than ephemeral_storage_local_ssd_count")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for a dumb question, but why do we need both kube-env var and instance property? Wouldn't it be simpler to just rely on init params? Is there any use case for having an instance with more local SSDs than configured in kube-env?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. We use a subset of local SSDs as a persistent volume and the rest for ephemeral storage.

@x13n
Copy link
Member

x13n commented Oct 15, 2021

/lgtm

@MaciekPytel
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adrienjt, jayantjain93, MaciekPytel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 18, 2021
@k8s-ci-robot k8s-ci-robot merged commit 47bc0f2 into kubernetes:master Oct 18, 2021
@brokenjacobs
Copy link

Dunno if anyone will see this, but is it possible that this PR works for SAS attached local-ssd but not NVMe attached? Seeing this behavior on gke v1.22.8-gke.202 and not sure if it's a regression of just no support for nvme disks.

@x13n
Copy link
Member

x13n commented Aug 8, 2022

This PR helps for self-managed clusters in GCE, but doesn't work out of the box for GKE. We need some changes on GKE side to set up env vars correctly first.

@brokenjacobs
Copy link

This PR helps for self-managed clusters in GCE, but doesn't work out of the box for GKE. We need some changes on GKE side to set up env vars correctly first.

That certainly 'sounds' like a simple fix. Why is it taking a year to implement?

@x13n
Copy link
Member

x13n commented Aug 9, 2022

Yeah, it should be simple, just didn't get prioritized (this PR was sent by an external contributor). Let me check if that's indeed the only change needed and get back to you later this week.

@x13n
Copy link
Member

x13n commented Aug 11, 2022

So I did a quick prototype and it passed an otherwise failing scale-from-0 scenario. I'll work on getting this change in.

@x13n
Copy link
Member

x13n commented Aug 24, 2022

This should work as soon as cluster versions higher than 1.24.3-gke.2200 become available.

@brokenjacobs
Copy link

Might want to update the limitations page with a note:
https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler#limitations

When Pods request ephemeral storage, cluster autoscaler does not support scaling up a node pool with zero nodes that uses [Local SSDs as ephemeral storage](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/local-ssd#creating_a_node_pool_using_ephemeral_storage_on_local_ssds).

@x13n
Copy link
Member

x13n commented Oct 14, 2022

Yup, the docs were updated. The feature is available from 1.24.5-gke.600 onwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cluster-autoscaler, GCE: allow providing ephemeral-storage capacity from kube-env?
7 participants