Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support scale up of nodepool with zero nodes with ephemeral storage r… #6607

Merged

Conversation

vbhargav875
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

With OCI Nodepools current implementation, we do not support scaling up of a nodepool with zero nodes when the pod spec was ephemeral-storage requests since does not know what ephemeral-storage size should be set to the node. With this PR we allow users to set the ephemeral-storage of a nodepool using OCI freeform tags.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/cluster-autoscaler labels Mar 11, 2024
@vbhargav875 vbhargav875 force-pushed the oci-node-ephemeral-storage branch 2 times, most recently from 1d86b7a to 25b927f Compare March 11, 2024 15:02
node.Status.Capacity[apiv1.ResourceCPU] = *resource.NewQuantity(int64(shape.CPU), resource.DecimalSI)
node.Status.Capacity[apiv1.ResourceMemory] = *resource.NewQuantity(int64(shape.MemoryInBytes), resource.DecimalSI)
node.Status.Capacity[ipconsts.ResourceGPU] = *resource.NewQuantity(int64(shape.GPU), resource.DecimalSI)

ephemeralStorage, err := getEphemeralResourceRequestsInBytes(freeformTags)
if err != nil {
klog.V(4).Info(err)
Copy link
Contributor

@trungng92 trungng92 Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, curious, why do we log the error, but not return the error? And why is this Info level? Shouldn't it at least be a warning?

If we are doing this on purpose, can we add a comment explaining why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that, since it is not mandatory for users to add the required freeform tag, so we could just leave it as an info log with higher verbosity level so it does not spam the logs. This would only be needed if customers want CA to spin up nodes with a specific ephemeral-storage size (especially when they want to scale up the pool from 0).

Let me know if you would still like me to make modifications here.

Copy link
Contributor

@trungng92 trungng92 Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if a user is specifying a value, but it is not getting parsed correctly, I personally think we should error out so that the user can quickly know that something is wrong and correct it.

Otherwise, they will think everything is successful, even when it's not.

However, if we don't do this, at the very least we should have it as a warn level. I could see arguments where it would be okay to warn the user. For example, the autoscaler is working, then the user mistypes a free form tag which will cause a failure, but doesn't see a failure until days later when a scale operation actually occurs. Or the fact that specifying one wrong value will cause scaling to not happen at all.

Overall, I'd be okay with erroring out or warning (but slightly prefer erroring).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, thanks. I've changed the logic here.

}
unit := value[valueLength-3:]
if unit != "GiB" && unit != "MiB" {
return 0, fmt.Errorf("invalid value set for ephemeral-storage requests")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, we should make the error measurement easier for the user to know what to do.
"I put a wrong value, but the error doesn't indicate what the value should be"

This comment might be irrelevant if we use the kubernetes resource library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yes, I have made the changes to use the k8s library.

if err != nil {
return 0, err
}
if unit == "GiB" {
Copy link
Contributor

@trungng92 trungng92 Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://pkg.go.dev/k8s.io/apimachinery/pkg/api/resource

We should be using this resource library, so that it will match everything kubernetes does.

Test file has examples: https://github.com/kubernetes/apimachinery/blob/master/pkg/api/resource/quantity_test.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, updated this.

@@ -25,4 +25,7 @@ const (
// ToBeDeletedByClusterAutoscaler is the taint used to ensure that after a node has been called to be deleted
// no more pods will schedule onto it
ToBeDeletedByClusterAutoscaler = "ignore-taint.cluster-autoscaler.kubernetes.io/oke-impending-node-termination"

// EphemeralStorageSize is the freeform tag key that would be used to determine the ephemeral-storage size of the node
EphemeralStorageSize = "cluster-autoscaler/node-ephemeral-storage"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is an OKE specific implementation, I'm on the fence whether we should make the name indicate that it's custom/OKE specific. 🤔

@vbhargav875 vbhargav875 force-pushed the oci-node-ephemeral-storage branch from 25b927f to 80cd80f Compare March 12, 2024 11:03
func getEphemeralResourceRequestsInBytes(tags map[string]string) (int64, error) {
for key, value := range tags {
if key == npconsts.EphemeralStorageSize {
klog.V(0).Infof("ephemeral-storage size set with value : %v", value)
Copy link
Contributor

@trungng92 trungng92 Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps change this to klog 4 (and the log below)? Do we we need to see these logs every time (log level 0)?

}
}()

resourceSize := resource.MustParse(value)
Copy link
Contributor

@trungng92 trungng92 Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://pkg.go.dev/k8s.io/apimachinery/pkg/api/resource#ParseQuantity
We should use ParseQuantity instead of MustParse so it doesn't panic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, made the changes.

@vbhargav875 vbhargav875 force-pushed the oci-node-ephemeral-storage branch 2 times, most recently from eba5612 to 6d1a03f Compare March 13, 2024 15:30
@trungng92
Copy link
Contributor

Changes look good to me.

@@ -25,4 +25,7 @@ const (
// ToBeDeletedByClusterAutoscaler is the taint used to ensure that after a node has been called to be deleted
// no more pods will schedule onto it
ToBeDeletedByClusterAutoscaler = "ignore-taint.cluster-autoscaler.kubernetes.io/oke-impending-node-termination"

// EphemeralStorageSize is the freeform tag key that would be used to determine the ephemeral-storage size of the node
EphemeralStorageSize = "cluster-autoscaler/node-ephemeral-storage"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using free-form tags to provide local disk information? Is there something special about the scale from 0 scenario that prevents the OKE and InstancePool implementations from using the LocalDisksTotalSizeInGBs and LocalDisks fields of core.shape when populating the node template in buildNodeFromTemplate?

Copy link
Contributor

@jlamillan jlamillan Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming ephemeral storage (like memory and CPU) can be determined from the shape, it is still useful to optionally support using free-form tags to override shape specific values like CPU, memory, ephemeral-storage, GPU, etc..

However, in my opinion, the values from core.shape should remain the default source resource information for a node when they are available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK... after looking into this more, it looks like most shapes use "block storage only". This means that most of the time LocalDisksTotalSizeInGBs and LocalDisks will not be set.

I still think it's a good idea for both the node-pool and instance-pool implementations to check for and use shape.LocalDisksTotalSizeInGBs from core.shape by default. However, given that it's going to be unset most of the time, it seems like a good idea to add the ability to receive these values via free-form tags.

@@ -542,12 +545,22 @@ func (m *ociManagerImpl) buildNodeFromTemplate(nodePool *oke.NodePool) (*apiv1.N
})
}

freeformTags, err := m.ociTagsGetter.GetFreeformTags(nodePool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the discussion in the other thread, I would expect our shape struct to have a new field, say, EphemeralStorageInGB, which may or may not be set after the call to m.ociShapeGetter.GetNodePoolShape(nodePool) based on disk info of the node shape.

It is at this point that buildNodeFromTemplate() would set the value of node.Status.Capacity[apiv1.ResourceEphemeralStorage] to either shape.EphemeralStorageInGB or the value from the free-form tag.

Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get the LocalDiskSize of a shape from the Core Service/Compute API - (LINK).
But this is not part of the NodeShapeConfig.
We would need to work on adding the changes to oci-go-sdk so that the NodeShapeConfig has the LocalDiskSize, but this would be a whole api level change and would involve a lot of time and effort.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Since there are two implementations of this provider (and because the core API may evolve to offer ephemeral storage), I think it still makes sense to expose this attribute to both implementation using our common.Shape struct alongside the other capacity attributes (CPU, memory, GPU).

So, for now, you still fetch the value from the free-form tag using a common.getEphemeralResourceRequestsInBytes(freeformTags) function, but instead of setting a local / private variable, you'd set shape.EphemeralStorageGB.

cluster-autoscaler/cloudprovider/oci/common/oci_shape.go

type Shape struct {
	Name             string
	CPU              float32
	GPU              int
	MemoryInBytes    float32
++	EphemeralStorageGB float32
}

As a followup MR, we'd need to ensure the instancepool implementation is also using the common.getEphemeralResourceRequestsInBytes(freeformTags) function to check for ans set shape.EphemeralStorageGB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I have updated the PR with the above suggested changes.
Please take a look.

@vbhargav875 vbhargav875 force-pushed the oci-node-ephemeral-storage branch from 6d1a03f to cc73806 Compare April 28, 2024 18:03
@k8s-ci-robot k8s-ci-robot added the area/provider/oci Issues or PRs related to oci provider label Apr 28, 2024
@vbhargav875 vbhargav875 force-pushed the oci-node-ephemeral-storage branch 2 times, most recently from 1c02dfc to 4b9cba3 Compare April 29, 2024 05:48
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 29, 2024
@vbhargav875 vbhargav875 force-pushed the oci-node-ephemeral-storage branch 3 times, most recently from 48ea7e7 to 5c7dc04 Compare April 29, 2024 07:04

// TagsGetter returns the oci tags for the pool.
type TagsGetter interface {
GetFreeformTags(*oke.NodePool) (map[string]string, error)
Copy link
Contributor

@jlamillan jlamillan May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there will be two separate implementations, please rename this function to GetNodePoolFreeformTags(*oke.NodePool) (map[string]string, error).

Later, we can add a second function to the interface called GetInstancePoolFreeformTags(*core.InstancePool) (map[string]string, error) and the associated implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, done.

Copy link
Contributor

@jlamillan jlamillan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the requested rename of GetFreeformTags to GetNodePoolFreeformTags, this change is looking good.

@vbhargav875 vbhargav875 force-pushed the oci-node-ephemeral-storage branch from 5c7dc04 to cd077ce Compare May 6, 2024 05:24
@vbhargav875 vbhargav875 force-pushed the oci-node-ephemeral-storage branch from cd077ce to e6291c1 Compare May 6, 2024 05:35
Copy link
Contributor

@jlamillan jlamillan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Vijay.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 6, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlamillan, vbhargav875

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 6, 2024
@k8s-ci-robot k8s-ci-robot merged commit f783488 into kubernetes:master May 6, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler area/provider/oci Issues or PRs related to oci provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants