Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale node storage based on pod ephemeral-storage requests #2394

Open
dewjam opened this issue Aug 29, 2022 · 10 comments
Open

Scale node storage based on pod ephemeral-storage requests #2394

dewjam opened this issue Aug 29, 2022 · 10 comments
Labels
feature New feature or request v1.x Issues prioritized for post-1.0

Comments

@dewjam
Copy link
Contributor

dewjam commented Aug 29, 2022

Tell us about your request
What do you want us to build?
Enable Karpenter to dynamically scale the size of block device attached to a node at provision time. The size of blocked device would be based on the sum of ephemeral-storage requests of pods being bin-packed to a node plus some overhead.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Currently, a node's storage capacity can be defined through the Karpenter provisioner through Block Device Mappings. This works well, but forces customers to define a static value for all instances launched through a given provisioner. Customers would like the ability to dynamically scale storage of the nodes based on the pod workload or by instance-type.

Are you currently working around this issue?
This can be worked around by defining Block Device Mappings in the Karpenter Provisioner. These values are static for the given provisioner, however, and cannot be dynamically scaled up/down.

Related issues:
#2512
#2298
#1995
#1467
#3077
#3111

Additional context
Anything else we should know?

Attachments
If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@cx-IliyaY
Copy link

is there some progress with it?
because the only answer is this link but it's dead now error 404... Block Device Mappings

@tzneal
Copy link
Contributor

tzneal commented Apr 17, 2023

Those docs are now here, but there hasn't been any update.

@jonathan-innis
Copy link
Contributor

is there some progress with it?

No current, active progress. But it's something that's considered "v1" scope, which means that we're planning to work on this part of the v1 release for Karpenter. It's definitely within our list of priorities but the maintainer team has been somewhat time-constrained lately and working on some other feature work and stability improvements.

@jagadeesh-kancherla-tfs
Copy link

+1

@pragmaticivan
Copy link

pragmaticivan commented Jun 17, 2024

This would reduce some alerting for NodePools sharing the same StorageClass with static storage values.

2xl - 16xl might require a bit of a drift in storage due to multiple factors, including docker pulls and volumes.

Any chance this would get a bump in priority?

@Smana
Copy link

Smana commented Jul 25, 2024

Are there any update on this issue? We currently need to specify a constraint for using nvme instance types. And prepare the RAID0 array.

        - key: karpenter.k8s.aws/instance-local-nvme
          operator: Gt
          values: ["100"]

Otherwise we face issues for some pods not having enough ephemeral disk space which leads to evicted pods with a node DiskPressure error.

@zakariais
Copy link

is there some progress with it?

@MedAzizTousli
Copy link

Any progress on this?

@pkit
Copy link

pkit commented Dec 5, 2024

@Smana

       - key: karpenter.k8s.aws/instance-local-nvme
          operator: Gt
          values: ["100"]

Doesn't seem to work for me at all, all nodes have hardcoded 20GB ephemeral no matter what.

@liorfr-monday
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request v1.x Issues prioritized for post-1.0
Projects
None yet
Development

No branches or pull requests