Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Instance Volume Configuration in Provisioner #939

Closed
cwwarren opened this issue Dec 8, 2021 · 12 comments
Closed

Support Instance Volume Configuration in Provisioner #939

cwwarren opened this issue Dec 8, 2021 · 12 comments
Assignees
Labels
feature New feature or request launch-templates Questions related to AWS launch templates scheduling Issues related to Kubernetes scheduling storage Issues related to storage resources and Karpenter

Comments

@cwwarren
Copy link

cwwarren commented Dec 8, 2021

I would like instance root volume to be configurable without a custom launch template. I am aware this is possible with a custom launch template, but we do not want to maintain a custom launch template just to increase storage.

User Story 1:

  • I self-host GitLab runners on Kubernetes, using Karpenter to rapidly scale up and down CI capacity during the day to meet bursty workloads during the day.
  • GitLab runner pods occasionally hit out of storage situations due to the small 20GB gp2 root volume

User Story 2:

  • I have several large container images that contain large binaries and/or datasets.
  • Some are third party and cannot be resolved internally so we are unable to use an alternative solution such as fetching the large assets from S3 to a Pod Generic Ephemeral Volume on startup.
  • These cannot be allocated on Karpenter nodes without a custom launch template.

User Story 3:

  • QA and Release Verification environments are tightly binpacked onto small instances leading to a large number of unique container images required to be extracted.
  • These cannot be allocated on Karpenter nodes without a custom launch template.

Proposed Solution:

  • Add a way to configure the amount of storage (in GiB) for the root volume to each provisioner
  • Ideally add volume options such as IOPS and Throughput for gp3, io1 or io2 volumes.
@khacminh
Copy link

khacminh commented Dec 8, 2021

I cannot run my TensorFlow serving pods with only 20GB of SSD. Please provide this feature soon!

@akestner akestner added feature New feature or request scheduling Issues related to Kubernetes scheduling labels Dec 8, 2021
@ellistarn ellistarn added the storage Issues related to storage resources and Karpenter label Dec 8, 2021
@ellistarn
Copy link
Contributor

We've discussed summing resource.requests['ephemeral-storage'] from pod specs and using it to increase LT size.

For volumes, we need something like #622.

@cwwarren
Copy link
Author

cwwarren commented Dec 8, 2021

We've discussed summing resource.requests['ephemeral-storage'] from pod specs and using it to increase LT size.

This would be an excellent feature that would solve the "80% case" in my opinion. 👍🏻

The issue with the number or size of containers is trickier to solve since Karpenter wouldn't have that information available readily. This may be a speciality use case where we accept maintaining our own launch templates or add ephemeral storage requests to pods with known large containers to trigger the above. This is a rather minor point either way that probably doesn't warrant a complex solution.

@savealive
Copy link

savealive commented Jan 11, 2022

Is it possible to prioritize this issue? It's really a blocking one.

@slobo
Copy link

slobo commented Jan 13, 2022

FWIW in the interim we are running a patched version of Karpenter that hardcodes root volume to 60GiB. See here for code: CourtAPI@8111c2b.

To deploy it in our cluster, we:

  • Created 2 public ECRs: karpenter/controller and karpenter/webhook.
  • Published the modified image to above repository
env CLOUD_PROVIDER=aws RELEASE_REPO=public.ecr.aws/your-alias/karpenter make publish
  • Asked helm chart to use said image:
controller:
  # ...
  image: "public.ecr.aws/..." # Whatever image `make publish` created 

@akestner akestner changed the title Feature Request: Support Instance Volume Configuration in Provisioner Support Instance Volume Configuration in Provisioner Jan 14, 2022
@suket22 suket22 added the launch-templates Questions related to AWS launch templates label Feb 2, 2022
@donalddewulf
Copy link

donalddewulf commented Feb 20, 2022

env CLOUD_PROVIDER=aws RELEASE_REPO=public.ecr.aws/your-alias/karpenter make publish

This command no longer seems to work on 0.6.3.

I see that in the latest versions of the code, the disk is already defined in the launchtemplate.go file

Ebs: &ec2.LaunchTemplateEbsBlockDeviceRequest{
	Encrypted:  aws.Bool(true),
	VolumeSize: aws.Int64(20),
}

I've been trying to figure out how to compile this project and push it to my ECR registry, but I've wasted already too much time. I'm not that familiar with go and the toolchain.

If anyone has a set of commands/steps to run on how to get this project compiled and pushed to a repository, that would be an enormous help.

@ellistarn
Copy link
Contributor

ellistarn commented Feb 20, 2022

https://karpenter.sh/v0.6.3/development-guide/ IIUC, @bwagner5 is already working on this.

@bwagner5
Copy link
Contributor

Block Device Mappings configuration has been merged and should be released next week. You can check out the preview docs here: https://karpenter.sh/preview/aws/provisioning/#block-device-mappings

@dekelev
Copy link

dekelev commented Mar 15, 2022

There's no match between the docs and the validation found here: https://github.com/aws/karpenter/blob/c1d08c6c91019443746ef310f2cd778335b4be65/pkg/cloudprovider/aws/apis/v1alpha1/provider.go#L129

This gives an error on unknown deleteOnTermination field:

spec:
  provider:
    blockDeviceMappings:
      - volumeSize: 100Gi
        volumeType: gp3
        iops: 3000
        throughput: 125
        deleteOnTermination: true

Only this will work:

spec:
  provider:
    blockDeviceMappings:
      - deviceName: /dev/xvda
        ebs:
          volumeSize: 100Gi
          volumeType: gp3
          iops: 3000
          throughput: 125
          deleteOnTermination: true

@bwagner5
Copy link
Contributor

Thank you for pointing this out! I just submitted a PR to correct the docs, apologies for the inconvenience! :)

@renxunsaky
Copy link

renxunsaky commented Aug 25, 2022

Hi,

I am using Karpenter for scheduling my Spark jobs on EKS.

My provisioner has some requirements on multiple instance families and sizes like below:

`
requirements:

  - key: "topology.kubernetes.io/zone"
    operator: In
    values: ["eu-west-1a"]
  - key: "karpenter.sh/capacity-type"
    operator: In
    values: ["spot"]
  - key: karpenter.k8s.aws/instance-family
    operator: In
    values: [m5, r5, r6]
  - key: karpenter.k8s.aws/instance-size
    operator: In
    values: [small, large, 2xlarge, 4xlarge]`

With "blockDeviceMappings" we can define the EBS volume size attached to the instances. But As you can see, I have different instance sizes, so it is logical to adapt the EBS volume size according to the instance type taken by Karpenter. Do you know if there is a way to let Karpenter adapt this size "intelligently" ? Kind of with a ratio, for exemple, in the "blockDeviceMappings" we can take a base of 20GB for a "small" one, and it should multiply the "ratio" with 20GB.

Thanks a lot.

@kfirsch
Copy link

kfirsch commented Jan 24, 2023

Hi,

I am using Karpenter for scheduling my Spark jobs on EKS.

My provisioner has some requirements on multiple instance families and sizes like below:

` requirements:

  - key: "topology.kubernetes.io/zone"
    operator: In
    values: ["eu-west-1a"]
  - key: "karpenter.sh/capacity-type"
    operator: In
    values: ["spot"]
  - key: karpenter.k8s.aws/instance-family
    operator: In
    values: [m5, r5, r6]
  - key: karpenter.k8s.aws/instance-size
    operator: In
    values: [small, large, 2xlarge, 4xlarge]`

With "blockDeviceMappings" we can define the EBS volume size attached to the instances. But As you can see, I have different instance sizes, so it is logical to adapt the EBS volume size according to the instance type taken by Karpenter. Do you know if there is a way to let Karpenter adapt this size "intelligently" ? Kind of with a ratio, for exemple, in the "blockDeviceMappings" we can take a base of 20GB for a "small" one, and it should multiply the "ratio" with 20GB.

Thanks a lot.

facing exact same issue here.
my walkaround is to make a copy of all of my provisioners with the same settings but unique providerRef and "karpenter.k8s.aws/instance-size" requirement.
also make a copy of your AWSNodeTemplate with higher disks specs. e.g:

    blockDeviceMappings:
      - deviceName: /dev/xvda
        ebs:
          volumeSize: 100Gi
          volumeType: gp3
          iops: 3000
          throughput: 125
          deleteOnTermination: true

It will be very useful if karpenter let you dynamically choose your instance disk by its size.
something the example belos in Provisioner spec could be nice:

  ebsVolumeSizePerCore: 10Gi # could be also per Memory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request launch-templates Questions related to AWS launch templates scheduling Issues related to Kubernetes scheduling storage Issues related to storage resources and Karpenter
Projects
None yet
Development

No branches or pull requests