From c2f019dda3b09bf356af3bca6b147c526c2b9290 Mon Sep 17 00:00:00 2001 From: Bryant Biggs Date: Thu, 31 Oct 2024 22:49:21 +0000 Subject: [PATCH] feat: Get Neuron device and core count from EC2 API for all `trn*` and `inf*` instance types (#6510) Co-authored-by: Jason Deal --- designs/limits.md | 5 +- examples/workloads/neuron.yaml | 2 +- hack/code/instancetype_testdata_gen/main.go | 23 +- hack/codegen.sh | 2 +- hack/docs/instancetypes_gen/main.go | 2 +- pkg/apis/v1/labels.go | 1 + pkg/fake/ec2api.go | 4 +- .../zz_generated.describe_instance_types.go | 103 +-- pkg/providers/instance/instance.go | 1 + pkg/providers/instancetype/suite_test.go | 77 ++- pkg/providers/instancetype/types.go | 48 +- .../integration/extended_resources_test.go | 600 ++++++++++++++++++ test/suites/scheduling/suite_test.go | 2 +- .../content/en/preview/concepts/scheduling.md | 15 +- .../en/preview/reference/instance-types.md | 219 ++++++- .../en/preview/upgrading/upgrade-guide.md | 1 + 16 files changed, 993 insertions(+), 112 deletions(-) diff --git a/designs/limits.md b/designs/limits.md index d29cf9ef19f0..dbc2b4376a94 100644 --- a/designs/limits.md +++ b/designs/limits.md @@ -12,14 +12,12 @@ The next large problem is the inability to define a hard ceiling on cluster cost We need to provide similar functionality via Karpenter as well wherein there's a hard limit a customer can configure. - ## Current State To address the runaway-scaling problem the current fix in place is to detect if the kubelet for a worker node has never reported its status to the K8s control plane. If it's been longer than 15 minutes, Karpenter assumes that there's a hard failure mode due to which this worker node will never become healthy and terminates the worker node. If the condition map of the node object in the API Server says `NodeStatusNeverUpdated` then we use that as an indicator of the node having never come up. This fix ensures that if there are other scenarios where a worker node has become unhealthy due to a network partition or power outage in a availability zone, we don't terminate those worker nodes. It's important we don't make the static stability of a cluster worse during such an event. On the other hand, if there is an edge case where worker nodes come online and soon go offline, it will lead to runaway scaling again. This edge case should be unlikely to happen in the near term, so this document focuses on just the ability to limit costs within Karpenter. That way even if runaway scaling does occur there's a way to bound it. A longer-term solution to handle the runaway problem will be discussed separately. - ## Proposed Solution for Limits There are two broad forms of limiting we could apply. The first is that we could introduce a limit to the number of in-flight worker node being provisioned at a point in time. A worker node that's in the `NotReady` state could be considered to be in-flight. The second form is an absolute limit of the number of resources Karpenter can provision. @@ -37,6 +35,7 @@ In the above example - `20%` indicates that if at any point in time, more than 2 The good bit about this approach is that we don't constrain how many total worker nodes can be spun up by Karpenter, while also making sure that if we keep launching worker nodes that aren't healthy, we stop the scaling and save costs. The two main problems with this approach though are - + 1. This limit while meant to just constrain the number of unhealthy worker nodes in a cluster, will also inhibit the rate at which Karpenter can respond to pods that aren't schedulable. This somewhat goes against the goal of minimizing launch times of workers. 2. While this helps ensure that costs don't increase due to runaway scaling, it won't help those who want a stricter cap on the amount of resources that's being provisioned even when nodes are otherwise healthy. @@ -62,11 +61,13 @@ As a cost control mechanism, this requires a little more work from our users if [CPU limits](https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/#cpu-units), memory limits and GPU limits will be defined similar to resource requests and will not be required by default. Karpenter will also will not default to any limits itself. The list of supported resource types is - + - `cpu` - `memory` - `nvidia.com/gpu` - `amd.com/gpu` - `aws.amazon.com/neuron` +- `aws.amazon.com/neuroncore` - `habana.ai/gaudi` Limits will be defined at the per-provisioner level. We'll rely on the `karpenter.sh/provisioner-name` node label when calculating resource usage by a specific provisioner. This is useful when multiple teams share a single cluster and use separate provisioners since each team's resource consumption will be limited separately. diff --git a/examples/workloads/neuron.yaml b/examples/workloads/neuron.yaml index e7cf74e13230..9629eeaad0ba 100644 --- a/examples/workloads/neuron.yaml +++ b/examples/workloads/neuron.yaml @@ -26,4 +26,4 @@ spec: cpu: "1" memory: 256M securityContext: - allowPrivilegeEscalation: false \ No newline at end of file + allowPrivilegeEscalation: false diff --git a/hack/code/instancetype_testdata_gen/main.go b/hack/code/instancetype_testdata_gen/main.go index e0df1b16163d..0debaccd22c6 100644 --- a/hack/code/instancetype_testdata_gen/main.go +++ b/hack/code/instancetype_testdata_gen/main.go @@ -147,11 +147,11 @@ func getInstanceTypeInfo(info *ec2.InstanceTypeInfo) string { fmt.Fprintf(src, "NvmeSupport: aws.String(\"%s\"),\n", lo.FromPtr(info.EbsInfo.NvmeSupport)) fmt.Fprintf(src, "},\n") } - if info.InferenceAcceleratorInfo != nil { - fmt.Fprintf(src, "InferenceAcceleratorInfo: &ec2.InferenceAcceleratorInfo{\n") - fmt.Fprintf(src, "Accelerators: []*ec2.InferenceDeviceInfo{\n") - for _, elem := range info.InferenceAcceleratorInfo.Accelerators { - fmt.Fprintf(src, getInferenceAcceleratorDeviceInfo(elem)) + if info.NeuronInfo != nil { + fmt.Fprintf(src, "NeuronInfo: &ec2.NeuronInfo{\n") + fmt.Fprintf(src, "NeuronDevices: []*ec2.NeuronDeviceInfo{\n") + for _, elem := range info.NeuronInfo.NeuronDevices { + fmt.Fprintf(src, getNeuronDeviceInfo(elem)) } fmt.Fprintf(src, "},\n") fmt.Fprintf(src, "},\n") @@ -199,12 +199,19 @@ func getNetworkCardInfo(info *ec2.NetworkCardInfo) string { return src.String() } -func getInferenceAcceleratorDeviceInfo(info *ec2.InferenceDeviceInfo) string { +func getNeuronDeviceInfo(info *ec2.NeuronDeviceInfo) string { + src := &bytes.Buffer{} fmt.Fprintf(src, "{\n") - fmt.Fprintf(src, "Name: aws.String(\"%s\"),\n", lo.FromPtr(info.Name)) - fmt.Fprintf(src, "Manufacturer: aws.String(\"%s\"),\n", lo.FromPtr(info.Manufacturer)) fmt.Fprintf(src, "Count: aws.Int64(%d),\n", lo.FromPtr(info.Count)) + fmt.Fprintf(src, "Name: aws.String(\"%s\"),\n", lo.FromPtr(info.Name)) + fmt.Fprintf(src, "CoreInfo: &ec2.NeuronDeviceCoreInfo{\n") + fmt.Fprintf(src, "Count: aws.Int64(%d),\n", lo.FromPtr(info.CoreInfo.Count)) + fmt.Fprintf(src, "Version: aws.Int64(%d),\n", lo.FromPtr(info.CoreInfo.Version)) + fmt.Fprintf(src, "},\n") + fmt.Fprintf(src, "MemoryInfo: &ec2.NeuronDeviceMemoryInfo{\n") + fmt.Fprintf(src, "SizeInMiB: aws.Int64(%d),\n", lo.FromPtr(info.MemoryInfo.SizeInMiB)) + fmt.Fprintf(src, "},\n") fmt.Fprintf(src, "},\n") return src.String() } diff --git a/hack/codegen.sh b/hack/codegen.sh index f148e0ca1ca7..bf79b79db911 100755 --- a/hack/codegen.sh +++ b/hack/codegen.sh @@ -46,7 +46,7 @@ instanceTypeTestData() { GENERATED_FILE="pkg/fake/zz_generated.describe_instance_types.go" go run hack/code/instancetype_testdata_gen/main.go --out-file ${GENERATED_FILE} \ - --instance-types t3.large,m5.large,m5.xlarge,p3.8xlarge,g4dn.8xlarge,c6g.large,inf1.2xlarge,inf1.6xlarge,trn1.2xlarge,m5.metal,dl1.24xlarge,m6idn.32xlarge,t4g.small,t4g.xlarge,t4g.medium,g4ad.16xlarge + --instance-types t3.large,m5.large,m5.xlarge,p3.8xlarge,g4dn.8xlarge,c6g.large,inf2.xlarge,inf2.24xlarge,trn1.2xlarge,m5.metal,dl1.24xlarge,m6idn.32xlarge,t4g.small,t4g.xlarge,t4g.medium,g4ad.16xlarge checkForUpdates "${GENERATED_FILE}" } diff --git a/hack/docs/instancetypes_gen/main.go b/hack/docs/instancetypes_gen/main.go index 59b655c34771..64cea3dd335e 100644 --- a/hack/docs/instancetypes_gen/main.go +++ b/hack/docs/instancetypes_gen/main.go @@ -124,7 +124,7 @@ below are the resources available with some assumptions and after the instance o resourceNameMap := sets.New[string]() // Iterate through regions and take the union of instance types we discover across both - for _, region := range []string{"us-east-1", "us-west-2"} { + for _, region := range []string{"us-east-1", "us-east-2", "us-west-2"} { sess := session.Must(session.NewSession(&aws.Config{Region: lo.ToPtr(region)})) ec2api := ec2.New(sess) subnetProvider := subnet.NewDefaultProvider(ec2api, cache.New(awscache.DefaultTTL, awscache.DefaultCleanupInterval), cache.New(awscache.AvailableIPAddressTTL, awscache.DefaultCleanupInterval), cache.New(awscache.AssociatePublicIPAddressTTL, awscache.DefaultCleanupInterval)) diff --git a/pkg/apis/v1/labels.go b/pkg/apis/v1/labels.go index 561359e57f31..9bf39a97054e 100644 --- a/pkg/apis/v1/labels.go +++ b/pkg/apis/v1/labels.go @@ -90,6 +90,7 @@ var ( ResourceNVIDIAGPU corev1.ResourceName = "nvidia.com/gpu" ResourceAMDGPU corev1.ResourceName = "amd.com/gpu" ResourceAWSNeuron corev1.ResourceName = "aws.amazon.com/neuron" + ResourceAWSNeuronCore corev1.ResourceName = "aws.amazon.com/neuroncore" ResourceHabanaGaudi corev1.ResourceName = "habana.ai/gaudi" ResourceAWSPodENI corev1.ResourceName = "vpc.amazonaws.com/pod-eni" ResourcePrivateIPv4Address corev1.ResourceName = "vpc.amazonaws.com/PrivateIPv4Address" diff --git a/pkg/fake/ec2api.go b/pkg/fake/ec2api.go index 060e0fb67134..4412514d3c16 100644 --- a/pkg/fake/ec2api.go +++ b/pkg/fake/ec2api.go @@ -631,11 +631,11 @@ func (e *EC2API) DescribeInstanceTypeOfferingsWithContext(_ context.Context, _ * Location: aws.String("test-zone-1b"), }, { - InstanceType: aws.String("inf1.2xlarge"), + InstanceType: aws.String("inf2.xlarge"), Location: aws.String("test-zone-1a"), }, { - InstanceType: aws.String("inf1.6xlarge"), + InstanceType: aws.String("inf2.24xlarge"), Location: aws.String("test-zone-1a"), }, { diff --git a/pkg/fake/zz_generated.describe_instance_types.go b/pkg/fake/zz_generated.describe_instance_types.go index da2762eee5f6..d1a1dca2114a 100644 --- a/pkg/fake/zz_generated.describe_instance_types.go +++ b/pkg/fake/zz_generated.describe_instance_types.go @@ -267,107 +267,119 @@ var defaultDescribeInstanceTypesOutput = &ec2.DescribeInstanceTypesOutput{ }, }, { - InstanceType: aws.String("inf1.2xlarge"), + InstanceType: aws.String("inf2.24xlarge"), SupportedUsageClasses: aws.StringSlice([]string{"on-demand", "spot"}), SupportedVirtualizationTypes: aws.StringSlice([]string{"hvm"}), BurstablePerformanceSupported: aws.Bool(false), BareMetal: aws.Bool(false), Hypervisor: aws.String("nitro"), ProcessorInfo: &ec2.ProcessorInfo{ - Manufacturer: aws.String("Intel"), + Manufacturer: aws.String("AMD"), SupportedArchitectures: aws.StringSlice([]string{"x86_64"}), }, VCpuInfo: &ec2.VCpuInfo{ - DefaultCores: aws.Int64(4), - DefaultVCpus: aws.Int64(8), + DefaultCores: aws.Int64(48), + DefaultVCpus: aws.Int64(96), }, MemoryInfo: &ec2.MemoryInfo{ - SizeInMiB: aws.Int64(16384), + SizeInMiB: aws.Int64(393216), }, EbsInfo: &ec2.EbsInfo{ EbsOptimizedInfo: &ec2.EbsOptimizedInfo{ - BaselineBandwidthInMbps: aws.Int64(1190), - BaselineIops: aws.Int64(6000), - BaselineThroughputInMBps: aws.Float64(148.75), - MaximumBandwidthInMbps: aws.Int64(4750), - MaximumIops: aws.Int64(20000), - MaximumThroughputInMBps: aws.Float64(593.75), + BaselineBandwidthInMbps: aws.Int64(30000), + BaselineIops: aws.Int64(120000), + BaselineThroughputInMBps: aws.Float64(3750.00), + MaximumBandwidthInMbps: aws.Int64(30000), + MaximumIops: aws.Int64(120000), + MaximumThroughputInMBps: aws.Float64(3750.00), }, EbsOptimizedSupport: aws.String("default"), EncryptionSupport: aws.String("supported"), NvmeSupport: aws.String("required"), }, - InferenceAcceleratorInfo: &ec2.InferenceAcceleratorInfo{ - Accelerators: []*ec2.InferenceDeviceInfo{ + NeuronInfo: &ec2.NeuronInfo{ + NeuronDevices: []*ec2.NeuronDeviceInfo{ { - Name: aws.String("Inferentia"), - Manufacturer: aws.String("AWS"), - Count: aws.Int64(1), + Count: aws.Int64(6), + Name: aws.String("Inferentia2"), + CoreInfo: &ec2.NeuronDeviceCoreInfo{ + Count: aws.Int64(2), + Version: aws.Int64(2), + }, + MemoryInfo: &ec2.NeuronDeviceMemoryInfo{ + SizeInMiB: aws.Int64(32768), + }, }, }, }, NetworkInfo: &ec2.NetworkInfo{ - MaximumNetworkInterfaces: aws.Int64(4), - Ipv4AddressesPerInterface: aws.Int64(10), + MaximumNetworkInterfaces: aws.Int64(15), + Ipv4AddressesPerInterface: aws.Int64(50), EncryptionInTransitSupported: aws.Bool(true), DefaultNetworkCardIndex: aws.Int64(0), NetworkCards: []*ec2.NetworkCardInfo{ { NetworkCardIndex: aws.Int64(0), - MaximumNetworkInterfaces: aws.Int64(4), + MaximumNetworkInterfaces: aws.Int64(15), }, }, }, }, { - InstanceType: aws.String("inf1.6xlarge"), + InstanceType: aws.String("inf2.xlarge"), SupportedUsageClasses: aws.StringSlice([]string{"on-demand", "spot"}), SupportedVirtualizationTypes: aws.StringSlice([]string{"hvm"}), BurstablePerformanceSupported: aws.Bool(false), BareMetal: aws.Bool(false), Hypervisor: aws.String("nitro"), ProcessorInfo: &ec2.ProcessorInfo{ - Manufacturer: aws.String("Intel"), + Manufacturer: aws.String("AMD"), SupportedArchitectures: aws.StringSlice([]string{"x86_64"}), }, VCpuInfo: &ec2.VCpuInfo{ - DefaultCores: aws.Int64(12), - DefaultVCpus: aws.Int64(24), + DefaultCores: aws.Int64(2), + DefaultVCpus: aws.Int64(4), }, MemoryInfo: &ec2.MemoryInfo{ - SizeInMiB: aws.Int64(49152), + SizeInMiB: aws.Int64(16384), }, EbsInfo: &ec2.EbsInfo{ EbsOptimizedInfo: &ec2.EbsOptimizedInfo{ - BaselineBandwidthInMbps: aws.Int64(4750), - BaselineIops: aws.Int64(20000), - BaselineThroughputInMBps: aws.Float64(593.75), - MaximumBandwidthInMbps: aws.Int64(4750), - MaximumIops: aws.Int64(20000), - MaximumThroughputInMBps: aws.Float64(593.75), + BaselineBandwidthInMbps: aws.Int64(1250), + BaselineIops: aws.Int64(6000), + BaselineThroughputInMBps: aws.Float64(156.25), + MaximumBandwidthInMbps: aws.Int64(10000), + MaximumIops: aws.Int64(40000), + MaximumThroughputInMBps: aws.Float64(1250.00), }, EbsOptimizedSupport: aws.String("default"), EncryptionSupport: aws.String("supported"), NvmeSupport: aws.String("required"), }, - InferenceAcceleratorInfo: &ec2.InferenceAcceleratorInfo{ - Accelerators: []*ec2.InferenceDeviceInfo{ + NeuronInfo: &ec2.NeuronInfo{ + NeuronDevices: []*ec2.NeuronDeviceInfo{ { - Name: aws.String("Inferentia"), - Manufacturer: aws.String("AWS"), - Count: aws.Int64(4), + Count: aws.Int64(1), + Name: aws.String("Inferentia2"), + CoreInfo: &ec2.NeuronDeviceCoreInfo{ + Count: aws.Int64(2), + Version: aws.Int64(2), + }, + MemoryInfo: &ec2.NeuronDeviceMemoryInfo{ + SizeInMiB: aws.Int64(32768), + }, }, }, }, NetworkInfo: &ec2.NetworkInfo{ - MaximumNetworkInterfaces: aws.Int64(8), - Ipv4AddressesPerInterface: aws.Int64(30), + MaximumNetworkInterfaces: aws.Int64(4), + Ipv4AddressesPerInterface: aws.Int64(15), EncryptionInTransitSupported: aws.Bool(true), DefaultNetworkCardIndex: aws.Int64(0), NetworkCards: []*ec2.NetworkCardInfo{ { NetworkCardIndex: aws.Int64(0), - MaximumNetworkInterfaces: aws.Int64(8), + MaximumNetworkInterfaces: aws.Int64(4), }, }, }, @@ -821,6 +833,21 @@ var defaultDescribeInstanceTypesOutput = &ec2.DescribeInstanceTypesOutput{ EncryptionSupport: aws.String("supported"), NvmeSupport: aws.String("required"), }, + NeuronInfo: &ec2.NeuronInfo{ + NeuronDevices: []*ec2.NeuronDeviceInfo{ + { + Count: aws.Int64(1), + Name: aws.String("Trainium"), + CoreInfo: &ec2.NeuronDeviceCoreInfo{ + Count: aws.Int64(2), + Version: aws.Int64(2), + }, + MemoryInfo: &ec2.NeuronDeviceMemoryInfo{ + SizeInMiB: aws.Int64(32768), + }, + }, + }, + }, InstanceStorageInfo: &ec2.InstanceStorageInfo{NvmeSupport: aws.String("required"), TotalSizeInGB: aws.Int64(474), }, diff --git a/pkg/providers/instance/instance.go b/pkg/providers/instance/instance.go index ffdd907e96a7..0fa6e54eccfe 100644 --- a/pkg/providers/instance/instance.go +++ b/pkg/providers/instance/instance.go @@ -461,6 +461,7 @@ func filterExoticInstanceTypes(instanceTypes []*cloudprovider.InstanceType) []*c continue } if !resources.IsZero(it.Capacity[v1.ResourceAWSNeuron]) || + !resources.IsZero(it.Capacity[v1.ResourceAWSNeuronCore]) || !resources.IsZero(it.Capacity[v1.ResourceAMDGPU]) || !resources.IsZero(it.Capacity[v1.ResourceNVIDIAGPU]) || !resources.IsZero(it.Capacity[v1.ResourceHabanaGaudi]) { diff --git a/pkg/providers/instancetype/suite_test.go b/pkg/providers/instancetype/suite_test.go index 2638a2fd7452..d397a6794f37 100644 --- a/pkg/providers/instancetype/suite_test.go +++ b/pkg/providers/instancetype/suite_test.go @@ -243,10 +243,11 @@ var _ = Describe("InstanceTypeProvider", func() { v1.LabelInstanceGPUCount: "1", v1.LabelInstanceGPUMemory: "16384", v1.LabelInstanceLocalNVME: "900", - v1.LabelInstanceAcceleratorName: "inferentia", - v1.LabelInstanceAcceleratorManufacturer: "aws", - v1.LabelInstanceAcceleratorCount: "1", - v1.LabelTopologyZoneID: "tstz1-1a", + // TODO - NVIDIA/GPU instances should not have Neuron/accelerator labels + v1.LabelInstanceAcceleratorName: "inferentia2", + v1.LabelInstanceAcceleratorManufacturer: "aws", + v1.LabelInstanceAcceleratorCount: "1", + v1.LabelTopologyZoneID: "tstz1-1a", // Deprecated Labels corev1.LabelFailureDomainBetaRegion: fake.DefaultRegion, corev1.LabelFailureDomainBetaZone: "test-zone-1a", @@ -330,7 +331,7 @@ var _ = Describe("InstanceTypeProvider", func() { karpv1.NodePoolLabelKey: nodePool.Name, corev1.LabelTopologyRegion: fake.DefaultRegion, corev1.LabelTopologyZone: "test-zone-1a", - corev1.LabelInstanceTypeStable: "inf1.2xlarge", + corev1.LabelInstanceTypeStable: "inf2.xlarge", corev1.LabelOSStable: "linux", corev1.LabelArchStable: "amd64", karpv1.CapacityTypeLabelKey: "on-demand", @@ -338,15 +339,15 @@ var _ = Describe("InstanceTypeProvider", func() { v1.LabelInstanceHypervisor: "nitro", v1.LabelInstanceEncryptionInTransitSupported: "true", v1.LabelInstanceCategory: "inf", - v1.LabelInstanceGeneration: "1", - v1.LabelInstanceFamily: "inf1", - v1.LabelInstanceSize: "2xlarge", - v1.LabelInstanceCPU: "8", - v1.LabelInstanceCPUManufacturer: "intel", + v1.LabelInstanceGeneration: "2", + v1.LabelInstanceFamily: "inf2", + v1.LabelInstanceSize: "xlarge", + v1.LabelInstanceCPU: "4", + v1.LabelInstanceCPUManufacturer: "amd", v1.LabelInstanceMemory: "16384", - v1.LabelInstanceEBSBandwidth: "4750", - v1.LabelInstanceNetworkBandwidth: "5000", - v1.LabelInstanceAcceleratorName: "inferentia", + v1.LabelInstanceEBSBandwidth: "10000", + v1.LabelInstanceNetworkBandwidth: "2083", + v1.LabelInstanceAcceleratorName: "inferentia2", v1.LabelInstanceAcceleratorManufacturer: "aws", v1.LabelInstanceAcceleratorCount: "1", v1.LabelTopologyZoneID: "tstz1-1a", @@ -355,7 +356,7 @@ var _ = Describe("InstanceTypeProvider", func() { corev1.LabelFailureDomainBetaZone: "test-zone-1a", "beta.kubernetes.io/arch": "amd64", "beta.kubernetes.io/os": "linux", - corev1.LabelInstanceType: "inf1.2xlarge", + corev1.LabelInstanceType: "inf2.xlarge", "topology.ebs.csi.aws.com/zone": "test-zone-1a", } @@ -761,8 +762,8 @@ var _ = Describe("InstanceTypeProvider", func() { pods := []*corev1.Pod{ coretest.UnschedulablePod(coretest.PodOptions{ ResourceRequirements: corev1.ResourceRequirements{ - Requests: corev1.ResourceList{v1.ResourceAWSNeuron: resource.MustParse("1")}, - Limits: corev1.ResourceList{v1.ResourceAWSNeuron: resource.MustParse("1")}, + Requests: corev1.ResourceList{v1.ResourceAWSNeuron: resource.MustParse("2")}, + Limits: corev1.ResourceList{v1.ResourceAWSNeuron: resource.MustParse("2")}, }, }), // Should pack onto same instance @@ -783,7 +784,7 @@ var _ = Describe("InstanceTypeProvider", func() { ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pods...) for _, pod := range pods { node := ExpectScheduled(ctx, env.Client, pod) - Expect(node.Labels).To(HaveKeyWithValue(corev1.LabelInstanceTypeStable, "inf1.6xlarge")) + Expect(node.Labels).To(HaveKeyWithValue(corev1.LabelInstanceTypeStable, "inf2.24xlarge")) nodeNames.Insert(node.Name) } Expect(nodeNames.Len()).To(Equal(2)) @@ -816,6 +817,34 @@ var _ = Describe("InstanceTypeProvider", func() { } Expect(nodeNames.Len()).To(Equal(1)) }) + It("should launch inf2 instances for aws.amazon.com/neuroncore resource requests", func() { + nodeNames := sets.NewString() + nodePool.Spec.Template.Spec.Requirements = []karpv1.NodeSelectorRequirementWithMinValues{ + { + NodeSelectorRequirement: corev1.NodeSelectorRequirement{ + Key: corev1.LabelInstanceTypeStable, + Operator: corev1.NodeSelectorOpIn, + Values: []string{"inf2.xlarge"}, + }, + }, + } + ExpectApplied(ctx, env.Client, nodePool, nodeClass) + pods := []*corev1.Pod{ + coretest.UnschedulablePod(coretest.PodOptions{ + ResourceRequirements: corev1.ResourceRequirements{ + Requests: corev1.ResourceList{v1.ResourceAWSNeuronCore: resource.MustParse("2")}, + Limits: corev1.ResourceList{v1.ResourceAWSNeuronCore: resource.MustParse("2")}, + }, + }), + } + ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pods...) + for _, pod := range pods { + node := ExpectScheduled(ctx, env.Client, pod) + Expect(node.Labels).To(HaveKeyWithValue(corev1.LabelInstanceTypeStable, "inf2.xlarge")) + nodeNames.Insert(node.Name) + } + Expect(nodeNames.Len()).To(Equal(1)) + }) It("should launch instances for vpc.amazonaws.com/efa resource requests", func() { nodePool.Spec.Template.Spec.Requirements = []karpv1.NodeSelectorRequirementWithMinValues{ { @@ -1871,7 +1900,7 @@ var _ = Describe("InstanceTypeProvider", func() { }) Context("Insufficient Capacity Error Cache", func() { It("should launch instances of different type on second reconciliation attempt with Insufficient Capacity Error Cache fallback", func() { - awsEnv.EC2API.InsufficientCapacityPools.Set([]fake.CapacityPool{{CapacityType: karpv1.CapacityTypeOnDemand, InstanceType: "inf1.6xlarge", Zone: "test-zone-1a"}}) + awsEnv.EC2API.InsufficientCapacityPools.Set([]fake.CapacityPool{{CapacityType: karpv1.CapacityTypeOnDemand, InstanceType: "inf2.24xlarge", Zone: "test-zone-1a"}}) ExpectApplied(ctx, env.Client, nodePool, nodeClass) pods := []*corev1.Pod{ coretest.UnschedulablePod(coretest.PodOptions{ @@ -1890,7 +1919,7 @@ var _ = Describe("InstanceTypeProvider", func() { }), } ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pods...) - // it should've tried to pack them on a single inf1.6xlarge then hit an insufficient capacity error + // it should've tried to pack them on a single inf2.24xlarge then hit an insufficient capacity error for _, pod := range pods { ExpectNotScheduled(ctx, env.Client, pod) } @@ -1898,7 +1927,7 @@ var _ = Describe("InstanceTypeProvider", func() { ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pods...) for _, pod := range pods { node := ExpectScheduled(ctx, env.Client, pod) - Expect(node.Labels).To(HaveKeyWithValue(v1.LabelInstanceAcceleratorName, "inferentia")) + Expect(node.Labels).To(HaveKeyWithValue(v1.LabelInstanceAcceleratorName, "inferentia2")) nodeNames.Insert(node.Name) } Expect(nodeNames.Len()).To(Equal(2)) @@ -1965,10 +1994,10 @@ var _ = Describe("InstanceTypeProvider", func() { } }) It("should launch instances on later reconciliation attempt with Insufficient Capacity Error Cache expiry", func() { - awsEnv.EC2API.InsufficientCapacityPools.Set([]fake.CapacityPool{{CapacityType: karpv1.CapacityTypeOnDemand, InstanceType: "inf1.6xlarge", Zone: "test-zone-1a"}}) + awsEnv.EC2API.InsufficientCapacityPools.Set([]fake.CapacityPool{{CapacityType: karpv1.CapacityTypeOnDemand, InstanceType: "inf2.24xlarge", Zone: "test-zone-1a"}}) ExpectApplied(ctx, env.Client, nodePool, nodeClass) pod := coretest.UnschedulablePod(coretest.PodOptions{ - NodeSelector: map[string]string{corev1.LabelInstanceTypeStable: "inf1.6xlarge"}, + NodeSelector: map[string]string{corev1.LabelInstanceTypeStable: "inf2.24xlarge"}, ResourceRequirements: corev1.ResourceRequirements{ Requests: corev1.ResourceList{v1.ResourceAWSNeuron: resource.MustParse("2")}, Limits: corev1.ResourceList{v1.ResourceAWSNeuron: resource.MustParse("2")}, @@ -1978,10 +2007,10 @@ var _ = Describe("InstanceTypeProvider", func() { ExpectNotScheduled(ctx, env.Client, pod) // capacity shortage is over - expire the item from the cache and try again awsEnv.EC2API.InsufficientCapacityPools.Set([]fake.CapacityPool{}) - awsEnv.UnavailableOfferingsCache.Delete("inf1.6xlarge", "test-zone-1a", karpv1.CapacityTypeOnDemand) + awsEnv.UnavailableOfferingsCache.Delete("inf2.24xlarge", "test-zone-1a", karpv1.CapacityTypeOnDemand) ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) node := ExpectScheduled(ctx, env.Client, pod) - Expect(node.Labels).To(HaveKeyWithValue(corev1.LabelInstanceTypeStable, "inf1.6xlarge")) + Expect(node.Labels).To(HaveKeyWithValue(corev1.LabelInstanceTypeStable, "inf2.24xlarge")) }) It("should launch instances in a different zone on second reconciliation attempt with Insufficient Capacity Error Cache fallback (Habana)", func() { awsEnv.EC2API.InsufficientCapacityPools.Set([]fake.CapacityPool{{CapacityType: karpv1.CapacityTypeOnDemand, InstanceType: "dl1.24xlarge", Zone: "test-zone-1a"}}) diff --git a/pkg/providers/instancetype/types.go b/pkg/providers/instancetype/types.go index 90cda92587ca..3d1496df8b6a 100644 --- a/pkg/providers/instancetype/types.go +++ b/pkg/providers/instancetype/types.go @@ -250,25 +250,24 @@ func computeRequirements(info *ec2.InstanceTypeInfo, offerings cloudprovider.Off requirements.Get(v1.LabelInstanceGPUCount).Insert(fmt.Sprint(aws.Int64Value(gpu.Count))) requirements.Get(v1.LabelInstanceGPUMemory).Insert(fmt.Sprint(aws.Int64Value(gpu.MemoryInfo.SizeInMiB))) } - // Accelerators - if info.InferenceAcceleratorInfo != nil && len(info.InferenceAcceleratorInfo.Accelerators) == 1 { + // Accelerators - excluding Neuron + if info.InferenceAcceleratorInfo != nil && len(info.InferenceAcceleratorInfo.Accelerators) == 1 && info.NeuronInfo == nil { accelerator := info.InferenceAcceleratorInfo.Accelerators[0] requirements.Get(v1.LabelInstanceAcceleratorName).Insert(lowerKabobCase(aws.StringValue(accelerator.Name))) requirements.Get(v1.LabelInstanceAcceleratorManufacturer).Insert(lowerKabobCase(aws.StringValue(accelerator.Manufacturer))) requirements.Get(v1.LabelInstanceAcceleratorCount).Insert(fmt.Sprint(aws.Int64Value(accelerator.Count))) } + // Neuron + if info.NeuronInfo != nil && len(info.NeuronInfo.NeuronDevices) == 1 { + device := info.NeuronInfo.NeuronDevices[0] + requirements.Get(v1.LabelInstanceAcceleratorName).Insert(lowerKabobCase(aws.StringValue(device.Name))) + requirements.Get(v1.LabelInstanceAcceleratorManufacturer).Insert(lowerKabobCase("aws")) + requirements.Get(v1.LabelInstanceAcceleratorCount).Insert(fmt.Sprint(aws.Int64Value(device.Count))) + } // Windows Build Version Labels if family, ok := amiFamily.(*amifamily.Windows); ok { requirements.Get(corev1.LabelWindowsBuild).Insert(family.Build) } - // Trn1 Accelerators - // TODO: remove function once DescribeInstanceTypes contains the accelerator data - // Values found from: https://aws.amazon.com/ec2/instance-types/trn1/ - if strings.HasPrefix(*info.InstanceType, "trn1") { - requirements.Get(v1.LabelInstanceAcceleratorName).Insert(lowerKabobCase("Inferentia")) - requirements.Get(v1.LabelInstanceAcceleratorManufacturer).Insert(lowerKabobCase("AWS")) - requirements.Get(v1.LabelInstanceAcceleratorCount).Insert(fmt.Sprint(awsNeurons(info))) - } // CPU Manufacturer, valid options: aws, intel, amd if info.ProcessorInfo != nil { requirements.Get(v1.LabelInstanceCPUManufacturer).Insert(lowerKabobCase(aws.StringValue(info.ProcessorInfo.Manufacturer))) @@ -311,7 +310,8 @@ func computeCapacity(ctx context.Context, info *ec2.InstanceTypeInfo, amiFamily v1.ResourceAWSPodENI: *awsPodENI(aws.StringValue(info.InstanceType)), v1.ResourceNVIDIAGPU: *nvidiaGPUs(info), v1.ResourceAMDGPU: *amdGPUs(info), - v1.ResourceAWSNeuron: *awsNeurons(info), + v1.ResourceAWSNeuron: *awsNeuronDevices(info), + v1.ResourceAWSNeuronCore: *awsNeuronCores(info), v1.ResourceHabanaGaudi: *habanaGaudis(info), v1.ResourceEFA: *efas(info), } @@ -406,19 +406,21 @@ func amdGPUs(info *ec2.InstanceTypeInfo) *resource.Quantity { return resources.Quantity(fmt.Sprint(count)) } -// TODO: remove trn1 hardcode values once DescribeInstanceTypes contains the accelerator data -// Values found from: https://aws.amazon.com/ec2/instance-types/trn1/ -func awsNeurons(info *ec2.InstanceTypeInfo) *resource.Quantity { +func awsNeuronCores(info *ec2.InstanceTypeInfo) *resource.Quantity { + count := int64(0) + if info.NeuronInfo != nil { + neuronDevice := info.NeuronInfo.NeuronDevices[0] + neuronCorePerDevice := neuronDevice.CoreInfo.Count + count = *neuronDevice.Count * *neuronCorePerDevice + } + return resources.Quantity(fmt.Sprint(count)) +} + +func awsNeuronDevices(info *ec2.InstanceTypeInfo) *resource.Quantity { count := int64(0) - if *info.InstanceType == "trn1.2xlarge" { - count = int64(1) - } else if *info.InstanceType == "trn1.32xlarge" { - count = int64(16) - } else if *info.InstanceType == "trn1n.32xlarge" { - count = int64(16) - } else if info.InferenceAcceleratorInfo != nil { - for _, accelerator := range info.InferenceAcceleratorInfo.Accelerators { - count += *accelerator.Count + if info.NeuronInfo != nil { + for _, device := range info.NeuronInfo.NeuronDevices { + count += *device.Count } } return resources.Quantity(fmt.Sprint(count)) diff --git a/test/suites/integration/extended_resources_test.go b/test/suites/integration/extended_resources_test.go index 4f47f1b3ec16..49c0803fdc64 100644 --- a/test/suites/integration/extended_resources_test.go +++ b/test/suites/integration/extended_resources_test.go @@ -22,9 +22,11 @@ import ( "github.com/samber/lo" appsv1 "k8s.io/api/apps/v1" corev1 "k8s.io/api/core/v1" + rbacv1 "k8s.io/api/rbac/v1" "k8s.io/apimachinery/pkg/api/resource" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" "k8s.io/apimachinery/pkg/labels" + "k8s.io/apimachinery/pkg/util/intstr" "sigs.k8s.io/karpenter/pkg/test" @@ -104,6 +106,90 @@ var _ = Describe("Extended Resources", func() { env.ExpectCreatedNodeCount("==", 1) env.EventuallyExpectInitializedNodeCount("==", 1) }) + It("should provision nodes for a deployment that requests aws.amazon.com/neuron", func() { + ExpectNeuronDevicePluginCreated() + // TODO: jmdeal@ remove AL2 pin once AL2023 accelerated AMIs are available + nodeClass.Spec.AMISelectorTerms = []v1.AMISelectorTerm{{Alias: "al2@latest"}} + numPods := 1 + dep := test.Deployment(test.DeploymentOptions{ + Replicas: int32(numPods), + PodOptions: test.PodOptions{ + ObjectMeta: metav1.ObjectMeta{ + Labels: map[string]string{"app": "large-app"}, + }, + ResourceRequirements: corev1.ResourceRequirements{ + Requests: corev1.ResourceList{ + // Only 1 is requested to avoid the use of the Neuron scheduler + // TODO: bryantbiggs@ add the ability to specify the scheduler name to test.PodOptions in order to use the Neuron scheduler + "aws.amazon.com/neuron": resource.MustParse("1"), + }, + Limits: corev1.ResourceList{ + "aws.amazon.com/neuron": resource.MustParse("1"), + }, + }, + }, + }) + selector := labels.SelectorFromSet(dep.Spec.Selector.MatchLabels) + test.ReplaceRequirements(nodePool, karpv1.NodeSelectorRequirementWithMinValues{ + NodeSelectorRequirement: corev1.NodeSelectorRequirement{ + Key: v1.LabelInstanceCategory, + Operator: corev1.NodeSelectorOpExists, + }, + }) + test.ReplaceRequirements(nodePool, karpv1.NodeSelectorRequirementWithMinValues{ + NodeSelectorRequirement: corev1.NodeSelectorRequirement{ + Key: v1.LabelInstanceGeneration, + Operator: corev1.NodeSelectorOpIn, + Values: []string{"1", "2"}, + }, + }) + env.ExpectCreated(nodeClass, nodePool, dep) + env.EventuallyExpectHealthyPodCount(selector, numPods) + env.ExpectCreatedNodeCount("==", 1) + env.EventuallyExpectInitializedNodeCount("==", 1) + }) + It("should provision nodes for a deployment that requests aws.amazon.com/neuroncore", func() { + ExpectNeuronDevicePluginCreated() + // TODO: jmdeal@ remove AL2 pin once AL2023 accelerated AMIs are available + nodeClass.Spec.AMISelectorTerms = []v1.AMISelectorTerm{{Alias: "al2@latest"}} + numPods := 1 + dep := test.Deployment(test.DeploymentOptions{ + Replicas: int32(numPods), + PodOptions: test.PodOptions{ + ObjectMeta: metav1.ObjectMeta{ + Labels: map[string]string{"app": "large-app"}, + }, + ResourceRequirements: corev1.ResourceRequirements{ + Requests: corev1.ResourceList{ + // Only 1 is requested to avoid the use of the Neuron scheduler + // TODO: bryantbiggs@ add the ability to specify the scheduler name to test.PodOptions in order to use the Neuron scheduler + "aws.amazon.com/neuroncore": resource.MustParse("1"), + }, + Limits: corev1.ResourceList{ + "aws.amazon.com/neuroncore": resource.MustParse("1"), + }, + }, + }, + }) + selector := labels.SelectorFromSet(dep.Spec.Selector.MatchLabels) + test.ReplaceRequirements(nodePool, karpv1.NodeSelectorRequirementWithMinValues{ + NodeSelectorRequirement: corev1.NodeSelectorRequirement{ + Key: v1.LabelInstanceCategory, + Operator: corev1.NodeSelectorOpExists, + }, + }) + test.ReplaceRequirements(nodePool, karpv1.NodeSelectorRequirementWithMinValues{ + NodeSelectorRequirement: corev1.NodeSelectorRequirement{ + Key: v1.LabelInstanceGeneration, + Operator: corev1.NodeSelectorOpIn, + Values: []string{"1", "2"}, + }, + }) + env.ExpectCreated(nodeClass, nodePool, dep) + env.EventuallyExpectHealthyPodCount(selector, numPods) + env.ExpectCreatedNodeCount("==", 1) + env.EventuallyExpectInitializedNodeCount("==", 1) + }) It("should provision nodes for a deployment that requests vpc.amazonaws.com/pod-eni (security groups for pods)", func() { env.ExpectPodENIEnabled() DeferCleanup(func() { @@ -325,6 +411,520 @@ func ExpectNvidiaDevicePluginCreated() { }) } +// https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/k8/k8s-neuron-device-plugin.yml +func ExpectNeuronDevicePluginCreated() { + GinkgoHelper() + + // When selecting more than 1 neuron/neuroncore but less than ALL of the neuron/neuroncores on the instance, + // you must use the Neuron scheduler to schedule neuron/neuroncores in a contiguous manner. + // https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/kubernetes-getting-started.html#neuron-scheduler-extension + ExpectK8sNeuronSchedulerCreated() + ExpectNeuronSchedulerExtensionCreated() + + neuronDevicePlugin := "neuron-device-plugin" + + env.ExpectCreatedOrUpdated(&rbacv1.ClusterRole{ + ObjectMeta: metav1.ObjectMeta{ + Name: neuronDevicePlugin, + }, + Rules: []rbacv1.PolicyRule{ + // Device plugin + { + APIGroups: []string{""}, + Resources: []string{"nodes"}, + Verbs: []string{"get", "list", "watch"}, + }, + { + APIGroups: []string{""}, + Resources: []string{"events"}, + Verbs: []string{"create", "patch"}, + }, + { + APIGroups: []string{""}, + Resources: []string{"pods"}, + Verbs: []string{"update", "patch", "get", "list", "watch"}, + }, + { + APIGroups: []string{""}, + Resources: []string{"nodes/status"}, + Verbs: []string{"update", "patch"}, + }, + // Scheduler + { + APIGroups: []string{""}, + Resources: []string{"configmaps"}, + Verbs: []string{"get", "list", "watch"}, + }, + { + APIGroups: []string{"coordination.k8s.io"}, + Resources: []string{"leases"}, + Verbs: []string{"create", "get", "list", "update"}, + }, + }, + }) + + env.ExpectCreatedOrUpdated(&rbacv1.ClusterRoleBinding{ + ObjectMeta: metav1.ObjectMeta{ + Name: neuronDevicePlugin, + }, + RoleRef: rbacv1.RoleRef{ + APIGroup: rbacv1.GroupName, + Kind: "ClusterRole", + Name: neuronDevicePlugin, + }, + Subjects: []rbacv1.Subject{ + { + Kind: "ServiceAccount", + Name: neuronDevicePlugin, + Namespace: "kube-system", + }, + }, + }) + + env.ExpectCreatedOrUpdated(&corev1.ServiceAccount{ + ObjectMeta: metav1.ObjectMeta{ + Name: neuronDevicePlugin, + Namespace: "kube-system", + }, + }) + + env.ExpectCreated(&appsv1.DaemonSet{ + ObjectMeta: test.ObjectMeta(metav1.ObjectMeta{ + Name: neuronDevicePlugin, + Namespace: "kube-system", + }), + Spec: appsv1.DaemonSetSpec{ + Selector: &metav1.LabelSelector{ + MatchLabels: map[string]string{ + "name": neuronDevicePlugin, + }, + }, + UpdateStrategy: appsv1.DaemonSetUpdateStrategy{ + Type: appsv1.RollingUpdateDaemonSetStrategyType, + }, + Template: corev1.PodTemplateSpec{ + ObjectMeta: test.ObjectMeta(metav1.ObjectMeta{ + Labels: map[string]string{ + "name": neuronDevicePlugin, + }, + }), + Spec: corev1.PodSpec{ + ServiceAccountName: neuronDevicePlugin, + Tolerations: []corev1.Toleration{ + { + Key: "aws.amazon.com/neuron", + Operator: corev1.TolerationOpExists, + Effect: corev1.TaintEffectNoSchedule, + }, + }, + PriorityClassName: "system-node-critical", + Containers: []corev1.Container{ + { + Name: neuronDevicePlugin, + Image: "public.ecr.aws/neuron/neuron-device-plugin:2.22.4.0", + Env: []corev1.EnvVar{ + { + Name: "KUBECONFIG", + Value: "/etc/kubernetes/kubelet.conf", + }, + { + Name: "NODE_NAME", + ValueFrom: &corev1.EnvVarSource{ + FieldRef: &corev1.ObjectFieldSelector{ + FieldPath: "spec.nodeName", + }, + }, + }, + }, + SecurityContext: &corev1.SecurityContext{ + AllowPrivilegeEscalation: lo.ToPtr(false), + Capabilities: &corev1.Capabilities{ + Drop: []corev1.Capability{"ALL"}, + }, + }, + VolumeMounts: []corev1.VolumeMount{ + { + Name: "device-plugin", + MountPath: "/var/lib/kubelet/device-plugins", + }, + { + Name: "infa-map", + MountPath: "/run", + }, + }, + }, + }, + Volumes: []corev1.Volume{ + { + Name: "device-plugin", + VolumeSource: corev1.VolumeSource{ + HostPath: &corev1.HostPathVolumeSource{ + Path: "/var/lib/kubelet/device-plugins", + }, + }, + }, + { + Name: "infa-map", + VolumeSource: corev1.VolumeSource{ + HostPath: &corev1.HostPathVolumeSource{ + Path: "/run", + }, + }, + }, + }, + }, + }, + }, + }) +} + +// https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/k8/k8s-neuron-scheduler-eks.yml +func ExpectK8sNeuronSchedulerCreated() { + GinkgoHelper() + + k8sNeuronScheduler := "k8s-neuron-scheduler" + + env.ExpectCreatedOrUpdated(&corev1.ServiceAccount{ + ObjectMeta: metav1.ObjectMeta{ + Name: k8sNeuronScheduler, + Namespace: "kube-system", + }, + }) + + env.ExpectCreatedOrUpdated(&rbacv1.ClusterRole{ + ObjectMeta: metav1.ObjectMeta{ + Name: k8sNeuronScheduler, + }, + Rules: []rbacv1.PolicyRule{ + { + APIGroups: []string{""}, + Resources: []string{"nodes"}, + Verbs: []string{"get", "list", "watch"}, + }, + { + APIGroups: []string{""}, + Resources: []string{"node/status"}, + Verbs: []string{"update", "patch", "get", "list", "watch"}, + }, + { + APIGroups: []string{""}, + Resources: []string{"events"}, + Verbs: []string{"create", "patch"}, + }, + { + APIGroups: []string{""}, + Resources: []string{"pods"}, + Verbs: []string{"update", "patch", "get", "list", "watch"}, + }, + { + APIGroups: []string{""}, + Resources: []string{"bindings", "pods/bindings"}, + Verbs: []string{"create"}, + }, + }, + }) + + env.ExpectCreatedOrUpdated(&rbacv1.ClusterRoleBinding{ + ObjectMeta: metav1.ObjectMeta{ + Name: k8sNeuronScheduler, + }, + RoleRef: rbacv1.RoleRef{ + APIGroup: rbacv1.GroupName, + Kind: "ClusterRole", + Name: k8sNeuronScheduler, + }, + Subjects: []rbacv1.Subject{ + { + Kind: "ServiceAccount", + Name: k8sNeuronScheduler, + Namespace: "kube-system", + }, + }, + }) + + env.ExpectCreatedOrUpdated(&corev1.Service{ + ObjectMeta: test.ObjectMeta(metav1.ObjectMeta{ + Name: k8sNeuronScheduler, + Namespace: "kube-system", + }), + Spec: corev1.ServiceSpec{ + Selector: map[string]string{ + "app": k8sNeuronScheduler, + }, + Ports: []corev1.ServicePort{ + { + Name: "http", + Port: 12345, + TargetPort: intstr.FromInt(12345), + }, + }, + }, + }) + + replicas := int32(1) + + env.ExpectCreatedOrUpdated(&appsv1.Deployment{ + ObjectMeta: test.ObjectMeta(metav1.ObjectMeta{ + Name: k8sNeuronScheduler, + Namespace: "kube-system", + }), + Spec: appsv1.DeploymentSpec{ + Replicas: &replicas, + Strategy: appsv1.DeploymentStrategy{ + Type: appsv1.RecreateDeploymentStrategyType, + }, + Selector: &metav1.LabelSelector{ + MatchLabels: map[string]string{ + "app": k8sNeuronScheduler, + }, + }, + Template: corev1.PodTemplateSpec{ + ObjectMeta: test.ObjectMeta(metav1.ObjectMeta{ + Labels: map[string]string{ + "app": k8sNeuronScheduler, + }, + Annotations: map[string]string{ + "scheduler.alpha.kubernetes.io/critical-pod": "", + }, + }), + Spec: corev1.PodSpec{ + ServiceAccountName: k8sNeuronScheduler, + PriorityClassName: "system-node-critical", + SchedulerName: k8sNeuronScheduler, + Tolerations: []corev1.Toleration{ + { + Key: "CriticalAddonsOnly", + Operator: corev1.TolerationOpExists, + Effect: corev1.TaintEffectNoSchedule, + }, + }, + Containers: []corev1.Container{ + { + Name: k8sNeuronScheduler, + Image: "public.ecr.aws/neuron/neuron-scheduler:2.22.4.0", + Ports: []corev1.ContainerPort{ + { + Name: "http", + ContainerPort: 12345, + }, + }, + Env: []corev1.EnvVar{ + { + Name: "PORT", + Value: "12345", + }, + }, + }, + }, + }, + }, + }, + }) +} + +// https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/k8/my-scheduler.yml +func ExpectNeuronSchedulerExtensionCreated() { + GinkgoHelper() + + neuronSchedulerExtension := "neuron-scheduler-ext" + + env.ExpectCreatedOrUpdated(&corev1.ServiceAccount{ + ObjectMeta: metav1.ObjectMeta{ + Name: neuronSchedulerExtension, + Namespace: "kube-system", + }, + }) + + env.ExpectCreatedOrUpdated(&rbacv1.ClusterRole{ + ObjectMeta: metav1.ObjectMeta{ + Name: neuronSchedulerExtension, + }, + Rules: []rbacv1.PolicyRule{ + { + APIGroups: []string{""}, + Resources: []string{"configmaps"}, + Verbs: []string{"get", "list", "watch"}, + }, + { + APIGroups: []string{"coordination.k8s.io"}, + Resources: []string{"leases"}, + Verbs: []string{"create", "get", "list", "update"}, + }, + }, + }) + + env.ExpectCreatedOrUpdated(&rbacv1.ClusterRoleBinding{ + ObjectMeta: metav1.ObjectMeta{ + Name: fmt.Sprintf("%s-kube-scheduler", neuronSchedulerExtension), + }, + Subjects: []rbacv1.Subject{ + { + Kind: "ServiceAccount", + Name: neuronSchedulerExtension, + Namespace: "kube-system", + }, + }, + RoleRef: rbacv1.RoleRef{ + APIGroup: rbacv1.GroupName, + Kind: "ClusterRole", + Name: "system:kube-scheduler", + }, + }) + env.ExpectCreatedOrUpdated(&rbacv1.ClusterRoleBinding{ + ObjectMeta: metav1.ObjectMeta{ + Name: fmt.Sprintf("%s-volume-scheduler", neuronSchedulerExtension), + }, + Subjects: []rbacv1.Subject{ + { + Kind: "ServiceAccount", + Name: neuronSchedulerExtension, + Namespace: "kube-system", + }, + }, + RoleRef: rbacv1.RoleRef{ + APIGroup: rbacv1.GroupName, + Kind: "ClusterRole", + Name: "system:volume-scheduler", + }, + }) + env.ExpectCreatedOrUpdated(&rbacv1.ClusterRoleBinding{ + ObjectMeta: metav1.ObjectMeta{ + Name: neuronSchedulerExtension, + }, + Subjects: []rbacv1.Subject{ + { + Kind: "ServiceAccount", + Name: neuronSchedulerExtension, + Namespace: "kube-system", + }, + }, + RoleRef: rbacv1.RoleRef{ + APIGroup: rbacv1.GroupName, + Kind: "ClusterRole", + Name: neuronSchedulerExtension, + }, + }) + + env.ExpectCreatedOrUpdated(&corev1.ConfigMap{ + ObjectMeta: test.ObjectMeta(metav1.ObjectMeta{ + Name: fmt.Sprintf("%s-config", neuronSchedulerExtension), + Namespace: "kube-system", + }), + Data: map[string]string{ + fmt.Sprintf("%s-config.yaml", neuronSchedulerExtension): fmt.Sprintf(`apiVersion: kubescheduler.config.k8s.io/v1 +kind: KubeSchedulerConfiguration +profiles: + - schedulerName: %[1]v +extenders: + - urlPrefix: 'http://k8s-neuron-scheduler.kube-system.svc.cluster.local:12345' + filterVerb: filter + bindVerb: bind + enableHTTPS: false + nodeCacheCapable: true + managedResources: + - name: 'aws.amazon.com/neuron' + ignoredByScheduler: false + - name: 'aws.amazon.com/neuroncore' + ignoredByScheduler: false + - name: 'aws.amazon.com/neurondevice' + ignoredByScheduler: false + ignorable: false +leaderElection: + leaderElect: true + resourceNamespace: kube-system + resourceName: %[1]v`, neuronSchedulerExtension), + }, + }) + + replicas := int32(1) + + env.ExpectCreatedOrUpdated(&appsv1.Deployment{ + ObjectMeta: test.ObjectMeta(metav1.ObjectMeta{ + Name: neuronSchedulerExtension, + Namespace: "kube-system", + Labels: map[string]string{ + "tier": "control-plane", + }, + }), + Spec: appsv1.DeploymentSpec{ + Replicas: &replicas, + Selector: &metav1.LabelSelector{ + MatchLabels: map[string]string{ + "tier": "control-plane", + }, + }, + Template: corev1.PodTemplateSpec{ + ObjectMeta: test.ObjectMeta(metav1.ObjectMeta{ + Labels: map[string]string{ + "tier": "control-plane", + }, + }), + Spec: corev1.PodSpec{ + ServiceAccountName: neuronSchedulerExtension, + Tolerations: []corev1.Toleration{ + { + Key: "CriticalAddonsOnly", + Operator: corev1.TolerationOpExists, + Effect: corev1.TaintEffectNoSchedule, + }, + }, + Containers: []corev1.Container{ + { + Name: neuronSchedulerExtension, + Args: []string{fmt.Sprintf("--config=/etc/kubernetes/%[1]v/%[1]v-config.yaml", neuronSchedulerExtension), "--leader-elect=true", "--v=2"}, + Command: []string{"/usr/local/bin/kube-scheduler"}, + Image: fmt.Sprintf("public.ecr.aws/eks-distro/kubernetes/kube-scheduler:v1.%[1]v.0-eks-1-%[1]v-latest", env.K8sMinorVersion()), + LivenessProbe: &corev1.Probe{ + InitialDelaySeconds: 15, + ProbeHandler: corev1.ProbeHandler{ + HTTPGet: &corev1.HTTPGetAction{ + Path: "/healthz", + Port: intstr.FromInt(10259), + Scheme: corev1.URISchemeHTTPS, + }, + }, + }, + ReadinessProbe: &corev1.Probe{ + ProbeHandler: corev1.ProbeHandler{ + HTTPGet: &corev1.HTTPGetAction{ + Path: "/healthz", + Port: intstr.FromInt(10259), + Scheme: corev1.URISchemeHTTPS, + }, + }, + }, + SecurityContext: &corev1.SecurityContext{ + Privileged: lo.ToPtr(false), + }, + VolumeMounts: []corev1.VolumeMount{ + { + Name: "config-volume", + MountPath: fmt.Sprintf("/etc/kubernetes/%s", neuronSchedulerExtension), + ReadOnly: true, + }, + }, + }, + }, + HostNetwork: false, + HostPID: false, + Volumes: []corev1.Volume{ + { + Name: "config-volume", + VolumeSource: corev1.VolumeSource{ + ConfigMap: &corev1.ConfigMapVolumeSource{ + LocalObjectReference: corev1.LocalObjectReference{ + Name: fmt.Sprintf("%s-config", neuronSchedulerExtension), + }, + }, + }, + }, + }, + }, + }, + }, + }) +} + func ExpectAMDDevicePluginCreated() { GinkgoHelper() env.ExpectCreated(&appsv1.DaemonSet{ diff --git a/test/suites/scheduling/suite_test.go b/test/suites/scheduling/suite_test.go index e0fe0d09c0f7..73315a3f8c2b 100644 --- a/test/suites/scheduling/suite_test.go +++ b/test/suites/scheduling/suite_test.go @@ -259,7 +259,7 @@ var _ = Describe("Scheduling", Ordered, ContinueOnFailure, func() { env.EventuallyExpectHealthyPodCount(labels.SelectorFromSet(deployment.Spec.Selector.MatchLabels), int(*deployment.Spec.Replicas)) env.ExpectCreatedNodeCount("==", 1) }) - It("should support well-known labels for an accelerator (inferentia)", func() { + It("should support well-known labels for an accelerator (inferentia2)", func() { nodeSelector := map[string]string{ v1.LabelInstanceAcceleratorName: "inferentia", v1.LabelInstanceAcceleratorManufacturer: "aws", diff --git a/website/content/en/preview/concepts/scheduling.md b/website/content/en/preview/concepts/scheduling.md index 54abf7ed7488..e18d566407cb 100755 --- a/website/content/en/preview/concepts/scheduling.md +++ b/website/content/en/preview/concepts/scheduling.md @@ -70,6 +70,7 @@ Accelerator (e.g., GPU) values include - `nvidia.com/gpu` - `amd.com/gpu` - `aws.amazon.com/neuron` +- `aws.amazon.com/neuroncore` - `habana.ai/gaudi` Karpenter supports accelerators, such as GPUs. @@ -88,15 +89,23 @@ spec: nvidia.com/gpu: "1" ``` {{% alert title="Note" color="primary" %}} -If you are provisioning GPU nodes, you need to deploy an appropriate GPU device plugin daemonset for those nodes. -Without the daemonset running, Karpenter will not see those nodes as initialized. +If you are provisioning nodes that will utilize accelerators/GPUs, you need to deploy the appropriate device plugin daemonset. +Without the respective device plugin daemonset, Karpenter will not see those nodes as initialized. Refer to general [Kubernetes GPU](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#deploying-amd-gpu-device-plugin) docs and the following specific GPU docs: * `nvidia.com/gpu`: [NVIDIA device plugin for Kubernetes](https://github.com/NVIDIA/k8s-device-plugin) * `amd.com/gpu`: [AMD GPU device plugin for Kubernetes](https://github.com/RadeonOpenCompute/k8s-device-plugin) -* `aws.amazon.com/neuron`: [Kubernetes environment setup for Neuron](https://github.com/aws-neuron/aws-neuron-sdk/tree/master/src/k8) +* `aws.amazon.com/neuron`/`aws.amazon.com/neuroncore`: [AWS Neuron device plugin for Kubernetes](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/kubernetes-getting-started.html#neuron-device-plugin) * `habana.ai/gaudi`: [Habana device plugin for Kubernetes](https://docs.habana.ai/en/latest/Orchestration/Gaudi_Kubernetes/Habana_Device_Plugin_for_Kubernetes.html) {{% /alert %}} +#### AWS Neuron Resources + +The [Neuron scheduler extension](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/kubernetes-getting-started.html#neuron-scheduler-extension) is required for pods that require more than one Neuron core (`aws.amazon.com/neuroncore`) or device (`aws.amazon.com/neuron`) resource, but less than all available Neuron cores or devices on a node. From the AWS Neuron documentation: + +> The Neuron scheduler extension finds sets of directly connected devices with minimal communication latency when scheduling containers. On Inf1 and Inf2 instance types where Neuron devices are connected through a ring topology, the scheduler finds sets of contiguous devices. For example, for a container requesting 3 Neuron devices the scheduler might assign Neuron devices 0,1,2 to the container if they are available but never devices 0,2,4 because those devices are not directly connected. On Trn1.32xlarge and Trn1n.32xlarge instance types where devices are connected through a 2D torus topology, the Neuron scheduler enforces additional constraints that containers request 1, 4, 8, or all 16 devices. If your container requires a different number of devices, such as 2 or 5, we recommend that you use an Inf2 instance instead of Trn1 to benefit from more advanced topology. + +However, Karpenter is not aware of the decisions made by the Neuron scheduler extension which precludes it from making any optimizations to consolidate and bin pack pods requiring Neuron resources. To ensure Karpenter's bin-packing is consistent with the decisions made by the scheduler extension, containers must have like-sized, power of 2 requests (e.g. 1, 2, 4, etc). Failing to do so may result in permanently pending pods. + ### Pod ENI Resources (Security Groups for Pods) [Pod ENI](https://github.com/aws/amazon-vpc-cni-k8s#enable_pod_eni-v170) is a feature of the AWS VPC CNI Plugin which allows an Elastic Network Interface (ENI) to be allocated directly to a Pod. When enabled, the `vpc.amazonaws.com/pod-eni` extended resource is added to supported nodes. The Pod ENI feature can be used independently, but is most often used in conjunction with Security Groups for Pods. Follow the below instructions to enable support for Pod ENI and/or Security Groups for Pods in Karpenter. diff --git a/website/content/en/preview/reference/instance-types.md b/website/content/en/preview/reference/instance-types.md index 40ac3f1708cf..aa7fd6676342 100644 --- a/website/content/en/preview/reference/instance-types.md +++ b/website/content/en/preview/reference/instance-types.md @@ -5511,7 +5511,6 @@ below are the resources available with some assumptions and after the instance o #### Resources | Resource | Quantity | |--|--| - |aws.amazon.com/neuron|8| |cpu|95690m| |ephemeral-storage|17Gi| |memory|718987Mi| @@ -7245,6 +7244,166 @@ below are the resources available with some assumptions and after the instance o |ephemeral-storage|17Gi| |memory|237794Mi| |pods|394| +## hpc6a Family +### `hpc6a.48xlarge` +#### Labels + | Label | Value | + |--|--| + |karpenter.k8s.aws/instance-category|hpc| + |karpenter.k8s.aws/instance-cpu|96| + |karpenter.k8s.aws/instance-cpu-manufacturer|amd| + |karpenter.k8s.aws/instance-ebs-bandwidth|2085| + |karpenter.k8s.aws/instance-encryption-in-transit-supported|true| + |karpenter.k8s.aws/instance-family|hpc6a| + |karpenter.k8s.aws/instance-generation|6| + |karpenter.k8s.aws/instance-hypervisor|nitro| + |karpenter.k8s.aws/instance-memory|393216| + |karpenter.k8s.aws/instance-network-bandwidth|100000| + |karpenter.k8s.aws/instance-size|48xlarge| + |kubernetes.io/arch|amd64| + |kubernetes.io/os|linux| + |node.kubernetes.io/instance-type|hpc6a.48xlarge| +#### Resources + | Resource | Quantity | + |--|--| + |cpu|95690m| + |ephemeral-storage|17Gi| + |memory|362269Mi| + |pods|100| + |vpc.amazonaws.com/efa|1| +## hpc6id Family +### `hpc6id.32xlarge` +#### Labels + | Label | Value | + |--|--| + |karpenter.k8s.aws/instance-category|hpc| + |karpenter.k8s.aws/instance-cpu|64| + |karpenter.k8s.aws/instance-cpu-manufacturer|intel| + |karpenter.k8s.aws/instance-ebs-bandwidth|2085| + |karpenter.k8s.aws/instance-encryption-in-transit-supported|true| + |karpenter.k8s.aws/instance-family|hpc6id| + |karpenter.k8s.aws/instance-generation|6| + |karpenter.k8s.aws/instance-hypervisor|nitro| + |karpenter.k8s.aws/instance-local-nvme|15200| + |karpenter.k8s.aws/instance-memory|1048576| + |karpenter.k8s.aws/instance-network-bandwidth|200000| + |karpenter.k8s.aws/instance-size|32xlarge| + |kubernetes.io/arch|amd64| + |kubernetes.io/os|linux| + |node.kubernetes.io/instance-type|hpc6id.32xlarge| +#### Resources + | Resource | Quantity | + |--|--| + |cpu|63770m| + |ephemeral-storage|17Gi| + |memory|969016Mi| + |pods|51| + |vpc.amazonaws.com/efa|2| +## hpc7a Family +### `hpc7a.12xlarge` +#### Labels + | Label | Value | + |--|--| + |karpenter.k8s.aws/instance-category|hpc| + |karpenter.k8s.aws/instance-cpu|24| + |karpenter.k8s.aws/instance-cpu-manufacturer|amd| + |karpenter.k8s.aws/instance-ebs-bandwidth|2085| + |karpenter.k8s.aws/instance-encryption-in-transit-supported|true| + |karpenter.k8s.aws/instance-family|hpc7a| + |karpenter.k8s.aws/instance-generation|7| + |karpenter.k8s.aws/instance-hypervisor|nitro| + |karpenter.k8s.aws/instance-memory|786432| + |karpenter.k8s.aws/instance-network-bandwidth|300000| + |karpenter.k8s.aws/instance-size|12xlarge| + |kubernetes.io/arch|amd64| + |kubernetes.io/os|linux| + |node.kubernetes.io/instance-type|hpc7a.12xlarge| +#### Resources + | Resource | Quantity | + |--|--| + |cpu|23870m| + |ephemeral-storage|17Gi| + |memory|725994Mi| + |pods|100| + |vpc.amazonaws.com/efa|2| +### `hpc7a.24xlarge` +#### Labels + | Label | Value | + |--|--| + |karpenter.k8s.aws/instance-category|hpc| + |karpenter.k8s.aws/instance-cpu|48| + |karpenter.k8s.aws/instance-cpu-manufacturer|amd| + |karpenter.k8s.aws/instance-ebs-bandwidth|2085| + |karpenter.k8s.aws/instance-encryption-in-transit-supported|true| + |karpenter.k8s.aws/instance-family|hpc7a| + |karpenter.k8s.aws/instance-generation|7| + |karpenter.k8s.aws/instance-hypervisor|nitro| + |karpenter.k8s.aws/instance-memory|786432| + |karpenter.k8s.aws/instance-network-bandwidth|300000| + |karpenter.k8s.aws/instance-size|24xlarge| + |kubernetes.io/arch|amd64| + |kubernetes.io/os|linux| + |node.kubernetes.io/instance-type|hpc7a.24xlarge| +#### Resources + | Resource | Quantity | + |--|--| + |cpu|47810m| + |ephemeral-storage|17Gi| + |memory|725994Mi| + |pods|100| + |vpc.amazonaws.com/efa|2| +### `hpc7a.48xlarge` +#### Labels + | Label | Value | + |--|--| + |karpenter.k8s.aws/instance-category|hpc| + |karpenter.k8s.aws/instance-cpu|96| + |karpenter.k8s.aws/instance-cpu-manufacturer|amd| + |karpenter.k8s.aws/instance-ebs-bandwidth|2085| + |karpenter.k8s.aws/instance-encryption-in-transit-supported|true| + |karpenter.k8s.aws/instance-family|hpc7a| + |karpenter.k8s.aws/instance-generation|7| + |karpenter.k8s.aws/instance-hypervisor|nitro| + |karpenter.k8s.aws/instance-memory|786432| + |karpenter.k8s.aws/instance-network-bandwidth|300000| + |karpenter.k8s.aws/instance-size|48xlarge| + |kubernetes.io/arch|amd64| + |kubernetes.io/os|linux| + |node.kubernetes.io/instance-type|hpc7a.48xlarge| +#### Resources + | Resource | Quantity | + |--|--| + |cpu|95690m| + |ephemeral-storage|17Gi| + |memory|725994Mi| + |pods|100| + |vpc.amazonaws.com/efa|2| +### `hpc7a.96xlarge` +#### Labels + | Label | Value | + |--|--| + |karpenter.k8s.aws/instance-category|hpc| + |karpenter.k8s.aws/instance-cpu|192| + |karpenter.k8s.aws/instance-cpu-manufacturer|amd| + |karpenter.k8s.aws/instance-ebs-bandwidth|2085| + |karpenter.k8s.aws/instance-encryption-in-transit-supported|true| + |karpenter.k8s.aws/instance-family|hpc7a| + |karpenter.k8s.aws/instance-generation|7| + |karpenter.k8s.aws/instance-hypervisor|nitro| + |karpenter.k8s.aws/instance-memory|786432| + |karpenter.k8s.aws/instance-network-bandwidth|300000| + |karpenter.k8s.aws/instance-size|96xlarge| + |kubernetes.io/arch|amd64| + |kubernetes.io/os|linux| + |node.kubernetes.io/instance-type|hpc7a.96xlarge| +#### Resources + | Resource | Quantity | + |--|--| + |cpu|191450m| + |ephemeral-storage|17Gi| + |memory|725994Mi| + |pods|100| + |vpc.amazonaws.com/efa|2| ## hpc7g Family ### `hpc7g.4xlarge` #### Labels @@ -8448,6 +8607,7 @@ below are the resources available with some assumptions and after the instance o | Resource | Quantity | |--|--| |aws.amazon.com/neuron|1| + |aws.amazon.com/neuroncore|4| |cpu|3920m| |ephemeral-storage|17Gi| |memory|6804Mi| @@ -8478,6 +8638,7 @@ below are the resources available with some assumptions and after the instance o | Resource | Quantity | |--|--| |aws.amazon.com/neuron|1| + |aws.amazon.com/neuroncore|4| |cpu|7910m| |ephemeral-storage|17Gi| |memory|14382Mi| @@ -8508,6 +8669,7 @@ below are the resources available with some assumptions and after the instance o | Resource | Quantity | |--|--| |aws.amazon.com/neuron|4| + |aws.amazon.com/neuroncore|16| |cpu|23870m| |ephemeral-storage|17Gi| |memory|42536Mi| @@ -8538,6 +8700,7 @@ below are the resources available with some assumptions and after the instance o | Resource | Quantity | |--|--| |aws.amazon.com/neuron|16| + |aws.amazon.com/neuroncore|64| |cpu|95690m| |ephemeral-storage|17Gi| |memory|177976Mi| @@ -8551,7 +8714,7 @@ below are the resources available with some assumptions and after the instance o |--|--| |karpenter.k8s.aws/instance-accelerator-count|1| |karpenter.k8s.aws/instance-accelerator-manufacturer|aws| - |karpenter.k8s.aws/instance-accelerator-name|inferentia| + |karpenter.k8s.aws/instance-accelerator-name|inferentia2| |karpenter.k8s.aws/instance-category|inf| |karpenter.k8s.aws/instance-cpu|4| |karpenter.k8s.aws/instance-cpu-manufacturer|amd| @@ -8570,6 +8733,7 @@ below are the resources available with some assumptions and after the instance o | Resource | Quantity | |--|--| |aws.amazon.com/neuron|1| + |aws.amazon.com/neuroncore|2| |cpu|3920m| |ephemeral-storage|17Gi| |memory|14162Mi| @@ -8581,7 +8745,7 @@ below are the resources available with some assumptions and after the instance o |--|--| |karpenter.k8s.aws/instance-accelerator-count|1| |karpenter.k8s.aws/instance-accelerator-manufacturer|aws| - |karpenter.k8s.aws/instance-accelerator-name|inferentia| + |karpenter.k8s.aws/instance-accelerator-name|inferentia2| |karpenter.k8s.aws/instance-category|inf| |karpenter.k8s.aws/instance-cpu|32| |karpenter.k8s.aws/instance-cpu-manufacturer|amd| @@ -8600,6 +8764,7 @@ below are the resources available with some assumptions and after the instance o | Resource | Quantity | |--|--| |aws.amazon.com/neuron|1| + |aws.amazon.com/neuroncore|2| |cpu|31850m| |ephemeral-storage|17Gi| |memory|118312Mi| @@ -8611,7 +8776,7 @@ below are the resources available with some assumptions and after the instance o |--|--| |karpenter.k8s.aws/instance-accelerator-count|6| |karpenter.k8s.aws/instance-accelerator-manufacturer|aws| - |karpenter.k8s.aws/instance-accelerator-name|inferentia| + |karpenter.k8s.aws/instance-accelerator-name|inferentia2| |karpenter.k8s.aws/instance-category|inf| |karpenter.k8s.aws/instance-cpu|96| |karpenter.k8s.aws/instance-cpu-manufacturer|amd| @@ -8630,6 +8795,7 @@ below are the resources available with some assumptions and after the instance o | Resource | Quantity | |--|--| |aws.amazon.com/neuron|6| + |aws.amazon.com/neuroncore|12| |cpu|95690m| |ephemeral-storage|17Gi| |memory|355262Mi| @@ -8641,7 +8807,7 @@ below are the resources available with some assumptions and after the instance o |--|--| |karpenter.k8s.aws/instance-accelerator-count|12| |karpenter.k8s.aws/instance-accelerator-manufacturer|aws| - |karpenter.k8s.aws/instance-accelerator-name|inferentia| + |karpenter.k8s.aws/instance-accelerator-name|inferentia2| |karpenter.k8s.aws/instance-category|inf| |karpenter.k8s.aws/instance-cpu|192| |karpenter.k8s.aws/instance-cpu-manufacturer|amd| @@ -8660,6 +8826,7 @@ below are the resources available with some assumptions and after the instance o | Resource | Quantity | |--|--| |aws.amazon.com/neuron|12| + |aws.amazon.com/neuroncore|24| |cpu|191450m| |ephemeral-storage|17Gi| |memory|718987Mi| @@ -14449,6 +14616,39 @@ below are the resources available with some assumptions and after the instance o |pods|100| |vpc.amazonaws.com/efa|32| |vpc.amazonaws.com/pod-eni|120| +## p5e Family +### `p5e.48xlarge` +#### Labels + | Label | Value | + |--|--| + |karpenter.k8s.aws/instance-category|p| + |karpenter.k8s.aws/instance-cpu|192| + |karpenter.k8s.aws/instance-cpu-manufacturer|amd| + |karpenter.k8s.aws/instance-ebs-bandwidth|80000| + |karpenter.k8s.aws/instance-encryption-in-transit-supported|true| + |karpenter.k8s.aws/instance-family|p5e| + |karpenter.k8s.aws/instance-generation|5| + |karpenter.k8s.aws/instance-gpu-count|8| + |karpenter.k8s.aws/instance-gpu-manufacturer|nvidia| + |karpenter.k8s.aws/instance-gpu-memory|144384| + |karpenter.k8s.aws/instance-gpu-name|h200| + |karpenter.k8s.aws/instance-hypervisor|nitro| + |karpenter.k8s.aws/instance-local-nvme|30400| + |karpenter.k8s.aws/instance-memory|2097152| + |karpenter.k8s.aws/instance-network-bandwidth|3200000| + |karpenter.k8s.aws/instance-size|48xlarge| + |kubernetes.io/arch|amd64| + |kubernetes.io/os|linux| + |node.kubernetes.io/instance-type|p5e.48xlarge| +#### Resources + | Resource | Quantity | + |--|--| + |cpu|191450m| + |ephemeral-storage|17Gi| + |memory|1938410Mi| + |nvidia.com/gpu|8| + |pods|100| + |vpc.amazonaws.com/efa|32| ## r3 Family ### `r3.large` #### Labels @@ -20568,7 +20768,7 @@ below are the resources available with some assumptions and after the instance o |--|--| |karpenter.k8s.aws/instance-accelerator-count|1| |karpenter.k8s.aws/instance-accelerator-manufacturer|aws| - |karpenter.k8s.aws/instance-accelerator-name|inferentia| + |karpenter.k8s.aws/instance-accelerator-name|trainium| |karpenter.k8s.aws/instance-category|trn| |karpenter.k8s.aws/instance-cpu|8| |karpenter.k8s.aws/instance-cpu-manufacturer|intel| @@ -20588,6 +20788,7 @@ below are the resources available with some assumptions and after the instance o | Resource | Quantity | |--|--| |aws.amazon.com/neuron|1| + |aws.amazon.com/neuroncore|2| |cpu|7910m| |ephemeral-storage|17Gi| |memory|29317Mi| @@ -20599,7 +20800,7 @@ below are the resources available with some assumptions and after the instance o |--|--| |karpenter.k8s.aws/instance-accelerator-count|16| |karpenter.k8s.aws/instance-accelerator-manufacturer|aws| - |karpenter.k8s.aws/instance-accelerator-name|inferentia| + |karpenter.k8s.aws/instance-accelerator-name|trainium| |karpenter.k8s.aws/instance-category|trn| |karpenter.k8s.aws/instance-cpu|128| |karpenter.k8s.aws/instance-cpu-manufacturer|intel| @@ -20619,6 +20820,7 @@ below are the resources available with some assumptions and after the instance o | Resource | Quantity | |--|--| |aws.amazon.com/neuron|16| + |aws.amazon.com/neuroncore|32| |cpu|127610m| |ephemeral-storage|17Gi| |memory|481894Mi| @@ -20632,7 +20834,7 @@ below are the resources available with some assumptions and after the instance o |--|--| |karpenter.k8s.aws/instance-accelerator-count|16| |karpenter.k8s.aws/instance-accelerator-manufacturer|aws| - |karpenter.k8s.aws/instance-accelerator-name|inferentia| + |karpenter.k8s.aws/instance-accelerator-name|trainium| |karpenter.k8s.aws/instance-category|trn| |karpenter.k8s.aws/instance-cpu|128| |karpenter.k8s.aws/instance-cpu-manufacturer|intel| @@ -20652,6 +20854,7 @@ below are the resources available with some assumptions and after the instance o | Resource | Quantity | |--|--| |aws.amazon.com/neuron|16| + |aws.amazon.com/neuroncore|32| |cpu|127610m| |ephemeral-storage|17Gi| |memory|481894Mi| diff --git a/website/content/en/preview/upgrading/upgrade-guide.md b/website/content/en/preview/upgrading/upgrade-guide.md index 4a5289d40768..f4f0bfaae3e3 100644 --- a/website/content/en/preview/upgrading/upgrade-guide.md +++ b/website/content/en/preview/upgrading/upgrade-guide.md @@ -37,6 +37,7 @@ WHEN CREATING A NEW SECTION OF THE UPGRADE GUIDANCE FOR NEWER VERSIONS, ENSURE T * Bottlerocket AMIFamily now supports `instanceStorePolicy: RAID0`. This means that Karpenter will auto-generate userData to RAID0 your instance store volumes (similar to AL2 and AL2023) when specifying this value. * Note: This userData configuration is _only_ valid on Bottlerocket v1.22.0+. If you are using an earlier version of a Bottlerocket image (< v1.22.0) with `amiFamily: Bottlerocket` and `instanceStorePolicy: RAID0`, nodes will fail to join the cluster. +* The AWS Neuron accelerator well known name label (`karpenter.k8s.aws/instance-accelerator-name`) values now reflect their correct names of `trainium`, `inferentia`, and `inferentia2`. Previously, all Neuron accelerators were assigned the label name of `inferentia`. ### Upgrading to `1.0.0`+