Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Update spot pricing even in Isolated VPC #5704

Merged
merged 29 commits into from
Mar 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
ad1edc9
remove isolated vpc check for spot pricing
liafizan Feb 15, 2024
9207c8d
adding test for isolate vpc & spot pricing
liafizan Feb 19, 2024
d8f2f9e
Merge branch 'aws:main' into main
liafizan Feb 19, 2024
b11e586
remove debug stmt
liafizan Feb 19, 2024
c57e83d
remove test that did not serve purpose
liafizan Feb 21, 2024
7a5293e
update document
liafizan Feb 21, 2024
19405f6
Merge branch 'main' into main
liafizan Feb 21, 2024
2a80849
additional changes for spot pricing update only
liafizan Feb 21, 2024
0071e04
Merge remote-tracking branch 'origin/main'
liafizan Feb 21, 2024
0dc8e4e
update test case
liafizan Feb 22, 2024
dc43d7a
revert change as per pr feedback
liafizan Feb 22, 2024
daef3e4
Updated using docgen
liafizan Feb 22, 2024
d103e1a
Update pkg/operator/options/options.go
liafizan Feb 27, 2024
54ce1b6
Merge branch 'main' into main
liafizan Feb 27, 2024
6885d63
use PR feedback
liafizan Feb 27, 2024
e8197a1
Merge branch 'main' of github.com:liafizan/karpenter-provider-aws
liafizan Feb 27, 2024
10f51ee
use PR feedback and update test
liafizan Feb 27, 2024
2d12fdb
disable initial check for pricing controller
liafizan Feb 27, 2024
49192b2
update docs
liafizan Feb 27, 2024
1353f61
adding logging
liafizan Feb 27, 2024
711c8cf
revert website preview changes
liafizan Feb 27, 2024
cc80eb3
Merge pull request #1 from liafizan/patch-1
liafizan Feb 27, 2024
a39ce5d
pr feedback for logging statements
liafizan Feb 27, 2024
1539bbf
Merge remote-tracking branch 'origin/main'
liafizan Feb 27, 2024
8123c31
pr feedback for tests & logs
liafizan Feb 28, 2024
96c6308
Merge branch 'aws:main' into main
liafizan Feb 28, 2024
4875dcd
Merge branch 'aws:main' into main
liafizan Mar 1, 2024
c8edcdf
pr feedback changes
liafizan Mar 1, 2024
66e9cb7
pr feedback changes
liafizan Mar 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 1 addition & 6 deletions pkg/controllers/controllers.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ import (
servicesqs "github.com/aws/aws-sdk-go/service/sqs"
"github.com/samber/lo"
"k8s.io/utils/clock"
"knative.dev/pkg/logging"
"sigs.k8s.io/controller-runtime/pkg/client"

"sigs.k8s.io/karpenter/pkg/events"
Expand Down Expand Up @@ -55,14 +54,10 @@ func NewControllers(ctx context.Context, sess *session.Session, clk clock.Clock,
nodeclass.NewController(kubeClient, recorder, subnetProvider, securityGroupProvider, amiProvider, instanceProfileProvider, launchTemplateProvider),
nodeclaimgarbagecollection.NewController(kubeClient, cloudProvider),
nodeclaimtagging.NewController(kubeClient, instanceProvider),
pricing.NewController(pricingProvider),
}
if options.FromContext(ctx).InterruptionQueue != "" {
controllers = append(controllers, interruption.NewController(kubeClient, clk, recorder, lo.Must(sqs.NewProvider(ctx, servicesqs.New(sess), options.FromContext(ctx).InterruptionQueue)), unavailableOfferings))
}
if options.FromContext(ctx).IsolatedVPC {
logging.FromContext(ctx).Infof("assuming isolated VPC, pricing information will not be updated")
liafizan marked this conversation as resolved.
Show resolved Hide resolved
} else {
controllers = append(controllers, pricing.NewController(pricingProvider))
}
return controllers
}
2 changes: 1 addition & 1 deletion pkg/operator/options/options.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ func (o *Options) AddFlags(fs *coreoptions.FlagSet) {
fs.StringVar(&o.ClusterCABundle, "cluster-ca-bundle", env.WithDefaultString("CLUSTER_CA_BUNDLE", ""), "Cluster CA bundle for nodes to use for TLS connections with the API server. If not set, this is taken from the controller's TLS configuration.")
fs.StringVar(&o.ClusterName, "cluster-name", env.WithDefaultString("CLUSTER_NAME", ""), "[REQUIRED] The kubernetes cluster name for resource discovery.")
fs.StringVar(&o.ClusterEndpoint, "cluster-endpoint", env.WithDefaultString("CLUSTER_ENDPOINT", ""), "The external kubernetes cluster endpoint for new nodes to connect with. If not specified, will discover the cluster endpoint using DescribeCluster API.")
fs.BoolVarWithEnv(&o.IsolatedVPC, "isolated-vpc", "ISOLATED_VPC", false, "If true, then assume we can't reach AWS services which don't have a VPC endpoint. This also has the effect of disabling look-ups to the AWS pricing endpoint.")
fs.BoolVarWithEnv(&o.IsolatedVPC, "isolated-vpc", "ISOLATED_VPC", false, "If true, then assume we can't reach AWS services which don't have a VPC endpoint. This also has the effect of disabling look-ups to the AWS on-demand pricing endpoint.")
fs.Float64Var(&o.VMMemoryOverheadPercent, "vm-memory-overhead-percent", env.WithDefaultFloat64("VM_MEMORY_OVERHEAD_PERCENT", 0.075), "The VM memory overhead as a percent that will be subtracted from the total memory for all instance types.")
fs.StringVar(&o.InterruptionQueue, "interruption-queue", env.WithDefaultString("INTERRUPTION_QUEUE", ""), "Interruption queue is disabled if not specified. Enabling interruption handling may require additional permissions on the controller service account. Additional permissions are outlined in the docs.")
fs.IntVar(&o.ReservedENIs, "reserved-enis", env.WithDefaultInt("RESERVED_ENIS", 0), "Reserved ENIs are not included in the calculations for max-pods or kube-reserved. This is most often used in the VPC CNI custom networking setup https://docs.aws.amazon.com/eks/latest/userguide/cni-custom-network.html.")
Expand Down
3 changes: 2 additions & 1 deletion pkg/providers/pricing/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,10 @@ func (c *Controller) Builder(_ context.Context, m manager.Manager) corecontrolle

func (c *Controller) updatePricing(ctx context.Context) error {
work := []func(ctx context.Context) error{
c.pricingProvider.UpdateOnDemandPricing,
c.pricingProvider.UpdateSpotPricing,
c.pricingProvider.UpdateOnDemandPricing,
}

errs := make([]error, len(work))
lop.ForEach(work, func(f func(ctx context.Context) error, i int) {
if err := f(ctx); err != nil {
Expand Down
9 changes: 9 additions & 0 deletions pkg/providers/pricing/pricing.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ import (
"sync"
"time"

"github.com/aws/karpenter-provider-aws/pkg/operator/options"

"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/ec2"
Expand Down Expand Up @@ -148,6 +150,13 @@ func (p *Provider) UpdateOnDemandPricing(ctx context.Context) error {
var onDemandPrices, onDemandMetalPrices map[string]float64
var onDemandErr, onDemandMetalErr error

// if we are in isolated vpc, skip updating on demand pricing
// as pricing api may not be available
if options.FromContext(ctx).IsolatedVPC {
logging.FromContext(ctx).Infof("assuming isolated VPC, on-demand pricing information will not be updated")
return nil
}

wg.Add(1)
go func() {
defer wg.Done()
Expand Down
40 changes: 40 additions & 0 deletions pkg/providers/pricing/suite_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,46 @@ var _ = Describe("Pricing", func() {
Expect(lo.Map(inp.ProductDescriptions, func(x *string, _ int) string { return *x })).
To(ContainElements("Linux/UNIX", "Linux/UNIX (Amazon VPC)"))
})
It("should return static on-demand data when in isolated-vpc", func() {
ctx = options.ToContext(ctx, test.Options(test.OptionsFields{
IsolatedVPC: lo.ToPtr(true),
}))
now := time.Now()
awsEnv.EC2API.DescribeSpotPriceHistoryOutput.Set(&ec2.DescribeSpotPriceHistoryOutput{
SpotPriceHistory: []*ec2.SpotPrice{
{
AvailabilityZone: aws.String("test-zone-1b"),
InstanceType: aws.String("c99.large"),
SpotPrice: aws.String("1.50"),
Timestamp: &now,
},
{
AvailabilityZone: aws.String("test-zone-1b"),
InstanceType: aws.String("c98.large"),
SpotPrice: aws.String("1.10"),
Timestamp: &now,
},
},
})

awsEnv.PricingAPI.GetProductsOutput.Set(&awspricing.GetProductsOutput{
// these are incorrect prices which are here to ensure that
// results from only static pricing are used
PriceList: []aws.JSONValue{
fake.NewOnDemandPrice("c3.2xlarge", 1.20),
fake.NewOnDemandPrice("c5.xlarge", 1.23),
},
})
ExpectReconcileSucceeded(ctx, controller, types.NamespacedName{})
price, ok := awsEnv.PricingProvider.OnDemandPrice("c3.2xlarge")
jonathan-innis marked this conversation as resolved.
Show resolved Hide resolved
Expect(ok).To(BeTrue())
Expect(price).To(BeNumerically("==", 0.420000))

price, ok = awsEnv.PricingProvider.SpotPrice("c98.large", "test-zone-1b")
Expect(ok).To(BeTrue())
Expect(price).To(BeNumerically("==", 1.10))
Expect(getPricingEstimateMetricValue("c98.large", ec2.UsageClassTypeSpot, "test-zone-1b")).To(BeNumerically("==", 1.10))
})
})

func getPricingEstimateMetricValue(instanceType string, capacityType string, zone string) float64 {
Expand Down
52 changes: 26 additions & 26 deletions website/content/en/docs/reference/settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,32 +10,32 @@ Karpenter surfaces environment variables and CLI parameters to allow you to conf

[comment]: <> (the content below is generated from hack/docs/configuration_gen_docs.go)

| Environment Variable | CLI Flag | Description |
|--|--|--|
| ASSUME_ROLE_ARN | \-\-assume-role-arn | Role to assume for calling AWS services.|
| ASSUME_ROLE_DURATION | \-\-assume-role-duration | Duration of assumed credentials in minutes. Default value is 15 minutes. Not used unless aws.assumeRole set. (default = 15m0s)|
| BATCH_IDLE_DURATION | \-\-batch-idle-duration | The maximum amount of time with no new pending pods that if exceeded ends the current batching window. If pods arrive faster than this time, the batching window will be extended up to the maxDuration. If they arrive slower, the pods will be batched separately. (default = 1s)|
| BATCH_MAX_DURATION | \-\-batch-max-duration | The maximum length of a batch window. The longer this is, the more pods we can consider for provisioning at one time which usually results in fewer but larger nodes. (default = 10s)|
| CLUSTER_CA_BUNDLE | \-\-cluster-ca-bundle | Cluster CA bundle for nodes to use for TLS connections with the API server. If not set, this is taken from the controller's TLS configuration.|
| CLUSTER_ENDPOINT | \-\-cluster-endpoint | The external kubernetes cluster endpoint for new nodes to connect with. If not specified, will discover the cluster endpoint using DescribeCluster API.|
| CLUSTER_NAME | \-\-cluster-name | [REQUIRED] The kubernetes cluster name for resource discovery.|
| DISABLE_WEBHOOK | \-\-disable-webhook | Disable the admission and validation webhooks|
| ENABLE_PROFILING | \-\-enable-profiling | Enable the profiling on the metric endpoint|
| FEATURE_GATES | \-\-feature-gates | Optional features can be enabled / disabled using feature gates. Current options are: Drift,SpotToSpotConsolidation (default = Drift=true,SpotToSpotConsolidation=false)|
| HEALTH_PROBE_PORT | \-\-health-probe-port | The port the health probe endpoint binds to for reporting controller health (default = 8081)|
| INTERRUPTION_QUEUE | \-\-interruption-queue | Interruption queue is disabled if not specified. Enabling interruption handling may require additional permissions on the controller service account. Additional permissions are outlined in the docs.|
| ISOLATED_VPC | \-\-isolated-vpc | If true, then assume we can't reach AWS services which don't have a VPC endpoint. This also has the effect of disabling look-ups to the AWS pricing endpoint.|
| KARPENTER_SERVICE | \-\-karpenter-service | The Karpenter Service name for the dynamic webhook certificate|
| KUBE_CLIENT_BURST | \-\-kube-client-burst | The maximum allowed burst of queries to the kube-apiserver (default = 300)|
| KUBE_CLIENT_QPS | \-\-kube-client-qps | The smoothed rate of qps to kube-apiserver (default = 200)|
| LEADER_ELECT | \-\-leader-elect | Start leader election client and gain leadership before executing the main loop. Enable this when running replicated components for high availability.|
| LOG_LEVEL | \-\-log-level | Log verbosity level. Can be one of 'debug', 'info', or 'error' (default = info)|
| MEMORY_LIMIT | \-\-memory-limit | Memory limit on the container running the controller. The GC soft memory limit is set to 90% of this value. (default = -1)|
| METRICS_PORT | \-\-metrics-port | The port the metric endpoint binds to for operating metrics about the controller itself (default = 8000)|
| RESERVED_ENIS | \-\-reserved-enis | Reserved ENIs are not included in the calculations for max-pods or kube-reserved. This is most often used in the VPC CNI custom networking setup https://docs.aws.amazon.com/eks/latest/userguide/cni-custom-network.html. (default = 0)|
| VM_MEMORY_OVERHEAD_PERCENT | \-\-vm-memory-overhead-percent | The VM memory overhead as a percent that will be subtracted from the total memory for all instance types. (default = 0.075)|
| WEBHOOK_METRICS_PORT | \-\-webhook-metrics-port | The port the webhook metric endpoing binds to for operating metrics about the webhook (default = 8001)|
| WEBHOOK_PORT | \-\-webhook-port | The port the webhook endpoint binds to for validation and mutation of resources (default = 8443)|
| Environment Variable | CLI Flag | Description |
liafizan marked this conversation as resolved.
Show resolved Hide resolved
liafizan marked this conversation as resolved.
Show resolved Hide resolved
|--|--|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ASSUME_ROLE_ARN | \-\-assume-role-arn | Role to assume for calling AWS services. |
| ASSUME_ROLE_DURATION | \-\-assume-role-duration | Duration of assumed credentials in minutes. Default value is 15 minutes. Not used unless aws.assumeRole set. (default = 15m0s) |
| BATCH_IDLE_DURATION | \-\-batch-idle-duration | The maximum amount of time with no new pending pods that if exceeded ends the current batching window. If pods arrive faster than this time, the batching window will be extended up to the maxDuration. If they arrive slower, the pods will be batched separately. (default = 1s) |
| BATCH_MAX_DURATION | \-\-batch-max-duration | The maximum length of a batch window. The longer this is, the more pods we can consider for provisioning at one time which usually results in fewer but larger nodes. (default = 10s) |
| CLUSTER_CA_BUNDLE | \-\-cluster-ca-bundle | Cluster CA bundle for nodes to use for TLS connections with the API server. If not set, this is taken from the controller's TLS configuration. |
| CLUSTER_ENDPOINT | \-\-cluster-endpoint | The external kubernetes cluster endpoint for new nodes to connect with. If not specified, will discover the cluster endpoint using DescribeCluster API. |
| CLUSTER_NAME | \-\-cluster-name | [REQUIRED] The kubernetes cluster name for resource discovery. |
| DISABLE_WEBHOOK | \-\-disable-webhook | Disable the admission and validation webhooks |
| ENABLE_PROFILING | \-\-enable-profiling | Enable the profiling on the metric endpoint |
| FEATURE_GATES | \-\-feature-gates | Optional features can be enabled / disabled using feature gates. Current options are: Drift,SpotToSpotConsolidation (default = Drift=true,SpotToSpotConsolidation=false) |
| HEALTH_PROBE_PORT | \-\-health-probe-port | The port the health probe endpoint binds to for reporting controller health (default = 8081) |
| INTERRUPTION_QUEUE | \-\-interruption-queue | Interruption queue is disabled if not specified. Enabling interruption handling may require additional permissions on the controller service account. Additional permissions are outlined in the docs. |
| ISOLATED_VPC | \-\-isolated-vpc | If true, then assume we can't reach AWS services which don't have a VPC endpoint. This also has the effect of disabling look-ups to the AWS pricing endpoint. |
| KARPENTER_SERVICE | \-\-karpenter-service | The Karpenter Service name for the dynamic webhook certificate |
| KUBE_CLIENT_BURST | \-\-kube-client-burst | The maximum allowed burst of queries to the kube-apiserver (default = 300) |
| KUBE_CLIENT_QPS | \-\-kube-client-qps | The smoothed rate of qps to kube-apiserver (default = 200) |
| LEADER_ELECT | \-\-leader-elect | Start leader election client and gain leadership before executing the main loop. Enable this when running replicated components for high availability. |
| LOG_LEVEL | \-\-log-level | Log verbosity level. Can be one of 'debug', 'info', or 'error' (default = info) |
| MEMORY_LIMIT | \-\-memory-limit | Memory limit on the container running the controller. The GC soft memory limit is set to 90% of this value. (default = -1) |
| METRICS_PORT | \-\-metrics-port | The port the metric endpoint binds to for operating metrics about the controller itself (default = 8000) |
| RESERVED_ENIS | \-\-reserved-enis | Reserved ENIs are not included in the calculations for max-pods or kube-reserved. This is most often used in the VPC CNI custom networking setup https://docs.aws.amazon.com/eks/latest/userguide/cni-custom-network.html. (default = 0) |
| VM_MEMORY_OVERHEAD_PERCENT | \-\-vm-memory-overhead-percent | The VM memory overhead as a percent that will be subtracted from the total memory for all instance types. (default = 0.075) |
| WEBHOOK_METRICS_PORT | \-\-webhook-metrics-port | The port the webhook metric endpoing binds to for operating metrics about the webhook (default = 8001) |
| WEBHOOK_PORT | \-\-webhook-port | The port the webhook endpoint binds to for validation and mutation of resources (default = 8443) |

[comment]: <> (end docs generated content from hack/docs/configuration_gen_docs.go)

Expand Down
Loading
Loading