Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm upgrade for karpenter version 0.37.0 is failing with context deadline exceeded. #173

Open
IndhumithaR opened this issue May 30, 2024 · 4 comments

Comments

@IndhumithaR
Copy link

IndhumithaR commented May 30, 2024

Hi,
When I am trying to upgrade karpenter version from v0.29.1 to 0.37.0 I am getting context deadline exceeded error.

Received response status [FAILED] from custom resource. Message returned: Error: b'Error: UPGRADE FAILED: context deadline exceeded\n'

I even tried to upgrade to 0.33.1. Facing the same issue.
We guess the helm chart upgradation is taking more time than expected but not very sure.
Is there anyways we can solve this issue? Can we increase the timeout time for helm upgrade?

@andskli
Copy link
Contributor

andskli commented May 30, 2024

It would be helpful to understand which resource is failing here, at least to confirm it's the Helm chart, do you have additional logs from the Lambda function which backs the custom resource?

@IndhumithaR
Copy link
Author

Hi,

This is the lambda function logs,

LAMBDA_WARNING: Unhandled exception. The most likely cause is an issue in the function code. However, in rare cases, a Lambda runtime update can cause unexpected function behavior. For functions using managed runtimes, runtime updates can be triggered by a function change, or can be applied automatically. To determine if the runtime has been updated, check the runtime version in the INIT_START log entry. If this error correlates with a change in the runtime version, you may be able to mitigate this error by temporarily rolling back to the previous runtime version. For more information, see https://docs.aws.amazon.com/lambda/latest/dg/runtimes-update.html
[ERROR] Exception: b'Error: UPGRADE FAILED: context deadline exceeded\n' Traceback (most recent call last):   File "/var/task/index.py", line 17, in handler

[ERROR] Exception: b'Error: UPGRADE FAILED: context deadline exceeded\n'
Traceback (most recent call last):
  File "/var/task/index.py", line 17, in handler
    return helm_handler(event, context)
  File "/var/task/helm/__init__.py", line 93, in helm_handler
    helm('upgrade', release, chart, repository, values_file, namespace, version, wait, timeout, create_namespace)
  File "/var/task/helm/__init__.py", line 199, in helm
    raise Exception(output)

Thanks

@theslyone
Copy link

theslyone commented Jul 8, 2024

Having similar issues as well. Dug deeper into the root cause and it appears to be related to helm upgrade and kubernetes api rate limiting. It is my understanding that helm is timing out during the helm upgrade call on L#199

@theslyone
Copy link

theslyone commented Jul 8, 2024

TLDR;

What worked for me was to first have a managed node group setup within my eks cluster before configuring karpenter. You could also add a fargate profile to your cluster first. reference here.

EXPLANATION

Running helm directly against my eks cluster with the --debug flag reveals that karpenter requires a node to be available because it attempts to startup some pods (see screenshoot below).

Screenshot 2024-07-08 at 2 07 26 AM

Since this lib configures helm to wait for completion as seen below, it eventually hits the timeout.

this.chart = this.cluster.addHelmChart('karpenter', {
// This one is important, if we don't ask helm to wait for resources to become available, the
// subsequent creation of karpenter resources will fail.
wait: true,
chart: 'karpenter',
release: 'karpenter',
repository: repoUrl,
namespace: this.namespace,
version: this.version,
createNamespace: false,
// We will merge our dyanmic `helmExtraValues` with the fixed values. Where the fixed values
// will override the dynamic values.
values: { ...this.helmExtraValues, ...this.helmChartValues },
});
this.chart.node.addDependency(namespace);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants