Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter: panic #299

Closed
1 task done
momelod opened this issue Nov 4, 2023 · 6 comments · Fixed by #315
Closed
1 task done

Karpenter: panic #299

momelod opened this issue Nov 4, 2023 · 6 comments · Fixed by #315
Milestone

Comments

@momelod
Copy link

momelod commented Nov 4, 2023

Description

v0.10.0 introduces a new bug.
Karpter pods error with:
panic: validating settings, missing field(s): aws.clusterName, aws.clusterName is required

Reverting the module version to 1.9.0 corrects the error.

  • ✋ I have searched the open/closed issues and my issue is not listed.

Versions

  • Module version [Required]: v0.10.0

  • Terraform version:
    Terraform v1.4.6

  • Provider version(s):

  • provider registry.terraform.io/alekc/kubectl v2.0.3
  • provider registry.terraform.io/hashicorp/aws v5.21.0
  • provider registry.terraform.io/hashicorp/cloudinit v2.3.2
  • provider registry.terraform.io/hashicorp/helm v2.11.0
  • provider registry.terraform.io/hashicorp/kubernetes v2.23.0
  • provider registry.terraform.io/hashicorp/null v3.2.1
  • provider registry.terraform.io/hashicorp/time v0.9.1
  • provider registry.terraform.io/hashicorp/tls v4.0.4
  • provider registry.terraform.io/hashicorp/vault v3.20.1

Reproduction Code [Required]

Steps to reproduce the behavior:

Upgrade the module version string, and run terraform plan
Observe the output and note the required values have been incorrectly placed in the yaml hierarchy:

      - set {
          - name  = "settings.aws.clusterName" -> null
          - value = "eks02-prod-rops-cac1" -> null
        }
      + set {
          + name  = "settings.clusterName"
          + value = "eks02-prod-rops-cac1"
        } 

Expected behaviour

Values should be in the correct place so Karpenter wont immediately panic and pods should be in a running state after fresh install.

Actual behaviour

Karpenter pods fail to start with the error: panic: validating settings, missing field(s): aws.clusterName, aws.clusterName is required

Additional context

I confirmed this problem on both existing and freshly installed clusters.

@bryantbiggs
Copy link
Contributor

please try using v1.11.0 which was recently released - more details can be found here #298

@momelod
Copy link
Author

momelod commented Nov 4, 2023

After upgrading to v1.11.0 I see a new error: panic: failed to setup nodeclaim provider id indexer: failed to get API group resources: unable to retrieve the complete list of server APIs: karpenter.sh/v1beta1: the server could not find the requested resource

So after that I re-installed karpenter by setting enable_karpenter = false, followed by terraform apply. After removal, went back in set enable_karpenter = true and once more terraform apply.

This recreated everything correctly and my Karpenter pods are now ready.

Thank you very much

@momelod
Copy link
Author

momelod commented Nov 4, 2023

Upon creating a new node, I am getting more errors. This time in reference to the default instance profile which seems undefined by the module.

"error": "launching machine, creating instance, getting launch template configs, getting launch templates, neither spec.provider.instanceProfile nor --aws-default-instance-profile is specified"

I see this value was set prior to v1.10.0 but is not set in this version.

terraform plan:

- set {
  - name  = "settings.aws.defaultInstanceProfile" -> null
  - value = "karpenter-my-superawesomecluser-20231104170235461500000001" -> null
- }

@bryantbiggs
Copy link
Contributor

Have you seen the Karpenter upgrade guide https://karpenter.sh/preview/upgrading/upgrade-guide/ - and specifically, for 0.32.0?

@momelod
Copy link
Author

momelod commented Nov 4, 2023

Sadly, no. I didn't see a breaking change in the release notes so was not aware I had to. I foolishly assumed it would work out of the box.

Now, after reviewing the upgrade guide I can confirm non of the breaking changes should be affecting the defaultInstanceProfile. There is mention of changes needed to support hostNetwork which I am not using.

@cdenneen
Copy link

cdenneen commented Nov 6, 2023

As per the migration docs you should be able to run both API versions side-by-side to allow for the migration from Provisioner -> NodePool and AWSNodeTemplate -> EC2NodeClass:

Running v1alpha1 alongside v1beta1: Having different Kind names for v1alpha5 and v1beta1 allows them to coexist for the same Karpenter controller for v0.32.x. This gives you time to transition to the new v1beta1 APIs while existing Provisioners and other objects stay in place. Keep in mind that there is no guarantee that the two versions will be able to coexist in future Karpenter versions.

So if that's the case then defaultInstanceProfile would need to be still available at runtime or this module would need to allow for the instance_profile to still be created (at least temporarily) and then removed in say 1.12.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants