Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not launch node, launching instances, with fleet error(s), UnauthorizedOperation: You are not authorized to perform this operation. #1488

Closed
cradules opened this issue Mar 9, 2022 · 33 comments
Assignees
Labels
bug Something isn't working burning Time sensitive issues

Comments

@cradules
Copy link

cradules commented Mar 9, 2022

Is an existing page relevant?
https://karpenter.sh/v0.6.5/getting-started/getting-started-with-terraform/

What karpenter features are relevant?
aws_iam_instance_profile

How should the docs be improved?

resource "aws_iam_instance_profile" "karpenter" {
  name = "KarpenterNodeInstanceProfile-${var.cluster_name}"
  role = module.eks.worker_iam_role_name
}

From the above example, I have to assign for instance profile worker's IAM role name. The problem is, that in the latest EKS module version worker_iam_role_name output does not exist anymore.

I am not sure what value I have to use here, because I don't know what permission need karpenter for aws_iam_instance_profile

Any suggestions?

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@ellistarn ellistarn added bug Something isn't working burning Time sensitive issues labels Mar 9, 2022
@cradules
Copy link
Author

cradules commented Mar 9, 2022

For better understanding I will provide the context of my cluster:

EKS Cluster with IRSA integration

#Create EKS Cluster with IRSA integration
module "eks" {
  source          = "terraform-aws-modules/eks/aws"
  cluster_name    = var.eks-cluster-name
  cluster_version = var.eks-cluster-version
  vpc_id          = module.vpc.vpc_id
  subnet_ids      = module.vpc.public_subnets
  create_cloudwatch_log_group = false
  cluster_security_group_additional_rules = {
    ingress_nodes_karpenter_ports_tcp = {
      description                = "Karpenter readiness"
      protocol                   = "tcp"
      from_port                  = 8443
      to_port                    = 8443
      type                       = "ingress"
      source_node_security_group = true
    }
  }

  node_security_group_additional_rules = {
    aws_lb_controller_webhook = {
      description                   = "Cluster API to AWS LB Controller webhook"
      protocol                      = "all"
      from_port                     = 9443
      to_port                       = 9443
      type                          = "ingress"
      source_cluster_security_group = true
    }
  }

  eks_managed_node_group_defaults = {
    # We are using the IRSA created below for permissions
    # This is a better practice as well so that the nodes do not have the permission,
    # only the VPC CNI addon will have the permission
    iam_role_attach_cni_policy = true
  }

  eks_managed_node_groups = {
    default = {
      min_size     = 1
      max_size     = 10
      desired_size = 1

      capacity_type = "SPOT"
    }
  }

  cluster_addons = {
    coredns = {
      resolve_conflicts = "OVERWRITE"
    }

    kube-proxy = {}

    vpc-cni = {
      resolve_conflicts        = "OVERWRITE"
      service_account_role_arn = module.vpc_cni_irsa.iam_role_arn
    }
  }

  tags = {
    "Name"        = var.eks-cluster-name
    "Environment" = var.environment
    "Terraform"   = "true"
    "karpenter.sh/discovery" = var.eks-cluster-name
  }
}

Karpenter IRSA

module "karpenter_irsa" {
  source                             = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  role_name                          = "karpenter-controller-${var.eks-cluster-name}"
  attach_karpenter_controller_policy = true
  karpenter_controller_cluster_ids   = [module.eks.cluster_id]

  karpenter_controller_node_iam_role_arns = [
    module.eks.eks_managed_node_groups["default"].iam_role_arn

  ]
  oidc_providers = {
    main = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["karpenter:karpenter"]
    }
  }
  tags = {
    "Name"        = "karpenter-irsa-${var.eks-cluster-name}"
    "Environment" = var.environment
    "Terraform"   = "true"
  }
}

From your documentation I have to use module.eks.worker_iam_role_name but there is no more worker_iam_role_name.

The question would be:

resource "aws_iam_instance_profile" "karpenter" {
  name = "KarpenterNodeInstanceProfile-${var.eks-cluster-name}"
  role = ?!

@gopiio
Copy link

gopiio commented Mar 9, 2022

@cradules you can get get role_name like: module.eks.eks_managed_node_groups["default"].iam_role_name

https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/modules/eks-managed-node-group

However, I am getting same error.

@cradules
Copy link
Author

cradules commented Mar 9, 2022

Thank you @gopiio! I have already tried that, before opening the issue. In my personal view, the right code would look like this:

resource "aws_iam_instance_profile" "karpenter" {
  name = "KarpenterNodeInstanceProfile-${var.eks-cluster-name}"
  role = module.karpenter_irsa.iam_role_name
}

As an attached policy I have:

  attach_karpenter_controller_policy = true
  attach_cluster_autoscaler_policy = true

But I have the same error. I even edit manually the role, temporarily, and I did add administrator policy, and I have the same error.

A part of the decoded message is :

{
   "DecodedMessage":"{\"allowed\":false,
   \"explicitDeny\":false,
   \"matchedStatements\":{\"items\":[]},
   \"failures\":{\"items\":[]},
   \"context\":{\"principal\":{\"id\":\"removed sesitive data\",
   \"arn\":\"arn:aws:sts::"removed sesitive data":assumed-role/karpenter-controller-eks-dev/"removed sesitive data"\"},
   \"action\":\"iam:PassRole\",\"resource\":\"arn:aws:iam::"removed sesitive data":role/karpenter-controller-eks-dev\",
   \"conditions\":{\"items\":[{\"key\":\"aws:Region\",
   \"values\":{\"items\":[{\"value\":\"us-east-2\"}]}},
   {\"key\":\"aws:Service\",
   \"values\":{\"items\":[{\"value\":\"ec2\"}]}},
   {\"key\":\"aws:Resource\",
   \"values\":{\"items\":[{\"value\":\"role/karpenter-controller-eks-dev\"}]}},
   {\"key\":\"iam:RoleName\",
   \"values\":{\"items\":[{\"value\":\"karpenter-controller-eks-dev\"}]}},
   {\"key\":\"aws:Type\",
   \"values\":{\"items\":[{\"value\":\"role\"}]}},
   {\"key\":\"aws:Account\",
   \"values\":{\"items\":[{\"value\":\""removed sesitive data"\"}]}},
   {\"key\":\"aws:ARN\",\"values\":{\"items\":[{\"value\":\"arn:aws:iam::"removed sesitive data":role/karpenter-controller-eks-dev\"}]}}]}}}"
}

From what I can see, the karpenter is not allowed to iam:PassRole

The policy attached to the role would be:

{
    "Statement": [
        {
            "Action": [
                "ec2:DescribeSubnets",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeLaunchTemplates",
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceTypes",
                "ec2:DescribeInstanceTypeOfferings",
                "ec2:DescribeAvailabilityZones",
                "ec2:CreateTags",
                "ec2:CreateLaunchTemplate",
                "ec2:CreateFleet"
            ],
            "Effect": "Allow",
            "Resource": "*",
            "Sid": ""
        },
        {
            "Action": [
                "ec2:TerminateInstances",
                "ec2:RunInstances",
                "ec2:DeleteLaunchTemplate"
            ],
            "Condition": {
                "StringEquals": {
                    "ec2:ResourceTag/karpenter.sh/discovery": "eks-dev"
                }
            },
            "Effect": "Allow",
            "Resource": "*",
            "Sid": ""
        },
        {
            "Action": "ssm:GetParameter",
            "Effect": "Allow",
            "Resource": "arn:aws:ssm:*:*:parameter/aws/service/*",
            "Sid": ""
        },
        {
            "Action": "iam:PassRole",
            "Effect": "Allow",
            "Resource": "arn:aws:iam::"removed sensitive data":role/default-eks-node-group-20220309132314999100000002",
            "Sid": ""
        }
    ],
    "Version": "2012-10-17"
}

@cradules
Copy link
Author

cradules commented Mar 9, 2022

Update:

By manually assigning the AutoScalingFullAccess to role/default-eks-node-group-20220309132314999100000002 I am able to bring nodes to the cluster, but they remain on NotReady state:

kubectl  get nodes
NAME                                            STATUS     ROLES    AGE     VERSION
ip-192-168-143-76.us-east-2.compute.internal    NotReady   <none>   3m5s
ip-192-168-146-173.us-east-2.compute.internal   NotReady   <none>   9m13s
ip-192-168-56-216.us-east-2.compute.internal    Ready      <none>   5h5m    v1.21.5-eks-9017834

I did attach the vpc_cni_policy to karpenter_irsa

module "karpenter_irsa" {
  source                             = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  role_name                          = "karpenter-controller-${var.eks-cluster-name}"

  attach_karpenter_controller_policy = true
  attach_cluster_autoscaler_policy = true
  attach_ebs_csi_policy = true
  attach_node_termination_handler_policy = true
  attach_load_balancer_controller_policy = true
  attach_vpc_cni_policy = true
  attach_external_dns_policy = true
------

@dewjam
Copy link
Contributor

dewjam commented Mar 9, 2022

Hey @cradules ,
I'm working on reproducing this issue. In the meantime, you could take a look at CloudTrail events to see if there are any failures to assume roles. Also, it's common to see nodes stay in NotReady if the AWS VPC CNI plugin is not functioning as expected. This guide may help with troubleshooting: https://aws.amazon.com/premiumsupport/knowledge-center/eks-node-status-ready/

@dewjam
Copy link
Contributor

dewjam commented Mar 9, 2022

Hey @cradules can you confirm which version of the EKS terraform module you are using? In regards to the NotReady node, I'm seeing a very similar issue outlined in this issue (and fixed in 18.8.0):
terraform-aws-modules/terraform-aws-eks#1894

Edit:
Please disregard this note, I mistakenly thought your managed node was stuck in NotReady.

@shane-snyder
Copy link

Similar issue on my end. Using eks module (latest version) and iam-for-service-accounts-eks module. I will say while troubleshooting, I can get it working by removing the below condition from the controller policy if that's any help.

        "Condition": {
            "StringEquals": {
                "ec2:ResourceTag/karpenter.sh/discovery": "cluster-name"
            }
        },

@vara-bonthu
Copy link

You can try this Terraform EKS Accelerator example for Karpenter if it helps.

@gopiio
Copy link

gopiio commented Mar 9, 2022

Similar issue on my end. Using eks module (latest version) and iam-for-service-accounts-eks module. I will say while troubleshooting, I can get it working by removing the below condition from the controller policy if that's any help.

        "Condition": {
            "StringEquals": {
                "ec2:ResourceTag/karpenter.sh/discovery": "cluster-name"
            }
        },

I managed to workaround by adding IAM policy manually using aws_iam_role_policy.

With terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks, I have
attach_karpenter_controller_policy = false

@dewjam
Copy link
Contributor

dewjam commented Mar 11, 2022

I had some time to look into this further. We have two gaps which surfaced as part of this issue.

First, Karpenter Getting Started docs are geared towards older versions of the EKS module (<18). It would be worthwhile to update the docs to support version 18 and greater.

I believe the answer to the specific question about what role should be used for the Karpenter node instance profile is either to usemodule.eks.eks_managed_node_groups["default"].iam_role_name or a role with equivalent permissions for a node to join an EKS cluster (thanks to @gopiio for finding this)

resource "aws_iam_instance_profile" "karpenter" {
  name = "KarpenterNodeInstanceProfile-${var.eks-cluster-name}"
  role = "module.eks.eks_managed_node_groups["default"].iam_role_name"

Second, the Karpenter IRSA Terraform module seems to be incorrect. Unless I'm mistaken, I think we should look into having this condition removed from the Karpenter Controller IAM policy: https://github.com/terraform-aws-modules/terraform-aws-iam/blob/master/modules/iam-role-for-service-accounts-eks/policies.tf#L456-L460 (thanks to @shane-snyder for finding this).

Any thoughts on these two items?

@shane-snyder
Copy link

shane-snyder commented Mar 11, 2022

@dewjam
I'd agree, that's likely the best course of action. I tested a bit more as well and the CloudTrail logs indicate that it is unable to launch due the the launch template automatically created by Karpenter not having the karpenter.sh/discovery tag, which would make sense given the condition in that policy. Only other course of action would be to apply that tag to the launch template and the node that is being created.

@dewjam
Copy link
Contributor

dewjam commented Mar 11, 2022

That's a good callout @shane-snyder . I'll look into it further.

I'm seeing something slightly different than you are, however. From what I can tell, the failure to launch a new instance is due to the condition applied to the ec2:RunInstances action. It seems the "karpenter.sh/discovery" tag isn't on the fleet we're using to launch the instance.

We also will not be able to terminate instances or delete launch templates unless the "karpenter.sh/discovery": "cluster-name" tag is present. As you mentioned, instances and launch templates currently don't have that tag.

@cradules
Copy link
Author

Sorry, I did not have the time to work on this issue, for the last few days. I hope next week, not to be so busy and I will have the chance to dig some more, especially since I have some starting points that emerged from @shane-snyder input, and I want to thank him for that!

@dewjam
Copy link
Contributor

dewjam commented Mar 14, 2022

FYI, it seems that Karpenter are not applying default tags to AWS resources per the documentation outlined here. This is relevant as the condition in the Terraform Karpenter IRSA module could instead target one of our default tags.

I'm working on a fix now.

@bryantbiggs
Copy link
Member

I've re-opened #1332 again which is related - any support on getting this across the line from the maintainers would be appreciated

@dewjam
Copy link
Contributor

dewjam commented Mar 15, 2022

Thanks @bryantbiggs . It seems I was missing some context when looking into this issue. As far as your PR goes, I think we were waiting for your changes to be tested. I've commented on the PR as well.

@dewjam dewjam mentioned this issue Mar 21, 2022
3 tasks
@dewjam
Copy link
Contributor

dewjam commented Apr 8, 2022

#1332 was merged yesterday, which updates the Karpenter Terraform Getting Started guide so it uses v18+ of the Terraform EKS module. Also, the Karpenter IRSA modules has been updated with some fixes. Thanks to @bryantbiggs for his hard work on this PR!

@cradules,
Any opposition to closing out this issue? Any other questions/problems you would like to bring up?

@midestefanis
Copy link

@dewjam

I am having a similar error:

ERROR controller.provisioning Launching node, creating cloud provider machine, with fleet error(s), UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: ****

The decoded message is as follows:

{
    "allowed": false,
    "explicitDeny": false,
    "matchedStatements": {
        "items": []
    },
    "failures": {
        "items": []
    },
    "context": {
        "principal": {
            "id": "**:***",
            "arn": "arn:aws:sts::**:assumed-role/karpenter-controller-****/***"
        },
        "action": "ec2:RunInstances",
        "resource": "arn:aws:ec2:us-east-1:******:launch-template/lt-0638ed6a3b7addbf5",
        "conditions": {
            "items": [{
                    "key": "******:karpenter.sh/discovery/****",
                    "values": {
                        "items": [{
                            "value": "****"
                        }]
                    }
                },
                {
                    "key": "******:Name",
                    "values": {
                        "items": [{
                            "value": "karpenter.sh/provisioner-name/default"
                        }]
                    }
                },
                {
                    "key": "******:karpenter.sh/provisioner-name",
                    "values": {
                        "items": [{
                            "value": "default"
                        }]
                    }
                },
                {
                    "key": "ec2:ResourceTag/karpenter.sh/discovery/****",
                    "values": {
                        "items": [{
                            "value": "****"
                        }]
                    }
                },
                {
                    "key": "aws:Resource",
                    "values": {
                        "items": [{
                            "value": "launch-template/lt-0638ed6a3b7addbf5"
                        }]
                    }
                },
                {
                    "key": "aws:Account",
                    "values": {
                        "items": [{
                            "value": "******"
                        }]
                    }
                },
                {
                    "key": "ec2:ResourceTag/Name",
                    "values": {
                        "items": [{
                            "value": "karpenter.sh/provisioner-name/default"
                        }]
                    }
                },
                {
                    "key": "ec2:IsLaunchTemplateResource",
                    "values": {
                        "items": [{
                            "value": "true"
                        }]
                    }
                },
                {
                    "key": "aws:Region",
                    "values": {
                        "items": [{
                            "value": "us-east-1"
                        }]
                    }
                },
                {
                    "key": "aws:ID",
                    "values": {
                        "items": [{
                            "value": "lt-0638ed6a3b7addbf5"
                        }]
                    }
                },
                {
                    "key": "aws:Service",
                    "values": {
                        "items": [{
                            "value": "ec2"
                        }]
                    }
                },
                {
                    "key": "aws:Type",
                    "values": {
                        "items": [{
                            "value": "launch-template"
                        }]
                    }
                },
                {
                    "key": "ec2:Region",
                    "values": {
                        "items": [{
                            "value": "us-east-1"
                        }]
                    }
                },
                {
                    "key": "aws:ARN",
                    "values": {
                        "items": [{
                            "value": "arn:aws:ec2:us-east-1:******:launch-template/lt-0638ed6a3b7addbf5"
                        }]
                    }
                },
                {
                    "key": "ec2:LaunchTemplateVersion",
                    "values": {
                        "items": [{
                            "value": "1"
                        }]
                    }
                },
                {
                    "key": "ec2:LaunchTemplate",
                    "values": {
                        "items": [{
                            "value": "arn:aws:ec2:us-east-1:******:launch-template/lt-0638ed6a3b7addbf5"
                        }]
                    }
                },
                {
                    "key": "ec2:ResourceTag/karpenter.sh/provisioner-name",
                    "values": {
                        "items": [{
                            "value": "default"
                        }]
                    }
                }
            ]
        }
    }
}

@midestefanis
Copy link

@dewjam I'm using the latest version of the documentation (preview) -> https://karpenter.sh/preview/getting-started/getting-started-with-terraform/

@dewjam
Copy link
Contributor

dewjam commented Apr 11, 2022

Hey @midestefanis ,
Thanks for reaching out! Would you mind providing your Provisioner spec? Also, did you provision the cluster using the getting Terraform Started guide? Or is this an existing cluster?

@midestefanis
Copy link

@dewjam

Also, did you provision the cluster using the getting Terraform Started guide? - Yes

Provisioner spec:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values: ["on-demand","spot"]
    - key: kubernetes.io/arch
      operator: In
      values: ["amd64"]
  limits:
    resources:
      cpu: 256
  provider:
    instanceProfile: KarpenterNodeInstanceProfile-${var.cluster_name}
    subnetSelector:
      karpenter.sh/discovery/${var.cluster_name}: ${var.cluster_name}
    securityGroupSelector:
      karpenter.sh/discovery/${var.cluster_name}: ${var.cluster_name}
    tags:
      karpenter.sh/discovery/${var.cluster_name}: ${var.cluster_name}

@dewjam
Copy link
Contributor

dewjam commented Apr 12, 2022

Hey @midestefanis ,
I think I see the issue here. Notice in the Karpenter IRSA Terraform module there is a condition which is expecting a tag on resources being used by the RunInstances action.

The "tags" key in your Provisioner spec must be the same, otherwise the condition will not match. If you modify your provisioner to instead use the below, it should work as expected:

tags:
      karpenter.sh/discovery: ${var.cluster_name}

@bryantbiggs ,
Would it make sense to make the tags configurable in the Karpenter IRSA Terraform module instead of setting it as a static value?

@bryantbiggs
Copy link
Member

bryantbiggs commented Apr 12, 2022

@dewjam I can definitely add that - however, that wouldn't have resolved the issue because the tag is written incorrectly. What would it take to bake in a tag that I can set on the IRSA module as the default (users could still override) - https://karpenter.sh/v0.8.2/aws/provisioning/#tags (meaning, users don't have to remember to set the tag on the provisioner)

I'll open a PR to make the IAM statement condition for the tag configurable to opt out of, but the default will still be scoped to the condition so that users are starting with a preferred security practice

@midestefanis
Copy link

@dewjam That is not an option because the tags must be customizable. I use multiple clusters in the same account and VPC, therefore I cannot tag in that way since that key is unique. It should be possible to do that last thing you mention.

@bryantbiggs
Copy link
Member

@dewjam That is not an option because the tags must be customizable. I use multiple clusters in the same account and VPC, therefore I cannot tag in that way since that key is unique. It should be possible to do that last thing you mention.

ah I see - that makes sense. one sec

@bryantbiggs
Copy link
Member

@midestefanis if you update your IRSA role to the following, this should start working for your use case now:

module "karpenter_irsa" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "4.19.0" # <= available starting in 4.19.0

  role_name                          = "karpenter-controller-${local.cluster_name}"
  attach_karpenter_controller_policy = true

  karpenter_tag_key               = "karpenter.sh/discovery/${module.eks.cluster_id}" # <= this
  karpenter_controller_cluster_id = module.eks.cluster_id
  karpenter_controller_node_iam_role_arns = [
    module.eks.eks_managed_node_groups["initial"].iam_role_arn
  ]

  oidc_providers = {
    ex = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["karpenter:karpenter"]
    }
  }
}

@midestefanis
Copy link

@bryantbiggs Thank you!

@dewjam
Copy link
Contributor

dewjam commented Apr 26, 2022

It seems as if the original purpose of this issue has been resolved. I'm going to go ahead and close this out, but please feel free to reopen if you disagree :).

@dewjam dewjam closed this as completed Apr 26, 2022
@garyyang6
Copy link

@dewjam and @bryantbiggs, I got same / similar problems. Refer to the issue, #2519. Can you please take a look?

@bryantbiggs
Copy link
Member

@garyyang6 please open a new issue and provide a reproduction plus the error you are seeing

@garyyang6
Copy link

@bryantbiggs You may find all the details at #2519. Do I need to create another one?

@hahasheminejad
Copy link

For us the problem was a version mismatch between the cloudformation stack and the controller running in the cluster.

Make sure KARPENTER_VERSION matches in both CFN and the helm chart

https://raw.githubusercontent.com/aws/karpenter/"${KARPENTER_VERSION}"/website/content/en/preview/getting-started/getting-started-with-karpenter/cloudformation.yaml

helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version ${KARPENTER_VERSION

@dbaltor
Copy link

dbaltor commented Sep 18, 2023

I was getting the same error whist trying to install Karpenter 0.30.0 using aws-ia/eks-blueprints-addons/aws terraform module 1.7.2. The issue was the SA not being correctly configured. Had to use the helm resource instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working burning Time sensitive issues
Projects
None yet
Development

No branches or pull requests