Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Restrict IAM permissions to those related to Karpenter managed resources #1332

Merged
merged 27 commits into from
Apr 7, 2022
Merged
Changes from 20 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
6dc4ca6
fix: Restrict `ssm:GetParameter` IAM permissions to only the AWS serv…
bryantbiggs Feb 13, 2022
5680438
Merge branch 'main' of github.com:aws/karpenter into fix/restrict-ssm…
bryantbiggs Feb 14, 2022
91638f7
chore: update permissions and Terraform example
bryantbiggs Feb 14, 2022
ec47682
chore: update doc wording
bryantbiggs Feb 14, 2022
d2f8a08
chore: one last -var reference
bryantbiggs Feb 14, 2022
9ee6224
feat: restrict `iam:PassRole` to only the Karpenter node role
bryantbiggs Feb 14, 2022
4ebf661
fix: remove copy+paste cruft
bryantbiggs Feb 14, 2022
97fc9ba
Merge branch 'main' of github.com:aws/karpenter into fix/restrict-ssm…
bryantbiggs Feb 16, 2022
684c747
chore: update to use new sub-module
bryantbiggs Feb 16, 2022
2574bc7
Merge branch 'main' of github.com:bryantbiggs/karpenter into fix/rest…
bryantbiggs Feb 23, 2022
a3d6bf7
Merge branch 'main' of github.com:aws/karpenter into fix/restrict-ssm…
bryantbiggs Mar 16, 2022
36d1b41
chore: re-update and validate
bryantbiggs Mar 18, 2022
f69fb32
chore: remove cloudformation/eksctl changes and v0.6.4 changes
bryantbiggs Mar 18, 2022
e69ef7a
chore: align cluster name with other examples
bryantbiggs Mar 18, 2022
f4411ce
Merge branch 'main' of github.com:aws/karpenter into fix/restrict-ssm…
bryantbiggs Mar 23, 2022
c1500af
chore: updates from testing
bryantbiggs Mar 23, 2022
5f1f22e
chore: final update with latest module changes incorporated for Karpe…
bryantbiggs Mar 24, 2022
be7250c
Merge branch 'main' of github.com:aws/karpenter into fix/restrict-ssm…
bryantbiggs Mar 24, 2022
98b54c4
Merge branch 'main' of github.com:aws/karpenter into fix/restrict-ssm…
bryantbiggs Mar 29, 2022
3b53b83
chore: update terraform modules to current latest
bryantbiggs Mar 29, 2022
83398c6
Merge branch 'main' of github.com:aws/karpenter into fix/restrict-ssm…
bryantbiggs Mar 30, 2022
b060ee5
feat: create Karpenter provisioner using Terraform+kubectl
bryantbiggs Mar 30, 2022
a937b8e
fix: add required provider versions for 3rd party source resolution
bryantbiggs Mar 31, 2022
fd8df73
Merge branch 'main' of github.com:aws/karpenter into fix/restrict-ssm…
bryantbiggs Mar 31, 2022
67e2950
Update website/content/en/preview/getting-started/getting-started-wit…
bryantbiggs Apr 5, 2022
2653df1
Merge branch 'main' of github.com:aws/karpenter into fix/restrict-ssm…
bryantbiggs Apr 7, 2022
8576324
docs: add note to udate local kubeconfig before running kubectl commands
bryantbiggs Apr 7, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -42,19 +42,25 @@ After setting up the tools, set the following environment variables to store
commonly used values.

```bash
export CLUSTER_NAME="${USER}-karpenter-demo"
export CLUSTER_NAME="karpenter-demo"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to change the cluster name? Is it just so that we don't have to supply the cluster_name var as an argument to the terraform apply command?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct - its not gernally a common way of interacting with Terraform in normal day-to-day use (i.e. - in CI/CD processes) and so I was trying to mirror this practice.

However, if do want to keep it like this, I would suggest a small tweak to where we re-name the variable to TF_VAR_cluster_name. Terraform will recognize this and supply the value for var.cluster_name in lieu of supplying -var="cluster_name=$CLUSTER_NAME" when planning and applying

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, if that is more inline with common Terraform practices, then I'm ok with that. It would avoid the case where someone tries to set a custom value for the CLUSTER_NAME env and then forget to update the terraform vars.

export AWS_DEFAULT_REGION="us-east-1"
```

The first thing we need to do is create our `main.tf` file and place the
following in it. This will let us pass in a cluster name that will be used
throughout the remainder of our config.
The first thing we need to do is create our `main.tf` file and place the following in it.

```hcl
variable "cluster_name" {
description = "The name of the cluster"
type = string
provider "aws" {
region = "us-east-1"
}

locals {
cluster_name = "karpenter-demo"

# Used to determine correct partition (i.e. - `aws`, `aws-gov`, `aws-cn`, etc.)
partition = data.aws_partition.current.partition
}

data "aws_partition" "current" {}
```

### Create a Cluster
Expand All @@ -63,13 +69,15 @@ We're going to use two different Terraform modules to create our cluster - one
to create the VPC and another for the cluster itself. The key part of this is
that we need to tag the VPC subnets that we want to use for the worker nodes.

Place the following Terraform config into your `main.tf` file.
Add the following to your `main.tf` to create a VPC and EKS cluster.

```hcl
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
# https://registry.terraform.io/modules/terraform-aws-modules/vpc/aws/latest
source = "terraform-aws-modules/vpc/aws"
version = "3.12.0"

name = var.cluster_name
name = local.cluster_name
cidr = "10.0.0.0/16"

azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
Expand All @@ -81,31 +89,58 @@ module "vpc" {
one_nat_gateway_per_az = false

private_subnet_tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "owned"
"karpenter.sh/discovery" = var.cluster_name
"kubernetes.io/cluster/${local.cluster_name}" = "owned"
# Tags subnets for Karpenter auto-discovery
"karpenter.sh/discovery" = local.cluster_name
}
}

module "eks" {
bryantbiggs marked this conversation as resolved.
Show resolved Hide resolved
source = "terraform-aws-modules/eks/aws"
version = "<18"
# https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest
source = "terraform-aws-modules/eks/aws"
version = "18.16.0"

cluster_name = local.cluster_name
cluster_version = "1.21"
cluster_name = var.cluster_name
vpc_id = module.vpc.vpc_id
subnets = module.vpc.private_subnets
enable_irsa = true

# Only need one node to get Karpenter up and running
worker_groups = [
{
instance_type = "t3a.medium"
asg_max_size = 1

vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets

# Required for Karpenter role below
enable_irsa = true

# We will rely only on the cluster security group created by the EKS service
# See note below for `tags`
create_cluster_security_group = false
create_node_security_group = false

# Only need one node to get Karpenter up and running.
# This ensures core services such as VPC CNI, CoreDNS, etc. are up and running
# so that Karpetner can be deployed and start managing compute capacity as required
eks_managed_node_groups = {
initial = {
instance_types = ["t3.medium"]
# We don't need the node security group since we are using the
# cluster created security group which Karpenter will also use
bryantbiggs marked this conversation as resolved.
Show resolved Hide resolved
create_security_group = false
attach_cluster_primary_security_group = true

min_size = 1
max_size = 1
desired_size = 1

iam_role_additional_policies = [
# Required by Karpenter
"arn:${local.partition}:iam::aws:policy/AmazonSSMManagedInstanceCore"
]
}
]
}

tags = {
"karpenter.sh/discovery" = var.cluster_name
# Tag node group resources for Karpenter auto-discovery
# NOTE - if creating multiple security groups with this module, only tag the
# security group that Karpenter should utilize with the following tag
"karpenter.sh/discovery" = local.cluster_name
}
}
```
Expand All @@ -115,23 +150,9 @@ EKS cluster. This may take some time.

```bash
terraform init
terraform apply -var "cluster_name=${CLUSTER_NAME}"
terraform apply
```

There's a good chance it will fail when trying to configure the aws-auth
ConfigMap. And that's because we need to use the kubeconfig file that was
generated during the cluster install. To use it, run the following. This will
configure both your local CLI and Terraform to use the file. Then try the apply
again.

```bash
export KUBECONFIG="${PWD}/kubeconfig_${CLUSTER_NAME}"
export KUBE_CONFIG_PATH="${KUBECONFIG}"
terraform apply -var "cluster_name=${CLUSTER_NAME}"
```

Everything should apply successfully now!

### Create the EC2 Spot Service Linked Role

This step is only necessary if this is the first time you're using EC2 Spot in this account. More details are available [here](https://docs.aws.amazon.com/batch/latest/userguide/spot_fleet_IAM_role.html).
Expand All @@ -144,33 +165,23 @@ aws iam create-service-linked-role --aws-service-name spot.amazonaws.com

### Configure the KarpenterNode IAM Role

The EKS module creates an IAM role for worker nodes. We'll use that for
The EKS module creates an IAM role for the EKS managed node group nodes. We'll use that for
Karpenter (so we don't have to reconfigure the aws-auth ConfigMap), but we need
to add one more policy and create an instance profile.
to create an instance profile we can reference.

Place the following into your `main.tf` to add the policy and create an
instance profile.
Add the following to your `main.tf` to create the instance profile.

```hcl
data "aws_iam_policy" "ssm_managed_instance" {
arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}

resource "aws_iam_role_policy_attachment" "karpenter_ssm_policy" {
role = module.eks.worker_iam_role_name
policy_arn = data.aws_iam_policy.ssm_managed_instance.arn
}

resource "aws_iam_instance_profile" "karpenter" {
name = "KarpenterNodeInstanceProfile-${var.cluster_name}"
role = module.eks.worker_iam_role_name
name = "KarpenterNodeInstanceProfile-${local.cluster_name}"
role = module.eks.eks_managed_node_groups["initial"].iam_role_name
}
```

Go ahead and apply the changes.

```bash
terraform apply -var "cluster_name=${CLUSTER_NAME}"
terraform apply
```

Now, Karpenter can use this instance profile to launch new EC2 instances and
Expand All @@ -185,55 +196,35 @@ using [IRSA](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/
We will create the ServiceAccount and connect it to this role during the Helm
chart install.

Add the following to your `main.tf` to create the IAM role for the Karpenter service account.

```hcl
module "iam_assumable_role_karpenter" {
source = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
version = "4.7.0"
create_role = true
role_name = "karpenter-controller-${var.cluster_name}"
provider_url = module.eks.cluster_oidc_issuer_url
oidc_fully_qualified_subjects = ["system:serviceaccount:karpenter:karpenter"]
}
module "karpenter_irsa" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're still likely to have some problems with IAM permissions due to the condition in this karpenter policy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you elaborate on "likely to have some problems"? The wildcard permissions for run/terminate any EC2 instance without a scoping condition are going to be a tough hurdle to pass for security conscious/enterprise environments - restricting these permissions to only those resources with certain attributes (i.e. - tags/naming scheme) is the current standard practice in the community. You can see this in the other policies such as https://github.com/terraform-aws-modules/terraform-aws-iam/blob/master/modules/iam-role-for-service-accounts-eks/policies.tf#L699-L702

Is there a different tag or set of conditions we should be specifying?

Copy link
Contributor

@dewjam dewjam Mar 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me clarify.

The condition is ok for the the ec2:TerminateInstances and ec2:DeleteLaunchTemplates actions as long as we also provide the same tag to the Karpenter Provisioner spec. (By providing a tag in the Provisioner spec, you are telling Karpenter to apply said tag to the Launch Templates, Instances, and volumes it creates).

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  provider:
    tags:
        karpenter.sh/discovery: karpenter-demo

The real problem I'm seeing is with the ec2:RunInstances action. The IAM condition applied to the RunInstances action expects a tag to be on all the resources the RunInstances action is attempting to use. But, if I understand correctly, this will never be able to match for some resources. I've done some testing and this IAM policy appears to work, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing @dewjam - I've opened a draft of a policy that I *hope will work based on the information you shared terraform-aws-modules/terraform-aws-iam#209

However, how do I go about testing this change since make test and make battletest do not appear to fully exercise Karpenter? Do I just need to use this https://karpenter.sh/v0.7.2/getting-started/getting-started-with-terraform/#first-use and verify Karpenter is able to provision and delete nodes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I just need to use this https://karpenter.sh/v0.7.2/getting-started/getting-started-with-terraform/#first-use and verify Karpenter is able to provision and delete nodes?

Yep exactly.

make test and make battletest spawn etcd and kube-apiserver as processes on your local machine. This works well for testing Karpenter behaviors, but obviously doesn't take into account underlying cloud infrastructure like security groups and/or IAM policies. We are working on a more robust e2e testing methodology, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have terraform-aws-modules/terraform-aws-iam#209 merged tomorrow and then this should be ready to re-check. testing locally though, Karpenter was able to provision and remove nodes (scale up/down) per the getting started guide. stay tuned

source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
version = "4.17.1"

role_name = "karpenter-controller-${local.cluster_name}"
attach_karpenter_controller_policy = true

resource "aws_iam_role_policy" "karpenter_controller" {
name = "karpenter-policy-${var.cluster_name}"
role = module.iam_assumable_role_karpenter.iam_role_name

policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"ec2:CreateLaunchTemplate",
"ec2:CreateFleet",
"ec2:RunInstances",
"ec2:CreateTags",
"iam:PassRole",
"ec2:TerminateInstances",
"ec2:DescribeLaunchTemplates",
"ec2:DeleteLaunchTemplate",
"ec2:DescribeInstances",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeInstanceTypes",
"ec2:DescribeInstanceTypeOfferings",
"ec2:DescribeAvailabilityZones",
"ssm:GetParameter"
]
Effect = "Allow"
Resource = "*"
},
]
})
karpenter_controller_cluster_id = module.eks.cluster_id
karpenter_controller_node_iam_role_arns = [
module.eks.eks_managed_node_groups["initial"].iam_role_arn
]

oidc_providers = {
ex = {
provider_arn = module.eks.oidc_provider_arn
namespace_service_accounts = ["karpenter:karpenter"]
}
}
}
```

Since we've added a new module, you'll need to run `terraform init` again.
Then, apply the changes.
Since we've added a new module, you'll need to run `terraform init` again before applying the changes.

```bash
terraform init
terraform apply -var "cluster_name=${CLUSTER_NAME}"
terraform apply
```

### Install Karpenter Helm Chart
Expand All @@ -242,9 +233,23 @@ Use helm to deploy Karpenter to the cluster. We are going to use the
`helm_release` Terraform resource to do the deploy and pass in the cluster
details and IAM role Karpenter needs to assume.

Add the following to your `main.tf` to provision Karpenter via a Helm chart.

```hcl
provider "helm" {
kubernetes {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

exec {
api_version = "client.authentication.k8s.io/v1alpha1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", local.cluster_name]
}
}
}

resource "helm_release" "karpenter" {
depends_on = [module.eks.kubeconfig]
namespace = "karpenter"
create_namespace = true

Expand All @@ -255,12 +260,12 @@ resource "helm_release" "karpenter" {

set {
name = "serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
value = module.iam_assumable_role_karpenter.iam_role_arn
value = module.karpenter_irsa.iam_role_arn
}

set {
name = "clusterName"
value = var.cluster_name
value = module.eks.cluster_id
}

set {
Expand All @@ -275,14 +280,14 @@ resource "helm_release" "karpenter" {
}
```

Now, deploy Karpenter by applying the new Terraform config.
Since we've added a new provider (helm), you'll need to run `terraform init` again
before applying the changes to deploy Karpenter.

```bash
terraform init
terraform apply -var "cluster_name=${CLUSTER_NAME}"
terraform apply
```


### Enable Debug Logging (optional)

The global log level can be modified with the `logLevel` chart value (e.g. `--set logLevel=debug`) or the individual components can have their log level set with `controller.logLevel` or `webhook.logLevel` chart values.
Expand Down Expand Up @@ -325,6 +330,8 @@ spec:
karpenter.sh/discovery: ${CLUSTER_NAME}
securityGroupSelector:
karpenter.sh/discovery: ${CLUSTER_NAME}
tags:
karpenter.sh/discovery: ${CLUSTER_NAME}
ttlSecondsAfterEmpty: 30
EOF
```
Expand Down Expand Up @@ -399,8 +406,7 @@ created LaunchTemplates.
```bash
kubectl delete deployment inflate
kubectl delete node -l karpenter.sh/provisioner-name=default
helm uninstall karpenter --namespace karpenter
terraform destroy -var "cluster_name=${CLUSTER_NAME}"
terraform destroy
aws ec2 describe-launch-templates \
| jq -r ".LaunchTemplates[].LaunchTemplateName" \
| grep -i "Karpenter-${CLUSTER_NAME}" \
Expand Down