feat: Allow communication between pods on workers and pods using the primary cluster security group #892

itssimon · 2020-05-26T05:28:46Z

PR o'clock

Description

Adds two security group rules allowing pods running on worker nodes to communicate with pods using the primary cluster security group (such as pods running on Fargate).

See #866 (comment).

Checklist

CI tests are passing
README.md has been updated after any changes to variables and outputs. See https://github.com/terraform-aws-modules/terraform-aws-eks/#doc-generation

dpiddockcmp · 2020-05-26T10:04:17Z

This is a potentially breaking change as it may conflict with rules put in place by users already running mixed clusters. I think there should be a variable, default false, to enable creation of these rules. We can then remove it in a future major release.

itssimon · 2020-05-27T03:43:42Z

Agree @dpiddockcmp and done.

workers.tf

local.tf

dpiddockcmp

Was playing around testing this PR.

Created a 1.13 cluster with worker_create_cluster_primary_security_group_rules = true. Everything worked great. I then tried to upgrade to 1.14 and after finishing the eks_cluster upgrade terraform threw an error:

Error: Provider produced inconsistent final plan

When expanding the plan for
module.eks.aws_security_group_rule.cluster_primary_ingress_workers[0] to
include new values learned so far during apply, provider
"registry.terraform.io/-/aws" produced an invalid new value for
.security_group_id: was cty.StringVal(""), but now
cty.StringVal("sg-0892893c8410071f4").

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

Applying a second time was fine.

I don't think this is something to worry about long term. The default value for these security groups is false so users upgrading 1.13 clusters won't naturally hit the bug. And 1.13 is being removed at the end of June, just a month away.

barryib · 2020-05-29T21:46:03Z

Error: Provider produced inconsistent final plan

When expanding the plan for
module.eks.aws_security_group_rule.cluster_primary_ingress_workers[0] to
include new values learned so far during apply, provider
"registry.terraform.io/-/aws" produced an invalid new value for
.security_group_id: was cty.StringVal(""), but now
cty.StringVal("sg-0892893c8410071f4").

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

Means that module.eks.aws_security_group_rule.cluster_primary_ingress_workers[0].security_group_id was at "" during the plan phase and became sg-0892893c8410071f4 during the apply.

I think it's because of :

cluster_primary_security_group_id = var.cluster_version >= 1.14 ? element(concat(aws_eks_cluster.this[*].vpc_config[0].cluster_security_group_id, list("")), 0) : null

local.cluster_primary_security_group_id is empty for all clusters created before Kubernetes 1.14 and eks.3 during the plan phase. So during the apply phase, the cluster is updated and the primary security group is created by AWS. This is probably the inconsistency.

You can notice with a terraform plan, that there is no security_group_id for the cluster_primary_ingress_workers SG rule.

  # module.eks.aws_security_group_rule.cluster_primary_ingress_workers[0] will be created
  + resource "aws_security_group_rule" "cluster_primary_ingress_workers" {
      + description              = "Allow pods running on workers to send communication to cluster primary security group (e.g. Fargate pods)."
      + from_port                = 0
      + id                       = (known after apply)
      + protocol                 = "-1"
      + self                     = false
      + source_security_group_id = "sg-0d741dba799316523"
      + to_port                  = 65535
      + type                     = "ingress"
    }

So we have to ensure a correct ordering during resources creation to let Terraform wait for the SG creation or at least mark the security_group_id as unknown.

Here how I make it work (but I don't know if it brokes something else).

locals {
  cluster_primary_security_group_id = var.cluster_version >= 1.14 ? coalescelist(data.null_data_source.cluster[*].outputs["cluster_primary_security_group_id"], [""])[0] : null
}

data "null_data_source" "cluster" {
  count = var.create_eks ? 1 : 0

  inputs = {
    cluster_primary_security_group_id = aws_eks_cluster.this[0].vpc_config[0].cluster_security_group_id
  }
}

With the null_data_source, Terraform wait for the cluster_security_group_id attribute to be available.

You can see the difference with a plan (security_group_id = (known after apply))

  # module.eks.aws_security_group_rule.cluster_primary_ingress_workers[0] will be created
  + resource "aws_security_group_rule" "cluster_primary_ingress_workers" {
      + description              = "Allow pods running on workers to send communication to cluster primary security group (e.g. Fargate pods)."
      + from_port                = 0
      + id                       = (known after apply)
      + protocol                 = "-1"
      + security_group_id        = (known after apply)
      + self                     = false
      + source_security_group_id = "sg-01ee228ae43275c29"
      + to_port                  = 65535
      + type                     = "ingress"
    }

This will also make the output cluster_primary_security_group_id wait for the SG (cluster_security_group_id) to be available.

@dpiddockcmp @itssimon Is this work for you ? Do you see any downside or side effects of doing this ?

@dpiddockcmp I assigned this PR to you, so you can squash and merge it when it's ready ^^

dpiddockcmp · 2020-05-30T17:41:51Z

I don't think the added complexity of using a null_data_source here is necessary. I also tried removing the local so that the security groups were directly referencing aws_eks_cluster.this[0].vpc_config[0].cluster_security_group_id. Same result when updating a 1.13 cluster but only for the aws_security_group_rule.cluster_primary_ingress_workers resource. The other applied correctly.

I guess don't enable this on a 1.13 cluster 😉

barryib · 2020-05-30T18:28:52Z

I don't think the added complexity of using a null_data_source here is necessary. I also tried removing the local so that the security groups were directly referencing aws_eks_cluster.this[0].vpc_config[0].cluster_security_group_id. Same result when updating a 1.13 cluster but only for the aws_security_group_rule.cluster_primary_ingress_workers resource. The other applied correctly.

Ohh that was the first thing I've tried, but no luck. Maybe I didn't test it correctly ^^

github-actions · 2022-11-18T02:30:35Z

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Add sg rules for cluster primary sg

8a173c9

itssimon changed the title ~~Allow communication between pods on workers and pods using the primary cluster security group~~ feat: Allow communication between pods on workers and pods using the primary cluster security group May 26, 2020

Merge branch 'master' into fargate-sg

f09363e

itssimon mentioned this pull request May 26, 2020

feat: Add EKS Fargate support #866

Closed

2 tasks

Add worker_create_cluster_primary_security_group_rules input

3cc122a

dpiddockcmp suggested changes May 27, 2020

View reviewed changes

workers.tf Outdated Show resolved Hide resolved

local.tf Outdated Show resolved Hide resolved

itssimon added 2 commits May 27, 2020 19:51

Fixes

0a243ae

Fix

72e1777

dpiddockcmp approved these changes May 29, 2020

View reviewed changes

barryib assigned dpiddockcmp May 29, 2020

Merge branch 'master' into fargate-sg

5ca8462

dpiddockcmp merged commit 3fefc2a into terraform-aws-modules:master May 30, 2020

github-actions bot locked as resolved and limited conversation to collaborators Nov 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Allow communication between pods on workers and pods using the primary cluster security group #892

feat: Allow communication between pods on workers and pods using the primary cluster security group #892

itssimon commented May 26, 2020 •

edited

Loading

dpiddockcmp commented May 26, 2020

itssimon commented May 27, 2020

dpiddockcmp left a comment

barryib commented May 29, 2020 •

edited

Loading

dpiddockcmp commented May 30, 2020

barryib commented May 30, 2020

github-actions bot commented Nov 18, 2022

feat: Allow communication between pods on workers and pods using the primary cluster security group #892

feat: Allow communication between pods on workers and pods using the primary cluster security group #892

Conversation

itssimon commented May 26, 2020 • edited Loading

PR o'clock

Description

Checklist

dpiddockcmp commented May 26, 2020

itssimon commented May 27, 2020

dpiddockcmp left a comment

Choose a reason for hiding this comment

barryib commented May 29, 2020 • edited Loading

dpiddockcmp commented May 30, 2020

barryib commented May 30, 2020

github-actions bot commented Nov 18, 2022

itssimon commented May 26, 2020 •

edited

Loading

barryib commented May 29, 2020 •

edited

Loading