AWS Load Balancer Controller v2.1.3 + EKSCTL (Multiple tagged SGs) #3459

marciogmorales · 2021-03-19T16:36:36Z

Hello,
I'm using eksctl to launch an Amazon EKS cluster consisting of:

1 Managed Linux node
2 Windows Nodes

The Windows nodes by default has two Security Groups created and attached:

1 - ClusterSharedNodeSecurityGroup
2 - SG for communication between the control plane and worker nodes in group windows-ng-ltsc

However,
When trying to launch a Type: Loadbalancer (NLB) I receive the following error on the Service:

Warning SyncLoadBalancerFailed 23s (x5 over 101s) service-controller Error syncing load balancer: failed to ensure load balancer: Multiple tagged security groups found for instance i-0f7XXXX; ensure only the k8s security group is tagged; the tagged groups were sg-0cf0274a15cb31e9a(eksctl-eks-windows-nodegroup-windows-ng-ltsc-SG-7ZXXXXXZ) sg-04be95eb6f37ad123(eks-cluster-sg-eks-windows-1XXXX7)

I found some people complain about the same some years ago, but sounds like the issue still persists. Any clue on how to solve it?

The text was updated successfully, but these errors were encountered:

aclevername · 2021-03-19T17:28:27Z

Thanks for opening the issue @marciogmorales !

https://github.com/kubernetes/kubernetes/blob/68b4e26caf6ede7af577db4af62fb405b4dd47e6/staging/src/k8s.io/legacy-cloud-providers/aws/aws.go#L4111-L4138 the error is coming back from the AWS cloud provider. It looks for the kuberntes.io/cluster/<name> tag on the security groups attached to an EC2 Instance, and appears to not like it when more than one security group has it. We currently add this annotation to the default nodegroup SG that allows communication between nodegroup and the control plane https://github.com/weaveworks/eksctl/blob/d08516d74e6f4c533406b3e001b951343e04225c/pkg/cfn/builder/vpc.go#L493-L501

The error your getting would indicate that the ClusterSharedNodeSecurityGroup also has this tag, but that does not happen AFAIK: https://github.com/weaveworks/eksctl/blob/d08516d74e6f4c533406b3e001b951343e04225c/pkg/cfn/builder/vpc.go#L397-L400

Which nodegroup type is instance instance i-0f7XXXX, the Linux or Window node? And can you provide all the tags attached to the securitygroups attached to that instane? Thanks

marciogmorales · 2021-03-19T19:03:25Z

Thanks for the quickly feedback.

This is happening on my Windows node (instance i-0f7XXXX). The kuberntes.io/cluster/ tag has begin created on both SGs.

Here are some screenshots of the Tags created:

ClusterSharedNodeSecurityGroup:

NodeSecurityGroup:

aclevername · 2021-03-19T19:17:26Z

Thanks @marciogmorales. I don't see the kubernetes/clusters/<name> tag on the screenshot above for ClusterSharedNodeSecurityGroup. Are you sure you have the correct SG's? Are those two the same reported in the error? the tagged groups were sg-0cf0274a15cb31e9a(eksctl-eks-windows-nodegroup-windows-ng-ltsc-SG-7ZXXXXXZ) sg-04be95eb6f37ad123(eks-cluster-sg-eks-windows-1XXXX7)

aclevername · 2021-03-30T08:40:48Z

closing due to inactivity

consideRatio · 2021-05-31T00:15:42Z

Hi @aclevername, I've run into this specific error as well and hope to revive this debugging effort (eksctl version: 0.52.0). I've tried to provide as much information as I can below.

UPDATE: RESOLVED

The issue was caused by having a tag called KubernetesCluster, something that I had added to my eksctl-config because i thought it was a strategy to recognize my resources created by eksctl. But, it turned out to have other meaning influencing things. Removing it did the trick.

Info: background

I have created the entire cluster recently
I have multiple node groups but they are all linux based on just different sizes.
I have only one running node from all these node groups, from the core-a node group.
I have not created any AWS SecurityGroup manually or similarly, they are all created via eksctl to my knowledge.

Info: `kubectl apply -f my-svc.yaml`

Note that I've deleted this k8s Service and created it again without any change in outcome.

apiVersion: v1
kind: Service
metadata:
  name: proxy-public
  namespace: prod
spec:
  ports:
  - name: https
    port: 443
    targetPort: https
  - name: http
    port: 80
    targetPort: http
  selector:
    component: autohttps
    release: prod
  type: LoadBalancer

Info: `kubectl describe svc my-svc`

Note the errors are about the security groups eksctl-jmte-cluster-ClusterSharedNodeSecurityGroup-2LRGJW3MQ4O8 and eksctl-jmte-nodegroup-core-a-SG-LGMTEQ7JLTNX.

Name:                     proxy-public
Namespace:                prod
Labels:                   app=jupyterhub
                          app.kubernetes.io/managed-by=Helm
                          chart=jupyterhub-1.0.0-beta.1
                          component=proxy-public
                          heritage=Helm
                          release=prod
Annotations:              meta.helm.sh/release-name: prod
                          meta.helm.sh/release-namespace: prod
Selector:                 component=autohttps,release=prod
Type:                     LoadBalancer
IP:                       10.100.101.229
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  32624/TCP
Endpoints:                192.168.22.121:8443
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  32559/TCP
Endpoints:                192.168.22.121:8080
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type     Reason                  Age               From                Message
  ----     ------                  ----              ----                -------
  Normal   EnsuringLoadBalancer    5s (x2 over 12s)  service-controller  Ensuring load balancer
  Warning  SyncLoadBalancerFailed  4s (x2 over 10s)  service-controller  Error syncing load balancer: failed to ensure load balancer: Multiple tagged security groups found for instance i-0a24c650cd9fbfc68; ensure only the k8s security group is tagged; the tagged groups were sg-0731df37a5a8e6844(eksctl-jmte-cluster-ClusterSharedNodeSecurityGroup-2LRGJW3MQ4O8) sg-00efe23d69c9c0c4a(eksctl-jmte-nodegroup-core-a-SG-LGMTEQ7JLTNX)

Info: security groups details

core-a (a unmanaged nodepool, the first in my eksctl config list)	ClusterSharedNodeSecurityGroup

Info: all security groups

All security groups related to the VPC at least, I omitted a few related to another VPC.

Info: about associated instance

instance info: tags section

instance info: security section

Info: `eksctl-cluster-config.yaml`

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: jmte
  region: us-west-2
  version: "1.19"
  tags:
    2i2c.org/project: jmte

availabilityZones: [us-west-2d, us-west-2b, us-west-2a]

iam:
  withOIDC: true

nodeGroups:
  - name: core-a
    availabilityZones: [us-west-2d]
    instanceType: m5.large
    minSize: 0
    maxSize: 2
    desiredCapacity: 1
    volumeSize: 80
    labels:
      hub.jupyter.org/node-purpose: core
    tags:
      k8s.io/cluster-autoscaler/node-template/label/hub.jupyter.org/node-purpose: core
    iam:
      withAddonPolicies:
        autoScaler: true
        efs: true

  - name: user-a
    availabilityZones: [us-west-2d]
    instanceType: m5.xlarge   # 57 pods, 4 cpu, 16 GB
    minSize: 0
    maxSize: 20
    desiredCapacity: 0
    volumeSize: 80
    labels:
      hub.jupyter.org/node-purpose: user
    tags:
      k8s.io/cluster-autoscaler/node-template/label/hub.jupyter.org/node-purpose: user
    iam:
      withAddonPolicies:
        autoScaler: true
        efs: true

  - name: worker-xlarge
    availabilityZones: &availabilityZones [us-west-2d, us-west-2b, us-west-2a]
    minSize: &minSize 0
    maxSize: &maxSize 8
    desiredCapacity: &desiredCapacity 0
    volumeSize: &volumeSize 80
    labels: &labels
      k8s.dask.org/node-purpose: worker
    taints: &taints
      k8s.dask.org_dedicated: worker:NoSchedule
    tags: &tags
      k8s.io/cluster-autoscaler/node-template/label/k8s.dask.org/node-purpose: worker
      k8s.io/cluster-autoscaler/node-template/taint/k8s.dask.org_dedicated: worker:NoSchedule
    iam: &iam
      withAddonPolicies:
        autoScaler: true
        efs: true
    instancesDistribution:
      instanceTypes:
        - m5a.xlarge      # 57 pods, 4 cpu, 16 GB (AMD,   10 GBits network,  100% cost)
        - m5.xlarge       # 57 pods, 4 cpu, 16 GB (Intel, 10 GBits network, ~112% cost)
        # - m5n.xlarge    # 57 pods, 4 cpu, 16 GB (Intel, 25 GBits network, ~139% cost)
      onDemandBaseCapacity: &onDemandBaseCapacity 0
      onDemandPercentageAboveBaseCapacity: &onDemandPercentageAboveBaseCapacity 0
      spotAllocationStrategy: &spotAllocationStrategy capacity-optimized

  - name: worker-2xlarge
    availabilityZones: *availabilityZones
    minSize: *minSize
    maxSize: *maxSize
    desiredCapacity: *desiredCapacity
    volumeSize: *volumeSize
    labels: *labels
    taints: *taints
    tags: *tags
    iam: *iam
    instancesDistribution:
      instanceTypes:
        - m5a.2xlarge     # 57 pods, 8 cpu, 32 GB (AMD,   10 GBits network,  100% cost)
        - m5.2xlarge      # 57 pods, 8 cpu, 32 GB (Intel, 10 GBits network, ~112% cost)
        # - m5n.2xlarge   # 57 pods, 8 cpu, 32 GB (Intel, 25 GBits network, ~139% cost)
      onDemandBaseCapacity: *onDemandBaseCapacity
      onDemandPercentageAboveBaseCapacity: *onDemandPercentageAboveBaseCapacity
      spotAllocationStrategy: *spotAllocationStrategy

  # ... repeating entries for worker-4xlarge, worker-8xlarge, and worker-16xlarge omitted

consideRatio · 2021-05-31T17:47:14Z

I updated my comment with the resolution for me to the issue, see #3459 (comment).

aclevername added the kind/bug label Mar 19, 2021

aclevername self-assigned this Mar 19, 2021

aclevername closed this as completed Mar 30, 2021

consideRatio mentioned this issue May 31, 2021

[Hub] - Jupyter Meets the Earth 2i2c-org/infrastructure#433

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS Load Balancer Controller v2.1.3 + EKSCTL (Multiple tagged SGs) #3459

AWS Load Balancer Controller v2.1.3 + EKSCTL (Multiple tagged SGs) #3459

marciogmorales commented Mar 19, 2021

aclevername commented Mar 19, 2021

marciogmorales commented Mar 19, 2021

aclevername commented Mar 19, 2021

aclevername commented Mar 30, 2021

consideRatio commented May 31, 2021 •

edited

Loading

consideRatio commented May 31, 2021

AWS Load Balancer Controller v2.1.3 + EKSCTL (Multiple tagged SGs) #3459

AWS Load Balancer Controller v2.1.3 + EKSCTL (Multiple tagged SGs) #3459

Comments

marciogmorales commented Mar 19, 2021

aclevername commented Mar 19, 2021

marciogmorales commented Mar 19, 2021

aclevername commented Mar 19, 2021

aclevername commented Mar 30, 2021

consideRatio commented May 31, 2021 • edited Loading

UPDATE: RESOLVED

Info: background

Info: kubectl apply -f my-svc.yaml

Info: kubectl describe svc my-svc

Info: security groups details

Info: all security groups

Info: about associated instance

instance info: tags section

instance info: security section

Info: eksctl-cluster-config.yaml

consideRatio commented May 31, 2021

consideRatio commented May 31, 2021 •

edited

Loading

Info: `kubectl apply -f my-svc.yaml`

Info: `kubectl describe svc my-svc`

Info: `eksctl-cluster-config.yaml`