Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot Run with IAM Service Account and no metadata service #474

Closed
geofffranks opened this issue Mar 25, 2020 · 49 comments · Fixed by #855
Closed

Cannot Run with IAM Service Account and no metadata service #474

geofffranks opened this issue Mar 25, 2020 · 49 comments · Fixed by #855
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@geofffranks
Copy link

/kind bug

What happened?
The ebs-plugin container on the ebs-csi-controller crashes repeatedly while talking to the metadata service:

I0325 19:25:18.560010       1 driver.go:62] Driver: ebs.csi.aws.com Version: v0.6.0-dirty
panic: EC2 instance metadata is not available

goroutine 1 [running]:
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.newNodeService(0x0, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/node.go:83 +0x196
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.NewDriver(0xc00016ff70, 0x3, 0x3, 0xc0000a88a0, 0xdcc3a0, 0xc0001edb00)
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/driver.go:87 +0x512
main.main()
	/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/cmd/main.go:31 +0x117

What you expected to happen?

When specifying the AWS_REGION variable, and an IAM role service account, the ebs-csi-driver should not need to access the metadata service, and run on its own.

How to reproduce it (as minimally and precisely as possible)?

  1. Create an IAM role with permissions for the aws-ebs-csi-driver
  2. Create an EKS cluster with an OIDC connect provider trust relationship with IAM (https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html), especially following the final step of disabling pods from accessing the metadata service to prevent them from assuming the worker instance profile.
  3. Deploy aws-ebs-csi-driver using the alpha kustomize overlays, adding the eks.amazonaws.com/role-arn to the service account
  4. The ebs-csi-controller pods will start and crash after about 20s

Anything else we need to know?:
As far as we can tell everything is set up correctly with the role + service account, but the code explicitly tries to instantiate the metadata service, which is firewalled off. Can this be made optional if region is set, and credentials are available via the service account?

Environment
EKS v1.14
ebs-csi-driver v0.5.0, v0.6.0-dirty

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 25, 2020
@leakingtapan
Copy link
Contributor

leakingtapan commented Mar 26, 2020

How is ur AWS_REGION specified? The metadata service is used as a fallback when AWS_REGION is not set through environment variable.

@franklevering
Copy link

franklevering commented Mar 27, 2020

@leakingtapan
I am having the same issue with the ebs-plugin, it is trying to access the metadata service while an AWS_REGION is defined as an environment variable. I've specified AWS_REGION the following way:

 containers:
        - name: ebs-plugin
          image: amazon/aws-ebs-csi-driver:latest
          args :
          # - {all,controller,node} # specify the driver mode
            - --endpoint=$(CSI_ENDPOINT)
            - --logtostderr
            - --v=5
          env:
            - name: AWS_REGION
              value: eu-central-1
...

If I follow the error log I see it is trying to access the metadata service when creating a new nodeService, see here

I assume when specifying the AWS_REGION variable it should respect that. If that's the case, I am willing to create a PR for that.

@geofffranks
Copy link
Author

We've specified it as such:

          env:
            - name: CSI_ENDPOINT
              value: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
          # overwrite the AWS region instead of looking it up dynamically via the AWS EC2 metadata svc
            - name: AWS_REGION
              value: us-east-1

Also have tried AWS_DEFAULT_REGION as the AWS CLI uses that variable name

@prashantokochavara
Copy link

I seem to be having the same issue when running it on my OpenShift cluster in AWS. My service account has full admin rights, but still this panics and fails.

@prashantokochavara
Copy link

@leakingtapan - is this a bug? Or am I missing some env variables somewhere?
@geofffranks / @franklevering - were you able to resolve this?

@leakingtapan
Copy link
Contributor

It sounds like a bug if AWS_REGION is specified but not honored. But I haven’t got enough time to root cause the issue

@nitesh-sharma-trilio
Copy link

I am also facing the issue.

@tanalam2411
Copy link

hello folks did anyone got a way out of this I am facing this while integrating my OCP with EBS thanks in advance.

@prashantokochavara
Copy link

@leakingtapan -- is it possible to get a rough estimate on when this fix would be available?

@mvaldesdeleon
Copy link

Hi Everyone. I've been looking into this issue a bit closer, and can confirm that this is not a misconfiguration, and also not related to the AWS_REGION environment variable being defined or not.

If you follow the stack trace [1], you end up realising that the driver relies heavily on the metadata service to retrieve the current instance id, availability zone for topology-aware dynamic provisioning, and information about the instance family used to derive the maximum number of EBS volumes that could be attached.

The way I see it, keeping in mind I'm not a member of this project, this does not look like a bug that should be fixed, but rather as a requirement of the driver that should be explicitly documented.

For the time being, I'm working around this issue by using a slightly more specific iptables rule leveraging the string extension [2] to filter only packets containing "iam/security-credentials" [3] within their first 100 bytes:

iptables --insert FORWARD 1 --in-interface eni+ --destination 169.254.169.254/32 -m string --algo bm --to 100 --string 'iam/security-credentials' --jump DROP

I'm would not bet on this to ensure that someone who REALLY wants to access this URL is able to do so, but it should help in most cases. Eager to hear if anyone can think of a better solution.

[1] https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/pkg/driver/node.go
[2] http://ipset.netfilter.org/iptables-extensions.man.html#lbCE
[3] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#instance-metadata-security-credentials

@prashantokochavara
Copy link

@mvaldesdeleon - where are you running the iptables command exactly? Is this something manually you are setting on the nodes?

@maxbraun
Copy link

maxbraun commented Apr 27, 2020

@prashantokochavara Martin is referring to the Worker Nodes, where Metadata Endpoint Access is restricted (https://docs.aws.amazon.com/de_de/eks/latest/userguide/restrict-ec2-credential-access.html)

@dmc5179
Copy link

dmc5179 commented May 20, 2020

According to the AWS docs the meta-data endpoint is a link local address which can only be reached from the host. Can it actually be reached from inside a container? When I try it on my cluster I'm able to curl the metadata endpoint from the host itself but I get a "connection refused" when trying the same command from inside the ebs-csi-controller

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

@dmc5179
Copy link

dmc5179 commented May 20, 2020

So I was able to workaround this issue by disabling the liveness containers and probes for 9808 and then enabling hostNetworking for the csi controller pod

@prashantokochavara
Copy link

@dmc5179 - can you show me your yaml file and how you are making these changes for the controller pod/deployment?

@dmc5179
Copy link

dmc5179 commented May 21, 2020

@prashantokochavara Yes, I'll add them here.

Wanted to note that I found the reason hostNetwork is not needed in vanilla kube is because the OpenShift SDN won't route the link local requests but vanilla kube will. I've created some other feature requests to support this driver on OpenShift.

@prashantokochavara
Copy link

Thanks @dmc5179
I haven't tried it on vanilla k8s yet, but that is what I was expecting.
My env is OpenShift 4.3/4.4 and I'm able to get through with static provisioning, but wherever dynamic is required (metadata), i run into those network issues.
hopefully your workaround is all I need :)

@dmc5179
Copy link

dmc5179 commented May 21, 2020

Here is a link to my fork of the driver where I modified the helm chart to support OpenShift 4.3. Note that I modified the 0.3.0 chart and ran that on my cluster. The git repo is version 0.4.0 of the helm chart. I don't see any reason why my modifications would not work. That being said, if you need to use the 0.3.0 version of the chart, take the changes that I made in my git repo and apply them to the deployment.yaml and daemonset.yaml files in the 0.3.0 version of the chart. Let me know if that makes any sense.

https://github.com/dmc5179/aws-ebs-csi-driver

Another member of our team tried this modification in AWS commercial and it worked.

Note that there is one additional modification in my version of the chart. Because I'm deploying in a private AWS region I need to add certificates to support the custom API endpoint. I could not find anyway to get them into the CSI driver containers, long story. What I ended up doing is a hostPath mount to /etc/pki which works. If you do not want that host mount and/or don't need it, just comment it out in the files that I changed in my version of the driver.

@prashantokochavara
Copy link

Thanks @dmc5179 Appreciate it!
I am going to try this on the 0.5.0 version of the helm chart (need snapshots/expansion).

@prashantokochavara
Copy link

@dmc5179 - I was finally able to get past the metadata issue. Thanks!

In addition to the changes that you had in your fork, I also had to disable the liveness container and pod that you had mentioned earlier. I basically commented those parts in the node and controller .yaml files.

Adding them here in case anyone else needs it.
aws-ebs-csi-driver.zip

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 31, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 30, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@groodt
Copy link
Contributor

groodt commented Apr 6, 2021

We've run into this issue and can confirm that even with the AWS_REGION variable set, this still fails in an environment with IMDSv2 and 1 hop (AWS EKS Security best-practice). This is due to: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/pkg/driver/node.go#L85

This is with v0.10.0

@vdhanan
Copy link
Contributor

vdhanan commented Apr 7, 2021

/assign

@niemeyer
Copy link

Same problem here. Have done the CSI and CNI installation by the book following the official guides as linked from the web console, and now observing the same panic reported in this issue. Modifying the base files to include the AWS_REGION environment variable doesn't seem to help either.

@pallabganai
Copy link

Same problem here. Have done the CSI and CNI installation by the book following the official guides as linked from the web console, and now observing the same panic reported in this issue. Modifying the base files to include the AWS_REGION environment variable doesn't seem to help either.

ec2-user@ip-10-0-4-68 ~]$ kubectl -n kube-system logs -f ebs-csi-controller-6d76fb9595-982fs -c ebs-plugin
I0831 08:05:50.554214       1 metadata.go:101] retrieving instance data from ec2 metadata
W0831 08:05:56.868620       1 metadata.go:104] ec2 metadata is not available
I0831 08:05:56.868707       1 metadata.go:112] retrieving instance data from kubernetes api
I0831 08:05:56.869895       1 metadata.go:117] kubernetes api is available
panic: did not find aws instance ID in node providerID string

goroutine 1 [running]:
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.newNodeService(0xc000088f00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/node.go:88 +0x2c7
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.NewDriver(0xc0004b5f40, 0x7, 0x7, 0x7f07585329f0, 0x100000000000000, 0x0)
        /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/driver.go:97 +0x352
main.main()
        /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/cmd/main.go:46 +0x255

I am getting the above error when configured Amazon EBS CSI driver to setup PersistentVolumeClaim in Fargate. I believe it is the same issue. Is there any workaround available?

@groodt
Copy link
Contributor

groodt commented Aug 31, 2021

I'm not sure if this is likely to work for Fargate, but you could try the recent manifest that removes hostNetwork access.

kubectl kustomize "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/ecr?ref=8d19fe36b6c2972124a769ae56fa4a3ba8aa1180" > aws-ebs-csi-driver.yaml

Actually, ignore me. Fargate does not support EBS: https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html

However, Fargate does apparently support static EFS provisioning, so perhaps that will solve your problem. https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html

@gwvandesteeg
Copy link

to help people find the root cause of this when upgrading from a previous version using the helm chart, the problem arises from the location of the IRSA serviceAccount annotation having changed from.

serviceAccount:
  controller:
    annotations:
      eks.amazonaws.com/role-arn: "..."
  node:
    annotations:
      eks.amazonaws.com/role-arn: "..."

to

controller:
  serviceAccount: 
    annotations:
      eks.amazonaws.com/role-arn: "..."
node:
  serviceAccount: 
    annotations:
      eks.amazonaws.com/role-arn: "..."

Observable symptom is:
GRPC error: rpc error: code = Internal desc = Could not detach volume "vol-xxxxxxxxxxx" from node "i-xxxxxxxxx": error listing AWS instances: "NoCredentialProviders: no valid providers in chain caused by: EnvAccessKeyNotFound: failed to find credentials in the environment.\nSharedCredsLoad: failed to load profile, .EC2RoleRequestError: no EC2 instance role found\ncaused by: EC2MetadataError: failed to make EC2Metadata request\n\n\tstatus code: 401, request id:

@jhonis
Copy link

jhonis commented Nov 24, 2022

Just an addition for those who landed here seeking for a solution...

@wongma7 said in his #474 (comment) that if we were on a recent version of the SDK we would be safe.

However, if you are setting IMDSv2 as required you may be facing the 401 issue reported by @gwvandesteeg on his #474 (comment) because of the hop limit as the HTTP request is not being sent directly from the EC2.
The solution is quite easy, just increase the hop limit and it will work 😉 (3 is working for me, but I haven't tested 2)

@groodt
Copy link
Contributor

groodt commented Nov 24, 2022

I personally feel this issue can be closed. At $dayJob, we are running v1.3.0 and have fully removed hostNetwork from this workload and everything is working with IRSA and the default hop limit of 1.

@gwvandesteeg
Copy link

gwvandesteeg commented Nov 24, 2022

Just an addition for those who landed here seeking for a solution...

@wongma7 said in his #474 (comment) that if we were on a recent version of the SDK we would be safe.

However, if you are setting IMDSv2 as required you may be facing the 401 issue reported by @gwvandesteeg on his #474 (comment) because of the hop limit as the HTTP request is not being sent directly from the EC2. The solution is quite easy, just increase the hop limit and it will work 😉 (3 is working for me, but I haven't tested 2)

From a least privilege security stand point, using the hop count with IMDSv2 would not be recommended. This approach means you've provided all workloads on those worker nodes the same IAM permissions as the worker node itself instead of only granting the workload the permissions it needs using IRSA, as well as limiting the worker node to only the permissions it itself needs.

@joebowbeer
Copy link

joebowbeer commented Jan 28, 2023

@groodt This was working for us in EKS 1.23 but is now failing in EKS 1.24

Failure in EKS 1.24:

The ebs-plugin containers in the ebs-csi-node pods fail to start because they cannot retrieve instance data.

ebs-csi-node/ebs-plugin reports that ec2 metadata is not available, and subsequently fails to retrieve it from kubernetes api, which times out

retrieving instance data from ec2 metadata
ec2 metadata is not available
retrieving instance data from kubernetes api
kubernetes api is available
...timeout...

Success in EKS 1.23:

ebs-csi-node/ebs-plugin successfully retrieves instance data from ec2 metadata

retrieving instance data from ec2 metadata
ec2 metadata is available

One difference introduced in 1.24 is that automountServiceAccountToken defaults to false, which might explain why the kubernetes api fallback is failing?

But why is the ec2 metadata not available?

I find it interesting that ebs-csi-node does not typically have an IAM role attached, and yet it is retrieving ec2 metadata.

Slack: https://kubernetes.slack.com/archives/C0LRMHZ1T/p1674883684976729


Failure happens here:

https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/pkg/driver/node.go#L82
https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/pkg/cloud/metadata.go#L84
https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/pkg/cloud/metadata_ec2.go#L23
https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/pkg/cloud/metadata_k8s.go#L38

Explanation of how ebs-csi-node obtains instance data: #821 (comment)

@gwvandesteeg
Copy link

There are many ways of controlling access to the EC2 metadata and AWS APIs, not limited to:

  • IAM Instance Profile instead of using IRSA
  • AWS creds injected into workload via env vars
  • AWS Organization SCP preventing access to the Metadata Endpoint
  • IMDSv2 in use with a Hop Limit preventing workloads from accessing it, and IMDSv1 disabled on the worker node
    In addition, automountServiceAccountToken needs to be enabled for IRSA to work (IAM Roles for Service Accounts.. it's in the name, and the annotation is on the service account).

Some references:

@joebowbeer
Copy link

joebowbeer commented Jan 28, 2023

@gwvandesteeg Thanks for the references.

In addition, automountServiceAccountToken needs to be enabled for IRSA to work (IAM Roles for Service Accounts.. it's in the name, and the annotation is on the service account).

IRSA does not need automountServiceAccountToken to be enabled. If anything, IRSA enables it to be disabled 😄

IRSA leverages Service Account Token Volume Projection:

In Kubernetes 1.24, a service account token is dynamically generated when the pod runs and is only valid for an hour by default. A secret for the service account will not be created.

https://aws.github.io/aws-eks-best-practices/security/docs/iam/#iam-roles-for-service-accounts-irsa

@joebowbeer
Copy link

Close?

@cpaton
Copy link

cpaton commented Jul 27, 2023

This still appears to be an issue for Windows nodes. In a mixed EKS cluster it works fine on Linux, but the windows instances are in a crash loop back-off reporting the following in the ebs-plugin container

I0727 13:09:59.984563    9844 driver.go:75] "Driver Information" Driver="ebs.csi.aws.com" Version="release-1.21"
I0727 13:09:59.984563    9844 node.go:85] "regionFromSession Node service" region=""
I0727 13:09:59.984563    9844 metadata.go:85] "retrieving instance data from ec2 metadata"
I0727 13:10:03.172037    9844 metadata.go:88] "ec2 metadata is not available"
I0727 13:10:03.172037    9844 metadata.go:96] "retrieving instance data from kubernetes api"
I0727 13:10:03.172037    9844 metadata.go:101] "kubernetes api is available"
panic: error getting Node ip-10-16-94-243.eu-west-1.compute.internal: Get "[https://172.20.0.1:443/api/v1/nodes/ip-10-16-94-243.eu-west-1.compute.internal":](https://172.20.0.1/api/v1/nodes/ip-10-16-94-243.eu-west-1.compute.internal%22:) dial tcp 172.20.0.1:443: connectex: A connection attempt failed because the connected party did not properly respond
after a period of time, or established connection failed because connected host has failed to respond.
goroutine 1 [running]:
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.newNodeService(0xc00007fce0)
     /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/node.go:88 +0x3e5
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.NewDriver({0xc00049ff28, 0x9, 0x4?})
     /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/driver.go:97 +0x430
main.main()
     /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/cmd/main.go:55 +0x43f
Stream closed EOF for kube-system/ebs-csi-node-windows-fcntc (ebs-plugin)

This is in a setup with IMDSv2 required, and hop limit set to 1 - i.e. instance metadata service is not accessible from pods.

@RalphSleighK
Copy link

I am seeing the same issue as @cpaton following an upgrade from EKS 1.23 to 1.27.

@BhautikChudasama
Copy link

I have the same issue in k3s. Has anyone tried with k3s / rke2?

│ ebs-plugin I1209 10:23:50.432565       1 driver.go:78] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.25.0"                                                           │
│ ebs-plugin I1209 10:23:50.438532       1 node.go:84] "regionFromSession Node service" region=""                                                                                  │
│ ebs-plugin I1209 10:23:50.438571       1 metadata.go:85] "retrieving instance data from ec2 metadata"                                                                            │
│ ebs-plugin I1209 10:24:03.217578       1 metadata.go:88] "ec2 metadata is not available"                                                                                         │
│ ebs-plugin I1209 10:24:03.217674       1 metadata.go:96] "retrieving instance data from kubernetes api"                                                                          │
│ ebs-plugin I1209 10:24:03.218472       1 metadata.go:101] "kubernetes api is available"                                                                                          │
│ ebs-plugin panic: did not find aws instance ID in node providerID string                                                                                                         │
│ ebs-plugin                                                                                                                                                                       │
│ ebs-plugin goroutine 1 [running]:                                                                                                                                                │
│ ebs-plugin github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.newNodeService(0x400066aba0)                                                                                 │
│ ebs-plugin     /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/node.go:87 +0x330                                                                                │
│ ebs-plugin github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.NewDriver({0x400069fec8, 0xb, 0x4?})                                                                         │
│ ebs-plugin     /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/driver.go:100 +0x354                                                                             │
│ ebs-plugin main.main()                                                                                                                                                           │

@bounteous17
Copy link

@BhautikChudasama I'm having the exact same issue after trying to install the driver on k3s. I had four PODs failing at the beginning, but after defining controller.region during the helm upgrade command I have two PODs running now. The other two are still failing with the same error you described.

NAME                                  READY   STATUS             RESTARTS       AGE
ebs-csi-controller-6bdcb4d559-m67n9   5/5     Running            0              13m
ebs-csi-controller-6bdcb4d559-zmr8z   5/5     Running            0              13m
ebs-csi-node-v2vxr                    0/3     CrashLoopBackOff   19 (46s ago)   13m
ebs-csi-node-dbktf                    0/3     CrashLoopBackOff   19 (19s ago)   13m

@NoamY8
Copy link

NoamY8 commented Feb 26, 2024

as @gwvandesteeg mentioned, i've disabled IMDSv1 using the hop limit, therefore only IMDSv2 enabled
the pods are up & running, but i still get the following warning logs:

I0222 10:47:16.201601       1 metadata.go:85] retrieving instance data from ec2 metadata
W0222 10:47:19.376606       1 metadata.go:88] ec2 metadata is not available
I0222 10:47:19.376623       1 metadata.go:96] retrieving instance data from kubernetes api
I0222 10:47:19.376968       1 metadata.go:101] kubernetes api is available

can i disable that somehow?
any help you guys?

@ConnorJC3
Copy link
Contributor

/close

The EBS CSI Driver supports running with either metadata from IMDS (either v1 or v2) or Kubernetes itself. If it cannot access IMDS, it will fall back to Kubernetes and use labels on the nodes added by the AWS CCM to determine the instance type, zone, etc.

This issue has become a dumping ground for all sorts of related issues. I am going to close it out as the primary request (running without IMDS) has been possible for a long time now. If you have a separate bug report, feature request, or support request, please open its own issue so it can be properly tracked and addressed.

@k8s-ci-robot
Copy link
Contributor

@ConnorJC3: Closing this issue.

In response to this:

/close

The EBS CSI Driver supports running with either metadata from IMDS (either v1 or v2) or Kubernetes itself. If it cannot access IMDS, it will fall back to Kubernetes and use labels on the nodes added by the AWS CCM to determine the instance type, zone, etc.

This issue has become a dumping ground for all sorts of related issues. I am going to close it out as the primary request (running without IMDS) has been possible for a long time now. If you have a separate bug report, feature request, or support request, please open its own issue so it can be properly tracked and addressed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet