-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot Run with IAM Service Account and no metadata service #474
Comments
How is ur |
@leakingtapan containers:
- name: ebs-plugin
image: amazon/aws-ebs-csi-driver:latest
args :
# - {all,controller,node} # specify the driver mode
- --endpoint=$(CSI_ENDPOINT)
- --logtostderr
- --v=5
env:
- name: AWS_REGION
value: eu-central-1
... If I follow the error log I see it is trying to access the metadata service when creating a new I assume when specifying the |
We've specified it as such:
Also have tried AWS_DEFAULT_REGION as the AWS CLI uses that variable name |
I seem to be having the same issue when running it on my OpenShift cluster in AWS. My service account has full admin rights, but still this panics and fails. |
@leakingtapan - is this a bug? Or am I missing some env variables somewhere? |
It sounds like a bug if AWS_REGION is specified but not honored. But I haven’t got enough time to root cause the issue |
I am also facing the issue. |
hello folks did anyone got a way out of this I am facing this while integrating my OCP with EBS thanks in advance. |
@leakingtapan -- is it possible to get a rough estimate on when this fix would be available? |
Hi Everyone. I've been looking into this issue a bit closer, and can confirm that this is not a misconfiguration, and also not related to the If you follow the stack trace [1], you end up realising that the driver relies heavily on the metadata service to retrieve the current instance id, availability zone for topology-aware dynamic provisioning, and information about the instance family used to derive the maximum number of EBS volumes that could be attached. The way I see it, keeping in mind I'm not a member of this project, this does not look like a bug that should be fixed, but rather as a requirement of the driver that should be explicitly documented. For the time being, I'm working around this issue by using a slightly more specific iptables rule leveraging the string extension [2] to filter only packets containing "iam/security-credentials" [3] within their first 100 bytes: iptables --insert FORWARD 1 --in-interface eni+ --destination 169.254.169.254/32 -m string --algo bm --to 100 --string 'iam/security-credentials' --jump DROP I'm would not bet on this to ensure that someone who REALLY wants to access this URL is able to do so, but it should help in most cases. Eager to hear if anyone can think of a better solution. [1] https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/pkg/driver/node.go |
@mvaldesdeleon - where are you running the iptables command exactly? Is this something manually you are setting on the nodes? |
@prashantokochavara Martin is referring to the Worker Nodes, where Metadata Endpoint Access is restricted (https://docs.aws.amazon.com/de_de/eks/latest/userguide/restrict-ec2-credential-access.html) |
According to the AWS docs the meta-data endpoint is a link local address which can only be reached from the host. Can it actually be reached from inside a container? When I try it on my cluster I'm able to curl the metadata endpoint from the host itself but I get a "connection refused" when trying the same command from inside the ebs-csi-controller https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html |
So I was able to workaround this issue by disabling the liveness containers and probes for 9808 and then enabling hostNetworking for the csi controller pod |
@dmc5179 - can you show me your yaml file and how you are making these changes for the controller pod/deployment? |
@prashantokochavara Yes, I'll add them here. Wanted to note that I found the reason hostNetwork is not needed in vanilla kube is because the OpenShift SDN won't route the link local requests but vanilla kube will. I've created some other feature requests to support this driver on OpenShift. |
Thanks @dmc5179 |
Here is a link to my fork of the driver where I modified the helm chart to support OpenShift 4.3. Note that I modified the 0.3.0 chart and ran that on my cluster. The git repo is version 0.4.0 of the helm chart. I don't see any reason why my modifications would not work. That being said, if you need to use the 0.3.0 version of the chart, take the changes that I made in my git repo and apply them to the deployment.yaml and daemonset.yaml files in the 0.3.0 version of the chart. Let me know if that makes any sense. https://github.com/dmc5179/aws-ebs-csi-driver Another member of our team tried this modification in AWS commercial and it worked. Note that there is one additional modification in my version of the chart. Because I'm deploying in a private AWS region I need to add certificates to support the custom API endpoint. I could not find anyway to get them into the CSI driver containers, long story. What I ended up doing is a hostPath mount to /etc/pki which works. If you do not want that host mount and/or don't need it, just comment it out in the files that I changed in my version of the driver. |
Thanks @dmc5179 Appreciate it! |
@dmc5179 - I was finally able to get past the metadata issue. Thanks! In addition to the changes that you had in your fork, I also had to disable the liveness container and pod that you had mentioned earlier. I basically commented those parts in the node and controller .yaml files. Adding them here in case anyone else needs it. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
We've run into this issue and can confirm that even with the This is with v0.10.0 |
/assign |
Same problem here. Have done the CSI and CNI installation by the book following the official guides as linked from the web console, and now observing the same panic reported in this issue. Modifying the base files to include the AWS_REGION environment variable doesn't seem to help either. |
I am getting the above error when configured Amazon EBS CSI driver to setup PersistentVolumeClaim in Fargate. I believe it is the same issue. Is there any workaround available? |
Actually, ignore me. Fargate does not support EBS: https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html However, Fargate does apparently support static EFS provisioning, so perhaps that will solve your problem. https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html |
to help people find the root cause of this when upgrading from a previous version using the helm chart, the problem arises from the location of the IRSA serviceAccount annotation having changed from. serviceAccount:
controller:
annotations:
eks.amazonaws.com/role-arn: "..."
node:
annotations:
eks.amazonaws.com/role-arn: "..." to controller:
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: "..."
node:
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: "..." Observable symptom is: |
Just an addition for those who landed here seeking for a solution... @wongma7 said in his #474 (comment) that if we were on a recent version of the SDK we would be safe. However, if you are setting IMDSv2 as required you may be facing the 401 issue reported by @gwvandesteeg on his #474 (comment) because of the hop limit as the HTTP request is not being sent directly from the EC2. |
I personally feel this issue can be closed. At $dayJob, we are running v1.3.0 and have fully removed hostNetwork from this workload and everything is working with IRSA and the default hop limit of 1. |
From a least privilege security stand point, using the hop count with IMDSv2 would not be recommended. This approach means you've provided all workloads on those worker nodes the same IAM permissions as the worker node itself instead of only granting the workload the permissions it needs using IRSA, as well as limiting the worker node to only the permissions it itself needs. |
@groodt This was working for us in EKS 1.23 but is now failing in EKS 1.24 Failure in EKS 1.24: The ebs-csi-node/ebs-plugin reports that ec2 metadata is not available, and subsequently fails to retrieve it from kubernetes api, which times out
Success in EKS 1.23: ebs-csi-node/ebs-plugin successfully retrieves instance data from ec2 metadata
One difference introduced in 1.24 is that But why is the ec2 metadata not available? I find it interesting that ebs-csi-node does not typically have an IAM role attached, and yet it is retrieving ec2 metadata. Slack: https://kubernetes.slack.com/archives/C0LRMHZ1T/p1674883684976729 Failure happens here: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/pkg/driver/node.go#L82 Explanation of how ebs-csi-node obtains instance data: #821 (comment) |
There are many ways of controlling access to the EC2 metadata and AWS APIs, not limited to:
Some references:
|
@gwvandesteeg Thanks for the references.
IRSA does not need IRSA leverages Service Account Token Volume Projection:
https://aws.github.io/aws-eks-best-practices/security/docs/iam/#iam-roles-for-service-accounts-irsa |
Close? |
This still appears to be an issue for Windows nodes. In a mixed EKS cluster it works fine on Linux, but the windows instances are in a crash loop back-off reporting the following in the ebs-plugin container
This is in a setup with IMDSv2 required, and hop limit set to 1 - i.e. instance metadata service is not accessible from pods. |
I am seeing the same issue as @cpaton following an upgrade from EKS 1.23 to 1.27. |
I have the same issue in
|
@BhautikChudasama I'm having the exact same issue after trying to install the driver on
|
as @gwvandesteeg mentioned, i've disabled IMDSv1 using the hop limit, therefore only IMDSv2 enabled
can i disable that somehow? |
/close The EBS CSI Driver supports running with either metadata from IMDS (either v1 or v2) or Kubernetes itself. If it cannot access IMDS, it will fall back to Kubernetes and use labels on the nodes added by the AWS CCM to determine the instance type, zone, etc. This issue has become a dumping ground for all sorts of related issues. I am going to close it out as the primary request (running without IMDS) has been possible for a long time now. If you have a separate bug report, feature request, or support request, please open its own issue so it can be properly tracked and addressed. |
@ConnorJC3: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/kind bug
What happened?
The ebs-plugin container on the ebs-csi-controller crashes repeatedly while talking to the metadata service:
What you expected to happen?
When specifying the AWS_REGION variable, and an IAM role service account, the ebs-csi-driver should not need to access the metadata service, and run on its own.
How to reproduce it (as minimally and precisely as possible)?
Anything else we need to know?:
As far as we can tell everything is set up correctly with the role + service account, but the code explicitly tries to instantiate the metadata service, which is firewalled off. Can this be made optional if region is set, and credentials are available via the service account?
Environment
EKS v1.14
ebs-csi-driver v0.5.0, v0.6.0-dirty
The text was updated successfully, but these errors were encountered: