-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example IAM policy is insufficient #935
Comments
is the error message accurate though? "InvalidVolume.NotFound: The volume 'vol-04da06270c9fd721e' does not exist." Why is ec2 returning this error, does this volume actually exist or not? |
Somewhat related to this issue, I think we do need to provide more examples though with more documentation. The current example https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/docs/example-iam-policy.json is a bit too restrictive because it breaks in migration scenario. Also it doesn't match the policy we use for e2e testing https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/hack/kops-patch.yaml#L6 which looks a lot like the 0.6.0 one. However I am still a bit baffled how even the 0.6.0 was not sufficient in this specific case. |
I've tried again with the current policy, results are the same Sorry, I've missed a crucial detail:
Despite this, VolumeSnapshot is in
The volume is missing from the volume list (I'm AWS noob, I think they should appear here, next to volumes underlying other PVs: https://eu-west-1.console.aws.amazon.com/ec2/v2/home?region=eu-west-1#Volumes:sort=desc:createTime ) Is there another way to check what permissions are required for the task?
|
There is this new feature called IAM Access Analyze. I have never tried it but in theory if you analyze the role that the policy is attached to then it will spit out exactly what you want and we can do a diff between it and the example. As for the events, from my experience "the object has been modified; please apply your changes to the latest version and try again" are usually intermittent, they indicate that the snapshotter's internal cache is out of date somehow. But like you said it doesn't really amke sense given that changing IAM permissions fixes the issue, so I think it's a red herring. |
I've used For reference, IAM policy generated by the tool:
|
Can you share logs or excerpts from the driver around the time of the error.
The command
Also, does the StorageClass have WaitForFirstConsumer like here ?
One explanation for the original volume NotFound error is the Pod trying to use the PVC got scheduled to a different zone than vol-04da06270c9fd721e was in. |
Hi! Logs from
Excerpt from logs of
Yes, I've used I thought that the lack of logs from |
I had the same issue and turns out that it was related to KMS permission, it was using the default KMS but the role had no access to it, so I added the missing permissions to the example policy. Perhaps setting Note: I'm using terraform to replace variables. Helm Values enableVolumeSnapshot: true
serviceAccount:
controller:
create: true
name: "ebs-csi-controller-sa"
annotations:
eks.amazonaws.com/role-arn: "${serviceAccountRoleArn}"
storageClasses:
- name: ebs-sc
annotations:
storageclass.kubernetes.io/is-default-class: "true"
volumeBindingMode: WaitForFirstConsumer
parameters:
encrypted: "true"
kmsKeyId: "${kmsKeyId}" Policy: {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:CreateSnapshot",
"ec2:AttachVolume",
"ec2:DetachVolume",
"ec2:ModifyVolume",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInstances",
"ec2:DescribeSnapshots",
"ec2:DescribeTags",
"ec2:DescribeVolumes",
"ec2:DescribeVolumesModifications"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2:CreateTags"
],
"Resource": [
"arn:aws:ec2:*:*:volume/*",
"arn:aws:ec2:*:*:snapshot/*"
],
"Condition": {
"StringEquals": {
"ec2:CreateAction": [
"CreateVolume",
"CreateSnapshot"
]
}
}
},
{
"Effect": "Allow",
"Action": [
"ec2:DeleteTags"
],
"Resource": [
"arn:aws:ec2:*:*:volume/*",
"arn:aws:ec2:*:*:snapshot/*"
]
},
{
"Effect": "Allow",
"Action": [
"ec2:CreateVolume"
],
"Resource": "*",
"Condition": {
"StringLike": {
"aws:RequestTag/ebs.csi.aws.com/cluster": "true"
}
}
},
{
"Effect": "Allow",
"Action": [
"ec2:CreateVolume"
],
"Resource": "*",
"Condition": {
"StringLike": {
"aws:RequestTag/CSIVolumeName": "*"
}
}
},
{
"Effect": "Allow",
"Action": [
"ec2:DeleteVolume"
],
"Resource": "*",
"Condition": {
"StringLike": {
"ec2:ResourceTag/CSIVolumeName": "*"
}
}
},
{
"Effect": "Allow",
"Action": [
"ec2:DeleteVolume"
],
"Resource": "*",
"Condition": {
"StringLike": {
"ec2:ResourceTag/ebs.csi.aws.com/cluster": "true"
}
}
},
{
"Effect": "Allow",
"Action": [
"ec2:DeleteSnapshot"
],
"Resource": "*",
"Condition": {
"StringLike": {
"ec2:ResourceTag/CSIVolumeSnapshotName": "*"
}
}
},
{
"Effect": "Allow",
"Action": [
"ec2:DeleteSnapshot"
],
"Resource": "*",
"Condition": {
"StringLike": {
"ec2:ResourceTag/ebs.csi.aws.com/cluster": "true"
}
}
},
{
"Effect": "Allow",
"Action": [
"kms:CreateGrant",
"kms:ListGrants",
"kms:RevokeGrant"
],
"Resource": ["${kmsKeyId}"],
"Condition": {
"Bool": {
"kms:GrantIsForAWSResource": "true"
}
}
},
{
"Effect": "Allow",
"Action": [
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey"
],
"Resource": ["${kmsKeyId}"]
}
]
}
|
Thanks a lot, adding the KMS permissions solved the issue for me! |
I analysed the events in Cloud Trail and noticed it was in a loop of Creating Volume and Deleting Volume, and by analysing the Creating Volume events, it was saying that it was creating and was using the default KMS key. I created another KMS giving access to the role which was being assumed by the Service Account and voilà it worked. IMO this is a bug still, just don't know if on this project or AWS as I would assume that Cloud Trail should have had log the event of failing to use the KMS. Meanwhile, we need to figure out the best way of improving the documentation. |
Same issue here, working out of the box with |
Manage to get it to work here with KMS key handling in Terraform: https://github.com/particuleio/terraform-kubernetes-addons/blob/main/modules/aws/aws-ebs-csi-driver.tf |
Can anyone point to what key actually gets selected by the provisioner when only |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind bug
What happened?
I was following this guide, and when my pod attempted to use a restored PVC (with snapshot datasource), I've got following error
What you expected to happen?
I expected PVC to be created and bound successfully
How to reproduce it (as minimally and precisely as possible)?
Just follow the AWS guide: https://aws.amazon.com/blogs/containers/using-ebs-snapshots-for-persistent-storage-with-your-eks-cluster/
Anything else we need to know?:
I suspect this is not-sufficient-permissions problem. I've used this IAM policy
After I've added the entire universe to my permission list (as below) I was able to create and restore snapshots successfully
Environment
kubectl version
):1.0.0 (Helm release)
The text was updated successfully, but these errors were encountered: