-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misleading error when attaching a persistent disk from a different project #1314
Comments
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
Hi there, just got out of one nice rabbit-hole and thought I won't keep the journey for myself :)
TL;DR
Now I know attaching a disk in one project as a
PersistentVolume
in a GKE in another project is not allowed, but the driver gave me a real hard time on my way to realize that - which is what I want to improve for any other unfortunate dead-end journeymen.The rabbit hole
Trust me, it was a well meant design decision. We decided to keep some persistent disks in their own project (let's call it DISK_PROJECT) out of a scope of an ephemeral project hosting a GKE cluster (GKE_PROJECT). So far I haven't tripped any warning that it's not really supposed to work; all there is speaks about that one can't attach disks from different zones, but that was no problem. We set it up per the instructions in the GKE docs. All seemed fine until we wanted to bind the PV/PVC to a workload when things started sliding the wrong direction without us knowing yet (the logs are from the workload's namespace
events
).Treading that path for the 1st time, it wasn't particularly easy to figure out that the permissions should be added to the "hidden" engine robot account, but we cracked that one.
Curious and cautious about what all the permissions will be required we naturally hit a couple new ones right after we added the previous ones, notably:
Something's already off in this one - it's mentioning the GKE_NODE in the DISK_PROJECT, while the node is actually in its own "GKE_PROJECT". But we didn't notice, so out of a little desperation we added the DISK_PROJECT's
compute.admin
role to the GKE_PROJECT's robot account, at which moment the "403 Forbidden" errors disappeared, but the most strange (while now more obvious) thing happened. The error became a "404 Not Found", because the GCP API couldn't find the node, that didn't exist in the DISK_PROJECT, but existed in the GKE_PROJECT:That was a dead end and at this moment I grew suspicious the driver (
v1.8.7
in our case, butmaster
seems to have it too) "incorrectly" derives the node's project from the disk's one. Then I really found the code that extracts the project from the disk'svolumeHandle
and uses it in the call toAttachDisk
which eventually leads to the GCP API call that produces the error. Ha!Before filing a bug for this "obviously" incorrect assumption, I tried to bypass the driver to see if I'll have more luck directly with the API's
attachDisk
(redacted):Finally it gave out the error I was missing the whole time:
Whoops, OK then!
Next steps
Given the level the driver wants to enforce that rule itself and not delegate it to the API, there are likely two scenarios how to cope with a fix.
Given the driver already derives the project from the disk's handle, the assumption is there. However, nothing warns anyone before starting with the GCP API calls (which leads one on the - hopefully very unnecessary - permission path with a dead end). An obvious and easy fix would be to add the check to
validateControllerPublishVolumeRequest
However nasty, it can still be a legitimate way to let things flow. In such case, the node's project should enter AttachDisk as yet another
instanceProject
argument (or whatever) so it would get passed correctly to the API so the user would get the final error.Either way, now the driver hides the last most important API error behind its silent assumption, which is hopefully worth tackling.
Thanks for reading this novel up to here 🎉 😄
The text was updated successfully, but these errors were encountered: