-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Registering volume gives error "Unknown volume attachment mode" #10626
Comments
Seeing the same here. I'm pretty sure it was working in 1.1.0-rc1, since I had previously amended my volume spec to include the newly required |
Hi @henriots! Can you grab the allocation logs for the plugin? That might help diagnose the problem. |
@optiz0r can you provide the error message you're seeing (+ alloc logs for the plugin, if available) |
Volume registration command failing:
CSI controller alloc logs (controller had just been restarted prior to running the above command):
No relevant logs in the csi node allocs. nomad leader output when volume register command is run
Volume spec:
|
CSI alloc logs are just those, so no help:
Nomad logs |
Same here, I think this happens in nomad already without talking to the CSI plugin. Can be reproduced with https://gitlab.com/rocketduck/csi-plugin-nfs/-/tree/main/nomad (the example.volume file probably needs adjusting to 1.1 -> capabilities) |
I wonder if this is at fault: Lines 46 to 47 in 771aad2
The code still assumes that an attachment_mode exists while there it should be a list of caps already, no? |
Seeing similar behaviour using the AWS EBS CSI plugin. CSI plugin logs are clean, but the Nomad client that is taking in the register request shows those 500... On subsequent tries (read keyboard arrow up and ENTER), the error is slightly different in wording: Some more context in my case: https://discuss.hashicorp.com/t/unable-to-get-a-csi-volume-registered/18805/3 |
...which would explain why the plugin logs are clean... |
Here's another twist in the plot. So both the controller and nodes plugins run on the same Nomad worker.
Controllers Healthy = 1
Controllers Healthy = 0
Container Storage Interface Might be unrelated but strange still... |
@tgross I have prepared an easy reproducer for you. Deploy this job:
This job runs a monolith version of my plugin in test mode, it has no dependency on an external NFS server or anything. Then try registering this:
|
This is what is sent via the API to nomad:
|
Funny story, after patching my nomad binary locally like this:
I was able to properly register the volume. Running status yielded:
but when deploying a job against it, it still complained: and now all of a sudden Access Mode & Attachment mode show nothing:
|
So there are at least two bugs: Register seems to read in "old" pre 1.1 fields that are no longer populated by the CLI tooling when it should read the capabilities. After that something weird in nomad manages to change & loose Access mode & Attachment mode again. |
@apollo13 Can you reboot your node and check if the plugin comes back healthy? Because that happens in my case with the AWS EBS plugin... Might by a (unrelated) third bug. |
Ok, that one is wrong. This was caused by me not adding access_mode/etc to the EDIT:// The main problem here (first and foremost) seems to be that |
I'm having same exact issue post upgrade to v1.1.0.
|
FWIW I have the following workaround: Instead of registering the volume I just ran "nomad volume create" -- for any proper CSI driver this will work because the |
Thanks for the repro plugin @apollo13. I'm going to take this and (along with the patches merged this morning) see what I can come up with
Hi @khaledabdelaziz, just FYI Nomad cannot be downgraded. |
Ok, so there's definitely a gap in the documentation around how this is supposed to work (and especially how it changed between Nomad 1.0 and Nomad 1.1.0). The original design for Nomad's CSI implementation for better or worse did not intend to implement the volume creation workflow. So when we decided otherwise, we ran into a contradiction between how the access/attach modes were being used and what the So in Nomad 1.1.0 the access/attach mode is removed from the volume when the volume claim is released ref Using the hostpath demo, we can see a volume created via
But if we try to register a volume we still get the "unknown attachment mode" error:
So I'm fairly certain that your patch is on the right track @apollo13, there's just an unfortunately long chain of different RPCs that it needs to get threaded through. I'm getting towards the end of my day here but I'll pick this back up tomorrow morning. Shouldn't be too terrible for me to fix. |
@tgross Thanks for your input. I was able to workaround the downgrade process with the following steps: That brought the cluster back with all acls and other settings. |
And since it failed scheduling in the |
@khaledabdelaziz said:
I should properly say it's unsupported to downgrade, from any version (or from ENT to OSS). We don't have any guarantee of forward compatibility in the state store and it's entirely possible for that snapshot restore to fail as a result, leaving the server in a crash loop. @apollo13 said:
Correct! |
I got pulled off to deal with #10694 for the last week or so, but I'm looking at this one again. Running a Nomad built with #10651 I was able to reproduce the problem fairly easily. Spun up the hostpath plugin demo in https://github.com/hashicorp/nomad/tree/main/demo/csi/hostpath. This results in the expected volume claims. successful nomad volume create
But now let's try to register a volume. First create it in the storage provider: endpoint=/var/nomad/client/csi/monolith/hostpath-plugin0/csi.sock
uuid=$(sudo csc --endpoint "$endpoint" controller \
create-volume 'test-volume[2]' --cap 1,2,ext4 \
| grep -o '".*"' | tr -d '"') new volume specid = "VOLUME_NAME"
name = "VOLUME_NAME"
type = "csi"
plugin_id = "hostpath-plugin0"
external_id = "VOLUME_UUID"
capacity_min = "1MB"
capacity_max = "1GB"
capability {
access_mode = "single-node-reader-only"
attachment_mode = "file-system"
}
capability {
access_mode = "single-node-writer"
attachment_mode = "file-system"
}
secrets {
somesecret = "xyzzy"
}
mount_options {
mount_flags = ["ro"]
} And when we register that new volume, we get the error reported above:
It looks like @apollo13's patch in #10626 (comment) will "fix" the problem but it won't give semantically correct results. The Should be a smallish fix, so I'll work on that next. |
Yes, my patch was just band-aid -- my volumes contained only a single capacity and the old code only allows for one. I needed a quick way to get my volumes working again, preferably without patching the server :D Thanks for working on this again! |
I've opened this PR #10703 and I imagine we'll be able to get that into the upcoming Nomad 1.1.1 patch. Thanks for your patience on this one, folks. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Hello!
After upgrading to v1.1.0, volume registration gives an error:
Error registering volume: Unexpected response code: 500 (rpc error: controller validate volume: Unknown volume attachment mode: )
It worked with v.1.4.0 when having set in volume configuration:
access_mode = "single-node-writer" attachment_mode = "file-system"
Nomad version
Nomad v1.1.0 (2678c36)
Operating system and Environment details
Issue
Reproduction steps
Expected Result
Actual Result
Job file (if appropriate)
The text was updated successfully, but these errors were encountered: