Poor handling of "nomad volume status -verbose" for CSI plugins without LIST_VOLUMES capability #15040
Labels
stage/accepted
Confirmed, and intend to work on. No timeline committment though.
theme/bad-ux
theme/storage
Nomad version
Nomad v1.3.3 (428b2cd)
Tested with this version but v1.4.1 code has the same issues (and is linked below).
Operating system and Environment details
Red Hat Enterprise Linux release 8.4 (Ootpa)
Issue
The BeeGFS CSI Driver does not support the CSI LIST_VOLUMES capability. When
nomad volume status -verbose
is run, a raw HTTP error code and the message "unimplemented for this plugin" are shown.Reproduction steps
nomad volume status -verbose
Expected Result
Similar output to when
nomad volume status
is run, perhaps with an additional message indicating specifically that LIST_VOLUMES is not implemented or that listing external volumes is not supported by this plugin.Actual Result
Similar output to when
nomad volume status
is run, followed by an raw HTTP error code and the message "unimplemented for this plugin".Nomad command output
Code deep dive
This only occurs because we execute "nomad volume status" with BOTH "-verbose" AND no specified ID.
No specified ID invokes "list mode":
https://github.com/hashicorp/nomad/blob/v1.4.1/command/volume_status_csi.go#L20-L23
No "-verbose" returns early:
https://github.com/hashicorp/nomad/blob/v1.4.1/command/volume_status_csi.go#L104-L106
Both conditions together ultimately result in the invocation of the ListVolumes RPC (and the documented failure message):
https://github.com/hashicorp/nomad/blob/v1.4.1/command/volume_status_csi.go#L128-L136
A Nomad server returns a 500 error code and the message "unimplemented for this plugin" when it recognizes that a driver doesn't advertise the LIST_VOLUMES capability.
https://github.com/hashicorp/nomad/blob/v1.4.1/nomad/csi_endpoint.go#L1189-L1191
We are supposed to see something like this for drivers that support the LIST_VOLUMES capability. Nomad already quietly moves on without printing the external list portion if it thinks nothing went wrong and doesn't have a list of volumes.
https://github.com/hashicorp/nomad/blob/v1.4.1/command/volume_status_csi.go#L137-L140
It seems like we just need a way for Nomad to understand to move on quietly in this circumstance as well. Maybe the Nomad server should just return an empty list instead of a 500 error when the LIST_VOLUMES capability isn't supported? Maybe the Nomad command should know to look for "unimplemented for this plugin" in the error output and move on quietly (or optionally print something more informative)?
The text was updated successfully, but these errors were encountered: