csi: unpublish workflow ID mismatches #7626

tgross · 2020-04-04T15:07:27Z

Fixes for #7628

The CSI plugins uses the external volume ID for all operations, but the Client CSI RPCs uses the Nomad volume ID (human-friendly) for the mount paths. Pass the External ID as an arg in the RPC call so that the unpublish workflows have it without calling back to the server to find the external ID.
The controller CSI plugins need the CSI node ID (or in other words, the storage provider's view of node ID like the EC2 instance ID), not the Nomad node ID, to determine how to detach the external volume.

tgross · 2020-04-04T19:19:29Z

I want to do some refactoring after the release with some of these names to disambiguate them to prevent this sort of error in the future, but for now this will do the job.

langmartin · 2020-04-04T20:45:54Z

nomad/core_sched.go

+		}
+		targetCSIInfo, ok := targetNode.CSINodePlugins[args.plug.ID]
+		if !ok {
+			return args.nodeClaims, fmt.Errorf("Failed to find NodeInfo for node: %s", targetNode.ID)


It seems like this failure could be temporary, will we retry this again if the node plugin is in the process of coming back up? Also, is the node CSI id constant when the plugin comes back up or is it generated by the node plugin everytime it starts?

The GC job will get requeued (which works fine for us with #7632 where the Job.Deregister will use the same GC path).

And as far as I can tell the CSI Node ID is implementation-specific. But in practice it has to be a fixed identifier of the underlying host. So for example, in the EBS case it's the EC2 instance ID, because if it wasn't the controller wouldn't be able to attach the external volume to it.

tgross · 2020-04-05T19:29:01Z

#7632 includes all these changes, so we can either merge that in to pick these all up or merge this separately and rebase it from master once that's done.

tgross · 2020-04-06T13:43:11Z

Closing as these changes were rolled into #7632

github-actions · 2023-01-10T02:17:36Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

csi: use internal ID for node unpublish

6a9cc4f

tgross added the theme/storage label Apr 4, 2020

tgross added 2 commits April 4, 2020 11:58

ensure we pass controller CSI node ID, not Nomad node ID

8c2b93a

node unpublish workflows require both external and internal IDs

4484a5f

tgross changed the title ~~csi: use internal ID for node unpublish~~ csi: unpublish workflow ID mismatches Apr 4, 2020

tgross force-pushed the csi_use_internal_id_for_node_unpublish branch 3 times, most recently from e8a8fd1 to 4484a5f Compare April 4, 2020 18:49

tgross mentioned this pull request Apr 4, 2020

csi: controller plugin timeouts #7629

Closed

tgross marked this pull request as ready for review April 4, 2020 19:18

tgross requested a review from langmartin April 4, 2020 19:18

langmartin reviewed Apr 4, 2020

View reviewed changes

tgross closed this Apr 6, 2020

tgross deleted the csi_use_internal_id_for_node_unpublish branch April 7, 2020 12:21

github-actions bot locked as resolved and limited conversation to collaborators Jan 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csi: unpublish workflow ID mismatches #7626

csi: unpublish workflow ID mismatches #7626

tgross commented Apr 4, 2020 •

edited

Loading

tgross commented Apr 4, 2020

langmartin Apr 4, 2020

tgross Apr 5, 2020

tgross commented Apr 5, 2020

tgross commented Apr 6, 2020

github-actions bot commented Jan 10, 2023

csi: unpublish workflow ID mismatches #7626

csi: unpublish workflow ID mismatches #7626

Conversation

tgross commented Apr 4, 2020 • edited Loading

tgross commented Apr 4, 2020

langmartin Apr 4, 2020

Choose a reason for hiding this comment

tgross Apr 5, 2020

Choose a reason for hiding this comment

tgross commented Apr 5, 2020

tgross commented Apr 6, 2020

github-actions bot commented Jan 10, 2023

tgross commented Apr 4, 2020 •

edited

Loading