csi: retry controller client RPCs on next controller #8561

tgross · 2020-07-29T20:17:28Z

Part of a fix for #8080, #8100, #8232 as summarized in #8232 (comment). Will help mitigate #8285, #8145, #8057. (1 of 4 PRs)

The documentation encourages operators to run multiple controller plugin
instances for HA, but the client RPCs don't take advantage of this by retrying
when the RPC fails in cases when the plugin is unavailable (because the node
has drained or the alloc has failed but we haven't received an updated
fingerprint yet).

This changeset tries all known controllers before giving up, and adds tests
that exercise the client RPC routing and retries.

The documentation encourages operators to run multiple controller plugin instances for HA, but the client RPCs don't take advantage of this by retrying when the RPC fails in cases when the plugin is unavailable (because the node has drained or the alloc has failed but we haven't received an updated fingerprint yet). This changeset tries all known controllers on ready nodes before giving up, and adds tests that exercise the client RPC routing and retries.

langmartin

👍

nomad/client_csi_endpoint.go

langmartin · 2020-08-06T15:37:50Z

nomad/client_csi_endpoint.go

+		return nil, fmt.Errorf("failed to find clients running controller plugin %q", pluginID)
+	}
+
+	rand.Shuffle(len(clientIDs), func(i, j int) {


github-actions · 2022-12-27T02:14:11Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

tgross force-pushed the b-csi-controller-retries branch from 9e06549 to 1c0e229 Compare July 30, 2020 12:41

tgross force-pushed the b-csi-controller-retries branch from 1c0e229 to d2b3ab3 Compare July 30, 2020 20:10

tgross marked this pull request as ready for review July 31, 2020 17:18

tgross requested a review from langmartin July 31, 2020 17:18

langmartin approved these changes Aug 6, 2020

View reviewed changes

tgross merged commit 07ff0b9 into master Aug 6, 2020

tgross deleted the b-csi-controller-retries branch August 6, 2020 17:24

tgross added this to the 0.12.2 milestone Aug 6, 2020

github-actions bot locked as resolved and limited conversation to collaborators Dec 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csi: retry controller client RPCs on next controller #8561

csi: retry controller client RPCs on next controller #8561

tgross commented Jul 29, 2020 •

edited

Loading

langmartin left a comment

langmartin Aug 6, 2020

github-actions bot commented Dec 27, 2022

csi: retry controller client RPCs on next controller #8561

csi: retry controller client RPCs on next controller #8561

Conversation

tgross commented Jul 29, 2020 • edited Loading

langmartin left a comment

Choose a reason for hiding this comment

langmartin Aug 6, 2020

Choose a reason for hiding this comment

github-actions bot commented Dec 27, 2022

tgross commented Jul 29, 2020 •

edited

Loading