support multiple replicas for linstor-controller #73

WanzenBug · 2020-08-04T07:59:30Z

Running multiple replicas requires special support from the linstor
controller. The controller container will start a leader election
process when it detects the presence of the K8S_AWAIT_ELECTION_*
variables.

The election process determines which pod is allowed to start the
linstor-controller process. Only this pod will be marked as ready
and will receive traffic from the k8s service object.

Should the leader crash or the node its running on goes offline,
a new leader will be elected and allowed to start the controller process.
Note: in case the full node goes offline, the old pod will still be marked
as ready. By using ClusterIP: "" on our service, we ensure we create an actual
proxy (which automatically chooses a responding pod) instead of each client
having to deal with multiple DNS responses which may or may not respond.

See also:

JoelColledge · 2020-08-07T09:31:38Z

We can get this merged now that we are no longer in an RC phase. Rebase needed, and the Helm keys need to be added to the new docs.

JoelColledge

I'm a little unsure about dumping a shell script in the middle of a long go function, but separating it out would be messy because it is a format string...

WanzenBug · 2020-08-07T10:28:31Z

I need to check if environment variables are passed to the ExecProbe. Then we could maybe find a cleaner solution

WanzenBug · 2020-08-07T12:39:32Z

So environment variables are available in ExecProbe

During testing I noticed a few additional issues:

The controller could not handle multiple running controllers at once. I just removed that check, as in this new version multiple controllers are expected.
Rolling upgrades, are not really working. Probably because the new container won't be Ready until it is elected, but the old leader is still running, so the upgrade process basically gets stuck.

The second issue can be traced to the way we (ab-)use the Readymechanism. Not sure how we should deal with this yet

JoelColledge

Looks good. Just waiting for internal tests to succeed.

WanzenBug · 2020-08-20T12:57:08Z

I want to polish our documentation a bit

Running multiple replicas requires special support from the linstor controller. The controller container will start a leader election process when it detects the presence of the K8S_AWAIT_ELECTION_* variables. The election process determines which pod is allowed to start the linstor-controller process. Only this pod will be added as endpoint for the controller service. Should the leader crash or the node its running on goes offline, a new leader will be elected and allowed to start the controller process. Note: in case the full node goes offline, the old pod will still be marked as ready. By using ClusterIP: "" on our service, we ensure we create an actual proxy (which automatically chooses the responding pod) instead of each client having to try multiple DNS responses.

WanzenBug · 2020-09-25T07:46:47Z

@JoelColledge I think this is finally ready

JoelColledge

Looks good to me. Just waiting for internal tests to pass before merging.

JoelColledge · 2020-09-30T11:15:21Z

🎉

WanzenBug force-pushed the controller-replicas branch 2 times, most recently from 2a9141e to 4c8a83a Compare August 4, 2020 09:59

WanzenBug marked this pull request as ready for review August 4, 2020 11:14

WanzenBug force-pushed the controller-replicas branch from 4c8a83a to 627cb7c Compare August 7, 2020 10:01

WanzenBug requested a review from JoelColledge August 7, 2020 10:03

WanzenBug force-pushed the controller-replicas branch from 627cb7c to 707a117 Compare August 7, 2020 10:08

JoelColledge previously approved these changes Aug 7, 2020

View reviewed changes

WanzenBug dismissed JoelColledge’s stale review via 69802cb August 7, 2020 12:26

WanzenBug force-pushed the controller-replicas branch 2 times, most recently from 69802cb to 76b0b9f Compare August 7, 2020 12:30

WanzenBug mentioned this pull request Aug 20, 2020

add election controlled service endpoints LINBIT/k8s-await-election#1

Merged

WanzenBug force-pushed the controller-replicas branch 3 times, most recently from db6f6f0 to 066a083 Compare August 20, 2020 12:28

JoelColledge previously approved these changes Aug 20, 2020

View reviewed changes

WanzenBug marked this pull request as draft August 20, 2020 12:56

WanzenBug dismissed JoelColledge’s stale review via 092803b August 20, 2020 12:58

WanzenBug force-pushed the controller-replicas branch from 066a083 to 092803b Compare August 20, 2020 12:58

WanzenBug marked this pull request as ready for review August 20, 2020 13:16

WanzenBug force-pushed the controller-replicas branch from 092803b to 2df767c Compare August 20, 2020 15:31

WanzenBug force-pushed the controller-replicas branch from 2df767c to 67600ca Compare August 31, 2020 08:01

WanzenBug force-pushed the controller-replicas branch from 67600ca to 02304a2 Compare September 7, 2020 12:22

WanzenBug force-pushed the controller-replicas branch from 02304a2 to 6c43999 Compare September 24, 2020 13:13

WanzenBug force-pushed the controller-replicas branch from 6c43999 to 1d526e1 Compare September 24, 2020 14:39

JoelColledge approved these changes Sep 25, 2020

View reviewed changes

JoelColledge merged commit 53688fe into piraeusdatastore:master Sep 30, 2020

WanzenBug deleted the controller-replicas branch September 30, 2020 11:15

WanzenBug linked an issue Sep 30, 2020 that may be closed by this pull request

Enable linstor-controller to run with multiple replicas #56

Closed

WanzenBug mentioned this pull request Sep 30, 2020

Two controllers? #52

Closed

kvaps mentioned this pull request Oct 10, 2020

Auto recovery after server crash. piraeusdatastore/linstor-csi#58

Open

sribee mentioned this pull request Nov 9, 2020

linstor volume list command fails with NullPointerException #114

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support multiple replicas for linstor-controller #73

support multiple replicas for linstor-controller #73

WanzenBug commented Aug 4, 2020 •

edited

Loading

JoelColledge commented Aug 7, 2020

JoelColledge left a comment

WanzenBug commented Aug 7, 2020

WanzenBug commented Aug 7, 2020

JoelColledge left a comment

WanzenBug commented Aug 20, 2020

WanzenBug commented Sep 25, 2020

JoelColledge left a comment

JoelColledge commented Sep 30, 2020

support multiple replicas for linstor-controller #73

support multiple replicas for linstor-controller #73

Conversation

WanzenBug commented Aug 4, 2020 • edited Loading

JoelColledge commented Aug 7, 2020

JoelColledge left a comment

Choose a reason for hiding this comment

WanzenBug commented Aug 7, 2020

WanzenBug commented Aug 7, 2020

JoelColledge left a comment

Choose a reason for hiding this comment

WanzenBug commented Aug 20, 2020

WanzenBug commented Sep 25, 2020

JoelColledge left a comment

Choose a reason for hiding this comment

JoelColledge commented Sep 30, 2020

WanzenBug commented Aug 4, 2020 •

edited

Loading