Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Galera service A/P #229

Merged

Conversation

dciabrin
Copy link
Contributor

@dciabrin dciabrin commented Jun 12, 2024

The service created for a Galera cluster always balances traffic to all available Galera replicas, which means the cluster is effectively configured as an A/A service.

In order to limit the burden on clients of supporting A/A semantics, change the way the service is configured to act as A/P instead.

A new callback script is introduced in pods. It is called by mysqld automatically every time the Galera library detects a change in the cluster (node join, crash, network partition...). This callback script reconfigures the service CR via label selectors to drive traffic to a single pod at a time.

Jira: OSPRH-7405

Copy link
Contributor

@zzzeek zzzeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

were we going to do a CRD change to make this configurable? would be handy for us to be able to deploy A/A so that we can in fact have CI / tempest jobs that do this and we can slowly try to improve codebases

@dciabrin
Copy link
Contributor Author

@zzzeek my plan is to do that in a subsequent PR, to avoid any CRD change that would force a bump in openstack-operator.

@stuggi
Copy link
Contributor

stuggi commented Jun 12, 2024

@zzzeek my plan is to do that in a subsequent PR, to avoid any CRD change that would force a bump in openstack-operator.

Depending on the CRD change, I think we should do it now.

service.Spec = pkgsvc.Spec
if present {
service.Spec.Selector[ActivePodSelectorKey] = activePod
}
err := controllerutil.SetOwnerReference(instance, service, r.Client.Scheme())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why the instance is not set as the Controller (same for L483)? currently the controller won't reconcile if the service gets deleted.

I guess we need it now for this PR as otherwise the controller will reconcile on the selector update and returns it to what it think it should be?

With this, there is a scenario where we'd loose the a-p selector. If someone deletes the service (for whatever reason), first of all the controller currently won't reconcile and re-create it. So a user would have to restart the mariadb operator, or do some other change to trigger a reconcile, but in this scenario, we don't have the active pod selector as it is a new service and there won't be a change to the galera cluster and the script won't run. With this we are in the multi master config.

Another approach could be (not sure yet which one is better) that the script adds an annotation to the statefulset on which one is the master (I guess on a pod restart it will always run at least once?), the controller would then reconcile and can update the service.
There is also a downside in this approach, the operator needs to run. If the mariadb-operator is down when there is a galera change, those would not be reflected properly. This could happen e.g. if the operator and the deployment pod which was the master are located on a worker which failed. Depending on how log it takes to re-schedule the operator the db would be not available ... so ... maybe it can be a mix, like checking if there is already the statefulset and take the info from there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why the instance is not set as the Controller (same for L483)? currently the controller won't reconcile if the service gets deleted.

I'm not sure I got this one. Isn't it what lines 509 and 483 supposed to do? When I do e.g. oc describe service openstack, I can see a label owner=mariadb-operator.

I guess we need it now for this PR as otherwise the controller will reconcile on the selector update and returns it to what it think it should be?

Hmmm you're correct we'd lose this label in this case.

With this, there is a scenario where we'd loose the a-p selector. If someone deletes the service (for whatever reason), first of all the controller currently won't reconcile and re-create it. So a user would have to restart the mariadb operator, or do some other change to trigger a reconcile, but in this scenario, we don't have the active pod selector as it is a new service and there won't be a change to the galera cluster and the script won't run. With this we are in the multi master config.

You're correct, the label won't be recreated automatically. Restarting a pod would be sufficient for galera of a cluster change and to recreate the right label, although that would be a manual action.

Another approach could be (not sure yet which one is better) that the script adds an annotation to the statefulset on which one is the master (I guess on a pod restart it will always run at least once?), the controller would then reconcile and can update the service. There is also a downside in this approach, the operator needs to run. If the mariadb-operator is down when there is a galera change, those would not be reflected properly. This could happen e.g. if the operator and the deployment pod which was the master are located on a worker which failed. Depending on how log it takes to re-schedule the operator the db would be not available ... so ... maybe it can be a mix, like checking if there is already the statefulset and take the info from there

So I'm against trying to duplicate the state in the CR's status because I don't want to ensure that the states are in sync. However we could handle this deletion case by selecting any one of the healthy pod as the current active endpoint. This wouldn't conflict with the script as it only performs conditional writes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why the instance is not set as the Controller (same for L483)? currently the controller won't reconcile if the service gets deleted.

I'm not sure I got this one. Isn't it what lines 509 and 483 supposed to do? When I do e.g. oc describe service openstack, I can see a label owner=mariadb-operator.

They are adding an owner, but not as the controller. Owns(&corev1.Service{}). in the NewControllerManagedBy() will only reconcile on the ones are also the controller, like in https://github.com/openstack-k8s-operators/mariadb-operator/blob/main/controllers/galera_controller.go#L447 .

I guess we need it now for this PR as otherwise the controller will reconcile on the selector update and returns it to what it think it should be?

Hmmm you're correct we'd lose this label in this case.

With this, there is a scenario where we'd loose the a-p selector. If someone deletes the service (for whatever reason), first of all the controller currently won't reconcile and re-create it. So a user would have to restart the mariadb operator, or do some other change to trigger a reconcile, but in this scenario, we don't have the active pod selector as it is a new service and there won't be a change to the galera cluster and the script won't run. With this we are in the multi master config.

You're correct, the label won't be recreated automatically. Restarting a pod would be sufficient for galera of a cluster change and to recreate the right label, although that would be a manual action.

right, restart one of the pods would also work.

Another approach could be (not sure yet which one is better) that the script adds an annotation to the statefulset on which one is the master (I guess on a pod restart it will always run at least once?), the controller would then reconcile and can update the service. There is also a downside in this approach, the operator needs to run. If the mariadb-operator is down when there is a galera change, those would not be reflected properly. This could happen e.g. if the operator and the deployment pod which was the master are located on a worker which failed. Depending on how log it takes to re-schedule the operator the db would be not available ... so ... maybe it can be a mix, like checking if there is already the statefulset and take the info from there

So I'm against trying to duplicate the state in the CR's status because I don't want to ensure that the states are in sync. However we could handle this deletion case by selecting any one of the healthy pod as the current active endpoint. This wouldn't conflict with the script as it only performs conditional writes.

No, I am not saying to store it in the CR status. Right now the script updates the service/endpoint. it could also add an annotation to the statefulset object with the information. And the controller here takes it from there instead of the service. since the statefulset remains in the mentioned scenario, the information would survive.

But we could handle this in a follow up. just wanted to bring up that there are situations where we may get back to the a-a config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why the instance is not set as the Controller (same for L483)? currently the controller won't reconcile if the service gets deleted.

I'm not sure I got this one. Isn't it what lines 509 and 483 supposed to do? When I do e.g. oc describe service openstack, I can see a label owner=mariadb-operator.

They are adding an owner, but not as the controller. Owns(&corev1.Service{}). in the NewControllerManagedBy() will only reconcile on the ones are also the controller, like in https://github.com/openstack-k8s-operators/mariadb-operator/blob/main/controllers/galera_controller.go#L447 .

Oh I see now... Thanks for pointing that out. I don't see why we shouldn't own it, I am going to update
the PR.

No, I am not saying to store it in the CR status. Right now the script updates the service/endpoint. it could also add an annotation to the statefulset object with the information. And the controller here takes it from there instead of the service. since the statefulset remains in the mentioned scenario, the information would survive.

But we could handle this in a follow up. just wanted to bring up that there are situations where we may get back to the a-a config.

I think it's sufficient to always set the first pod as the current active endpoint, as 1) if that information becomes incorrect, it is guaranteed to get updated by galera as soon as it detects that this galera node is unresponsive. 2) if it is a healthy pod, it can be used as the active endpoint anyway, and the script will still be able to perform failover next time it is needed.

@@ -800,6 +814,7 @@ func (r *GaleraReconciler) SetupWithManager(mgr ctrl.Manager) error {
Owns(&corev1.Endpoints{}).
Owns(&corev1.ConfigMap{}).
Owns(&corev1.ServiceAccount{}).
Owns(&corev1.Service{}).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already in L813

@dciabrin
Copy link
Contributor Author

/retest

@dciabrin dciabrin force-pushed the galera-a-p branch 2 times, most recently from c7af585 to 5fb18af Compare June 13, 2024 09:59
@dciabrin
Copy link
Contributor Author

As discussed with @stuggi, we now do SetControllerReference on all Service CRs to get notified of deleting and react appropriately.

The service created for a Galera cluster always balances traffic
to all available Galera replicas, which means the cluster is
effectively configured as an A/A service.

In order to limit the burden on clients of supporting A/A semantics,
change the way the service is configured to act as A/P instead.

A new callback script is introduced in pods. It is called by mysqld
automatically every time the Galera library detects a change in the
cluster (node join, crash, network partition...). This callback
script reconfigures the service CR via label selectors to drive
traffic to a single pod at a time.

Jira: OSPRH-7405
Copy link
Contributor

@stuggi stuggi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

openshift-ci bot commented Jun 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dciabrin, stuggi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 7d997dc into openstack-k8s-operators:main Jun 13, 2024
6 checks passed
gibizer added a commit to gibizer/nova-operator that referenced this pull request Jun 14, 2024
This reverts commit ab95f15.

The openstack-k8s-operators/mariadb-operator#229
switched the mariadb-operator to Active/Passive mode instead of
multimaster. This means we don't need to force synchronization from the
client config any more.
gibizer added a commit to gibizer/placement-operator that referenced this pull request Jun 14, 2024
This reverts commit a5b0bf4.

The openstack-k8s-operators/mariadb-operator#229
switched the mariadb-operator to Active/Passive mode instead of
multimaster. This means we don't need to force synchronization from the
client config any more.
openshift-merge-bot bot pushed a commit to openstack-k8s-operators/placement-operator that referenced this pull request Jun 14, 2024
This reverts commit a5b0bf4.

The openstack-k8s-operators/mariadb-operator#229
switched the mariadb-operator to Active/Passive mode instead of
multimaster. This means we don't need to force synchronization from the
client config any more.
openshift-merge-bot bot pushed a commit to openstack-k8s-operators/nova-operator that referenced this pull request Jun 17, 2024
This reverts commit ab95f15.

The openstack-k8s-operators/mariadb-operator#229
switched the mariadb-operator to Active/Passive mode instead of
multimaster. This means we don't need to force synchronization from the
client config any more.
stuggi added a commit to stuggi/glance-operator that referenced this pull request Jun 17, 2024
This reverts commit 980c318.

The openstack-k8s-operators/mariadb-operator#229
switched the mariadb-operator to deploy in Active/Passive mode
per default, instead of multimaster.
stuggi added a commit to stuggi/cinder-operator that referenced this pull request Jun 18, 2024
This reverts commit 67b2877.

The openstack-k8s-operators/mariadb-operator#229
switched the mariadb-operator to deploy in Active/Passive mode
per default, instead of multimaster.
Akrog added a commit to Akrog/openstack-k8s-operators-architecture that referenced this pull request Jun 18, 2024
The openstack-k8s-operators/mariadb-operator#229 switched the
mariadb-operator to deploy in Active/Passive mode per default, instead
of multimaster.

So we should no longer set `mysql_wsrep_sync_wait`.

This reverts commit cf4f3f6.
softwarefactory-project-zuul bot added a commit to openstack-k8s-operators/architecture that referenced this pull request Jun 18, 2024
Revert "Cinder wait for DB writes on reads"

The openstack-k8s-operators/mariadb-operator#229 switched the mariadb-operator to deploy in Active/Passive mode per default, instead of multimaster.
So we should no longer set mysql_wsrep_sync_wait.
Related PR: openstack-k8s-operators/cinder-operator#401
This reverts commit cf4f3f6.

Reviewed-by: John Fulton <[email protected]>
gthiemonge added a commit to gthiemonge/octavia-operator that referenced this pull request Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants