Bootstrapping is the process of (re)starting a Galera cluster.
Bootstrapping is only required when the cluster has lost quorum.
Quorum is lost when less than half of the nodes can communicate with each other (for longer than the configured grace period). In Galera terminology, if a node can communicate with the rest of the cluster, its DB is in a good state, and it reports itself as synced
.
If quorum has not been lost, individual unhealthy nodes should automatically rejoin the cluster once repaired (error resolved, node restarted, or connectivity restored).
-
All responsive nodes report the value of
wsrep_cluster_status
asnon-Primary
.mysql> SHOW STATUS LIKE 'wsrep_cluster_status'; +----------------------+-------------+ | Variable_name | Value | +----------------------+-------------+ | wsrep_cluster_status | non-Primary | +----------------------+-------------+
-
All responsive nodes respond with
ERROR 1047
when queried with most statement types.mysql> select * from mysql.user; ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use
See Cluster Behavior for more details about determining cluster state.
As part of cf-mysql-release v25, we provide an auto-bootstrap feature which runs as a BOSH errand. The errand evaluates if quorum has been lost on a cluster, and if so bootstraps the cluster. Before running the errand, one should ensure that there are no network partitions. Once network partitions have been resolved, the cluster is in a state where the errand can be run.
Run bosh run errand bootstrap
from the terminal. When done, this should successfully bootstrap the cluster, and all jobs should report as running
. Note that:
If the cluster was already healthy to begin with (i.e. quorum was never lost), the errand will error out saying bootstrap is not required
.
If one or more nodes are not reachable (i.e. the VM exists but in an unknown state), it will error out saying Error: could not reach node
. In this situation, follow the steps below:
bosh -n stop mysql_z1 && bosh -n stop mysql_z2 && bosh -n stop <arbitrator|mysql>_z3
bosh edit deployment
- Set
update.canaries
to 0,update.max_in_flight
to 3, andupdate.serial
to false. bosh deploy
- Note, if you get a 503 error (like
Sending stop request to monit: Request failed, response: Response{ StatusCode: 503, Status: '503 Service Unavailable' }
), it means that monit is still trying to stop the vms. Please wait a few minutes and try this step again.
bosh -n start mysql_z1 ; bosh -n start mysql_z2 ; bosh -n start <arbitrator|mysql>_z3
- This will throw several errors, but it ensures that all the jobs are present on the VM.
bosh instances
to verify that all jobs report as failing.- Try running the errand again using
bosh -n run errand bootstrap
as above.
- Once the errand succeeds, the cluster is synced, although some jobs might still report as failing.
bosh edit deployment
- Set
update.canaries
to 1,update.max_in_flight
to 1, andupdate.serial
to true. - Verify that deployment succeeds and all jobs are healthy. A healthy deployment should look like this:
$ bosh vms cf-mysql'
Acting as user 'admin' on deployment 'cf-mysql' on 'Bosh Lite Director'
| mysql_z1/0 | running | mysql_z1 | 10.244.7.2 |
| mysql_z2/0 | running | mysql_z2 | 10.244.8.2 |
| arbitrator_z3/0 | running | arbitrator_z3 | 10.244.9.6 |
...
If these steps did not work for you, please refer to the Manual Bootstrap Process below.
The bootstrap errand simply automates the steps in the manual bootstrapping process documented below. It finds the node with the highest transaction sequence number, and asks it to start up by itself (i.e. in bootstrap mode), then asks the remaining nodes to join the cluster.
The sequence number of a stopped node can be retained by either reading the node's state file under /var/vcap/store/mysql/grastate.dat
, or by running a mysqld command with a WSREP flag, like mysqld --wsrep-recover
.
The following steps are prone to user-error and can result in lost data if followed incorrectly. Please follow the Auto-bootstrap instructions above first, and only resort to the manual process if the errand fails to repair the cluster.
- SSH to each node in the cluster and, as root, shut down the mariadb process.
$ monit stop mariadb_ctrl
Re-bootstrapping the cluster will not be successful unless all other nodes have been shut down.
-
Choose a node to bootstrap.
Find the node with the highest transaction sequence number (seqno):
-
If a node shutdown gracefully, the seqno should be in the galera state file.
$ cat /var/vcap/store/mysql/grastate.dat | grep 'seqno:'
-
If the node crashed or was killed, the seqno in the galera state file should be
-1
. In this case, the seqno may be recoverable from the database. The following command will cause the database to start up, log the recovered sequence number, and then exit.$ /var/vcap/packages/mariadb/bin/mysqld --wsrep-recover
Scan the error log for the recovered sequence number (the last number after the group id (uuid) is the recovered seqno):
$ grep "Recovered position" /var/vcap/sys/log/mysql/mysql.err.log | tail -1 150225 18:09:42 mysqld_safe WSREP: Recovered position e93955c7-b797-11e4-9faa-9a6f0b73eb46:15
Note: The galera state file will still say
seqno: -1
afterward. -
If the node never connected to the cluster before crashing, it may not even have a group id (uuid in grastate.dat). In this case there's nothing to recover. Unless all nodes crashed this way, don't choose this node for bootstrapping.
Use the node with the highest
seqno
value as the new bootstrap node. If all nodes have the sameseqno
, you can choose any node as the new bootstrap node. -
Important: Only perform these bootstrap commands on the node with the highest seqno
. Otherwise the node with the highest seqno
will be unable to join the new cluster (unless its data is abandoned). Its mariadb process will exit with an error. See cluster behavior for more details on intentionally abandoning data.
- On the new bootstrap node, update state file and restart the mariadb process:
$ echo -n "NEEDS_BOOTSTRAP" > /var/vcap/store/mysql/state.txt
$ monit start mariadb_ctrl
You can check that the mariadb process has started successfully by running:
$ watch monit summary
It can take up to 10 minutes for monit to start the mariadb process.
- Once the bootstrapped node is running, start the mariadb process on the remaining nodes via monit.
$ monit start mariadb_ctrl
- Verify that the new nodes have successfully joined the cluster. The following command should output the total number of nodes in the cluster:
mysql> SHOW STATUS LIKE 'wsrep_cluster_size';