- Install docker and docker-compose.
- Docker 19, Docker-compose 1.18, possibly be familiar with it. ( https://docs.docker.com/get-docker/ + https://docs.docker.com/compose/install/ )
- Could work with other versions.
- Clone this repository.
- Suggested while working, after building the first container.
docker-compose logs -f
Percona Server + Xtrabackup + Galera librairies.
- What to expect : Resiliency + read scalability ( to manage ! ) + flexibility and Zabbix high load capable. ( currently 35-40k nvps )
- What not to expect: Write scalability ( Data written everyhwere, slower node can slow the others )
- Usage of Docker here : lab purpose to make everyone able to follow during / after the workshop.
- First SQL-node to bootstrap the server
docker-compose config --services
docker-compose up -d --build db-sql-node-1
- Second SQL-node to join him
docker-compose up -d --build db-sql-node-2
- Third SQL-node to join him
docker-compose up -d --build db-sql-node-3
- Add the HAPROXY.
docker-compose up -d --build proxy
Check http://ip:9000/haproxy_stats
- Server 4.2 time
docker-compose up -d --build zabbix-server-42
- Frontend 4.2 now
docker-compose up -d --build zabbix-frontend-42
- Agent is optional here.
We should have the platform up! Check that you have the proper acces to the haproxy stat page http://ip:9000/haproxy_stats and the zabbix frontend http://ip:8081/
- Stop one node and check the failover.
docker-compose stop db-sql-node-1
- Restore the node.
CLUSTER_JOIN: 192.168.13.42,192.168.13.43
#CLUSTER_JOIN:
docker-compose up -d --build db-sql-node-1
- Go back to first node.
docker-compose restart proxy
- Why ? Slower node or specific select requests needs, avoid flow control, replication ON, no more write requests.
mysql --user=root --password=murloc --host=127.0.0.1 -P 3306 zabbix -e "SHOW variables LIKE 'wsrep_provider_options';"
mysql --user=root --password=murloc --host=127.0.0.1 -P 3306 zabbix -e "SHOW GLOBAL STATUS LIKE 'wsrep_%';"
- How ?
mysql --user=root --password=murloc --host=127.0.0.1 -P 3306 zabbix -e "SHOW GLOBAL STATUS LIKE 'wsrep_%';"
mysql --user=root --password=murloc --host=127.0.0.1 -P 23306 zabbix -e "set global wsrep_desync=ON;"
mysql --user=root --password=murloc --host=127.0.0.1 -P 23306 zabbix -e "SHOW variables LIKE 'wsrep_des%';"
mysql --user=root --password=murloc --host=127.0.0.1 -P 23306 zabbix -e "set global wsrep_desync=OFF;"
- Carefull to having no more synced nodes.
- Why this process vs direct server upgrade
Prepare for upgrade. Keep the service running : monitoring still available. Avoid upgrade failure. Pre-test the upgrade.
- Isolate one node
docker-compose stop db-sql-node-3
docker ps -a | grep node-3
docker inspect container-id | grep /var/lib/mysql -B 1
vim /var/lib/docker/volumes/e6409284da5abe2436dd72d4fcc44d59d8deffdd6f0da0b6b535a1781f06ffa0/_data/grastate.dat
safe to bootstrap to 1
Change the cluster name to CLUSTER_NAME: 'zabbix-db-cluster-52'
empty cluster join
- Bootstrap a new cluster
docker-compose up -d --build db-sql-node-3
- Start a new server
docker-compose up -d --build zabbix-server-52
- Upgrade the isolated node
Automated step at the start of the server.
- Check that it's working fine
Edit DB host for frontend 52
docker-compose up -d --build zabbix-frontend-52
Check the frontend 52 and find back your data
- What is next is your choice :
Make another node join the new cluster ? ( consider clearing the data first for fresh configuration ) Use another Haproxy and failover to the second server so your agents will communicate with the new one ? In our use habits, we just use the second server to upgrade the DB, then upgrade our usual server + point to the fresh upgraded DB and start making the other nodes join.
Also : consider keeping a node with the previous DB version for a while, would help doing a rollback if needed.
- ProxySQL could nice features instead of Haproxy ( not tested for now ) : detection of nodes state, point writes to synced or specific node, balande the read requests, manage the failover better. Not tested for production in our case but we would.
- Bufferpool + innodblogfile = eat as much RAM as possible but for good reasons.
- Possible upgrade for the future : timescale DB when the multi-node version is out ?
- Look at your queues !
- For now how it is going in production : All Fine for 35-40k nvps with 32 core / 256 GB of ram and io1 10k iops volumes for now and for a while.