-
Notifications
You must be signed in to change notification settings - Fork 997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes current handling for replication lag in favor of setting lagging servers to SHUNNED state #3533
Conversation
…avior in favor of general server 'SHUNNING'
…ia 'SQLite3' server
1. Introduced new global variable: 'monitor_groupreplication_max_transaction_behind_for_read_only', that modifies the behavior of 'group_replication_lag'. 2. Improved logic making use of 'MyHGC_find' instead of directly searching 'MyHostGroups' structure. 3. Improved 'group_replication_lag' documentation with new implementation updates. 4. Introduced changes to 'update_group_replication_set_writer' preserving writters placed in 'OFFLINE_SOFT' state.
…on_set_server_status'
Retest this please. |
We need to merge this. |
@JavierJF @renecannao , I think there are not many scenarios for mgr with multiple master. For single master mode, there are more scenarios for using single master mode. And when the slave server is in a state where the delay exceeds the threshold, proxysql will immediately offline the slave server. I think this is inappropriate, because it will interrupt the business and cause the program to report an error. I submitted a fix PR, set it to OFFLINE_SOFT , and softly released the delay slave server. Please review PR: #3473. |
…p_replication' actions 'set_read_only/set_offline/set_writer'
…plication' update actions
Hi @bskllzh . Thank you for your feedback.
In fact, PR #3473 would conflict with what said previously: a server in
This is by design. Thinking about a possible solution, we could implement a mechanism in which a node is first configured as |
@renecannao, PR #3473 , i added a mgr_replication_lag_status(MGR replication lag flag, true lag, false not lag) parameter to distinguish whether the server was manually configured to the configuration state, or the state changed to OFFLINE_SOFT due to the delay of the mysql slave. |
When shunning a node due to replication lag in a group replication cluster, we first shun the node as MYSQL_SERVER_STATUS_SHUNNED , then we shun it as MYSQL_SERVER_STATUS_SHUNNED_REPLICATION_LAG . In this way we prevent (for a short time) to kill connections on that backend. This backing off from that server can give the server enough time to sync up. See discussion in comments in #3533
@bskllzh , thank you for pointing out the new flag. About your comment:
Please note that the enhancements in this PR are driven from the needs of a customer, that requires multi-writers, disable a node no matter if writer of reader (this is why we added a new variable to control this behavior), the ability to prevent configured |
@renecannao , PR dd71fcd, I think |
This pull request introduces several changes to how lag in 'Group Replication'
is handled.
Old behavior
Servers which lag is above the threshold determined by
'mysql-groupreplication_max_transactions_behind_count' and had read_only=1
were set 'OFFLINE' until replication catch up.
New behavior
Servers which lag is above the threshold determined by
'mysql-groupreplication_max_transactions_behind_count' are 'SHUNNED' depending
on the value of the new introduced variable:
'mysql-monitor_groupreplication_max_transaction_behind_for_read_only'.
This variable has three possible values:
In addition to this behavior regarding to actions when 'groupreplication_max_transactions_behind_count'
is exceeded by a server. Now it's also possible to set severs configured as writers
in 'OFFLINE_SOFT' state, while preserving the server in the 'writer_hostgroup'.
For achieve this behavior, simply set a server which is configured as a 'writer'
i.e. the server 'hostgroup_id' is the 'writer_hostgroup', and set it's state to
be 'OFFLINE_SOFT', after this, issue a 'LOAD MYSQL SERVERS TO RUNTIME'. The
server should be preserved in the writer hostgroup but it's status should change
to 'OFFLINE_SOFT'.
Situation description
We have 3 servers, '2' writers and '1' reader for a MySQL Group Replication
Cluster of 3 nodes, the servers are configure in ProxySQL in the following way:
Resulting in the following cluster state in 'runtime_mysql_servers' table in
ProxySQL:
Now we want to set the writer '127.2.1.2' to OFFLINE_SOFT, so we simply set it
via ProxySQL Admin:
And we load mysql_servers to runtime:
The
runtime_mysql_servers
table should transition to the following state:This change is performed, without afecting to any current transactions
behind executed in the server that have placed as 'OFFLINE_SOFT'. For making the
server operational again, it's required just to set it again to 'ONLINE' state.