Skip to content

Commit

Permalink
BUG#19579811: SET GTID_PURGED DOES NOT STOP WAIT_FOR_EXECUTED_GTID_SET
Browse files Browse the repository at this point in the history
Background:

WAIT_FOR_EXECUTED_GTID_SET waits until a specified set of GTIDs is
included in GTID_EXECUTED. SET GTID_PURGED adds GTIDs to
GTID_EXECUTED. RESET MASTER clears GTID_EXECUTED.

There were multiple issues:

 1. Problem:

    The change in GTID_EXECUTED implied by SET GTID_PURGED did
    not cause WAIT_FOR_EXECUTED_GTID_SET to stop waiting.

    Analysis:

    WAIT_FOR_EXECUTED_GTID_SET waits for a signal to be sent.
    But SET GTID_PURGED never sent the signal.

    Fix:

    Make GTID_PURGED send the signal.

    Changes:
    - sql/rpl_gtid_state.cc:Gtid_state::add_lost_gtids
    - sql/rpl_gtid_state.cc: removal of #ifdef HAVE_GTID_NEXT_LIST
    - sql/rpl_gtid.h: removal of #ifdef HAVE_GTID_NEXT_LIST

 2. Problem:

    There was a race condition where WAIT_FOR_EXECUTED_GTID_SET
    could miss the signal from a commit and go into an infinite
    wait even if GTID_EXECUTED contains all the waited-for GTIDs.

    Analysis:

    In the bug, WAIT_FOR_EXECUTED_GTID_SET took a lock while
    taking a copy of the global state. Then it released the lock,
    analyzed the copy of the global state, and decided whether it
    should wait.  But if the GTID to wait for was committed after
    the lock was released, WAIT_FOR_EXECUTED_GTID_SET would miss
    the signal and go to an infinite wait even if GTID_EXECUTED
    contains all the waited-for GTIDs.

    Fix:

    Refactor the code so that it holds the lock all the way from
    before it reads the global state until it goes to the wait.

    Changes:

    - sql/rpl_gtid_state.cc:Gtid_state::wait_for_gtid_set:
      Most of the changes in this function are to fix this bug.

    Note:

    When the bug existed, it was possible to create a test case
    for this by placing a debug sync point in the section where
    it does not hold the lock.  However, after the bug has been
    fixed this section does not exist, so there is no way to test
    it deterministically.  The bug would also cause the test to
    fail rarely, so a way to test this is to run the test case
    1000 times.

 3. Problem:

    The function would take global_sid_lock.wrlock every time it has
    to wait, and while holding it takes a copy of the entire
    gtid_executed (which implies allocating memory).  This is not very
    optimal: it may process the entire set each time it waits, and it
    may wait once for each member of the set, so in the worst case it
    is O(N^2) where N is the size of the set.

    Fix:

    This is fixed by the same refactoring that fixes problem #2.  In
    particular, it does not re-process the entire Gtid_set for each
    committed transaction. It only removes all intervals of
    gtid_executed for the current sidno from the remainder of the
    wait-for-set.

    Changes:
    - sql/rpl_gtid_set.cc: Add function remove_intervals_for_sidno.
    - sql/rpl_gtid_state.cc: Use remove_intervals_for_sidno and remove
      only intervals for the current sidno. Remove intervals
      incrementally in the innermost while loop, rather than recompute
      the entire set each iteration.

 4. Problem:

    If the client that executes WAIT_FOR_EXECUTED_GTID_SET owns a
    GTID that is included in the set, then there is no chance for
    another thread to commit it, so it will wait forever.  In
    effect, it deadlocks with itself.

    Fix:

    Detect the situation and generate an error.

    Changes:
    - sql/share/errmsg-utf8.txt: new error code
      ER_CANT_WAIT_FOR_EXECUTED_GTID_SET_WHILE_OWNING_A_GTID
    - sql/item_func.cc: check the condition and generate the new error

 5. Various simplfications.

    - sql/item_func.cc:Item_wait_for_executed_gtid_set::val_int:
      - Pointless to set null_value when generating an error.
      - add DBUG_ENTER
      - Improve the prototype for Gtid_state::wait_for_gtid_set so
        that it takes a Gtid_set instead of a string, and also so that
        it requires global_sid_lock.
    - sql/rpl_gtid.h:Mutex_cond_array
      - combine wait functions into one and make it return bool
      - improve some comments
    - sql/rpl_gtid_set.cc:Gtid_set::remove_gno_intervals:
      - Optimize so that it returns early if this set becomes empty

@mysql-test/extra/rpl_tests/rpl_wait_for_executed_gtid_set.inc
- Move all wait_for_executed_gtid_set tests into
  mysql-test/suite/rpl/t/rpl_wait_for_executed_gtid_set.test

@mysql-test/include/kill_wait_for_executed_gtid_set.inc
@mysql-test/include/wait_for_wait_for_executed_gtid_set.inc
- New auxiliary scripts.

@mysql-test/include/rpl_init.inc
- Document undocumented side effect.

@mysql-test/suite/rpl/r/rpl_wait_for_executed_gtid_set.result
- Update result file.

@mysql-test/suite/rpl/t/rpl_wait_for_executed_gtid_set.test
- Rewrote the test to improve coverage and cover all parts of this bug.

@sql/item_func.cc
- Add DBUG_ENTER
- No point in setting null_value when generating an error.
- Do the decoding from text to Gtid_set here rather than in Gtid_state.
- Check for the new error
  ER_CANT_WAIT_FOR_EXECUTED_GTID_SET_WHILE_OWNING_A_GTID

@sql/rpl_gtid.h
- Simplify the Mutex_cond_array::wait functions in the following ways:
  - Make them one function since they share most code. This also allows
    calling the three-argument function with NULL as the last
    parameter, which simplifies the caller.
  - Make it return bool rather than 0/ETIME/ETIMEOUT, to make it more
    easy to use.
- Make is_thd_killed private.
- Add prototype for new Gtid_set::remove_intervals_for_sidno.
- Add prototype for Gtid_state::wait_for_sidno.
- Un-ifdef-out lock_sidnos/unlock_sidnos/broadcast_sidnos since we now
  need them.
- Make wait_for_gtid_set return bool.

@sql/rpl_gtid_mutex_cond_array.cc
- Remove the now unused check_thd_killed.

@sql/rpl_gtid_set.cc
- Optimize Gtid_set::remove_gno_intervals, so that it returns early
  if the Interval list becomes empty.
- Add Gtid_set::remove_intervals_for_sidno. This is just a wrapper
  around the already existing private member function
  Gtid_set::remove_gno_intervals.

@sql/rpl_gtid_state.cc
- Rewrite wait_for_gtid_set to fix problems 2 and 3. See code
  comment for details.
- Factor out wait_for_sidno from wait_for_gtid.
- Enable broadcast_sidnos/lock_sidnos/unlock_sidnos, which were ifdef'ed out.
- Call broadcast_sidnos after updating the state, to fix issue #1.

@sql/share/errmsg-utf8.txt
- Add error message used to fix issue #4.
  • Loading branch information
Sven Sandberg committed Sep 7, 2015
1 parent d658107 commit 95a156f
Show file tree
Hide file tree
Showing 13 changed files with 1,635 additions and 287 deletions.
100 changes: 0 additions & 100 deletions mysql-test/extra/rpl_tests/rpl_wait_for_executed_gtid_set.inc

This file was deleted.

42 changes: 42 additions & 0 deletions mysql-test/include/kill_wait_for_executed_gtid_set.inc
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# ==== Purpose ====
#
# Kill an ongoing WAIT_FOR_EXECUTED_GTID_SET query on a specified connection.
#
# ==== Usage ====
#
# --let $thread_id= THREAD_ID
# --let $rpl_connection_name= CONNECTION
# --source include/kill_wait_for_executed_gtid_set.inc
#
# Parameters:
# $thread_id
# Thread_id of the ongoing WAIT_FOR_EXECUTED_GTID_SET
#
# $rpl_connection_name
# The connection where WAIT_FOR_EXECUTED_GTID_SET is executing.
# The script will execute 'reap' on this connection.

--let $include_filename= kill_wait_for_executed_gtid_set.inc
--source include/begin_include_file.inc

if ($rpl_connection_name == '') {
--die !!!ERROR IN TEST: Set $rpl_connection_name before you source kill_wait_for_executed_gtid_set.inc
}

# Kill
--disable_query_log
--disable_result_log
eval KILL QUERY $thread_id;

# Wait for query to stop executing query.
--let $wait_condition= SELECT COUNT(*) = 0 FROM performance_schema.threads WHERE PROCESSLIST_ID = $thread_id AND PROCESSLIST_INFO LIKE '%WAIT_FOR_EXECUTED_GTID_SET%'
--source include/wait_condition.inc

# Reap killed query
--source include/rpl_connection.inc
--error ER_QUERY_INTERRUPTED
--reap


--let $include_filename= kill_wait_for_executed_gtid_set.inc
--source include/end_include_file.inc
1 change: 1 addition & 0 deletions mysql-test/include/rpl_init.inc
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
# - Sets the following mtr variables for convenience:
# $gtid_mode: the value of @@global.gtid_mode on server 1
# $gtid_mode_on: 1 if gtid_mode=on, 0 if gtid_mode=off
# $binlog_format: The value of @@global.binlog_format.
# $server_1_datadir, ..., $server_10_datadir: the @@datadir on each server.
# $server_1_uuid, ..., $server_10_uuid: the @@server_uuid on each server.
#
Expand Down
19 changes: 19 additions & 0 deletions mysql-test/include/wait_for_wait_for_executed_gtid_set.inc
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# ==== Purpose ====
#
# Wait until WAIT_FOR_EXECUTED_GTID_SET goes to the wait state.
#
# ==== Usage ====
#
# --let $thread_id= THREAD_ID
# --source include/wait_for_wait_for_executed_gtid_set.inc

--let $include_filename= wait_for_wait_for_executed_gtid_set.inc
--source include/begin_include_file.inc


--let $wait_condition= SELECT COUNT(*) = 1 FROM performance_schema.threads WHERE PROCESSLIST_ID = $thread_id AND PROCESSLIST_STATE = 'Waiting for GTID to be committed'
--source include/wait_condition.inc


--let $include_filename= wait_for_wait_for_executed_gtid_set.inc
--source include/end_include_file.inc
Loading

0 comments on commit 95a156f

Please sign in to comment.