Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 5.6.22-72.0 #3

Merged
merged 2 commits into from
Feb 20, 2015
Merged

Conversation

tplavcic
Copy link
Member

Release branch for version 5.6.22-72.0
I have now created annotated tag for this release, please check if it will be merged since we had some problem with lightweight tag when trying for PXB.

laurynas-biveinis added a commit that referenced this pull request Feb 20, 2015
@laurynas-biveinis laurynas-biveinis merged commit a2213fb into percona:5.6 Feb 20, 2015
laurynas-biveinis referenced this pull request in laurynas-biveinis/percona-server Feb 24, 2015
Release 5.6.22-72.0
(cherry picked from commit a2213fb)

Conflicts:
	VERSION
@tplavcic tplavcic deleted the release-5.6.22-72.0 branch March 9, 2015 15:04
akopytov added a commit to akopytov/percona-server that referenced this pull request Jul 28, 2015
https://blueprints.launchpad.net/percona-server/+spec/backup-safe-binlog-info

One inefficiency of the backup locks feature is that even though LOCK
TABLES FOR BACKUP as a light-weight FTWRL alternative does not affect
DML statements updating InnoDB tables, LOCK BINLOG FOR BACKUP does
affect them by blocking commits.

XtraBackup uses LOCK BINLOG FOR BACKUP to:

1. retrieve consistent binary log coordinates with SHOW MASTER
   STATUS. More precisely, binary log coordinates must be consistent
   with the REDO log copy and non-transactional tables. Therefore, no
   updates can be done to non-transactional tables (this is achieved by
   an active LOCK TABLES FOR BACKUP lock), and no commits can be
   performed between SHOW MASTER STATUS and finalizing the redo log
   copy, which is achieved by LOCK BINLOG FOR BACKUP.

2. retrieve consistent master connection information for a replication
   slave. More precisely, the binary log coordinates on the master as
   reported by SHOW SLAVE STATUS must be consistent with the REDO log
   copy, so LOCK BINLOG FOR BACKUP also block the I/O replication thread.

3. For a GTID-enabled PXC node, the last binary log file must be
   included into an SST snapshot. Which is a rather artificial
   limitation on the WSREP side, but still XtraBackup obeys it by
   blocking commits with LOCK BINLOG FOR BACKUP to ensure the integrity
   of the binary log file copy.

Depending on the write rate on the server, finalizing the REDO log copy
may take a long time, so blocking commits for that duration may still
affect server availability considerably.

This task is to make the necessary server-side change to make it
possible for XtraBackup to avoid LOCK BINLOG FOR BINLOG in case percona#1, when
cases percona#2 and percona#3 do not apply, i.e. when no --slave-info is requested by
the XtraBackup options and the server is not a GTID-enabled PXC node.

Lifting limitations for cases percona#2 and percona#3 is also possible, but that is
outside the scope of this task.

The idea of the optimization is that even though InnoDB provides a
transactional storage for the binary log information (i.e. current file
name and offset), it cannot be fully trusted by XtraBackup, because that
information is only updated on an InnoDB commit operation. Which means
if the last operation before LOCK TABLES FOR BACKUP was an update to a
non-transactional storage engine, and no InnoDB commits occur before the
backup is finalized by XtraBackup, the InnoDB system header will contain
stale binary log coordinates.

One way to fix that would be to force binlog coordinates update in the
InnoDB system header on each update, regardless of the involved storage
engine(s). This is what a Galera node does to ensure XID consistency
which is stored in the same way as binary log coordinates: it forces XID
update in the InnoDB system header on each TOI operation, in particular
on each non-transactional update.

Another approach is less invasive: XtraBackup blocks all
non-transactional updates with LOCK TABLES FOR BACKUP anyway, so instead
of having all non-transactional updates flush binlog coordinates to
InnoDB unconditionally, LTFB can be modified to flush (and redo-log) the
current binlog coordinates to InnoDB. In which case binlog coordinates
provided by InnoDB will be consistent with REDO log under any
circumstances.

This patch implements the latter approach.
laurynas-biveinis referenced this pull request in laurynas-biveinis/percona-server Feb 26, 2016
Problem:

The binary log group commit sync is failing when committing a group of
transactions into a non-transactional storage engine while other thread
is rotating the binary log.

Analysis:

The binary log group commit procedure (ordered_commit) acquires LOCK_log
during the #1 stage (flush). As it holds the LOCK_log, a binary log
rotation will have to wait until this flush stage to finish before
actually rotating the binary log.

For the #2 stage (sync), the binary log group commit only holds the
LOCK_log if sync_binlog=1. In this case, the rotation has to wait also
for the sync stage to finish.

When sync_binlog>1, the sync stage releases the LOCK_log (to let other
groups to enter the flush stage), holding only the LOCK_sync. In this
case, the rotation can acquire the LOCK_log in parallel with the sync
stage.

For commits into transactional storage engine, the binary log rotation
checks a counter of "flushed but not yet committed" transactions,
waiting until this counter to be zeroed before closing the current
binary log file.  As the commit of the transactions happen in the #3
stage of the binary log group commit, the sync of the binary log in
stage #2 always succeed.

For commits into non-transactional storage engine, the binary log
rotation is checking the "flushed but not yet committed" transactions
counter, but it is zero because it only counts transactions that
contains XIDs. So, the rotation is allowed to take place in parallel
with the #2 stage of the binary log group commit. When the sync is
called at the same time that the rotation has closed the old binary log
file but didn't open the new file yet, the sync is failing with the
following error: 'Can't sync file 'UNOPENED' to disk (Errcode: 9 - Bad
file descriptor)'.

Fix:

For non-transactional only workload, binary log group commit will keep
the LOCK_log when entering #2 stage (sync) if the current group is
supposed to be synced to the binary log file.
george-lorch pushed a commit to george-lorch/percona-server that referenced this pull request May 7, 2016
george-lorch pushed a commit to george-lorch/percona-server that referenced this pull request May 9, 2016
laurynas-biveinis referenced this pull request in laurynas-biveinis/percona-server Jul 19, 2016
…hutdown)

On several testcases (i.e. rpl_gtid_mode), LeakSanitizer diagnoses
missed memory deallocation:

=================================================================
==16675==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 21 byte(s) in 1 object(s) allocated from:
    #0 0x7f17748fa54a in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x9854a)
    #1 0xff7f7f in my_malloc /mnt/workspace/percona-server-5.6-asan-param/BUILD_TYPE/debug-asan/Host/ubuntu-xenial-64bit/mysys/my_malloc.c:38
    #2 0x1634b83 in add_pfs_instr_to_array(char const*, char const*) /mnt/workspace/percona-server-5.6-asan-param/BUILD_TYPE/debug-asan/Host/ubuntu-xenial-64bit/storage/perfschema/pfs_server.cc:251
    #3 0x58cccf in mysqld_get_one_option /mnt/workspace/percona-server-5.6-asan-param/BUILD_TYPE/debug-asan/Host/ubuntu-xenial-64bit/sql/mysqld.cc:9198
    #4 0x10256c6 in my_handle_options /mnt/workspace/percona-server-5.6-asan-param/BUILD_TYPE/debug-asan/Host/ubuntu-xenial-64bit/mysys_ssl/my_getopt.cc:817
    #5 0x1025c63 in handle_options /mnt/workspace/percona-server-5.6-asan-param/BUILD_TYPE/debug-asan/Host/ubuntu-xenial-64bit/mysys_ssl/my_getopt.cc:308
    #6 0x5963e5 in handle_early_options() /mnt/workspace/percona-server-5.6-asan-param/BUILD_TYPE/debug-asan/Host/ubuntu-xenial-64bit/sql/mysqld.cc:7263
    #7 0x5a35a3 in mysqld_main(int, char**) /mnt/workspace/percona-server-5.6-asan-param/BUILD_TYPE/debug-asan/Host/ubuntu-xenial-64bit/sql/mysqld.cc:5613
    #8 0x586aae in main /mnt/workspace/percona-server-5.6-asan-param/BUILD_TYPE/debug-asan/Host/ubuntu-xenial-64bit/sql/main.cc:25
    #9 0x7f17726cc82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)

This class of errors is already attempted to suppress in
valgrind.supp. But these suppressions have been added to work around a
bug of racy PFS shutdown, which is not required anymore as
pfs_instr_config_array is deallocated exactly once since [1]. Thus,
free the elements of this array and remove related suppressions
instead.
percona-ysorokin referenced this pull request in percona-ysorokin/percona-server Aug 22, 2017
In WL-included builds ASAN run witnessed missed ~Query_log_event invocation.
The destruct-or was not called due to the WL's changes in the error propagation
that specifically affect LC MTS.
The failure is exposed in particular by rpl_trigger as the following
stack:

  #0 0x9ecd98 in __interceptor_malloc (/export/home/pb2/test/sb_2-22611026-1489061390.32/mysql-commercial-8.0.1-dmr-linux-x86_64-asan/bin/mysqld+0x9ecd98)
  #1 0x2b1a245 in my_raw_malloc(unsigned long, int) obj/mysys/../../mysqlcom-pro-8.0.1-dmr/mysys/my_malloc.cc:209:12
  #2 0x2b1a245 in my_malloc obj/mysys/../../mysqlcom-pro-8.0.1-dmr/mysys/my_malloc.cc:72
  #3 0x2940590 in Query_log_event::Query_log_event(char const*, unsigned int, binary_log::Format_description_event const*, binary_log::Log_event_type) obj/sql/../../mysqlcom-pro-8.0.1-dmr/sql/log_event.cc:4343:46
  #4 0x293d235 in Log_event::read_log_event(char const*, unsigned int, char const**, Format_description_log_event const*, bool) obj/sql/../../mysqlcom-pro-8.0.1-dmr/sql/log_event.cc:1686:17
  #5 0x293b96f in Log_event::read_log_event()
  #6 0x2a2a1c9 in next_event(Relay_log_info*)

Previously before the WL
Mts_submode_logical_clock::wait_for_workers_to_finish() had not
returned any error even when Coordinator thread is killed.

The WL patch needed to refine such behavior, but at doing so
it also had to attend log_event.cc::schedule_next_event() to register
an error to follow an existing pattern.
While my_error() does not take place the killed Coordinator continued
scheduling, ineffectively though - no Worker gets engaged (legal case
of deferred scheduling), and without noticing its killed status up to
a point when it resets the event pointer in
apply_event_and_update_pos():

  *ptr_ev= NULL; // announcing the event is passed to w-worker

The reset was intended for an assigned Worker to perform the event
destruction or by Coordinator itself when the event is deferred.
As neither is the current case the event gets unattended for its termination.

In contrast in the pre-WL sources the killed Coordinator does find a Worker.
However such Worker could be already down (errored out and exited), in
which case apply_event_and_update_pos() reasonably returns an error and executes

  delete ev

in exec_relay_log_event() error branch.

**Fixed** with deploying my_error() call in log_event.cc::schedule_next_event()
error branch which fits to the existing pattern.
THD::is_error() has been always checked by Coordinator before any attempt to
reset *ptr_ev= NULL. In the errored case Coordinator does not reset and
destroys the event itself in the exec_relay_log_event() error branch pretty similarly to
how the pre-WL sources do.

Tested against rpl_trigger and rpl suites to pass.

Approved on rb#15667.
percona-ysorokin referenced this pull request in percona-ysorokin/percona-server Aug 22, 2017
Some character sets are designated as MY_CS_STRNXFRM, meaning that sorting
needs to go through my_strnxfrm() (implemented by the charset), and some are
not, meaning that a client can do the strnxfrm itself based on
cs->sort_order. However, most of the logic related to the latter has been
removed already (e.g. filesort always uses my_strnxfrm() since 2003), and now
it's mostly in the way. The three main uses left are:

 1. A microoptimization for constructing sort keys in filesort.
 2. A home-grown implementation of Boyer-Moore for accelerating certain
    LIKE patterns that should probably be handled through FTS.
 3. Some optimizations to MyISAM prefix keys.

Given that our default collation (utf8mb4_0900_ai_ci) now is a strnxfrm-based
collation, the benefits of keeping these around for a narrow range of
single-byte locales (like latin1_swedish_ci, cp850 and a bunch of more
obscure locales) seems dubious. We seemingly can't remove the flag entirely
due to #3 seemingly affecting the on-disk MyISAM structure, but we can remove
the code for #1 and #2.

Change-Id: If974e490d451b7278355e33ab1fca993f446b792
percona-ysorokin referenced this pull request in percona-ysorokin/percona-server Aug 22, 2017
BohuTANG referenced this pull request in xelabs/tokudb Dec 24, 2017
Summary:
In the xa transation 'XA END' phase(thd_sql_command is SQLCOM_END), TokuDB slave will create both transaction for trx->sp_level and trx->stmt, this will cause the toku_xids_can_create_child abort since the trx->sp_level->xids is 0x00.

How to reproduce:
With tokudb_debug=32, do the queries on master:
create table t1(a int)engine=tokudb;

xa start 'x1';
insert into t1 values(1);
xa end 'x1';
xa prepare 'x1';
xa commit 'x1';

xa start 'x2';
insert into t1 values(2);
xa end 'x2';
xa prepare 'x2';
xa commit 'x2';

Slave debug info:
xa start 'x1';
insert into t1 values(1);
xa end 'x1';
xa prepare 'x1';
xa commit 'x1';
2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6533 ha_tokudb::external_lock trx (nil) (nil) (nil) (nil) 0 0
2123 /u01/tokudb/storage/tokudb/tokudb_txn.h:127 txn_begin begin txn (nil) 0x7ff2d44a3000 67108864 r=0
2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6426 ha_tokudb::create_txn created master 0x7ff2d44a3000
2123 /u01/tokudb/storage/tokudb/tokudb_txn.h:127 txn_begin begin txn 0x7ff2d44a3000 0x7ff2d44a3100 1 r=0
2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6468 ha_tokudb::create_txn created stmt 0x7ff2d44a3000 sp_level 0x7ff2d44a3100
2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:4120 ha_tokudb::write_row txn 0x7ff2d44a3100
2123 /u01/tokudb/storage/tokudb/hatoku_hton.cc:942 tokudb_commit commit trx 0 txn 0x7ff2d44a3100 syncflag 512

xa start 'x2';
insert into t1 values(2);
xa end 'x2';
xa prepare 'x2';
xa commit 'x2';
2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6533 ha_tokudb::external_lock trx 0x7ff2d44a3000 (nil) 0x7ff2d44a3000 (nil) 0 0
2123 /u01/tokudb/storage/tokudb/tokudb_txn.h:127 txn_begin begin txn 0x7ff2d44a3000 0x7ff2d44a3000 1 r=0
2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6468 ha_tokudb::create_txn created stmt 0x7ff2d44a3000 sp_level 0x7ff2d44a3000
2123 0x7ff2d44c5830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:4120 ha_tokudb::write_row txn 0x7ff2d44a3000
2017-12-24T08:36:45.347405Z 11 [ERROR] TokuDB: toku_db_put: Transaction cannot do work when child exists

2017-12-24T08:36:45.347444Z 11 [Warning] Slave: Got error 22 from storage engine Error_code: 1030
2017-12-24T08:36:45.347448Z 11 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with 'SLAVE START'. We stopped at log 'mysql-bin.000001' position 1007
2123 /u01/tokudb/storage/tokudb/hatoku_hton.cc:972 tokudb_rollback rollback 0 txn 0x7ff2d44a3000
Segmentation fault (core dumped)

This crash caused by the parent->xid is 0x00.
The core statck info:
(gdb) bt
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:62
#1  0x0000000000f6b647 in my_write_core (sig=sig@entry=11) at /u01/tokudb/mysys/stacktrace.c:249
#2  0x000000000086b945 in handle_fatal_signal (sig=11) at /u01/tokudb/sql/signal_handler.cc:223
#3  <signal handler called>
#4  toku_xids_can_create_child (xids=0x0) at /u01/tokudb/storage/tokudb/PerconaFT/ft/txn/xids.cc:93
#5  0x000000000080531f in toku_txn_begin_with_xid (parent=0x7f0bf501c280, txnp=0x7f0bf50a3490, logger=0x7f0c415e66c0, xid=..., snapshot_type=TXN_SNAPSHOT_CHILD, container_db_txn=0x7f0bf50a3400, for_recovery=false, read_only=false) at /u01/tokudb/storage/tokudb/PerconaFT/ft/txn/txn.cc:137
#6  0x00000000007aa6a2 in toku_txn_begin (env=0x7f0c819fde00, stxn=0x7f0bf50a3300, txn=0x7f0bf500dca8, flags=<optimized out>) at /u01/tokudb/storage/tokudb/PerconaFT/src/ydb_txn.cc:579
#7  0x0000000000f99323 in txn_begin (thd=0x7f0bf504bfc0, flags=1, txn=0x7f0bf500dca8, parent=0x7f0bf50a3300, env=<optimized out>) at /u01/tokudb/storage/tokudb/tokudb_txn.h:116
#8  ha_tokudb::create_txn (this=0x7f0bf50c8830, thd=0x7f0bf504bfc0, trx=0x7f0bf500dca0) at /u01/tokudb/storage/tokudb/ha_tokudb.cc:6458
#9  0x0000000000fa48f9 in ha_tokudb::external_lock (this=0x7f0bf50c8830, thd=0x7f0bf504bfc0, lock_type=1) at /u01/tokudb/storage/tokudb/ha_tokudb.cc:6544
#10 0x00000000008d46eb in handler::ha_external_lock (this=0x7f0bf50c8830, thd=thd@entry=0x7f0bf504bfc0, lock_type=lock_type@entry=1) at /u01/tokudb/sql/handler.cc:8352
#11 0x0000000000e4f3b4 in lock_external (count=1, tables=0x7f0bf5050688, thd=0x7f0bf504bfc0) at /u01/tokudb/sql/lock.cc:389
#12 mysql_lock_tables (thd=thd@entry=0x7f0bf504bfc0, tables=<optimized out>, count=<optimized out>, flags=0) at /u01/tokudb/sql/lock.cc:325
#13 0x0000000000cd0b6d in lock_tables (thd=thd@entry=0x7f0bf504bfc0, tables=0x7f0bf4d11020, count=<optimized out>, flags=flags@entry=0) at /u01/tokudb/sql/sql_base.cc:6705
#14 0x0000000000cd61f2 in open_and_lock_tables (thd=0x7f0bf504bfc0, tables=0x7f0bf4d11020, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x7f0c89629680) at /u01/tokudb/sql/sql_base.cc:6523
percona#15 0x0000000000ee09eb in open_and_lock_tables (flags=0, tables=<optimized out>, thd=<optimized out>) at /u01/tokudb/sql/sql_base.h:484
percona#16 Rows_log_event::do_apply_event (this=0x7f0bf50ab4a0, rli=0x7f0c87762800) at /u01/tokudb/sql/log_event.cc:10911
percona#17 0x0000000000ed71c0 in Log_event::apply_event (this=this@entry=0x7f0bf50ab4a0, rli=rli@entry=0x7f0c87762800) at /u01/tokudb/sql/log_event.cc:3329
percona#18 0x0000000000f1d233 in apply_event_and_update_pos (ptr_ev=ptr_ev@entry=0x7f0c89629940, thd=thd@entry=0x7f0bf504bfc0, rli=rli@entry=0x7f0c87762800) at /u01/tokudb/sql/rpl_slave.cc:4761
percona#19 0x0000000000f280a8 in exec_relay_log_event (rli=0x7f0c87762800, thd=0x7f0bf504bfc0) at /u01/tokudb/sql/rpl_slave.cc:5276
percona#20 handle_slave_sql (arg=<optimized out>) at /u01/tokudb/sql/rpl_slave.cc:7491
percona#21 0x00000000013c6184 in pfs_spawn_thread (arg=0x7f0bf5bea820) at /u01/tokudb/storage/perfschema/pfs.cc:2185
percona#22 0x00007f0c885126ba in start_thread (arg=0x7f0c8962a700) at pthread_create.c:333
percona#23 0x00007f0c87d293dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) f 10
#10 0x00000000008d46eb in handler::ha_external_lock (this=0x7f0bf50c8830, thd=thd@entry=0x7f0bf504bfc0, lock_type=lock_type@entry=1) at /u01/tokudb/sql/handler.cc:8352
8352    /u01/tokudb/sql/handler.cc: No such file or directory.
(gdb) p thd->lex->sql_command
 = SQLCOM_END

With the fixed patch, the debug info is:
xa start 'x1';
insert into t1 values(1);
xa end 'x1';
xa prepare 'x1';
xa commit 'x1';
24111 0x7f4aba6c4830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6534 ha_tokudb::external_lock trx (nil) (nil) (nil) (nil) 0 0
24111 /u01/tokudb/storage/tokudb/tokudb_txn.h:127 txn_begin begin txn (nil) 0x7f4aba689000 67108864 r=0
24111 0x7f4aba6c4830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6469 ha_tokudb::create_txn created stmt (nil) sp_level 0x7f4aba689000
24111 0x7f4aba6c4830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:4120 ha_tokudb::write_row txn 0x7f4aba689000
24111 /u01/tokudb/storage/tokudb/hatoku_hton.cc:942 tokudb_commit commit trx 0 txn 0x7f4aba689000 syncflag 512

xa start 'x2';
insert into t1 values(2);
xa end 'x2';
xa prepare 'x2';
xa commit 'x2';
24111 0x7f4aba6c4830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6534 ha_tokudb::external_lock trx (nil) (nil) (nil) (nil) 0 0
24111 /u01/tokudb/storage/tokudb/tokudb_txn.h:127 txn_begin begin txn (nil) 0x7f4aba689000 67108864 r=0
24111 0x7f4aba6c4830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:6469 ha_tokudb::create_txn created stmt (nil) sp_level 0x7f4aba689000
24111 0x7f4aba6c4830 /u01/tokudb/storage/tokudb/ha_tokudb.cc:4120 ha_tokudb::write_row txn 0x7f4aba689000
24111 /u01/tokudb/storage/tokudb/hatoku_hton.cc:942 tokudb_commit commit trx 0 txn 0x7f4aba689000 syncflag 512

Test:
mtr --suite=tokudb xa

Reviewed by: Rik
prohaska7 referenced this pull request in xelabs/tokudb Jan 5, 2018
…ecuting global constructors (before main gets called). the assert catches global mutex initialization which seems to be problematic. since tokudb has a global mutex (open_tables_mutex) and tokudb is statically linked into mysqld (plugin type mandatory), the assert fires.

we could:
1. remove the assert, or
2. rewrite tokudb to remove the open_tables_mutex, or
3. compile tokudb into a shared library.

this commit compiles tokudb into a shared library for debug builds.

GNU gdb (Ubuntu 8.0.1-0ubuntu1) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from bin/mysqld...done.
[New LWP 24606]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `bin/mysqld --initialize'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f4498d8df5d in __GI_abort () at abort.c:90
#2  0x00007f4498d83f17 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x5581f567ed23 "safe_mutex_inited",
    file=file@entry=0x5581f567ecf0 "/home/rfp/projects/xelabs-server/mysys/thr_mutex.c", line=line@entry=39,
    function=function@entry=0x5581f567f040 <__PRETTY_FUNCTION__.6977> "safe_mutex_init") at assert.c:92
#3  0x00007f4498d83fc2 in __GI___assert_fail (assertion=0x5581f567ed23 "safe_mutex_inited", file=0x5581f567ecf0 "/home/rfp/projects/xelabs-server/mysys/thr_mutex.c", line=39,
    function=0x5581f567f040 <__PRETTY_FUNCTION__.6977> "safe_mutex_init") at assert.c:101
#4  0x00005581f4ab1a40 in safe_mutex_init (mp=0x5581f632b8e0 <TOKUDB_SHARE::_open_tables_mutex>, attr=0x5581f6336550 <my_fast_mutexattr>,
    file=0x5581f57ac188 "/home/rfp/projects/xelabs-server/storage/tokudb/tokudb_thread.h", line=207) at /home/rfp/projects/xelabs-server/mysys/thr_mutex.c:39
#5  0x00005581f500aa9f in my_mutex_init (mp=0x5581f632b8e0 <TOKUDB_SHARE::_open_tables_mutex>, attr=0x5581f6336550 <my_fast_mutexattr>,
    file=0x5581f57ac188 "/home/rfp/projects/xelabs-server/storage/tokudb/tokudb_thread.h", line=207) at /home/rfp/projects/xelabs-server/include/thr_mutex.h:167
#6  0x00005581f500ac25 in inline_mysql_mutex_init (key=4294967295, that=0x5581f632b8e0 <TOKUDB_SHARE::_open_tables_mutex>, attr=0x5581f6336550 <my_fast_mutexattr>,
    src_file=0x5581f57ac188 "/home/rfp/projects/xelabs-server/storage/tokudb/tokudb_thread.h", src_line=207)
    at /home/rfp/projects/xelabs-server/include/mysql/psi/mysql_thread.h:668
#7  0x00005581f5043ee5 in tokudb::thread::mutex_t::mutex_t (this=0x5581f632b8e0 <TOKUDB_SHARE::_open_tables_mutex>, key=4294967295)
    at /home/rfp/projects/xelabs-server/storage/tokudb/tokudb_thread.h:207
#8  0x00005581f5043e68 in tokudb::thread::mutex_t::mutex_t (this=0x5581f632b8e0 <TOKUDB_SHARE::_open_tables_mutex>)
    at /home/rfp/projects/xelabs-server/storage/tokudb/tokudb_thread.h:45
#9  0x00005581f50433a4 in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at /home/rfp/projects/xelabs-server/storage/tokudb/ha_tokudb.cc:38
#10 0x00005581f5043936 in _GLOBAL__sub_I__ZN6tokudb8metadata4readEP9__toku_dbP13__toku_db_txnyPvmPm () at /home/rfp/projects/xelabs-server/storage/tokudb/ha_tokudb.cc:9061
#11 0x00005581f529aefd in __libc_csu_init ()
#12 0x00007f4498d76150 in __libc_start_main (main=0x5581f4025aea <main(int, char**)>, argc=2, argv=0x7ffe9dcd7118, init=0x5581f529aeb0 <__libc_csu_init>,
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe9dcd7108) at ../csu/libc-start.c:264
#13 0x00005581f4025a0a in _start ()
(gdb) q
prohaska7 referenced this pull request in xelabs/tokudb Jan 8, 2018
…ecuting global constructors (before main gets called). the assert catches global mutex initialization which seems to be problematic. since tokudb has a global mutex (open_tables_mutex) and tokudb is statically linked into mysqld (plugin type mandatory), the assert fires.

we could:
1. remove the assert, or
2. rewrite tokudb to remove the open_tables_mutex, or
3. compile tokudb into a shared library.

this commit compiles tokudb into a shared library for debug builds.

GNU gdb (Ubuntu 8.0.1-0ubuntu1) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from bin/mysqld...done.
[New LWP 24606]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `bin/mysqld --initialize'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f4498d8df5d in __GI_abort () at abort.c:90
#2  0x00007f4498d83f17 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x5581f567ed23 "safe_mutex_inited",
    file=file@entry=0x5581f567ecf0 "/home/rfp/projects/xelabs-server/mysys/thr_mutex.c", line=line@entry=39,
    function=function@entry=0x5581f567f040 <__PRETTY_FUNCTION__.6977> "safe_mutex_init") at assert.c:92
#3  0x00007f4498d83fc2 in __GI___assert_fail (assertion=0x5581f567ed23 "safe_mutex_inited", file=0x5581f567ecf0 "/home/rfp/projects/xelabs-server/mysys/thr_mutex.c", line=39,
    function=0x5581f567f040 <__PRETTY_FUNCTION__.6977> "safe_mutex_init") at assert.c:101
#4  0x00005581f4ab1a40 in safe_mutex_init (mp=0x5581f632b8e0 <TOKUDB_SHARE::_open_tables_mutex>, attr=0x5581f6336550 <my_fast_mutexattr>,
    file=0x5581f57ac188 "/home/rfp/projects/xelabs-server/storage/tokudb/tokudb_thread.h", line=207) at /home/rfp/projects/xelabs-server/mysys/thr_mutex.c:39
#5  0x00005581f500aa9f in my_mutex_init (mp=0x5581f632b8e0 <TOKUDB_SHARE::_open_tables_mutex>, attr=0x5581f6336550 <my_fast_mutexattr>,
    file=0x5581f57ac188 "/home/rfp/projects/xelabs-server/storage/tokudb/tokudb_thread.h", line=207) at /home/rfp/projects/xelabs-server/include/thr_mutex.h:167
#6  0x00005581f500ac25 in inline_mysql_mutex_init (key=4294967295, that=0x5581f632b8e0 <TOKUDB_SHARE::_open_tables_mutex>, attr=0x5581f6336550 <my_fast_mutexattr>,
    src_file=0x5581f57ac188 "/home/rfp/projects/xelabs-server/storage/tokudb/tokudb_thread.h", src_line=207)
    at /home/rfp/projects/xelabs-server/include/mysql/psi/mysql_thread.h:668
#7  0x00005581f5043ee5 in tokudb::thread::mutex_t::mutex_t (this=0x5581f632b8e0 <TOKUDB_SHARE::_open_tables_mutex>, key=4294967295)
    at /home/rfp/projects/xelabs-server/storage/tokudb/tokudb_thread.h:207
#8  0x00005581f5043e68 in tokudb::thread::mutex_t::mutex_t (this=0x5581f632b8e0 <TOKUDB_SHARE::_open_tables_mutex>)
    at /home/rfp/projects/xelabs-server/storage/tokudb/tokudb_thread.h:45
#9  0x00005581f50433a4 in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at /home/rfp/projects/xelabs-server/storage/tokudb/ha_tokudb.cc:38
#10 0x00005581f5043936 in _GLOBAL__sub_I__ZN6tokudb8metadata4readEP9__toku_dbP13__toku_db_txnyPvmPm () at /home/rfp/projects/xelabs-server/storage/tokudb/ha_tokudb.cc:9061
#11 0x00005581f529aefd in __libc_csu_init ()
#12 0x00007f4498d76150 in __libc_start_main (main=0x5581f4025aea <main(int, char**)>, argc=2, argv=0x7ffe9dcd7118, init=0x5581f529aeb0 <__libc_csu_init>,
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe9dcd7108) at ../csu/libc-start.c:264
#13 0x00005581f4025a0a in _start ()
(gdb) q
BohuTANG referenced this pull request in xelabs/tokudb Jan 18, 2018
TokuDB will be crashed during shutdown due to PFS key double-free.

(gdb) bt
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:62
#1  0x0000000000f6b3e7 in my_write_core (sig=sig@entry=11) at /u01/tokudb/mysys/stacktrace.c:249
#2  0x000000000086b6e5 in handle_fatal_signal (sig=11) at /u01/tokudb/sql/signal_handler.cc:223
#3  <signal handler called>
#4  destroy_mutex (pfs=0x7f8fb71c9900) at /u01/tokudb/storage/perfschema/pfs_instr.cc:327
#5  0x00000000013c574a in pfs_destroy_mutex_v1 (mutex=<optimized out>) at /u01/tokudb/storage/perfschema/pfs.cc:1833
#6  0x0000000000fc350a in inline_mysql_mutex_destroy (that=0x1f84ea0 <tokudb_map_mutex>) at /u01/tokudb/include/mysql/psi/mysql_thread.h:681
#7  tokudb::thread::mutex_t::~mutex_t (this=0x1f84ea0 <tokudb_map_mutex>, __in_chrg=<optimized out>) at /u01/tokudb/storage/tokudb/tokudb_thread.h:214
#8  0x00007f8fbb38cff8 in _run_exit_handlers (status=status@entry=0, listp=0x7f8fbb7175f8 <_exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:82
#9  0x00007f8fbb38d045 in __GI_exit (status=status@entry=0) at exit.c:104
#10 0x000000000085b7b5 in mysqld_exit (exit_code=exit_code@entry=0) at /u01/tokudb/sql/mysqld.cc:1205
#11 0x0000000000865fa6 in mysqld_main (argc=46, argv=0x7f8fbaeb3088) at /u01/tokudb/sql/mysqld.cc:5430
#12 0x00007f8fbb373830 in __libc_start_main (main=0x78d700 <main(int, char**)>, argc=10, argv=0x7ffd36ab1a28, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffd36ab1a18) at ../csu/libc-start.c:291
#13 0x00000000007a7b79 in _start ()

And the AddressSanitizer errors:
==27219==ERROR: AddressSanitizer: heap-use-after-free on address 0x7f009b12d118 at pc 0x00000265d80b bp 0x7ffd3bb7ffb0 sp 0x7ffd3bb7ffa0
READ of size 8 at 0x7f009b12d118 thread T0
#0 0x265d80a in destroy_mutex(PFS_mutex*) /u01/tokudb/storage/perfschema/pfs_instr.cc:323
#1 0x1cfaef3 in inline_mysql_mutex_destroy /u01/tokudb/include/mysql/psi/mysql_thread.h:681
#2 0x1cfaef3 in tokudb::thread::mutex_t::~mutex_t() /u01/tokudb/storage/tokudb/tokudb_thread.h:214
#3 0x7f009d512ff7 (/lib/x86_64-linux-gnu/libc.so.6+0x39ff7)
#4 0x7f009d513044 in exit (/lib/x86_64-linux-gnu/libc.so.6+0x3a044)
#5 0x9a8239 in mysqld_exit /u01/tokudb/sql/mysqld.cc:1205
#6 0x9b210f in unireg_abort /u01/tokudb/sql/mysqld.cc:1175
#7 0x9b629e in init_server_components /u01/tokudb/sql/mysqld.cc:4509
#8 0x9b8aca in mysqld_main(int, char**) /u01/tokudb/sql/mysqld.cc:5001
#9 0x7f009d4f982f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#10 0x7e8308 in _start (/home/ubuntu/mysql_20161216/bin/mysqld+0x7e8308)
vlad-lesin pushed a commit to vlad-lesin/percona-server that referenced this pull request May 10, 2018
…TABLE_UPGRADE_GUARD

To repeat: cmake -DWITH_ASAN=1 -DWITH_ASAN_SCOPE=1
./mtr --mem --sanitize main.dd_upgrade_error

A few dd tests fail with:
==26861==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7000063bf5e8 at pc 0x00010d4dbe8b bp 0x7000063bda40 sp 0x7000063bda38
READ of size 8 at 0x7000063bf5e8 thread T2
    #0 0x10d4dbe8a in Prealloced_array<st_plugin_int**, 16ul>::empty() const prealloced_array.h:186
    #1 0x10d406a8b in lex_end(LEX*) sql_lex.cc:560
    percona#2 0x10dae4b6d in dd::upgrade::Table_upgrade_guard::~Table_upgrade_guard() (mysqld:x86_64+0x100f87b6d)
    percona#3 0x10dadc557 in dd::upgrade::migrate_table_to_dd(THD*, std::__1::basic_string<char, std::__1::char_traits<char>, Stateless_allocator<char, dd::String_type_alloc, My_free_functor> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, Stateless_allocator<char, dd::String_type_alloc, My_free_functor> > const&, bool) (mysqld:x86_64+0x100f7f557)
    percona#4 0x10dad7e85 in dd::upgrade::migrate_plugin_table_to_dd(THD*) (mysqld:x86_64+0x100f7ae85)
    percona#5 0x10daec6a1 in dd::upgrade::do_pre_checks_and_initialize_dd(THD*) upgrade.cc:1216
    percona#6 0x10cd0a5c0 in bootstrap::handle_bootstrap(void*) bootstrap.cc:336

Change-Id: I265ec6dd97ee8076aaf03763840c0cdf9e20325b
Fix: increase lifetime of 'LEX lex;' which is used by 'table_guard'
laurynas-biveinis added a commit that referenced this pull request Aug 27, 2018
A subset of binlog encryption tests was crashing with:

* thread #39, stop reason = signal SIGSTOP
    frame #0: 0x00007fff56063b66 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff5622e080 libsystem_pthread.dylib`pthread_kill + 333
    frame #2: 0x000000010657442b mysqld-debug`my_write_core(sig=11) at stacktrace.cc:278
    frame #3: 0x0000000104d84334 mysqld-debug`::handle_fatal_signal(sig=11) at signal_handler.cc:254
    frame #4: 0x00007fff56221f5a libsystem_platform.dylib`_sigtramp + 26
    frame #5: 0x00007fff5622934d libsystem_pthread.dylib`pthread_mutex_lock + 1
    frame #6: 0x0000000106578d05 mysqld-debug`native_mutex_lock(mutex=0x0000000000000000) at thr_mutex.h:93
    frame #7: 0x0000000106578a57 mysqld-debug`safe_mutex_lock(mp=0x0000000000000000, try_lock=false, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.cc:70
    frame #8: 0x000000010653cd3a mysqld-debug`my_mutex_lock(mp=0x00007ffb6b215038, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.h:180
    frame #9: 0x000000010653b2cc mysqld-debug`inline_mysql_mutex_lock(that=0x00007ffb6b215038, src_file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", src_line=113) at mysql_mutex.h:267
  * frame #10: 0x000000010653b0d8 mysqld-debug`my_b_append_tell(info=0x00007ffb6b214fd8) at mf_iocache2.cc:113
    frame #11: 0x0000000105ed6a96 mysqld-debug`MYSQL_BIN_LOG::write_buffer(this=0x00007ffb6b214cb8, buf="", len=47, mi=0x00007ffb6b1f6a00) at binlog.cc:7128
    frame #12: 0x0000000105f4d54b mysqld-debug`queue_event(mi=0x00007ffb6b1f6a00, buf="", event_len=47, do_flush_mi=true) at rpl_slave.cc:7756
    frame #13: 0x0000000105f3a243 mysqld-debug`::handle_slave_io(arg=0x00007ffb6b1f6a00) at rpl_slave.cc:5382
    frame #14: 0x00000001065b87a5 mysqld-debug`pfs_spawn_thread(arg=0x00007ffb6a543af0) at pfs.cc:2836
    frame #15: 0x00007fff5622b661 libsystem_pthread.dylib`_pthread_body + 340
    frame #16: 0x00007fff5622b50d libsystem_pthread.dylib`_pthread_start + 377
    frame #17: 0x00007fff5622abf9 libsystem_pthread.dylib`thread_start + 13

This was caused by my_b_append_tell trying to lock a nullptr
IO_CACHE::append_buffer_lock. The lock was nullptr, because it's only
initialized for SEQ_READ_APPEND IO_CACHEs, whereas we have
WRITE_CACHE. This mismatch was introduced by WL#8599 [1] changing the
IO_CACHE type from the former to the latter.

Fix by using the correct API for the new IO_CACHE type: my_b_tell
instead of my_b_append_tell.

[1]:

commit dbd2ca2
Author: Joao Gramacho <[email protected]>
Date:   Tue Nov 1 06:45:39 2016 +0000

    WL#8599: Reduce contention in IO and SQL threads
    (...)
laurynas-biveinis added a commit that referenced this pull request Sep 6, 2018
create_table_info_t::create_table_def leaked memory in the case
enable_encryption(table) call failed:

worker[5] Sanitizer report from /tmp/results/PS/mysql-test/var/5/log/mysqld.2.err after tests:
 binlog_encryption.binlog_encryption_without_keyring group_replication.gr_change_master_hidden group_replication.gr_server_uuid_matches_group_name group_replication.gr_perfschema_connect_status group_replication.gr_single_primary_and_leader_election_on_error group_replication.gr_without_perfschema rpl.rpl_key_rotation
--------------------------------------------------------------------------
==14131==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1136 byte(s) in 1 object(s) allocated from:
    #0 0x7fe9233f1602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0xc692483 in ut_allocator<unsigned char>::allocate(unsigned long, unsigned char const*, unsigned int, bool, bool) storage/innobase/include/ut0new.h:608
    #2 0xc692483 in mem_heap_create_block_func(mem_block_info_t*, unsigned long, unsigned long) storage/innobase/mem/memory.cc:281
    #3 0xb99ff96 in mem_heap_create_func storage/innobase/include/mem0mem.ic:464
    #4 0xbae8604 in create_table_info_t::create_table_def(dd::Table const*) storage/innobase/handler/ha_innodb.cc:10349
    #5 0xbaee018 in create_table_info_t::create_table(dd::Table const*) storage/innobase/handler/ha_innodb.cc:12420
    #6 0xbaf1aba in int innobase_basic_ddl::create_impl<dd::Table>(THD*, char const*, TABLE*, HA_CREATE_INFO*, dd::Table*, bool, bool, bool, unsigned long, unsigned long) storage/innobase/handler/ha_innodb.cc:12805
    #7 0xbaf7e6a in ha_innobase::create(char const*, TABLE*, HA_CREATE_INFO*, dd::Table*) storage/innobase/handler/ha_innodb.cc:13756
    #8 0x2857f7a in ha_create_table(THD*, char const*, char const*, char const*, HA_CREATE_INFO*, List<Create_field> const*, bool, bool, dd::Table*) sql/handler.cc:5156
    #9 0x19d0d9f in rea_create_base_table sql/sql_table.cc:991
    #10 0x19d0d9f in create_table_impl sql/sql_table.cc:7118
    #11 0x19d37cf in mysql_create_table_no_lock(THD*, char const*, char const*, HA_CREATE_INFO*, Alter_info*, unsigned int, bool, bool*, handlerton**) sql/sql_table.cc:7200
    #12 0x19dffb2 in mysql_create_table(THD*, TABLE_LIST*, HA_CREATE_INFO*, Alter_info*) sql/sql_table.cc:7950
    #13 0x3b58b9b in Sql_cmd_create_table::execute(THD*) sql/sql_cmd_ddl_table.cc:319
    #14 0x15917c1 in mysql_execute_command(THD*, bool) sql/sql_parse.cc:4417
    #15 0x15b086e in mysql_parse(THD*, Parser_state*, bool) sql/sql_parse.cc:5139
    #16 0x8efc7fd in Query_log_event::do_apply_event(Relay_log_info const*, char const*, unsigned long) sql/log_event.cc:5295
    #17 0x8f7ea48 in Log_event::apply_event(Relay_log_info*) sql/log_event.cc:3882
    #18 0x91cb682 in apply_event_and_update_pos sql/rpl_slave.cc:4352
    #19 0x9215e69 in exec_relay_log_event sql/rpl_slave.cc:4812
    #20 0x9254685 in handle_slave_sql sql/rpl_slave.cc:6912
    #21 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836
    #22 0x7fe9231436b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)

Fix by adding the missing mem_heap_free(heap) call.
laurynas-biveinis added a commit that referenced this pull request Sep 7, 2018
Avoid undefined behavior in audit_log_update_thd_local by avoiding
passing NULL as source pointer to memcpy, even with zero length.

The UBSan report fixed is

/usr/include/x86_64-linux-gnu/bits/string3.h:53:71: runtime error: null pointer passed as argument 2, which is declared to never be null
    #0 0x7fe5aad56fb1 in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53
    #1 0x7fe5aad56fb1 in audit_log_update_thd_local plugin/audit_log/audit_log.cc:987
    #2 0x7fe5aad56fb1 in audit_log_notify plugin/audit_log/audit_log.cc:1105
    #3 0x1ecac37 in plugins_dispatch sql/sql_audit.cc:1284
    #4 0x1ecac37 in event_class_dispatch sql/sql_audit.cc:1322
    #5 0x1ecb311 in event_class_dispatch_error sql/sql_audit.cc:1340
    #6 0x1ed21b1 in mysql_audit_notify(THD*, mysql_event_connection_subclass_t, char const*, int) sql/sql_audit.cc:438
    #7 0x1350071 in check_connection sql/sql_connect.cc:868
    #8 0x1350071 in login_connection sql/sql_connect.cc:929
    #9 0x1357881 in thd_prepare_connection(THD*, bool) sql/sql_connect.cc:1084
    #10 0x1e66347 in handle_connection sql/conn_handler/connection_handler_per_thread.cc:313
    #11 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836
    #12 0x7fe5d352f6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)
    #13 0x7fe5d0bd741c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c)
inikep pushed a commit that referenced this pull request Oct 10, 2024
Upstream commit ID : fb-mysql-5.6.35/77032004ad23d21a4c386f8136ecfbb071ea42d6
PS-6865 : Merge fb-prod201903

Summary:
Currently during primary key's value encode, its ttl value can be from either
one of these 3 cases
1. ttl column in primary key
2. non-ttl column
   a. old record(update case)
   b. current timestamp
3. ttl column in non-key field

Workflow #1: first in Rdb_key_def::pack_record() find and
store pk_offset, then in value encode try to parse key slice to fetch ttl
value by using pk_offset.

Workflow #3: fetch ttl value from ttl column

The change is to merge #1 and #3 by always fetching TTL value from ttl column,
not matter whether the ttl column is in primary key or not. Of course, remove
pk_offset, since it isn't used.

BTW, for secondary keys, its ttl value is always from m_ttl_bytes, which is
stored by primary value encoding.

Reviewed By: yizhang82

Differential Revision: D14662716

fbshipit-source-id: 6b4e5f044fd
inikep pushed a commit that referenced this pull request Oct 10, 2024
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f
PS-7731 : Merge percona-202102

Summary:
Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty:

```
 // bitmap is cleared on index merge, but it still needs to decode columns
    bool field_requested =
        decode_all_fields || m_verify_row_debug_checksums ||
        bitmap_is_set(field_map, m_table->field[i]->field_index);
```
As a result MyRocks is significantly slower than InnoDB in this particular scenario.

Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*).

We have a few options to address this:
1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge.
2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false).
3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly.

In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init.

Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly.

Reference Patch: facebook/mysql-5.6@44d6a8d

---------
Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra.

Reviewed By: Pushapgl

Differential Revision: D26265470

fbshipit-source-id: f142be681ab
inikep added a commit that referenced this pull request Oct 10, 2024
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.
inikep pushed a commit that referenced this pull request Oct 10, 2024
…n read() syscall over network

https://jira.percona.com/browse/PS-8592

Description
-----------
GR suffered from problems caused by the security probes and network scanner
processes connecting to the group replication communication port. This usually
is not a problem, but poses a serious threat when another member tries to join
the cluster by initialting a connection to the member which is affected by
external processes using the port dedicated for group communication for longer
durations.

On such activites by external processes, the SSL enabled server stalled forever
on the SSL_accept() call waiting for handshake data. Below is the stacktrace:

    Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)):
    #0 in read ()
    #1 in sock_read ()
    #2 in BIO_read ()
    #3 in ssl23_read_bytes ()
    #4 in ssl23_get_client_hello ()
    #5 in ssl23_accept ()
    #6 in xcom_tcp_server_startup(Xcom_network_provider*) ()

When the server stalled in the above path forever, it prohibited other members
to join the cluster resulting in the following messages on the joiner server's
logs.

    [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group'
    [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.'

Solution
--------
This patch adds two new variables

1. group_replication_xcom_ssl_socket_timeout

   It is a file-descriptor level timeout in seconds for both accept() and
   SSL_accept() calls when group replication is listening on the xcom port.
   When set to a valid value, say for example 5 seconds, both accept() and
   SSL_accept() return after 5 seconds. The default value has been set to 0
   (waits infinitely) for backward compatibility. This variable is effective
   only when GR is configred with SSL.

2. group_replication_xcom_ssl_accept_retries

   It defines the number of retries to be performed before closing the socket.
   For each retry the server thread calls SSL_accept()  with timeout defined by
   the group_replication_xcom_ssl_socket_timeout for the SSL handshake process
   once the connection has been accepted by the first accept() call. The
   default value has been set to 10. This variable is effective only when GR is
   configred with SSL.

Note:
- Both of the above variables are dynamically configurable, but will become
  effective only on START GROUP_REPLICATION.

-------------------------------------------------------------------------------

PS-8844: Fix the failing main.mysqldump_gtid_purged

https://jira.percona.com/browse/PS-8844

This patch fixes the test failure of main.mysqldump_gtid_purged that
failed due to the uninitialized variable $redirect_stderr in the
start_proc_in_background.inc.
inikep pushed a commit that referenced this pull request Oct 10, 2024
…ocal DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    #2  ___pthread_mutex_lock
    #3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    #4  Commit_stage_manager::enroll_for
    #5  MYSQL_BIN_LOG::change_stage
    #6  MYSQL_BIN_LOG::ordered_commit
    #7  MYSQL_BIN_LOG::commit
    #8  ha_commit_trans
    #9  trans_commit_implicit
    #10 mysql_create_like_table
    #11 Sql_cmd_create_table::execute
    #12 mysql_execute_command
    #13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    #2  native_mutex_lock
    #3  safe_mutex_lock
    #4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    #5  Gtid_state::update_commit_group
    #6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    #7  Commit_order_manager::finish
    #8  Commit_order_manager::wait_and_finish
    #9  ha_commit_low
    #10 trx_coordinator::commit_in_engines
    #11 MYSQL_BIN_LOG::commit
    #12 ha_commit_trans
    #13 trans_commit
    #14 Xid_log_event::do_commit
    #15 Xid_apply_log_event::do_apply_event_worker
    #16 Slave_worker::slave_worker_exec_event
    #17 slave_worker_exec_job_group
    #18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
dlenev pushed a commit to dlenev/percona-server that referenced this pull request Oct 17, 2024
Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3
PS-5217 : Merge fb-prod201803

Summary:
Original report: https://jira.mariadb.org/browse/MDEV-15816

To reproduce this bug just following below steps,

client 1:
USE test;
CREATE TABLE t1 (i INT) ENGINE=MyISAM;
HANDLER t1 OPEN h;
CREATE TABLE t2 (i INT) ENGINE=RocksDB;
LOCK TABLES t2 WRITE;

client 2:
FLUSH TABLES WITH READ LOCK;

client 1:
INSERT INTO t2 VALUES (1);

So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE.
Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly
set to RDB_LOCK_NONE, as below

```
 #0  myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE)
 #1  get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2)
 percona#2  mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0)
 percona#3  THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true)
 percona#4  MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8)
 percona#5  MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2)
 percona#6  Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0)
```

Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE'
failed in myrocks::ha_rocksdb::write_row()

Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE.

Closes facebook/mysql-5.6#838
Pull Request resolved: facebook/mysql-5.6#871

Differential Revision: D9417382

Pulled By: lth

fbshipit-source-id: c36c164e06c
dlenev pushed a commit to dlenev/percona-server that referenced this pull request Oct 17, 2024
Upstream commit ID : fb-mysql-5.6.35/77032004ad23d21a4c386f8136ecfbb071ea42d6
PS-6865 : Merge fb-prod201903

Summary:
Currently during primary key's value encode, its ttl value can be from either
one of these 3 cases
1. ttl column in primary key
2. non-ttl column
   a. old record(update case)
   b. current timestamp
3. ttl column in non-key field

Workflow #1: first in Rdb_key_def::pack_record() find and
store pk_offset, then in value encode try to parse key slice to fetch ttl
value by using pk_offset.

Workflow percona#3: fetch ttl value from ttl column

The change is to merge #1 and percona#3 by always fetching TTL value from ttl column,
not matter whether the ttl column is in primary key or not. Of course, remove
pk_offset, since it isn't used.

BTW, for secondary keys, its ttl value is always from m_ttl_bytes, which is
stored by primary value encoding.

Reviewed By: yizhang82

Differential Revision: D14662716

fbshipit-source-id: 6b4e5f044fd
dlenev pushed a commit to dlenev/percona-server that referenced this pull request Oct 17, 2024
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f
PS-7731 : Merge percona-202102

Summary:
Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty:

```
 // bitmap is cleared on index merge, but it still needs to decode columns
    bool field_requested =
        decode_all_fields || m_verify_row_debug_checksums ||
        bitmap_is_set(field_map, m_table->field[i]->field_index);
```
As a result MyRocks is significantly slower than InnoDB in this particular scenario.

Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*).

We have a few options to address this:
1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge.
2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false).
3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly.

In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of percona#3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init.

Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly.

Reference Patch: facebook/mysql-5.6@44d6a8d

---------
Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra.

Reviewed By: Pushapgl

Differential Revision: D26265470

fbshipit-source-id: f142be681ab
dlenev pushed a commit to dlenev/percona-server that referenced this pull request Oct 17, 2024
…s=0 and a local DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    percona#2  ___pthread_mutex_lock
    percona#3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    percona#4  Commit_stage_manager::enroll_for
    percona#5  MYSQL_BIN_LOG::change_stage
    percona#6  MYSQL_BIN_LOG::ordered_commit
    percona#7  MYSQL_BIN_LOG::commit
    percona#8  ha_commit_trans
    percona#9  trans_commit_implicit
    percona#10 mysql_create_like_table
    percona#11 Sql_cmd_create_table::execute
    percona#12 mysql_execute_command
    percona#13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    percona#2  native_mutex_lock
    percona#3  safe_mutex_lock
    percona#4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    percona#5  Gtid_state::update_commit_group
    percona#6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    percona#7  Commit_order_manager::finish
    percona#8  Commit_order_manager::wait_and_finish
    percona#9  ha_commit_low
    percona#10 trx_coordinator::commit_in_engines
    percona#11 MYSQL_BIN_LOG::commit
    percona#12 ha_commit_trans
    percona#13 trans_commit
    percona#14 Xid_log_event::do_commit
    percona#15 Xid_apply_log_event::do_apply_event_worker
    percona#16 Slave_worker::slave_worker_exec_event
    percona#17 slave_worker_exec_job_group
    percona#18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
dlenev pushed a commit to dlenev/percona-server that referenced this pull request Oct 17, 2024
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    percona#2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    percona#3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    percona#4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    percona#5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    percona#6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    percona#7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    percona#8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    percona#9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    percona#10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    percona#11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    percona#12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.

---------------------------------------------------------------------------

PS-9218: Merge MySQL 8.4.0 (fix gcc-14 build)

https://perconadev.atlassian.net/browse/PS-9218
dlenev pushed a commit to dlenev/percona-server that referenced this pull request Oct 17, 2024
…n read() syscall over network

https://jira.percona.com/browse/PS-8592

Description
-----------
GR suffered from problems caused by the security probes and network scanner
processes connecting to the group replication communication port. This usually
is not a problem, but poses a serious threat when another member tries to join
the cluster by initialting a connection to the member which is affected by
external processes using the port dedicated for group communication for longer
durations.

On such activites by external processes, the SSL enabled server stalled forever
on the SSL_accept() call waiting for handshake data. Below is the stacktrace:

    Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)):
    #0 in read ()
    #1 in sock_read ()
    percona#2 in BIO_read ()
    percona#3 in ssl23_read_bytes ()
    percona#4 in ssl23_get_client_hello ()
    percona#5 in ssl23_accept ()
    percona#6 in xcom_tcp_server_startup(Xcom_network_provider*) ()

When the server stalled in the above path forever, it prohibited other members
to join the cluster resulting in the following messages on the joiner server's
logs.

    [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group'
    [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.'

Solution
--------
This patch adds two new variables

1. group_replication_xcom_ssl_socket_timeout

   It is a file-descriptor level timeout in seconds for both accept() and
   SSL_accept() calls when group replication is listening on the xcom port.
   When set to a valid value, say for example 5 seconds, both accept() and
   SSL_accept() return after 5 seconds. The default value has been set to 0
   (waits infinitely) for backward compatibility. This variable is effective
   only when GR is configred with SSL.

2. group_replication_xcom_ssl_accept_retries

   It defines the number of retries to be performed before closing the socket.
   For each retry the server thread calls SSL_accept()  with timeout defined by
   the group_replication_xcom_ssl_socket_timeout for the SSL handshake process
   once the connection has been accepted by the first accept() call. The
   default value has been set to 10. This variable is effective only when GR is
   configred with SSL.

Note:
- Both of the above variables are dynamically configurable, but will become
  effective only on START GROUP_REPLICATION.

-------------------------------------------------------------------------

PS-8844: Fix the failing main.mysqldump_gtid_purged

https://jira.percona.com/browse/PS-8844

This patch fixes the test failure of main.mysqldump_gtid_purged that
failed due to the uninitialized variable $redirect_stderr in the
start_proc_in_background.inc.

----------------------------------------------------------------------

PS-9218: Merge MySQL 8.4.0 (fix terminology in replication tests)

https://perconadev.atlassian.net/browse/PS-9218

mysql/mysql-server@44a77b5
dlenev pushed a commit to dlenev/percona-server that referenced this pull request Oct 17, 2024
Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3
PS-5217 : Merge fb-prod201803

Summary:
Original report: https://jira.mariadb.org/browse/MDEV-15816

To reproduce this bug just following below steps,

client 1:
USE test;
CREATE TABLE t1 (i INT) ENGINE=MyISAM;
HANDLER t1 OPEN h;
CREATE TABLE t2 (i INT) ENGINE=RocksDB;
LOCK TABLES t2 WRITE;

client 2:
FLUSH TABLES WITH READ LOCK;

client 1:
INSERT INTO t2 VALUES (1);

So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE.
Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly
set to RDB_LOCK_NONE, as below

```
 #0  myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE)
 #1  get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2)
 percona#2  mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0)
 percona#3  THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true)
 percona#4  MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8)
 percona#5  MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2)
 percona#6  Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0)
```

Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE'
failed in myrocks::ha_rocksdb::write_row()

Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE.

Closes facebook/mysql-5.6#838
Pull Request resolved: facebook/mysql-5.6#871

Differential Revision: D9417382

Pulled By: lth

fbshipit-source-id: c36c164e06c
dlenev pushed a commit to dlenev/percona-server that referenced this pull request Oct 17, 2024
Upstream commit ID : fb-mysql-5.6.35/77032004ad23d21a4c386f8136ecfbb071ea42d6
PS-6865 : Merge fb-prod201903

Summary:
Currently during primary key's value encode, its ttl value can be from either
one of these 3 cases
1. ttl column in primary key
2. non-ttl column
   a. old record(update case)
   b. current timestamp
3. ttl column in non-key field

Workflow #1: first in Rdb_key_def::pack_record() find and
store pk_offset, then in value encode try to parse key slice to fetch ttl
value by using pk_offset.

Workflow percona#3: fetch ttl value from ttl column

The change is to merge #1 and percona#3 by always fetching TTL value from ttl column,
not matter whether the ttl column is in primary key or not. Of course, remove
pk_offset, since it isn't used.

BTW, for secondary keys, its ttl value is always from m_ttl_bytes, which is
stored by primary value encoding.

Reviewed By: yizhang82

Differential Revision: D14662716

fbshipit-source-id: 6b4e5f044fd
dlenev pushed a commit to dlenev/percona-server that referenced this pull request Oct 17, 2024
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f
PS-7731 : Merge percona-202102

Summary:
Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty:

```
 // bitmap is cleared on index merge, but it still needs to decode columns
    bool field_requested =
        decode_all_fields || m_verify_row_debug_checksums ||
        bitmap_is_set(field_map, m_table->field[i]->field_index);
```
As a result MyRocks is significantly slower than InnoDB in this particular scenario.

Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*).

We have a few options to address this:
1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge.
2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false).
3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly.

In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of percona#3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init.

Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly.

Reference Patch: facebook/mysql-5.6@44d6a8d

---------
Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra.

Reviewed By: Pushapgl

Differential Revision: D26265470

fbshipit-source-id: f142be681ab
dlenev pushed a commit to dlenev/percona-server that referenced this pull request Oct 17, 2024
…s=0 and a local DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    percona#2  ___pthread_mutex_lock
    percona#3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    percona#4  Commit_stage_manager::enroll_for
    percona#5  MYSQL_BIN_LOG::change_stage
    percona#6  MYSQL_BIN_LOG::ordered_commit
    percona#7  MYSQL_BIN_LOG::commit
    percona#8  ha_commit_trans
    percona#9  trans_commit_implicit
    percona#10 mysql_create_like_table
    percona#11 Sql_cmd_create_table::execute
    percona#12 mysql_execute_command
    percona#13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    percona#2  native_mutex_lock
    percona#3  safe_mutex_lock
    percona#4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    percona#5  Gtid_state::update_commit_group
    percona#6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    percona#7  Commit_order_manager::finish
    percona#8  Commit_order_manager::wait_and_finish
    percona#9  ha_commit_low
    percona#10 trx_coordinator::commit_in_engines
    percona#11 MYSQL_BIN_LOG::commit
    percona#12 ha_commit_trans
    percona#13 trans_commit
    percona#14 Xid_log_event::do_commit
    percona#15 Xid_apply_log_event::do_apply_event_worker
    percona#16 Slave_worker::slave_worker_exec_event
    percona#17 slave_worker_exec_job_group
    percona#18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
dlenev pushed a commit to dlenev/percona-server that referenced this pull request Oct 17, 2024
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    percona#2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    percona#3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    percona#4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    percona#5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    percona#6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    percona#7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    percona#8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    percona#9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    percona#10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    percona#11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    percona#12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.

---------------------------------------------------------------------------

PS-9218: Merge MySQL 8.4.0 (fix gcc-14 build)

https://perconadev.atlassian.net/browse/PS-9218
dlenev pushed a commit to dlenev/percona-server that referenced this pull request Oct 17, 2024
…n read() syscall over network

https://jira.percona.com/browse/PS-8592

Description
-----------
GR suffered from problems caused by the security probes and network scanner
processes connecting to the group replication communication port. This usually
is not a problem, but poses a serious threat when another member tries to join
the cluster by initialting a connection to the member which is affected by
external processes using the port dedicated for group communication for longer
durations.

On such activites by external processes, the SSL enabled server stalled forever
on the SSL_accept() call waiting for handshake data. Below is the stacktrace:

    Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)):
    #0 in read ()
    #1 in sock_read ()
    percona#2 in BIO_read ()
    percona#3 in ssl23_read_bytes ()
    percona#4 in ssl23_get_client_hello ()
    percona#5 in ssl23_accept ()
    percona#6 in xcom_tcp_server_startup(Xcom_network_provider*) ()

When the server stalled in the above path forever, it prohibited other members
to join the cluster resulting in the following messages on the joiner server's
logs.

    [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group'
    [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.'

Solution
--------
This patch adds two new variables

1. group_replication_xcom_ssl_socket_timeout

   It is a file-descriptor level timeout in seconds for both accept() and
   SSL_accept() calls when group replication is listening on the xcom port.
   When set to a valid value, say for example 5 seconds, both accept() and
   SSL_accept() return after 5 seconds. The default value has been set to 0
   (waits infinitely) for backward compatibility. This variable is effective
   only when GR is configred with SSL.

2. group_replication_xcom_ssl_accept_retries

   It defines the number of retries to be performed before closing the socket.
   For each retry the server thread calls SSL_accept()  with timeout defined by
   the group_replication_xcom_ssl_socket_timeout for the SSL handshake process
   once the connection has been accepted by the first accept() call. The
   default value has been set to 10. This variable is effective only when GR is
   configred with SSL.

Note:
- Both of the above variables are dynamically configurable, but will become
  effective only on START GROUP_REPLICATION.

-------------------------------------------------------------------------

PS-8844: Fix the failing main.mysqldump_gtid_purged

https://jira.percona.com/browse/PS-8844

This patch fixes the test failure of main.mysqldump_gtid_purged that
failed due to the uninitialized variable $redirect_stderr in the
start_proc_in_background.inc.

----------------------------------------------------------------------

PS-9218: Merge MySQL 8.4.0 (fix terminology in replication tests)

https://perconadev.atlassian.net/browse/PS-9218

mysql/mysql-server@44a77b5
dlenev pushed a commit to dlenev/percona-server that referenced this pull request Oct 22, 2024
Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3
PS-5217 : Merge fb-prod201803

Summary:
Original report: https://jira.mariadb.org/browse/MDEV-15816

To reproduce this bug just following below steps,

client 1:
USE test;
CREATE TABLE t1 (i INT) ENGINE=MyISAM;
HANDLER t1 OPEN h;
CREATE TABLE t2 (i INT) ENGINE=RocksDB;
LOCK TABLES t2 WRITE;

client 2:
FLUSH TABLES WITH READ LOCK;

client 1:
INSERT INTO t2 VALUES (1);

So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE.
Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly
set to RDB_LOCK_NONE, as below

```
 #0  myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE)
 #1  get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2)
 percona#2  mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0)
 percona#3  THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true)
 percona#4  MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8)
 percona#5  MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2)
 percona#6  Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0)
```

Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE'
failed in myrocks::ha_rocksdb::write_row()

Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE.

Closes facebook/mysql-5.6#838
Pull Request resolved: facebook/mysql-5.6#871

Differential Revision: D9417382

Pulled By: lth

fbshipit-source-id: c36c164e06c
dlenev pushed a commit to dlenev/percona-server that referenced this pull request Oct 22, 2024
Upstream commit ID : fb-mysql-5.6.35/77032004ad23d21a4c386f8136ecfbb071ea42d6
PS-6865 : Merge fb-prod201903

Summary:
Currently during primary key's value encode, its ttl value can be from either
one of these 3 cases
1. ttl column in primary key
2. non-ttl column
   a. old record(update case)
   b. current timestamp
3. ttl column in non-key field

Workflow #1: first in Rdb_key_def::pack_record() find and
store pk_offset, then in value encode try to parse key slice to fetch ttl
value by using pk_offset.

Workflow percona#3: fetch ttl value from ttl column

The change is to merge #1 and percona#3 by always fetching TTL value from ttl column,
not matter whether the ttl column is in primary key or not. Of course, remove
pk_offset, since it isn't used.

BTW, for secondary keys, its ttl value is always from m_ttl_bytes, which is
stored by primary value encoding.

Reviewed By: yizhang82

Differential Revision: D14662716

fbshipit-source-id: 6b4e5f044fd
dlenev pushed a commit to dlenev/percona-server that referenced this pull request Oct 22, 2024
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f
PS-7731 : Merge percona-202102

Summary:
Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty:

```
 // bitmap is cleared on index merge, but it still needs to decode columns
    bool field_requested =
        decode_all_fields || m_verify_row_debug_checksums ||
        bitmap_is_set(field_map, m_table->field[i]->field_index);
```
As a result MyRocks is significantly slower than InnoDB in this particular scenario.

Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*).

We have a few options to address this:
1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge.
2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false).
3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly.

In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of percona#3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init.

Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly.

Reference Patch: facebook/mysql-5.6@44d6a8d

---------
Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra.

Reviewed By: Pushapgl

Differential Revision: D26265470

fbshipit-source-id: f142be681ab
dlenev pushed a commit to dlenev/percona-server that referenced this pull request Oct 22, 2024
…s=0 and a local DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    percona#2  ___pthread_mutex_lock
    percona#3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    percona#4  Commit_stage_manager::enroll_for
    percona#5  MYSQL_BIN_LOG::change_stage
    percona#6  MYSQL_BIN_LOG::ordered_commit
    percona#7  MYSQL_BIN_LOG::commit
    percona#8  ha_commit_trans
    percona#9  trans_commit_implicit
    percona#10 mysql_create_like_table
    percona#11 Sql_cmd_create_table::execute
    percona#12 mysql_execute_command
    percona#13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    percona#2  native_mutex_lock
    percona#3  safe_mutex_lock
    percona#4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    percona#5  Gtid_state::update_commit_group
    percona#6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    percona#7  Commit_order_manager::finish
    percona#8  Commit_order_manager::wait_and_finish
    percona#9  ha_commit_low
    percona#10 trx_coordinator::commit_in_engines
    percona#11 MYSQL_BIN_LOG::commit
    percona#12 ha_commit_trans
    percona#13 trans_commit
    percona#14 Xid_log_event::do_commit
    percona#15 Xid_apply_log_event::do_apply_event_worker
    percona#16 Slave_worker::slave_worker_exec_event
    percona#17 slave_worker_exec_job_group
    percona#18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
dlenev pushed a commit to dlenev/percona-server that referenced this pull request Oct 22, 2024
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    percona#2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    percona#3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    percona#4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    percona#5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    percona#6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    percona#7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    percona#8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    percona#9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    percona#10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    percona#11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    percona#12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.

---------------------------------------------------------------------------

PS-9218: Merge MySQL 8.4.0 (fix gcc-14 build)

https://perconadev.atlassian.net/browse/PS-9218
dlenev pushed a commit to dlenev/percona-server that referenced this pull request Oct 22, 2024
…n read() syscall over network

https://jira.percona.com/browse/PS-8592

Description
-----------
GR suffered from problems caused by the security probes and network scanner
processes connecting to the group replication communication port. This usually
is not a problem, but poses a serious threat when another member tries to join
the cluster by initialting a connection to the member which is affected by
external processes using the port dedicated for group communication for longer
durations.

On such activites by external processes, the SSL enabled server stalled forever
on the SSL_accept() call waiting for handshake data. Below is the stacktrace:

    Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)):
    #0 in read ()
    #1 in sock_read ()
    percona#2 in BIO_read ()
    percona#3 in ssl23_read_bytes ()
    percona#4 in ssl23_get_client_hello ()
    percona#5 in ssl23_accept ()
    percona#6 in xcom_tcp_server_startup(Xcom_network_provider*) ()

When the server stalled in the above path forever, it prohibited other members
to join the cluster resulting in the following messages on the joiner server's
logs.

    [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group'
    [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.'

Solution
--------
This patch adds two new variables

1. group_replication_xcom_ssl_socket_timeout

   It is a file-descriptor level timeout in seconds for both accept() and
   SSL_accept() calls when group replication is listening on the xcom port.
   When set to a valid value, say for example 5 seconds, both accept() and
   SSL_accept() return after 5 seconds. The default value has been set to 0
   (waits infinitely) for backward compatibility. This variable is effective
   only when GR is configred with SSL.

2. group_replication_xcom_ssl_accept_retries

   It defines the number of retries to be performed before closing the socket.
   For each retry the server thread calls SSL_accept()  with timeout defined by
   the group_replication_xcom_ssl_socket_timeout for the SSL handshake process
   once the connection has been accepted by the first accept() call. The
   default value has been set to 10. This variable is effective only when GR is
   configred with SSL.

Note:
- Both of the above variables are dynamically configurable, but will become
  effective only on START GROUP_REPLICATION.

-------------------------------------------------------------------------

PS-8844: Fix the failing main.mysqldump_gtid_purged

https://jira.percona.com/browse/PS-8844

This patch fixes the test failure of main.mysqldump_gtid_purged that
failed due to the uninitialized variable $redirect_stderr in the
start_proc_in_background.inc.

----------------------------------------------------------------------

PS-9218: Merge MySQL 8.4.0 (fix terminology in replication tests)

https://perconadev.atlassian.net/browse/PS-9218

mysql/mysql-server@44a77b5
inikep pushed a commit that referenced this pull request Oct 30, 2024
Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3
PS-5217 : Merge fb-prod201803

Summary:
Original report: https://jira.mariadb.org/browse/MDEV-15816

To reproduce this bug just following below steps,

client 1:
USE test;
CREATE TABLE t1 (i INT) ENGINE=MyISAM;
HANDLER t1 OPEN h;
CREATE TABLE t2 (i INT) ENGINE=RocksDB;
LOCK TABLES t2 WRITE;

client 2:
FLUSH TABLES WITH READ LOCK;

client 1:
INSERT INTO t2 VALUES (1);

So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE.
Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly
set to RDB_LOCK_NONE, as below

```
 #0  myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE)
 #1  get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2)
 #2  mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0)
 #3  THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true)
 #4  MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8)
 #5  MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2)
 #6  Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0)
```

Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE'
failed in myrocks::ha_rocksdb::write_row()

Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE.

Closes facebook/mysql-5.6#838
Pull Request resolved: facebook/mysql-5.6#871

Differential Revision: D9417382

Pulled By: lth

fbshipit-source-id: c36c164e06c
inikep pushed a commit that referenced this pull request Oct 30, 2024
Upstream commit ID : fb-mysql-5.6.35/77032004ad23d21a4c386f8136ecfbb071ea42d6
PS-6865 : Merge fb-prod201903

Summary:
Currently during primary key's value encode, its ttl value can be from either
one of these 3 cases
1. ttl column in primary key
2. non-ttl column
   a. old record(update case)
   b. current timestamp
3. ttl column in non-key field

Workflow #1: first in Rdb_key_def::pack_record() find and
store pk_offset, then in value encode try to parse key slice to fetch ttl
value by using pk_offset.

Workflow #3: fetch ttl value from ttl column

The change is to merge #1 and #3 by always fetching TTL value from ttl column,
not matter whether the ttl column is in primary key or not. Of course, remove
pk_offset, since it isn't used.

BTW, for secondary keys, its ttl value is always from m_ttl_bytes, which is
stored by primary value encoding.

Reviewed By: yizhang82

Differential Revision: D14662716

fbshipit-source-id: 6b4e5f044fd
inikep pushed a commit that referenced this pull request Oct 30, 2024
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f
PS-7731 : Merge percona-202102

Summary:
Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty:

```
 // bitmap is cleared on index merge, but it still needs to decode columns
    bool field_requested =
        decode_all_fields || m_verify_row_debug_checksums ||
        bitmap_is_set(field_map, m_table->field[i]->field_index);
```
As a result MyRocks is significantly slower than InnoDB in this particular scenario.

Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*).

We have a few options to address this:
1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge.
2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false).
3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly.

In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init.

Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly.

Reference Patch: facebook/mysql-5.6@44d6a8d

---------
Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra.

Reviewed By: Pushapgl

Differential Revision: D26265470

fbshipit-source-id: f142be681ab
inikep added a commit that referenced this pull request Oct 30, 2024
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.

---------------------------------------------------------------------------

PS-9369: Fix currently processed query comparison in audit_log

https://perconadev.atlassian.net/browse/PS-9369

The audit_log uses stack to keep track of table access operations being
performed in scope of one query. It compares last known table access query
string stored on top of this stack with actual query in audit event being
processed at the moment to decide if new record should be pushed to stack
or it is time to clean records from the stack.

Currently audit_log simply compares char* variables to decide if this is
the same query string. This approach doesn't work. As a result plugin looses
control of the stack size and it starts growing with the time consuming
memory. This issue is not noticable on short term server connections
as memory is freed once connection is closed. At the same time this
leads to extra memory consumption for long running server connections.

The following is done to fix the issue:
- Query is sent along with audit event as MYSQL_LEX_CSTRING structure.
  It is not correct to ignore MYSQL_LEX_CSTRING.length comparison as
  sometimes MYSQL_LEX_CSTRING.str pointer may be not iniialised
  properly. Added string length check to make sure structure contains
  any valid string.
- Used strncmp to compare actual strings instead of comparing char*
  variables.
inikep pushed a commit that referenced this pull request Oct 30, 2024
…n read() syscall over network

https://jira.percona.com/browse/PS-8592

Description
-----------
GR suffered from problems caused by the security probes and network scanner
processes connecting to the group replication communication port. This usually
is not a problem, but poses a serious threat when another member tries to join
the cluster by initialting a connection to the member which is affected by
external processes using the port dedicated for group communication for longer
durations.

On such activites by external processes, the SSL enabled server stalled forever
on the SSL_accept() call waiting for handshake data. Below is the stacktrace:

    Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)):
    #0 in read ()
    #1 in sock_read ()
    #2 in BIO_read ()
    #3 in ssl23_read_bytes ()
    #4 in ssl23_get_client_hello ()
    #5 in ssl23_accept ()
    #6 in xcom_tcp_server_startup(Xcom_network_provider*) ()

When the server stalled in the above path forever, it prohibited other members
to join the cluster resulting in the following messages on the joiner server's
logs.

    [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group'
    [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.'

Solution
--------
This patch adds two new variables

1. group_replication_xcom_ssl_socket_timeout

   It is a file-descriptor level timeout in seconds for both accept() and
   SSL_accept() calls when group replication is listening on the xcom port.
   When set to a valid value, say for example 5 seconds, both accept() and
   SSL_accept() return after 5 seconds. The default value has been set to 0
   (waits infinitely) for backward compatibility. This variable is effective
   only when GR is configred with SSL.

2. group_replication_xcom_ssl_accept_retries

   It defines the number of retries to be performed before closing the socket.
   For each retry the server thread calls SSL_accept()  with timeout defined by
   the group_replication_xcom_ssl_socket_timeout for the SSL handshake process
   once the connection has been accepted by the first accept() call. The
   default value has been set to 10. This variable is effective only when GR is
   configred with SSL.

Note:
- Both of the above variables are dynamically configurable, but will become
  effective only on START GROUP_REPLICATION.

-------------------------------------------------------------------------------

PS-8844: Fix the failing main.mysqldump_gtid_purged

https://jira.percona.com/browse/PS-8844

This patch fixes the test failure of main.mysqldump_gtid_purged that
failed due to the uninitialized variable $redirect_stderr in the
start_proc_in_background.inc.
inikep pushed a commit that referenced this pull request Oct 30, 2024
…ocal DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    #2  ___pthread_mutex_lock
    #3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    #4  Commit_stage_manager::enroll_for
    #5  MYSQL_BIN_LOG::change_stage
    #6  MYSQL_BIN_LOG::ordered_commit
    #7  MYSQL_BIN_LOG::commit
    #8  ha_commit_trans
    #9  trans_commit_implicit
    #10 mysql_create_like_table
    #11 Sql_cmd_create_table::execute
    #12 mysql_execute_command
    #13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    #2  native_mutex_lock
    #3  safe_mutex_lock
    #4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    #5  Gtid_state::update_commit_group
    #6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    #7  Commit_order_manager::finish
    #8  Commit_order_manager::wait_and_finish
    #9  ha_commit_low
    #10 trx_coordinator::commit_in_engines
    #11 MYSQL_BIN_LOG::commit
    #12 ha_commit_trans
    #13 trans_commit
    #14 Xid_log_event::do_commit
    #15 Xid_apply_log_event::do_apply_event_worker
    #16 Slave_worker::slave_worker_exec_event
    #17 slave_worker_exec_job_group
    #18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep pushed a commit that referenced this pull request Nov 18, 2024
This is a combination of 5 commits.

This is the 1st commit message:

WL#15746: TLS Enhancements for HeatWave-AutoML & Dask Comm. Upgrade

Problem:
--------
- HeatWave-AutoML communication was unauthenticated, unauthorized,
  and unencrypted.
- Dask communication utilized TCP, not aligning with FedRamp
  guidelines.

Solution:
---------
- Introduced TLS and mTLS in HeatWave-AutoML's plugin and driver for
  authentication, authorization, and encryption.
- Applied TLS to Dask to ensure authentication, encryption, and
  authorization.

Dask Authorization (OCID-based):
--------------------------------
1. For each DBsystem:
    - MySQL node sends OCIDs of authorized nodes to the head driver
      via:
        a. rapid_net_nodes
        b. rapid_net_allowed_ocids (older API, mainly for MTR tests)
    - Scenarios:
        a. All OCIDs provided: Dask authorizes.
        b. Any OCID absent: ML call fails with message.
2. During Dask worker registration to the Dask scheduler, a script is
    dispatched to the Dask worker for execution, retrieving the worker's
    OCID for authorization purposes.
    - If the OCID isn't approved, the connection is denied, terminating
      the worker and causing the ML query to fail.
3. For every Dask worker (both as listener and connector), an OCID-
    based authorization is performed post SSL/TLS connection handshake.
    The process compares the OCID from the peer's certificate against
    the allowed_ocids received from the HeatWave-AutoML MySQL plugin.

HWAML Plugin Changes:
---------------------
- Sourced certificate data and SSL setup from disk, incorporating
  SSL/TLS for HWAML.
- Reused "keystore" variable to specify disk location for
  certificate retrieval.
- Certificates and keys expected in PKCS12 format.
- Introduced "have_ml_encryption" variable (default=0).
    > Acts as a switch to explicitly deactivate HWAML network
      encryption, akin to "disable_net_encryption" affecting
      network encryption for HeatWave. Set to 1 to enable.
- Introduced a customized verifier function for verify_callback to
  be set in SSL_CTX_set_verify and used in the handshake process
  of SSL/TLS. The customized verifier function will perform
  instance id (OCID) based authorization on the plugin side during
  standard SSL/TLS handshake process.
- CRL (Certificate Revocation List) checks are also conducted if CRL
  Distribution Points are present and accessible in the provided
  certificate.

HWAML Driver Changes & OCID-based Authorization:
------------------------------------------------
- Introduced "enable_encryption" (default=0).
    > Set to 1 to enable encryption.
- When receiving a new connection request and encryption is on, the
  driver performs OCID-based self-checking, comparing OCID retrieved
  from its own instance principal with the OCID in the
  provided certificate on disk.
- The driver compares OCID from "mysql_compute_id" and extracted OCID
  from mTLS certificate during connection.
- Introduced "cert_dir" argument for certificate directory
  specification.
- Expected files: cert_chain.pem, certificate.pem, private_key.pem.
    > OCID should be in the userID (UID) or CN field of the
      certificate.pem subject.
- CRL (Certificate Revocation List) checks are also conducted post
  handshake, if CRL Distribution Points are present and accessible in
  the provided certificate, alongside OCID authorization.

Encryption Behavior:
--------------------
- If encryption is deactivated on both plugin and driver side, HWAML
  will work without encryption as it was before this commit.

Enabling Encryption:
--------------------
- By default, "have_ml_encryption" and "enable_encryption" are set to 0
    > Encryption is disabled by default.
- For the HWAML plugin:
    > "have_ml_encryption" set to 1 (default is 0).
    > Specify the .pfx file's path using the "keystore".
- For the HWAML Driver:
    > "enable_encryption" set to 1 (default is 0)
    > Specify "mysql_instance_id" and "cert_dir".

Testing:
--------
- MTR has been modified for the encryption setup.
    > Runs with encryption if "OCI_INSTANCE_ID" is set to a valid
      value.
- On OCI (when "OLRAPID_KEYSTORE" is not set):
    > Certificates and keys are generated; PEMs for driver and PKCS12
      for plugin.
- On AWS (when "OLRAPID_KEYSTORE" is set as the path to PKCS12
  keystore files):
    > PEM files are extracted from the provided PKCS12 and used for
      the driver. The plugin uses the provided PKCS12 keystore file.

Change-Id: I553ca135241e03484db6debbe186e6d34d582bf4

This is the commit message #2:

WL#15746 - Adding ML encryption support to BM

Enabling ML encryption on Baumeister:
- Certificates are generated on MySQLd during initialization
- Needed certicates for workers are packaged and sent to worker nodes
- Workers use packaged files to generate their certificates
- Arguments are added to driver.py invoke
- Keystore path is added to mysql config

Change-Id: I11a5cc5926488ff4fbf91bb6c10a091358db7dc9

This is the commit message #3:

WL#15746: Enhanced CRL Daemon Checker

Issue
=====
The previous design assumed a plain HTTPS link for the CRL distribution
point, accessible to all. This assumption no longer holds, as public
accessibility for CRL distribution contradicts OCI guidelines. Now, the
CRL distribution point in certificates provided by the control plane is
expected to be protected by OCI Instance Principal Authentication.
However, using this authentication method introduces a delay of several
seconds, which is impractical for HeatWave-AutoML.

Solution
========
The CRL fetching code now uses OCI Instance Principal Authentication.
To mitigate performance issues, the CRL checking process has been
redesigned. Instead of enforcing CRL checks per connection in MySQL
Plugin and HeatWave-AutoML Driver communications, a daemon thread in
HeatWave-AutoML Driver, Dask scheduler, and Dask Worker now periodically
fetches and verifies the CRL against all active connections. This
separation minimizes performance impacts. Consequently, MySQL Plugin's
CRL checks have been removed, as checks in the Driver, Scheduler, and
Worker sufficiently cover all cluster nodes.

Changes
=======
- Implemented CRL checker as a daemon thread in Driver, Scheduler, and
  Worker.
- Each connection/socket has an associated CRL checker.
- CRL checks occur periodically at set intervals.
- Skips CRL check if the CRL is temporarily unavailable.
- Failing a CRL check results in the associated connection/socket being
  closed. On the Driver, a stop event is triggered (akin to CTRL-C).

Change-Id: Id998cfe9e15d9236291b0ae420d65c2197837966

This is the commit message #4:

WL#15746: Fix Dask workers being shutdown without releasing address

Issue
=====
Dask workers getting shutting but not releasing the address used
properly sometimes.

Solution
========
Reverted some changes in heatwave_cluster.py in dask worker shutdown
function. Hopefully this will fix the address issue

Change-Id: I5a6749b5a25b0ccb73ba7369e545bc010da1b84f

This is the commit message #5:

WL#15746: Implement Dask Worker Join Timeout for Head Node

Issue:
======
In the cluster_shutdown method, the join operation on the head node's
worker process lacked a timeout. This led to potential indefinite
waiting and eventual hanging of the head node.

Solution:
=========
A timeout has been introduced for the worker process join on the head
node. Unlike non-head nodes, which rely on worker join to complete Dask
tasks and cannot have a timeout, the head node can safely implement
this. Now, if the worker process on the head node fails to shut down
post-join, indicating a timeout, it will be manually terminated. This
ensures proper release of associated resources and prevents hanging of
the head node.

Additional Change:
==================
Added Cert Rotation Guard for DASK clusters. This feature initiates on
the first plugin-driver connection when the DASK cluster is off,
recording the certificate's expiry date. During driver idle times,
it checks the current cert's expiry against this date. If it detects a
change, indicating a certificate rotation, it shuts down the DASK
cluster. The cluster restarts on the next connection request, ensuring
the use of the latest certificate.

Change-Id: Ie63a2e2b7664e05e1622d8bd6503663e13fa73cb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants