FB8-267: rocksdb.delete_ignore, #5

kamil-holubicki · 2021-11-05T10:06:44Z

rocksdb.update_ignorefail in 8.0.23 after porting
from 8.0.20

https://jira.percona.com/browse/FB8-268

This is related to the upstream bug:
https://bugs.mysql.com/bug.php?id=100352

Problem:
There is wrong assumption in the test that
WHERE field != (SELECT <mulirow_result>)
is the same as
WHERE field NOT IN (SELECT <mulirow_result>)

Fix:
Replace != with NOT IN and rerecord test result.

Summary: wl5222_debug_zip usually timeouts and making it a big test should help Reviewed By: Pushapgl Differential Revision: D26379752 fbshipit-source-id: 4e3ea153dbd

Summary: Fix to avoid signed extesnion when the value_length is a large number (such that the MSB is set). This happens in a config change event when the length of the config change string is large (causing the MSB in the 16bit unsigned integer to be set). While reading this back we have to ensure the signed conversion is done correctly. For example, a config change string of length 45460 bytes (0xb194) gets sign extended to 65428 bytes (0xff94) Reviewed By: anirbanr-fb Differential Revision: D26496395 fbshipit-source-id: 18be3f468d2

fbshipit-source-id: 9acffe3e091

Summary: The change removes interlocked locking of LOCK_log, LOCK_sync and LOCK_commit that was added due to START TRANSACTION WITH CONSISTENT SNAPSHOT implementation. START TRANSACTION WITH CONSISTENT SNAPSHOT requires accurate binlog position wrt committed transaction. Hence it drained all the transactions by taking LOCK_log, LOCK_sync and LOCK_commit. It is strictly not needed. This has caused deadlock issues for us several times in the past, and although we've been able to break the deadlock chain in other places, it seems like a better solution is just to remove this interlocked locking, which potentially decreases throughput anyway. The fix takes LOCK_log which prevents new transactions to sneak into binlog followed by waiting for all committing transaction to complete. binlog rotation already has this mechanism and the fix leverages the same code used by binlog rotation. Differential Revision: D21149796 fbshipit-source-id: 6ebc6ad0e38

Summary: 8.0.20 has support for zstd on the replication channel, but FB's 8.0.17 and 5.6.35 uses the compression_lib connection attribute to indicate which compression type to use. Add backwards compatibility so that if the attribute is present, then it will be used to determine compression type. Once everything is upgraded to 8.0.20, then this patch can be dropped. 8.0.20 also supports specifying the zstd compression level from the client. However, the 8.0.17 implementation used the compression level on the server, so client requested compression level is ignored if the compression_lib attribute is specified (i.e. this patch will enforce using all compression level settings from the server). While 8.0.20 allows selecting individual compression types through the client flags and enabling/disabling server support for some compression types, for backwards compatibility, this has been reduced to either supporting all types, or support none of the compression types. Certains tests needed to be rewritten and re-recorded for this change. The main changes are as follows: 1. Modify server handshake to include CLIENT_COMPRESS if any compression type is supported. 2. Modify client handshake reply to include the compression_lib attribute to indicate the type of compression supported if compression is requested. The CLIENT_COMPRESS bit is only set if the server does not advertise any of the new compression settings. 3. Modify the server parsing of the handshake to use the compression_lib attribute to designate the compression type. If the compression_lib attribute is not present, then fall back to normal compression negotiation. Reviewed By: lth Differential Revision: D26472664 fbshipit-source-id: 53d33318f77

Summary: Adding log pos and source (i.e. relay log before image) and local (i.e. the image from the storage engine) column values in rbr_bi_inconsistencies. Only the mismatching columns are recorded in a col=val format (comma separated). Reviewed By: hermanlee Differential Revision: D26518188 fbshipit-source-id: 9a38635ee64

…e PK to begin with. Summary: Related diff - D25966887 (facebook@1df79a8) Notes- sql_require_primary_key blocks alter table operations if the existing table has no primary key. Due to this we have received sevs in production where secondaries broke. Sequence of steps- AOSC schema update to add primary key in existing table. This change is applied on 1 secondary and primary. Before all secondaries finished the AOSC schema update, we had received DDL command(drop another index apart from PK) on primary. DDL command finished on primary and failed to execute on secondaries which doesn't has ASOC update. Reason for failure is we block alter if sql_require_primary_key is true even when there is no PK existing. Reviewed By: pradeep1288 Differential Revision: D26435571 fbshipit-source-id: 7f9c12773cf

Summary: When running 'show processlist' on linux OS, additionally show system thread id. It is helpful to have the system thread pid of a connection to trace individual threads or change process priority. Previously, we had to look through quickstack output to find the thread id based on a specific stack. Also, there is a race in show processlist, which shows the actual show processlist run as either "init" or "cleaning up". To avoid the test inconsistency, I replace both with "STATE" in many tests. Reference Patch: facebook@d6278638 ---- Porting Notes: * Updated to use _WIN32 instead of TARGET_OS_LINUX * Always call capture_system_thread_id (even for Windows which just set to 0) * I didn't port the test changes as-is and instead just update the failing tests to mask TID Reviewed By: Pushapgl Differential Revision: D26295836 fbshipit-source-id: 9b3208a2919

Summary: For range query case with multiple OR, optimizer rely on `tree_or` and `key_or` to 'merge' the OR sub-tree together. And in the case of having multiple ranges that are the same, `key_or` would see that both ranges key1 and key2 are exactly the same, release key2->next_key_part, and then OR key1->next_key_part and key2->next_key_part= NULL (since it is released) with `key_or`. However, because NULL sub-tree are treat as TRUE (same as is_always), the entirety of key2->next_key_part gets dropped, so you end up with just the first key. It makes sense for NULL sub-tree to be treat as TRUE for `key_or`, because for example (A > 1 AND NULL) OR (A > 1 AND B > 1 AND NULL) should be merged as (A > 1 AND NULL), so it makes sense for NULL to be treated as TRUE so that `key_or(NULL, key)` should come back as NULL. So the right way to fix this is to stop handling current range and simply move to the next range (without falling through to the remaining logic). Because of this bug the range query plan end up using much less keys and end up being more expensive, so usually a ref plan gets picked with less keys, and the query becomes much more expensive. Upstream bug: https://bugs.mysql.com/bug.php?id=102634 Reviewed By: lth, Pushapgl Differential Revision: D26477017 fbshipit-source-id: 11df8d0c68c

Summary: Port D24628821 mysqld removes partial trxs in the tail of trx log (named binary-logs on primaries and apply-logs on secondaries) during startup. However, relay logs were not of much importance since it was anyways discarded and a new one would be created. However, with raft, this is not ideal. Relay logs are raft logs on secondaries and have to be kept around (and kept sane and consistent). This diff adds the ability to remove partial trxs from raft/relay logs. Much of the code to open the last relay log (based on relay log index) and identify partial trxs is borrowed from existing logic in MYSQL_BIN_LOG::open_binlog() and binlog_recover() Reviewed By: Pushapgl Differential Revision: D26447448 fbshipit-source-id: 046d49bbb4a

Summary: Port D25584004 A master.info and relay.info file can be present but needs to be properly inited for use. We were bypassing the inited check which could lead to issues in Raft. In case there is an error in global_init_info, Raft will do a raft_reset_slave and make another attempt at it. If both recourses fail, the init of the plugin would fail. Reviewed By: Pushapgl Differential Revision: D26447457 fbshipit-source-id: 205b497ccf8

Summary: [Porting Notes] We want to dump raft logs to vanilla async replicas regardless of whether it's the relay log or binlog. Effectively after this change we'll dump relay logs on the followers and binlogs on the leader. When the raft role changes, the logs to the dumped are also changed. Dump_log class is introduced as a thin wrapper/continer around mysql_bin_log or rli->relay_log and is inited with mysql_bin_log to emulate vanilla mysql behavior. Dump threads use the global dump_log object instead of mysql_bin_log directly. We switch the log in dump log only when raft role changes (in binlog_change_to_binlog() and binlog_change_to_apply_log()). During raft role change we take all log releated locks (LOCK_log, LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with other log operations like dumping logs. Related doc - https://fb.quip.com/oTVAAdgEi4zY This diff contains below 7 patches: D23013977 D24766787 D24716539 D24900223 D24955284 D25174166 D25775525 Reviewed By: luqun Differential Revision: D26141496 fbshipit-source-id: c29c7bd73d5

Summary: This diff adds server CPU time to query response attributes. The key of the attribute is "server_cpu" and the value is the stringified version of the cpu time. This diff also adds warnings information to query response attributes. The key of the attribute is "warnings" and the value is the string that serializes the warnings information. Since there could be more than one warnings for a statement, the value is a list of pairs included with in brackets separated by commas. The first value of the pair is the error number (code) and the second value of the pair is the message text and these are separated by a comma. The following example shows the query response attribute for 'warnings' where two warnings are raised for the statement: **1265,Data truncated for column 'c1' at row 1),(1264,Out of range value for column 'c2' at row 1)** The variable 'response_attrs_contain_warnings' controls this feature and is disabled by default. If the query raises any errors then the response OK packet is not set to the client. So in this case putting warnings into response attributes is skipped. Reviewed By: george-reynya Differential Revision: D26598990 fbshipit-source-id: e3691a0c366

Summary: First clean run of entire raft test suite :) **Changes** * Reset apply_logs on raft secondaries before the start of every test * Increased the lock timeouts and changed isolation level in rpl_raft_slave_out_of_order_commit. Reviewed By: luqun Differential Revision: D26651257 fbshipit-source-id: 78a29246156

Summary: Port D23065441 (facebook@b9067f7) The new macro is used to call into raft plugin. If plugin gets unloaded accidentally when enable_raft_plugin is ON, then this STRICT version returns failure. This is to be called only by raft plugin currently Reviewed By: Pushapgl Differential Revision: D26447523 fbshipit-source-id: 3ba4a907a54

Summary: Early returning in the FD:print() function makes mysqlbinlog not be able to parse Raft logs on secondaries. The original commit which added this is d048c0f (P173872135) To comply with the intent of the original bug fix, we avoid printing the FD event of a relay log as a 'BINLOG'. Reviewed By: anirbanr-fb Differential Revision: D26359417 fbshipit-source-id: 9ef927b6940

Summary: During idempotent recovery, every trx recovered is printed in the mysqld error logs. This can caused the mysqld error logs to balloon up in size and get as large as 100MB Reviewed By: hermanlee Differential Revision: D26773913 fbshipit-source-id: 627328485fe

Summary: In MySQL8: - When call `set global GTID_PURGED="UUID:ID1 (facebook@fea5bde449680bb466b5e76bf9d617acbf6511d4)-ID2 (https://github.com/facebook/mysql-5.6/commit/f4cefbaca8753c582ee1b372be9be9c6fc9915b4)"`, it will store "UUID:ID1 (facebook@fea5bde449680bb466b5e76bf9d617acbf6511d4):ID2 (facebook@f4cefba)" into lost_gtids, executed_gtids, gtids_only_in_table variables and also store "UUID:ID1 (facebook@fea5bde449680bb466b5e76bf9d617acbf6511d4):ID2 (facebook@f4cefba)" into mysql.executed_gtid table. - when call `flush logs`, it will store (executed_gtids - gtids_only_in_table) as Previous_gtid_set. For `set global GTID_PURGED="UUID:ID1 (facebook@fea5bde449680bb466b5e76bf9d617acbf6511d4)-ID2 (https://github.com/facebook/mysql-5.6/commit/f4cefbaca8753c582ee1b372be9be9c6fc9915b4)";flush logs`, it will store <empty> as Previous_gtid_set in the new binlog files, which is different than MySQL 5.6 behavior--- In MySQL 5.6, it will store "UUID:ID1 (facebook@fea5bde449680bb466b5e76bf9d617acbf6511d4):ID2 (facebook@f4cefba)" as Previous_gtid_set in new binlog file for `set global GTID_PURGED="UUID:ID1 (facebook@fea5bde449680bb466b5e76bf9d617acbf6511d4)-ID2 (https://github.com/facebook/mysql-5.6/commit/f4cefbaca8753c582ee1b372be9be9c6fc9915b4)`. The change is to always store executed_gtids instead of (executed_gtids - gtids_only_in_table) as Previous_gtid_set during flush logs. - `set global GTID_PURGED="UUID:ID1 (facebook@fea5bde449680bb466b5e76bf9d617acbf6511d4)-ID2 (https://github.com/facebook/mysql-5.6/commit/f4cefbaca8753c582ee1b372be9be9c6fc9915b4)";flush logs` executed_gtids == UUID:ID1 (facebook@fea5bde449680bb466b5e76bf9d617acbf6511d4)-ID2 (facebook@f4cefba) (executed_gtids - gtids_only_in_table) == <empty> - `set global GTID_PURGED="UUID:ID1 (facebook@fea5bde449680bb466b5e76bf9d617acbf6511d4)-ID2 (facebook@f4cefba)"; sync another trans;flush logs` executed_gtids == UUID:ID1 (facebook@fea5bde449680bb466b5e76bf9d617acbf6511d4)-ID3 (executed_gtids - gtids_only_in_table) == UUID:ID3 - `set global GTID_PURGED="UUID:ID1 (facebook@fea5bde449680bb466b5e76bf9d617acbf6511d4)-ID2 (facebook@f4cefba)"; restart slave;flush logs` executed_gtids == UUID:ID1 (facebook@fea5bde449680bb466b5e76bf9d617acbf6511d4)-ID2 (facebook@f4cefba) (executed_gtids - gtids_only_in_table) == <empty> in MySQL 8: variable gtids_only_in_table is initialized with different GTIDs from GLOBAL.GTID_EXECUTED and GTID_EXECUTED_BINLOG during server restarting. Reviewed By: bhatvinay Differential Revision: D17873651 (facebook@40c0156) fbshipit-source-id: 99e83d91fd1

Summary: Port D23366447 (facebook@79e3b75) Extending the variable slave_preserve_commit_order to allow three values NONE, DB and GLOBAL. When the variabale it set to GLOBAL we order commits even across databases. Reviewed By: abhinav04sharma Differential Revision: D24213844 fbshipit-source-id: 0d330f38b0a

Summary: MySQL plugin for privacy (D26704751) is build using the AUDIT plugin. The plugin requires access to the from tables of the query. This is extracted from the LEX tree produced by parser after the parse step is complete. This diff adds a function to sql/sql_thd_internal_api.h/.cc: - thd_get_query_tables() - Returns the list of tables extracted from the LEX tree produced by parser. Reviewed By: lth, george-reynya Differential Revision: D26704727 fbshipit-source-id: e0776000860

Summary: When there are too many test threads running on the same host, mtr will sometimes skip a bunch of tests with the error message: ``` worker[25] mysql-test-run: *** ERROR: Could not get a unique build thread id ``` Since we have excluded ports 13000-13600, and each thread takes at least 20 ports, this means we only have room for around 20 threads per host (and we currently run with --parallel=16). The fix is to bump the limit so that 200 ids (or 2000 ports) are available. This means we might start using ports in the 14000 range. Reviewed By: yizhang82 Differential Revision: D26821198 fbshipit-source-id: 8fa5f37e315

…tener thread Summary: Port D25572614 The timestamp of a binlog event is picked up from the when field in the event. In most cases of rotation, the when is left unpopulated during rotation for the top 3 events (fd, pgtid, metadata). However in such a situation, a normal rotate (flush binary logs) still manages to get a valid timestamp, since the thread in which the flush binary logs happens has a valid start time. Now enter Raft relay log rotations. In those cases and in the case of config change rotate, the rotations are happening in the context of a raft listener queue thread. In that context, the when and start time of the thread are both 0. The diff handles this case by populating the when field appropriately. Reviewed By: bhatvinay Differential Revision: D26194612 fbshipit-source-id: 11804b2f357

Summary: binlog file should get trimmed for abrupt stepdown Reviewed By: Pushapgl, bhatvinay Differential Revision: D26169975 fbshipit-source-id: 4171b654aab

Summary: **Notes** * New functions to block and unblock dump threads for plugin to use during raft log truncation. Below check is already done in raft plugin as part of raft plugin in D26866429. * Re-init rli->cur_log in Relay_log_info::rli_init_info() instead of just seeking to the beginning to handle the case when raft log is truncated before starting the applier Reviewed By: luqun Differential Revision: D26759813 fbshipit-source-id: 22dffff3f37

Summary: During online alter table, we occasional call transaction commit in order to refresh a snapshot. However, by calling commit, we're accidentally losing the name of the transaction. When we lose the name of the transaction, this error occurs during transaction prepare: ``` [ERROR] [MY-000000] [Server] RocksDB: Failed to read/write in RocksDB, Status Code: 4, Status: Invalid argument: Cannot prepare a transaction that has not been named. ``` We don't actually need to call commit in this case, as the intention was just to refresh the snapshot so that the newly bulk loaded keys become visible after `finalize_bulk_load`. Reviewed By: yizhang82 Differential Revision: D26799937 fbshipit-source-id: 0422d4b244f

Summary: In current bypass implementation, its range query implementation is not correct. For A >= begin and A <= end , it'll only pick begin *or* end as start of the range, but it will only use the prefix as the end of the range, which means it can potentially scan a lot more rows than needed. It rely on the condition A >= begin / A <= end to filter out the unnecessary rows so the end result is correct (that's why this issue is not discovered in testing). The correct way to implement is to set up (begin, end) range slice correctly, and do NOT rely on any conditional evaluation to filter rows. The tricky part is to determine the correct begin/end based on a few factors: * forward / reverse column family * ascending / descending orders * inclusive vs non-inclusive ranges (>=, <= vs <, >) * prefix query vs range query For better testing, I've done the following: * verify_bypass_query.inc that will run the query in bypass and non-bypass, verify the row reads are same or less than non-bypass case to ensure such issue won't happen again. Most bypass query validation are moved to use verify_bypass_query.inc. We can also add more validation in the future in verify_bypass_query.inc as needed. As a bonus, it'll make verifying the results are the same in bypass/non-bypass case easy as well. * move range key validation from bypass_select_scenarios into bypass_select_range_pk and bypass_select_range_sk * added more test cases to bypass_select_range_pk / bypass_select_range_sk, especially for PK case where some scenarios are missing * dump query results to file and verify query results didn't change w/ and w/o bypass. I had to back port `output` mysqltest support to make this work properly (for some reason --exec $MYSQL doesn't like multiline queries". For review, there is no need to go through results file as there are validation in-place to make sure query return same results w/ and w/o bypass. Reference Patch: facebook@ef9a677 ------------ Porting Notes: * Disabled a scenairo with TEXT BLOB prefix key scenario that would require one of the later commits to fix * ubsan reported a "bug" with memcmp(NULL, NULL, 0) with Rdb_string_writer comparison. The fix has been back ported to 5.6. * Tests caught a bug that THD::inc_sent_row_count now increments status var resulting double counting Reviewed By: Pushapgl Differential Revision: D26564619 fbshipit-source-id: 8fd763c7cbd

Summary: We can handle bloom filter case for both forward / reverse order in the same way by finding the longest prefix of eq and initial_pos_slice - this should handle some edge cases better. Also setup iterator bounds using MyRocks helpers when bloom filter can't be used, for faster access. Reference Patch: facebook@3bfd0f9 Reviewed By: luqun Differential Revision: D26564674 fbshipit-source-id: d5997d94a52

Summary: In most cases bypass parser `select_parser` should identify unsupported case and fallback to regular query. However, there are a few checks that are expensive and is done as part of execution, and we should properly fallback in those case as well. In a future version maybe we should refactor it so that those are another phase of parsing as well, which would also help bypass RPC. Reference Patch: facebook@5452df8 Reviewed By: luqun Differential Revision: D26564781 fbshipit-source-id: 07297325810

Summary: For scenarios like A >= 1 and B >= 2, doing it correctly requires skip scan which isn't supported yet. Today we would just use A >= 1 as the starting point and use B >= 2 as filter to discard unwanted rows, which isn't efficient. There are other cases too, such as range query using force index A but uses key from index B. Either way our production scenarios for bypass doesn't have these scenarios but it would be good to add a switch to disallow such non-optimal cases and simply fallback to regular range query for such corner cases. This diff adds `rocksdb-select-bypass-allow-filters ` switch to allow/block filters. It is enabled by default to make tests happy but will be disabled in prod. Reference Patch: facebook@7cdb33a Reviewed By: Pushapgl Differential Revision: D26566608 fbshipit-source-id: e29f1cdbbcd

Summary: For scenarios like `SELECT A FROM table WHERE B > 1`, we only consider the fields in the SELECT item list but not the WHERE condition, so we end up returning garbage. This uses readset and walk through all fields, just like what we do in setup_field_decoders. I have another pending change to make the cover check take columns keys into account. Fortunately for we don't have those in prod for bypass queries yet. Reference Patch: facebook@569f281 Reviewed By: Pushapgl Differential Revision: D26566928 fbshipit-source-id: bbbe69713b0

Summary: [Porting Notes] We want to dump raft logs to vanilla async replicas regardless of whether it's the relay log or binlog. Effectively after this change we'll dump relay logs on the followers and binlogs on the leader. When the raft role changes, the logs to the dumped are also changed. Dump_log class is introduced as a thin wrapper/continer around mysql_bin_log or rli->relay_log and is inited with mysql_bin_log to emulate vanilla mysql behavior. Dump threads use the global dump_log object instead of mysql_bin_log directly. We switch the log in dump log only when raft role changes (in binlog_change_to_binlog() and binlog_change_to_apply_log()). During raft role change we take all log releated locks (LOCK_log, LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with other log operations like dumping logs. Related doc - https://fb.quip.com/oTVAAdgEi4zY This diff contains below 7 patches: D23013977 D24766787 D24716539 D24900223 D24955284 D25174166 D25775525 Reviewed By: luqun Differential Revision: D26141496 ------------------------------------------------------------------------------- Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat() Summary: When enable_raft_plugin is OFF Dump_log::lock() is a no-op. Which means that when enable_raft_plugin is OFF there can be a race between log switching and dump threads. This could lead to a scenario where the raw_log that wait_next_event() is working on might be different than what wait_with_heartbeat()/wait_without_heartbeat() is working on. This can cause deadlocks because wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would unlock and then lock a different log's LOCK_binlog_end_pos mutex which would then never be unlocked by wait_next_event(). Reviewed By: anirbanr-fb Differential Revision: D32152658 fbshipit-source-id: d96ebcef966 ----------------------------------------------------------------------------------------- Fix rpl_raft_dump_raft_logs Summary: This tests completes but fails because the following warning exists: ``` 2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114 ``` Since the MTR result file is valid, we can suppress this error. Reviewed By: yichenshen Differential Revision: D39141846 fbshipit-source-id: 8e7fdb8 ------------------------------------------------------------------------------- Fix heap overflow in group_relay_log_name handling Summary: We were accessing group_relay_log_name in Query_log_event::do_apply_event_worker() but it's assigned only after the coordinator thread encounters an end event (i.e. xid event or a query event with "COMMIT" or "ROLLBACK" query). This was causing a race between accessing group_relay_log_name in the worker thread and writing it on the coordinator thread. We don't need to set transaction position in events other than end event, so now we set transaction position in query event only if it's an end event. The race is eliminated because group_relay_log_name is set before enqueuing the event to the worker thread (in both dep repl and vanilla mts). Reviewed By: lth Differential Revision: D28767430 ------------------------------------------------------------------------------- fix memory during MYSQL_BIN_LOG::open_existing_binlog Summary: asandebug complain there are memory leaks during MYSQL_BIN_LOG open Direct leak of 50 byte(s) in 1 object(s) allocated from: #0 0x67460ef in malloc #1 0x93f0777 in my_raw_malloc(unsigned long, int) #2 0x93f064a in my_malloc(unsigned int, unsigned long, int) #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int) #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int) #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool) #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*) #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*) #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*) #9 0x8c7696a in process_raft_queue #10 0xa0fa1fd in pfs_spawn_thread(void*) #11 0x7f8c9a12b20b in start_thread release these memory before assign them Reviewed By: Pushapgl Differential Revision: D28819752

…?e=20to=20make=20all=20keys=20available=20to=20al=E2=80=A6=20(#5?= =?UTF-8?q?55)?= (facebook#555) Summary: …l of MyRocks code. As port to 5.7 and 8.0 will add memory keys and having global access eliminates need to pass keys around across function/member barriers. Closes facebook#555 Differential Revision: D4647988 Pulled By: jkedgar

…acebook#871) Summary: Original report: https://jira.mariadb.org/browse/MDEV-15816 To reproduce this bug just following below steps, client 1: USE test; CREATE TABLE t1 (i INT) ENGINE=MyISAM; HANDLER t1 OPEN h; CREATE TABLE t2 (i INT) ENGINE=RocksDB; LOCK TABLES t2 WRITE; client 2: FLUSH TABLES WITH READ LOCK; client 1: INSERT INTO t2 VALUES (1); So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE. Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly set to RDB_LOCK_NONE, as below ``` #0 myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE) #1 get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2) #2 mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0) #3 THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true) #4 MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8) #5 MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2) #6 Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0) ``` Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE' failed in myrocks::ha_rocksdb::write_row() Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE. Closes facebook#838 Pull Request resolved: facebook#871 Differential Revision: D9417382 Pulled By: lth

Summary: 1. Account for explain format changes 2. Binary text isn't supported as MySQL has changed its internal make_sort_key implementation and we haven't yet account for that. Disabling the scenario with disable_testcase BUG#888003 for now. There is already a task for it. 3. Change regex for MySQL log format changes in rocksdb_checksums.test 4. Fix a bug where index stats calculation background threads are not started properly as the macro TARGET_OS_LINUX isn't defined. Reviewed By: lloyd Differential Revision: D17622870

Summary: In MySQL 8.0.17, sending data stage is gone. As a result, in testcase #5/#6 the test is waiting on sending data stage and timed out, and at that point the lock is already taken, so trying to take the same lock on the same row on another connection simply timed out, instead of getting a deadlock/snapshot conflict. For now I'm using an slightly earlier stage "executing" - this aligns what Percona has done and we can see if this works reasonably well. If not we can see if we can introduce the old stage back. Note: The current implementation of the test can be flaky - it depends on the SELECT has already started the scan over some of the rows and taken snapshot, but before taking the lock, so that you can get snapshot conflict in another connection doing delete over the same row (instead of timeout with lock contention). Given that the test is intended to test taking snapshot before doing any get, this is fortunately the best the test can do at this point. Reviewed By: lloyd Differential Revision: D18716622

Summary: [Porting Notes] We want to dump raft logs to vanilla async replicas regardless of whether it's the relay log or binlog. Effectively after this change we'll dump relay logs on the followers and binlogs on the leader. When the raft role changes, the logs to the dumped are also changed. Dump_log class is introduced as a thin wrapper/continer around mysql_bin_log or rli->relay_log and is inited with mysql_bin_log to emulate vanilla mysql behavior. Dump threads use the global dump_log object instead of mysql_bin_log directly. We switch the log in dump log only when raft role changes (in binlog_change_to_binlog() and binlog_change_to_apply_log()). During raft role change we take all log releated locks (LOCK_log, LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with other log operations like dumping logs. Related doc - https://fb.quip.com/oTVAAdgEi4zY This diff contains below 7 patches: D23013977 D24766787 D24716539 D24900223 D24955284 D25174166 D25775525 Reviewed By: luqun Differential Revision: D26141496 ------------------------------------------------------------------------------- Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat() Summary: When enable_raft_plugin is OFF Dump_log::lock() is a no-op. Which means that when enable_raft_plugin is OFF there can be a race between log switching and dump threads. This could lead to a scenario where the raw_log that wait_next_event() is working on might be different than what wait_with_heartbeat()/wait_without_heartbeat() is working on. This can cause deadlocks because wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would unlock and then lock a different log's LOCK_binlog_end_pos mutex which would then never be unlocked by wait_next_event(). Reviewed By: anirbanr-fb Differential Revision: D32152658 fbshipit-source-id: d96ebcef966 ----------------------------------------------------------------------------------------- Fix rpl_raft_dump_raft_logs Summary: This tests completes but fails because the following warning exists: ``` 2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114 ``` Since the MTR result file is valid, we can suppress this error. Reviewed By: yichenshen Differential Revision: D39141846 fbshipit-source-id: 8e7fdb8 ------------------------------------------------------------------------------- Fix heap overflow in group_relay_log_name handling Summary: We were accessing group_relay_log_name in Query_log_event::do_apply_event_worker() but it's assigned only after the coordinator thread encounters an end event (i.e. xid event or a query event with "COMMIT" or "ROLLBACK" query). This was causing a race between accessing group_relay_log_name in the worker thread and writing it on the coordinator thread. We don't need to set transaction position in events other than end event, so now we set transaction position in query event only if it's an end event. The race is eliminated because group_relay_log_name is set before enqueuing the event to the worker thread (in both dep repl and vanilla mts). Reviewed By: lth Differential Revision: D28767430 ------------------------------------------------------------------------------- fix memory during MYSQL_BIN_LOG::open_existing_binlog Summary: asandebug complain there are memory leaks during MYSQL_BIN_LOG open Direct leak of 50 byte(s) in 1 object(s) allocated from: #0 0x67460ef in malloc #1 0x93f0777 in my_raw_malloc(unsigned long, int) #2 0x93f064a in my_malloc(unsigned int, unsigned long, int) #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int) #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int) #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool) #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*) #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*) #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*) #9 0x8c7696a in process_raft_queue #10 0xa0fa1fd in pfs_spawn_thread(void*) #11 0x7f8c9a12b20b in start_thread release these memory before assign them Reviewed By: Pushapgl Differential Revision: D28819752

Summary: [Porting Notes] We want to dump raft logs to vanilla async replicas regardless of whether it's the relay log or binlog. Effectively after this change we'll dump relay logs on the followers and binlogs on the leader. When the raft role changes, the logs to the dumped are also changed. Dump_log class is introduced as a thin wrapper/continer around mysql_bin_log or rli->relay_log and is inited with mysql_bin_log to emulate vanilla mysql behavior. Dump threads use the global dump_log object instead of mysql_bin_log directly. We switch the log in dump log only when raft role changes (in binlog_change_to_binlog() and binlog_change_to_apply_log()). During raft role change we take all log releated locks (LOCK_log, LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with other log operations like dumping logs. Related doc - https://fb.quip.com/oTVAAdgEi4zY This diff contains below 7 patches: D23013977 D24766787 D24716539 D24900223 D24955284 D25174166 D25775525 Reviewed By: luqun Differential Revision: D26141496 ------------------------------------------------------------------------------- Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat() Summary: When enable_raft_plugin is OFF Dump_log::lock() is a no-op. Which means that when enable_raft_plugin is OFF there can be a race between log switching and dump threads. This could lead to a scenario where the raw_log that wait_next_event() is working on might be different than what wait_with_heartbeat()/wait_without_heartbeat() is working on. This can cause deadlocks because wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would unlock and then lock a different log's LOCK_binlog_end_pos mutex which would then never be unlocked by wait_next_event(). Reviewed By: anirbanr-fb Differential Revision: D32152658 ----------------------------------------------------------------------------------------- Fix rpl_raft_dump_raft_logs Summary: This tests completes but fails because the following warning exists: ``` 2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114 ``` Since the MTR result file is valid, we can suppress this error. Reviewed By: yichenshen Differential Revision: D39141846 ------------------------------------------------------------------------------- Fix heap overflow in group_relay_log_name handling Summary: We were accessing group_relay_log_name in Query_log_event::do_apply_event_worker() but it's assigned only after the coordinator thread encounters an end event (i.e. xid event or a query event with "COMMIT" or "ROLLBACK" query). This was causing a race between accessing group_relay_log_name in the worker thread and writing it on the coordinator thread. We don't need to set transaction position in events other than end event, so now we set transaction position in query event only if it's an end event. The race is eliminated because group_relay_log_name is set before enqueuing the event to the worker thread (in both dep repl and vanilla mts). Reviewed By: lth Differential Revision: D28767430 ------------------------------------------------------------------------------- fix memory during MYSQL_BIN_LOG::open_existing_binlog Summary: asandebug complain there are memory leaks during MYSQL_BIN_LOG open Direct leak of 50 byte(s) in 1 object(s) allocated from: #0 0x67460ef in malloc #1 0x93f0777 in my_raw_malloc(unsigned long, int) #2 0x93f064a in my_malloc(unsigned int, unsigned long, int) #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int) #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int) #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool) #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*) #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*) #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*) #9 0x8c7696a in process_raft_queue #10 0xa0fa1fd in pfs_spawn_thread(void*) #11 0x7f8c9a12b20b in start_thread release these memory before assign them Reviewed By: Pushapgl Differential Revision: D28819752

yizhang82 and others added 30 commits November 5, 2021 10:33

Make wl5522_debug_zip a big test

af68ecb

Summary: wl5222_debug_zip usually timeouts and making it a big test should help Reviewed By: Pushapgl Differential Revision: D26379752 fbshipit-source-id: 4e3ea153dbd

PS-6790 : Introduce reduced doublewrite buffer mode

3e8375e

fbshipit-source-id: 9acffe3e091

Raft abrupt stepdown and trim binlog file / gtid test

8b30ec7

Summary: binlog file should get trimmed for abrupt stepdown Reviewed By: Pushapgl, bhatvinay Differential Revision: D26169975 fbshipit-source-id: 4171b654aab

inikep pushed a commit that referenced this pull request Jun 14, 2023

Fix Issue #5: Transaction rollback doesn't undo all changes.

1331798

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FB8-267: rocksdb.delete_ignore, #5

FB8-267: rocksdb.delete_ignore, #5

kamil-holubicki commented Nov 5, 2021

FB8-267: rocksdb.delete_ignore, #5

FB8-267: rocksdb.delete_ignore, #5

Conversation

kamil-holubicki commented Nov 5, 2021