-
Notifications
You must be signed in to change notification settings - Fork 714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
key_info[secondary_key].actual_key_parts does not include primary key on partitioned tables #105
Comments
I saved a core file. Stack trace from the failing assert:
|
Code around the assertion failure:
So this might be a MyRocks issue after all.
|
@spetrunia Could you paste the duplicate SingleDelete keys and associated schema definition? Duplicate SingleDelete keys is definitely a bug.. |
I could reproduce the assertion failure with log-bin enabled and binlog_format=statement. But I haven't been able to reproduce with binlog_format=row. |
When I reproduced a couple of times, index id was always 538, and the index was test.table100_key_pk_parts_2_int_autoinc#P#p1 col_set_utf8_not_null_key.
|
I tested with adding debugging logic (hexdump Put/SingleDelete keys), then noticed that for keys 0000021A0000000080000008 and 0000021A0000002080000040, SingleDelete was called twice but Put was called only once. SingleDelete was called from ha_rocksdb::delete_row(). Statements were as follows.
|
I tested with the following assertion code then verified it hit -- So somehow MyRocks called SingleDelete() on a secondary key that did not exist.
Next step will be identifying how this has happened. |
I managed to reproduce the assertion error with the following simple SQL statements. It was not concurrency issue. I could not reproduce the issue if not using Partition. I assume it was because multiple index ids were assigned for partitioned primary key.
Then consecutive SingleDelete for id=8 happened. It makes it easier to debug if applying a patch in the above comment -- aborting if Get() returned no result before calling SingleDelete(). |
@spetrunia I think we should discuss how index ids should be allocated on partitioned tables. Currently index_ids are different per partition, but I think index_id should be the same across partitions, at least on unique/primary key. |
I found that with partition, key_info[secondary_index_id].actual_key_parts did not include primary key.
This affects logic "if (old_data && !updated_indexes.is_set(i))" at https://github.com/facebook/mysql-5.6/blob/webscalesql-5.6.24.97/storage/rocksdb/ha_rocksdb.cc#L4890-L4891 -- with Partition, the if clause returned true, then further SingleDelete and Put were skipped. In other words, with partition, if primary key changed but secondary key column did not change, MyRocks may decide that the whole secondary key did not change, even though it actually changed in primary key part. Another side effect is with partition, covering index didn't work in the following case:
In InnoDB, covering index was used. |
key_info[secondary_key].actual_key_parts does not include primary key on partitioned tables -- This happens on both InnoDB and MyRocks. I still have no idea if it is right to use actual_key_parts to check if keys are modified/covered or not |
@yoshinorim , I'm sorry for the delay: Most of the code (e.g. encoding/decoding index tuples) is actually aware of the fact that secondary index is an "Extended key" with PK column at the end. The only part that's not aware is ha_rocksdb::calc_updated_indexes(). Here is a draft patch that makes calc_updated_indexes to take "extended keys" into account: https://gist.github.com/spetrunia/5c138da0dbcebf0f070c I would like to take another look at it tomorrow before committing it. |
Found another example where the MyRocks fails to take into account the hidden create table t1 (
pk int primary key,
col1 int,
col2 int,
key (col1) comment 'rev:cf1'
) engine=rocksdb partition by hash(pk) partitions 2;
insert into t1 values (1,10,10);
insert into t1 values (2,10,10);
insert into t1 values (11,20,20);
insert into t1 values (12,20,20);
The error happens in ha_rocksdb::read_key_exact(), which is invoked with |
Ok I figured out why index-only doesn't work for partitioned case. ha_rocksdb has ha_rocksdb::pk_can_be_decoded, and ha_rocksdb::table_flags() returns HA_PRIMARY_KEY_IN_READ_INDEX bit only when pk_can_be_decoded==true. The problem was that SQL layer (open_binary_frm() in particular) makes calls to This was resolved in LevelDB-SE as follows:
For non-partitioned tables, it worked: handler_file was a ha_rocksdb object, With partitioned tables, it doesn't work. ha_rocksdb::index_flags() gets Also, ha_partition only calls index_flags() for the first partition. (Maybe it |
Will try to find a solution for partitioned tables. |
The bug was found by Daniel Lee @dleeyh .
Start with a clear data directory.
Start the server with following my.cnf settings:
Use MariaDB's version of RQG from
http://bazaar.launchpad.net/~elenst/randgen/mariadb-patches/changes , I used revision 1050.
Run this command (adjust host/port/socket as appropriate)
It doesn't happen on every run, but on my machine there is about 30% chance that the server will crash like this:
RQG output will look like this:
The text was updated successfully, but these errors were encountered: