Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash tests "no hole" verification gap for multiple CFs with disableWAL=1 and atomic_flush=false #11841

Open
dnanuti opened this issue Sep 15, 2023 · 2 comments

Comments

@dnanuti
Copy link

dnanuti commented Sep 15, 2023

This issue was confirmed privately with hx235 and the explanations on behavior and reproduction steps are provided by her. Opening this issue for traceability.

Assuming disable_wal = true, atomic_flush = false in the example below:
RocksDB-CrashScenario

Expected behavior

Assuming no disk corruptions, manual removal of SST files, etc. the recovered database is expected to contain CF2 with mutations 8, 11.
There is no hole in CF2 since we did not skip any operation (e.g. if we skipped 8 and have 11, then it would have been an invalid recovery).
This is a valid recovery despite that there exists data loss to other CFs. Such data loss is fine since atomic flush is false and CFs are independent from each other.

Actual behavior

The current crash-recovery-verification in crash tests actually does not behave correctly and it will treat it as incorrect recovery.
The reason is crash-recovery-verification assumes everything up to db→GetLatestSequenceNumber() is recovered, but this example shows that it is not the case for CFs 1, 3. In this case, the ExpectedState was replayed up to 11 and will have below data:
CF1 - 1, 2, 3, 4, 5, 6, 10
CF2 - 8, 11, explicit flush call just for CF2 after mutation with sequence number 11
CF3 - 7, 9
This will lead to an error in VerifyOrSyncValue() and signal that the recovery of the database has failed, while the database did recover all the data it could with due efforts.

Steps to reproduce the behavior

The above failure can be reproduced by running db stress manually:

(1) Run db stress under a debugger with —column_families=3 —disable_wal=1 –atomic_flush=0 —max_key=15 —flush_one_in=6 (and all the other xx_one_in=0 for simplicity) —writepercent=95 (and all the other xxpercent to be in total 5%) —threads=1 —sync_fault_injection=1

(2) Kill or exit the debugger when it finishes flush call on CF2’s 11 while all the other CFs are unflushed and contain seqnos between 8 and 11.

(3) Re-run db stress with the same settings but with —destroy_db_initially=0.

(4) The verification will fail for CF1 or CF3 on seqno before 11 saying something like “value_from_expected:.... Value not found: NotFound: Crash-recovery verification failed :(”

@hx235
Copy link
Contributor

hx235 commented Oct 20, 2023

FYI - the reason why we haven't run into this issue is our db_crashtest.py always turns on atomic flush when disable_wal=1

if dest_params.get("disable_wal", 0) == 1:
dest_params["atomic_flush"] = 1

@ajkr
Copy link
Contributor

ajkr commented Oct 20, 2023

I wonder if there is more detail on the use case or possible bugs. Typically atomic_flush would be enabled when WAL is disabled and crash-recovery consistency is needed. Otherwise inconsistency across column families will happen by design. I suppose a user could attempt to reconcile the inconsistency by inspecting the SST file seqnos. Or they might not care about consistency. But these users are hypothetical AFAIK, unless I'm told otherwise.

For possible bugs, disabling WAL seems to reduce the number of places a hole could occur, compared to having WAL enabled without atomic_flush. Maybe I'm wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants