Detect missing records in WAL recovery #12488

ajkr · 2024-03-29T18:01:46Z

Current methods can only detect missing records in certain scenarios.

One scenario not covered is clean truncation, which is when the WAL is truncated to an exact record boundary. The simplest case is truncating a WAL to be empty and losing all its records.

Another scenario not covered is corruption with zero bytes. Zero bytes at the WAL tail naturally occur when pre-allocating WAL space, so LevelDB chose to treat such records as valid. But this causes us to miss cases where the WAL tail is unintentionally corrupted with zero bytes.

The solution I was thinking of was: "We could start requiring consecutive WALs to have consecutive sequence numbers. To prevent disableWAL=true writes from getting in the way, we could begin every WAL with a dummy entry with sequence number consecutive to that of the final record in the previous WAL." (#12467 (comment)).

Other ideas are welcome.

The text was updated successfully, but these errors were encountered:

ajkr · 2024-03-29T21:05:43Z

Another way could be to append a marker record to the active WAL before switching it. Recovery would only continue to recover a newer WAL after seeing such a marker record. Unlike the existing code for writing dummy entries, which is constrained to writing to the new WAL as it is part of DB open, the new code should be able to mutate a WAL right before switching over to a new one.

cbi42 · 2024-04-01T16:53:31Z

The second approach sounds cleaner since it does not require consecutive sequence numbers in consecutive WALs. We may still need to put some special value in the marker record to tell it apart from an old marker record in a reused WAL.

ajkr · 2024-04-01T17:47:36Z

I guess we can use the recyclable record format for this marker record - https://github.com/facebook/rocksdb/wiki/Write-Ahead-Log-File-Format#the-recyclable-record-format. It has a log number that we can match against the log number in the filename.

ajkr · 2024-04-01T20:21:24Z

The second approach sounds cleaner since it does not require consecutive sequence numbers in consecutive WALs.

For the first approach, a dummy record with a new type at the start of the WAL could require recovery to have reached a specific sequence number before continuing. The feature would be opt-in to not break downgrade compatibility.

For the second approach, how would we know that an existing WAL should end with an ending marker, as the WAL format is unversioned? It feels like we would need a new record type at the start anyways to indicate that. It may be worthwhile if we make this new record type carry a version number.

cbi42 · 2024-04-01T22:24:06Z

It feels like we would need a new record type at the start anyways to indicate that.

Makes sense.

For the dummy record, if it contains the size of the previous WAL file, it seems this feature provides a better protection compared to track_and_verify_wals_in_manifest for non-active WALs.

nkg- · 2024-04-04T23:19:23Z

@ajkr : Another case, where some write is not fsynced to file, and missing completely from the file. That should cause missing records from WAL. Would that be caught today. Atleast I have seen (based on corruption logs), we check for crc checksum per record, and truncated records (which are a result of partial writes).

ywave620 · 2024-04-28T07:32:02Z

where some write is not fsynced to file, and missing completely from the file

@nkg- I think this only results from write with option.sycn=false, so it is expected to both DB user and DB developer

ywave620 · 2024-12-09T03:22:55Z

@ajkr Could you tell me is there any difference in purpose between the second approach and the current track_and_verify_wals_in_manifest ? And also, I think we can not efficiently detect missing record in the latest WAL, because it would require somehow track its synced size or checksum after every write with option.sync=true. @cbi42

hx235 · 2024-12-09T05:56:45Z

Could you tell me is there any difference in purpose between the second approach and the current track_and_verify_wals_in_manifest

track_and_verify_wals_in_manifest required WriteOptions::sync = true to sync hence record the closed wal. The proposed solution doesn't.

And also, I think we can not efficiently detect missing record in the latest WAL, because it would require somehow track its synced size or checksum after every write with option.sync=true. @cbi42

Yes I think so too.

ywave620 · 2024-12-11T05:18:08Z

Thanks. @hx235 Could I deem this issue as an unhandled edge case of track_and_verify_wals_in_manifest, where the active WAL(I think there can be one active WAL at any given time) has not ever been synced and the last inactive WAL is (fully synced but) cleanly truncated later. @ajkr

Recall that before an WAL becomes inactive, its synced size will not be tracked in manifest.

hx235 · 2024-12-12T09:09:34Z

where the active WAL(I think there can be one active WAL at any given time) has not ever been synced and the last inactive WAL is (fully synced but) cleanly truncated later.

Catching missing data in active WAL won't be handled by track_and_verify_wals_in_manifest=true nor this proposal. This proposal cares more about the hole so old data is missing from inactive WAL while active WAL has its data.

Catching missing data in inactive WAL can be handled by track_and_verify_wals_in_manifest=true because the last inactive WAL will have its size recorded in manifest to compare against the truncated size an fail the db open.

ywave620 · 2024-12-13T03:27:19Z

because the last inactive WAL will have its size recorded in manifest

I think this only happens when the active WAL has ever been synced? Right?

hx235 · 2024-12-13T21:36:52Z

I think this only happens when the active WAL has ever been synced? Right?

Oh sorry I misread and thought as part of the assumption that the last inactive WAL was synced. Then yes this proposal can write protection info about the last inactive WAL even when the active WAL or the last inactive WAL have not been synced

Summary: **Context/Summary:** This PR provides a new Options `track_and_verify_wals` to detect and handle WAL hole where new WAL data presents while some old WAL data is missing as well as db opened with no WAL. It's for #12488. It's intended to be a future replacement to `track_and_verify_wals_in_manifest` for its simplicity, better handling of WAL hole in `WALRecoveryMode::kPointInTimeRecovery` and potentials to cover more scenarios for `WALRecoveryMode::kTolerateCorruptedTailRecords/kAbsoluteConsistency`(in future PRs). The verification is done in `LogReader::MaybeVerifyPredecessorWALInfo()` and tracking is done in `log::Writer::MaybeAddPredecessorWALInfo()`. This PR also groups common utilities in `log::Writer` into functions `MaybeHandleSeenFileWriterError()`, `MaybeSwitchToNewBlock()` to avoid adding redundant code Pull Request resolved: #13226 Test Plan: - New UT - Integrate into existing UT - Intense rehearsal stress/crash test - db bench - The only potential performance implication it has is to the write path since now we keep track of the last seqno recorded in the WAL in `log::Writer`. Below benchmark show no regression. ``` ./db_bench --benchmarks=fillrandom[-X3] --num=2500000 --db=/dev/shm/db_bench_new --disable_auto_compactions=1 --threads=1 --enable_pipelined_write=0 --disable_wal=0 --track_and_verify_wals=1 Pre fillrandom [AVG 3 runs] : 310517 (± 5641) ops/sec; 34.4 (± 0.6) MB/sec fillrandom [MEDIAN 3 runs] : 308848 ops/sec; 34.2 MB/sec Post fillrandom [AVG 3 runs] : 311469 (± 4096) ops/sec; 34.5 (± 0.5) MB/sec fillrandom [MEDIAN 3 runs] : 311961 ops/sec; 34.5 MB/sec ``` Reviewed By: pdillinger Differential Revision: D67550260 Pulled By: hx235 fbshipit-source-id: 623e29bbe293ef03a45c20c348f84c8cb5bdaf91

ajkr added the design discussion label Mar 29, 2024

cbi42 assigned cbi42 and hx235 and unassigned cbi42 Nov 12, 2024

hx235 closed this as completed Dec 13, 2024

hx235 reopened this Dec 13, 2024

hx235 mentioned this issue Dec 25, 2024

Detect WAL hole #13226

Closed

hx235 closed this as completed Jan 3, 2025

evanj mentioned this issue Jan 16, 2025

track_and_verify_wals_in_manifest with sync=true creates hundreds of duplicate WalAddition entries #13192

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect missing records in WAL recovery #12488

Detect missing records in WAL recovery #12488

ajkr commented Mar 29, 2024

ajkr commented Mar 29, 2024 •

edited

Loading

cbi42 commented Apr 1, 2024

ajkr commented Apr 1, 2024

ajkr commented Apr 1, 2024

cbi42 commented Apr 1, 2024

nkg- commented Apr 4, 2024

ywave620 commented Apr 28, 2024

ywave620 commented Dec 9, 2024 •

edited

Loading

hx235 commented Dec 9, 2024

ywave620 commented Dec 11, 2024

hx235 commented Dec 12, 2024 •

edited

Loading

ywave620 commented Dec 13, 2024

hx235 commented Dec 13, 2024

Detect missing records in WAL recovery #12488

Detect missing records in WAL recovery #12488

Comments

ajkr commented Mar 29, 2024

ajkr commented Mar 29, 2024 • edited Loading

cbi42 commented Apr 1, 2024

ajkr commented Apr 1, 2024

ajkr commented Apr 1, 2024

cbi42 commented Apr 1, 2024

nkg- commented Apr 4, 2024

ywave620 commented Apr 28, 2024

ywave620 commented Dec 9, 2024 • edited Loading

hx235 commented Dec 9, 2024

ywave620 commented Dec 11, 2024

hx235 commented Dec 12, 2024 • edited Loading

ywave620 commented Dec 13, 2024

hx235 commented Dec 13, 2024

ajkr commented Mar 29, 2024 •

edited

Loading

ywave620 commented Dec 9, 2024 •

edited

Loading

hx235 commented Dec 12, 2024 •

edited

Loading