-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: close previous WAL before linking next WAL #886
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@petermattis I wonder the existing code created the new WAL first because the previous WAL's flush loop will asynchronously flush pending writes while we perform the filesystem operations to set up the new WAL. That little bit of parallelism might help reduce the size of the critical section?
Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on @itsbilal and @petermattis)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@petermattis I wonder the existing code created the new WAL first because the previous WAL's flush loop will asynchronously flush pending writes while we perform the filesystem operations to set up the new WAL. That little bit of parallelism might help reduce the size of the critical section?
That's a good point, though I don't think it was the reason for the existing code structure as the go-leveldb log writer didn't have a separate flush loop. I don't think there is anything we can do here to fix this. If the parallelism exists then so does the possibility that the EOF trailer is not written before the new log file is created. I suspect that this isn't something worth optimizing.
Reviewable status: 0 of 6 files reviewed, 3 unresolved discussions (waiting on @itsbilal, @jbowens, and @petermattis)
db.go, line 1334 at r1 (raw file):
// being written for the first time. Note this is true even if file // preallocation is performed (e.g. fallocate). recycleLogNum := d.logRecycler.peek()
If err != nil
, should we be peeking at recycleLogNum
? Seems like doing so will result in us deterministically popping that log num down below even though we never attempted to use the recycled log number.
db.go, line 1359 at r1 (raw file):
} if recycleLogNum > 0 {
If err != nil
, I wonder if we should be placing recycleLogNum
on some sort of obsoleteWALs
list. Otherwise it seems like we'd leak the WAL file until the DB restarts. Definitely not something to fix in this PR. Perhaps add a TODO and file an issue if what I'm saying makes sense.
error_test.go, line 334 at r1 (raw file):
} func TestDBWALRotationCrash(t *testing.T) {
Nice test!
In cockroachdb#871, Open started interpreting any WAL older than the most recent one more strictly, erroring if it encountered any invalid chunks while reading the log. This was enabled by a new EOF trailer chunk that's written when a WAL is closed, ensuring that previous WALs end cleanly and unambiguously. However, during a WAL rotation, a crash before the previous WAL was cleanly closed could leave the previous WAL with recycled garbage at its tail. This change updates WAL rotation to close the previous WAL first, then create/rename the next WAL.
8d02ce7
to
2497546
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 6 files reviewed, 3 unresolved discussions (waiting on @itsbilal, @jbowens, and @petermattis)
db.go, line 1334 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
If
err != nil
, should we be peeking atrecycleLogNum
? Seems like doing so will result in us deterministically popping that log num down below even though we never attempted to use the recycled log number.
Done
db.go, line 1359 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
If
err != nil
, I wonder if we should be placingrecycleLogNum
on some sort ofobsoleteWALs
list. Otherwise it seems like we'd leak the WAL file until the DB restarts. Definitely not something to fix in this PR. Perhaps add a TODO and file an issue if what I'm saying makes sense.
Currently we panic a little down below if err != nil
, so I think we'll leak the WAL file regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 6 files reviewed, 2 unresolved discussions (waiting on @itsbilal and @jbowens)
db.go, line 1359 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Currently we panic a little down below if
err != nil
, so I think we'll leak the WAL file regardless.
Ah, I didn't look far enough. The WAL will be collected at startup via scanObsoleteFiles
, so no permanent leak will have occurred.
TFTR! |
In #871, Open started interpreting any WAL older than the most recent
one more strictly, erroring if it encountered any invalid chunks while
reading the log. This was enabled by a new EOF trailer chunk that's
written when a WAL is closed, ensuring that previous WALs end cleanly
and unambiguously.
However, during a WAL rotation, a crash before the previous WAL was
cleanly closed could leave the previous WAL with recycled garbage at its
tail.
This change updates WAL rotation to close the previous WAL first,
then create/rename the next WAL.