Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/record: write an EOF trailer on LogWriter Close #871

Merged
merged 1 commit into from
Sep 3, 2020

Conversation

jbowens
Copy link
Collaborator

@jbowens jbowens commented Sep 1, 2020

While closing a LogWriter, write a special EOF record signifying the end
of the log. This EOF record will overwrite trailing garbage from the
recycled log file, allowing us to detect corruption errors in any WALs
earlier than the most recent one.

In Open return an error if any WAL other than the most recent one
encounters an invalid chunk.

Fix #864.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach. Two small simplification suggestions below.

Reviewable status: 0 of 5 files reviewed, 2 unresolved discussions (waiting on @jbowens)


open.go, line 271 at r1 (raw file):

		// Invalid records are expected from recycled WALs, but only in the
		// most recent WAL.
		if err != nil && (!record.IsInvalidRecord(err) || i < len(logFiles)-1) {

I wonder if it would be cleaner to pass in lastWAL = (i == len(logFiles)-1)) as a parameter to replayWAL. That way we'd localize the error handling logic rather than having it split between replayWAL and Open.


internal/record/record.go, line 248 at r1 (raw file):

				// These chunks prevent us from misinterpreting garbage at the
				// tail of a WAL from recycling.
				if chunkType == recyclableEOFChunkType {

Rather than adding a new chunk type, I wonder if we can write a chunk that we know will be treated as io.EOF. It looks like if we write a chunk with logNum != r.logNum then we'd get this behavior. The small benefit to that is that we wouldn't be adding a new chunk type.

@jbowens jbowens force-pushed the jackson/eof-trailer branch from eb829fd to b7e7207 Compare September 1, 2020 19:55
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 2 unresolved discussions (waiting on @petermattis)


open.go, line 271 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I wonder if it would be cleaner to pass in lastWAL = (i == len(logFiles)-1)) as a parameter to replayWAL. That way we'd localize the error handling logic rather than having it split between replayWAL and Open.

Done.


internal/record/record.go, line 248 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Rather than adding a new chunk type, I wonder if we can write a chunk that we know will be treated as io.EOF. It looks like if we write a chunk with logNum != r.logNum then we'd get this behavior. The small benefit to that is that we wouldn't be adding a new chunk type.

I gave it a try, using zero for the log number since we start file numbers at 1. Many of tests used a zero log number, so I updated them to use a non-zero one. This approach does feels a little less obvious, but it is nice that it doesn't require a change to the log reader.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 2 unresolved discussions (waiting on @jbowens)


internal/record/log_writer.go, line 630 at r2 (raw file):

	binary.LittleEndian.PutUint16(b.buf[i+4:i+6], 0) // Size
	b.buf[i+6] = recyclableFullChunkType
	binary.LittleEndian.PutUint32(b.buf[i+7:i+11], 0) // Log number

I think using zero here might be problematic. The issue is that the log number field is 32-bits. The 64-bit log number is truncated to 32-bits in NewLogWriter:

		// NB: we truncate the 64-bit log number to 32-bits. This is ok because a)
		// we are very unlikely to reach a file number of 4 billion and b) the log
		// number is used as a validation check and using only the low 32-bits is
		// sufficient for that purpose.

While we are very unlikely to ever reach a log number of 4 billion, it isn't impossible if there is some sort of tight loop of filenum allocations due to some error condition. Rather than using zero, can we use w.logNum+1 or ^w.logNum. I think any value that isn't w.logNum works.


internal/record/record.go, line 248 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I gave it a try, using zero for the log number since we start file numbers at 1. Many of tests used a zero log number, so I updated them to use a non-zero one. This approach does feels a little less obvious, but it is nice that it doesn't require a change to the log reader.

I think using zero might be problematic. See my other comment for why and an alternative. The alternative might also remove the need to change the tests.

I think we might want to add a test for using the log num 1 << 32 so we explicitly test the case I'm worried about.

@jbowens jbowens force-pushed the jackson/eof-trailer branch from b7e7207 to c748598 Compare September 2, 2020 15:42
While closing a LogWriter, write a special EOF record with an
incremented log number signifying the end of the log. This EOF record
will overwrite trailing garbage from the recycled log file, allowing us
to detect corruption errors in any logs earlier than the most recent
one.

In Open return an error if any WAL other than the most recent one
encounters an invalid chunk.
@jbowens jbowens force-pushed the jackson/eof-trailer branch from c748598 to 59020cb Compare September 2, 2020 15:44
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 2 unresolved discussions (waiting on @petermattis)


internal/record/log_writer.go, line 630 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think using zero here might be problematic. The issue is that the log number field is 32-bits. The 64-bit log number is truncated to 32-bits in NewLogWriter:

		// NB: we truncate the 64-bit log number to 32-bits. This is ok because a)
		// we are very unlikely to reach a file number of 4 billion and b) the log
		// number is used as a validation check and using only the low 32-bits is
		// sufficient for that purpose.

While we are very unlikely to ever reach a log number of 4 billion, it isn't impossible if there is some sort of tight loop of filenum allocations due to some error condition. Rather than using zero, can we use w.logNum+1 or ^w.logNum. I think any value that isn't w.logNum works.

Thanks for catching that! Fixed.


internal/record/record.go, line 248 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think using zero might be problematic. See my other comment for why and an alternative. The alternative might also remove the need to change the tests.

I think we might want to add a test for using the log num 1 << 32 so we explicitly test the case I'm worried about.

Done.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 4 files reviewed, all discussions resolved

@jbowens
Copy link
Collaborator Author

jbowens commented Sep 3, 2020

TFTR!

@jbowens jbowens merged commit c29cc16 into cockroachdb:master Sep 3, 2020
@jbowens jbowens deleted the jackson/eof-trailer branch September 3, 2020 18:38
jbowens added a commit to jbowens/pebble that referenced this pull request Sep 9, 2020
In cockroachdb#871, Open started interpreting any WAL older than the most recent
one more strictly, erroring if it encountered any invalid chunks while
reading the log. This was enabled by a new EOF trailer chunk that's
written when a WAL is closed, ensuring that previous WALs end cleanly
and unambiguously.

However, during a WAL rotation, a crash before the previous WAL was
cleanly closed could leave the previous WAL with recycled garbage at its
tail.

This change updates WAL rotation to close the previous WAL first,
then create/rename the next WAL.
jbowens added a commit to jbowens/pebble that referenced this pull request Sep 10, 2020
In cockroachdb#871, Open started interpreting any WAL older than the most recent
one more strictly, erroring if it encountered any invalid chunks while
reading the log. This was enabled by a new EOF trailer chunk that's
written when a WAL is closed, ensuring that previous WALs end cleanly
and unambiguously.

However, during a WAL rotation, a crash before the previous WAL was
cleanly closed could leave the previous WAL with recycled garbage at its
tail.

This change updates WAL rotation to close the previous WAL first,
then create/rename the next WAL.
jbowens added a commit that referenced this pull request Sep 10, 2020
In #871, Open started interpreting any WAL older than the most recent
one more strictly, erroring if it encountered any invalid chunks while
reading the log. This was enabled by a new EOF trailer chunk that's
written when a WAL is closed, ensuring that previous WALs end cleanly
and unambiguously.

However, during a WAL rotation, a crash before the previous WAL was
cleanly closed could leave the previous WAL with recycled garbage at its
tail.

This change updates WAL rotation to close the previous WAL first,
then create/rename the next WAL.
jbowens added a commit to jbowens/pebble that referenced this pull request Sep 10, 2020
In cockroachdb#871, Open started interpreting any WAL older than the most recent
one more strictly, erroring if it encountered any invalid chunks while
reading the log. This was enabled by a new EOF trailer chunk that's
written when a WAL is closed, ensuring that previous WALs end cleanly
and unambiguously.

However, during a WAL rotation, a crash before the previous WAL was
cleanly closed could leave the previous WAL with recycled garbage at its
tail.

This change updates WAL rotation to close the previous WAL first,
then create/rename the next WAL.
jbowens added a commit that referenced this pull request Sep 10, 2020
In #871, Open started interpreting any WAL older than the most recent
one more strictly, erroring if it encountered any invalid chunks while
reading the log. This was enabled by a new EOF trailer chunk that's
written when a WAL is closed, ensuring that previous WALs end cleanly
and unambiguously.

However, during a WAL rotation, a crash before the previous WAL was
cleanly closed could leave the previous WAL with recycled garbage at its
tail.

This change updates WAL rotation to close the previous WAL first,
then create/rename the next WAL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

db: problematic point-in-time handling of WAL checksum failures
3 participants