Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix snapshot compaction bug #339

Merged
merged 1 commit into from
Apr 12, 2019
Merged

Conversation

richcole-at-amazon
Copy link
Contributor

Closes #320

During compaction it was possible that records from a block b1=(l1,u1)
would be pushed down from level i to level i+1. If there is a block
b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then
a subsequent search for k1 will yield the record l2 which has a smaller
sequence number than u1 because the sort order for records sorts
increasing by user key but decreaing by sequence number.

This change add a call to a new function AddBoundaryInputs to
SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the
criteria above and adds it to the set of files to be compacted. Whenever
AddBoundaryInputs is called it is important that the compaction fileset
in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each
call to AddBoundaryInputs is followed by a call to GetOverlappingInputs.

SetupOtherInputs is called on both manual and automated compaction
passes. It is called for both level zero and for levels greater than 0.

Testing Done:

A test program (issue320_test) was constructed that performs mutations
while snapshots are active. issue320_test fails without this bug fix
after 64k writes. It passes with this bug fix. It was run with 200M
writes and passed.

Unit tests were written for the new function that was added to the
code. Make check was run and seen to pass.

Signed-off-by: Richard Cole [email protected]

@x2c3z4
Copy link

x2c3z4 commented Mar 12, 2019

Why does this patch not to be merged to the master?
This is a very serious defect.
@richcole-at-amazon @jeffreyadean

Is there any unknown background behind this?

@x2c3z4
Copy link

x2c3z4 commented Mar 14, 2019

@ghemawat
@costan

@kylezh
Copy link
Contributor

kylezh commented Mar 14, 2019

@cmumford @pwnall @ghemawat

Please take some time to a look at this patch. LevelDB is a widely used library and this is really a very critical bugfix.

I can reproduce this bug on master branch(9ce3051).

And I think this is the same problem of #375

Thanks!

@cmumford
Copy link
Contributor

Just sent an email to @richcole-at-amazon to see if he is able to sign the Google CLA.

cmumford added a commit to cmumford/leveldb that referenced this pull request Mar 18, 2019
Makefile is now deprecated so added reference to issue320_test.cc
in CMakeLists.txt.
Copy link
Contributor

@cmumford cmumford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pull request. I've made a few minor comments. After those few changes I'll ask Sanjay to take a look at your changes to version_set.cc as he knows this code much better than I.

Makefile Outdated Show resolved Hide resolved
db/version_set_test.cc Outdated Show resolved Hide resolved
issues/issue320_test.cc Outdated Show resolved Hide resolved
@richcole-at-amazon
Copy link
Contributor Author

Thanks Chris. I'll look at making these changes and updating the pull request. I'm going to try to carve out some time on Thursday.

richcole-at-amazon added a commit to richcole-at-amazon/leveldb that referenced this pull request Mar 22, 2019
Closes google#320

During compaction it was possible that records from a block b1=(l1,u1)
would be pushed down from level i to level i+1. If there is a block
b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then
a subsequent search for k1 will yield the record l2 which has a smaller
sequence number than u1 because the sort order for records sorts
increasing by user key but decreaing by sequence number.

This change add a call to a new function AddBoundaryInputs to
SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the
criteria above and adds it to the set of files to be compacted. Whenever
AddBoundaryInputs is called it is important that the compaction fileset
in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each
call to AddBoundaryInputs is followed by a call to GetOverlappingInputs.

SetupOtherInputs is called on both manual and automated compaction
passes. It is called for both level zero and for levels greater than 0.

The original change posted in google#339
has been modified to also include changed made by Chris Mumford<[email protected]>
in cmumford@4b72cb1

  1. Releasing snapshots during test cleanup to avoid
     memory leak warnings.
  2. Refactored test to use testutil.h to be in line
     with other issue tests and to create the test
     database in the correct temporary location.
  3. Added copyright banner.

  Otherwise, just minor formatting and limiting character
  width to 80 characters.

Additionally the change was rebased on top of current master and
changes previously made to the Makefile were ported to the
CMakeLists.txt.

Testing Done:

  A test program (issue320_test) was constructed that performs mutations
  while snapshots are active. issue320_test fails without this bug fix
  after 64k writes. It passes with this bug fix. It was run with 200M
  writes and passed.

  Unit tests were written for the new function that was added to the
  code. Make test was run and seen to pass.

Signed-off-by: Richard Cole <[email protected]>
Copy link
Contributor

@cmumford cmumford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment.

class Issue320 { };

TEST(Issue320, Test) {
srandom(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richcole-at-amazon This is failing on Windows with the following message:

error C3861: 'srandom': identifier not found 

Can you switch to using std::srand() and std::rand()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure should be straight forward. Will update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, one more optional thing. I failed to point out, in your PR, code that didn't fully conform to the Google C++ Style Guide. I won't require it for this PR and will fix up the formatting after merging this change. However, if you want Git blame to more accurately identify you as the author of certain lines you're welcome to pull over my formatting fixes from cmumford@783fcff. If not then I'll just land them in a post merge cleanup.

cmumford pushed a commit to cmumford/leveldb that referenced this pull request Mar 27, 2019
Closes google#320

During compaction it was possible that records from a block b1=(l1,u1)
would be pushed down from level i to level i+1. If there is a block
b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then
a subsequent search for k1 will yield the record l2 which has a smaller
sequence number than u1 because the sort order for records sorts
increasing by user key but decreaing by sequence number.

This change add a call to a new function AddBoundaryInputs to
SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the
criteria above and adds it to the set of files to be compacted. Whenever
AddBoundaryInputs is called it is important that the compaction fileset
in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each
call to AddBoundaryInputs is followed by a call to GetOverlappingInputs.

SetupOtherInputs is called on both manual and automated compaction
passes. It is called for both level zero and for levels greater than 0.

The original change posted in google#339
has been modified to also include changed made by Chris Mumford<[email protected]>
in 4b72cb1

  1. Releasing snapshots during test cleanup to avoid
     memory leak warnings.
  2. Refactored test to use testutil.h to be in line
     with other issue tests and to create the test
     database in the correct temporary location.
  3. Added copyright banner.

  Otherwise, just minor formatting and limiting character
  width to 80 characters.

Additionally the change was rebased on top of current master and
changes previously made to the Makefile were ported to the
CMakeLists.txt.

Testing Done:

  A test program (issue320_test) was constructed that performs mutations
  while snapshots are active. issue320_test fails without this bug fix
  after 64k writes. It passes with this bug fix. It was run with 200M
  writes and passed.

  Unit tests were written for the new function that was added to the
  code. Make test was run and seen to pass.

Signed-off-by: Richard Cole <[email protected]>
@ruish
Copy link

ruish commented Apr 1, 2019

hi Chris @cmumford,

I was trying to patch your change cmumford@48a9a9a to my local repo.

I first build and test the issue320_test.cc without patching other part of the fix. To my surprise, the test was passed. I expected it to fail without the fix.

Could you please verify that? Should the issue320_test pass without other part of the fix?

Thanks,

Rui

@ruish
Copy link

ruish commented Apr 1, 2019

BTW, my local repo is one based on this commit: 7b945f2

richcole-at-amazon added a commit to richcole-at-amazon/leveldb that referenced this pull request Apr 1, 2019
Closes google#320

During compaction it was possible that records from a block b1=(l1,u1)
would be pushed down from level i to level i+1. If there is a block
b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then
a subsequent search for k1 will yield the record l2 which has a smaller
sequence number than u1 because the sort order for records sorts
increasing by user key but decreaing by sequence number.

This change add a call to a new function AddBoundaryInputs to
SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the
criteria above and adds it to the set of files to be compacted. Whenever
AddBoundaryInputs is called it is important that the compaction fileset
in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each
call to AddBoundaryInputs is followed by a call to GetOverlappingInputs.

SetupOtherInputs is called on both manual and automated compaction
passes. It is called for both level zero and for levels greater than 0.

The original change posted in google#339
has been modified to also include changed made by Chris Mumford<[email protected]>
in cmumford@4b72cb1

  1. Releasing snapshots during test cleanup to avoid
     memory leak warnings.
  2. Refactored test to use testutil.h to be in line
     with other issue tests and to create the test
     database in the correct temporary location.
  3. Added copyright banner.

  Otherwise, just minor formatting and limiting character
  width to 80 characters.

Additionally the change was rebased on top of current master and
changes previously made to the Makefile were ported to the
CMakeLists.txt.

Testing Done:

  A test program (issue320_test) was constructed that performs mutations
  while snapshots are active. issue320_test fails without this bug fix
  after 64k writes. It passes with this bug fix. It was run with 200M
  writes and passed.

  Unit tests were written for the new function that was added to the
  code. Make test was run and seen to pass.

Signed-off-by: Richard Cole <[email protected]>
@cmumford
Copy link
Contributor

cmumford commented Apr 1, 2019

Thanks @ruish - I introduced an error in the test when refactoring. I've uploaded a fix to cmumford@9c8d57c.

Closes google#320

During compaction it was possible that records from a block b1=(l1,u1)
would be pushed down from level i to level i+1. If there is a block
b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then
a subsequent search for k1 will yield the record l2 which has a smaller
sequence number than u1 because the sort order for records sorts
increasing by user key but decreaing by sequence number.

This change add a call to a new function AddBoundaryInputs to
SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the
criteria above and adds it to the set of files to be compacted. Whenever
AddBoundaryInputs is called it is important that the compaction fileset
in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each
call to AddBoundaryInputs is followed by a call to GetOverlappingInputs.

SetupOtherInputs is called on both manual and automated compaction
passes. It is called for both level zero and for levels greater than 0.

The original change posted in google#339
has been modified to also include changed made by Chris Mumford<[email protected]>
in cmumford@4b72cb1

  1. Releasing snapshots during test cleanup to avoid
     memory leak warnings.
  2. Refactored test to use testutil.h to be in line
     with other issue tests and to create the test
     database in the correct temporary location.
  3. Added copyright banner.

  Otherwise, just minor formatting and limiting character
  width to 80 characters.

Additionally the change was rebased on top of current master and
changes previously made to the Makefile were ported to the
CMakeLists.txt.

Testing Done:

  A test program (issue320_test) was constructed that performs mutations
  while snapshots are active. issue320_test fails without this bug fix
  after 64k writes. It passes with this bug fix. It was run with 200M
  writes and passed.

  Unit tests were written for the new function that was added to the
  code. Make test was run and seen to pass.

Signed-off-by: Richard Cole <[email protected]>
Copy link
Contributor

@cmumford cmumford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. I'll fixup the formatting later.

@cmumford
Copy link
Contributor

cmumford commented Apr 1, 2019

Note. I'm trying to configure Copybara to accept this PR. Please ignore any of those errors.

@ruish
Copy link

ruish commented Apr 2, 2019

Hi Chris @cmumford ,

LGTM!

I patched your fix in my local repo and verified that the test is passed/failed with/without the original fix, which meets the expectation.

Thanks!

Rui

@jsolman
Copy link

jsolman commented Apr 7, 2019

Using LevelDB built from the master branch of this repository, I was hitting this issue fairly consistently when syncing the Neo (https://neo.org) MainNet Blockchain from the latest chain file available at https://sync.ngd.network. In certain instances where compaction would occur, keys were becoming missing. I have not encountered the issue since using a build that incorporates the changes from this PR.

@cmumford cmumford merged commit 20fb601 into google:master Apr 12, 2019
cmumford added a commit that referenced this pull request Apr 12, 2019
dwing4g pushed a commit to dwing4g/leveldb that referenced this pull request Apr 17, 2019
Closes google#320

During compaction it was possible that records from a block b1=(l1,u1)
would be pushed down from level i to level i+1. If there is a block
b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then
a subsequent search for k1 will yield the record l2 which has a smaller
sequence number than u1 because the sort order for records sorts
increasing by user key but decreaing by sequence number.

This change add a call to a new function AddBoundaryInputs to
SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the
criteria above and adds it to the set of files to be compacted. Whenever
AddBoundaryInputs is called it is important that the compaction fileset
in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each
call to AddBoundaryInputs is followed by a call to GetOverlappingInputs.

SetupOtherInputs is called on both manual and automated compaction
passes. It is called for both level zero and for levels greater than 0.

The original change posted in google#339
has been modified to also include changed made by Chris Mumford<[email protected]>
in cmumford@4b72cb1

  1. Releasing snapshots during test cleanup to avoid
     memory leak warnings.
  2. Refactored test to use testutil.h to be in line
     with other issue tests and to create the test
     database in the correct temporary location.
  3. Added copyright banner.

  Otherwise, just minor formatting and limiting character
  width to 80 characters.

Additionally the change was rebased on top of current master and
changes previously made to the Makefile were ported to the
CMakeLists.txt.

Testing Done:

  A test program (issue320_test) was constructed that performs mutations
  while snapshots are active. issue320_test fails without this bug fix
  after 64k writes. It passes with this bug fix. It was run with 200M
  writes and passed.

  Unit tests were written for the new function that was added to the
  code. Make test was run and seen to pass.

Signed-off-by: Richard Cole <[email protected]>

# Conflicts:
#	CMakeLists.txt
@kylezh kylezh mentioned this pull request Sep 11, 2019
pwnall pushed a commit that referenced this pull request May 20, 2021
key across multiple files.

As reported in Github issue #339, it is incorrect to split the
same user key across multiple compacted files since it causes
tombstones/newer-versions to be dropped, thereby exposing obsolete
data. There was a fix for #339, but it ended up not fully fixing
the problem. (It checked for boundary problems in the first level
being compacted, but not the second). This problem was revealed
by Github issue 887.

We now adjust boundaries to avoid splitting user keys in both the
first level and the second level.

PiperOrigin-RevId: 374921082
fanquake pushed a commit to fanquake/leveldb-subtree that referenced this pull request Jun 11, 2024
key across multiple files.

As reported in Github issue google#339, it is incorrect to split the
same user key across multiple compacted files since it causes
tombstones/newer-versions to be dropped, thereby exposing obsolete
data. There was a fix for google#339, but it ended up not fully fixing
the problem. (It checked for boundary problems in the first level
being compacted, but not the second). This problem was revealed
by Github issue 887.

We now adjust boundaries to avoid splitting user keys in both the
first level and the second level.

PiperOrigin-RevId: 374921082
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compaction causes data inconsistency when using snapshots
7 participants