Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle merge of already serialized value-keys #95

Merged
merged 2 commits into from
Oct 5, 2022
Merged

Conversation

masih
Copy link
Member

@masih masih commented Oct 4, 2022

Pebble documentation suggests that Merger is used by .Merge calls on a given key value. However, Pebble uses merger on iteration over values too. Internally, Get operations use iterators for example. This means values that are already serialized may be passed to the merger implementation during in-memory traversal.

The previous implementation of value-keys merger did not take this into account and as a result it meant that Get failed to find value-keys associated to a multihash since they ended up being double wrapped by the merger value-keys marshaller.

The changes here fixes this issue by checking explicitly that a give value-key has the expected key prefix before proceeding with the remaining merge logic. As a precaution, the merger is also modified to fall back on the default Pebble merger when the key is not of kind multihash-key. Further, the benchmarks are modified to explicitly assert that the inserted values are indeed found for each entry.

The Get implementation in Pebble store is also modified to copy the content returned by the backing store to avoid slice invalidation and runtime invalid reference errors captured in #80.

Unrelated to the lookup issue, the changes here also fix typos in benchmarks where workload does not match the benchmark name. Further, the common tests are updated to do a flush as suggested in #93.

Fixes #94, #80.

@codecov-commenter
Copy link

codecov-commenter commented Oct 4, 2022

Codecov Report

Merging #95 (521d8bc) into main (3f6b061) will decrease coverage by 0.10%.
The diff coverage is 66.21%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #95      +/-   ##
==========================================
- Coverage   63.57%   63.47%   -0.11%     
==========================================
  Files          15       15              
  Lines        2740     2779      +39     
==========================================
+ Hits         1742     1764      +22     
- Misses        742      753      +11     
- Partials      256      262       +6     
Impacted Files Coverage Δ
store/test/tests.go 39.04% <0.00%> (-0.99%) ⬇️
store/pebble/pebble.go 59.89% <57.14%> (-1.19%) ⬇️
store/pebble/vk_merger.go 95.52% <90.00%> (-4.48%) ⬇️
store/pebble/key.go 85.26% <100.00%> (+0.81%) ⬆️
value.go 82.08% <100.00%> (+0.55%) ⬆️
store/memory/memory.go 88.34% <0.00%> (+0.44%) ⬆️

Pebble documentation suggests that `Merger` is used by `.Merge` calls on
a given key value. However, Pebble uses merger on iteration over values
too. Internally, `Get` operations use iterators for example. This means
values that are already serialized may be passed to the merger
implementation during in-memory traversal.

The previous implementation of value-keys merger did not take this into
account and as a result it meant that `Get` failed to find value-keys
associated to a multihash since they ended up being double wrapped by
the merger value-keys marshaller.

The changes here fixes this issue by checking explicitly that a give
value-key has the expected key prefix before proceeding with the
remaining merge logic. As a precaution, the merger is also modified to
fall back on the default Pebble merger when the key is not of kind
multihash-key. Further, the benchmarks are modified to explicitly assert
that the inserted values are indeed found for each entry.

The `Get` implementation in Pebble store is also modified to copy the
content returned by the backing store to avoid slice invalidation and
runtime invalid reference errors captured in #80.

Unrelated to the lookup issue, the changes here also fix typos in
benchmarks where workload does not match the benchmark name. Further,
the common tests are updated to do a flush as suggested in #93.

Fixes #94, #80.
@masih masih force-pushed the masih/pebble_merger_fix branch from 2083f1b to d5b60db Compare October 4, 2022 16:19
@masih
Copy link
Member Author

masih commented Oct 4, 2022

For reference here is a local run of Put and Get benchmarks for Pebble and STH stores for workloads 0 and 1:

$ go test ./bench -bench='(Pebble|Storethehash)(Put|Get)_W(0|1)'  -count=3 -timeout=300m

cpu: Intel(R) Core(TM) i7 CPU @ 2.30GHz

PASS
ok      github.com/filecoin-project/go-indexer-core/bench       1416.652s
name                        time/op
Store_PebblePut_W0-8           1.78s ±12%
Store_PebbleGet_W0-8           9.03s ± 1%
Store_PebblePut_W1-8           21.4s ± 6%
Store_PebbleGet_W1-8            135s ± 3%
Store_StorethehashPut_W0-8     3.38s ± 4%
Store_StorethehashGet_W0-8     9.82s ± 4%
Store_StorethehashPut_W1-8     55.9s ± 1%
Store_StorethehashGet_W1-8      141s ± 9%

name                        speed
Store_PebblePut_W0-8        19.3MB/s ±11%
Store_PebbleGet_W0-8        3.77MB/s ± 1%
Store_PebblePut_W1-8        15.9MB/s ± 6%
Store_PebbleGet_W1-8        2.53MB/s ± 3%
Store_StorethehashPut_W0-8  10.1MB/s ± 4%
Store_StorethehashGet_W0-8  3.47MB/s ± 4%
Store_StorethehashPut_W1-8  6.10MB/s ± 1%
Store_StorethehashGet_W1-8  2.43MB/s ± 9%

name                        size
Store_PebblePut_W0-8           48.4M ± 0%
Store_PebbleGet_W0-8           45.5M ± 0%
Store_PebblePut_W1-8            501M ± 0%
Store_PebbleGet_W1-8            487M ± 0%
Store_StorethehashPut_W0-8     87.4M ± 0%
Store_StorethehashGet_W0-8     87.4M ± 0%
Store_StorethehashPut_W1-8      912M ± 0%
Store_StorethehashGet_W1-8      912M ± 0%

name                        alloc/op
Store_PebblePut_W0-8           155MB ± 1%
Store_PebbleGet_W0-8          19.6GB ± 0%
Store_PebblePut_W1-8          4.51GB ± 2%
Store_PebbleGet_W1-8           198GB ± 0%
Store_StorethehashPut_W0-8     967MB ± 0%
Store_StorethehashGet_W0-8    1.61GB ± 5%
Store_StorethehashPut_W1-8    11.1GB ± 1%
Store_StorethehashGet_W1-8    17.8GB ± 0%

name                        allocs/op
Store_PebblePut_W0-8           2.55M ± 2%
Store_PebbleGet_W0-8           29.7M ± 0%
Store_PebblePut_W1-8           97.9M ± 2%
Store_PebbleGet_W1-8            351M ± 1%
Store_StorethehashPut_W0-8     11.9M ± 1%
Store_StorethehashGet_W0-8     28.1M ± 5%
Store_StorethehashPut_W1-8      142M ± 0%
Store_StorethehashGet_W1-8      309M ± 0%

Copy link
Collaborator

@gammazero gammazero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment, otherwise LGTM.

I have made the changes suggested in the comment here for your review:
521d8bc

Copy link
Contributor

@ischasny ischasny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Make a copy of given byte slices to assure they remain valid and are
unaffected by slice invalidation.
@masih masih force-pushed the masih/pebble_merger_fix branch from 521d8bc to 2e1c674 Compare October 5, 2022 08:37
@masih masih merged commit 3f4a7b4 into main Oct 5, 2022
@masih masih deleted the masih/pebble_merger_fix branch October 5, 2022 08:37
masih added a commit that referenced this pull request Oct 6, 2022
The previous behaviour of value-key merger in certain scenarios may have
been resulted in value-keys being marshalled repeatedly. This was the
underlying cause of the issue captured in #94 and corrected in #95.

The work here adds to the corrections by gracefully fixing such records
by unmarshalling them recursively until we reach values with the
expected key prefix, and de-duplicating them as necessary. The fix is
done opportunistically, in that whenever such values are detected during
reads and writes they are fixed.

Additional tests are added to simulate nested marshalling of value-keys
and assert that the resulting merge is as expected for both merge-older
and merge-newer cases.
masih added a commit that referenced this pull request Oct 6, 2022
The previous behaviour of value-key merger in certain scenarios may have
been resulted in value-keys being marshalled repeatedly. This was the
underlying cause of the issue captured in #94 and corrected in #95.

The work here adds to the corrections by gracefully fixing such records
by unmarshalling them recursively until we reach values with the
expected key prefix, and de-duplicating them as necessary. The fix is
done opportunistically, in that whenever such values are detected during
reads and writes they are fixed.

Additional tests are added to simulate nested marshalling of value-keys
and assert that the resulting merge is as expected for both merge-older
and merge-newer cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Not all processed records stored in Pebble are found
5 participants