Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipeline: hasher bug #1267

Merged
merged 23 commits into from
Feb 20, 2021
Merged

pipeline: hasher bug #1267

merged 23 commits into from
Feb 20, 2021

Conversation

acud
Copy link
Member

@acud acud commented Feb 16, 2021

There was a bug in our hashing pipeline that was discovered in #1175.
The problem was that we assumed that the trie depth was known in advanced, however, when wrapping deeper levels under certain circumstances the upper levels would also wrap, resulting in a trie that is deeper than originally assumed. The result was that the trie was being traversed only until the preassumed depth, leaving us with an "empty" level once it has wrapped into the next.

This has been demonstrated with an explicit regression test that is now in the hashtrie package. Furthermore, unit tests have now been added to the hashtrie package. It covers various scenarios and edge cases.

The Sum method logic has now been inlined into the method instead of the helper method, and the logic has been significantly simplified, with extensive comments and explanations on how things work, with the hope that this improves developer onboarding and open-source contributions, which have revealed this bug.

The logic of generating the root hash has changed and indeed simplified. In the past, we would wrongly assume which level we're aiming at and try to read the resulting data out of that level. However as mentioned, this did not take into account the case where several subsequent levels were wrapped as a result of one level wrap (causing the trie to balance, basically).
After the simplification we just iterate on all levels and constantly carry over hashes or wrap levels as needed, carrying over the result to higher level. In the end of the operation, we just take the resulting hash from the top level of the trie.

I have used TestRegression to verify that this problem exists on current master and is solved with this change, as well as manual tests. In the unit tests added to the hashtrie package, I have chosen an approach where one predefined hash is constantly being written to the trie, with constant spans of value 1. This allows the eventual root chunk span to be compared against the linear number of writes that were done in the test. Also, custom branching and chunk size values were used in tests in order to decrease the number of writes needed in order to trigger a level wrap of the trie.

@acud acud self-assigned this Feb 16, 2021
@acud acud added the bug Something isn't working label Feb 16, 2021
@acud acud added the ready for review The PR is ready to be reviewed label Feb 17, 2021
@acud acud requested review from janos and zelig February 18, 2021 15:19
Copy link
Member

@janos janos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

pkg/storage/mock/storer.go Show resolved Hide resolved
Copy link
Member

@zelig zelig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. please change trie to tree everywhere. this is not a trie

errTrieFull = errors.New("trie full")
)

const maxLevel = 8

type hashTrieWriter struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this just writer

@@ -12,15 +12,21 @@ import (
"github.com/ethersphere/bee/pkg/swarm"
)

var errInconsistentRefs = errors.New("inconsistent reference lengths in level")
var (
errInconsistentRefs = errors.New("inconsistent references")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not exported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what for?

var errInconsistentRefs = errors.New("inconsistent reference lengths in level")
var (
errInconsistentRefs = errors.New("inconsistent references")
errTrieFull = errors.New("trie full")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought we agreed this should be handled but its ok

@Eknir
Copy link
Contributor

Eknir commented Feb 20, 2021

This PR is ready for review, but not connected to an issue. The issue which belongs to the PR is not in the review workspace. Please properly manage your issues.

@acud
Copy link
Member Author

acud commented Feb 20, 2021

change trie to tree everywhere

Let's throttle renaming. I'll do this in a separate PR

@acud acud merged commit 00f4c06 into master Feb 20, 2021
@acud acud deleted the hasher-bug branch February 20, 2021 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pull-request ready for review The PR is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants