-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pipeline: hasher bug #1267
pipeline: hasher bug #1267
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. please change trie to tree everywhere. this is not a trie
errTrieFull = errors.New("trie full") | ||
) | ||
|
||
const maxLevel = 8 | ||
|
||
type hashTrieWriter struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this just writer
@@ -12,15 +12,21 @@ import ( | |||
"github.com/ethersphere/bee/pkg/swarm" | |||
) | |||
|
|||
var errInconsistentRefs = errors.New("inconsistent reference lengths in level") | |||
var ( | |||
errInconsistentRefs = errors.New("inconsistent references") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not exported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what for?
var errInconsistentRefs = errors.New("inconsistent reference lengths in level") | ||
var ( | ||
errInconsistentRefs = errors.New("inconsistent references") | ||
errTrieFull = errors.New("trie full") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought we agreed this should be handled but its ok
This PR is ready for review, but not connected to an issue. The issue which belongs to the PR is not in the |
Let's throttle renaming. I'll do this in a separate PR |
There was a bug in our hashing pipeline that was discovered in #1175.
The problem was that we assumed that the trie depth was known in advanced, however, when wrapping deeper levels under certain circumstances the upper levels would also wrap, resulting in a trie that is deeper than originally assumed. The result was that the trie was being traversed only until the preassumed depth, leaving us with an "empty" level once it has wrapped into the next.
This has been demonstrated with an explicit regression test that is now in the
hashtrie
package. Furthermore, unit tests have now been added to thehashtrie
package. It covers various scenarios and edge cases.The
Sum
method logic has now been inlined into the method instead of the helper method, and the logic has been significantly simplified, with extensive comments and explanations on how things work, with the hope that this improves developer onboarding and open-source contributions, which have revealed this bug.The logic of generating the root hash has changed and indeed simplified. In the past, we would wrongly assume which level we're aiming at and try to read the resulting data out of that level. However as mentioned, this did not take into account the case where several subsequent levels were wrapped as a result of one level wrap (causing the trie to balance, basically).
After the simplification we just iterate on all levels and constantly carry over hashes or wrap levels as needed, carrying over the result to higher level. In the end of the operation, we just take the resulting hash from the top level of the trie.
I have used
TestRegression
to verify that this problem exists on currentmaster
and is solved with this change, as well as manual tests. In the unit tests added to thehashtrie
package, I have chosen an approach where one predefined hash is constantly being written to the trie, with constant spans of value1
. This allows the eventual root chunk span to be compared against the linear number of writes that were done in the test. Also, custom branching and chunk size values were used in tests in order to decrease the number of writes needed in order to trigger a level wrap of the trie.