-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-4284] Implement bloom lookup tree as red-black tree #5978
base: master
Are you sure you want to change the base?
Conversation
53e67f6
to
a8f041f
Compare
@codope @nsivabalan Please take a look~ |
@vinothchandar Can you help review this PR, thanks. |
@yabola Thanks for putting up this PR. Sorry for the delay as I was occupied with Hudi release work. I am going to review it towards the end of week. Expect feedback by Monday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yabola Can you please rebase? Would be great if we can compare the performance against shuffle cost.
// Note that the interval tree implementation doesn't have auto-balancing to ensure logN search time. | ||
// So, we are shuffling the input here hoping the tree will not have any skewness. If not, the tree could be skewed | ||
// which could result in N search time instead of NlogN. | ||
Collections.shuffle(allIndexFiles); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yabola do you have some micro-benchmark as to how much improvement this change brings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, actually I don't have benchmark on it. I think red-black tree is a general optimization strategy like hashmap
* @param key the numeric value of the key | ||
* @return result of aligned numbers. For example, `1` -> `00001`. | ||
*/ | ||
private String alignedNumber(long key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be static?
* 4. Every path from a node (including root) to any of its descendants NULL nodes has the same number of black nodes. | ||
*/ | ||
@SuppressWarnings({"unchecked", "rawtypes"}) | ||
public class RedBlackTree<T extends RedBlackTreeNode<K>, K extends Comparable<K>> implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered thread safety of operations on the tree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make this tree thread safety later if need
What is the purpose of the pull request
The existing KeyRangeLookupTree implementation is Binary Sorting Tree.
Although it is shuffled before insertion, it may still cause uneven distribution. This PR implement it as a Red Black Tree.
Brief change log
Added abstract implementation of red-black tree
RedBlackTree
and implement theKeyRangeLookupTree
as red-black tree.Verify this pull request
Added new unit test
org.apache.hudi.common.util.rbtree.TestRedBlackTree
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.