Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-4284] Implement bloom lookup tree as red-black tree #5978

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

yabola
Copy link

@yabola yabola commented Jun 26, 2022

What is the purpose of the pull request

The existing KeyRangeLookupTree implementation is Binary Sorting Tree.
Although it is shuffled before insertion, it may still cause uneven distribution. This PR implement it as a Red Black Tree.

Brief change log

Added abstract implementation of red-black tree RedBlackTree and implement the KeyRangeLookupTree as red-black tree.

Verify this pull request

Added new unit test org.apache.hudi.common.util.rbtree.TestRedBlackTree

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@yabola yabola changed the title HUDI-4284 Implement bloom lookup tree as red-black tree [HUDI-4284] Implement bloom lookup tree as red-black tree Jun 26, 2022
@yabola yabola changed the title [HUDI-4284] Implement bloom lookup tree as red-black tree [HUDI-4284] Implement index lookup tree as red-black tree Jun 26, 2022
@yabola yabola force-pushed the HUDI-4284 branch 2 times, most recently from 53e67f6 to a8f041f Compare June 26, 2022 15:58
@yabola yabola changed the title [HUDI-4284] Implement index lookup tree as red-black tree [HUDI-4284] Implement bloom lookup tree as red-black tree Jun 26, 2022
@XuQianJin-Stars XuQianJin-Stars requested review from nsivabalan, xushiyan, danny0405 and codope and removed request for nsivabalan June 28, 2022 06:11
@yabola
Copy link
Author

yabola commented Jul 1, 2022

@codope @nsivabalan Please take a look~

@vinothchandar vinothchandar self-assigned this Jul 13, 2022
@yabola
Copy link
Author

yabola commented Aug 3, 2022

@vinothchandar Can you help review this PR, thanks.

@codope
Copy link
Member

codope commented Aug 10, 2022

@yabola Thanks for putting up this PR. Sorry for the delay as I was occupied with Hudi release work. I am going to review it towards the end of week. Expect feedback by Monday.

@yihua yihua added priority:major degraded perf; unable to move forward; potential bugs writer-core Issues relating to core transactions/write actions index labels Sep 13, 2022
@yabola
Copy link
Author

yabola commented Oct 25, 2022

@yihua @codope Hi, if you have time, can you help review my PR, thanks~

Copy link
Member

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yabola Can you please rebase? Would be great if we can compare the performance against shuffle cost.

// Note that the interval tree implementation doesn't have auto-balancing to ensure logN search time.
// So, we are shuffling the input here hoping the tree will not have any skewness. If not, the tree could be skewed
// which could result in N search time instead of NlogN.
Collections.shuffle(allIndexFiles);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yabola do you have some micro-benchmark as to how much improvement this change brings?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, actually I don't have benchmark on it. I think red-black tree is a general optimization strategy like hashmap

* @param key the numeric value of the key
* @return result of aligned numbers. For example, `1` -> `00001`.
*/
private String alignedNumber(long key) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be static?

* 4. Every path from a node (including root) to any of its descendants NULL nodes has the same number of black nodes.
*/
@SuppressWarnings({"unchecked", "rawtypes"})
public class RedBlackTree<T extends RedBlackTreeNode<K>, K extends Comparable<K>> implements Serializable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered thread safety of operations on the tree?

Copy link
Author

@yabola yabola Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make this tree thread safety later if need

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan self-assigned this May 2, 2023
@github-actions github-actions bot added the size:XL PR with lines of changes > 1000 label Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
index priority:major degraded perf; unable to move forward; potential bugs size:XL PR with lines of changes > 1000 writer-core Issues relating to core transactions/write actions
Projects
Status: 🔖 Ready for review
Development

Successfully merging this pull request may close these issues.

6 participants