-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT-#7254: Support right merge/join #42
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Anatoly Myachev <[email protected]>
Clone of the PR modin-project/modin#7226 |
My review is in progress 📖 - I will have feedback for you in a few minutes! |
Warning Rate limit exceeded@furwellness has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 12 minutes and 7 seconds before requesting another review. How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reviewed your code and found 2 potential issues.
if how == "inner" and left._modin_frame._partitions.shape[0] == 1: | ||
left, right = right, left | ||
reverted = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new condition to swap operands in the merge operation is a good optimization for performance. However, we need to ensure that this swap doesn't affect the order of columns in the final result. Consider adding a step after the merge to reorder the columns if the operands were swapped. This will maintain consistency with the expected output when the operands are not swapped.
Chat with Korbit by mentioning @korbit-ai, and give a 👍 or 👎 to help Korbit improve your reviews.
if not reverted: | ||
new_columns, new_dtypes = cls._compute_result_metadata( | ||
left, | ||
right, | ||
on, | ||
left_on, | ||
right_on, | ||
kwargs.get("suffixes", ("_x", "_y")), | ||
) | ||
else: | ||
new_columns, new_dtypes = cls._compute_result_metadata( | ||
right, | ||
left, | ||
on, | ||
right_on, | ||
left_on, | ||
kwargs.get("suffixes", ("_x", "_y")), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _compute_result_metadata method has been updated to handle the case when operands are swapped. However, it's important to ensure that this change doesn't introduce any inconsistencies in the computed metadata. Consider adding unit tests specifically for the swapped operands case to verify that the computed metadata (columns and dtypes) is correct in both swapped and non-swapped scenarios.
Chat with Korbit by mentioning @korbit-ai, and give a 👍 or 👎 to help Korbit improve your reviews.
/review |
PR Reviewer Guide 🔍(Review updated until commit 616b51f)
|
Persistent review updated to latest commit 616b51f |
@coderabbitai full review |
Actions performedFull review triggered. |
PR Code Suggestions ✨
|
No comment from coderabbit |
User description
What do these changes do?
Blocked on modin-project/modin#7251Performance: 1.7 sec (the PR) vs 3.1 sec (on main) on Ray (8 cores)
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
docs/development/architecture.rst
is up-to-dateDescription by Korbit AI
Note
This feature is in early access. You can enable or disable it in the Korbit Console.
What change is being made?
Add support for right merge/join in the
row_axis_merge
function withinmodin/core/storage_formats/pandas/merge.py
.Why are these changes being made?
This change addresses the need for right merge/join functionality, which was previously unsupported. The implementation ensures that the merge operation correctly handles the right join by swapping the left and right dataframes when necessary and adjusting the metadata computation accordingly.
PR Type
enhancement
Description
reverted
to track if the operands were swapped to ensure correct merge operations.Changes walkthrough 📝
merge.py
Implement right merge/join support with operand swapping
modin/core/storage_formats/pandas/merge.py