-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT-#7254: Support right merge/join #42
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -129,6 +129,11 @@ def row_axis_merge( | |
if how in ["left", "inner"] and left_index is False and right_index is False: | ||
kwargs["sort"] = False | ||
|
||
reverted = False | ||
if how == "inner" and left._modin_frame._partitions.shape[0] == 1: | ||
left, right = right, left | ||
reverted = True | ||
|
||
def should_keep_index(left, right): | ||
keep_index = False | ||
if left_on is not None and right_on is not None: | ||
|
@@ -147,7 +152,10 @@ def should_keep_index(left, right): | |
def map_func( | ||
left, right, *axis_lengths, kwargs=kwargs, **service_kwargs | ||
): # pragma: no cover | ||
df = pandas.merge(left, right, **kwargs) | ||
if reverted: | ||
df = pandas.merge(right, left, **kwargs) | ||
else: | ||
df = pandas.merge(left, right, **kwargs) | ||
|
||
if kwargs["how"] == "left": | ||
partition_idx = service_kwargs["partition_idx"] | ||
|
@@ -170,9 +178,24 @@ def map_func( | |
on = list(on) if is_list_like(on) else [on] | ||
|
||
right_to_broadcast = right._modin_frame.combine() | ||
new_columns, new_dtypes = cls._compute_result_metadata( | ||
left, right, on, left_on, right_on, kwargs.get("suffixes", ("_x", "_y")) | ||
) | ||
if not reverted: | ||
new_columns, new_dtypes = cls._compute_result_metadata( | ||
left, | ||
right, | ||
on, | ||
left_on, | ||
right_on, | ||
kwargs.get("suffixes", ("_x", "_y")), | ||
) | ||
else: | ||
new_columns, new_dtypes = cls._compute_result_metadata( | ||
right, | ||
left, | ||
on, | ||
right_on, | ||
left_on, | ||
kwargs.get("suffixes", ("_x", "_y")), | ||
) | ||
Comment on lines
+181
to
+198
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The _compute_result_metadata method has been updated to handle the case when operands are swapped. However, it's important to ensure that this change doesn't introduce any inconsistencies in the computed metadata. Consider adding unit tests specifically for the swapped operands case to verify that the computed metadata (columns and dtypes) is correct in both swapped and non-swapped scenarios.
|
||
|
||
# We rebalance when the ratio of the number of existing partitions to | ||
# the ideal number of partitions is smaller than this threshold. The | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new condition to swap operands in the merge operation is a good optimization for performance. However, we need to ensure that this swap doesn't affect the order of columns in the final result. Consider adding a step after the merge to reorder the columns if the operands were swapped. This will maintain consistency with the expected output when the operands are not swapped.