You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a user, I expect the resultant dataframe from a left join to preserve the row ordering from my left dataframe. This reordering occurs when the left and right dataframes have all keys in common and when the right dataframe only has a subset of the left's keys.
Example
import cudf
import numpy as np
import pandas as pd
import dask_cudf
import dask.dataframe as dd
## Subset of keys
df_a = cudf.DataFrame()
df_a['key'] = [0, 1, 2, 3, 4]
df_a['vals_a'] = [float(i + 10) for i in range(5)]
df_b = cudf.DataFrame()
df_b['key'] = [1, 2, 4]
df_b['vals_b'] = [float(i+100) for i in range(3)]
ddf_a = dask_cudf.from_cudf(df_a, npartitions=2)
ddf_b = dask_cudf.from_cudf(df_b, npartitions=2)
merged = ddf_a.merge(ddf_b, on=['key'], how='left').compute()
print(merged)
key vals_a vals_b
0 1 11.0 100.0
1 2 12.0 101.0
2 4 14.0 102.0
3 0 10.0
0 3 13.0
## All keys in common
df_a = cudf.DataFrame()
df_a['key'] = [0, 1, 2, 3, 4]
df_a['vals_a'] = [float(i + 10) for i in range(5)]
df_b = cudf.DataFrame()
df_b['key'] = [0, 1, 2, 3, 4]
df_b['vals_b'] = [float(i+100) for i in range(5)]
ddf_a = dask_cudf.from_cudf(df_a, npartitions=2)
ddf_b = dask_cudf.from_cudf(df_b, npartitions=2)
merged = ddf_a.merge(ddf_b, on=['key'], how='left').compute()
print(merged)
key vals_a vals_b
0 0 10.0 100.0
1 1 11.0 101.0
2 2 12.0 102.0
3 4 14.0 104.0
0 3 13.0 103.0
The text was updated successfully, but these errors were encountered:
Description
As a user, I expect the resultant dataframe from a left join to preserve the row ordering from my
left
dataframe. This reordering occurs when theleft
andright
dataframes have all keys in common and when theright
dataframe only has a subset of theleft
's keys.Example
The text was updated successfully, but these errors were encountered: