Skip to content

Commit

Permalink
Handle renamed indexes
Browse files Browse the repository at this point in the history
  • Loading branch information
shwina committed Mar 26, 2021
1 parent f7e94fb commit 1cb9448
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 0 deletions.
12 changes: 12 additions & 0 deletions python/cudf/cudf/core/multiindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,18 @@ def names(self):
def names(self, value):
value = [None] * self.nlevels if value is None else value
assert len(value) == self.nlevels

if len(value) == len(set(value)):

This comment has been minimized.

Copy link
@kkraus14

kkraus14 Mar 26, 2021

Collaborator

I wonder if we should track whether the names are unique / match the underlying keys instead of always calculating the set(value) here.

This comment has been minimized.

Copy link
@shwina

shwina Mar 26, 2021

Author Contributor

Value here can be any arbitrary object though?

This comment has been minimized.

Copy link
@kkraus14

kkraus14 Mar 26, 2021

Collaborator

Sorry, there's no code change here, more along the lines of something like self._are_names_unique which if this evaluates to true we can set to True, otherwise it sets it to False. We could then use that property to determine if it's safe to reach directly into _data or if we need to check self._names instead.

This comment has been minimized.

Copy link
@kkraus14

kkraus14 Mar 26, 2021

Collaborator

Either way out of scope for this PR.

# IMPORTANT: if the provided names are unique,
# we reconstruct self._data with the names as keys.
# If they are not unique, the keys of self._data
# and self._names will be different, which can lead
# to unexpected behaviour in some cases. This is
# definitely buggy, but we can't disallow non-unique
# names either...
self._data = self._data._create_unsafe(
dict(zip(value, self._data.values()))
)
self._names = pd.core.indexes.frozen.FrozenList(value)

def rename(self, names, inplace=False):
Expand Down
13 changes: 13 additions & 0 deletions python/cudf/cudf/tests/test_joining.py
Original file line number Diff line number Diff line change
Expand Up @@ -1725,3 +1725,16 @@ def test_merge_with_lists(how):
got = gd_left.merge(gd_right, on="a")

assert_join_results_equal(expect, got, how=how)


def test_join_renamed_index():
df = cudf.DataFrame(
{0: [1, 2, 3, 4, 5], 1: [1, 2, 3, 4, 5], "c": [1, 2, 3, 4, 5]}
).set_index([0, 1])
df.index.names = ["a", "b"] # doesn't actually change df._index._data

expect = df.to_pandas().merge(
df.to_pandas(), left_index=True, right_index=True
)
got = df.merge(df, left_index=True, right_index=True, how="inner")
assert_join_results_equal(expect, got, how="inner")

0 comments on commit 1cb9448

Please sign in to comment.