Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #1607: Allow MultiIndex to have mixed dtypes #1608

Merged

Conversation

stress-tess
Copy link
Member

This PR (Fixes #1607):

NOTE:
The dtypes shown below are from Series and are not accurate since it's possible to have a mix of dtypes in Series. This should be investigated further, but I wanted to get this PR up to address the important bug

Output on reproducer for #1607:

>>> df = ak.DataFrame({'a': ak.arange(10, dtype=ak.int64), 'b': ak.arange(10, dtype=ak.uint64)})
>>> df.groupby(['a', 'b']).count()
0  0    1
1  1    1
2  2    1
3  3    1
4  4    1
5  5    1
6  6    1
7  7    1
8  8    1
9  9    1
dtype: int64

Output from reproducer for #1512 (to ensure it's still working):

>>> a = ak.array([1,2,3,2,3,4,5,6,5,4,3,2])
>>> b = ak.array([2,3,2,3,4,5,6,5,4,3,2,1])
>>> df = ak.DataFrame({'a':a, 'b':b})
>>> c = df.groupby(['a','b']).count()

# These all threw type or attribute errors before this PR
>>> c.index
MultiIndex((array([1 2 2 3 3 4 4 5 5 6]), array([2 1 3 2 4 3 5 4 6 5])))

>>> c.sort_values()
1  2    1
2  1    1
3  4    1
4  3    1
   5    1
5  4    1
   6    1
6  5    1
2  3    2
3  2    2
dtype: int64

>>> c.sort_index()
1  2    1
2  1    1
   3    2
3  2    2
   4    1
4  3    1
   5    1
5  4    1
   6    1
6  5    1
dtype: int64

>>> c.locate((ak.array([1, 2, 2, 3, 3, 4, 4, 5, 5, 6]), ak.array([2, 1, 3, 2, 4, 3, 5, 4, 6, 5])))
1  2    1
2  1    1
   3    2
3  2    2
   4    1
4  3    1
   5    1
5  4    1
   6    1
6  5    1
dtype: int64

This PR (Fixes Bears-R-Us#1607):
- Removes `dtype` parameter and associated validation that was introduced in PR Bears-R-Us#1516. Overrides the `__repr__` of `Index` instead
Copy link
Contributor

@Ethan-DeBandi99 Ethan-DeBandi99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. One comment, but not positive how important it is.

arkouda/index.py Show resolved Hide resolved
Copy link
Collaborator

@reuster986 reuster986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks!

@Ethan-DeBandi99 Ethan-DeBandi99 merged commit 08bcec0 into Bears-R-Us:master Jul 26, 2022
@stress-tess stress-tess deleted the 1607_multiIndex_mixed_dtypes branch July 26, 2022 16:44
jeichert60 pushed a commit to jeichert60/arkouda that referenced this pull request Jul 28, 2022
…-R-Us#1608)

This PR (Fixes Bears-R-Us#1607):
- Removes `dtype` parameter and associated validation that was introduced in PR Bears-R-Us#1516. Overrides the `__repr__` of `Index` instead

Co-authored-by: Pierce Hayes <[email protected]>
jeichert60 pushed a commit to jeichert60/arkouda that referenced this pull request Jul 28, 2022
…-R-Us#1608)

This PR (Fixes Bears-R-Us#1607):
- Removes `dtype` parameter and associated validation that was introduced in PR Bears-R-Us#1516. Overrides the `__repr__` of `Index` instead

Co-authored-by: Pierce Hayes <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow MultiIndex to have mixed dtypes
4 participants