Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: expected result of concat of SparseArray with Categorical? #34459

Open
jorisvandenbossche opened this issue May 29, 2020 · 0 comments
Open
Labels
API - Consistency Internal Consistency of API/Behavior Categorical Categorical Data Type Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type

Comments

@jorisvandenbossche
Copy link
Member

xref #34338

The current behaviour when Sparse and Categorical are concatenated, is that for both the "dense" values are used (for sparse the densified values, for categorical the non-categorical version of the values).

For example, Sparse[int] with Categorical[int] results in a plain int series:

In [1]: pd.concat([ 
   ...:     pd.Series([1, 0, 2], dtype=pd.SparseDtype("int64", 0)), 
   ...:     pd.Series([3, 4, 5], dtype="category") 
   ...: ]) 
Out[1]: 
0    1
1    0
2    2
0    3
1    4
2    5
dtype: int64

An alternative could also be to preserve the sparseness in the result, which is what happens when concatting with a plain int series (not categorical):

In [2]: pd.concat([ 
   ...:     pd.Series([1, 0, 2], dtype=pd.SparseDtype("int64", 0)), 
   ...:     pd.Series([3, 4, 5], dtype="int64") 
   ...: ]) 
Out[2]: 
0    1
1    0
2    2
0    3
1    4
2    5
dtype: Sparse[int64, 0]

(alternatively, you could maybe also say that concatting with categorical should never give a numerical result, but rather object dtype or so)

@jorisvandenbossche jorisvandenbossche added Docs Needs Triage Issue that has not been reviewed by a pandas team member API - Consistency Internal Consistency of API/Behavior Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type and removed Docs Needs Triage Issue that has not been reviewed by a pandas team member labels May 29, 2020
@mroeschke mroeschke added Categorical Categorical Data Type Enhancement labels Aug 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Categorical Categorical Data Type Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type
Projects
None yet
Development

No branches or pull requests

2 participants