-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concatenating rows with Int64 datatype coerces to object #24768
Comments
Thanks for the report and makes sense. Investigation and PRs are always welcome! |
I like the title change that you proposed, as this is actually another issue that I wanted to open. The problem that I tried to point out in this issue however is the fact that there are now 6 rows with a 1 in it (column B), whereas we would only expect 3 rows with a 1 in it. |
In short, I would like to propose to keep the title of this issue to the original title, as all the code and expected output points out the 'unexpected behavior when concatenating rows with int datatype'. I will open a second issue with proper expected output on 'Concatenating rows with Int64 datatype coerces to object' (there are some more things that go wrong there, that are not outlined in this issue ) |
@janvanrijn can you check the example your original post? When you assign import pandas as pd
df_a = pd.DataFrame({'a': [-1, -1, -1]})
df_b = pd.DataFrame({'b': [1, 1, 1]})
total = pd.concat([df_a, df_b], ignore_index=True) as the columns are different. Perhaps you mean |
Ouch, sloppy. My sincere apologies. Then only the 'Concatenating rows with Int64 datatype coerces to object' remains. |
Can you ensure that the original post has correct values for the example
and expected output?
…On Mon, Jan 14, 2019 at 1:16 PM janvanrijn ***@***.***> wrote:
Ouch, sloppy. My sincere apologies.
Then only the 'Concatenating rows with Int64 datatype coerces to object'
remains.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#24768 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIruIrDQY_97VxLlRh4l0whl1ngVZks5vDNeNgaJpZM4Z_EIi>
.
|
Updated. |
Thanks. So the issue is that concating integer and empty coerces to object In [33]: pd.concat([df_a, pd.DataFrame(index=[0, 1])], ignore_index=True, sort=True).dtypes
Out[33]:
a object
dtype: object The logic in |
this is correct |
sorry that was for int64 concat with empty is not tested at all for EA types |
Yeah, just to make this clear, the following are two different cases
```python
In [10]: pd.concat([pd.DataFrame({"A": [1, 2]}), pd.DataFrame()]).dtypes
Out[10]:
A int64
dtype: object
In [11]: pd.concat([pd.DataFrame({"A": [1, 2]}),
pd.DataFrame(columns=['A'])]).dtypes
Out[11]:
A object
dtype: object
```
Concatenating with a newly-created column should preserve the dtype (if
possible).
…On Mon, Jan 14, 2019 at 2:20 PM Jeff Reback ***@***.***> wrote:
sorry that was for int64
concat with empty is not tested at all for EA types
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#24768 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHImTiGV39Wu5S3iiqhL6WMJ25HtOeks5vDOaPgaJpZM4Z_EIi>
.
|
Also, when making an |
@janvanrijn that's a separate issue. #22861 / #23223 |
is not implemented |
In such case concatenating float64 data doesn't preserve the dtype though. Is it considered a bug?
A float64 |
Not sure about bug, or just not implemented fully yet.
The general rule for concatenating a mixture of EAs and non-EAs is to
coerce everything to object.
We have an open issue about a type-based multiple dispatch for concat.
On Fri, Feb 15, 2019 at 10:58 AM Haochen Wu <[email protected]>
wrote:
… Yeah, just to make this clear, the following are two different cases python
In [10]: pd.concat([pd.DataFrame({"A": [1, 2]}), pd.DataFrame()]).dtypes
Out[10]: A int64 dtype: object In [11]: pd.concat([pd.DataFrame({"A": [1,
2]}), pd.DataFrame(columns=['A'])]).dtypes Out[11]: A object dtype: object Concatenating
with a newly-created column should preserve the dtype (if possible).
… <#m_-1856625048003412203_>
On Mon, Jan 14, 2019 at 2:20 PM Jeff Reback ***@***.***> wrote: sorry that
was for int64 concat with empty is not tested at all for EA types — You are
receiving this because you commented. Reply to this email directly, view it
on GitHub <#24768 (comment)
<#24768 (comment)>>,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHImTiGV39Wu5S3iiqhL6WMJ25HtOeks5vDOaPgaJpZM4Z_EIi
.
In such case concatenating float64 data doesn't preserve the dtype though.
Is it considered a bug?
pd.concat([pd.DataFrame({"A": [1., 2.]}), pd.DataFrame(columns=['A'])]).dtypes
A float64
dtype: object
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#24768 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIqNP3kIeB4Ecky9zpD1chb-jx3UMks5vNucmgaJpZM4Z_EIi>
.
|
Code Sample, a copy-pastable example if possible
output:
Problem description
When running the exact same code with floats, the output is similar to the expected output. It also happens with the append function
Expected Output
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: