Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Putting an Arrow table into a Pandas series takes a long time #25389

Closed
mrocklin opened this issue Feb 20, 2019 · 4 comments
Closed

Putting an Arrow table into a Pandas series takes a long time #25389

mrocklin opened this issue Feb 20, 2019 · 4 comments

Comments

@mrocklin
Copy link
Contributor

It takes a surprisingly long time to put a list containing a PyArrow table object into an object-dtype Pandas series

In [1]: import numpy as np, pandas as pd, pyarrow as pa

In [2]: df = pd.DataFrame({'x': np.arange(1000000)})

In [3]: %time t = pa.Table.from_pandas(df)
CPU times: user 5.36 ms, sys: 3.22 ms, total: 8.58 ms
Wall time: 7.47 ms

In [4]: %time s = pd.Series([t], dtype=object)
CPU times: user 2.7 s, sys: 114 ms, total: 2.82 s
Wall time: 2.81 s

Originally reported downstream in dask/distributed#2521 by @randerzander

@jreback
Copy link
Contributor

jreback commented Feb 20, 2019

this is essentially a duplicate of #25364

@mrocklin
Copy link
Contributor Author

mrocklin commented Feb 20, 2019 via email

@jreback
Copy link
Contributor

jreback commented Feb 20, 2019

i had a typo in the issue number ; check again

@mrocklin
Copy link
Contributor Author

Thanks. Posted there. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants