Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check dtypes in h5ad #15

Open
mccalluc opened this issue Feb 27, 2020 · 2 comments
Open

check dtypes in h5ad #15

mccalluc opened this issue Feb 27, 2020 · 2 comments

Comments

@mccalluc
Copy link
Contributor

Trevor notes:

@mccalluc: it would be good to check the dtypes in umap = ann_data.obsm['X_umap']. I would be surprised if the hdf5 data used unnecessarily large dtypes, but pandas defaults to float64 for csv numerics. This was a headache I ran into with arrow early on.

@manzt
Copy link

manzt commented Feb 27, 2020

I would be surprised if numeric dtypes were huge (but good to check!). However, in my experience people forget that casting a column in pandas as categorical for many repeated entries (ie. cell type, etc) can lead to a much lower memory footprint. For saving arrow, I found converting categorical columns had some nice benefits in the resulting arrow size. I found it easiest to convert these types on the pandas.DataFrame and then let pyarrow take care of mapping these to arrow-specific dtypes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants