You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@mccalluc: it would be good to check the dtypes in umap = ann_data.obsm['X_umap']. I would be surprised if the hdf5 data used unnecessarily large dtypes, but pandas defaults to float64 for csv numerics. This was a headache I ran into with arrow early on.
The text was updated successfully, but these errors were encountered:
I would be surprised if numeric dtypes were huge (but good to check!). However, in my experience people forget that casting a column in pandas as categorical for many repeated entries (ie. cell type, etc) can lead to a much lower memory footprint. For saving arrow, I found converting categorical columns had some nice benefits in the resulting arrow size. I found it easiest to convert these types on the pandas.DataFrame and then let pyarrow take care of mapping these to arrow-specific dtypes.
Trevor notes:
The text was updated successfully, but these errors were encountered: