[Bug]: Reading units table is very slow when expanding electrodes
table
#237
Labels
category: bug
errors in the code or code behavior
priority: medium
non-critical problem and/or affecting only a small set of users
Milestone
What happened?
We're mainly using NWB-Zarr and found that reading the units table and rendering as a dataframe is prohibitevly slow. This is mainly due to the
electrodes
column that copies the entire dynamic table region into the dataframe.To give some numbers, reading a dataset with ~758 units takse over ~10 minutes. When
index=True
, reading time goes down to ~6s.To investigate this performance issue, we also ran the same tests with the same file saved as HDF5, and here are the results (see steps to reproduce).
In general, Zarr is slower, but this could be due to the fact that everything is compressed by default, with no compression is applied in HDF5.
This barplot shows the reading time for each column in the units table, obtained with:
load_times_hdf5_zarr.pdf
Steps to Reproduce
Traceback
Operating System
Linux
Python Executable
Conda
Python Version
3.9
Package Versions
pynwb 2.8.2
hdmf 3.14.5
hdmf_zarr 0.9.0
Code of Conduct
The text was updated successfully, but these errors were encountered: