Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Reading units table is very slow when expanding electrodes table #237

Open
3 tasks done
alejoe91 opened this issue Nov 20, 2024 · 1 comment
Open
3 tasks done
Labels
category: bug errors in the code or code behavior priority: medium non-critical problem and/or affecting only a small set of users
Milestone

Comments

@alejoe91
Copy link
Collaborator

alejoe91 commented Nov 20, 2024

What happened?

We're mainly using NWB-Zarr and found that reading the units table and rendering as a dataframe is prohibitevly slow. This is mainly due to the electrodes column that copies the entire dynamic table region into the dataframe.

To give some numbers, reading a dataset with ~758 units takse over ~10 minutes. When index=True, reading time goes down to ~6s.
To investigate this performance issue, we also ran the same tests with the same file saved as HDF5, and here are the results (see steps to reproduce).

In general, Zarr is slower, but this could be due to the fact that everything is compressed by default, with no compression is applied in HDF5.

This barplot shows the reading time for each column in the units table, obtained with:

for col in colnames:
    t_start = time.perf_counter()
    nwbfile_zarr.units[col][:]
    t_stop = time.perf_counter()
    elapsed_zarr = np.round(t_stop - t_start, 2)

    t_start = time.perf_counter()
    nwbfile_hdf5.units[col][:]
    t_stop = time.perf_counter()
    elapsed_hdf5 = np.round(t_stop - t_start)

load_times_hdf5_zarr.pdf

Steps to Reproduce

### index = False
# zarr
t_start = time.perf_counter()
read_io_zarr =  NWBZarrIO(nwb_path_zarr, mode='r')
nwbfile_zarr = read_io_zarr.read()
nwbfile_zarr.units.to_dataframe(index=False)
t_stop = time.perf_counter()
elapsed = np.round(t_stop - t_start, 2)
print(f"Time reading Zarr units with index=False: {elapsed}s")

# hdf5
t_start = time.perf_counter()
read_io_hdf5 = NWBHDF5IO(nwb_paths_hdf5[file_index], mode='r')
nwbfile_hdf5 = read_io_hdf5.read()
nwbfile_hdf5.units.to_dataframe(index=False)
t_stop = time.perf_counter()
elapsed = np.round(t_stop - t_start, 2)
print(f"Time reading HDF5 units with index=False: {elapsed}s")

### index = False
# zarr
t_start = time.perf_counter()
read_io_zarr =  NWBZarrIO(nwb_path_zarr, mode='r')
nwbfile_zarr = read_io_zarr.read()
nwbfile_zarr.units.to_dataframe(index=True)
t_stop = time.perf_counter()
elapsed = np.round(t_stop - t_start, 2)
print(f"Time reading Zarr units with index=True: {elapsed}s")

# hdf5
t_start = time.perf_counter()
read_io_hdf5 = NWBHDF5IO(nwb_paths_hdf5[file_index], mode='r')
nwbfile_hdf5 = read_io_hdf5.read()
nwbfile_hdf5.units.to_dataframe(index=True)
t_stop = time.perf_counter()
elapsed = np.round(t_stop - t_start, 2)
print(f"Time reading HDF5 units with index=True: {elapsed}s")

Traceback

>>> Time reading Zarr units with index=False: 509.52s

>>> Time reading HDF5 units with index=False: 11.59s

>>> Time reading Zarr units with index=True: 5.48s

>>> Time reading HDF5 units with index=True: 0.45s

Operating System

Linux

Python Executable

Conda

Python Version

3.9

Package Versions

pynwb 2.8.2
hdmf 3.14.5
hdmf_zarr 0.9.0

Code of Conduct

@alejoe91
Copy link
Collaborator Author

@rly here's the issue we discussed about!

@rly rly added category: bug errors in the code or code behavior priority: medium non-critical problem and/or affecting only a small set of users labels Nov 27, 2024
@rly rly added this to the 1.1.0 milestone Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: bug errors in the code or code behavior priority: medium non-critical problem and/or affecting only a small set of users
Projects
None yet
Development

No branches or pull requests

2 participants