Compression for 4D Light Beads Microscopy data? #125

nobias · 2024-01-09T22:21:08Z

nobias
Jan 9, 2024

Hi! We had a brief exchange a while back on whether Light Beads Microscopy data is suitable for DANDI. You offered to take a look at tuning compression parameters for our data back then.

Now I've finally gotten around to writing a preliminary implementation of an ImagingExtractorInterface that reads our (slightly preprocessed) 4D imaging data from .mat files and writes them into NWB. I've uploaded a 16 GB example dataset as an NWB file. This test dataset only contains three out of thirty z-planes, so it is about one tenth of a typical full dataset. Dimensions: (rows, cols, timesteps, z-planes) = (1213, 1188, 1200, 3).

Here's a snip from the output of h5ls showing dataset dimensions and compression ratio:

/acquisition/TwoPhotonSeries/data Dataset {1213/1213, 1200/1200, 1188/1188, 3/3}
    Attribute: conversion scalar
        Type:      native double
    Attribute: offset scalar
        Type:      native double
    Attribute: resolution scalar
        Type:      native double
    Attribute: unit scalar
        Type:      variable-length null-terminated UTF-8 string
    Location:  1:2864
    Links:     1
    Chunks:    {22, 22, 22, 3} 127776 bytes
    Storage:   20751033600 logical bytes, 17345373802 allocated bytes, 119.63% utilization
    Filter-0:  deflate-1 OPT {4}
    Type:      native float

So, deflate is squeezing out about 20%. This data will always be pretty noisy, so I don't think that this is a particularly bad compression ratio for our data. But perhaps you'd have some time to look into this some more?

Let me know how I can help.

CodyCBakerPhD · 2024-01-09T23:01:48Z

CodyCBakerPhD
Jan 9, 2024

@nobias

Thanks for the data example!

A couple of things I noticed right away...

Dimensions: (rows, cols, timesteps, z-planes) = (1213, 1188, 1200, 3).

Those dimensions are incorrect for NWB; they should be (frames, row, cols, planes)

Running the NWB Inspector (used for DANDI validation as well) should help catch such things

Now I've finally gotten around to writing a preliminary implementation of an ImagingExtractorInterface

What version of NeuroConv are you using to run your ImagingExtractorInterface? I ask mainly because

Chunks: {22, 22, 22, 3} 127776 bytes

Those chunks are smaller than modern recommendation, and using recent versions of NeuroConv should automatically choose some better ones for you (namely, chunking by entire frames; all rows/cols; then including as many frames as possible to the max chunk size; distribution of frames over time or z planes is a separate question that depends on use case)

0 replies

nobias · 2024-01-09T23:15:14Z

nobias
Jan 9, 2024
Author

Hi Cody! Hmm, thanks for catching that dimensions oddity, I had some issues with transposed matrices from our .mat files which I thought I had sorted out, but it seems I didn't. I'll have to re-visit.

I installed NeuroConv and roiextractors by pip-cloning the github repos, so they are the the respective repo heads. The ImagingExtractor I wrote is strongly based on Hdf5ImagingInterface and Hdf5ImagingExtractor, with a few tweaks to accommodate those transposed arrays and some custom metadata. I ended up not using your lazy_ops module, but plain h5py for data access in the .mat files.

So, not sure why the chunking is off. Would you perhaps have a clue as to where this is determined in the code? I can debug some more. Thanks!

1 reply

CodyCBakerPhD Jan 9, 2024

Are you able to point me to the repo or code used to define the interface and how the conversion was run?

I installed NeuroConv and roiextractors by pip-cloning the github repos, so they are the the respective repo heads.

As of today, or some point in the past?

Would you perhaps have a clue as to where this is determined in the code

There are lots of places this could be affected; especially if you're using the Hdf5ImagingExtractor or something close to it

Did you see any warnings in console at time of conversion? Particularly from HDMF?

nobias · 2024-01-10T00:18:44Z

nobias
Jan 10, 2024
Author

Sorry for the inaccuracy, I last pulled about two weeks ago, I'm at this commit, so pretty recent.

I must have gotten something wrong in my interface implementation. You can find my module with both ImagingExtractors and ImagingExtractorInterface implementations here. The test was run as in the last two cells of this notebook -- please ignore the rest of that notebook, that's just random test snippets.

I didn't get any warnings, but I had verbose=False. I'll test some more tomorrow. Thank you!

0 replies

nobias · 2024-02-21T18:45:45Z

nobias
Feb 21, 2024
Author

Hi! I fixed my issue with the wrong dimension order (I got confused by the fact that the chunk iterator does its own transpose). I uploaded a new NWB example data file with the correct dimensions (frames, rows, cols, planes) = (1200, 1188, 1213, 3), datatype is float32.

The chunk size is still relatively small, (22, 22, 22, 3). I think this was chosen as an indirect consequence of the buffer size limit, which is 1GB. The buffers used during data conversion were (44, 1188, 1213, 3), which is 760 MB, so probably as large as possible. The chunk size was then chosen to efficiently tile the buffer and be below 1 MB. In my case, this resulted in a relatively small chunk size (127 kB). Would you recommend increasing the buffer and/or chunk size? What chunk size should I aim for?

As for compression, is there anything you would recommend me to try to get a better compression efficiency out, or would you think my current one (119.73%) is acceptable?

1 reply

CodyCBakerPhD Feb 21, 2024

Can you run conda list and then copy/paste the results into a

box here?

nobias · 2024-02-21T19:23:30Z

nobias
Feb 21, 2024
Author

Hi Cody! Sure, here's my environment:
[EDIT: code formatting doesn't seem to work inside a collapsible details section, so I'm removing it. Sorry.]

_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
aiobotocore               2.9.0                    pypi_0    pypi
aiohttp                   3.9.1                    pypi_0    pypi
aioitertools              0.11.0                   pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
anyio                     4.2.0              pyhd8ed1ab_0    conda-forge
appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
argon2-cffi               23.1.0             pyhd8ed1ab_0    conda-forge
argon2-cffi-bindings      21.2.0          py311h459d7ec_4    conda-forge
arrow                     1.3.0              pyhd8ed1ab_0    conda-forge
asciitree                 0.3.3                      py_2    conda-forge
asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge
async-lru                 2.0.4              pyhd8ed1ab_0    conda-forge
attrs                     23.1.0             pyh71513ae_1    conda-forge
babel                     2.14.0             pyhd8ed1ab_0    conda-forge
beautifulsoup4            4.12.2             pyha770c72_0    conda-forge
bidsschematools           0.7.2              pyhd8ed1ab_0    conda-forge
blas                      1.1                    openblas    conda-forge
bleach                    6.1.0              pyhd8ed1ab_0    conda-forge
blessed                   1.19.1             pyhe4f9e05_2    conda-forge
boto3                     1.34.7             pyhd8ed1ab_0    conda-forge
botocore                  1.34.7                   pypi_0    pypi
bottleneck                1.3.7           py311h1f0f07a_1    conda-forge
bqplot                    0.12.42                  pypi_0    pypi
brotli-python             1.0.9           py311h6a678d5_7  
bzip2                     1.0.8                h7b6447c_0  
c-ares                    1.24.0               hd590300_0    conda-forge
ca-certificates           2023.12.12           h06a4308_0  
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
certifi                   2024.2.2        py311h06a4308_0  
cffi                      1.16.0          py311hb3a22ac_0    conda-forge
charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
ci-info                   0.3.0              pyhd8ed1ab_0    conda-forge
click                     8.1.7           unix_pyh707e725_0    conda-forge
click-didyoumean          0.3.0              pyhd8ed1ab_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
comm                      0.1.4              pyhd8ed1ab_0    conda-forge
contourpy                 1.2.0                    pypi_0    pypi
cryptography              41.0.7          py311hcb13ee4_1    conda-forge
cycler                    0.12.1                   pypi_0    pypi
dandi                     0.58.2          py311h38be061_1    conda-forge
dandischema               0.8.4              pyhd8ed1ab_0    conda-forge
dbus                      1.13.6               h48d8840_2    conda-forge
debugpy                   1.6.7           py311h6a678d5_0  
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
dill                      0.3.7                    pypi_0    pypi
dnspython                 2.4.2              pyhd8ed1ab_1    conda-forge
email-validator           2.1.0.post1        pyhd8ed1ab_0    conda-forge
entrypoints               0.4                pyhd8ed1ab_0    conda-forge
etelemetry                0.3.1                    pypi_0    pypi
exceptiongroup            1.2.0              pyhd8ed1ab_0    conda-forge
executing                 2.0.1              pyhd8ed1ab_0    conda-forge
expat                     2.5.0                hcb278e6_1    conda-forge
fasteners                 0.17.3             pyhd8ed1ab_0    conda-forge
fonttools                 4.47.0                   pypi_0    pypi
fqdn                      1.5.1              pyhd8ed1ab_0    conda-forge
frozenlist                1.4.1                    pypi_0    pypi
fscacher                  0.4.0              pyhd8ed1ab_0    conda-forge
fsspec                    2023.12.2                pypi_0    pypi
gast                      0.4.0                    pypi_0    pypi
gettext                   0.21.1               h27087fc_0    conda-forge
glib                      2.68.4               h9c3ff4c_1    conda-forge
glib-tools                2.68.4               h9c3ff4c_1    conda-forge
gmp                       6.2.1                h58526e2_0    conda-forge
h11                       0.14.0             pyhd8ed1ab_0    conda-forge
h2                        4.1.0              pyhd8ed1ab_0    conda-forge
h5py                      3.10.0          nompi_py311hebc2b07_101    conda-forge
hdf5                      1.14.3          nompi_h4f84152_100    conda-forge
hdf5plugin                4.4.0           py311h28d1a1f_0    conda-forge
hdmf                      3.11.0             pyh1ea47a8_0    conda-forge
hdmf-zarr                 0.5.0                    pypi_0    pypi
hpack                     4.0.0              pyh9f0ad1d_0    conda-forge
httpcore                  1.0.2              pyhd8ed1ab_0    conda-forge
humanize                  4.9.0              pyhd8ed1ab_0    conda-forge
hyperframe                6.0.1              pyhd8ed1ab_0    conda-forge
idna                      3.6                pyhd8ed1ab_0    conda-forge
imageio                   2.33.1                   pypi_0    pypi
importlib-metadata        4.13.0                   pypi_0    pypi
importlib-resources       6.1.1              pyhd8ed1ab_0    conda-forge
importlib_metadata        7.0.1                hd8ed1ab_0    conda-forge
importlib_resources       6.1.1              pyhd8ed1ab_0    conda-forge
interleave                0.2.1              pyhd8ed1ab_0    conda-forge
ipydatagrid               1.2.0                    pypi_0    pypi
ipydatawidgets            4.3.2                    pypi_0    pypi
ipyfilechooser            0.6.0                    pypi_0    pypi
ipykernel                 6.26.0             pyhf8b6a83_0    conda-forge
ipympl                    0.9.3                    pypi_0    pypi
ipython                   8.19.0             pyh707e725_0    conda-forge
ipython-genutils          0.2.0                    pypi_0    pypi
ipyvolume                 0.6.3                    pypi_0    pypi
ipyvue                    1.10.1                   pypi_0    pypi
ipyvuetify                1.8.10                   pypi_0    pypi
ipywebrtc                 0.6.0                    pypi_0    pypi
ipywidgets                8.1.1              pyhd8ed1ab_0    conda-forge
isodate                   0.6.1              pyhd8ed1ab_0    conda-forge
isoduration               20.11.0            pyhd8ed1ab_0    conda-forge
jaraco.classes            3.3.0              pyhd8ed1ab_0    conda-forge
jedi                      0.19.1             pyhd8ed1ab_0    conda-forge
jeepney                   0.8.0              pyhd8ed1ab_0    conda-forge
jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
jmespath                  1.0.1              pyhd8ed1ab_0    conda-forge
joblib                    1.3.2              pyhd8ed1ab_0    conda-forge
json5                     0.9.14             pyhd8ed1ab_0    conda-forge
jsonpointer               2.4             py311h38be061_3    conda-forge
jsonschema                4.20.0             pyhd8ed1ab_0    conda-forge
jsonschema-specifications 2023.11.2          pyhd8ed1ab_0    conda-forge
jsonschema-with-format-nongpl 4.20.0             pyhd8ed1ab_0    conda-forge
jupyter                   1.0.0             pyhd8ed1ab_10    conda-forge
jupyter-lsp               2.2.1              pyhd8ed1ab_0    conda-forge
jupyter_client            8.6.0              pyhd8ed1ab_0    conda-forge
jupyter_console           6.6.3              pyhd8ed1ab_0    conda-forge
jupyter_core              5.5.1           py311h38be061_0    conda-forge
jupyter_events            0.9.0              pyhd8ed1ab_0    conda-forge
jupyter_server            2.12.1             pyhd8ed1ab_0    conda-forge
jupyter_server_terminals  0.5.0              pyhd8ed1ab_0    conda-forge
jupyterlab                4.0.9              pyhd8ed1ab_0    conda-forge
jupyterlab_pygments       0.3.0              pyhd8ed1ab_0    conda-forge
jupyterlab_server         2.25.2             pyhd8ed1ab_0    conda-forge
jupyterlab_widgets        3.0.9              pyhd8ed1ab_0    conda-forge
keyring                   24.3.0          py311h38be061_0    conda-forge
keyrings.alt              4.2.0              pyhd8ed1ab_0    conda-forge
kiwisolver                1.4.5                    pypi_0    pypi
krb5                      1.20.1               h143b758_1  
lazy-loader               0.3                      pypi_0    pypi
lazy-ops                  0.2.0                    pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1  
libaec                    1.1.2                h59595ed_1    conda-forge
libcurl                   8.4.0                h251f7ec_1  
libedit                   3.1.20230828         h5eee18b_0  
libev                     4.33                 hd590300_2    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.4                h6a678d5_0  
libgcc-ng                 13.2.0               h807b86a_3    conda-forge
libgfortran-ng            13.2.0               h69a702a_3    conda-forge
libgfortran5              13.2.0               ha4646dd_3    conda-forge
libglib                   2.68.4               h174f98d_1    conda-forge
libgomp                   13.2.0               h807b86a_3    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
libnghttp2                1.57.0               h2d74bed_0  
libopenblas               0.3.25          pthreads_h413a1c8_0    conda-forge
libsodium                 1.0.18               h36c2ea0_1    conda-forge
libssh2                   1.10.0               ha35d2d1_2    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_5    conda-forge
libuuid                   1.41.5               h5eee18b_0  
libzlib                   1.2.13               hd590300_5    conda-forge
markdown-it-py            3.0.0              pyhd8ed1ab_0    conda-forge
markupsafe                2.1.3           py311h459d7ec_1    conda-forge
matplotlib                3.8.2                    pypi_0    pypi
matplotlib-inline         0.1.6              pyhd8ed1ab_0    conda-forge
mdurl                     0.1.0              pyhd8ed1ab_0    conda-forge
mistune                   3.0.2              pyhd8ed1ab_0    conda-forge
more-itertools            10.1.0             pyhd8ed1ab_0    conda-forge
msgpack-python            1.0.3           py311hdb19cb5_0  
multidict                 6.0.4                    pypi_0    pypi
natsort                   8.4.0              pyhd8ed1ab_0    conda-forge
nbclient                  0.8.0              pyhd8ed1ab_0    conda-forge
nbconvert                 7.13.1             pyhd8ed1ab_0    conda-forge
nbconvert-core            7.13.1             pyhd8ed1ab_0    conda-forge
nbconvert-pandoc          7.13.1             pyhd8ed1ab_0    conda-forge
nbformat                  5.9.2              pyhd8ed1ab_0    conda-forge
ncurses                   6.4                  h6a678d5_0  
ndx-grayscalevolume       0.0.2                    pypi_0    pypi
ndx-icephys-meta          0.1.0                    pypi_0    pypi
ndx-spectrum              0.2.2                    pypi_0    pypi
nest-asyncio              1.5.8              pyhd8ed1ab_0    conda-forge
networkx                  3.2.1                    pypi_0    pypi
neuroconv                 0.4.6                    pypi_0    pypi
nexusformat               1.0.3              pyhd8ed1ab_0    conda-forge
notebook                  7.0.6              pyhd8ed1ab_0    conda-forge
notebook-shim             0.2.3              pyhd8ed1ab_0    conda-forge
numcodecs                 0.11.0          py311h6a678d5_0  
numexpr                   2.8.7           py311h812550d_0  
numpy                     1.26.2          py311h24aa872_0  
numpy-base                1.26.2          py311hbfb1bba_0  
nwbinspector              0.4.31             pyhd8ed1ab_0    conda-forge
nwbwidgets                0.11.3                   pypi_0    pypi
openblas                  0.3.25          pthreads_h7a3da1a_0    conda-forge
openssl                   3.2.0                hd590300_1    conda-forge
overrides                 7.4.0              pyhd8ed1ab_0    conda-forge
packaging                 23.2               pyhd8ed1ab_0    conda-forge
pandas                    2.1.4           py311ha02d727_0  
pandoc                    2.19.2               ha770c72_0    conda-forge
pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
parse                     1.20.0                   pypi_0    pypi
parso                     0.8.3              pyhd8ed1ab_0    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pexpect                   4.8.0              pyh1a96a4e_2    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pillow                    10.1.0                   pypi_0    pypi
pip                       23.3.1          py311h06a4308_0  
pkgutil-resolve-name      1.3.10             pyhd8ed1ab_1    conda-forge
platformdirs              4.1.0              pyhd8ed1ab_0    conda-forge
plotly                    5.13.1                   pypi_0    pypi
prometheus_client         0.19.0             pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.42             pyha770c72_0    conda-forge
prompt_toolkit            3.0.42               hd8ed1ab_0    conda-forge
psutil                    5.9.7           py311h459d7ec_0    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
py2vega                   0.6.1                    pypi_0    pypi
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pycryptodomex             3.19.0          py311h459d7ec_1    conda-forge
pydantic                  1.10.13         py311h459d7ec_1    conda-forge
pygments                  2.17.2             pyhd8ed1ab_0    conda-forge
pynwb                     2.5.0              pyh267d04e_0    conda-forge
pyout                     0.7.3           py311h38be061_1    conda-forge
pyparsing                 3.1.1              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.11.5               h955ad1f_0  
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-fastjsonschema     2.19.0             pyhd8ed1ab_0    conda-forge
python-json-logger        2.0.7              pyhd8ed1ab_0    conda-forge
python-tzdata             2023.3             pyhd8ed1ab_0    conda-forge
python_abi                3.11                    2_cp311    conda-forge
pythreejs                 2.4.2                    pypi_0    pypi
pytz                      2023.3.post1       pyhd8ed1ab_0    conda-forge
pyyaml                    6.0.1           py311h459d7ec_1    conda-forge
pyzmq                     25.1.0          py311h6a678d5_0  
qtconsole-base            5.5.1              pyha770c72_0    conda-forge
qtpy                      2.4.1              pyhd8ed1ab_0    conda-forge
readline                  8.2                  h5eee18b_0  
referencing               0.32.0             pyhd8ed1ab_0    conda-forge
requests                  2.31.0             pyhd8ed1ab_0    conda-forge
rfc3339-validator         0.1.4              pyhd8ed1ab_0    conda-forge
rfc3986-validator         0.1.1              pyh9f0ad1d_0    conda-forge
rfc3987                   1.3.8                      py_0    conda-forge
roiextractors             0.5.6                     dev_0    <develop>
rpds-py                   0.15.2          py311h46250e7_0    conda-forge
ruamel.yaml               0.18.5          py311h459d7ec_0    conda-forge
ruamel.yaml.clib          0.2.7           py311h459d7ec_2    conda-forge
s3fs                      0.4.2                    pypi_0    pypi
s3transfer                0.10.0             pyhd8ed1ab_0    conda-forge
scanimage-tiff-reader     1.4.1.4                  pypi_0    pypi
scikit-image              0.22.0                   pypi_0    pypi
scipy                     1.11.4          py311h24aa872_0  
secretstorage             3.3.3           py311h38be061_2    conda-forge
semantic_version          2.10.0             pyhd8ed1ab_0    conda-forge
send2trash                1.8.2              pyh41d4057_0    conda-forge
setuptools                68.2.2          py311h06a4308_0  
six                       1.16.0             pyh6c4a22f_0    conda-forge
sniffio                   1.3.0              pyhd8ed1ab_0    conda-forge
soupsieve                 2.5                pyhd8ed1ab_1    conda-forge
sqlite                    3.41.2               h5eee18b_0  
stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
tabulate                  0.9.0              pyhd8ed1ab_1    conda-forge
tenacity                  8.2.3              pyhd8ed1ab_0    conda-forge
terminado                 0.18.0             pyh0d859eb_0    conda-forge
threadpoolctl             3.2.0                    pypi_0    pypi
tifffile                  2023.12.9                pypi_0    pypi
tinycss2                  1.2.1              pyhd8ed1ab_0    conda-forge
tk                        8.6.12               h1ccaba5_0  
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
tornado                   6.3.3           py311h459d7ec_1    conda-forge
tqdm                      4.66.1             pyhd8ed1ab_0    conda-forge
traitlets                 5.14.0             pyhd8ed1ab_0    conda-forge
traittypes                0.2.1                    pypi_0    pypi
trimesh                   4.0.8                    pypi_0    pypi
types-python-dateutil     2.8.19.14          pyhd8ed1ab_0    conda-forge
typing-extensions         4.9.0                hd8ed1ab_0    conda-forge
typing_extensions         4.9.0              pyha770c72_0    conda-forge
typing_utils              0.1.0              pyhd8ed1ab_0    conda-forge
tzdata                    2023c                h04d1e81_0  
uri-template              1.3.0              pyhd8ed1ab_0    conda-forge
urllib3                   1.26.18            pyhd8ed1ab_0    conda-forge
wcwidth                   0.2.12             pyhd8ed1ab_0    conda-forge
webcolors                 1.13               pyhd8ed1ab_0    conda-forge
webencodings              0.5.1              pyhd8ed1ab_2    conda-forge
websocket-client          1.7.0              pyhd8ed1ab_0    conda-forge
wheel                     0.41.2          py311h06a4308_0  
widgetsnbextension        4.0.9              pyhd8ed1ab_0    conda-forge
wrapt                     1.16.0                   pypi_0    pypi
xz                        5.4.5                h5eee18b_0  
yaml                      0.2.5                h7f98852_2    conda-forge
yarl                      1.9.4                    pypi_0    pypi
zarr                      2.16.1             pyhd8ed1ab_0    conda-forge
zarr-checksum             0.2.12                   pypi_0    pypi
zeromq                    4.3.4                h9c3ff4c_1    conda-forge
zipp                      3.17.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               hd590300_5    conda-forge

2 replies

CodyCBakerPhD Feb 21, 2024

I had figured that neuroconv 0.4.6 and hdmf 0.11.0 would be sufficient to get you everything

To be sure can you

pip uninstall neuroconv
pip install git+https://github.com/catalystneuro/neuroconv.git@main

pip uninstall hdmf
pip install git+https://github.com/hdmf-dev/hdmf.git@dev

CodyCBakerPhD Feb 21, 2024

And then try running conversion again?

Hi! I fixed my issue with the wrong dimension order (I got confused by the fact that the chunk iterator does its own transpose)

Yeah, that's because ROIExtractors uses class height x width convention common for videos/image data, but NWB and NeuroConv by extension use width x height. Sorry for that confusion, we try to document it everywhere we can

The chunk size is still relatively small, (22, 22, 22, 3). I think this was chosen as an indirect consequence of the buffer size limit, which is 1GB. The buffers used during data conversion were (44, 1188, 1213, 3), which is 760 MB, so probably as large as possible. The chunk size was then chosen to efficiently tile the buffer and be below 1 MB. In my case, this resulted in a relatively small chunk size (127 kB). Would you recommend increasing the buffer and/or chunk size? What chunk size should I aim for?

What we want to see is chunk sizes closer to ~10MB, and fitting full images into each chunk

If you carefully read the __init__ of the method you linked, the chunk_mb defaults to 10 MB , then chunk shape is determined (and really should fit all the frames it can along first axis), then buffer shape is determined from the chunk shape, effectively filling 'as many chunks as it can'

As for compression, is there anything you would recommend me to try to get a better compression efficiency out, or would you think my current one (119.73%) is acceptable?

Once you fit the entire image, and multiple images, into each chunk, there's nothing more to do than pray the compression method does better. You could also try different compressors to write the same file with same chunking pattern and see if any of them do any better 🤔

nobias · 2024-02-22T17:13:26Z

nobias
Feb 22, 2024
Author

Alright, after upgrading to the latest main branches of neuroconv, roiextractors and hdmf, the chunk size is now (1, 1188, 1213, 1) = 5764176 bytes. Compression utilization increased very slightly to 122%. I don't have the time to try different compressors, so if you don't see a red flag then it is what it is. This is rather noisy data, so without doing something much smarter, I don't think that the other generic compressors would do dramatically better.

One last question: Compression is single-threaded and thus quite a bottleneck. Would it be acceptable if instead of packing our entire 4D dataset into a single NWB file, we generated a separate NWB file for each of our 30 planes? Is there anything we should observe in that case regarding the metadata and file structures? Any example use cases out there that we should learn from? Thank you!

1 reply

CodyCBakerPhD Feb 22, 2024

What I usually do is parallelize over multiple sessions (files) at a time, but the objects in each file are still the 4D versions

It comes down to how you like to think about the data. If you analyze planes separately (not volumetric) then separate of planes into different objects, and then into different files, could certainly be done (though this seems rather reminiscent of the NWB Zarr backend I must say, which also supports parallel write to a single file)

But if you think of the data as whole-brain or other volumetric splicing then I'd say keep it as 4D and just leave the data to churn as a nightly job - it all comes down to how you want to communicate the data contents to to the rest of the world

nobias · 2024-02-22T17:40:00Z

nobias
Feb 22, 2024
Author

Thank you, Cody! The way we analyze things will remain in flux for a good while. Currently, we first run source extraction plane-by-plane and then merge across planes in a separate step, to create the final list of 3D neuron locations and activity time series. Meanwhile, we are also working on an inherently volumetric source extraction pipeline. So the most efficient way to organize the data during source extraction may change, and I think that it is a somewhat separate problem from how we archive it. Thanks for the hint on the NWB Zarr backend. I'll look into it and decide based on that.

In general, I would expect that what 99% of data users will be interested in is not the monstrous 4D voxel dataset, but the output of the source extraction (locations, activities). And I'm very much inclined to keep that in a separate file anyway, to save people the trouble of dealing with a 150+ GB file of voxel data if all they want to access is a few MB of neuron activity time series, plus some metadata. So probably, we will end up having more than one NWB file per dataset anyways, so I might just as well keep the planes separate, too, to keep things a bit more lightweight on the file level.

1 reply

CodyCBakerPhD Feb 22, 2024

Sounds great; we've been recommending lately for uploaders to DANDI to separate raw and processed data into separate files exactly as you indicate

Keep in mind DANDI would still love to have all that 'monstrous' raw data and would be happy to store it at no cost 😉It tends to be very useful for the corner of the ecosystem where developers of segmentation, motion correction, & other ophys processing algorithms live

Just let us know if you ever run into any difficulties on this journey 👍

nobias · 2024-02-22T20:10:15Z

nobias
Feb 22, 2024
Author

Absolutely, thank you very much for your kind support.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compression for 4D Light Beads Microscopy data? #125

{{title}}

Replies: 8 comments 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Compression for 4D Light Beads Microscopy data? #125

Replies: 8 comments · 6 replies

nobias Jan 9, 2024 Author

nobias Jan 10, 2024 Author

nobias Feb 21, 2024 Author

nobias Feb 21, 2024 Author

nobias Feb 22, 2024 Author

nobias Feb 22, 2024 Author

nobias Feb 22, 2024 Author

Replies: 8 comments 6 replies

nobias
Jan 9, 2024
Author

nobias
Jan 10, 2024
Author

nobias
Feb 21, 2024
Author

nobias
Feb 21, 2024
Author

nobias
Feb 22, 2024
Author

nobias
Feb 22, 2024
Author

nobias
Feb 22, 2024
Author