Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error : Compressing Maxwell hdf5 dataformat to Zarr using SpikeInterface #1702

Closed
mandarmp opened this issue Jun 6, 2023 · 8 comments · Fixed by #1707
Closed

Error : Compressing Maxwell hdf5 dataformat to Zarr using SpikeInterface #1702

mandarmp opened this issue Jun 6, 2023 · 8 comments · Fixed by #1707
Labels
compression Related to data compression question General question regarding SI

Comments

@mandarmp
Copy link

mandarmp commented Jun 6, 2023

Trying to compress maxwell hdf5 file to zarr format as discussed in the paper "Compression strategies for large-scale electrophysiology data".

Tried the following snippet :

`
from flac_numcodecs import Flac

compressor = Flac(level =8)

local_path= '/mnt/disk15tb/mmpatil/Spikesorting/Data/May16_analysison1024Amp/16848/Network/000041/data.raw.h5'
recording1 = se.read_maxwell(local_path)

recording_zarr = recording1.save(format = "zarr",folder="/mnt/disk15tb/mmpatil/Spikesorting/Data/May16_analysison1024Amp/16848/Network/000041/compressed.zarr",compressor=compressor,
channel_chunk_size =2,n_jobs=64,chunk_duration="1s")
`

But get the folowing Assertion error:

_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/lib/python3.10/concurrent/futures/process.py", line 205, in _process_chunk
return [fn(*args) for args in chunk]
File "/usr/lib/python3.10/concurrent/futures/process.py", line 205, in
return [fn(*args) for args in chunk]
File "/home/mmp/.local/lib/python3.10/site-packages/spikeinterface/core/job_tools.py", line 399, in function_wrapper
return _func(segment_index, start_frame, end_frame, _worker_ctx)
File "/home/mmp/.local/lib/python3.10/site-packages/spikeinterface/core/core_tools.py", line 702, in _write_zarr_chunk
zarr_dataset[start_frame:end_frame, :] = traces
File "/home/mmp/.local/lib/python3.10/site-packages/zarr/core.py", line 1391, in setitem
self.set_basic_selection(pure_selection, value, fields=fields)
File "/home/mmp/.local/lib/python3.10/site-packages/zarr/core.py", line 1486, in set_basic_selection
return self._set_basic_selection_nd(selection, value, fields=fields)
File "/home/mmp/.local/lib/python3.10/site-packages/zarr/core.py", line 1790, in _set_basic_selection_nd
self._set_selection(indexer, value, fields=fields)
File "/home/mmp/.local/lib/python3.10/site-packages/zarr/core.py", line 1842, in _set_selection
self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields)
File "/home/mmp/.local/lib/python3.10/site-packages/zarr/core.py", line 2137, in _chunk_setitem
self._chunk_setitem_nosync(chunk_coords, chunk_selection, value,
...
404 finally:
405 # Break a reference cycle with the exception in self._exception
406 self = None

AssertionError: Data type not supported. Only int16 is supported.

any guidance /support will be greatly appreciated.

@samuelgarcia
Copy link
Member

Hi @mandarmp. Thank you for the feedback.

@alejoe91, you have a first client for your compression story.

@alejoe91
Copy link
Member

alejoe91 commented Jun 7, 2023

Hi @mandarmp

As the error suggests, the problem is that Maxwell data is unsigned integer (uint) while FLAC requires int16.

You can convert the recording toan int16, but you need some extra steps.
The uint16 goes ranges from 0 to 2^16-1, while int16 is from -2^15 to 2^15 -1.

We don't have an automatic function to to this at the recording level, so this is how you should do it:

import spikeinterface.extractors as se
import spikeinterface.preprocessing as spre

rec_original = se.read_maxwell(...)

# upcast to int32
rec_int32 = spre.scale(rec_original, dtype="int32")
# remove the 2^15 offset
rec_rm_offset = spre.scale(rec_int32, offset=-2 ** 15)
# now we can safely cast to int16
rec_int16 = spre.scale(rec_rm_offset, dtype="int16")

The rec_int16 should now be compatible with FLAC ;)

I'll make a PR with a helper function to do this in a single line!

@samuelgarcia
Copy link
Member

@alejoe91 we now have rec_int32 = rec.astype('int16') instead of rec_int32 = scale(rec, dtype='int16')

@alejoe91
Copy link
Member

alejoe91 commented Jun 7, 2023

What do you mean @samuelgarcia ? To make an astype() function in the base recording? That's tricky because it depends on preprocessing. I'd recommend: spre.astype(recording, "int16")

@alejoe91 alejoe91 added question General question regarding SI compression Related to data compression labels Jun 7, 2023
@samuelgarcia
Copy link
Member

the rec.astype() method is already done by Charlie 2 weeks ago and make a local import.

@alejoe91
Copy link
Member

alejoe91 commented Jun 7, 2023

Does it handle unsigned conversion?

@alejoe91
Copy link
Member

alejoe91 commented Jun 7, 2023

I'll modify that ;)

@mandarmp
Copy link
Author

mandarmp commented Jun 7, 2023

Thank you so much @alejoe91 and @samuelgarcia for your assistance. I will use the suggested solution for now. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compression Related to data compression question General question regarding SI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants