-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof of concept: add support for zipfile.Path
when loading raw data
#11924
base: main
Are you sure you want to change the base?
Conversation
interesting ! thx for looking into this @dmalt can you test IO speed vs non zip ? basically how much slower is it due to compression? how much is this affected by the choice of compression in ZipFile object? |
Sure! Here are the times for reading 2 splits for small and large data file in different compression variants. I also added times for loading '.fif.gz' for comparison.
I used two mne datasets for this: sample (small) and one file from brainstorm.bst_resting (large). base: simply loading with If you need more detail, I can post the scripts I used. I also looked into what causes the slowdown when reading from archives even with |
hum so we use zip_stored then the difference is not huge assuming we use
preload=True.
With preload=False I think it requires to uncompress the entire file to
read the header.
… Message ID: ***@***.***>
|
It's not, but it still grows with the file size. In any case, 800 ms for loading 1.6 with |
Yes, but it's only in part zip's fault. The slowdown comes more or less from a single line in The problem is |
I'm also wondering, is there a way to load a header without increasing the number of bytes we have to look through? Or is it a quirk of |
makes sense
where do we go from here?
… Message ID: ***@***.***>
|
For now, since adding support for Then we can optionally add handling of with ZipFile(raw_zip_fname) as zipper:
raw = mne.io.read_raw_fif(zipfile.Path(zipper, fname)) If you ask me, I'd rather go for the second option. As far as the header reading performance goes, I'm not sure. I don't understand the '.fiff' file structure enough to see how this can be fixed. But it feels like it should be fixable. |
don't go the fiff understanding route. It can fast become a rabbit hole.
If you think you can do it with documentation then I think it's the safest
option.
… Message ID: ***@***.***>
|
Yeah, I've got this impression.
Ok then, let's do it like that then. |
ccfdd5b
to
a5767bf
Compare
Following the discussion with @agramfort in #11870, I've looked into options to
store splits in archive while keeping
preload=False
possible. Apparently, it's indeed possible with zip archives.Moreover, thanks to
zipfile.Path
implementingpathlib.Path
interface adding support for reading from zip archives is mostly just a matter of not usingstr
for paths and relying onPath
instead. With this PR we can now read directly from zip archives:I.e. this PR allows to use zip archives to avoid problems with splits at least in the client code.
@agramfort , @larsoner, @drammock what do you think?