Refactor writing raw file #11953

dmalt · 2023-09-04T18:13:31Z

Reference issue

Necessary for PR #11924

What does this implement/fix?

Purge recursion from _write_raw() and _write_raw_fid() interaction
Improve general code readability

Additional information

In #11924 I'm adding support for reading and writing from zip archives via using Path interface in the reading and writing functions. At the moment such change is not possible because zipfile.Paths don't support writing to several fids at once. It's a problem because currently MNE-Python writes splits recursively, such that it keeps the fid for the first file open until we're done writing all the remaining splits.

This PR attempts at fixing this recursion problem by using a simple while loop. As a bonus, no-recursion implementation is easier to understand and maintain (hopefully).

_write_raw_fid and _start_writing_raw are now methods of _RawFidWriter

call _start_writing_raw() and _write_raw_fid() from a single method

fix a newly introduced bug when we moved fpath before closing its fid

larsoner · 2023-09-06T01:00:04Z

If you still think it's too much, please check the first commit in this PR: 5dbd876.

Hmm... I do find that somewhat more readable than the class + dataclasses version that is in the current diff. What's the advantage of the class+dataclass-based approach?

dmalt · 2023-09-06T09:32:09Z

I did classes because I wanted to organise parameters to the _write_raw() call . Most of the arguments to _write_raw() are just passed through to _write_raw_fid() and since there's so many of them it's hard to see, which variables are modified and which are just passed. Same logic goes behind hiding start as a class state variable. _write_raw() doesn't really need to be exposed to these. Although, arguably now it's hard to see from a first glance, that start is a mutable state of _FidRawWriter, while other variables are read-only.

Also It feels like _start_writing_raw() and _write_raw_fid() conceptually are really part of the same function. E.g. there's opening fiff tag in the former and the matching closing tag in the latter.

At this point, I've looked at this piece of code for too long and I'm not sure about anything any longer:)
What do you think I should do?

larsoner · 2023-09-06T12:25:33Z

Although, arguably now it's hard to see from a first glance, that start is a mutable state of _FidRawWriter, while other variables are read-only.

Can you make a comment saying this? I think it's an important point

Also It feels like _start_writing_raw() and _write_raw_fid() conceptually are really part of the same function

Agreed, the names are not descriptive. Like if the former writes metadata (info, annotation, etc) and the latter writes actual data we could rename them accordingly...

there's opening fiff tag in the former and the matching closing tag in the latter.

... And/or in theory we could split off some stuff in a _finish_writing_raw if it helps close the loop.

But these are half baked ideas I came up with without looking at the code in depth so they could be way off. If you want I can dig deeper in an hour or two and see if these make any sense and possibly push changes if they do (or if something else occurs to me when looking deeper).

wmvanvliet · 2023-09-06T13:09:38Z

great improvement!

dmalt · 2023-09-06T13:28:44Z

Can you make a comment saying this? I think it's an important point

Sure.

... And/or in theory we could split off some stuff in a _finish_writing_raw if it helps close the loop.

I'd rather move the outermost tags to _RawFidWriter.write() like this:

def write(...):
    begin_block(fid,  FIFF.FIFFB_MEAS)
    _start_writing_raw(...)
    is_new_split = _write_raw_fid(...)
    end_block(fid, FIFF.FIFFB_MEAS)
    return is_new_split

And rename _start_writing_raw to something like _write_raw_fid_metadata. So these begin_block and end_block outline the structure. I attempted to do it like this in the initial _FidRawWriter version.

The same way _write_raw_fid would begin with

    if info.get("maxshield", False):
        start_block(fid, FIFF.FIFFB_IAS_RAW_DATA)
    else:
        start_block(fid, FIFF.FIFFB_RAW_DATA)

(Which is now part of _start_writing_raw() for some reason)
And end with

    if info.get("maxshield", False):
        end_block(fid, FIFF.FIFFB_IAS_RAW_DATA)
    else:
        end_block(fid, FIFF.FIFFB_RAW_DATA)

Also this part of _start_writing_raw() really belongs in _write_raw_fid():

    cals = []
    for k in range(info["nchan"]):
        #
        #   Scan numbers may have been messed up
        #
        info["chs"][k]["scanno"] = k + 1  # scanno starts at 1 in FIF format
        if reset_range is True:
            info["chs"][k]["range"] = 1.0
        cals.append(info["chs"][k]["cal"] * info["chs"][k]["range"])

All it does is calculating calibrations just to pass them to _write_raw_fid().

Maybe just merging _start_writing_raw() and _write_raw_fid() together would be the simplest solution to everything.

larsoner · 2023-09-06T13:34:49Z

I'd rather move the outermost tags to _RawFidWriter.write() like this: ... _write_raw_fid_metadata

There are exactly part of the solution I came up with! Pushing now...

Maybe just merging _start_writing_raw() and _write_raw_fid() together would be the simplest solution to everything.

Yes but I don't mind keeping them a bit separate since one really is about metadata and the other about data. That plus a smaller diff is nice I think.

dmalt · 2023-09-06T13:36:46Z

There are exactly part of the solution I came up with! Pushing now...

Ok, cool!

larsoner · 2023-09-06T13:38:00Z

@dmalt let me know if you're happy with this version and/or push any final tweaks and I'll mark for merge when green!

dmalt · 2023-09-06T13:38:20Z

Yes but I don't mind keeping them a bit separate since one really is about metadata and the other about data. That plus a smaller diff is nice I think.

If you move out tags and calibrations, it becomes just a couple of lines of code.

dmalt · 2023-09-06T13:53:57Z

@dmalt let me know if you're happy with this version and/or push any final tweaks and I'll mark for merge when green!

Looks great!

Just made that calibrations move. Feel free to merge when green!

UPD.
I'm an idiot. Calibrations snippet mutates info before writing it, so it can't be moved. I reverted this change.
UPD2.
Added back a modified version. I hope this time it's correct.

mne/io/base.py

larsoner · 2023-09-06T17:33:16Z

mne/io/base.py

+    dir_path = fpath.parent
+    # We have to create one extra filename here to make the for loop below happy,
+    # but it will raise an error if it actually gets used
+    split_fnames = _make_split_fnames(
+        fpath.name, n_splits=MAX_N_SPLITS + 1, split_naming=split_naming
+    )
+    is_next_split, prev_fname = True, None
+    for part_idx in range(0, MAX_N_SPLITS):
+        if not is_next_split:
+            break
+        bids_special_behavior = part_idx == 0 and split_naming == "bids"
+        if bids_special_behavior:
+            reserved_fname = dir_path / split_fnames[0]
+            logger.info(f"Reserving possible split file {reserved_fname.name}")
+            _check_fname(reserved_fname, overwrite)
+            reserved_ctx = _ReservedFilename(reserved_fname)


@dmalt in addition to changing from while to for, I also DRY'ed the code a bit. The previous version had a first if clause that had a lot of the same code in it that the while loop had. By embedding the logic for bids_special_naming inside the loop, we can remove that redundancy. Now I think it's clearer when something special has to happen. WDYT?

Honestly, I'm not the biggest fan of the DRY principle. I feel like it definitely must be applied when there are >= 3 repetitions, but when something is repeated 2 times, it's sometimes better to resist the urge and just repeat yourself. For me here DRY doesn't improve readability because the control flow gets more complex. Here I guess it's a matter of taste, so I'll be happy with whichever option you choose. Just saying it for the sake of discussion.

DRY doesn't improve readability because the control flow gets more complex.

It is true that there are now two if statements instead of one. But in the old code the conditional variable values that end up needing to be accounted for (in the mental model and executed in code) were more plentiful: part_idx, prev_fname, and is_next_split could all be modified inside that first conditional, so all of their values were essentially conditional on (what is now known as) bids_special_behavior, even though setting those values was accomplished with a single if statement.

In the new version of the code that uses a unified loop, this isn't the case: part_idx always starts at zero and increases in one place (range); prev_fname starts as None and gets set to the current_fname or whatever at the end of each iteration of the loop; is_next_split only gets overwritten one place (inside the loop) instead of two. The only special/conditional thing that happens is that for BIDS there might be a renaming, which leads to two "if" statements, but to me the variable behavior and otherwise equivalence of the behaviors under the two naming schemes is more readily apparent. But I guess this could be based on a different emphasis / mental model for how similar these two naming schemes are similar rather than how they differ...

I was thinking of prev_fname more as of explanatory variable. It could be plugged directly into _write_raw_data(), e.g. as keyword arguments to be more explicit. I see the point with regards to part_idx and is_next_split but I feel like with duplication you just read the code from top to bottom and in dried version there's more mental context to keep. Especially with the with start..., reserved_ctx: line for which you need to go back and to reassure yourself, that it's triggered only for the 0-th iteration and you're not actually reserving the filename every time. For me the file reserving behaviour is the hardest part to understand and an extra conditional on it doesn't help.

But I guess this could be based on a different emphasis / mental model for how similar these two naming schemes are similar rather than how they differ...

I agree. And readability has a lot of subjectiveness to it in general. I've just started exploring this 'DRY being not always good' idea and I was curious where you stand on it.

'DRY being not always good' idea and I was curious where you stand on it.

Agreed it's not always good, just one of the many things to weigh when deciding how to implement something!

BTW, both cyclomatic and cognitive complexity give a little lower scores for the duplicated version: 7 vs 8 and 9 vs 11 correspondingly.

larsoner · 2023-09-06T18:45:16Z

Pushed a tiny commit to fix the Windows error (I was creating an absurdly large test file!) and marking for merge when green, thanks @dmalt !

Co-authored-by: Eric Larson <[email protected]>

dmalt force-pushed the refactor_write_raw branch from af1551c to 5dbd876 Compare September 4, 2023 18:15

purge recursion from _write_raw() - _write_raw_fid()

5dbd876

dmalt marked this pull request as draft September 4, 2023 18:28

dmalt added 3 commits September 4, 2023 20:57

remove use_fname from args to _write_raw_fid

8dbcc3e

remove next_idx arg from _write_raw_fid

a280976

pack config for _write_raw_fid

5c1aa1a

dmalt force-pushed the refactor_write_raw branch 2 times, most recently from b1acbb1 to c586bf6 Compare September 4, 2023 20:28

add reset_range and data_type to config dataclass

8505cf0

dmalt force-pushed the refactor_write_raw branch from c586bf6 to f9d85a5 Compare September 4, 2023 21:14

dmalt added 4 commits September 4, 2023 23:15

introduce _RawFidWriter

03dcc88

convert fid writing funcitons to methods

745bc6d

_write_raw_fid and _start_writing_raw are now methods of _RawFidWriter

make start and stop part of _RawFidWriter

e7c06ee

make cfg part of _RawFidWriter

efa8163

dmalt force-pushed the refactor_write_raw branch from 49fbaec to 1c448cd Compare September 4, 2023 21:16

rename the cfg class

1c448cd

dmalt force-pushed the refactor_write_raw branch from f87fc1d to 146a34d Compare September 4, 2023 22:43

dmalt added 12 commits September 5, 2023 00:44

rename the cfg class

2d3c0ff

simplify reserved context indentation

a85297f

add write function

b04b503

call _start_writing_raw() and _write_raw_fid() from a single method

more logical block start/end placement for raw

b32dcf3

more logical outermost block start/end placement

093bd0d

apply pick to info in __init__

62d9f5e

factor out calibrations getter

9782dac

factor out prev-next fname writing

9efce9b

factor out splitting criteria

c8c8cac

clarify _should_split via variable renaming

1ce3b8a

factor out too large buffer check

146a34d

add _write_raw_neuromag

56fa7ef

dmalt force-pushed the refactor_write_raw branch from cdac5c7 to 0a5b437 Compare September 5, 2023 11:29

fix reserved fid bug

9447fb2

fix a newly introduced bug when we moved fpath before closing its fid

dmalt marked this pull request as ready for review September 5, 2023 20:07

Merge branch 'main' into refactor_write_raw

3a4fbc3

cbrnr changed the title ~~Refactror writing raw file~~ Refactor writing raw file Sep 6, 2023

FIX: Block open and close in same scope

51e7e86

dmalt force-pushed the refactor_write_raw branch from d3b1123 to df1b6c0 Compare September 6, 2023 13:53

dmalt force-pushed the refactor_write_raw branch from df1b6c0 to fb2dd5f Compare September 6, 2023 14:15

dmalt added 3 commits September 6, 2023 16:18

fix docstring

fb2dd5f

return only new_start from _write_raw_data

af49527

move calibrations correctly

e917a3e

larsoner reviewed Sep 6, 2023

View reviewed changes

mne/io/base.py Outdated Show resolved Hide resolved

larsoner added 3 commits September 6, 2023 13:12

MAINT: while to for

93e85b0

MAINT: DRY

fb11971

FIX: Debugging cruft

7304cf6

larsoner reviewed Sep 6, 2023

View reviewed changes

FIX: Smaller

c3584fb

larsoner enabled auto-merge (squash) September 6, 2023 18:45

larsoner merged commit 3dd8899 into mne-tools:main Sep 6, 2023
22 checks passed

snwnde pushed a commit to snwnde/mne-python that referenced this pull request Mar 20, 2024

Refactor writing raw file (mne-tools#11953)

1fcad7e

Co-authored-by: Eric Larson <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor writing raw file #11953

Refactor writing raw file #11953

dmalt commented Sep 4, 2023 •

edited

Loading

larsoner commented Sep 6, 2023

dmalt commented Sep 6, 2023 •

edited

Loading

larsoner commented Sep 6, 2023

wmvanvliet commented Sep 6, 2023

dmalt commented Sep 6, 2023 •

edited

Loading

larsoner commented Sep 6, 2023

dmalt commented Sep 6, 2023

larsoner commented Sep 6, 2023

dmalt commented Sep 6, 2023

dmalt commented Sep 6, 2023 •

edited

Loading

larsoner Sep 6, 2023

dmalt Sep 6, 2023 •

edited

Loading

larsoner Sep 6, 2023 •

edited

Loading

dmalt Sep 6, 2023

larsoner Sep 6, 2023

dmalt Sep 6, 2023

larsoner commented Sep 6, 2023

Refactor writing raw file #11953

Refactor writing raw file #11953

Conversation

dmalt commented Sep 4, 2023 • edited Loading

Reference issue

What does this implement/fix?

Additional information

larsoner commented Sep 6, 2023

dmalt commented Sep 6, 2023 • edited Loading

larsoner commented Sep 6, 2023

wmvanvliet commented Sep 6, 2023

dmalt commented Sep 6, 2023 • edited Loading

larsoner commented Sep 6, 2023

dmalt commented Sep 6, 2023

larsoner commented Sep 6, 2023

dmalt commented Sep 6, 2023

dmalt commented Sep 6, 2023 • edited Loading

larsoner Sep 6, 2023

Choose a reason for hiding this comment

dmalt Sep 6, 2023 • edited Loading

Choose a reason for hiding this comment

larsoner Sep 6, 2023 • edited Loading

Choose a reason for hiding this comment

dmalt Sep 6, 2023

Choose a reason for hiding this comment

larsoner Sep 6, 2023

Choose a reason for hiding this comment

dmalt Sep 6, 2023

Choose a reason for hiding this comment

larsoner commented Sep 6, 2023

dmalt commented Sep 4, 2023 •

edited

Loading

dmalt commented Sep 6, 2023 •

edited

Loading

dmalt commented Sep 6, 2023 •

edited

Loading

dmalt commented Sep 6, 2023 •

edited

Loading

dmalt Sep 6, 2023 •

edited

Loading

larsoner Sep 6, 2023 •

edited

Loading