-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor writing raw file #11953
Refactor writing raw file #11953
Conversation
af1551c
to
5dbd876
Compare
b1acbb1
to
c586bf6
Compare
c586bf6
to
f9d85a5
Compare
_write_raw_fid and _start_writing_raw are now methods of _RawFidWriter
49fbaec
to
1c448cd
Compare
f87fc1d
to
146a34d
Compare
call _start_writing_raw() and _write_raw_fid() from a single method
cdac5c7
to
0a5b437
Compare
fix a newly introduced bug when we moved fpath before closing its fid
Hmm... I do find that somewhat more readable than the class + |
I did classes because I wanted to organise parameters to the Also It feels like At this point, I've looked at this piece of code for too long and I'm not sure about anything any longer:) |
Can you make a comment saying this? I think it's an important point
Agreed, the names are not descriptive. Like if the former writes metadata (info, annotation, etc) and the latter writes actual data we could rename them accordingly...
... And/or in theory we could split off some stuff in a But these are half baked ideas I came up with without looking at the code in depth so they could be way off. If you want I can dig deeper in an hour or two and see if these make any sense and possibly push changes if they do (or if something else occurs to me when looking deeper). |
great improvement! |
Sure.
I'd rather move the outermost tags to def write(...):
begin_block(fid, FIFF.FIFFB_MEAS)
_start_writing_raw(...)
is_new_split = _write_raw_fid(...)
end_block(fid, FIFF.FIFFB_MEAS)
return is_new_split And rename The same way if info.get("maxshield", False):
start_block(fid, FIFF.FIFFB_IAS_RAW_DATA)
else:
start_block(fid, FIFF.FIFFB_RAW_DATA) (Which is now part of if info.get("maxshield", False):
end_block(fid, FIFF.FIFFB_IAS_RAW_DATA)
else:
end_block(fid, FIFF.FIFFB_RAW_DATA) Also this part of cals = []
for k in range(info["nchan"]):
#
# Scan numbers may have been messed up
#
info["chs"][k]["scanno"] = k + 1 # scanno starts at 1 in FIF format
if reset_range is True:
info["chs"][k]["range"] = 1.0
cals.append(info["chs"][k]["cal"] * info["chs"][k]["range"]) All it does is calculating calibrations just to pass them to Maybe just merging |
There are exactly part of the solution I came up with! Pushing now...
Yes but I don't mind keeping them a bit separate since one really is about metadata and the other about data. That plus a smaller diff is nice I think. |
Ok, cool! |
@dmalt let me know if you're happy with this version and/or push any final tweaks and I'll mark for merge when green! |
If you move out tags and calibrations, it becomes just a couple of lines of code. |
d3b1123
to
df1b6c0
Compare
Looks great! Just made that calibrations move. Feel free to merge when green! UPD. |
df1b6c0
to
fb2dd5f
Compare
dir_path = fpath.parent | ||
# We have to create one extra filename here to make the for loop below happy, | ||
# but it will raise an error if it actually gets used | ||
split_fnames = _make_split_fnames( | ||
fpath.name, n_splits=MAX_N_SPLITS + 1, split_naming=split_naming | ||
) | ||
is_next_split, prev_fname = True, None | ||
for part_idx in range(0, MAX_N_SPLITS): | ||
if not is_next_split: | ||
break | ||
bids_special_behavior = part_idx == 0 and split_naming == "bids" | ||
if bids_special_behavior: | ||
reserved_fname = dir_path / split_fnames[0] | ||
logger.info(f"Reserving possible split file {reserved_fname.name}") | ||
_check_fname(reserved_fname, overwrite) | ||
reserved_ctx = _ReservedFilename(reserved_fname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmalt in addition to changing from while
to for
, I also DRY'ed the code a bit. The previous version had a first if
clause that had a lot of the same code in it that the while
loop had. By embedding the logic for bids_special_naming
inside the loop, we can remove that redundancy. Now I think it's clearer when something special has to happen. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I'm not the biggest fan of the DRY principle. I feel like it definitely must be applied when there are >= 3 repetitions, but when something is repeated 2 times, it's sometimes better to resist the urge and just repeat yourself. For me here DRY doesn't improve readability because the control flow gets more complex. Here I guess it's a matter of taste, so I'll be happy with whichever option you choose. Just saying it for the sake of discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DRY doesn't improve readability because the control flow gets more complex.
It is true that there are now two if
statements instead of one. But in the old code the conditional variable values that end up needing to be accounted for (in the mental model and executed in code) were more plentiful: part_idx
, prev_fname
, and is_next_split
could all be modified inside that first conditional, so all of their values were essentially conditional on (what is now known as) bids_special_behavior
, even though setting those values was accomplished with a single if
statement.
In the new version of the code that uses a unified loop, this isn't the case: part_idx
always starts at zero and increases in one place (range
); prev_fname
starts as None
and gets set to the current_fname
or whatever at the end of each iteration of the loop; is_next_split
only gets overwritten one place (inside the loop) instead of two. The only special/conditional thing that happens is that for BIDS there might be a renaming, which leads to two "if" statements, but to me the variable behavior and otherwise equivalence of the behaviors under the two naming schemes is more readily apparent. But I guess this could be based on a different emphasis / mental model for how similar these two naming schemes are similar rather than how they differ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of prev_fname
more as of explanatory variable. It could be plugged directly into _write_raw_data()
, e.g. as keyword arguments to be more explicit. I see the point with regards to part_idx
and is_next_split
but I feel like with duplication you just read the code from top to bottom and in dried version there's more mental context to keep. Especially with the with start..., reserved_ctx:
line for which you need to go back and to reassure yourself, that it's triggered only for the 0-th iteration and you're not actually reserving the filename every time. For me the file reserving behaviour is the hardest part to understand and an extra conditional on it doesn't help.
But I guess this could be based on a different emphasis / mental model for how similar these two naming schemes are similar rather than how they differ...
I agree. And readability has a lot of subjectiveness to it in general. I've just started exploring this 'DRY being not always good' idea and I was curious where you stand on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'DRY being not always good' idea and I was curious where you stand on it.
Agreed it's not always good, just one of the many things to weigh when deciding how to implement something!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, both cyclomatic and cognitive complexity give a little lower scores for the duplicated version: 7 vs 8 and 9 vs 11 correspondingly.
Pushed a tiny commit to fix the Windows error (I was creating an absurdly large test file!) and marking for merge when green, thanks @dmalt ! |
Co-authored-by: Eric Larson <[email protected]>
Reference issue
Necessary for PR #11924
What does this implement/fix?
_write_raw()
and_write_raw_fid()
interactionAdditional information
In #11924 I'm adding support for reading and writing from zip archives via using
Path
interface in the reading and writing functions. At the moment such change is not possible becausezipfile.Path
s don't support writing to severalfid
s at once. It's a problem because currently MNE-Python writes splits recursively, such that it keeps thefid
for the first file open until we're done writing all the remaining splits.This PR attempts at fixing this recursion problem by using a simple
while
loop. As a bonus, no-recursion implementation is easier to understand and maintain (hopefully).