Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix epoch splits naming #11876

Merged
merged 12 commits into from
Aug 17, 2023
Merged

Conversation

dmalt
Copy link
Contributor

@dmalt dmalt commented Aug 12, 2023

Fixes #11870

Fixes overwriting bug, when epoch.save('test-epo.fif', overwrite=False, split_naming="neuromag") would overwrite
"test-epo-1.fif".
Disallows filenames like "test-epo.fif" with split_naming="bids" to avoid surprises with split names. File names like
"a_b-epo.fif", i.e. having several bids clauses divided by "_" are still allowed. Also we don't do this check if no splits will be created.

Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me know when it's good to go from your end. Don't forget to update latest.inc file in the doc. thx

mne/tests/test_epochs.py Outdated Show resolved Hide resolved
mne/tests/test_epochs.py Outdated Show resolved Hide resolved
@dmalt
Copy link
Contributor Author

dmalt commented Aug 13, 2023

While working on the PR, found another problem: if the ending is not BIDS, i.e. -epo.fif and split_naming="bids", epochs.save() produces files like _split-01_test-epo.fif without complains. Should I prohibit -epo.fif with split_naming="bids"? I think raising ValueError in this case would be reasonable.

Also, should epochs.save("test-epo.fif", split_naming="bids") fail if no splits will be produced?

@dmalt dmalt marked this pull request as draft August 13, 2023 08:46
with pytest.raises(FileExistsError, match="Destination file"):
epochs.save(split_fname, split_naming=split_naming, verbose=True)
os.remove(split_fname)
# we don't test for reserved files as it's not implemented here
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what reserved files means here. I noticed, Raw.save() mentions something reserving the main file when producing splits. Is it something important I should be aware of? For now I just removed this comment.

events = mne.make_fixed_length_events(raw, 1)
epochs = mne.Epochs(raw, events)
if split_size == "2MB" and (metadata or concat):
n_files += 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this was doing. Removed it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a special case inside the test. Basically it was necessary because in the case where your split_size is small and you have metadata (or concat, for whatever reason), you will end up with another file. It doesn't seem like your PR should change the file sizes produced but rather just the filenames, so it seems like a bug (either on this PR or in main currently) that you were able to remove this and have things still pass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... looks like it is a bug on main because we never use split_size="2MB", so removing this seems okay

@@ -1511,50 +1511,102 @@ def test_split_saving(tmp_path, split_size, n_epochs, n_files, size, metadata, c
}
)
epochs.metadata = metadata
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see, how the metadata are used in the tests. Do metadata affect epochs.drop_bad()? I don't think they do. Same thing with concat. Maybe they should be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They were probably added to make sure that we actually split the metadata properly (or copy it? not sure which we do currently) when splitting the epoch data across multiple files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked into commit history and it seems they are added to fix #7897. My current understanding is that there was a bug with saving splits when the size just marginally exceeded the splitting threshold, so metadata and concat manipulations help catch this corner case. Does this make sense?

Also there's #5102 related to problem with events when loading back the splits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was a bug with saving splits when the size just marginally exceeded the splitting threshold

ah yes I remember that one. Nice git archeology!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx:)

@larsoner
Copy link
Member

if the ending is not BIDS, i.e. -epo.fif and split_naming="bids", epochs.save() produces files like _split-01_test-epo.fif without complains. Should I prohibit -epo.fif with split_naming="bids"? I think raising ValueError in this case would be reasonable.

@sappelhoff WDYT the BIDS-sensible thing is to do in this case?

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmalt thanks for working on this!

As it stands, this PR is a little bit tough to review because you're mixing adding functionality with refactoring non-trivial tests that cover weird corner cases. It is hard to see what has been changed to make tests cleaner/better while keeping all existing functionalityl/checks (this is difficult to see) vs what was changed to accommodate the new naming scheme. It makes me worry that we might silently have break/omit some test that we put in place previously to cover some corner case. So even though the tests look cleaner, in some sense they are a bit less safe now.

I would suggest reverting the test changes, then start by adding just a new parametrization like:

@pytest.mark.parametrize("split_naming", ("neuromag", "bids"))

on a test or a few tests. Then you'll need to make a few small naming updates in the tests. This would be digestible from a review standpoint.

Alternatively you could open a fresh PR that only does the test refactoring without adding anything new. This could include stuff like the removal of the if split_size == "2MB" that probably doesn't need to be there. Then we get that green and merged, then rebase this PR on that... and then this PR once again hopefully just adds a parametrize + naming updates as above.

Does that make sense?

@dmalt
Copy link
Contributor Author

dmalt commented Aug 14, 2023

Yes, you're right, the PR spiralled out of control real quick:) Let me try from scratch. I feel like refactoring tests first would be a better option. The library code changes are easy but testing them without bloating an existing test even more -- not as much.

@sappelhoff
Copy link
Member

@sappelhoff WDYT the BIDS-sensible thing is to do in this case?

IMHO mne-bids is the BIDS-aware software and the feature that is provided via MNE-Python here is more a convenience for mne-bids. So I don't have a strong opinion as we correctly use this in mne-bids.

I wouldn't get into too much of a logic branching here and simply assume that users will either:

  1. need to know what they are doing or
  2. use mne-bids

@dmalt dmalt force-pushed the fix-epoch-splits-naming branch from 997074b to 651517c Compare August 16, 2023 22:54
@dmalt
Copy link
Contributor Author

dmalt commented Aug 17, 2023

@sappelhoff WDYT the BIDS-sensible thing is to do in this case?

IMHO mne-bids is the BIDS-aware software and the feature that is provided via MNE-Python here is more a convenience for mne-bids. So I don't have a strong opinion as we correctly use this in mne-bids.

I wouldn't get into too much of a logic branching here and simply assume that users will either:

  1. need to know what they are doing or
  2. use mne-bids

I was thinking adding a simple check fname.endswith("_epo.fif") when using bids just to avoid accidents. Does that sound reasonable?

@dmalt
Copy link
Contributor Author

dmalt commented Aug 17, 2023

I was thinking adding a simple check fname.endswith("_epo.fif") when using bids just to avoid accidents. Does that sound reasonable?

As a matter of fact, this check is already implemented. It was just turned off. The filenames for splits are constructed via mne.io.utils._construct_bids_filename called validate=False. validate=True would do what we need here (almost).

mne/epochs.py Outdated
fname = f"{base}-{part_idx:d}{ext}"
elif split_naming == "bids" and n_parts > 1:
fname = _construct_bids_filename(base, ext, part_idx + 1)
_check_fname(fname, overwrite=overwrite)
Copy link
Contributor Author

@dmalt dmalt Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call _check_fname() for both 'neuromag' and 'bids' to fix overwriting bug when
epochs.save('test-epo.fif', overwrite=False) would overwrite existing 'test-epo-1.fif'

mne/epochs.py Outdated
if part_idx > 0 and split_naming == "neuromag":
fname = f"{base}-{part_idx:d}{ext}"
elif split_naming == "bids" and n_parts > 1:
fname = _construct_bids_filename(base, ext, part_idx + 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove validate=False to prohibit using '-epo.fif' with split_naming='bids'

@dmalt dmalt marked this pull request as ready for review August 17, 2023 15:56
@dmalt
Copy link
Contributor Author

dmalt commented Aug 17, 2023

This is ready from my side.
I'm not sure if my last commit belongs here, or if it should go to a separate PR. I felt like it would be easier to convince yourself that the implementation is correct on the refactored version. If you disagree, feel free to revert the last commit or ask me and I'll do it.

@dmalt dmalt changed the title WIP: Fix epoch splits naming BUG: Fix epoch splits naming Aug 17, 2023
@dmalt dmalt force-pushed the fix-epoch-splits-naming branch 2 times, most recently from 0097cfb to 91b4a04 Compare August 17, 2023 16:42
dmalt and others added 6 commits August 17, 2023 18:44
Use already present validation mechanism. For now validation
only works when splits will be created, i.e. test-epo.fif with
split_naming="bids" is still allowed. Record this behaviour in
tests.
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much easier

mne/io/utils.py Outdated Show resolved Hide resolved
mne/tests/test_epochs.py Outdated Show resolved Hide resolved
mne/tests/test_epochs.py Outdated Show resolved Hide resolved
@dmalt dmalt force-pushed the fix-epoch-splits-naming branch from a5fb3e2 to 5a9ddeb Compare August 17, 2023 17:56
dmalt and others added 2 commits August 17, 2023 19:58
- add xfail reasons
- add match regex to pytest.raises
- remove elif, add _check_option
mne/tests/test_epochs.py Outdated Show resolved Hide resolved
@drammock drammock enabled auto-merge (squash) August 17, 2023 19:40
doc/changes/devel.rst Outdated Show resolved Hide resolved
@drammock drammock disabled auto-merge August 17, 2023 19:42
@drammock
Copy link
Member

@larsoner our "check rendered docs here" CI isn't working? takes me to the same link as the CircleCI Details link, and there is no artifact to view

@larsoner
Copy link
Member

It also doesn't run pytest-macos-arm64 -- @dmalt when I asked to sign up for CircleCI did you do something like enable it for your repo? It should be using MNE-Python's runs for CircleCI but it's using your user's. It's messing up the linking and preventing pytest-macos-arm64 from running, neither of which are total blockers but would be good to know how this happened...

@larsoner
Copy link
Member

(By "sign up" I just meant to "log in with GitHub" on the UI when you click "Details", so if you did more than that it would be good to know!)

@drammock
Copy link
Member

OK well I guess no harm in re-running them with the changelog commit then. I'll commit that and then mark for merge when green

@larsoner larsoner merged commit 9e85b2e into mne-tools:main Aug 17, 2023
@drammock
Copy link
Member

ha, @larsoner beats me to it again :)

larsoner added a commit to larsoner/mne-python that referenced this pull request Aug 17, 2023
* upstream/main:
  BUG: Fix epoch splits naming (mne-tools#11876)
@dmalt
Copy link
Contributor Author

dmalt commented Aug 17, 2023

It also doesn't run pytest-macos-arm64 -- @dmalt when I asked to sign up for CircleCI did you do something like enable it for your repo? It should be using MNE-Python's runs for CircleCI but it's using your user's. It's messing up the linking and preventing pytest-macos-arm64 from running, neither of which are total blockers but would be good to know how this happened...

Honestly, no idea. Sorry :( All I did is logging in with GitHub. It's my first experience with Circle CI, so I might have clicked something by accident.

@larsoner
Copy link
Member

Honestly, no idea. Sorry :( All I did is logging in with GitHub.

Sounds like what you did should have worked, weird!

@dmalt
Copy link
Contributor Author

dmalt commented Aug 17, 2023

Honestly, no idea. Sorry :( All I did is logging in with GitHub.

Sounds like what you did should have worked, weird!

And I still have this problem, right? For my next PR I mean.

@larsoner
Copy link
Member

Yes probably... maybe there is some way you can disable CircleCI building on your fork, you can try messing with settings in https://app.circleci.com/pipelines/github/dmalt/mne-python

@dmalt
Copy link
Contributor Author

dmalt commented Aug 17, 2023

Yes probably... maybe there is some way you can disable CircleCI building on your fork, you can try messing with settings in https://app.circleci.com/pipelines/github/dmalt/mne-python

Ok, I think the problem was that I clicked "Set Up Project" next to MNE-Python right after logging in. To be fair, the button was right in the middle of a screen, so it was calling for action:) I deleted the project and created a test PR #11897 and now it seems to be working fine.

@dmalt dmalt deleted the fix-epoch-splits-naming branch August 17, 2023 20:47
larsoner added a commit to drammock/mne-python that referenced this pull request Aug 22, 2023
* upstream/main:
  [pre-commit.ci] pre-commit autoupdate (mne-tools#11911)
  [BUG, MRG] Remove check on `mne.viz.Brain.add_volume_labels` (mne-tools#11889)
  Small splits fix (mne-tools#11905)
  adds niseq package to "Related software" (mne-tools#11909)
  Minor fixes for ERDS maps example (mne-tools#11904)
  FIX: Fix pyvista rendering (mne-tools#11896)
  BUG: Fix epoch splits naming (mne-tools#11876)
  ENH: Use section-title for HTML anchors in Report (mne-tools#11890)
  CI: Deploy [circle deploy]
  MAINT: Clean up whats_new and doc versions (mne-tools#11888)
  Refactor test_epochs.py::test_split_saving (2 out of 2) (mne-tools#11884)
  Cross-figure event passing system (mne-tools#11685)
  MAINT: Post-release deprecations, updates [circle deploy] (mne-tools#11887)
  MAINT: Release 1.5.0 (mne-tools#11886)
  [pre-commit.ci] pre-commit autoupdate (mne-tools#11883)
  Refactor test_epochs.py::test_split_saving (1 out of 2) (mne-tools#11880)
  FIX: Missing Saccade information in Eyelink File (mne-tools#11877)
  Improve drawing of annotations with matplotlib (mne-tools#11855)
  MAINT: Work around NumPy deprecation (mne-tools#11878)
snwnde pushed a commit to snwnde/mne-python that referenced this pull request Mar 20, 2024
Co-authored-by: Eric Larson <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel McCloy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent splits naming between Raw and Epochs when using BIDS schema
5 participants