Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SCHEMA] Use macro for filename templates in file collections appendix #787

Merged
merged 1 commit into from
Jul 15, 2021

Conversation

tsalo
Copy link
Member

@tsalo tsalo commented Apr 26, 2021

Closes None.

Changes proposed:

  • Auto-generate the filename templates for anat and fmap file collections in Appendix X: File collections.

@tsalo
Copy link
Member Author

tsalo commented Apr 26, 2021

@agahkarakuzu would you be willing to take a look at the rendered filename templates? Here is the rendered pattern for anat data and here's fmap data.

@tsalo tsalo requested review from effigies and sappelhoff May 3, 2021 18:49
@tsalo tsalo added the schema Issues related to the YAML schema representation of the specification. Patch version release. label May 4, 2021
@agahkarakuzu
Copy link
Contributor

Hey @tsalo sure, I'll look into it, thank you!

@agahkarakuzu
Copy link
Contributor

agahkarakuzu commented May 14, 2021

Most of the issues I raise below are probably because of the way we defined requirements in the first place, but now it has become much easier to see what is reasonable and what is less. Thanks for the effort!

MP2RAGE

[_flip-<index>]_inv-<index>[_part-<label>]_MP2RAGE.json

I think it is safe to drop OPTIONAL from the flip entity and make part required if possible. Optional echo is the only alternative (file-collection entity relevant) use case.

MPM

[_echo-<index>]_flip-<index>_mt-<label>[_part-<label>]_MPM.json

Echo should be required for MPM (at least that's what I infer from the article), @ChristophePhillips @lazaral can confirm if that's the case.

B1maps

It all looks good to me, I think people will check the table below and not rely solely on the template, which should make it clear how to use acq for some fmap file collections.

@sappelhoff
Copy link
Member

I think it is safe to drop OPTIONAL from the flip entity and make part required if possible. Optional echo is the only alternative (file-collection entity relevant) use case.

Echo should be required for MPM (at least that's what I infer from the article), @ChristophePhillips @lazaral can confirm if that's the case.

@agahkarakuzu et al. those sound like quite important changes: Changing requirement levels can have a large impact on downstream software. We should (1) be sure that this is needed, (2) if it is needed, act quickly on it, before more and more people start to adopt the current (then to be outdated) way of specifying.

@effigies
Copy link
Collaborator

effigies commented Jun 6, 2021

We're a couple releases in. I think we need to make them RECOMMENDED and make sure there's a warning in the validator.

@tsalo
Copy link
Member Author

tsalo commented Jun 16, 2021

@agahkarakuzu can you open an issue about the requirement changes in the bids 2.0 repository? That way, at least it'll be on the to-do list for when we're able to make backwards-incompatible changes.

Regarding this PR, is there anything I need to change or is it good as-is?

@agahkarakuzu
Copy link
Contributor

@tsalo the PR LGTM. I will discuss the requirement changes I mentioned in this comment with the team and open an issue about it.

If possible, as @effigies suggested, switching to RECOMMENDED is a good solution short term. (part for MP2RAGE and echo for MPM).

@tsalo
Copy link
Member Author

tsalo commented Jun 16, 2021

@tsalo the PR LGTM.

Thanks!

If possible, as @effigies suggested, switching to RECOMMENDED is a good solution short term.

Definitely worth discussing, though we haven't used "RECOMMENDED" with entities in the spec. Just REQUIRED and OPTIONAL. Would/will be a new feature.

@sappelhoff
Copy link
Member

sappelhoff commented Jul 1, 2021

@tsalo for the sake of the original intent of this PR (rendering text in the spec from the schema), can that be done without changes to what's in the text?

Then we could merge this, and discuss potential changes to qMRI requirement levels in a separate, dedicated issue.

EDIT: Whether to add a notion of "recommended" to filename templates can be a separate issue (or discussed in the yet-to-be-opened qMRI issue).

@tsalo
Copy link
Member Author

tsalo commented Jul 2, 2021

@tsalo for the sake of the original intent of this PR (rendering text in the spec from the schema), can that be done without changes to what's in the text?

Sorry, could you clarify what you mean by "can that be done"? I think the PR, as-is, could be merged, with the caveat (as you mentioned) that two issues should be opened for the RECOMMENDED option and for updating qMRI entity requirements, respectively. Although I think the latter should probably go in bids-2-devel.

Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see now - thanks. So this PR just made obvious what was already there: Namely that it wouldn't hurt to refine a few requirement level settings for qMRI.

But as that's not fully possible in a backward compatible way, let's 1) push it back to BIDS 2.0, and 2) discuss in a new issue whether/how we can RECOMMEND in file path templates (backward compatible).

+1 for merging then, as the issues can be raised by the respective stakeholders later.

@tsalo
Copy link
Member Author

tsalo commented Jul 4, 2021

Awesome! We have two approving reviews, so I'll wait until Thursday and merge then, unless someone raises a concern.

@tsalo tsalo merged commit 87ca2c1 into bids-standard:master Jul 15, 2021
@tsalo tsalo deleted the file-collections-templates branch July 15, 2021 16:04
yarikoptic added a commit to yarikoptic/bids-specification that referenced this pull request Dec 18, 2021
Text is using both 'file name' and 'filename' pretty much to the equal amount
ATM (see git grep outputs below).  Code uses 'filename', and wikipedia has
https://en.wikipedia.org/wiki/Filename and prefers to use 'filename' in it. So
I decided to harmonize  into  'filename'.

	$> git grep  'file name' | grep '\.md' | grep -v MACRO | nl
		 1	src/02-common-principles.md:1.  **File extension** - a portion of the file name after the left-most
		 2	src/02-common-principles.md:are compulsory. For example a particular file name format is required when
		 3	src/02-common-principles.md:saved under a particular file name specified in the standard. This standard
		 4	src/02-common-principles.md:A file name consists of a chain of *entities*, or key-value pairs, a *suffix* and an
		 5	src/02-common-principles.md:`subject`, the file name MUST begin with the string `sub-<label>_ses-<label>`.
		 6	src/02-common-principles.md:If the `session` level is omitted in the folder structure, the file name MUST begin
		 7	src/02-common-principles.md:key/value pair MUST also be included as part of the file names themselves.
		 8	src/02-common-principles.md:produces a human readable file name, such as `sub-01_task-rest_eeg.edf`.
		 9	src/02-common-principles.md:It is evident from the file name alone that the file contains resting state
		10	src/02-common-principles.md:Entities within a file name MUST be unique.
		11	src/02-common-principles.md:For example, the following file name is not valid because it uses the `acq`
		12	src/02-common-principles.md:label, but must be included in file names (similarly to other key names).
		13	src/02-common-principles.md:meaning of file names and setting requirements on their contents or metadata.
		14	src/02-common-principles.md:to suppress warnings or provide interpretations of your file names.
		15	src/03-modality-agnostic-files.md:This file is REQUIRED if `sample-<label>` is present in any file name within the dataset.
		16	src/04-modality-specific-files/05-task-events.md:Where `<matches>` corresponds to task file name. For example:
		17	src/04-modality-specific-files/06-physiological-and-other-continuous-recordings.md:In the template file names, the `<matches>` part corresponds to task file name
		18	src/05-derivatives/02-common-data-types.md:is used to prevent clashing with the original file name.
		19	src/06-longitudinal-and-multi-site-studies.md:and [file names](02-common-principles.md#file-name-structure)
		20	src/99-appendices/03-hed.md:screen or the file name of the stimulus image.
		21	src/99-appendices/03-hed.md:       "LongName": "Stimulus file name",
		22	src/99-appendices/04-entity-table.md:[file name structure](../02-common-principles.md#file-name-structure),
		23	src/99-appendices/06-meg-file-formats.md:that not only the file names, but also the internal file pointers will be
		24	src/99-appendices/09-entities.md:[file name structure](../02-common-principles.md#file-name-structure).
		25	src/CHANGES.md:-   \[FIX] Clarify use of session entity in file names [bids-standard#532](bids-standard#532) ([Moo-Marc](https://github.com/Moo-Marc))
		26	src/CHANGES.md:-   \[FIX] Specify marker file names for KIT data (MEG) [bids-standard#62](bids-standard#62) ([monkeyman192](https://github.com/monkeyman192))
		27	src/CHANGES.md:-   Added missing `sub-<participant_label>` in behavioral data file names.
		28	src/pregh-changes.md:-   Added missing `sub-<participant_label>` in behavioral data file names.

	$> git grep 'filename' | grep '\.md' | grep -v MACRO | nl
		 1	CONTRIBUTING.md:Make sure that all filename format templates, entity tables, and entity definitions are correct
		 2	src/02-common-principles.md:(with the same filename as the `.nii[.gz]` file, but with a `.json` extension).
		 3	src/03-modality-agnostic-files.md:      "filename": ("REQUIRED", "There MUST be exactly one row for each file."),
		 4	src/03-modality-agnostic-files.md:filename	acq_time
		 5	src/04-modality-specific-files/02-magnetoencephalography.md:which saves the MEG sensor coil positions in a separate file with two possible filename extensions  (`.sqd`, `.mrk`).
		 6	src/05-derivatives/01-introduction.md:    status. Any modification of raw files must use a modified filename that does
		 7	src/05-derivatives/01-introduction.md:    not conflict with the raw filename. Further, any files created as part of a
		 8	src/05-derivatives/01-introduction.md:    derivative dataset must not match a permissible filename of a valid raw
		 9	src/05-derivatives/01-introduction.md:    dataset. Stated equivalently, if any filename in a derivative dataset has a
		10	src/05-derivatives/01-introduction.md:-   Each Derivatives filename MUST be of the form:
		11	src/05-derivatives/01-introduction.md:    `source_entities` MUST be the entire source filename, with the omission of
		12	src/05-derivatives/01-introduction.md:    the source suffix and extension. One exception to this rule is filename
		13	src/05-derivatives/01-introduction.md:-   There is no prohibition against identical filenames in different derived
		14	src/05-derivatives/03-imaging.md:filename.
		15	src/99-appendices/04-entity-table.md:specification, and establishes a common order within a filename.
		16	src/99-appendices/08-coordinate-systems.md:The `scanner` coordinate system is implicit and assumed by default if the derivative filename does not define **any** `space-<label>`.
		17	src/99-appendices/11-qmri.md:filenames will remain the same; however, the optional metadata (third column) may
		18	src/CHANGES.md:-   \[SCHEMA] Use macro for filename templates in file collections appendix [bids-standard#787](bids-standard#787) ([tsalo](https://github.com/tsalo))
		19	src/CHANGES.md:-   \[FIX] Accidentally swapped Neuromag/Elekta/MEGIN cross-talk & fine-calibration filename extensions [bids-standard#621](bids-standard#621) ([hoechenberger](https://github.com/hoechenberger))
		20	src/CHANGES.md:-   \[INFRA] SCHEMA: Declare entities by concept names, add entity field for filename components [bids-standard#616](bids-standard#616) ([effigies](https://github.com/effigies))
		21	src/CHANGES.md:-   \[FIX] Common principles: Fix filename in inheritance principle [bids-standard#261](bids-standard#261) ([Lestropie](https://github.com/Lestropie))
		22	src/CHANGES.md:-   \[FIX] Example for IntendedFor was missing session indicator in the filename [bids-standard#129](bids-standard#129) ([yarikoptic](https://github.com/yarikoptic))
		23	src/schema/README.md:the entity tables, entity definitions, filename templates, and metadata tables.
		24	src/schema/README.md:-   `entities.yaml`: Entities (key/value pairs in folder and filenames).
		25	src/schema/README.md:This file contains a dictionary in which each entity (key/value pair in filenames) is defined.
		26	src/schema/README.md:they appear in filenames _and_ their full names.
		27	src/schema/README.md:For example, the key for the "Contrast Enhancing Agent" entity, which appears in filenames as `ce-<label>`,
		28	src/schema/README.md:since many entities (such as `ce`) have very short filename elements.
		29	src/schema/README.md:The `entity` field is the entity as it appears in filenames. For example, the `entity` for `ceagent` is `ce`.
		30	src/schema/README.md:Given that all entities appear in filenames, they should all be strings and the `type` field should always be `string`.
		31	src/schema/README.md:For example, `run` should have an index, so a valid key-value pair in a filename would be `run-01`.
		32	src/schema/README.md:Keys are the filenames (without file extensions),
		33	src/schema/README.md:-   `datatypes/*.yaml`: Files in the `datatypes` folder contain information about valid filenames within a given datatype.
		34	src/schema/README.md:    Each dictionary contains a list of suffixes, entities, and file extensions which may constitute a valid BIDS filename.
		35	src/schema/README.md:-   `entities.yaml`: This file simply defines the order in which entities, when present, MUST appear in filenames.
		36	src/schema/README.md:Each dictionary corresponds to a group of suffixes that have the same rules regarding filenames.
		37	src/schema/README.md:**NOTE**: The order in which entities appear in these dictionaries does not reflect how they should appear in filenames.
		38	src/schema/README.md:This file contains a list of entities in the order in which they must appear in filenames.
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "bash -c 'git grep -l '\"'\"'file name'\"'\"' | grep '\"'\"'\\.md'\"'\"' | grep -v MACRO | xargs sed -i -e '\"'\"'s,file name,filename,g'\"'\"''",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
sappelhoff pushed a commit that referenced this pull request Dec 21, 2021
Text is using both 'file name' and 'filename' pretty much to the equal amount
ATM (see git grep outputs below).  Code uses 'filename', and wikipedia has
https://en.wikipedia.org/wiki/Filename and prefers to use 'filename' in it. So
I decided to harmonize  into  'filename'.

	$> git grep  'file name' | grep '\.md' | grep -v MACRO | nl
		 1	src/02-common-principles.md:1.  **File extension** - a portion of the file name after the left-most
		 2	src/02-common-principles.md:are compulsory. For example a particular file name format is required when
		 3	src/02-common-principles.md:saved under a particular file name specified in the standard. This standard
		 4	src/02-common-principles.md:A file name consists of a chain of *entities*, or key-value pairs, a *suffix* and an
		 5	src/02-common-principles.md:`subject`, the file name MUST begin with the string `sub-<label>_ses-<label>`.
		 6	src/02-common-principles.md:If the `session` level is omitted in the folder structure, the file name MUST begin
		 7	src/02-common-principles.md:key/value pair MUST also be included as part of the file names themselves.
		 8	src/02-common-principles.md:produces a human readable file name, such as `sub-01_task-rest_eeg.edf`.
		 9	src/02-common-principles.md:It is evident from the file name alone that the file contains resting state
		10	src/02-common-principles.md:Entities within a file name MUST be unique.
		11	src/02-common-principles.md:For example, the following file name is not valid because it uses the `acq`
		12	src/02-common-principles.md:label, but must be included in file names (similarly to other key names).
		13	src/02-common-principles.md:meaning of file names and setting requirements on their contents or metadata.
		14	src/02-common-principles.md:to suppress warnings or provide interpretations of your file names.
		15	src/03-modality-agnostic-files.md:This file is REQUIRED if `sample-<label>` is present in any file name within the dataset.
		16	src/04-modality-specific-files/05-task-events.md:Where `<matches>` corresponds to task file name. For example:
		17	src/04-modality-specific-files/06-physiological-and-other-continuous-recordings.md:In the template file names, the `<matches>` part corresponds to task file name
		18	src/05-derivatives/02-common-data-types.md:is used to prevent clashing with the original file name.
		19	src/06-longitudinal-and-multi-site-studies.md:and [file names](02-common-principles.md#file-name-structure)
		20	src/99-appendices/03-hed.md:screen or the file name of the stimulus image.
		21	src/99-appendices/03-hed.md:       "LongName": "Stimulus file name",
		22	src/99-appendices/04-entity-table.md:[file name structure](../02-common-principles.md#file-name-structure),
		23	src/99-appendices/06-meg-file-formats.md:that not only the file names, but also the internal file pointers will be
		24	src/99-appendices/09-entities.md:[file name structure](../02-common-principles.md#file-name-structure).
		25	src/CHANGES.md:-   \[FIX] Clarify use of session entity in file names [#532](#532) ([Moo-Marc](https://github.com/Moo-Marc))
		26	src/CHANGES.md:-   \[FIX] Specify marker file names for KIT data (MEG) [#62](#62) ([monkeyman192](https://github.com/monkeyman192))
		27	src/CHANGES.md:-   Added missing `sub-<participant_label>` in behavioral data file names.
		28	src/pregh-changes.md:-   Added missing `sub-<participant_label>` in behavioral data file names.

	$> git grep 'filename' | grep '\.md' | grep -v MACRO | nl
		 1	CONTRIBUTING.md:Make sure that all filename format templates, entity tables, and entity definitions are correct
		 2	src/02-common-principles.md:(with the same filename as the `.nii[.gz]` file, but with a `.json` extension).
		 3	src/03-modality-agnostic-files.md:      "filename": ("REQUIRED", "There MUST be exactly one row for each file."),
		 4	src/03-modality-agnostic-files.md:filename	acq_time
		 5	src/04-modality-specific-files/02-magnetoencephalography.md:which saves the MEG sensor coil positions in a separate file with two possible filename extensions  (`.sqd`, `.mrk`).
		 6	src/05-derivatives/01-introduction.md:    status. Any modification of raw files must use a modified filename that does
		 7	src/05-derivatives/01-introduction.md:    not conflict with the raw filename. Further, any files created as part of a
		 8	src/05-derivatives/01-introduction.md:    derivative dataset must not match a permissible filename of a valid raw
		 9	src/05-derivatives/01-introduction.md:    dataset. Stated equivalently, if any filename in a derivative dataset has a
		10	src/05-derivatives/01-introduction.md:-   Each Derivatives filename MUST be of the form:
		11	src/05-derivatives/01-introduction.md:    `source_entities` MUST be the entire source filename, with the omission of
		12	src/05-derivatives/01-introduction.md:    the source suffix and extension. One exception to this rule is filename
		13	src/05-derivatives/01-introduction.md:-   There is no prohibition against identical filenames in different derived
		14	src/05-derivatives/03-imaging.md:filename.
		15	src/99-appendices/04-entity-table.md:specification, and establishes a common order within a filename.
		16	src/99-appendices/08-coordinate-systems.md:The `scanner` coordinate system is implicit and assumed by default if the derivative filename does not define **any** `space-<label>`.
		17	src/99-appendices/11-qmri.md:filenames will remain the same; however, the optional metadata (third column) may
		18	src/CHANGES.md:-   \[SCHEMA] Use macro for filename templates in file collections appendix [#787](#787) ([tsalo](https://github.com/tsalo))
		19	src/CHANGES.md:-   \[FIX] Accidentally swapped Neuromag/Elekta/MEGIN cross-talk & fine-calibration filename extensions [#621](#621) ([hoechenberger](https://github.com/hoechenberger))
		20	src/CHANGES.md:-   \[INFRA] SCHEMA: Declare entities by concept names, add entity field for filename components [#616](#616) ([effigies](https://github.com/effigies))
		21	src/CHANGES.md:-   \[FIX] Common principles: Fix filename in inheritance principle [#261](#261) ([Lestropie](https://github.com/Lestropie))
		22	src/CHANGES.md:-   \[FIX] Example for IntendedFor was missing session indicator in the filename [#129](#129) ([yarikoptic](https://github.com/yarikoptic))
		23	src/schema/README.md:the entity tables, entity definitions, filename templates, and metadata tables.
		24	src/schema/README.md:-   `entities.yaml`: Entities (key/value pairs in folder and filenames).
		25	src/schema/README.md:This file contains a dictionary in which each entity (key/value pair in filenames) is defined.
		26	src/schema/README.md:they appear in filenames _and_ their full names.
		27	src/schema/README.md:For example, the key for the "Contrast Enhancing Agent" entity, which appears in filenames as `ce-<label>`,
		28	src/schema/README.md:since many entities (such as `ce`) have very short filename elements.
		29	src/schema/README.md:The `entity` field is the entity as it appears in filenames. For example, the `entity` for `ceagent` is `ce`.
		30	src/schema/README.md:Given that all entities appear in filenames, they should all be strings and the `type` field should always be `string`.
		31	src/schema/README.md:For example, `run` should have an index, so a valid key-value pair in a filename would be `run-01`.
		32	src/schema/README.md:Keys are the filenames (without file extensions),
		33	src/schema/README.md:-   `datatypes/*.yaml`: Files in the `datatypes` folder contain information about valid filenames within a given datatype.
		34	src/schema/README.md:    Each dictionary contains a list of suffixes, entities, and file extensions which may constitute a valid BIDS filename.
		35	src/schema/README.md:-   `entities.yaml`: This file simply defines the order in which entities, when present, MUST appear in filenames.
		36	src/schema/README.md:Each dictionary corresponds to a group of suffixes that have the same rules regarding filenames.
		37	src/schema/README.md:**NOTE**: The order in which entities appear in these dictionaries does not reflect how they should appear in filenames.
		38	src/schema/README.md:This file contains a list of entities in the order in which they must appear in filenames.
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "bash -c 'git grep -l '\"'\"'file name'\"'\"' | grep '\"'\"'\\.md'\"'\"' | grep -v MACRO | xargs sed -i -e '\"'\"'s,file name,filename,g'\"'\"''",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
@tsalo tsalo added schema-code Updates or changes to the code used to parse, filter, and render the schema. and removed schema-code Updates or changes to the code used to parse, filter, and render the schema. labels Apr 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
schema Issues related to the YAML schema representation of the specification. Patch version release.
Projects
Development

Successfully merging this pull request may close these issues.

4 participants