-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SCHEMA] Reorganize schema #609
Changes from 3 commits
9b03fc4
b5af5ba
709351d
594b9e9
036ce51
5f53f35
6b83898
97a62e0
3587875
98e5263
d286048
1e6dc8f
bf55e72
74cd786
60f3cc2
52d965e
3aa1324
047d037
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,138 +1,16 @@ | ||
--- | ||
sub: | ||
name: Subject | ||
description: | | ||
A person or animal participating in the study. | ||
format: label | ||
ses: | ||
name: Session | ||
description: | | ||
A logical grouping of neuroimaging and behavioral data consistent across | ||
subjects. | ||
Session can (but doesn't have to) be synonymous to a visit in a | ||
longitudinal study. | ||
In general, subjects will stay in the scanner during one session. | ||
However, for example, if a subject has to leave the scanner room and then | ||
be re-positioned on the scanner bed, the set of MRI acquisitions will still | ||
be considered as a session and match sessions acquired in other subjects. | ||
Similarly, in situations where different data types are obtained over | ||
several visits (for example fMRI on one day followed by DWI the day after) | ||
those can be grouped in one session. | ||
Defining multiple sessions is appropriate when several identical or similar | ||
data acquisitions are planned and performed on all -or most- subjects, | ||
often in the case of some intervention between sessions (e.g., training). | ||
format: label | ||
task: | ||
name: Task | ||
format: label | ||
description: | | ||
Each task has a unique label that MUST only consist of letters and/or | ||
numbers (other characters, including spaces and underscores, are not | ||
allowed). | ||
Those labels MUST be consistent across subjects and sessions. | ||
acq: | ||
name: Acquisition | ||
description: | | ||
The `acq-<label>` key/value pair corresponds to a custom label the | ||
user MAY use to distinguish a different set of parameters used for | ||
acquiring the same modality. | ||
For example this should be used when a study includes two T1w images - one | ||
full brain low resolution and and one restricted field of view but high | ||
resolution. | ||
In such case two files could have the following names: | ||
`sub-01_acq-highres_T1w.nii.gz` and `sub-01_acq-lowres_T1w.nii.gz`, however | ||
the user is free to choose any other label than highres and lowres as long | ||
as they are consistent across subjects and sessions. | ||
In case different sequences are used to record the same modality (e.g. RARE | ||
and FLASH for T1w) this field can also be used to make that distinction. | ||
At what level of detail to make the distinction (e.g. just between RARE and | ||
FLASH, or between RARE, FLASH, and FLASHsubsampled) remains at the | ||
discretion of the researcher. | ||
format: label | ||
ce: | ||
name: Contrast Enhancing Agent | ||
description: | | ||
The `ce-<label>` key/value can be used to distinguish | ||
sequences using different contrast enhanced images. | ||
The label is the name of the contrast agent. | ||
The key `ContrastBolusIngredient` MAY be also be added in the JSON file, | ||
with the same label. | ||
format: label | ||
rec: | ||
name: Reconstruction | ||
description: | | ||
The `rec-<label>` key/value can be used to distinguish | ||
different reconstruction algorithms (for example ones using motion | ||
correction). | ||
format: label | ||
dir: | ||
name: Phase-Encoding Direction | ||
description: | | ||
The `dir-<label>` key/value can be used to distinguish | ||
different phase-encoding directions. | ||
format: label | ||
run: | ||
name: Run | ||
description: | | ||
If several scans of the same modality are acquired they MUST be indexed | ||
with a key-value pair: `_run-1`, `_run-2`, ..., `_run-<index>` | ||
(only nonnegative integers are allowed for the `<index>`). | ||
When there is only one scan of a given type the run key MAY be omitted. | ||
format: index | ||
mod: | ||
name: Corresponding Modality | ||
description: | | ||
The `mod-<label>` key/value pair corresponds to modality label for defacing | ||
masks, e.g., T1w, inplaneT1, referenced by a defacemask image. | ||
E.g., `sub-01_mod-T1w_defacemask.nii.gz`. | ||
format: label | ||
echo: | ||
name: Echo | ||
description: | | ||
Multi-echo data MUST be split into one file per echo. | ||
Each file shares the same name with the exception of the `_echo-<index>` | ||
key/value. | ||
Please note that the `<index>` denotes the number/index (in the form of a | ||
nonnegative integer) of the echo not the echo time value which needs to be | ||
stored in the field `EchoTime` of the separate JSON file. | ||
format: index | ||
recording: | ||
name: Recording | ||
description: | | ||
More than one continuous recording file can be included (with different | ||
sampling frequencies). | ||
In such case use different labels. | ||
For example: `_recording-contrast`, `_recording-saturation`. | ||
format: label | ||
proc: | ||
name: Processed (on device) | ||
description: | | ||
The proc label is analogous to rec for MR and denotes a variant of a file | ||
that was a result of particular processing performed on the device. | ||
This is useful for files produced in particular by Elekta’s MaxFilter | ||
(e.g. sss, tsss, trans, quat, mc, etc.), which some installations impose to | ||
be run on raw data because of active shielding software corrections before | ||
the MEG data can actually be exploited. | ||
format: label | ||
space: | ||
name: Space | ||
description: | | ||
The space label (`*[_space-<label>]_electrodes.tsv`) can be used | ||
to indicate the way in which electrode positions are interpreted. | ||
The space label needs to be taken from the list in Appendix VIII. | ||
format: label | ||
split: | ||
name: Split | ||
description: | | ||
In the case of long data recordings that exceed a file size of 2Gb, the | ||
.fif files are conventionally split into multiple parts. | ||
Each of these files has an internal pointer to the next file. | ||
This is important when renaming these split recordings to the BIDS | ||
convention. | ||
|
||
Instead of a simple renaming, files should be read in and saved under their | ||
new names with dedicated tools like MNE, which will ensure that not only | ||
the file names, but also the internal file pointers will be updated. | ||
It is RECOMMENDED that .fif files with multiple parts use the | ||
`split-<index>` entity to indicate each part. | ||
format: index | ||
# This file determines the order of entities in filenames. | ||
- sub | ||
- ses | ||
- task | ||
- acq | ||
- ce | ||
- rec | ||
- dir | ||
- run | ||
- mod | ||
- echo | ||
- recording | ||
- proc | ||
- space | ||
- split |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
--- | ||
name: T1-weighted | ||
class: suffix | ||
description: | | ||
T1-weighted MRI scan. | ||
# Suffix-specific fields | ||
extensions: | ||
- .nii.gz | ||
- .nii | ||
- .json | ||
entities: | ||
sub: required | ||
ses: optional | ||
run: optional | ||
acq: optional | ||
ce: optional | ||
rec: optional |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
--- | ||
name: Acquisition | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. am I catching it right that filename pretty much specifies the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If @effigies' idea is accepted then yes I would rename these files to use the altnames instead. |
||
class: entity | ||
description: | | ||
The `acq-<label>` key/value pair corresponds to a custom label the | ||
user MAY use to distinguish a different set of parameters used for | ||
acquiring the same modality. | ||
For example this should be used when a study includes two T1w images - one | ||
full brain low resolution and and one restricted field of view but high | ||
resolution. | ||
In such case two files could have the following names: | ||
`sub-01_acq-highres_T1w.nii.gz` and `sub-01_acq-lowres_T1w.nii.gz`, however | ||
the user is free to choose any other label than highres and lowres as long | ||
as they are consistent across subjects and sessions. | ||
In case different sequences are used to record the same modality (e.g. RARE | ||
and FLASH for T1w) this field can also be used to make that distinction. | ||
At what level of detail to make the distinction (e.g. just between RARE and | ||
FLASH, or between RARE, FLASH, and FLASHsubsampled) remains at the | ||
discretion of the researcher. | ||
# Entity-specific fields | ||
format: label |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
name: Contrast Enhancing Agent | ||
class: entity | ||
description: | | ||
The `ce-<label>` key/value can be used to distinguish | ||
sequences using different contrast enhanced images. | ||
The label is the name of the contrast agent. | ||
The key `ContrastBolusIngredient` MAY be also be added in the JSON file, | ||
with the same label. | ||
# Entity-specific fields | ||
format: label |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
name: Phase-Encoding Direction | ||
class: entity | ||
description: | | ||
The `dir-<label>` key/value can be used to distinguish | ||
different phase-encoding directions. | ||
# Entity-specific fields | ||
format: label |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
--- | ||
name: Echo | ||
class: entity | ||
description: | | ||
Multi-echo data MUST be split into one file per echo. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh, that is a good example where we have not just a description of a term (EchoTime which could be relevant for both entity AND sidecar file), but the description incorporating BIDS specific that it is REQUIRED (as an entity) in cases of multi-echo recording. So now I wonder if indeed descriptions in terms should have some reflection of being BIDS-Terms ;) or we just provide descriptions like this at the level of entity etc definitions (which would also describe There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeap yeap -- the latter: term description should be generic. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh wonderful - and validators then could validate that various instances of the term (e.g. in sidecar and filename entity) correspond since they do come from the same term, thus not hardcoding any kind of association! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The one I'm stuck on right now is "modality". Under Common Principles, modality has a very nice new definition:
On the other hand, the modality entity ( I don't think there's a good way of making one definition for both. WDYT? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do you think it doesn't work for the entity, because of Note: The only use of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The full definition of "modality" just doesn't seem relevant to the corresponding entity. At least it provides too much information that might confuse people. I don't want to cut it down though. I think the level of detail is necessary to understand the main concept defined in Common principles. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm just going to drop Common principles from the schema for now. Since the focus on this PR is separating context from description and adding the BIDS terms, I think we can just circle around to it later. |
||
Each file shares the same name with the exception of the `_echo-<index>` | ||
key/value. | ||
Please note that the `<index>` denotes the number/index (in the form of a | ||
nonnegative integer) of the echo not the echo time value which needs to be | ||
stored in the field `EchoTime` of the separate JSON file. | ||
# Entity-specific fields | ||
format: index |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
--- | ||
name: Corresponding Modality | ||
class: entity | ||
description: | | ||
The `mod-<label>` key/value pair corresponds to modality label for defacing | ||
masks, e.g., T1w, inplaneT1, referenced by a defacemask image. | ||
E.g., `sub-01_mod-T1w_defacemask.nii.gz`. | ||
# Entity-specific fields | ||
format: label |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
--- | ||
name: Processed (on device) | ||
class: entity | ||
description: | | ||
The proc label is analogous to rec for MR and denotes a variant of a file | ||
that was a result of particular processing performed on the device. | ||
This is useful for files produced in particular by Elekta’s MaxFilter | ||
(e.g. sss, tsss, trans, quat, mc, etc.), which some installations impose to | ||
be run on raw data because of active shielding software corrections before | ||
the MEG data can actually be exploited. | ||
# Entity-specific fields | ||
format: label |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
--- | ||
name: Reconstruction | ||
class: entity | ||
description: | | ||
The `rec-<label>` key/value can be used to distinguish | ||
different reconstruction algorithms (for example ones using motion | ||
correction). | ||
# Entity-specific fields | ||
format: label |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
--- | ||
name: Recording | ||
class: entity | ||
description: | | ||
More than one continuous recording file can be included (with different | ||
sampling frequencies). | ||
In such case use different labels. | ||
For example: `_recording-contrast`, `_recording-saturation`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. unrelated to this PR, I wonder if we should just reserve There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd like to avoid incorporating examples into the schema files since a lot of the examples used throughout the specification for different entities are section-specific. For example, I think you're right, though, that the examples should be removed from the entity definitions. We should clean up those definitions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wrote a response nodding in agreement but then realized -- we are talking about the super dooper machine readable filenaming standard! If we have ANY examples we would anyways need to parse and validate them to be legit or specify them as records ( but anyways -- it is indeed a separate topic for some future work and indeed they just need to be removed from descriptions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The filename format rendering will be generated in the specification text in #610. The examples I'm referring to are more text-based, like "acq may refer to something like different resolutions of T1w scans" in the Anatomical MRI section, while in the MEG |
||
# Entity-specific fields | ||
format: label |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
--- | ||
name: Run | ||
class: entity | ||
description: | | ||
If several scans of the same modality are acquired they MUST be indexed | ||
with a key-value pair: `_run-1`, `_run-2`, ..., `_run-<index>` | ||
(only nonnegative integers are allowed for the `<index>`). | ||
When there is only one scan of a given type the run key MAY be omitted. | ||
# Entity-specific fields | ||
format: index |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
--- | ||
name: Session | ||
class: entity | ||
description: | | ||
A logical grouping of neuroimaging and behavioral data consistent across | ||
subjects. | ||
Session can (but doesn't have to) be synonymous to a visit in a | ||
longitudinal study. | ||
In general, subjects will stay in the scanner during one session. | ||
However, for example, if a subject has to leave the scanner room and then | ||
be re-positioned on the scanner bed, the set of MRI acquisitions will still | ||
be considered as a session and match sessions acquired in other subjects. | ||
Similarly, in situations where different data types are obtained over | ||
several visits (for example fMRI on one day followed by DWI the day after) | ||
those can be grouped in one session. | ||
Defining multiple sessions is appropriate when several identical or similar | ||
data acquisitions are planned and performed on all -or most- subjects, | ||
often in the case of some intervention between sessions (e.g., training). | ||
# Entity-specific fields | ||
format: label |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
--- | ||
name: Space | ||
class: entity | ||
description: | | ||
The space label (`*[_space-<label>]_electrodes.tsv`) can be used | ||
to indicate the way in which electrode positions are interpreted. | ||
The space label needs to be taken from the list in Appendix VIII. | ||
# Entity-specific fields | ||
format: label |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
--- | ||
name: Split | ||
class: entity | ||
description: | | ||
In the case of long data recordings that exceed a file size of 2Gb, the | ||
.fif files are conventionally split into multiple parts. | ||
Each of these files has an internal pointer to the next file. | ||
This is important when renaming these split recordings to the BIDS | ||
convention. | ||
|
||
Instead of a simple renaming, files should be read in and saved under their | ||
new names with dedicated tools like MNE, which will ensure that not only | ||
the file names, but also the internal file pointers will be updated. | ||
It is RECOMMENDED that .fif files with multiple parts use the | ||
`split-<index>` entity to indicate each part. | ||
# Entity-specific fields | ||
format: index |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
--- | ||
name: Subject | ||
class: entity | ||
description: | | ||
A person or animal participating in the study. | ||
# Entity-specific fields | ||
format: label |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
--- | ||
name: Task | ||
class: entity | ||
description: | | ||
Each task has a unique label that MUST only consist of letters and/or | ||
numbers (other characters, including spaces and underscores, are not | ||
allowed). | ||
Those labels MUST be consistent across subjects and sessions. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similarly to |
||
# Entity-specific fields | ||
format: label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the main thing that @yarikoptic will object to with this structure. If we have a single file for each suffix, for example, we will have a very large number of files to update any time we change the specification's supported entities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is especially true for anatomical MRI scans, in which many suffixes have the same rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, I think there's a lot of benefit in having suffix-specific definitions at least. Perhaps we could start with the unfortunate duplication, and then try an inheritance-like procedure as discussed in #588. For example, there could be "parent" YAML files with entities and extensions specified that the "child" YAMLs (like
T1w.yaml
) could inherit from. These parent files would correspond to the subgroups we currently have in our datatype YAML files. Does that make sense?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why to bring in
entities
andextensions
in here?suffix
and adatatype
(ref: Separate schema item definitions from layout rules #603 (comment))items/
should have no "class-specific" attributes and be of a uniform schema (which attributes they could have etc). I think if those are introduced, establishing later a schema (and a validator) for our schema would be trickierdatatypes/anat.yaml
for T1w). Then files initems/
could just provide a file for every "name" (well -- "altname") mentioned in thedatatypes/*yaml
file in any section, thus just centralizing their description etc:.nii.gz.yaml
) which would provide description etcThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was your comment a reply to mine? github seems placed it above mine -- just want to make sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#609 (comment) was before #609 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am inclined to think that the entities and extensions are properties of the items. It's just that we need to treat instances where one term refers to two difference classes (e.g.,
meg
as adatatype
andsuffix
) as separate items. There should be adatatype-meg
and asuffix-meg
, for example. While that is my preference, though, I'm willing to defer to folks with more experience in this area.I don't think that's entirely possible. For example, entities are going to have features that are distinct from other classes, such as whether they take an
index
or alabel
, as well as their "entity-key". Metadata fields, at minimum, need "units". I think those are features of the actual items, rather than how they interact with other classes.Well, on the bright side we can always change things later, so I'm willing to give it a try. I can start with the filename, the description, and a "display-name" (per @effigies suggestion) for each of the suffixes. If we want to ultimately add the entities, extensions, and possibly metadata fields later on, that should be easy enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh the call ended while I was staring at this PR and thought to may be chat about it further (sorry for being slow with replies on github): I think if we make this
items/
folder intoterms/
it could be pretty much what BIDS_Terms folder is (just nicer yamls, from which jsonlds would be generated) with just with descriptions tuned up to not provide any "BIDS formatting" specifics, and rather describing the actual generic concept (subject, session, MR enhancement contrast agent, etc). Then it would be for schema "upstairs" (inentities/
etc) to use the terms provided underterms/
, so any "word" used in the schema has clear mapping to a term.It would be ok for the same term to be used in different contexts (
datatype/
folder vs a filename_suffix
). If there is a need to provide a "concept" of a composite term + context where it is used, I think we (or whoever needs them) could automatically "mint" them, at the level of composition they need. So there could be a concept ofsub-*/[ses-*/]<datatype-term>/
corresponding to a "BIDS folder with data for a single acquisition session of a single subject", thus reflecting layout limitation of BIDS (i.e. we cannot have a folder with eeg data for all subjects).I think it also boils down to "contexts" to provide mapping if there is no 1-to-1 mapping. E.g. (hypothetical since no concrete case comes to mind ATM; and may be not good here since IMHO the example term is a composition of modality + concept): If there was an
_nchan-<number>
entity to signal number of channels in the file, then it would correspond to either ECGChannelCount or EEGChannelCount etc depending on the context -- datatype for which it is listed. So we would then need to be able somehow to say that entity points to different terms based on some other level (datatype ATM). For that we could provision that any entity listing does not only list its "use" (optional vs required; may be there is a better name) but also a "term". E.g.datatypes/eeg.yaml
could haveand allow similar gimmick for any place where term could be used and we need to remap depending on the context.
To me, it is not yet totally clear on where we would need such "composite" terms. Since even in the case of EEGChannelCount, it is sameAs a generic "http://purl.org/nidash/nidm#NumberOfChannels", so I think overall it might be a utopia to fight combinatorics at the level of terms ;), and we should get away from it.