Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SCHEMA] Reorganize schema #609

Closed
wants to merge 18 commits into from
Closed
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions src/schema/auxmodalities/headshape.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
- datatypes:
- meg
suffixes:
- headshape
extensions:
- .pos
- .txt
entities:
sub: required
ses: optional
acq: optional
space: optional
13 changes: 13 additions & 0 deletions src/schema/auxmodalities/markers.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
datatypes:
- meg
suffixes:
- markers
extensions:
- .json
entities:
sub: required
ses: optional
task: optional
acq: optional
space: optional
File renamed without changes.
103 changes: 103 additions & 0 deletions src/schema/common_principles.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
Dataset:
description: |
A set of neuroimaging and behavioral data acquired for a purpose of a
particular study.
A dataset consists of data acquired from one or more subjects, possibly
from multiple sessions.
Subject:
description: |
A person or animal participating in the study.
Used interchangeably with term Participant.
Session:
description: |
A logical grouping of neuroimaging and behavioral data consistent across
subjects.
Session can (but doesn't have to) be synonymous to a visit in a
longitudinal study.
In general, subjects will stay in the scanner during one session.
However, for example, if a subject has to leave the scanner room and then
be re-positioned on the scanner bed, the set of MRI acquisitions will still
be considered as a session and match sessions acquired in other subjects.
Similarly, in situations where different data types are obtained over
several visits (for example fMRI on one day followed by DWI the day after)
those can be grouped in one session.
Defining multiple sessions is appropriate when several identical or similar
data acquisitions
are planned and performed on all -or most- subjects, often in the case of
some intervention between sessions (e.g., training).
Data acquisition:
description: |
A continuous uninterrupted block of time during which a brain scanning
instrument was acquiring data according to particular scanning
sequence/protocol.
Data type:
description: |
A functional group of different types of data.
BIDS defines eight data types:
func (task based and resting state functional MRI),
dwi (diffusion weighted imaging),
fmap (field inhomogeneity mapping data such as field maps),
anat (structural imaging such as T1, T2, etc.),
meg (magnetoencephalography),
eeg (electroencephalography),
ieeg (intracranial electroencephalography),
beh (behavioral).
Data files are contained in a directory named for the data type.
In raw datasets, the data type directory is nested inside subject and
(optionally) session directories.
Task:
description: |
A set of structured activities performed by the participant.
Tasks are usually accompanied by stimuli and responses, and can greatly
vary in complexity.
For the purpose of this specification we consider the so-called
"resting state" a task.
In the context of brain scanning, a task is always tied to one data
acquisition.
Therefore, even if during one acquisition the subject performed multiple
conceptually different behaviors (with different sets of instructions) they
will be considered one (combined) task.
Event:
description: |
A stimulus or subject response recorded during a task.
Each event has an onset time and duration.
Note that not all tasks will have recorded events (e.g., “resting state”).
Run:
description: |
An uninterrupted repetition of data acquisition that has the same
acquisition parameters and task (however events can change from run to run
due to different subject response or randomized nature of the stimuli).
Run is a synonym of a data acquisition.
Modality:
description: |
The category of brain data recorded by a file.
For MRI data, different pulse sequences are considered distinct modalities,
such as T1w, bold or dwi.
For passive recording techniques, such as EEG, MEG or iEEG, the technique
is sufficiently uniform to define the modalities eeg, meg and ieeg.
When applicable, the modality is indicated in the suffix.
The modality may overlap with, but should not be confused with the data
type.
index:
description: |
A nonnegative integer, possibly prefixed with arbitrary number of 0s for
consistent indentation, e.g., it is 01 in run-01 following run-<index>
specification.
label:
description: |
An alphanumeric value, possibly prefixed with arbitrary number of 0s for
consistent indentation, e.g., it is rest in task-rest following
task-<label> specification.
suffix:
description: |
An alphanumeric value, located after the key-value_ pairs
(thus after the final _), right before the File extension,
e.g., it is eeg in sub-05_task-matchingpennies_eeg.vhdr.
File extension:
description: |
A portion of the the file name after the left-most period (.) preceded by
any other alphanumeric.
For example, .gitignore does not have a file extension, but the file
extension of test.nii.gz is .nii.gz.
Note that the left-most period is included in the file extension.
21 changes: 0 additions & 21 deletions src/schema/datatypes/meg.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,24 +22,3 @@
run: optional
proc: optional
split: optional
# Second group
- suffixes:
- headshape
extensions:
- .pos
- .txt
entities:
sub: required
ses: optional
acq: optional
space: optional
- suffixes:
- markers
extensions:
- .json
entities:
sub: required
ses: optional
task: optional
acq: optional
space: optional
152 changes: 15 additions & 137 deletions src/schema/entities.yaml
Original file line number Diff line number Diff line change
@@ -1,138 +1,16 @@
---
sub:
name: Subject
description: |
A person or animal participating in the study.
format: label
ses:
name: Session
description: |
A logical grouping of neuroimaging and behavioral data consistent across
subjects.
Session can (but doesn't have to) be synonymous to a visit in a
longitudinal study.
In general, subjects will stay in the scanner during one session.
However, for example, if a subject has to leave the scanner room and then
be re-positioned on the scanner bed, the set of MRI acquisitions will still
be considered as a session and match sessions acquired in other subjects.
Similarly, in situations where different data types are obtained over
several visits (for example fMRI on one day followed by DWI the day after)
those can be grouped in one session.
Defining multiple sessions is appropriate when several identical or similar
data acquisitions are planned and performed on all -or most- subjects,
often in the case of some intervention between sessions (e.g., training).
format: label
task:
name: Task
format: label
description: |
Each task has a unique label that MUST only consist of letters and/or
numbers (other characters, including spaces and underscores, are not
allowed).
Those labels MUST be consistent across subjects and sessions.
acq:
name: Acquisition
description: |
The `acq-<label>` key/value pair corresponds to a custom label the
user MAY use to distinguish a different set of parameters used for
acquiring the same modality.
For example this should be used when a study includes two T1w images - one
full brain low resolution and and one restricted field of view but high
resolution.
In such case two files could have the following names:
`sub-01_acq-highres_T1w.nii.gz` and `sub-01_acq-lowres_T1w.nii.gz`, however
the user is free to choose any other label than highres and lowres as long
as they are consistent across subjects and sessions.
In case different sequences are used to record the same modality (e.g. RARE
and FLASH for T1w) this field can also be used to make that distinction.
At what level of detail to make the distinction (e.g. just between RARE and
FLASH, or between RARE, FLASH, and FLASHsubsampled) remains at the
discretion of the researcher.
format: label
ce:
name: Contrast Enhancing Agent
description: |
The `ce-<label>` key/value can be used to distinguish
sequences using different contrast enhanced images.
The label is the name of the contrast agent.
The key `ContrastBolusIngredient` MAY be also be added in the JSON file,
with the same label.
format: label
rec:
name: Reconstruction
description: |
The `rec-<label>` key/value can be used to distinguish
different reconstruction algorithms (for example ones using motion
correction).
format: label
dir:
name: Phase-Encoding Direction
description: |
The `dir-<label>` key/value can be used to distinguish
different phase-encoding directions.
format: label
run:
name: Run
description: |
If several scans of the same modality are acquired they MUST be indexed
with a key-value pair: `_run-1`, `_run-2`, ..., `_run-<index>`
(only nonnegative integers are allowed for the `<index>`).
When there is only one scan of a given type the run key MAY be omitted.
format: index
mod:
name: Corresponding Modality
description: |
The `mod-<label>` key/value pair corresponds to modality label for defacing
masks, e.g., T1w, inplaneT1, referenced by a defacemask image.
E.g., `sub-01_mod-T1w_defacemask.nii.gz`.
format: label
echo:
name: Echo
description: |
Multi-echo data MUST be split into one file per echo.
Each file shares the same name with the exception of the `_echo-<index>`
key/value.
Please note that the `<index>` denotes the number/index (in the form of a
nonnegative integer) of the echo not the echo time value which needs to be
stored in the field `EchoTime` of the separate JSON file.
format: index
recording:
name: Recording
description: |
More than one continuous recording file can be included (with different
sampling frequencies).
In such case use different labels.
For example: `_recording-contrast`, `_recording-saturation`.
format: label
proc:
name: Processed (on device)
description: |
The proc label is analogous to rec for MR and denotes a variant of a file
that was a result of particular processing performed on the device.
This is useful for files produced in particular by Elekta’s MaxFilter
(e.g. sss, tsss, trans, quat, mc, etc.), which some installations impose to
be run on raw data because of active shielding software corrections before
the MEG data can actually be exploited.
format: label
space:
name: Space
description: |
The space label (`*[_space-<label>]_electrodes.tsv`) can be used
to indicate the way in which electrode positions are interpreted.
The space label needs to be taken from the list in Appendix VIII.
format: label
split:
name: Split
description: |
In the case of long data recordings that exceed a file size of 2Gb, the
.fif files are conventionally split into multiple parts.
Each of these files has an internal pointer to the next file.
This is important when renaming these split recordings to the BIDS
convention.

Instead of a simple renaming, files should be read in and saved under their
new names with dedicated tools like MNE, which will ensure that not only
the file names, but also the internal file pointers will be updated.
It is RECOMMENDED that .fif files with multiple parts use the
`split-<index>` entity to indicate each part.
format: index
# This file determines the order of entities in filenames.
- sub
- ses
- task
- acq
- ce
- rec
- dir
- run
- mod
- echo
- recording
- proc
- space
- split
7 changes: 2 additions & 5 deletions src/schema/modalities.yaml → src/schema/instruments.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,10 @@ mri:
- dwi
- fmap
- func
eeg:
name: Electroencephalography
bioamp:
name: Biopotential Amplification
datatypes:
- eeg
ieeg:
name: Intracranial Electroencephalography
datatypes:
- ieeg
meg:
name: Magnetoencephalography
Expand Down
5 changes: 5 additions & 0 deletions src/schema/items/.nii.gz.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
name: Gzipped Nifti
class: extension
description: |
A gzipped nifti file.
6 changes: 6 additions & 0 deletions src/schema/items/Manufacturer.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
name: Manufacturer
class: metadata
description: |
Manufacturer of the equipment that produced the composite instances.
Corresponds to DICOM Tag 0008, 0070 `Manufacturer`.
17 changes: 17 additions & 0 deletions src/schema/items/T1w.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
name: T1-weighted
class: suffix
description: |
T1-weighted MRI scan.
# Suffix-specific fields
extensions:
- .nii.gz
- .nii
- .json
entities:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the main thing that @yarikoptic will object to with this structure. If we have a single file for each suffix, for example, we will have a very large number of files to update any time we change the specification's supported entities.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is especially true for anatomical MRI scans, in which many suffixes have the same rules.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, I think there's a lot of benefit in having suffix-specific definitions at least. Perhaps we could start with the unfortunate duplication, and then try an inheritance-like procedure as discussed in #588. For example, there could be "parent" YAML files with entities and extensions specified that the "child" YAMLs (like T1w.yaml) could inherit from. These parent files would correspond to the subgroups we currently have in our datatype YAML files. Does that make sense?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why to bring in entities and extensions in here?

  • It is not per se "item"'s property, in particular whenever it might be both a suffix and a datatype (ref: Separate schema item definitions from layout rules #603 (comment))
  • I think records in items/ should have no "class-specific" attributes and be of a uniform schema (which attributes they could have etc). I think if those are introduced, establishing later a schema (and a validator) for our schema would be trickier
  • What if we keep them ("extensions", "entities") where/as they are (i.e. datatypes/anat.yaml for T1w). Then files in items/ could just provide a file for every "name" (well -- "altname") mentioned in the datatypes/*yaml file in any section, thus just centralizing their description etc:
    • Even extensions could get their class (i.e. "extension") and we could have a bunch of (hidden ;)) files (like .nii.gz.yaml) which would provide description etc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was your comment a reply to mine? github seems placed it above mine -- just want to make sure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am inclined to think that the entities and extensions are properties of the items. It's just that we need to treat instances where one term refers to two difference classes (e.g., meg as a datatype and suffix) as separate items. There should be a datatype-meg and a suffix-meg, for example. While that is my preference, though, I'm willing to defer to folks with more experience in this area.

I think records in items/ should have no "class-specific" attributes and be of a uniform schema (which attributes they could have etc). I think if those are introduced, establishing later a schema (and a validator) for our schema would be trickier

I don't think that's entirely possible. For example, entities are going to have features that are distinct from other classes, such as whether they take an index or a label, as well as their "entity-key". Metadata fields, at minimum, need "units". I think those are features of the actual items, rather than how they interact with other classes.

What if we keep them ("extensions", "entities") where/as they are (i.e. datatypes/anat.yaml for T1w). Then files in items/ could just provide a file for every "name" (well -- "altname") mentioned in the datatypes/*yaml file in any section, thus just centralizing their description etc:

Well, on the bright side we can always change things later, so I'm willing to give it a try. I can start with the filename, the description, and a "display-name" (per @effigies suggestion) for each of the suffixes. If we want to ultimately add the entities, extensions, and possibly metadata fields later on, that should be easy enough.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh the call ended while I was staring at this PR and thought to may be chat about it further (sorry for being slow with replies on github): I think if we make this items/ folder into terms/ it could be pretty much what BIDS_Terms folder is (just nicer yamls, from which jsonlds would be generated) with just with descriptions tuned up to not provide any "BIDS formatting" specifics, and rather describing the actual generic concept (subject, session, MR enhancement contrast agent, etc). Then it would be for schema "upstairs" (in entities/ etc) to use the terms provided under terms/, so any "word" used in the schema has clear mapping to a term.
It would be ok for the same term to be used in different contexts (datatype/ folder vs a filename _suffix). If there is a need to provide a "concept" of a composite term + context where it is used, I think we (or whoever needs them) could automatically "mint" them, at the level of composition they need. So there could be a concept of sub-*/[ses-*/]<datatype-term>/ corresponding to a "BIDS folder with data for a single acquisition session of a single subject", thus reflecting layout limitation of BIDS (i.e. we cannot have a folder with eeg data for all subjects).
I think it also boils down to "contexts" to provide mapping if there is no 1-to-1 mapping. E.g. (hypothetical since no concrete case comes to mind ATM; and may be not good here since IMHO the example term is a composition of modality + concept): If there was an _nchan-<number> entity to signal number of channels in the file, then it would correspond to either ECGChannelCount or EEGChannelCount etc depending on the context -- datatype for which it is listed. So we would then need to be able somehow to say that entity points to different terms based on some other level (datatype ATM). For that we could provision that any entity listing does not only list its "use" (optional vs required; may be there is a better name) but also a "term". E.g. datatypes/eeg.yaml could have

  ...
  entities:
    sub: required
    ses: optional
    task: required
    acq: optional
    run: optional
    nchan:
      use: optional
      term: EEGChannelCount

and allow similar gimmick for any place where term could be used and we need to remap depending on the context.

To me, it is not yet totally clear on where we would need such "composite" terms. Since even in the case of EEGChannelCount, it is sameAs a generic "http://purl.org/nidash/nidm#NumberOfChannels", so I think overall it might be a utopia to fight combinatorics at the level of terms ;), and we should get away from it.

sub: required
ses: optional
run: optional
acq: optional
ce: optional
rec: optional
21 changes: 21 additions & 0 deletions src/schema/items/acq.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
name: Acquisition
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am I catching it right that filename pretty much specifies the altname (#588)?

Copy link
Member Author

@tsalo tsalo Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If @effigies' idea is accepted then yes I would rename these files to use the altnames instead.

class: entity
description: |
The `acq-<label>` key/value pair corresponds to a custom label the
user MAY use to distinguish a different set of parameters used for
acquiring the same modality.
For example this should be used when a study includes two T1w images - one
full brain low resolution and and one restricted field of view but high
resolution.
In such case two files could have the following names:
`sub-01_acq-highres_T1w.nii.gz` and `sub-01_acq-lowres_T1w.nii.gz`, however
the user is free to choose any other label than highres and lowres as long
as they are consistent across subjects and sessions.
In case different sequences are used to record the same modality (e.g. RARE
and FLASH for T1w) this field can also be used to make that distinction.
At what level of detail to make the distinction (e.g. just between RARE and
FLASH, or between RARE, FLASH, and FLASHsubsampled) remains at the
discretion of the researcher.
# Entity-specific fields
format: label
5 changes: 5 additions & 0 deletions src/schema/items/anat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
name: Anatomical MRI
class: datatype
description: |
Anatomical magnetic resonance imaging data.
11 changes: 11 additions & 0 deletions src/schema/items/ce.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
name: Contrast Enhancing Agent
class: entity
description: |
The `ce-<label>` key/value can be used to distinguish
sequences using different contrast enhanced images.
The label is the name of the contrast agent.
The key `ContrastBolusIngredient` MAY be also be added in the JSON file,
with the same label.
# Entity-specific fields
format: label
Loading