-
Notifications
You must be signed in to change notification settings - Fork 170
/
Copy pathtop_level_files.yaml
110 lines (91 loc) · 4.95 KB
/
top_level_files.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---
# This file describes files which may appear at the top level of a dataset.
# This does not include information about whether these files are required or optional.
# For that information, see `rules/top_level_files.yaml`.
CHANGES:
name: CHANGES
description: |
Version history of the dataset (describing changes, updates and corrections) MAY be provided in
the form of a `CHANGES` text file.
This file MUST follow the
[CPAN Changelog convention](https://metacpan.org/pod/release/HAARG/CPAN-Changes-0.400002/lib/\
CPAN/Changes/Spec.pod).
The `CHANGES` file MUST be either in ASCII or UTF-8 encoding.
LICENSE:
name: LICENSE
description: |
A `LICENSE` file MAY be provided in addition to the short specification of the
used license in the `dataset_description.json` `"License"` field.
The `"License"` field and `LICENSE` file MUST correspond.
The `LICENSE` file MUST be either in ASCII or UTF-8 encoding.
README:
name: README
description: |
In addition a free form text file (`README`) describing the dataset in more details SHOULD be
provided.
The `README` file MUST be either in ASCII or UTF-8 encoding.
dataset_description:
name: Dataset Description
description: |
The file `dataset_description.json` is a JSON file describing the dataset.
genetic_info:
name: Genetic Information
description: |
The `genetic_info.json` file describes the genetic information available in the
`participants.tsv` file and/or the genetic database described in
`dataset_description.json`.
Datasets containing the `Genetics` field in `dataset_description.json` or the
`genetic_id` column in `participants.tsv` MUST include this file.
participants:
name: Participant Information
description: |
The purpose of this RECOMMENDED file is to describe properties of participants
such as age, sex, handedness.
If this file exists, it MUST contain the column `participant_id`,
which MUST consist of `sub-<label>` values identifying one row for each participant,
followed by a list of optional columns describing participants.
Each participant MUST be described by one and only one row.
Commonly used *optional* columns in `participant.tsv` files are `age`, `sex`,
and `handedness`. We RECOMMEND to make use of these columns, and
in case that you do use them, we RECOMMEND to use the following values
for them:
- `age`: numeric value in years (float or integer value)
- `sex`: string value indicating phenotypical sex, one of "male", "female",
"other"
- for "male", use one of these values: `male`, `m`, `M`, `MALE`, `Male`
- for "female", use one of these values: `female`, `f`, `F`, `FEMALE`,
`Female`
- for "other", use one of these values: `other`, `o`, `O`, `OTHER`,
`Other`
- `handedness`: string value indicating one of "left", "right",
"ambidextrous"
- for "left", use one of these values: `left`, `l`, `L`, `LEFT`, `Left`
- for "right", use one of these values: `right`, `r`, `R`, `RIGHT`,
`Right`
- for "ambidextrous", use one of these values: `ambidextrous`, `a`, `A`,
`AMBIDEXTROUS`, `Ambidextrous`
Throughout BIDS you can indicate missing values with `n/a` (for "not
available").
samples:
name: Sample Information
description: |
The purpose of this file is to describe properties of samples, indicated by the `sample` entity.
This file is REQUIRED if `sample-<label>` is present in any file name within the dataset.
If this file exists, it MUST contain the three following columns:
- `sample_id`: MUST consist of `sample-<label>` values identifying one row
for each sample
- `participant_id`: MUST consist of `sub-<label>`
- `sample_type`: MUST consist of sample type values, either `cell line`, `in vitro differentiated cells`,
`primary cell`, `cell-free sample`, `cloning host`, `tissue`, `whole organisms`, `organoid` or
`technical sample` from [ENCODE Biosample Type](https://www.encodeproject.org/profiles/biosample_type)
Other optional columns MAY be used to describe the samples.
Each sample MUST be described by one and only one row.
Commonly used *optional* columns in `samples.tsv` files are `pathology` and
`derived_from`. We RECOMMEND to make use of these columns, and in case that
you do use them, we RECOMMEND to use the following values for them:
- `pathology`: string value describing the pathology of the sample or type of control.
When different from `healthy`, pathology SHOULD be specified in `samples.tsv`.
The pathology MAY instead be specified in
[Sessions files](06-longitudinal-and-multi-site-studies.md#sessions-file) in case it changes over time.
- `derived_from`: `sample-<label>` key/value pair from which a sample is derived from,
for example a slice of tissue (`sample-02`) derived from a block of tissue (`sample-01`)