For detail on ISA framework terminology, please read the ISA Abstract Model specification.
This document describes the ISA Abstract Model reference implementation specified in the ISA-XLSX format. The XLSX format uses the SpreadsheetML markup language and schema to represent a spreadsheet document. Conceptually, using the terminology of the Spreadsheet ML specification ISO/IEC 29500-1, the document comprises one or more worksheets in a workbook.
Table of contents
- Investigation File
- Study File
- Assay File
- Datamap File
- Top-level metadata sheets
- Annotation Table sheets
- Datamap Table sheets
Below we provide the schemas and the content rules for valid ISA-XLSX documents.
ISA-XLSX uses three types of files to capture the experimental metadata:
- Investigation file
- Study file
- Assay file
The Investigation file contains all the information needed to understand the overall goals and means used in an experiment; experimental steps (or sequences of events) are described in the Study and in the Assay file(s). For each Investigation file there may be one or more Studies defined with a corresponding Study file; for each Study there may be one or more Assays defined with corresponding Assay files; one assay file may be registered in different studies.
In order to facilitate identification of ISA-XLSX component files, specific naming patterns MUST be followed:
isa.investigation.xlsx
for identifying the Investigation fileisa.study.xlsx
for identifying Study file(s)isa.assay.xlsx
for identifying Assay file(s)
Sheets described in this specification MUST follow one of the two given formats:
Top-level metadata sheets
for listing top-level metadataAnnotation Table sheets
for describing experimental workflows
Sheets which do not follow any of these two formats are considered additional payload and are ignored in this specification.
All labels are case-sensitive:
Dates SHOULD be supplied in the ISO8601 format.
For maximal portability file names SHOULD contain only ASCII characters not excluded
already (that is A-Za-z0-9._!#$%&+,;=@^(){}'[]
- we exclude space as many utilities
do not accept spaces in file paths): non-English alphabetic characters cannot be guaranteed
to be supported in all locales. It is recommended to avoid the shell metacharacters
(){}'[]$."
.
The Investigation file
fulfils four needs:
- to declare key entities, such as factors, protocols, which may be referenced in the other files
- to track provenance of the used terminologies (controlled vocabularies or ontologies), where applicable
- to relate Assay files to Studies
- to select those Studies, that are considered part of the investigation.
The Investigation File
MUST contain one Top-Level Metadata sheet
. This sheet MUST be named isa_investigation
and MUST contain the following sections:
Additionally, it MAY contain the following sections:
STUDY
STUDY DESIGN DESCRIPTORS
STUDY PUBLICATIONS
STUDY FACTORS
STUDY ASSAYS
STUDY PROTOCOLS
STUDY CONTACTS
The Investigation File
implements the Investigation
graph from the ISA Abstract Model.
The Study
represents a set of logically connected experiments. A Study File
contains contextualising information for one or more Assays
, metadata about the study design, study factors used, and study protocols, as well as information similarly to the Investigation including title and description of the study, and related people and scholarly publications, but also details the sample collection process needed to perform the connected Assays
.
The Study File
MUST contain one Top-Level Metadata sheet
. This sheet MUST be named isa_study
and MUST contain the following sections:
Additionally, it MAY contain the following sections:
Additionally, the Study File
SHOULD contain one or more Annotation Table sheet(s)
, which MAY record provenance of biological samples, from source material through a collection process to sample material.
Therefore, the main entities of the Study File
should be Sources
and Samples
.
Any Study MAY contain datamap references as described in the Datamap Sheet
section.
The Study File
implements the Study
graph from the ISA Abstract Model. graph from the ISA Abstract Model.
The Assay
represents one experimental measurement. An Assay File
metadata about the assay design, information about the people performing the experiment, and most importantly, details about the preparation and/or execution of the experimental measurement.
The Assay File
MUST contain one Top-Level Metadata sheet
. This sheet MUST be named isa_assay
and MUST contain the following sections:
Additionally, the Assay File
SHOULD contain one or more Annotation Table sheet(s)
, which MAY record preparation of biological samples, measurement of these samples and basic computations performed on the resulting data.
Therefore, the main entities of the Assay File
should be Samples
and Data
.
Any Assay MAY contain datamap references as described in the Datamap Sheet
section.
The Assay File
implements the Assay
graph from the ISA Abstract Model.
The Datamap
represents a set of explanations about the data
entities defined in assays
and studies
.
The Datamap File
MUST contain one Datamap table sheet
. This sheet MUST be named isa_datamap
.
Therefore, the main entities of the Datamap File
should be Data
.
The Datamap File
acts as an extension of the data
nodes defined in the Study and Assay graphs section
from the ISA Abstract Model.
The purpose of top-level metadata sheets is aggregating and listing top-level metadata. Each sheet consists of sections consisting of a section header and key-value fields. Section headers MUST be completely written in upper case (e.g. STUDY), field headers MUST have the first letter of each word in upper case (e.g. Study Identifier); with the exception of the referencing label (REF).
In the following sections, examples of each section block are given beside the specification of each section.
Rows in which the first character in the first column is Unicode U+0023 (the
#
character) > MUST be interpreted as comments, where reference implementation parsers SHOULD ignore those lines entirely.
Rows where the label
Comment[<comment name>]
appear can also appear within any of the > section blocks. Where these appear, the comment name must be unique within the context of a single block (e.g. you cannot have multiple occurrences ofComment[external DB REF]
withinSTUDY ASSAYS
. Also, the value cells MUST match the number of values indicated by the rest of the section in context.
The Ontology Source section of the Investigation file is used to declare Ontology Sources used elsewhere in the ISA-XLSX files within the context of an Investigation.
Where a row labelled with Term Source REF
suffixed in a Top-level metadata sheet
, the value of the cell SHOULD match one of the Term Source Name
value declared in this section.
Where a column labelled with Term Source REF
in a Annotation table sheet
, the value
of the cell SHOULD match one of the Term Source Name
value declared in this section.
This section implements a list of Ontology Source
from the ISA Abstract Model.
This section MUST contain zero or more values.
This section MUST contain the following labels, with the specified datatypes for values supported:
Label | Datatype | Description |
---|---|---|
Term Source Name | String | The name of the source of a term; i.e. the source controlled vocabulary or ontology. These names will be used in all corresponding Term Source REF fields that occur elsewhere. |
Term Source File | String (file name or URI) | A file name or a URI of an official resource. |
Term Source Version | String | The version number of the Term Source to support terms tracking. |
Term Source Description | String | Use for disambiguating resources when homologous prefixes have been used. |
Example
For example, the ONTOLOGY SOURCE REFERENCE
section of an ISA-XLSX isa.investigation.xlsx
file may look as follows:
ONTOLOGY SOURCE REFERENCE | |||||
Term Source Name | CHEBI | EFO | OBI | NCBITAXON | PATO |
Term Source File | http://data.bioontology.org/ontologies/CHEBI | http://data.bioontology.org/ontologies/EFO | http://data.bioontology.org/ontologies/OBI | http://data.bioontology.org/ontologies/NCBITAXON | http://data.bioontology.org/ontologies/PATO |
Term Source Version | 78 | 111 | 21 | 2 | 160 |
Term Source Description | Chemical Entities of Biological Interest Ontology | Experimental Factor Ontology | Ontology for Biomedical Investigations | National Center for Biotechnology Information (NCBI) Organismal Classification | Phenotypic Quality Ontology |
This section is organized in several subsections, described in detail below.
This section implements an Investigation
from the ISA Abstract Model.
This section MUST contain zero or one values.
This section MUST contain the following labels, with the specified datatypes for values supported:
Label | Datatype | Description |
---|---|---|
Investigation Identifier | String | A identifier or an accession number provided by a repository. This SHOULD be locally unique. |
Investigation Title | String | A concise name given to the investigation. |
Investigation Description | String | A textual description of the investigation. |
Investigation Submission Date | String formatted as ISO8601 date YYYY-MM-DD | The date on which the investigation was reported to the repository. |
Investigation Public Release Date | String formatted as ISO8601 date YYYY-MM-DD | The date on which the investigation was released publicly. |
Example
For example, the INVESTIGATION
section of an ISA-XLSX isa.investigation.xlsx
file may look as follows:
INVESTIGATION | |
Investigation Identifier | ChlamyHeatstress |
Investigation Title | Systems-wide investigation of responses to moderate and acute high temperatures in the green alga Chlamydomonas reinhardtii. |
Investigation Description | Algae cultures were grown mixotrophically (TAP). After 24h of 35°C/40°C the cells were shifted back to room temperature for 48h. 'omics samples were taken. |
Investigation Submission Date | 2022-05-13 |
Investigation Public Release Date |
This section MUST contain zero or more values.
This section MUST contain the following labels, with the specified datatypes for values supported:
Label | Datatype | Description |
---|---|---|
Investigation Publication PubMed ID | String formatted as valid PubMed ID | The PubMed IDs of the described publication(s) associated with this investigation. |
Investigation Publication DOI | String formatted as valid DOI | A Digital Object Identifier (DOI) for that publication (where available). |
Investigation Publication Author List | String | The list of authors associated with that publication. |
Investigation Publication Title | String | The title of publication associated with the investigation. |
Investigation Publication Status | String, or Ontology Annotation by providing accompanying Term Accession Number and Term Source REF | A term describing the status of that publication (i.e. submitted, in preparation, published). |
Investigation Publication Status Term Accession Number | String or URI | The accession number from the Term Source associated with the selected term. |
Investigation Publication Status Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one the Term Source Name declared in the in the Ontology Source Reference section. |
Example
For example, the INVESTIGATION PUBLICATIONS
section of an ISA-XLSX isa.investigation.xlsx
file may look as follows:
INVESTIGATION PUBLICATIONS | |
Investigation Publication PubMed ID | PMC9106746 |
Investigation Publication DOI | 10.1038/s42003-022-03359-z |
Investigation Publication Author List | Ningning Zhang, Erin M. Mattoon, Will McHargue, Benedikt Venn, David Zimmer, Kresti Pecani, Jooyeon Jeong, Cheyenne M. Anderson, Chen Chen, Jeffrey C. Berry, Ming Xia, Shin-Cheng Tzeng, Eric Becker, Leila Pazouki, Bradley Evans, Fred Cross, Jianlin Cheng, Kirk J. Czymmek, Michael Schroda, Timo Mühlhaus & Ru Zhang |
Investigation Publication Title | Systems-wide analysis revealed shared and unique responses to moderate and acute high temperatures in the green alga Chlamydomonas reinhardtii |
Investigation Publication Status | published |
Investigation Publication Status Term Accession Number | http://purl.org/spar/pso/published |
Investigation Publication Status Term Source REF | PSO |
This section MUST contain zero or more values.
This section MUST contain the following labels, with the specified datatypes for values supported:
Label | Datatype | Description |
---|---|---|
Investigation Person Last Name | String | The last name of a person associated with the investigation. |
Investigation Person First Name | String | Investigation Person Name |
Investigation Person Mid Initials | String | The middle initials of a person associated with the investigation. |
Investigation Person Email | String formatted as email | The email address of a person associated with the investigation. |
Investigation Person Phone | String | The telephone number of a person associated with the investigation. |
Investigation Person Fax | String | The fax number of a person associated with the investigation. |
Investigation Person Address | String | The address of a person associated with the investigation. |
Investigation Person Affiliation | String | The organization affiliation for a person associated with the investigation. |
Investigation Person Roles | String or Ontology Annotation if accompanied by Term Accession Numbers and Term Source REFs | Term to classify the role(s) performed by this person in the context of the investigation, which means that the roles reported here need not correspond to roles held withing their affiliated organization. Multiple annotations or values attached to one person can be provided by using a semicolon (“;”) Unicode (U0003+B) as a separator (e.g.: submitter;funder;sponsor) .The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. |
Investigation Person Roles Term Accession Number | String | The accession number from the Term Source associated with the selected term. |
Investigation Person Roles Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Names declared in the Ontology Source Reference section. |
Example
For example, the INVESTIGATION CONTACTS
section of an ISA-XLSX isa.investigation.xlsx
file may look as follows:
INVESTIGATION CONTACTS | |||
Investigation Person Last Name | Venn | Zimmer | Mühlhaus |
Investigation Person First Name | Benedikt | David | Timo |
Investigation Person Mid Initials | |||
Investigation Person Email | [email protected] | [email protected] | [email protected] |
Investigation Person Phone | |||
Investigation Person Fax | |||
Investigation Person Address | TU Kaiserslautern, Kaiserslautern, 67663, Germany | TU Kaiserslautern, Kaiserslautern, 67663, Germany | TU Kaiserslautern, Kaiserslautern, 67663, Germany |
Investigation Person Affiliation | Computational Systems Biology | Computational Systems Biology | Computational Systems Biology |
Investigation Person Roles | author | author | corresponding author |
Investigation Person Roles Term Accession Number | |||
Investigation Person Roles Term Source REF |
This section is organized in several subsections, described in detail below. This section also represents a repeatable block, which is replicated according to the number of Studies to report (i.e. two Studies, two Study blocks are represented in the Investigation file). The subsections in the block are arranged vertically; the intent being to enhance readability and presentation, and possibly to help with parsing. These subsections MUST remain within this repeatable block, although their order MAY vary; the fields MUST remain within their subsection.
These sections implement the metadata for a Study
from the ISA Abstract Model and a list of Assay
(i.e. Study
and
Assay
without graphs; graphs are implemented in ISA-XLSX as Annotation Table sheets
).
This section MUST contain zero or one values.
This section MUST contain the following labels, with the specified datatypes for values supported:
Label | Datatype | Description |
---|---|---|
Study Identifier | String | A unique identifier, either a temporary identifier supplied by users or one generated by a repository or other database. For example, it could be an identifier complying with the LSID specification. |
Study Title | String | A concise phrase used to encapsulate the purpose and goal of the study. |
Study Description | String | A textual description of the study, with components such as objective or goals. |
Study Submission Date | String formatted as ISO8601 date | The date on which the study is submitted to an archive. |
Study Public Release Date | String formatted as ISO8601 date | The date on which the study SHOULD be released publicly. |
Study File Name | String formatted as file name or URI | A field to specify the name of the Study Table file corresponding the definition of that Study. There can be only one file per cell. |
Example
For example, the STUDY
section of an ISA-XLSX isa.investigation.xlsx
file may look as follows:
STUDY | |
Study Identifier | HeatstressExperiment |
Study Title | Systems-wide investigation of responses to moderate and acute high temperatures in the green alga Chlamydomonas reinhardtii. |
Study Description | Algae cultures were grown mixotrophically (TAP). After 24h of 35°C/40°C the cells were shifted back to room temperature for 48h. 'omics samples were taken. |
Study Submission Date | 2022-05-13 |
Study Public Release Date | |
Study File Name | studies/HeatstressExperiment/isa.study.xlsx |
This section MUST contain zero or more values.
This section MUST contain the following labels, with the specified datatypes for values supported:
Label | Datatype | Description |
---|---|---|
Study Design Type | String | A term allowing the classification of the study based on the overall experimental design, e.g cross-over design or parallel group design. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. |
Study Design Type Term Accession Number | String | The accession number from the Term Source associated with the selected term. |
Study Design Type Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Study Design Term Source REF has to match one the Term Source Name declared in the Ontology Source Reference section. |
Example
For example, the STUDY DESIGN DESCRIPTORS
section of an ISA-XLSX isa.investigation.xlsx
file may look as follows:
STUDY DESIGN DESCRIPTORS | ||
Study Design Type | time series design | heat exposure |
Study Design Type Term Accession Number | http://purl.obolibrary.org/obo/OBI_0500020 | http://purl.obolibrary.org/obo/XCO_0000308 |
Study Design Type Term Source REF | OBI |
This section MUST contain zero or more values.
This section MUST contain the following labels, with the specified datatypes for values supported:
Label | Datatype | Description |
---|---|---|
Study PubMed ID | String formatted as valid PubMed ID | The PubMed IDs of the described publication(s) associated with this study. |
Study Publication DOI | String formatted as valid DOI | A Digital Object Identifier (DOI) for that publication (where available). |
Study Publication Author List | String | The list of authors associated with that publication. |
Study Publication Title | String | The title of publication associated with the investigation. |
Study Publication Status | String, or Ontology Annotation by providing accompanying Term Accession Number and Term Source REF | A term describing the status of that publication (i.e. submitted, in preparation, published). |
Study Publication Status Term Accession Number | String or URI | The accession number from the Term Source associated with the selected term. |
Study Publication Status Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one the Term Source Name declared in the in the Ontology Source Reference section. |
Example
For example, the STUDY PUBLICATIONS
section of an ISA-XLSX isa.investigation.xlsx
file may look as follows:
STUDY PUBLICATIONS | |
Study Publication PubMed ID | PMC9106746 |
Study Publication DOI | 10.1038/s42003-022-03359-z |
Study Publication Author List | Ningning Zhang, Erin M. Mattoon, Will McHargue, Benedikt Venn, David Zimmer, Kresti Pecani, Jooyeon Jeong, Cheyenne M. Anderson, Chen Chen, Jeffrey C. Berry, Ming Xia, Shin-Cheng Tzeng, Eric Becker, Leila Pazouki, Bradley Evans, Fred Cross, Jianlin Cheng, Kirk J. Czymmek, Michael Schroda, Timo Mühlhaus & Ru Zhang |
Study Publication Title | Systems-wide analysis revealed shared and unique responses to moderate and acute high temperatures in the green alga Chlamydomonas reinhardtii |
Study Publication Status | published |
Study Publication Status Term Accession Number | http://purl.org/spar/pso/published |
Study Publication Status Term Source REF | PSO |
This section MUST contain zero or more values.
This section MUST contain the following labels, with the specified datatypes for values supported:
Label | Datatype | Description |
---|---|---|
Study Factor Name | String | The name of one factor used in the Study and/or Assay files. A factor corresponds to an independent variable manipulated by the experimentalist with the intention to affect biological systems in a way that can be measured by an assay. The value of a factor is given in the Study or Assay file, accordingly. If both Study and Assay have a Factor Value, these must be different. |
Study Factor Type | String | A term allowing the classification of this factor into categories. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. |
Study Factor Type Term Accession Number | String | The accession number from the Term Source associated with the selected term. |
Study Factor Type Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section. |
Example
For example, the STUDY FACTORS
section of an ISA-XLSX isa.investigation.xlsx
file may look as follows:
STUDY FACTORS | ||
Study Factor Name | temperature | collection time |
Study Factor Type | temperature | time |
Study Factor Type Term Accession Number | http://purl.obolibrary.org/obo/PATO_0000146 | http://purl.obolibrary.org/obo/PATO_0000165 |
Study Factor Type Term Source REF | PATO | PATO |
This section MUST contain zero or more values.
This section MUST contain the following labels, with the specified datatypes for values supported:
Label | Datatype | Description |
---|---|---|
Study Assay Measurement Type | String | A term to qualify the endpoint, or what is being measured (e.g. gene expression profiling or protein identification). The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. |
Study Assay Measurement Type Term Accession Number | String | The accession number from the Term Source associated with the selected term. |
Study Assay Measurement Type Term Source REF | String | The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section. |
Study Assay Technology Type | String | Term to identify the technology used to perform the measurement, e.g. DNA microarray, mass spectrometry. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. |
Study Assay Technology Type Term Accession Number | String | The accession number from the Term Source associated with the selected term. |
Study Assay Technology Type Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Names declared in the Ontology Source Reference section. |
Study Assay Technology Platform | String | Manufacturer and platform name, e.g. Bruker AVANCE |
Study Assay File Name | String | A field to specify the name of the Assay Table file corresponding the definition of that assay. There can be only one file per cell. |
Example
For example, the STUDY ASSAYS
section of an ISA-XLSX isa.investigation.xlsx
file may look as follows:
STUDY ASSAYS | ||
Study Assay File Name | assays/Proteomics/isa.assay.xlsx | assays/Transcriptomics/isa.assay.xlsx |
Study Assay Measurement Type | Proteomics | transcription profiling |
Study Assay Measurement Type Term Accession Number | http://purl.obolibrary.org/obo/NCIT_C20085 | http://purl.obolibrary.org/obo/OBI_0000424 |
Study Assay Measurement Type Term Source REF | NCIT | OBI |
Study Assay Technology Type | Mass Spectrometry | nucleotide sequencing |
Study Assay Technology Type Term Accession Number | http://purl.obolibrary.org/obo/NCIT_C17156 | http://purl.obolibrary.org/obo/OBI_0000626 |
Study Assay Technology Type Term Source REF | NCIT | OBI |
Study Assay Technology Platform | Orbitrap Fusion Lumos | Illumina HiSeq 2000 Rapid Run |
This section MUST contain zero or more values.
This section MUST contain the following labels, with the specified datatypes for values supported:
Label | Datatype | Description |
---|---|---|
Study Protocol Name | String | The name of the protocols used within the ISA-XLSX document. The names are used as identifiers within the ISA-XLSX document and will be referenced in the Study and Assay files in the Protocol REF columns. Names can be either local identifiers, unique within the ISA Archive which contains them, or fully qualified external accession numbers. |
Study Protocol Type | String | Term to classify the protocol. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. |
Study Protocol Type Term Accession Number | String | The accession number from the Term Source associated with the selected term. |
Study Protocol Type Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section. |
Study Protocol Description | String | A free-text description of the protocol. |
Study Protocol URI | String | Pointer to protocol resources external to the ISA-Tab that can be accessed by their Uniform Resource Identifier (URI). |
Study Protocol Version | String | An identifier for the version to ensure protocol tracking. |
Study Protocol Parameters Name | String | A semicolon-delimited (“;”) list of parameter names, used as an identifier within the ISA-XLSX document. These names are used in the Study and Assay files (in the “Parameter Value []” column heading) to list the values used for each protocol parameter. Refer to section Multiple values fields in the Investigation File on how to encode multiple values in one field and match term sources |
Study Protocol Parameters Term Accession Number | String | The accession number from the Term Source associated with the selected term. |
Study Protocol Parameters Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section. |
Study Protocol Components Name | String | A semicolon-delimited (“;”) list of a protocol’s components; e.g. instrument names, software names, and reagents names. Refer to section Multiple values fields in the Investigation File on how to encode multiple components in one field and match term sources. |
Study Protocol Components Type | String | Term to classify the protocol components listed for example, instrument, software, detector or reagent. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. |
Study Protocol Components Type Term Accession Number | String | The accession number from the Source associated to the selected terms. |
Study Protocol Components Type Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match a Term Source Name previously declared in the ontology section |
Example
For example, the STUDY PROTOCOLS
section of an ISA-XLSX isa.investigation.xlsx
file may look as follows:
STUDY PROTOCOLS | |||
Study Protocol Name | Harvesting | Protein extraction | Measurement |
Study Protocol Type | Biospecimen Collection | nucleic acid extraction | nucleic acid extraction |
Study Protocol Type Term Accession Measurement Number | http://purl.obolibrary.org/obo/NCIT_C70945 | ||
Study Protocol Type Term Source REF | NCIT | ||
Study Protocol Description | Extraction and storage of algae cells from photo-bio reactor. Extracted and centrifuged cell pellets were frozen in liquid nitrogen. | Proteins were extracted from cells using a combination of chemical (lysis buffer) and physical (sonicator) methods. Digested peptides were purified and resuspended in LC loading buffer. | Peptides were separated by a nanoHPLC (C18 column) and detected using an Orbitrap mass spectrometry device. |
Study Protocol URI | |||
Study Protocol Version | |||
Study Protocol Parameters Name | Centrifugation Time;sample volume setting | frequency; duration | duration;flow rate |
Study Protocol Parameters Name Term Accession Number | http://purl.obolibrary.org/obo/NCIT_C178881;http://purl.allotrope.org/ontologies/result#AFR_0002492 | http://purl.obolibrary.org/obo/PATO_0000044;http://purl.obolibrary.org/obo/PATO_0001309 | http://purl.obolibrary.org/obo/PATO_0001309;http://purl.obolibrary.org/obo/PATO_0001574 |
Study Protocol Parameters Name Term Source REF | NCIT;AFO | PATO;PATO | PATO;PATO |
Study Protocol Components Name | liquid nitrogen | Sonicator; Extraction Kit | HPLC; Column; MS |
Study Protocol Components Type | Liquid Nitrogen | VWR Aquasonic 250D; IST sample preparation kit (PreOmics GmbH, Germany) | U3000 RSLCnano HPLC; C18 column (Fritted Glass Column, 25 cm × 75 μm); Orbitrap Fusion Lumos |
Study Protocol Components Type Term Accession Number | http://purl.obolibrary.org/obo/NCIT_C68796 | ;;http://purl.obolibrary.org/obo/MS_1002732 | |
Study Protocol Components Type Term Source REF | NCIT | ;;MS |
This section MUST contain zero or more values.
This section MUST contain the following labels, with the specified datatypes for values supported:
Label | Datatype | Description |
---|---|---|
Study Person Last Name | String | The last name of a person associated with the study. |
Study Person First Name | String | Study Person Name |
Study Person Mid Initials | String | The middle initials of a person associated with the study. |
Study Person Email | String formatted as email | The email address of a person associated with the study. |
Study Person Phone | String | The telephone number of a person associated with the study. |
Study Person Fax | String | The fax number of a person associated with the study. |
Study Person Address | String | The address of a person associated with the study. |
Study Person Affiliation | String | The organization affiliation for a person associated with the study. |
Study Person Roles | String or Ontology Annotation if accompanied by Term Accession Numbers and Term Source REFs | Term to classify the role(s) performed by this person in the context of the study, which means that the roles reported here need not correspond to roles held withing their affiliated organization. Multiple annotations or values attached to one person can be provided by using a semicolon (“;”) Unicode (U0003+B) as a separator (e.g.: submitter;funder;sponsor) .The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. |
Study Person Roles Term Accession Number | String | The accession number from the Term Source associated with the selected term. |
Study Person Roles Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Names declared in the Ontology Source Reference section. |
Example
For example, the STUDY CONTACTS
section of an ISA-XLSX isa.investigation.xlsx
file may look as follows:
STUDY CONTACTS | |||
Study Person Last Name | Venn | Zimmer | Mühlhaus |
Study Person First Name | Benedikt | David | Timo |
Study Person Mid Initials | |||
Study Person Email | [email protected] | [email protected] | [email protected] |
Study Person Phone | |||
Study Person Fax | |||
Study Person Address | TU Kaiserslautern, Kaiserslautern, 67663, Germany | TU Kaiserslautern, Kaiserslautern, 67663, Germany | TU Kaiserslautern, Kaiserslautern, 67663, Germany |
Study Person Affiliation | Computational Systems Biology | Computational Systems Biology | Computational Systems Biology |
Study Person Roles | author | author | corresponding author |
Study Person Roles Term Accession Number | |||
Study Person Roles Term Source REF |
This section is organized in several subsections, described in detail below. The subsections in the block are arranged vertically; the intent being to enhance readability and presentation, and possibly to help with parsing. These subsections MUST remain within this block; the fields MUST remain within their subsection.
These sections implement the metadata for an Assay
from the ISA Abstract Model.
This section MUST contain zero or one values.
This section MUST contain the following labels, with the specified datatypes for values supported:
Label | Datatype | Description |
---|---|---|
Assay Measurement Type | String | A term to qualify the endpoint, or what is being measured (e.g. gene expression profiling or protein identification). The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. |
Assay Measurement Type Term Accession Number | String | The accession number from the Term Source associated with the selected term. |
Assay Measurement Type Term Source REF | String | The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section. |
Assay Technology Type | String | Term to identify the technology used to perform the measurement, e.g. DNA microarray, mass spectrometry. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. |
Assay Technology Type Term Accession Number | String | The accession number from the Term Source associated with the selected term. |
Assay Technology Type Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Names declared in the Ontology Source Reference section. |
Assay Technology Platform | String | Manufacturer and platform name, e.g. Bruker AVANCE |
Assay File Name | String | A field to specify the name of the Assay Table file corresponding the definition of that assay. There can be only one file per cell. |
Example
For example, the ASSAY
section of an ISA-XLSX isa.assay.xlsx
file may look as follows:
ASSAY | |
Assay File Name | assays/Proteomics/isa.assay.xlsx |
Assay Measurement Type | Proteomics |
Assay Measurement Type Term Accession Number | http://purl.obolibrary.org/obo/NCIT_C20085 |
Assay Measurement Type Term Source REF | NCIT |
Assay Technology Type | Mass Spectrometry |
Assay Technology Type Term Accession Number | http://purl.obolibrary.org/obo/NCIT_C17156 |
Assay Technology Type Term Source REF | NCIT |
Assay Technology Platform | Orbitrap Fusion Lumos |
This section MUST contain zero or more values.
This section MUST contain the following labels, with the specified datatypes for values supported:
Label | Datatype | Description |
---|---|---|
Assay Person Last Name | String | The last name of a person associated with the Assay. |
Assay Person First Name | String | Assay Person Name |
Assay Person Mid Initials | String | The middle initials of a person associated with the Assay. |
Assay Person Email | String formatted as email | The email address of a person associated with the Assay. |
Assay Person Phone | String | The telephone number of a person associated with the Assay. |
Assay Person Fax | String | The fax number of a person associated with the assay. |
Assay Person Address | String | The address of a person associated with the assay. |
Assay Person Affiliation | String | The organization affiliation for a person associated with the assay. |
Assay Person Roles | String or Ontology Annotation if accompanied by Term Accession Numbers and Term Source REFs | Term to classify the role(s) performed by this person in the context of the assay, which means that the roles reported here need not correspond to roles held withing their affiliated organization. Multiple annotations or values attached to one person can be provided by using a semicolon (“;”) Unicode (U0003+B) as a separator (e.g.: submitter;funder;sponsor) .The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. |
Assay Person Roles Term Accession Number | String | The accession number from the Term Source associated with the selected term. |
Assay Person Roles Term Source REF | String | Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Names declared in the Ontology Source Reference section. |
Example
For example, the ASSAY PERFORMERS
section of an ISA-XLSX isa.assay.xlsx
file may look as follows:
ASSAY PERFORMERS | |||
Assay Person Last Name | Zhang | Tzeng | Evans |
Assay Person First Name | Ningning | Shin-Cheng | Bradley |
Assay Person Mid Initials | |||
Assay Person Email | |||
Assay Person Phone | |||
Assay Person Fax | |||
Assay Person Address | St. Louis, Missouri 63132, USA | St. Louis, Missouri 63132, USA | St. Louis, Missouri 63132, USA |
Assay Person Affiliation | Donald Danforth Plant Science Center | Donald Danforth Plant Science Center | Donald Danforth Plant Science Center |
Assay Person Roles | Investigator | Laboratory Technologist | Laboratory Technologist |
Assay Person Roles Term Accession Number | http://purl.obolibrary.org/obo/NCIT_C25936 | http://purl.obolibrary.org/obo/NCIT_C51830 | http://purl.obolibrary.org/obo/NCIT_C51830 |
Assay Person Roles Term Source REF | NCIT | NCIT | NCIT |
Annotation Table sheets
are used to describe the experimental flow in detailed, machine readable way. In each sheet, there is a mapping from input entities to output entities, placed in the Input
and Output
columns, accordingly. The other columns then are used to either describe those entities or the processes that led to this mapping.
In the Annotation Table sheets
, column headers MUST have the first letter of each word in upper case, with the exception of the referencing label (REF).
The content of the annotation table MUST be placed in an xlsx table
whose name starts with annotationTable
. Each sheet MUST contain at most one such annotation table. Only cells inside this table are considered as part of the formatted metadata.
Annotation Table sheets
are structured with fields organized on a per-row basis. The first row MUST be used for column headers. Each body row is an implementation of a Process
node.
Each annotation table sheet MUST contain at most one Input
and at most one Output
column, which denote the Input and Output node of the Process
node respectively. They MUST be formatted in the pattern Input [<NodeType>]
and Output [<NodeType>]
.
NodeTypes
MUST be one of the following:
-
A
Source
MUST be indicated with the node typeSource Name
.Sources
MUST not be used asOutput
nodes. -
A
Sample
MUST be indicated with the node typeSample Name
. -
An
Extract Material
MUST be indicated with the node typeMaterial Name
. -
A
Data
object MUST be indicated with the node typeData
.
Source Names
, Sample Names
, Material Names
MUST be unique across an ARC. If two of these entities with the same name exist in the same ARC, they are considered the same entity.
The Data
node type MUST correspond to a relevant data resource location, following the Data Path Annotation patterns. If the annotation of the Data
node refers not to the complete resource, but a part of it, a Selector
MAY be added. This Selector MUST be separated from the resource location using a #
— with no whitespace between: location#selector
. If appropriate, the Selector SHOULD be formatted according to IRI fragment selectors specified by W3.
The format of the data resource MAY be further qualified using a Data Format
column. The Data Format
SHOULD be expressed using a MIME format, most commonly consisting of two parts: a type and a subtype, separated by a slash (/) — with no whitespace between: type/subtype
. If appropriate, a format from the list composed by IANA
SHOULD be picked. Unregistered or niche encoding and file formats MAY be indicated instead via the most appropriate URL.
The format and usage info about the Selector MAY be further qualified using a Data Selector Format
column. The Data Selector Format
SHOULD point to a web resource containing instructions about how the Selector is formatted and how it should be interpreted.
In this example, there is a measurement of two Samples
, namely input1
and input2
. The values measured are both written into the same data resource in the location result.csv
, whichs formatting is tabular, according to the Data Format
being text/csv
. To distinguish between the measurement values stemming from the different inputs, selectors were added to the resource location (seperated by a #
), namely col=1
and col=2
. The specification about the formatting of these selectors can be found in the provided link, namely https://datatracker.ietf.org/
.
Input [Sample Name] | Output [Data] | Data Format | Data Selector Format |
---|---|---|---|
input1 | result.csv#col=1 | text/csv | https://datatracker.ietf.org/doc/html/rfc7111 |
input2 | result.csv#col=2 | text/csv | https://datatracker.ietf.org/doc/html/rfc7111 |
Protocol REF
columns MAY be used to specify the name of the Protocol
node implemented by the Process
node. Per Annotation Table sheet there MUST be at most one Protocol REF
column. The value MUST be free text.
Protocol Version
columns MAY be used to specify the version of the Protocol
node implemented by the Process
node. Per Annotation Table sheet there MUST be at most one Protocol Version
column. The value MUST be free text.
Protocol Description
columns MAY be used to specify the description of the Protocol
node implemented by the Process
node. Per Annotation Table sheet there MUST be at most one Protocol Description
column. The value MUST be free text.
Protocol Uri
columns MAY be used to specify the uri of the Protocol
node implemented by the Process
node. Per Annotation Table sheet there MUST be at most one Protocol Uri
column. The value MUST be either a URI or a file path corresponding to a relevant protocol file location.
Protocol Type
columns MAY be used to specify the type of the Protocol
node implemented by the Process
node. Per Annotation Table sheet there MUST be at most one Protocol Type
column. The value MUST be free text, or an Ontology Annotation
.
Where a value is an Ontology Annotation
in an annotation table, Term Accession Number
and Term Source REF
columns MUST follow the main column.
An Ontology Annotation
MAY be applied to any appropriate Characteristic
, Parameter
, Factor
, Component
or Protocol Type
.
This implements Ontology Annotation
from the ISA Abstract Model.
The header of the main column MUST contain the structural column type followed by the name
of the ontology term in []
brackets.
There SHOULD be a space
between the column type and the [
bracket.
The headers of the two annotation columns SHOULD contain further ontological information about the ontology term of the main header.
In this case, following the static header string, separated by a single space, there MUST be a short ontology term identifier formatted as CURIEs (prefixed identifiers) of the form <IDSPACE>:<LOCALID>
(specified here) inside ()
brackets.
In the other case, i.e. when the annotation columns do not contain further ontological information, the static header strings MUST be either followed by a single space and empty ()
brackets or nothing.
The value in the main column MUST contain the name of the ontology term.
The value in the Term Source REF
column MUST either contain a short identifier for the IDSPACE
, which identifies the ontology containing the term, or be left empty.
The value in the Term Accession Number
column MUST either contain a value formatted in one of the following formats, or be left empty:
LOCALID
of the ontology, which is only applicable if the matchingIDSPACE
is given in theTerm Source REF
column- short ontology term identifier formatted as CURIEs (prefixed identifiers) of the form
<IDSPACE>:<LOCALID>
(specified here) URL
pointing to the ontology term
For example, a characteristic type organism
with a value of Homo sapiens
can be qualified with an Ontology Annotation
of a term from NCBI Taxonomy as follows:
Characteristic [organism] | Term Source REF (OBI:0100026) | Term Accession Number (OBI:0100026) |
---|---|---|
Homo sapiens | NCBITaxon | http://…/NCBITAXON_9606 |
Note
In this example, the value in the Term Accession Number
column is formatted as a URL
, but shortened for the purpose of markdown-formatting.
Where a value is numeric, a Unit
MAY be used to qualify the quantity.
In this case, the main column must be followed by a Unit
column, which in turn SHOULD be further annotated as an Ontology Annotation
, being followed by Term Accession Number
and Term Source REF
columns.
- The headers of the annotation columns then refer to the header of the main column.
- The values of the annotation columns then refer to the unit, and not to the numeric value of the main column.
For example, in the following, the header ontology temperature
is further qualified with the CURIE PATO:0000146
.
The value 300
is qualified with a Unit
Kelvin
, which is further qualified as an Ontology Annotation
from the Units Ontology declared in the Ontology Sources with UO
:
Parameter [temperature] | Unit | Term Source REF (PATO:0000146) | Term Accession Number (PATO:0000146) |
---|---|---|---|
300 | Kelvin | UO | http://…/obo/UO_0000012 |
Note
In this example, the value in the Term Accession Number
column is formatted as a URL
, but shortened for the purpose of markdown-formatting.
A Characteristic
is used as an attribute column following Sources
and Samples
. This column contains terms describing each material according to the characteristics category indicated in the column header in the pattern Characteristic [<category term>]
.
For example, a column header Characteristic [organ part]
would contain terms describing an organ part. Characteristic
SHOULD be used as an attribute column following Input [Source Name]
, or Input [Sample Name]
. The value MUST be free text, numeric, or an Ontology Annotation
.
For example, a characteristic type Organism with a value of Homo sapiens can be qualified with an Ontology Annotation
of a term from NCBI Taxonomy as follows:
Characteristic [organ part] | Term Source REF (UBERON:0000064) | Term Accession Number (UBERON:0000064) |
---|---|---|
Liver | MeSH | D008099 |
Note
In this example, the value in the Term Accession Number
column is formatted as a LOCALID
. The associated IDSPACE
to identify the ontology term is given in the Term Source REF
column.
A Factor
is an independent variable manipulated by an experimentalist with the intention to affect biological systems in a way that can be measured by an assay. This field holds the actual data for the Factor
named between the square brackets (as declared in the Study Factors
section of a top-level metadata sheet) so MUST match, for example, Factor [compound]
. The value MUST be free text, numeric, or an Ontology Annotation
.
Factor [Gender] | Term Source REF (NCIT:C17357) | Term Accession Number (NCIT:C17357) |
---|---|---|
Male | MeSH | D008297 |
Note
In this example, the value in the Term Accession Number
column is formatted as a LOCALID
. The associated IDSPACE
to identify the ontology term is given in the Term Source REF
column.
A Component
is a consumable or reusable physical entity used in the experimental workflow. It is formatted in the pattern Component [<category term>]
. The value MUST be free text, numeric, or an Ontology Annotation
.
Component [Measurement Device] | Term Source REF (NCIT:C81182) | Term Accession Number (NCIT:C81182) |
---|---|---|
Illumina MiniSeq | OBI | http://…/obo/OBI_0003114 |
Note
In this example, the value in the Term Accession Number
column is formatted as a URL
, but shortened for the purpose of markdown-formatting.
A Parameter
can be used to specify any additional information about the experimental setup, that does not fall under the aforementioned 3 categories. It is formatted in the pattern Parameter [<category term>]
. The value MUST be free text, numeric, or an Ontology Annotation
.
Parameter [temperature] | Unit | Term Source REF (NCRO:0000029) | Term Accession Number (NCRO:0000029) |
---|---|---|---|
300 | Kelvin | UO | http://…/obo/UO_0000032 |
Note
In this example, the value in the Term Accession Number
column is formatted as a URL
, but shortened for the purpose of markdown-formatting.
A Comment
can be used to provide some additional information. Columns headed with Comment[<comment name>]
MAY appear anywhere in the Annotation Table. The comment always refers to the Annotation Table. The value MUST be free text.
Comment [Answer to everything] |
---|
forty-two |
Columns whose headers do not follow any of the formats described above are considered additional payload and are out of the scope of this specification.
For example, a simple source to sample may be represented as:
Input [Source Name] | Protocol REF | Output [Sample Name] |
---|---|---|
source1 | sample collection | sample1 |
Where a graph splits or pools, we use the Input or Output column to represent the same nodes.
For example, if we split a source into two samples, we might represent this as:
Input [Source Name] | Protocol REF | Output [Sample Name] |
---|---|---|
source1 | sample collection | sample1 |
source1 | sample collection | sample2 |
If we pool two sources into a single sample, we might represent this as:
Input [Source Name] | Protocol REF | Output [Sample Name] |
---|---|---|
source1 | sample collection | sample1 |
source2 | sample collection | sample1 |
Datamap Table sheets
are used to describe the contents of data files.
In the Datamap Table sheets
, column headers MUST have the first letter of each word in upper case, with the exception of the referencing label (REF).
The content of the datamap table MUST be placed in an xlsx table
whose name equals datamapTable
. Each sheet MUST contain at most one such datamap table. Only cells inside this table are considered as part of the formatted metadata.
Datamap Table sheets
are structured with fields organized on a per-row basis. The first row MUST be used for column headers. Each body row is an implementation of a data
node.
Every Datamap Table sheet
MUST contain a Data
column. Every object in this column MUST correspond to a relevant data resource location, following the Data Path Annotation patterns. If the annotation of the Data
node refers not to the complete resource, but a part of it, a Selector
MAY be added. This Selector MUST be separated from the resource location using a #
— with no whitespace between: location#selector
. If appropriate, the Selector SHOULD be formatted according to IRI fragment selectors specified by W3.
The format of the data resource MAY be further qualified using a Data Format
column. The Data Format
SHOULD be expressed using a MIME format, most commonly consisting of two parts: a type and a subtype, separated by a slash (/) — with no whitespace between: type/subtype
. If appropriate, a format from the list composed by IANA
SHOULD be picked. Unregistered or niche encoding and file formats MAY be indicated instead via the most appropriate URL.
The format and usage info about the Selector MAY be further qualified using a Data Selector Format
column. The Data Selector Format
SHOULD point to a web resource containing instructions about how the Selector is formatted and how it should be interpreted.
Every Datamap Table sheet
SHOULD contain an Explication
column. The Explication
adds explicit meaning to the data node. The value MUST be free text, or an Ontology Annotation
.
Explication | Term Source REF | Term Accession Number |
---|---|---|
average value | OBI | http://…/obo/OBI_0000679 |
Every Datamap Table sheet
SHOULD contain an Unit
column. The Unit
adds a unit of measurement to the data node. The value MUST be free text, or an Ontology Annotation
.
Unit | Term Source REF | Term Accession Number |
---|---|---|
milligram per milliliter | UO | http://…/obo/UO_0000176 |
Note
In this example, the value in the Term Accession Number
column is formatted as a URL
, but shortened for the purpose of markdown-formatting.
Every Datamap Table sheet
SHOULD contain an Object Type
column. The Object Type
defines the shape or format in which the data node is represented. The value MUST be free text, or an Ontology Annotation
.
Object Type | Term Source REF | Term Accession Number |
---|---|---|
Float | NCIT | http://…/obo/NCIT_C48150 |
Note
In this example, the value in the Term Accession Number
column is formatted as a URL
, but shortened for the purpose of markdown-formatting.
Every Datamap Table sheet
SHOULD contain a Description
column. The Description
gives additional, humand readable context about the data node. The value MUST be free text.
Description |
---|
The average protein concentration for the given gene |
Every Datamap Table sheet
SHOULD contain a Generated By
column. The Generated By
names the tool which led to the creation of the data node. The value MUST be free text.
If possible, the value in this column MUST correspond to a relevant data resource location, following the Data Path Annotation patterns.
Generated By |
---|
GeneStatisticsTool.exe |
A Comment
can be used to provide some additional information. Columns headed with Comment[<comment name>]
MAY appear anywhere in the Annotation Table. The comment always refers to the Annotation Table. The value MUST be free text.
Comment [Answer to everything] |
---|
forty-two |
For example, a simple datamap
table representing a tabular datafile might look as follows:
Data | Explication | Term Source REF | Term Accession Number | Unit | Term Source REF | Term Accession Number | Object Type | Term Source REF | Term Accession Number | Description | GeneratedBy |
---|---|---|---|---|---|---|---|---|---|---|---|
MyData.csv#col=1 | Gene Identifier | NCIT | http://…/obo/NCIT_C48664 | String | NCIT | http://…/obo/NCIT_C45253 | Short hand identifier of the gene coding for the protein. | GeneStatisticsTool.exe | |||
MyData.csv#col=2 | average value | OBI | http://…/obo/OBI_0000679 | milligram per milliliter | UO | http://…/obo/UO_0000176 | Float | NCIT | http://…/obo/NCIT_C48150 | The average protein concentration for the given gene | GeneStatisticsTool.exe |
MyData.csv#col=3 | p-value | OBI | http://…/obo/OBI_0000175 | Float | NCIT | http://…/obo/NCIT_C48150 | p-value of t-test against control. | GeneStatisticsTool.exe |
In this example, the datamap
table describes a single data file named MyData.csv
. This file contains three columns. The first column contains gene identifiers, the other two results of a statistical analysis performed by the tool GeneStatisticsTool.exe.