Skip to content

Latest commit

 

History

History
922 lines (633 loc) · 89.9 KB

ISA-XLSX.md

File metadata and controls

922 lines (633 loc) · 89.9 KB

ISA-XLSX format

For detail on ISA framework terminology, please read the ISA Abstract Model specification.

This document describes the ISA Abstract Model reference implementation specified in the ISA-XLSX format. The XLSX format uses the SpreadsheetML markup language and schema to represent a spreadsheet document. Conceptually, using the terminology of the Spreadsheet ML specification ISO/IEC 29500-1, the document comprises one or more worksheets in a workbook.

Table of contents

Below we provide the schemas and the content rules for valid ISA-XLSX documents.

ISA-XLSX uses three types of files to capture the experimental metadata:

  • Investigation file
  • Study file
  • Assay file

The Investigation file contains all the information needed to understand the overall goals and means used in an experiment; experimental steps (or sequences of events) are described in the Study and in the Assay file(s). For each Investigation file there may be one or more Studies defined with a corresponding Study file; for each Study there may be one or more Assays defined with corresponding Assay files; one assay file may be registered in different studies.

In order to facilitate identification of ISA-XLSX component files, specific naming patterns MUST be followed:

Sheets described in this specification MUST follow one of the two given formats:

Sheets which do not follow any of these two formats are considered additional payload and are ignored in this specification.

All labels are case-sensitive:

Dates SHOULD be supplied in the ISO8601 format.

For maximal portability file names SHOULD contain only ASCII characters not excluded already (that is A-Za-z0-9._!#$%&+,;=@^(){}'[] - we exclude space as many utilities do not accept spaces in file paths): non-English alphabetic characters cannot be guaranteed to be supported in all locales. It is recommended to avoid the shell metacharacters (){}'[]$.".

Investigation File

The Investigation file fulfils four needs:

  1. to declare key entities, such as factors, protocols, which may be referenced in the other files
  2. to track provenance of the used terminologies (controlled vocabularies or ontologies), where applicable
  3. to relate Assay files to Studies
  4. to select those Studies, that are considered part of the investigation.

The Investigation File MUST contain one Top-Level Metadata sheet. This sheet MUST be named isa_investigation and MUST contain the following sections:

Additionally, it MAY contain the following sections:

The Investigation File implements the Investigation graph from the ISA Abstract Model.

Study File

The Study represents a set of logically connected experiments. A Study File contains contextualising information for one or more Assays, metadata about the study design, study factors used, and study protocols, as well as information similarly to the Investigation including title and description of the study, and related people and scholarly publications, but also details the sample collection process needed to perform the connected Assays.

The Study File MUST contain one Top-Level Metadata sheet. This sheet MUST be named isa_study and MUST contain the following sections:

Additionally, it MAY contain the following sections:

Additionally, the Study File SHOULD contain one or more Annotation Table sheet(s), which MAY record provenance of biological samples, from source material through a collection process to sample material.

Therefore, the main entities of the Study File should be Sources and Samples.

Any Study MAY contain datamap references as described in the Datamap Sheet section.

The Study File implements the Study graph from the ISA Abstract Model. graph from the ISA Abstract Model.

Assay File

The Assay represents one experimental measurement. An Assay File metadata about the assay design, information about the people performing the experiment, and most importantly, details about the preparation and/or execution of the experimental measurement.

The Assay File MUST contain one Top-Level Metadata sheet. This sheet MUST be named isa_assay and MUST contain the following sections:

Additionally, the Assay File SHOULD contain one or more Annotation Table sheet(s), which MAY record preparation of biological samples, measurement of these samples and basic computations performed on the resulting data.

Therefore, the main entities of the Assay File should be Samples and Data.

Any Assay MAY contain datamap references as described in the Datamap Sheet section.

The Assay File implements the Assay graph from the ISA Abstract Model.

Datamap File

The Datamap represents a set of explanations about the data entities defined in assays and studies.

The Datamap File MUST contain one Datamap table sheet. This sheet MUST be named isa_datamap.

Therefore, the main entities of the Datamap File should be Data.

The Datamap File acts as an extension of the data nodes defined in the Study and Assay graphs section from the ISA Abstract Model.

Top-level metadata sheets

The purpose of top-level metadata sheets is aggregating and listing top-level metadata. Each sheet consists of sections consisting of a section header and key-value fields. Section headers MUST be completely written in upper case (e.g. STUDY), field headers MUST have the first letter of each word in upper case (e.g. Study Identifier); with the exception of the referencing label (REF).

In the following sections, examples of each section block are given beside the specification of each section.

ATTENTION

Rows in which the first character in the first column is Unicode U+0023 (the # character) > MUST be interpreted as comments, where reference implementation parsers SHOULD ignore those lines entirely.

Rows where the label Comment[<comment name>] appear can also appear within any of the > section blocks. Where these appear, the comment name must be unique within the context of a single block (e.g. you cannot have multiple occurrences of Comment[external DB REF] within STUDY ASSAYS. Also, the value cells MUST match the number of values indicated by the rest of the section in context.

Ontology Source Reference section

The Ontology Source section of the Investigation file is used to declare Ontology Sources used elsewhere in the ISA-XLSX files within the context of an Investigation.

Where a row labelled with Term Source REF suffixed in a Top-level metadata sheet, the value of the cell SHOULD match one of the Term Source Name value declared in this section.

Where a column labelled with Term Source REF in a Annotation table sheet, the value of the cell SHOULD match one of the Term Source Name value declared in this section.

This section implements a list of Ontology Source from the ISA Abstract Model.

This section MUST contain zero or more values.

ONTOLOGY SOURCE REFERENCE

This section MUST contain the following labels, with the specified datatypes for values supported:

Label Datatype Description
Term Source Name String The name of the source of a term; i.e. the source controlled vocabulary or ontology. These names will be used in all corresponding Term Source REF fields that occur elsewhere.
Term Source File String (file name or URI) A file name or a URI of an official resource.
Term Source Version String The version number of the Term Source to support terms tracking.
Term Source Description String Use for disambiguating resources when homologous prefixes have been used.

Example

For example, the ONTOLOGY SOURCE REFERENCE section of an ISA-XLSX isa.investigation.xlsx file may look as follows:

ONTOLOGY SOURCE REFERENCE
Term Source Name CHEBI EFO OBI NCBITAXON PATO
Term Source File http://data.bioontology.org/ontologies/CHEBI http://data.bioontology.org/ontologies/EFO http://data.bioontology.org/ontologies/OBI http://data.bioontology.org/ontologies/NCBITAXON http://data.bioontology.org/ontologies/PATO
Term Source Version 78 111 21 2 160
Term Source Description Chemical Entities of Biological Interest Ontology Experimental Factor Ontology Ontology for Biomedical Investigations National Center for Biotechnology Information (NCBI) Organismal Classification Phenotypic Quality Ontology

INVESTIGATION section

This section is organized in several subsections, described in detail below.

This section implements an Investigation from the ISA Abstract Model.

INVESTIGATION

This section MUST contain zero or one values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label Datatype Description
Investigation Identifier String A identifier or an accession number provided by a repository. This SHOULD be locally unique.
Investigation Title String A concise name given to the investigation.
Investigation Description String A textual description of the investigation.
Investigation Submission Date String formatted as ISO8601 date YYYY-MM-DD The date on which the investigation was reported to the repository.
Investigation Public Release Date String formatted as ISO8601 date YYYY-MM-DD The date on which the investigation was released publicly.

Example

For example, the INVESTIGATION section of an ISA-XLSX isa.investigation.xlsx file may look as follows:

INVESTIGATION
Investigation Identifier ChlamyHeatstress
Investigation Title Systems-wide investigation of responses to moderate and acute high temperatures in the green alga Chlamydomonas reinhardtii.
Investigation Description Algae cultures were grown mixotrophically (TAP). After 24h of 35°C/40°C the cells were shifted back to room temperature for 48h. 'omics samples were taken.
Investigation Submission Date 2022-05-13
Investigation Public Release Date

INVESTIGATION PUBLICATIONS

This section MUST contain zero or more values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label Datatype Description
Investigation Publication PubMed ID String formatted as valid PubMed ID The PubMed IDs of the described publication(s) associated with this investigation.
Investigation Publication DOI String formatted as valid DOI A Digital Object Identifier (DOI) for that publication (where available).
Investigation Publication Author List String The list of authors associated with that publication.
Investigation Publication Title String The title of publication associated with the investigation.
Investigation Publication Status String, or Ontology Annotation by providing accompanying Term Accession Number and Term Source REF A term describing the status of that publication (i.e. submitted, in preparation, published).
Investigation Publication Status Term Accession Number String or URI The accession number from the Term Source associated with the selected term.
Investigation Publication Status Term Source REF String Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one the Term Source Name declared in the in the Ontology Source Reference section.

Example

For example, the INVESTIGATION PUBLICATIONS section of an ISA-XLSX isa.investigation.xlsx file may look as follows:

INVESTIGATION PUBLICATIONS
Investigation Publication PubMed ID PMC9106746
Investigation Publication DOI 10.1038/s42003-022-03359-z
Investigation Publication Author List Ningning Zhang, Erin M. Mattoon, Will McHargue, Benedikt Venn, David Zimmer, Kresti Pecani, Jooyeon Jeong, Cheyenne M. Anderson, Chen Chen, Jeffrey C. Berry, Ming Xia, Shin-Cheng Tzeng, Eric Becker, Leila Pazouki, Bradley Evans, Fred Cross, Jianlin Cheng, Kirk J. Czymmek, Michael Schroda, Timo Mühlhaus & Ru Zhang
Investigation Publication Title Systems-wide analysis revealed shared and unique responses to moderate and acute high temperatures in the green alga Chlamydomonas reinhardtii
Investigation Publication Status published
Investigation Publication Status Term Accession Number http://purl.org/spar/pso/published
Investigation Publication Status Term Source REF PSO

INVESTIGATION CONTACTS

This section MUST contain zero or more values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label Datatype Description
Investigation Person Last Name String The last name of a person associated with the investigation.
Investigation Person First Name String Investigation Person Name
Investigation Person Mid Initials String The middle initials of a person associated with the investigation.
Investigation Person Email String formatted as email The email address of a person associated with the investigation.
Investigation Person Phone String The telephone number of a person associated with the investigation.
Investigation Person Fax String The fax number of a person associated with the investigation.
Investigation Person Address String The address of a person associated with the investigation.
Investigation Person Affiliation String The organization affiliation for a person associated with the investigation.
Investigation Person Roles String or Ontology Annotation if accompanied by Term Accession Numbers and Term Source REFs Term to classify the role(s) performed by this person in the context of the investigation, which means that the roles reported here need not correspond to roles held withing their affiliated organization. Multiple annotations or values attached to one person can be provided by using a semicolon (“;”) Unicode (U0003+B) as a separator (e.g.: submitter;funder;sponsor) .The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Investigation Person Roles Term Accession Number String The accession number from the Term Source associated with the selected term.
Investigation Person Roles Term Source REF String Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Names declared in the Ontology Source Reference section.

Example

For example, the INVESTIGATION CONTACTS section of an ISA-XLSX isa.investigation.xlsx file may look as follows:

INVESTIGATION CONTACTS
Investigation Person Last Name Venn Zimmer Mühlhaus
Investigation Person First Name Benedikt David Timo
Investigation Person Mid Initials
Investigation Person Email [email protected] [email protected] [email protected]
Investigation Person Phone
Investigation Person Fax
Investigation Person Address TU Kaiserslautern, Kaiserslautern, 67663, Germany TU Kaiserslautern, Kaiserslautern, 67663, Germany TU Kaiserslautern, Kaiserslautern, 67663, Germany
Investigation Person Affiliation Computational Systems Biology Computational Systems Biology Computational Systems Biology
Investigation Person Roles author author corresponding author
Investigation Person Roles Term Accession Number
Investigation Person Roles Term Source REF

STUDY section

This section is organized in several subsections, described in detail below. This section also represents a repeatable block, which is replicated according to the number of Studies to report (i.e. two Studies, two Study blocks are represented in the Investigation file). The subsections in the block are arranged vertically; the intent being to enhance readability and presentation, and possibly to help with parsing. These subsections MUST remain within this repeatable block, although their order MAY vary; the fields MUST remain within their subsection.

These sections implement the metadata for a Study from the ISA Abstract Model and a list of Assay (i.e. Study and Assay without graphs; graphs are implemented in ISA-XLSX as Annotation Table sheets).

STUDY

This section MUST contain zero or one values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label Datatype Description
Study Identifier String A unique identifier, either a temporary identifier supplied by users or one generated by a repository or other database. For example, it could be an identifier complying with the LSID specification.
Study Title String A concise phrase used to encapsulate the purpose and goal of the study.
Study Description String A textual description of the study, with components such as objective or goals.
Study Submission Date String formatted as ISO8601 date The date on which the study is submitted to an archive.
Study Public Release Date String formatted as ISO8601 date The date on which the study SHOULD be released publicly.
Study File Name String formatted as file name or URI A field to specify the name of the Study Table file corresponding the definition of that Study. There can be only one file per cell.

Example

For example, the STUDY section of an ISA-XLSX isa.investigation.xlsx file may look as follows:

STUDY
Study Identifier HeatstressExperiment
Study Title Systems-wide investigation of responses to moderate and acute high temperatures in the green alga Chlamydomonas reinhardtii.
Study Description Algae cultures were grown mixotrophically (TAP). After 24h of 35°C/40°C the cells were shifted back to room temperature for 48h. 'omics samples were taken.
Study Submission Date 2022-05-13
Study Public Release Date
Study File Name studies/HeatstressExperiment/isa.study.xlsx

STUDY DESIGN DESCRIPTORS

This section MUST contain zero or more values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label Datatype Description
Study Design Type String A term allowing the classification of the study based on the overall experimental design, e.g cross-over design or parallel group design. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Study Design Type Term Accession Number String The accession number from the Term Source associated with the selected term.
Study Design Type Term Source REF String Identifies the controlled vocabulary or ontology that this term comes from. The Study Design Term Source REF has to match one the Term Source Name declared in the Ontology Source Reference section.

Example

For example, the STUDY DESIGN DESCRIPTORS section of an ISA-XLSX isa.investigation.xlsx file may look as follows:

STUDY DESIGN DESCRIPTORS
Study Design Type time series design heat exposure
Study Design Type Term Accession Number http://purl.obolibrary.org/obo/OBI_0500020 http://purl.obolibrary.org/obo/XCO_0000308
Study Design Type Term Source REF OBI

STUDY PUBLICATIONS

This section MUST contain zero or more values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label Datatype Description
Study PubMed ID String formatted as valid PubMed ID The PubMed IDs of the described publication(s) associated with this study.
Study Publication DOI String formatted as valid DOI A Digital Object Identifier (DOI) for that publication (where available).
Study Publication Author List String The list of authors associated with that publication.
Study Publication Title String The title of publication associated with the investigation.
Study Publication Status String, or Ontology Annotation by providing accompanying Term Accession Number and Term Source REF A term describing the status of that publication (i.e. submitted, in preparation, published).
Study Publication Status Term Accession Number String or URI The accession number from the Term Source associated with the selected term.
Study Publication Status Term Source REF String Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one the Term Source Name declared in the in the Ontology Source Reference section.

Example

For example, the STUDY PUBLICATIONS section of an ISA-XLSX isa.investigation.xlsx file may look as follows:

STUDY PUBLICATIONS
Study Publication PubMed ID PMC9106746
Study Publication DOI 10.1038/s42003-022-03359-z
Study Publication Author List Ningning Zhang, Erin M. Mattoon, Will McHargue, Benedikt Venn, David Zimmer, Kresti Pecani, Jooyeon Jeong, Cheyenne M. Anderson, Chen Chen, Jeffrey C. Berry, Ming Xia, Shin-Cheng Tzeng, Eric Becker, Leila Pazouki, Bradley Evans, Fred Cross, Jianlin Cheng, Kirk J. Czymmek, Michael Schroda, Timo Mühlhaus & Ru Zhang
Study Publication Title Systems-wide analysis revealed shared and unique responses to moderate and acute high temperatures in the green alga Chlamydomonas reinhardtii
Study Publication Status published
Study Publication Status Term Accession Number http://purl.org/spar/pso/published
Study Publication Status Term Source REF PSO

STUDY FACTORS

This section MUST contain zero or more values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label Datatype Description
Study Factor Name String The name of one factor used in the Study and/or Assay files. A factor corresponds to an independent variable manipulated by the experimentalist with the intention to affect biological systems in a way that can be measured by an assay. The value of a factor is given in the Study or Assay file, accordingly. If both Study and Assay have a Factor Value, these must be different.
Study Factor Type String A term allowing the classification of this factor into categories. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Study Factor Type Term Accession Number String The accession number from the Term Source associated with the selected term.
Study Factor Type Term Source REF String Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section.

Example

For example, the STUDY FACTORS section of an ISA-XLSX isa.investigation.xlsx file may look as follows:

STUDY FACTORS
Study Factor Name temperature collection time
Study Factor Type temperature time
Study Factor Type Term Accession Number http://purl.obolibrary.org/obo/PATO_0000146 http://purl.obolibrary.org/obo/PATO_0000165
Study Factor Type Term Source REF PATO PATO

STUDY ASSAYS

This section MUST contain zero or more values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label Datatype Description
Study Assay Measurement Type String A term to qualify the endpoint, or what is being measured (e.g. gene expression profiling or protein identification). The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Study Assay Measurement Type Term Accession Number String The accession number from the Term Source associated with the selected term.
Study Assay Measurement Type Term Source REF String The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section.
Study Assay Technology Type String Term to identify the technology used to perform the measurement, e.g. DNA microarray, mass spectrometry. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Study Assay Technology Type Term Accession Number String The accession number from the Term Source associated with the selected term.
Study Assay Technology Type Term Source REF String Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Names declared in the Ontology Source Reference section.
Study Assay Technology Platform String Manufacturer and platform name, e.g. Bruker AVANCE
Study Assay File Name String A field to specify the name of the Assay Table file corresponding the definition of that assay. There can be only one file per cell.

Example

For example, the STUDY ASSAYS section of an ISA-XLSX isa.investigation.xlsx file may look as follows:

STUDY ASSAYS
Study Assay File Name assays/Proteomics/isa.assay.xlsx assays/Transcriptomics/isa.assay.xlsx
Study Assay Measurement Type Proteomics transcription profiling
Study Assay Measurement Type Term Accession Number http://purl.obolibrary.org/obo/NCIT_C20085 http://purl.obolibrary.org/obo/OBI_0000424
Study Assay Measurement Type Term Source REF NCIT OBI
Study Assay Technology Type Mass Spectrometry nucleotide sequencing
Study Assay Technology Type Term Accession Number http://purl.obolibrary.org/obo/NCIT_C17156 http://purl.obolibrary.org/obo/OBI_0000626
Study Assay Technology Type Term Source REF NCIT OBI
Study Assay Technology Platform Orbitrap Fusion Lumos Illumina HiSeq 2000 Rapid Run

STUDY PROTOCOLS

This section MUST contain zero or more values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label Datatype Description
Study Protocol Name String The name of the protocols used within the ISA-XLSX document. The names are used as identifiers within the ISA-XLSX document and will be referenced in the Study and Assay files in the Protocol REF columns. Names can be either local identifiers, unique within the ISA Archive which contains them, or fully qualified external accession numbers.
Study Protocol Type String Term to classify the protocol. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Study Protocol Type Term Accession Number String The accession number from the Term Source associated with the selected term.
Study Protocol Type Term Source REF String Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section.
Study Protocol Description String A free-text description of the protocol.
Study Protocol URI String Pointer to protocol resources external to the ISA-Tab that can be accessed by their Uniform Resource Identifier (URI).
Study Protocol Version String An identifier for the version to ensure protocol tracking.
Study Protocol Parameters Name String A semicolon-delimited (“;”) list of parameter names, used as an identifier within the ISA-XLSX document. These names are used in the Study and Assay files (in the “Parameter Value []” column heading) to list the values used for each protocol parameter. Refer to section Multiple values fields in the Investigation File on how to encode multiple values in one field and match term sources
Study Protocol Parameters Term Accession Number String The accession number from the Term Source associated with the selected term.
Study Protocol Parameters Term Source REF String Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section.
Study Protocol Components Name String A semicolon-delimited (“;”) list of a protocol’s components; e.g. instrument names, software names, and reagents names. Refer to section Multiple values fields in the Investigation File on how to encode multiple components in one field and match term sources.
Study Protocol Components Type String Term to classify the protocol components listed for example, instrument, software, detector or reagent. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Study Protocol Components Type Term Accession Number String The accession number from the Source associated to the selected terms.
Study Protocol Components Type Term Source REF String Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match a Term Source Name previously declared in the ontology section

Example

For example, the STUDY PROTOCOLS section of an ISA-XLSX isa.investigation.xlsx file may look as follows:

STUDY PROTOCOLS
Study Protocol Name Harvesting Protein extraction Measurement
Study Protocol Type Biospecimen Collection nucleic acid extraction nucleic acid extraction
Study Protocol Type Term Accession Measurement Number http://purl.obolibrary.org/obo/NCIT_C70945
Study Protocol Type Term Source REF NCIT
Study Protocol Description Extraction and storage of algae cells from photo-bio reactor. Extracted and centrifuged cell pellets were frozen in liquid nitrogen. Proteins were extracted from cells using a combination of chemical (lysis buffer) and physical (sonicator) methods. Digested peptides were purified and resuspended in LC loading buffer. Peptides were separated by a nanoHPLC (C18 column) and detected using an Orbitrap mass spectrometry device.
Study Protocol URI
Study Protocol Version
Study Protocol Parameters Name Centrifugation Time;sample volume setting frequency; duration duration;flow rate
Study Protocol Parameters Name Term Accession Number http://purl.obolibrary.org/obo/NCIT_C178881;http://purl.allotrope.org/ontologies/result#AFR_0002492 http://purl.obolibrary.org/obo/PATO_0000044;http://purl.obolibrary.org/obo/PATO_0001309 http://purl.obolibrary.org/obo/PATO_0001309;http://purl.obolibrary.org/obo/PATO_0001574
Study Protocol Parameters Name Term Source REF NCIT;AFO PATO;PATO PATO;PATO
Study Protocol Components Name liquid nitrogen Sonicator; Extraction Kit HPLC; Column; MS
Study Protocol Components Type Liquid Nitrogen VWR Aquasonic 250D; IST sample preparation kit (PreOmics GmbH, Germany) U3000 RSLCnano HPLC; C18 column (Fritted Glass Column, 25 cm × 75 μm); Orbitrap Fusion Lumos
Study Protocol Components Type Term Accession Number http://purl.obolibrary.org/obo/NCIT_C68796 ;;http://purl.obolibrary.org/obo/MS_1002732
Study Protocol Components Type Term Source REF NCIT ;;MS

STUDY CONTACTS

This section MUST contain zero or more values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label Datatype Description
Study Person Last Name String The last name of a person associated with the study.
Study Person First Name String Study Person Name
Study Person Mid Initials String The middle initials of a person associated with the study.
Study Person Email String formatted as email The email address of a person associated with the study.
Study Person Phone String The telephone number of a person associated with the study.
Study Person Fax String The fax number of a person associated with the study.
Study Person Address String The address of a person associated with the study.
Study Person Affiliation String The organization affiliation for a person associated with the study.
Study Person Roles String or Ontology Annotation if accompanied by Term Accession Numbers and Term Source REFs Term to classify the role(s) performed by this person in the context of the study, which means that the roles reported here need not correspond to roles held withing their affiliated organization. Multiple annotations or values attached to one person can be provided by using a semicolon (“;”) Unicode (U0003+B) as a separator (e.g.: submitter;funder;sponsor) .The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Study Person Roles Term Accession Number String The accession number from the Term Source associated with the selected term.
Study Person Roles Term Source REF String Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Names declared in the Ontology Source Reference section.

Example

For example, the STUDY CONTACTS section of an ISA-XLSX isa.investigation.xlsx file may look as follows:

STUDY CONTACTS
Study Person Last Name Venn Zimmer Mühlhaus
Study Person First Name Benedikt David Timo
Study Person Mid Initials
Study Person Email [email protected] [email protected] [email protected]
Study Person Phone
Study Person Fax
Study Person Address TU Kaiserslautern, Kaiserslautern, 67663, Germany TU Kaiserslautern, Kaiserslautern, 67663, Germany TU Kaiserslautern, Kaiserslautern, 67663, Germany
Study Person Affiliation Computational Systems Biology Computational Systems Biology Computational Systems Biology
Study Person Roles author author corresponding author
Study Person Roles Term Accession Number
Study Person Roles Term Source REF

ASSAY section

This section is organized in several subsections, described in detail below. The subsections in the block are arranged vertically; the intent being to enhance readability and presentation, and possibly to help with parsing. These subsections MUST remain within this block; the fields MUST remain within their subsection.

These sections implement the metadata for an Assay from the ISA Abstract Model.

ASSAY

This section MUST contain zero or one values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label Datatype Description
Assay Measurement Type String A term to qualify the endpoint, or what is being measured (e.g. gene expression profiling or protein identification). The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Assay Measurement Type Term Accession Number String The accession number from the Term Source associated with the selected term.
Assay Measurement Type Term Source REF String The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section.
Assay Technology Type String Term to identify the technology used to perform the measurement, e.g. DNA microarray, mass spectrometry. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Assay Technology Type Term Accession Number String The accession number from the Term Source associated with the selected term.
Assay Technology Type Term Source REF String Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Names declared in the Ontology Source Reference section.
Assay Technology Platform String Manufacturer and platform name, e.g. Bruker AVANCE
Assay File Name String A field to specify the name of the Assay Table file corresponding the definition of that assay. There can be only one file per cell.

Example

For example, the ASSAY section of an ISA-XLSX isa.assay.xlsx file may look as follows:

ASSAY
Assay File Name assays/Proteomics/isa.assay.xlsx
Assay Measurement Type Proteomics
Assay Measurement Type Term Accession Number http://purl.obolibrary.org/obo/NCIT_C20085
Assay Measurement Type Term Source REF NCIT
Assay Technology Type Mass Spectrometry
Assay Technology Type Term Accession Number http://purl.obolibrary.org/obo/NCIT_C17156
Assay Technology Type Term Source REF NCIT
Assay Technology Platform Orbitrap Fusion Lumos

ASSAY PERFORMERS

This section MUST contain zero or more values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label Datatype Description
Assay Person Last Name String The last name of a person associated with the Assay.
Assay Person First Name String Assay Person Name
Assay Person Mid Initials String The middle initials of a person associated with the Assay.
Assay Person Email String formatted as email The email address of a person associated with the Assay.
Assay Person Phone String The telephone number of a person associated with the Assay.
Assay Person Fax String The fax number of a person associated with the assay.
Assay Person Address String The address of a person associated with the assay.
Assay Person Affiliation String The organization affiliation for a person associated with the assay.
Assay Person Roles String or Ontology Annotation if accompanied by Term Accession Numbers and Term Source REFs Term to classify the role(s) performed by this person in the context of the assay, which means that the roles reported here need not correspond to roles held withing their affiliated organization. Multiple annotations or values attached to one person can be provided by using a semicolon (“;”) Unicode (U0003+B) as a separator (e.g.: submitter;funder;sponsor) .The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Assay Person Roles Term Accession Number String The accession number from the Term Source associated with the selected term.
Assay Person Roles Term Source REF String Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Names declared in the Ontology Source Reference section.

Example

For example, the ASSAY PERFORMERS section of an ISA-XLSX isa.assay.xlsx file may look as follows:

ASSAY PERFORMERS
Assay Person Last Name Zhang Tzeng Evans
Assay Person First Name Ningning Shin-Cheng Bradley
Assay Person Mid Initials
Assay Person Email
Assay Person Phone
Assay Person Fax
Assay Person Address St. Louis, Missouri 63132, USA St. Louis, Missouri 63132, USA St. Louis, Missouri 63132, USA
Assay Person Affiliation Donald Danforth Plant Science Center Donald Danforth Plant Science Center Donald Danforth Plant Science Center
Assay Person Roles Investigator Laboratory Technologist Laboratory Technologist
Assay Person Roles Term Accession Number http://purl.obolibrary.org/obo/NCIT_C25936 http://purl.obolibrary.org/obo/NCIT_C51830 http://purl.obolibrary.org/obo/NCIT_C51830
Assay Person Roles Term Source REF NCIT NCIT NCIT

Annotation Table sheets

Annotation Table sheets are used to describe the experimental flow in detailed, machine readable way. In each sheet, there is a mapping from input entities to output entities, placed in the Input and Output columns, accordingly. The other columns then are used to either describe those entities or the processes that led to this mapping.

In the Annotation Table sheets, column headers MUST have the first letter of each word in upper case, with the exception of the referencing label (REF).

The content of the annotation table MUST be placed in an xlsx table whose name starts with annotationTable. Each sheet MUST contain at most one such annotation table. Only cells inside this table are considered as part of the formatted metadata.

Annotation Table sheets are structured with fields organized on a per-row basis. The first row MUST be used for column headers. Each body row is an implementation of a Process node.

Inputs and Outputs

Each annotation table sheet MUST contain at most one Input and at most one Output column, which denote the Input and Output node of the Process node respectively. They MUST be formatted in the pattern Input [<NodeType>] and Output [<NodeType>].

NodeTypes MUST be one of the following:

  • A Source MUST be indicated with the node type Source Name. Sources MUST not be used as Output nodes.

  • A Sample MUST be indicated with the node type Sample Name.

  • An Extract Material MUST be indicated with the node type Material Name.

  • A Data object MUST be indicated with the node type Data.

Source Names, Sample Names, Material Names MUST be unique across an ARC. If two of these entities with the same name exist in the same ARC, they are considered the same entity.

The Data node type MUST correspond to a relevant data resource location, following the Data Path Annotation patterns. If the annotation of the Data node refers not to the complete resource, but a part of it, a Selector MAY be added. This Selector MUST be separated from the resource location using a #— with no whitespace between: location#selector. If appropriate, the Selector SHOULD be formatted according to IRI fragment selectors specified by W3.

The format of the data resource MAY be further qualified using a Data Format column. The Data Format SHOULD be expressed using a MIME format, most commonly consisting of two parts: a type and a subtype, separated by a slash (/) — with no whitespace between: type/subtype. If appropriate, a format from the list composed by IANA SHOULD be picked. Unregistered or niche encoding and file formats MAY be indicated instead via the most appropriate URL.

The format and usage info about the Selector MAY be further qualified using a Data Selector Format column. The Data Selector Format SHOULD point to a web resource containing instructions about how the Selector is formatted and how it should be interpreted.

Examples

Data Location and Selector

In this example, there is a measurement of two Samples, namely input1 and input2. The values measured are both written into the same data resource in the location result.csv, whichs formatting is tabular, according to the Data Format being text/csv. To distinguish between the measurement values stemming from the different inputs, selectors were added to the resource location (seperated by a #), namely col=1 and col=2. The specification about the formatting of these selectors can be found in the provided link, namely https://datatracker.ietf.org/.

Input [Sample Name] Output [Data] Data Format Data Selector Format
input1 result.csv#col=1 text/csv https://datatracker.ietf.org/doc/html/rfc7111
input2 result.csv#col=2 text/csv https://datatracker.ietf.org/doc/html/rfc7111

Protocol Columns

Protocol REF columns MAY be used to specify the name of the Protocol node implemented by the Process node. Per Annotation Table sheet there MUST be at most one Protocol REF column. The value MUST be free text.

Protocol Version columns MAY be used to specify the version of the Protocol node implemented by the Process node. Per Annotation Table sheet there MUST be at most one Protocol Version column. The value MUST be free text.

Protocol Description columns MAY be used to specify the description of the Protocol node implemented by the Process node. Per Annotation Table sheet there MUST be at most one Protocol Description column. The value MUST be free text.

Protocol Uri columns MAY be used to specify the uri of the Protocol node implemented by the Process node. Per Annotation Table sheet there MUST be at most one Protocol Uri column. The value MUST be either a URI or a file path corresponding to a relevant protocol file location.

Protocol Type columns MAY be used to specify the type of the Protocol node implemented by the Process node. Per Annotation Table sheet there MUST be at most one Protocol Type column. The value MUST be free text, or an Ontology Annotation.

Ontology Annotations

Where a value is an Ontology Annotation in an annotation table, Term Accession Number and Term Source REF columns MUST follow the main column.

An Ontology Annotation MAY be applied to any appropriate Characteristic, Parameter, Factor, Component or Protocol Type.

This implements Ontology Annotation from the ISA Abstract Model.

Ontology Annotation Headers

The header of the main column MUST contain the structural column type followed by the name of the ontology term in [] brackets. There SHOULD be a space between the column type and the [ bracket.

The headers of the two annotation columns SHOULD contain further ontological information about the ontology term of the main header. In this case, following the static header string, separated by a single space, there MUST be a short ontology term identifier formatted as CURIEs (prefixed identifiers) of the form <IDSPACE>:<LOCALID> (specified here) inside () brackets.

In the other case, i.e. when the annotation columns do not contain further ontological information, the static header strings MUST be either followed by a single space and empty () brackets or nothing.

Ontology Annotation Values

The value in the main column MUST contain the name of the ontology term.

The value in the Term Source REF column MUST either contain a short identifier for the IDSPACE, which identifies the ontology containing the term, or be left empty.

The value in the Term Accession Number column MUST either contain a value formatted in one of the following formats, or be left empty:

  • LOCALID of the ontology, which is only applicable if the matching IDSPACE is given in the Term Source REF column
  • short ontology term identifier formatted as CURIEs (prefixed identifiers) of the form <IDSPACE>:<LOCALID> (specified here)
  • URL pointing to the ontology term

Ontology Annotation Example

For example, a characteristic type organism with a value of Homo sapiens can be qualified with an Ontology Annotation of a term from NCBI Taxonomy as follows:

Characteristic [organism] Term Source REF (OBI:0100026) Term Accession Number (OBI:0100026)
Homo sapiens NCBITaxon http://…/NCBITAXON_9606

Note

In this example, the value in the Term Accession Number column is formatted as a URL, but shortened for the purpose of markdown-formatting.

Unit

Where a value is numeric, a Unit MAY be used to qualify the quantity. In this case, the main column must be followed by a Unit column, which in turn SHOULD be further annotated as an Ontology Annotation, being followed by Term Accession Number and Term Source REF columns.

  • The headers of the annotation columns then refer to the header of the main column.
  • The values of the annotation columns then refer to the unit, and not to the numeric value of the main column.

For example, in the following, the header ontology temperature is further qualified with the CURIE PATO:0000146. The value 300 is qualified with a Unit Kelvin, which is further qualified as an Ontology Annotation from the Units Ontology declared in the Ontology Sources with UO:

Parameter [temperature] Unit Term Source REF (PATO:0000146) Term Accession Number (PATO:0000146)
300 Kelvin UO http://…/obo/UO_0000012

Note

In this example, the value in the Term Accession Number column is formatted as a URL, but shortened for the purpose of markdown-formatting.

Characteristics

A Characteristic is used as an attribute column following Sources and Samples. This column contains terms describing each material according to the characteristics category indicated in the column header in the pattern Characteristic [<category term>]. For example, a column header Characteristic [organ part] would contain terms describing an organ part. Characteristic SHOULD be used as an attribute column following Input [Source Name], or Input [Sample Name]. The value MUST be free text, numeric, or an Ontology Annotation.

For example, a characteristic type Organism with a value of Homo sapiens can be qualified with an Ontology Annotation of a term from NCBI Taxonomy as follows:

Characteristic [organ part] Term Source REF (UBERON:0000064) Term Accession Number (UBERON:0000064)
Liver MeSH D008099

Note

In this example, the value in the Term Accession Number column is formatted as a LOCALID. The associated IDSPACE to identify the ontology term is given in the Term Source REF column.

Factors

A Factor is an independent variable manipulated by an experimentalist with the intention to affect biological systems in a way that can be measured by an assay. This field holds the actual data for the Factor named between the square brackets (as declared in the Study Factors section of a top-level metadata sheet) so MUST match, for example, Factor [compound]. The value MUST be free text, numeric, or an Ontology Annotation.

Factor [Gender] Term Source REF (NCIT:C17357) Term Accession Number (NCIT:C17357)
Male MeSH D008297

Note

In this example, the value in the Term Accession Number column is formatted as a LOCALID. The associated IDSPACE to identify the ontology term is given in the Term Source REF column.

Components

A Component is a consumable or reusable physical entity used in the experimental workflow. It is formatted in the pattern Component [<category term>]. The value MUST be free text, numeric, or an Ontology Annotation.

Component [Measurement Device] Term Source REF (NCIT:C81182) Term Accession Number (NCIT:C81182)
Illumina MiniSeq OBI http://…/obo/OBI_0003114

Note

In this example, the value in the Term Accession Number column is formatted as a URL, but shortened for the purpose of markdown-formatting.

Parameters

A Parameter can be used to specify any additional information about the experimental setup, that does not fall under the aforementioned 3 categories. It is formatted in the pattern Parameter [<category term>]. The value MUST be free text, numeric, or an Ontology Annotation.

Parameter [temperature] Unit Term Source REF (NCRO:0000029) Term Accession Number (NCRO:0000029)
300 Kelvin UO http://…/obo/UO_0000032

Note

In this example, the value in the Term Accession Number column is formatted as a URL, but shortened for the purpose of markdown-formatting.

Comments

A Comment can be used to provide some additional information. Columns headed with Comment[<comment name>] MAY appear anywhere in the Annotation Table. The comment always refers to the Annotation Table. The value MUST be free text.

Comment [Answer to everything]
forty-two

Others

Columns whose headers do not follow any of the formats described above are considered additional payload and are out of the scope of this specification.

Examples

For example, a simple source to sample may be represented as:

Input [Source Name] Protocol REF Output [Sample Name]
source1 sample collection sample1

Where a graph splits or pools, we use the Input or Output column to represent the same nodes.

For example, if we split a source into two samples, we might represent this as:

Input [Source Name] Protocol REF Output [Sample Name]
source1 sample collection sample1
source1 sample collection sample2

If we pool two sources into a single sample, we might represent this as:

Input [Source Name] Protocol REF Output [Sample Name]
source1 sample collection sample1
source2 sample collection sample1

Datamap table sheets

Datamap Table sheets are used to describe the contents of data files.

In the Datamap Table sheets, column headers MUST have the first letter of each word in upper case, with the exception of the referencing label (REF).

The content of the datamap table MUST be placed in an xlsx table whose name equals datamapTable. Each sheet MUST contain at most one such datamap table. Only cells inside this table are considered as part of the formatted metadata.

Datamap Table sheets are structured with fields organized on a per-row basis. The first row MUST be used for column headers. Each body row is an implementation of a data node.

Data column

Every Datamap Table sheet MUST contain a Data column. Every object in this column MUST correspond to a relevant data resource location, following the Data Path Annotation patterns. If the annotation of the Data node refers not to the complete resource, but a part of it, a Selector MAY be added. This Selector MUST be separated from the resource location using a #— with no whitespace between: location#selector. If appropriate, the Selector SHOULD be formatted according to IRI fragment selectors specified by W3.

The format of the data resource MAY be further qualified using a Data Format column. The Data Format SHOULD be expressed using a MIME format, most commonly consisting of two parts: a type and a subtype, separated by a slash (/) — with no whitespace between: type/subtype. If appropriate, a format from the list composed by IANA SHOULD be picked. Unregistered or niche encoding and file formats MAY be indicated instead via the most appropriate URL.

The format and usage info about the Selector MAY be further qualified using a Data Selector Format column. The Data Selector Format SHOULD point to a web resource containing instructions about how the Selector is formatted and how it should be interpreted.

Explication column

Every Datamap Table sheet SHOULD contain an Explication column. The Explication adds explicit meaning to the data node. The value MUST be free text, or an Ontology Annotation.

Explication Term Source REF Term Accession Number
average value OBI http://…/obo/OBI_0000679

Unit column

Every Datamap Table sheet SHOULD contain an Unit column. The Unit adds a unit of measurement to the data node. The value MUST be free text, or an Ontology Annotation.

Unit Term Source REF Term Accession Number
milligram per milliliter UO http://…/obo/UO_0000176

Note

In this example, the value in the Term Accession Number column is formatted as a URL, but shortened for the purpose of markdown-formatting.

Object Type column

Every Datamap Table sheet SHOULD contain an Object Type column. The Object Type defines the shape or format in which the data node is represented. The value MUST be free text, or an Ontology Annotation.

Object Type Term Source REF Term Accession Number
Float NCIT http://…/obo/NCIT_C48150

Note

In this example, the value in the Term Accession Number column is formatted as a URL, but shortened for the purpose of markdown-formatting.

Description column

Every Datamap Table sheet SHOULD contain a Description column. The Description gives additional, humand readable context about the data node. The value MUST be free text.

Description
The average protein concentration for the given gene

Generated By column

Every Datamap Table sheet SHOULD contain a Generated By column. The Generated By names the tool which led to the creation of the data node. The value MUST be free text.

If possible, the value in this column MUST correspond to a relevant data resource location, following the Data Path Annotation patterns.

Generated By
GeneStatisticsTool.exe

Comments

A Comment can be used to provide some additional information. Columns headed with Comment[<comment name>] MAY appear anywhere in the Annotation Table. The comment always refers to the Annotation Table. The value MUST be free text.

Comment [Answer to everything]
forty-two

Examples

For example, a simple datamap table representing a tabular datafile might look as follows:

Data Explication Term Source REF Term Accession Number Unit Term Source REF Term Accession Number Object Type Term Source REF Term Accession Number Description GeneratedBy
MyData.csv#col=1 Gene Identifier NCIT http://…/obo/NCIT_C48664 String NCIT http://…/obo/NCIT_C45253 Short hand identifier of the gene coding for the protein. GeneStatisticsTool.exe
MyData.csv#col=2 average value OBI http://…/obo/OBI_0000679 milligram per milliliter UO http://…/obo/UO_0000176 Float NCIT http://…/obo/NCIT_C48150 The average protein concentration for the given gene GeneStatisticsTool.exe
MyData.csv#col=3 p-value OBI http://…/obo/OBI_0000175 Float NCIT http://…/obo/NCIT_C48150 p-value of t-test against control. GeneStatisticsTool.exe

In this example, the datamap table describes a single data file named MyData.csv. This file contains three columns. The first column contains gene identifiers, the other two results of a statistical analysis performed by the tool GeneStatisticsTool.exe.