ISA-XLSX format

For detail on ISA framework terminology, please read the ISA Abstract Model specification.

This document describes the ISA Abstract Model reference implementation specified in the ISA-XLSX format. The XLSX format uses the SpreadsheetML markup language and schema to represent a spreadsheet document. Conceptually, using the terminology of the Spreadsheet ML specification ISO/IEC 29500-1, the document comprises one or more worksheets in a workbook.

Table of contents

Investigation File
Study File
Assay File
Datamap File
Top-level metadata sheets
Annotation Table sheets
Datamap Table sheets
- Data
- Explication
- Unit
- Object Type
- Description
- Generated By
- Comment
- Examples

Below we provide the schemas and the content rules for valid ISA-XLSX documents.

ISA-XLSX uses three types of files to capture the experimental metadata:

Investigation file
Study file
Assay file

The Investigation file contains all the information needed to understand the overall goals and means used in an experiment; experimental steps (or sequences of events) are described in the Study and in the Assay file(s). For each Investigation file there may be one or more Studies defined with a corresponding Study file; for each Study there may be one or more Assays defined with corresponding Assay files; one assay file may be registered in different studies.

In order to facilitate identification of ISA-XLSX component files, specific naming patterns MUST be followed:

isa.investigation.xlsx for identifying the Investigation file
isa.study.xlsx for identifying Study file(s)
isa.assay.xlsx for identifying Assay file(s)

Sheets described in this specification MUST follow one of the two given formats:

Top-level metadata sheets for listing top-level metadata
Annotation Table sheets for describing experimental workflows

Sheets which do not follow any of these two formats are considered additional payload and are ignored in this specification.

All labels are case-sensitive:

Dates SHOULD be supplied in the ISO8601 format.

For maximal portability file names SHOULD contain only ASCII characters not excluded already (that is A-Za-z0-9._!#$%&+,;=@^(){}'[] - we exclude space as many utilities do not accept spaces in file paths): non-English alphabetic characters cannot be guaranteed to be supported in all locales. It is recommended to avoid the shell metacharacters (){}'[]$.".

Investigation File

The Investigation file fulfils four needs:

to declare key entities, such as factors, protocols, which may be referenced in the other files
to track provenance of the used terminologies (controlled vocabularies or ontologies), where applicable
to relate Assay files to Studies
to select those Studies, that are considered part of the investigation.

The Investigation File MUST contain one Top-Level Metadata sheet. This sheet MUST be named isa_investigation and MUST contain the following sections:

ONTOLOGY SOURCE REFERENCE
INVESTIGATION
INVESTIGATION PUBLICATIONS
INVESTIGATION CONTACTS

Additionally, it MAY contain the following sections:

STUDY
STUDY DESIGN DESCRIPTORS
STUDY PUBLICATIONS
STUDY FACTORS
STUDY ASSAYS
STUDY PROTOCOLS
STUDY CONTACTS

The Investigation File implements the Investigation graph from the ISA Abstract Model.

Study File

The Study represents a set of logically connected experiments. A Study File contains contextualising information for one or more Assays, metadata about the study design, study factors used, and study protocols, as well as information similarly to the Investigation including title and description of the study, and related people and scholarly publications, but also details the sample collection process needed to perform the connected Assays.

The Study File MUST contain one Top-Level Metadata sheet. This sheet MUST be named isa_study and MUST contain the following sections:

STUDY
STUDY DESIGN DESCRIPTORS
STUDY PUBLICATIONS
STUDY CONTACTS

Additionally, it MAY contain the following sections:

STUDY FACTORS
STUDY ASSAYS
STUDY PROTOCOLS

Additionally, the Study File SHOULD contain one or more Annotation Table sheet(s), which MAY record provenance of biological samples, from source material through a collection process to sample material.

Therefore, the main entities of the Study File should be Sources and Samples.

Any Study MAY contain datamap references as described in the Datamap Sheet section.

The Study File implements the Study graph from the ISA Abstract Model. graph from the ISA Abstract Model.

Assay File

The Assay represents one experimental measurement. An Assay File metadata about the assay design, information about the people performing the experiment, and most importantly, details about the preparation and/or execution of the experimental measurement.

The Assay File MUST contain one Top-Level Metadata sheet. This sheet MUST be named isa_assay and MUST contain the following sections:

ASSAY
ASSAY PERFORMERS

Additionally, the Assay File SHOULD contain one or more Annotation Table sheet(s), which MAY record preparation of biological samples, measurement of these samples and basic computations performed on the resulting data.

Therefore, the main entities of the Assay File should be Samples and Data.

Any Assay MAY contain datamap references as described in the Datamap Sheet section.

The Assay File implements the Assay graph from the ISA Abstract Model.

Datamap File

The Datamap represents a set of explanations about the data entities defined in assays and studies.

The Datamap File MUST contain one Datamap table sheet. This sheet MUST be named isa_datamap.

Therefore, the main entities of the Datamap File should be Data.

The Datamap File acts as an extension of the data nodes defined in the Study and Assay graphs section from the ISA Abstract Model.

Top-level metadata sheets

The purpose of top-level metadata sheets is aggregating and listing top-level metadata. Each sheet consists of sections consisting of a section header and key-value fields. Section headers MUST be completely written in upper case (e.g. STUDY), field headers MUST have the first letter of each word in upper case (e.g. Study Identifier); with the exception of the referencing label (REF).

In the following sections, examples of each section block are given beside the specification of each section.

ATTENTION

Rows in which the first character in the first column is Unicode U+0023 (the # character) > MUST be interpreted as comments, where reference implementation parsers SHOULD ignore those lines entirely.

Rows where the label Comment[<comment name>] appear can also appear within any of the > section blocks. Where these appear, the comment name must be unique within the context of a single block (e.g. you cannot have multiple occurrences of Comment[external DB REF] within STUDY ASSAYS. Also, the value cells MUST match the number of values indicated by the rest of the section in context.

Ontology Source Reference section

The Ontology Source section of the Investigation file is used to declare Ontology Sources used elsewhere in the ISA-XLSX files within the context of an Investigation.

Where a row labelled with Term Source REF suffixed in a Top-level metadata sheet, the value of the cell SHOULD match one of the Term Source Name value declared in this section.

Where a column labelled with Term Source REF in a Annotation table sheet, the value of the cell SHOULD match one of the Term Source Name value declared in this section.

This section implements a list of Ontology Source from the ISA Abstract Model.

This section MUST contain zero or more values.

ONTOLOGY SOURCE REFERENCE

This section MUST contain the following labels, with the specified datatypes for values supported:

Label	Datatype	Description
Term Source Name	String	The name of the source of a term; i.e. the source controlled vocabulary or ontology. These names will be used in all corresponding Term Source REF fields that occur elsewhere.
Term Source File	String (file name or URI)	A file name or a URI of an official resource.
Term Source Version	String	The version number of the Term Source to support terms tracking.
Term Source Description	String	Use for disambiguating resources when homologous prefixes have been used.

Example

For example, the ONTOLOGY SOURCE REFERENCE section of an ISA-XLSX isa.investigation.xlsx file may look as follows:


ONTOLOGY SOURCE REFERENCE
Term Source Name	CHEBI	EFO	OBI	NCBITAXON	PATO
Term Source File	http://data.bioontology.org/ontologies/CHEBI	http://data.bioontology.org/ontologies/EFO	http://data.bioontology.org/ontologies/OBI	http://data.bioontology.org/ontologies/NCBITAXON	http://data.bioontology.org/ontologies/PATO
Term Source Version	78	111	21	2	160
Term Source Description	Chemical Entities of Biological Interest Ontology	Experimental Factor Ontology	Ontology for Biomedical Investigations	National Center for Biotechnology Information (NCBI) Organismal Classification	Phenotypic Quality Ontology

INVESTIGATION section

This section is organized in several subsections, described in detail below.

This section implements an Investigation from the ISA Abstract Model.

INVESTIGATION

This section MUST contain zero or one values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label	Datatype	Description
Investigation Identifier	String	A identifier or an accession number provided by a repository. This SHOULD be locally unique.
Investigation Title	String	A concise name given to the investigation.
Investigation Description	String	A textual description of the investigation.
Investigation Submission Date	String formatted as ISO8601 date YYYY-MM-DD	The date on which the investigation was reported to the repository.
Investigation Public Release Date	String formatted as ISO8601 date YYYY-MM-DD	The date on which the investigation was released publicly.

Example

For example, the INVESTIGATION section of an ISA-XLSX isa.investigation.xlsx file may look as follows:


INVESTIGATION
Investigation Identifier	ChlamyHeatstress
Investigation Title	Systems-wide investigation of responses to moderate and acute high temperatures in the green alga Chlamydomonas reinhardtii.
Investigation Description	Algae cultures were grown mixotrophically (TAP). After 24h of 35°C/40°C the cells were shifted back to room temperature for 48h. 'omics samples were taken.
Investigation Submission Date	2022-05-13
Investigation Public Release Date

INVESTIGATION PUBLICATIONS

This section MUST contain zero or more values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label	Datatype	Description
Investigation Publication PubMed ID	String formatted as valid PubMed ID	The PubMed IDs of the described publication(s) associated with this investigation.
Investigation Publication DOI	String formatted as valid DOI	A Digital Object Identifier (DOI) for that publication (where available).
Investigation Publication Author List	String	The list of authors associated with that publication.
Investigation Publication Title	String	The title of publication associated with the investigation.
Investigation Publication Status	String, or Ontology Annotation by providing accompanying Term Accession Number and Term Source REF	A term describing the status of that publication (i.e. submitted, in preparation, published).
Investigation Publication Status Term Accession Number	String or URI	The accession number from the Term Source associated with the selected term.
Investigation Publication Status Term Source REF	String	Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one the Term Source Name declared in the in the Ontology Source Reference section.

Example

For example, the INVESTIGATION PUBLICATIONS section of an ISA-XLSX isa.investigation.xlsx file may look as follows:


INVESTIGATION PUBLICATIONS
Investigation Publication PubMed ID	PMC9106746
Investigation Publication DOI	10.1038/s42003-022-03359-z
Investigation Publication Author List	Ningning Zhang, Erin M. Mattoon, Will McHargue, Benedikt Venn, David Zimmer, Kresti Pecani, Jooyeon Jeong, Cheyenne M. Anderson, Chen Chen, Jeffrey C. Berry, Ming Xia, Shin-Cheng Tzeng, Eric Becker, Leila Pazouki, Bradley Evans, Fred Cross, Jianlin Cheng, Kirk J. Czymmek, Michael Schroda, Timo Mühlhaus & Ru Zhang
Investigation Publication Title	Systems-wide analysis revealed shared and unique responses to moderate and acute high temperatures in the green alga Chlamydomonas reinhardtii
Investigation Publication Status	published
Investigation Publication Status Term Accession Number	http://purl.org/spar/pso/published
Investigation Publication Status Term Source REF	PSO

INVESTIGATION CONTACTS

This section MUST contain zero or more values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label	Datatype	Description
Investigation Person Last Name	String	The last name of a person associated with the investigation.
Investigation Person First Name	String	Investigation Person Name
Investigation Person Mid Initials	String	The middle initials of a person associated with the investigation.
Investigation Person Email	String formatted as email	The email address of a person associated with the investigation.
Investigation Person Phone	String	The telephone number of a person associated with the investigation.
Investigation Person Fax	String	The fax number of a person associated with the investigation.
Investigation Person Address	String	The address of a person associated with the investigation.
Investigation Person Affiliation	String	The organization affiliation for a person associated with the investigation.
Investigation Person Roles	String or Ontology Annotation if accompanied by Term Accession Numbers and Term Source REFs	Term to classify the role(s) performed by this person in the context of the investigation, which means that the roles reported here need not correspond to roles held withing their affiliated organization. Multiple annotations or values attached to one person can be provided by using a semicolon (“;”) Unicode (U0003+B) as a separator (e.g.: submitter;funder;sponsor) .The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Investigation Person Roles Term Accession Number	String	The accession number from the Term Source associated with the selected term.
Investigation Person Roles Term Source REF	String	Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Names declared in the Ontology Source Reference section.

Example

For example, the INVESTIGATION CONTACTS section of an ISA-XLSX isa.investigation.xlsx file may look as follows:


INVESTIGATION CONTACTS
Investigation Person Last Name	Venn	Zimmer	Mühlhaus
Investigation Person First Name	Benedikt	David	Timo
Investigation Person Mid Initials
Investigation Person Email	venn@rptu.de	d_zimmer@rptu.de	timo.muehlhaus@rptu.de
Investigation Person Phone
Investigation Person Fax
Investigation Person Address	TU Kaiserslautern, Kaiserslautern, 67663, Germany	TU Kaiserslautern, Kaiserslautern, 67663, Germany	TU Kaiserslautern, Kaiserslautern, 67663, Germany
Investigation Person Affiliation	Computational Systems Biology	Computational Systems Biology	Computational Systems Biology
Investigation Person Roles	author	author	corresponding author
Investigation Person Roles Term Accession Number
Investigation Person Roles Term Source REF

STUDY section

This section is organized in several subsections, described in detail below. This section also represents a repeatable block, which is replicated according to the number of Studies to report (i.e. two Studies, two Study blocks are represented in the Investigation file). The subsections in the block are arranged vertically; the intent being to enhance readability and presentation, and possibly to help with parsing. These subsections MUST remain within this repeatable block, although their order MAY vary; the fields MUST remain within their subsection.

These sections implement the metadata for a Study from the ISA Abstract Model and a list of Assay (i.e. Study and Assay without graphs; graphs are implemented in ISA-XLSX as Annotation Table sheets).

STUDY

This section MUST contain zero or one values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label	Datatype	Description
Study Identifier	String	A unique identifier, either a temporary identifier supplied by users or one generated by a repository or other database. For example, it could be an identifier complying with the LSID specification.
Study Title	String	A concise phrase used to encapsulate the purpose and goal of the study.
Study Description	String	A textual description of the study, with components such as objective or goals.
Study Submission Date	String formatted as ISO8601 date	The date on which the study is submitted to an archive.
Study Public Release Date	String formatted as ISO8601 date	The date on which the study SHOULD be released publicly.
Study File Name	String formatted as file name or URI	A field to specify the name of the Study Table file corresponding the definition of that Study. There can be only one file per cell.

Example

For example, the STUDY section of an ISA-XLSX isa.investigation.xlsx file may look as follows:


STUDY
Study Identifier	HeatstressExperiment
Study Title	Systems-wide investigation of responses to moderate and acute high temperatures in the green alga Chlamydomonas reinhardtii.
Study Description	Algae cultures were grown mixotrophically (TAP). After 24h of 35°C/40°C the cells were shifted back to room temperature for 48h. 'omics samples were taken.
Study Submission Date	2022-05-13
Study Public Release Date
Study File Name	studies/HeatstressExperiment/isa.study.xlsx

STUDY DESIGN DESCRIPTORS

This section MUST contain zero or more values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label	Datatype	Description
Study Design Type	String	A term allowing the classification of the study based on the overall experimental design, e.g cross-over design or parallel group design. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Study Design Type Term Accession Number	String	The accession number from the Term Source associated with the selected term.
Study Design Type Term Source REF	String	Identifies the controlled vocabulary or ontology that this term comes from. The Study Design Term Source REF has to match one the Term Source Name declared in the Ontology Source Reference section.

Example

For example, the STUDY DESIGN DESCRIPTORS section of an ISA-XLSX isa.investigation.xlsx file may look as follows:


STUDY DESIGN DESCRIPTORS
Study Design Type	time series design	heat exposure
Study Design Type Term Accession Number	http://purl.obolibrary.org/obo/OBI_0500020	http://purl.obolibrary.org/obo/XCO_0000308
Study Design Type Term Source REF	OBI

STUDY PUBLICATIONS

This section MUST contain zero or more values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label	Datatype	Description
Study PubMed ID	String formatted as valid PubMed ID	The PubMed IDs of the described publication(s) associated with this study.
Study Publication DOI	String formatted as valid DOI	A Digital Object Identifier (DOI) for that publication (where available).
Study Publication Author List	String	The list of authors associated with that publication.
Study Publication Title	String	The title of publication associated with the investigation.
Study Publication Status	String, or Ontology Annotation by providing accompanying Term Accession Number and Term Source REF	A term describing the status of that publication (i.e. submitted, in preparation, published).
Study Publication Status Term Accession Number	String or URI	The accession number from the Term Source associated with the selected term.
Study Publication Status Term Source REF	String	Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one the Term Source Name declared in the in the Ontology Source Reference section.

Example

For example, the STUDY PUBLICATIONS section of an ISA-XLSX isa.investigation.xlsx file may look as follows:


STUDY PUBLICATIONS
Study Publication PubMed ID	PMC9106746
Study Publication DOI	10.1038/s42003-022-03359-z
Study Publication Author List	Ningning Zhang, Erin M. Mattoon, Will McHargue, Benedikt Venn, David Zimmer, Kresti Pecani, Jooyeon Jeong, Cheyenne M. Anderson, Chen Chen, Jeffrey C. Berry, Ming Xia, Shin-Cheng Tzeng, Eric Becker, Leila Pazouki, Bradley Evans, Fred Cross, Jianlin Cheng, Kirk J. Czymmek, Michael Schroda, Timo Mühlhaus & Ru Zhang
Study Publication Title	Systems-wide analysis revealed shared and unique responses to moderate and acute high temperatures in the green alga Chlamydomonas reinhardtii
Study Publication Status	published
Study Publication Status Term Accession Number	http://purl.org/spar/pso/published
Study Publication Status Term Source REF	PSO

STUDY FACTORS

This section MUST contain zero or more values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label	Datatype	Description
Study Factor Name	String	The name of one factor used in the Study and/or Assay files. A factor corresponds to an independent variable manipulated by the experimentalist with the intention to affect biological systems in a way that can be measured by an assay. The value of a factor is given in the Study or Assay file, accordingly. If both Study and Assay have a Factor Value, these must be different.
Study Factor Type	String	A term allowing the classification of this factor into categories. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Study Factor Type Term Accession Number	String	The accession number from the Term Source associated with the selected term.
Study Factor Type Term Source REF	String	Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section.

Example

For example, the STUDY FACTORS section of an ISA-XLSX isa.investigation.xlsx file may look as follows:


STUDY FACTORS
Study Factor Name	temperature	collection time
Study Factor Type	temperature	time
Study Factor Type Term Accession Number	http://purl.obolibrary.org/obo/PATO_0000146	http://purl.obolibrary.org/obo/PATO_0000165
Study Factor Type Term Source REF	PATO	PATO

STUDY ASSAYS

This section MUST contain zero or more values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label	Datatype	Description
Study Assay Measurement Type	String	A term to qualify the endpoint, or what is being measured (e.g. gene expression profiling or protein identification). The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Study Assay Measurement Type Term Accession Number	String	The accession number from the Term Source associated with the selected term.
Study Assay Measurement Type Term Source REF	String	The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section.
Study Assay Technology Type	String	Term to identify the technology used to perform the measurement, e.g. DNA microarray, mass spectrometry. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Study Assay Technology Type Term Accession Number	String	The accession number from the Term Source associated with the selected term.
Study Assay Technology Type Term Source REF	String	Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Names declared in the Ontology Source Reference section.
Study Assay Technology Platform	String	Manufacturer and platform name, e.g. Bruker AVANCE
Study Assay File Name	String	A field to specify the name of the Assay Table file corresponding the definition of that assay. There can be only one file per cell.

Example

For example, the STUDY ASSAYS section of an ISA-XLSX isa.investigation.xlsx file may look as follows:


STUDY ASSAYS
Study Assay File Name	assays/Proteomics/isa.assay.xlsx	assays/Transcriptomics/isa.assay.xlsx
Study Assay Measurement Type	Proteomics	transcription profiling
Study Assay Measurement Type Term Accession Number	http://purl.obolibrary.org/obo/NCIT_C20085	http://purl.obolibrary.org/obo/OBI_0000424
Study Assay Measurement Type Term Source REF	NCIT	OBI
Study Assay Technology Type	Mass Spectrometry	nucleotide sequencing
Study Assay Technology Type Term Accession Number	http://purl.obolibrary.org/obo/NCIT_C17156	http://purl.obolibrary.org/obo/OBI_0000626
Study Assay Technology Type Term Source REF	NCIT	OBI
Study Assay Technology Platform	Orbitrap Fusion Lumos	Illumina HiSeq 2000 Rapid Run

STUDY PROTOCOLS

This section MUST contain zero or more values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label	Datatype	Description
Study Protocol Name	String	The name of the protocols used within the ISA-XLSX document. The names are used as identifiers within the ISA-XLSX document and will be referenced in the Study and Assay files in the Protocol REF columns. Names can be either local identifiers, unique within the ISA Archive which contains them, or fully qualified external accession numbers.
Study Protocol Type	String	Term to classify the protocol. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Study Protocol Type Term Accession Number	String	The accession number from the Term Source associated with the selected term.
Study Protocol Type Term Source REF	String	Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section.
Study Protocol Description	String	A free-text description of the protocol.
Study Protocol URI	String	Pointer to protocol resources external to the ISA-Tab that can be accessed by their Uniform Resource Identifier (URI).
Study Protocol Version	String	An identifier for the version to ensure protocol tracking.
Study Protocol Parameters Name	String	A semicolon-delimited (“;”) list of parameter names, used as an identifier within the ISA-XLSX document. These names are used in the Study and Assay files (in the “Parameter Value []” column heading) to list the values used for each protocol parameter. Refer to section Multiple values fields in the Investigation File on how to encode multiple values in one field and match term sources
Study Protocol Parameters Term Accession Number	String	The accession number from the Term Source associated with the selected term.
Study Protocol Parameters Term Source REF	String	Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section.
Study Protocol Components Name	String	A semicolon-delimited (“;”) list of a protocol’s components; e.g. instrument names, software names, and reagents names. Refer to section Multiple values fields in the Investigation File on how to encode multiple components in one field and match term sources.
Study Protocol Components Type	String	Term to classify the protocol components listed for example, instrument, software, detector or reagent. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Study Protocol Components Type Term Accession Number	String	The accession number from the Source associated to the selected terms.
Study Protocol Components Type Term Source REF	String	Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match a Term Source Name previously declared in the ontology section

Example

For example, the STUDY PROTOCOLS section of an ISA-XLSX isa.investigation.xlsx file may look as follows:


STUDY PROTOCOLS
Study Protocol Name	Harvesting	Protein extraction	Measurement
Study Protocol Type	Biospecimen Collection	nucleic acid extraction	nucleic acid extraction
Study Protocol Type Term Accession Measurement Number	http://purl.obolibrary.org/obo/NCIT_C70945
Study Protocol Type Term Source REF	NCIT
Study Protocol Description	Extraction and storage of algae cells from photo-bio reactor. Extracted and centrifuged cell pellets were frozen in liquid nitrogen.	Proteins were extracted from cells using a combination of chemical (lysis buffer) and physical (sonicator) methods. Digested peptides were purified and resuspended in LC loading buffer.	Peptides were separated by a nanoHPLC (C18 column) and detected using an Orbitrap mass spectrometry device.
Study Protocol URI
Study Protocol Version
Study Protocol Parameters Name	Centrifugation Time;sample volume setting	frequency; duration	duration;flow rate
Study Protocol Parameters Name Term Accession Number	http://purl.obolibrary.org/obo/NCIT_C178881;http://purl.allotrope.org/ontologies/result#AFR_0002492	http://purl.obolibrary.org/obo/PATO_0000044;http://purl.obolibrary.org/obo/PATO_0001309	http://purl.obolibrary.org/obo/PATO_0001309;http://purl.obolibrary.org/obo/PATO_0001574
Study Protocol Parameters Name Term Source REF	NCIT;AFO	PATO;PATO	PATO;PATO
Study Protocol Components Name	liquid nitrogen	Sonicator; Extraction Kit	HPLC; Column; MS
Study Protocol Components Type	Liquid Nitrogen	VWR Aquasonic 250D; IST sample preparation kit (PreOmics GmbH, Germany)	U3000 RSLCnano HPLC; C18 column (Fritted Glass Column, 25 cm × 75 μm); Orbitrap Fusion Lumos
Study Protocol Components Type Term Accession Number	http://purl.obolibrary.org/obo/NCIT_C68796		;;http://purl.obolibrary.org/obo/MS_1002732
Study Protocol Components Type Term Source REF	NCIT		;;MS

STUDY CONTACTS

This section MUST contain zero or more values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label	Datatype	Description
Study Person Last Name	String	The last name of a person associated with the study.
Study Person First Name	String	Study Person Name
Study Person Mid Initials	String	The middle initials of a person associated with the study.
Study Person Email	String formatted as email	The email address of a person associated with the study.
Study Person Phone	String	The telephone number of a person associated with the study.
Study Person Fax	String	The fax number of a person associated with the study.
Study Person Address	String	The address of a person associated with the study.
Study Person Affiliation	String	The organization affiliation for a person associated with the study.
Study Person Roles	String or Ontology Annotation if accompanied by Term Accession Numbers and Term Source REFs	Term to classify the role(s) performed by this person in the context of the study, which means that the roles reported here need not correspond to roles held withing their affiliated organization. Multiple annotations or values attached to one person can be provided by using a semicolon (“;”) Unicode (U0003+B) as a separator (e.g.: submitter;funder;sponsor) .The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Study Person Roles Term Accession Number	String	The accession number from the Term Source associated with the selected term.
Study Person Roles Term Source REF	String	Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Names declared in the Ontology Source Reference section.

Example

For example, the STUDY CONTACTS section of an ISA-XLSX isa.investigation.xlsx file may look as follows:


STUDY CONTACTS
Study Person Last Name	Venn	Zimmer	Mühlhaus
Study Person First Name	Benedikt	David	Timo
Study Person Mid Initials
Study Person Email	venn@bio.rptu.de	d_zimmer@rptu.de	timo.muehlhaus@rptu.de
Study Person Phone
Study Person Fax
Study Person Address	TU Kaiserslautern, Kaiserslautern, 67663, Germany	TU Kaiserslautern, Kaiserslautern, 67663, Germany	TU Kaiserslautern, Kaiserslautern, 67663, Germany
Study Person Affiliation	Computational Systems Biology	Computational Systems Biology	Computational Systems Biology
Study Person Roles	author	author	corresponding author
Study Person Roles Term Accession Number
Study Person Roles Term Source REF

ASSAY section

This section is organized in several subsections, described in detail below. The subsections in the block are arranged vertically; the intent being to enhance readability and presentation, and possibly to help with parsing. These subsections MUST remain within this block; the fields MUST remain within their subsection.

These sections implement the metadata for an Assay from the ISA Abstract Model.

ASSAY

This section MUST contain zero or one values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label	Datatype	Description
Assay Measurement Type	String	A term to qualify the endpoint, or what is being measured (e.g. gene expression profiling or protein identification). The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Assay Measurement Type Term Accession Number	String	The accession number from the Term Source associated with the selected term.
Assay Measurement Type Term Source REF	String	The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section.
Assay Technology Type	String	Term to identify the technology used to perform the measurement, e.g. DNA microarray, mass spectrometry. The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Assay Technology Type Term Accession Number	String	The accession number from the Term Source associated with the selected term.
Assay Technology Type Term Source REF	String	Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Names declared in the Ontology Source Reference section.
Assay Technology Platform	String	Manufacturer and platform name, e.g. Bruker AVANCE
Assay File Name	String	A field to specify the name of the Assay Table file corresponding the definition of that assay. There can be only one file per cell.

Example

For example, the ASSAY section of an ISA-XLSX isa.assay.xlsx file may look as follows:


ASSAY
Assay File Name	assays/Proteomics/isa.assay.xlsx
Assay Measurement Type	Proteomics
Assay Measurement Type Term Accession Number	http://purl.obolibrary.org/obo/NCIT_C20085
Assay Measurement Type Term Source REF	NCIT
Assay Technology Type	Mass Spectrometry
Assay Technology Type Term Accession Number	http://purl.obolibrary.org/obo/NCIT_C17156
Assay Technology Type Term Source REF	NCIT
Assay Technology Platform	Orbitrap Fusion Lumos

ASSAY PERFORMERS

This section MUST contain zero or more values.

This section MUST contain the following labels, with the specified datatypes for values supported:

Label	Datatype	Description
Assay Person Last Name	String	The last name of a person associated with the Assay.
Assay Person First Name	String	Assay Person Name
Assay Person Mid Initials	String	The middle initials of a person associated with the Assay.
Assay Person Email	String formatted as email	The email address of a person associated with the Assay.
Assay Person Phone	String	The telephone number of a person associated with the Assay.
Assay Person Fax	String	The fax number of a person associated with the assay.
Assay Person Address	String	The address of a person associated with the assay.
Assay Person Affiliation	String	The organization affiliation for a person associated with the assay.
Assay Person Roles	String or Ontology Annotation if accompanied by Term Accession Numbers and Term Source REFs	Term to classify the role(s) performed by this person in the context of the assay, which means that the roles reported here need not correspond to roles held withing their affiliated organization. Multiple annotations or values attached to one person can be provided by using a semicolon (“;”) Unicode (U0003+B) as a separator (e.g.: submitter;funder;sponsor) .The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required.
Assay Person Roles Term Accession Number	String	The accession number from the Term Source associated with the selected term.
Assay Person Roles Term Source REF	String	Identifies the controlled vocabulary or ontology that this term comes from. The Source REF has to match one of the Term Source Names declared in the Ontology Source Reference section.

Example

For example, the ASSAY PERFORMERS section of an ISA-XLSX isa.assay.xlsx file may look as follows:


ASSAY PERFORMERS
Assay Person Last Name	Zhang	Tzeng	Evans
Assay Person First Name	Ningning	Shin-Cheng	Bradley
Assay Person Mid Initials
Assay Person Email
Assay Person Phone
Assay Person Fax
Assay Person Address	St. Louis, Missouri 63132, USA	St. Louis, Missouri 63132, USA	St. Louis, Missouri 63132, USA
Assay Person Affiliation	Donald Danforth Plant Science Center	Donald Danforth Plant Science Center	Donald Danforth Plant Science Center
Assay Person Roles	Investigator	Laboratory Technologist	Laboratory Technologist
Assay Person Roles Term Accession Number	http://purl.obolibrary.org/obo/NCIT_C25936	http://purl.obolibrary.org/obo/NCIT_C51830	http://purl.obolibrary.org/obo/NCIT_C51830
Assay Person Roles Term Source REF	NCIT	NCIT	NCIT

Annotation Table sheets

Annotation Table sheets are used to describe the experimental flow in detailed, machine readable way. In each sheet, there is a mapping from input entities to output entities, placed in the Input and Output columns, accordingly. The other columns then are used to either describe those entities or the processes that led to this mapping.

In the Annotation Table sheets, column headers MUST have the first letter of each word in upper case, with the exception of the referencing label (REF).

The content of the annotation table MUST be placed in an xlsx table whose name starts with annotationTable. Each sheet MUST contain at most one such annotation table. Only cells inside this table are considered as part of the formatted metadata.

Annotation Table sheets are structured with fields organized on a per-row basis. The first row MUST be used for column headers. Each body row is an implementation of a Process node.

Inputs and Outputs

Each annotation table sheet MUST contain at most one Input and at most one Output column, which denote the Input and Output node of the Process node respectively. They MUST be formatted in the pattern Input [<NodeType>] and Output [<NodeType>].

NodeTypes MUST be one of the following:

A Source MUST be indicated with the node type Source Name. Sources MUST not be used as Output nodes.
A Sample MUST be indicated with the node type Sample Name.
An Extract Material MUST be indicated with the node type Material Name.
A Data object MUST be indicated with the node type Data.

Source Names, Sample Names, Material Names MUST be unique across an ARC. If two of these entities with the same name exist in the same ARC, they are considered the same entity.

The Data node type MUST correspond to a relevant data resource location, following the Data Path Annotation patterns. If the annotation of the Data node refers not to the complete resource, but a part of it, a Selector MAY be added. This Selector MUST be separated from the resource location using a #— with no whitespace between: location#selector. If appropriate, the Selector SHOULD be formatted according to IRI fragment selectors specified by W3.

The format of the data resource MAY be further qualified using a Data Format column. The Data Format SHOULD be expressed using a MIME format, most commonly consisting of two parts: a type and a subtype, separated by a slash (/) — with no whitespace between: type/subtype. If appropriate, a format from the list composed by IANA SHOULD be picked. Unregistered or niche encoding and file formats MAY be indicated instead via the most appropriate URL.

The format and usage info about the Selector MAY be further qualified using a Data Selector Format column. The Data Selector Format SHOULD point to a web resource containing instructions about how the Selector is formatted and how it should be interpreted.

Examples

Data Location and Selector

In this example, there is a measurement of two Samples, namely input1 and input2. The values measured are both written into the same data resource in the location result.csv, whichs formatting is tabular, according to the Data Format being text/csv. To distinguish between the measurement values stemming from the different inputs, selectors were added to the resource location (seperated by a #), namely col=1 and col=2. The specification about the formatting of these selectors can be found in the provided link, namely https://datatracker.ietf.org/.

Input [Sample Name]	Output [Data]	Data Format	Data Selector Format
input1	result.csv#col=1	text/csv	https://datatracker.ietf.org/doc/html/rfc7111
input2	result.csv#col=2	text/csv	https://datatracker.ietf.org/doc/html/rfc7111

Protocol Columns

Protocol REF columns MAY be used to specify the name of the Protocol node implemented by the Process node. Per Annotation Table sheet there MUST be at most one Protocol REF column. The value MUST be free text.

Protocol Version columns MAY be used to specify the version of the Protocol node implemented by the Process node. Per Annotation Table sheet there MUST be at most one Protocol Version column. The value MUST be free text.

Protocol Description columns MAY be used to specify the description of the Protocol node implemented by the Process node. Per Annotation Table sheet there MUST be at most one Protocol Description column. The value MUST be free text.

Protocol Uri columns MAY be used to specify the uri of the Protocol node implemented by the Process node. Per Annotation Table sheet there MUST be at most one Protocol Uri column. The value MUST be either a URI or a file path corresponding to a relevant protocol file location.

Protocol Type columns MAY be used to specify the type of the Protocol node implemented by the Process node. Per Annotation Table sheet there MUST be at most one Protocol Type column. The value MUST be free text, or an Ontology Annotation.

Ontology Annotations

Where a value is an Ontology Annotation in an annotation table, Term Accession Number and Term Source REF columns MUST follow the main column.

An Ontology Annotation MAY be applied to any appropriate Characteristic, Parameter, Factor, Component or Protocol Type.

This implements Ontology Annotation from the ISA Abstract Model.

Ontology Annotation Headers

The header of the main column MUST contain the structural column type followed by the name of the ontology term in [] brackets. There SHOULD be a space between the column type and the [ bracket.

The headers of the two annotation columns SHOULD contain further ontological information about the ontology term of the main header. In this case, following the static header string, separated by a single space, there MUST be a short ontology term identifier formatted as CURIEs (prefixed identifiers) of the form <IDSPACE>:<LOCALID> (specified here) inside () brackets.

In the other case, i.e. when the annotation columns do not contain further ontological information, the static header strings MUST be either followed by a single space and empty () brackets or nothing.

Ontology Annotation Values

The value in the main column MUST contain the name of the ontology term.

The value in the Term Source REF column MUST either contain a short identifier for the IDSPACE, which identifies the ontology containing the term, or be left empty.

The value in the Term Accession Number column MUST either contain a value formatted in one of the following formats, or be left empty:

LOCALID of the ontology, which is only applicable if the matching IDSPACE is given in the Term Source REF column
short ontology term identifier formatted as CURIEs (prefixed identifiers) of the form <IDSPACE>:<LOCALID> (specified here)
URL pointing to the ontology term

Ontology Annotation Example

For example, a characteristic type organism with a value of Homo sapiens can be qualified with an Ontology Annotation of a term from NCBI Taxonomy as follows:

Characteristic [organism]	Term Source REF (OBI:0100026)	Term Accession Number (OBI:0100026)
Homo sapiens	NCBITaxon	http://…/NCBITAXON_9606

Note

In this example, the value in the Term Accession Number column is formatted as a URL, but shortened for the purpose of markdown-formatting.

Unit

Where a value is numeric, a Unit MAY be used to qualify the quantity. In this case, the main column must be followed by a Unit column, which in turn SHOULD be further annotated as an Ontology Annotation, being followed by Term Accession Number and Term Source REF columns.

The headers of the annotation columns then refer to the header of the main column.
The values of the annotation columns then refer to the unit, and not to the numeric value of the main column.

For example, in the following, the header ontology temperature is further qualified with the CURIE PATO:0000146. The value 300 is qualified with a Unit Kelvin, which is further qualified as an Ontology Annotation from the Units Ontology declared in the Ontology Sources with UO:

Parameter [temperature]	Unit	Term Source REF (PATO:0000146)	Term Accession Number (PATO:0000146)
300	Kelvin	UO	http://…/obo/UO_0000012

Note

In this example, the value in the Term Accession Number column is formatted as a URL, but shortened for the purpose of markdown-formatting.

Characteristics

A Characteristic is used as an attribute column following Sources and Samples. This column contains terms describing each material according to the characteristics category indicated in the column header in the pattern Characteristic [<category term>]. For example, a column header Characteristic [organ part] would contain terms describing an organ part. Characteristic SHOULD be used as an attribute column following Input [Source Name], or Input [Sample Name]. The value MUST be free text, numeric, or an Ontology Annotation.

For example, a characteristic type Organism with a value of Homo sapiens can be qualified with an Ontology Annotation of a term from NCBI Taxonomy as follows:

Characteristic [organ part]	Term Source REF (UBERON:0000064)	Term Accession Number (UBERON:0000064)
Liver	MeSH	D008099

Note

In this example, the value in the Term Accession Number column is formatted as a LOCALID. The associated IDSPACE to identify the ontology term is given in the Term Source REF column.

Factors

A Factor is an independent variable manipulated by an experimentalist with the intention to affect biological systems in a way that can be measured by an assay. This field holds the actual data for the Factor named between the square brackets (as declared in the Study Factors section of a top-level metadata sheet) so MUST match, for example, Factor [compound]. The value MUST be free text, numeric, or an Ontology Annotation.

Factor [Gender]	Term Source REF (NCIT:C17357)	Term Accession Number (NCIT:C17357)
Male	MeSH	D008297

Note

In this example, the value in the Term Accession Number column is formatted as a LOCALID. The associated IDSPACE to identify the ontology term is given in the Term Source REF column.

Components

A Component is a consumable or reusable physical entity used in the experimental workflow. It is formatted in the pattern Component [<category term>]. The value MUST be free text, numeric, or an Ontology Annotation.

Component [Measurement Device]	Term Source REF (NCIT:C81182)	Term Accession Number (NCIT:C81182)
Illumina MiniSeq	OBI	http://…/obo/OBI_0003114

Note

In this example, the value in the Term Accession Number column is formatted as a URL, but shortened for the purpose of markdown-formatting.

Parameters

A Parameter can be used to specify any additional information about the experimental setup, that does not fall under the aforementioned 3 categories. It is formatted in the pattern Parameter [<category term>]. The value MUST be free text, numeric, or an Ontology Annotation.

Parameter [temperature]	Unit	Term Source REF (NCRO:0000029)	Term Accession Number (NCRO:0000029)
300	Kelvin	UO	http://…/obo/UO_0000032

Note

In this example, the value in the Term Accession Number column is formatted as a URL, but shortened for the purpose of markdown-formatting.

Comments

A Comment can be used to provide some additional information. Columns headed with Comment[<comment name>] MAY appear anywhere in the Annotation Table. The comment always refers to the Annotation Table. The value MUST be free text.

Comment [Answer to everything]
forty-two

Others

Columns whose headers do not follow any of the formats described above are considered additional payload and are out of the scope of this specification.

Examples

For example, a simple source to sample may be represented as:

Input [Source Name]	Protocol REF	Output [Sample Name]
source1	sample collection	sample1

Where a graph splits or pools, we use the Input or Output column to represent the same nodes.

For example, if we split a source into two samples, we might represent this as:

Input [Source Name]	Protocol REF	Output [Sample Name]
source1	sample collection	sample1
source1	sample collection	sample2

If we pool two sources into a single sample, we might represent this as:

Input [Source Name]	Protocol REF	Output [Sample Name]
source1	sample collection	sample1
source2	sample collection	sample1

Datamap table sheets

Datamap Table sheets are used to describe the contents of data files.

In the Datamap Table sheets, column headers MUST have the first letter of each word in upper case, with the exception of the referencing label (REF).

The content of the datamap table MUST be placed in an xlsx table whose name equals datamapTable. Each sheet MUST contain at most one such datamap table. Only cells inside this table are considered as part of the formatted metadata.

Datamap Table sheets are structured with fields organized on a per-row basis. The first row MUST be used for column headers. Each body row is an implementation of a data node.

Data column

Every Datamap Table sheet MUST contain a Data column. Every object in this column MUST correspond to a relevant data resource location, following the Data Path Annotation patterns. If the annotation of the Data node refers not to the complete resource, but a part of it, a Selector MAY be added. This Selector MUST be separated from the resource location using a #— with no whitespace between: location#selector. If appropriate, the Selector SHOULD be formatted according to IRI fragment selectors specified by W3.

The format of the data resource MAY be further qualified using a Data Format column. The Data Format SHOULD be expressed using a MIME format, most commonly consisting of two parts: a type and a subtype, separated by a slash (/) — with no whitespace between: type/subtype. If appropriate, a format from the list composed by IANA SHOULD be picked. Unregistered or niche encoding and file formats MAY be indicated instead via the most appropriate URL.

The format and usage info about the Selector MAY be further qualified using a Data Selector Format column. The Data Selector Format SHOULD point to a web resource containing instructions about how the Selector is formatted and how it should be interpreted.

Explication column

Every Datamap Table sheet SHOULD contain an Explication column. The Explication adds explicit meaning to the data node. The value MUST be free text, or an Ontology Annotation.

Explication	Term Source REF	Term Accession Number
average value	OBI	http://…/obo/OBI_0000679

Unit column

Every Datamap Table sheet SHOULD contain an Unit column. The Unit adds a unit of measurement to the data node. The value MUST be free text, or an Ontology Annotation.

Unit	Term Source REF	Term Accession Number
milligram per milliliter	UO	http://…/obo/UO_0000176

Note

In this example, the value in the Term Accession Number column is formatted as a URL, but shortened for the purpose of markdown-formatting.

Object Type column

Every Datamap Table sheet SHOULD contain an Object Type column. The Object Type defines the shape or format in which the data node is represented. The value MUST be free text, or an Ontology Annotation.

Object Type	Term Source REF	Term Accession Number
Float	NCIT	http://…/obo/NCIT_C48150

Note

In this example, the value in the Term Accession Number column is formatted as a URL, but shortened for the purpose of markdown-formatting.

Description column

Every Datamap Table sheet SHOULD contain a Description column. The Description gives additional, humand readable context about the data node. The value MUST be free text.

Description
The average protein concentration for the given gene

Generated By column

Every Datamap Table sheet SHOULD contain a Generated By column. The Generated By names the tool which led to the creation of the data node. The value MUST be free text.

If possible, the value in this column MUST correspond to a relevant data resource location, following the Data Path Annotation patterns.

Generated By
GeneStatisticsTool.exe

Comments

A Comment can be used to provide some additional information. Columns headed with Comment[<comment name>] MAY appear anywhere in the Annotation Table. The comment always refers to the Annotation Table. The value MUST be free text.

Comment [Answer to everything]
forty-two

Examples

For example, a simple datamap table representing a tabular datafile might look as follows:

Data	Explication	Term Source REF	Term Accession Number	Unit	Term Source REF	Term Accession Number	Object Type	Term Source REF	Term Accession Number	Description	GeneratedBy
MyData.csv#col=1	Gene Identifier	NCIT	http://…/obo/NCIT_C48664				String	NCIT	http://…/obo/NCIT_C45253	Short hand identifier of the gene coding for the protein.	GeneStatisticsTool.exe
MyData.csv#col=2	average value	OBI	http://…/obo/OBI_0000679	milligram per milliliter	UO	http://…/obo/UO_0000176	Float	NCIT	http://…/obo/NCIT_C48150	The average protein concentration for the given gene	GeneStatisticsTool.exe
MyData.csv#col=3	p-value	OBI	http://…/obo/OBI_0000175				Float	NCIT	http://…/obo/NCIT_C48150	p-value of t-test against control.	GeneStatisticsTool.exe

In this example, the datamap table describes a single data file named MyData.csv. This file contains three columns. The first column contains gene identifiers, the other two results of a statistical analysis performed by the tool GeneStatisticsTool.exe.

Files

ISA-XLSX.md

Latest commit

History

ISA-XLSX.md

File metadata and controls

ISA-XLSX format

Investigation File

Study File

Assay File

Datamap File

Top-level metadata sheets

ATTENTION

Ontology Source Reference section

ONTOLOGY SOURCE REFERENCE

INVESTIGATION section

INVESTIGATION

INVESTIGATION PUBLICATIONS

INVESTIGATION CONTACTS

STUDY section

STUDY

STUDY DESIGN DESCRIPTORS

STUDY PUBLICATIONS

STUDY FACTORS

STUDY ASSAYS

STUDY PROTOCOLS

STUDY CONTACTS

ASSAY section

ASSAY

ASSAY PERFORMERS

Annotation Table sheets

Inputs and Outputs

Examples

Data Location and Selector

Protocol Columns

Ontology Annotations

Ontology Annotation Headers

Ontology Annotation Values

Ontology Annotation Example

Unit

Characteristics

Factors

Components

Parameters

Comments

Others

Examples

Datamap table sheets

Data column

Explication column

Unit column

Object Type column

Description column

Generated By column

Comments

Examples