From f2c40fd6c6d6abf9a290da89eacc971e630243fc Mon Sep 17 00:00:00 2001
From: j-uranic <117292295+j-uranic@users.noreply.github.com>
Date: Thu, 14 Nov 2024 15:50:37 -0500
Subject: [PATCH] J uranic/dirschema updates (#1384)
* Create segmentation-mask-v2.2.yaml
Add top-level "extras" directory
* Create af-v2.1.yaml
raw\/channel_layout\.tsv - make optional
raw\/images\/[^\/]+\.(?:xml|nd2|oir|lif|czi|tiff|qptiff) - add qptiff
* Create codex-v2.1.yaml
Add .* to end of processed\/drv_[^\/]*\/ and raw\/src_[^\/]*\/
* Update CHANGELOG.md
* Docs: Update dir schemas for af, seg-mask, and codex
---------
Co-authored-by: Juan Puerto <=>
---
CHANGELOG.md | 7 +-
docs/af/current/index.md | 27 +++++-
docs/codex/current/index.md | 24 ++++-
docs/segmentation-mask/current/index.md | 17 +++-
.../directory-schemas/af-v2.1.yaml | 89 +++++++++++++++++++
.../directory-schemas/codex-v2.1.yaml | 80 +++++++++++++++++
.../segmentation-mask-v2.2.yaml | 46 ++++++++++
7 files changed, 286 insertions(+), 4 deletions(-)
create mode 100644 src/ingest_validation_tools/directory-schemas/af-v2.1.yaml
create mode 100644 src/ingest_validation_tools/directory-schemas/codex-v2.1.yaml
create mode 100644 src/ingest_validation_tools/directory-schemas/segmentation-mask-v2.2.yaml
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 44051ccc..3c36a9a4 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,7 +1,12 @@
# Changelog
-## v0.0.29 (in progress)
+## v0.0.30 (in progress)
+
+## v0.0.29
- Add CosMX directory schema
- Update CosMX directory schema
+- Update Segmentation masks directory schema
+- Update Auto-fluorescence directory schema
+- Update CODEX directory schema
## v0.0.28
- Update Xenium directory schema
diff --git a/docs/af/current/index.md b/docs/af/current/index.md
index d3a941de..da75a822 100644
--- a/docs/af/current/index.md
+++ b/docs/af/current/index.md
@@ -28,7 +28,32 @@ This schema is for autofluorescence (AF). For an example of an AF dataset & dire
## Directory schemas
-Version 2.0 (use this one)
+Version 2.1 (use this one)
+
+| pattern | required? | description |
+| --- | --- | --- |
+| extras\/.*
| ✓ | Folder for general lab-specific files related to the dataset. [Exists in all assays] |
+| extras\/microscope_hardware\.json
| ✓ | **[QA/QC]** A file generated by the micro-meta app that contains a description of the hardware components of the microscope. Email HuBMAP Consortium Help Desk if help is required in generating this document. |
+| extras\/microscope_settings\.json
| | **[QA/QC]** A file generated by the micro-meta app that contains a description of the settings that were used to acquire the image data. Email HuBMAP Consortium Help Desk if help is required in generating this document. |
+| raw\/.*
| ✓ | Raw data files for the experiment. |
+| raw\/channel_layout\.tsv
| | Table that includes a dictionary for channel to moiety, which may be a protein given in an OMAP panel or captured in the ASCT+B table. |
+| raw\/images\/.*
| ✓ | Raw image files. Using this subdirectory allows for harmonization with other imaging assays. [This directory must include at least one raw file.] |
+| raw\/images\/[^\/]+\.(?:xml|nd2|oir|lif|czi|tiff|qptiff)
| ✓ | Raw microscope file for the experiment |
+| lab_processed\/.*
| ✓ | Experiment files that were processed by the lab generating the data. |
+| lab_processed\/images\/.*
| ✓ | Processed image files |
+| lab_processed\/images\/[^\/]+\.ome\.tiff
(example: lab_processed/images/HBM892.MDXS.293.ome.tiff
) | ✓ | OME-TIFF files (multichannel, multi-layered) produced by the microscopy experiment. If compressed, must use loss-less compression algorithm. See the following link for the set of fields that are required in the OME TIFF file XML header. |
+| lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv
| ✓ | This file provides essential documentation pertaining to each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. The required fields are detailed |
+| lab_processed\/transformations\/.*
| | This directory contains transformation matrices that capture how each modality is aligned with the other and can be used to visualize overlays of multimodal data. This is needed to overlay images from the exact same tissue section (e.g., MALDI imaging mass spec, autofluorescence microscopy, MxIF, histological stains). In these cases data type may have different pixel sizes and slightly different orientations (i.e., one may be rotated relative to another). |
+| lab_processed\/transformations\/[^\/]+\.txt
| | Transformation matrices used to overlay images from the exact same tissue section (e.g., MALDI imaging mass spec, autofluorescence microscopy, MxIF, histological stains). |
+| qa_qc\/.*
| ✓ | Directory containing QA and/or QC information. |
+| qa_qc\/resolution_report\/.*
| ✓ | Directory containing the results of resolution tests and/or vendor preventative maintenance reports. |
+| qa_qc\/resolution_report\/resolution\.txt
| | This file summarizes the results of resolution tests or vendor reports from preventative maintenance visits. |
+| qa_qc\/resolution_report\/[^\/]+\.pdf
| | This file is a pdf from a vendor preventative maintenance visit or resolution check tool demonstrating resolution. This file may include illumination test results. |
+| qa_qc\/illumination_report\/.*
| ✓ | Directory containing the results of illumination tests and/or vendor preventative maintenance reports. |
+| qa_qc\/illumination_report\/illumination.txt
| | This file summarizes the results of illumination tests or vendor reports from preventative maintenance visits. |
+| qa_qc\/illumination_report\/[^\/]+\.pdf
| | This file is a pdf from a vendor preventative maintenance visit or illumination check tool demonstrating illumination intensity. |
+
+Version 2.0
| pattern | required? | description |
| --- | --- | --- |
diff --git a/docs/codex/current/index.md b/docs/codex/current/index.md
index 137e3b72..f3a70869 100644
--- a/docs/codex/current/index.md
+++ b/docs/codex/current/index.md
@@ -28,7 +28,29 @@ Related files:
## Directory schemas
-Version 2.0 (use this one)
+Version 2.1 (use this one)
+
+| pattern | required? | description |
+| --- | --- | --- |
+| extras\/.*
| ✓ | Folder for general lab-specific files related to the dataset. [Exists in all assays] |
+| extras\/microscope_hardware\.json
| ✓ | **[QA/QC]** A file generated by the micro-meta app that contains a description of the hardware components of the microscope. Email HuBMAP Consortium Help Desk if help is required in generating this document. |
+| extras\/microscope_settings\.json
| | **[QA/QC]** A file generated by the micro-meta app that contains a description of the settings that were used to acquire the image data. Email HuBMAP Consortium Help Desk if help is required in generating this document. |
+| raw\/.*
| ✓ | This is a directory containing raw data. |
+| lab_processed\/images\/[^\/]+\.ome\.tiff
| ✓ | OME-TIFF file (multichannel, multi-layered) produced by the experiment. If compressed, must use loss-less compression algorithm. See the following link for the set of fields that are required in the OME TIFF file XML header. |
+| lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv
| ✓ | This file should describe any processing that was done to generate the images in each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. Two columns should be booleans "is this a channel to use for nuclei segmentation" and "is this a channel to use for cell segmentation". |
+| [^\/]*NAV[^\/]*\.tif
(example: NAV.tif
) | | Navigational Image showing Region of Interest (Keyance Microscope only) |
+| [^\/]+\.pdf
(example: summary.pdf
) | | **[QA/QC]** PDF export of Powerpoint slide deck containing the Image Analysis Report |
+| extras\/dir-schema-v2-with-dataset-json
| ✓ | Empty file whose presence indicates the version of the directory schema in use |
+| processed\/drv_[^\/]*\/.*
| ✓ | Processed files produced by the Akoya software or alternative software. |
+| raw\/cyc[^\/]*_reg[^\/]*\/.*
| ✓ | Intermediary directory |
+| raw\/src_[^\/]*\/.*
| ✓ | Intermediary directory |
+| raw\/cyc[^\/]*_reg[^\/]*\/[^\/]*_z[^\/]*_CH[^\/]*\.tif
| ✓ | TIFF files produced by the experiment. General folder format: Cycle(n)_Region(n)_date; General file format: name_tileNumber(n)_zplaneNumber(n)_channelNumber(n) |
+| raw\/src_[^\/]*\/cyc[^\/]*_reg[^\/]*_[^\/]*\/[^\/]+\.gci
| | Group Capture Information File (Keyance Microscope only) |
+| raw\/dataset\.json
(example: raw/dataset.json
) | ✓ | Data processing parameters file. This will include additional CODEX specific metadata needed for the HIVE processing workflow. |
+| raw\/reg_[^\/]*\.png
(example: raw/reg_00.png
) | | Region overviews |
+| raw\/experiment\.json
(example: raw/experiment.json
) | | JSON file produced by the Akoya software which contains the metadata for the experiment, including the software version used, microscope parameters, channel names, pixel dimensions, etc. (required for HuBMAP pipeline) |
+
+Version 2.0
| pattern | required? | description |
| --- | --- | --- |
diff --git a/docs/segmentation-mask/current/index.md b/docs/segmentation-mask/current/index.md
index 2fc54f93..37fb0828 100644
--- a/docs/segmentation-mask/current/index.md
+++ b/docs/segmentation-mask/current/index.md
@@ -28,7 +28,22 @@ For additional documentation on Segmentation Masks, please visit [here](https://
## Directory schemas
-Version 2.1 (use this one)
+Version 2.2 (use this one)
+
+| pattern | required? | description |
+| --- | --- | --- |
+| extras\/.*
| ✓ | Folder for general lab-specific files related to the dataset. |
+| derived\/.*
| ✓ | The EPIC data is placed in TOP/derived/ so it does not conflict with any files if it is uploaded with a primary dataset. |
+| derived\/extras\/.*
| ✓ | Folder for general lab-specific files related to the derived dataset. |
+| derived\/segmentation_masks\/.*
| ✓ | Directory containing segmentation masks. |
+| derived\/segmentation_masks\/[^\/]+\.segmentations\.ome\.tiff
| ✓ | The segmentation masks should be stored as multi-channel pyramidal OME TIFF bitmasks with one channel per mask, where a single mask contains all instances of a type of object (e.g., all cells, a class of FTUs, etc). The class of objects contained in the mask is documented in the segmentation-masks.csv file. Each individual object in a mask should be represented by a unique integer pixel value starting at 1, with 0 meaning background (e.g., all pixels belonging to the first instance of a T-cell have a value of 1, the pixels for the second instance of a T-cell have a value of 2, etc). The pixel values should be unique within a mask. FTUs and other structural elements should be captured the same way as cells with segmentation masks and the appropriate channel feature definitions. |
+| derived\/segmentation_masks\/[^\/]+-objects\.(?:tsv|xlsx)
| ✓ | This is a matrix where each row describes an individual object (e.g., one row per cell in the case where a mask contains all cells) and columns are features (i.e., object type, marker intensity, classification strategies, etc). One file should be created per mask with the name of the mask prepended to the file name. For example, if there is a cell segmentation map called "cells" then you would include a file called "cells-objects.csv" and that file would contain one row per cell in the "cells" mask and one column per feature, such as marker intensity and/or cell type. A minimum set of fields (required and optional) is included below. |
+| derived\/segmentation_masks\/[^\/]+-centroid-adjacency\.csv
| | Objects are required to be in the same mask. A separate centroid-adjacency file can be included per mask. |
+| derived\/segmentation_masks\/[^\/]+-linkage-adjacency\.csv
| | Objects are required to be in the same mask. A separate linkage-adjacency file can be included per mask. |
+| derived\/segmentation_masks\/[^\/]+-mesh\.glb
| | This is a file with 3D mesh images for a 3D map. |
+| derived\/segmentation_masks\/transformations\/.*
| | This directory should include any transformation files that pertain to a 3D reconstruction from serial sections. The mask protocol should explain the structure of these transformation files and how they can be used to reconstruct the 3D map from the 2D sections. |
+
+Version 2.1
| pattern | required? | description |
| --- | --- | --- |
diff --git a/src/ingest_validation_tools/directory-schemas/af-v2.1.yaml b/src/ingest_validation_tools/directory-schemas/af-v2.1.yaml
new file mode 100644
index 00000000..2aeff09f
--- /dev/null
+++ b/src/ingest_validation_tools/directory-schemas/af-v2.1.yaml
@@ -0,0 +1,89 @@
+files:
+ -
+ pattern: extras\/.*
+ required: True
+ description: Folder for general lab-specific files related to the dataset. [Exists in all assays]
+ -
+ pattern: extras\/microscope_hardware\.json
+ required: True
+ description: A file generated by the micro-meta app that contains a description of the hardware components of the microscope. Email HuBMAP Consortium Help Desk if help is required in generating this document.
+ is_qa_qc: True
+ -
+ pattern: extras\/microscope_settings\.json
+ required: False
+ description: A file generated by the micro-meta app that contains a description of the settings that were used to acquire the image data. Email HuBMAP Consortium Help Desk if help is required in generating this document.
+ is_qa_qc: True
+ -
+ pattern: raw\/.*
+ required: True
+ description: Raw data files for the experiment.
+ -
+ pattern: raw\/channel_layout\.tsv
+ required: False
+ description: Table that includes a dictionary for channel to moiety, which may be a protein given in an OMAP panel or captured in the ASCT+B table.
+ is_qa_qc: False
+ -
+ pattern: raw\/images\/.*
+ required: True
+ description: Raw image files. Using this subdirectory allows for harmonization with other imaging assays. [This directory must include at least one raw file.]
+ -
+ pattern: raw\/images\/[^\/]+\.(?:xml|nd2|oir|lif|czi|tiff|qptiff)
+ required: True
+ description: Raw microscope file for the experiment
+ is_qa_qc: False
+ -
+ pattern: lab_processed\/.*
+ required: True
+ description: Experiment files that were processed by the lab generating the data.
+ -
+ pattern: lab_processed\/images\/.*
+ required: True
+ description: Processed image files
+ -
+ pattern: lab_processed\/images\/[^\/]+\.ome\.tiff
+ required: True
+ description: OME-TIFF files (multichannel, multi-layered) produced by the microscopy experiment. If compressed, must use loss-less compression algorithm. See the following link for the set of fields that are required in the OME TIFF file XML header.
+ is_qa_qc: False
+ example: lab_processed/images/HBM892.MDXS.293.ome.tiff
+ -
+ pattern: lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv
+ required: True
+ description: This file provides essential documentation pertaining to each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. The required fields are detailed
+ is_qa_qc: False
+ -
+ pattern: lab_processed\/transformations\/.*
+ required: False
+ description: This directory contains transformation matrices that capture how each modality is aligned with the other and can be used to visualize overlays of multimodal data. This is needed to overlay images from the exact same tissue section (e.g., MALDI imaging mass spec, autofluorescence microscopy, MxIF, histological stains). In these cases data type may have different pixel sizes and slightly different orientations (i.e., one may be rotated relative to another).
+ -
+ pattern: lab_processed\/transformations\/[^\/]+\.txt
+ required: False
+ description: Transformation matrices used to overlay images from the exact same tissue section (e.g., MALDI imaging mass spec, autofluorescence microscopy, MxIF, histological stains).
+ is_qa_qc: False
+ -
+ pattern: qa_qc\/.*
+ required: True
+ description: Directory containing QA and/or QC information.
+ -
+ pattern: qa_qc\/resolution_report\/.*
+ required: True
+ description: Directory containing the results of resolution tests and/or vendor preventative maintenance reports.
+ -
+ pattern: qa_qc\/resolution_report\/resolution\.txt
+ required: False
+ description: This file summarizes the results of resolution tests or vendor reports from preventative maintenance visits.
+ -
+ pattern: qa_qc\/resolution_report\/[^\/]+\.pdf
+ required: False
+ description: This file is a pdf from a vendor preventative maintenance visit or resolution check tool demonstrating resolution. This file may include illumination test results.
+ -
+ pattern: qa_qc\/illumination_report\/.*
+ required: True
+ description: Directory containing the results of illumination tests and/or vendor preventative maintenance reports.
+ -
+ pattern: qa_qc\/illumination_report\/illumination.txt
+ required: False
+ description: This file summarizes the results of illumination tests or vendor reports from preventative maintenance visits.
+ -
+ pattern: qa_qc\/illumination_report\/[^\/]+\.pdf
+ required: False
+ description: This file is a pdf from a vendor preventative maintenance visit or illumination check tool demonstrating illumination intensity.
diff --git a/src/ingest_validation_tools/directory-schemas/codex-v2.1.yaml b/src/ingest_validation_tools/directory-schemas/codex-v2.1.yaml
new file mode 100644
index 00000000..4cb48a98
--- /dev/null
+++ b/src/ingest_validation_tools/directory-schemas/codex-v2.1.yaml
@@ -0,0 +1,80 @@
+files:
+ -
+ pattern: extras\/.*
+ required: True
+ description: Folder for general lab-specific files related to the dataset. [Exists in all assays]
+ -
+ pattern: extras\/microscope_hardware\.json
+ required: True
+ description: A file generated by the micro-meta app that contains a description of the hardware components of the microscope. Email HuBMAP Consortium Help Desk if help is required in generating this document.
+ is_qa_qc: True
+ -
+ pattern: extras\/microscope_settings\.json
+ required: False
+ description: A file generated by the micro-meta app that contains a description of the settings that were used to acquire the image data. Email HuBMAP Consortium Help Desk if help is required in generating this document.
+ is_qa_qc: True
+ -
+ pattern: raw\/.*
+ required: True
+ description: This is a directory containing raw data.
+ -
+ pattern: lab_processed\/images\/[^\/]+\.ome\.tiff
+ required: True
+ description: OME-TIFF file (multichannel, multi-layered) produced by the experiment. If compressed, must use loss-less compression algorithm. See the following link for the set of fields that are required in the OME TIFF file XML header.
+ is_qa_qc: False
+ -
+ pattern: lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv
+ required: True
+ description: This file should describe any processing that was done to generate the images in each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. Two columns should be booleans "is this a channel to use for nuclei segmentation" and "is this a channel to use for cell segmentation".
+ -
+ pattern: '[^\/]*NAV[^\/]*\.tif'
+ required: False
+ description: Navigational Image showing Region of Interest (Keyance Microscope only)
+ is_qa_qc: False
+ example: NAV.tif
+ -
+ pattern: '[^\/]+\.pdf'
+ required: False
+ description: PDF export of Powerpoint slide deck containing the Image Analysis Report
+ is_qa_qc: True
+ example: summary.pdf
+ -
+ pattern: extras\/dir-schema-v2-with-dataset-json
+ required: True
+ description: Empty file whose presence indicates the version of the directory schema in use
+ is_qa_qc: False
+ -
+ pattern: processed\/drv_[^\/]*\/.*
+ required: True
+ description: Processed files produced by the Akoya software or alternative software.
+ -
+ pattern: raw\/cyc[^\/]*_reg[^\/]*\/.*
+ required: True
+ description: Intermediary directory
+ -
+ pattern: raw\/src_[^\/]*\/.*
+ required: True
+ description: Intermediary directory
+ -
+ pattern: raw\/cyc[^\/]*_reg[^\/]*\/[^\/]*_z[^\/]*_CH[^\/]*\.tif
+ required: True
+ description: 'TIFF files produced by the experiment. General folder format: Cycle(n)_Region(n)_date; General file format: name_tileNumber(n)_zplaneNumber(n)_channelNumber(n)'
+ -
+ pattern: raw\/src_[^\/]*\/cyc[^\/]*_reg[^\/]*_[^\/]*\/[^\/]+\.gci
+ required: False
+ description: Group Capture Information File (Keyance Microscope only)
+ -
+ pattern: raw\/dataset\.json
+ required: True
+ description: Data processing parameters file. This will include additional CODEX specific metadata needed for the HIVE processing workflow.
+ example: raw/dataset.json
+ -
+ pattern: raw\/reg_[^\/]*\.png
+ required: False
+ description: Region overviews
+ example: raw/reg_00.png
+ -
+ pattern: raw\/experiment\.json
+ required: False
+ description: JSON file produced by the Akoya software which contains the metadata for the experiment, including the software version used, microscope parameters, channel names, pixel dimensions, etc. (required for HuBMAP pipeline)
+ example: raw/experiment.json
diff --git a/src/ingest_validation_tools/directory-schemas/segmentation-mask-v2.2.yaml b/src/ingest_validation_tools/directory-schemas/segmentation-mask-v2.2.yaml
new file mode 100644
index 00000000..95c888e2
--- /dev/null
+++ b/src/ingest_validation_tools/directory-schemas/segmentation-mask-v2.2.yaml
@@ -0,0 +1,46 @@
+files:
+ -
+ pattern: extras\/.*
+ required: True
+ description: Folder for general lab-specific files related to the dataset.
+ -
+ pattern: derived\/.*
+ required: True
+ description: The EPIC data is placed in TOP/derived/ so it does not conflict with any files if it is uploaded with a primary dataset.
+ -
+ pattern: derived\/extras\/.*
+ required: True
+ description: Folder for general lab-specific files related to the derived dataset.
+ -
+ pattern: derived\/segmentation_masks\/.*
+ required: True
+ description: Directory containing segmentation masks.
+ -
+ pattern: derived\/segmentation_masks\/[^\/]+\.segmentations\.ome\.tiff
+ required: True
+ description: The segmentation masks should be stored as multi-channel pyramidal OME TIFF bitmasks with one channel per mask, where a single mask contains all instances of a type of object (e.g., all cells, a class of FTUs, etc). The class of objects contained in the mask is documented in the segmentation-masks.csv file. Each individual object in a mask should be represented by a unique integer pixel value starting at 1, with 0 meaning background (e.g., all pixels belonging to the first instance of a T-cell have a value of 1, the pixels for the second instance of a T-cell have a value of 2, etc). The pixel values should be unique within a mask. FTUs and other structural elements should be captured the same way as cells with segmentation masks and the appropriate channel feature definitions.
+ is_qa_qc: False
+ -
+ pattern: derived\/segmentation_masks\/[^\/]+-objects\.(?:tsv|xlsx)
+ required: True
+ description: This is a matrix where each row describes an individual object (e.g., one row per cell in the case where a mask contains all cells) and columns are features (i.e., object type, marker intensity, classification strategies, etc). One file should be created per mask with the name of the mask prepended to the file name. For example, if there is a cell segmentation map called "cells" then you would include a file called "cells-objects.csv" and that file would contain one row per cell in the "cells" mask and one column per feature, such as marker intensity and/or cell type. A minimum set of fields (required and optional) is included below.
+ is_qa_qc: False
+ -
+ pattern: derived\/segmentation_masks\/[^\/]+-centroid-adjacency\.csv
+ required: False
+ description: Objects are required to be in the same mask. A separate centroid-adjacency file can be included per mask.
+ is_qa_qc: False
+ -
+ pattern: derived\/segmentation_masks\/[^\/]+-linkage-adjacency\.csv
+ required: False
+ description: Objects are required to be in the same mask. A separate linkage-adjacency file can be included per mask.
+ is_qa_qc: False
+ -
+ pattern: derived\/segmentation_masks\/[^\/]+-mesh\.glb
+ required: False
+ description: This is a file with 3D mesh images for a 3D map.
+ is_qa_qc: False
+ -
+ pattern: derived\/segmentation_masks\/transformations\/.*
+ required: False
+ description: This directory should include any transformation files that pertain to a 3D reconstruction from serial sections. The mask protocol should explain the structure of these transformation files and how they can be used to reconstruct the 3D map from the 2D sections.