From d4e411e308e95dda6eaa8eeeabed1628791c6bd0 Mon Sep 17 00:00:00 2001
From: j-uranic <117292295+j-uranic@users.noreply.github.com>
Date: Thu, 10 Oct 2024 11:41:25 -0400
Subject: [PATCH] Juranic/segmasks dirschema fix (#1369)
* Create segmentation-mask-v2.1.yaml
Updated parse for derived\/segmentation_masks\/[^\/]+\.segmentations\.ome\.tiff
* Update CHANGELOG.md
* Documentation: Update seg mask docs
---------
Co-authored-by: Juan Puerto <=>
---
CHANGELOG.md | 1 +
docs/segmentation-mask/current/index.md | 16 ++++++-
.../segmentation-mask-v2.1.yaml | 42 +++++++++++++++++++
3 files changed, 58 insertions(+), 1 deletion(-)
create mode 100644 src/ingest_validation_tools/directory-schemas/segmentation-mask-v2.1.yaml
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 6abf268d..a0bf9b4b 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,6 +9,7 @@
- Update Visium with probes directory schema
- Update Visium with probes directory schema pt 2
- Update Visium with probes directory schema pt 3
+- Update Segmentation masks directory schema
## v0.0.25
- Update GeoMx NGS directory schema
diff --git a/docs/segmentation-mask/current/index.md b/docs/segmentation-mask/current/index.md
index b9fb8b7b..049a9906 100644
--- a/docs/segmentation-mask/current/index.md
+++ b/docs/segmentation-mask/current/index.md
@@ -28,7 +28,21 @@ For additional documentation on Segmentation Masks, please visit [here](https://
## Directory schemas
-Version 2.0 (use this one)
+Version 2.1 (use this one)
+
+| pattern | required? | description |
+| --- | --- | --- |
+| derived\/.*
| ✓ | The EPIC data is placed in TOP/derived/ so it does not conflict with any files if it is uploaded with a primary dataset. |
+| derived\/extras\/.*
| ✓ | Folder for general lab-specific files related to the derived dataset. |
+| derived\/segmentation_masks\/.*
| ✓ | Directory containing segmentation masks. |
+| derived\/segmentation_masks\/[^\/]+\.segmentations\.ome\.tiff
| ✓ | The segmentation masks should be stored as multi-channel pyramidal OME TIFF bitmasks with one channel per mask, where a single mask contains all instances of a type of object (e.g., all cells, a class of FTUs, etc). The class of objects contained in the mask is documented in the segmentation-masks.csv file. Each individual object in a mask should be represented by a unique integer pixel value starting at 1, with 0 meaning background (e.g., all pixels belonging to the first instance of a T-cell have a value of 1, the pixels for the second instance of a T-cell have a value of 2, etc). The pixel values should be unique within a mask. FTUs and other structural elements should be captured the same way as cells with segmentation masks and the appropriate channel feature definitions. |
+| derived\/segmentation_masks\/[^\/]+-objects\.(?:tsv|xlsx)
| ✓ | This is a matrix where each row describes an individual object (e.g., one row per cell in the case where a mask contains all cells) and columns are features (i.e., object type, marker intensity, classification strategies, etc). One file should be created per mask with the name of the mask prepended to the file name. For example, if there is a cell segmentation map called "cells" then you would include a file called "cells-objects.csv" and that file would contain one row per cell in the "cells" mask and one column per feature, such as marker intensity and/or cell type. A minimum set of fields (required and optional) is included below. |
+| derived\/segmentation_masks\/[^\/]+-centroid-adjacency\.csv
| | Objects are required to be in the same mask. A separate centroid-adjacency file can be included per mask. |
+| derived\/segmentation_masks\/[^\/]+-linkage-adjacency\.csv
| | Objects are required to be in the same mask. A separate linkage-adjacency file can be included per mask. |
+| derived\/segmentation_masks\/[^\/]+-mesh\.glb
| | This is a file with 3D mesh images for a 3D map. |
+| derived\/segmentation_masks\/transformations\/.*
| | This directory should include any transformation files that pertain to a 3D reconstruction from serial sections. The mask protocol should explain the structure of these transformation files and how they can be used to reconstruct the 3D map from the 2D sections. |
+
+Version 2.0
| pattern | required? | description |
| --- | --- | --- |
diff --git a/src/ingest_validation_tools/directory-schemas/segmentation-mask-v2.1.yaml b/src/ingest_validation_tools/directory-schemas/segmentation-mask-v2.1.yaml
new file mode 100644
index 00000000..82b7a8ca
--- /dev/null
+++ b/src/ingest_validation_tools/directory-schemas/segmentation-mask-v2.1.yaml
@@ -0,0 +1,42 @@
+files:
+ -
+ pattern: derived\/.*
+ required: True
+ description: The EPIC data is placed in TOP/derived/ so it does not conflict with any files if it is uploaded with a primary dataset.
+ -
+ pattern: derived\/extras\/.*
+ required: True
+ description: Folder for general lab-specific files related to the derived dataset.
+ -
+ pattern: derived\/segmentation_masks\/.*
+ required: True
+ description: Directory containing segmentation masks.
+ -
+ pattern: derived\/segmentation_masks\/[^\/]+\.segmentations\.ome\.tiff
+ required: True
+ description: The segmentation masks should be stored as multi-channel pyramidal OME TIFF bitmasks with one channel per mask, where a single mask contains all instances of a type of object (e.g., all cells, a class of FTUs, etc). The class of objects contained in the mask is documented in the segmentation-masks.csv file. Each individual object in a mask should be represented by a unique integer pixel value starting at 1, with 0 meaning background (e.g., all pixels belonging to the first instance of a T-cell have a value of 1, the pixels for the second instance of a T-cell have a value of 2, etc). The pixel values should be unique within a mask. FTUs and other structural elements should be captured the same way as cells with segmentation masks and the appropriate channel feature definitions.
+ is_qa_qc: False
+ -
+ pattern: derived\/segmentation_masks\/[^\/]+-objects\.(?:tsv|xlsx)
+ required: True
+ description: This is a matrix where each row describes an individual object (e.g., one row per cell in the case where a mask contains all cells) and columns are features (i.e., object type, marker intensity, classification strategies, etc). One file should be created per mask with the name of the mask prepended to the file name. For example, if there is a cell segmentation map called "cells" then you would include a file called "cells-objects.csv" and that file would contain one row per cell in the "cells" mask and one column per feature, such as marker intensity and/or cell type. A minimum set of fields (required and optional) is included below.
+ is_qa_qc: False
+ -
+ pattern: derived\/segmentation_masks\/[^\/]+-centroid-adjacency\.csv
+ required: False
+ description: Objects are required to be in the same mask. A separate centroid-adjacency file can be included per mask.
+ is_qa_qc: False
+ -
+ pattern: derived\/segmentation_masks\/[^\/]+-linkage-adjacency\.csv
+ required: False
+ description: Objects are required to be in the same mask. A separate linkage-adjacency file can be included per mask.
+ is_qa_qc: False
+ -
+ pattern: derived\/segmentation_masks\/[^\/]+-mesh\.glb
+ required: False
+ description: This is a file with 3D mesh images for a 3D map.
+ is_qa_qc: False
+ -
+ pattern: derived\/segmentation_masks\/transformations\/.*
+ required: False
+ description: This directory should include any transformation files that pertain to a 3D reconstruction from serial sections. The mask protocol should explain the structure of these transformation files and how they can be used to reconstruct the 3D map from the 2D sections.