From 7950f59c1f07ad554c82ac28486981f6a3f9ef28 Mon Sep 17 00:00:00 2001
From: j-uranic <117292295+j-uranic@users.noreply.github.com>
Date: Thu, 12 Sep 2024 16:00:05 -0400
Subject: [PATCH] Juranic/merfish dirschema fixes (#1362)
* Create merfish-v2.3.yaml
Fixes previous version errors - *.dax extension changed to lowercase and pattern: raw/data/*.inf added.
* Update CHANGELOG.md
* Documentation: Update merfish
---------
Co-authored-by: Juan Puerto <=>
Co-authored-by: jpuerto-psc <68066250+jpuerto-psc@users.noreply.github.com>
---
CHANGELOG.md | 1 +
docs/merfish/current/index.md | 29 +++++-
.../directory-schemas/merfish-v2.3.yaml | 94 +++++++++++++++++++
3 files changed, 123 insertions(+), 1 deletion(-)
create mode 100644 src/ingest_validation_tools/directory-schemas/merfish-v2.3.yaml
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 39440c3d..4af637c8 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,6 +2,7 @@
## v0.0.26 (in progress)
- Update GeoMx NGS directory schema
+- Update MERFISH directory schema
## v0.0.25
- Update GeoMx NGS directory schema
diff --git a/docs/merfish/current/index.md b/docs/merfish/current/index.md
index f518b3ad..3fedafc5 100644
--- a/docs/merfish/current/index.md
+++ b/docs/merfish/current/index.md
@@ -28,7 +28,34 @@ Related files:
## Directory schemas
-Version 2.2 (use this one)
+Version 2.3 (use this one)
+
+| pattern | required? | description |
+| --- | --- | --- |
+| extras\/.*
| ✓ | Folder for general lab-specific files related to the dataset. |
+| extras\/microscope_hardware\.json
| ✓ | **[QA/QC]** A file generated by the micro-meta app that contains a description of the hardware components of the microscope. Email HuBMAP Consortium Help Desk if help is required in generating this document. |
+| extras\/microscope_settings\.json
| | **[QA/QC]** A file generated by the micro-meta app that contains a description of the settings that were used to acquire the image data. Email HuBMAP Consortium Help Desk if help is required in generating this document. |
+| raw\/.*
| ✓ | All raw data files for the experiment. |
+| raw\/additional_panels_used\.csv
| | If multiple commercial probe panels were used, then the primary probe panel should be selected in the "oligo_probe_panel" metadata field. The additional panels must be included in this file. Each panel record should include:manufacturer, model/name, product code. |
+| raw\/gene_panel\.csv
| ✓ | The list of target genes. The expected format is gene_id (ensembl ID), gene_name. |
+| raw\/custom_probe_set\.csv
| | This file should contain any custom probes used and must be included if the metadata field "is_custom_probes_used" is "Yes". The file should minimally include:target gene id, probe seq, probe id. The contents of this file are modeled after the 10x Genomics probe set file (see ). |
+| raw\/micron_to_mosaic_pixel_transform\.csv
| | Matrix used to transform from pixels to physical distance. |
+| raw\/manifest\.json
| ✓ | This file contains stain by channel details and pixel details. |
+| raw\/codebook\.csv
| ✓ | CSV containing codebook information for the experiment. Rows are barcodes and columns are imaging rounds. The first column is the barcode target, and the following column IDs are expected to be sequential, and round identifiers are expected to be integers (not roman numerals). |
+| raw\/positions\.csv
| ✓ | File that includes the top left coordinate of each tiled image. This is required to stitch the images. |
+| raw\/dataorganization\.csv
| ✓ | Necessary image definitions |
+| raw\/data\/.*
| ✓ | All raw stack data files for the MERFISH experiment. |
+| raw\/data\/[^\/]+\.dax
| ✓ | The raw image stack. |
+| raw\/data\/[^\/]+\.inf
| ✓ | Information file with dax image format specifications. Variable expected for downstream processing with PIPEFISH are frame dimensions, number of frames, little/big endian, stage X and Y locations, lock target, scalemin, and scalemax. |
+| raw\/images\/.*
| ✓ | Directory containing raw image files. This directory should include at least one raw file. |
+| raw\/images\/[^\/]+\.tif
| ✓ | Raw microscope file for the experiment. |
+| lab_processed\/.*
| ✓ | Experiment files that were processed by the lab generating the data. |
+| lab_processed\/detected_transcripts\.csv
| ✓ | A file containing the locations of each RNA target. |
+| lab_processed\/images\/.*
| ✓ | Processed image files |
+| lab_processed\/images\/[^\/]+\.ome\.tiff
(example: lab_processed/images/HBM892.MDXS.293.ome.tiff
) | ✓ | OME-TIFF files (multichannel, multi-layered) produced by the microscopy experiment. If compressed, must use loss-less compression algorithm. For Visium this stitched file should only include the single capture area relevant to the current dataset. For GeoMx there will be one OME TIFF file per slide, with each slide including multiple AOIs. See the following link for the set of fields that are required in the OME TIFF file XML header. |
+| lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv
| ✓ | This file provides essential documentation pertaining to each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. The required fields are detailed |
+
+Version 2.2
| pattern | required? | description |
| --- | --- | --- |
diff --git a/src/ingest_validation_tools/directory-schemas/merfish-v2.3.yaml b/src/ingest_validation_tools/directory-schemas/merfish-v2.3.yaml
new file mode 100644
index 00000000..a44c65b1
--- /dev/null
+++ b/src/ingest_validation_tools/directory-schemas/merfish-v2.3.yaml
@@ -0,0 +1,94 @@
+files:
+ -
+ pattern: extras\/.*
+ required: True
+ description: Folder for general lab-specific files related to the dataset.
+ -
+ pattern: extras\/microscope_hardware\.json
+ required: True
+ description: A file generated by the micro-meta app that contains a description of the hardware components of the microscope. Email HuBMAP Consortium Help Desk if help is required in generating this document.
+ is_qa_qc: True
+ -
+ pattern: extras\/microscope_settings\.json
+ required: False
+ description: A file generated by the micro-meta app that contains a description of the settings that were used to acquire the image data. Email HuBMAP Consortium Help Desk if help is required in generating this document.
+ is_qa_qc: True
+ -
+ pattern: raw\/.*
+ required: True
+ description: All raw data files for the experiment.
+ -
+ pattern: raw\/additional_panels_used\.csv
+ required: False
+ description: If multiple commercial probe panels were used, then the primary probe panel should be selected in the "oligo_probe_panel" metadata field. The additional panels must be included in this file. Each panel record should include:manufacturer, model/name, product code.
+ -
+ pattern: raw\/gene_panel\.csv
+ required: True
+ description: The list of target genes. The expected format is gene_id (ensembl ID), gene_name.
+ -
+ pattern: raw\/custom_probe_set\.csv
+ required: False
+ description: This file should contain any custom probes used and must be included if the metadata field "is_custom_probes_used" is "Yes". The file should minimally include:target gene id, probe seq, probe id. The contents of this file are modeled after the 10x Genomics probe set file (see ).
+ -
+ pattern: raw\/micron_to_mosaic_pixel_transform\.csv
+ required: False
+ description: Matrix used to transform from pixels to physical distance.
+ -
+ pattern: raw\/manifest\.json
+ required: True
+ description: This file contains stain by channel details and pixel details.
+ -
+ pattern: raw\/codebook\.csv
+ required: True
+ description: CSV containing codebook information for the experiment. Rows are barcodes and columns are imaging rounds. The first column is the barcode target, and the following column IDs are expected to be sequential, and round identifiers are expected to be integers (not roman numerals).
+ -
+ pattern: raw\/positions\.csv
+ required: True
+ description: File that includes the top left coordinate of each tiled image. This is required to stitch the images.
+ -
+ pattern: raw\/dataorganization\.csv
+ required: True
+ description: Necessary image definitions
+ -
+ pattern: raw\/data\/.*
+ required: True
+ description: All raw stack data files for the MERFISH experiment.
+ -
+ pattern: raw\/data\/[^\/]+\.dax
+ required: True
+ description: The raw image stack.
+ -
+ pattern: raw\/data\/[^\/]+\.inf
+ required: True
+ description: Information file with dax image format specifications. Variable expected for downstream processing with PIPEFISH are frame dimensions, number of frames, little/big endian, stage X and Y locations, lock target, scalemin, and scalemax.
+ -
+ pattern: raw\/images\/.*
+ required: True
+ description: Directory containing raw image files. This directory should include at least one raw file.
+ -
+ pattern: raw\/images\/[^\/]+\.tif
+ required: True
+ description: Raw microscope file for the experiment.
+ -
+ pattern: lab_processed\/.*
+ required: True
+ description: Experiment files that were processed by the lab generating the data.
+ -
+ pattern: lab_processed\/detected_transcripts\.csv
+ required: True
+ description: A file containing the locations of each RNA target.
+ -
+ pattern: lab_processed\/images\/.*
+ required: True
+ description: Processed image files
+ -
+ pattern: lab_processed\/images\/[^\/]+\.ome\.tiff
+ required: True
+ description: OME-TIFF files (multichannel, multi-layered) produced by the microscopy experiment. If compressed, must use loss-less compression algorithm. For Visium this stitched file should only include the single capture area relevant to the current dataset. For GeoMx there will be one OME TIFF file per slide, with each slide including multiple AOIs. See the following link for the set of fields that are required in the OME TIFF file XML header.
+ is_qa_qc: False
+ example: lab_processed/images/HBM892.MDXS.293.ome.tiff
+ -
+ pattern: lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv
+ required: True
+ description: This file provides essential documentation pertaining to each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. The required fields are detailed
+ is_qa_qc: False