From 38863a321f9540feab9357bc98327f6bf54cb907 Mon Sep 17 00:00:00 2001 From: j-uranic <117292295+j-uranic@users.noreply.github.com> Date: Tue, 27 Aug 2024 16:03:15 -0400 Subject: [PATCH] Juranic/mibi visium dirschema update (#1352) * Create mibi-v2.1.yaml Add extras\/microscope_hardware\.json and extras\/microscope_settings\.json; change raw\/images\/[^\/]+\.ome\.tiff from required to optional * Create visium-no-probes-v2.2.yaml Remove segmentation masks (lab_processed\/annotations\/.*, that has segmentation-masks-v2 dependency) * Update CHANGELOG.md * General: Update mibi & visium docs. --------- Co-authored-by: jpuerto-psc <68066250+jpuerto-psc@users.noreply.github.com> Co-authored-by: Juan Puerto <=> --- CHANGELOG.md | 2 + docs/mibi/current/index.md | 18 ++++- docs/visium-no-probes/current/index.md | 24 +++++- .../directory-schemas/mibi-v2.1.yaml | 48 ++++++++++++ .../visium-no-probes-v2.2.yaml | 78 +++++++++++++++++++ 5 files changed, 168 insertions(+), 2 deletions(-) create mode 100644 src/ingest_validation_tools/directory-schemas/mibi-v2.1.yaml create mode 100644 src/ingest_validation_tools/directory-schemas/visium-no-probes-v2.2.yaml diff --git a/CHANGELOG.md b/CHANGELOG.md index 1814a1e60..aab2aecc7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,8 @@ - Update Phenocycler docs - Update MERFISH directory schema - Add next-gen Cell DIVE directory schema +- Update MIBI directory schema +- Update Visium no probes directory schema ## v0.0.23 - Add token to validation_utils.get_assaytype_data, replace URL string concatenation with urllib diff --git a/docs/mibi/current/index.md b/docs/mibi/current/index.md index 5da6052ad..8cbd8e9c3 100644 --- a/docs/mibi/current/index.md +++ b/docs/mibi/current/index.md @@ -28,7 +28,23 @@ Related files:
## Directory schemas -Version 2.0 (use this one) +Version 2.1 (use this one) + +| pattern | required? | description | +| --- | --- | --- | +| extras\/.* | ✓ | Folder for general lab-specific files related to the dataset. [Exists in all assays] | +| extras\/microscope_hardware\.json | ✓ | **[QA/QC]** A file generated by the micro-meta app that contains a description of the hardware components of the microscope. Email HuBMAP Consortium Help Desk if help is required in generating this document. | +| extras\/microscope_settings\.json | | **[QA/QC]** A file generated by the micro-meta app that contains a description of the settings that were used to acquire the image data. Email HuBMAP Consortium Help Desk if help is required in generating this document. | +| raw\/.* | ✓ | This is a directory containing raw data. | +| raw\/images\/.* | ✓ | Raw image files. Using this subdirectory allows for harmonization with other more complex assays, like Visium that includes both raw imaging and sequencing data. | +| raw\/images\/[^\/]+\.ome\.tiff | | Raw image file. | +| raw\/images\/tiles\.csv | | This file contains the approximate coordinates for each of the tiled raw images. | +| lab_processed\/.* | ✓ | Experiment files that were processed by the lab generating the data. | +| lab_processed\/images\/.* | ✓ | This is a directory containing processed image files | +| lab_processed\/images\/[^\/]+\.ome\.tiff | ✓ | OME-TIFF file (multichannel, multi-layered) produced by the experiment. If compressed, must use loss-less compression algorithm. See the following link for the set of fields that are required in the OME TIFF file XML header. | +| lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv | ✓ | This file provides essential documentation pertaining to each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. The required fields are detailed | + +Version 2.0 | pattern | required? | description | | --- | --- | --- | diff --git a/docs/visium-no-probes/current/index.md b/docs/visium-no-probes/current/index.md index f6d37d4d5..c8254609e 100644 --- a/docs/visium-no-probes/current/index.md +++ b/docs/visium-no-probes/current/index.md @@ -30,7 +30,29 @@ REQUIRED - For this assay, you must also prepare and submit two additional metad
## Directory schemas -Version 2.1 (use this one) +Version 2.2 (use this one) + +| pattern | required? | description | +| --- | --- | --- | +| extras\/.* | ✓ | Folder for general lab-specific files related to the dataset | +| extras\/microscope_hardware\.json | ✓ | **[QA/QC]** A file generated by the micro-meta app that contains a description of the hardware components of the microscope. Email HuBMAP Consortium Help Desk if help is required in generating this document. | +| extras\/microscope_settings\.json | | **[QA/QC]** A file generated by the micro-meta app that contains a description of the settings that were used to acquire the image data. Email HuBMAP Consortium Help Desk if help is required in generating this document. | +| raw\/.* | ✓ | All raw data files for the experiment. | +| raw\/[^\/]+\.gpr | ✓ | This is a 10X Genomics layout file that's generated by 10X and individualized for each Visium slide. This is a text file and can be generated using this 10X web form along with the unique 10X Visium slide ID. | +| raw\/fastq\/.* | ✓ | Raw sequencing files for the experiment | +| raw\/fastq\/RNA\/.* | ✓ | Directory containing fastq files pertaining to RNAseq sequencing. | +| raw\/fastq\/RNA\/[^\/]+_R[^\/]+\.fastq\.gz | ✓ | This is a GZip'd version of the forward and reverse fastq files from RNAseq sequencing (R1 and R2). | +| raw\/images\/.* | ✓ | Directory containing raw image files. This directory should include at least one raw file. | +| raw\/images\/[^\/]+\.(?:xml|scn|vsi|svs|czi|tiff) | ✓ | Raw microscope file for the experiment | +| lab_processed\/.* | ✓ | Experiment files that were processed by the lab generating the data. | +| lab_processed\/images\/.* | ✓ | Processed image files | +| lab_processed\/images\/[^\/]+\.ome\.tiff (example: lab_processed/images/HBM892.MDXS.293.ome.tiff) | ✓ | OME-TIFF files (multichannel, multi-layered) produced by the microscopy experiment. If compressed, must use loss-less compression algorithm. For Visium this stitched file should only include the single capture area relevant to the current dataset. For GeoMx there will be one OME TIFF file per slide, with each slide including multiple AOIs. See the following link for the set of fields that are required in the OME TIFF file XML header. | +| lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv | ✓ | This file provides essential documentation pertaining to each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. The required fields are detailed | +| lab_processed\/images\/[^\/]+\.json | | This file is the output from LoupeBrowser, when a data provider manually denotes which spots on the slide contain tissue. This file is optionally used by 10X SpaceRanger. | +| lab_processed\/transformations\/.* | | This directory contains transformation matrices that capture how each modality is aligned with the other and can be used to visualize overlays of multimodal data. This is needed to overlay images from the exact same tissue section (e.g., MALDI imaging mass spec, autofluorescence microscopy, MxIF, histological stains). In these cases data type may have different pixel sizes and slightly different orientations (i.e., one may be rotated relative to another). | +| lab_processed\/transformations\/[^\/]+\.txt | | Transformation matrices used to overlay images from the exact same tissue section (e.g., MALDI imaging mass spec, autofluorescence microscopy, MxIF, histological stains). | + +Version 2.1 | pattern | required? | description | dependent on | | --- | --- | --- | --- | diff --git a/src/ingest_validation_tools/directory-schemas/mibi-v2.1.yaml b/src/ingest_validation_tools/directory-schemas/mibi-v2.1.yaml new file mode 100644 index 000000000..f3d682cbf --- /dev/null +++ b/src/ingest_validation_tools/directory-schemas/mibi-v2.1.yaml @@ -0,0 +1,48 @@ +files: + - + pattern: extras\/.* + required: True + description: Folder for general lab-specific files related to the dataset. [Exists in all assays] + - + pattern: extras\/microscope_hardware\.json + required: True + description: A file generated by the micro-meta app that contains a description of the hardware components of the microscope. Email HuBMAP Consortium Help Desk if help is required in generating this document. + is_qa_qc: True + - + pattern: extras\/microscope_settings\.json + required: False + description: A file generated by the micro-meta app that contains a description of the settings that were used to acquire the image data. Email HuBMAP Consortium Help Desk if help is required in generating this document. + is_qa_qc: True + - + pattern: raw\/.* + required: True + description: This is a directory containing raw data. + - + pattern: raw\/images\/.* + required: True + description: Raw image files. Using this subdirectory allows for harmonization with other more complex assays, like Visium that includes both raw imaging and sequencing data. + - + pattern: raw\/images\/[^\/]+\.ome\.tiff + required: False + description: Raw image file. + - + pattern: raw\/images\/tiles\.csv + required: False + description: This file contains the approximate coordinates for each of the tiled raw images. + - + pattern: lab_processed\/.* + required: True + description: Experiment files that were processed by the lab generating the data. + - + pattern: lab_processed\/images\/.* + required: True + description: This is a directory containing processed image files + - + pattern: lab_processed\/images\/[^\/]+\.ome\.tiff + required: True + description: OME-TIFF file (multichannel, multi-layered) produced by the experiment. If compressed, must use loss-less compression algorithm. See the following link for the set of fields that are required in the OME TIFF file XML header. + is_qa_qc: False + - + pattern: lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv + required: True + description: This file provides essential documentation pertaining to each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. The required fields are detailed diff --git a/src/ingest_validation_tools/directory-schemas/visium-no-probes-v2.2.yaml b/src/ingest_validation_tools/directory-schemas/visium-no-probes-v2.2.yaml new file mode 100644 index 000000000..40fd4021c --- /dev/null +++ b/src/ingest_validation_tools/directory-schemas/visium-no-probes-v2.2.yaml @@ -0,0 +1,78 @@ +files: + - + pattern: extras\/.* + required: True + description: Folder for general lab-specific files related to the dataset + - + pattern: extras\/microscope_hardware\.json + required: True + description: A file generated by the micro-meta app that contains a description of the hardware components of the microscope. Email HuBMAP Consortium Help Desk if help is required in generating this document. + is_qa_qc: True + - + pattern: extras\/microscope_settings\.json + required: False + description: A file generated by the micro-meta app that contains a description of the settings that were used to acquire the image data. Email HuBMAP Consortium Help Desk if help is required in generating this document. + is_qa_qc: True + - + pattern: raw\/.* + required: True + description: All raw data files for the experiment. + - + pattern: raw\/[^\/]+\.gpr + required: True + description: This is a 10X Genomics layout file that's generated by 10X and individualized for each Visium slide. This is a text file and can be generated using this 10X web form along with the unique 10X Visium slide ID. + is_qa_qc: False + - + pattern: raw\/fastq\/.* + required: True + description: Raw sequencing files for the experiment + - + pattern: raw\/fastq\/RNA\/.* + required: True + description: Directory containing fastq files pertaining to RNAseq sequencing. + - + pattern: raw\/fastq\/RNA\/[^\/]+_R[^\/]+\.fastq\.gz + required: True + description: This is a GZip'd version of the forward and reverse fastq files from RNAseq sequencing (R1 and R2). + is_qa_qc: False + - + pattern: raw\/images\/.* + required: True + description: Directory containing raw image files. This directory should include at least one raw file. + - + pattern: raw\/images\/[^\/]+\.(?:xml|scn|vsi|svs|czi|tiff) + required: True + description: Raw microscope file for the experiment + is_qa_qc: False + - + pattern: lab_processed\/.* + required: True + description: Experiment files that were processed by the lab generating the data. + - + pattern: lab_processed\/images\/.* + required: True + description: Processed image files + - + pattern: lab_processed\/images\/[^\/]+\.ome\.tiff + required: True + description: OME-TIFF files (multichannel, multi-layered) produced by the microscopy experiment. If compressed, must use loss-less compression algorithm. For Visium this stitched file should only include the single capture area relevant to the current dataset. For GeoMx there will be one OME TIFF file per slide, with each slide including multiple AOIs. See the following link for the set of fields that are required in the OME TIFF file XML header. + is_qa_qc: False + example: lab_processed/images/HBM892.MDXS.293.ome.tiff + - + pattern: lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv + required: True + description: This file provides essential documentation pertaining to each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. The required fields are detailed + is_qa_qc: False + - + pattern: lab_processed\/images\/[^\/]+\.json + required: False + description: This file is the output from LoupeBrowser, when a data provider manually denotes which spots on the slide contain tissue. This file is optionally used by 10X SpaceRanger. + - + pattern: lab_processed\/transformations\/.* + required: False + description: This directory contains transformation matrices that capture how each modality is aligned with the other and can be used to visualize overlays of multimodal data. This is needed to overlay images from the exact same tissue section (e.g., MALDI imaging mass spec, autofluorescence microscopy, MxIF, histological stains). In these cases data type may have different pixel sizes and slightly different orientations (i.e., one may be rotated relative to another). + - + pattern: lab_processed\/transformations\/[^\/]+\.txt + required: False + description: Transformation matrices used to overlay images from the exact same tissue section (e.g., MALDI imaging mass spec, autofluorescence microscopy, MxIF, histological stains). + is_qa_qc: False