Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phenocycler and Histology bump to 2.2.0 #1283

Merged
merged 20 commits into from
Jan 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
# Changelog

## v0.0.17 - in progress

- Update atacseq cedar link
- Add Phenocycler next-gen directory schema
- Update Histology next-gen directory schema
- Add LC-MS next-gen directory schema
- Add GeoMx NGS next-gen directory schema
- Update PhenoCycler and Histology to 2.2.0

## v0.0.16

Expand Down
37 changes: 36 additions & 1 deletion docs/geomx-ngs/current/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,40 @@ Related files:
<br>

## Directory schemas
<summary><a href="https://docs.google.com/spreadsheets/d/1LE-iyY2E6eP4E8jhgP6rhsvjESrdHXWYrMwKTvNkI5Y"><b>Version 2 (use this one)</b> (draft - submission of data prepared using this schema will be supported by Sept. 30) </a></summary>
<summary><b>Version 2 (use this one)</b></summary>

| pattern | required? | description | dependent on |
| --- | --- | --- | --- |
| <code>extras\/.*</code> | ✓ | Folder for general lab-specific files related to the dataset. | |
| <code>extras\/microscope_hardware\.json</code> | ✓ | **[QA/QC]** A file generated by the micro-meta app that contains a description of the hardware components of the microscope. Email HuBMAP Consortium Help Desk <[email protected]> if help is required in generating this document. | |
| <code>extras\/microscope_settings\.json</code> | | **[QA/QC]** A file generated by the micro-meta app that contains a description of the settings that were used to acquire the image data. Email HuBMAP Consortium Help Desk <[email protected]> if help is required in generating this document. | |
| <code>raw\/.*</code> | ✓ | All raw data files for the experiment. | |
| <code>raw\/[^\/]+_LabWorksheet.txt</code> | ✓ | An Excel spreadsheet to refer to in setting up the library. This file documents all of the samples from a single collection plate. Generated by DSP run, prior to sequencing. | |
| <code>raw\/[^\/]+_config\.ini</code> | ✓ | Needed to generate the DCC file from the fastq file. Contains pipeline processing parameters. Generated by DSP run, prior to sequencing. | |
| <code>raw\/[^\/]+_SeqCodeIndices\.csv</code> | ✓ | A file with sample information needed by the Illumina software. Use the contents of the SeqCodeIndices.csv file to create a SampleSheet.csv for input to the Illumina sequencer. (NextSeq 1000/2000 users download a SampleSheet.csv and whitelist.txt instead of SeqCodeIndices.csv.) Generated by DSP run. | |
| <code>raw\/markers\.csv</code> | | A csv file describing any morphology markers used to guide ROI and/or AOI selection [this should be similar in structure to the antibodies file] | |
| <code>raw\/[^\/]*targets\.pkc</code> | ✓ | The file listing probe barcode sequence and corresponding gene symbol or proteins targeted by that probe. This should be consistent for the same probe panel. | |
| <code>raw\/additional_panels_used\.csv</code> | | If multiple commercial probe panels were used, then the primary probe panel should be selected in the "oligo_probe_panel" metadata field. The additional panels must be included in this file. Each panel record should include:manufacturer, model/name, product code. | |
| <code>raw\/custom_probe_set\.csv</code> | ✓ | This file should contain any custom probes used and must be included if the metadata field "is_custom_probes_used" is "Yes". The file should minimally include:target gene id, probe seq, probe id. The contents of this file are modeled after the 10x Genomics probe set file (see <https://support.10xgenomics.com/spatial-gene-expression-ffpe/probe-sets/probe-set-file-descriptions/probe-set-file-descriptions#probe_set_csv_file>). | |
| <code>raw\/fastq\/.*</code> | ✓ | Raw sequencing files for the experiment | |
| <code>raw\/fastq\/oligo\/.*</code> | ✓ | Directory containing fastq files pertaining to oligo sequencing. | |
| <code>raw\/fastq\/oligo\/[^\/]+\.fastq\.gz</code> | ✓ | This is a gzip version of the fastq file. This file contains the cell barcode and unique molecular identifier (technical). | |
| <code>raw\/images\/.*</code> | | Directory containing raw image files. This directory should include at least one raw file. | |
| <code>raw\/images\/overlay\.(?:jpeg&#124;tiff)</code> | | State whether an overlay image was used to guide ROI selection. If an overlay is used, then the overlay details will be provided in the protocols.io protocol. If used, this needs to be uploaded. It is not included in the OME TIFF. This can be a JPEG or TIFF file | |
| <code>lab_processed\/.*</code> | ✓ | Experiment files that were processed by the lab generating the data. | |
| <code>lab_processed\/Initial\s{1}Dataset\.xlsx</code> | ✓ | **[QA/QC]** An excel spreadsheet that is downloaded from the GeoMx DSP Data Analysis Suite containing QA/QC metrics based on raw, unprocessed target counts. This file contains one row per AOI/segment and no analyses span AOI. The AOIs included in this file can come from different GeoMx runs and hence span Globus uploads. So care must be taken to make sure the appropriate AOIs are included in the file. | |
| <code>lab_processed\/annotations\.xlsx</code> | | AOI specific annotations. This might include cell type and anatomical information. | |
| <code>lab_processed\/dcc\/.*</code> | ✓ | DCC files generated from fastq by the Nanostring GeoMx NGS Pipeline. | |
| <code>lab_processed\/dcc\/[^\/]+\.dcc</code> | ✓ | DCC files containing target probe counts, generated from fastq by the Nanostring GeoMx NGS Pipeline. | |
| <code>lab_processed\/images\/.*</code> | ✓ | Processed image files | |
| <code>lab_processed\/images\/[^\/]+\.ome\.tiff</code> | ✓ | OME-TIFF files (multichannel, multi-layered) produced by the microscopy experiment. If compressed, must use loss-less compression algorithm. For Visium this stitched file should only include the single capture area relevant to the current dataset. For GeoMx there will be one OME TIFF file per slide, with each slide including multiple AOIs. See the following link for the set of fields that are required in the OME TIFF file XML header. <https://docs.google.com/spreadsheets/d/1YnmdTAA0Z9MKN3OjR3Sca8pz-LNQll91wdQoRPSP6Q4/edit#gid=0> | |
| <code>lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv</code> | ✓ | This file provides essential documentation pertaining to each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. The required fields are detailed <https://docs.google.com/spreadsheets/d/1xEJSb0xn5C5fB3k62pj1CyHNybpt4-YtvUs5SUMS44o/edit#gid=0> | |
| <code>lab_processed\/annotations\/.*</code> | | Directory containing segmentation masks. | |
| <code>lab_processed\/annotations\/[^\/]+\.segmentations\.ome\.tiff</code> | | The segmentation masks should be stored as multi-channel pyramidal OME TIFF bitmasks with one channel per mask, where a single mask contains all instances of a type of object (e.g., all cells, a class of FTUs, etc). The class of objects contained in the mask is documented in the segmentation-masks.csv file. Each individual object in a mask should be represented by a unique integer pixel value starting at 1, with 0 meaning background (e.g., all pixels belonging to the first instance of a T-cell have a value of 1, the pixels for the second instance of a T-cell have a value of 2, etc). The pixel values should be unique within a mask. FTUs and other structural elements should be captured the same way as cells with segmentation masks and the appropriate channel feature definitions. | lab_processed\/annotations\/.* |
| <code>lab_processed\/annotations\/segmentation-masks\.csv</code> | | This file contains details about each mask, with one row per mask. Each column in this file contains details describing the mask (e.g., channel number, mask name, ontological ID, etc). Each mask is stored as a channel in the segmentations.ome.tiff file and the mask name should be ontologically based and linked to the ASCT+B table where possible. The number of rows in this file should equal the number of channels in the segmentations.ome.tiff. For example, one row in this file would ontologically describe cells, if the segmentations.ome.tiff file contained a mask of all cells. A minimum set of fields (required and optional) is included below. If multiple segmentations.ome.tiff files are used, this segmentation-masks.csv file should document the masks across all of the OME TIFF files. | lab_processed\/annotations\/.* |
| <code>lab_processed\/annotations\/[^\/]+-objects\.csv</code> | | This is a matrix where each row describes an individual object (e.g., one row per cell in the case where a mask contains all cells) and columns are features (i.e., object type, marker intensity, classification strategies, etc). One file should be created per mask with the name of the mask prepended to the file name. For example, if there’s a cell segmentation map called “cells” then you would include a file called “cells-objects.csv” and that file would contain one row per cell in the “cells” mask and one column per feature, such as marker intensity and/or cell type. A minimum set of fields (required and optional) is included below. | lab_processed\/annotations\/.* |
| <code>lab_processed\/annotations\/[^\/]+\.geojson</code> | | A GeoJSON file(s) containing the geometries of each object within a mask. For example, if the mask contains multiple FTUs, multiple cells, etc, each of the objects in the mask would be independently documented in the GeoJSON file. There would be a single GeoJSON file per mask and the name of the file should be the name of the mask. If this file is generated by QuPath, the coordinates will be in pixel units with the origin (0, 0) as the top left corner of the full-resolution image. | lab_processed\/annotations\/.* |
| <code>lab_processed\/annotations\/tissue-boundary\.geojson</code> | | **[QA/QC]** If the boundaries of the tissue have been identified (e.g., by manual efforts), then the boundary geometry can be included as a GeoJSON file named “tissue-boundary.geojson”. | lab_processed\/annotations\/.* |
| <code>lab_processed\/annotations\/regions-of-concern\.csv</code> | | This file and the associated GeoJSON file can be used to denote any regions in the image that may contain QA/QC concerns. For example, if there are folds in the tissue, the region of the fold can be highlighted. This file should contain one row per region and include documentation about the region and why it's being flagged. | lab_processed\/annotations\/.* |
| <code>lab_processed\/annotations\/regions-of-concern\.geojson</code> | | This file and the associated CSV file can be used to denote any regions in the image that may contain QA/QC concerns. For example, if there are folds in the tissue, the region of the fold can be highlighted. This file should contain the geometric coordinates of each region being flagged. | lab_processed\/annotations\/.* |

1 change: 1 addition & 0 deletions docs/histology/current/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ Related files:
| <code>lab_processed\/images\/.*</code> | ✓ | Processed image files | |
| <code>lab_processed\/images\/[^\/]+\.ome\.tiff</code> (example: <code>lab_processed/images/HBM892.MDXS.293.ome.tiff</code>) | ✓ | OME-TIFF files (multichannel, multi-layered) produced by the microscopy experiment. If compressed, must use loss-less compression algorithm. See the following link for the set of fields that are required in the OME TIFF file XML header. <https://docs.google.com/spreadsheets/d/1YnmdTAA0Z9MKN3OjR3Sca8pz-LNQll91wdQoRPSP6Q4/edit#gid=0> | |
| <code>lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv</code> | ✓ | This file provides essential documentation pertaining to each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. The required fields are detailed here <https://docs.google.com/spreadsheets/d/1xEJSb0xn5C5fB3k62pj1CyHNybpt4-YtvUs5SUMS44o/edit#gid=0> | |
| <code>lab_processed\/images\/[^\/]+\.tissue-boundary\.geojson</code> | | **[QA/QC]** If the boundaries of the tissue have been identified (e.g., by manual efforts), then the boundary geometry can be included as a GeoJSON file named “*.tissue-boundary.geojson”. | |
| <code>lab_processed\/transformations\/.*</code> | | This directory contains transformation matrices that capture how each modality is aligned with the other and can be used to visualize overlays of multimodal data. This is needed to overlay images from the exact same tissue section (e.g., MALDI imaging mass spec, autofluorescence microscopy, MxIF, histological stains). In these cases data type may have different pixel sizes and slightly different orientations (i.e., one may be rotated relative to another). | |
| <code>lab_processed\/transformations\/[^\/]+\.txt</code> | | Transformation matrices used to overlay images from the exact same tissue section (e.g., MALDI imaging mass spec, autofluorescence microscopy, MxIF, histological stains). | |
| <code>lab_processed\/probabilities\/.*</code> | | Directory containing probabilities pertaining to lab processed data (e.g., from Ilastik pixel classification). | |
Expand Down
22 changes: 21 additions & 1 deletion docs/phenocycler/current/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,25 @@ Related files:
<br>

## Directory schemas
<summary><a href="https://docs.google.com/spreadsheets/d/1pZD2e51e4QkxzIk6xjHPPu1RBZpx5mzoykMmlaDK8rA"><b>Version 2 (use this one)</b> (draft - submission of data prepared using this schema will be supported by Sept. 30) </a></summary>
<summary><b>Version 2 (use this one)</b></summary>

| pattern | required? | description |
| --- | --- | --- |
| <code>extras\/.*</code> | ✓ | Folder for general lab-specific files related to the dataset. [Exists in all assays] |
| <code>extras\/microscope_hardware\.json</code> | ✓ | **[QA/QC]** A file generated by the micro-meta app that contains a description of the hardware components of the microscope. Email HuBMAP Consortium Help Desk <[email protected]> if help is required in generating this document. |
| <code>extras\/microscope_settings\.json</code> | | **[QA/QC]** A file generated by the micro-meta app that contains a description of the settings that were used to acquire the image data. Email HuBMAP Consortium Help Desk <[email protected]> if help is required in generating this document. |
| <code>raw\/.*</code> | ✓ | This is a directory containing raw data. |
| <code>raw\/images\/.*</code> | ✓ | Raw image files. Using this subdirectory allows for harmonization with other more complex assays, like Visium that includes both raw imaging and sequencing data. |
| <code>raw\/images\/[^\/]+\.xpd</code> | ✓ | Experimental set up of the Phenocycler-Fusion run. File includes cycle information, antibodies utilized, and the experimental design of the run |
| <code>raw\/images\/[^\/]+\.qptiff</code> | ✓ | Final image file produced by the Phenocycler-Fusion |
| <code>raw\/images\/phenocycler\/.*</code> | ✓ | These are the files from the temp directory generated by the PhenoCycler. The dataset should include all files from this directory except the "qptiff.intermediate" files. |
| <code>raw\/images\/phenocycler\/[^\/]+\.qptiff\.raw</code> | ✓ | Raw image files from the temp directory generated by the PhenoCycler. |
| <code>raw\/images\/phenocycler\/[^\/]+\.qptiff\.intermediate</code> | | Intermediate image files from the temp directory generated by the PhenoCycler. These files are not required. |
| <code>raw\/images\/phenocycler\/(?:CombineInputs.txt&#124;FocusMap.tif&#124;FocusTable.txt&#124;Label.tif&#124;MarkerList.txt&#124;OverviewBF.tif&#124;SampleMask.tif)</code> | ✓ | Required file from the temp directory generated by the PhenoCycler. The optional files depend on which version of the PhenoCycler software was being used. |
| <code>raw\/images\/phenocycler\/(?:CoverslipMask.tif&#124;FlowCellOverview.tif&#124;OverviewFL.tif&#124;SampleValMask.tif)</code> | | Required file from the temp directory generated by the PhenoCycler. The optional files depend on which version of the PhenoCycler software was being used. |
| <code>lab_processed\/.*</code> | ✓ | Experiment files that were processed by the lab generating the data. |
| <code>lab_processed\/images\/.*</code> | ✓ | This is a directory containing processed image files |
| <code>lab_processed\/images\/[^\/]+\.tissue-boundary\.geojson</code> | | **[QA/QC]** If the boundaries of the tissue have been identified (e.g., by manual efforts), then the boundary geometry can be included as a GeoJSON file named “*.tissue-boundary.geojson”. |
| <code>lab_processed\/images\/[^\/]+\.ome\.tiff</code> | ✓ | OME-TIFF file (multichannel, multi-layered) produced by the experiment. If compressed, must use loss-less compression algorithm. See the following link for the set of fields that are required in the OME TIFF file XML header. <https://docs.google.com/spreadsheets/d/1YnmdTAA0Z9MKN3OjR3Sca8pz-LNQll91wdQoRPSP6Q4/edit#gid=0> |
| <code>lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv</code> | ✓ | This file provides essential documentation pertaining to each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. The required fields are detailed <https://docs.google.com/spreadsheets/d/1xEJSb0xn5C5fB3k62pj1CyHNybpt4-YtvUs5SUMS44o/edit#gid=0> |

1 change: 1 addition & 0 deletions docs/phenocycler/deprecated/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Moved to [github pages](https://hubmapconsortium.github.io/ingest-validation-tools/lcms/).
Loading
Loading