Skip to content

Commit

Permalink
Merge branch 'main' of github.com:hubmapconsortium/ingest-validation-…
Browse files Browse the repository at this point in the history
…tools into phillips/prevent_multiple_assay_types
  • Loading branch information
gesinaphillips committed Feb 2, 2024
2 parents cc0e108 + 390f5c5 commit 3f2022a
Show file tree
Hide file tree
Showing 31 changed files with 155 additions and 48 deletions.
11 changes: 10 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Changelog

## v0.0.17 - in progress
## v0.0.18 (in progress)
- Update PhenoCycler directory schema

## v0.0.17

- Update atacseq cedar link
- Add Phenocycler next-gen directory schema
Expand All @@ -10,6 +13,12 @@
- Update PhenoCycler and Histology to 2.2.0
- Update CEDAR links for PhenoCycler & Histology
- Refactor Upload to avoid validating the same contributors.tsv multiple times / running plugins over files multiple times
- Add entry for segmentation-mask
- Modify directory schema validation such that it takes empty directories into account
- Add Publication next-gen directory schema
- Update ATAC/RNA/10X documentation
- Update Cell Dive documentation
- Update to support passing list of data_paths to ingest-validation-tests plugins

## v0.0.16

Expand Down
2 changes: 1 addition & 1 deletion docs/10x-multiome/current/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Related files:
- [📝 TSV template](https://raw.githubusercontent.com/hubmapconsortium/dataset-metadata-spreadsheet/main/10x-multiome/latest/10x-multiome.tsv): Alternative for metadata entry.


REQUIRED - For this assay, you must also prepare and submit two additional metadata.tsv files following the metadata schemas linked here for [RNAseq](https://hubmapconsortium.github.io/ingest-validation-tools/rnaseq/current/) and [ATACseq](https://hubmapconsortium.github.io/ingest-validation-tools/atacseq/current/).
REQUIRED - For this assay, you must also prepare and submit two additional metadata.tsv files following the metadata schemas linked here for [RNAseq](https://hubmapconsortium.github.io/ingest-validation-tools/rnaseq/current/) and [ATACseq](https://hubmapconsortium.github.io/ingest-validation-tools/atacseq/current/). For additional documentation on this dataset type, please visit [here](https://docs.google.com/document/d/1cVX_uMA5ehz3TBjrlXSb9KkRo8_5kcFUFhJaWeW9JyU).

## Metadata schema

Expand Down
2 changes: 1 addition & 1 deletion docs/_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ url: "https://hubmapconsortium.github.io" # the base hostname & protocol for you
markdown: kramdown
theme: minima
categories-order:
- Organ
- Sample
- Clinical Imaging Modalities
- Fluorescence In Situ Hybridization (FISH)
Expand All @@ -24,6 +23,7 @@ categories-order:
- Sequence Assays
- Single-cycle Fluorescence Microscopy (SFM)
- Spatial Transcriptomics
- Derived Datasets
- Other TSVs

# Exclude from processing.
Expand Down
2 changes: 1 addition & 1 deletion docs/atacseq/current/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Related files:
- [📝 TSV template](https://raw.githubusercontent.com/hubmapconsortium/dataset-metadata-spreadsheet/main/atacseq/latest/atacseq.tsv): Alternative for metadata entry.



For additional documentation on this dataset type, please visit [here](https://docs.google.com/document/d/1cVX_uMA5ehz3TBjrlXSb9KkRo8_5kcFUFhJaWeW9JyU).

## Metadata schema

Expand Down
2 changes: 1 addition & 1 deletion docs/celldive/current/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Related files:
## Metadata schema


<summary><a href="https://openview.metadatacenter.org/templates/https:%2F%2Frepo.metadatacenter.org%2Ftemplates%2F4b41ed1c-30ae-4a54-839f-9d92750bcc05"><b>Version 2 (use this one)</b></a></summary>
<summary><a href="https://openview.metadatacenter.org/templates/https:%2F%2Frepo.metadatacenter.org%2Ftemplates%2F6f9eee7b-7ef1-4f32-a34e-706bbbbb09bf"><b>Version 2 (use this one)</b></a></summary>



Expand Down
2 changes: 1 addition & 1 deletion docs/codex/current/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Related files:
| <code>extras\/microscope_settings\.json</code> | | **[QA/QC]** A file generated by the micro-meta app that contains a description of the settings that were used to acquire the image data. Email HuBMAP Consortium Help Desk <[email protected]> if help is required in generating this document. | |
| <code>raw\/.*</code> || This is a directory containing raw data. | |
| <code>lab_processed\/images\/[^\/]+\.ome\.tiff</code> || OME-TIFF file (multichannel, multi-layered) produced by the experiment. If compressed, must use loss-less compression algorithm. See the following link for the set of fields that are required in the OME TIFF file XML header. <https://docs.google.com/spreadsheets/d/1YnmdTAA0Z9MKN3OjR3Sca8pz-LNQll91wdQoRPSP6Q4/edit#gid=0> | |
| <code>lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv</code> || This file should describe any processing that was done to generate the images in each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. Two columns should be booleans "is this a channel to use for nuclei segmentation" and "is this a channel to use for cell segmentation". | |
| <code>lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv</code> || This file should describe any processing that was done to generate the images in each channel of the accompanying OME TIFF. The file should contain one row per OME TIFF channel. Two columns should be booleans "is this a channel to use for nuclei segmentation" and "is this a channel to use for cell segmentation". | |
| <code>lab_processed\/annotations\/.*</code> | | Directory containing segmentation masks. | |
| <code>lab_processed\/annotations\/[^\/]+\.segmentations\.ome\.tiff</code> | | The segmentation masks should be stored as multi-channel pyramidal OME TIFF bitmasks with one channel per mask, where a single mask contains all instances of a type of object (e.g., all cells, a class of FTUs, etc). The class of objects contained in the mask is documented in the segmentation-masks.csv file. Each individual object in a mask should be represented by a unique integer pixel value starting at 1, with 0 meaning background (e.g., all pixels belonging to the first instance of a T-cell have a value of 1, the pixels for the second instance of a T-cell have a value of 2, etc). The pixel values should be unique within a mask. FTUs and other structural elements should be captured the same way as cells with segmentation masks and the appropriate channel feature definitions. | lab_processed\/annotations\/.* |
| <code>lab_processed\/annotations\/segmentation-masks\.csv</code> | | This file contains details about each mask, with one row per mask. Each column in this file contains details describing the mask (e.g., channel number, mask name, ontological ID, etc). Each mask is stored as a channel in the segmentations.ome.tiff file and the mask name should be ontologically based and linked to the ASCT+B table where possible. The number of rows in this file should equal the number of channels in the segmentations.ome.tiff. For example, one row in this file would ontologically describe cells, if the segmentations.ome.tiff file contained a mask of all cells. A minimum set of fields (required and optional) is included below. If multiple segmentations.ome.tiff files are used, this segmentation-masks.csv file should document the masks across all of the OME TIFF files. | lab_processed\/annotations\/.* |
Expand Down
3 changes: 3 additions & 0 deletions docs/field-assays.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,7 @@ assay_category:
- Sample Section
- Sample Suspension
- Second Harmonic Generation
- Segmentation Mask
- Slide-seq
- Thick Section Multiphoton MxIF
- Ultrasound
Expand Down Expand Up @@ -326,6 +327,7 @@ assay_type:
- Sample Section
- Sample Suspension
- Second Harmonic Generation
- Segmentation Mask
- Slide-seq
- Thick Section Multiphoton MxIF
- Ultrasound
Expand Down Expand Up @@ -839,6 +841,7 @@ is_cedar:
- Sample Section
- Sample Suspension
- Second Harmonic Generation
- Segmentation Mask
- Slide-seq
- Thick Section Multiphoton MxIF
- Ultrasound
Expand Down
Binary file modified docs/field-schemas.xlsx
Binary file not shown.
3 changes: 3 additions & 0 deletions docs/field-schemas.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,7 @@ assay_category:
- scrnaseq
- scrnaseq-hca
- second-harmonic-generation
- segmentation-mask
- seqfish
- sims
- slideseq
Expand Down Expand Up @@ -231,6 +232,7 @@ assay_type:
- scrnaseq
- scrnaseq-hca
- second-harmonic-generation
- segmentation-mask
- seqfish
- sims
- slideseq
Expand Down Expand Up @@ -583,6 +585,7 @@ is_cedar:
- sample-section
- sample-suspension
- second-harmonic-generation
- segmentation-mask
- seqfish
- sims
- slideseq
Expand Down
2 changes: 1 addition & 1 deletion docs/phenocycler/current/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,6 @@ Related files:
| <code>lab_processed\/.*</code> || Experiment files that were processed by the lab generating the data. |
| <code>lab_processed\/images\/.*</code> || This is a directory containing processed image files |
| <code>lab_processed\/images\/[^\/]+\.tissue-boundary\.geojson</code> | | **[QA/QC]** If the boundaries of the tissue have been identified (e.g., by manual efforts), then the boundary geometry can be included as a GeoJSON file named “*.tissue-boundary.geojson”. |
| <code>lab_processed\/images\/[^\/]+\.ome\.tiff</code> || OME-TIFF file (multichannel, multi-layered) produced by the experiment. If compressed, must use loss-less compression algorithm. See the following link for the set of fields that are required in the OME TIFF file XML header. <https://docs.google.com/spreadsheets/d/1YnmdTAA0Z9MKN3OjR3Sca8pz-LNQll91wdQoRPSP6Q4/edit#gid=0> |
| <code>lab_processed\/images\/[^\/]+\.ome\.tiff</code> || OME-TIFF file (multichannel, multi-layered) produced by the experiment. If compressed, must use loss-less compression algorithm. See the following link for the set of fields that are required in the OME TIFF file XML header. <https://docs.google.com/spreadsheets/d/1YnmdTAA0Z9MKN3OjR3Sca8pz-LNQll91wdQoRPSP6Q4/edit#gid=0>. It is recommended that you confirm you're using the latest version of Bio-Formats, when generating the OME TIFF, as newer versions have improved XML handling. |
| <code>lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv</code> || This file provides essential documentation pertaining to each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. The required fields are detailed <https://docs.google.com/spreadsheets/d/1xEJSb0xn5C5fB3k62pj1CyHNybpt4-YtvUs5SUMS44o/edit#gid=0> |

5 changes: 4 additions & 1 deletion docs/publication/current/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ Excel and TSV templates for this schema will be available when the draft next-ge

| pattern | required? | description |
| --- | --- | --- |
| <code>TODO</code> || Directory structure not yet specified. |
| <code>extras\/.*</code> || Folder for general lab-specific files related to the dataset. [Exists in all assays] |
| <code>data\/.+</code> (example: <code>data/file1.ext</code>) || Supplementary data files for the publication. All files referenced by the Vitessce visualization configurations in the vignettes must be included in this directory. |
| <code>vignettes\/.*</code> || Subdirectory containing Vitessce visualization files and a description of those files. |
| <code>vignettes\/vignette_\d+\/[^\/]+\.json</code> (example: <code>vignettes/vignette_01/file1.json</code>) | | Vitessce visualization configuration files. One or more visualization configurations can be provided per vignette. |
| <code>vignettes\/vignette_\d+\/description\.md</code> (example: <code>vignettes/vignette_02/description.md</code>) | | Description of the vignette and titles for the visualization configuration files. |

2 changes: 1 addition & 1 deletion docs/rnaseq-with-probes/current/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Related files:
- [📝 TSV template](https://raw.githubusercontent.com/hubmapconsortium/dataset-metadata-spreadsheet/main/rnaseq-with-probes/latest/rnaseq-with-probes.tsv): Alternative for metadata entry.



For additional documentation on this dataset type, please visit [here](https://docs.google.com/document/d/1cVX_uMA5ehz3TBjrlXSb9KkRo8_5kcFUFhJaWeW9JyU).

## Metadata schema

Expand Down
2 changes: 1 addition & 1 deletion docs/rnaseq/current/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Related files:
- [📝 TSV template](https://raw.githubusercontent.com/hubmapconsortium/dataset-metadata-spreadsheet/main/rnaseq/latest/rnaseq.tsv): Alternative for metadata entry.



For additional documentation on this dataset type, please visit [here](https://docs.google.com/document/d/1cVX_uMA5ehz3TBjrlXSb9KkRo8_5kcFUFhJaWeW9JyU).

## Metadata schema

Expand Down
1 change: 1 addition & 0 deletions docs/segmentation-mask/current/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Moved to [github pages](https://hubmapconsortium.github.io/ingest-validation-tools/segmentation-mask/).
28 changes: 28 additions & 0 deletions docs/segmentation-mask/current/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
title: Segmentation Mask
schema_name: segmentation-mask
category: Derived Datasets
all_versions_deprecated: False
exclude_from_index: False
layout: default

---

Related files:

Excel and TSV templates for this schema will be available when the draft next-generation schema, to be used in all future submissions, is finalized (no later than Sept. 30).

For additional documentation on Segmentation Masks, please visit [here](https://docs.google.com/document/d/1TSQon8nTIoyA5bEKxd8IAKYO6nsDabGbbQ8uKN1gj4E).

## Metadata schema


<summary><a href="https://docs.google.com/spreadsheets/d/1sMMyKtrxD_PO4TVj0JhOpeLF0fRYe2Fjmxnhp-fNzdM"><b>Version 2 (use this one)</b> (draft - submission of data prepared using this schema will be supported by Sept. 30)</a></summary>



<br>

## Directory schemas
<summary><a href="https://docs.google.com/spreadsheets/d/1lAGryXYfGIP0jBXjIopB7PtFfo6RqC2bw3RUI4GzTWQ"><b>Version 2 (use this one)</b> (draft - submission of data prepared using this schema will be supported by Sept. 30) </a></summary>

20 changes: 11 additions & 9 deletions examples/plugin-tests/expected-failure/README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,22 @@
/home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S1/raw/images/faketiff.tiff is not a valid TIFF file: not a TIFF file
/home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S1/lab_processed/images/Visium_90LC_A4_S1.ome.tiff is not a valid TIFF file: not a TIFF file
/home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_I4_S1/raw/images/faketiff.tiff is not a valid TIFF file: not a TIFF file
/home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_I4_S1/lab_processed/images/Visium_90LC_A4_S1.ome.tiff is not a valid TIFF file: not a TIFF file
/home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S2/raw/images/faketiff.tiff is not a valid TIFF file: not a TIFF file
Threading at 4
Threading at 4
Validating matching fastq files in /home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S1
Added files from /home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S1 to dirs_and_files: defaultdict(<class 'dict'>, {PosixPath('/home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S1'): defaultdict(<class 'list'>, {PosixPath('raw/fastq/RNA'): [PosixPath('/home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S1/raw/fastq/RNA/empty_R_file.fastq.gz')]})})
Added files from /home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S2 to dirs_and_files: defaultdict(<class 'dict'>, {PosixPath('/home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S1'): defaultdict(<class 'list'>, {PosixPath('raw/fastq/RNA'): [PosixPath('/home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S1/raw/fastq/RNA/empty_R_file.fastq.gz')]}), PosixPath('/home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S2'): defaultdict(<class 'list'>, {PosixPath('raw/fastq/RNA'): [PosixPath('/home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S2/raw/fastq/RNA/empty_R_file.fastq.gz')]})})
Added files from /home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_I4_S1 to dirs_and_files: defaultdict(<class 'dict'>, {PosixPath('/home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S1'): defaultdict(<class 'list'>, {PosixPath('raw/fastq/RNA'): [PosixPath('/home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S1/raw/fastq/RNA/empty_R_file.fastq.gz')]}), PosixPath('/home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S2'): defaultdict(<class 'list'>, {PosixPath('raw/fastq/RNA'): [PosixPath('/home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S2/raw/fastq/RNA/empty_R_file.fastq.gz')]}), PosixPath('/home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_I4_S1'): defaultdict(<class 'list'>, {PosixPath('raw/fastq/RNA'): [PosixPath('/home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_I4_S1/raw/fastq/RNA/empty_R_file.fastq.gz')]})})
Validating matching fastq file /home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_I4_S1/raw/fastq/RNA/empty_R_file.fastq.gz
Validating empty_R_file.fastq.gz...
→ /home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_I4_S1/raw/fastq/RNA/empty_R_file.fastq.gz
Validating matching fastq file /home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S1/raw/fastq/RNA/empty_R_file.fastq.gz
Validating empty_R_file.fastq.gz...
→ /home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S1/raw/fastq/RNA/empty_R_file.fastq.gz
Threading at 4
Threading at 4
Validating matching fastq files in /home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S2
Validating matching fastq file /home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S2/raw/fastq/RNA/empty_R_file.fastq.gz
Validating empty_R_file.fastq.gz...
→ /home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_A4_S2/raw/fastq/RNA/empty_R_file.fastq.gz
Threading at 4
Threading at 4
Validating matching fastq files in /home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_I4_S1
Validating empty_R_file.fastq.gz...
→ /home/gesina/code/ingest-validation-tools/examples/plugin-tests/expected-failure/upload/Visium_9OLC_I4_S1/raw/fastq/RNA/empty_R_file.fastq.gz
```
Plugin Errors:
Recursively test all ome-tiff files for validity:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ files:
-
pattern: lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv
required: True
description: This file should describe any processing that was done to generate the images in each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. Two columns should be booleans "is this a channel to use for nuclei segmentation" and "is this a channel to use for cell segmentation".
description: This file should describe any processing that was done to generate the images in each channel of the accompanying OME TIFF. The file should contain one row per OME TIFF channel. Two columns should be booleans "is this a channel to use for nuclei segmentation" and "is this a channel to use for cell segmentation".
-
pattern: lab_processed\/annotations\/.*
required: False
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ files:
-
pattern: lab_processed\/images\/[^\/]+\.ome\.tiff
required: True
description: OME-TIFF file (multichannel, multi-layered) produced by the experiment. If compressed, must use loss-less compression algorithm. See the following link for the set of fields that are required in the OME TIFF file XML header. <https://docs.google.com/spreadsheets/d/1YnmdTAA0Z9MKN3OjR3Sca8pz-LNQll91wdQoRPSP6Q4/edit#gid=0>
description: OME-TIFF file (multichannel, multi-layered) produced by the experiment. If compressed, must use loss-less compression algorithm. See the following link for the set of fields that are required in the OME TIFF file XML header. <https://docs.google.com/spreadsheets/d/1YnmdTAA0Z9MKN3OjR3Sca8pz-LNQll91wdQoRPSP6Q4/edit#gid=0>. It is recommended that you confirm you're using the latest version of Bio-Formats, when generating the OME TIFF, as newer versions have improved XML handling.
is_qa_qc: False
-
pattern: lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv
Expand Down
Loading

0 comments on commit 3f2022a

Please sign in to comment.