Docker containers to pre-process data for visualization in the portal.
The subdirectories in this repo all have the same structure:
context/
: A Docker context, including aDockerfile
and typicallymain.py
,requirements.txt
, andrequirements-freeze.txt
.test-input/
,test-output-actual/
,test-output-expected/
: Test fixtures.VERSION
: contains a semantic version number- and a
README.md
.
Images are named by the containing directory.
Running test.sh
will build (and test!) all the images.
You can then define $INPUT_DIR
, $OUTPUT_DIR
, and $IMAGE
to run an image with your own data:
docker run \
--mount type=bind,source=$INPUT_DIR,target=/input \
--mount type=bind,source=$OUTPUT_DIR,target=/output \
$IMAGE
To push the latest versions to dockerhub just run:
test_docker.sh push
This repo is included as a submodule in ingest-pipeline
:
When there are changes here that you want run in production:
- bump the
VERSION
file in to the corresponding containers subdirectory - update the version referenced in the corresponding
.cwl
file in the root directory - run
test_docker.sh push
- make a PR in
ingest-pipeline
to update that submodule to the latest code here, and make Joel a reviewer on the PR.
Depending on the rate of change, it might be good to have a weekly routine of making PRs to ingest-pipeline
. TBD.
In addition, each workflow must have a corresponding -manifest.json
file conforming to this schema, which has a pattern
, description
, and edam_ontology_term
entry for each output file. (see here for information about EDAM).
# In ingest-pipeline:
git checkout devel
git pull
git submodule update --init --recursive # This fails right now because we're not using plain URLs in .gitmodules.
git checkout -b username/update-portal-containers
cd src/ingest-pipeline/airflow/dags/cwl/portal-containers/
git checkout master
git pull
cd -
git commit -am 'Update portal-containers'
git push origin
# And then make a PR at: https://github.com/hubmapconsortium/ingest-pipeline
Here is a template for the PR into ingest-pipelines
, if there is a new pipeline in portal-containers
that needs to be run. This helps us communicate what the pipeline's input, output, and purpose are:
# --NAME OF THE PIPELINE--
## Input Pipeline/Original Dataset:
## Output Pipeline (Optional):
## Description:
For example:
# [ome-tiff-tiler](https://github.com/hubmapconsortium/portal-containers/blob/master/ome-tiff-tiler.cwl)
## Input Pipeline/Original Dataset + Files:
- High resolution imaging from Vanderbilt data (OME-TIFF) files, such as those in `/hive/hubmap/lz/Vanderbilt TMC/e6e9bb7c01d3cb9cdb31a8da857f8832/processedMicroscopy/`
## Output Pipeline:
- [ome-tiff-offsets](https://github.com/hubmapconsortium/portal-containers/blob/master/ome-tiff-offsets.cwl)
## Description:
This pipeline takes as input Vanderbilt's processed microscopy data and outputs an image pyramid for visualization. In addition, the `ome-tiff-offsets` pipeline needs to be run the output of `ome-tiff-tiler` so that images with z-stacks/large numbers of channels can be efficiently visualized.