-
Notifications
You must be signed in to change notification settings - Fork 0
Using gatb_tools_machine on the command line
We suppose here that you have built the gatb_tools_machine using docker build command.
This project contains a 'data' directory that can be used to illustrate how to execute a GATB-Tool inside the Docker container named 'gatb_tools_machine' with real data located outside the container.
First of all, create a 'tmp' (i.e. working) directory:
cd <gatb-tools-machine Github home project>
mkdir tmp
Now, within "gatb-tools-machine Github home project", we have:
./tmp: a working directory
./data: test data for each tool; in turn, we have a sub-folder for each tool to test
General form of the command to use is as follows:
docker run --rm -i -t gatb_tools_machine -c <command> -- <arguments>
where:
<command>: must be one of (case-sensitive):
- simka, simka-visu,
- bloocoo, metabloocoo,
- dsk, dsk2ascii
- MindTheGap
- minia
- rconnector (Short Read Connector)
- discosnp
- takeabreak
- h5dump, dbgh5, dbginfo
<arguments>: tool arguments
To illustrate the use of this GATB-Tools Docker Image, let's take the example of running SIMKA tool.
When using Simka on the command-line, you use such a command line:
cd simka/bin
./simka -in ../example/simka_input.txt -out ./simka_results/ -out-tmp ./simka_temp_output
---1--- ---2-----------------------------------------------------------------------------
1: Simka program name
2: Simka arguments
Now, to run Simka using the Docker container gatb_tools_machine, you simply do this:
docker run --rm -i -t -v $PWD:/tmp gatb_tools_machine -c simka -- <Simka arguments>
---1------------------------------------------------- ---2---- ---3-------------
1: Start the Docker container (more on that, below)
2: We say that we want to use 'simka' (more on that, below)
3: Simka arguments
Here is a real example:
cd <gatb-tools-machine Github home project>
docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/simka/:/simka gatb_tools_machine -c simka -- -in /simka/simka_input.txt -out /tmp/simka_results/ -out-tmp /tmp/simka_temp_output
You'll see results in '$PWD/tmp' when the command has finished to execute.
This command explained:
docker run [1]
--rm [2]
-i -t [3]
-v $PWD:/tmp [4]
-v $PWD/data/simka/:/simka [4']
gatb_tools_machine [5]
-c simka [6]
-- [7]
-in /simka/example/simka_input.txt [8]
-out /tmp/simka_results/ [9]
-out-tmp /tmp/simka_temp_output [10]
[1]-[5]: Docker arguments
[6]-[7]: simka container's invoker program
[8]-[10]: 'bin/simka' arguments
[1]: start Docker container
[2]: destroy container when Docker finishes
(it does NOT delete the 'gatb_tools_machine' image)
[3]: start an interactive job
(for instance, you'll see messages on stdout, if any)
[4]: mount a volume. This is required to get the results from Simka.
Here, we say that '$PWD/tmp' ('tmp' subdirectory located within
current local directory will be viewed as '/tmp' from the inside
of the container. Then, we say that $PWD/data/simka/ directory
will be viewed as '/simka' from the inside of the container. In such
a way, we have an easy way to provide OUR data (located within
$PWD/data/simka/) to the program 'simka' located within the
Docker container. In turn, 'simka' will produce results in '/tmp',
i.e. in '$PWD/tmp', actually.
[5]: tell Docker which image to start: the 'gatb_tools_machine' of course.
[6]: ask to start the simka program. See companion file 'run-tool.sh' for
more information.
[7]: '--' is required to separate arguments [6] from the rest of the
command line
[8]: the data file to process with simka. Here we use a data file
provided with the simka software to test it.
[9]: tells simka where to put results. Of course, simka will write
within /tmp directory inside the container. However, since we
have directive [4], data writing is actually done in $PWD/tmp,
i.e. a local directory.
[10]: tells simka where to put temporary files.
Now that you see how you can start 'simka' GATB-Tool, you will be capable of using all other tools.
Here is a list of basic command to use to test all provided GATB-Tools with sample data.
Simka visualization:
docker run --rm -i -t -v $PWD/tmp/:/tmp gatb_tools_machine -c simka-visu -- -in /tmp/simka_results/ -out /tmp/simka_results/ -pca -heatmap -tree
Bloocoo:
docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/bloocoo/:/bloocoo gatb_tools_machine -c bloocoo -- -file /bloocoo/errclose.fasta -out /tmp/errclose_bloocoo_corr_errs.tab -kmer-size 31 -abundance-min 5 -err-tab
MetaBloocoo:
cd $PWD/data/bloocoo
curl -O http://downloads.hmpdacc.org/data/Illumina/anterior_nares/SRS018585.tar.bz2
tar -xjf SRS018585.tar.bz2
cd ../..
docker run --rm -i -t -v $PWD/tmp/:/tmp -v /Users/pdurand/tmp/nosave/gatb-tools-machine/data/bloocoo/:/bloocoo gatb_tools_machine -c metabloocoo -- count -file /bloocoo/SRS018585/SRS018585.denovo_duplicates_marked.trimmed.1.fastq -out /tmp/SRS018585
DSK:
docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/dsk/:/dsk gatb_tools_machine -c dsk -- -file /dsk/read50x_ref10K_e001.fasta.gz -kmer-size 27 -out /tmp/dsk27 -max-memory 200 -verbose 0
docker run --rm -i -t -v $PWD/tmp/:/tmp gatb_tools_machine -c h5dump -- -y -d histogram/histogram /tmp/dsk27.h5 | grep "^\ *[0-9]" | tr -d " " | tr -d "," | paste - - > $PWD/tmp/dsk27.histo
MindTheGap:
docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/MindTheGap/:/mdg gatb_tools_machine -c MindTheGap -- find -in /mdg/master.fasta -ref /mdg/deleted.fasta -kmer-size 31 -out /tmp/mdg_find -insert-only
Short Read Connector:
docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/ShortReadConnector/:/src gatb_tools_machine -c rconnector -- -b /src/c1.fasta.gz -q /src/fof.txt -p src_linker
DiscoSNP++:
docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/DiscoSnp/:/disco gatb_tools_machine -c discosnp -- -r /disco/fof.txt -T
TakeABreak:
docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/TakeABreak/:/tab gatb_tools_machine -c takeabreak -- -in /tab/test4.fasta.gz -out /tmp/test4.takeabreak
In addition, please refer to the appropriate GATB-Tools to review how to use their respective command-line arguments.
Documentation is here: