Skip to content

Using gatb_tools_machine on the command line

Genscale Team edited this page Jun 26, 2017 · 5 revisions

Pre-requisites

We suppose here that you have built the gatb_tools_machine using docker build command.

Using the Docker Image

This project contains a 'data' directory that can be used to illustrate how to execute a GATB-Tool inside the Docker container named 'gatb_tools_machine' with real data located outside the container.

First of all, create a 'tmp' (i.e. working) directory:

cd <gatb-tools-machine Github home project>
mkdir tmp

Now, within "gatb-tools-machine Github home project", we have:

./tmp: a working directory
./data: test data for each tool; in turn, we have a sub-folder for each tool to test

How to use the image on the command-line?

General form of the command to use is as follows:

docker run --rm -i -t gatb_tools_machine -c <command> -- <arguments>

where:
    <command>: must be one of (case-sensitive):
               - simka, simka-visu, 
               - bloocoo, metabloocoo,
               - dsk, dsk2ascii
               - MindTheGap
               - minia
               - rconnector (Short Read Connector)
               - discosnp
               - takeabreak
               - h5dump, dbgh5, dbginfo
    <arguments>: tool arguments

Example: running Simka

To illustrate the use of this GATB-Tools Docker Image, let's take the example of running SIMKA tool.

When using Simka on the command-line, you use such a command line:

cd simka/bin
./simka  -in ../example/simka_input.txt -out ./simka_results/ -out-tmp ./simka_temp_output
---1---  ---2-----------------------------------------------------------------------------

1: Simka program name
2: Simka arguments

Now, to run Simka using the Docker container gatb_tools_machine, you simply do this:

docker run --rm -i -t -v $PWD:/tmp gatb_tools_machine   -c simka -- <Simka arguments>
---1-------------------------------------------------   ---2----    ---3-------------

1: Start the Docker container (more on that, below)
2: We say that we want to use 'simka' (more on that, below)
3: Simka arguments

Here is a real example:

cd <gatb-tools-machine Github home project>
docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/simka/:/simka gatb_tools_machine -c simka -- -in /simka/simka_input.txt -out /tmp/simka_results/ -out-tmp /tmp/simka_temp_output

You'll see results in '$PWD/tmp' when the command has finished to execute.

This command explained:

docker run                                 [1]
   --rm                                    [2]
   -i -t                                   [3]
   -v $PWD:/tmp                            [4]
   -v $PWD/data/simka/:/simka              [4']
   gatb_tools_machine                      [5] 
   -c simka                                [6]
   --                                      [7]
   -in /simka/example/simka_input.txt      [8]
   -out /tmp/simka_results/                [9]
   -out-tmp /tmp/simka_temp_output         [10]

   [1]-[5]: Docker arguments
   [6]-[7]: simka container's invoker program
   [8]-[10]: 'bin/simka' arguments

   [1]: start Docker container
   [2]: destroy container when Docker finishes
        (it does NOT delete the 'gatb_tools_machine' image)
   [3]: start an interactive job 
        (for instance, you'll see messages on stdout, if any)
   [4]: mount a volume. This is required to get the results from Simka.
        Here, we say that '$PWD/tmp' ('tmp' subdirectory located within
        current local directory will be viewed as '/tmp' from the inside 
        of the container. Then, we say that $PWD/data/simka/ directory
        will be viewed as '/simka' from the inside of the container. In such
        a way, we have an easy way to provide OUR data (located within
        $PWD/data/simka/) to the program 'simka' located within the
        Docker container. In turn, 'simka' will produce results in '/tmp',
        i.e. in '$PWD/tmp', actually. 
   [5]: tell Docker which image to start: the 'gatb_tools_machine' of course.
   [6]: ask to start the simka program. See companion file 'run-tool.sh' for
        more information.
   [7]: '--' is required to separate arguments [6] from the rest of the
        command line
   [8]: the data file to process with simka. Here we use a data file
        provided with the simka software to test it.
   [9]: tells simka where to put results. Of course, simka will write 
        within /tmp directory inside the container. However, since we
        have directive [4], data writing is actually done in $PWD/tmp, 
        i.e. a local directory.
   [10]: tells simka where to put temporary files. 

Now that you see how you can start 'simka' GATB-Tool, you will be capable of using all other tools.

Running other GATB-Tools

Here is a list of basic command to use to test all provided GATB-Tools with sample data.

Simka visualization:

docker run --rm -i -t -v $PWD/tmp/:/tmp gatb_tools_machine -c simka-visu -- -in /tmp/simka_results/ -out /tmp/simka_results/ -pca -heatmap -tree

Bloocoo:

docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/bloocoo/:/bloocoo gatb_tools_machine -c bloocoo -- -file /bloocoo/errclose.fasta -out /tmp/errclose_bloocoo_corr_errs.tab -kmer-size 31 -abundance-min 5 -err-tab

MetaBloocoo:

cd $PWD/data/bloocoo
curl -O http://downloads.hmpdacc.org/data/Illumina/anterior_nares/SRS018585.tar.bz2
tar -xjf SRS018585.tar.bz2
cd ../..
docker run --rm -i -t -v $PWD/tmp/:/tmp -v /Users/pdurand/tmp/nosave/gatb-tools-machine/data/bloocoo/:/bloocoo gatb_tools_machine -c metabloocoo -- count -file /bloocoo/SRS018585/SRS018585.denovo_duplicates_marked.trimmed.1.fastq -out /tmp/SRS018585

DSK:

docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/dsk/:/dsk gatb_tools_machine -c dsk -- -file /dsk/read50x_ref10K_e001.fasta.gz -kmer-size 27 -out /tmp/dsk27 -max-memory 200 -verbose 0
docker run --rm -i -t -v $PWD/tmp/:/tmp gatb_tools_machine -c h5dump -- -y -d histogram/histogram /tmp/dsk27.h5 | grep "^\ *[0-9]" | tr -d " " | tr -d "," | paste - - > $PWD/tmp/dsk27.histo

MindTheGap:

docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/MindTheGap/:/mdg gatb_tools_machine -c MindTheGap -- find -in /mdg/master.fasta -ref /mdg/deleted.fasta -kmer-size 31 -out /tmp/mdg_find -insert-only

Short Read Connector:

docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/ShortReadConnector/:/src gatb_tools_machine -c rconnector -- -b /src/c1.fasta.gz -q /src/fof.txt -p src_linker

DiscoSNP++:

docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/DiscoSnp/:/disco gatb_tools_machine -c discosnp -- -r /disco/fof.txt -T

TakeABreak:

docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/TakeABreak/:/tab gatb_tools_machine -c takeabreak -- -in /tab/test4.fasta.gz -out /tmp/test4.takeabreak

GATB-Tools documentation

In addition, please refer to the appropriate GATB-Tools to review how to use their respective command-line arguments.

Documentation is here: