Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sudo: unknown uid 1012: who are you? #475

Closed
brainstorm opened this issue Nov 8, 2016 · 15 comments
Closed

sudo: unknown uid 1012: who are you? #475

brainstorm opened this issue Nov 8, 2016 · 15 comments
Assignees

Comments

@brainstorm
Copy link

Moving on from issue #469 et al, I'm hitting a wall now with docker uid: ... from what I can tell, it does not have to do with DOCKSTORE_ROOT env var at all, but I assume docker's --privileged flag might help in here?

Seems like cwltool is passing my hosts's UID to the container and of course cannot find it in the running docker container.

@denis-yuen Thanks a ton for following this up with me, any hints with this one? How are you mapping UID's between host and container now? Or are you just running it all as root?:

Executing: cwltool --enable-dev --non-strict --enable-net --outdir /home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-4f257681-2907-4b2e-bd65-c9eaae282f84/outputs/ --tmpdir-prefix /home/rvalls/dev/CGP-Somatic-Docker/./datasto
re/launcher-4f257681-2907-4b2e-bd65-c9eaae282f84/working/ /home/rvalls/dev/CGP-Somatic-Docker/Dockstore.cwl /home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-4f257681-2907-4b2e-bd65-c9eaae282f84/workflow_params.json
org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
        at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
        at org.apache.commons.exec.DefaultExecutor.access$200(DefaultExecutor.java:48)
        at org.apache.commons.exec.DefaultExecutor$1.run(DefaultExecutor.java:200)
        at java.lang.Thread.run(Thread.java:745)
stderr : /home/rvalls/.miniconda/bin/cwltool 1.0.20160712154127
[job Dockstore.cwl] /tmp/tmp0xcfN1$ docker \
    run \
    -i \
    --volume=/home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-4f257681-2907-4b2e-bd65-c9eaae282f84/inputs/1870eb79-6cce-49a0-9d83-5e8de445c872/HCC1143.bam.bai:/var/lib/cwl/stgebf8302e-eea7-4be2-ade7-492b3c1048a9/HCC1143.ba$
.bai:ro \
    --volume=/home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-4f257681-2907-4b2e-bd65-c9eaae282f84/inputs/b7582d9b-caf6-4c09-9659-7c60478c703d/HCC1143_BL.bam.bai:/var/lib/cwl/stg8a35bc14-b785-441a-aaf5-fedc5f622d3d/HCC1143_
BL.bam.bai:ro \
    --volume=/home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-4f257681-2907-4b2e-bd65-c9eaae282f84/inputs/b7582d9b-caf6-4c09-9659-7c60478c703d/HCC1143_BL.bam:/var/lib/cwl/stg8a35bc14-b785-441a-aaf5-fedc5f622d3d/HCC1143_BL.b
am:ro \
    --volume=/home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-4f257681-2907-4b2e-bd65-c9eaae282f84/inputs/f54a1fa2-51f1-4389-9f82-4165f8a66bb3/GRCh37d5_CGP_refBundle.tar.gz:/var/lib/cwl/stg7d26b651-ed4d-4da3-a452-8eb2b0c233
55/GRCh37d5_CGP_refBundle.tar.gz:ro \
    --volume=/home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-4f257681-2907-4b2e-bd65-c9eaae282f84/inputs/1870eb79-6cce-49a0-9d83-5e8de445c872/HCC1143.bam:/var/lib/cwl/stgebf8302e-eea7-4be2-ade7-492b3c1048a9/HCC1143.bam:ro
\
    --volume=/home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-4f257681-2907-4b2e-bd65-c9eaae282f84/inputs/5dcd7597-a86e-4482-99c6-5440da5e7a2b/GRCh37d5_battenberg.tar.gz:/var/lib/cwl/stgf33af064-c789-467d-adcc-3ea11965b515/
GRCh37d5_battenberg.tar.gz:ro \
    --volume=/tmp/tmp0xcfN1:/var/spool/cwl:rw \
    --volume=/home/rvalls/dev/CGP-Somatic-Docker/datastore/launcher-4f257681-2907-4b2e-bd65-c9eaae282f84/workingen7nUi:/tmp:rw \
    --workdir=/var/spool/cwl \                                                                                                                                                                                                             --read-only=true \
    --user=1012 \                                                                                                                                                                                                                          --rm \
    --env=TMPDIR=/tmp \
    --env=HOME=/var/spool/cwl \
    quay.io/pancancer/pcawg-sanger-cgp-workflow:2.0.2 \                                                                                                                                                                                    python \                                                                                                                                                                                                                               /home/seqware/CGP-Somatic-Docker/scripts/run_seqware_workflow.py \
    --tumor \
    /var/lib/cwl/stgebf8302e-eea7-4be2-ade7-492b3c1048a9/HCC1143.bam \
    --normal \
    /var/lib/cwl/stg8a35bc14-b785-441a-aaf5-fedc5f622d3d/HCC1143_BL.bam \
    --refFrom \
    /var/lib/cwl/stg7d26b651-ed4d-4da3-a452-8eb2b0c23355/GRCh37d5_CGP_refBundle.tar.gz \
    --bbFrom \                                                                                                                                                                                                                             /var/lib/cwl/stgf33af064-c789-467d-adcc-3ea11965b515/GRCh37d5_battenberg.tar.gz                                                                                                                                                    [ERROR] command: sudo mkdir -p /var/spool/cwl/.seqware && sudo chown -R seqware /var/spool/cwl/ exited with code: 1                                                                                                                    sudo: unknown uid 1012: who are you?

Traceback (most recent call last):
  File "/home/seqware/CGP-Somatic-Docker/scripts/run_seqware_workflow.py", line 277, in <module>                                                                                                                                       RUNNING...
 sudo mkdir -p /var/spool/cwl/.seqware && sudo chown -R seqware /var/spool/cwl/

    main()
  File "/home/seqware/CGP-Somatic-Docker/scripts/run_seqware_workflow.py", line 233, in main
    execute("sudo mkdir -p /var/spool/cwl/.seqware && sudo chown -R seqware /var/spool/cwl/")
  File "/home/seqware/CGP-Somatic-Docker/scripts/run_seqware_workflow.py", line 197, in execute
    raise RuntimeError
RuntimeError
Error deleting container: Error response from daemon: Cannot destroy container e75965a88bc303bd04c7cff0e45f73b690ce4739afeda2224b1aa3f124db8b1f: Unable to remove filesystem for e75965a88bc303bd04c7cff0e45f73b690ce4739afeda2224b1aa3f124db8b1f: remove /cm/shared/docker_volumes/containers/e75965a88bc303bd04c7cff0e45f73b690ce4739afeda2224b1aa3f124db8b1f: directory not empty                                                                                          Error while running job: Error collecting output for parameter 'somatic_cnv_tar_gz': Did not find output file with glob pattern: '['*.somatic.cnv.tar.gz']'
[job Dockstore.cwl] completed permanentFail
Final process status is permanentFail
Workflow error, try again with --debug for more information:
  Process status is ['permanentFail']

stdout :
java.lang.RuntimeException: problems running command: cwltool --enable-dev --non-strict --enable-net --outdir /home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-4f257681-2907-4b2e-bd65-c9eaae282f84/outputs/ --tmpdir-prefix /home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-4f257681-2907-4b2e-bd65-c9eaae282f84/working/ /home/rvalls/dev/CGP-Somatic-Docker/Dockstore.cwl /home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-4f257681-2907-4b2e-bd65-c9eaae282f84/workflow_params.json
@denis-yuen
Copy link
Member

denis-yuen commented Nov 8, 2016

Hi,
A bit of a coincidence, we've run into this one on the gitter chat, but we do not have a good solution yet.

The long explanation is basically that there is a mismatch between how cwltool approaches a docker image and how we approached a Docker image when we were running PCAWG.

cwltool sets the uid of the user inside the container to match the uid of the user outside the container. This works in most cases since if you run an unprivileged tool, it does not matter what user you are inside the container.

For PCAWG, many of the workflows used existing workflow systems that were created before Docker like SeqWare or like Roddy. The strategy that we used was to use the USER keyword https://docs.docker.com/engine/reference/builder/#/user in a Dockerfile to set an executing user and then give that the privileges necessary to configure services like SeqWare, Roddy, or even SGE. In this specific workflow, it looks like we're just configuring file permissions with sudo.

The conflict comes in when you use both, cwltool ends up overriding the user to a seemingly random uid and then the way we've setup the sudoers file probably fails.

That's the long answer and we haven't implemented a solution yet. @briandoconnor @tetron feel free to chime in if I've mischaracterized anything.

The short answer is that we've been testing these workflows by running in a brand new Ubuntu VM on OpenStack or AWS. This seems to work, probably because the default "ubuntu" user has a matching uid with the uid of the user cwltool overrides with.

see also: common-workflow-language/cwltool#47

@tetron
Copy link
Contributor

tetron commented Nov 8, 2016

A fix on the cwltool side may be to switch from matching the user id to setting the gid sticky bit on the output dir. This enables the runner process to manipulate files written by a different user inside the container by having group ownership.

@denis-yuen
Copy link
Member

@tetron Yeah, I think that would be a possibility as well.

@brainstorm
Copy link
Author

Reopening, seems that while the --no-match-user flag did allow cwltool to run, there are (potentially) other underlying issues with this approach:

RUNNING...
 sudo mkdir -p /var/spool/cwl/.seqware && sudo chown -R seqware /var/spool/cwl/

RUNNING...
 sudo cp /home/seqware/.seqware/settings /var/spool/cwl/.seqware

RUNNING...
 sudo chmod a+wrx /var/spool/cwl/.seqware/settings
RUNNING...                                                                                                                                                                                                                              perl -pi -e 's/wrench.res/seqwaremaven/g' /home/seqware/bin/seqware

RUNNING...
 echo "options(bitmapType='cairo')" > /var/spool/cwl/.Rprofile

RUNNING...
 seqware bundle launch --dir /home/seqware/CGP-Somatic-Docker/target/Workflow_Bundle_CgpSomaticCore_0.0.0_SeqWare_1.1.1/ --engine whitestar-parallel --ini /var/spool/cwl/workflow.ini --no-metadata
                                                                                                                                                                                                                                       Downloading SeqWare to /var/spool/cwl/.seqware/self-installs/seqware-distribution-1.1.2-full.jar now...
Downloading SeqWare Check to /var/spool/cwl/.seqware/self-installs/seqware-sanity-check-1.1.2-jar-paired-with-distribution.jar now...
Performing launch of workflow 'CgpSomaticCore' version '0.0.0'
[--plugin, net.sourceforge.seqware.pipeline.plugins.BundleManager, --, --install-dir-only, --bundle, /home/seqware/CGP-Somatic-Docker/target/Workflow_Bundle_CgpSomaticCore_0.0.0_SeqWare_1.1.1/, --out, /tmp/bundle_manager5075051775232464401out]
Installing Bundle (Working Directory Only)                                                                                                                                                                                             Bundle: /home/seqware/CGP-Somatic-Docker/target/Workflow_Bundle_CgpSomaticCore_0.0.0_SeqWare_1.1.1/
Added 'CgpSomaticCore' (SWID: 1)
Bundle Has Been Installed to the MetaDB and Provisioned to /home/seqware/CGP-Somatic-Docker/target/Workflow_Bundle_CgpSomaticCore_0.0.0_SeqWare_1.1.1/!
[--plugin, io.seqware.pipeline.plugins.WorkflowScheduler, --, --workflow-accession, 1, --host, de16069cb2f9, --out, /tmp/scheduler4014052588321300011out, --i, /var/spool/cwl/workflow.ini, --workflow-engine, whitestar-parallel, --no-meta-db, --]                                                                                                                                                                                                                          Created workflow run with SWID: 58
[--plugin, io.seqware.pipeline.plugins.WorkflowLauncher, --, --launch-scheduled, 58]

Above it's a simple docker logs from the running container which upon closer examination it doesn't seem to do anything else than locking:

[rvalls@node08 CGP-Somatic-Docker]$ ps auxwww | grep java
rvalls   13038  0.0  0.0 103248   888 pts/2    S+   14:50   0:00 grep java
rvalls   32893  0.7  2.7 16894740 1373344 pts/1 Sl+ 11:01   1:37 java io.dockstore.client.cli.Client tool launch --entry Dockstore.cwl --local-entry --json test1.json
root     33758  100  0.4 14719440 225568 ?     Sl   11:06 225:18 java io.seqware.cli.Main bundle launch --dir /home/seqware/CGP-Somatic-Docker/target/Workflow_Bundle_CgpSomaticCore_0.0.0_SeqWare_1.1.1/ --engine whitestar-parallel --ini /var/spool/cwl/workflow.ini --no-metadata

[rvalls@node08 CGP-Somatic-Docker]$ sudo strace -p 33758
Process 33758 attached - interrupt to quit
futex(0x2aaaacff99d0, FUTEX_WAIT, 41, NULL

Any hints on what could be going wrong with this run? Will continue digging on my own, but feedback/debugging tips on seqware/CGP/dockstore/cwltool are greatly appreciated.

@brainstorm brainstorm reopened this Nov 9, 2016
@denis-yuen
Copy link
Member

denis-yuen commented Nov 9, 2016

@brainstorm

Hi,
Sorry for these issues. Try taking a look at /datastore/oozie-* inside the running container (you can get another session into a running container with a command like docker exec -ti <id of the container> /bin/bash)

If the workflow successfully started, then you should see generated log files. That's one approach.

Another approach is that earlier in the first log above, cwltool should spit out the docker run command that it uses. Change it to interactive by adding -it and then override the entry point with /bin/bash. Then you should be able to see if anything is wrong with the environment and then call the entrypoint python /home/seqware/CGP-Somatic-Docker/scripts/run_seqware_workflow.py You can also use a command like 'lsof' to see what file is being waited on inside the container.

@denis-yuen
Copy link
Member

denis-yuen commented Nov 9, 2016

One additional clue.
The output of ps auxwww | grep java. I think that's outside the container since it contains the dockstore command. Can you do a ps aux inside the container as well while doing the exec thing above and paste the output here? Sorry for this troubleshooting!

@denis-yuen
Copy link
Member

@brainstorm
To further explain debugging inside the container, on my system the log has a line which contains this output from cwltool

[job Dockstore.cwl] /tmp/tmpCHSRib$ docker \
        run \
        -i \
        --volume=/home/ubuntu/CGP-Somatic-Docker/./datastore/launcher-d631eed9-e542-4881-a4f4-b5099d87a911/inputs/d9a0fe27-00e7-457d-ae5f-29833cbdb639/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai:/var/lib/cwl/stg2cf8dc73-8502-4ed0-8a35-32ad7e19226e/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai:ro \
        --volume=/home/ubuntu/CGP-Somatic-Docker/./datastore/launcher-d631eed9-e542-4881-a4f4-b5099d87a911/inputs/d9a0fe27-00e7-457d-ae5f-29833cbdb639/fdcb1bd7cffca69d15383ca9566c58e0.bam:/var/lib/cwl/stg2cf8dc73-8502-4ed0-8a35-32ad7e19226e/fdcb1bd7cffca69d15383ca9566c58e0.bam:ro \
        --volume=/home/ubuntu/CGP-Somatic-Docker/./datastore/launcher-d631eed9-e542-4881-a4f4-b5099d87a911/inputs/a6d23967-ba66-46f0-ba4c-90a6f5aea0c6/7875b5196f6b8b52847f99bf370aada0.bam:/var/lib/cwl/stg63cef54c-7a6a-433c-b05e-fea0640ec75c/7875b5196f6b8b52847f99bf370aada0.bam:ro \
        --volume=/home/ubuntu/CGP-Somatic-Docker/./datastore/launcher-d631eed9-e542-4881-a4f4-b5099d87a911/inputs/f1b1d76b-6a0e-4501-932c-76948fce9b21/GRCh37d5_battenberg.tar.gz:/var/lib/cwl/stg315d88b2-406f-4614-aac6-354875c9de11/GRCh37d5_battenberg.tar.gz:ro \
        --volume=/home/ubuntu/CGP-Somatic-Docker/./datastore/launcher-d631eed9-e542-4881-a4f4-b5099d87a911/inputs/a6d23967-ba66-46f0-ba4c-90a6f5aea0c6/7875b5196f6b8b52847f99bf370aada0.bam.bai:/var/lib/cwl/stg63cef54c-7a6a-433c-b05e-fea0640ec75c/7875b5196f6b8b52847f99bf370aada0.bam.bai:ro \
        --volume=/home/ubuntu/CGP-Somatic-Docker/./datastore/launcher-d631eed9-e542-4881-a4f4-b5099d87a911/inputs/c1b48667-b3a3-4369-bea1-90ec9f274163/GRCh37d5_CGP_refBundle.tar.gz:/var/lib/cwl/stgbfb48940-3c31-494a-bbb1-211d3f458f99/GRCh37d5_CGP_refBundle.tar.gz:ro \
        --volume=/tmp/tmpCHSRib:/var/spool/cwl:rw \
        --volume=/home/ubuntu/CGP-Somatic-Docker/datastore/launcher-d631eed9-e542-4881-a4f4-b5099d87a911/workingbdDaWV:/tmp:rw \
        --workdir=/var/spool/cwl \
        --read-only=true \
        --user=1000 \
        --env=TMPDIR=/tmp \
        --env=HOME=/var/spool/cwl \
        quay.io/pancancer/pcawg-sanger-cgp-workflow:2.0.2 \
        python \
        /home/seqware/CGP-Somatic-Docker/scripts/run_seqware_workflow.py \
        --tumor \
        /var/lib/cwl/stg63cef54c-7a6a-433c-b05e-fea0640ec75c/7875b5196f6b8b52847f99bf370aada0.bam \
        --normal \
        /var/lib/cwl/stg2cf8dc73-8502-4ed0-8a35-32ad7e19226e/fdcb1bd7cffca69d15383ca9566c58e0.bam \
        --refFrom \
        /var/lib/cwl/stgbfb48940-3c31-494a-bbb1-211d3f458f99/GRCh37d5_CGP_refBundle.tar.gz \
        --bbFrom \
        /var/lib/cwl/stg315d88b2-406f-4614-aac6-354875c9de11/GRCh37d5_battenberg.tar.gz

You'll want to modify that to something like

docker \
        run \
        -ti \
        --volume=/home/ubuntu/CGP-Somatic-Docker/./datastore/launcher-d631eed9-e542-4881-a4f4-b5099d87a911/inputs/d9a0fe27-00e7-457d-ae5f-29833cbdb639/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai:/var/lib/cwl/stg2cf8dc73-8502-4ed0-8a35-32ad7e19226e/fdcb1bd7cffca69d15383ca9566c58e0.bam.bai:ro \
        --volume=/home/ubuntu/CGP-Somatic-Docker/./datastore/launcher-d631eed9-e542-4881-a4f4-b5099d87a911/inputs/d9a0fe27-00e7-457d-ae5f-29833cbdb639/fdcb1bd7cffca69d15383ca9566c58e0.bam:/var/lib/cwl/stg2cf8dc73-8502-4ed0-8a35-32ad7e19226e/fdcb1bd7cffca69d15383ca9566c58e0.bam:ro \
        --volume=/home/ubuntu/CGP-Somatic-Docker/./datastore/launcher-d631eed9-e542-4881-a4f4-b5099d87a911/inputs/a6d23967-ba66-46f0-ba4c-90a6f5aea0c6/7875b5196f6b8b52847f99bf370aada0.bam:/var/lib/cwl/stg63cef54c-7a6a-433c-b05e-fea0640ec75c/7875b5196f6b8b52847f99bf370aada0.bam:ro \
        --volume=/home/ubuntu/CGP-Somatic-Docker/./datastore/launcher-d631eed9-e542-4881-a4f4-b5099d87a911/inputs/f1b1d76b-6a0e-4501-932c-76948fce9b21/GRCh37d5_battenberg.tar.gz:/var/lib/cwl/stg315d88b2-406f-4614-aac6-354875c9de11/GRCh37d5_battenberg.tar.gz:ro \
        --volume=/home/ubuntu/CGP-Somatic-Docker/./datastore/launcher-d631eed9-e542-4881-a4f4-b5099d87a911/inputs/a6d23967-ba66-46f0-ba4c-90a6f5aea0c6/7875b5196f6b8b52847f99bf370aada0.bam.bai:/var/lib/cwl/stg63cef54c-7a6a-433c-b05e-fea0640ec75c/7875b5196f6b8b52847f99bf370aada0.bam.bai:ro \
        --volume=/home/ubuntu/CGP-Somatic-Docker/./datastore/launcher-d631eed9-e542-4881-a4f4-b5099d87a911/inputs/c1b48667-b3a3-4369-bea1-90ec9f274163/GRCh37d5_CGP_refBundle.tar.gz:/var/lib/cwl/stgbfb48940-3c31-494a-bbb1-211d3f458f99/GRCh37d5_CGP_refBundle.tar.gz:ro \
        --volume=/tmp/tmpCHSRib:/var/spool/cwl:rw \
        --volume=/home/ubuntu/CGP-Somatic-Docker/datastore/launcher-d631eed9-e542-4881-a4f4-b5099d87a911/workingbdDaWV:/tmp:rw \
        --workdir=/var/spool/cwl \
        --read-only=true \
        --env=TMPDIR=/tmp \
        --env=HOME=/var/spool/cwl \
        quay.io/pancancer/pcawg-sanger-cgp-workflow:2.0.2 /bin/bash

Then you'll be able to examine the environment, run the workflow, and see what is going on inside the container at your leisure.

@brainstorm
Copy link
Author

Thanks a lot @denis-yuen for the help and feedback!

I've been looking at this problem for a while today and:

  1. /datastore/ is empty
  2. Cannot install anything in the docker without rebuilding the image such as strace or lsof as you pointed out above (/var/lib/dpkg is readonly and cannot mount -oremount,rw /). Instead I examined /proc/<PID> for hints.
  3. I suspect that the issue might be in the execute/POpen part of the python script (something similar to this stackoverflow thread perhaps?:
    cmd_parts = ["seqware bundle launch",
                 "--dir {0}".format(seqware_bundle_dir),
                 "--engine whitestar-parallel",
                 "--ini {0}".format(ini_file),
                 "--no-metadata"]
    cmd = " ".join(cmd_parts)
    execute(cmd)
  1. All log folders are empty.
  2. No --output flag seems to be provided in the commandline:
root@6509f7f1eeac:~# ps auxww
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  32852  6880 ?        Ss   13:08   0:00 python /home/seqware/CGP-Somatic-Docker/scripts/run_seqware_workflow.py --tumor /var/lib/cwl/stg6ce6658c-10b6-4625-9152-27a8d704da27/HCC1143.bam --normal /var/lib/cwl/stg40585d9c-309e-42f8-b82d-7b7e78d1f1a7/HCC1143_BL.bam --refFrom /var/lib/cwl/stg89c7c2c7-af92-4cc4-95e4-11c980dcd339/GRCh37d5_CGP_refBundle.tar.gz --bbFrom /var/lib/cwl/stgf6ef5eb6-60e8-4c4f-9243-95d892aad61a/GRCh37d5_battenberg.tar.gz
root        19  0.0  0.0   4344   640 ?        S    13:08   0:00 /bin/sh -c seqware bundle launch --dir /home/seqware/CGP-Somatic-Docker/target/Workflow_Bundle_CgpSomaticCore_0.0.0_SeqWare_1.1.1/ --engine whitestar-parallel --ini /var/spool/cwl/workflow.ini --no-metadata
root        20  0.0  0.0  17904  1664 ?        S    13:08   0:00 bash /home/seqware/bin/seqware bundle launch --dir /home/seqware/CGP-Somatic-Docker/target/Workflow_Bundle_CgpSomaticCore_0.0.0_SeqWare_1.1.1/ --engine whitestar-parallel --ini /var/spool/cwl/workflow.ini --no-metadata
root        40  100  3.2 14719440 1608028 ?    Sl   13:10 319:36 java io.seqware.cli.Main bundle launch --dir /home/seqware/CGP-Somatic-Docker/target/Workflow_Bundle_CgpSomaticCore_0.0.0_SeqWare_1.1.1/ --engine whitestar-parallel --ini /var/spool/cwl/workflow.ini --no-metadata
  1. Running the (manual) docker commandlines above has the same effect so far (futexe'd java process).

So I'll have to think/dig deeper and continue on this, but a bit unsure on what's really going on now.

@denis-yuen
Copy link
Member

Hmmm, I'm kind of puzzled as well. Can you describe your environment as thoroughly as possible and we can try to reproduce the issue on our OpenStack cluster. In particular:

  1. What OS is the host?
  2. What version of Docker are you using?
  3. What is the full Dockstore tool launch command, what is the generated cwltool command-line and docker run command-line?

A futex java process sounds like the Java portion of the workflow inside the container is having trouble creating or writing to a file and the OS is making it wait forever. I'm not clear on why that would happen though since the user is root.

@brainstorm
Copy link
Author

Yes, in the cluster we are running:

[rvalls@node08 ~]$ lsb_release -a
LSB Version:    :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description:    CentOS release 6.6 (Final)
Release:        6.6
Codename:       Final

[rvalls@node08 ~]$ uname -a
Linux node08 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

[rvalls@node08 ~]$ docker -v
Docker version 1.7.1, build 786b29d/1.7.1

I know, a bit legacy, we'll hopefully upgrade as soon as we can :/

The the commandlines that you asked for are:

[rvalls@node08 CGP-Somatic-Docker]$ time dockstore tool launch --entry Dockstore.cwl --local-entry --json test1.json
(...)
Executing: cwltool --enable-dev --non-strict --enable-net --outdir /home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-1ca4a864-eeaf-426c-8b49-050b2309223e/outputs/ --tmpdir-prefix /home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-1ca4a864-eeaf-426c-8b49-050b2309223e/working/ /home/rvalls/dev/CGP-Somatic-Docker/Dockstore.cwl /home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-1ca4a864-eeaf-426c-8b49-050b2309223e/workflow_params.json

@denis-yuen
Copy link
Member

@brainstorm Thanks, I think the last bit of info that would be useful, what was the generated docker run command?

@denis-yuen denis-yuen self-assigned this Nov 11, 2016
@brainstorm
Copy link
Author

brainstorm commented Nov 11, 2016

Sorry where can I see that command? It's not showing up in after issuing the dockstore command...

[rvalls@node08 CGP-Somatic-Docker]$ time dockstore tool launch --entry Dockstore.cwl --local-entry --json test1.json
(...)
Downloading: #refFrom from https://s3-eu-west-1.amazonaws.com/wtsi-pancancer/reference/GRCh37d5_CGP_refBundle.tar.gz into directory: /home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-1ca4a864-eeaf-426c-8b49-050b2309223e/inputs/f35d3680-938f-48e1-b387-54bed6a8cbbf
[##################################################] 100%
Downloading: #tumor from /scratch/rvalls/HCC1143_ds/HCC1143.bam into directory: /home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-1ca4a864-eeaf-426c-8b49-050b2309223e/inputs/744c44e9-b639-4f0d-891e-8e95946e4e4d
13:07:13.541 [main] ERROR io.dockstore.common.FileProvisioning - Could not link /scratch/rvalls/HCC1143_ds/HCC1143.bam to /home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-1ca4a864-eeaf-426c-8b49-050b2309223e/inputs/744c44e9-b639-4f0d-891e-8e95946e4e4d/HCC1143.bam , copying instead
java.nio.file.FileSystemException: /home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-1ca4a864-eeaf-426c-8b49-050b2309223e/inputs/744c44e9-b639-4f0d-891e-8e95946e4e4d/HCC1143.bam -> /scratch/rvalls/HCC1143_ds/HCC1143.bam: Invalid cross-device link
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:476)
        at java.nio.file.Files.createLink(Files.java:1086)
        at io.dockstore.common.FileProvisioning.provisionInputFile(FileProvisioning.java:246)
        at io.github.collaboratory.LauncherCWL.copyIndividualFile(LauncherCWL.java:669)
        at io.github.collaboratory.LauncherCWL.doProcessFile(LauncherCWL.java:635)
        at io.github.collaboratory.LauncherCWL.pullFilesHelper(LauncherCWL.java:606)
        at io.github.collaboratory.LauncherCWL.pullFiles(LauncherCWL.java:562)
        at io.github.collaboratory.LauncherCWL.run(LauncherCWL.java:184)
        at io.dockstore.client.cli.nested.AbstractEntryClient.launchCwl(AbstractEntryClient.java:897)
        at io.dockstore.client.cli.nested.AbstractEntryClient.checkEntryFile(AbstractEntryClient.java:748)
        at io.dockstore.client.cli.nested.AbstractEntryClient.launch(AbstractEntryClient.java:821)
        at io.dockstore.client.cli.nested.AbstractEntryClient.processEntryCommands(AbstractEntryClient.java:218)
        at io.dockstore.client.cli.Client.run(Client.java:698)
        at io.dockstore.client.cli.Client.main(Client.java:759)
13:07:43.980 [main] ERROR io.dockstore.common.FileProvisioning - Could not link /scratch/rvalls/HCC1143_ds/HCC1143.bam.bai to /home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-1ca4a864-eeaf-426c-8b49-050b2309223e/inputs/744c44e9-b
639-4f0d-891e-8e95946e4e4d/HCC1143.bam.bai , copying instead
java.nio.file.FileSystemException: /home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-1ca4a864-eeaf-426c-8b49-050b2309223e/inputs/744c44e9-b639-4f0d-891e-8e95946e4e4d/HCC1143.bam.bai -> /scratch/rvalls/HCC1143_ds/HCC1143.bam.bai: I
nvalid cross-device link
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:476)
        at java.nio.file.Files.createLink(Files.java:1086)
        at io.dockstore.common.FileProvisioning.provisionInputFile(FileProvisioning.java:246)
        at io.github.collaboratory.LauncherCWL.copyIndividualFile(LauncherCWL.java:669)
        at io.github.collaboratory.LauncherCWL.doProcessFile(LauncherCWL.java:649)
        at io.github.collaboratory.LauncherCWL.pullFilesHelper(LauncherCWL.java:606)
        at io.github.collaboratory.LauncherCWL.pullFiles(LauncherCWL.java:562)
        at io.github.collaboratory.LauncherCWL.run(LauncherCWL.java:184)
        at io.dockstore.client.cli.nested.AbstractEntryClient.launchCwl(AbstractEntryClient.java:897)
        at io.dockstore.client.cli.nested.AbstractEntryClient.checkEntryFile(AbstractEntryClient.java:748)
        at io.dockstore.client.cli.nested.AbstractEntryClient.launch(AbstractEntryClient.java:821)
        at io.dockstore.client.cli.nested.AbstractEntryClient.processEntryCommands(AbstractEntryClient.java:218)
        at io.dockstore.client.cli.Client.run(Client.java:698)
        at io.dockstore.client.cli.Client.main(Client.java:759)
Calling out to cwltool to run your tool
Executing: cwltool --enable-dev --non-strict --enable-net --outdir /home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-1ca4a864-eeaf-426c-8b49-050b2309223e/outputs/ --tmpdir-prefix /home/rvalls/dev/CGP-Somatic-Docker/./datastore/lau
ncher-1ca4a864-eeaf-426c-8b49-050b2309223e/working/ /home/rvalls/dev/CGP-Somatic-Docker/Dockstore.cwl /home/rvalls/dev/CGP-Somatic-Docker/./datastore/launcher-1ca4a864-eeaf-426c-8b49-050b2309223e/workflow_params.json

@denis-yuen
Copy link
Member

Hi,
I think I have a couple of avenues of investigation, but I just realized while trying to setup your environment, we may be chasing a red herring. The immediate problem, in the version of Dockstore CLI we have in production (and this was fixed later), the output from cwltool is output only after the program in the Docker container finishes. If it locks up, you'll need to kill -9 it (in this case, the call to java io.seqware.cli.Main bundle launch) to get to the output.

However, I think the bigger problem is that we've seen problems like this in the past on a different project (PCAWG). We've seen issues with file locking and other oddities on the container file system with older versions of Docker and/or running on older kernels.

https://docs.docker.com/engine/installation/binaries/#/check-kernel-dependencies indicates that you need a Linux kernel above 3.1 but it looks like you're running 2.6.32

https://docs.docker.com/engine/installation/linux/centos/ indicates you can probably get that kernel with CentOS 7.

Could you maybe try again on a newer CentOS version before coming back to CentOS 6 to confirm?

@brainstorm
Copy link
Author

We'll be able to do so soon since we'll be migrating anyway, I'll keep you guys posted.

Thanks a ton for all your support!

@denis-yuen
Copy link
Member

For the issue in the original post, we also released a new version of the workflow ( https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/releases/tag/2.0.3 ) which uses gosu which should help with the unknown user issue. Feel free to re-open this issue or create a new one when you look at CentOS again for the second issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants