Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job failing before starting with docker error #240

Closed
nithinjoshy opened this issue Apr 20, 2022 · 2 comments
Closed

Job failing before starting with docker error #240

nithinjoshy opened this issue Apr 20, 2022 · 2 comments

Comments

@nithinjoshy
Copy link

I am having a recurring issue where jobs that I start are failing immediately with an error that looks related to Docker. The log file is below. These are the first lines in the file.

time="2022-04-20T17:43:09Z" level=error msg="error waiting for container: context canceled"`
Error response from daemon: driver failed programming external connectivity on endpoint ssh (4b09db99808becfa8dbe7f000240cf72d29b6535f0b006e93c97391bc82904cf): Error starting userland proxy: listen tcp4 0.0.0.0:22: bind: address already in use

There are a few statements afterwards but they contain info specific to my project so I would not like to copy them here. They are similar to the two statements below and seem generally unrelated to any error.

2022-04-20 17:43:10 INFO: gsutil -h Content-Type:text/plain  -mq cp /tmp/continuous_logging_action/...
2022-04-20 17:43:10 INFO: mkdir -m 777 -p /mnt/data/input/...
2022-04-20 17:43:10 INFO: mkdir -m 777 -p /mnt/data/output/....

Below is the code I use to start the job.

dsub \
    --provider google-cls-v2 \
    --project ${PROJECT} \
    --logging gs://${DSUB_BUCKET}/logs \
    --input-recursive INPUT_PATH=gs://${OUTPUT_BUCKET}/${name}/ \
    --output-recursive OUTPUT_PATH=gs://${OUTPUT_BUCKET}/${name}/ \
    --image ${AGGREGATE_IMAGE} \
    --script aggregate.sh \
    --disk-size 1000 \
    --name "aggregate" \
    --machine-type n1-standard-16 \
    --ssh \
    --boot-disk-size 30

I am posting about this here because this issue started to occur without any change to my code. Furthermore, as far as I can tell, this error is unrelated to any of my own code but instead is an issue with how Dsub is deploying my Docker container to the VM. Does anyone have any ideas about what may be happening? Please excuse me if it is actually a trivial error in my code.

@wnojopra
Copy link
Contributor

Hi @nj3252!
The error you're seeing seems related to the SSH port. If you're not actively using the ssh feature, could you remove the --ssh flag from your command and try again?

FWIW, we are seeing a few other issues related to SSH in #238 and #233

@nithinjoshy
Copy link
Author

Thanks, that has fixed it. I was using the ssh flag previously and never removed it. I really appreciate the quick response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants