Cannot run remote nn classification workflow #181

constantinpape · 2021-07-16T05:26:58Z

I tried to run the remote nn classification workflow (client: my laptop, server: embl gpu 7), but ran int some issues:

the server lists "127.0.0.0" instead of the correct ip
even when inserting the correct IP the client does not find the server

See screenshots (server was still running when I tried to connect):

akreshuk · 2021-07-16T07:28:31Z

You have to start it with --addr 0.0.0.0

constantinpape · 2021-07-17T09:26:42Z

You have to start it with --addr 0.0.0.0

That works, we should either make this the default behaviour or at least document it.

Also, for me the server only runs on the cpu:

I am making sure that the gpu is visible to the server:

$ CUDA_VISIBLE_DEVICES=0 tiktorch-server --addr 0.0.0.0

m-novikov · 2021-07-17T10:03:09Z

It may be an issue with the cuda versions. In this cases, device list doesn't show
Check the output of this command in the tiktorch-server-env

python -c "import torch.cuda; print(torch.cuda.is_available()); from torch.version import cuda;  print(cuda); "

I added some expanded cuda logging to server startup sequence to simplify debugging

constantinpape · 2021-07-19T08:55:39Z

@m-novikov you are right, the problem is that CUDA is not available:

$ CUDA_VISIBLE_DEVCIES=0 python -c "import torch.cuda; print(torch.cuda.is_available()); from torch.version import cuda;  print(cuda)"
False
None

and this is due to the fact that my env has a cpu only pytorch installed:

$ conda list | grep pytorch
pytorch                   1.9.0               py3.7_cpu_0    pytorch

Note that I followed the tiktorch installation instructions: https://github.com/ilastik/tiktorch#installation. So for some reason this env pulls in a cpu pytorch.

m-novikov · 2021-07-21T17:58:48Z

I think this is an issue with cudatoolkit version in https://github.com/ilastik/tiktorch/blob/master/conda-recipe/meta.yaml#L37
Last time I tried to find stable configuration I only managed to get it working with 10.1

k-dominik · 2021-07-22T07:18:16Z

hey @constantinpape,

could you try installing the conda package instead of creating the devenv?

conda create -n tiktorch-server-env -c ilastik-forge -c conda-forge tiktorch cudatoolkit=YOURPREFERREDVERSION

I recently updated the conda recipe, while the environment.yaml might be outdated.

constantinpape · 2021-07-22T08:50:08Z

@k-dominik yes, that works; I had to add pytorch to the channels though, otherwise it would use the cpu pytorch paackage from conda-forge:

conda create -n tiktorch-server-env -c pytorch -c ilastik-forge -c conda-forge tiktorch cudatoolkit=11.0

k-dominik · 2021-07-22T09:02:20Z

sorry, of course, I forgot the pytorch channel!

m-novikov · 2021-07-25T14:01:56Z

Default installation command on linux uses cpu installation:

conda create -n tiktorch-server-env -c ilastik-forge -c conda-forge -c pytorch tiktorch

results in

pytorch            conda-forge/linux-64::pytorch-1.8.0-cpu_py37ha70c682_1

Seemls like because of strict channel priority, if I disable strict channel priority or rearrange command to have -c pytorch in first position it picks up the cuda version

conda create -n tiktorch-server-env -c pytorch  -c ilastik-forge -c conda-forge tiktorch

Maybe we should consider specifying build-string constraint for pytorch e.g pytorch=1.*=*cuda*

m-novikov mentioned this issue Jul 17, 2021

Improve startup log #183

Merged

constantinpape closed this as completed in 4f0e75b Jul 19, 2021

constantinpape reopened this Jul 19, 2021

m-novikov mentioned this issue Jul 25, 2021

Update release install command to use strict channel priority #185

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot run remote nn classification workflow #181

Cannot run remote nn classification workflow #181

constantinpape commented Jul 16, 2021

akreshuk commented Jul 16, 2021

constantinpape commented Jul 17, 2021

m-novikov commented Jul 17, 2021 •

edited

Loading

constantinpape commented Jul 19, 2021

m-novikov commented Jul 21, 2021

k-dominik commented Jul 22, 2021

constantinpape commented Jul 22, 2021

k-dominik commented Jul 22, 2021

m-novikov commented Jul 25, 2021 •

edited

Loading

Cannot run remote nn classification workflow #181

Cannot run remote nn classification workflow #181

Comments

constantinpape commented Jul 16, 2021

akreshuk commented Jul 16, 2021

constantinpape commented Jul 17, 2021

m-novikov commented Jul 17, 2021 • edited Loading

constantinpape commented Jul 19, 2021

m-novikov commented Jul 21, 2021

k-dominik commented Jul 22, 2021

constantinpape commented Jul 22, 2021

k-dominik commented Jul 22, 2021

m-novikov commented Jul 25, 2021 • edited Loading

m-novikov commented Jul 17, 2021 •

edited

Loading

m-novikov commented Jul 25, 2021 •

edited

Loading