Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run remote nn classification workflow #181

Open
constantinpape opened this issue Jul 16, 2021 · 9 comments
Open

Cannot run remote nn classification workflow #181

constantinpape opened this issue Jul 16, 2021 · 9 comments

Comments

@constantinpape
Copy link
Member

I tried to run the remote nn classification workflow (client: my laptop, server: embl gpu 7), but ran int some issues:

  1. the server lists "127.0.0.0" instead of the correct ip
  2. even when inserting the correct IP the client does not find the server

See screenshots (server was still running when I tried to connect):
Screenshot from 2021-07-16 07-19-26
Screenshot from 2021-07-16 07-18-24

@akreshuk
Copy link
Member

You have to start it with --addr 0.0.0.0

@constantinpape
Copy link
Member Author

You have to start it with --addr 0.0.0.0

That works, we should either make this the default behaviour or at least document it.

Also, for me the server only runs on the cpu:
Screenshot from 2021-07-17 11-23-13

I am making sure that the gpu is visible to the server:

$ CUDA_VISIBLE_DEVICES=0 tiktorch-server --addr 0.0.0.0

@m-novikov
Copy link
Collaborator

m-novikov commented Jul 17, 2021

It may be an issue with the cuda versions. In this cases, device list doesn't show
Check the output of this command in the tiktorch-server-env

python -c "import torch.cuda; print(torch.cuda.is_available()); from torch.version import cuda;  print(cuda); "

I added some expanded cuda logging to server startup sequence to simplify debugging

@constantinpape
Copy link
Member Author

@m-novikov you are right, the problem is that CUDA is not available:

$ CUDA_VISIBLE_DEVCIES=0 python -c "import torch.cuda; print(torch.cuda.is_available()); from torch.version import cuda;  print(cuda)"
False
None

and this is due to the fact that my env has a cpu only pytorch installed:

$ conda list | grep pytorch
pytorch                   1.9.0               py3.7_cpu_0    pytorch

Note that I followed the tiktorch installation instructions: https://github.com/ilastik/tiktorch#installation. So for some reason this env pulls in a cpu pytorch.

@m-novikov
Copy link
Collaborator

I think this is an issue with cudatoolkit version in https://github.com/ilastik/tiktorch/blob/master/conda-recipe/meta.yaml#L37
Last time I tried to find stable configuration I only managed to get it working with 10.1

@k-dominik
Copy link
Collaborator

hey @constantinpape,

could you try installing the conda package instead of creating the devenv?

conda create -n tiktorch-server-env -c ilastik-forge -c conda-forge tiktorch cudatoolkit=YOURPREFERREDVERSION

I recently updated the conda recipe, while the environment.yaml might be outdated.

@constantinpape
Copy link
Member Author

@k-dominik yes, that works; I had to add pytorch to the channels though, otherwise it would use the cpu pytorch paackage from conda-forge:

conda create -n tiktorch-server-env -c pytorch -c ilastik-forge -c conda-forge tiktorch cudatoolkit=11.0

@k-dominik
Copy link
Collaborator

sorry, of course, I forgot the pytorch channel!

@m-novikov
Copy link
Collaborator

m-novikov commented Jul 25, 2021

Default installation command on linux uses cpu installation:

conda create -n tiktorch-server-env -c ilastik-forge -c conda-forge -c pytorch tiktorch

results in

pytorch            conda-forge/linux-64::pytorch-1.8.0-cpu_py37ha70c682_1

Seemls like because of strict channel priority, if I disable strict channel priority or rearrange command to have -c pytorch in first position it picks up the cuda version

conda create -n tiktorch-server-env -c pytorch  -c ilastik-forge -c conda-forge tiktorch

Maybe we should consider specifying build-string constraint for pytorch e.g pytorch=1.*=*cuda*

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants