Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch==1.9.0+cu111 installation fails and results in training error #45

Open
ennemoser opened this issue Jun 5, 2023 · 3 comments
Open

Comments

@ennemoser
Copy link

It seems that torch==1.9.0+cu111 torchvision==0.10.0+cu111 can't be installed and resulting in installing torch-2.0.1+cu118. This give an error when I try to train the model.

This is the error I get: ERROR: Could not find a version that satisfies the requirement torch==1.9.0+cu111 (from versions: 1.11.0, 1.11.0+cpu, 1.11.0+cu102, 1.11.0+cu113, 1.11.0+cu115, 1.11.0+rocm4.3.1, 1.11.0+rocm4.5.2, 1.12.0, 1.12.0+cpu, 1.12.0+cu102, 1.12.0+cu113, 1.12.0+cu116, 1.12.0+rocm5.0, 1.12.0+rocm5.1.1, 1.12.1, 1.12.1+cpu, 1.12.1+cu102, 1.12.1+cu113, 1.12.1+cu116, 1.12.1+rocm5.0, 1.12.1+rocm5.1.1, 1.13.0, 1.13.0+cpu, 1.13.0+cu116, 1.13.0+cu117, 1.13.0+cu117.with.pypi.cudnn, 1.13.0+rocm5.1.1, 1.13.0+rocm5.2, 1.13.1, 1.13.1+cpu, 1.13.1+cu116, 1.13.1+cu117, 1.13.1+cu117.with.pypi.cudnn, 1.13.1+rocm5.1.1, 1.13.1+rocm5.2, 2.0.0, 2.0.0+cpu, 2.0.0+cpu.cxx11.abi, 2.0.0+cu117, 2.0.0+cu117.with.pypi.cudnn, 2.0.0+cu118, 2.0.0+rocm5.3, 2.0.0+rocm5.4.2, 2.0.1, 2.0.1+cpu, 2.0.1+cpu.cxx11.abi, 2.0.1+cu117, 2.0.1+cu117.with.pypi.cudnn, 2.0.1+cu118, 2.0.1+rocm5.3, 2.0.1+rocm5.4.2)
ERROR: No matching distribution found for torch==1.9.0+cu111

The error that I get when training is the following:

/content/drive/My Drive/colab-sg2-ada-pytorch/stylegan2-ada-pytorch/torch_utils/ops/conv2d_gradfix.py:55: UserWarning: conv2d_gradfix not supported on PyTorch 2.0.1+cu117. Falling back to torch.nn.functional.conv2d().
warnings.warn(f'conv2d_gradfix not supported on PyTorch {torch.version}. Falling back to torch.nn.functional.conv2d().')

I desperately try to this collab running - Can anyone help?

@monolesan
Copy link

StyleGAN works with torch 1.7, 1.8, 1.9. These versions are only supported by CUDA 11.1 and Python 3.6, 3.7, 3.8, 3.9.
Google Colab uses Python 3.10.11 and CUDA 11.8.0 by default right now after updates.

So, to solve this problem:

  1. I installed CUDA 11.1
  2. Installed Python 3.9.

It helped, but after this, I got new errors which I couldn't solve:
warnings.warn('Failed to build CUDA kernels for upfirdn2d. Falling back to slow reference implementation. Details:\n\n' + traceback.format_exc()) Setting up PyTorch plugin "upfirdn2d_plugin"... Failed! /content/drive/My Drive/colab-sg2-ada-pytorch/stylegan2-ada-pytorch/torch_utils/ops/upfirdn2d.py:34: UserWarning: Failed to build CUDA kernels for upfirdn2d. Falling back to slow reference implementation.

How to install CUDA 11.1 (just do step by step, when you need to reboot, you can just Runtime --> Restart runtime and continue running other commands): https://www.youtube.com/watch?v=5eJTzhGe2QE

How to install python 3.9:
!apt-get install python3.9
!curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
!python3.9 get-pip.py
Also, I changed all commands in colab that start with python and pip.
For example, it was like this:
!pip install ninja
!python train.py [...]
And I changed it to this:
!python3.9 -m pip install ninja
!python3.9 train.py [...]

@d-bohn
Copy link

d-bohn commented Jul 17, 2023

Same issue here. Has anyone found a complete solution yet with the current Colab defaults (or around them)?

@ada-ada-ada-art
Copy link

I think I fixed this. I did it without changing CUDA or Python version in Colab.

I've added the code changes to this PR: #48

The fix

I did two main things:

  • Copied some updates from StyleGAN3 into this repo
  • Changed the Colab notebook to use Colab default PyTorch and JAX

StyleGAN3 files

I changed two files in the repo: torch_utils/ops/conv2d_gradfix.py and torch_utils/ops/grid_simple_gradfix.py.

I copied the files from the StyleGAN3 repo, which has received an update to handle new PyTorch versions.

Here's some links to the two SG3 files:
https://github.com/NVlabs/stylegan3/blob/main/torch_utils/ops/conv2d_gradfix.py
https://github.com/NVlabs/stylegan3/blob/main/torch_utils/ops/grid_sample_gradfix.py

Colab notebook changes

I also changed the Colab notebook. I removed all the JAX and PyTorch uninstall stuff, so that this:

#Uninstall new JAX
!pip uninstall jax jaxlib -y
#GPU frontend
!pip install "jax[cuda11_cudnn805]==0.3.10" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
#CPU frontend
#!pip install jax[cpu]==0.3.10
#Downgrade Pytorch
!pip uninstall torch torchvision -y
!pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
!pip install timm==0.4.12 ftfy==6.1.1 ninja==1.10.2 opensimplex

becomes this:

!pip install timm==0.4.12 ftfy==6.1.1 ninja==1.10.2 opensimplex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants