Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thank god for this, you saved me #95

Closed
arnicas opened this issue Oct 7, 2022 · 3 comments · Fixed by #103
Closed

Thank god for this, you saved me #95

arnicas opened this issue Oct 7, 2022 · 3 comments · Fixed by #103

Comments

@arnicas
Copy link

arnicas commented Oct 7, 2022

I can't figure out how to say thank you except to file this and say you just saved me after 2 days of wasted broken installs and deps on an A100.

@pmeier
Copy link
Owner

pmeier commented Oct 7, 2022

Thank god for this

Well, you could thank me instead 😛 Jokes aside, thanks for posting this. It means a lot. I'm glad that this project helped you out.

If you find the time, could you tell me how your envs were broken before? While working on #73, I figured that detecting broken envs would also be good addition for this tools. In the future we might even have ltt fix to fix the env automatically. But all of this depends on correctly detecting broken envs and the reason why they are broken in the first place.

Plus, did you encounter any issues with light-the-torch? Was the usage clear just from the README?

@arnicas
Copy link
Author

arnicas commented Oct 7, 2022

Hah - it was great. My only request would be to add torchvision (and maybe torchaudio) as extra possible installs along with? I held my breath when I added it but it worked ok.

The envs were broken because I didn't know what I was doing on the installation. My nvidia-smi said one thing about cuda (11.6), my nvcc -V said another, the pytorch site had no supporting versions in their little "helper", and I couldn't figure out how to get the right +cu on the install. So various libs that need torch tried to install it, and different versions, and actually broke my installs of older versions that i had gotten working with cuda. So then I also got repeated errors from cuda/torch saying that my driver with sm_80 wasn't supported etc. Shrug, hard to describe the chaos.

Thanks again!

@pmeier
Copy link
Owner

pmeier commented Oct 7, 2022

My only request would be to add torchvision (and maybe torchaudio) as extra possible installs along with? I held my breath when I added it but it worked ok.

Most PyTorch distributions, including torchvision and torchaudio are already supported. There is no public list, but I'm periodically monitoring their indices to check if we are missing something.

PYTORCH_DISTRIBUTIONS = {
"torch",
"torch_model_archiver",
"torch_tb_profiler",
"torcharrow",
"torchaudio",
"torchcsprng",
"torchdata",
"torchdistx",
"torchserve",
"torchtext",
"torchvision",
}

If something breaks, feel free to reach out.

My nvidia-smi said one thing about cuda (11.6), my nvcc -V said another

That is indeed very confusing and I fell for it myself in the beginning. nvcc reports the version of the CUDA toolkit you have installed, while nvidia-smi reports the version up to which compiled CUDA code is supported on your machine. So in your case, you can use everything with CUDA<=11.6, which I believe is most of the binaries PyTorch provides at the moment.

Plus, unless you actually want to compile CUDA code, e.g. building PyTorch from source, you don't need the CUDA toolkit installed on your system at all. PyTorch ships everything you need at runtime inside the wheels. This is why they are so large. You only need to have the driver installed.

TL;DR if you ever have to install PyTorch wheels manually again, trust nvidia-smi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants