-
Notifications
You must be signed in to change notification settings - Fork 27.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Torch210 #14107
Torch210 #14107
Conversation
I'd like to hear about whether this breaks anything, so if anyone can test out this branch and write here about the experience, that would help. |
Torch has already advanced into 2.1.1 and xformers is expected to have 2.1.1 support next week so if there is no rush probably we can wait for it? |
I also don't have any issues with Torch210, actually I'm currently using the nightly for ROCM 5.7 (Torch 2.11 I think?) also works okay though needs a small change for of the deps. Would be really nice to get the FP8 stuff in, it makes a big difference for people on low memory GPUs. |
@AUTOMATIC1111 xformers have published xformers 0.0.23 which is built for pytorch 2.1.1 |
There shouldn't be any problem going up a point release. It actually even works on the 2.2 nightly as well. |
Can you help me to check if fp8 works with pytorch220? |
@KohakuBlueleaf I apologize, I didn't realize that was your pull. Great work! I am actually already using that branch, so I can confirm it does work just fine. It requires one small change (which isn't related to the fp8 stuff as far as I know): in
needs to be
As far as I can tell, there's no noticeable difference between using 2.2 and 2.1 - I just wanted the shiny new thing. Is there any specific you'd like me to try? This is with ROCM so I can't test any |
I need to ensure fp8 storage with autocast can work. Since you said you are using ROCm which I don't know if it supports fp8 or not. I will try it by myself. Thx for your info! |
I am pretty sure it's working since there's a noticeable difference in memory consumption with fp8 turned on vs not.
If there's anything I can help you test, please let me know. I know Python, etc so I can follow relatively technical instructions. Happy to facilitate your testing if it helps (especially if it makes the ROCM stuff work better!) |
Thx for the info that rocm also works. |
The speed difference seems pretty small to me, it actually seemed like fp8 is faster sometimes. With the generation I was running currently (1536x768, tiled diffusion) fp8 was like 4s/it, fp16 was 3.65s/it (not a very scientific test). Also when VRAM runs low and stuff starts swapping everything gets insanely slow and sometimes it's impossible to even cancel the job. So being able to avoid that is its own kind of performance boost. |
Thx. |
actually it is supposed to be slower due to type casting. it only reduces VRAM. |
Yes, I need to check we got the "slow down" effect to ensure it is using type casting |
PyTorch 2.1.2 and xformers 0.0.23.post1 was released today. |
I haven't specifically tested this pull, but I haven't had any problems going from Torch2.0+Cuda11.8 to Torch2.1.2+Cuda12.1 on Win11. It was necessary to update xfomers to 0.0.23, and I had to recompile bitsandbytes for Windows+Cuda12.1 Everything seems to work correctly, on my system there seems to be a slight improvement of around 5-7% |
Description
Updates torch to 2.1.0
Checklist: