Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torch210 #14107

Merged
merged 3 commits into from
Dec 16, 2023
Merged

Torch210 #14107

merged 3 commits into from
Dec 16, 2023

Conversation

AUTOMATIC1111
Copy link
Owner

@AUTOMATIC1111 AUTOMATIC1111 commented Nov 26, 2023

Description

Updates torch to 2.1.0

Checklist:

@AUTOMATIC1111
Copy link
Owner Author

I'd like to hear about whether this breaks anything, so if anyone can test out this branch and write here about the experience, that would help.

@wfjsw
Copy link
Contributor

wfjsw commented Nov 26, 2023

Torch has already advanced into 2.1.1 and xformers is expected to have 2.1.1 support next week so if there is no rush probably we can wait for it?

@freecoderwaifu
Copy link

freecoderwaifu commented Dec 3, 2023

Didn't test this particular commit but tested the latest dev commit with Python 3.11.6, Torch 2.1.1 and latest xformers, everything works as expected. Tested with these extensions, all work so far.

image
image

@KerfuffleV2
Copy link

I also don't have any issues with Torch210, actually I'm currently using the nightly for ROCM 5.7 (Torch 2.11 I think?) also works okay though needs a small change for of the deps. Would be really nice to get the FP8 stuff in, it makes a big difference for people on low memory GPUs.

@KohakuBlueleaf
Copy link
Collaborator

@AUTOMATIC1111 xformers have published xformers 0.0.23 which is built for pytorch 2.1.1
And considering pytorch 2.1.1 have fixed some known issue with pytorch 2.1.0
I think it is good to update this pr to Torch211 instead of Torch210

@KerfuffleV2
Copy link

There shouldn't be any problem going up a point release. It actually even works on the 2.2 nightly as well.

@KohakuBlueleaf
Copy link
Collaborator

There shouldn't be any problem going up a point release. It actually even works on the 2.2 nightly as well.

Can you help me to check if fp8 works with pytorch220?
They have changed some behaviours of fp8 and I never tested it

@KerfuffleV2
Copy link

@KohakuBlueleaf I apologize, I didn't realize that was your pull. Great work! I am actually already using that branch, so I can confirm it does work just fine.

It requires one small change (which isn't related to the fp8 stuff as far as I know):

in .venv/lib/python3.11/site-packages/basicsr/data/degradations.py:

from torchvision.transforms.functional_tensor import rgb_to_grayscale

needs to be

from torchvision.transforms.functional import rgb_to_grayscale

As far as I can tell, there's no noticeable difference between using 2.2 and 2.1 - I just wanted the shiny new thing. Is there any specific you'd like me to try? This is with ROCM so I can't test any xformers stuff.

@KohakuBlueleaf
Copy link
Collaborator

@KohakuBlueleaf I apologize, I didn't realize that was your pull. Great work! I am actually already using that branch, so I can confirm it does work just fine.

It requires one small change (which isn't related to the fp8 stuff as far as I know):

in .venv/lib/python3.11/site-packages/basicsr/data/degradations.py:

from torchvision.transforms.functional_tensor import rgb_to_grayscale

needs to be

from torchvision.transforms.functional import rgb_to_grayscale

As far as I can tell, there's no noticeable difference between using 2.2 and 2.1 - I just wanted the shiny new thing. Is there any specific you'd like me to try? This is with ROCM so I can't test any xformers stuff.

I need to ensure fp8 storage with autocast can work.
Since some pr for pytorch remove some "upgrade" behaviour of fp8 which my implementation "may" rely on.

Since you said you are using ROCm which I don't know if it supports fp8 or not.

I will try it by myself.

Thx for your info!

@KerfuffleV2
Copy link

I need to ensure fp8 storage with autocast can work.

I am pretty sure it's working since there's a noticeable difference in memory consumption with fp8 turned on vs not.

I will try it by myself.

If there's anything I can help you test, please let me know. I know Python, etc so I can follow relatively technical instructions. Happy to facilitate your testing if it helps (especially if it makes the ROCM stuff work better!)

@KohakuBlueleaf
Copy link
Collaborator

I need to ensure fp8 storage with autocast can work.

I am pretty sure it's working since there's a noticeable difference in memory consumption with fp8 turned on vs not.

I will try it by myself.

If there's anything I can help you test, please let me know. I know Python, etc so I can follow relatively technical instructions. Happy to facilitate your testing if it helps (especially if it makes the ROCM stuff work better!)

Thx for the info that rocm also works.
Want to know the speed differences.

@KerfuffleV2
Copy link

Want to know the speed differences.

The speed difference seems pretty small to me, it actually seemed like fp8 is faster sometimes. With the generation I was running currently (1536x768, tiled diffusion) fp8 was like 4s/it, fp16 was 3.65s/it (not a very scientific test). Also when VRAM runs low and stuff starts swapping everything gets insanely slow and sometimes it's impossible to even cancel the job. So being able to avoid that is its own kind of performance boost.

@KohakuBlueleaf
Copy link
Collaborator

Want to know the speed differences.

The speed difference seems pretty small to me, it actually seemed like fp8 is faster sometimes. With the generation I was running currently (1536x768, tiled diffusion) fp8 was like 4s/it, fp16 was 3.65s/it (not a very scientific test). Also when VRAM runs low and stuff starts swapping everything gets insanely slow and sometimes it's impossible to even cancel the job. So being able to avoid that is its own kind of performance boost.

Thx.
The behaviour is as same as CUDA
What a good news!

@FurkanGozukara
Copy link

Want to know the speed differences.

The speed difference seems pretty small to me, it actually seemed like fp8 is faster sometimes. With the generation I was running currently (1536x768, tiled diffusion) fp8 was like 4s/it, fp16 was 3.65s/it (not a very scientific test). Also when VRAM runs low and stuff starts swapping everything gets insanely slow and sometimes it's impossible to even cancel the job. So being able to avoid that is its own kind of performance boost.

actually it is supposed to be slower due to type casting. it only reduces VRAM.

@KohakuBlueleaf
Copy link
Collaborator

Want to know the speed differences.

The speed difference seems pretty small to me, it actually seemed like fp8 is faster sometimes. With the generation I was running currently (1536x768, tiled diffusion) fp8 was like 4s/it, fp16 was 3.65s/it (not a very scientific test). Also when VRAM runs low and stuff starts swapping everything gets insanely slow and sometimes it's impossible to even cancel the job. So being able to avoid that is its own kind of performance boost.

actually it is supposed to be slower due to type casting. it only reduces VRAM.

Yes, I need to check we got the "slow down" effect to ensure it is using type casting

@daxijiu
Copy link
Contributor

daxijiu commented Dec 15, 2023

PyTorch 2.1.2 and xformers 0.0.23.post1 was released today.

@AUTOMATIC1111 AUTOMATIC1111 merged commit 60186c7 into dev Dec 16, 2023
6 checks passed
@AUTOMATIC1111 AUTOMATIC1111 deleted the torch210 branch December 16, 2023 07:16
@Theliel
Copy link

Theliel commented Jan 2, 2024

I haven't specifically tested this pull, but I haven't had any problems going from Torch2.0+Cuda11.8 to Torch2.1.2+Cuda12.1 on Win11. It was necessary to update xfomers to 0.0.23, and I had to recompile bitsandbytes for Windows+Cuda12.1

Everything seems to work correctly, on my system there seems to be a slight improvement of around 5-7%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants