Torch210 #14107

AUTOMATIC1111 · 2023-11-26T09:10:34Z

Description

Updates torch to 2.1.0

Checklist:

I have read contributing wiki page
I have performed a self-review of my own code
My code follows the style guidelines
My code passes tests

AUTOMATIC1111 · 2023-11-26T12:01:21Z

I'd like to hear about whether this breaks anything, so if anyone can test out this branch and write here about the experience, that would help.

wfjsw · 2023-11-26T16:07:16Z

Torch has already advanced into 2.1.1 and xformers is expected to have 2.1.1 support next week so if there is no rush probably we can wait for it?

freecoderwaifu · 2023-12-03T04:20:51Z

Didn't test this particular commit but tested the latest dev commit with Python 3.11.6, Torch 2.1.1 and latest xformers, everything works as expected. Tested with these extensions, all work so far.

KerfuffleV2 · 2023-12-07T12:20:40Z

I also don't have any issues with Torch210, actually I'm currently using the nightly for ROCM 5.7 (Torch 2.11 I think?) also works okay though needs a small change for of the deps. Would be really nice to get the FP8 stuff in, it makes a big difference for people on low memory GPUs.

KohakuBlueleaf · 2023-12-07T15:28:58Z

@AUTOMATIC1111 xformers have published xformers 0.0.23 which is built for pytorch 2.1.1
And considering pytorch 2.1.1 have fixed some known issue with pytorch 2.1.0
I think it is good to update this pr to Torch211 instead of Torch210

KerfuffleV2 · 2023-12-07T15:52:46Z

There shouldn't be any problem going up a point release. It actually even works on the 2.2 nightly as well.

KohakuBlueleaf · 2023-12-07T16:02:53Z

There shouldn't be any problem going up a point release. It actually even works on the 2.2 nightly as well.

Can you help me to check if fp8 works with pytorch220?
They have changed some behaviours of fp8 and I never tested it

KerfuffleV2 · 2023-12-07T16:43:18Z

@KohakuBlueleaf I apologize, I didn't realize that was your pull. Great work! I am actually already using that branch, so I can confirm it does work just fine.

It requires one small change (which isn't related to the fp8 stuff as far as I know):

in .venv/lib/python3.11/site-packages/basicsr/data/degradations.py:

from torchvision.transforms.functional_tensor import rgb_to_grayscale

needs to be

from torchvision.transforms.functional import rgb_to_grayscale

As far as I can tell, there's no noticeable difference between using 2.2 and 2.1 - I just wanted the shiny new thing. Is there any specific you'd like me to try? This is with ROCM so I can't test any xformers stuff.

KohakuBlueleaf · 2023-12-07T17:22:43Z

@KohakuBlueleaf I apologize, I didn't realize that was your pull. Great work! I am actually already using that branch, so I can confirm it does work just fine.

It requires one small change (which isn't related to the fp8 stuff as far as I know):

in .venv/lib/python3.11/site-packages/basicsr/data/degradations.py:

from torchvision.transforms.functional_tensor import rgb_to_grayscale

needs to be

from torchvision.transforms.functional import rgb_to_grayscale

As far as I can tell, there's no noticeable difference between using 2.2 and 2.1 - I just wanted the shiny new thing. Is there any specific you'd like me to try? This is with ROCM so I can't test any xformers stuff.

I need to ensure fp8 storage with autocast can work.
Since some pr for pytorch remove some "upgrade" behaviour of fp8 which my implementation "may" rely on.

Since you said you are using ROCm which I don't know if it supports fp8 or not.

I will try it by myself.

Thx for your info!

KerfuffleV2 · 2023-12-07T19:08:49Z

I need to ensure fp8 storage with autocast can work.

I am pretty sure it's working since there's a noticeable difference in memory consumption with fp8 turned on vs not.

I will try it by myself.

If there's anything I can help you test, please let me know. I know Python, etc so I can follow relatively technical instructions. Happy to facilitate your testing if it helps (especially if it makes the ROCM stuff work better!)

KohakuBlueleaf · 2023-12-08T02:55:26Z

I need to ensure fp8 storage with autocast can work.

I am pretty sure it's working since there's a noticeable difference in memory consumption with fp8 turned on vs not.

I will try it by myself.

If there's anything I can help you test, please let me know. I know Python, etc so I can follow relatively technical instructions. Happy to facilitate your testing if it helps (especially if it makes the ROCM stuff work better!)

Thx for the info that rocm also works.
Want to know the speed differences.

KerfuffleV2 · 2023-12-08T08:38:39Z

Want to know the speed differences.

The speed difference seems pretty small to me, it actually seemed like fp8 is faster sometimes. With the generation I was running currently (1536x768, tiled diffusion) fp8 was like 4s/it, fp16 was 3.65s/it (not a very scientific test). Also when VRAM runs low and stuff starts swapping everything gets insanely slow and sometimes it's impossible to even cancel the job. So being able to avoid that is its own kind of performance boost.

KohakuBlueleaf · 2023-12-08T08:40:31Z

Want to know the speed differences.

The speed difference seems pretty small to me, it actually seemed like fp8 is faster sometimes. With the generation I was running currently (1536x768, tiled diffusion) fp8 was like 4s/it, fp16 was 3.65s/it (not a very scientific test). Also when VRAM runs low and stuff starts swapping everything gets insanely slow and sometimes it's impossible to even cancel the job. So being able to avoid that is its own kind of performance boost.

Thx.
The behaviour is as same as CUDA
What a good news!

FurkanGozukara · 2023-12-08T10:33:38Z

Want to know the speed differences.

The speed difference seems pretty small to me, it actually seemed like fp8 is faster sometimes. With the generation I was running currently (1536x768, tiled diffusion) fp8 was like 4s/it, fp16 was 3.65s/it (not a very scientific test). Also when VRAM runs low and stuff starts swapping everything gets insanely slow and sometimes it's impossible to even cancel the job. So being able to avoid that is its own kind of performance boost.

actually it is supposed to be slower due to type casting. it only reduces VRAM.

KohakuBlueleaf · 2023-12-08T10:35:12Z

Want to know the speed differences.

The speed difference seems pretty small to me, it actually seemed like fp8 is faster sometimes. With the generation I was running currently (1536x768, tiled diffusion) fp8 was like 4s/it, fp16 was 3.65s/it (not a very scientific test). Also when VRAM runs low and stuff starts swapping everything gets insanely slow and sometimes it's impossible to even cancel the job. So being able to avoid that is its own kind of performance boost.

actually it is supposed to be slower due to type casting. it only reduces VRAM.

Yes, I need to check we got the "slow down" effect to ensure it is using type casting

daxijiu · 2023-12-15T14:13:36Z

PyTorch 2.1.2 and xformers 0.0.23.post1 was released today.

Theliel · 2024-01-02T05:22:35Z

I haven't specifically tested this pull, but I haven't had any problems going from Torch2.0+Cuda11.8 to Torch2.1.2+Cuda12.1 on Win11. It was necessary to update xfomers to 0.0.23, and I had to recompile bitsandbytes for Windows+Cuda12.1

Everything seems to work correctly, on my system there seems to be a slight improvement of around 5-7%

update torch to 2.1.0

29f0414

AUTOMATIC1111 changed the base branch from master to dev November 26, 2023 09:10

AUTOMATIC1111 mentioned this pull request Nov 26, 2023

A big improvement for dtype casting system with fp8 storage type and manual cast #14031

Merged

4 tasks

Asteliks mentioned this pull request Nov 26, 2023

[Feature Request]: Update to Python 3.11 for some performance gains #14115

Open

1 task

AUTOMATIC1111 added 2 commits December 16, 2023 10:05

Merge branch 'dev' into torch210

e9c6325

torch 2.1.2

7745db6

AUTOMATIC1111 merged commit 60186c7 into dev Dec 16, 2023
6 checks passed

AUTOMATIC1111 deleted the torch210 branch December 16, 2023 07:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torch210 #14107

Torch210 #14107

AUTOMATIC1111 commented Nov 26, 2023 •

edited

Loading

AUTOMATIC1111 commented Nov 26, 2023

wfjsw commented Nov 26, 2023

freecoderwaifu commented Dec 3, 2023 •

edited

Loading

KerfuffleV2 commented Dec 7, 2023

KohakuBlueleaf commented Dec 7, 2023

KerfuffleV2 commented Dec 7, 2023

KohakuBlueleaf commented Dec 7, 2023

KerfuffleV2 commented Dec 7, 2023

KohakuBlueleaf commented Dec 7, 2023

KerfuffleV2 commented Dec 7, 2023

KohakuBlueleaf commented Dec 8, 2023

KerfuffleV2 commented Dec 8, 2023

KohakuBlueleaf commented Dec 8, 2023

FurkanGozukara commented Dec 8, 2023

KohakuBlueleaf commented Dec 8, 2023

daxijiu commented Dec 15, 2023

Theliel commented Jan 2, 2024

Torch210 #14107

Torch210 #14107

Conversation

AUTOMATIC1111 commented Nov 26, 2023 • edited Loading

Description

Checklist:

AUTOMATIC1111 commented Nov 26, 2023

wfjsw commented Nov 26, 2023

freecoderwaifu commented Dec 3, 2023 • edited Loading

KerfuffleV2 commented Dec 7, 2023

KohakuBlueleaf commented Dec 7, 2023

KerfuffleV2 commented Dec 7, 2023

KohakuBlueleaf commented Dec 7, 2023

KerfuffleV2 commented Dec 7, 2023

KohakuBlueleaf commented Dec 7, 2023

KerfuffleV2 commented Dec 7, 2023

KohakuBlueleaf commented Dec 8, 2023

KerfuffleV2 commented Dec 8, 2023

KohakuBlueleaf commented Dec 8, 2023

FurkanGozukara commented Dec 8, 2023

KohakuBlueleaf commented Dec 8, 2023

daxijiu commented Dec 15, 2023

Theliel commented Jan 2, 2024

AUTOMATIC1111 commented Nov 26, 2023 •

edited

Loading

freecoderwaifu commented Dec 3, 2023 •

edited

Loading