Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization Fixes and Improvements #575

Merged
merged 14 commits into from
Apr 7, 2023
Merged

Optimization Fixes and Improvements #575

merged 14 commits into from
Apr 7, 2023

Conversation

NullSenseStudio
Copy link
Collaborator

@NullSenseStudio NullSenseStudio commented Feb 21, 2023

Bugs

Fixes a bug where most speed optimizations are not visible in CUDA and sequential CPU offload was wrongly visible in macOS. Also fixes a bug with CPU offloading that was supposed to already be fixed, but I believe was accidentally reintroduced and missed while merging at some point.

Improvements

I've extracted the device checking functionality from Optimizations.can_use() to its own classmethod device_supports() to simplify and make optimization filtering clearer. Descriptions have been added to all optimizations with some indication of what each does and if there are certain device limitations on them.

Removed AMP

Automatic mixed precision was removed due to 🤗 diffusers recommending against it. Any memory savings it would have can be done better with half precision, often times being faster, and with the same image quality.

New Optimizations

Memory efficient attention from xFormers saves VRAM and inference time. I've noticed memory savings around the same as attention slicing size 1 and an increase in it/s of 20% for normal image generation. It can overtake attention slicing memory savings if the image is larger than usual and when upscaling. xFormers may automatically select different attention optimizations for some GPUs/generations so improvements will vary.
CPU offloading has been split into model and submodule options; submodule being a rename of sequential CPU offload. Model offloading is a lesser version of submodule offloading: not as significant of memory savings while not severely slowing inference speed.

Discussion

xFormers is often recommended for use in diffusers and may be suitable to be enabled by default. One issue is I've heard some GPUs won't quite generate the same image with the same settings while using xFormers so could cause unneeded confusion. It also causes a warning message when installed A matching Triton is not available, some optimizations will not be enabled. which should be suppressed, and I don't know a good method how to. Triton is only officially available for linux and could be added to a linux specific requirements.txt.
Model offloading requires accelerate 0.17.0, which isn't released yet. Windows/linux requirements.txt could be updated to install from github or wait for it to be officially released (and based on release history that likely could be soon).

@NullSenseStudio
Copy link
Collaborator Author

With the addition of memory efficient attention and VAE tiling it's possible to create some quite large images without requiring an equally enormous amount of VRAM. Great for textures and backgrounds as long as the subject can be repeated easily and if you're willing to wait.
magical forest
brick texture (269689527)
an ocean full of colorful fish (269689527)

VAE tiling can have some issues with color and detail accuracy. AFAIK this is caused by normalization operations, tiling the latents will not allow each section to normalize the same. Tile size set to 128 and blend 0 for demonstration. While it can save a little memory on standard size images I don't recommend it.
a house on a hill (21247002) combined

Some prompts may not produce usable results at high resolutions. These were meant to be people.
selfie (1369286764)

Interestingly tiled VAE decoding and encoding was merged into diffusers recently huggingface/diffusers#1441. Doesn't support seamless axis blending like the implementation here. The tiled encoding for img2img and inpainting is a nice idea and could be added in a later PR. Also some discussion of improving tiling in the link, might be worth looking into at some point.

@NullSenseStudio NullSenseStudio marked this pull request as ready for review March 11, 2023 01:43
@carson-katri
Copy link
Owner

Would you be interested in trying out the new attention mechanisms built-in to PyTorch 2.0 for this? It should be automatically enabled in diffusers if PyTorch 2 is installed.

https://pytorch.org/blog/accelerated-diffusers-pt-20/

@NullSenseStudio
Copy link
Collaborator Author

Nice to see that PyTorch 2.0 just got released. I have already tried it out in previous nightly builds and saw similar performance to xFormers attention, but looking at that blog post it aught to do better for newer GPUs.

A native C++ implementation suitable for non-CUDA devices or when high-precision is required.

Hope that means there'll be some meaningful MPS and CPU improvement as well.

I'll see about adding this in soon.

Copy link
Owner

@carson-katri carson-katri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Everything looks good overall.

If PyTorch 2's scaled_dot_product_attention is similar enough in performance to xformers, I'd rather just upgrade that dependency than add another though.

@carson-katri carson-katri added this to the v0.2.0 milestone Mar 25, 2023
@NullSenseStudio
Copy link
Collaborator Author

I'll see about adding this in soon.

That certainly didn't go as planned.

Anywho, got the PyTorch 2.0 SDP attention optimization added, and set to be on by default since that's how it is in diffusers and it appears to alter the image less than xFormers attention. I have also tested it on CPU, but I don't see any difference in speed or memory usage. Not so sure it'll help on MPS now. DirectML is also not compatible for the time being, it at least needs to be released specifically for PyTorch 2.0 or newer.

Copy link
Owner

@carson-katri carson-katri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@carson-katri carson-katri merged commit 24821fa into main Apr 7, 2023
@carson-katri carson-katri deleted the optimizations branch April 7, 2023 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants