Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FLUX Speed Improvements (~10% speedup) #7399

Merged
merged 3 commits into from
Nov 29, 2024

Conversation

RyanJDick
Copy link
Collaborator

Summary

This PR includes several small speed improvements to FLUX inference:

  • Use torch.nn.functional.rms_norm(...) rather than the custom implementation
  • Reduce tensor type casting in apply_rope(...)
  • Use .view(...) over .reshape(...) to be sure that the underlying tensor data is contiguous and shared.

After these changes, some operations are now run at a lower precision than before, which results in slight differences in the generated images.

Speedup

Configuration: 1024x1024, 15 steps
Before:

  • bf16: 0.481 secs / iter
  • BnB int8: 0.521 secs / iter

After:

  • bf16: 0.435 secs / iter (9.6% speedup)
  • BnB int8: 0.468 secs / iter (9.1% speedup)

Image Change

Left=before, Right=after

Prompt: "An architecture rendering of the reception area of a corporate office with modern decor."

Before After

Prompt: "A pixar cartoon rendering of a frog with big eyes watching a frog."

Before After

Prompt: "A portrait photo of a man with blonde hair and glasses wearing a suit and tie."

Before After

QA Instructions

  • Generated before / after comparison images with the same seed as shown above.
  • bf16 inference on CUDA
  • BnB int8 inference on CUDA
  • Test on MacOS

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

@github-actions github-actions bot added python PRs that change python files backend PRs that change backend files labels Nov 29, 2024
@hipsterusername hipsterusername force-pushed the ryan/flux-speed-improvements branch from 184e0f3 to a03721d Compare November 29, 2024 17:24
@hipsterusername hipsterusername enabled auto-merge (rebase) November 29, 2024 17:30
@hipsterusername hipsterusername merged commit 021552f into main Nov 29, 2024
14 checks passed
@hipsterusername hipsterusername deleted the ryan/flux-speed-improvements branch November 29, 2024 17:32
RyanJDick added a commit that referenced this pull request Dec 3, 2024
## Summary

#7422

As reported in the above ticket, a recent FLUX performance improvement
caused a regression on MacOS. This PR reverts the offending part of the
change.

## Related Issues / Discussions

- Closes #7422 
- Original perf improvement:
#7399

## QA Instructions

I don't have a Mac capable of running this test, so trusting the report
in #7422 that this fixes the problem.

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend PRs that change backend files python PRs that change python files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants