Replies: 2 comments
-
wow nf4 got huge quality degrade compared to llm.int8() |
Beta Was this translation helpful? Give feedback.
0 replies
-
Related #9165 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Introduction
See relevant threads here first: #6500, #7023.
We have a number of pipelines that use a transformer-based backbone for the diffusion process:
With more and more coming, we might want to take advantage of the lower-precision computation capabilities explored in the transformer area (from the LLM world). Two widely popular studies in this area:
So, I decided to give these exotic data-types and methods a try through the
bitsandbytes
library. Before setting your expectations too high, please be aware of what to expect here: #6500 (comment).Experiments
Code
Script to launch experiments in bulk
Quantitative results
PixArt-Sigma
SD3
👁️ Interesting to see the correlation between the memory reduction and the latency improvements doesn't always carry equally for the transformer backbone being used. For example, the memory reduction is far more evident for SD3 than it is for PixArt-Sigma. 👁️
Visual results
All the results are with the "a golden vase with different flowers" prompt and the default pipeline call arguments weren't changed.
Beta Was this translation helpful? Give feedback.
All reactions