Replies: 1 comment
-
fp8 on a modern 4090 or similar card, is fully utilizing the power of your modern GPU (native fp8 support) gguf on any card is naturally slower as it's a custom kernel algorithm doing a non-native translation between a custom data format into a native format before execution. How much slower it is is situational, and there might be room for improvement in the relevant gguf implementation, but it will always be at least a bit slower. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
is that right that gguf at them moment in combination with a lora is 2times slower than fp8 ?
can that be faster some day or is that the nature of gguf and cuda and gpu ?
Beta Was this translation helpful? Give feedback.
All reactions