[QST] Cute GPTQ with asymmetric 4-bit encoding #1149

kroburg · 2023-10-17T21:38:54Z

Hello.

I've implemented GPTQ with asymmetric 4-bit encoding using CuTe.

GPTQ is about matrices weights quantization using N-bit using scale and zero-point of group of (128) elements. Activations are in fp16 format. It works good in sense of model quality for medium models (7B, 33B).

During work I introduced some fixes into CuTe codebase which are (as I feel) not really good from design/architecture perspective.

Also I got boring performance. It takes 2.4ms for MNK 2048, 4096, 4096 vs 1.7ms with cuBlas (fp16xfp16) on 4090GTX. It looks like it will be much faster to dequantize initial weights into (temporal) fp16 and run cuBlas.

Examing Questions/PRs shows me that there is a demand for such GEMM. So I wish to contribute it somehow.

The questions are:

What is the best way to submit my fixes for subbyte support?
Those fixes are tightly connected to my GemmUniveral and MixedMma mainloop. What about new example?
Can you please help with performance issue?

thakkarV · 2023-10-18T13:42:24Z

Hi! thanks a lot for your question.

We have updates to subbyte iterator support in CuTe landing with the 3.3 release in the coming weeks. I would prefer you you wait until they come out and see if they resolve your issues. If they do not, a PR rebased on top of 3.3 will be welcome :)
A new example would be the best way to go about it. That said, we have mixed input GEMMs with dequant fused into the mainloop coming with the 3.3 release as well that will be a first class citizen of the 3.x API. Please see #1134
Sure. Have you compared your perf against the existing Ampere dequnt fused kernel implementations?

kroburg · 2023-11-02T19:24:20Z

So, it is. Thank you!
I will explore new codebase and rebase.

github-actions · 2023-12-02T20:04:59Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

mnicely · 2023-12-05T17:09:03Z

Closing due to inactivity. Please reopen if needed

jeromeku · 2024-03-09T04:24:25Z

@kroburg
Very interested in subbyte quantization implementations using Cutlass 3.x / CuTe. Would you consider open sourcing your GPTQ implementation? Curious how you handled the smem layout / register shuffling to get the right thread / value when dealing with 4 bit types...

kroburg · 2024-03-11T09:51:30Z

@kroburg Very interested in subbyte quantization implementations using Cutlass 3.x / CuTe. Would you consider open sourcing your GPTQ implementation? Curious how you handled the smem layout / register shuffling to get the right thread / value when dealing with 4 bit types...

Hello.

I don't store encoded data in smem. The pipeline is global -> load -> decode -> store in smem.
So smem layout is ordinary fp16 layout.

Opensourcing is not feasible because at the moment of implementation cute had some issues with subbyte data types. And I fixed them in not very good manner. Since then cute got a large update on that functionality and my code should be rewritten.

UPD I published just "kernel" (since all other stuff is not even compilable). May be it will help you to find some insights. https://github.com/kroburg/cute_gptq/blob/main/cute_gptq_70.hpp

kroburg added ? - Needs Triage question Question labels Oct 17, 2023

thakkarV removed the ? - Needs Triage label Oct 18, 2023

github-actions bot added the inactive-30d label Dec 2, 2023

mnicely closed this as completed Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Cute GPTQ with asymmetric 4-bit encoding #1149

[QST] Cute GPTQ with asymmetric 4-bit encoding #1149

kroburg commented Oct 17, 2023

thakkarV commented Oct 18, 2023

kroburg commented Nov 2, 2023

github-actions bot commented Dec 2, 2023

mnicely commented Dec 5, 2023

jeromeku commented Mar 9, 2024

kroburg commented Mar 11, 2024 •

edited

Loading

[QST] Cute GPTQ with asymmetric 4-bit encoding #1149

[QST] Cute GPTQ with asymmetric 4-bit encoding #1149

Comments

kroburg commented Oct 17, 2023

thakkarV commented Oct 18, 2023

kroburg commented Nov 2, 2023

github-actions bot commented Dec 2, 2023

mnicely commented Dec 5, 2023

jeromeku commented Mar 9, 2024

kroburg commented Mar 11, 2024 • edited Loading

kroburg commented Mar 11, 2024 •

edited

Loading