gguf : use Qn_K for k-quants instead of KQn #837

compilade · 2024-05-24T18:45:23Z

#822 (by @mofosyne) has introduced a naming convention for GGUF model files, but the way it names k-quants doesn't follow the established practice (all other places where k-quants are named use Qn_K where n is the number of bits per weight excluding the scales).

rg -i 'KQ\d' doesn't return anything related to quants except for this recently-added section, while
rg -i 'Q\d_K' returns a lot of things related to k-quants when run in ggml and llama.cpp repos

So this renames KQ2 to Q2_K, for consistency. This should avoid unnecessary confusion.

(note that the recently-added wiki page about "tensor encoding schemes" will need to be updated too, since it is the only other place I found to also use this KQ<X> naming scheme)

gguf : use Qn_K for k-quants instead of KQn

85a895a

ggerganov approved these changes May 24, 2024

View reviewed changes

ggerganov merged commit 8d6b703 into ggerganov:master May 24, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gguf : use Qn_K for k-quants instead of KQn #837

gguf : use Qn_K for k-quants instead of KQn #837

compilade commented May 24, 2024

gguf : use Qn_K for k-quants instead of KQn #837

gguf : use Qn_K for k-quants instead of KQn #837

Conversation

compilade commented May 24, 2024