Reading GGUF metadata with gguf-dump.py does not work for i-quants #5809

countzero · 2024-03-01T08:59:48Z

The gguf-dump.py script in the llama.cpp release b2297 is missing support for i-quants.

Steps to reproduce

Create or download a GGUF file in any IQ* format (e.g., miqu-1-70b-Requant-b2131-iMat-c32_ch400-IQ1_S_v3.gguf)
Copy the file to .\models\miqu-1-70b-sf.IQ1_S.gguf
Execute the following

python .\gguf-py\scripts\gguf-dump.py --no-tensors .\models\miqu-1-70b-sf.IQ1_S.gguf

See the error:

ValueError: 19 is not a valid GGMLQuantizationType

Expected behaviour

I expect the Python gguf-py library to support all possible GGUF formats.

Working example for k-quants:

python .\gguf-py\scripts\gguf-dump.py --no-tensors .\models\miqu-1-70b-sf.Q5_K_M.gguf

* Loading: .\models\miqu-1-70b-sf.Q5_K_M.gguf
* File is LITTLE endian, script is running on a LITTLE endian host.

* Dumping 26 key/value pair(s)
      1: UINT32     |        1 | GGUF.version = 3
      2: UINT64     |        1 | GGUF.tensor_count = 723
      3: UINT64     |        1 | GGUF.kv_count = 23
      4: STRING     |        1 | general.architecture = 'llama'
      5: STRING     |        1 | general.name = 'R:\\AI\\LLM\\source'
      6: UINT32     |        1 | llama.context_length = 32764
      7: UINT32     |        1 | llama.embedding_length = 8192
      8: UINT32     |        1 | llama.block_count = 80
      9: UINT32     |        1 | llama.feed_forward_length = 28672
     10: UINT32     |        1 | llama.rope.dimension_count = 128
     11: UINT32     |        1 | llama.attention.head_count = 64
     12: UINT32     |        1 | llama.attention.head_count_kv = 8
     13: FLOAT32    |        1 | llama.attention.layer_norm_rms_epsilon = 9.999999747378752e-06
     14: FLOAT32    |        1 | llama.rope.freq_base = 1000000.0
     15: UINT32     |        1 | general.file_type = 17
     16: STRING     |        1 | tokenizer.ggml.model = 'llama'
     17: [STRING]   |    32000 | tokenizer.ggml.tokens
     18: [FLOAT32]  |    32000 | tokenizer.ggml.scores
     19: [INT32]    |    32000 | tokenizer.ggml.token_type
     20: UINT32     |        1 | tokenizer.ggml.bos_token_id = 1
     21: UINT32     |        1 | tokenizer.ggml.eos_token_id = 2
     22: UINT32     |        1 | tokenizer.ggml.padding_token_id = 0
     23: BOOL       |        1 | tokenizer.ggml.add_bos_token = True
     24: BOOL       |        1 | tokenizer.ggml.add_eos_token = False
     25: STRING     |        1 | tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}{% if (message['"
     26: UINT32     |        1 | general.quantization_version = 2

Use-Case

I am extracting the metadata from any given GGUF model to automatically calculate the optimal runtime arguments for the server in the following PowerShell script: https://github.com/countzero/windows_llama.cpp/blob/v1.12.0/examples/server.ps1#L104

Question

@ggerganov Is there another way to only dump the metadata from a given GGUF model? Perhaps this could be an --inspect option of the gguf binary?

The text was updated successfully, but these errors were encountered:

ggerganov · 2024-03-01T13:51:13Z

Yes, this could be added as extra functionality to the gguf example

Nindaleth · 2024-03-02T22:57:27Z

I'm working on this, hope to have a PR ready Sunday evening EU time.

countzero · 2024-03-04T08:07:25Z

@Nindaleth & @ggerganov Thank you for the quick fix! It now works as expected for i-quants:

python .\gguf-py\scripts\gguf-dump.py --no-tensors .\models\miqu-1-70b-sf.Q5_K_M.gguf

* Loading: .\models\miqu-1-70b-sf.IQ1_S.gguf
* File is LITTLE endian, script is running on a LITTLE endian host.

* Dumping 26 key/value pair(s)
      1: UINT32     |        1 | GGUF.version = 3
      2: UINT64     |        1 | GGUF.tensor_count = 723
      3: UINT64     |        1 | GGUF.kv_count = 23
      4: STRING     |        1 | general.architecture = 'llama'
      5: STRING     |        1 | general.name = 'D:\\HF'
      6: UINT32     |        1 | llama.context_length = 32764
      7: UINT32     |        1 | llama.embedding_length = 8192
      8: UINT32     |        1 | llama.block_count = 80
      9: UINT32     |        1 | llama.feed_forward_length = 28672
     10: UINT32     |        1 | llama.rope.dimension_count = 128
     11: UINT32     |        1 | llama.attention.head_count = 64
     12: UINT32     |        1 | llama.attention.head_count_kv = 8
     13: FLOAT32    |        1 | llama.attention.layer_norm_rms_epsilon = 9.999999747378752e-06
     14: FLOAT32    |        1 | llama.rope.freq_base = 1000000.0
     15: UINT32     |        1 | general.file_type = 24
     16: STRING     |        1 | tokenizer.ggml.model = 'llama'
     17: [STRING]   |    32000 | tokenizer.ggml.tokens
     18: [FLOAT32]  |    32000 | tokenizer.ggml.scores
     19: [INT32]    |    32000 | tokenizer.ggml.token_type
     20: UINT32     |        1 | tokenizer.ggml.bos_token_id = 1
     21: UINT32     |        1 | tokenizer.ggml.eos_token_id = 2
     22: UINT32     |        1 | tokenizer.ggml.unknown_token_id = 0
     23: BOOL       |        1 | tokenizer.ggml.add_bos_token = True
     24: BOOL       |        1 | tokenizer.ggml.add_eos_token = False
     25: STRING     |        1 | tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}{% if (message['"
     26: UINT32     |        1 | general.quantization_version = 2

countzero added the bug-unconfirmed label Mar 1, 2024

countzero changed the title ~~Reading GGUF metadata with .\gguf-py\scripts\gguf-dump.py does not work for i-quants~~ Reading GGUF metadata with gguf-dump.py does not work for i-quants Mar 1, 2024

ggerganov added enhancement New feature or request good first issue Good for newcomers and removed bug-unconfirmed labels Mar 1, 2024

Nindaleth mentioned this issue Mar 2, 2024

gguf-dump: support i-quants #5841

Merged

ggerganov closed this as completed in #5841 Mar 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading GGUF metadata with gguf-dump.py does not work for i-quants #5809

Reading GGUF metadata with gguf-dump.py does not work for i-quants #5809

countzero commented Mar 1, 2024 •

edited

Loading

ggerganov commented Mar 1, 2024

Nindaleth commented Mar 2, 2024

countzero commented Mar 4, 2024 •

edited

Loading

Reading GGUF metadata with gguf-dump.py does not work for i-quants #5809

Reading GGUF metadata with gguf-dump.py does not work for i-quants #5809

Comments

countzero commented Mar 1, 2024 • edited Loading

Steps to reproduce

Expected behaviour

Use-Case

Question

ggerganov commented Mar 1, 2024

Nindaleth commented Mar 2, 2024

countzero commented Mar 4, 2024 • edited Loading

countzero commented Mar 1, 2024 •

edited

Loading

countzero commented Mar 4, 2024 •

edited

Loading