Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading GGUF metadata with gguf-dump.py does not work for i-quants #5809

Closed
countzero opened this issue Mar 1, 2024 · 3 comments · Fixed by #5841
Closed

Reading GGUF metadata with gguf-dump.py does not work for i-quants #5809

countzero opened this issue Mar 1, 2024 · 3 comments · Fixed by #5841
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@countzero
Copy link

countzero commented Mar 1, 2024

The gguf-dump.py script in the llama.cpp release b2297 is missing support for i-quants.

Steps to reproduce

  1. Create or download a GGUF file in any IQ* format (e.g., miqu-1-70b-Requant-b2131-iMat-c32_ch400-IQ1_S_v3.gguf)
  2. Copy the file to .\models\miqu-1-70b-sf.IQ1_S.gguf
  3. Execute the following
python .\gguf-py\scripts\gguf-dump.py --no-tensors .\models\miqu-1-70b-sf.IQ1_S.gguf
  1. See the error:
ValueError: 19 is not a valid GGMLQuantizationType

Expected behaviour

I expect the Python gguf-py library to support all possible GGUF formats.

Working example for k-quants:

python .\gguf-py\scripts\gguf-dump.py --no-tensors .\models\miqu-1-70b-sf.Q5_K_M.gguf
* Loading: .\models\miqu-1-70b-sf.Q5_K_M.gguf
* File is LITTLE endian, script is running on a LITTLE endian host.

* Dumping 26 key/value pair(s)
      1: UINT32     |        1 | GGUF.version = 3
      2: UINT64     |        1 | GGUF.tensor_count = 723
      3: UINT64     |        1 | GGUF.kv_count = 23
      4: STRING     |        1 | general.architecture = 'llama'
      5: STRING     |        1 | general.name = 'R:\\AI\\LLM\\source'
      6: UINT32     |        1 | llama.context_length = 32764
      7: UINT32     |        1 | llama.embedding_length = 8192
      8: UINT32     |        1 | llama.block_count = 80
      9: UINT32     |        1 | llama.feed_forward_length = 28672
     10: UINT32     |        1 | llama.rope.dimension_count = 128
     11: UINT32     |        1 | llama.attention.head_count = 64
     12: UINT32     |        1 | llama.attention.head_count_kv = 8
     13: FLOAT32    |        1 | llama.attention.layer_norm_rms_epsilon = 9.999999747378752e-06
     14: FLOAT32    |        1 | llama.rope.freq_base = 1000000.0
     15: UINT32     |        1 | general.file_type = 17
     16: STRING     |        1 | tokenizer.ggml.model = 'llama'
     17: [STRING]   |    32000 | tokenizer.ggml.tokens
     18: [FLOAT32]  |    32000 | tokenizer.ggml.scores
     19: [INT32]    |    32000 | tokenizer.ggml.token_type
     20: UINT32     |        1 | tokenizer.ggml.bos_token_id = 1
     21: UINT32     |        1 | tokenizer.ggml.eos_token_id = 2
     22: UINT32     |        1 | tokenizer.ggml.padding_token_id = 0
     23: BOOL       |        1 | tokenizer.ggml.add_bos_token = True
     24: BOOL       |        1 | tokenizer.ggml.add_eos_token = False
     25: STRING     |        1 | tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}{% if (message['"
     26: UINT32     |        1 | general.quantization_version = 2

Use-Case

I am extracting the metadata from any given GGUF model to automatically calculate the optimal runtime arguments for the server in the following PowerShell script: https://github.com/countzero/windows_llama.cpp/blob/v1.12.0/examples/server.ps1#L104

Question

@ggerganov Is there another way to only dump the metadata from a given GGUF model? Perhaps this could be an --inspect option of the gguf binary?

@countzero countzero changed the title Reading GGUF metadata with .\gguf-py\scripts\gguf-dump.py does not work for i-quants Reading GGUF metadata with gguf-dump.py does not work for i-quants Mar 1, 2024
@ggerganov
Copy link
Owner

Yes, this could be added as extra functionality to the gguf example

@ggerganov ggerganov added enhancement New feature or request good first issue Good for newcomers and removed bug-unconfirmed labels Mar 1, 2024
@Nindaleth
Copy link
Contributor

I'm working on this, hope to have a PR ready Sunday evening EU time.

@countzero
Copy link
Author

countzero commented Mar 4, 2024

@Nindaleth & @ggerganov Thank you for the quick fix! It now works as expected for i-quants:

python .\gguf-py\scripts\gguf-dump.py --no-tensors .\models\miqu-1-70b-sf.Q5_K_M.gguf
* Loading: .\models\miqu-1-70b-sf.IQ1_S.gguf
* File is LITTLE endian, script is running on a LITTLE endian host.

* Dumping 26 key/value pair(s)
      1: UINT32     |        1 | GGUF.version = 3
      2: UINT64     |        1 | GGUF.tensor_count = 723
      3: UINT64     |        1 | GGUF.kv_count = 23
      4: STRING     |        1 | general.architecture = 'llama'
      5: STRING     |        1 | general.name = 'D:\\HF'
      6: UINT32     |        1 | llama.context_length = 32764
      7: UINT32     |        1 | llama.embedding_length = 8192
      8: UINT32     |        1 | llama.block_count = 80
      9: UINT32     |        1 | llama.feed_forward_length = 28672
     10: UINT32     |        1 | llama.rope.dimension_count = 128
     11: UINT32     |        1 | llama.attention.head_count = 64
     12: UINT32     |        1 | llama.attention.head_count_kv = 8
     13: FLOAT32    |        1 | llama.attention.layer_norm_rms_epsilon = 9.999999747378752e-06
     14: FLOAT32    |        1 | llama.rope.freq_base = 1000000.0
     15: UINT32     |        1 | general.file_type = 24
     16: STRING     |        1 | tokenizer.ggml.model = 'llama'
     17: [STRING]   |    32000 | tokenizer.ggml.tokens
     18: [FLOAT32]  |    32000 | tokenizer.ggml.scores
     19: [INT32]    |    32000 | tokenizer.ggml.token_type
     20: UINT32     |        1 | tokenizer.ggml.bos_token_id = 1
     21: UINT32     |        1 | tokenizer.ggml.eos_token_id = 2
     22: UINT32     |        1 | tokenizer.ggml.unknown_token_id = 0
     23: BOOL       |        1 | tokenizer.ggml.add_bos_token = True
     24: BOOL       |        1 | tokenizer.ggml.add_eos_token = False
     25: STRING     |        1 | tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}{% if (message['"
     26: UINT32     |        1 | general.quantization_version = 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants