From b489520361077cfe3f7787dbb2fefbbfa321b9f4 Mon Sep 17 00:00:00 2001 From: brian khuu Date: Wed, 15 May 2024 17:30:21 +1000 Subject: [PATCH] gguf.md: GGUF Filename Parsing Strategy --- docs/gguf.md | 26 +++++++++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/docs/gguf.md b/docs/gguf.md index 1e21ba5c8..161e18822 100644 --- a/docs/gguf.md +++ b/docs/gguf.md @@ -20,12 +20,12 @@ The key difference between GGJT and GGUF is the use of a key-value structure for ### GGUF Naming Convention -GGUF follow a naming convention of `--x-.gguf`. +GGUF follow a naming convention of `--x-.gguf` The components are: 1. **Model**: A descriptive name for the model type or architecture. -2. **Version (Optional)**: Denotes the model version number, starting at `v1` if not specified, formatted as `v.`. - - Best practice to include model version number only if model has multiple versions and assume the unversioned model to be the first version and/or check the model card. +2. **Version**: (Optional) Denotes the model version number, formatted as `v.` + - If model is missing a version number then assume `v0.0` (Prerelease) 3. **ExpertsCount**: Indicates the number of experts found in a Mixture of Experts based model. 4. **Parameters**: Indicates the number of parameters and their scale, represented as ``: - `T`: Trillion parameters. @@ -45,6 +45,26 @@ The components are: - Even Number (0 or 2): ` = * ` - Odd Number (1 or 3): ` = + * ` +#### Parsing Above Naming Convention + +To correctly parse a well formed naming convention based gguf filename, it is recommended to read from right to left using `-` as the delimiter. This strategy allow for the most flexibility in model name to include dashes if they so choose, while at the same time allowing for version string to be optional. This approach also gives some future proofing to extend the format if needed in the future. + +For example: + + * `mixtral-v0.1-8x7B-Q2_K.gguf`: + - Model Name: Mixtral + - Version Number: v0.1 + - Expert Count: 8 + - Parameter Count: 7B + - Quantization: Q2_K + + * `Hermes-2-Pro-Llama-3-8B-F16.gguf`: + - Model Name: Hermes 2 Pro Llama + - Version Number: v0.0 (`-` missing) + - Expert Count: 0 (`x` missing) + - Parameter Count: 8B + - Quantization: F16 + ### File Structure ![image](https://github.com/ggerganov/ggml/assets/1991296/c3623641-3a1d-408e-bfaf-1b7c4e16aa63)