Skip to content

Commit

Permalink
gguf.md: GGUF Filename Parsing Strategy
Browse files Browse the repository at this point in the history
  • Loading branch information
mofosyne committed May 15, 2024
1 parent 1bf1ab5 commit b489520
Showing 1 changed file with 23 additions and 3 deletions.
26 changes: 23 additions & 3 deletions docs/gguf.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,12 @@ The key difference between GGJT and GGUF is the use of a key-value structure for

### GGUF Naming Convention

GGUF follow a naming convention of `<Model>-<Version>-<ExpertsCount>x<Parameters>-<Quantization>.gguf`.
GGUF follow a naming convention of `<Model>-<Version>-<ExpertsCount>x<Parameters>-<Quantization>.gguf`

The components are:
1. **Model**: A descriptive name for the model type or architecture.
2. **Version (Optional)**: Denotes the model version number, starting at `v1` if not specified, formatted as `v<Major>.<Minor>`.
- Best practice to include model version number only if model has multiple versions and assume the unversioned model to be the first version and/or check the model card.
2. **Version**: (Optional) Denotes the model version number, formatted as `v<Major>.<Minor>`
- If model is missing a version number then assume `v0.0` (Prerelease)
3. **ExpertsCount**: Indicates the number of experts found in a Mixture of Experts based model.
4. **Parameters**: Indicates the number of parameters and their scale, represented as `<count><scale-prefix>`:
- `T`: Trillion parameters.
Expand All @@ -45,6 +45,26 @@ The components are:
- Even Number (0 or 2): `<model weights> = <scaling factor> * <quantised weight>`
- Odd Number (1 or 3): `<model weights> = <offset factor> + <scaling factor> * <quantised weight>`

#### Parsing Above Naming Convention

To correctly parse a well formed naming convention based gguf filename, it is recommended to read from right to left using `-` as the delimiter. This strategy allow for the most flexibility in model name to include dashes if they so choose, while at the same time allowing for version string to be optional. This approach also gives some future proofing to extend the format if needed in the future.

For example:

* `mixtral-v0.1-8x7B-Q2_K.gguf`:
- Model Name: Mixtral
- Version Number: v0.1
- Expert Count: 8
- Parameter Count: 7B
- Quantization: Q2_K

* `Hermes-2-Pro-Llama-3-8B-F16.gguf`:
- Model Name: Hermes 2 Pro Llama
- Version Number: v0.0 (`<Version>-` missing)
- Expert Count: 0 (`<ExpertsCount>x` missing)
- Parameter Count: 8B
- Quantization: F16

### File Structure

![image](https://github.com/ggerganov/ggml/assets/1991296/c3623641-3a1d-408e-bfaf-1b7c4e16aa63)
Expand Down

0 comments on commit b489520

Please sign in to comment.