ggerganov · ggerganov · May 17, 2024 · May 14, 2024 · May 14, 2024 · May 15, 2024
diff --git a/docs/gguf.md b/docs/gguf.md
@@ -18,6 +18,32 @@ GGUF is a format based on the existing GGJT, but makes a few changes to the form
 
 The key difference between GGJT and GGUF is the use of a key-value structure for the hyperparameters (now referred to as metadata), rather than a list of untyped values. This allows for new metadata to be added without breaking compatibility with existing models, and to annotate the model with additional information that may be useful for inference or for identifying the model.
 
+### GGUF Naming Convention
+
+GGUF follow a naming convention of `<Model>-<Version>-<ExpertsCount>x<Parameters>-<Quantization>.gguf`.
+
+The components are:
+1. **Model**: A descriptive name for the model type or architecture.
+2. **Version (Optional)**: Denotes the model version number, starting at `v1` if not specified, formatted as `v<Major>.<Minor>`.
+    - Best practice to include model version number only if model has multiple versions and assume the unversioned model to be the first version and/or check the model card.
+3. **ExpertsCount**: Indicates the number of experts found in a Mixture of Experts based model.
+4. **Parameters**: Indicates the number of parameters and their scale, represented as `<count><scale-prefix>`:
+    - `T`: Trillion parameters.
+    - `B`: Billion parameters.
+    - `M`: Million parameters.
+    - `K`: Thousand parameters.
+5. **Quantization**: This part specifies how the model parameters are quantized or compressed.
+   - Uncompressed formats:
+     - `F16`: 16-bit floats per weight
+     - `F32`: 32-bit floats per weight
+   - Quantization (Compression) formats:
+     - `Q<X>`: X bits per weight, where `X` could be `4` (for 4 bits) or `8` (for 8 bits) etc...
+     - Variants provide further details on how the quantized weights are interpreted:
+       - `_K`: k-quant models, which further have specifiers like `_S`, `_M`, and `_L` for small, medium, and large, respectively, if they are not specified, it defaults to medium.
+       - `_<num>`: Different approaches, with even numbers indicating the model weights as a scaling factor multiplied by the quantized weight and odd numbers indicating the model weights as a combination of an offset factor plus a scaling factor multiplied by the quantized weight.
+            - Even Number (0 or 2): `<model weights> = <scaling factor> * <quantised weight>`
+            - Odd Number (1 or 3): `<model weights> = <offset factor> + <scaling factor> * <quantised weight>`
+
 ### File Structure
 
 ![image](https://github.com/ggerganov/ggml/assets/1991296/c3623641-3a1d-408e-bfaf-1b7c4e16aa63)