Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yuya/add checkpoints section #9329

Merged
merged 15 commits into from
Jul 17, 2024
418 changes: 418 additions & 0 deletions docs/source/checkpoints/dist_ckpt.rst

Large diffs are not rendered by default.

64 changes: 64 additions & 0 deletions docs/source/checkpoints/intro.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
Checkpoints
===========


In this section, we present key functionalities of NVIDIA NeMo related to checkpoint management.

Understanding Checkpoint Formats
--------------------------------

A ``.nemo`` checkpoint is fundamentally a tar file that bundles the model configurations (given as a YAML file), model weights, and other artifacts like tokenizer models or vocabulary files. This consolidated design streamlines sharing, loading, tuning, evaluating, and inference.
yaoyu-33 marked this conversation as resolved.
Show resolved Hide resolved

In contrast, the ``.ckpt`` file, created during PyTorch Lightning training, contains both the model weights and the optimizer states, and is usually used to resume training.

Sharded Model Weights
---------------------

Within ``.nemo`` or ``.ckpt`` checkpoints, the model weights could be saved in either a regular format (one file called ``model_weights.ckpt`` inside model parallelism folders) or a sharded format (a folder called ``model_weights``).

With sharded model weights, you can save and load the state of your training script with multiple GPUs or nodes more efficiently and avoid the need to change model partitions when you resume tuning with a different model parallelism setup.

NeMo supports the distributed (sharded) checkpoint format from Megatron Core. In Megatron Core, it supports two backends: PyTorch-based (recommended) and Zarr-based (deprecated).
yaoyu-33 marked this conversation as resolved.
Show resolved Hide resolved
For a detailed explanation check the :doc:`dist_ckpt` guide.


Quantized Checkpoints
---------------------

NeMo provides a :doc:`Post-Training Quantization <../nlp/quantization>` workflow that allows you to convert regular ``.nemo`` models into a `TensorRT-LLM checkpoint <https://nvidia.github.io/TensorRT-LLM/architecture/checkpoint.html>`_, commonly referred to as ``.qnemo`` checkpoints in NeMo. These ``.qnemo`` checkpoints can then be used with the `NVIDIA TensorRT-LLM library <https://nvidia.github.io/TensorRT-LLM/index.html>`_ for efficient inference.

A ``.qnemo`` checkpoint, similar to ``.nemo`` checkpoints, is a tar file that bundles the model configuration specified in the ``config.json`` file along with the ``rank{i}.safetensors`` files. These ``.safetensors`` files store the model weights for each rank individually. In addition, a ``tokenizer_config.yaml`` file is saved, containing only the tokenizer section from the original NeMo ``model_config.yaml`` file. This configuration file defines the tokenizer used by the given model.

When working with large quantized LLMs, it is recommended that you use a directory rather than a tar file. You can control this behavior by setting the ``compress`` flag when exporting quantized models in `PTQ configuration file <https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/conf/megatron_gpt_ptq.yaml>`_.
yaoyu-33 marked this conversation as resolved.
Show resolved Hide resolved

The following example shows the contents of a quantized model intended to be served using two GPUs (ranks):

.. code-block:: bash

model-qnemo
├── config.json
├── rank0.safetensors
├── rank1.safetensors
├── tokenizer.model
└── tokenizer_config.yaml

Community Checkpoint Converter
-----------------------------
We provide easy-to-use tools that enable users to convert community checkpoints into the NeMo format. These tools facilitate various operations, including resuming training, Supervised Fine-Tuning (SFT), Parameter-Efficient Fine-Tuning (PEFT), and deployment. For detailed instructions and guidelines, please refer to our documentation.

We offer comprehensive guides to assist both end users and developers:

- **User Guide**: Detailed steps on how to convert community model checkpoints for further training or deployment within NeMo. For more information, please see our :doc:`user_guide`.

- **Developer Guide**: Instructions for developers on how to implement converters for community model checkpoints, allowing for broader compatibility and integration within the NeMo ecosystem. For development details, refer to our :doc:`dev_guide`.

- **Megatron-LM Checkpoint Conversion**: NVIDIA NeMo and NVIDIA Megatron-LM share several foundational technologies. You can convert your GPT-style model checkpoints trained with Megatron-LM into the NeMo Framework using our scripts, see our :doc:`convert_mlm`.

.. toctree::
:maxdepth: 1
:caption: NeMo Checkpoints

dist_ckpt
user_guide
dev_guide
convert_mlm
22 changes: 0 additions & 22 deletions docs/source/ckpt_converters/intro.rst

This file was deleted.

6 changes: 3 additions & 3 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,10 @@ For more information, browse the developer docs for your area of interest in the

.. toctree::
:maxdepth: 1
:caption: Community Model Converters
:name: CheckpointConverters
:caption: Training Checkpoints
yaoyu-33 marked this conversation as resolved.
Show resolved Hide resolved
:name: Checkpoints

ckpt_converters/intro
checkpoints/intro

.. toctree::
:maxdepth: 1
Expand Down
Loading