Skip to content

Commit

Permalink
Yuya/add checkpoints section (NVIDIA#9329)
Browse files Browse the repository at this point in the history
* Add checkpoints section

Signed-off-by: yaoyu-33 <[email protected]>

* Fix title

Signed-off-by: yaoyu-33 <[email protected]>

* update

Signed-off-by: yaoyu-33 <[email protected]>

* Add section on ".qnemo" checkpoints (NVIDIA#9503)

* Add 'Quantized Checkpoints' section

Signed-off-by: Jan Lasek <[email protected]>

* Address review comments

Signed-off-by: Jan Lasek <[email protected]>

---------

Signed-off-by: Jan Lasek <[email protected]>

* Distributed checkpointing user guide (NVIDIA#9494)

* Describe shardings and entrypoints

Signed-off-by: Mikołaj Błaż <[email protected]>

* Strategies, optimizers, finalize entrypoints

Signed-off-by: Mikołaj Błaż <[email protected]>

* Transformations

Signed-off-by: Mikołaj Błaż <[email protected]>

* Integration

Signed-off-by: Mikołaj Błaż <[email protected]>

* Add link from intro

Signed-off-by: Mikołaj Błaż <[email protected]>

* Apply grammar suggestions

Signed-off-by: Mikołaj Błaż <[email protected]>

* Explain the example

Signed-off-by: Mikołaj Błaż <[email protected]>

* Apply review suggestions

Signed-off-by: Mikołaj Błaż <[email protected]>

* Add zarr and torch_dist explanation

---------

Signed-off-by: Mikołaj Błaż <[email protected]>

* add subsection

Signed-off-by: yaoyu-33 <[email protected]>

* Update docs/source/checkpoints/intro.rst

Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Yu Yao <[email protected]>

* address comments

Signed-off-by: yaoyu-33 <[email protected]>

* fix

Signed-off-by: yaoyu-33 <[email protected]>

* fix code block

Signed-off-by: yaoyu-33 <[email protected]>

* address comments

Signed-off-by: yaoyu-33 <[email protected]>

* formatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix

Signed-off-by: yaoyu-33 <[email protected]>

* fix

Signed-off-by: yaoyu-33 <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Jan Lasek <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: Yu Yao <[email protected]>
Co-authored-by: Jan Lasek <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Malay Nagda <[email protected]>
  • Loading branch information
4 people authored and Malay Nagda committed Jul 26, 2024
1 parent 961dcf2 commit c06e0ae
Show file tree
Hide file tree
Showing 7 changed files with 494 additions and 25 deletions.
File renamed without changes.
File renamed without changes.
427 changes: 427 additions & 0 deletions docs/source/checkpoints/dist_ckpt.rst

Large diffs are not rendered by default.

64 changes: 64 additions & 0 deletions docs/source/checkpoints/intro.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
Checkpoints
===========


In this section, we present key functionalities of NVIDIA NeMo related to checkpoint management.

Understanding Checkpoint Formats
--------------------------------

A ``.nemo`` checkpoint is fundamentally a tar file that bundles the model configurations (specified inside a YAML file), model weights (inside a ``.ckpt`` file), and other artifacts like tokenizer models or vocabulary files. This consolidated design streamlines sharing, loading, tuning, evaluating, and inference.

In contrast, the ``.ckpt`` file, created during PyTorch Lightning training, contains both the model weights and the optimizer states, and is usually used to resume training.

Sharded Model Weights
---------------------

Within ``.nemo`` or ``.ckpt`` checkpoints, the model weights could be saved in either a regular format (one file called ``model_weights.ckpt`` inside model parallelism folders) or a sharded format (a folder called ``model_weights``).

With sharded model weights, you can save and load the state of your training script with multiple GPUs or nodes more efficiently and avoid the need to change model partitions when you resume tuning with a different model parallelism setup.

NeMo supports the distributed (sharded) checkpoint format from Megatron Core. Megatron Core supports two checkpoint backends: PyTorch-based (recommended) and Zarr-based (deprecated).
For a detailed explanation check the :doc:`dist_ckpt` guide.


Quantized Checkpoints
---------------------

NeMo provides a :doc:`Post-Training Quantization <../nlp/quantization>` workflow that allows you to convert regular ``.nemo`` models into a `TensorRT-LLM checkpoint <https://nvidia.github.io/TensorRT-LLM/architecture/checkpoint.html>`_, commonly referred to as ``.qnemo`` checkpoints in NeMo. These ``.qnemo`` checkpoints can then be used with the `NVIDIA TensorRT-LLM library <https://nvidia.github.io/TensorRT-LLM/index.html>`_ for efficient inference.

A ``.qnemo`` checkpoint, similar to ``.nemo`` checkpoints, is a tar file that bundles the model configuration specified in the ``config.json`` file along with the ``rank{i}.safetensors`` files. These ``.safetensors`` files store the model weights for each rank individually. In addition, a ``tokenizer_config.yaml`` file is saved, containing only the tokenizer section from the original NeMo ``model_config.yaml`` file. This configuration file defines the tokenizer used by the given model.

When working with large quantized LLMs, it is recommended that you leave the checkpoint uncompressed as a directory rather than a tar file. You can control this behavior by setting the ``compress`` flag when exporting quantized models in `PTQ configuration file <https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/conf/megatron_gpt_ptq.yaml>`_.

The following example shows the contents of a quantized model intended to be served using two GPUs (ranks):

.. code-block:: bash
model-qnemo
├── config.json
├── rank0.safetensors
├── rank1.safetensors
├── tokenizer.model
└── tokenizer_config.yaml
Community Checkpoint Converter
-----------------------------
We provide easy-to-use tools that enable users to convert community checkpoints into the NeMo format. These tools facilitate various operations, including resuming training, Supervised Fine-Tuning (SFT), Parameter-Efficient Fine-Tuning (PEFT), and deployment. For detailed instructions and guidelines, please refer to our documentation.

We offer comprehensive guides to assist both end users and developers:

- **User Guide**: Detailed steps on how to convert community model checkpoints for further training or deployment within NeMo. For more information, please see our :doc:`user_guide`.

- **Developer Guide**: Instructions for developers on how to implement converters for community model checkpoints, allowing for broader compatibility and integration within the NeMo ecosystem. For development details, refer to our :doc:`dev_guide`.

- **Megatron-LM Checkpoint Conversion**: NVIDIA NeMo and NVIDIA Megatron-LM share several foundational technologies. You can convert your GPT-style model checkpoints trained with Megatron-LM into the NeMo Framework using our scripts, see our :doc:`convert_mlm`.

.. toctree::
:maxdepth: 1
:caption: NeMo Checkpoints

dist_ckpt
user_guide
dev_guide
convert_mlm
File renamed without changes.
22 changes: 0 additions & 22 deletions docs/source/ckpt_converters/intro.rst

This file was deleted.

6 changes: 3 additions & 3 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,10 @@ For more information, browse the developer docs for your area of interest in the

.. toctree::
:maxdepth: 1
:caption: Community Model Converters
:name: CheckpointConverters
:caption: Model Checkpoints
:name: Checkpoints

ckpt_converters/intro
checkpoints/intro

.. toctree::
:maxdepth: 1
Expand Down

0 comments on commit c06e0ae

Please sign in to comment.