-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add section on ".qnemo" checkpoints #9503
Add section on ".qnemo" checkpoints #9503
Conversation
Signed-off-by: Jan Lasek <[email protected]>
docs/source/checkpoints/intro.rst
Outdated
|
||
NeMo also offers :doc:`Post-Training Quantization <../nlp/quantization>` workflow to convert regular ``.nemo`` models into a `TensorRT-LLM checkpoint <https://nvidia.github.io/TensorRT-LLM/architecture/checkpoint.html>`_ conventionally referred to as ``.qnemo`` checkpoints in NeMo. Such a checkpoint can be used with `NVIDIA TensorRT-LLM library <https://nvidia.github.io/TensorRT-LLM/index.html>`_ for efficient inference. | ||
|
||
Much as in the case of ``.nemo`` checkpoints, a ``.qnemo`` checkpoint is a tar file that bundles the model configuration given in ``config.json`` file and ``rank{i}.safetensors`` files storing model weights for each rank separately. Additionally a ``tokenizer_config.yaml`` file is saved which is just ``tokenizer`` section from ``model_config.yaml`` file from the original NeMo model. This configuration file defines a tokenizer for the model given. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.qnemo would not support distributed checkpoint format? i.e. you saved with world_size 2 and have to load with world_size 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to clarify is that these config.json
+ rank{i}.safetensors
output is a TRT-LLM checkpoint. This should not be confused by distributed checkpoint in Nemo sense.
Anyway, the feature you asked for is not available in TRT-LLM currently. So to build a TRT-LLM engine with world_size=2 one needs to calibrate/quantize model to TRT-LLM checkpoint with world_size=2 and provide this as the input to trtllm-build
command. In other words, world_size cannnot be changed at engine build.
@@ -20,6 +20,26 @@ With sharded model weights, you can save and load the state of your training scr | |||
|
|||
NeMo supports the distributed (sharded) checkpoint format from Megatron-Core. In Megatron-Core, it supports two backends: Zarr-based and PyTorch-based. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edits for 1 - 21.
Checkpoints
This section presents the key functionalities of NVIDIA NeMo that pertain to checkpoint management.
Understand Checkpoint Formats
A .nemo
checkpoint is essentially a tar file that combines various components of a trained model. These components include the model configurations (specified in a YAML file), the model weights, and other related artifacts such as tokenizer models or vocabulary files. This design simplifies tasks like sharing, loading, tuning, evaluating, and performing inference with the model.
On the other hand, the .ckpt
file, generated during PyTorch Lightning training, contains both the model weights and the optimizer states. It is typically used to resume training from a paused state.
Sharded Model Weights
In both .nemo
and .ckpt
checkpoints, the model weights can be saved in either a regular format (as a single file named model_weights.ckpt
within model parallelism folders) or a sharded format (where they are stored in a folder called model_weights
).
Sharded model weights allow you to efficiently save and load the state of your training script across multiple GPUs or nodes. This approach avoids the necessity to modify model partitions when resuming tuning with a different model parallelism setup.
NeMo supports the distributed (sharded) checkpoint format from Megatron Core. In Megatron Core, there are two supported backends: Zarr-based and PyTorch-based.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yaoyu-33 that would be sth for you to account for in the destination branch yuya/add_checkpoints_section
├── rank1.safetensors | ||
├── tokenizer.model | ||
└── tokenizer_config.yaml | ||
|
||
Community Checkpoint Converter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edits to 45-47
NVIDIA provides easy-to-use tools that enable users to convert community checkpoints into the NeMo format. These tools facilitate various operations, including resuming training, Supervised Fine-tuning (SFT), Parameter Efficient Fine-Tuning (PEFT), and deployment. Please consult our documentation for detailed instructions and guidelines. We provide comprehensive guides to assist both end users and developers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Jan Lasek <[email protected]>
* Add checkpoints section Signed-off-by: yaoyu-33 <[email protected]> * Fix title Signed-off-by: yaoyu-33 <[email protected]> * update Signed-off-by: yaoyu-33 <[email protected]> * Add section on ".qnemo" checkpoints (#9503) * Add 'Quantized Checkpoints' section Signed-off-by: Jan Lasek <[email protected]> * Address review comments Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * Distributed checkpointing user guide (#9494) * Describe shardings and entrypoints Signed-off-by: Mikołaj Błaż <[email protected]> * Strategies, optimizers, finalize entrypoints Signed-off-by: Mikołaj Błaż <[email protected]> * Transformations Signed-off-by: Mikołaj Błaż <[email protected]> * Integration Signed-off-by: Mikołaj Błaż <[email protected]> * Add link from intro Signed-off-by: Mikołaj Błaż <[email protected]> * Apply grammar suggestions Signed-off-by: Mikołaj Błaż <[email protected]> * Explain the example Signed-off-by: Mikołaj Błaż <[email protected]> * Apply review suggestions Signed-off-by: Mikołaj Błaż <[email protected]> * Add zarr and torch_dist explanation --------- Signed-off-by: Mikołaj Błaż <[email protected]> * add subsection Signed-off-by: yaoyu-33 <[email protected]> * Update docs/source/checkpoints/intro.rst Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Yu Yao <[email protected]> * address comments Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> * fix code block Signed-off-by: yaoyu-33 <[email protected]> * address comments Signed-off-by: yaoyu-33 <[email protected]> * formatting Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> --------- Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Yu Yao <[email protected]> Co-authored-by: Jan Lasek <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Chen Cui <[email protected]>
* Add checkpoints section Signed-off-by: yaoyu-33 <[email protected]> * Fix title Signed-off-by: yaoyu-33 <[email protected]> * update Signed-off-by: yaoyu-33 <[email protected]> * Add section on ".qnemo" checkpoints (#9503) * Add 'Quantized Checkpoints' section Signed-off-by: Jan Lasek <[email protected]> * Address review comments Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * Distributed checkpointing user guide (#9494) * Describe shardings and entrypoints Signed-off-by: Mikołaj Błaż <[email protected]> * Strategies, optimizers, finalize entrypoints Signed-off-by: Mikołaj Błaż <[email protected]> * Transformations Signed-off-by: Mikołaj Błaż <[email protected]> * Integration Signed-off-by: Mikołaj Błaż <[email protected]> * Add link from intro Signed-off-by: Mikołaj Błaż <[email protected]> * Apply grammar suggestions Signed-off-by: Mikołaj Błaż <[email protected]> * Explain the example Signed-off-by: Mikołaj Błaż <[email protected]> * Apply review suggestions Signed-off-by: Mikołaj Błaż <[email protected]> * Add zarr and torch_dist explanation --------- Signed-off-by: Mikołaj Błaż <[email protected]> * add subsection Signed-off-by: yaoyu-33 <[email protected]> * Update docs/source/checkpoints/intro.rst Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Yu Yao <[email protected]> * address comments Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> * fix code block Signed-off-by: yaoyu-33 <[email protected]> * address comments Signed-off-by: yaoyu-33 <[email protected]> * formatting Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> --------- Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Yu Yao <[email protected]> Co-authored-by: Jan Lasek <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Tugrul Konuk <[email protected]>
* Add checkpoints section Signed-off-by: yaoyu-33 <[email protected]> * Fix title Signed-off-by: yaoyu-33 <[email protected]> * update Signed-off-by: yaoyu-33 <[email protected]> * Add section on ".qnemo" checkpoints (NVIDIA#9503) * Add 'Quantized Checkpoints' section Signed-off-by: Jan Lasek <[email protected]> * Address review comments Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * Distributed checkpointing user guide (NVIDIA#9494) * Describe shardings and entrypoints Signed-off-by: Mikołaj Błaż <[email protected]> * Strategies, optimizers, finalize entrypoints Signed-off-by: Mikołaj Błaż <[email protected]> * Transformations Signed-off-by: Mikołaj Błaż <[email protected]> * Integration Signed-off-by: Mikołaj Błaż <[email protected]> * Add link from intro Signed-off-by: Mikołaj Błaż <[email protected]> * Apply grammar suggestions Signed-off-by: Mikołaj Błaż <[email protected]> * Explain the example Signed-off-by: Mikołaj Błaż <[email protected]> * Apply review suggestions Signed-off-by: Mikołaj Błaż <[email protected]> * Add zarr and torch_dist explanation --------- Signed-off-by: Mikołaj Błaż <[email protected]> * add subsection Signed-off-by: yaoyu-33 <[email protected]> * Update docs/source/checkpoints/intro.rst Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Yu Yao <[email protected]> * address comments Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> * fix code block Signed-off-by: yaoyu-33 <[email protected]> * address comments Signed-off-by: yaoyu-33 <[email protected]> * formatting Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> --------- Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Yu Yao <[email protected]> Co-authored-by: Jan Lasek <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Chen Cui <[email protected]>
* Add checkpoints section Signed-off-by: yaoyu-33 <[email protected]> * Fix title Signed-off-by: yaoyu-33 <[email protected]> * update Signed-off-by: yaoyu-33 <[email protected]> * Add section on ".qnemo" checkpoints (#9503) * Add 'Quantized Checkpoints' section Signed-off-by: Jan Lasek <[email protected]> * Address review comments Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * Distributed checkpointing user guide (#9494) * Describe shardings and entrypoints Signed-off-by: Mikołaj Błaż <[email protected]> * Strategies, optimizers, finalize entrypoints Signed-off-by: Mikołaj Błaż <[email protected]> * Transformations Signed-off-by: Mikołaj Błaż <[email protected]> * Integration Signed-off-by: Mikołaj Błaż <[email protected]> * Add link from intro Signed-off-by: Mikołaj Błaż <[email protected]> * Apply grammar suggestions Signed-off-by: Mikołaj Błaż <[email protected]> * Explain the example Signed-off-by: Mikołaj Błaż <[email protected]> * Apply review suggestions Signed-off-by: Mikołaj Błaż <[email protected]> * Add zarr and torch_dist explanation --------- Signed-off-by: Mikołaj Błaż <[email protected]> * add subsection Signed-off-by: yaoyu-33 <[email protected]> * Update docs/source/checkpoints/intro.rst Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Yu Yao <[email protected]> * address comments Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> * fix code block Signed-off-by: yaoyu-33 <[email protected]> * address comments Signed-off-by: yaoyu-33 <[email protected]> * formatting Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> --------- Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Yu Yao <[email protected]> Co-authored-by: Jan Lasek <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]>
* Add checkpoints section Signed-off-by: yaoyu-33 <[email protected]> * Fix title Signed-off-by: yaoyu-33 <[email protected]> * update Signed-off-by: yaoyu-33 <[email protected]> * Add section on ".qnemo" checkpoints (NVIDIA#9503) * Add 'Quantized Checkpoints' section Signed-off-by: Jan Lasek <[email protected]> * Address review comments Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * Distributed checkpointing user guide (NVIDIA#9494) * Describe shardings and entrypoints Signed-off-by: Mikołaj Błaż <[email protected]> * Strategies, optimizers, finalize entrypoints Signed-off-by: Mikołaj Błaż <[email protected]> * Transformations Signed-off-by: Mikołaj Błaż <[email protected]> * Integration Signed-off-by: Mikołaj Błaż <[email protected]> * Add link from intro Signed-off-by: Mikołaj Błaż <[email protected]> * Apply grammar suggestions Signed-off-by: Mikołaj Błaż <[email protected]> * Explain the example Signed-off-by: Mikołaj Błaż <[email protected]> * Apply review suggestions Signed-off-by: Mikołaj Błaż <[email protected]> * Add zarr and torch_dist explanation --------- Signed-off-by: Mikołaj Błaż <[email protected]> * add subsection Signed-off-by: yaoyu-33 <[email protected]> * Update docs/source/checkpoints/intro.rst Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Yu Yao <[email protected]> * address comments Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> * fix code block Signed-off-by: yaoyu-33 <[email protected]> * address comments Signed-off-by: yaoyu-33 <[email protected]> * formatting Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> --------- Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Yu Yao <[email protected]> Co-authored-by: Jan Lasek <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Malay Nagda <[email protected]>
* Add checkpoints section Signed-off-by: yaoyu-33 <[email protected]> * Fix title Signed-off-by: yaoyu-33 <[email protected]> * update Signed-off-by: yaoyu-33 <[email protected]> * Add section on ".qnemo" checkpoints (#9503) * Add 'Quantized Checkpoints' section Signed-off-by: Jan Lasek <[email protected]> * Address review comments Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * Distributed checkpointing user guide (#9494) * Describe shardings and entrypoints Signed-off-by: Mikołaj Błaż <[email protected]> * Strategies, optimizers, finalize entrypoints Signed-off-by: Mikołaj Błaż <[email protected]> * Transformations Signed-off-by: Mikołaj Błaż <[email protected]> * Integration Signed-off-by: Mikołaj Błaż <[email protected]> * Add link from intro Signed-off-by: Mikołaj Błaż <[email protected]> * Apply grammar suggestions Signed-off-by: Mikołaj Błaż <[email protected]> * Explain the example Signed-off-by: Mikołaj Błaż <[email protected]> * Apply review suggestions Signed-off-by: Mikołaj Błaż <[email protected]> * Add zarr and torch_dist explanation --------- Signed-off-by: Mikołaj Błaż <[email protected]> * add subsection Signed-off-by: yaoyu-33 <[email protected]> * Update docs/source/checkpoints/intro.rst Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Yu Yao <[email protected]> * address comments Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> * fix code block Signed-off-by: yaoyu-33 <[email protected]> * address comments Signed-off-by: yaoyu-33 <[email protected]> * formatting Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> --------- Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Yu Yao <[email protected]> Co-authored-by: Jan Lasek <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Chen Cui <[email protected]>
* Add checkpoints section Signed-off-by: yaoyu-33 <[email protected]> * Fix title Signed-off-by: yaoyu-33 <[email protected]> * update Signed-off-by: yaoyu-33 <[email protected]> * Add section on ".qnemo" checkpoints (NVIDIA#9503) * Add 'Quantized Checkpoints' section Signed-off-by: Jan Lasek <[email protected]> * Address review comments Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * Distributed checkpointing user guide (NVIDIA#9494) * Describe shardings and entrypoints Signed-off-by: Mikołaj Błaż <[email protected]> * Strategies, optimizers, finalize entrypoints Signed-off-by: Mikołaj Błaż <[email protected]> * Transformations Signed-off-by: Mikołaj Błaż <[email protected]> * Integration Signed-off-by: Mikołaj Błaż <[email protected]> * Add link from intro Signed-off-by: Mikołaj Błaż <[email protected]> * Apply grammar suggestions Signed-off-by: Mikołaj Błaż <[email protected]> * Explain the example Signed-off-by: Mikołaj Błaż <[email protected]> * Apply review suggestions Signed-off-by: Mikołaj Błaż <[email protected]> * Add zarr and torch_dist explanation --------- Signed-off-by: Mikołaj Błaż <[email protected]> * add subsection Signed-off-by: yaoyu-33 <[email protected]> * Update docs/source/checkpoints/intro.rst Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Yu Yao <[email protected]> * address comments Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> * fix code block Signed-off-by: yaoyu-33 <[email protected]> * address comments Signed-off-by: yaoyu-33 <[email protected]> * formatting Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> --------- Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Yu Yao <[email protected]> Co-authored-by: Jan Lasek <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Hainan Xu <[email protected]>
* Add checkpoints section Signed-off-by: yaoyu-33 <[email protected]> * Fix title Signed-off-by: yaoyu-33 <[email protected]> * update Signed-off-by: yaoyu-33 <[email protected]> * Add section on ".qnemo" checkpoints (NVIDIA#9503) * Add 'Quantized Checkpoints' section Signed-off-by: Jan Lasek <[email protected]> * Address review comments Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * Distributed checkpointing user guide (NVIDIA#9494) * Describe shardings and entrypoints Signed-off-by: Mikołaj Błaż <[email protected]> * Strategies, optimizers, finalize entrypoints Signed-off-by: Mikołaj Błaż <[email protected]> * Transformations Signed-off-by: Mikołaj Błaż <[email protected]> * Integration Signed-off-by: Mikołaj Błaż <[email protected]> * Add link from intro Signed-off-by: Mikołaj Błaż <[email protected]> * Apply grammar suggestions Signed-off-by: Mikołaj Błaż <[email protected]> * Explain the example Signed-off-by: Mikołaj Błaż <[email protected]> * Apply review suggestions Signed-off-by: Mikołaj Błaż <[email protected]> * Add zarr and torch_dist explanation --------- Signed-off-by: Mikołaj Błaż <[email protected]> * add subsection Signed-off-by: yaoyu-33 <[email protected]> * Update docs/source/checkpoints/intro.rst Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Yu Yao <[email protected]> * address comments Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> * fix code block Signed-off-by: yaoyu-33 <[email protected]> * address comments Signed-off-by: yaoyu-33 <[email protected]> * formatting Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> * fix Signed-off-by: yaoyu-33 <[email protected]> --------- Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Yu Yao <[email protected]> Co-authored-by: Jan Lasek <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Chen Cui <[email protected]>
What does this PR do ?
Add section on ".qnemo" checkpoints to #9329.
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use this
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information