-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Quantizer for reusing in QAT #9276
Refactor Quantizer for reusing in QAT #9276
Conversation
bcbfb74
to
6d01179
Compare
8742cc7
to
16ceb91
Compare
16ceb91
to
901c88f
Compare
96fde3b
to
deba9b9
Compare
nemo/export/quantize/quantizer.py
Outdated
model_cfg.activations_checkpoint_method = None | ||
model_cfg.activations_checkpoint_granularity = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this disable of activation checkpointing as well? For the SFT model there is no accuracy regression even if we enable activation checkpointing plus that reduces GPU memory requirement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure regarding this, see also related examples in #9276 (comment). Maybe you could ask internally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the end it turns out that it's OK to remove setting these to None
?:
model_cfg.activations_checkpoint_method = None
model_cfg.activations_checkpoint_granularity = None
I hope it will not blow up in some large tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users always have the option to disable it in the yaml if they want to, but we should not restrict them to not be able to use it as it allows reducing GPU memory requirement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will set them as None in the yaml file so there is no difference than previous version
Could you demostrate/link how do you want to use that refactored |
Note that changes will need to be unstreamed to another project as well here https://github.com/NVIDIA/NeMo-Framework-Launcher/blob/main/launcher_scripts/conf/ptq/model/quantization.yaml. I think we'll need to chop this slightly |
Its used in #9276 |
We'll need to do follow up MRs to: |
138732e
to
7ed3d26
Compare
7ed3d26
to
b33daad
Compare
Signed-off-by: Keval Morabia <[email protected]>
b33daad
to
e16e82a
Compare
@janekl should I rename |
OK, good idea (same applies to Launcher ofc.) |
Signed-off-by: Keval Morabia <[email protected]>
7f9d4f2
to
28caca5
Compare
examples/nlp/language_modeling/conf/megatron_gpt_quantization.yaml
Outdated
Show resolved
Hide resolved
Signed-off-by: Keval Morabia <[email protected]>
* Refactor Quantizer for reusing in QAT Signed-off-by: Keval Morabia <[email protected]> * Address more reviewer comments Signed-off-by: Keval Morabia <[email protected]> * update yaml config Signed-off-by: Keval Morabia <[email protected]> --------- Signed-off-by: Keval Morabia <[email protected]>
* Refactor Quantizer for reusing in QAT Signed-off-by: Keval Morabia <[email protected]> * Address more reviewer comments Signed-off-by: Keval Morabia <[email protected]> * update yaml config Signed-off-by: Keval Morabia <[email protected]> --------- Signed-off-by: Keval Morabia <[email protected]>
* Refactor Quantizer for reusing in QAT Signed-off-by: Keval Morabia <[email protected]> * Address more reviewer comments Signed-off-by: Keval Morabia <[email protected]> * update yaml config Signed-off-by: Keval Morabia <[email protected]> --------- Signed-off-by: Keval Morabia <[email protected]>
What does this PR do ?
This PR refactors the Quantizer class used for PTQ so it can be reused for QAT as well (follow-up PR)
Note that functionally wise it is same as before
Collection: N/A
Changelog
Usage
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information