-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Packed sequence bug fixes #10898
Merged
Merged
Packed sequence bug fixes #10898
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Signed-off-by: artbataev <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
…ally changing config Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
8 tasks
Signed-off-by: Chen Cui <[email protected]>
pablo-garay
reviewed
Oct 17, 2024
pablo-garay
reviewed
Oct 17, 2024
pablo-garay
previously approved these changes
Oct 17, 2024
It looks like github action uses a container with an older version of TE, so the new checkpoint i added couldn't be loaded. I'll revert to the old checkpoints for now |
Signed-off-by: Chen Cui <[email protected]>
[🤖]: Hi @cuichenx 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully So it might be time to merge this PR or get some approvals I'm just a bot so I'll leave it you what to do next. //cc @pablo-garay @ko3n1g |
pablo-garay
approved these changes
Oct 18, 2024
cuichenx
added a commit
that referenced
this pull request
Oct 18, 2024
(cherry picked from commit 76352fb) Signed-off-by: Chen Cui <[email protected]>
nithinraok
pushed a commit
that referenced
this pull request
Oct 18, 2024
* save prepared dataset to different folders according to tokenizer name Signed-off-by: Chen Cui <[email protected]> * fix hang Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * fix hang Signed-off-by: Chen Cui <[email protected]> * raise mbs>1 error and provide suggestion to user instead of automatically changing config Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * add ci for packed seq Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fix bug Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: cuichenx <[email protected]> Co-authored-by: artbataev <[email protected]>
akoumpa
pushed a commit
that referenced
this pull request
Oct 22, 2024
…10893) * Packed Sequence [NeMo 2] (#10445) * initial commit Signed-off-by: Chen Cui <[email protected]> * seq length bug fix Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * support online concat for mbs>1 Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * switch to updating packed seq len with a warning message Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * add header Signed-off-by: Chen Cui <[email protected]> * add docstrings Signed-off-by: Chen Cui <[email protected]> * fix issue with ssm model Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Co-authored-by: cuichenx <[email protected]> (cherry picked from commit 58e4cc9) * Packed sequence bug fixes (#10898) (cherry picked from commit 76352fb) Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fix peft resume (#10887) Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Co-authored-by: cuichenx <[email protected]>
artbataev
added a commit
to artbataev/NeMo
that referenced
this pull request
Oct 22, 2024
* save prepared dataset to different folders according to tokenizer name Signed-off-by: Chen Cui <[email protected]> * fix hang Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * fix hang Signed-off-by: Chen Cui <[email protected]> * raise mbs>1 error and provide suggestion to user instead of automatically changing config Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * add ci for packed seq Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fix bug Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: cuichenx <[email protected]> Co-authored-by: artbataev <[email protected]>
akoumpa
pushed a commit
that referenced
this pull request
Oct 24, 2024
* save prepared dataset to different folders according to tokenizer name Signed-off-by: Chen Cui <[email protected]> * fix hang Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * fix hang Signed-off-by: Chen Cui <[email protected]> * raise mbs>1 error and provide suggestion to user instead of automatically changing config Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * add ci for packed seq Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fix bug Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: cuichenx <[email protected]> Co-authored-by: artbataev <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]>
yashaswikarnati
pushed a commit
that referenced
this pull request
Oct 24, 2024
* save prepared dataset to different folders according to tokenizer name Signed-off-by: Chen Cui <[email protected]> * fix hang Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * fix hang Signed-off-by: Chen Cui <[email protected]> * raise mbs>1 error and provide suggestion to user instead of automatically changing config Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * add ci for packed seq Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fix bug Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: cuichenx <[email protected]> Co-authored-by: artbataev <[email protected]>
titu1994
pushed a commit
that referenced
this pull request
Oct 28, 2024
* save prepared dataset to different folders according to tokenizer name Signed-off-by: Chen Cui <[email protected]> * fix hang Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * fix hang Signed-off-by: Chen Cui <[email protected]> * raise mbs>1 error and provide suggestion to user instead of automatically changing config Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * add ci for packed seq Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fix bug Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: cuichenx <[email protected]> Co-authored-by: artbataev <[email protected]>
hainan-xv
pushed a commit
to hainan-xv/NeMo
that referenced
this pull request
Nov 5, 2024
* save prepared dataset to different folders according to tokenizer name Signed-off-by: Chen Cui <[email protected]> * fix hang Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * fix hang Signed-off-by: Chen Cui <[email protected]> * raise mbs>1 error and provide suggestion to user instead of automatically changing config Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * add ci for packed seq Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fix bug Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: cuichenx <[email protected]> Co-authored-by: artbataev <[email protected]> Signed-off-by: Hainan Xu <[email protected]>
HuiyingLi
pushed a commit
to HuiyingLi/NeMo
that referenced
this pull request
Nov 15, 2024
* save prepared dataset to different folders according to tokenizer name Signed-off-by: Chen Cui <[email protected]> * fix hang Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * fix hang Signed-off-by: Chen Cui <[email protected]> * raise mbs>1 error and provide suggestion to user instead of automatically changing config Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * add ci for packed seq Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fix bug Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: cuichenx <[email protected]> Co-authored-by: artbataev <[email protected]>
ericharper
added a commit
that referenced
this pull request
Nov 19, 2024
* nemo2-sft notebook initial draft Signed-off-by: HuiyingLi <[email protected]> * remove mixtral info Signed-off-by: HuiyingLi <[email protected]> * minor fixes Signed-off-by: HuiyingLi <[email protected]> * minor fixes Signed-off-by: HuiyingLi <[email protected]> * minor fixes Signed-off-by: HuiyingLi <[email protected]> * add import_ckpt script and minor changes Signed-off-by: HuiyingLi <[email protected]> * Random read for tarr files in lhotse dataloaders (#10536) * Random read for tarr files in lhotse dataloaders Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * Solve failled tests Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * Adding a testcase Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * Some changs in tests Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * removing import Signed-off-by: Nune <[email protected]> --------- Signed-off-by: Nune <[email protected]> Signed-off-by: nune-tadevosyan <[email protected]> Co-authored-by: nune-tadevosyan <[email protected]> * training code for hybrid-autoregressive inference model (#10841) * training code for hybrid-autoregressive inference model Signed-off-by: Hainan Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: hainan-xv <[email protected]> --------- Signed-off-by: Hainan Xu <[email protected]> Signed-off-by: hainan-xv <[email protected]> Co-authored-by: Hainan Xu <[email protected]> Co-authored-by: hainan-xv <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 772faca ! (#10871) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> * Use trainer.local_rank/global_rank (#10860) * fix global_rank calculation Signed-off-by: Alexandros Koumparoulis <[email protected]> * use trainer's global/local rank Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove stacking operation from batched functions (#10524) * remove stacking operations Signed-off-by: lilithgrigoryan <[email protected]> * fixes im base class Signed-off-by: lilithgrigoryan <[email protected]> * clean up Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * remove potentially uninitialized local variable Signed-off-by: lilithgrigoryan <[email protected]> * restore batch_intilize states funcname Signed-off-by: lilithgrigoryan <[email protected]> * fix typo Signed-off-by: lilithgrigoryan <[email protected]> * fix potentially uninitialized local variable Signed-off-by: lilithgrigoryan <[email protected]> * fix potentially uninitialized local variable in stateless transduser Signed-off-by: lilithgrigoryan <[email protected]> * fix test Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * fix docstring, rm comment Signed-off-by: lilithgrigoryan <[email protected]> * fix dosctrings Signed-off-by: lilithgrigoryan <[email protected]> --------- Signed-off-by: lilithgrigoryan <[email protected]> Signed-off-by: lilithgrigoryan <[email protected]> Co-authored-by: lilithgrigoryan <[email protected]> Co-authored-by: lilithgrigoryan <[email protected]> * [NeMo-UX] Add llm.generate to nemo.collections.llm (#10471) * Add llm.generate Signed-off-by: Hemil Desai <[email protected]> * Remove comment Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix launching with python Signed-off-by: Hemil Desai <[email protected]> * PR feedback Signed-off-by: Hemil Desai <[email protected]> * PR feedback Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Add assert cp Signed-off-by: Hemil Desai <[email protected]> * Add example script Signed-off-by: Hemil Desai <[email protected]> * Fix Signed-off-by: Hemil Desai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: hemildesai <[email protected]> * Adding support for LightningDataModule inside Fabric-API (#10879) * Make FabricMegatronMixedPrecision match MegatronMixedPrecision Signed-off-by: Marc Romeijn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> * Supporting DataModule in fabric-API Signed-off-by: Marc Romeijn <[email protected]> * Adding support for LightningDataModule inside Fabric-API Signed-off-by: Marc Romeijn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> * Remove import in mock.py Signed-off-by: Marc Romeijn <[email protected]> --------- Signed-off-by: Marc Romeijn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> * initial draft Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Save yaml config for model in nemo.lightning.io (#10765) * Save yaml config for model in nemo.lightning.io Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix bug Signed-off-by: Hemil Desai <[email protected]> * Fix bug Signed-off-by: Hemil Desai <[email protected]> * fix bug Signed-off-by: Hemil Desai <[email protected]> * Add explicit yaml comparison Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * relax test Signed-off-by: Hemil Desai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: hemildesai <[email protected]> * Move collectiob.nlp imports inline for t5 (#10877) * Move collectiob.nlp imports inline for t5 Signed-off-by: Marc Romeyn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> --------- Signed-off-by: Marc Romeyn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> * add world_size/pp_size runtime check (#10842) * add world_size/pp_size runtime check Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix msg precision Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix test_init_parallel_ranks ws=3 pp=3 Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix peft resume (#10887) Signed-off-by: Chen Cui <[email protected]> * Update engine build step for TRT-LLM 0.13.0 (#10880) * Setting use_fused_mlp for TRT-LLM >= 0.13.0 Signed-off-by: Jan Lasek <[email protected]> * Unused import removal Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * Akoumparouli/nemo ux moe loss logging (#10128) * Move across pipeline loss reduction to a separate function Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add support for MoE loss logging Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove unused function Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * enable vboost and set LM SM margin (#10853) * enable vboost Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * env vars Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * add perf plugin Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> * revert default executor Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> * fix typo Signed-off-by: Jimmy Zhang <[email protected]> * fix more typo Signed-off-by: Jimmy Zhang <[email protected]> * ln margin knob Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> * specify lm margin Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> --------- Signed-off-by: Malay Nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: JimmyZhang12 <[email protected]> Co-authored-by: malay-nagda <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> * use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_k… (#10608) * use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_kwargs & overwrite device) Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Use torch sdpa implementation in ASR mha (#9590) * use pytorch sdpa Signed-off-by: WoodieDudy <[email protected]> * sdpa work Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: titu1994 <[email protected]> * sdpa flag to false & sdpa_backend arg Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * change arg name Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * fix config args Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * add condition on version Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * update condition on version Signed-off-by: WoodieDudy <[email protected]> * remove condition on torch version Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * move code to init Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * refactor Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * refactor Signed-off-by: WoodieDudy <[email protected]> --------- Signed-off-by: WoodieDudy <[email protected]> Signed-off-by: titu1994 <[email protected]> Signed-off-by: WoodieDudy <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: titu1994 <[email protected]> Co-authored-by: WoodieDudy <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * Add registry to register all needed classes with artifacts in nemo.lightning.io (#10861) * Add registry to register all needed classes with artifacts in nemo.lightning.io Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fixes Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix Signed-off-by: Hemil Desai <[email protected]> * comments Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Remove cyclic import Signed-off-by: Hemil Desai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: artbataev <[email protected]> * call __post_init__ after altering config values (#10885) * call __post_init__ after altering config values Signed-off-by: Alexandros Koumparoulis <[email protected]> * test fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * turn off SP Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> * Nemo 2.0 ckpt support in TRT-LLM export (#10891) * fix minor import bug Signed-off-by: Onur Yilmaz <[email protected]> * Add registry to register all needed classes with artifacts in nemo.lightning.io Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fixes Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix Signed-off-by: Hemil Desai <[email protected]> * nemo 2.0 support in export to trt-llm Signed-off-by: Onur Yilmaz <[email protected]> * get mixing from main Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * fix style Signed-off-by: Onur Yilmaz <[email protected]> --------- Signed-off-by: Onur Yilmaz <[email protected]> Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: oyilmaz-nvidia <[email protected]> Co-authored-by: Hemil Desai <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: oyilmaz-nvidia <[email protected]> * [Docs] Fix doc warnings, focus on feature and multimodal sections (#10171) * various simple docs source fixes Signed-off-by: Elena Rastorgueva <[email protected]> * fix docstrings and typing with forward reference Signed-off-by: Elena Rastorgueva <[email protected]> * Apply isort and black reformatting Signed-off-by: erastorgueva-nv <[email protected]> * fix typing forward reference for PromptedAudioToTextLhotseDataset Signed-off-by: Elena Rastorgueva <[email protected]> * fix feature warnings Signed-off-by: yaoyu-33 <[email protected]> * Try fix some model part errors Signed-off-by: yaoyu-33 <[email protected]> * try add requirements Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try add requirements Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix indent in docstring Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * update Signed-off-by: yaoyu-33 <[email protected]> * handle duplicate issue Signed-off-by: yaoyu-33 <[email protected]> * handle duplicate issue Signed-off-by: yaoyu-33 <[email protected]> * fix imagen cite * fix ratio issues Signed-off-by: yaoyu-33 <[email protected]> * fix Dreambooth Signed-off-by: yaoyu-33 <[email protected]> * Fix activation recomputation Signed-off-by: yaoyu-33 <[email protected]> * fix sequence packing Signed-off-by: yaoyu-33 <[email protected]> * fix asr_language_modeling_and_customization Signed-off-by: yaoyu-33 <[email protected]> * fixes wip Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: erastorgueva-nv <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Yu Yao <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Co-authored-by: erastorgueva-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Ao Tang <[email protected]> Co-authored-by: Huiying Li <[email protected]> * calculate step time batch end-batch end (#10202) * log step time at end Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * use nemo logging Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * cleanup Signed-off-by: Malay Nagda <[email protected]> * check remove Signed-off-by: Malay Nagda <[email protected]> * delta timing callback Signed-off-by: Malay Nagda <[email protected]> * comment and name change Signed-off-by: Malay Nagda <[email protected]> --------- Signed-off-by: Malay Nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Co-authored-by: malay-nagda <[email protected]> * late import prettytable (#10912) Signed-off-by: Maanu Grover <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 0d89fc4 ! (#10919) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Warning for missing FP8 checkpoint support for vLLM deployment (#10906) Signed-off-by: Jan Lasek <[email protected]> * Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10821) * Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10787) * Add lhotse fixes for rnnt model training and WER hanging issue with fuse batching Signed-off-by: Nithin Rao Koluguri <nithinraok> * Apply isort and black reformatting Signed-off-by: nithinraok <[email protected]> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: nithinraok <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: nithinraok <[email protected]> * Apply isort and black reformatting Signed-off-by: nithinraok <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: nithinraok <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: nithinraok <[email protected]> Co-authored-by: artbataev <[email protected]> * Fix ASR tests (#10794) * Make tests required Signed-off-by: Vladimir Bataev <[email protected]> * Debug torch.load issue Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Run only necessary tests Signed-off-by: Vladimir Bataev <[email protected]> * Try fix loading Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Avoid caching fixture Signed-off-by: Vladimir Bataev <[email protected]> * Try restore model several times Signed-off-by: Vladimir Bataev <[email protected]> * Try customize temporary directory Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Reorder tests Signed-off-by: Vladimir Bataev <[email protected]> * Disable one test Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Avoid xxlarge model Signed-off-by: Vladimir Bataev <[email protected]> * Disable test Signed-off-by: Vladimir Bataev <[email protected]> * Revert changes Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Magic fix Signed-off-by: Vladimir Bataev <[email protected]> * Revert unnecessary changes Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Disable all jobs except L0 Signed-off-by: Vladimir Bataev <[email protected]> * RNNT alignments - merge with unit tests Signed-off-by: Vladimir Bataev <[email protected]> * Fix CUDA graph frame-looping decoder to handle non-CUDA inputs Signed-off-by: Vladimir Bataev <[email protected]> * Fix config Signed-off-by: Vladimir Bataev <[email protected]> * Log test results Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Use less audio files for tests Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: artbataev <[email protected]> * Integrating mcore export (#10238) * Integrating mcore export * Integrating mcore export * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Move trt imports in nemo.collections.llm inside respective functions (#10234) Signed-off-by: Hemil Desai <[email protected]> * Add tests for LazyNeMoIterator and fix case with metadata_only=True and offsets in manifest (#10198) * Add tests for LazyNeMoIterator and fix case with manifest_only=True and offsets in manifest Signed-off-by: Piotr Żelasko <[email protected]> * Address code review Signed-off-by: Piotr Żelasko <[email protected]> * fix tests Signed-off-by: Piotr Żelasko <[email protected]> * fix tests Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> * [NeMo-UX] Fix a serialization bug that prevents users from moving checkpoints (#9939) * perfor serialization using relative paths to allow users to move checkpoints after they're saved Signed-off-by: ashors1 <[email protected]> * Apply isort and black reformatting Signed-off-by: ashors1 <[email protected]> * remove unused import Signed-off-by: ashors1 <[email protected]> * fix artifact load Signed-off-by: ashors1 <[email protected]> * fix path artifact Signed-off-by: ashors1 <[email protected]> * remove unused import Signed-off-by: ashors1 <[email protected]> --------- Signed-off-by: ashors1 <[email protected]> Signed-off-by: ashors1 <[email protected]> Co-authored-by: ashors1 <[email protected]> * Add MemoryProfileCallback (#10166) * Add MemoryProfileCallback Signed-off-by: Shriya Palsamudram <[email protected]> * Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> * Remove reference cycles, save snapshot on specific ranks Signed-off-by: Shriya Palsamudram <[email protected]> * Remove unnecessary imports Signed-off-by: Shriya Palsamudram <[email protected]> * Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> * Update docstring Signed-off-by: Shriya Palsamudram <[email protected]> --------- Signed-off-by: Shriya Palsamudram <[email protected]> Signed-off-by: ShriyaPalsamudram <[email protected]> Signed-off-by: Shriya Rishab <[email protected]> Co-authored-by: ShriyaPalsamudram <[email protected]> * Lower bound transformers to support nemotron (#10240) Signed-off-by: Dong Hyuk Chang <[email protected]> Co-authored-by: Dong Hyuk Chang <[email protected]> * [Audio] SSL Pretraining framework for flow-matching model for audio processing (#10052) Flow matching generative model with SSL pretraining framework Signed-off-by: Pin-Jui Ku <[email protected]> Co-authored-by: Kuray107 <[email protected]> * Revert torchrun fix for model import (#10251) Signed-off-by: Alexandros Koumparoulis <[email protected]> * [NeMo-UX[ Move nemotron imports inline (#10255) * Move nemotron transformers + tokenizer imports inline to reduce number of required deps Signed-off-by: Marc Romeyn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> --------- Signed-off-by: Marc Romeyn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> * Wrap CPU model init with megatron_lazy_init_context (#10219) * Wrap CPU model init with megatron_lazy_init_context Signed-off-by: Alexandros Koumparoulis <[email protected]> * Cleanup checkpoint-dir if saving fails Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Bump `Dockerfile.ci` (2024-08-22) (#10227) * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 124bcff ! Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix bert flags Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Oliver Koenig <[email protected]> Co-authored-by: pablo-garay <[email protected]> * salm export trtllm (#10245) Signed-off-by: slyne deng <[email protected]> Co-authored-by: slyne deng <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to ef85bc9 ! (#10250) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 01ca03f ! (#10266) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: oliver könig <[email protected]> Co-authored-by: pablo-garay <[email protected]> * Load model in the target export precision by default in PTQ (#10267) * Load model in the target export precision by default Signed-off-by: Jan Lasek <[email protected]> * Enable megatron_amp_O2=true to actually use half-precision Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Jan Lasek <[email protected]> * Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins (#10223) * Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Remove duplicate Signed-off-by: Hemil Desai <[email protected]> * Add entity to wandb logger Signed-off-by: Hemil Desai <[email protected]> * Add documentation Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Add warning Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * PR feedback Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Add comments Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: hemildesai <[email protected]> * [NeMo-UX] Handle absolute logger directories in nemo_logger (#10259) * handle absolute and relative logger directories Signed-off-by: Anna Shors <[email protected]> * merge lines Signed-off-by: ashors1 <[email protected]> --------- Signed-off-by: Anna Shors <[email protected]> Signed-off-by: ashors1 <[email protected]> * Add sdxl notebook (#10139) * Add sdxl notebook Signed-off-by: mingyuanm <[email protected]> * Rename Signed-off-by: mingyuanm <[email protected]> * final Update SDXL notebook Signed-off-by: mingyuanm <[email protected]> --------- Signed-off-by: mingyuanm <[email protected]> * Updating some coments * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Updating some coments * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Updating some coments * Small change * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * ADD support for layernorm1p * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Update Dockerfile.ci Signed-off-by: Shanmugam Ramasamy <[email protected]> * Update Dockerfile.ci Signed-off-by: Shanmugam Ramasamy <[email protected]> * Update Dockerfile.ci Signed-off-by: Shanmugam Ramasamy <[email protected]> --------- Signed-off-by: shanmugamr1992 <[email protected]> Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ashors1 <[email protected]> Signed-off-by: ashors1 <[email protected]> Signed-off-by: Shriya Palsamudram <[email protected]> Signed-off-by: ShriyaPalsamudram <[email protected]> Signed-off-by: Shriya Rishab <[email protected]> Signed-off-by: Dong Hyuk Chang <[email protected]> Signed-off-by: Pin-Jui Ku <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Signed-off-by: akoumpa <[email protected]> Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: slyne deng <[email protected]> Signed-off-by: oliver könig <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: Anna Shors <[email protected]> Signed-off-by: mingyuanm <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: shanmugamr1992 <[email protected]> Co-authored-by: Hemil Desai <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Anna Shors <[email protected]> Co-authored-by: ashors1 <[email protected]> Co-authored-by: Shriya Rishab <[email protected]> Co-authored-by: ShriyaPalsamudram <[email protected]> Co-authored-by: Dong Hyuk Chang <[email protected]> Co-authored-by: Dong Hyuk Chang <[email protected]> Co-authored-by: Kuray107 <[email protected]> Co-authored-by: Kuray107 <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: oliver könig <[email protected]> Co-authored-by: pablo-garay <[email protected]> Co-authored-by: Slyne Deng <[email protected]> Co-authored-by: slyne deng <[email protected]> Co-authored-by: Jan Lasek <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: Ming <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> * Fix artifact saving (#10914) Signed-off-by: Hemil Desai <[email protected]> * Lora improvement (#10918) * pull out freeze model Signed-off-by: Chen Cui <[email protected]> * add wildcard match to lora target modules Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> * Huvu/t5 nemo2.0 peft (#10916) * adding peft test and cicd * add setting mcore model to train in peft.py * adding test for T5 lora * fix follow Chen's fix * restore cicd-main.yml --------- Co-authored-by: Huy Vu2 <[email protected]> * Add tie_word_embeddings=True (#10710) Signed-off-by: Yoshi Suhara <[email protected]> * Use a context-manager when opening files (#10895) * Use a context-manager when opening files Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: artbataev <[email protected]> * long context performance numbers in doc (#10784) * long context perf Signed-off-by: Youngeun Kwon <[email protected]> * update the long context perf Signed-off-by: Youngeun Kwon <[email protected]> * Akoumparouli/mcore microbatch calculator fix (#10780) * move tests/lightning/{,_}io Signed-off-by: Alexandros Koumparoulis <[email protected]> * add microbatch calculator context manager Signed-off-by: Alexandros Koumparoulis <[email protected]> * use microbatch calculator context manager Signed-off-by: Alexandros Koumparoulis <[email protected]> * add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove unused var Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * remove 8x3b recipes (#10764) * remove 8x3b recipes Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove 8x3b from test_nemo_run Signed-off-by: Alexandros Koumparoulis <[email protected]> * rm from __init__ Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * change the figure file name Signed-off-by: Youngeun Kwon <[email protected]> * Accommodating the reviewer's comment Signed-off-by: Youngeun Kwon <[email protected]> * update the y-axis title Signed-off-by: Youngeun Kwon <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 3f90b98 ! (#10789) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * Add ModelOpt transformer model pruning example for Llama models, default to llama3.1-8b-base (#10294) * Add ModelOpt transformer model pruning example for Llama3 model Signed-off-by: Shengliang Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: shengliangxu <[email protected]> Signed-off-by: Shengliang Xu <[email protected]> * examples code is at wrong dir, move them Signed-off-by: Shengliang Xu <[email protected]> * changes as suggested in comment remove some logging and unused config code, update example model to llama3.1 Signed-off-by: Shengliang Xu <[email protected]> * Add pruning of hidden_size into example Signed-off-by: Shengliang Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: shengliangxu <[email protected]> Signed-off-by: Shengliang Xu <[email protected]> * Update examples/nlp/language_modeling/conf/megatron_gpt_prune.yaml Signed-off-by: Keval Morabia <[email protected]> * Add pruning test to cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> --------- Signed-off-by: Shengliang Xu <[email protected]> Signed-off-by: shengliangxu <[email protected]> Signed-off-by: Keval Morabia <[email protected]> Co-authored-by: shengliangxu <[email protected]> Co-authored-by: Keval Morabia <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * Update mamba.rst after dist ckpt addition (#10800) Signed-off-by: Ali Taghibakhshi <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * fix chunked infer (#10581) Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * fix state transform (#10728) Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * use ckpt_to_weights_subdir in restore (#10786) * use ckpt_to_weights_subdir in restore Signed-off-by: Alexandros Koumparoulis <[email protected]> * make ckpt_to_{weight,context}_subdir idempotent Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * Mixtral set seq_length=4k (#10704) * enable SP & set seq_lenght=4k Signed-off-by: Alexandros Koumparoulis <[email protected]> * update test expected values Signed-off-by: Alexandros Koumparoulis <[email protected]> * 8x22b 4k Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * Fix for crashes with tensorboard_logger=false and VP + LoRA (#10792) * Fix for crashes with tensorboard_logger=false and virtual pipeline parallel + LoRA Signed-off-by: Valerie Sarge <[email protected]> * Apply isort and black reformatting Signed-off-by: vysarge <[email protected]> --------- Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: vysarge <[email protected]> Co-authored-by: vysarge <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * Disable checkpoint conversion inside AutoResume (#10645) * Disable checkpoint conversion inside AutoResume Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Update resume docstrings Signed-off-by: Hemil Desai <[email protected]> * fix Signed-off-by: Hemil Desai <[email protected]> * add default finetuning recipe and refactor llama3 8b recipe Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * address comment Signed-off-by: Chen Cui <[email protected]> * refactor other recipes Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * remove 8x3b finetuning recipe for now because HF version not available Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * adjust unit tests based on recipe fixes Signed-off-by: Chen Cui <[email protected]> * fix failed unit test Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: cuichenx <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * replace png file to github assets Signed-off-by: Youngeun Kwon <[email protected]> * change image url to github release Signed-off-by: Youngeun Kwon <[email protected]> --------- Signed-off-by: Youngeun Kwon <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Shengliang Xu <[email protected]> Signed-off-by: shengliangxu <[email protected]> Signed-off-by: Keval Morabia <[email protected]> Signed-off-by: Ali Taghibakhshi <[email protected]> Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: vysarge <[email protected]> Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: cuichenx <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: oliver könig <[email protected]> Co-authored-by: pablo-garay <[email protected]> Co-authored-by: Shengliang Xu <[email protected]> Co-authored-by: shengliangxu <[email protected]> Co-authored-by: Keval Morabia <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: He Huang (Steve) <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: vysarge <[email protected]> Co-authored-by: Hemil Desai <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: cuichenx <[email protected]> * perf recipes and Mcore DistOpt params (#10883) * 175b gpt3 recipe Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * dist opt params Signed-off-by: Malay Nagda <[email protected]> * 405b dist opt params Signed-off-by: Malay Nagda <[email protected]> * perf recipes and dist opt params Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * MoE dist opt params Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * gpt bias fusion params Signed-off-by: Malay Nagda <[email protected]> * 175b recipe Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * perf params comments Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * MoE perf params comments Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * perf recipes suffix Signed-off-by: Malay Nagda <[email protected]> * specific models fusion params Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> --------- Signed-off-by: Malay Nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Co-authored-by: malay-nagda <[email protected]> * ci: Fix cherry pick team (#10945) Signed-off-by: Oliver Koenig <[email protected]> * Packed sequence bug fixes (#10898) * save prepared dataset to different folders according to tokenizer name Signed-off-by: Chen Cui <[email protected]> * fix hang Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * fix hang Signed-off-by: Chen Cui <[email protected]> * raise mbs>1 error and provide suggestion to user instead of automatically changing config Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * add ci for packed seq Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fix bug Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: cuichenx <[email protected]> Co-authored-by: artbataev <[email protected]> * Fix requirements for MacOS (#10930) Signed-off-by: Vladimir Bataev <[email protected]> * Fix nemo 2.0 recipes (#10915) * Fix recipe num_nodes and long context docstring * Fix typo * Fix PP issue * Fix unit test * Change recipes * fix test * Fix unit tests * Fix recipes * Add general legal test on parallelization settings * Rename test * Apply isort and black reformatting Signed-off-by: BoxiangW <[email protected]> --------- Signed-off-by: BoxiangW <[email protected]> Co-authored-by: BoxiangW <[email protected]> * Akoumparouli/nemo ux fix dir or string artifact (#10936) * Add __repr__ to Artifact Signed-off-by: Alexandros Koumparoulis <[email protected]> * nemo.lightning.io.artifact: represent strings as fdl.Config to avoid path adjustment during restoration Signed-off-by: Alexandros Koumparoulis <[email protected]> * t5 test minification Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * ckpt convert bug fixes (#10878) * Mistral-NeMo-12B recipe Signed-off-by: Alexandros Koumparoulis <[email protected]> * rename mistral to mistral_7b Signed-off-by: Alexandros Koumparoulis <[email protected]> * include mistral_nemo_12b in __init__ Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * add to __init__ Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * Remove stale imports Signed-off-by: Alexandros Koumparoulis <[email protected]> * TP=2 Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove finetune_reci[e Signed-off-by: Alexandros Koumparoulis <[email protected]> * Rename MistralNeMo2407Config12B to MistralNeMoConfig12B per review's suggestion Signed-off-by: Alexandros Koumparoulis <[email protected]> * update config names in tests Signed-off-by: Alexandros Koumparoulis <[email protected]> * mistral-nemo-12b from llama_8b Signed-off-by: Alexandros Koumparoulis <[email protected]> * TP=2; SP=True Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix overlap value Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * update mistral-nemo-base-12b finetune recipe Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * bug fix Signed-off-by: dimapihtar <[email protected]> * Apply isort and black reformatting Signed-off-by: dimapihtar <[email protected]> * remove extra file Signed-off-by: dimapihtar <[email protected]> * remove extra changes Signed-off-by: dimapihtar <[email protected]> * revert changes Signed-off-by: dimapihtar <[email protected]> * add ckpt_format configurable Signed-off-by: dimapihtar <[email protected]> * Apply isort and black reformatting Signed-off-by: dimapihtar <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * revert changes Signed-off-by: dimapihtar <[email protected]> * Apply isort and black reformatting Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: artbataev <[email protected]> * fix typo in docstring (#10955) Signed-off-by: ashors1 <[email protected]> * remove deprecated ci tests (#10922) * remove deprecated tutorial Signed-off-by: dimapihtar <[email protected]> * remove deprecated ci tests Signed-off-by: dimapihtar <[email protected]> * add deprecation note Signed-off-by: dimapihtar <[email protected]> * add deprecation note Signed-off-by: dimapihtar <[email protected]> * remove bart tests Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: dimapihtar <[email protected]> * [Nemo CICD] Remove deprecated tests (#10960) * remove deprecated tutorial Signed-off-by: dimapihtar <[email protected]> * remove deprecated ci tests Signed-off-by: dimapihtar <[email protected]> * add deprecation note Signed-off-by: dimapihtar <[email protected]> * add deprecation note Signed-off-by: dimapihtar <[email protected]> * remove bart tests Signed-off-by: dimapihtar <[email protected]> * Remove deleted CI tests --------- Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Pablo Garay <[email protected]> Co-authored-by: dimapihtar <[email protected]> * Adithyare/oai chat completion (#10785) * updates Signed-off-by: adithyare <[email protected]> * open ai chat completion wip Signed-off-by: adithyare <[email protected]> * responding with model responses Signed-off-by: adithyare <[email protected]> * Apply isort and black reformatting Signed-off-by: arendu <[email protected]> * also support general completion Signed-off-by: adithyare <[email protected]> * Apply isort and black reformatting Signed-off-by: arendu <[email protected]> --------- Signed-off-by: adithyare <[email protected]> Signed-off-by: arendu <[email protected]> Co-authored-by: arendu <[email protected]> * Update megatron_t5_pretraining.py (#10952) Signed-off-by: Huy Vu <[email protected]> * Convert perf plugin env vars to strings (#10947) Signed-off-by: Hemil Desai <[email protected]> * disable dynamo for ddp checker (#10961) Signed-off-by: Alexandros Koumparoulis <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to db7d37b ! (#10965) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> * Mistral-NeMo-12B recipe (#10607) * Mistral-NeMo-12B recipe Signed-off-by: Alexandros Koumparoulis <[email protected]> * rename mistral to mistral_7b Signed-off-by: Alexandros Koumparoulis <[email protected]> * include mistral_nemo_12b in __init__ Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * add to __init__ Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * Remove stale imports Signed-off-by: Alexandros Koumparoulis <[email protected]> * TP=2 Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove finetune_reci[e Signed-off-by: Alexandros Koumparoulis <[email protected]> * Rename MistralNeMo2407Config12B to MistralNeMoConfig12B per review's suggestion Signed-off-by: Alexandros Koumparoulis <[email protected]> * update config names in tests Signed-off-by: Alexandros Koumparoulis <[email protected]> * mistral-nemo-12b from llama_8b Signed-off-by: Alexandros Koumparoulis <[email protected]> * TP=2; SP=True Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix overlap value Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * update mistral-nemo-base-12b finetune recipe Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Make nemo text processing optional in TTS (#10584) * move TN guard to better location; make guard print error message rather than throwing error Signed-off-by: Jason <[email protected]> * Apply isort and black reformatting Signed-off-by: blisc <[email protected]> * Forgot to add the actual normalizer Signed-off-by: Jason <[email protected]> * Apply isort and black reformatting Signed-off-by: blisc <[email protected]> --------- Signed-off-by: Jason <[email protected]> Signed-off-by: blisc <[email protected]> Co-authored-by: blisc <[email protected]> * respect warnings' filters (#10953) * respect warnings' filters Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Update T5 tokenizer (adding additional tokens to tokenizer config) (#10972) * initial commit * restore t5_pretraining * Apply isort and black reformatting Signed-off-by: huvunvidia <[email protected]> --------- Signed-off-by: huvunvidia <[email protected]> Co-authored-by: Huy Vu2 <[email protected]> Co-authored-by: huvunvidia <[email protected]> * Alit/mamba recipe (#10935) * add some mamba recipe * add 130m * add the rest of the recipes * add tokenizer * add tokenizer * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * add fixes to ssm for nemorun recipes * add hybrid tokenizer * updating some recipes * Apply isort and black reformatting Signed-off-by: JRD971000 <[email protected]> * remove comments * update gbs * fix ckpt resume * fix ckpt resume * fix ckpt resume * update recipes final * Apply isort and black reformatting Signed-off-by: JRD971000 <[email protected]> * remove redundant imports * ckpt convertor dtype fix * Apply isort and black reformatting Signed-off-by: JRD971000 <[email protected]> --------- Signed-off-by: JRD971000 <[email protected]> Signed-off-by: Ali Taghibakhshi <[email protected]> Co-authored-by: JRD971000 <[email protected]> * Long context performance doc hot fix (#10946) * long context perf Signed-off-by: Youngeun Kwon <[email protected]> * update the long context perf Signed-off-by: Youngeun Kwon <[email protected]> * Akoumparouli/mcore microbatch calculator fix (#10780) * move tests/lightning/{,_}io Signed-off-by: Alexandros Koumparoulis <[email protected]> * add microbatch calculator context manager Signed-off-by: Alexandros Koumparoulis <[email protected]> * use microbatch calculator context manager Signed-off-by: Alexandros Koumparoulis <[email protected]> * add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove unused var Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * remove 8x3b recipes (#10764) * remove 8x3b recipes Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove 8x3b from test_nemo_run Signed-off-by: Alexandros Koumparoulis <[email protected]> * rm fr…
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use this
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information