Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft+ for SDXL [draft] #222

Merged
merged 37 commits into from
Jul 12, 2024
Merged

Draft+ for SDXL [draft] #222

merged 37 commits into from
Jul 12, 2024

Conversation

rohitrango
Copy link
Contributor

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

  • Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

Checklist when contributing a new algorithm

  • Does the trainer resume and restore model state all states?
  • Does the trainer support all parallelism techniques(PP, TP, DP)?
  • Does the trainer support max_steps=-1 and validation?
  • Does the trainer only call APIs defined in alignable_interface.py?
  • Does the trainer have proper logging?

Additional Information

  • Related to # (issue)

Copy link
Collaborator

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rohitrango for the contribution!

I've left some comments. Let me know if there are any questions

@rohitrango rohitrango requested a review from terrykong July 8, 2024 18:54
@terrykong
Copy link
Collaborator

terrykong commented Jul 8, 2024

Requires NVIDIA/NeMo#9543

edit: Also requires NVIDIA/NeMo#9654 to have feature in RC branch

Rohit Jena added 14 commits July 8, 2024 16:01
Signed-off-by: Rohit Jena <[email protected]>
Signed-off-by: Rohit Jena <[email protected]>
Signed-off-by: Rohit Jena <[email protected]>
Signed-off-by: Rohit Jena <[email protected]>
Signed-off-by: Rohit Jena <[email protected]>
Signed-off-by: Rohit Jena <[email protected]>
Signed-off-by: Rohit Jena <[email protected]>
Signed-off-by: Rohit Jena <[email protected]>
Signed-off-by: Rohit Jena <[email protected]>
Signed-off-by: Rohit Jena <[email protected]>
Rohit Jena and others added 23 commits July 8, 2024 16:05
Signed-off-by: Rohit Jena <[email protected]>
Signed-off-by: Rohit Jena <[email protected]>
Signed-off-by: Rohit Jena <[email protected]>
Signed-off-by: Rohit Jena <[email protected]>
Signed-off-by: Rohit Jena <[email protected]>
Signed-off-by: Rohit Jena <[email protected]>
Signed-off-by: Rohit Jena <[email protected]>
Signed-off-by: Rohit Jena <[email protected]>
Copy link
Collaborator

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rohitrango for adding this new feature!

@terrykong terrykong merged commit 2a454f9 into NVIDIA:main Jul 12, 2024
3 checks passed
abukharin3 pushed a commit to abukharin3/NeMo-Aligner that referenced this pull request Sep 25, 2024
* leftover commit

Signed-off-by: Rohit Jena <[email protected]>

* leftover commit

Signed-off-by: Rohit Jena <[email protected]>

* commits

Signed-off-by: Rohit Jena <[email protected]>

* update gitignore

Signed-off-by: Rohit Jena <[email protected]>

* init model for SDXL

Signed-off-by: Rohit Jena <[email protected]>

* correct path in eos script

Signed-off-by: Rohit Jena <[email protected]>

* modified generate and log_generate scripts to use diffusion engine
sampling instead

Signed-off-by: Rohit Jena <[email protected]>

* fixed most runtime bugs -- check for logical bugs

Signed-off-by: Rohit Jena <[email protected]>

* examining mapping between hf and nemo

Signed-off-by: Rohit Jena <[email protected]>

* writing converter script for unet

Signed-off-by: Rohit Jena <[email protected]>

* tmp commit (moving to eos)

Signed-off-by: Rohit Jena <[email protected]>

* more changes to draftp xl

Signed-off-by: Rohit Jena <[email protected]>

* changed batch scripts

Signed-off-by: Rohit Jena <[email protected]>

* changed launch scripts

Signed-off-by: Rohit Jena <[email protected]>

* check adapter control

Signed-off-by: Rohit Jena <[email protected]>

* clean up lora hotswap debugging

Signed-off-by: Rohit Jena <[email protected]>

* adding fsdp to draftp training

Signed-off-by: Rohit Jena <[email protected]>

* FSDP now works for SDXL?!

Signed-off-by: Rohit Jena <[email protected]>

* added custom rule to enable sharding of decoder

Signed-off-by: Rohit Jena <[email protected]>

* also shard the clip embeddings

Signed-off-by: Rohit Jena <[email protected]>

* multinode script created + testing 2048 config

Signed-off-by: Rohit Jena <[email protected]>

* add activation checkpointing

Signed-off-by: Rohit Jena <[email protected]>

* added activation checkpointing

Signed-off-by: Rohit Jena <[email protected]>

* added SFT and PEFT support with Draft+

Signed-off-by: Rohit Jena <[email protected]>

* corrected init denoise bug

Signed-off-by: Rohit Jena <[email protected]>

* added multinode setup script - test it

Signed-off-by: Rohit Jena <[email protected]>

* enable sharding for sdlora

Signed-off-by: Rohit Jena <[email protected]>

* added multinode script for OCI

Signed-off-by: Rohit Jena <[email protected]>

* some more refactoring

Signed-off-by: Rohit Jena <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Rohit Jena <[email protected]>

* address Terry's comments

Signed-off-by: Rohit Jena <[email protected]>

* remove + 0 to clone

Signed-off-by: Rohit Jena <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Rohit Jena <[email protected]>

* add version guard

Signed-off-by: Rohit Jena <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Rohit Jena <[email protected]>

* added todo to merge these APIs later

Signed-off-by: Rohit Jena <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Rohit Jena <[email protected]>

---------

Signed-off-by: Rohit Jena <[email protected]>
Co-authored-by: Rohit Jena <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexander Bukharin <[email protected]>
abukharin3 pushed a commit to abukharin3/NeMo-Aligner that referenced this pull request Nov 7, 2024
* leftover commit

Signed-off-by: Rohit Jena <[email protected]>

* leftover commit

Signed-off-by: Rohit Jena <[email protected]>

* commits

Signed-off-by: Rohit Jena <[email protected]>

* update gitignore

Signed-off-by: Rohit Jena <[email protected]>

* init model for SDXL

Signed-off-by: Rohit Jena <[email protected]>

* correct path in eos script

Signed-off-by: Rohit Jena <[email protected]>

* modified generate and log_generate scripts to use diffusion engine
sampling instead

Signed-off-by: Rohit Jena <[email protected]>

* fixed most runtime bugs -- check for logical bugs

Signed-off-by: Rohit Jena <[email protected]>

* examining mapping between hf and nemo

Signed-off-by: Rohit Jena <[email protected]>

* writing converter script for unet

Signed-off-by: Rohit Jena <[email protected]>

* tmp commit (moving to eos)

Signed-off-by: Rohit Jena <[email protected]>

* more changes to draftp xl

Signed-off-by: Rohit Jena <[email protected]>

* changed batch scripts

Signed-off-by: Rohit Jena <[email protected]>

* changed launch scripts

Signed-off-by: Rohit Jena <[email protected]>

* check adapter control

Signed-off-by: Rohit Jena <[email protected]>

* clean up lora hotswap debugging

Signed-off-by: Rohit Jena <[email protected]>

* adding fsdp to draftp training

Signed-off-by: Rohit Jena <[email protected]>

* FSDP now works for SDXL?!

Signed-off-by: Rohit Jena <[email protected]>

* added custom rule to enable sharding of decoder

Signed-off-by: Rohit Jena <[email protected]>

* also shard the clip embeddings

Signed-off-by: Rohit Jena <[email protected]>

* multinode script created + testing 2048 config

Signed-off-by: Rohit Jena <[email protected]>

* add activation checkpointing

Signed-off-by: Rohit Jena <[email protected]>

* added activation checkpointing

Signed-off-by: Rohit Jena <[email protected]>

* added SFT and PEFT support with Draft+

Signed-off-by: Rohit Jena <[email protected]>

* corrected init denoise bug

Signed-off-by: Rohit Jena <[email protected]>

* added multinode setup script - test it

Signed-off-by: Rohit Jena <[email protected]>

* enable sharding for sdlora

Signed-off-by: Rohit Jena <[email protected]>

* added multinode script for OCI

Signed-off-by: Rohit Jena <[email protected]>

* some more refactoring

Signed-off-by: Rohit Jena <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Rohit Jena <[email protected]>

* address Terry's comments

Signed-off-by: Rohit Jena <[email protected]>

* remove + 0 to clone

Signed-off-by: Rohit Jena <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Rohit Jena <[email protected]>

* add version guard

Signed-off-by: Rohit Jena <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Rohit Jena <[email protected]>

* added todo to merge these APIs later

Signed-off-by: Rohit Jena <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Rohit Jena <[email protected]>

---------

Signed-off-by: Rohit Jena <[email protected]>
Co-authored-by: Rohit Jena <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants