Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSDP oneshot #1939

Merged
merged 84 commits into from
Jan 11, 2024
Merged

FSDP oneshot #1939

merged 84 commits into from
Jan 11, 2024

Conversation

Satrat
Copy link

@Satrat Satrat commented Jan 4, 2024

This PR updates the one-shot modifiers SparseGPT, Wanda and SmoothQuant and Quantization to be compatible with FSDP. This enables us to run alternating one-shot/finetuning flows with FSDP

** NOTE: ** #1912 should be merged first, it covers the initial alternating flow implementation.

Summary of Changes

  • Remove any references of specific devices from the one-shot modifiers, device is now handled by SparseCausalLM, and defaults to "auto" for splitting the model across multiple GPUs (this isn't FSDP related, we can split the model even outside of FSDP) "auto" actually isn't compatible with quantization :( so keeping the default as "cuda:0", but you can pass "auto" through the CLI for a non-quantized oneshot
  • Refactored the SparseGPT class to be a module wrapper, so that we can update weights using module.apply as required for FSDP compatibility
  • Refactored Wanda in the same way, also cleaned up the code sharing between SparseGPT and Wanda(@rahul-tuli would like your input specifically on this)
  • Bug fixes related to quantizing FSDP models

@Satrat Satrat requested review from bfineran and rahul-tuli January 9, 2024 19:14
bfineran
bfineran previously approved these changes Jan 9, 2024
@Satrat
Copy link
Author

Satrat commented Jan 9, 2024

Remove any references of specific devices from the one-shot modifiers, device is now handled by SparseCausalLM, and defaults to "auto" for splitting the model across multiple GPUs (this isn't FSDP related, we can split the model even outside of FSDP)

It doesn't seem like device defaults to "auto" if that is an intended change. Current obcq.py arg:

    parser.add_argument("--device", type=str, default="cuda:0")

See updated PR comment :( device_map="auto" doesn't seem to be compatible with quantization so I'm leaving it off the default. It can still be specified on the CLI for non-quantized one-shot

@Satrat Satrat requested review from bfineran and mgoin January 9, 2024 22:56
bfineran
bfineran previously approved these changes Jan 10, 2024
@bfineran bfineran merged commit 5007b8c into main Jan 11, 2024
13 checks passed
@bfineran bfineran deleted the sgpt_fsdp branch January 11, 2024 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants