Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add snip_momentum structured pruning example with 80% sparsity ratio #348

Merged
merged 1 commit into from
May 8, 2023

Conversation

ftian1
Copy link
Contributor

@ftian1 ftian1 commented Apr 19, 2023

This PR is used to demonstrate the functionality of snip_momentum structured pruning algo implemented in here.

User can reproduce below result by running source ./bash_script/pruning_sparse_snip_momentum.sh with the PR mentioned at above.

pattern sparsity ratio pruning method epochs acc & mm-acc
1x1   80% DeepSpeed L1     2   0.8113/0.822
1x1   80% Snip_momentum 2   0.8176/0.822
4x1   80% snip_momentum 10 0.8248/0.8305

@xiaoxiawu-microsoft
Copy link
Contributor

@ftian1, thanks for your patience. I have tested the method and it looks good. I will do some minor changes for the checkpoint saving.

yaozhewei added a commit that referenced this pull request May 9, 2023
* add mask for generation, otherwise the generation is broken (#468)

* Update model_utils.py (#471)

* Support huggyllama/llama-7b with DSPipeline (#484)

During token encoding for the huggyllama/llama-7b model, self.tokenizer.batch_encode_plus returns a token_type_ids kwarg that isn't recognized in the model's generate function.

This is due to AutoTokenizer returning LlamaTokenizerFast instead of LlamaTokenizer as seen in the model tokenizer config:
https://huggingface.co/huggyllama/llama-7b/blob/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/tokenizer_config.json#L24

As a workaround, the DSPipeline will check to see if the tokenizer is LlamaTokenizerFast and only pass in input_tokens.input_ids to the self.model.generate(...) call if that is the case.

* Add snip_momentum structured pruning example with 80% sparsity ratio (#348)

---------

Co-authored-by: Jeff Rasley <[email protected]>
Co-authored-by: Lev Kurilenko <[email protected]>
Co-authored-by: Tian, Feng <[email protected]>
@ftian1
Copy link
Contributor Author

ftian1 commented May 11, 2023

@xiaoxiawu-microsoft appreciate for your effort

yaozhewei added a commit that referenced this pull request May 12, 2023
* add mask for generation, otherwise the generation is broken

* .

* merge master (#504)

* add mask for generation, otherwise the generation is broken (#468)

* Update model_utils.py (#471)

* Support huggyllama/llama-7b with DSPipeline (#484)

During token encoding for the huggyllama/llama-7b model, self.tokenizer.batch_encode_plus returns a token_type_ids kwarg that isn't recognized in the model's generate function.

This is due to AutoTokenizer returning LlamaTokenizerFast instead of LlamaTokenizer as seen in the model tokenizer config:
https://huggingface.co/huggyllama/llama-7b/blob/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/tokenizer_config.json#L24

As a workaround, the DSPipeline will check to see if the tokenizer is LlamaTokenizerFast and only pass in input_tokens.input_ids to the self.model.generate(...) call if that is the case.

* Add snip_momentum structured pruning example with 80% sparsity ratio (#348)

---------

Co-authored-by: Jeff Rasley <[email protected]>
Co-authored-by: Lev Kurilenko <[email protected]>
Co-authored-by: Tian, Feng <[email protected]>

* .

* change eos

* .

* resolve tokenizer issue

* add script

* .

---------

Co-authored-by: Jeff Rasley <[email protected]>
Co-authored-by: Lev Kurilenko <[email protected]>
Co-authored-by: Tian, Feng <[email protected]>
Syulin7 pushed a commit to Syulin7/DeepSpeedExamples that referenced this pull request May 15, 2023
leocnj pushed a commit to leocnj/DeepSpeedExamples that referenced this pull request May 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants