Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the tokenization #143

Open
HillZhang1999 opened this issue Dec 9, 2021 · 9 comments
Open

Optimize the tokenization #143

HillZhang1999 opened this issue Dec 9, 2021 · 9 comments

Comments

@HillZhang1999
Copy link

First, thanks for your excellent work. Here is my question:

  • I used your code to reproduce the results in your paper, but found the CPU utilization rate was really high during training process, especially for stage 1. However, the GPU rate was not always 100%, sometimes only 50~60%, and fluctuated.
  • I debugged and assumed that the reason is the dynamically word-piece tokenization operation in the indexer.
  • I also made minor changes to adapt your code for Chinese GEC, e.g., 1) upgrade allennlp to latest version; 2) discard the word-piece operations since in Chinese we directly use chars as input units. And in Chinese experiments, i found that after above adaptations, the CPU rate was reduced heavily and training speed was greatly accelerated (e.g., 900M sentences cost ~5d for English, but only 1d for Chinese).
  • So i wonder if your implementation can be accelerated by upgrading allennlp to the latest version, or preprocessing the data (i.e., do word-piece or bpe segmentation) statically before training.
@skurzhanskyi
Copy link
Collaborator

skurzhanskyi commented Dec 9, 2021

That's a good suggestion. Indeed, tokenization may require heavy CPU usage.
I don't see how updating AlllenNLP can help. Maybe you have a suggestion on how we can optimize the code?

@HillZhang1999
Copy link
Author

That's a good suggestion. Indeed, tokenization may require heavy CPU usage. I don't see how updating AlllenNLP can help. Maybe you have a suggestion on how we can optimize the code?

Hi, for upgrading AllenNlp, the training speed can be accelerated, since we can easily set the parameter use_amp to true in the GradientDescentTrainer, and thus use Automatic Mixed Precision to train gector. (and, of course, we need to make some extra adaptations to support characteristics in gector like cold_steps).
For handling the problem of tokenization, I guess if we can tokenize the training data before the training process, and load the results during training, to avoid the heavy CPU usage and redundant tokenization?
Thank you for your kind reply!

@Jason3900
Copy link

That's a good suggestion. Indeed, tokenization may require heavy CPU usage. I don't see how updating AlllenNLP can help. Maybe you have a suggestion on how we can optimize the code?

Hi, for upgrading AllenNlp, the training speed can be accelerated, since we can easily set the parameter use_amp to true in the GradientDescentTrainer, and thus use Automatic Mixed Precision to train gector. (and, of course, we need to make some extra adaptations to support characteristics in gector like cold_steps). For handling the problem of tokenization, I guess if we can tokenize the training data before the training process, and load the results during training, to avoid the heavy CPU usage and redundant tokenization? Thank you for your kind reply!

Hey! I'm wondering how to modify the code to support cold_steps with amp? I tried but if I freeze the encoder for the first few epochs, it's not possible to unfreeze it with "params.requires_grad = True". The loss and acc will not decrease. Have you figured out a possible solution? I need some help.

@HillZhang1999
Copy link
Author

That's a good suggestion. Indeed, tokenization may require heavy CPU usage. I don't see how updating AlllenNLP can help. Maybe you have a suggestion on how we can optimize the code?

Hi, for upgrading AllenNlp, the training speed can be accelerated, since we can easily set the parameter use_amp to true in the GradientDescentTrainer, and thus use Automatic Mixed Precision to train gector. (and, of course, we need to make some extra adaptations to support characteristics in gector like cold_steps). For handling the problem of tokenization, I guess if we can tokenize the training data before the training process, and load the results during training, to avoid the heavy CPU usage and redundant tokenization? Thank you for your kind reply!

Hey! I'm wondering how to modify the code to support cold_steps with amp? I tried but if I freeze the encoder for the first few epochs, it's not possible to unfreeze it with "params.requires_grad = True". The loss and acc will not decrease. Have you figured out a possible solution? I need some help.

One simple solution is that you can save the model parameters to your disk after finishing the cold steps. Then you can start a new training procedure, reload the model parameters and unfreeze the BERT encoders.

@Jason3900
Copy link

Jason3900 commented Feb 13, 2022

That's a good suggestion. Indeed, tokenization may require heavy CPU usage. I don't see how updating AlllenNLP can help. Maybe you have a suggestion on how we can optimize the code?

Hi, for upgrading AllenNlp, the training speed can be accelerated, since we can easily set the parameter use_amp to true in the GradientDescentTrainer, and thus use Automatic Mixed Precision to train gector. (and, of course, we need to make some extra adaptations to support characteristics in gector like cold_steps). For handling the problem of tokenization, I guess if we can tokenize the training data before the training process, and load the results during training, to avoid the heavy CPU usage and redundant tokenization? Thank you for your kind reply!

Hey! I'm wondering how to modify the code to support cold_steps with amp? I tried but if I freeze the encoder for the first few epochs, it's not possible to unfreeze it with "params.requires_grad = True". The loss and acc will not decrease. Have you figured out a possible solution? I need some help.

One simple solution is that you can save the model parameters to your disk after finishing the cold steps. Then you can start a new training procedure, reload the model parameters and unfreeze the BERT encoders.

Okay, I found that if the requires_grad option is set inside forward method, it will work. Thank you by the way~

@damien2012eng
Copy link

Hi @HillZhang1999 Could you please suggest what changes did you made to use latest AllenNLP? Thanks!

@HillZhang1999
Copy link
Author

Hi @HillZhang1999 Could you please suggest what changes did you made to use latest AllenNLP? Thanks!

Maybe you can refer to this repo: https://github.com/HillZhang1999/MuCGEC/tree/main/models/seq2edit-based-CGEC

@damien2012eng
Copy link

Thanks for replying it so quickly!
Looks like that you did not use the tokenization file in your code base? I tried to replace the existing ones with pretrainedIndexer and pretrainedEmbedder directly. However, the predicted results are different.

@Jason3900
Copy link

Jason3900 commented Aug 31, 2022

Thanks for replying it so quickly! Looks like that you did not use the tokenization file in your code base? I tried to replace the existing ones with pretrainedIndexer and pretrainedEmbedder directly. However, the predicted results are different.

And if you would like to train seq2edit GEC without AllenNLP bundle but with faster speed, I made a deepspeed + pytorch + transformers implementation, you can refer to this repo:
https://github.com/blcuicall/CCL2022-CLTC/tree/main/baselines/track3/seq2edit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants