-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to generate text with the Megatron-LM model trained with DeepSpeed #507
Comments
I was able to modify the text generation utility generate_samples.py to wrap Megatron-LM model in DeepSpeed engine and use it to generate text with the checkpoint created by DeepSpeed Megatron-LM wrapper. I can create a pull request for the change, but the code for Megatron-LM in this repository is over a year old and I think that fixing it will not serve much purpose. The preferred solution is for the Microsoft team to complete their effort of incorporating the more recent version of Megatron-LM in DeepSpeed Examples repo. I left a note to the developers working on that branch appraising them of this bug. Please note that another bug in DeepSpeed makes it hard to wrap Megatron-LM model in DeepSpeed engine for text generation. Text generation does not require an optimizer. The deepspeed.initialize() call states that the optimizer is optional. In fact, passing None for an optimizer leads to an exception. Trying to create a "fake" optimizer for text generation just to make DeepSpeed code to work complicates the code and makes it hard to test the code on a machine with small amount of GPU memory. My team fixed the issue with optimizer and sent a pull request to the Microsoft team. Once it is merged, it will be much easier to use text generation with the DeepSpeed code. |
Hi @msmolyak, many many thanks to you and your team for the multiple contributions along this line. I am back after some sick leave and catching up. Let me work on your submitted bugfix and ramp back up on the DSE side of things. We'd love to get the text generation PR from you once we're setup to properly track Megatron-LM's upstream branch. |
Hi, @msmolyak , I'd like to ask you a question,I modify the generate_samples.py,to use the setup_model_and_optimizer() function in pretrain_gpt2.py. It can load model parallel checkpoint success, but it can't generate any text,it will stuck in
|
Hi @hujian233, Here are the notes I took when trying to make text generation work in the current version of DeepSpeedExamples (I did not commit the code anywhere, it was just a proof of concept effort): https://docs.google.com/document/d/1My6UA-2n_MHMZO8w-xwnXoKBU1KWEr4B4_iYveFoXPs/edit?usp=sharing Step 6 dealing with the changes to generate_samples has three changes. The document contains the diff with my changes.
The second change was a hack to create an optimizer, which is totally superfluous. The preferred approach here would be to modify the DeepSpeed code to allow initializing a model without an optimizer. (My colleague submitted a pull request to that effect). This code was able to generate text from the DeepSpeed-trained model. If this document does not offer any clues, let me know and I will try to run text generation with your code in my test environment. |
Hi @msmolyak , thank you. I have already generate text normally. |
Hi, @msmolyak , Have you solved the problem you raised. If now can use deepspeed initialize without optmizer?
It doesn't word well with the latest deepspeed version. do you know what I'm missing? how the deepspeed config file set? |
Closing this issue as it appears to be stale. If you are hitting new issues/have more questions, please open a new issue with the latest DeepSpeed/Megatron-DeepSpeed repo and we would be happy to take a look. Thanks! |
DeepSpeed tutorial https://www.deepspeed.ai/tutorials/megatron/ does not have any guidance regarding text generation after training the model. All the information in the tutorial deals with training, which implies that logic for generation does not change.
I tried running the Megatron-LM script for text generation, (which works fine with the model trained using Megatron-LM scripts), but it was unable to load the checkpoint generated by the DeepSpeed wrapper.
Are checkpoints generated by Megatron-LM and DeepSpeed wrapper of Megatron-LM binary compatible? Do I need to wrap the text generation code with deepspeed module they way it was done with the pre-training logic? Are there examples of text generation based on models trained with DeepSpeed?
Thank you,
Michael
The text was updated successfully, but these errors were encountered: