Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actor loss nan and Resizing model embedding #922

Open
ouyanmei opened this issue Aug 29, 2024 · 1 comment
Open

Actor loss nan and Resizing model embedding #922

ouyanmei opened this issue Aug 29, 2024 · 1 comment

Comments

@ouyanmei
Copy link

The model I use is GPT-2 124M. When resizing model embeddings during the training of STF and RW, I often encounter issues where the generated answers consist entirely of zeros. This causes both the log probabilities and actor loss to become NaN (Not a Number). I have noticed that resizing the embeddings can lead to the generation of token IDs that exceed the vocabulary size. I suspect this may be contributing to the problem. However, when I don't resize the model's embeddings and train STF and RW, I do not experience this issue during RLHF training. I don't know why.

@ouyanmei
Copy link
Author

Due to the vocabulary size of the GPT-2 124M model being 50257, resizing the model's embedding layer dimensions may result in new embeddings that exceed the original vocabulary range. This can lead to the generation of token IDs that go beyond 50256. In the context of Reinforcement Learning with Human Feedback (RLHF), if the generated token IDs exceed the vocabulary range, the logprobs for these out-of-range tokens may become extremely small, manifesting as outliers. This can lead to numerical instability during training, potentially resulting in NaN values. Clipping the logprobs may help mitigate this issue (some experiments have been conducted, but it has not yet been widely validated).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant