-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The training and validation loss become nan #166
Comments
What are your gradient values for the few steps before the |
Can you tell me how to output the gradient values when using the OpenSTL? I don't know how to output it. Thanks!
|
How are writing the above output? Is that the default (apologies I do not have the code open in front of me). I would search for where these print statements are happening in the code and amend them to include printing the value of the (norm) of the gradient...My conjecture (given your loss isn't really decreasing anymore) is that you are in an area of flat geometry and the gradient is encountering troubles... |
May I ask what should be done to solve this problem in this case? Like tweaking the training data, reducing some training samples or something like that? |
You can either change your learning rate schedule to more rapidly decrease But your learning rate is 5e-4 which might be too big to get better estimates. So I would definitely do a more aggressive scheduler. |
Thanks! I'll set a smaller Although |
"The results were not very good." what does this mean? Is there something that you are doing to discern this that isn't being represented in your model? Like I asked above: what makes you think the loss will go less than |
Honestly, I have no strict basis for thinking that the LOSS should be lower than 0.022. My previous model trained with a dataset with less data was better in predicting results than this current model that 'converged' at epoch 25. I think this model with more training data should perform better, so I don't think the model trained now is optimal. But, it is true that the optimal model will not necessarily have a loss lower than 0.022. |
Hello, everyone! Recently, I used OpenSTL to train a PredRNN++ model, but after several epochs, the training and validation loss became nan. Before training the model, I normalized the training data to between 0-1. Why does this problem occur and how should this problem be solved? Thanks!!!
The following are the model parameters:
The following is the output when I train the model:
The text was updated successfully, but these errors were encountered: