-
-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADAMW not stable #1920
Comments
There's not enough here for a MWE (specifically the W part). In this case I would recommend providing the equivalent Python code as well since you're doing a head-to-head comparison. A couple other things to note:
|
Thanks for your reply!
Thanks again! |
I'm sorry, there is something different between model in pytorch and Flux: embedding in pytorch I use an argument with name |
Yes, It is the What's remaining is:
|
Great that you managed to get things working and see an improvement. For 1. and 2., this has come up a few times in various help forums but the JuliaLang/julia#21672 (comment) is one of the more canonical answers. TL;DR being able to preallocate is fast, splatting tons of elements is not. |
If I'm reading this correctly, you are doing |
Yes, my data is calculated on an ad hoc basis because of the volume of data and random selection would make it difficult to determine when to terminate. |
Thanks! |
Closing since "ADAMW not stable" seems not to be a true issue. You can file separate issues for performance concerns if you want. |
Loss gets to be nan while raining on dataset wili-2018, about epoch 40, test data set accurate>0.8.
I am sure its a problem, because it runs well in pytorch.
By the way,
hcat(map(x...))
is very slow!Flux.batch(map(f, d)) is very fast, but will cause an error named
Mutating arrays not supported
The text was updated successfully, but these errors were encountered: