-
-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSTM gradient calculation fails on GPU, works on CPU #1586
Comments
Cc @mzgubic should we relax this bound somewhat? I can take a look in a bit too |
For reference FluxML/Zygote.jl#924 In a nutshell, Zygote used to quietly drop certain derivatives when accumulating two NamedTuples. After that PR it throws an error. The underlying issue that this exposes is that one of the custom That said, I don't know why it would work on CPU and not on GPU. My only guess is that CUDA has some compatibility requirements which change the Zygote version compared to the one running on CPU? |
@mzgubic Good catch---differing Zygote versions looks like it was the root issue. After updating Zygote and CUDA the error disappears. I'll go ahead and close the issue. |
Flux depends on CUDA unconditionally, so the Zygote version should be the same regardless of whether GPU functionality is used. @mahowald can you update to a newer version of Flux? From the stacktrace, it looks like this is a version that still relies on the (often buggy) CuDNN RNN path, which we've more or less removed in recent versions. The presence of a Edit: looks like Zygote and CUDA may have been held back for some reason, glad you resolved it. For posterity, do you mind listing what Flux, Zygote and CUDA versions you were and are on? |
@ToucheSir Current versions are: Flux - v0.12.1 If I remember correctly, the previous versions were: Flux - v0.11.6 |
I have encountered an error when attempting to train an LSTM model that only appears on GPU. Below is a minimal example to reproduce the error. The code works fine without the
|> gpu
, and it also works fine with GPU if the LSTM layer is replaced withRNN(5, 64)
.Here is the stacktrace for the error:
This occurs on Linux, CUDA v10.1, Julia v1.6. Any ideas about what's going wrong?
The text was updated successfully, but these errors were encountered: