Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropout2d and residual #42

Closed
AliVard opened this issue Jun 15, 2022 · 2 comments
Closed

Dropout2d and residual #42

AliVard opened this issue Jun 15, 2022 · 2 comments

Comments

@AliVard
Copy link

AliVard commented Jun 15, 2022

Dear authors and contributors,

There is an observation that I would be happy to get your confirmation on :-)
In all of the model hierarchy: SequenceModel, SequenceResidualBlock and S4 ,you are using Dropout2d which zeros at the batch dimension, i.e. ignores the entire sample. Without a residual link, with multiple layers, the probability that each sample is not ignored through the model becomes negligible. Consequently, the model does not see the inputs and will not train!
In the SequenceResidualBlock, the dropout is applied only if a residual link is present. The residual link of SequenceResidualBlock also takes care of the dropout from S4.
So my issue is two-fold:

  • When using dropout > 0, we never should set residual = None in the parameters of SequenceResidualBlock, right? Is it possible to add a check in the initialization to avoid possible misconfigurations?
  • The dropinp input of SequenceModel should not be used, as there is no residual link there. I've seen in all of the configs we have dropinp: 0.0. So why is it there at all?

Thanks and regards,

@albertfgu
Copy link
Contributor

There is a bug in PyTorch 1.11 which is causing the behavior of Dropout2d that you've observed: pytorch/pytorch#77081

We will add a warning and a fix for this.

dropinp is a hyperparameter that people sometimes use, and we also used in earlier experiments on WikiText-103.

@albertfgu
Copy link
Contributor

The READMEs have been updated to mention this issue, and we have implemented a custom dropout function to avoid problems with the PyTorch implementation. Perhaps in the far future when everyone is using torch 1.12 or later we can switch back to using the official functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants