You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is an observation that I would be happy to get your confirmation on :-)
In all of the model hierarchy: SequenceModel, SequenceResidualBlock and S4 ,you are using Dropout2d which zeros at the batch dimension, i.e. ignores the entire sample. Without a residual link, with multiple layers, the probability that each sample is not ignored through the model becomes negligible. Consequently, the model does not see the inputs and will not train!
In the SequenceResidualBlock, the dropout is applied only if a residual link is present. The residual link of SequenceResidualBlock also takes care of the dropout from S4.
So my issue is two-fold:
When using dropout > 0, we never should set residual = None in the parameters of SequenceResidualBlock, right? Is it possible to add a check in the initialization to avoid possible misconfigurations?
The dropinp input of SequenceModel should not be used, as there is no residual link there. I've seen in all of the configs we have dropinp: 0.0. So why is it there at all?
Thanks and regards,
The text was updated successfully, but these errors were encountered:
The READMEs have been updated to mention this issue, and we have implemented a custom dropout function to avoid problems with the PyTorch implementation. Perhaps in the far future when everyone is using torch 1.12 or later we can switch back to using the official functions.
Dear authors and contributors,
There is an observation that I would be happy to get your confirmation on :-)
In all of the model hierarchy:
SequenceModel
,SequenceResidualBlock
andS4
,you are usingDropout2d
which zeros at the batch dimension, i.e. ignores the entire sample. Without a residual link, with multiple layers, the probability that each sample is not ignored through the model becomes negligible. Consequently, the model does not see the inputs and will not train!In the
SequenceResidualBlock
, the dropout is applied only if a residual link is present. The residual link ofSequenceResidualBlock
also takes care of the dropout fromS4
.So my issue is two-fold:
dropout > 0
, we never should setresidual = None
in the parameters ofSequenceResidualBlock
, right? Is it possible to add a check in the initialization to avoid possible misconfigurations?dropinp
input ofSequenceModel
should not be used, as there is no residual link there. I've seen in all of the configs we havedropinp: 0.0
. So why is it there at all?Thanks and regards,
The text was updated successfully, but these errors were encountered: