-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added AlphaDropout which is used in SNNs. #656
Conversation
Seems generally reasonable, but the implementation is currently dropping gradients. You'll need to avoid in-place broadcast to get this working without |
Thank You @MikeInnes. I have made the necessary changes so that gradients are not lost. I am no longer using collect and param. |
src/layers/normalise.jl
Outdated
a.active || return x | ||
α = -1.75813631 | ||
noise = randn(Float64, size(x.data)) | ||
x.data .= x.data .* (noise .> (1 - a.p)) + α .* (noise .<= (1 - a.p)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not assign to x.data, otherwise you . Just do
x = x* (noise .> (1 - a.p)) + α .* (noise .<= (1 - a.p))
or even better
d = rand(size(x)) .> a.p
x = @. x * d + α * !d
src/layers/normalise.jl
Outdated
function (a::AlphaDropout)(x) | ||
a.active || return x | ||
α = -1.75813631 | ||
noise = randn(Float64, size(x.data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
randn
gives normally distributed numbers, maybe you want rand
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think randn is more suitable here, as we are required to have a uniform normally distributed output.
src/layers/normalise.jl
Outdated
|
||
function (a::AlphaDropout)(x) | ||
a.active || return x | ||
α = -1.75813631 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is going to change the return value to Float64 irrespective of x
. α should be initialized to the same element type of x
Thank you @CarloLucibello. I have made the changes suggested by you. Just curious, I wanted to know what would have been the problem if we assigned to x.data. |
This looks generally reasonable to me. Would like a quick look over from @staticfloat that it makes sense.
Any time you work with |
Also this needs some numerical tests, e.g. comparing hand-coded output from another implementation, or similar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also introduce some numerical tests, preferably calculated by hand or in PyTorch/TF so that we can make sure we never break the numerics here.
src/layers/normalise.jl
Outdated
a.active || return x | ||
α = eltype(x)(-1.75813631) | ||
noise = randn(eltype(x), size(x)) | ||
x = @. x*(noise .> (1 - a.p)) + α .* (noise .<= (1 - a.p)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you use @.
you don't need the .>
and .*
and .<=
operators; either use @.
to get them all, or manually write them all as dotted operators.
src/layers/normalise.jl
Outdated
|
||
function (a::AlphaDropout)(x) | ||
a.active || return x | ||
α = eltype(x)(-1.75813631) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use the same α
parameter as given by the selu()
function in NNlib, see the definition here: https://github.com/FluxML/NNlib.jl/blob/d07ac0bfd3c71c3a29bc9c22becbba19227bbeb5/src/activation.jl#L100-L104
src/layers/normalise.jl
Outdated
|
||
function AlphaDropout(p) | ||
@assert 0 ≤ p ≤ 1 | ||
AlphaDropout{typeof(p)}(p,true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need {typeof(p)}
here; it should be able to determine this from the p
parameter:
julia> struct Foo{F}
p::F
active::Bool
end
julia> Foo(1.0, true)
Foo{Float64}(1.0, true)
src/layers/normalise.jl
Outdated
x = @. x*(noise .> (1 - a.p)) + α .* (noise .<= (1 - a.p)) | ||
A = (a.p + a.p * (1 - a.p) * α ^ 2)^0.5 | ||
B = -A * α * (1 - a.p) | ||
x = @. A .* x .+ B |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here; this @.
is not doing anything, because you have already dotted your operators.
I realized randn(), which I am using to initialize noise, is not giving a standard normal distribution. Due to which, the mean and variance is slightly different from the mean and variance of input. This can be solved by making use of Normal() which is available in Distributions package. But Flux doesn't have a dependency on Distributions. What would be the best way to get a standard normal distribution? Shall I add Distributions among Flux Dependencies? |
Thank you @staticfloat. I have used the value of lambda and alpha as defined in NNlib.jl. I have removed the redundant dot operations. I will be adding test cases. But before that, I wanted to solve the problem of randn() not giving a standard normal distribution. Can I use Distributions package and use Normal() so that I get a distribution with mean = 0 and variance = 1? |
What do you mean by this? Why do you think that |
`julia> x = randn(2,2) julia> using Statistics julia> mean(x) julia> std(x) ` |
I was just testing randn() and it doesn't seem to be giving a standard normal distribution. |
That's just sampling error, the mean is only in the limit of a large number of samples. |
This is completely expected; the mean and standard deviation you compute on normally-distributed random numbers will not always be exactly zero and one. That's not the way randomness works. When you take Let's imagine we are measuring the height of people in a city. Let's further assume that across the entire city, the heights form a normal distribution with a certain mean and standard deviation. If I just take 4 random people from the city, the mean of their heights will probably not be the "true" mean; it will just be an estimate of it. I may be unlucky and grab four people that are all on the very tall side of things. The "error" of my mean estimate may be very high. You can read here for more information on calculating the error bounds of your statistical estimates, but to convince yourself, try making a matrix that is much larger than |
Oh! That was a nice example. Thank you for clearing my doubt. |
src/layers/normalise.jl
Outdated
function (a::AlphaDropout)(x) | ||
a.active || return x | ||
λ = 1.0507009873554804934193349852946 | ||
α = 1.6732632423543772848170429916717 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to have λ
and α
be oftype(eltype(x), ....)
so that we don't have to convert lower on down; that might help us generate code to run on platforms where, for instance, Float64 literals are not supported (that's not the case on CPUs, but for things like TPUs it's helpful to stay in the proper datatype as much as possible)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
The Travis CI build failed after I resolved the merge conflict, which was raised due to InstanceNorm. |
Thanks @thebhatman this looks pretty good! |
Awesome, thanks a lot @thebhatman and @staticfloat for the review! @thebhatman can you also add an entry to NEWS.md about this? |
No description provided.