Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added AlphaDropout which is used in SNNs. #656

Merged
merged 10 commits into from
Mar 7, 2019

Conversation

thebhatman
Copy link
Contributor

No description provided.

@MikeInnes
Copy link
Member

Seems generally reasonable, but the implementation is currently dropping gradients. You'll need to avoid in-place broadcast to get this working without param and collect.

@thebhatman
Copy link
Contributor Author

Thank You @MikeInnes. I have made the necessary changes so that gradients are not lost. I am no longer using collect and param.

src/layers/normalise.jl Show resolved Hide resolved
a.active || return x
α = -1.75813631
noise = randn(Float64, size(x.data))
x.data .= x.data .* (noise .> (1 - a.p)) + α .* (noise .<= (1 - a.p))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not assign to x.data, otherwise you . Just do

 x = x* (noise .> (1 - a.p)) + α .* (noise .<= (1 - a.p))

or even better

d = rand(size(x)) .> a.p
 x = @. x * d + α * !d

function (a::AlphaDropout)(x)
a.active || return x
α = -1.75813631
noise = randn(Float64, size(x.data))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

randn gives normally distributed numbers, maybe you want rand here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think randn is more suitable here, as we are required to have a uniform normally distributed output.


function (a::AlphaDropout)(x)
a.active || return x
α = -1.75813631
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to change the return value to Float64 irrespective of x. α should be initialized to the same element type of x

@thebhatman
Copy link
Contributor Author

Thank you @CarloLucibello. I have made the changes suggested by you. Just curious, I wanted to know what would have been the problem if we assigned to x.data.

@MikeInnes MikeInnes requested a review from staticfloat March 6, 2019 16:41
@MikeInnes
Copy link
Member

This looks generally reasonable to me. Would like a quick look over from @staticfloat that it makes sense.

Just curious, I wanted to know what would have been the problem if we assigned to x.data.

Any time you work with .data you drop gradients; the reason we wrap the data in a TrackedArray is exactly so that we can record what happens to it and get gradients.

@MikeInnes
Copy link
Member

Also this needs some numerical tests, e.g. comparing hand-coded output from another implementation, or similar.

Copy link
Contributor

@staticfloat staticfloat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also introduce some numerical tests, preferably calculated by hand or in PyTorch/TF so that we can make sure we never break the numerics here.

a.active || return x
α = eltype(x)(-1.75813631)
noise = randn(eltype(x), size(x))
x = @. x*(noise .> (1 - a.p)) + α .* (noise .<= (1 - a.p))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you use @. you don't need the .> and .* and .<= operators; either use @. to get them all, or manually write them all as dotted operators.


function (a::AlphaDropout)(x)
a.active || return x
α = eltype(x)(-1.75813631)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use the same α parameter as given by the selu() function in NNlib, see the definition here: https://github.com/FluxML/NNlib.jl/blob/d07ac0bfd3c71c3a29bc9c22becbba19227bbeb5/src/activation.jl#L100-L104


function AlphaDropout(p)
@assert 0 ≤ p ≤ 1
AlphaDropout{typeof(p)}(p,true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need {typeof(p)} here; it should be able to determine this from the p parameter:

julia> struct Foo{F}
       p::F
       active::Bool
       end

julia> Foo(1.0, true)
Foo{Float64}(1.0, true)

x = @. x*(noise .> (1 - a.p)) + α .* (noise .<= (1 - a.p))
A = (a.p + a.p * (1 - a.p) * α ^ 2)^0.5
B = -A * α * (1 - a.p)
x = @. A .* x .+ B
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here; this @. is not doing anything, because you have already dotted your operators.

@thebhatman
Copy link
Contributor Author

I realized randn(), which I am using to initialize noise, is not giving a standard normal distribution. Due to which, the mean and variance is slightly different from the mean and variance of input. This can be solved by making use of Normal() which is available in Distributions package. But Flux doesn't have a dependency on Distributions. What would be the best way to get a standard normal distribution? Shall I add Distributions among Flux Dependencies?

@thebhatman
Copy link
Contributor Author

Thank you @staticfloat. I have used the value of lambda and alpha as defined in NNlib.jl. I have removed the redundant dot operations. I will be adding test cases. But before that, I wanted to solve the problem of randn() not giving a standard normal distribution. Can I use Distributions package and use Normal() so that I get a distribution with mean = 0 and variance = 1?

@staticfloat
Copy link
Contributor

I realized randn(), which I am using to initialize noise, is not giving a standard normal distribution.

What do you mean by this? Why do you think that randn() is not a standard normal distribution? Note that the Normal distribution from Distributions.jl uses randn() to generate its random numbers internally.

@thebhatman
Copy link
Contributor Author

`julia> x = randn(2,2)
2×2 Array{Float64,2}:
0.332405 -0.813627
0.446137 -0.644151

julia> using Statistics

julia> mean(x)
-0.16980890659280456

julia> std(x)
0.650925294719868

`

@thebhatman
Copy link
Contributor Author

I was just testing randn() and it doesn't seem to be giving a standard normal distribution.

@MikeInnes
Copy link
Member

That's just sampling error, the mean is only in the limit of a large number of samples.

@staticfloat
Copy link
Contributor

This is completely expected; the mean and standard deviation you compute on normally-distributed random numbers will not always be exactly zero and one. That's not the way randomness works. When you take mean(x), you are calculating an estimate of the true mean of the population that your four numbers were taken from.

Let's imagine we are measuring the height of people in a city. Let's further assume that across the entire city, the heights form a normal distribution with a certain mean and standard deviation. If I just take 4 random people from the city, the mean of their heights will probably not be the "true" mean; it will just be an estimate of it. I may be unlucky and grab four people that are all on the very tall side of things. The "error" of my mean estimate may be very high.

You can read here for more information on calculating the error bounds of your statistical estimates, but to convince yourself, try making a matrix that is much larger than 2x2, say, 10000x10000, and see what the statistics of that look like. I guarantee they will be much closer to a mean of 0 and a variance of 1, but they will also not be exactly 0 and 1.

@thebhatman
Copy link
Contributor Author

Oh! That was a nice example. Thank you for clearing my doubt.

function (a::AlphaDropout)(x)
a.active || return x
λ = 1.0507009873554804934193349852946
α = 1.6732632423543772848170429916717
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to have λ and α be oftype(eltype(x), ....) so that we don't have to convert lower on down; that might help us generate code to run on platforms where, for instance, Float64 literals are not supported (that's not the case on CPUs, but for things like TPUs it's helpful to stay in the proper datatype as much as possible)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@thebhatman
Copy link
Contributor Author

The Travis CI build failed after I resolved the merge conflict, which was raised due to InstanceNorm.

@staticfloat
Copy link
Contributor

Thanks @thebhatman this looks pretty good!

@staticfloat staticfloat merged commit bc12a4d into FluxML:master Mar 7, 2019
@MikeInnes
Copy link
Member

Awesome, thanks a lot @thebhatman and @staticfloat for the review!

@thebhatman can you also add an entry to NEWS.md about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants