Add MaxOut layer #647

oxinabox · 2019-02-27T12:05:56Z

No description provided.

docs/src/models/layers.md

src/layers/basic.jl

MikeInnes · 2019-02-27T17:38:08Z

src/layers/basic.jl

+end
+
+"""
+    MaxOut(f, n_alts, args...; kwargs...)


Why not just pass in those layers, like MaxOut(a, b, c)? Or at least avoid simulating a closure by writing MaxOut(n -> Dense(...), 4). We could have both but then would have to pass a tuple in the first case.

In general, using different layers is not a thing.
Also using layers that are not dense/linear is not a thing.
The paper is just about identically structured dense linear layers.

Allowing them to be different, and allowing them to be any kind of
felt like a natural generalisation, and it made testing easier.

Here the line in my code using this

MaxOut(Dense, 4, insize, hlsize)

Here is the Keras code this is replacing

MaxoutDense(hlsize, input_dim=input_units, nb_feature=4, bias=True

What are your proposals?

Makes sense to optimise for the common case of course. But in that case I think MaxOut(_ -> Dense(insize, outsize), 4) gives a clearer sense that this is doing something with 4 dense layers (it looks like ntuple, which is basically what it is).

This is growing on me.

Another option is:
MaxOut(Dense(insize, outsize), 4),
and then we call deepcopy on the first argument to get 4 versions of it.
Not sure I like that. Very hard to know if it would do it correctly.

Ok, I've sold myself on your idea.
It seperates which thigns are arguments of the inner layer, and which things are arguments of the outer layer.
Going forward we sould make this a convention if it comes up again.

MikeInnes · 2019-02-27T17:39:40Z

src/layers/basic.jl

+end
+
+function (mo::MaxOut)(input::AbstractArray)
+    mapreduce(f -> f(input), (acc, out) -> max.(acc, out), mo.over)


I'd find max.(map(f -> f(input), mo.over)...) clearer.

hmmm...
Will the compiler unroll the map and the splat?

It should be possible to do so since everything is tuples.

Mapreduce is noticably faster.
over 2x faster.

julia> foo(fs,x) = max.(map(f->f(x), fs)...) foo (generic function with 1 method) julia> bar(fs,x) = mapreduce(f->f(x), (a,b)->max.(a,b), fs) bar (generic function with 1 method) julia> @btime foo((x->2x, x->0.5x, x->3x, x->4x, x->6x), $[1:1000.0;]); 9.469 μs (9 allocations: 47.72 KiB) julia> @btime bar((x->2x, x->0.5x, x->3x, x->4x, x->6x), $[1:1000.0;]); 4.989 μs (11 allocations: 71.47 KiB)

Just to record what I mentioned on slack, tracked arrays might have slightly different perf here (broadcast is much heavier and SIMD probably won't kick in either way), so it'd be worth giving that case a quick test.

Even for TrackedArrays

julia> data = Flux.TrackedArray(collect(1:1000)); julia> using BenchmarkTools julia> foo(fs,x) = max.(map(f->f(x), fs)...) foo (generic function with 1 method) julia> bar(fs,x) = mapreduce(f->f(x), (a,b)->max.(a,b), fs) bar (generic function with 1 method) julia> @btime bar((x->2x, x->0.5x, x->3x, x->4x, x->6x), $data); 7.891 μs (65 allocations: 73.08 KiB) julia> @btime foo((x->2x, x->0.5x, x->3x, x->4x, x->6x), $data); 12.359 μs (56 allocations: 49.13 KiB)

We hafve resolved in slack that we will use mapreduce
and if anyone complains it is too slow rewrite it using a generated function.

nickrobinson251 · 2019-03-02T18:53:55Z

Very minor: the name Maxout would be consistent with the paper (and with Dropout), as opposed to MaxOut :)

oxinabox · 2019-03-06T18:28:56Z

Done

MikeInnes · 2019-03-07T10:31:37Z

src/layers/basic.jl

+```
+"""
+function Maxout(f, n_alts, args...; kwargs...)
+  over = Tuple(f(args...; kwargs...) for _ in 1:n_alts)


The signature needs an update now that we're supporting the closure-ised version instead of passing args through.

Also I think it'd make sense to use ntuple(f, n_alts) to construct the tuple. It's clean and it means you can write _ -> Dense(...), which saves a character :) but the n might be useful in some cases also.

It could be useful.
I worry that we are overengineering generality in to this that is not present in the paper.

If we wanted to make it easier/less complex to just do the thing in the paper (and the thing Karas lets you do)
could add.

MaxoutDense(insize, outsize, n_alts) = Maxout(_->Dense(insize, outsize), n_alts)

But looking at that writen I don't know that I like the LHS more than the RHS anyway.

I've thought about it more.
I can't imagine passing in the number would every be useful.
Maxout is defined as the max over structurally identical internal layers.

Most things you might thing about doing with it are

going to break it Maxout(n->Dense(128, 128n), 4)

pointless Maxout(n->n*Dense(128, 128), 4)

Weird Maxout(n->Dense(128,128; σ=(relu, identity, tanh, sigmoid)[n])

Complex enough that they should be written seperately:

Maxout(n->Chain(Dense(128,128n, σ=sigmoid), Dense(128n, 128)), 4)

vs

inner_layers = ntuple(n->Chain(Dense(128,128n, σ=sigmoid), Dense(128n, 128)), 4) Maxout(inner_layers)

I think it is better to be able to say if you want to construct differing internal layers,
then you should do so explictly and pass them in as a tuple

This extraflexibility I think makes the code harder to read (even if it can be 1 character shorter).
it is readily apparent what a ()->f(z) does, and how I can change/tweak it.
where as the purpose of the argument in _->f(z) is not clear, and it isn't obvious what the n is doing in n->g(z,n)

docs/src/models/layers.md

src/Flux.jl

docs/src/models/layers.md

Co-Authored-By: oxinabox <[email protected]>

oxinabox · 2019-03-18T12:26:24Z

Any further thoughts?

MikeInnes · 2019-03-21T11:25:54Z

I think we are in general agreement on how to tweak the API; I think it'll be good to go once that's done.

oxinabox · 2019-03-21T12:06:50Z

API tweaks are done.
It matches as discussed in #647 (comment)

MikeInnes · 2019-03-21T12:54:14Z

src/layers/basic.jl

+end
+
+"""
+    Maxout(f, n_alts, args...; kwargs...)


If we're supporting the closure-based API we can remove the splatting from the signature.

I think we may as well document the signature as Maxout(() -> ...) if that's always the intended use.

Ah yes, mmy bad i forget to remove the arguments.

it is Maxout(f, n_alts)
I feel like Maxout(()->..., n_alts) is a bit less clear than having f then being able to reference f in the docstring below.

The docstring does incluide an example

Maxout(()->Dense(insize, outsize), 4)

MikeInnes · 2019-03-22T12:18:51Z

Alright, I think this is pretty perfect. Thanks a lot @oxinabox!

Add MaxOut layer

1794: Tidy up `Maxout` r=mcabbott a=mcabbott Maxout is from #698 . This: * adds pretty printing * changes the explicit signature to `Maxout(layer, layer, layer)`, rather than providing a tuple, to be more like other layers (with deprecation) * adds more examples to the docstring, and combines the two * changes not to use `mapreduce`. I see now this was a performance choice at the time, discussed here #647 (comment) , but with Zygote this is much slower. Before: ``` julia> using Flux julia> m3 = Maxout(() -> Dense(5, 7, tanh), 3) Maxout{Tuple{Dense{typeof(tanh), Matrix{Float32}, Vector{Float32}}, Dense{typeof(tanh), Matrix{Float32}, Vector{Float32}}, Dense{typeof(tanh), Matrix{Float32}, Vector{Float32}}}}((Dense(5, 7, tanh), Dense(5, 7, tanh), Dense(5, 7, tanh))) julia> x = rand(Float32, 5, 11); julia> `@btime` gradient(sum∘m3, $x); min 112.792 μs, mean 123.774 μs (930 allocations, 49.09 KiB. GC mean 3.71%) ``` After: ``` julia> m3 = Maxout(() -> Dense(5, 7, tanh), 3) Maxout( Dense(5, 7, tanh), # 42 parameters Dense(5, 7, tanh), # 42 parameters Dense(5, 7, tanh), # 42 parameters ) # Total: 6 arrays, 126 parameters, 888 bytes. julia> x = rand(Float32, 5, 11); julia> `@btime` gradient(sum∘m3, $x); min 34.541 μs, mean 38.448 μs (493 allocations, 32.48 KiB. GC mean 6.63%) ``` Co-authored-by: Michael Abbott <[email protected]>

aviks reviewed Feb 27, 2019

View reviewed changes

docs/src/models/layers.md Outdated Show resolved Hide resolved

KristofferC reviewed Feb 27, 2019

View reviewed changes

src/layers/basic.jl Outdated Show resolved Hide resolved

KristofferC reviewed Feb 27, 2019

View reviewed changes

src/layers/basic.jl Outdated Show resolved Hide resolved

MikeInnes reviewed Feb 27, 2019

View reviewed changes

MikeInnes reviewed Mar 7, 2019

View reviewed changes

nickrobinson251 suggested changes Mar 7, 2019

View reviewed changes

docs/src/models/layers.md Outdated Show resolved Hide resolved

src/Flux.jl Outdated Show resolved Hide resolved

docs/src/models/layers.md Outdated Show resolved Hide resolved

oxinabox and others added 12 commits March 18, 2019 12:18

move Dense's overloads to be near its defn

79de829

Add MaxOut layer

fcc3ec4

do things to docs

c1a33c5

Update src/layers/basic.jl

b84a60e

Co-Authored-By: oxinabox <[email protected]>

fix docs

838047f

fix docs

c76b9c7

take zero-arge closure

e23c8dd

correct casing

ca68bf9

Update docs/src/models/layers.md

2bc4b8d

Co-Authored-By: oxinabox <[email protected]>

Update src/Flux.jl

f222555

Co-Authored-By: oxinabox <[email protected]>

Update docs/src/models/layers.md

025d9b6

Co-Authored-By: oxinabox <[email protected]>

update docstring

7d247ea

oxinabox force-pushed the ox/maxout branch from b64eebe to 7d247ea Compare March 18, 2019 12:25

MikeInnes reviewed Mar 21, 2019

View reviewed changes

no arg closures

401d3da

MikeInnes merged commit b637311 into FluxML:master Mar 22, 2019

oxinabox mentioned this pull request Mar 25, 2019

make Maxout trainable #698

Merged

BerenMillidge pushed a commit to BerenMillidge/Flux.jl that referenced this pull request Dec 20, 2019

Merge pull request FluxML#647 from oxinabox/ox/maxout

9f6921d

Add MaxOut layer

mcabbott mentioned this pull request Nov 30, 2021

Tidy up Maxout #1794

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MaxOut layer #647

Add MaxOut layer #647

oxinabox commented Feb 27, 2019

MikeInnes Feb 27, 2019 •

edited

Loading

oxinabox Feb 27, 2019

MikeInnes Feb 28, 2019

oxinabox Feb 28, 2019 •

edited

Loading

oxinabox Feb 28, 2019

MikeInnes Feb 27, 2019

oxinabox Feb 27, 2019

oxinabox Feb 27, 2019

MikeInnes Feb 28, 2019

oxinabox Feb 28, 2019

nickrobinson251 commented Mar 2, 2019

oxinabox commented Mar 6, 2019

MikeInnes Mar 7, 2019

oxinabox Mar 7, 2019 •

edited

Loading

oxinabox Mar 11, 2019 •

edited

Loading

oxinabox commented Mar 18, 2019

MikeInnes commented Mar 21, 2019

oxinabox commented Mar 21, 2019

MikeInnes Mar 21, 2019

oxinabox Mar 21, 2019

oxinabox Mar 21, 2019

MikeInnes commented Mar 22, 2019

Add MaxOut layer #647

Add MaxOut layer #647

Conversation

oxinabox commented Feb 27, 2019

MikeInnes Feb 27, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oxinabox Feb 28, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nickrobinson251 commented Mar 2, 2019

oxinabox commented Mar 6, 2019

Choose a reason for hiding this comment

oxinabox Mar 7, 2019 • edited Loading

Choose a reason for hiding this comment

oxinabox Mar 11, 2019 • edited Loading

Choose a reason for hiding this comment

oxinabox commented Mar 18, 2019

MikeInnes commented Mar 21, 2019

oxinabox commented Mar 21, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeInnes commented Mar 22, 2019

MikeInnes Feb 27, 2019 •

edited

Loading

oxinabox Feb 28, 2019 •

edited

Loading

oxinabox Mar 7, 2019 •

edited

Loading

oxinabox Mar 11, 2019 •

edited

Loading