generalize softmax #77

CarloLucibello · 2018-11-28T07:32:51Z

the softmax functions should be generalized to handle reduction across any dimensions, e.g.:

softmax(x, dims=2)
softmax(x, dims=(1,3))

bhvieira · 2019-08-02T13:12:26Z

Is this solved? I have some interest in this to add attention support in Flux.

CarloLucibello · 2019-08-02T13:18:02Z

Looking at this https://github.com/FluxML/NNlib.jl/blob/342928eb4478da9c7b1433ec75c8eb8a9b155747/src/softmax.jl
it seems that the issue with softmax has been solved, but softmax! and logsoftmax still don't support reduction over arbitrary dimensions.

mcabbott · 2019-09-02T10:10:48Z

The softmax on master (which takes dims=1) is only half as fast as the old one:

mm = rand(20,100);                                          
@btime softmax($mm);  # tagged, with softmax!,  23.076 μs (1 allocation: 15.75 KiB)
@btime softmax1($mm); # master, with dims=1,    47.657 μs (13 allocations: 33.45 KiB)

Was this discussed somewhere? Are there goals besides being generic? Some variants which are almost as fast:

function softmax2(xs::AbstractArray{T}; dims=1) where {T}
    temp = maximum(xs, dims=dims)
    out = exp.(xs .- temp)
    out ./= sum!(temp, out)
end
function softmax3(xs::AbstractArray{T}; dims=1) where {T}
    max = maximum(xs, dims=dims)
    out = exp.(xs .- max)
    out ./ sum(out, dims=dims)
end

@btime softmax2($mm); # re-using temp,   27.382 μs (11 allocations: 16.83 KiB) 
@btime softmax3($mm); # no mutation,     26.462 μs (13 allocations: 33.45 KiB)

CarloLucibello · 2020-02-27T13:47:28Z

softmax and logsoftmax now support dims keyword, so this can be closed.
@mcabbott fill free to open a new Issue/PR if you think there is some performance problem.
The current implementation seems to be exactly the same as your softmax3

NNlib.jl/src/softmax.jl

Line 28 in c82a76d

function softmax(xs::AbstractArray; dims=1)

staticfloat mentioned this issue Feb 24, 2019

Major overhaul of NNlib #94

Merged

3 tasks

pshashk mentioned this issue Mar 27, 2019

Plans for supporting higher dimensional data? FluxML/Flux.jl#708

Closed

MikeInnes added the help wanted label Apr 4, 2019

mcabbott mentioned this issue Sep 26, 2019

logsoftmax with dims #135

Merged

CarloLucibello closed this as completed Feb 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generalize softmax #77

generalize softmax #77

CarloLucibello commented Nov 28, 2018

bhvieira commented Aug 2, 2019

CarloLucibello commented Aug 2, 2019

mcabbott commented Sep 2, 2019

CarloLucibello commented Feb 27, 2020

generalize softmax #77

generalize softmax #77

Comments

CarloLucibello commented Nov 28, 2018

bhvieira commented Aug 2, 2019

CarloLucibello commented Aug 2, 2019

mcabbott commented Sep 2, 2019

CarloLucibello commented Feb 27, 2020