-
-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generalize softmax #77
Comments
Is this solved? I have some interest in this to add attention support in Flux. |
Looking at this https://github.com/FluxML/NNlib.jl/blob/342928eb4478da9c7b1433ec75c8eb8a9b155747/src/softmax.jl |
The mm = rand(20,100);
@btime softmax($mm); # tagged, with softmax!, 23.076 μs (1 allocation: 15.75 KiB)
@btime softmax1($mm); # master, with dims=1, 47.657 μs (13 allocations: 33.45 KiB) Was this discussed somewhere? Are there goals besides being generic? Some variants which are almost as fast: function softmax2(xs::AbstractArray{T}; dims=1) where {T}
temp = maximum(xs, dims=dims)
out = exp.(xs .- temp)
out ./= sum!(temp, out)
end
function softmax3(xs::AbstractArray{T}; dims=1) where {T}
max = maximum(xs, dims=dims)
out = exp.(xs .- max)
out ./ sum(out, dims=dims)
end
@btime softmax2($mm); # re-using temp, 27.382 μs (11 allocations: 16.83 KiB)
@btime softmax3($mm); # no mutation, 26.462 μs (13 allocations: 33.45 KiB) |
the softmax functions should be generalized to handle reduction across any dimensions, e.g.:
The text was updated successfully, but these errors were encountered: