Make softmax! dimension-agnostic #130

jumerckx · 2019-06-22T15:26:32Z

I tried reducing the allocations for a dimension-agnostic softmax implementation but there is still a few allocations. On my pc benchmarking this against the original implementation gives very similar results however.
As for logsoftmax, I'm not quite sure what's happening in the original implementation so I've not yet tried to generalize that.
EDIT:
Of course, I should note I also haven't changed the derivatives yet since I'm not sure what they should be.

codecov-io · 2019-06-22T15:55:23Z

Codecov Report

Merging #130 into master will decrease coverage by 1.75%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #130      +/-   ##
==========================================
- Coverage   80.78%   79.03%   -1.76%     
==========================================
  Files          24       24              
  Lines         760      763       +3     
==========================================
- Hits          614      603      -11     
- Misses        146      160      +14

Impacted Files	Coverage Δ
src/softmax.jl	`50% <100%> (-43.11%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ea355e...16262e8. Read the comment docs.

jumerckx · 2019-06-23T21:54:05Z

softmax(Flux.param(rand(10,10)), dims=2) throws an error:
MethodError: no method matching Float64(::Tracker.TrackedReal{Float64}).
I believe this is because similar(TrackedArray) returns a regular array but I'm not quite sure why the error doesn't occur when using dims=1?
Thanks
Jules

MikeInnes · 2019-06-24T11:25:10Z

If we're going to do this, I think the best way might be to just remove the in-place version entirely and write the concise array-level version.

Probably, the gradient issue is because softmax(x) has an explicit gradient but softmax(x; kw...) doesn't, or something similar. After adding this we'll have to have PRs to Tracker, Zygote etc.

src/softmax.jl

make softmax out-of-place; remove gradient code

MikeInnes · 2019-06-25T12:45:50Z

Sorry, I don't think we should actually remove definitions here even if they aren't used directly by NNlib (e.g. gradients and in place versions). If you can simplify how they're implemented that's still useful. But they get overloaded by things like CUDNN so we'll need to keep them for compatibility at least.

jumerckx · 2019-06-25T12:50:13Z

Right, I'm still figuring out all this git stuff.
So only the softmax(xs; dims) should be added and the rest should just stay, correct?

src/softmax.jl

MikeInnes · 2019-06-25T14:44:41Z

Would it be easy to generalise the definition of the gradient to handle dimensions as well? Not necessarily essential but would be a big help in updating our AD, etc.

jumerckx · 2019-06-25T14:57:03Z

I'm not sure what the derivative function should look like, presumably it isn't as straightforward as adding dims=dims everywhere?
I believe this is something that should be implemented by someone more experienced than myself.

A huge thank you also for guiding me through this pr as this is my first real code contribution, albeit a rather small one.

jekbradbury · 2019-06-25T17:30:39Z

I'm not sure what the derivative function should look like, presumably it isn't as straightforward as adding dims=dims everywhere?

I'm pretty sure that's all you need to do.

MikeInnes · 2019-07-08T14:12:23Z

Great, thanks. We'll need to update the CUDA wrappers so that CUDNN gets called, but this shouldn't in itself break stuff.

MikeInnes · 2019-07-08T14:12:35Z

Thanks @merckxiaan!

jumerckx added 3 commits June 22, 2019 17:22

make softmax! dimension-agnostic

5de0867

fix dimension arguments

3b85e46

set default dimension to 1

9dee1cc

jumerckx added 2 commits June 24, 2019 20:44

use broadcasting

98ca791

typo

cfa3201

MikeInnes reviewed Jun 25, 2019

View reviewed changes

src/softmax.jl Outdated Show resolved Hide resolved

jumerckx and others added 2 commits June 25, 2019 14:36

make softmax out-of-place; remove gradient code

599856e

Merge pull request #1 from merckxiaan/softmax-derivative

ce106cd

make softmax out-of-place; remove gradient code

revert things and add simple broadcast version

671f155

MikeInnes reviewed Jun 25, 2019

View reviewed changes

src/softmax.jl Outdated Show resolved Hide resolved

MikeInnes reviewed Jun 25, 2019

View reviewed changes

src/softmax.jl Outdated Show resolved Hide resolved

fix formatting errors, remove original definition

eec6d74

grad function for softmax

16262e8

MikeInnes merged commit 342928e into FluxML:master Jul 8, 2019

mcabbott mentioned this pull request Sep 26, 2019

logsoftmax with dims #135

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make softmax! dimension-agnostic #130

Make softmax! dimension-agnostic #130

jumerckx commented Jun 22, 2019 •

edited

Loading

codecov-io commented Jun 22, 2019 •

edited

Loading

jumerckx commented Jun 23, 2019

MikeInnes commented Jun 24, 2019

MikeInnes commented Jun 25, 2019

jumerckx commented Jun 25, 2019

MikeInnes commented Jun 25, 2019

jumerckx commented Jun 25, 2019

jekbradbury commented Jun 25, 2019

MikeInnes commented Jul 8, 2019

MikeInnes commented Jul 8, 2019

Make softmax! dimension-agnostic #130

Make softmax! dimension-agnostic #130

Conversation

jumerckx commented Jun 22, 2019 • edited Loading

codecov-io commented Jun 22, 2019 • edited Loading

Codecov Report

jumerckx commented Jun 23, 2019

MikeInnes commented Jun 24, 2019

MikeInnes commented Jun 25, 2019

jumerckx commented Jun 25, 2019

MikeInnes commented Jun 25, 2019

jumerckx commented Jun 25, 2019

jekbradbury commented Jun 25, 2019

MikeInnes commented Jul 8, 2019

MikeInnes commented Jul 8, 2019

jumerckx commented Jun 22, 2019 •

edited

Loading

codecov-io commented Jun 22, 2019 •

edited

Loading