Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make softmax! dimension-agnostic #130

Merged
merged 10 commits into from
Jul 8, 2019
Merged

Make softmax! dimension-agnostic #130

merged 10 commits into from
Jul 8, 2019

Conversation

jumerckx
Copy link
Contributor

@jumerckx jumerckx commented Jun 22, 2019

I tried reducing the allocations for a dimension-agnostic softmax implementation but there is still a few allocations. On my pc benchmarking this against the original implementation gives very similar results however.
As for logsoftmax, I'm not quite sure what's happening in the original implementation so I've not yet tried to generalize that.
EDIT:
Of course, I should note I also haven't changed the derivatives yet since I'm not sure what they should be.

@codecov-io
Copy link

codecov-io commented Jun 22, 2019

Codecov Report

Merging #130 into master will decrease coverage by 1.75%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #130      +/-   ##
==========================================
- Coverage   80.78%   79.03%   -1.76%     
==========================================
  Files          24       24              
  Lines         760      763       +3     
==========================================
- Hits          614      603      -11     
- Misses        146      160      +14
Impacted Files Coverage Δ
src/softmax.jl 50% <100%> (-43.11%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ea355e...16262e8. Read the comment docs.

@jumerckx
Copy link
Contributor Author

softmax(Flux.param(rand(10,10)), dims=2) throws an error:
MethodError: no method matching Float64(::Tracker.TrackedReal{Float64}).
I believe this is because similar(TrackedArray) returns a regular array but I'm not quite sure why the error doesn't occur when using dims=1?
Thanks
Jules

@MikeInnes
Copy link
Member

If we're going to do this, I think the best way might be to just remove the in-place version entirely and write the concise array-level version.

Probably, the gradient issue is because softmax(x) has an explicit gradient but softmax(x; kw...) doesn't, or something similar. After adding this we'll have to have PRs to Tracker, Zygote etc.

@MikeInnes
Copy link
Member

Sorry, I don't think we should actually remove definitions here even if they aren't used directly by NNlib (e.g. gradients and in place versions). If you can simplify how they're implemented that's still useful. But they get overloaded by things like CUDNN so we'll need to keep them for compatibility at least.

@jumerckx
Copy link
Contributor Author

Right, I'm still figuring out all this git stuff.
So only the softmax(xs; dims) should be added and the rest should just stay, correct?

@MikeInnes
Copy link
Member

Would it be easy to generalise the definition of the gradient to handle dimensions as well? Not necessarily essential but would be a big help in updating our AD, etc.

@jumerckx
Copy link
Contributor Author

I'm not sure what the derivative function should look like, presumably it isn't as straightforward as adding dims=dims everywhere?
I believe this is something that should be implemented by someone more experienced than myself.

A huge thank you also for guiding me through this pr as this is my first real code contribution, albeit a rather small one.

@jekbradbury
Copy link
Contributor

I'm not sure what the derivative function should look like, presumably it isn't as straightforward as adding dims=dims everywhere?

I'm pretty sure that's all you need to do.

@MikeInnes
Copy link
Member

Great, thanks. We'll need to update the CUDA wrappers so that CUDNN gets called, but this shouldn't in itself break stuff.

@MikeInnes MikeInnes merged commit 342928e into FluxML:master Jul 8, 2019
@MikeInnes
Copy link
Member

Thanks @merckxiaan!

@mcabbott mcabbott mentioned this pull request Sep 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants