-
-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make softmax! dimension-agnostic #130
Conversation
Codecov Report
@@ Coverage Diff @@
## master #130 +/- ##
==========================================
- Coverage 80.78% 79.03% -1.76%
==========================================
Files 24 24
Lines 760 763 +3
==========================================
- Hits 614 603 -11
- Misses 146 160 +14
Continue to review full report at Codecov.
|
|
If we're going to do this, I think the best way might be to just remove the in-place version entirely and write the concise array-level version. Probably, the gradient issue is because |
make softmax out-of-place; remove gradient code
Sorry, I don't think we should actually remove definitions here even if they aren't used directly by NNlib (e.g. gradients and in place versions). If you can simplify how they're implemented that's still useful. But they get overloaded by things like CUDNN so we'll need to keep them for compatibility at least. |
Right, I'm still figuring out all this git stuff. |
Would it be easy to generalise the definition of the gradient to handle dimensions as well? Not necessarily essential but would be a big help in updating our AD, etc. |
I'm not sure what the derivative function should look like, presumably it isn't as straightforward as adding A huge thank you also for guiding me through this pr as this is my first real code contribution, albeit a rather small one. |
I'm pretty sure that's all you need to do. |
Great, thanks. We'll need to update the CUDA wrappers so that CUDNN gets called, but this shouldn't in itself break stuff. |
Thanks @merckxiaan! |
I tried reducing the allocations for a dimension-agnostic softmax implementation but there is still a few allocations. On my pc benchmarking this against the original implementation gives very similar results however.
As for
logsoftmax
, I'm not quite sure what's happening in the original implementation so I've not yet tried to generalize that.EDIT:
Of course, I should note I also haven't changed the derivatives yet since I'm not sure what they should be.