-
-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define activation functions taking arrays as input #423
Conversation
Convolution fuzzing tests are already failing on master |
The canonical definition would almost always broadcast in the forwards
pass, which GPUCompiler usually catches and dispatches to the kernels with
array arguments anyway. Can we get an idea of how we dispatch to these when most
forwards passes broadcast by default?
…On Sun, Jun 19, 2022, 16:51 Kyle Daruwalla ***@***.***> wrote:
Convolution fuzzing tests are already failing on master
—
Reply to this email directly, view it on GitHub
<#423 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJOZVVKHC3ZOHFCKOQLUS53VP37EXANCNFSM5ZGCAU3Q>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
What do you mean? The canonical definition of what? If it's the layer forward definitions, then this doesn't change those. All this does is allow |
The forward passes in many cases looks like act.(f(x)) Ie with the activation broadcasted over an object. In this case, the activation function provided has to be (x -> relu.(x)).(f(x)) Thus what actually happens is that the anonymous function actually receives a scalar and works anyway since numbers are iterable. I think in most cases it doesn't make much difference but that is what is happening unless the compiler can optimise the extra broadcast away. In AD, we would actually have to see both the outer broadcast and the inner broadcast and generate pullback for both (this hopefully isn't the case with a lens based system that can interleave optimisation and compilation, but is the case elsewhere). To be clear, I'm in favour of picking points of optimisation and simplifications, i just wanted to clarify that it's mostly useful for cases when the activation function sees an iterable of arrays (like a tuple or vector of array), and if there are specific advantages to automatically calling broadcasted operations (say for fusion) then perhaps overloading |
Could we get a patch release with this? Would be helpful 😅 |
I missed this, but before we release, is it a good idea? It means Another level at which this "do what I mean" could be implemented is to make the |
Unfortunately, we already released this change.
My immediate thought is: when is Random thoughts:
|
I agree it's what's normally meant, it just seems a bit contrary to Julia's normal behaviour. My My |
|
The nuclear option is to do what PyTorch does and make vectorized activation layer types/constructors, then not export any of the activation functions themselves so that users are incentivized to use the former. That gets around some of the confusion with |
An attempt to fix #422...hopefully this doesn't break anything