Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pure-Julia convolutions #9

Closed
wants to merge 2 commits into from
Closed

Pure-Julia convolutions #9

wants to merge 2 commits into from

Conversation

MikeInnes
Copy link
Member

Accepts an input W+ C N and a kernel W+ Cin Cout (similar to Knet, although we don't flip the channel dimension of the kernel).

This is fairly naive – it's very fast for basic convolutions but struggles a bit more when you have multiple channels.

@iblislin
Copy link
Contributor

need to bump REQUIRE?

@MikeInnes
Copy link
Member Author

Good catch.

FWIW there's a bunch of stuff up for grabs here, e.g. implementing gradients and pooling. I'll happily take extremely slow / naive implementations to get things going.

@iblislin
Copy link
Contributor

Any example usage for this API?

@dfdx
Copy link
Contributor

dfdx commented Nov 21, 2017

A couple of notes to keep in mind:

  1. At least padding and strides should be supported.
  2. Backpropagation also requires gradient of convolution. Basic gradient is also a convolution, but I'm not sure if the same parameters - padding and strides - can be passed the same way.
  3. I haven't seen a deep learning practitioner doing many convolutions on CPU yet, so it's worth to align API with that of GPU counterparts (most notably, cuDNN). In particular, cuDNN always assume 4D data (with, height, channels and batch size).

@MikeInnes
Copy link
Member Author

MikeInnes commented Nov 21, 2017

  1. Yes, although in Julia we can implement e.g. a pad function that doesn't copy, so it doesn't have to be special-cased in the core convolution algorithm; you can just write conv(pad(x, 2), w). I'm hoping to prototype this soon.
  2. If you do the above you will also get gradients for free (though you might still want fused versions of common cases).
  3. We do need to be able to take advantage of cudnn's optimisations, but I don't think that will pose an issue (e.g. we can easily reshape to 4D where necessary in wrappers). Cudnn is also not known for being an exceptionally clean API, so it's not necessarily where we want to take inspiration from.

@FluxML FluxML deleted a comment from codecov-io Nov 21, 2017
@MikeInnes
Copy link
Member Author

@iblis17 the API is pretty simple, we just define a weight like

w = randn(2, 2, 3, 5)

which is a 2x2 convolution from 3 channels to 5. Then we can call it with

im = rand(100, 100, 3)
conv(im, w)

A nice property of this is that you can drop trailing dimensions of the image (particularly the batch dimension, which is implicitly 1 in this case). It's also completely generic across number of dimensions, which seems a lot nicer than having several convNd functions up to some arbitrary N. (Most current systems do this, but only because being generic is impractical in other languages). Would be interested to hear of potential downsides though.

@dfdx
Copy link
Contributor

dfdx commented Nov 22, 2017

A nice property of this is that you can drop trailing dimensions of the image (particularly the batch dimension, which is implicitly 1 in this case). It's also completely generic across number of dimensions, which seems a lot nicer than having several convNd functions up to some arbitrary N

Not quite, this way you wouldn't be able to distinguish between a single 2D image and a batch of 1D inputs or a single 3D image and a batch of 2D ones. Maybe we can properly dispatch using the second argument, but it still doesn't sound like a clear API for me.

you can just write conv(pad(x, 2), w)

I believe we need to preserve the way people use convolutions in other languages/libraries which is to use keyword. Also, it's again unclear how to map this to cuDNN case.

Anyway, I'm more worried about strides. I don't think you can implement them using any kind of view, but even if you can, this would break array memory contiguity and, I suppose, invalidate of slow down some algorithms.

All this stuff requires quite a lot of investigation, I should say.

If you do the above you will also get gradients for free (though you might still want fused versions of common cases).

Once again, strides are harder to handle than padding. Also keep in mind pooling which requires strides ~99% of times and gradients for them should be thought out separately.

@ChrisRackauckas
Copy link
Member

w = randn(2, 2, 3, 5)

How come the weight is like that instead of rand(2,2,3)? It seems to me like you're talking about a 2x2 weight stencil for 3 channels, so I don't understand what the 5 is for.

@ChrisRackauckas
Copy link
Member

Does this accept static array stencils?

@iblislin
Copy link
Contributor

How come the weight is like that instead of rand(2,2,3)? It seems to me like you're talking about a 2x2 weight stencil for 3 channels, so I don't understand what the 5 is for.

5 indicates there are 5 filters (each fiter is 2x2 with 3 channel)

@ChrisRackauckas
Copy link
Member

ChrisRackauckas commented Feb 15, 2018

But if the channels are independent (the stencil doesn't apply between channels, just on each channel), is this the same operation as if it were reshaped to rand(2,2,15)?

@iblislin
Copy link
Contributor

iblislin commented Feb 15, 2018

But if the channels are independent, is this the same operation as if it were reshaped to rand(2,2,15)?

well, I think it depends on your data. There is a slightly different.

Consider that in case of 2x2x3x5: we have image input in R,G,B order, so our filter will learn the relationship between R,G,B. So, for testing phase, we won't want to the R,G,B input order.

In case of "channels are independent", please checkout depth-wise convolution.

@staticfloat
Copy link
Contributor

Superseded by an absolutely mind-bogglingly less efficient amount of code (in terms of SLOC) in #94.

ToucheSir pushed a commit that referenced this pull request Feb 13, 2023
Add group support for convolutions
ToucheSir pushed a commit that referenced this pull request Feb 13, 2023
Add group support for convolutions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants