Skip to content

Commit

Permalink
Re-organise "built-in layers" section (#2112)
Browse files Browse the repository at this point in the history
* re-organise built-in layers section

* fixup

* more create_bias somewhere more logical

* add a warning about other AD breaking automagic train mode

* remove mention of at-layer macro for now

* fix some links

Co-authored-by: Saransh Chopra <[email protected]>

Co-authored-by: Saransh Chopra <[email protected]>
  • Loading branch information
mcabbott and Saransh-cpp authored Nov 30, 2022
1 parent bf3cf8b commit a8dfcfc
Show file tree
Hide file tree
Showing 4 changed files with 85 additions and 32 deletions.
1 change: 0 additions & 1 deletion docs/src/models/basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,5 +233,4 @@ Affine(3 => 1, bias=false, init=ones) |> gpu

```@docs
Functors.@functor
Flux.create_bias
```
110 changes: 80 additions & 30 deletions docs/src/models/layers.md
Original file line number Diff line number Diff line change
@@ -1,86 +1,136 @@
# Basic Layers
# Built-in Layer Types

If you started at the beginning of the guide, then you have already met the
basic [`Dense`](@ref) layer, and seen [`Chain`](@ref) for combining layers.
These core layers form the foundation of almost all neural networks.

The `Dense` exemplifies several features:

* It contains an an [activation function](@ref man-activations), which is broadcasted over the output. Because this broadcast can be fused with other operations, doing so is more efficient than applying the activation function separately.

* It take an `init` keyword, which accepts a function acting like `rand`. That is, `init(2,3,4)` should create an array of this size. Flux has [many such functions](@ref man-init-funcs) built-in. All make a CPU array, moved later with [`gpu`](@ref Flux.gpu) if desired.

* The bias vector is always intialised [`Flux.zeros32`](@ref). The keyword `bias=false` will turn this off, i.e. keeping the bias permanently zero.

* It is annotated with [`@functor`](@ref Functors.@functor), which means that [`params`](@ref Flux.params) will see the contents, and [`gpu`](@ref Flux.gpu) will move their arrays to the GPU.

By contrast, `Chain` itself contains no parameters, but connects other layers together.
The section on [dataflow layers](@ref man-dataflow-layers) introduces others like this,

## Fully Connected

```@docs
Chain
Dense
Flux.Bilinear
Flux.Scale
```

## Convolution and Pooling Layers
Perhaps `Scale` isn't quite fully connected, but it may be thought of as `Dense(Diagonal(s.weights), s.bias)`, and LinearAlgebra's `Diagonal` is a matrix which just happens to contain many zeros.

## Convolution Models

These layers are used to build convolutional neural networks (CNNs).

They all expect images in what is called WHCN order: a batch of 32 colour images, each 50 x 50 pixels, will have `size(x) == (50, 50, 3, 32)`. A single grayscale image might instead have `size(x) == (28, 28, 1, 1)`.

Besides images, 2D data, they also work with 1D data, where for instance stereo sound recording with 1000 samples might have `size(x) == (1000, 2, 1)`. They will also work with 3D data, `ndims(x) == 5`, where again the last two dimensions are channel and batch.

To understand how strides and padding work, the article by [Dumoulin & Visin](https://arxiv.org/abs/1603.07285) has great illustrations.

```@docs
Conv
Conv(weight::AbstractArray)
AdaptiveMaxPool
MaxPool
GlobalMaxPool
AdaptiveMeanPool
MeanPool
GlobalMeanPool
DepthwiseConv
ConvTranspose
ConvTranspose(weight::AbstractArray)
CrossCor
CrossCor(weight::AbstractArray)
DepthwiseConv
SamePad
Flux.flatten
```

## Upsampling Layers
### Pooling

These layers are commonly used after a convolution layer, and reduce the size of its output. They have no trainable parameters.

```@docs
AdaptiveMaxPool
MaxPool
GlobalMaxPool
AdaptiveMeanPool
MeanPool
GlobalMeanPool
```

## Upsampling

The opposite of pooling, these layers increase the size of an array. They have no trainable parameters.

```@docs
Upsample
PixelShuffle
```

## Recurrent Layers
## Embedding Vectors

Much like the core layers above, but can be used to process sequence data (as well as other kinds of structured data).
These layers accept an index, and return a vector (or several indices, and several vectors). The possible embedding vectors are learned parameters.

```@docs
RNN
LSTM
GRU
GRUv3
Flux.Recur
Flux.reset!
Flux.Embedding
Flux.EmbeddingBag
```

## Other General Purpose Layers
## [Dataflow Layers, or Containers](@id man-dataflow-layers)

These are marginally more obscure than the Basic Layers.
But in contrast to the layers described in the other sections are not readily grouped around a particular purpose (e.g. CNNs or RNNs).
The basic `Chain(F, G, H)` applies the layers it contains in sequence, equivalent to `H ∘ G ∘ F`. Flux has some other layers which contain layers, but connect them up in a more complicated way: `SkipConnection` allows ResNet's residual connection.

```@docs
Chain
Flux.activations
Maxout
SkipConnection
Parallel
Flux.Bilinear
Flux.Scale
Flux.Embedding
PairwiseFusion
```

## Recurrent Models

Much like the core layers above, but can be used to process sequence data (as well as other kinds of structured data).

```@docs
RNN
LSTM
GRU
GRUv3
Flux.Recur
Flux.reset!
```

## Normalisation & Regularisation

These layers don't affect the structure of the network but may improve training times or reduce overfitting.
These layers don't affect the structure of the network but may improve training times or reduce overfitting. Some of them contain trainable parameters, while others do not.

```@docs
Flux.normalise
BatchNorm
Dropout
Flux.dropout
AlphaDropout
LayerNorm
InstanceNorm
GroupNorm
Flux.normalise
Flux.dropout
```

### Testmode
### Test vs. Train

Several normalisation layers behave differently under training and inference (testing). By default, Flux will automatically determine when a layer evaluation is part of training or inference.

!!! warning
This automatic train/test detection works best with Zygote, the default
automatic differentiation package. It may not work with other packages
such as Tracker, Yota, or ForwardDiff.

Many normalisation layers behave differently under training and inference (testing). By default, Flux will automatically determine when a layer evaluation is part of training or inference. Still, depending on your use case, it may be helpful to manually specify when these layers should be treated as being trained or not. For this, Flux provides `Flux.testmode!`. When called on a model (e.g. a layer or chain of layers), this function will place the model into the mode specified.
The functions `Flux.trainmode!` and `Flux.testmode!` let you manually specify which behaviour you want. When called on a model, they will place all layers within the model into the specified mode.

```@docs
Flux.testmode!
Expand Down
3 changes: 2 additions & 1 deletion docs/src/utilities.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Random Weight Initialisation
# [Random Weight Initialisation](@id man-init-funcs)

Flux initialises convolutional layers and recurrent cells with `glorot_uniform` by default.
Most layers accept a function as an `init` keyword, which replaces this default. For example:
Expand Down Expand Up @@ -42,6 +42,7 @@ Flux.ones32
Flux.zeros32
Flux.rand32
Flux.randn32
Flux.create_bias
```

These functions call:
Expand Down
3 changes: 3 additions & 0 deletions src/layers/basic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,9 @@ function Base.show(io::IO, l::Dense)
print(io, ")")
end

Dense(W::LinearAlgebra.Diagonal, bias = true, σ = identity) =
Scale(W.diag, bias, σ)

"""
Scale(size::Integer..., σ=identity; bias=true, init=ones32)
Scale(scale::AbstractArray, [bias, σ])
Expand Down

0 comments on commit a8dfcfc

Please sign in to comment.