Skip to content

Commit

Permalink
tweaks
Browse files Browse the repository at this point in the history
  • Loading branch information
mcabbott committed Sep 29, 2022
1 parent c874eb7 commit f8d5cd0
Show file tree
Hide file tree
Showing 8 changed files with 41 additions and 32 deletions.
6 changes: 3 additions & 3 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ makedocs(
"Training" => "training/training.md",
"Regularisation" => "models/regularisation.md",
"Loss Functions 📚" => "models/losses.md",
"Optimisation Rules 📚" => "training/optimisers.md", # TODO move optimiser intro up to Training, destructure to new section
"Optimisation Rules 📚" => "training/optimisers.md", # TODO move optimiser intro up to Training
"Callback Helpers 📚" => "training/callbacks.md",
"Zygote.jl 📚 (`gradient`, ...)" => "training/zygote.md",
],
Expand All @@ -44,8 +44,8 @@ makedocs(
],
"Performance Tips" => "performance.md",
"Flux's Ecosystem" => "ecosystem.md",
"Tutorials" => [ # TODO, maybe
],
# "Tutorials" => [ # TODO, maybe
# ],
],
format = Documenter.HTML(
sidebar_sitename = false,
Expand Down
10 changes: 4 additions & 6 deletions docs/src/data/mlutils.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,23 @@
# Working with data using MLUtils.jl
# Working with Data, using MLUtils.jl

Flux re-exports the `DataLoader` type and utility functions for working with
data from [MLUtils](https://github.com/JuliaML/MLUtils.jl).

## DataLoader
## `DataLoader`

`DataLoader` can be used to handle iteration over mini-batches of data.
The `DataLoader` can be used to create mini-batches of data, in the format [`train!`](@ref) expects.

`Flux`'s website has a [dedicated tutorial](https://fluxml.ai/tutorials/2021/01/21/data-loader.html) on `DataLoader` for more information.

```@docs
MLUtils.DataLoader
```

## Utility functions for working with data
## Utility Functions

The utility functions are meant to be used while working with data;
these functions help create inputs for your models or batch your dataset.

Below is a non-exhaustive list of such utility functions.

```@docs
MLUtils.unsqueeze
MLUtils.flatten
Expand Down
14 changes: 11 additions & 3 deletions docs/src/destructure.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@

A Flux model is a nested structure, with parameters stored within many layers. Sometimes you may want a flat representation of them, to interact with functions expecting just one vector. This is provided by `destructure`.

For example, this computes the Hessian `∂²L/∂θᵢ∂θⱼ` of some loss function, with respect to all parameters of a Flux model. The resulting matrix has off-diagonal entries, which cannot really be expressed in a nested structure:

```julia
julia> model = Chain(Dense(2=>1, tanh), Dense(1=>1))
Chain(
Expand All @@ -15,6 +13,16 @@ Chain(
julia> flat, rebuild = Flux.destructure(model)
(Float32[0.863101, 1.2454957, 0.0, -1.6345707, 0.0], Restructure(Chain, ..., 5))

julia> rebuild(zeros(5)) # same structure, new parameters
Chain(
Dense(2 => 1, tanh), # 3 parameters (all zero)
Dense(1 => 1), # 2 parameters (all zero)
) # Total: 4 arrays, 5 parameters, 276 bytes.
```

This can be used within gradient computations. For instance, this computes the Hessian `∂²L/∂θᵢ∂θⱼ` of some loss function, with respect to all parameters of the Flux model. The resulting matrix has off-diagonal entries, which cannot really be expressed in a nested structure:

```
julia> x = rand(Float32, 2, 16);
julia> grad = gradient(m -> sum(abs2, m(x)), model) # nested gradient
Expand All @@ -26,7 +34,7 @@ julia> function loss(v::Vector)
sum(abs2, y)
end;
julia> gradient(loss, flat) # same numbers
julia> gradient(loss, flat) # flat gradient, same numbers
(Float32[10.339018, 11.379145, 22.845667, -29.565302, -37.644184],)
julia> Zygote.hessian(loss, flat) # second derivative
Expand Down
4 changes: 2 additions & 2 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,15 @@ Download [Julia 1.6](https://julialang.org/downloads/) or later, preferably the

This will automatically install several other packages, including [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) which supports Nvidia GPUs. To directly access some of its functionality, you may want to add `] add CUDA` too. The page on [GPU support](gpu.md) has more details.

Other closely associated packages, which are also installed, are [Zygote.jl](https://github.com/FluxML/Zygote.jl), [Optimisers.jl](https://github.com/FluxML/Optimisers.jl), [NNlib.jl](https://github.com/FluxML/NNlib.jl), [Functors.jl](https://github.com/FluxML/Functors.jl) and [MLUtils.jl](https://github.com/JuliaML/MLUtils.jl).
Other closely associated packages, also installed automatically, include [Zygote](https://github.com/FluxML/Zygote.jl), [Optimisers](https://github.com/FluxML/Optimisers.jl), [NNlib](https://github.com/FluxML/NNlib.jl), [Functors](https://github.com/FluxML/Functors.jl) and [MLUtils](https://github.com/JuliaML/MLUtils.jl).

## Learning Flux

The [quick start](models/quickstart.md) page trains a simple neural network.

This rest of this documentation provides a from-scratch introduction to Flux's take on models and how they work, starting with [fitting a line](models/overview.md). Once you understand these docs, congratulations, you also understand [Flux's source code](https://github.com/FluxML/Flux.jl), which is intended to be concise, legible and a good reference for more advanced concepts.

Sections with 📚 contain API listings. The same text is avalable at the Julia prompt by typing `?gpu`.
Sections with 📚 contain API listings. The same text is avalable at the Julia prompt, by typing for example `?gpu`.

If you just want to get started writing models, the [model zoo](https://github.com/FluxML/model-zoo/) gives good starting points for many common ones.

Expand Down
2 changes: 1 addition & 1 deletion docs/src/models/functors.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Flux models are deeply nested structures, and [Functors.jl](https://github.com/F

New layers should be annotated using the `Functors.@functor` macro. This will enable [`params`](@ref Flux.params) to see the parameters inside, and [`gpu`](@ref) to move them to the GPU.

`Functors.jl` has its own [notes on basic usage](https://fluxml.ai/Functors.jl/stable/#Basic-Usage-and-Implementation) for more details. Additionally, the [Advanced Model Building and Customisation](@ref Advanced-Model-Building-and-Customisation) page covers the use cases of `Functors` in greater details.
`Functors.jl` has its own [notes on basic usage](https://fluxml.ai/Functors.jl/stable/#Basic-Usage-and-Implementation) for more details. Additionally, the [Advanced Model Building and Customisation](../models/advanced.md) page covers the use cases of `Functors` in greater details.

```@docs
Functors.@functor
Expand Down
25 changes: 14 additions & 11 deletions docs/src/models/quickstart.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,22 @@
# Flux Neural Networks in One Minute
# A Neural Network in One Minute

If you have used neural networks before, then this simple example might be helpful for seeing how the major parts of Flux work together. Try pasting the code into the REPL prompt.

If you haven't, then you might prefer the [Fitting a Straight Line](models/overview.jl) page.
If you haven't, then you might prefer the [Fitting a Straight Line](overview.md) page.

```julia
# With Julia 1.7+, this will prompt if neccessary to install everything, including CUDA:
using Flux, Plots, Statistics
using Flux, Statistics

# Generate some data for the XOR problem: vectors of length 2, as columns of a matrix:
noisy = rand(Float32, 2, 1000) # 2×1000 Matrix{Float32}
truth = map(col -> xor(col...), eachcol(noisy .> 0.5)) # 1000-element Vector{Bool}

p_true = Plots.scatter(noisy[1,:], noisy[2,:], zcolor=truth, lab="", title="True classification")

# Define our model, a multi-layer perceptron with one hidden layer of size 3:
model = Chain(Dense(2 => 3, tanh), BatchNorm(3), Dense(3 => 2), softmax)

# The model encapsulates parameters, randomly initialised. Its initial output is:
out = model(noisy) # 2×1000 Matrix{Float32}
p_raw = Plots.scatter(noisy[1,:], noisy[2,:], zcolor=out[1,:], legend=false, title="Untrained network")
out1 = model(noisy) # 2×1000 Matrix{Float32}

# To train the model, we use batches of 64 samples:
mat = Flux.onehotbatch(truth, [true, false]) # 2×1000 OneHotMatrix
Expand All @@ -29,7 +26,7 @@ first(data) .|> summary # ("2×64 Matr
pars = Flux.params(model) # contains references to arrays in model
opt = Flux.Adam(0.01) # will store optimiser momentum, etc.

# Training loop, using whole data set 1000 times:
# Training loop, using the whole data set 1000 times:
for epoch in 1:1_000
Flux.train!(pars, data, opt) do x, y
# First argument of train! is a loss function, here defined by a `do` block.
Expand All @@ -43,13 +40,19 @@ opt
out2 = model(noisy)

mean((out2[1,:] .> 0.5) .== truth) # accuracy 94% so far!

p_done = Plots.scatter(noisy[1,:], noisy[2,:], zcolor=out2[1,:], legend=false, title="Trained network")
plot(p_true, p_raw, p_done, layout=(1,3), size=(1000,330)) # combined plot, shown below
```

![](../assets/oneminute.png)

```
using Plots # to draw the above figure
p_true = Plots.scatter(noisy[1,:], noisy[2,:], zcolor=truth, lab="", title="True classification")
p_raw = Plots.scatter(noisy[1,:], noisy[2,:], zcolor=out1[1,:], legend=false, title="Untrained network")
p_done = Plots.scatter(noisy[1,:], noisy[2,:], zcolor=out2[1,:], legend=false, title="Trained network")
plot(p_true, p_raw, p_done, layout=(1,3), size=(1000,330))
```

This XOR ("exclusive or") problem is a variant of the famous one which drove Minsky and Papert to invent deep neural networks in 1969. For small values of "deep" -- this has one hidden layer, while earlier perceptrons had none. (What they call a hidden layer, Flux calls the output of the first layer, `model[1](noisy)`.)

Expand Down
4 changes: 2 additions & 2 deletions docs/src/training/zygote.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# Automatic Differentiation using Zygote.jl

Flux re-exports the `gradient` from [Zygote](https://github.com/FluxML/Zygote.jl), and uses this function within [`train!`](@ref) to differentiate the model. Zygote has its own [documentation](https://fluxml.ai/Zygote.jl/dev/), in particulat listing some [limitations](https://fluxml.ai/Zygote.jl/dev/limitations/).
Flux re-exports the `gradient` from [Zygote](https://github.com/FluxML/Zygote.jl), and uses this function within [`train!`](@ref) to differentiate the model. Zygote has its own [documentation](https://fluxml.ai/Zygote.jl/dev/), in particular listing some [important limitations](https://fluxml.ai/Zygote.jl/dev/limitations/).

### Implicit style

Flux uses primarily what Zygote calls "implicit" gradients, [described here](https://fluxml.ai/Zygote.jl/dev/#Explicit-and-Implicit-Parameters-1) in its documentation.

```@docs
Zygote.gradient(f, pars::Zygote.Params)
Zygote.gradient
Zygote.Params
Zygote.Grads
```
Expand Down
8 changes: 4 additions & 4 deletions src/outputsize.jl
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ If `m` is a `Tuple` or `Vector`, its elements are applied in sequence, like `Cha
```julia-repl
julia> using Flux: outputsize
julia> outputsize(Dense(10, 4), (10,); padbatch=true)
julia> outputsize(Dense(10 => 4), (10,); padbatch=true)
(4, 1)
julia> m = Chain(Conv((3, 3), 3 => 16), Conv((3, 3), 16 => 32));
Expand All @@ -84,7 +84,7 @@ julia> try outputsize(m, (10, 10, 7, 64)) catch e println(e) end
└ @ Flux ~/.julia/dev/Flux/src/outputsize.jl:114
DimensionMismatch("Input channels must match! (7 vs. 3)")
julia> outputsize([Dense(10, 4), Dense(4, 2)], (10, 1)) # Vector of layers becomes a Chain
julia> outputsize([Dense(10 => 4), Dense(4 => 2)], (10, 1)) # Vector of layers becomes a Chain
(2, 1)
```
"""
Expand Down Expand Up @@ -121,12 +121,12 @@ this returns `size(m((x, y, ...)))` given `size_x = size(x)`, etc.
```jldoctest
julia> x, y = rand(Float32, 5, 64), rand(Float32, 7, 64);
julia> par = Parallel(vcat, Dense(5, 9), Dense(7, 11));
julia> par = Parallel(vcat, Dense(5 => 9), Dense(7 => 11));
julia> Flux.outputsize(par, (5, 64), (7, 64))
(20, 64)
julia> m = Chain(par, Dense(20, 13), softmax);
julia> m = Chain(par, Dense(20 => 13), softmax);
julia> Flux.outputsize(m, (5,), (7,); padbatch=true)
(13, 1)
Expand Down

0 comments on commit f8d5cd0

Please sign in to comment.