tweaks

FluxML · Sep 29, 2022 · f8d5cd0 · f8d5cd0
1 parent c874eb7
commit f8d5cd0
Show file tree

Hide file tree

Showing 8 changed files with 41 additions and 32 deletions.
diff --git a/docs/make.jl b/docs/make.jl
@@ -30,7 +30,7 @@ makedocs(
              "Training" => "training/training.md",
              "Regularisation" => "models/regularisation.md",
              "Loss Functions 📚" => "models/losses.md",
-             "Optimisation Rules 📚" => "training/optimisers.md",  # TODO move optimiser intro up to Training, destructure to new section
+             "Optimisation Rules 📚" => "training/optimisers.md",  # TODO move optimiser intro up to Training
              "Callback Helpers 📚" => "training/callbacks.md",
              "Zygote.jl 📚 (`gradient`, ...)" => "training/zygote.md",
          ],
@@ -44,8 +44,8 @@ makedocs(
          ],
          "Performance Tips" => "performance.md",
          "Flux's Ecosystem" => "ecosystem.md",
-         "Tutorials" => [  # TODO, maybe
-         ],
+         # "Tutorials" => [  # TODO, maybe
+         # ],
     ],
     format = Documenter.HTML(
         sidebar_sitename = false,

diff --git a/docs/src/data/mlutils.md b/docs/src/data/mlutils.md
@@ -1,25 +1,23 @@
-# Working with data using MLUtils.jl
+# Working with Data, using MLUtils.jl
 
 Flux re-exports the `DataLoader` type and utility functions for working with
 data from [MLUtils](https://github.com/JuliaML/MLUtils.jl).
 
-## DataLoader
+## `DataLoader`
 
-`DataLoader` can be used to handle iteration over mini-batches of data.
+The `DataLoader` can be used to create mini-batches of data, in the format [`train!`](@ref) expects.
 
 `Flux`'s website has a [dedicated tutorial](https://fluxml.ai/tutorials/2021/01/21/data-loader.html) on `DataLoader` for more information. 
 
 ```@docs
 MLUtils.DataLoader
 ```
 
-## Utility functions for working with data
+## Utility Functions
 
 The utility functions are meant to be used while working with data;
 these functions help create inputs for your models or batch your dataset.
 
-Below is a non-exhaustive list of such utility functions.
-
 ```@docs
 MLUtils.unsqueeze
 MLUtils.flatten

diff --git a/docs/src/destructure.md b/docs/src/destructure.md
@@ -3,8 +3,6 @@
 
 A Flux model is a nested structure, with parameters stored within many layers. Sometimes you may want a flat representation of them, to interact with functions expecting just one vector. This is provided by `destructure`.
 
-For example, this computes the Hessian `∂²L/∂θᵢ∂θⱼ` of some loss function, with respect to all parameters of a Flux model. The resulting matrix has off-diagonal entries, which cannot really be expressed in a nested structure:
-
 ```julia
 julia> model = Chain(Dense(2=>1, tanh), Dense(1=>1))
 Chain(
@@ -15,6 +13,16 @@ Chain(
 julia> flat, rebuild = Flux.destructure(model)
 (Float32[0.863101, 1.2454957, 0.0, -1.6345707, 0.0], Restructure(Chain, ..., 5))
 
+julia> rebuild(zeros(5))  # same structure, new parameters
+Chain(
+  Dense(2 => 1, tanh),                  # 3 parameters  (all zero)
+  Dense(1 => 1),                        # 2 parameters  (all zero)
+)                   # Total: 4 arrays, 5 parameters, 276 bytes.
+```
+
+This can be used within gradient computations. For instance, this computes the Hessian `∂²L/∂θᵢ∂θⱼ` of some loss function, with respect to all parameters of the Flux model. The resulting matrix has off-diagonal entries, which cannot really be expressed in a nested structure:
+
+```
 julia> x = rand(Float32, 2, 16);
 
 julia> grad = gradient(m -> sum(abs2, m(x)), model)  # nested gradient
@@ -26,7 +34,7 @@ julia> function loss(v::Vector)
          sum(abs2, y)
        end;
 
-julia> gradient(loss, flat)  # same numbers
+julia> gradient(loss, flat)  # flat gradient, same numbers
 (Float32[10.339018, 11.379145, 22.845667, -29.565302, -37.644184],)
 
 julia> Zygote.hessian(loss, flat)  # second derivative

diff --git a/docs/src/index.md b/docs/src/index.md
@@ -12,15 +12,15 @@ Download [Julia 1.6](https://julialang.org/downloads/) or later, preferably the
 
 This will automatically install several other packages, including [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) which supports Nvidia GPUs. To directly access some of its functionality, you may want to add `] add CUDA` too. The page on [GPU support](gpu.md) has more details.
 
-Other closely associated packages, which are also installed, are [Zygote.jl](https://github.com/FluxML/Zygote.jl), [Optimisers.jl](https://github.com/FluxML/Optimisers.jl), [NNlib.jl](https://github.com/FluxML/NNlib.jl), [Functors.jl](https://github.com/FluxML/Functors.jl) and [MLUtils.jl](https://github.com/JuliaML/MLUtils.jl).
+Other closely associated packages, also installed automatically, include [Zygote](https://github.com/FluxML/Zygote.jl), [Optimisers](https://github.com/FluxML/Optimisers.jl), [NNlib](https://github.com/FluxML/NNlib.jl), [Functors](https://github.com/FluxML/Functors.jl) and [MLUtils](https://github.com/JuliaML/MLUtils.jl).
 
 ## Learning Flux
 
 The [quick start](models/quickstart.md) page trains a simple neural network.
 
 This rest of this documentation provides a from-scratch introduction to Flux's take on models and how they work, starting with [fitting a line](models/overview.md). Once you understand these docs, congratulations, you also understand [Flux's source code](https://github.com/FluxML/Flux.jl), which is intended to be concise, legible and a good reference for more advanced concepts.
 
-Sections with 📚 contain API listings. The same text is avalable at the Julia prompt by typing `?gpu`.
+Sections with 📚 contain API listings. The same text is avalable at the Julia prompt, by typing for example `?gpu`.
 
 If you just want to get started writing models, the [model zoo](https://github.com/FluxML/model-zoo/) gives good starting points for many common ones.
 

diff --git a/docs/src/models/functors.md b/docs/src/models/functors.md
@@ -4,7 +4,7 @@ Flux models are deeply nested structures, and [Functors.jl](https://github.com/F
 
 New layers should be annotated using the `Functors.@functor` macro. This will enable [`params`](@ref Flux.params) to see the parameters inside, and [`gpu`](@ref) to move them to the GPU.
 
-`Functors.jl` has its own [notes on basic usage](https://fluxml.ai/Functors.jl/stable/#Basic-Usage-and-Implementation) for more details. Additionally, the [Advanced Model Building and Customisation](@ref Advanced-Model-Building-and-Customisation) page covers the use cases of `Functors` in greater details.
+`Functors.jl` has its own [notes on basic usage](https://fluxml.ai/Functors.jl/stable/#Basic-Usage-and-Implementation) for more details. Additionally, the [Advanced Model Building and Customisation](../models/advanced.md) page covers the use cases of `Functors` in greater details.
 
 ```@docs
 Functors.@functor

diff --git a/docs/src/models/quickstart.md b/docs/src/models/quickstart.md
@@ -1,25 +1,22 @@
-# Flux Neural Networks in One Minute
+# A Neural Network in One Minute
 
 If you have used neural networks before, then this simple example might be helpful for seeing how the major parts of Flux work together. Try pasting the code into the REPL prompt.
 
-If you haven't, then you might prefer the [Fitting a Straight Line](models/overview.jl) page.
+If you haven't, then you might prefer the [Fitting a Straight Line](overview.md) page.
 
 ```julia
 # With Julia 1.7+, this will prompt if neccessary to install everything, including CUDA:
-using Flux, Plots, Statistics
+using Flux, Statistics
 
 # Generate some data for the XOR problem: vectors of length 2, as columns of a matrix:
 noisy = rand(Float32, 2, 1000)                                    # 2×1000 Matrix{Float32}
 truth = map(col -> xor(col...), eachcol(noisy .> 0.5))            # 1000-element Vector{Bool}
 
-p_true = Plots.scatter(noisy[1,:], noisy[2,:], zcolor=truth, lab="", title="True classification")
-
 # Define our model, a multi-layer perceptron with one hidden layer of size 3:
 model = Chain(Dense(2 => 3, tanh), BatchNorm(3), Dense(3 => 2), softmax)
 
 # The model encapsulates parameters, randomly initialised. Its initial output is:
-out = model(noisy)                                                # 2×1000 Matrix{Float32}
-p_raw = Plots.scatter(noisy[1,:], noisy[2,:], zcolor=out[1,:], legend=false, title="Untrained network")
+out1 = model(noisy)                                               # 2×1000 Matrix{Float32}
 
 # To train the model, we use batches of 64 samples:
 mat = Flux.onehotbatch(truth, [true, false])                      # 2×1000 OneHotMatrix
@@ -29,7 +26,7 @@ first(data) .|> summary                                           # ("2×64 Matr
 pars = Flux.params(model)  # contains references to arrays in model
 opt = Flux.Adam(0.01)      # will store optimiser momentum, etc.
 
-# Training loop, using whole data set 1000 times:
+# Training loop, using the whole data set 1000 times:
 for epoch in 1:1_000
     Flux.train!(pars, data, opt) do x, y
         # First argument of train! is a loss function, here defined by a `do` block.
@@ -43,13 +40,19 @@ opt
 out2 = model(noisy)
 
 mean((out2[1,:] .> 0.5) .== truth)  # accuracy 94% so far!
-
-p_done = Plots.scatter(noisy[1,:], noisy[2,:], zcolor=out2[1,:], legend=false, title="Trained network")
-plot(p_true, p_raw, p_done, layout=(1,3), size=(1000,330))  # combined plot, shown below
 ```
 
 ![](../assets/oneminute.png)
 
+```
+using Plots  # to draw the above figure
+
+p_true = Plots.scatter(noisy[1,:], noisy[2,:], zcolor=truth, lab="", title="True classification")
+p_raw = Plots.scatter(noisy[1,:], noisy[2,:], zcolor=out1[1,:], legend=false, title="Untrained network")
+p_done = Plots.scatter(noisy[1,:], noisy[2,:], zcolor=out2[1,:], legend=false, title="Trained network")
+
+plot(p_true, p_raw, p_done, layout=(1,3), size=(1000,330))
+```
 
 This XOR ("exclusive or") problem is a variant of the famous one which drove Minsky and Papert to invent deep neural networks in 1969. For small values of "deep" -- this has one hidden layer, while earlier perceptrons had none. (What they call a hidden layer, Flux calls the output of the first layer, `model[1](noisy)`.)
 

diff --git a/docs/src/training/zygote.md b/docs/src/training/zygote.md
@@ -1,13 +1,13 @@
 # Automatic Differentiation using Zygote.jl
 
-Flux re-exports the `gradient` from [Zygote](https://github.com/FluxML/Zygote.jl), and uses this function within [`train!`](@ref) to differentiate the model. Zygote has its own [documentation](https://fluxml.ai/Zygote.jl/dev/), in particulat listing some [limitations](https://fluxml.ai/Zygote.jl/dev/limitations/).
+Flux re-exports the `gradient` from [Zygote](https://github.com/FluxML/Zygote.jl), and uses this function within [`train!`](@ref) to differentiate the model. Zygote has its own [documentation](https://fluxml.ai/Zygote.jl/dev/), in particular listing some [important limitations](https://fluxml.ai/Zygote.jl/dev/limitations/).
 
 ### Implicit style
 
 Flux uses primarily what Zygote calls "implicit" gradients, [described here](https://fluxml.ai/Zygote.jl/dev/#Explicit-and-Implicit-Parameters-1) in its documentation. 
 
 ```@docs
-Zygote.gradient(f, pars::Zygote.Params)
+Zygote.gradient
 Zygote.Params
 Zygote.Grads
 ```

diff --git a/src/outputsize.jl b/src/outputsize.jl
@@ -65,7 +65,7 @@ If `m` is a `Tuple` or `Vector`, its elements are applied in sequence, like `Cha
 ```julia-repl
 julia> using Flux: outputsize
 
-julia> outputsize(Dense(10, 4), (10,); padbatch=true)
+julia> outputsize(Dense(10 => 4), (10,); padbatch=true)
 (4, 1)
 
 julia> m = Chain(Conv((3, 3), 3 => 16), Conv((3, 3), 16 => 32));
@@ -84,7 +84,7 @@ julia> try outputsize(m, (10, 10, 7, 64)) catch e println(e) end
 └ @ Flux ~/.julia/dev/Flux/src/outputsize.jl:114
 DimensionMismatch("Input channels must match! (7 vs. 3)")
 
-julia> outputsize([Dense(10, 4), Dense(4, 2)], (10, 1)) # Vector of layers becomes a Chain
+julia> outputsize([Dense(10 => 4), Dense(4 => 2)], (10, 1)) # Vector of layers becomes a Chain
 (2, 1)
 ```
 """
@@ -121,12 +121,12 @@ this returns `size(m((x, y, ...)))` given `size_x = size(x)`, etc.
 ```jldoctest
 julia> x, y = rand(Float32, 5, 64), rand(Float32, 7, 64);
 
-julia> par = Parallel(vcat, Dense(5, 9), Dense(7, 11));
+julia> par = Parallel(vcat, Dense(5 => 9), Dense(7 => 11));
 
 julia> Flux.outputsize(par, (5, 64), (7, 64))
 (20, 64)
 
-julia> m = Chain(par, Dense(20, 13), softmax);
+julia> m = Chain(par, Dense(20 => 13), softmax);
 
 julia> Flux.outputsize(m, (5,), (7,); padbatch=true)
 (13, 1)