Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to explicit Flux, small fix for arrays.md #21

Merged
merged 1 commit into from
Nov 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/src/lecture_02/arrays.md
Original file line number Diff line number Diff line change
Expand Up @@ -448,7 +448,7 @@ ERROR: ArgumentError: number of columns of each array must match (got (4, 1))
<div class="admonition-body">
```

Create two vectors: vector of all odd positive integers smaller than `10` and vector of all even positive integers smaller than `10`. Then concatenate these two vectors horizontally and fill the third row with `4`.
Create two vectors: vector of all odd positive integers smaller than `10` and vector of all even positive integers smaller than or equal to `10`. Then concatenate these two vectors horizontally and fill the third row with `4`.

```@raw html
</div></div>
Expand Down
50 changes: 50 additions & 0 deletions docs/src/lecture_11/Iris_train_test_acc.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
46 changes: 25 additions & 21 deletions docs/src/lecture_11/iris.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ using Flux

n_hidden = 5
m = Chain(
Dense(size(X_train,1), n_hidden, relu),
Dense(n_hidden, size(y_train,1), identity),
Dense(size(X_train,1) => n_hidden, relu),
Dense(n_hidden => size(y_train,1), identity),
softmax,
)

Expand All @@ -59,7 +59,7 @@ m(X_train)

Because there are ``3`` classes and ``120`` samples in the training set, it returns an array of size ``3\times 120``. Each column corresponds to one sample and forms a vector of probabilities due to the last layer of softmax.

We access the neural network parameters by using `params(m)`. We can select the second layer of `m` by `m[2]`. Since the second layer has ``5 `` input and ``3`` output neurons, its parameters are a matrix of size ``3\times 5`` and a vector of length ``3``. The parameters `params(m[2])` are a tuple of the matrix and the vector. This also implies that the parameters are initialized randomly, and we do not need to take care of it. We can easily modify any parameters.
We access the neural network parameters by using `params(m)`. We can select the second layer of `m` by `m[2]`. Since the second layer has ``5 `` input and ``3`` output neurons, its parameters are a matrix of size ``3\times 5`` and a vector of length ``3``. The parameters `params(m[2])` are a tuple of the matrix and the vector. This also implies that the parameters are initialized randomly, and we do not need to take care of it. We can also easily modify any parameters.

```@example iris
using Flux: params
Expand All @@ -76,17 +76,19 @@ To train the network, we need to define the objective function ``L``. Since we a
```@example iris
using Flux: crossentropy

L(x,y) = crossentropy(m(x), y)
L(ŷ, y) = crossentropy(, y)

nothing # hide
```

The `loss` function does not have `m` as input. Even though there could be an additional input parameter, it is customary to write it without it. We can evaluate the objective function by
The `loss` function should be defined between predicted $\hat{y}$ and true label $y$. Therefore, we can evaluate the objective function by

```@example iris
L(X_train, y_train)
L(m(X_train), y_train)
```

where `ŷ = m(x)`.

This computes the objective function on the whole training set. Since Flux is (unlike our implementation from the last lecture) smart, there is no need to take care of individual samples.

!!! info "Notation:"
Expand All @@ -95,46 +97,48 @@ This computes the objective function on the whole training set. Since Flux is (u
Since we have the model and the loss function, the only remaining thing is the gradient. Flux again provides a smart way to compute it.

```@example iris
ps = params(m)
grad = gradient(() -> L(X_train, y_train), ps)
grads = Flux.gradient(m -> L(m(X_train), y_train), m)

nothing # hide
```

The function `gradient` takes two inputs. The first one is the function we want to differentiate, and the second one are the parameters. The `L` function needs to be evaluated at the correct points `X_train` and `y_train`. In some applications, we may need to differentiate with respect to other parameters such as `X_train`. This can be achieved by changing the second parameters of the `gradient` function.
The function `gradient` takes as inputs a function to differentiate, and arguments that specify the parameters we want to differentiate with respect to. Since the argument is the model `m` itself, the gradient is taken with respect to the parameters of `m`. The `L` function needs to be evaluated at the correct points `m(X_train)` (predictions) and `y_train` (true labels).

```@example iris
grad = gradient(() -> L(X_train, y_train), params(X_train))
The `grads` structure is a tuple holding a named tuple with the `:layers` key. Each layer then holds the parameters of the model, in this case, the weights $W$, bias $b$, and optionally parameters of the activation function $\sigma$.

size(grad[X_train])
```julia
julia> grads[1][:layers][2]
(weight = Float32[0.30140522 0.007785671 … -0.070617765 0.014230583; 0.06814249 -0.07018863 … 0.17996183 -0.20995824; -0.36954764 0.062402964 … -0.10934405 0.19572766], bias = Float32[0.0154182855, 0.022615476, -0.03803377], σ = nothing)
```

Since `X_train` has shape ``4\times 120``, the gradient needs to have the same size.

We train the classifiers for 250 iterations. In each iteration, we compute the gradient with respect to all network parameters and perform the gradient descent with stepsize ``0.1``.
Now, we train the classifiers for 250 iterations. In each iteration, we compute the gradient with respect to all network parameters and perform the gradient descent with stepsize ``0.1``. Since [email protected], there's been a change from implicit definition to explicit definition of optimisers. Since now, we need to use `Flux.setup(optimiser, model)` to create an optimiser that would optimise over the model's parameters.

```@example iris
opt = Descent(0.1)
opt_state = Flux.setup(opt, m)
max_iter = 250

acc_train = zeros(max_iter)
acc_test = zeros(max_iter)
for i in 1:max_iter
gs = gradient(() -> L(X_train, y_train), ps)
Flux.Optimise.update!(opt, ps, gs)
gs = Flux.gradient(m -> L(m(X_train), y_train), m)
Flux.update!(opt_state, m, gs[1])
acc_train[i] = accuracy(X_train, y_train)
acc_test[i] = accuracy(X_test, y_test)
end

nothing # hide
```

The accuracy on the testing set keeps increasing as the training progresses.
Both the accuracy on the training and testing set keeps increasing as the training progresses. This is a good check that we are not over-fitting.

```@example iris
using Plots

plot(acc_test, xlabel="Iteration", ylabel="Test accuracy", label="", ylim=(-0.01,1.01))
plot(acc_train, xlabel="Iteration", ylabel="Accuracy", label="train", ylim=(-0.01,1.01))
plot!(acc_test, xlabel="Iteration", label="test", ylim=(-0.01,1.01))

savefig("Iris_acc.svg") # hide
savefig("Iris_train_test_acc.svg") # hide
```

![](Iris_acc.svg)
![](Iris_train_test_acc.svg)
20 changes: 12 additions & 8 deletions docs/src/lecture_11/nn.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ This lecture shows how to train more complex networks using stochastic gradient

## Preparing data

During the last lecture, we implemented everything from scratch. This lecture will introduce the package [Flux](https://fluxml.ai/Flux.jl/stable/models/basics/) which automizes most of the things needed for neural networks.
During the last lecture, we implemented everything from scratch. This lecture will introduce the package [Flux](https://fluxml.ai/Flux.jl/stable/models/basics/) (and [Optimisers](https://fluxml.ai/Optimisers.jl/stable/)) which automizes most of the things needed for neural networks.
- It creates many layers, including convolutional layers.
- It creates the model by chaining layers together.
- It efficiently represents model parameters.
Expand Down Expand Up @@ -381,35 +381,39 @@ m = Chain(
nothing # hide
```

The objective function ``L`` then applies the cross-entropy loss to the predictions and labels.
The objective function ``L`` then applies the cross-entropy loss to the predictions and labels. For us to be able to use `Flux.Optimise.train!` function to easily train a neural network, we will define the loss $\operatorname{L}$ as

```@example nn
using Flux: crossentropy

L(X, y) = crossentropy(m(X), y)
L(model, X, y) = crossentropy(model(X), y)

nothing # hide
```

We now write the function `train_model!` to train the neural network `m`. Since this function modifies the input model `m`, its name should contain the exclamation mark. Besides the loss function `L`, data `X` and labels `y`, it also contains as keyword arguments optimizer the optimizer `opt`, the minibatch size `batchsize`, the number of epochs `n_epochs`, and the file name `file_name` to which the model should be saved.

!!! info "Optimiser and optimiser state:"
Note that we have to initialize the optimiser state `opt_state`. For a simple gradient descent `Descent(learning_rate)`, there is no internal state of the optimiser and internal parameters. However, when using different parametrized optimisers such as Adam, the internal state of `opt_state` is updated in each iteration, just as the parameters of the model. Therefore, if we want to save a model and continue its training later on, we need to save both the model (or its parameters) and the optimiser state.


```@example nn
using BSON
using Flux: params

function train_model!(m, L, X, y;
opt = Descent(0.1),
batchsize = 128,
n_epochs = 10,
file_name = "")

opt_state = Flux.setup(opt, m)
batches = DataLoader((X, y); batchsize, shuffle = true)

for _ in 1:n_epochs
Flux.train!(L, params(m), batches, opt)
Flux.train!(L, m, batches, opt_state)
end

!isempty(file_name) && BSON.bson(file_name, m=m)
!isempty(file_name) && BSON.bson(file_name, m=m, opt_state=opt_state)

return
end
Expand Down Expand Up @@ -498,7 +502,7 @@ Use this function to load the model from `data/mnist.bson` and evaluate the perf

The optional arguments should contain `kwargs...`, which will be passed to `train_model!`. Besides that, we include `force` which enforces that the model is trained even if it already exists.

First, we should check whether the directory exists ```!isdir(dirname(file_name))``` and if not, we create it ```mkpath(dirname(file_name))```. Then we check whether the file exists (or whether we want to enforce the training). If yes, we train the model, which already modifies ```m```. If not, we ```BSON.load``` the model and copy the loaded parameters into ```m``` by ```Flux.loadparams!(m, params(m_loaded))```. We cannot load directly into ```m``` instead of ```m_loaded``` because that would create a local copy of ```m``` and the function would not modify the external ```m```.
First, we should check whether the directory exists ```!isdir(dirname(file_name))``` and if not, we create it ```mkpath(dirname(file_name))```. Then we check whether the file exists (or whether we want to enforce the training). If yes, we train the model, which already modifies ```m```. If not, we ```BSON.load``` the model and copy the loaded parameters into ```m``` by ```Flux.loadparams!(m, Flux.params(m_loaded))```. We cannot load directly into ```m``` instead of ```m_loaded``` because that would create a local copy of ```m``` and the function would not modify the external ```m```.

```@example nn
function train_or_load!(file_name, m, args...; force=false, kwargs...)
Expand All @@ -509,7 +513,7 @@ function train_or_load!(file_name, m, args...; force=false, kwargs...)
train_model!(m, args...; file_name=file_name, kwargs...)
else
m_weights = BSON.load(file_name)[:m]
Flux.loadparams!(m, params(m_weights))
Flux.loadparams!(m, Flux.params(m_weights))
end
end

Expand Down
Loading