add some docs for the binary classifier

FluxML · May 30, 2024 · 5842ed9 · 5842ed9
1 parent decef9b
commit 5842ed9
Show file tree

Hide file tree

Showing 4 changed files with 198 additions and 2 deletions.
diff --git a/docs/src/interface/Classification.md b/docs/src/interface/Classification.md
@@ -1,3 +1,7 @@
 ```@docs
 MLJFlux.NeuralNetworkClassifier
+```
+
+```@docs
+MLJFlux.NeuralNetworkBinaryClassifier
 ```
diff --git a/docs/src/interface/Summary.md b/docs/src/interface/Summary.md
@@ -12,6 +12,7 @@ Model Type | Prediction type | `scitype(X) <: _` | `scitype(y) <: _`
 `NeuralNetworkRegressor` | `Deterministic` | `Table(Continuous)` with `n_in` columns | `AbstractVector{<:Continuous)` (`n_out = 1`)
 `MultitargetNeuralNetworkRegressor` | `Deterministic` | `Table(Continuous)` with `n_in` columns | `<: Table(Continuous)` with `n_out` columns
 `NeuralNetworkClassifier` | `Probabilistic` | `<:Table(Continuous)` with `n_in` columns | `AbstractVector{<:Finite}` with `n_out` classes
+`NeuralNetworkBinaryClassifier` | `Probabilistic` | `<:Table(Continuous)` with `n_in` columns | `AbstractVector{<:Finite{2}}` (`n_out = 2`)
 `ImageClassifier` | `Probabilistic` | `AbstractVector(<:Image{W,H})` with `n_in = (W, H)` | `AbstractVector{<:Finite}` with `n_out` classes
 
 

diff --git a/src/MLJFlux.jl b/src/MLJFlux.jl
@@ -29,7 +29,7 @@ include("image.jl")
 include("mlj_model_interface.jl")
 
 export NeuralNetworkRegressor, MultitargetNeuralNetworkRegressor
-export NeuralNetworkClassifier, ImageClassifier
+export NeuralNetworkClassifier, NeuralNetworkBinaryClassifier, ImageClassifier
 export CUDALibs, CPU1
 
 

diff --git a/src/types.jl b/src/types.jl
@@ -282,11 +282,202 @@ plot(curve.parameter_values,
 
 ```
 
-See also [`ImageClassifier`](@ref).
+See also [`ImageClassifier`](@ref), [`NeuralNetworkBinaryClassifier`](@ref).
 
 """
 NeuralNetworkClassifier
 
+"""
+$(MMI.doc_header(NeuralNetworkBinaryClassifier))
+
+`NeuralNetworkBinaryClassifier` is for training a data-dependent Flux.jl neural network
+for making probabilistic predictions of a binary (`Multiclass{2}` or `OrderedFactor{2}`) target,
+given a table of `Continuous` features. Users provide a recipe for constructing
+ the network, based on properties of the data that is encountered, by specifying
+ an appropriate `builder`. See MLJFlux documentation for more on builders.
+
+# Training data
+
+In MLJ or MLJBase, bind an instance `model` to data with
+
+    mach = machine(model, X, y)
+
+Here:
+
+- `X` is either a `Matrix` or any table of input features (eg, a `DataFrame`) whose columns are of scitype
+  `Continuous`; check column scitypes with `schema(X)`. If `X` is a `Matrix`, 
+  it is assumed to have columns corresponding to features and rows corresponding to observations.
+
+- `y` is the target, which can be any `AbstractVector` whose element scitype is `Multiclass{2}`
+  or `OrderedFactor{2}`; check the scitype with `scitype(y)`
+
+Train the machine with `fit!(mach, rows=...)`.
+
+
+# Hyper-parameters
+
+- `builder=MLJFlux.Short()`: An MLJFlux builder that constructs a neural network. Possible
+   `builders` include: `MLJFlux.Linear`, `MLJFlux.Short`, and `MLJFlux.MLP`. See
+   MLJFlux.jl documentation for examples of user-defined builders. See also `finaliser`
+   below.
+
+- `optimiser::Flux.Adam()`: A `Flux.Optimise` optimiser. The optimiser performs the
+  updating of the weights of the network. For further reference, see [the Flux optimiser
+  documentation](https://fluxml.ai/Flux.jl/stable/training/optimisers/). To choose a
+  learning rate (the update rate of the optimizer), a good rule of thumb is to start out
+  at `10e-3`, and tune using powers of 10 between `1` and `1e-7`.
+
+- `loss=Flux.binarycrossentropy`: The loss function which the network will optimize. Should be a
+  function which can be called in the form `loss(yhat, y)`.  Possible loss functions are
+  listed in [the Flux loss function
+  documentation](https://fluxml.ai/Flux.jl/stable/models/losses/). For a classification
+  task, the most natural loss functions are:
+
+  - `Flux.binarycrossentropy`: Standard binary classification loss, also known as the log
+    loss.
+
+  - `Flux.logitbinarycrossentropy`: Mathematically equal to crossentropy, but numerically more
+    stable than finalising the outputs with `σ` and then calculating
+    crossentropy. You will need to specify `finaliser=identity` to remove MLJFlux's
+    default sigmoid finaliser, and understand that the output of `predict` is then
+    unnormalized (no longer probabilistic).
+
+  - `Flux.tversky_loss`: Used with imbalanced data to give more weight to false negatives.
+
+  - `Flux.binary_focal_loss`: Used with highly imbalanced data. Weights harder examples more than
+    easier examples.
+
+  Currently MLJ measures are not supported values of `loss`.
+
+- `epochs::Int=10`: The duration of training, in epochs. Typically, one epoch represents
+  one pass through the complete the training dataset.
+
+- `batch_size::int=1`: the batch size to be used for training, representing the number of
+  samples per update of the network weights. Typically, batch size is between 8 and
+  512. Increassing batch size may accelerate training if `acceleration=CUDALibs()` and a
+  GPU is available.
+
+- `lambda::Float64=0`: The strength of the weight regularization penalty. Can be any value
+  in the range `[0, ∞)`.
+
+- `alpha::Float64=0`: The L2/L1 mix of regularization, in the range `[0, 1]`. A value of 0
+  represents L2 regularization, and a value of 1 represents L1 regularization.
+
+- `rng::Union{AbstractRNG, Int64}`: The random number generator or seed used during
+  training.
+
+- `optimizer_changes_trigger_retraining::Bool=false`: Defines what happens when re-fitting
+  a machine if the associated optimiser has changed. If `true`, the associated machine
+  will retrain from scratch on `fit!` call, otherwise it will not.
+
+- `acceleration::AbstractResource=CPU1()`: Defines on what hardware training is done. For
+  Training on GPU, use `CUDALibs()`.
+
+- `finaliser=Flux.σ`: The final activation function of the neural network (applied
+  after the network defined by `builder`). Defaults to `Flux.σ`.
+
+
+# Operations
+
+- `predict(mach, Xnew)`: return predictions of the target given new features `Xnew`, which
+  should have the same scitype as `X` above. Predictions are probabilistic but uncalibrated.
+
+- `predict_mode(mach, Xnew)`: Return the modes of the probabilistic predictions returned
+  above.
+
+
+# Fitted parameters
+
+The fields of `fitted_params(mach)` are:
+
+- `chain`: The trained "chain" (Flux.jl model), namely the series of layers,
+   functions, and activations which make up the neural network. This includes
+   the final layer specified by `finaliser` (eg, `softmax`).
+
+
+# Report
+
+The fields of `report(mach)` are:
+
+- `training_losses`: A vector of training losses (penalised if `lambda != 0`) in
+   historical order, of length `epochs + 1`.  The first element is the pre-training loss.
+
+# Examples
+
+In this example we build a classification model using the Iris dataset. This is a very
+basic example, using a default builder and no standardization.  For a more advanced
+illustration, see [`NeuralNetworkRegressor`](@ref) or [`ImageClassifier`](@ref), and
+examples in the MLJFlux.jl documentation.
+
+```julia
+using MLJ, Flux
+import RDatasets
+```
+
+First, we can load the data:
+
+```julia
+mtcars = RDatasets.dataset("datasets", "mtcars");
+y, X = unpack(mtcars, ==(:VS), in([:MPG, :Cyl, :Disp, :HP, :WT, :QSec])); # a vector and a table
+y = categorical(y) # classifier takes catogorical input
+X_f32 = Float32.(X) # To match floating point type of the neural network layers
+NeuralNetworkBinaryClassifier = @load NeuralNetworkBinaryClassifier pkg=MLJFlux
+bclf = NeuralNetworkBinaryClassifier()
+```
+
+Next, we can train the model:
+
+```julia
+mach = machine(bclf, X_f32, y)
+fit!(mach)
+```
+
+We can train the model in an incremental fashion, altering the learning rate as we go,
+provided `optimizer_changes_trigger_retraining` is `false` (the default). Here, we also
+change the number of (total) iterations:
+
+```julia
+bclf.optimiser.eta = bclf.optimiser.eta * 2
+bclf.epochs = bclf.epochs + 5
+
+fit!(mach, verbosity=2) # trains 5 more epochs
+```
+
+We can inspect the mean training loss using the `cross_entropy` function:
+
+```julia
+training_loss = cross_entropy(predict(mach, X_f32), y) |> mean
+```
+
+And we can access the Flux chain (model) using `fitted_params`:
+
+```julia
+chain = fitted_params(mach).chain
+```
+
+Finally, we can see how the out-of-sample performance changes over time, using MLJ's
+`learning_curve` function:
+
+```julia
+r = range(bclf, :epochs, lower=1, upper=200, scale=:log10)
+curve = learning_curve(bclf, X_f32, y,
+                     range=r,
+                     resampling=Holdout(fraction_train=0.7),
+                     measure=cross_entropy)
+using Plots
+plot(curve.parameter_values,
+     curve.measurements,
+     xlab=curve.parameter_name,
+     xscale=curve.parameter_scale,
+     ylab = "Cross Entropy")
+
+```
+
+See also [`ImageClassifier`](@ref).
+
+"""
+NeuralNetworkBinaryClassifier
+
 """
 $(MMI.doc_header(ImageClassifier))