change verbose with verbosity

JuliaTrustworthyAI · Oct 22, 2024 · 9ce642f · 9ce642f
1 parent 37f6bda
commit 9ce642f
Show file tree

Hide file tree

Showing 30 changed files with 305 additions and 976 deletions.
diff --git a/Project.toml b/Project.toml
@@ -6,16 +6,11 @@ version = "1.1.1"
 [deps]
 ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
 Compat = "34da2185-b29b-5c13-b0c7-acf172513d20"
-ComputationalResources = "ed09eef8-17a6-5b46-8889-db040fac31e3"
 Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
 Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
 LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
-MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
-MLJFlux = "094fc8d1-fd35-5302-93ea-dabda2abf845"
-MLJModelInterface = "e80e1ace-859a-464e-9ed9-23947d8ae3ea"
 MLUtils = "f1d291b0-491e-4a28-83b9-f70985020b54"
 Optimisers = "3bd65402-5787-11e9-1adc-39752487f4e2"
-ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca"
 Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
 Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
 Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
@@ -26,16 +21,11 @@ Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
 Aqua = "0.8"
 ChainRulesCore = "1.23.0"
 Compat = "4.7.0"
-ComputationalResources = "0.3.2"
 Distributions = "0.25.109"
 Flux = "0.12, 0.13, 0.14"
 LinearAlgebra = "1.7, 1.10"
-MLJBase = "1"
-MLJFlux = "0.5"
-MLJModelInterface = "1.8.0"
 MLUtils = "0.4"
 Optimisers = "0.2, 0.3"
-ProgressMeter = "1.7.2"
 Random = "1.9, 1.10"
 Statistics = "1"
 Tables = "1.10.1"

diff --git a/_freeze/docs/src/tutorials/logit/execute-results/md.json b/_freeze/docs/src/tutorials/logit/execute-results/md.json
@@ -2,7 +2,7 @@
   "hash": "64cc61b7b60f8aef12841a8bd09bc8bb",
   "result": {
     "engine": "jupyter",
-    "markdown": "```@meta\nCurrentModule = LaplaceRedux\n```\n\n# Bayesian Logistic Regression\n\n## Libraries\n\n::: {.cell execution_count=1}\n``` {.julia .cell-code}\nusing Pkg; Pkg.activate(\"docs\")\n# Import libraries\nusing Flux, Plots, TaijaPlotting, Random, Statistics, LaplaceRedux, LinearAlgebra\ntheme(:lime)\n```\n:::\n\n\n## Data\n\nWe will use synthetic data with linearly separable samples:\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\n# set seed\nseed= 1234\nRandom.seed!(seed)\n# Number of points to generate.\nxs, ys = LaplaceRedux.Data.toy_data_linear(100; seed=seed)\nX = hcat(xs...) # bring into tabular format\n```\n:::\n\n\nsplit in a training and test set\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\n# Shuffle the data\nn = length(ys)\nindices = randperm(n)\n\n# Define the split ratio\nsplit_ratio = 0.8\nsplit_index = Int(floor(split_ratio * n))\n\n# Split the data into training and test sets\ntrain_indices = indices[1:split_index]\ntest_indices = indices[split_index+1:end]\n\nxs_train = xs[train_indices]\nxs_test = xs[test_indices]\nys_train = ys[train_indices]\nys_test = ys[test_indices]\n# bring into tabular format\nX_train = hcat(xs_train...) \nX_test = hcat(xs_test...) \n\ndata = zip(xs_train,ys_train)\n```\n:::\n\n\n## Model\n\nLogistic regression with weight decay can be implemented in Flux.jl as a single dense (linear) layer with binary logit crossentropy loss:\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\nnn = Chain(Dense(2,1))\nλ = 0.5\nsqnorm(x) = sum(abs2, x)\nweight_regularization(λ=λ) = 1/2 * λ^2 * sum(sqnorm, Flux.params(nn))\nloss(x, y) = Flux.Losses.logitbinarycrossentropy(nn(x), y) + weight_regularization()\n```\n:::\n\n\nThe code below simply trains the model. After about 50 training epochs training loss stagnates.\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code}\nusing Flux.Optimise: update!, Adam\nopt = Adam()\nepochs = 50\navg_loss(data) = mean(map(d -> loss(d[1],d[2]), data))\nshow_every = epochs/10\n\nfor epoch = 1:epochs\n  for d in data\n    gs = gradient(Flux.params(nn)) do\n      l = loss(d...)\n    end\n    update!(opt, Flux.params(nn), gs)\n  end\n  if epoch % show_every == 0\n    println(\"Epoch \" * string(epoch))\n    @show avg_loss(data)\n  end\nend\n```\n:::\n\n\n## Laplace approximation\n\nLaplace approximation for the posterior predictive can be implemented as follows:\n\n::: {.cell execution_count=6}\n``` {.julia .cell-code}\nla = Laplace(nn; likelihood=:classification, λ=λ, subset_of_weights=:last_layer)\nfit!(la, data)\nla_untuned = deepcopy(la)   # saving for plotting\noptimize_prior!(la; verbose=true, n_steps=500)\n```\n:::\n\n\nThe plot below shows the resulting posterior predictive surface for the plugin estimator (left) and the Laplace approximation (right).\n\n::: {.cell execution_count=7}\n``` {.julia .cell-code}\nzoom = 0\np_plugin = plot(la, X, ys; title=\"Plugin\", link_approx=:plugin, clim=(0,1))\np_untuned = plot(la_untuned, X, ys; title=\"LA - raw (λ=$(unique(diag(la_untuned.prior.P₀))[1]))\", clim=(0,1), zoom=zoom)\np_laplace = plot(la, X, ys; title=\"LA - tuned (λ=$(round(unique(diag(la.prior.P₀))[1],digits=2)))\", clim=(0,1), zoom=zoom)\nplot(p_plugin, p_untuned, p_laplace, layout=(1,3), size=(1700,400))\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n![](logit_files/figure-commonmark/cell-8-output-1.svg){}\n:::\n:::\n\n\nNow we can test the level of calibration of the neural network.\nFirst we collect the predicted results over the test dataset\n\n::: {.cell execution_count=8}\n``` {.julia .cell-code}\n predicted_distributions= predict(la, X_test,ret_distr=true)\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```\n1×20 Matrix{Distributions.Bernoulli{Float64}}:\n Distributions.Bernoulli{Float64}(p=0.13122)  …  Distributions.Bernoulli{Float64}(p=0.109559)\n```\n:::\n:::\n\n\nthen we plot the calibration plot\n\n::: {.cell execution_count=9}\n``` {.julia .cell-code}\nCalibration_Plot(la,ys_test,vec(predicted_distributions);n_bins = 10)\n```\n\n::: {.cell-output .cell-output-display}\n![](logit_files/figure-commonmark/cell-10-output-1.svg){}\n:::\n:::\n\n\nas we can see from the plot, although extremely accurate, the neural network does not seem to be calibrated well. This is, however, an effect of the extreme accuracy reached by the neural network which causes the lack of predictions with  high uncertainty (low certainty). We can see this by looking at the level of sharpness for the two classes which are extremely close to 1, indicating the high level of trust that the neural network has in the predictions.\n\n::: {.cell execution_count=10}\n``` {.julia .cell-code}\nsharpness_classification(ys_test,vec(predicted_distributions))\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```\n(0.9131870336577175, 0.8865055827351365)\n```\n:::\n:::\n\n\n",
+    "markdown": "```@meta\nCurrentModule = LaplaceRedux\n```\n\n# Bayesian Logistic Regression\n\n## Libraries\n\n::: {.cell execution_count=1}\n``` {.julia .cell-code}\nusing Pkg; Pkg.activate(\"docs\")\n# Import libraries\nusing Flux, Plots, TaijaPlotting, Random, Statistics, LaplaceRedux, LinearAlgebra\ntheme(:lime)\n```\n:::\n\n\n## Data\n\nWe will use synthetic data with linearly separable samples:\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\n# set seed\nseed= 1234\nRandom.seed!(seed)\n# Number of points to generate.\nxs, ys = LaplaceRedux.Data.toy_data_linear(100; seed=seed)\nX = hcat(xs...) # bring into tabular format\n```\n:::\n\n\nsplit in a training and test set\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\n# Shuffle the data\nn = length(ys)\nindices = randperm(n)\n\n# Define the split ratio\nsplit_ratio = 0.8\nsplit_index = Int(floor(split_ratio * n))\n\n# Split the data into training and test sets\ntrain_indices = indices[1:split_index]\ntest_indices = indices[split_index+1:end]\n\nxs_train = xs[train_indices]\nxs_test = xs[test_indices]\nys_train = ys[train_indices]\nys_test = ys[test_indices]\n# bring into tabular format\nX_train = hcat(xs_train...) \nX_test = hcat(xs_test...) \n\ndata = zip(xs_train,ys_train)\n```\n:::\n\n\n## Model\n\nLogistic regression with weight decay can be implemented in Flux.jl as a single dense (linear) layer with binary logit crossentropy loss:\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\nnn = Chain(Dense(2,1))\nλ = 0.5\nsqnorm(x) = sum(abs2, x)\nweight_regularization(λ=λ) = 1/2 * λ^2 * sum(sqnorm, Flux.params(nn))\nloss(x, y) = Flux.Losses.logitbinarycrossentropy(nn(x), y) + weight_regularization()\n```\n:::\n\n\nThe code below simply trains the model. After about 50 training epochs training loss stagnates.\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code}\nusing Flux.Optimise: update!, Adam\nopt = Adam()\nepochs = 50\navg_loss(data) = mean(map(d -> loss(d[1],d[2]), data))\nshow_every = epochs/10\n\nfor epoch = 1:epochs\n  for d in data\n    gs = gradient(Flux.params(nn)) do\n      l = loss(d...)\n    end\n    update!(opt, Flux.params(nn), gs)\n  end\n  if epoch % show_every == 0\n    println(\"Epoch \" * string(epoch))\n    @show avg_loss(data)\n  end\nend\n```\n:::\n\n\n## Laplace approximation\n\nLaplace approximation for the posterior predictive can be implemented as follows:\n\n::: {.cell execution_count=6}\n``` {.julia .cell-code}\nla = Laplace(nn; likelihood=:classification, λ=λ, subset_of_weights=:last_layer)\nfit!(la, data)\nla_untuned = deepcopy(la)   # saving for plotting\noptimize_prior!(la; verbosity=1, n_steps=500)\n```\n:::\n\n\nThe plot below shows the resulting posterior predictive surface for the plugin estimator (left) and the Laplace approximation (right).\n\n::: {.cell execution_count=7}\n``` {.julia .cell-code}\nzoom = 0\np_plugin = plot(la, X, ys; title=\"Plugin\", link_approx=:plugin, clim=(0,1))\np_untuned = plot(la_untuned, X, ys; title=\"LA - raw (λ=$(unique(diag(la_untuned.prior.P₀))[1]))\", clim=(0,1), zoom=zoom)\np_laplace = plot(la, X, ys; title=\"LA - tuned (λ=$(round(unique(diag(la.prior.P₀))[1],digits=2)))\", clim=(0,1), zoom=zoom)\nplot(p_plugin, p_untuned, p_laplace, layout=(1,3), size=(1700,400))\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n![](logit_files/figure-commonmark/cell-8-output-1.svg){}\n:::\n:::\n\n\nNow we can test the level of calibration of the neural network.\nFirst we collect the predicted results over the test dataset\n\n::: {.cell execution_count=8}\n``` {.julia .cell-code}\n predicted_distributions= predict(la, X_test,ret_distr=true)\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```\n1×20 Matrix{Distributions.Bernoulli{Float64}}:\n Distributions.Bernoulli{Float64}(p=0.13122)  …  Distributions.Bernoulli{Float64}(p=0.109559)\n```\n:::\n:::\n\n\nthen we plot the calibration plot\n\n::: {.cell execution_count=9}\n``` {.julia .cell-code}\nCalibration_Plot(la,ys_test,vec(predicted_distributions);n_bins = 10)\n```\n\n::: {.cell-output .cell-output-display}\n![](logit_files/figure-commonmark/cell-10-output-1.svg){}\n:::\n:::\n\n\nas we can see from the plot, although extremely accurate, the neural network does not seem to be calibrated well. This is, however, an effect of the extreme accuracy reached by the neural network which causes the lack of predictions with  high uncertainty (low certainty). We can see this by looking at the level of sharpness for the two classes which are extremely close to 1, indicating the high level of trust that the neural network has in the predictions.\n\n::: {.cell execution_count=10}\n``` {.julia .cell-code}\nsharpness_classification(ys_test,vec(predicted_distributions))\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```\n(0.9131870336577175, 0.8865055827351365)\n```\n:::\n:::\n\n\n",
     "supporting": [
       "logit_files"
     ],