Skip to content

Commit

Permalink
Merge pull request #127 from JuliaTrustworthyAI/125-change-the-verbos…
Browse files Browse the repository at this point in the history
…e-param-of-optimize_prior-from-a-boolean-to-an-integer

change verbose::Bool  with verbosity::Int
  • Loading branch information
pasq-cat authored Oct 29, 2024
2 parents 37f6bda + 9ce642f commit 38b6e69
Show file tree
Hide file tree
Showing 30 changed files with 305 additions and 976 deletions.
10 changes: 0 additions & 10 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,11 @@ version = "1.1.1"
[deps]
ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
Compat = "34da2185-b29b-5c13-b0c7-acf172513d20"
ComputationalResources = "ed09eef8-17a6-5b46-8889-db040fac31e3"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
MLJFlux = "094fc8d1-fd35-5302-93ea-dabda2abf845"
MLJModelInterface = "e80e1ace-859a-464e-9ed9-23947d8ae3ea"
MLUtils = "f1d291b0-491e-4a28-83b9-f70985020b54"
Optimisers = "3bd65402-5787-11e9-1adc-39752487f4e2"
ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
Expand All @@ -26,16 +21,11 @@ Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
Aqua = "0.8"
ChainRulesCore = "1.23.0"
Compat = "4.7.0"
ComputationalResources = "0.3.2"
Distributions = "0.25.109"
Flux = "0.12, 0.13, 0.14"
LinearAlgebra = "1.7, 1.10"
MLJBase = "1"
MLJFlux = "0.5"
MLJModelInterface = "1.8.0"
MLUtils = "0.4"
Optimisers = "0.2, 0.3"
ProgressMeter = "1.7.2"
Random = "1.9, 1.10"
Statistics = "1"
Tables = "1.10.1"
Expand Down
2 changes: 1 addition & 1 deletion _freeze/docs/src/tutorials/logit/execute-results/md.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"hash": "64cc61b7b60f8aef12841a8bd09bc8bb",
"result": {
"engine": "jupyter",
"markdown": "```@meta\nCurrentModule = LaplaceRedux\n```\n\n# Bayesian Logistic Regression\n\n## Libraries\n\n::: {.cell execution_count=1}\n``` {.julia .cell-code}\nusing Pkg; Pkg.activate(\"docs\")\n# Import libraries\nusing Flux, Plots, TaijaPlotting, Random, Statistics, LaplaceRedux, LinearAlgebra\ntheme(:lime)\n```\n:::\n\n\n## Data\n\nWe will use synthetic data with linearly separable samples:\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\n# set seed\nseed= 1234\nRandom.seed!(seed)\n# Number of points to generate.\nxs, ys = LaplaceRedux.Data.toy_data_linear(100; seed=seed)\nX = hcat(xs...) # bring into tabular format\n```\n:::\n\n\nsplit in a training and test set\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\n# Shuffle the data\nn = length(ys)\nindices = randperm(n)\n\n# Define the split ratio\nsplit_ratio = 0.8\nsplit_index = Int(floor(split_ratio * n))\n\n# Split the data into training and test sets\ntrain_indices = indices[1:split_index]\ntest_indices = indices[split_index+1:end]\n\nxs_train = xs[train_indices]\nxs_test = xs[test_indices]\nys_train = ys[train_indices]\nys_test = ys[test_indices]\n# bring into tabular format\nX_train = hcat(xs_train...) \nX_test = hcat(xs_test...) \n\ndata = zip(xs_train,ys_train)\n```\n:::\n\n\n## Model\n\nLogistic regression with weight decay can be implemented in Flux.jl as a single dense (linear) layer with binary logit crossentropy loss:\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\nnn = Chain(Dense(2,1))\nλ = 0.5\nsqnorm(x) = sum(abs2, x)\nweight_regularization(λ=λ) = 1/2 * λ^2 * sum(sqnorm, Flux.params(nn))\nloss(x, y) = Flux.Losses.logitbinarycrossentropy(nn(x), y) + weight_regularization()\n```\n:::\n\n\nThe code below simply trains the model. After about 50 training epochs training loss stagnates.\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code}\nusing Flux.Optimise: update!, Adam\nopt = Adam()\nepochs = 50\navg_loss(data) = mean(map(d -> loss(d[1],d[2]), data))\nshow_every = epochs/10\n\nfor epoch = 1:epochs\n for d in data\n gs = gradient(Flux.params(nn)) do\n l = loss(d...)\n end\n update!(opt, Flux.params(nn), gs)\n end\n if epoch % show_every == 0\n println(\"Epoch \" * string(epoch))\n @show avg_loss(data)\n end\nend\n```\n:::\n\n\n## Laplace approximation\n\nLaplace approximation for the posterior predictive can be implemented as follows:\n\n::: {.cell execution_count=6}\n``` {.julia .cell-code}\nla = Laplace(nn; likelihood=:classification, λ=λ, subset_of_weights=:last_layer)\nfit!(la, data)\nla_untuned = deepcopy(la) # saving for plotting\noptimize_prior!(la; verbose=true, n_steps=500)\n```\n:::\n\n\nThe plot below shows the resulting posterior predictive surface for the plugin estimator (left) and the Laplace approximation (right).\n\n::: {.cell execution_count=7}\n``` {.julia .cell-code}\nzoom = 0\np_plugin = plot(la, X, ys; title=\"Plugin\", link_approx=:plugin, clim=(0,1))\np_untuned = plot(la_untuned, X, ys; title=\"LA - raw (λ=$(unique(diag(la_untuned.prior.P₀))[1]))\", clim=(0,1), zoom=zoom)\np_laplace = plot(la, X, ys; title=\"LA - tuned (λ=$(round(unique(diag(la.prior.P₀))[1],digits=2)))\", clim=(0,1), zoom=zoom)\nplot(p_plugin, p_untuned, p_laplace, layout=(1,3), size=(1700,400))\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n![](logit_files/figure-commonmark/cell-8-output-1.svg){}\n:::\n:::\n\n\nNow we can test the level of calibration of the neural network.\nFirst we collect the predicted results over the test dataset\n\n::: {.cell execution_count=8}\n``` {.julia .cell-code}\n predicted_distributions= predict(la, X_test,ret_distr=true)\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```\n1×20 Matrix{Distributions.Bernoulli{Float64}}:\n Distributions.Bernoulli{Float64}(p=0.13122) … Distributions.Bernoulli{Float64}(p=0.109559)\n```\n:::\n:::\n\n\nthen we plot the calibration plot\n\n::: {.cell execution_count=9}\n``` {.julia .cell-code}\nCalibration_Plot(la,ys_test,vec(predicted_distributions);n_bins = 10)\n```\n\n::: {.cell-output .cell-output-display}\n![](logit_files/figure-commonmark/cell-10-output-1.svg){}\n:::\n:::\n\n\nas we can see from the plot, although extremely accurate, the neural network does not seem to be calibrated well. This is, however, an effect of the extreme accuracy reached by the neural network which causes the lack of predictions with high uncertainty (low certainty). We can see this by looking at the level of sharpness for the two classes which are extremely close to 1, indicating the high level of trust that the neural network has in the predictions.\n\n::: {.cell execution_count=10}\n``` {.julia .cell-code}\nsharpness_classification(ys_test,vec(predicted_distributions))\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```\n(0.9131870336577175, 0.8865055827351365)\n```\n:::\n:::\n\n\n",
"markdown": "```@meta\nCurrentModule = LaplaceRedux\n```\n\n# Bayesian Logistic Regression\n\n## Libraries\n\n::: {.cell execution_count=1}\n``` {.julia .cell-code}\nusing Pkg; Pkg.activate(\"docs\")\n# Import libraries\nusing Flux, Plots, TaijaPlotting, Random, Statistics, LaplaceRedux, LinearAlgebra\ntheme(:lime)\n```\n:::\n\n\n## Data\n\nWe will use synthetic data with linearly separable samples:\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\n# set seed\nseed= 1234\nRandom.seed!(seed)\n# Number of points to generate.\nxs, ys = LaplaceRedux.Data.toy_data_linear(100; seed=seed)\nX = hcat(xs...) # bring into tabular format\n```\n:::\n\n\nsplit in a training and test set\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\n# Shuffle the data\nn = length(ys)\nindices = randperm(n)\n\n# Define the split ratio\nsplit_ratio = 0.8\nsplit_index = Int(floor(split_ratio * n))\n\n# Split the data into training and test sets\ntrain_indices = indices[1:split_index]\ntest_indices = indices[split_index+1:end]\n\nxs_train = xs[train_indices]\nxs_test = xs[test_indices]\nys_train = ys[train_indices]\nys_test = ys[test_indices]\n# bring into tabular format\nX_train = hcat(xs_train...) \nX_test = hcat(xs_test...) \n\ndata = zip(xs_train,ys_train)\n```\n:::\n\n\n## Model\n\nLogistic regression with weight decay can be implemented in Flux.jl as a single dense (linear) layer with binary logit crossentropy loss:\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\nnn = Chain(Dense(2,1))\nλ = 0.5\nsqnorm(x) = sum(abs2, x)\nweight_regularization(λ=λ) = 1/2 * λ^2 * sum(sqnorm, Flux.params(nn))\nloss(x, y) = Flux.Losses.logitbinarycrossentropy(nn(x), y) + weight_regularization()\n```\n:::\n\n\nThe code below simply trains the model. After about 50 training epochs training loss stagnates.\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code}\nusing Flux.Optimise: update!, Adam\nopt = Adam()\nepochs = 50\navg_loss(data) = mean(map(d -> loss(d[1],d[2]), data))\nshow_every = epochs/10\n\nfor epoch = 1:epochs\n for d in data\n gs = gradient(Flux.params(nn)) do\n l = loss(d...)\n end\n update!(opt, Flux.params(nn), gs)\n end\n if epoch % show_every == 0\n println(\"Epoch \" * string(epoch))\n @show avg_loss(data)\n end\nend\n```\n:::\n\n\n## Laplace approximation\n\nLaplace approximation for the posterior predictive can be implemented as follows:\n\n::: {.cell execution_count=6}\n``` {.julia .cell-code}\nla = Laplace(nn; likelihood=:classification, λ=λ, subset_of_weights=:last_layer)\nfit!(la, data)\nla_untuned = deepcopy(la) # saving for plotting\noptimize_prior!(la; verbosity=1, n_steps=500)\n```\n:::\n\n\nThe plot below shows the resulting posterior predictive surface for the plugin estimator (left) and the Laplace approximation (right).\n\n::: {.cell execution_count=7}\n``` {.julia .cell-code}\nzoom = 0\np_plugin = plot(la, X, ys; title=\"Plugin\", link_approx=:plugin, clim=(0,1))\np_untuned = plot(la_untuned, X, ys; title=\"LA - raw (λ=$(unique(diag(la_untuned.prior.P₀))[1]))\", clim=(0,1), zoom=zoom)\np_laplace = plot(la, X, ys; title=\"LA - tuned (λ=$(round(unique(diag(la.prior.P₀))[1],digits=2)))\", clim=(0,1), zoom=zoom)\nplot(p_plugin, p_untuned, p_laplace, layout=(1,3), size=(1700,400))\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n![](logit_files/figure-commonmark/cell-8-output-1.svg){}\n:::\n:::\n\n\nNow we can test the level of calibration of the neural network.\nFirst we collect the predicted results over the test dataset\n\n::: {.cell execution_count=8}\n``` {.julia .cell-code}\n predicted_distributions= predict(la, X_test,ret_distr=true)\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```\n1×20 Matrix{Distributions.Bernoulli{Float64}}:\n Distributions.Bernoulli{Float64}(p=0.13122) … Distributions.Bernoulli{Float64}(p=0.109559)\n```\n:::\n:::\n\n\nthen we plot the calibration plot\n\n::: {.cell execution_count=9}\n``` {.julia .cell-code}\nCalibration_Plot(la,ys_test,vec(predicted_distributions);n_bins = 10)\n```\n\n::: {.cell-output .cell-output-display}\n![](logit_files/figure-commonmark/cell-10-output-1.svg){}\n:::\n:::\n\n\nas we can see from the plot, although extremely accurate, the neural network does not seem to be calibrated well. This is, however, an effect of the extreme accuracy reached by the neural network which causes the lack of predictions with high uncertainty (low certainty). We can see this by looking at the level of sharpness for the two classes which are extremely close to 1, indicating the high level of trust that the neural network has in the predictions.\n\n::: {.cell execution_count=10}\n``` {.julia .cell-code}\nsharpness_classification(ys_test,vec(predicted_distributions))\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```\n(0.9131870336577175, 0.8865055827351365)\n```\n:::\n:::\n\n\n",
"supporting": [
"logit_files"
],
Expand Down
Loading

0 comments on commit 38b6e69

Please sign in to comment.