Skip to content

Commit

Permalink
Merge pull request #10 from pat-alt/develop
Browse files Browse the repository at this point in the history
Develop
  • Loading branch information
pat-alt authored Oct 15, 2022
2 parents 042cc39 + 987aa82 commit 66bd956
Show file tree
Hide file tree
Showing 31 changed files with 803 additions and 356 deletions.
1 change: 1 addition & 0 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ on:
push:
branches:
- main
- develop
tags: ['*']
pull_request:
concurrency:
Expand Down
3 changes: 3 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ version = "0.1.0"

[deps]
MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
MLJModelInterface = "e80e1ace-859a-464e-9ed9-23947d8ae3ea"
PkgTemplates = "14b8a8f1-9102-5b29-a752-f990bacb7fe1"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"

[compat]
Expand Down
44 changes: 28 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ using Pkg
Pkg.add(url="https://github.com/pat-alt/ConformalPrediction.jl")
```

## Usage Example - Regression 🔍
## Usage Example - Inductive Conformal Regression 🔍

To illustrate the intended use of the package, let’s have a quick look at a simple regression problem. Using [MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/) we first generate some synthetic data and then determine indices for our training, calibration and test data:

Expand All @@ -40,35 +40,47 @@ X, y = MLJ.make_regression(1000, 2)
train, calibration, test = partition(eachindex(y), 0.4, 0.4)
```

We then train a boosted tree ([EvoTrees](https://github.com/Evovest/EvoTrees.jl)) and follow the standard [MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/) training procedure.
We then train a decision tree ([DecisionTree](https://github.com/Evovest/DecisionTree.jl)) and follow the standard [MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/) training procedure.

``` julia
EvoTreeRegressor = @load EvoTreeRegressor pkg=EvoTrees
model = EvoTreeRegressor()
mach = machine(model, X, y)
fit!(mach, rows=train)
DecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree
model = DecisionTreeRegressor()
```

To turn our conventional machine into a conformal machine, we just need to declare it as such and then calibrate it using our calibration data:
To turn our conventional machine into a conformal model, we just need to declare it as such by using `conformal_model` wrapper function. The generated conformal model instance can wrapped in data to create a *machine* following standard MLJ convention. By default that function instantiates a `SimpleInductiveRegressor`.

Fitting Inductive Conformal Predictors using `fit!` trains the underlying machine learning model, but it does not compute nonconformity scores. That is because Inductive Conformal Predictors rely on a separate set of calibration data. Consequently, conformal models of type `InductiveConformalModel <: ConformalModel` require a separate calibration step to be trained for conformal prediction. This can be implemented by calling the generic `calibrate!` method on the model instance.

``` julia
using ConformalPrediction
conf_mach = conformal_machine(mach)
calibrate!(conf_mach, selectrows(X, calibration), y[calibration])
conf_model = conformal_model(model)
mach = machine(conf_model, X, y)
fit!(mach, rows=train)
calibrate!(conf_model, selectrows(X, calibration), y[calibration])
```

Predictions can then be computed using the generic `predict` method. The code below produces predictions a random subset of test samples:

``` julia
predict(conf_mach, selectrows(X, rand(test,5)))
predict(conf_model, selectrows(X, rand(test,5)))
```

5-element Vector{Vector{Pair{String, Vector{Float64}}}}:
["lower" => [-2.5656268495995658], "upper" => [1.4558014252276577]]
["lower" => [-2.5656268495995658], "upper" => [1.4558014252276577]]
["lower" => [-2.5656268495995658], "upper" => [1.4558014252276577]]
["lower" => [-3.906072026876036], "upper" => [0.11535624795118737]]
["lower" => [-1.9725646439635294], "upper" => [2.048863630863694]]
╭────────────────────────────────────────────────────────────────────╮
│ │
│ (1) ["lower" => [0.3963962694045419], "upper" => │
│ [1.0933093154587168]] │
│ (2) ["lower" => [0.819397821856154], "upper" => │
│ [1.516310867910329]] │
│ (3) ["lower" => [-0.6332868767933615], "upper" => │
│ [0.06362616926081349]] │
│ (4) ["lower" => [0.7215947047552422], "upper" => │
│ [1.4185077508094173]] │
│ (5) ["lower" => [2.0323107892753947], "upper" => │
│ [2.7292238353295697]] │
│ │
│ │
│ │
╰──────────────────────────────────────────────────────── 5 items ───╯

## Contribute 🛠

Expand Down
12 changes: 5 additions & 7 deletions README.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,15 @@ format:
variant: -raw_html
wrap: none
self-contained: true
execute:
freeze: auto
echo: true
eval: true
output: false
crossref:
fig-prefix: Figure
tbl-prefix: Table
bibliography: https://raw.githubusercontent.com/pat-alt/bib/main/bib.bib
output: asis
execute:
eval: true
echo: true
output: false
freeze: auto # re-render only when source changes
jupyter: julia-1.7
---

# ConformalPrediction
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"hash": "23e5ff6ddc8b19eba4e8290b20658f33",
"result": {
"markdown": "---\nformat:\n commonmark:\n variant: '-raw_html'\n wrap: none\n self-contained: true\ncrossref:\n fig-prefix: Figure\n tbl-prefix: Table\nbibliography: 'https://raw.githubusercontent.com/pat-alt/bib/main/bib.bib'\noutput: asis\nexecute:\n output: false\n freeze: auto\n eval: true\n echo: true\n---\n\n# Classification Tutorial\n\n[INCOMPLETE]\n\nWe firstly generate some synthetic data with three classes and partition it into a training set, a calibration set and a test set:\n\n::: {.cell execution_count=1}\n``` {.julia .cell-code}\nusing MLJ\nX, y = MLJ.make_blobs(1000, 2, centers=3, cluster_std=2)\ntrain, calibration, test = partition(eachindex(y), 0.4, 0.4)\n```\n:::\n\n\nFollowing the standard [MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/) procedure, we train a boosted tree for the classification task:\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\nEvoTreeClassifier = @load EvoTreeClassifier pkg=EvoTrees\nmodel = EvoTreeClassifier() \nmach = machine(model, X, y)\nfit!(mach, rows=train)\n```\n:::\n\n\nNext we instantiate our conformal machine and calibrate using the calibration data:\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\nusing ConformalPrediction\nconf_mach = conformal_machine(mach)\ncalibrate!(conf_mach, selectrows(X, calibration), y[calibration])\n```\n:::\n\n\nUsing the generic `predict` method we can generate prediction sets like so:\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\npredict(conf_mach, selectrows(X, rand(test,5)))\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```\n╭──────────────────────────────────────────────────────────────────────────╮\n│ │\n│ (1) Pair[1 => missing, 2 => 0.6448661054062889, 3 => missing] │\n│ (2) Pair[1 => missing, 2 => missing, 3 => 0.8197529347049547] │\n│ (3) Pair[1 => missing, 2 => 0.8229512785953512, 3 => missing] │\n│ (4) Pair[1 => missing, 2 => 0.7858778376049668, 3 => missing] │\n│ (5) Pair[1 => missing, 2 => missing, 3 => 0.8197529347049547] │\n│ │\n│ │\n╰────────────────────────────────────────────────────────────── 5 items ───╯\n```\n:::\n:::\n\n\n",
"markdown": "---\nformat:\n commonmark:\n variant: '-raw_html'\n wrap: none\n self-contained: true\ncrossref:\n fig-prefix: Figure\n tbl-prefix: Table\nbibliography: 'https://raw.githubusercontent.com/pat-alt/bib/main/bib.bib'\noutput: asis\nexecute:\n output: false\n freeze: auto\n eval: true\n echo: true\n---\n\n# Classification Tutorial\n\n[INCOMPLETE]\n\nWe firstly generate some synthetic data with three classes and partition it into a training set, a calibration set and a test set:\n\n::: {.cell execution_count=1}\n``` {.julia .cell-code}\nusing MLJ\nX, y = MLJ.make_blobs(1000, 2, centers=3, cluster_std=2)\ntrain, calibration, test = partition(eachindex(y), 0.4, 0.4)\n```\n:::\n\n\nFollowing the standard [MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/) procedure, we train a decision tree for the classification task:\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\nEvoTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree\nmodel = DecisionTreeClassifier() \nmodel = machine(model, X, y)\nfit!(model, rows=train)\n```\n:::\n\n\nNext we instantiate our conformal model and calibrate using the calibration data:\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\nusing ConformalPrediction\nconformal_model = conformal_model(model)\ncalibrate!(conf_model, selectrows(X, calibration), y[calibration])\n```\n:::\n\n\nUsing the generic `predict` method we can generate prediction sets like so:\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\npredict(conf_model, selectrows(X, rand(test,5)))\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```\n╭──────────────────────────────────────────────────────────────────────────╮\n│ │\n│ (1) Pair[1 => missing, 2 => 0.6448661054062889, 3 => missing] │\n│ (2) Pair[1 => missing, 2 => missing, 3 => 0.8197529347049547] │\n│ (3) Pair[1 => missing, 2 => 0.8229512785953512, 3 => missing] │\n│ (4) Pair[1 => missing, 2 => 0.7858778376049668, 3 => missing] │\n│ (5) Pair[1 => missing, 2 => missing, 3 => 0.8197529347049547] │\n│ │\n│ │\n╰────────────────────────────────────────────────────────────── 5 items ───╯\n```\n:::\n:::\n\n\n",
"supporting": [
"simple_files"
],
Expand Down
10 changes: 10 additions & 0 deletions _freeze/docs/src/index/execute-results/md.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"hash": "c56dcfed5fce5fece3f8dd90b08af0bd",
"result": {
"markdown": "```@meta\nCurrentModule = ConformalPrediction\n```\n\n# ConformalPrediction\n\nDocumentation for [ConformalPrediction.jl](https://github.com/pat-alt/ConformalPrediction.jl).\n\n\n\n`ConformalPrediction.jl` is a package for Uncertainty Quantification (UQ) through Conformal Prediction (CP) in Julia. It is designed to work with supervised models trained in [MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/). Conformal Prediction is distribution-free, easy-to-understand, easy-to-use and model-agnostic. \n\n## Disclaimer ⚠️\n\nThis package is in its very early stages of development. In fact, I've built this package largely to gain a better understanding of the topic myself. So far only the most simple approaches have been implemented:\n\n- Naive method for regression.\n- LABEL approach for classification [@sadinle2019least].\n\nI have only tested it for a few of the supervised models offered by [MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/).\n\n## Installation 🚩\n\nYou can install the first stable release from the general registry:\n\n```julia\nusing Pkg\nPkg.add(\"ConformalPrediction\")\n```\n\nThe development version can be installed as follows:\n\n```julia\nusing Pkg\nPkg.add(url=\"https://github.com/pat-alt/ConformalPrediction.jl\")\n```\n\n## Usage Example - Inductive Conformal Regression 🔍\n\nTo illustrate the intended use of the package, let's have a quick look at a simple regression problem. Using [MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/) we first generate some synthetic data and then determine indices for our training, calibration and test data:\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\nusing MLJ\nX, y = MLJ.make_regression(1000, 2)\ntrain, calibration, test = partition(eachindex(y), 0.4, 0.4)\n```\n:::\n\n\nWe then train a decision tree ([DecisionTree](https://github.com/Evovest/DecisionTree.jl)) and follow the standard [MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/) training procedure.\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\nDecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree\nmodel = DecisionTreeRegressor() \n```\n:::\n\n\nTo turn our conventional machine into a conformal model, we just need to declare it as such by using `conformal_model` wrapper function. The generated conformal model instance can wrapped in data to create a *machine* following standard MLJ convention. By default that function instantiates a `SimpleInductiveRegressor`. \n\nFitting Inductive Conformal Predictors using `fit!` trains the underlying machine learning model, but it does not compute nonconformity scores. That is because Inductive Conformal Predictors rely on a separate set of calibration data. Consequently, conformal models of type `InductiveConformalModel <: ConformalModel` require a separate calibration step to be trained for conformal prediction. This can be implemented by calling the generic `calibrate!` method on the model instance. \n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\nusing ConformalPrediction\nconf_model = conformal_model(model)\nmach = machine(conf_model, X, y)\nfit!(mach, rows=train)\ncalibrate!(conf_model, selectrows(X, calibration), y[calibration])\n```\n:::\n\n\nPredictions can then be computed using the generic `predict` method. The code below produces predictions a random subset of test samples:\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code}\npredict(conf_model, selectrows(X, rand(test,5)))\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```\n╭────────────────────────────────────────────────────────────────────╮\n│ │\n│ (1) [\"lower\" => [0.27243371134520067], \"upper\" => │\n│ [1.0198357965554317]] │\n│ (2) [\"lower\" => [0.6621889092109277], \"upper\" => │\n│ [1.4095909944211586]] │\n│ (3) [\"lower\" => [0.6835568713212139], \"upper\" => │\n│ [1.430958956531445]] │\n│ (4) [\"lower\" => [0.6835568713212139], \"upper\" => │\n│ [1.430958956531445]] │\n│ (5) [\"lower\" => [0.005568859502752321], \"upper\" => │\n│ [0.7529709447129833]] │\n│ │\n│ │\n│ │\n╰──────────────────────────────────────────────────────── 5 items ───╯\n```\n:::\n:::\n\n\n## Contribute 🛠\n\nContributions are welcome! Please follow the [SciML ColPrac guide](https://github.com/SciML/ColPrac).\n\n## References 🎓\n\n",
"supporting": [
"index_files"
],
"filters": []
}
}
10 changes: 10 additions & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,15 @@ project:
title: "ConformalPrediction.jl"
execute-dir: project

crossref:
fig-prefix: Figure
tbl-prefix: Table
bibliography: https://raw.githubusercontent.com/pat-alt/bib/main/bib.bib

execute:
freeze: auto
echo: true
eval: true
output: false


27 changes: 25 additions & 2 deletions docs/Manifest.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

julia_version = "1.8.1"
manifest_format = "2.0"
project_hash = "5ff5a6a704a15c1ad1e3ac182ca933ebf64a0761"
project_hash = "a9c53b8831f0d9c33b8a54796051d54892c81b58"

[[deps.ANSIColoredPrinters]]
git-tree-sha1 = "574baf8110975760d391c710b6341da1afa48d8c"
Expand All @@ -21,6 +21,11 @@ git-tree-sha1 = "69f7020bd72f069c219b5e8c236c1fa90d2cb409"
uuid = "621f4979-c628-5d54-868e-fcf4e3e8185c"
version = "1.2.1"

[[deps.AbstractTrees]]
git-tree-sha1 = "5c0b629df8a5566a06f5fef5100b53ea56e465a0"
uuid = "1520ce14-60c1-5f80-bbc7-55ef81b5835c"
version = "0.4.2"

[[deps.Adapt]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "195c5505521008abea5aee4f96930717958eac6f"
Expand Down Expand Up @@ -158,7 +163,7 @@ uuid = "ed09eef8-17a6-5b46-8889-db040fac31e3"
version = "0.3.2"

[[deps.ConformalPrediction]]
deps = ["MLJ", "Statistics"]
deps = ["MLJ", "MLJBase", "MLJModelInterface", "Statistics"]
path = ".."
uuid = "98bfc277-1877-43dc-819b-a3e38c30242f"
version = "0.1.0"
Expand Down Expand Up @@ -205,6 +210,12 @@ version = "1.0.0"
deps = ["Printf"]
uuid = "ade2ca70-3891-5945-98fb-dc099432e06a"

[[deps.DecisionTree]]
deps = ["AbstractTrees", "DelimitedFiles", "LinearAlgebra", "Random", "ScikitLearnBase", "Statistics"]
git-tree-sha1 = "fb3f7ff27befb9877bee84076dd9173185d7d86a"
uuid = "7806a523-6efd-50cb-b5f6-3fa6f1930dbb"
version = "0.11.2"

[[deps.DelimitedFiles]]
deps = ["Mmap"]
uuid = "8bb1440f-4735-579b-a4ab-409b98df4dab"
Expand Down Expand Up @@ -684,6 +695,12 @@ git-tree-sha1 = "f68deea1f25727f24a4afa9f941763e6fc44f5af"
uuid = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
version = "0.20.19"

[[deps.MLJDecisionTreeInterface]]
deps = ["DecisionTree", "MLJModelInterface", "Random", "Tables"]
git-tree-sha1 = "d0d682ef8504e1ab705f10307c587239ebb20c4d"
uuid = "c6f25543-311c-4c74-83dc-3ea6d1015661"
version = "0.2.5"

[[deps.MLJEnsembles]]
deps = ["CategoricalArrays", "CategoricalDistributions", "ComputationalResources", "Distributed", "Distributions", "MLJBase", "MLJModelInterface", "ProgressMeter", "Random", "ScientificTypesBase", "StatsBase"]
git-tree-sha1 = "ed2f724be26d0023cade9d59b55da93f528c3f26"
Expand Down Expand Up @@ -1004,6 +1021,12 @@ git-tree-sha1 = "a8e18eb383b5ecf1b5e6fc237eb39255044fd92b"
uuid = "30f210dd-8aff-4c5f-94ba-8e64358c1161"
version = "3.0.0"

[[deps.ScikitLearnBase]]
deps = ["LinearAlgebra", "Random", "Statistics"]
git-tree-sha1 = "7877e55c1523a4b336b433da39c8e8c08d2f221f"
uuid = "6e75b9c4-186b-50bd-896f-2d2496a4843e"
version = "0.5.0"

[[deps.Scratch]]
deps = ["Dates"]
git-tree-sha1 = "f94f779c94e58bf9ea243e77a37e16d9de9126bd"
Expand Down
3 changes: 2 additions & 1 deletion docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
[deps]
ConformalPrediction = "98bfc277-1877-43dc-819b-a3e38c30242f"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
DecisionTree = "f6006082-12f8-11e9-0c9c-0d5d367ab1e5"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
EvoTrees = "f6006082-12f8-11e9-0c9c-0d5d367ab1e5"
MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
MLJDecisionTreeInterface = "c6f25543-311c-4c74-83dc-3ea6d1015661"
PlotThemes = "ccf2f8ad-2431-5c83-bf29-c5338b663b6a"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
5 changes: 5 additions & 0 deletions docs/_metadata.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
format:
commonmark:
variant: -raw_html
wrap: none
self-contained: true
Loading

0 comments on commit 66bd956

Please sign in to comment.