Skip to content

Commit

Permalink
Merge pull request #80 from JuliaTrustworthyAI/79-revisit-sample-corr…
Browse files Browse the repository at this point in the history
…ection
  • Loading branch information
pat-alt authored Aug 2, 2023
2 parents 1a4ceff + 4fc2024 commit 5013ca1
Show file tree
Hide file tree
Showing 25 changed files with 2,809 additions and 2,447 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"hash": "f6be6dff987885f970c8b1d4c7f00f62",
"result": {
"markdown": "---\ntitle: Finite-sample Correction\n---\n\n\n\n\n\n\nWe follow the convention used in @angelopoulos2021gentle and @barber2021predictive to correct for the finite-sample bias of the empirical quantile. Specifically, we use the following definition of the $(1-\\alpha)$ empirical quantile:\n\n```math\n\\hat{q}_{n,\\alpha}^{+}\\{v\\} = \\frac{\\lceil (n+1)(1-\\alpha)\\rceil}{n}\n```\n\n@barber2021predictive further define as the $\\alpha$ empirical quantile:\n\n```math\n\\hat{q}_{n,\\alpha}^{-}\\{v\\} = \\frac{\\lfloor (n+1)\\alpha \\rfloor}{n} = - \\hat{q}_{n,\\alpha}^{+}\\{-v\\}\n```\n\nBelow we test this equality numerically by generating a large number of random vectors and comparing the two quantiles. We then plot the density of the difference between the two quantiles. While the errors are small, they are not negligible for small $n$. In our computations, we use $\\hat{q}_{n,\\alpha}^{-}\\{v\\}$ exactly as it is defined above, rather than relying on $- \\hat{q}_{n,\\alpha}^{+}\\{-v\\}$.\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\nusing ConformalPrediction: qplus, qminus\nnobs = [100, 1000, 10000]\nn = 1000\nalpha = 0.1\nplts = []\nΔ = Float32[]\nfor _nobs in nobs\n for i in 1:n\n v = rand(_nobs)\n δ = qminus(v, alpha) - (-qplus(-v, 1-alpha))\n push!(Δ, δ)\n end\n plt = density(Δ)\n vline!([mean(Δ)], color=:red, label=\"mean\")\n push!(plts, plt)\nend\nplot(plts..., layout=(1,3), size=(900, 300), legend=:topleft, title=[\"nobs = 100\" \"nobs = 1000\" \"nobs = 10000\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n![](finite_sample_correction_files/figure-commonmark/cell-3-output-1.svg){}\n:::\n:::\n\n\nSee also this related [discussion](https://github.com/JuliaTrustworthyAI/ConformalPrediction.jl/discussions/17).\n\n## References\n\n",
"supporting": [
"finite_sample_correction_files"
],
"filters": []
}
}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

2,315 changes: 1,156 additions & 1,159 deletions _freeze/docs/src/tutorials/classification/figure-commonmark/cell-10-output-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 24 additions & 1 deletion docs/Manifest.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

julia_version = "1.9.2"
manifest_format = "2.0"
project_hash = "31cd76ebe2189514892e5384387c85920222063e"
project_hash = "e1685a29d6d370eab88233cc5ac9d849e2f3994f"

[[deps.ANSIColoredPrinters]]
git-tree-sha1 = "574baf8110975760d391c710b6341da1afa48d8c"
Expand Down Expand Up @@ -1754,6 +1754,11 @@ git-tree-sha1 = "5ee110f3d54e0f29daacc3bdde01b638bf05b9bc"
uuid = "12afc1b8-fad6-47e1-9132-84abc478905f"
version = "0.2.10"

[[deps.Observables]]
git-tree-sha1 = "6862738f9796b3edc1c09d0890afce4eca9e7e93"
uuid = "510215fc-4207-5dde-b226-833fc4488ee2"
version = "0.5.4"

[[deps.OffsetArrays]]
deps = ["Adapt"]
git-tree-sha1 = "2ac17d29c523ce1cd38e27785a7d23024853a4bb"
Expand Down Expand Up @@ -2391,6 +2396,12 @@ git-tree-sha1 = "8cc7a5385ecaa420f0b3426f9b0135d0df0638ed"
uuid = "3eaba693-59b7-5ba5-a881-562e759f1c8d"
version = "0.7.2"

[[deps.StatsPlots]]
deps = ["AbstractFFTs", "Clustering", "DataStructures", "Distributions", "Interpolations", "KernelDensity", "LinearAlgebra", "MultivariateStats", "NaNMath", "Observables", "Plots", "RecipesBase", "RecipesPipeline", "Reexport", "StatsBase", "TableOperations", "Tables", "Widgets"]
git-tree-sha1 = "9115a29e6c2cf66cf213ccc17ffd61e27e743b24"
uuid = "f3b207a7-027a-5e70-b257-86293d7955fd"
version = "0.15.6"

[[deps.StrTables]]
deps = ["Dates"]
git-tree-sha1 = "5998faae8c6308acc25c25896562a1e66a3bb038"
Expand Down Expand Up @@ -2465,6 +2476,12 @@ deps = ["Dates"]
uuid = "fa267f1f-6049-4f14-aa54-33bafae1ed76"
version = "1.0.3"

[[deps.TableOperations]]
deps = ["SentinelArrays", "Tables", "Test"]
git-tree-sha1 = "e383c87cf2a1dc41fa30c093b2a19877c83e1bc1"
uuid = "ab02a1b2-a7df-11e8-156e-fb1833f50b87"
version = "1.2.0"

[[deps.TableTraits]]
deps = ["IteratorInterfaceExtensions"]
git-tree-sha1 = "c06b2f539df1c6efa794486abfb6ed2022561a39"
Expand Down Expand Up @@ -2674,6 +2691,12 @@ git-tree-sha1 = "b1be2855ed9ed8eac54e5caff2afcdb442d52c23"
uuid = "ea10d353-3f73-51f8-a26c-33c1cb351aa5"
version = "1.4.2"

[[deps.Widgets]]
deps = ["Colors", "Dates", "Observables", "OrderedCollections"]
git-tree-sha1 = "fcdae142c1cfc7d89de2d11e08721d0f2f86c98a"
uuid = "cc8bc4a8-27d6-5769-a93b-9d913e69aa62"
version = "0.6.6"

[[deps.WoodburyMatrices]]
deps = ["LinearAlgebra", "SparseArrays"]
git-tree-sha1 = "de67fa59e33ad156a590055375a30b23c40299d3"
Expand Down
1 change: 1 addition & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ PrettyTables = "08abe8d2-0d0c-5749-adfa-8a2ac140af0d"
Serialization = "9e88b42a-f829-5b0c-bbe9-9e923198166b"
ShiftedArrays = "1277b4bf-5013-50f5-be3d-901d8477a67a"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
SymbolicRegression = "8254be44-1295-4e6a-a16d-46603ac705cb"
Transformers = "21ca0261-441d-5938-ace7-c90938fde4d4"
UnicodePlots = "b8865327-cd53-5732-bb35-84acbb429228"
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ makedocs(;
"🤓 Explanation" => [
"Overview" => "explanation/index.md",
"Package Architecture" => "explanation/architecture.md",
"Finite-sample Correction" => "explanation/finite_sample_correction.md",
],
"🧐 Reference" => "reference.md",
"🛠 Contribute" => "contribute.md",
Expand Down
1 change: 1 addition & 0 deletions docs/setup_docs.jl
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ setup_docs = quote
using Serialization
using SharedArrays
using StatsBase
using StatsPlots
using Transformers
using Transformers.TextEncoders
using Transformers.HuggingFace
Expand Down
45 changes: 45 additions & 0 deletions docs/src/explanation/finite_sample_correction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Finite-sample Correction

We follow the convention used in Angelopoulos and Bates (2021) and Barber et al. (2021) to correct for the finite-sample bias of the empirical quantile. Specifically, we use the following definition of the (1−*α*) empirical quantile:

``` math
\hat{q}_{n,\alpha}^{+}\{v\} = \frac{\lceil (n+1)(1-\alpha)\rceil}{n}
```

Barber et al. (2021) further define as the *α* empirical quantile:

``` math
\hat{q}_{n,\alpha}^{-}\{v\} = \frac{\lfloor (n+1)\alpha \rfloor}{n} = - \hat{q}_{n,\alpha}^{+}\{-v\}
```

Below we test this equality numerically by generating a large number of random vectors and comparing the two quantiles. We then plot the density of the difference between the two quantiles. While the errors are small, they are not negligible for small *n*. In our computations, we use **_(*n*, *α*)⁻{*v*} exactly as it is defined above, rather than relying on  − **_(*n*, *α*)⁺{ − *v*}.

``` julia
using ConformalPrediction: qplus, qminus
nobs = [100, 1000, 10000]
n = 1000
alpha = 0.1
plts = []
Δ = Float32[]
for _nobs in nobs
for i in 1:n
v = rand(_nobs)
δ = qminus(v, alpha) - (-qplus(-v, 1-alpha))
push!(Δ, δ)
end
plt = density(Δ)
vline!([mean(Δ)], color=:red, label="mean")
push!(plts, plt)
end
plot(plts..., layout=(1,3), size=(900, 300), legend=:topleft, title=["nobs = 100" "nobs = 1000" "nobs = 10000"])
```

![](finite_sample_correction_files/figure-commonmark/cell-3-output-1.svg)

See also this related [discussion](https://github.com/JuliaTrustworthyAI/ConformalPrediction.jl/discussions/17).

## References

Angelopoulos, Anastasios N., and Stephen Bates. 2021. “A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification.” <https://arxiv.org/abs/2107.07511>.

Barber, Rina Foygel, Emmanuel J. Candès, Aaditya Ramdas, and Ryan J. Tibshirani. 2021. “Predictive Inference with the Jackknife+.” *The Annals of Statistics* 49 (1): 486–507. <https://doi.org/10.1214/20-AOS1965>.
47 changes: 47 additions & 0 deletions docs/src/explanation/finite_sample_correction.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Finite-sample Correction

```{julia}
#| echo: false
include("$(pwd())/docs/setup_docs.jl")
eval(setup_docs)
```

We follow the convention used in @angelopoulos2021gentle and @barber2021predictive to correct for the finite-sample bias of the empirical quantile. Specifically, we use the following definition of the $(1-\alpha)$ empirical quantile:

```math
\hat{q}_{n,\alpha}^{+}\{v\} = \frac{\lceil (n+1)(1-\alpha)\rceil}{n}
```

@barber2021predictive further define as the $\alpha$ empirical quantile:

```math
\hat{q}_{n,\alpha}^{-}\{v\} = \frac{\lfloor (n+1)\alpha \rfloor}{n} = - \hat{q}_{n,\alpha}^{+}\{-v\}
```

Below we test this equality numerically by generating a large number of random vectors and comparing the two quantiles. We then plot the density of the difference between the two quantiles. While the errors are small, they are not negligible for small $n$. In our computations, we use $\hat{q}_{n,\alpha}^{-}\{v\}$ exactly as it is defined above, rather than relying on $- \hat{q}_{n,\alpha}^{+}\{-v\}$.

```{julia}
#| output: true
using ConformalPrediction: qplus, qminus
nobs = [100, 1000, 10000]
n = 1000
alpha = 0.1
plts = []
Δ = Float32[]
for _nobs in nobs
for i in 1:n
v = rand(_nobs)
δ = qminus(v, alpha) - (-qplus(-v, 1-alpha))
push!(Δ, δ)
end
plt = density(Δ)
vline!([mean(Δ)], color=:red, label="mean")
push!(plts, plt)
end
plot(plts..., layout=(1,3), size=(900, 300), legend=:topleft, title=["nobs = 100" "nobs = 1000" "nobs = 10000"])
```

See also this related [discussion](https://github.com/JuliaTrustworthyAI/ConformalPrediction.jl/discussions/17).

## References
Loading

0 comments on commit 5013ca1

Please sign in to comment.