-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Other methods for PIs #6
Comments
Sure thing! I'll probably turn back to this later next week - happy to have a chat then. Obviously do also feel free to create a PR in the meantime |
Here is some code for the Jackknife+ (and friends) Prediction intervals: using Distributions, Plots, Random, Tables, GLMNet, MLJ;
n= 1_000; σ=1.0;
X = [ones(n) sin.(1:n) cos.(n:-1:1) exp.(sin.(1:n))]
θ = [9.0; 7.0; 3.0; 21]
Noise = randn(MersenneTwister(49), n) # n rows
cef(x;θ=θ) = x*θ; # Conditional Expectation Function
skedastic(x;σ=σ) = σ; # Skedastic Function. Homoskedastic.
y = cef(X;θ=θ) + skedastic(X;σ=σ) .* Noise
train, calibration, test = partition(eachindex(y[:,1]), 0.4, 0.4)
#train, calibration, test = partition(eachindex(y[:,1]), 0.4, 0.4, shuffle=true, rng=444)
i_new = test[1] # index of new data point.
y_train = y[train]; X_train = X[train,:];
scfit(x,y) = GLMNet.glmnetcv(x, y, nlambda=1_000, alpha = 1.0) # Ridge/Lasso alpha = 0/1
scpr(m,x) = GLMNet.predict(m, x);
α = 0.05 # miscoverage rate α∈[0,1]
# compute in-sample residuals in the training data
m = scfit(X_train, y_train);
ŷ = scpr(m, X);
ŷ_new_is = ŷ[i_new];
res_is = y_train .- ŷ[train] # in-sample residuals.
# compute LOO residuals in the training data
res_LOO, ŷ_new_LOO = [], [];
for tt in train
println(tt)
#
ty = y_train[train .!= tt] # y_train leave out tt
tX = X_train[train .!= tt,:] # X_train leave out tt
#
tm = scfit(tX, ty) # fit on training data minus row==tt
tŷ = scpr(tm, X) # predict on dataset
#
push!(res_LOO, y[tt] - tŷ[tt]) # LOO residual
push!(ŷ_new_LOO, tŷ[i_new]) # LOO prediction
end
res_LOO
ŷ_new_LOO
"Naive PI: use in-sample residuals to estimate oos residuals"
"problem: in-sample residuals usually smaller than oos residuals"
err_Naive = quantile(abs.(res_is), 1 - α)
LB_Naive = ŷ_new_is - err_Naive
UB_Naive = ŷ_new_is + err_Naive
"Jackknife PI: use training LOO residuals to estimate oos residuals"
"problem: sensitive to instability"
err_LOO = quantile(abs.(res_LOO), 1 - α)
LB_J = ŷ_new_is - err_LOO
UB_J = ŷ_new_is + err_LOO
"Jackknife+ PI: use sample quantile of each ŷ_new_LOO[tt] adjusted by its own res_LOO[tt]"
LB_Jp = quantile(ŷ_new_LOO - abs.(res_LOO), α)
UB_Jp = quantile(ŷ_new_LOO + abs.(res_LOO), 1-α)
"Jackknife-minmax PI"
LB_Jmm = minimum(ŷ_new_LOO) - err_LOO
UB_Jmm = maximum(ŷ_new_LOO) + err_LOO
#
#v = ŷ_new_LOO - abs.(res_LOO);
#quantile(v,α) == -quantile(-v,1-α) # SANITY CHECK
LB_Naive, LB_J, LB_Jp, LB_Jmm
UB_Naive, UB_J, UB_Jp, UB_Jmm
#TODO:
"Split Conformal/Full Conformal/K-Fold CV+/K-fold cross-conformal" @ablaom (I had trouble implementing in MLJ so I used GLMNet.jl) Naive prediction intervals (ignores overfitting): @ryantibs this is prob a dumb question (I prob misunderstood the paper), but suppose I want a prediction interval that contains |
Yes, that is correct. Of course if you're seeking guaranteed 90% coverage, so \alpha = 0.1, then you could always use the JK+ with \alpha = 0.05. However I wouldn't say the interpretation is that the JK-minimax is better. The JK+ at level \alpha often has close to level 1-\alpha coverage in practice, and provably so under stability conditions (as the paper shows). In practice, the JK-minimax is often overly conservative. So practically, I would favor the JK+ in most scenarios. |
Thanks @ryantibs! I re-read the paper.
|
Nice one @azev77 👍🏽 should be straight-forward to add this here. Happy to do that myself, but perhaps you'd prefer creating a PR so you'll show up as a contributor? (I'd like to turn to #5 first anyway and have some work to do this week on some other packages) Edit: I've invited you as a collaborator and created a separate branch for this. |
As a general comment, I would say that validity just means that the coverage is at least 1-\alpha. This just means you don't undercover. The upper bound is of course nice, and says that you don't overcover by "too much". Traditional conformal methods have this property.
|
@pat-alt thanks for being so open to collaborating. |
I just tried to understand what the package is doing to compute Naive PIs for regression The JK+ paper (linked above) calls using MLJ
import Statistics: quantile
EvoTreeRegressor = @load EvoTreeRegressor pkg=EvoTrees
#Make Data
X, y = randn(1_000, 2), randn(1_000)
train, calibration, test = partition(eachindex(y), 0.4, 0.4)
Xcal = X[calibration,:]; ycal = y[calibration];
Xnew = X[test,:]; ynew = y[test];
#Fit training data
model = EvoTreeRegressor()
mach = machine(model, X, y)
fit!(mach, rows=train)
#Get PIs manually w/o a pkg (NaiveConformalRegressor)
## 2: non-conformity scores w/ calibration data ##
ŷcal = MLJ.predict(mach, Xcal) # MLJ.predict(mach, rows=calibration)
scores_cal = @.(abs(ŷcal - ycal))
scores_cal = sort(scores_cal, rev=true) # sorted non-conformity scores
## 3: get PIs ##
α=0.05; # miscoverage α ∈ [0,1]
n = length(scores_cal)
p̂ = ceil(((n+1) * (1-α))) / n
p̂ = clamp(p̂, 0.0, 1.0)
q̂ = quantile(scores_cal, p̂)
ŷnew = MLJ.predict(mach, Xnew) # MLJ.predict(mach, rows=test)
PInew = map(x -> ["lower" => x .- q̂, "upper" => x .+ q̂],eachrow(ŷnew)) some comments (to consider maybe at some point):
PS: before I risk letting "scope creep" let me get too carried away. |
Thanks @azev77 You're right, that was a misnomer. In fact, what I had referred to as I've changed the code base (see #10) to incorporate that broad distinction: models of type I've also implemented Jackknife now (see #15). Adding other approaches is very straight-forward from here. Authors/contributors just need to subtype the relevant I'll close this issue and related branch now, but feel free to continue discussion here. |
HI @pat-alt, I was actually hoping to add JK+ etc, I'm actually traveling (in Vienna rn, but wanna look into the code when back...) |
Great - will be good to have second pair of eyes glance over it 🙏 safe travels! |
I have a weird question about prediction interval terminology. Are there other approaches to PIs besides conformal & JK+? @ryantibs @pat-alt @ablaom What about packages such as ngboost.py which return an entire predicted distribution? What if we compute PIs implied by that predicted distribution, what are those PIs called? I prob should've asked this question sooner, but is there any advantage to naming this pkg |
There are many other ways to compute prediction intervals, but I think we should limit our focus here on CP. I also don't think I want to change the name of this package. One reason is that it also implements conformal classification and those predictions are set-valued. That being said, |
When you recover from COVID, can we discuss implementing other methods for prediction Intervals besides the naive method?
I have code for the Jacknife+
https://www.stat.cmu.edu/~ryantibs/papers/jackknife.pdf
and additional conformal methods, that can handle heteroskedasticity ….
The text was updated successfully, but these errors were encountered: