Question and documentation #13

lcrmorin · 2023-07-04T16:39:30Z

I have some relatively beginer questions after trying the intro code:

The wrapper do not modify the predict_proba, is that right ?
When compared with calibrating a rf on the complete training dataset X_train (same seed), the conformal approach doesn't always improve the metric (70% of the time it improve the Brier score), is there any reason that it doesn't work all the time ?
Small comment but we would need:

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split

for the code to work out of the box.

The text was updated successfully, but these errors were encountered:

henrikbostrom · 2023-07-04T17:52:53Z

Hi,

Thanks for your questions and comment! Here are my answers in order:

It is correct that the wrapper does not affect predict_proba (or any of the other existing methods)
I am not sure that I fully follow you, but note that predict_p outputs p-values and not class probabilities (they do not sum to 1), hence the Brier score is not expected to be improved; the p-values do work in the sense that the p-values for the true labels are uniformly distributed
Thanks a lot for informing me; the Quickstart section has now been updated.

Best regards,
Henrik

JuleanA · 2023-07-05T00:50:08Z

Can you clarify predict_p and predict_set.

If you have a predict_p that outputs [0.46552707, 0.04407598] - as in your example, shouldn't the predict_set output be [0, 1] (in your example it is [1, 0])? I assume predict_set is predicting the class labels, and thus the second class with a p-value of 0.04407598 is under the 95% confidence interval?

Thank you.

henrikbostrom · 2023-07-05T07:40:10Z

predict_set provides the labels that cannot be rejected at the chosen confidence level; 1 indicates the presence of the corresponding label in the prediction set, i.e. it has not been rejected.

Best regards,
Henrik

lcrmorin · 2023-07-05T07:58:00Z

@henrikbostrom thanks for your answer. Where I come from 'calibrating a model in probabilities' means changing the output of the model so that it matches historical probabilities. (think probability calibration curve, usually dealt with things like isotonic regression). Maybe this is a cultural thing... as I understand it, it can be translated into p-values calibration.

It seemed to me that conformal prediction would allows something like this. At least the Venn-Abers approach seems to offer something similar, based on the metric used (see discussion). I was wondering 1) if there would be a similar approach here to build an optimal prediction to account for the calculated p-values and 2) if this would depend on the metric used as in the Venn-Abers approach.

Regarding my second question, I was trying to evaluate the performance gain of the calibration process. And tried to compare some random forest performance with and without the wrapper. As the predict_proba method is not modified in the sense I expected, the experiment is a bit void (results depends on the seeds and the vanilla rf having access to the whole train data).

henrikbostrom · 2023-07-05T08:20:57Z

Thanks for the clarification!

Venn-Abers predictors would indeed be a natural choice for obtaining calibrated class probabilities; these are not (currently) implemented in the package.

When evaluating the output of predict_proba (which is not affected by the calibrate method) one would indeed expect a better performance from fitting the full (rather than only the proper) training set.

Best regards,
Henrik

henrikbostrom · 2023-07-10T11:08:19Z

I will move this thread to "Discussions" (the proposed documentation change has been fixed).

henrikbostrom added documentation Improvements or additions to documentation question Further information is requested labels Jul 10, 2023

Repository owner locked and limited conversation to collaborators Jul 10, 2023

henrikbostrom converted this issue into discussion #17 Jul 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Question and documentation #13

Question and documentation #13

lcrmorin commented Jul 4, 2023

henrikbostrom commented Jul 4, 2023

JuleanA commented Jul 5, 2023 •

edited

Loading

henrikbostrom commented Jul 5, 2023

lcrmorin commented Jul 5, 2023

henrikbostrom commented Jul 5, 2023

henrikbostrom commented Jul 10, 2023

This issue was moved to a discussion.

This issue was moved to a discussion.

Question and documentation #13

Question and documentation #13

Comments

lcrmorin commented Jul 4, 2023

henrikbostrom commented Jul 4, 2023

JuleanA commented Jul 5, 2023 • edited Loading

henrikbostrom commented Jul 5, 2023

lcrmorin commented Jul 5, 2023

henrikbostrom commented Jul 5, 2023

henrikbostrom commented Jul 10, 2023

This issue was moved to a discussion.

JuleanA commented Jul 5, 2023 •

edited

Loading