Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

usage of p0 and p1 ? #2

Open
lcrmorin opened this issue Nov 19, 2022 · 3 comments
Open

usage of p0 and p1 ? #2

lcrmorin opened this issue Nov 19, 2022 · 3 comments

Comments

@lcrmorin
Copy link

I am not entirely sure what is supposed to be the difference between p0 and p1. By similarity to the sklearn framework I would expect something like p1 = 1 - p0. But this is not the case. From the paper I understand that the final prediction is a function of p0 and p1 depending on the target metric. wouldn't some other names for p0 and p1 be better ? wouldn't a predict_proba method be usefull ?

@ptocca
Copy link
Owner

ptocca commented Nov 19, 2022

Thanks for the comment and the suggestions.
I kept the names p0 and p1 to remain aligned with the notation used in the paper. Perhaps p_low and p_high might be a clearer alternative?
Let me try to clarify their meaning: p0 and p1 are the probability predictions for y=1 if we were to add to the training set a new hypothetical example z=(x_t, y_t) made up of the test object x_t and y_t=0 in one case and y_t=1 in the other.
So you could view the interval (p_0, p_1) as an indication of the sensitivity of the prediction to the data.
Just to reiterate: both p0 and p1 are predictions for the probability that the label y_t be 1. So they do not sum to 1.

@francescopisu
Copy link

Would it be correct to use this formula
p = p1 / (1 - p0 + p1)
to get a "point" prediction (single value prediction as per this presentation) and report it with p0, p1 acting as confidence bounds ? E.g., p [p0, p1]
I'm new to the topic so what I'm saying could potentially be wrong.

Thank you

@ptocca
Copy link
Owner

ptocca commented Nov 23, 2022

@francescopisu yes, that is correct.
Strictly speaking, the suggested way is optimal in some sense (see later) when you use log-loss as loss function.
However, if you use square loss, the optimality is achieved for a different way of combining the two predictions.
For this alternative way and for details on the precise meaning of optimality in this context, please refer to chapter 4 of the paper Venn-Abers Predictors by Vovk and Petej.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants