-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FeatureImp + mlr3 Learner that predicts probabilities does not work #134
Comments
Ah, I need to define the positve class, I could solve the issue using
Maybe printing an error message could help here? |
Thanks for reporting! The underlying issue is that we do not have information about the task type in I've tried to partially address this in #137 by checking if the supplied learner has attributes of a classification learner. I only did so for mlr3 learners right now though. This should probably be tackled in a more robust fashion but it does the job for now. |
I was wrong, What is correct is that {iml} by default does not know about the Also your loss function does not seem to be suited here? With the default library("mlr3")
library("iml")
credit.task = tsk("german_credit")
lrn = lrn("classif.rpart", predict_type = "prob")
model = lrn$train(credit.task)
data = credit.task$data()
pred = Predictor$new(model, data = data, y = "credit_risk", type = "prob")
FeatureImp$new(pred, loss = "ce", n.repetitions = 1)
#> Interpretation method: FeatureImp
#> error function: ce
#>
#> Analysed predictor:
#> Prediction task: classification
#> Classes:
#>
#> Analysed data:
#> Sampling from data.frame with 1000 rows and 20 columns.
#>
#> Head of results:
#> feature importance.05 importance importance.95 permutation.error
#> 1 status 1.517241 1.517241 1.517241 0.308
#> 2 duration 1.295567 1.295567 1.295567 0.263
#> 3 amount 1.157635 1.157635 1.157635 0.235
#> 4 credit_history 1.137931 1.137931 1.137931 0.231
#> 5 purpose 1.098522 1.098522 1.098522 0.223
#> 6 savings 1.098522 1.098522 1.098522 0.223
<sup>Created on 2020-07-29 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0.9001)</sup> |
I see, Example: library("mlr3")
library("iml")
credit.task = tsk("german_credit")
lrn = lrn("classif.rpart", predict_type = "prob")
model = lrn$train(credit.task)
data = credit.task$data()
brier = function(actual, predicted) {
sum((actual - predicted)^2)
}
pred = Predictor$new(model, data = data, y = "credit_risk", type = "prob", class = "good")
FeatureImp$new(pred, loss = brier, n.repetitions = 1)
## Fehler in if (self$original.error == 0 & self$compare == "ratio") { :
## Fehlender Wert, wo TRUE/FALSE nötig ist
## Zusätzlich: Warnmeldung:
## In Ops.factor(actual, predicted) : ‘-’ not meaningful for factors To unterstand why this happens, let's look at the values library("mlr3")
library("iml")
credit.task = tsk("german_credit")
lrn = lrn("classif.rpart", predict_type = "prob")
model = lrn$train(credit.task)
data = credit.task$data()
measure_print = function(actual, predicted) {
cat(head(actual), fill = T)
cat(head(predicted), fill = T)
}
pred = Predictor$new(model, data = data, y = "credit_risk", type = "prob", class = "good")
FeatureImp$new(pred, loss = measure_print, n.repetitions = 1)
## 1 2 1 1 2 1
## 0.8767123 0.1388889 0.868709 0.379562 0.379562 0.868709
## Fehler in if (self$original.error == 0 & self$compare == "ratio") { :
## Argument hat Länge 0
# PS: if I don't use class = "good", the value of "actual" from the measure is still a factor:
pred = Predictor$new(model, data = data, y = "credit_risk", type = "prob")
FeatureImp$new(pred, loss = measure_print, n.repetitions = 1)
## 1 2 1 1 2 1
## 1 2 1 2 2 1
## Fehler in if (self$original.error == 0 & self$compare == "ratio") { :
## Argument hat Länge 0 |
Thanks. Yes something bad is happening internally. But in any case, |
Maybe if iml allows that the |
Thanks for addressing this. It is quite difficult to capture many possible losses and also the different types of outcomes (regression, binare classification, multiclass, probabilities). library("mlr3")
library("iml")
credit.task = tsk("german_credit")
lrn = lrn("classif.rpart", predict_type = "prob")
model = lrn$train(credit.task)
data = credit.task$data()
brier = function(actual, predicted) {
sum((actual - predicted)^2)
}
y = 1 * (credit.task$data()$credit_risk == "good")
pred = Predictor$new(model, data = data, y = y, type = "prob", class = "good")
FeatureImp$new(pred, loss = brier, n.repetitions = 1)
|
I am not sure how to improve the situation while still allowing very general settings for the loss function. Maybe having some more examples would have helped in the help file? |
My problem was that I passed my own predict.fun which was ignored completely: library("mlr3")
library("iml")
credit.task = tsk("german_credit")
lrn = lrn("classif.rpart", predict_type = "prob")
model = lrn$train(credit.task)
data = credit.task$data()
# print actual and predicted
measure_print = function(actual, predicted) {
cat(head(actual), fill = T)
cat(head(predicted), fill = T)
}
# use a manually written predict function that returns probabilities
predict_good_prob = function(model, newdata) predict(model, newdata, predict_type = "prob")[, "good"]
head(predict_good_prob(model, data))
# [1] 0.8767123 0.1388889 0.8687090 0.3795620 0.3795620 0.8687090
pred = Predictor$new(model, data = data, y = "credit_risk", predict.function = predict_good_prob)
imp = FeatureImp$new(pred, loss = measure_print, n.repetitions = 1)
# 1 2 1 1 2 1
# 1 2 1 2 2 1
# Fehler in if (self$original.error == 0 & self$compare == "ratio") { :
# Argument hat Länge 0 Usually, the user knows how In case of multiclass I could have written a predict function that passes a matrix of all probabilities for each class, e.g.: predict_good_prob = function(model, newdata) predict(model, newdata, predict_type = "prob")
head(predict_good_prob(model, data))
# good bad
# [1,] 0.8767123 0.1232877
# [2,] 0.1388889 0.8611111
# [3,] 0.8687090 0.1312910
pred = Predictor$new(model, data = data, y = "credit_risk", predict.function = predict_good_prob) Then, I'd expect that I can reuse this matrix of probabilities in the measure # print actual and predicted
my_cool_measure = function(actual, predicted) {
class1 = predicted[,"good"]
class2 = predicted[,"bad"]
# do some cool computations with probabilities of each class
} |
If I want to compute the importance for a measure based on probabilities (e.g., brier score), the
FeatureImp
is never calculated on the probabilities, even if I manually use apredict.function
:It seems that internally the class is converted as numeric values (1 and 2), which makes it impossible to compute measures based on probabilities. I then tried to directly use a manually written
predict.function
which also did not work:The text was updated successfully, but these errors were encountered: