-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problem obtaining VIP for SVM results #159
Comments
Thanks @kyleGrealis I posted a solution on the StackOverflow post! Let me know if you still have issues. I'll leave this open so I can think of a proper "fix" for library(vip)
#>
#> Attaching package: 'vip'
#> The following object is masked from 'package:utils':
#>
#> vi
library(MASS)
library(tidymodels)
data(Boston, package = "MASS")
# Make a classificaiton outcome
df <- Boston |>
mutate(is_big = factor(if_else(medv > 22, 1, 0)))
# Split the data into train and test set
set.seed(7)
splits <- initial_split(df)
train <- training(splits)
test <- testing(splits)
# Preprocess with recipe
rec <- recipe(
formula = is_big ~ .,
data = train
)
svm_spec <- svm_rbf(margin = 0.0937, cost = 20, rbf_sigma = 0.0208) %>%
set_engine("kernlab") %>%
set_mode("classification")
# Putting into workflow
svr_fit <- workflow() %>%
add_recipe(rec) %>%
add_model(svm_spec) %>%
fit(data = train)
# Extract the raw underlying fit
original_fit <- workflows::extract_fit_engine(svr_fit)
# Prediction wrapper should return a vector of probabilities for the second class
pfun <- function(object, newdata) {
kernlab::predict(object, newdata, type = "probabilities")[, 2L]
}
# Sanity check
original_fit %>%
pfun(train) %>%
head()
#> [1] 0.0181795600 0.0907770250 0.0471238327 0.0002242344 0.0086771825
#> [6] 0.9999855454
# Now this should work
original_fit %>%
vip(
method = "permute",
nsim = 5,
target = "is_big", metric = "roc_auc", event_level = "second",
pred_wrapper = pfun,
train = train
) # Alternatively, you can define a prediction wrapper for the workflow object
# directly; vip() seems to be bugged with tidymodels workflows
svr_fit %>%
vi(
method = "permute",
nsim = 5,
target = "is_big", metric = "roc_auc", event_level = "second",
pred_wrapper = function(object, newdata) predict(object, newdata, type = "prob")[[".pred_1"]],
train = train
)
#> # A tibble: 14 × 3
#> Variable Importance StDev
#> <chr> <dbl> <dbl>
#> 1 medv 0.341 0.0306
#> 2 rad 0.0106 0.00233
#> 3 ptratio 0.0102 0.00270
#> 4 rm 0.00719 0.00141
#> 5 age 0.00480 0.00215
#> 6 lstat 0.00315 0.00165
#> 7 dis 0.00210 0.000703
#> 8 nox 0.00199 0.000971
#> 9 tax 0.00197 0.000934
#> 10 chas 0.00132 0.000322
#> 11 crim 0.000374 0.000587
#> 12 black 0.000284 0.000388
#> 13 indus 0.000221 0.000488
#> 14 zn 0.0000681 0.000306 Created on 2024-08-06 with reprex v2.1.0 Careful though! Your example includes leakage since your binary outcome is a direct function of |
Thank you for your time answering this! And I appreciate your disclaimer here too. Applied to our working dataset, this solution produced exactly what we're looking for. So MANY thanks to you!! |
Hi! My boss and I are stuck on a problem and are looking for advice.
Here is a reprex that was posted to Stack Overflow outlining our issue.
TL;DR: VIP info on a SVM model that is doing classification
Thank you for your time!
The text was updated successfully, but these errors were encountered: