Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem obtaining VIP for SVM results #159

Open
kyleGrealis opened this issue Aug 6, 2024 · 2 comments
Open

problem obtaining VIP for SVM results #159

kyleGrealis opened this issue Aug 6, 2024 · 2 comments

Comments

@kyleGrealis
Copy link

Hi! My boss and I are stuck on a problem and are looking for advice.
Here is a reprex that was posted to Stack Overflow outlining our issue.

TL;DR: VIP info on a SVM model that is doing classification

Thank you for your time!

@bgreenwell
Copy link
Member

bgreenwell commented Aug 6, 2024

Thanks @kyleGrealis I posted a solution on the StackOverflow post! Let me know if you still have issues. I'll leave this open so I can think of a proper "fix" for vip() and tidymodels workflows; in short, it's tricky since tidymodels wraps its own class. Here's the reprex from my end:

library(vip)
#> 
#> Attaching package: 'vip'
#> The following object is masked from 'package:utils':
#> 
#>     vi
library(MASS)
library(tidymodels)

data(Boston, package = "MASS")

# Make a classificaiton outcome
df <- Boston |> 
  mutate(is_big = factor(if_else(medv > 22, 1, 0)))

# Split the data into train and test set
set.seed(7)
splits <- initial_split(df)
train <- training(splits)
test <- testing(splits)

# Preprocess with recipe
rec <- recipe(
  formula = is_big ~ .,
  data = train
) 

svm_spec <- svm_rbf(margin = 0.0937, cost = 20, rbf_sigma = 0.0208) %>%
  set_engine("kernlab") %>%
  set_mode("classification")


# Putting into workflow
svr_fit <- workflow() %>%
  add_recipe(rec) %>%
  add_model(svm_spec) %>%
  fit(data = train)

# Extract the raw underlying fit
original_fit <- workflows::extract_fit_engine(svr_fit)

# Prediction wrapper should return a vector of probabilities for the second class
pfun <- function(object, newdata) {
  kernlab::predict(object, newdata, type = "probabilities")[, 2L]
}

# Sanity check
original_fit %>%
  pfun(train) %>%
  head()
#> [1] 0.0181795600 0.0907770250 0.0471238327 0.0002242344 0.0086771825
#> [6] 0.9999855454

# Now this should work
original_fit %>%
  vip(
    method = "permute",
    nsim = 5,
    target = "is_big", metric = "roc_auc", event_level = "second",
    pred_wrapper = pfun,
    train = train
  )

# Alternatively, you can define a prediction wrapper for the workflow object 
# directly; vip() seems to be bugged with tidymodels workflows
svr_fit %>%
  vi(
    method = "permute",
    nsim = 5,
    target = "is_big", metric = "roc_auc", event_level = "second",
    pred_wrapper = function(object, newdata) predict(object, newdata, type = "prob")[[".pred_1"]],
    train = train
  )
#> # A tibble: 14 × 3
#>    Variable Importance    StDev
#>    <chr>         <dbl>    <dbl>
#>  1 medv      0.341     0.0306  
#>  2 rad       0.0106    0.00233 
#>  3 ptratio   0.0102    0.00270 
#>  4 rm        0.00719   0.00141 
#>  5 age       0.00480   0.00215 
#>  6 lstat     0.00315   0.00165 
#>  7 dis       0.00210   0.000703
#>  8 nox       0.00199   0.000971
#>  9 tax       0.00197   0.000934
#> 10 chas      0.00132   0.000322
#> 11 crim      0.000374  0.000587
#> 12 black     0.000284  0.000388
#> 13 indus     0.000221  0.000488
#> 14 zn        0.0000681 0.000306

Created on 2024-08-06 with reprex v2.1.0

Careful though! Your example includes leakage since your binary outcome is a direct function of medv which is also included as an input; hence the large importance score for the latter.

@kyleGrealis
Copy link
Author

Thank you for your time answering this! And I appreciate your disclaimer here too. Applied to our working dataset, this solution produced exactly what we're looking for. So MANY thanks to you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants