Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement our own partial dependence method #2502

Closed
bchen1116 opened this issue Jul 13, 2021 · 1 comment · Fixed by #2834
Closed

Implement our own partial dependence method #2502

bchen1116 opened this issue Jul 13, 2021 · 1 comment · Fixed by #2834
Assignees
Labels
enhancement An improvement to an existing feature.

Comments

@bchen1116
Copy link
Contributor

With this PR, we add a section of code to handle datetime support for partial dependence. @chukarsten brings up a good point about risks involved in using the private functions of sklearn's partial dependence, _grid_from_X and _partial_dependence_brute.

In my look into the issue, I found the following:

  • sklearn's partial_dependence makes calls to _grid_from_X and _partial_dependence_brute
  • _grid_from_X doesn't accept datetime features. If a datetime is passed in, it will result in the original datetimes to be returned, ignoring the grid_resolution parameter. We resolve this by turning the datetime into seconds (an int value), then pass it forward
  • _partial_dependence_brute calls the pipeline.predict/predict_proba methods on the trained pipelines. This means that we can't alter the dataset in only partial dependence if it isn't altered similarly for training (ie we cannot introduce a new column or alter the X data)

If we wanted to not use the the private functions, we need to find a way to properly pass the data to sklearn's partial_dependence. The issue here is that we somehow need to handle converting the datetime to seconds for obtaining the grid, but we'd need to use datetime for fitting, predicting, and computing the partial dependence otherwise.

It might be beneficial to build our own partial dependence feature or to find another way to pass this through without necessarily relying on private methods. This issue tracks finding the next best-steps to clean up our current partial dependence implementation.

fyi @chukarsten @freddyaboulton

@bchen1116 bchen1116 added the enhancement An improvement to an existing feature. label Jul 13, 2021
@dsherry
Copy link
Contributor

dsherry commented Jul 14, 2021

Agreed. Goal is to avoid more issues like #2475 , make our code easier to read and maintain.

Seems like we have two options. 1) clean up our current impl 2) don't use sklearn impl any more, write our own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An improvement to an existing feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants