You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With this PR, we add a section of code to handle datetime support for partial dependence. @chukarstenbrings up a good point about risks involved in using the private functions of sklearn's partial dependence, _grid_from_X and _partial_dependence_brute.
In my look into the issue, I found the following:
sklearn's partial_dependence makes calls to _grid_from_X and _partial_dependence_brute
_grid_from_X doesn't accept datetime features. If a datetime is passed in, it will result in the original datetimes to be returned, ignoring the grid_resolution parameter. We resolve this by turning the datetime into seconds (an int value), then pass it forward
_partial_dependence_brute calls the pipeline.predict/predict_proba methods on the trained pipelines. This means that we can't alter the dataset in only partial dependence if it isn't altered similarly for training (ie we cannot introduce a new column or alter the X data)
If we wanted to not use the the private functions, we need to find a way to properly pass the data to sklearn's partial_dependence. The issue here is that we somehow need to handle converting the datetime to seconds for obtaining the grid, but we'd need to use datetime for fitting, predicting, and computing the partial dependence otherwise.
It might be beneficial to build our own partial dependence feature or to find another way to pass this through without necessarily relying on private methods. This issue tracks finding the next best-steps to clean up our current partial dependence implementation.
With this PR, we add a section of code to handle datetime support for partial dependence. @chukarsten brings up a good point about risks involved in using the private functions of sklearn's partial dependence,
_grid_from_X
and_partial_dependence_brute
.In my look into the issue, I found the following:
partial_dependence
makes calls to_grid_from_X
and_partial_dependence_brute
_grid_from_X
doesn't accept datetime features. If a datetime is passed in, it will result in the original datetimes to be returned, ignoring thegrid_resolution
parameter. We resolve this by turning the datetime into seconds (an int value), then pass it forward_partial_dependence_brute
calls thepipeline.predict/predict_proba
methods on the trained pipelines. This means that we can't alter the dataset in only partial dependence if it isn't altered similarly for training (ie we cannot introduce a new column or alter the X data)If we wanted to not use the the private functions, we need to find a way to properly pass the data to sklearn's
partial_dependence
. The issue here is that we somehow need to handle converting the datetime to seconds for obtaining the grid, but we'd need to use datetime for fitting, predicting, and computing the partial dependence otherwise.It might be beneficial to build our own partial dependence feature or to find another way to pass this through without necessarily relying on private methods. This issue tracks finding the next best-steps to clean up our current partial dependence implementation.
fyi @chukarsten @freddyaboulton
The text was updated successfully, but these errors were encountered: