Add fit_transform methods? #546

CameronBieganek · 2020-05-22T16:23:51Z

Sometimes it's handy to have a combined fit_transform method. For example, right now I'm doing my one-hot encoding once at the beginning of my analysis (not in a pipeline). This is safe to do, since it's a static transformer that doesn't learn from the data. So for this use case, it would be cool if I could do something like this:

Xencoded = fit_transform(OneHotEncoder(), X)

Or perhaps it would make sense to just overload transform?

Xencoded = transform(OneHotEncoder(), X)

The text was updated successfully, but these errors were encountered:

ablaom · 2020-05-26T00:02:03Z

Thanks for that. Yes, sk-learn has this, and so you are presumably not the first to wonder about this.

Users can sometimes get confused when there are lot of methods and many ways to do the same thing. In balancing convenience with simplicity, I feel some reluctance to do this.
However, there is also a performance issue, which you did not raise, which is that it is sometimes cheaper to fit and transform in one go, so we may want to allow model-specific implementations of such methods as well.

I wonder what others think of the suggestion?

CameronBieganek · 2020-05-26T19:17:58Z

I kind of like the approach of overloading transform for this. (If that's possible given the current method table.) I think it has a couple of advantages:

It doesn't introduce a new function to the API.
For some transformers the split between fit! and transform seems artificial. E.g., what does it mean to fit a OneHotEncoder? It seems like OneHotEncoder is really just a transformer with no fitting required.

ablaom · 2020-06-02T01:16:28Z

If we are going to do this, then I agree this might be the best option.

It seem the method table would allow this.

Still, like to ponder this a bit more before committing.

davidbp · 2020-12-21T16:03:51Z

I do not see the motivation for removing the fit with Onehotencoder.

As @CameronBieganek mentions sklearn has the option to fit_transform but it also has the option to first fit and then transform.

For some transformers the split between fit! and transform seems artificial. 
E.g., what does it mean to fit a OneHotEncoder?
 It seems like OneHotEncoder is really just a transformer with no fitting required.

I would argue otherwise. When you are doing Cross-Validation fitting a preprocess can be very handy. For example, one of feature values of a categorical variable might rarely appear. It might happen that the folds in which OneHotEncoder gets fitted do not have a particular feature value, making the model to simply ignore the creation of a new variable that might do more harm than good.

I do not have any counter argument for having a fit_transform method but I would vote against removing the capability to fit prepossesses that adapts to the input data when it is fitted.

OkonSamuel added the enhancement New feature or request label May 22, 2020

ablaom added the design discussion Discussing design issues label May 26, 2020

ablaom mentioned this issue Mar 15, 2021

New BetaML models (to be checked) #749

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fit_transform methods? #546

Add fit_transform methods? #546

CameronBieganek commented May 22, 2020

ablaom commented May 26, 2020

CameronBieganek commented May 26, 2020

ablaom commented Jun 2, 2020

davidbp commented Dec 21, 2020 •

edited

Loading

Add fit_transform methods? #546

Add fit_transform methods? #546

Comments

CameronBieganek commented May 22, 2020

ablaom commented May 26, 2020

CameronBieganek commented May 26, 2020

ablaom commented Jun 2, 2020

davidbp commented Dec 21, 2020 • edited Loading

davidbp commented Dec 21, 2020 •

edited

Loading