Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Accept fold index for TargetEncoder #4453

Merged
merged 14 commits into from
Jan 20, 2022

Conversation

daxiongshu
Copy link
Contributor

@daxiongshu daxiongshu commented Dec 16, 2021

As requested in issue #4441, in this PR we let TargetEncoder accept a customized fold index array in fit()
For example, in the following code

X = [1, 2, 3, 1, 2]
y = [1, 0, 0, 0, 1]
fold_id = [0,1,0,0,1]
encoder = TargetEncoder(split_method='customize')
encoder.fit(X,y,fold_id=fold_id)

The target encoder will fit subarray of X and y where fold_id==0 to encode the subarray of X where fold_id==1, and vice versa.

@daxiongshu daxiongshu requested a review from a team as a code owner December 16, 2021 15:21
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Dec 16, 2021
@daxiongshu
Copy link
Contributor Author

stop

@daxiongshu daxiongshu added non-breaking Non-breaking change feature request New feature or request labels Jan 7, 2022
@daxiongshu daxiongshu added the 3 - Ready for Review Ready for review by team label Jan 7, 2022
Copy link
Member

@dantegd dantegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks great, just a request for updated docstring

@@ -114,7 +118,7 @@ def __init__(self, n_folds=4, smooth=0, seed=42,
self.train = None
self.output_type = output_type

def fit(self, x, y):
def fit(self, x, y, fold_ids=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add fold_ids to the docstring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, thank you for comments.

self.train_encode = res
self.train = train
self._fitted = True
return self

def fit_transform(self, x, y):
def fit_transform(self, x, y, fold_ids=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.02@fed3774). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-22.02    #4453   +/-   ##
===============================================
  Coverage                ?   85.74%           
===============================================
  Files                   ?      236           
  Lines                   ?    19322           
  Branches                ?        0           
===============================================
  Hits                    ?    16567           
  Misses                  ?     2755           
  Partials                ?        0           
Flag Coverage Δ
dask 46.52% <0.00%> (?)
non-dask 78.64% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fed3774...84a04b2. Read the comment docs.

@dantegd
Copy link
Member

dantegd commented Jan 20, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit fffee47 into rapidsai:branch-22.02 Jan 20, 2022
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
As requested in issue rapidsai#4441, in this PR we let TargetEncoder accept a customized fold index array in `fit()`
For example, in the following code
```
X = [1, 2, 3, 1, 2]
y = [1, 0, 0, 0, 1]
fold_id = [0,1,0,0,1]
encoder = TargetEncoder(split_method='customize')
encoder.fit(X,y,fold_id=fold_id)
``` 
The target encoder will fit subarray of `X` and `y` where `fold_id==0` to encode the subarray of `X` where `fold_id==1`, and vice versa.

Authors:
  - Jiwei Liu (https://github.com/daxiongshu)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4453
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team Cython / Python Cython or Python issue feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants