Basic implementation of `OrdinalEncoder`. #5646

trivialfis · 2023-11-06T21:10:36Z

Implement OrdinalEncoder.
Implement dask version.
Fix dask transformers with DataFrame input by using dask_cudf to construct return df.

Some other scikit-learn features are not available yet, for instance, encoded_missing_value, min_frequency, and max_categories.

The implementation is mostly based on the existing one hot encoder and label encoder.

I'm a bit confused by the output_type parameter and not sure how strictly it's enforced. I looked around, it seems some estimators can ignore this parameter in their returns. Would be great if there's a guideline on how to handle this parameter, along with #5645 .

Close #4456 .

- Implement `OrdinalEncoder`. - Implement dask version. - Fix dask transformers with DataFrame input by using `dask_cudf` to construct return df.

trivialfis · 2023-11-07T21:31:12Z

I think the failure with dask is not related to this PR.

csadorf · 2023-11-14T22:39:48Z

I'm a bit confused by the output_type parameter and not sure how strictly it's enforced. I looked around, it seems some estimators can ignore this parameter in their returns. Would be great if there's a guideline on how to handle this parameter, along with #5645 .

I don't think the value should be ignored, but I am not sure how consistently it is being used within the dask implementation. To the best of my understanding, it is implemented correctly here.

csadorf

I really only found some superficial issues and would like to see one more test to explicitly test array inputs. Other than that, I ran some local tests to check whether the output_types are actually respected and it seems to work as expected. Nice job!

python/cuml/dask/preprocessing/encoders.py

python/cuml/preprocessing/ordinalencoder_mg.py

python/cuml/tests/test_ordinal_encoder.py

python/cuml/dask/preprocessing/encoders.py

python/cuml/tests/dask/test_dask_ordinal_encoder.py

…encoder

trivialfis · 2023-11-15T08:42:56Z

I think an updated cudf is causing the device ordinal error:

  - cudf                    23.12.00a140  cuda12_py310_231107_g2463b3ad53_140  rapidsai-nightly                    
  + cudf                    23.12.00a697  cuda12_py310_231115_g8a0a08f34f_697  rapidsai-nightly/linux-64        8MB
  - dask-cuda                23.12.00a26  py310_231107_ge5b240c_26             rapidsai-nightly                    
  + dask-cuda                23.12.00a28  py310_231115_gd026d6e_28             rapidsai-nightly/linux-64      189kB
  - dask-cudf               23.12.00a140  cuda12_py310_231107_g2463b3ad53_140  rapidsai-nightly                    
  + dask-cudf               23.12.00a697  cuda12_py310_231115_g8a0a08f34f_697  rapidsai-nightly/linux-64      137kB

trivialfis · 2023-11-16T10:08:53Z

After the first review, I picked some fixes from docformatter. However, due to the use of generated documents, it's not easy to simply apply all changes or enforce them in CI. But that could be something interesting in the long term.

trivialfis · 2023-11-21T16:19:23Z

This is ready for another review. :-)

csadorf

Thank you for addressing my comments!

csadorf · 2023-11-21T21:32:40Z

/merge

Implement basic support for ordinal encoder.

80fe90a

- Implement `OrdinalEncoder`. - Implement dask version. - Fix dask transformers with DataFrame input by using `dask_cudf` to construct return df.

trivialfis added feature request New feature or request non-breaking Non-breaking change labels Nov 6, 2023

trivialfis requested a review from a team as a code owner November 6, 2023 21:10

github-actions bot added the Cython / Python Cython or Python issue label Nov 6, 2023

trivialfis added 8 commits November 7, 2023 05:14

black.

c26eca7

Merge branch 'branch-23.12' into ordinal-encoder

ae179c6

Fixes for doc.

5b92aa0

Indirect class.

c921158

Use Base instead.

9e217b5

lint.

098c0c0

Doc fix.

4855a68

doc test.

9eb3bc5

csadorf and others added 5 commits November 9, 2023 15:22

Merge branch 'branch-23.12' into ordinal-encoder

6a0429e

Merge branch 'branch-23.12' into ordinal-encoder

a1f6e70

Cleanup doc strings in local.

983d4b5

black.

692e426

black.

7c21a1a

trivialfis added the 3 - Ready for Review Ready for review by team label Nov 14, 2023

Merge branch 'branch-23.12' into ordinal-encoder

c2a5653

csadorf self-assigned this Nov 14, 2023

csadorf requested changes Nov 14, 2023

View reviewed changes

csadorf added 4 - Waiting on Author Waiting for author to respond to review and removed 3 - Ready for Review Ready for review by team labels Nov 14, 2023

trivialfis added 4 commits November 15, 2023 12:08

Add tests for array. Cleanup.

4bf271b

remove init.

dee2a99

Merge remote-tracking branch 'jiamingy/ordinal-encoder' into ordinal-…

b2ac9c7

…encoder

black.

5526933

Merge branch 'branch-23.12' into ordinal-encoder

afb582b

extract a test sample.

dff373f

csadorf added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 4 - Waiting on Author Waiting for author to respond to review labels Nov 16, 2023

trivialfis added 3 commits November 21, 2023 17:17

Merge branch 'branch-23.12' into ordinal-encoder

b7c5ab4

doc checker is now happy.

1030d5b

Merge branch 'branch-23.12' into ordinal-encoder

6652249

csadorf approved these changes Nov 21, 2023

View reviewed changes

rapids-bot bot merged commit 21fbf04 into rapidsai:branch-23.12 Nov 21, 2023
49 checks passed

trivialfis deleted the ordinal-encoder branch November 21, 2023 21:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic implementation of `OrdinalEncoder`. #5646

Basic implementation of `OrdinalEncoder`. #5646

trivialfis commented Nov 6, 2023 •

edited

Loading

trivialfis commented Nov 7, 2023

csadorf commented Nov 14, 2023

csadorf left a comment •

edited

Loading

trivialfis commented Nov 15, 2023

trivialfis commented Nov 16, 2023

trivialfis commented Nov 21, 2023

csadorf left a comment

csadorf commented Nov 21, 2023

Basic implementation of OrdinalEncoder. #5646

Basic implementation of OrdinalEncoder. #5646

Conversation

trivialfis commented Nov 6, 2023 • edited Loading

trivialfis commented Nov 7, 2023

csadorf commented Nov 14, 2023

csadorf left a comment • edited Loading

Choose a reason for hiding this comment

trivialfis commented Nov 15, 2023

trivialfis commented Nov 16, 2023

trivialfis commented Nov 21, 2023

csadorf left a comment

Choose a reason for hiding this comment

csadorf commented Nov 21, 2023

Basic implementation of `OrdinalEncoder`. #5646

Basic implementation of `OrdinalEncoder`. #5646

trivialfis commented Nov 6, 2023 •

edited

Loading

csadorf left a comment •

edited

Loading