-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Deprecate label_encoding method in favor of categorical columns #8608
Comments
What does the string data look like? You may be able to use the subword_tokenize function to assign integers to words. If you want to replace an entire string row with an integer you could use
|
In addition to @davidwendt , you could do this in a couple of other ways to avoid enumerating your categories (the former probably being preferred method in pandas): import cudf
s = cudf.Series(["a","b","c"])
s.astype("category").cat.codes
0 0
1 1
2 2
dtype: uint8 import cudf
from cuml.preprocessing import LabelEncoder
s = cudf.Series(["a","b","c"])
le = LabelEncoder()
le.fit(s)
le.transform(s)
0 0
1 1
2 2
dtype: uint8 cc @shwina perhaps we should consider deprecating |
Thank you @davidwendt and @beckernick for your suggestions. I think they all sound like they will do what I need. The strings I need to replace are all values in a single column. I haven't had time to try these out yet but I just wanted to say thank you and that I really appreciate your quick answers and advice. |
Ok, I've had some time to read about and test the options you suggested.
Thank you both again for your help and suggestions. I really appreciate it. My question is resolved. Should I close this, or would you like me to keep it open for issue tracking? |
Glad this is resolved. Let's keep this open and edit the title to be a discussion around deprecating |
This PR addresses issue #8608 by adding a deprecation warning before we remove the functionality entirely. Authors: - Mayank Anand (https://github.com/mayankanand007) Approvers: - Ashwin Srinath (https://github.com/shwina) URL: #9289
Dear cuDF developers,
Is your feature request related to a problem? Please describe.
My feature request is for cuDF to add the ability for the
label_encoding
function to encode string data. My specific use-case is to encode string values as ints.Describe the solution you'd like
I would like to use the
label_encoding
function to encode some values that are currently strings as ints. My larger goal is to use these data in CuPy's histogram2d, but since CuPy does not yet support string arrays, I was hoping that label encoding would help me get around this problem.Describe alternatives you've considered
I have also tried the cuDF
replace
function for this, but it looks like it doesn't support encoding strings as ints either:I'd be equally happy with
replace
supporting the ability to replace objects of different types.Additional context
If there are better ways to accomplish this task, I would be grateful for your suggestions.
Thank you very much,
Laurie
The text was updated successfully, but these errors were encountered: