Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting a GPU DF to DMatrix doesn't handle uint8 #468

Closed
raghavmi opened this issue Dec 7, 2018 · 5 comments
Closed

Converting a GPU DF to DMatrix doesn't handle uint8 #468

raghavmi opened this issue Dec 7, 2018 · 5 comments
Assignees

Comments

@raghavmi
Copy link
Contributor

raghavmi commented Dec 7, 2018

Reporting a bug

Converting a GPU DF to Dmatrix throws an error if some of the columns are of type uint8.

cuDF's one_hot_encoding function automatically creates columns of type uint8. But these columns then throw an error when you try to convert it to a DMatrix.

You can reproduce this issue on the latest RAPIDS Docker image using the attached Python code .

GPU DF to DMatrix Conversion Error #1.txt

@raghavmi raghavmi changed the title Converting a GPU DF to DMatrix throws doesn't handle uint8 Converting a GPU DF to DMatrix doesn't handle uint8 Dec 7, 2018
@kkraus14
Copy link
Collaborator

@raghavmi I don't believe we support uint dtypes yet, have you tried with an int8 or int16? Does it still show this issue?

@kkraus14 kkraus14 self-assigned this Dec 10, 2018
@raghavmi
Copy link
Contributor Author

It works when I remove the int8 columns.

How do I convert a column dtype in cuDF? I don't see an option like Pandas' astype or to_numeric.

I created this issue to capture the lack of support for uint8. Is that already being tracked elsewhere ?

Note, uint8 is the default type that cuDF's one hot encoding generates. So people are likely to run into it frequently.

There's a separate issue when you get past the uint8 issue, that I've it documented here:
#469

@kkraus14
Copy link
Collaborator

astype is supported you should be able to specify the dtype as int8 or int16 instead of uint8 as well

@raghavmi
Copy link
Contributor Author

raghavmi commented Dec 10, 2018

My apologies. I found the documentation for it now. I was looking for it as a function on the dataframe object itself.

I still see some weirdness. My attempt to change the dtype doesn't quite seem to work.

In my case, APPT_WEEKDAY_1 is a column of dtype uint8.

print(X_train.APPT_WEEKDAY_1.dtype)

uint8

  1. I try to convert it to int8 and it looks like it worked.
X_train.APPT_WEEKDAY_1 = X_train.APPT_WEEKDAY_1.astype(np.int8)
print(X_train.APPT_WEEKDAY_1.dtype)

int8

  1. But when I print all dtypes, it still shows APPT_WEEKDAY_1 has a dtype of uint8
print(X_train.dtypes)

AGE int64
DISTANCE int64
APPT_WEEKDAY_1 uint8

  1. Finally, when I try to create a DMatrix, it still errors out with a key error for numpy.unit8.

@kkraus14
Copy link
Collaborator

kkraus14 commented Feb 5, 2019

Should be fixed in latest release which had numerous dtype handling fixes.

@kkraus14 kkraus14 closed this as completed Feb 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants