Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OneHotEncoder doesn't support dtype 'category' #751

Closed
npatki opened this issue Jan 8, 2024 · 0 comments · Fixed by #759
Closed

OneHotEncoder doesn't support dtype 'category' #751

npatki opened this issue Jan 8, 2024 · 0 comments · Fixed by #759
Assignees
Labels
bug Something isn't working
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Jan 8, 2024

Environment Details

  • RDT version: 1.9.0

Error Description

The OneHotEncoder transformer crashes if the dtype is 'category' similar to the LabelEncoder as described in #617. Certain synthesizers -- such as CTGANSynthesizer and TVAESynthesizer -- use One Hot Encoding, so this transformer error prevents them from being able to model this type of data.

Steps to reproduce

import pandas as pd
from rdt.transformers.categorical import OneHotEncoder

# create some discrete data and store is as type 'category'
test_data = pd.DataFrame(data={
    'A': ['Yes', 'No', 'Yes', 'Maybe', 'No']
})
test_data['A'] = test_data['A'].astype('category')

# try to use one hot encoder
transformer = OneHotEncoder()
transformed_data = transformer.fit_transform(test_data, column='A')

Output:

TypeError: Cannot interpret 'CategoricalDtype(categories=['Maybe', 'No', 'Yes'], ordered=False)' as a data type

Stack Trace:
stack_trace.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants