-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variable types not preserved after call to normalize_entity() #10
Comments
@j-grover is this an issue with autonormalize or Featuretools? If featuretools, please post as an issue that that repo: https://github.com/featuretools/featuretools/ |
For reference: autonormalize.py The normalization of a EntitySet follows the following call graph: According to my understanding, the variable types are not carried forward from normalize_entity to auto_entityset. So when entities are created in make_entityset, we do not have variable types: if time_index in current.df.columns:
entities[current.index[0]] = (current.df, current.index[0], time_index)
else:
entities[current.index[0]] = (current.df, current.index[0]) Entities definition: """
entities (dict[str -> tuple(pd.DataFrame, str, str)]): Dictionary of
entities. Entries take the format
{entity id -> (dataframe, id column, (time_column), (variable_types))}.
Note that time_column and variable_types are optional.
""" |
@j-grover thanks for clarification. I see the issue now. you're right that we aren't carrying the variable types through. would you be interested in submitting a PR that does that? |
Yeah sure, I'll give it a go. |
@kmax12 I have a branch ready, I believe I do not have access to push. |
@j-grover can you create a fork to make the pull request? |
Thanks, created PR. |
Reproducible example:
Column ip_address is set to dtype featuretools.variable_types.IPAddress:
After normalisation, ip_address resolves back to categorical:
To get the desired features, the variable types need to be preserved so the right primitives can be applied when running dfs. My question is whether this should be the desired behaviour or do the variable types need to be set manually again?
The text was updated successfully, but these errors were encountered: