You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When the PyTorch data loader reads a parquet file which has both columns with lists and "simple" (not-lists) columns, it is not possible to know what are the column names for the "simple" features.
Steps/Code to reproduce bug
Read a parquet file that has both list columns and simple columns for categorical columns (could also be continuous columns, the problem is the same)
The cat_sequence_features will have a dictionary with the key list_col2 and the tensor as a value. The cat_single_features will have a tensor with 2 dimensions, one for simple_col1 and other for simple_col3.
The problem is that the data loader does not provide a way to know the column names corresponding to cat_single_features dimensions.
I have checked the train_set.cat_names and train_set.cont_names, but they do not correspond only to the ```cat_single_features``, because they also contain the list column names.
Expected behavior
It would be better if cat_single_features could also be a dict of tensors. If there is a relevant performance penalty for doing so, the data loader should provide a property with the cat_names corresponding to the cat_single_features tensor dimensions.
Environment details (please complete the following information):
NVTabular 0.3
The text was updated successfully, but these errors were encountered:
Describe the bug
When the PyTorch data loader reads a parquet file which has both columns with lists and "simple" (not-lists) columns, it is not possible to know what are the column names for the "simple" features.
Steps/Code to reproduce bug
Read a parquet file that has both list columns and simple columns for categorical columns (could also be continuous columns, the problem is the same)
The
cat_sequence_features
will have a dictionary with the key list_col2 and the tensor as a value. Thecat_single_features
will have a tensor with 2 dimensions, one for simple_col1 and other for simple_col3.The problem is that the data loader does not provide a way to know the column names corresponding to
cat_single_features
dimensions.I have checked the
train_set.cat_names
andtrain_set.cont_names
, but they do not correspond only to the ```cat_single_features``, because they also contain the list column names.Expected behavior
It would be better if
cat_single_features
could also be a dict of tensors. If there is a relevant performance penalty for doing so, the data loader should provide a property with the cat_names corresponding to thecat_single_features
tensor dimensions.Environment details (please complete the following information):
The text was updated successfully, but these errors were encountered: