You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When a TargetEncoding op is present in an NVTabular workflow, during fit() NVTabular computes the mean (count,sum) statistics for categorical values with respect to the target column.
Althought, when using this fitted workflow to transform() a dataset (for prediction), NVTabular requires the prediction dataset to contain the target columns (which we wanna predict) and raises the following error.
File "preprocessing.py", line 199, in run
new_predict_dataset = nvt_workflow_features.transform(predict_dataset)
File "/usr/lib/python3.8/functools.py", line 912, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/nvtabular/workflow/workflow.py", line 115, in _
return self._transform_impl(dataset)
File "/usr/local/lib/python3.8/dist-packages/nvtabular/workflow/workflow.py", line 271, in _transform_impl
ddf = dataset.to_ddf(columns=self._input_columns())
File "/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py", line 401, in to_ddf
ddf = self.engine.to_ddf(columns=columns)
File "/usr/local/lib/python3.8/dist-packages/merlin/io/dataframe_engine.py", line 44, in to_ddf
return _ddf[columns]
File "/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py", line 4648, in __getitem__
meta = self._meta[_extract_meta(key)]
File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py", line 1169, in __getitem__
return self._get_columns_by_label(mask)
File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py", line 1893, in _get_columns_by_label
new_data = super()._get_columns_by_label(labels, downcast)
File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py", line 418, in _get_columns_by_label
return self._data.select_by_label(labels)
File "/usr/local/lib/python3.8/dist-packages/cudf/core/column_accessor.py", line 338, in select_by_label
return self._select_by_label_list_like(key)
File "/usr/local/lib/python3.8/dist-packages/cudf/core/column_accessor.py", line 453, in _select_by_label_list_like
data = {k: self._grouped_data[k] for k in key}
File "/usr/local/lib/python3.8/dist-packages/cudf/core/column_accessor.py", line 453, in <dictcomp>
data = {k: self._grouped_data[k] for k in key}
KeyError: 'is_installed'
The target encoded values should be retrieved from statistics computed in fit() and target columns shouldn't be required.
P.s. As a workaround I have been creating dummy target columns in the prediction dataset to avoid that error in transform()
Steps/Code to reproduce bug
Create an NVTabular workflow that includes a target encoded feature
Describe the bug
When a
TargetEncoding
op is present in an NVTabular workflow, duringfit()
NVTabular computes the mean (count,sum) statistics for categorical values with respect to the target column.Althought, when using this fitted workflow to
transform()
a dataset (for prediction), NVTabular requires the prediction dataset to contain the target columns (which we wanna predict) and raises the following error.The target encoded values should be retrieved from statistics computed in fit() and target columns shouldn't be required.
P.s. As a workaround I have been creating dummy target columns in the prediction dataset to avoid that error in
transform()
Steps/Code to reproduce bug
worflow.fit()
with a dataset that contains feature1, feature2, is_installedworflow.transform()
on a dataset that contains feature1, feature2, but NOT is_installedExpected behavior
The text was updated successfully, but these errors were encountered: