You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current HyperTransformer allows being passed as input for the transform and reverse_transform methods a DataFrame with additional columns that were not seen in the training data. When that happens, the HyperTransformer just ignores the columns and leaves them unmodified.
The opposite, being passed a DataFrame with only a subset of the columns being seen during training, is not possible and makes the HyperTransformer crash. This should be also supported.
Example
This example shows how the HyperTransformer currently crashes when being passed a subset of the training data.
In [1]: import pandas as pd
In [2]: data = pd.DataFrame({
...: 'category': ['a', 'b', 'c'],
...: 'float': [1., 2., 3.],
...: })
In [3]: import rdt
In [4]: ht = rdt.HyperTransformer()
In [5]: ht.fit(data)
In [6]: ht.transform(data[['category']])
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/mnt/nvme0n1p2/xals/.virtualenvs/RDT/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2894 try:
-> 2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'float'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-6-b9f4d352f138> in <module>
----> 1 ht.transform(data[['category']])
/mnt/nvme0n1p2/xals/Projects/MIT/RDT/rdt/hyper_transformer.py in transform(self, data)
187
188 for column_name, transformer in self._transformers.items():
--> 189 column = data.pop(column_name)
190 transformed = transformer.transform(column)
191
/mnt/nvme0n1p2/xals/.virtualenvs/RDT/lib/python3.8/site-packages/pandas/core/frame.py in pop(self, item)
4369 3 monkey NaN
4370 """
-> 4371 return super().pop(item=item)
4372
4373 @doc(NDFrame.replace, **_shared_doc_kwargs)
/mnt/nvme0n1p2/xals/.virtualenvs/RDT/lib/python3.8/site-packages/pandas/core/generic.py in pop(self, item)
659
660 def pop(self, item: Label) -> Union["Series", Any]:
--> 661 result = self[item]
662 del self[item]
663 if self.ndim == 2:
/mnt/nvme0n1p2/xals/.virtualenvs/RDT/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
2904 if self.columns.nlevels > 1:
2905 return self._getitem_multilevel(key)
-> 2906 indexer = self.columns.get_loc(key)
2907 if is_integer(indexer):
2908 indexer = [indexer]
/mnt/nvme0n1p2/xals/.virtualenvs/RDT/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
-> 2897 raise KeyError(key) from err
2898
2899 if tolerance is not None:
KeyError: 'float'
The text was updated successfully, but these errors were encountered:
Description
The current HyperTransformer allows being passed as input for the
transform
andreverse_transform
methods a DataFrame with additional columns that were not seen in the training data. When that happens, the HyperTransformer just ignores the columns and leaves them unmodified.The opposite, being passed a DataFrame with only a subset of the columns being seen during training, is not possible and makes the HyperTransformer crash. This should be also supported.
Example
This example shows how the HyperTransformer currently crashes when being passed a subset of the training data.
The text was updated successfully, but these errors were encountered: