You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The LogisticParentChildDetection metric crashes with a KeyError if the names of the primary_key and foreign_key are different and there is another field on either of the tables that is called like the key on the other table.
For example, the parent table has the field id as its primary key and a child table contains both the id as its own primary key and parent_id as the foreign key to the parent. When this happens, the id fields end up converted to id_x and id_y during the merge, and then the del statements after that fail.
How to reproduce
In [1]: import pandas as pd
In [2]: parent = pd.DataFrame({'id': [1, 2, 3, 4]})
In [3]: child = pd.DataFrame({'id': [1, 2, 3, 4], 'parent_id': [1, 2, 3, 4]})
In [4]: foreign_keys = [('parent', 'id', 'child', 'parent_id')]
In [5]: data = {'parent': parent, 'child': child}
In [6]: from sdmetrics.multi_table import LogisticParentChildDetection
In [7]: LogisticParentChildDetection.compute(data, data, foreign_keys=foreign_keys)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/.virtualenvs/SDMetrics/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2894 try:
-> 2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'id'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-7-91c73836e519> in <module>
----> 1 LogisticParentChildDetection.compute(data, data, foreign_keys=foreign_keys)
~/Projects/MIT/SDMetrics/sdmetrics/multi_table/detection/parent_child.py in compute(cls, real_data, synthetic_data, metadata, foreign_keys)
104 scores = []
105 for foreign_key in foreign_keys:
--> 106 real = cls._denormalize(real_data, foreign_key)
107 synth = cls._denormalize(synthetic_data, foreign_key)
108 scores.append(cls.single_table_metric.compute(real, synth))
~/Projects/MIT/SDMetrics/sdmetrics/multi_table/detection/parent_child.py in _denormalize(data, foreign_key)
61 )
62
---> 63 del flat[parent_key]
64 if child_key != parent_key:
65 del flat[child_key]
~/.virtualenvs/SDMetrics/lib/python3.8/site-packages/pandas/core/generic.py in __delitem__(self, key)
3709 # there was no match, this call should raise the appropriate
3710 # exception:
-> 3711 loc = self.axes[-1].get_loc(key)
3712 self._mgr.idelete(loc)
3713
~/.virtualenvs/SDMetrics/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
-> 2897 raise KeyError(key) from err
2898
2899 if tolerance is not None:
KeyError: 'id'
The text was updated successfully, but these errors were encountered:
Description
The LogisticParentChildDetection metric crashes with a
KeyError
if the names of the primary_key and foreign_key are different and there is another field on either of the tables that is called like the key on the other table.For example, the parent table has the field
id
as its primary key and a child table contains both theid
as its own primary key andparent_id
as the foreign key to the parent. When this happens, theid
fields end up converted toid_x
andid_y
during the merge, and then thedel
statements after that fail.How to reproduce
The text was updated successfully, but these errors were encountered: