Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParentChildDetection metrics KeyError #39

Closed
csala opened this issue Jan 26, 2021 · 0 comments · Fixed by #40
Closed

ParentChildDetection metrics KeyError #39

csala opened this issue Jan 26, 2021 · 0 comments · Fixed by #40
Assignees
Labels
bug Something isn't working
Milestone

Comments

@csala
Copy link
Contributor

csala commented Jan 26, 2021

  • SDMetrics version: v0.1.1

Description

The LogisticParentChildDetection metric crashes with a KeyError if the names of the primary_key and foreign_key are different and there is another field on either of the tables that is called like the key on the other table.

For example, the parent table has the field id as its primary key and a child table contains both the id as its own primary key and parent_id as the foreign key to the parent. When this happens, the id fields end up converted to id_x and id_y during the merge, and then the del statements after that fail.

How to reproduce

In [1]: import pandas as pd

In [2]: parent = pd.DataFrame({'id': [1, 2, 3, 4]})

In [3]: child = pd.DataFrame({'id': [1, 2, 3, 4], 'parent_id': [1, 2, 3, 4]})

In [4]: foreign_keys = [('parent', 'id', 'child', 'parent_id')]

In [5]: data = {'parent': parent, 'child': child}

In [6]: from sdmetrics.multi_table import LogisticParentChildDetection

In [7]: LogisticParentChildDetection.compute(data, data, foreign_keys=foreign_keys)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/.virtualenvs/SDMetrics/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2894             try:
-> 2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'id'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-7-91c73836e519> in <module>
----> 1 LogisticParentChildDetection.compute(data, data, foreign_keys=foreign_keys)

~/Projects/MIT/SDMetrics/sdmetrics/multi_table/detection/parent_child.py in compute(cls, real_data, synthetic_data, metadata, foreign_keys)
    104         scores = []
    105         for foreign_key in foreign_keys:
--> 106             real = cls._denormalize(real_data, foreign_key)
    107             synth = cls._denormalize(synthetic_data, foreign_key)
    108             scores.append(cls.single_table_metric.compute(real, synth))

~/Projects/MIT/SDMetrics/sdmetrics/multi_table/detection/parent_child.py in _denormalize(data, foreign_key)
     61         )
     62 
---> 63         del flat[parent_key]
     64         if child_key != parent_key:
     65             del flat[child_key]

~/.virtualenvs/SDMetrics/lib/python3.8/site-packages/pandas/core/generic.py in __delitem__(self, key)
   3709             # there was no match, this call should raise the appropriate
   3710             # exception:
-> 3711             loc = self.axes[-1].get_loc(key)
   3712             self._mgr.idelete(loc)
   3713 

~/.virtualenvs/SDMetrics/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:
-> 2897                 raise KeyError(key) from err
   2898 
   2899         if tolerance is not None:

KeyError: 'id'
@csala csala added the bug Something isn't working label Jan 26, 2021
@csala csala added this to the 0.1.2 milestone Jan 26, 2021
@csala csala self-assigned this Jan 26, 2021
@csala csala closed this as completed in #40 Jan 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant