Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix drop_unknown_references for null foreign keys #1820

Closed
R-Palazzo opened this issue Feb 27, 2024 · 0 comments · Fixed by #1822
Closed

Fix drop_unknown_references for null foreign keys #1820

R-Palazzo opened this issue Feb 27, 2024 · 0 comments · Fixed by #1822
Assignees
Labels
bug Something isn't working
Milestone

Comments

@R-Palazzo
Copy link
Contributor

R-Palazzo commented Feb 27, 2024

Environment Details

SDV main

Error Description

Currently, the drop_unknown_references doesn't work as expected for foreign keys with null values.
This bug arises because having null values inside the foreign keys is valid, and so metadata.validate_data(data) no longer crashes.

To fix it, drop_unknown_references should check for null foreign keys and update the data accordingly, also according to its parameter drop_missing_values.

Steps to reproduce

The following snippet code currently fails while it should work

parent = pd.DataFrame(data={
        'id': [0, 1, 2, 3, 4],
        'A': [True, True, False, True, False],
        'B': [0.434, 0.312, 0.212, 0.339, 0.491]
    })

    child = pd.DataFrame(data={
        'parent_id': [0, 1, 2, 2, 5],
        'C': ['Yes', 'No', 'Maye', 'No', 'No']
    })

 data =  {
        'parent': parent,
        'child': child
}
data['child'].loc[4, 'parent_id'] = np.nan

cleaned_data = drop_unknown_references(metadata, data)
metadata.validate_data(cleaned_data)

pd.testing.assert_frame_equal(cleaned_data['parent'], data['parent'])
pd.testing.assert_frame_equal(cleaned_data['child'], data['child'].iloc[:4])
assert len(cleaned_data['child']) == 4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants