You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The internal implementation of the Unique constraint relies on the input data index.
Because of this, errors such as #616 and #610 happen, but other weird behaviors can be observed, such as the constraint consider that valid data is invalid:
In [30]: data=pd.DataFrame({'unique': [1, 2, 3]}, index=[0, 0, 0])
In [31]: unique.is_valid(data)
Out[31]:
0True1False2Falsedtype: bool
Or producing inconsistent results:
In [38]: data=pd.DataFrame({'unique': [1, 2, 1]}, index=[0, 2, 1])
In [39]: unique.is_valid(data)
Out[39]:
0True1False2Truedtype: bool
We should review its implementation to prevent all these scenarios.
The text was updated successfully, but these errors were encountered:
…#619)
* fix unique constraint on data with column named index
- alters implementation of Unique constraint
- adds test for this behavior
* - tests to show problem of #616 and #617
- first implementation that fixes this
* refactors unique constraint
- removes usage of list
- uses cumcount on group by
- numbers occurences of group
- group nr. 0 is first occurence of this group
- cumcount equal to 0 are the valid rows
* reference Github issues in test
* remove the call of reset_index
- keeps the original index specified
* remove metadata specification
- pass constraint directly as parameter
* remove metadata specification
- specify constraint directly in model creation
- change docstrings to new test
* add test unique constraint with index column
- unique constraint set on test_column
- column named index in dataframe
* use older graphviz version
- version 0.18.1 errors
* return to old setup.py
Environment Details
Error Description
The internal implementation of the
Unique
constraint relies on the input dataindex
.Because of this, errors such as #616 and #610 happen, but other weird behaviors can be observed, such as the constraint consider that valid data is invalid:
Or producing inconsistent results:
We should review its implementation to prevent all these scenarios.
The text was updated successfully, but these errors were encountered: