Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validate to SingleTableMetadata #930

Merged
merged 18 commits into from
Aug 4, 2022

Conversation

fealho
Copy link
Member

@fealho fealho commented Jul 29, 2022

Resolve #879.

@fealho fealho changed the base branch from master to V1.0.0.dev July 29, 2022 16:42

@staticmethod
def _validate_datetime(column_name, **kwargs):
datetime_format = kwargs.get('datetime_format')
if datetime_format:
if datetime_format is not None:
Copy link
Member Author

@fealho fealho Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated. It was not validating if the value was an empty string.

sdv/metadata/single_table.py Outdated Show resolved Hide resolved
@fealho
Copy link
Member Author

fealho commented Aug 2, 2022

The #879 issue asks us to test Real World (Semantic) Types (ie. phone_number, but this was not done in the validate_columns PR. Do we still want to add this?

@fealho fealho force-pushed the issue-879-validate-SingleTableMetadata branch from 14d8108 to 5e4f658 Compare August 2, 2022 17:52
@codecov-commenter
Copy link

codecov-commenter commented Aug 3, 2022

Codecov Report

Merging #930 (af34779) into V1.0.0.dev (2fde2d0) will increase coverage by 0.43%.
The diff coverage is 96.55%.

@@              Coverage Diff               @@
##           V1.0.0.dev     #930      +/-   ##
==============================================
+ Coverage       72.73%   73.17%   +0.43%     
==============================================
  Files              40       40              
  Lines            3323     3377      +54     
==============================================
+ Hits             2417     2471      +54     
  Misses            906      906              
Impacted Files Coverage Δ
sdv/metadata/dataset.py 64.78% <25.00%> (ø)
sdv/metadata/table.py 80.99% <50.00%> (ø)
sdv/constraints/base.py 97.16% <100.00%> (+0.02%) ⬆️
sdv/constraints/tabular.py 99.00% <100.00%> (ø)
sdv/metadata/__init__.py 100.00% <100.00%> (ø)
sdv/metadata/errors.py 100.00% <100.00%> (ø)
sdv/metadata/multi_table.py 98.52% <100.00%> (ø)
sdv/metadata/single_table.py 99.14% <100.00%> (+0.24%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@fealho fealho force-pushed the issue-879-validate-SingleTableMetadata branch from c7c61c9 to 06b9dbe Compare August 3, 2022 15:51
Copy link
Member Author

@fealho fealho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments to help understand the code. Messages marked with Unrelated mean they are standalone and unrelated to the PR itself.

sdv/constraints/base.py Show resolved Hide resolved
@@ -135,16 +136,16 @@ def _validate_metadata_columns(cls, metadata, **kwargs):
else:
column_names = kwargs.get('column_names')

missing_columns = set(column_names) - set(metadata._columns)

missing_columns = set(column_names) - set(metadata._columns) - {None}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated. This happens when a column name was not passed/is missing.

raise ConstraintMetadataError(
f'A {cls.__name__} constraint is being applied to invalid column names '
f'{article} {cls.__name__} constraint is being applied to invalid column names '
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated.

sdv/constraints/base.py Show resolved Hide resolved
@@ -250,55 +248,90 @@ def _validate_datatype(id):
"""Check whether id is a string or a tuple of strings."""
return isinstance(id, str) or isinstance(id, tuple) and all(isinstance(i, str) for i in id)

def _validate_key(self, id, key_type):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous implementation had the validation inside the set_key methods. Pulling it out of the key methods so I can use it in the validate method.


self._metadata['sequence_index'] = column_name
def _validate_constraint(self, constraint_name, **kwargs):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulling this out of add_constraint because I need it in for validate.

try:
self._validate_constraint(constraint_name, **kwargs)
except MultipleConstraintsErrors as e:
reformated_errors = '\n'.join(map(str, e.errors))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to do this because MultipleConstraintsErrors sets it's own __str__, which messes up the formatting of the error message.


def to_dict(self):
"""Return a python ``dict`` representation of the ``SingleTableMetadata``."""
metadata = {}
for key, value in self._metadata.items():
for key in self._KEYS:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change needed because we deleted _metadata.

@@ -990,7 +990,7 @@ def test__validate_metadata_specific_to_constraint_datetime_error(self):

# Run
error_message = re.escape(
'An Inequality constraint is being applied to mismatched sdtypes '
'An Inequality constraint is being applied to mismatched sdtype columns '
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated. Phrasing was confusing, it made it seem the values were sdtypes when they are actually columns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: can we reword to is being applied to columns with mismatched sdtypes then? I think that makes more sense

@fealho fealho marked this pull request as ready for review August 3, 2022 17:02
@fealho fealho requested a review from a team as a code owner August 3, 2022 17:02
@fealho fealho requested review from amontanez24 and pvk-developer and removed request for a team and amontanez24 August 3, 2022 17:02
Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. A few minor comments

sdv/constraints/base.py Show resolved Hide resolved
sdv/constraints/base.py Show resolved Hide resolved
@@ -990,7 +990,7 @@ def test__validate_metadata_specific_to_constraint_datetime_error(self):

# Run
error_message = re.escape(
'An Inequality constraint is being applied to mismatched sdtypes '
'An Inequality constraint is being applied to mismatched sdtype columns '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: can we reword to is being applied to columns with mismatched sdtypes then? I think that makes more sense

tests/unit/constraints/test_tabular.py Outdated Show resolved Hide resolved
tests/unit/constraints/test_tabular.py Outdated Show resolved Hide resolved
tests/unit/constraints/test_tabular.py Outdated Show resolved Hide resolved
tests/unit/metadata/test_single_table.py Show resolved Hide resolved
tests/unit/metadata/test_single_table.py Show resolved Hide resolved
tests/unit/metadata/test_single_table.py Show resolved Hide resolved
@fealho
Copy link
Member Author

fealho commented Aug 3, 2022

@amontanez24 I'm not sure what you mean here: #930 (comment), it was already changed in the base.

@amontanez24
Copy link
Contributor

@amontanez24 I'm not sure what you mean here: #930 (comment), it was already changed in the base.

you're right, I read something wrong

Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some very minor comments, but I think this is good to go!

tests/unit/constraints/test_base.py Outdated Show resolved Hide resolved
sdv/metadata/single_table.py Outdated Show resolved Hide resolved
self._append_error(errors, self._validate_column, column, **kwargs)

if errors:
raise InvalidMetadataError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary for this PR but later I think we might want to make a custom error that takes a list of errors and joins them in the message. We're probably going to be doing that a lot in the future

@fealho fealho force-pushed the issue-879-validate-SingleTableMetadata branch from ee9284f to af34779 Compare August 3, 2022 20:30
@fealho fealho merged commit aabcaad into V1.0.0.dev Aug 4, 2022
@fealho fealho deleted the issue-879-validate-SingleTableMetadata branch August 4, 2022 02:18
pvk-developer pushed a commit that referenced this pull request Aug 9, 2022
* Add key validation methods (the methods themselves still gotta be revised)

* Combined errors implementation

* Finish constraints

* Add silly constraint solution

* Add try/catches for validation + add unit and integration tests

* Change MetadataError to InvalidMetadataError

* Simplify tests to one element + fix lint

* Remove _metadata

* Change way _metadata implemented in to_dict

* Add constraint validation + integration test + change _constraints to be list of tuple

* Fix error message for validate + add error integration test

* Fix all constraint related error messages

* Add helper _append_error to validate + fix constraint related errors

* Add constraint unit test

* Fix lint

* change _metadata[pk] to pk

* Add tests requested + addresses feedback

* Address minor feedback
pvk-developer pushed a commit that referenced this pull request Aug 10, 2022
* Add key validation methods (the methods themselves still gotta be revised)

* Combined errors implementation

* Finish constraints

* Add silly constraint solution

* Add try/catches for validation + add unit and integration tests

* Change MetadataError to InvalidMetadataError

* Simplify tests to one element + fix lint

* Remove _metadata

* Change way _metadata implemented in to_dict

* Add constraint validation + integration test + change _constraints to be list of tuple

* Fix error message for validate + add error integration test

* Fix all constraint related error messages

* Add helper _append_error to validate + fix constraint related errors

* Add constraint unit test

* Fix lint

* change _metadata[pk] to pk

* Add tests requested + addresses feedback

* Address minor feedback
amontanez24 pushed a commit that referenced this pull request Sep 13, 2022
* Add key validation methods (the methods themselves still gotta be revised)

* Combined errors implementation

* Finish constraints

* Add silly constraint solution

* Add try/catches for validation + add unit and integration tests

* Change MetadataError to InvalidMetadataError

* Simplify tests to one element + fix lint

* Remove _metadata

* Change way _metadata implemented in to_dict

* Add constraint validation + integration test + change _constraints to be list of tuple

* Fix error message for validate + add error integration test

* Fix all constraint related error messages

* Add helper _append_error to validate + fix constraint related errors

* Add constraint unit test

* Fix lint

* change _metadata[pk] to pk

* Add tests requested + addresses feedback

* Address minor feedback
amontanez24 pushed a commit that referenced this pull request Nov 22, 2022
* Add key validation methods (the methods themselves still gotta be revised)

* Combined errors implementation

* Finish constraints

* Add silly constraint solution

* Add try/catches for validation + add unit and integration tests

* Change MetadataError to InvalidMetadataError

* Simplify tests to one element + fix lint

* Remove _metadata

* Change way _metadata implemented in to_dict

* Add constraint validation + integration test + change _constraints to be list of tuple

* Fix error message for validate + add error integration test

* Fix all constraint related error messages

* Add helper _append_error to validate + fix constraint related errors

* Add constraint unit test

* Fix lint

* change _metadata[pk] to pk

* Add tests requested + addresses feedback

* Address minor feedback
amontanez24 pushed a commit that referenced this pull request Dec 21, 2022
* Add key validation methods (the methods themselves still gotta be revised)

* Combined errors implementation

* Finish constraints

* Add silly constraint solution

* Add try/catches for validation + add unit and integration tests

* Change MetadataError to InvalidMetadataError

* Simplify tests to one element + fix lint

* Remove _metadata

* Change way _metadata implemented in to_dict

* Add constraint validation + integration test + change _constraints to be list of tuple

* Fix error message for validate + add error integration test

* Fix all constraint related error messages

* Add helper _append_error to validate + fix constraint related errors

* Add constraint unit test

* Fix lint

* change _metadata[pk] to pk

* Add tests requested + addresses feedback

* Address minor feedback
amontanez24 pushed a commit that referenced this pull request Feb 17, 2023
* Add key validation methods (the methods themselves still gotta be revised)

* Combined errors implementation

* Finish constraints

* Add silly constraint solution

* Add try/catches for validation + add unit and integration tests

* Change MetadataError to InvalidMetadataError

* Simplify tests to one element + fix lint

* Remove _metadata

* Change way _metadata implemented in to_dict

* Add constraint validation + integration test + change _constraints to be list of tuple

* Fix error message for validate + add error integration test

* Fix all constraint related error messages

* Add helper _append_error to validate + fix constraint related errors

* Add constraint unit test

* Fix lint

* change _metadata[pk] to pk

* Add tests requested + addresses feedback

* Address minor feedback
amontanez24 pushed a commit that referenced this pull request Mar 23, 2023
* Add key validation methods (the methods themselves still gotta be revised)

* Combined errors implementation

* Finish constraints

* Add silly constraint solution

* Add try/catches for validation + add unit and integration tests

* Change MetadataError to InvalidMetadataError

* Simplify tests to one element + fix lint

* Remove _metadata

* Change way _metadata implemented in to_dict

* Add constraint validation + integration test + change _constraints to be list of tuple

* Fix error message for validate + add error integration test

* Fix all constraint related error messages

* Add helper _append_error to validate + fix constraint related errors

* Add constraint unit test

* Fix lint

* change _metadata[pk] to pk

* Add tests requested + addresses feedback

* Address minor feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add validate method to SingleTableMetadata
4 participants