You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Columns described as sdtype=numerical should only have data that is numerical (or missing) – no other strings, etc. are allowed Error: Invalid values found for numerical column 'age': ('a', 'b', 'c', +more)
Columns described as sdtype=datetime should only have data that datetime or missing – try to convert everything to the datetime format and error if it fails Error: Invalid values found for datetime column 'start_date': (0.0, 30.0, 4.4, +more)
Columns described as sdtype=boolean should only have values that are: True, False or missing Error: Invalid values found for boolean column 'is_subscribed': (0.0, 30.0, 4.4, +more)
Columns marked as a key (primary, alternate, sequence, or foreign) should not have any missing values Error: Key column 'user_id' contains missing values
Columns marked as a primary or alternate keys should be unique in the table Error: Primary key column 'user_id' contains repeating values: ('UID_000', 'UID_001', 'UID_002', +more)
(Sequential only) Context columns (stored in the model's parameters) should be fixed for each sequence – ie if you group by sequence key, the context columns should not vary Error: Context column 'patient_address' is changing inside sequence ('Patient_ID'='ID_004').
Additional context
For the sequential case, we should override the method in the PARSynthesizer
The text was updated successfully, but these errors were encountered:
Problem Description
As a user, it would be helpful if I could check if my data was valid according to my metadata.
Expected behavior
validate
method to theBaseSynthesizer
InvalidDataError
in the following formatError: Invalid values found for numerical column 'age': ('a', 'b', 'c', +more)
Error: Invalid values found for datetime column 'start_date': (0.0, 30.0, 4.4, +more)
Error: Invalid values found for boolean column 'is_subscribed': (0.0, 30.0, 4.4, +more)
Error: Key column 'user_id' contains missing values
Error: Primary key column 'user_id' contains repeating values: ('UID_000', 'UID_001', 'UID_002', +more)
Error: Context column 'patient_address' is changing inside sequence ('Patient_ID'='ID_004').
Additional context
PARSynthesizer
The text was updated successfully, but these errors were encountered: