You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Detection metrics use machine learning to determine whether the real vs. synthetic data can be detected. For this to work, we should only be using columns that are statistically modeled.
Expected behavior
When running any of the detection metrics, the following columns should be ignored:
Primary keys
Foreign keysEdit: Foreign keys do not need to be considered because Detection metrics are only implemented at the single table level.
Any other kinds of IDs
PII or sensitive data
Text data (or data created by RegEx)
None of these columns provide any useful information for detection.
The remaining data types are statistically modeled and should be included: numerical, datetime, categorical (non-PII), boolean
Additional context
We already filtered out primary keys in #119. The issue of foreign keys is discussed in #285.
The text was updated successfully, but these errors were encountered:
Problem Description
The Detection metrics use machine learning to determine whether the real vs. synthetic data can be detected. For this to work, we should only be using columns that are statistically modeled.
Expected behavior
When running any of the detection metrics, the following columns should be ignored:
Foreign keysEdit: Foreign keys do not need to be considered because Detection metrics are only implemented at the single table level.None of these columns provide any useful information for detection.
The remaining data types are statistically modeled and should be included: numerical, datetime, categorical (non-PII), boolean
Additional context
We already filtered out primary keys in #119. The issue of foreign keys is discussed in #285.
The text was updated successfully, but these errors were encountered: