Detection metrics should only use statistically modeled columns (filter out the rest) #286

npatki · 2022-12-20T22:13:24Z

Problem Description

The Detection metrics use machine learning to determine whether the real vs. synthetic data can be detected. For this to work, we should only be using columns that are statistically modeled.

Expected behavior

When running any of the detection metrics, the following columns should be ignored:

Primary keys
~~Foreign keys~~ Edit: Foreign keys do not need to be considered because Detection metrics are only implemented at the single table level.
Any other kinds of IDs
PII or sensitive data
Text data (or data created by RegEx)

None of these columns provide any useful information for detection.

The remaining data types are statistically modeled and should be included: numerical, datetime, categorical (non-PII), boolean

Additional context

We already filtered out primary keys in #119. The issue of foreign keys is discussed in #285.

npatki added the feature request Request for a new feature label Dec 20, 2022

npatki mentioned this issue Dec 20, 2022

Does removing foreign keys in detection metrics for multi-tables make sense? #285

Closed

mohamedgy mentioned this issue Jan 2, 2023

Issue 286 detection remove fkey #289

Open

lajohn4747 mentioned this issue Nov 17, 2023

Filter out keys that cannot be statistically modeled #525

Merged

lajohn4747 closed this as completed in #525 Nov 22, 2023

amontanez24 assigned lajohn4747 Nov 30, 2023

amontanez24 added this to the 0.13.0 milestone Nov 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detection metrics should only use statistically modeled columns (filter out the rest) #286

Detection metrics should only use statistically modeled columns (filter out the rest) #286

npatki commented Dec 20, 2022 •

edited

Loading

Detection metrics should only use statistically modeled columns (filter out the rest) #286

Detection metrics should only use statistically modeled columns (filter out the rest) #286

Comments

npatki commented Dec 20, 2022 • edited Loading

Problem Description

Expected behavior

Additional context

npatki commented Dec 20, 2022 •

edited

Loading