Add `KeyUniqueness` metric #460

frances-h · 2023-10-09T21:33:02Z

Problem Description

As a user, I would like a metric that gives me information about the uniqueness of my primary key and alternate key columns.

Expected behavior

Add a new single_column metric that calculates the percent of keys that are unique and not null.
This metric takes in primary key or alternate key columns (either ID or PII sdtypes).

Attributes

The metric should have the following attributes:

name: 'KeyUniqueness'
goal: Goal.MAXIMIZE
min_value: 0.0
max_value: 1.0

Methods

The metric should also define the following methods

compute(real_data, synthetic_data): Compute the score for the metric. The returned score should be the percent of keys that are unique and not null (eg. a score of 0.6 means 60% of the keys are unique and 40% are duplicates).
- Parameters:
  - (required) real_data: A pandas.Series object with the column of real data
  - (required) synthetic_data: A pandas.Series object with the column of synthetic data
- Returns: The score for this metric
- If the real data does not pass this test (eg. contains duplicate or null values), then the metric should Error.

>>> from sdmetrics.single_column import KeyUniqueness
>>> KeyUniqueness.compute(
	real_data=real_table['user_id'],
	synthetic_data=synthetic_table['user_id'])
1.0
>>> KeyUniqueness.compute_breakdown(
	real_data=real_table['ethnicity'],
	synthetic_data=synthetic_table['ethnicity'])
{ 'score': 1.0 }

The text was updated successfully, but these errors were encountered:

frances-h added feature request Request for a new feature new Label applied to new issues labels Oct 9, 2023

frances-h mentioned this issue Oct 12, 2023

Add DataValidity property #467

Closed

R-Palazzo mentioned this issue Oct 23, 2023

Add KeyUniqueness metric #474

Merged

amontanez24 added this to the 0.13.0 milestone Oct 23, 2023

amontanez24 removed the new Label applied to new issues label Oct 23, 2023

R-Palazzo mentioned this issue Nov 7, 2023

New Diagnostic Reports #499

Merged

R-Palazzo closed this as completed in #499 Nov 27, 2023

amontanez24 assigned R-Palazzo Nov 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `KeyUniqueness` metric #460

Add `KeyUniqueness` metric #460

frances-h commented Oct 9, 2023 •

edited

Loading

Add KeyUniqueness metric #460

Add KeyUniqueness metric #460

Comments

frances-h commented Oct 9, 2023 • edited Loading

Problem Description

Expected behavior

Attributes

Methods

Add `KeyUniqueness` metric #460

Add `KeyUniqueness` metric #460

frances-h commented Oct 9, 2023 •

edited

Loading