You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a user, I would like a metric that gives me information about the uniqueness of my primary key and alternate key columns.
Expected behavior
Add a new single_column metric that calculates the percent of keys that are unique and not null.
This metric takes in primary key or alternate key columns (either ID or PII sdtypes).
Attributes
The metric should have the following attributes:
name: 'KeyUniqueness'
goal: Goal.MAXIMIZE
min_value: 0.0
max_value: 1.0
Methods
The metric should also define the following methods
compute(real_data, synthetic_data): Compute the score for the metric. The returned score should be the percent of keys that are unique and not null (eg. a score of 0.6 means 60% of the keys are unique and 40% are duplicates).
Parameters:
(required) real_data: A pandas.Series object with the column of real data
(required) synthetic_data: A pandas.Series object with the column of synthetic data
Returns: The score for this metric
If the real data does not pass this test (eg. contains duplicate or null values), then the metric should Error.
Problem Description
As a user, I would like a metric that gives me information about the uniqueness of my primary key and alternate key columns.
Expected behavior
single_column
metric that calculates the percent of keys that are unique and not null.Attributes
The metric should have the following attributes:
name
:'KeyUniqueness'
goal
:Goal.MAXIMIZE
min_value
: 0.0max_value
: 1.0Methods
The metric should also define the following methods
compute(real_data, synthetic_data)
: Compute the score for the metric. The returned score should be the percent of keys that are unique and not null (eg. a score of 0.6 means 60% of the keys are unique and 40% are duplicates).real_data
: Apandas.Series
object with the column of real datasynthetic_data
: Apandas.Series
object with the column of synthetic dataThe text was updated successfully, but these errors were encountered: