You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As of SDV v1.8.0, the metadata auto-detection can identify a wide variety of sdtypes:
statistical columns such as 'numerical'', 'datetime', 'categorical', 'boolean'
semantic concepts such as 'email', 'phone_number', 'latitude', etc.
structured identifiers, 'id'
However, when detecting a primary key, it only considers columns that are sdtype 'id'. In reality, semantic columns such as 'email' or 'phone_number' may also be primary keys and should be considered as possibilities.
Expected behavior
Consider the default demo dataset. The first column, 'guest_email' is the primary key. The metadata should continue detect is as sdtype 'email', and it should also mark is a primary key.
Note that if the column was named 'guest_id' and contained random number ids, then the metadata script would correctly identify the sdtype as 'id' and it would mark it as a primary key.
The text was updated successfully, but these errors were encountered:
npatki
changed the title
Metadata auto-detection should find primary keys of any sdtype
Metadata auto-detection should find primary keys of semantic sdtypes
Dec 19, 2023
Problem Description
As of SDV v1.8.0, the metadata auto-detection can identify a wide variety of sdtypes:
'numerical'
','datetime'
,'categorical'
,'boolean'
'email'
,'phone_number'
,'latitude'
, etc.'id'
However, when detecting a primary key, it only considers columns that are sdtype
'id'
. In reality, semantic columns such as'email'
or'phone_number'
may also be primary keys and should be considered as possibilities.Expected behavior
Consider the default demo dataset. The first column,
'guest_email'
is the primary key. The metadata should continue detect is as sdtype'email'
, and it should also mark is a primary key.Additional context
Note that if the column was named
'guest_id'
and contained random number ids, then the metadata script would correctly identify the sdtype as'id'
and it would mark it as a primary key.The text was updated successfully, but these errors were encountered: