Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multiple uuid4 types in a table #1303

Closed
HotDiggityDogz opened this issue Mar 9, 2023 · 3 comments
Closed

Support for multiple uuid4 types in a table #1303

HotDiggityDogz opened this issue Mar 9, 2023 · 3 comments
Labels
feature request Request for a new feature resolution:resolved The issue was fixed, the question was answered, etc.

Comments

@HotDiggityDogz
Copy link

Problem Description

If a table has more than one column with datatype of uuid4, then when generating synthetic data both columns get the same uuid4 value in any given row. In the original data, the uuids are different (and refer to different entities) .
Screenshot 2023-03-09 at 11 47 15 AM

Expected behavior

Each column of type 'uuid4' should have it's own value generated within a a row.

Additional context

Changing models or distribution within a model did not fix this.
The values are also exactly the same every time data is generated (unlike data of other types)

@HotDiggityDogz HotDiggityDogz added feature request Request for a new feature new Automatic label applied to new issues labels Mar 9, 2023
@npatki
Copy link
Contributor

npatki commented Mar 9, 2023

Hi @HotDiggityDogz (great username!), thanks for testing out the SDV 1.0 Beta and providing feedback.

I can replicate the issue. I agree it's strange that both columns always have the same (randomly generated) values. In real world settings, I imagine you would expect the values to be different.

Context

Starting from SDV 1.0, we control the randomization so that it's possible to deterministically sample the same data whenever you create a synthesizer. The Controlling Randomization section has more details about this.

It seems like we are fixing the same seed for each uuid column, which is resulting in the same random values for both columns.

The values are also exactly the same every time data is generated (unlike data of other types)

I'm note quite sure what you mean by this? I see that every time I call sample, there are new values being generated.

image

@npatki npatki added under discussion Issue is currently being discussed SDV 1.0 and removed new Automatic label applied to new issues labels Mar 9, 2023
@npatki
Copy link
Contributor

npatki commented Mar 13, 2023

FYI This issue will be resolved in our RDT library, which is responsible for the data preprocessing and anonymization.

I've filed RDT issue 619 with more details, and we'll use it to track the fix. In the meantime, we can keep this issue open until the fix is propagated back into the SDV library.

@npatki npatki removed the SDV 1.0 label Mar 29, 2023
@npatki
Copy link
Contributor

npatki commented Apr 21, 2023

Hi @HotDiggityDogz, this issue has now been fixed and is part of the latest release (SDV 1.0.1).

@npatki npatki closed this as completed Apr 21, 2023
@npatki npatki added resolution:resolved The issue was fixed, the question was answered, etc. and removed under discussion Issue is currently being discussed labels Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature resolution:resolved The issue was fixed, the question was answered, etc.
Projects
None yet
Development

No branches or pull requests

2 participants