-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for multiple uuid4 types in a table #1303
Comments
Hi @HotDiggityDogz (great username!), thanks for testing out the SDV 1.0 Beta and providing feedback. I can replicate the issue. I agree it's strange that both columns always have the same (randomly generated) values. In real world settings, I imagine you would expect the values to be different. ContextStarting from SDV 1.0, we control the randomization so that it's possible to deterministically sample the same data whenever you create a synthesizer. The Controlling Randomization section has more details about this. It seems like we are fixing the same seed for each uuid column, which is resulting in the same random values for both columns.
I'm note quite sure what you mean by this? I see that every time I call |
FYI This issue will be resolved in our RDT library, which is responsible for the data preprocessing and anonymization. I've filed RDT issue 619 with more details, and we'll use it to track the fix. In the meantime, we can keep this issue open until the fix is propagated back into the SDV library. |
Hi @HotDiggityDogz, this issue has now been fixed and is part of the latest release (SDV 1.0.1). |
Problem Description
If a table has more than one column with datatype of uuid4, then when generating synthetic data both columns get the same uuid4 value in any given row. In the original data, the uuids are different (and refer to different entities) .
data:image/s3,"s3://crabby-images/7bc1d/7bc1d47bc31739d688b48e26882b7958dd59aecd" alt="Screenshot 2023-03-09 at 11 47 15 AM"
Expected behavior
Each column of type 'uuid4' should have it's own value generated within a a row.
Additional context
Changing models or distribution within a model did not fix this.
The values are also exactly the same every time data is generated (unlike data of other types)
The text was updated successfully, but these errors were encountered: