Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in the ability to fingerprint JSON columns #11002

Merged
merged 4 commits into from
Jun 12, 2024

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Jun 7, 2024

This gives datagen the ability to automatically gather some statistics about JSON formatted columns so that the data gen tool can produce data that would work with JSON parsing tools like get_json_object or from_json. This replaces some of the previous JSON generation code.

It does this by introducing the concept of using multiple different data gens to produce a single column. Right now that is limited to string columns, but it could be expanded out into others in the future. I also want to extend these same concepts so that we could fingerprint a table the same way as a good starting point.

@revans2
Copy link
Collaborator Author

revans2 commented Jun 7, 2024

build

@revans2
Copy link
Collaborator Author

revans2 commented Jun 10, 2024

build

@revans2
Copy link
Collaborator Author

revans2 commented Jun 12, 2024

@jlowe please take another look when you get a chance

@revans2
Copy link
Collaborator Author

revans2 commented Jun 12, 2024

build

@revans2 revans2 merged commit d9686d4 into NVIDIA:branch-24.08 Jun 12, 2024
44 checks passed
@revans2 revans2 deleted the json_datagen branch June 12, 2024 21:26
revans2 added a commit to revans2/spark-rapids that referenced this pull request Jun 13, 2024
revans2 added a commit that referenced this pull request Jun 13, 2024
Revert "Add in the ability to fingerprint JSON columns (#11002)" [skip ci]
SurajAralihalli pushed a commit to SurajAralihalli/spark-rapids that referenced this pull request Jul 12, 2024
SurajAralihalli pushed a commit to SurajAralihalli/spark-rapids that referenced this pull request Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants