Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in basic support for running tpcds like queries #506

Merged
merged 4 commits into from
Aug 4, 2020

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Aug 4, 2020

This pulls in several TPC-DS like queries into the integration tests. They are based off of the databricks version at https://github.com/databricks/spark-sql-perf but I modified a few of the queries slightly so that our test code could sort the results properly to verify that the correctness of the results.

I have tested this on data at a scale factor of 1 and everything passes, but not everything is running on the GPU.

I disabled partitioning of the data when writing parquet for now, just because it produced really really small files, and we might need to look into what is the right thing to do for larger test runs.

I plan on doing some more testing at la larger scale factor to see how things go.

There are here mostly to help us debug issue with customers who are running tpc-ds. These are not official runs in any way.

@revans2 revans2 added the test Only impacts tests label Aug 4, 2020
@revans2 revans2 self-assigned this Aug 4, 2020
@revans2
Copy link
Collaborator Author

revans2 commented Aug 4, 2020

build

@jlowe jlowe added this to the Aug 3 - Aug 14 milestone Aug 4, 2020
Copy link
Collaborator

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nits. This is pretty awesome to have.

Signed-off-by: Robert (Bobby) Evans <[email protected]>
@revans2
Copy link
Collaborator Author

revans2 commented Aug 4, 2020

build

@revans2
Copy link
Collaborator Author

revans2 commented Aug 4, 2020

@abellina please take another look

jlowe
jlowe previously approved these changes Aug 4, 2020
abellina
abellina previously approved these changes Aug 4, 2020
@tgravescs
Copy link
Collaborator

can we document the restrictions and diffs like you have in the description in the code for reference.

@revans2 revans2 dismissed stale reviews from abellina and jlowe via d0cda08 August 4, 2020 18:38
@revans2
Copy link
Collaborator Author

revans2 commented Aug 4, 2020

build

@revans2
Copy link
Collaborator Author

revans2 commented Aug 4, 2020

@tgravescs please take another look

@revans2 revans2 merged commit b8da990 into NVIDIA:branch-0.2 Aug 4, 2020
@revans2 revans2 deleted the tpc-ds-like branch August 4, 2020 20:05
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants