Rework some python tests of Parquet delta encodings #15693

etseidl · 2024-05-07T19:30:56Z

Description

test_parquet.py currently takes around 55s to run on an RTXA6000 system. A large portion of that run time is in two tests of the Parquet DELTA_LENGTH_BYTE_ARRAY and DELTA_BYTE_ARRAY encodings. These tests are parameterized with varying row counts to test certain encoding edge cases, but the final two row counts (10,000, 50,000) are unnecessarily large to provide adequate test coverage. This PR reduces the number of row counts (some were redundant) and decreases the maximum row count to 1,000. This drops the execution time to just under 26s on the same system.

This PR also corrects an oversight from #15239. DELTA_BYTE_ARRAY encoding should have been added to the tests at that time.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2024-05-07T19:30:59Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

vuule · 2024-05-07T19:44:20Z

/ok to test

vuule · 2024-05-08T00:10:47Z

/merge

etseidl and others added 3 commits May 7, 2024 19:13

speed up delta encoding tests, fix an omission

966771d

Merge branch 'rapidsai:branch-24.06' into delta_pytest_speedup

471386f

add more delta_byte_array

a64514d

etseidl requested a review from a team as a code owner May 7, 2024 19:30

etseidl requested review from vyasr and bdice May 7, 2024 19:30

github-actions bot added the Python Affects Python cuDF API. label May 7, 2024

vuule added tests Unit testing for project improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 7, 2024

vuule approved these changes May 7, 2024

View reviewed changes

mroeschke approved these changes May 7, 2024

View reviewed changes

vyasr approved these changes May 8, 2024

View reviewed changes

rapids-bot bot merged commit d29af84 into rapidsai:branch-24.06 May 8, 2024
70 checks passed

etseidl deleted the delta_pytest_speedup branch May 8, 2024 05:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework some python tests of Parquet delta encodings #15693

Rework some python tests of Parquet delta encodings #15693

etseidl commented May 7, 2024

copy-pr-bot bot commented May 7, 2024

vuule commented May 7, 2024

vuule commented May 8, 2024

Rework some python tests of Parquet delta encodings #15693

Rework some python tests of Parquet delta encodings #15693

Conversation

etseidl commented May 7, 2024

Description

Checklist

copy-pr-bot bot commented May 7, 2024

vuule commented May 7, 2024

vuule commented May 8, 2024