Ignore the order locally for repartition tests #163

firestarman · 2020-06-12T10:26:38Z

This PR is to fix a comparison issue for repartition tests.

Repartition is likely to mess up the order of the rows in a DataFrame, especially when running on multiple executors, so better to ignore the order locally when comparing the result.

revans2 · 2020-06-12T12:20:11Z

integration_tests/src/main/python/repart_test.py

@@ -37,6 +38,7 @@ def test_coalesce_df(num_parts, length):

 @pytest.mark.parametrize('num_parts', [1, 10, 100, 1000, 2000], ids=idfn)
 @pytest.mark.parametrize('length', [0, 2048, 4096], ids=idfn)
+@ignore_order('local')


Why local? local sort does not scale well. Is it because of floating point -0.0 vs 0.0 comparison. If so we should mark it as such with a comment.

no, just try to avoid involving any extra operation on the input DataFrame to make the test pure, since sort on Spark will shuffle the data again as far as i know.
Verifed sort on Spark also has the test passed.

if you prefer, i can remove the 'local'

The local sort is very slow compared to the distributed sort, and it does not scale for larger tests. I am fine with it being local, especially because this does deal with partitioning, but I would prefer a comment explaining why you picked local. Perhaps something like.

@ignore_order('local') # don't repartition again for sort

I forgot to add that this is just a nit if you want to merge it in as-is that is fine.

Thanks Bobby, really a good suggestion, updated to add a comment.

revans2 · 2020-06-12T12:20:24Z

build

revans2 · 2020-06-12T18:05:33Z

build

revans2 · 2020-06-13T13:43:10Z

build

* Ignore the order locally for repartition tests * Update repart_test.py Co-authored-by: Liangcai Li <[email protected]>

Signed-off-by: spark-rapids automation <[email protected]>

Ignore the order locally for repartition tests

dcde0f4

revans2 reviewed Jun 12, 2020

View reviewed changes

revans2 previously approved these changes Jun 13, 2020

View reviewed changes

Update repart_test.py

a07078b

firestarman dismissed revans2’s stale review via a07078b June 13, 2020 12:03

revans2 approved these changes Jun 13, 2020

View reviewed changes

revans2 merged commit 4f12285 into NVIDIA:branch-0.1 Jun 13, 2020

sameerz added the test Only impacts tests label Jun 15, 2020

sameerz added this to the Jun 8 - Jun 19 milestone Jun 16, 2020

firestarman deleted the ignore_order_repart branch June 17, 2020 02:46

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Ignore the order locally for repartition tests (NVIDIA#163)

cee2a89

* Ignore the order locally for repartition tests * Update repart_test.py Co-authored-by: Liangcai Li <[email protected]>

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Ignore the order locally for repartition tests (NVIDIA#163)

c1ef857

* Ignore the order locally for repartition tests * Update repart_test.py Co-authored-by: Liangcai Li <[email protected]>

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023

Update submodule cudf to fb03c8b (NVIDIA#163)

d9d20a9

Signed-off-by: spark-rapids automation <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore the order locally for repartition tests #163

Ignore the order locally for repartition tests #163

firestarman commented Jun 12, 2020

revans2 Jun 12, 2020

firestarman Jun 13, 2020

firestarman Jun 13, 2020

revans2 Jun 13, 2020

revans2 Jun 13, 2020

firestarman Jun 13, 2020

revans2 commented Jun 12, 2020

revans2 commented Jun 12, 2020

revans2 commented Jun 13, 2020

Ignore the order locally for repartition tests #163

Ignore the order locally for repartition tests #163

Conversation

firestarman commented Jun 12, 2020

revans2 Jun 12, 2020

Choose a reason for hiding this comment

firestarman Jun 13, 2020

Choose a reason for hiding this comment

firestarman Jun 13, 2020

Choose a reason for hiding this comment

revans2 Jun 13, 2020

Choose a reason for hiding this comment

revans2 Jun 13, 2020

Choose a reason for hiding this comment

firestarman Jun 13, 2020

Choose a reason for hiding this comment

revans2 commented Jun 12, 2020

revans2 commented Jun 12, 2020

revans2 commented Jun 13, 2020