[BUG] Implement getReaderForRange in the RapidsShuffleManager #362

abellina · 2020-07-15T18:33:55Z

If we have a skewed join that has GPU written output, we may end up with needing to read by range.

Currently the RapidsShuffleManager will "delegate" to the cpu shuffle for this, but this will fail (the output was written the catalog, not to files).

This is to implement getReaderForRange

Note that in spark 3.1, getReader became final, and we should only use getReaderForRange.

The text was updated successfully, but these errors were encountered:

sameerz · 2020-08-19T19:22:53Z

Closing in favor of issue #455

Signed-off-by: Robert (Bobby) Evans <[email protected]>

abellina added bug Something isn't working ? - Needs Triage Need team to review and classify shuffle things that impact the shuffle plugin labels Jul 15, 2020

abellina removed the ? - Needs Triage Need team to review and classify label Jul 15, 2020

abellina mentioned this issue Jul 15, 2020

Update partitioning logic in ShuffledBatchRDD #319

Merged

sameerz closed this as completed Aug 19, 2020

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023

Add new native parquet footer API and deprecate the old one (NVIDIA#362)

ba5de10

Signed-off-by: Robert (Bobby) Evans <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Implement getReaderForRange in the RapidsShuffleManager #362

[BUG] Implement getReaderForRange in the RapidsShuffleManager #362

abellina commented Jul 15, 2020

sameerz commented Aug 19, 2020

[BUG] Implement getReaderForRange in the RapidsShuffleManager #362

[BUG] Implement getReaderForRange in the RapidsShuffleManager #362

Comments

abellina commented Jul 15, 2020

sameerz commented Aug 19, 2020