Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] decompressed batches corrupt if they are made spillable #7827

Open
abellina opened this issue Feb 28, 2023 · 1 comment
Open

[BUG] decompressed batches corrupt if they are made spillable #7827

abellina opened this issue Feb 28, 2023 · 1 comment
Labels
bug Something isn't working reliability Features to improve reliability or bugs that severly impact the reliability of the plugin shuffle things that impact the shuffle plugin

Comments

@abellina
Copy link
Collaborator

While working on #7777 I ran into an issue where a decompressed batch (via nvcomp/UCX) was made spillable, but then I got a corrupted batch out when calling getColumnarBatch. This is likely an issue of 23.04.

So far, when we decompress batches in GpuCoalesceBatches https://github.com/NVIDIA/spark-rapids/blob/branch-23.02/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCoalesceBatches.scala#L645, we are taking the TableMeta directly from the compressed vector, instead of building a new one without compression info.

Code that uses the metadata to rebuild the ColumnarBatch would produce an invalid batch, because we do different things when the batch has codecs defined: https://github.com/NVIDIA/spark-rapids/blob/branch-23.02/sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsDeviceMemoryStore.scala#L282, and

GpuCompressedColumnVector.from(devBuffer, meta)

I think I can fix this with #7777, but adding an issue since it's a bug, and I am not sure if it was made worse in 23.04 because of #7572, since we now rely on TableMeta on the first creation of a batch from a RapidsBuffer.

@abellina abellina added bug Something isn't working ? - Needs Triage Need team to review and classify shuffle things that impact the shuffle plugin reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Feb 28, 2023
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Feb 28, 2023
@abellina
Copy link
Collaborator Author

abellina commented Mar 6, 2023

Related issue: #7850

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working reliability Features to improve reliability or bugs that severly impact the reliability of the plugin shuffle things that impact the shuffle plugin
Projects
None yet
Development

No branches or pull requests

2 participants