-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race in ORC string dictionary creation #13214
Conversation
Signed-off-by: Robert (Bobby) Evans <[email protected]>
@@ -72,6 +72,10 @@ static __device__ void LoadNonNullIndices(volatile dictinit_state_s* s, | |||
Storage& temp_storage) | |||
{ | |||
if (t == 0) { s->nnz = 0; } | |||
if (s->chunk.num_rows <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copyrights need to be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nvdbaranec Is there anything similar in parquet code?
Nice catch :) |
/merge |
Thank you @revans2 for investigating this - excellent work |
Description
Unfortunately this is really hard to reproduce. For whatever reason I had to try and reproduce this on a relatively small data set with at least 140,001 rows or more, where one column is a LIST but all of the lists are empty lists and another column is a STRUCT column with two STRING child columns where all of the STRINGS are empty. I also had to sort and partition the data before doing the write, and it had to be in a very specific environment with T4 GPUs. I don't know why all of those were needed to make the race happen regularly, but it did.
Because of this complexity in reproducing it I have not added in any unit tests.
The problem was essentially a race when trying to calculate dictionary duplication for strings in ORC. As a part of this a function
LoadNonNullIndices
was being called that was supposed to set a valuennz
in a shared memory locations
. In the normal case a loop was taken where__syncthreads()
was called, but if there were no rows in the column (the LIST column) then the loop was not taken and it was a race to see ifnnz
which was set to 0 by thread 0 showed up in all of the threads or not.What made this crash is that this
nnz
value is used to determine what happens in the rest of the kernel to see if it reads data, or writes to temp memory (which is not allocated if previous processing shows that there is no need for it), or any of that. Ifnnz
is non-zero then it tries to do all of those things and bad stuff starts to happen.Checklist
As a side note I am not a C++ or CUDA expert so I am happy to any suggestions.