-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unused variable and fix memory issue in ORC writer #12984
Conversation
cpp/src/io/orc/writer_impl.cu
Outdated
segmentation, | ||
stripes, | ||
orc_table, | ||
pd_masks, /* unused, but needs to be kept alive */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand why the lifetime matters if it's unused? I don't see what this is holding, or what is holding this data, that makes its lifetime relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That buffer contains intermediate null mask for the intermediate data (orc_table
) during null pushdown. Thus, we need to keep it alive in order to access the null mask later in the write_
function. Otherwise, the null mask pointer in orc_table
will be dangling pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is easy to overlook and ignore such variable because it is unused, leading to hidden invalid memory access bugs when the buffer is destroyed earlier than needed, similar to this situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it needed after convert_table_to_orc_data
? statistics computation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not used, but it stores null mask for orc_table
to be valid. So as long as we still need orc_table
we need to keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as long as we still need
orc_table
we need to keep it.
Not exactly. We only need it as long as we need the null masks in the orc_table
. I can't find where that is used convert_table_to_orc_data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used in calculate_aligned_rowgroup_bounds
which is called in encode_columns
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memory lifetime is extremely non-obvious and hidden behind several layers of functions. We have a few issues I'd like to improve here:
- The line
auto pd_masks = init_pushdown_null_masks(orc_table, stream);
is modifyingorc_table
in-place with no docstrings or comments indicating that the inputorc_table
is modified (!) or that the lifetime ofpd_masks
is relevant. I would recommend that we avoid in-place modifications of input parameters and make the return values meaningful -- rather than just their lifetimes. - The struct type
pushdown_null_masks
is never really used. It has membersdata
andmasks
but it's just storing things. It could be astd::tuple
or pair instead. - Should the lifetime of the pushed-down null masks be tied to the lifetime of the
orc_table_view
? Are there other similar objects? Could a single object be used to tie their lifetimes together?
I don't want to block this PR indefinitely. I'd be happy with a partial solution to any of these issues, or at least a discussion of how we can improve and an issue with next steps.
@bdice Maybe |
Maybe so. That would fix a lot of the lifetime questions. I don't know if there is precedent in the I/O codebase for this, or if it's confusing to have a
It's not a big deal either way, but I assumed the struct would be using accessors like |
Taking this into draft as can't verify/reproduce the issue thus I'm not sure if my fix is correct. I also found something else that can be the source of problem. Will report later. |
I thought that we had both |
EOD update: still can't find where pushdown null masks are used after the encode. I'm really confused about the reported fix. |
This reverts commit b43de26.
Alright, with the help of GCC address sanitizer (ASAN), I caught the bug at this line, which deref |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. We can rework the code to eliminate the need to store the dictionaries separately in a later PR.
/merge |
This removes unused variable in ORC writer, and also fix a memory issue with dereferencing dangling pointers due to a device buffer being destroyed early while it should be kept alive.