Column to JCUDF row for strings #10234

hyperbolic2346 · 2022-02-07T17:14:08Z

Is your feature request related to a problem? Please describe.
Now that the row offset iterator is written, the next step in getting strings converted in the row to column and column to row code is to implement one side. This is the implementation issue for the column to row portion of the work. This will accept a table with string columns in it and convert it into the JCUDF row format for the spark-rapids plugin.

Describe the solution you'd like
The kernel will break up the work with a warp doing a single row. The 0th thread of the warp will write the offset/length of the data and then all threads will participate in the memcpy_async call to copy the actual string data.

Describe alternatives you've considered
Other methods to parallelize the work were considered including trying to break it up where each thread would copy a specific number of bytes to the proper destination. This would result in lower_bound calls to try and figure out the destination for the data and had issues with data chunks spanning multiple destinations. The complexity of this approach led us to the current solution in the interest of time.

Additional context
This is part of the larger feature of #10033

The text was updated successfully, but these errors were encountered:

github-actions · 2022-03-13T20:07:21Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

This is the code for the column to row portion of the string work. This code will convert a table that includes strings into the JCUDF row format. This depends on #10157 and as such, is a draft PR until that is merged. I am putting this up now so people working on reviewing that PR can see where it is headed. closes #10234 Authors: - Mike Wilson (https://github.com/hyperbolic2346) Approvers: - Nghia Truong (https://github.com/ttnghia) - MithunR (https://github.com/mythrocks) - https://github.com/nvdbaranec URL: #10235

hyperbolic2346 mentioned this issue Feb 7, 2022

[FEA] Add string support to row/column conversion #10033

Closed

3 tasks

hyperbolic2346 changed the title ~~write column to row kernel code for variable-width data copy.~~ Column to JCUDF row for strings Feb 7, 2022

hyperbolic2346 self-assigned this Feb 7, 2022

hyperbolic2346 added feature request New feature or request Spark Functionality that helps Spark RAPIDS labels Feb 7, 2022

This was referenced Feb 7, 2022

Column to JCUDF row for tables with strings #10235

Merged

[FEA]Add support in column to row conversion for strings #10160

Closed

github-actions bot added the inactive-30d label Mar 13, 2022

rapids-bot bot closed this as completed in #10235 Mar 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Column to JCUDF row for strings #10234

Column to JCUDF row for strings #10234

hyperbolic2346 commented Feb 7, 2022 •

edited

Loading

github-actions bot commented Mar 13, 2022

Column to JCUDF row for strings #10234

Column to JCUDF row for strings #10234

Comments

hyperbolic2346 commented Feb 7, 2022 • edited Loading

github-actions bot commented Mar 13, 2022

hyperbolic2346 commented Feb 7, 2022 •

edited

Loading