Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column to JCUDF row for strings #10234

Closed
Tracked by #10033
hyperbolic2346 opened this issue Feb 7, 2022 · 1 comment · Fixed by #10235
Closed
Tracked by #10033

Column to JCUDF row for strings #10234

hyperbolic2346 opened this issue Feb 7, 2022 · 1 comment · Fixed by #10235
Assignees
Labels
feature request New feature or request Spark Functionality that helps Spark RAPIDS

Comments

@hyperbolic2346
Copy link
Contributor

hyperbolic2346 commented Feb 7, 2022

Is your feature request related to a problem? Please describe.
Now that the row offset iterator is written, the next step in getting strings converted in the row to column and column to row code is to implement one side. This is the implementation issue for the column to row portion of the work. This will accept a table with string columns in it and convert it into the JCUDF row format for the spark-rapids plugin.

Describe the solution you'd like
The kernel will break up the work with a warp doing a single row. The 0th thread of the warp will write the offset/length of the data and then all threads will participate in the memcpy_async call to copy the actual string data.

Describe alternatives you've considered
Other methods to parallelize the work were considered including trying to break it up where each thread would copy a specific number of bytes to the proper destination. This would result in lower_bound calls to try and figure out the destination for the data and had issues with data chunks spanning multiple destinations. The complexity of this approach led us to the current solution in the interest of time.

Additional context
This is part of the larger feature of #10033

@hyperbolic2346 hyperbolic2346 changed the title write column to row kernel code for variable-width data copy. Column to JCUDF row for strings Feb 7, 2022
@hyperbolic2346 hyperbolic2346 self-assigned this Feb 7, 2022
@hyperbolic2346 hyperbolic2346 added feature request New feature or request Spark Functionality that helps Spark RAPIDS labels Feb 7, 2022
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

rapids-bot bot pushed a commit that referenced this issue Mar 22, 2022
This is the code for the column to row portion of the string work. This code will convert a table that includes strings into the JCUDF row format. This depends on #10157 and as such, is a draft PR until that is merged. I am putting this up now so people working on reviewing that PR can see where it is headed.

closes #10234

Authors:
  - Mike Wilson (https://github.com/hyperbolic2346)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - MithunR (https://github.com/mythrocks)
  - https://github.com/nvdbaranec

URL: #10235
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant