Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JCUDF row to cuDF column for tables with strings #10286

Closed
Tracked by #10033
hyperbolic2346 opened this issue Feb 14, 2022 · 4 comments · Fixed by #10871
Closed
Tracked by #10033

JCUDF row to cuDF column for tables with strings #10286

hyperbolic2346 opened this issue Feb 14, 2022 · 4 comments · Fixed by #10871
Assignees

Comments

@hyperbolic2346
Copy link
Contributor

hyperbolic2346 commented Feb 14, 2022

Is your feature request related to a problem? Please describe.
Now that the row offset iterator is written, the next step in getting strings converted in the row to column and column to row code is to implement one side. This is the implementation issue for the row to column portion of the work. This will accept JCUDF rows with strings and produce a table with string columns in it for use by the spark-rapids plugin.

Describe the solution you'd like
The plan is to use the existing fixed-width code to fill in a device_uvectorwith length and source offset values, since that data is written inside the fixed-width section. Scanning the length vector will produce an offset column. Then the string data itself will need to be copied, but with the length, src offset, and dest offset arrays this should be fairly trivial. The original pass at this will break it up with a string per warp, but this will scale poorly if the strings are drastically different sizes.

Describe alternatives you've considered
Other methods to parallelize the work were considered including trying to break it up where each thread would copy a specific number of bytes to the proper destination. The complexity of this approach led us to the current solution in the interest of time.

Additional context
This is part of the larger feature of #10033

@hyperbolic2346 hyperbolic2346 changed the title write row to column kernel code for variable-width data copy. JCUDF row to cuDF column for tables with strings Feb 14, 2022
@hyperbolic2346 hyperbolic2346 self-assigned this Feb 14, 2022
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@sameerz
Copy link
Contributor

sameerz commented Mar 31, 2022

Still needed.

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@sameerz
Copy link
Contributor

sameerz commented May 2, 2022

Still needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants