Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add iterator for variable-length string data row offsets #10111

Closed
Tracked by #10033
hyperbolic2346 opened this issue Jan 24, 2022 · 0 comments · Fixed by #10157
Closed
Tracked by #10033

[FEA] Add iterator for variable-length string data row offsets #10111

hyperbolic2346 opened this issue Jan 24, 2022 · 0 comments · Fixed by #10157
Assignees
Labels
feature request New feature or request Spark Functionality that helps Spark RAPIDS

Comments

@hyperbolic2346
Copy link
Contributor

Is your feature request related to a problem? Please describe.
The next step for the row to column and column to row conversion code is to support strings. The first step to that support is to write an iterator for row offsets that supports string data. This would be an iterator similar to row_offset_functor that supports the variable-length string data.

Describe the solution you'd like
Since each element will query the row offset, expensive calculations or lookups should be avoided for this iterator. Manifesting the data into a device_uvector for the iterator to index into seems the best approach. This array will be built by looking at the offset columns of the string data for each row.

Describe alternatives you've considered
Performing multiple lookups for offsets was briefly considered, but thought to be too expensive to performance.

@hyperbolic2346 hyperbolic2346 added feature request New feature or request Needs Triage Need team to review and classify labels Jan 24, 2022
@hyperbolic2346 hyperbolic2346 self-assigned this Jan 24, 2022
@hyperbolic2346 hyperbolic2346 added the Spark Functionality that helps Spark RAPIDS label Jan 24, 2022
rapids-bot bot pushed a commit that referenced this issue Feb 11, 2022
…onversion (#10157)

This is the first step to supporting variable-width strings in the row to column and column to row code. It adds an iterator that reads the offset columns inside string columns to compute the row sizes of this variable-width data.

Note that this doesn't add support for strings yet, but is the first step in that direction.

closes #10111

Authors:
  - Mike Wilson (https://github.com/hyperbolic2346)

Approvers:
  - MithunR (https://github.com/mythrocks)
  - Nghia Truong (https://github.com/ttnghia)
  - https://github.com/nvdbaranec

URL: #10157
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants