Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Faster transition from columns to rows #14

Closed
revans2 opened this issue May 28, 2020 · 1 comment
Closed

[FEA] Faster transition from columns to rows #14

revans2 opened this issue May 28, 2020 · 1 comment
Assignees
Labels
duplicate This issue or pull request already exists feature request New feature or request performance A performance related task/issue SQL part of the SQL/Dataframe plugin

Comments

@revans2
Copy link
Collaborator

revans2 commented May 28, 2020

Is your feature request related to a problem? Please describe.
The current columnar to row conversion code pulls back columnar data to the CPU and then walks through it. We have seen a lot of issues with this and the cache. It can be a real performance issue.

In the past we tried to create the unsafe row format for fixed width types using a cuda kernel and it worked. The problem was that it used up a lot of memory. It might be good to explore some kind of a hybrid approach where we can create a more compressed row based format that can very easily be expanded out into unsafe row on the fly on the CPU, possibly using some code generation.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify SQL part of the SQL/Dataframe plugin performance A performance related task/issue labels May 28, 2020
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Oct 13, 2020
@sameerz sameerz added the duplicate This issue or pull request already exists label Dec 1, 2020
@sameerz
Copy link
Collaborator

sameerz commented Dec 1, 2020

Duplicate of #507

@sameerz sameerz marked this as a duplicate of #507 Dec 1, 2020
@sameerz sameerz closed this as completed Dec 1, 2020
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
binmahone added a commit to binmahone/spark-rapids that referenced this issue Jun 12, 2024
commit doc change



refine naming



fix only reduction case



fix compile



fix



clean



fix doc

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists feature request New feature or request performance A performance related task/issue SQL part of the SQL/Dataframe plugin
Projects
None yet
Development

No branches or pull requests

2 participants