Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Optimize join followed by a project/drop #23

Closed
revans2 opened this issue May 28, 2020 · 2 comments
Closed

[FEA] Optimize join followed by a project/drop #23

revans2 opened this issue May 28, 2020 · 2 comments
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request performance A performance related task/issue SQL part of the SQL/Dataframe plugin wontfix This will not be worked on

Comments

@revans2
Copy link
Collaborator

revans2 commented May 28, 2020

Is your feature request related to a problem? Please describe.
There are cases, TPC-H query 17, where a join happens and then many of the columns are dropped. It would be great if we could tell cudf not to bother materializing the columns that we know are not needed any more and will be dropped.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify SQL part of the SQL/Dataframe plugin performance A performance related task/issue labels May 28, 2020
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Oct 20, 2020
@jlowe
Copy link
Member

jlowe commented Oct 20, 2020

This should be possible once rapidsai/cudf#6480 is implemented.

wjxiz1992 pushed a commit to wjxiz1992/spark-rapids that referenced this issue Oct 29, 2020
wjxiz1992 pushed a commit to wjxiz1992/spark-rapids that referenced this issue Oct 29, 2020
@revans2 revans2 added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Feb 18, 2021
@mattahrens
Copy link
Collaborator

Closing as won't fix for now

@sameerz sameerz added the wontfix This will not be worked on label May 26, 2022
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
Signed-off-by: spark-rapids automation <[email protected]>
sperlingxx pushed a commit to sperlingxx/spark-rapids that referenced this issue Jan 18, 2024
sperlingxx pushed a commit to sperlingxx/spark-rapids that referenced this issue Jan 18, 2024
Signed-off-by: Firestarman <[email protected]>
res-life pushed a commit to res-life/spark-rapids that referenced this issue Jun 27, 2024
* add a heristic to skip agg pass

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* commit doc change

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* refine naming

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* fix only reduction case

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* fix compile

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* fix

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* clean

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* fix doc

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* reduce premergeci2

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* reduce premergeci2, 2

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* use test_parallel to workaround flaky array test

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* address review comment

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* remove comma

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* workaround for  ci_scala213

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* disable agg ratio heruistic by default

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* fix doc

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

---------

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Co-authored-by: Hongbin Ma (Mahone) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request performance A performance related task/issue SQL part of the SQL/Dataframe plugin wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

4 participants