Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support array_remove #5225

Closed
viadea opened this issue Apr 12, 2022 · 2 comments · Fixed by #7115
Closed

[FEA] Support array_remove #5225

viadea opened this issue Apr 12, 2022 · 2 comments · Fixed by #7115
Assignees
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request

Comments

@viadea
Copy link
Collaborator

viadea commented Apr 12, 2022

I wish we can support array_remove.

Eg:

from pyspark.sql.functions import *
df = spark.createDataFrame([(["a", "b", "a"], ["b", "c"]), (["a","a"], ["b", "c"]), (["aa"], ["b", "c"])    ], ['x', 'y'])
df.write.format("parquet").mode("overwrite").save("/tmp/testparquet")
df = spark.read.parquet("/tmp/testparquet")
df.select(array_remove(df.x, "a").alias("remove")).collect()
    ! <ArrayRemove> array_remove(x#72, a) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.ArrayRemove
@viadea viadea added feature request New feature or request ? - Needs Triage Need team to review and classify labels Apr 12, 2022
@revans2
Copy link
Collaborator

revans2 commented Apr 13, 2022

This one we might be able to do without any help from CUDF, but it is probably simpler to get them involved.

I think the ideal way to do this would be to do a segmented filter, but that does not exist. There is a segmented gather, and we could build our own segmented_filter with it, but I would rather work with CUDF on it. Also there is a map_filter function that we could implement with this same functionality.

@revans2
Copy link
Collaborator

revans2 commented Apr 13, 2022

I filed rapidsai/cudf#10650 for this with CUDF

@revans2 revans2 added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Apr 13, 2022
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Apr 19, 2022
@cindyyuanjiang cindyyuanjiang self-assigned this Nov 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants