Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support concatenating ArrayType columns #2013

Closed
Dooyoung-Hwang opened this issue Mar 24, 2021 · 1 comment · Fixed by #2379
Closed

[FEA] Support concatenating ArrayType columns #2013

Dooyoung-Hwang opened this issue Mar 24, 2021 · 1 comment · Fixed by #2379
Assignees
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request

Comments

@Dooyoung-Hwang
Copy link
Contributor

I request a feature of concatenating columns of ArrayType into one column on GPU.

  • Test code
data = [(1,[0.1, 0.4], [0.1, 0.4], 1.0, -1.0), (2,  [0.3, 0.5], [0.3, 0.5], 2.0, -1.0), (3, [0.3, 0.5], [2.1, 2.2], 3.0, -1.0)]
schema = StructType([ \
    StructField("a",IntegerType(),True), \
    StructField("b",ArrayType(FloatType()),True), \
    StructField("c",ArrayType(FloatType()), True), \
    StructField("d",FloatType(),True), \
    StructField("e",FloatType(),True)
  ])

df1 = spark.createDataFrame(data=data,schema=schema).withColumn("arr1", concat("b", "c"))
print(df1._jdf.queryExecution())
  • Output log from spark-driver
!Expression concat(b#7221, c#7222) cannot run on GPU because expression Concat concat(b#7221, c#7222) produces an unsupported type ArrayType(FloatType,true); expression AttributeReference b#7221 produces an unsupported type ArrayType(FloatType,true); expression AttributeReference c#7222 produces an unsupported type ArrayType(FloatType,true)
@Dooyoung-Hwang Dooyoung-Hwang added ? - Needs Triage Need team to review and classify feature request New feature or request labels Mar 24, 2021
@sameerz sameerz added cudf_dependency An issue or PR with this label depends on a new feature in cudf and removed ? - Needs Triage Need team to review and classify labels Mar 24, 2021
@revans2
Copy link
Collaborator

revans2 commented Mar 30, 2021

I filed rapidsai/cudf#7767 for this.

@sameerz sameerz linked a pull request May 11, 2021 that will close this issue
sperlingxx added a commit that referenced this issue May 20, 2021
Closes #2013

Support GpuConcat on ArrayType. And introduce some refinement for GpuConcat on StringType.

Signed-off-by: sperlingxx <[email protected]>
abellina pushed a commit to abellina/spark-rapids that referenced this issue May 24, 2021
Closes NVIDIA#2013

Support GpuConcat on ArrayType. And introduce some refinement for GpuConcat on StringType.

Signed-off-by: sperlingxx <[email protected]>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this issue Jun 9, 2021
Closes NVIDIA#2013

Support GpuConcat on ArrayType. And introduce some refinement for GpuConcat on StringType.

Signed-off-by: sperlingxx <[email protected]>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this issue Jun 9, 2021
Closes NVIDIA#2013

Support GpuConcat on ArrayType. And introduce some refinement for GpuConcat on StringType.

Signed-off-by: sperlingxx <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants