Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]The Scala UDF function cannot invoke the UDF compiler when it's passed to "explode" #1079

Closed
wjxiz1992 opened this issue Nov 6, 2020 · 1 comment · Fixed by #1153
Closed
Assignees
Labels
bug Something isn't working P1 Nice to have for release

Comments

@wjxiz1992
Copy link
Collaborator

wjxiz1992 commented Nov 6, 2020

Describe the bug
When a UDF is passed to explode() function like explode(myUdf), the UDF compiler will not be used to try to compile this myUdf.

Root cause
The udf plugin didn't try to replace a udf is this udf is a child plan of Project

Steps/Code to reproduce bug

    val myudf: (String) => Array[String] = a => {
      a.split(",")
    }
    val u = makeUdf(myudf)
    val dataset = List("first,second").toDF("x").repartition(1)
    var result = dataset.withColumn("new", explode(u(col("x"))))
    result.explain(true)

Expected behavior
I should be able to see the compiled Catalyst expressions like "split(x....) AS new" after "Analyzed Logical Plan".

Environment details (please complete the following information)

  • Environment location: Local in an IDEA debug environment
  • Spark configuration settings related to the issue :
    .set("spark.sql.extensions", "com.nvidia.spark.udf.Plugin")
    .set("spark.rapids.sql.udfCompiler.enabled", "true")
    .set("spark.rapids.sql.test.enabled", "true")
@wjxiz1992 wjxiz1992 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Nov 6, 2020
@wjxiz1992
Copy link
Collaborator Author

wjxiz1992 commented Nov 13, 2020

With the fix from @abellina , the project child of explode can be compiled now.

diff --git a/udf-compiler/src/main/scala/com/nvidia/spark/udf/Plugin.scala b/udf-compiler/src/main/scala/com/nvidia/spark/udf/Plugin.scala
index 679d126b9..98164e70e 100644
--- a/udf-compiler/src/main/scala/com/nvidia/spark/udf/Plugin.scala
+++ b/udf-compiler/src/main/scala/com/nvidia/spark/udf/Plugin.scala
@@ -82,7 +82,7 @@ case class LogicalPlanRules() extends Rule[LogicalPlan] with Logging {
       plan match {
         case project: Project =>
           Project(project.projectList.map(e => attemptToReplaceExpression(plan, e))
-              .asInstanceOf[Seq[NamedExpression]], project.child)
+              .asInstanceOf[Seq[NamedExpression]], apply(project.child))
         case x => {
           x.transformExpressions(replacePartialFunc(plan))
         }

The real problem and correct description for this issue turns to be "child of Project cannot be replaced".
@abellina Do you think we can close this issue with your fix and create another issue for GpuExplode ? It's for the sql plugin side, not a UDF problem.

@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Nov 17, 2020
@GaryShen2008 GaryShen2008 added the P1 Nice to have for release label Dec 3, 2020
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1 Nice to have for release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants