Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Renaming of PythonMapInArrow [databricks] #10931

Merged
merged 4 commits into from
May 31, 2024

Conversation

razajafri
Copy link
Collaborator

In Spark change apache/spark@ed9a3a8, PythonMapInArrow has been changed to MapInArrow. This PR adds shims to support that change.

fixes #10673

@razajafri razajafri changed the base branch from branch-24.06 to branch-24.08 May 28, 2024 21:00
Signed-off-by: Raza Jafri <[email protected]>
@@ -0,0 +1,72 @@
/*
* Copyright (c) 2022-2024, NVIDIA CORPORATION.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Copyright (c) 2022-2024, NVIDIA CORPORATION.
* Copyright (c) 2024, NVIDIA CORPORATION.

import org.apache.spark.sql.execution.python.MapInArrowExec
import org.apache.spark.sql.rapids.execution.python.GpuMapInBatchExec

class GpuMapInArrowExecMetaBase(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused why this is here and not in GpuMapInArrowExecMeta, and also confusing why the base class has a convertToGpu that needs to be overridden by the only class that actually uses it. Why have a base class if there's only one user, or why have a method that is not used in the base class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, thank you!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was fighting the compiler for potentially including a common base class, I gave up but then had some remnants left

@razajafri razajafri changed the title Add Support for Renaming of PythonMapInArrow Add Support for Renaming of PythonMapInArrow [databricks] May 28, 2024
@razajafri
Copy link
Collaborator Author

build

@sameerz sameerz added the Spark 4.0+ Spark 4.0+ issues label May 29, 2024
@razajafri
Copy link
Collaborator Author

build

1 similar comment
@razajafri
Copy link
Collaborator Author

build

ExecChecks((TypeSig.commonCudfTypes + TypeSig.ARRAY + TypeSig.STRUCT).nested(),
TypeSig.all),
(mapPy, conf, p, r) => new GpuMapInArrowExecMeta(mapPy, conf, p, r) {
override def tagPlanForGpu(): Unit = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason this method is overridden here as opposed to sql-plugin/src/main/spark400/scala/org/apache/spark/sql/rapids/shims/GpuMapInArrowExecMeta.scala?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer. I have addressed it. PTAL

@razajafri
Copy link
Collaborator Author

build

@razajafri razajafri merged commit 2977c14 into NVIDIA:branch-24.08 May 31, 2024
44 checks passed
@razajafri razajafri deleted the SP-9259-rename-arrow-class branch May 31, 2024 23:27
SurajAralihalli pushed a commit to SurajAralihalli/spark-rapids that referenced this pull request Jul 12, 2024
* Add support for the renaming of PythonMapInArrow to MapInArrow

* Signing off

Signed-off-by: Raza Jafri <[email protected]>

* Removed the unnecessary base class from 400

* addressed review comments

---------

Signed-off-by: Raza Jafri <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Spark 4.0+ Spark 4.0+ issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AUDIT] Rename plan nodes for PythonMapInArrowExec
4 participants