Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support mapInArrow introduced by pyspark 3.3.0+ #6313

Closed
wbo4958 opened this issue Aug 15, 2022 · 0 comments · Fixed by #6823
Closed

[FEA] Support mapInArrow introduced by pyspark 3.3.0+ #6313

wbo4958 opened this issue Aug 15, 2022 · 0 comments · Fixed by #6823
Assignees
Labels
audit_3.3.0 Audit related tasks for 3.3.0 feature request New feature or request

Comments

@wbo4958
Copy link
Collaborator

wbo4958 commented Aug 15, 2022

Spark 3.3.0 has introduced a new API mapInArrow in PySpark DataFrame, see SPARK-37228 and PR apache/spark#34505. mapInArrow is quite similar with mapInPandas, the only difference is the input is Iterable[pa.RecordBatch] for mapInArrow, while it is Iterator[pd.DataFrame] for mapInPandas.

PyArrow has already supported CUDA Integration, see https://arrow.apache.org/docs/python/integration/cuda.html and potential CUDA IPC, which means, there is a chance that Rapids Accelerator has the opportunity to support ZERO-COPY between JVM process an python process and improve the performance.

I hope it can be supported in Spark-Rapids in 22.12 release.

@wbo4958 wbo4958 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Aug 15, 2022
@wbo4958 wbo4958 self-assigned this Aug 15, 2022
@wbo4958 wbo4958 changed the title [FEA] Support mapInArrow in pyspark 3.3.0+ [FEA] Support mapInArrow introduced by pyspark 3.3.0+ Aug 15, 2022
@amahussein amahussein added the audit_3.3.0 Audit related tasks for 3.3.0 label Aug 16, 2022
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Aug 16, 2022
@GaryShen2008 GaryShen2008 assigned firestarman and unassigned wbo4958 Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
audit_3.3.0 Audit related tasks for 3.3.0 feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants