You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Spark 3.3.0 has introduced a new API mapInArrow in PySpark DataFrame, see SPARK-37228 and PR apache/spark#34505. mapInArrow is quite similar with mapInPandas, the only difference is the input is Iterable[pa.RecordBatch] for mapInArrow, while it is Iterator[pd.DataFrame] for mapInPandas.
PyArrow has already supported CUDA Integration, see https://arrow.apache.org/docs/python/integration/cuda.html and potential CUDA IPC, which means, there is a chance that Rapids Accelerator has the opportunity to support ZERO-COPY between JVM process an python process and improve the performance.
I hope it can be supported in Spark-Rapids in 22.12 release.
The text was updated successfully, but these errors were encountered:
Spark 3.3.0 has introduced a new API mapInArrow in PySpark DataFrame, see SPARK-37228 and PR apache/spark#34505. mapInArrow is quite similar with mapInPandas, the only difference is the input is Iterable[pa.RecordBatch] for mapInArrow, while it is Iterator[pd.DataFrame] for mapInPandas.
PyArrow has already supported CUDA Integration, see https://arrow.apache.org/docs/python/integration/cuda.html and potential CUDA IPC, which means, there is a chance that Rapids Accelerator has the opportunity to support ZERO-COPY between JVM process an python process and improve the performance.
I hope it can be supported in Spark-Rapids in 22.12 release.
The text was updated successfully, but these errors were encountered: