Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add options to build Arrow with Python and Parquet support (#8670)
This PR adds two options when CPM builds Arrow during a libcudf source build: `CUDF_ENABLE_ARROW_PYTHON`, and `CUDF_ENABLE_ARROW_PARQUET`. These options enable building `libarrow.so` with Python and Parquet support, so that we can build the pyarrow and cuDF Cython after building libcudf. For example: ```shell export PARALLEL_LEVEL=$(nproc --ignore=2) # Clone cuDF git clone --depth 1 --branch branch-21.08 https://github.com/rapidsai/cudf.git /opt/rapids/cudf # Build and install libcudf (also builds libarrow/libarrow_cuda) cmake -GNinja \ -S /opt/rapids/cudf/cpp \ -B /opt/rapids/cudf/cpp/build \ -D CUDF_ENABLE_ARROW_S3=OFF \ -D CUDF_ENABLE_ARROW_PYTHON=ON \ -D CUDF_ENABLE_ARROW_PARQUET=ON \ && cmake --build /opt/rapids/cudf/cpp/build -j${PARALLEL_LEVEL} -v --target install # Build and install pyarrow cd /opt/rapids/cudf/cpp/build/_deps/arrow-src/python \ && ARROW_HOME=/usr/local \ PYARROW_WITH_S3=OFF \ PYARROW_WITH_ORC=ON \ PYARROW_WITH_CUDA=ON \ PYARROW_WITH_HDFS=OFF \ PYARROW_WITH_FLIGHT=OFF \ PYARROW_WITH_PLASMA=OFF \ PYARROW_WITH_DATASET=ON \ PYARROW_WITH_GANDIVA=OFF \ PYARROW_WITH_PARQUET=ON \ PYARROW_BUILD_TYPE=Release \ PYARROW_CMAKE_GENERATOR=Ninja \ PYARROW_PARALLEL=${PARALLEL_LEVEL} \ ARROW_PYTHON_DIR=/opt/rapids/cudf/cpp/build/_deps/arrow-src/python \ && python setup.py install --single-version-externally-managed --record=record.txt # Build and install cudf python cd /opt/rapids/cudf/python/cudf \ && pip install --upgrade \ "nvtx>=0.2.1" \ "numba>=0.53.1" \ "fsspec>=0.6.0" \ "protobuf>=3.0.0" \ "fastavro>=0.22.9" \ "transformers>=4.8" \ "pandas>=1.0,<1.3.0dev0" \ "cmake-setuptools>=0.1.3" \ "cupy-cuda112>7.1.0,<10.0.0a0" \ "git+https://github.com/dask/dask.git@main" \ "git+https://github.com/dask/distributed.git@main" \ "git+https://github.com/rapidsai/[email protected]" \ && python setup.py build_ext -j${PARALLEL_LEVEL} --inplace \ && python setup.py install --single-version-externally-managed --record=record.txt # Build and install dask_cudf python cd /opt/rapids/cudf/python/dask_cudf \ && python setup.py build_ext -j${PARALLEL_LEVEL} --inplace \ && python setup.py install --single-version-externally-managed --record=record.txt ``` Authors: - Paul Taylor (https://github.com/trxcllnt) Approvers: - Robert Maynard (https://github.com/robertmaynard) URL: #8670
- Loading branch information