Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add options to build Arrow with Python and Parquet support #8670

Merged

Conversation

trxcllnt
Copy link
Contributor

@trxcllnt trxcllnt commented Jul 7, 2021

This PR adds two options when CPM builds Arrow during a libcudf source build: CUDF_ENABLE_ARROW_PYTHON, and CUDF_ENABLE_ARROW_PARQUET.

These options enable building libarrow.so with Python and Parquet support, so that we can build the pyarrow and cuDF Cython after building libcudf. For example:

export PARALLEL_LEVEL=$(nproc --ignore=2)

# Clone cuDF
git clone --depth 1 --branch branch-21.08 https://github.com/rapidsai/cudf.git /opt/rapids/cudf

# Build and install libcudf (also builds libarrow/libarrow_cuda)
cmake -GNinja \
          -S /opt/rapids/cudf/cpp \
          -B /opt/rapids/cudf/cpp/build \
          -D CUDF_ENABLE_ARROW_S3=OFF \
          -D CUDF_ENABLE_ARROW_PYTHON=ON \
          -D CUDF_ENABLE_ARROW_PARQUET=ON \
 && cmake --build /opt/rapids/cudf/cpp/build -j${PARALLEL_LEVEL} -v --target install

# Build and install pyarrow
cd /opt/rapids/cudf/cpp/build/_deps/arrow-src/python \
 && ARROW_HOME=/usr/local \
    PYARROW_WITH_S3=OFF \
    PYARROW_WITH_ORC=ON \
    PYARROW_WITH_CUDA=ON \
    PYARROW_WITH_HDFS=OFF \
    PYARROW_WITH_FLIGHT=OFF \
    PYARROW_WITH_PLASMA=OFF \
    PYARROW_WITH_DATASET=ON \
    PYARROW_WITH_GANDIVA=OFF \
    PYARROW_WITH_PARQUET=ON \
    PYARROW_BUILD_TYPE=Release \
    PYARROW_CMAKE_GENERATOR=Ninja \
    PYARROW_PARALLEL=${PARALLEL_LEVEL} \
    ARROW_PYTHON_DIR=/opt/rapids/cudf/cpp/build/_deps/arrow-src/python \
 && python setup.py install --single-version-externally-managed --record=record.txt

# Build and install cudf python
cd /opt/rapids/cudf/python/cudf \
 && pip install --upgrade \
    "nvtx>=0.2.1" \
    "numba>=0.53.1" \
    "fsspec>=0.6.0" \
    "protobuf>=3.0.0" \
    "fastavro>=0.22.9" \
    "transformers>=4.8" \
    "pandas>=1.0,<1.3.0dev0" \
    "cmake-setuptools>=0.1.3" \
    "cupy-cuda112>7.1.0,<10.0.0a0" \
    "git+https://github.com/dask/dask.git@main" \
    "git+https://github.com/dask/distributed.git@main" \
    "git+https://github.com/rapidsai/[email protected]" \
 && python setup.py build_ext -j${PARALLEL_LEVEL} --inplace \
 && python setup.py install --single-version-externally-managed --record=record.txt

 # Build and install dask_cudf python
cd /opt/rapids/cudf/python/dask_cudf \
 && python setup.py build_ext -j${PARALLEL_LEVEL} --inplace \
 && python setup.py install --single-version-externally-managed --record=record.txt

@trxcllnt trxcllnt requested a review from a team as a code owner July 7, 2021 14:16
@github-actions github-actions bot added CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. labels Jul 7, 2021
@ajschmidt8 ajschmidt8 added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 7, 2021
@codecov
Copy link

codecov bot commented Jul 7, 2021

Codecov Report

Merging #8670 (f6fad95) into branch-21.08 (7721819) will not change coverage.
The diff coverage is n/a.

❗ Current head f6fad95 differs from pull request most recent head b06fbfd. Consider uploading reports for the commit b06fbfd to get more accurate results
Impacted file tree graph

@@              Coverage Diff              @@
##           branch-21.08    #8670   +/-   ##
=============================================
  Coverage         10.62%   10.62%           
=============================================
  Files               109      109           
  Lines             18289    18289           
=============================================
  Hits               1943     1943           
  Misses            16346    16346           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7721819...b06fbfd. Read the comment docs.

@ajschmidt8
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 73df850 into rapidsai:branch-21.08 Jul 8, 2021
rapids-bot bot pushed a commit that referenced this pull request Jul 21, 2021
#8670 added support for Python/Parquet support in CPM builds of libarrow. This PR fixes finding arrow in non-CPM builds.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Paul Taylor (https://github.com/trxcllnt)

URL: #8808
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants