Skip to content

Commit

Permalink
apacheGH-43299: [Release][Packaging] Only include pyarrow folder when…
Browse files Browse the repository at this point in the history
… finding packages on setuptools (apache#43325)

### Rationale for this change

Currently we include everything when building wheels, see:
```
$ pip install pyarrow
Collecting pyarrow
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl (39.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.9/39.9 MB 33.8 MB/s eta 0:00:00
Collecting numpy>=1.16.6
  Using cached numpy-2.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.3 MB)
Installing collected packages: numpy, pyarrow
Successfully installed numpy-2.0.0 pyarrow-17.0.0
(test-env)  $ ls test-env/lib/python3.10/site-packages/
benchmarks/                  distutils-precedence.pth     numpy-2.0.0.dist-info/       pip-22.0.2.dist-info/        pyarrow-17.0.0.dist-info/    setuptools-59.6.0.dist-info/ 
cmake_modules/               examples/                    numpy.libs/                  pkg_resources/               scripts/                     
_distutils_hack/             numpy/                       pip/                         pyarrow/                     setuptools/    
```

### What changes are included in this PR?

Use `include` as seen here: https://setuptools.pypa.io/en/latest/userguide/package_discovery.html#finding-simple-packages

### Are these changes tested?

Will check via the build wheel on CI

### Are there any user-facing changes?

No and yes :)
We will remove unnecessary files
* GitHub Issue: apache#43299

Lead-authored-by: Raúl Cumplido <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
  • Loading branch information
raulcd and pitrou authored Sep 5, 2024
1 parent 8113594 commit f545b90
Show file tree
Hide file tree
Showing 11 changed files with 66 additions and 23 deletions.
3 changes: 3 additions & 0 deletions ci/docker/python-wheel-manylinux.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,9 @@ RUN vcpkg install \
--x-feature=parquet \
--x-feature=s3

# Make sure auditwheel is up-to-date
RUN pipx upgrade auditwheel

# Configure Python for applications running in the bash shell of this Dockerfile
ARG python=3.8
ENV PYTHON_VERSION=${python}
Expand Down
1 change: 0 additions & 1 deletion ci/scripts/python_wheel_macos_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,6 @@ echo "=== (${PYTHON_VERSION}) Building wheel ==="
export PYARROW_BUILD_TYPE=${CMAKE_BUILD_TYPE}
export PYARROW_BUNDLE_ARROW_CPP=1
export PYARROW_CMAKE_GENERATOR=${CMAKE_GENERATOR}
export PYARROW_INSTALL_TESTS=1
export PYARROW_WITH_ACERO=${ARROW_ACERO}
export PYARROW_WITH_AZURE=${ARROW_AZURE}
export PYARROW_WITH_DATASET=${ARROW_DATASET}
Expand Down
3 changes: 1 addition & 2 deletions ci/scripts/python_wheel_manylinux_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,6 @@ echo "=== (${PYTHON_VERSION}) Building wheel ==="
export PYARROW_BUILD_TYPE=${CMAKE_BUILD_TYPE}
export PYARROW_BUNDLE_ARROW_CPP=1
export PYARROW_CMAKE_GENERATOR=${CMAKE_GENERATOR}
export PYARROW_INSTALL_TESTS=1
export PYARROW_WITH_ACERO=${ARROW_ACERO}
export PYARROW_WITH_AZURE=${ARROW_AZURE}
export PYARROW_WITH_DATASET=${ARROW_DATASET}
Expand Down Expand Up @@ -181,5 +180,5 @@ popd
rm -rf dist/temp-fix-wheel

echo "=== (${PYTHON_VERSION}) Tag the wheel with manylinux${MANYLINUX_VERSION} ==="
auditwheel repair -L . dist/pyarrow-*.whl -w repaired_wheels
auditwheel repair dist/pyarrow-*.whl -w repaired_wheels
popd
6 changes: 6 additions & 0 deletions ci/scripts/python_wheel_unix_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ source_dir=${1}
: ${ARROW_S3:=ON}
: ${ARROW_SUBSTRAIT:=ON}
: ${CHECK_IMPORTS:=ON}
: ${CHECK_WHEEL_CONTENT:=ON}
: ${CHECK_UNITTESTS:=ON}
: ${INSTALL_PYARROW:=ON}

Expand Down Expand Up @@ -87,6 +88,11 @@ import pyarrow.parquet
fi
fi

if [ "${CHECK_WHEEL_CONTENT}" == "ON" ]; then
python ${source_dir}/ci/scripts/python_wheel_validate_contents.py \
--path ${source_dir}/python/repaired_wheels
fi

if [ "${CHECK_UNITTESTS}" == "ON" ]; then
# Install testing dependencies
pip install -U -r ${source_dir}/python/requirements-wheel-test.txt
Expand Down
48 changes: 48 additions & 0 deletions ci/scripts/python_wheel_validate_contents.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

import argparse
from pathlib import Path
import re
import zipfile


def validate_wheel(path):
p = Path(path)
wheels = list(p.glob('*.whl'))
error_msg = f"{len(wheels)} wheels found but only 1 expected ({wheels})"
assert len(wheels) == 1, error_msg
f = zipfile.ZipFile(wheels[0])
outliers = [
info.filename for info in f.filelist if not re.match(
r'(pyarrow/|pyarrow-[-.\w\d]+\.dist-info/)', info.filename
)
]
assert not outliers, f"Unexpected contents in wheel: {sorted(outliers)}"
print(f"The wheel: {wheels[0]} seems valid.")


def main():
parser = argparse.ArgumentParser()
parser.add_argument("--path", type=str, required=True,
help="Directory where wheel is located")
args = parser.parse_args()
validate_wheel(args.path)


if __name__ == '__main__':
main()
1 change: 0 additions & 1 deletion ci/scripts/python_wheel_windows_build.bat
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,6 @@ echo "=== (%PYTHON_VERSION%) Building wheel ==="
set PYARROW_BUILD_TYPE=%CMAKE_BUILD_TYPE%
set PYARROW_BUNDLE_ARROW_CPP=ON
set PYARROW_CMAKE_GENERATOR=%CMAKE_GENERATOR%
set PYARROW_INSTALL_TESTS=ON
set PYARROW_WITH_ACERO=%ARROW_ACERO%
set PYARROW_WITH_DATASET=%ARROW_DATASET%
set PYARROW_WITH_FLIGHT=%ARROW_FLIGHT%
Expand Down
3 changes: 3 additions & 0 deletions ci/scripts/python_wheel_windows_test.bat
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,9 @@ set PYTHON_CMD=py -%PYTHON%
%PYTHON_CMD% -c "import pyarrow.parquet" || exit /B 1
%PYTHON_CMD% -c "import pyarrow.substrait" || exit /B 1

@REM Validate wheel contents
%PYTHON_CMD% C:\arrow\ci\scripts\python_wheel_validate_contents.py --path C:\arrow\python\dist || exit /B 1

@rem Download IANA Timezone Database for ORC C++
curl https://cygwin.osuosl.org/noarch/release/tzdata/tzdata-2024a-1.tar.xz --output tzdata.tar.xz || exit /B
mkdir %USERPROFILE%\Downloads\test\tzdata
Expand Down
2 changes: 2 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1144,6 +1144,7 @@ services:
<<: *common
CHECK_IMPORTS: "ON"
CHECK_UNITTESTS: "OFF"
CHECK_WHEEL_CONTENT: "ON"
command: /arrow/ci/scripts/python_wheel_unix_test.sh /arrow

python-wheel-manylinux-test-unittests:
Expand All @@ -1164,6 +1165,7 @@ services:
<<: *common
CHECK_IMPORTS: "OFF"
CHECK_UNITTESTS: "ON"
CHECK_WHEEL_CONTENT: "OFF"
command: /arrow/ci/scripts/python_wheel_unix_test.sh /arrow

python-wheel-windows-vs2019:
Expand Down
3 changes: 0 additions & 3 deletions docs/source/developers/python.rst
Original file line number Diff line number Diff line change
Expand Up @@ -632,9 +632,6 @@ PyArrow are:
* - ``PYARROW_BUNDLE_CYTHON_CPP``
- Bundle the C++ files generated by Cython
- ``0`` (``OFF``)
* - ``PYARROW_INSTALL_TESTS``
- Add the test to the python package
- ``1`` (``ON``)
* - ``PYARROW_BUILD_VERBOSE``
- Enable verbose output from Makefile builds
- ``0`` (``OFF``)
Expand Down
3 changes: 2 additions & 1 deletion python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,8 @@ zip-safe=false
include-package-data=true

[tool.setuptools.packages.find]
where = ["."]
include = ["pyarrow"]
namespaces = false

[tool.setuptools.package-data]
pyarrow = ["*.pxd", "*.pyx", "includes/*.pxd"]
Expand Down
16 changes: 1 addition & 15 deletions python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
# Get correct EXT_SUFFIX on Windows (https://bugs.python.org/issue39825)
from distutils import sysconfig

from setuptools import setup, Extension, Distribution, find_namespace_packages
from setuptools import setup, Extension, Distribution

from Cython.Distutils import build_ext as _build_ext
import Cython
Expand Down Expand Up @@ -396,21 +396,7 @@ def has_ext_modules(foo):
return True


if strtobool(os.environ.get('PYARROW_INSTALL_TESTS', '1')):
packages = find_namespace_packages(include=['pyarrow*'])
exclude_package_data = {}
else:
packages = find_namespace_packages(include=['pyarrow*'],
exclude=["pyarrow.tests*"])
# setuptools adds back importable packages even when excluded.
# https://github.com/pypa/setuptools/issues/3260
# https://github.com/pypa/setuptools/issues/3340#issuecomment-1219383976
exclude_package_data = {"pyarrow": ["tests*"]}


setup(
packages=packages,
exclude_package_data=exclude_package_data,
distclass=BinaryDistribution,
# Dummy extension to trigger build_ext
ext_modules=[Extension('__dummy__', sources=[])],
Expand Down

0 comments on commit f545b90

Please sign in to comment.