Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support new versions of pyarrow in apache-beam #28410

Closed
1 of 15 tasks
vinodk99 opened this issue Sep 12, 2023 · 10 comments · Fixed by #29536
Closed
1 of 15 tasks

Support new versions of pyarrow in apache-beam #28410

vinodk99 opened this issue Sep 12, 2023 · 10 comments · Fixed by #29536
Assignees
Labels
bug done & done Issue has been reviewed after it was closed for verification, followups, etc. P1 python

Comments

@vinodk99
Copy link

vinodk99 commented Sep 12, 2023

What happened?

Facing pyarrow issue on PPC4LE while executing below commands:

tox -c tox.ini run -e py311

` -- Found Python3: /beam/sdks/python/target/.tox/py311/bin/python (found version "3.11.2") found components: Interpreter Development.Module NumPy
-- Found Python3Alt: /beam/sdks/python/target/.tox/py311/bin/python
CMake Error at CMakeLists.txt:268 (find_package):
By not providing "FindArrow.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "Arrow", but
CMake did not find one.

    Could not find a package configuration file provided by "Arrow" with any of
    the following names:

      ArrowConfig.cmake
      arrow-config.cmake

    Add the installation prefix of "Arrow" to CMAKE_PREFIX_PATH or set
    "Arrow_DIR" to a directory containing one of the above files.  If "Arrow"
    provides a separate development package or SDK, be sure it has been
    installed.


  -- Configuring incomplete, errors occurred!
  See also "/beam/sdks/python/target/.tox/py311/tmp/pip-install-9464v7kv/pyarrow_ef7786cb8dcf4c3a84231f093d6c09c1/build/temp.linux-ppc64le-cpython-311/CMakeFiles/CMakeOutput.log".
  error: command '/usr/bin/cmake' failed with exit code 1
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for pyarrow
ERROR: Could not build wheels for cryptography, pyarrow, which is required to install pyproject.toml-based projects

py311: exit 1 (34.57 seconds) /beam/sdks/python> target/.tox/py311/bin/python /beam/sdks/python/target/.tox/py311/bin/pip install --retries 10 --pre 'cloudpickle~=2.2.1' 'crcmod<2.0,>=1.7' 'cryptography>=41.0.2' 'dill<0.3.2,>=0.3.1.1' 'fastavro<2,>=0.23.6' 'fasteners<1.0,>=0.3' 'freezegun>=0.3.12' 'grpcio!=1.48.0,<2,>=1.33.1' 'hdfs<3.0.0,>=2.1.0' 'httplib2<0.23.0,>=0.8' 'hypothesis<=7.0.0,>5.0.0' 'joblib>=1.0.1' 'mock<6.0.0,>=1.0.1' 'numpy<1.25.0,>=1.14.3' 'objsize<0.7.0,>=0.6.1' 'orjson<3.9.3' 'packaging>=22.0' 'pandas!=1.5.0,!=1.5.1,<1.6,>=1.4.3; python_version >= "3.8"' 'pandas<2.0.0' 'parameterized<0.10.0,>=0.7.1' 'proto-plus<2,>=1.7.1' 'protobuf<4.24.0,>=3.20.3' 'psycopg2-binary<3.0.0,>=2.8.5' 'pyarrow<12.0.0,>=3.0.0' 'pydot<2,>=1.2.0' 'pyhamcrest!=1.10.0,<3.0.0,>=1.9' 'pymongo<5.0.0,>=3.8.0' 'pytest-timeout<3,>=2.1.0' 'pytest-xdist<4,>=2.5.0' 'pytest<8.0,>=7.1.2' 'python-dateutil<3,>=2.8.0' 'pytz>=2018.3' 'pyyaml<7.0.0,>=3.12' 'regex>=2020.6.8' 'requests-mock<2.0,>=1.7' 'requests<3.0.0,>=2.24.0' 'scikit-learn>=0.20.0' 'sqlalchemy<2.0,>=1.3' 'tenacity<9,>=8.0.0' 'testcontainers[mysql]<4.0.0,>=3.0.3' 'typing-extensions>=3.7.0' 'zstandard<1,>=0.18.0' pid=60536
.pkg: _exit> python /usr/local/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta legacy
py311: FAIL code 1 (37.57 seconds)
evaluation failed :( (37.77 seconds)
`

Issue Priority

Priority: 3 (minor)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@tvalentyn
Copy link
Contributor

it is likely that pyarrow 11 is not available on your platform. can you pip install pyarrow==11 in a clean environment?

@tvalentyn
Copy link
Contributor

if that doesn't work, please try newer versions (12, 13), and let us know if those work. It may be time to add support for those versions in beam.

@vinodk99
Copy link
Author

if that doesn't work, please try newer versions (12, 13), and let us know if those work. It may be time to add support for those versions in beam.

version(12,13 are working)

@tvalentyn
Copy link
Contributor

tvalentyn commented Sep 15, 2023

thanks for checking, i'll repurpose this issue. i took a look at upgrading pyarrow in #28437 , looks like some tests were failing on windows - would you be interested in taking a closer look and chasing down what's not working there?

@tvalentyn tvalentyn changed the title Facing pyarrow issue on PPC4LE. Support new versions of pyarrow in apache-beam Sep 15, 2023
@tvalentyn
Copy link
Contributor

If only windows is a problem, we could skip the updates for windows and make the issues are known and tracked on pyarrow side.

@vinodk99
Copy link
Author

vinodk99 commented Oct 6, 2023

Pyarrow is working fine after updating pyarrow==12.0.0 on PPC4LE but continuing the tests are failing on autocomplete_test.py

tox -c tox.ini run -e py311
.pkg: _optional_hooks> python /usr/local/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta legacy
.pkg: get_requires_for_build_sdist> python /usr/local/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta legacy
.pkg: get_requires_for_build_wheel> python /usr/local/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta legacy
.pkg: prepare_metadata_for_build_wheel> python /usr/local/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta legacy
.pkg: build_sdist> python /usr/local/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta legacy
py311: install_package> target/.tox/py311/bin/python /beam/sdks/python/target/.tox/py311/bin/pip install --retries 10 --pre --force-reinstall --no-deps /beam/sdks/python/target/.tox/.tmp/package/12/apache-beam-2.50.0.tar.gz
py311: commands_pre[0]> python --version
Python 3.11.2
py311: commands_pre[1]> pip --version
pip 23.2.1 from /beam/sdks/python/target/.tox/py311/lib/python3.11/site-packages/pip (python 3.11)
py311: commands_pre[2]> pip check
No broken requirements found.
py311: commands_pre[3]> bash /beam/sdks/python/scripts/run_tox_cleanup.sh
py311: commands[0]> python apache_beam/examples/complete/autocomplete_test.py
Traceback (most recent call last):
File "/beam/sdks/python/apache_beam/examples/complete/autocomplete_test.py", line 26, in
import apache_beam as beam
File "/beam/sdks/python/target/.tox/py311/lib/python3.11/site-packages/apache_beam/init.py", line 88, in
from apache_beam import io
File "/beam/sdks/python/target/.tox/py311/lib/python3.11/site-packages/apache_beam/io/init.py", line 28, in
from apache_beam.io.parquetio import *
File "/beam/sdks/python/target/.tox/py311/lib/python3.11/site-packages/apache_beam/io/parquetio.py", line 679, in
class _ParquetSink(filebasedsink.FileBasedSink):
File "/beam/sdks/python/target/.tox/py311/lib/python3.11/site-packages/apache_beam/io/parquetio.py", line 733, in _ParquetSink
def write_record(self, writer, table: pa.Table):
^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'Table'
py311: exit 1 (1.34 seconds) /beam/sdks/python> python apache_beam/examples/complete/autocomplete_test.py pid=39206
py311: commands_post[0]> bash /beam/sdks/python/scripts/run_tox_cleanup.sh
.pkg: _exit> python /usr/local/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta legacy
py311: FAIL code 1 (14.02=setup[10.76]+cmd[0.00,0.40,1.46,0.03,1.34,0.03] seconds)
evaluation failed :( (14.23 seconds)
[root@1aa720a07127 python]#

@github-actions github-actions bot added this to the 2.52.0 Release milestone Oct 20, 2023
@tvalentyn tvalentyn reopened this Oct 20, 2023
@tvalentyn
Copy link
Contributor

I think we still don't support new versions of Pyarrow as per https://github.com/apache/beam/blob/e7a6405800a83dd16437b8b1b372e020e010a042/sdks/python/setup.py , so keeping this open.

@ff-sdesai
Copy link

When is Apache beam 2.53.0 going to get released? Do we have a fixed date?

@tvalentyn
Copy link
Contributor

We cut a release branch every 6 weeks on schedule. releases happen a few weeks after the cut. Next cut is Dec 13, 2023.

@tvalentyn
Copy link
Contributor

@AnandInguva can we close this now?

@AnandInguva AnandInguva linked a pull request Dec 6, 2023 that will close this issue
3 tasks
@github-actions github-actions bot added this to the 2.53.0 Release milestone Dec 6, 2023
@tvalentyn tvalentyn added the done & done Issue has been reviewed after it was closed for verification, followups, etc. label Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug done & done Issue has been reviewed after it was closed for verification, followups, etc. P1 python
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants