Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Python error just in importing apache-beam line within docker image #33321

Open
1 of 17 tasks
maneesh299 opened this issue Dec 8, 2024 · 4 comments
Open
1 of 17 tasks

Comments

@maneesh299
Copy link

What happened?

Error in just importing apache-beam
so below are my files under load folder
files under root

under test_dp_maneesh_loader i have main.py

from datetime import datetime

import apache_beam as beam


# pylint: disable=too-many-locals
def run(argv=None):
    print("hello world")


if __name__ == "__main__":
    run()

the dockerfile as below

FROM apache/beam_python3.9_sdk:2.61.0
ENV POETRY_VIRTUALENVS_CREATE false \
    POETRY_CACHE_DIR=/var/cache/pypoetry

LABEL maintainer="xyz <[email protected]>"

WORKDIR /load

COPY test_dp_maneesh_loader ./test_dp_maneesh_loader
COPY pyproject.toml poetry.lock ./
RUN python3 -m pip install --upgrade pip
RUN pip3 install "poetry==1.3.2"
RUN poetry install --only main

ENTRYPOINT ["/opt/apache/beam/boot"]

the pyproject.toml

[tool.poetry]
name = "test_issuer"
version = "0.1.0"
description = "testing a small issue"
authors = ["xyz here <[email protected]>"]

[tool.poetry.dependencies]
python = "~3.9"
apache-beam = {extras = ["gcp"], version = "^2.53.0"}
google-auth = "^2.31.0"
#numpy = ">=1.26.4, <2.0"

[tool.poetry.scripts]
load = "test_dp_maneesh_loader.main:run"


[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

now i build the docker image as

docker build -t myapp:1.0 .

and if i run the docker container with the image interactively in the container

docker run -it --entrypoint /bin/bash myapp:1.0

and inside container i do

python main.py

getting the error in the import statement as below

File "/load/test_dp_maneesh_loader/main.py", line 3, in <module>
    import apache_beam as beam
  File "/usr/local/lib/python3.9/site-packages/apache_beam/__init__.py", line 87, in <module>
    ....
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

i can resolve the issue by downgrading th numpy version to 1.26.4 by mentioning in the toml file. I saw several posts mentioning about the issue on numpy 2 and dependencies.

what is the reason for this error in the import statement itself? Is this the only way to resolve this issue(by downgrading numpy)?

note that i removed many unnecessary lines from the python file.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@liferoad
Copy link
Collaborator

liferoad commented Dec 8, 2024

https://numpy.org/news/#numpy-210-released: numpy 2.10 does not support Python 3.9. What numpy version does it install? Can you add the results of pip list?

@maneesh299
Copy link
Author

below are the results

`root@28522bcd545f:/load# pip list
Package Version


annotated-types 0.7.0
apache-beam 2.61.0
async-timeout 5.0.1
attrs 24.2.0
backports.tarfile 1.2.0
beautifulsoup4 4.12.3
bs4 0.0.2
build 1.2.2
CacheControl 0.12.14
cachetools 5.5.0
certifi 2024.8.30
cffi 1.17.1
charset-normalizer 3.4.0
cleo 2.1.0
click 8.1.7
cloudpickle 2.2.1
cramjam 2.8.4
crashtest 0.4.1
crcmod 1.7
cryptography 43.0.3
Cython 3.0.11
Deprecated 1.2.15
deprecation 2.1.0
dill 0.3.1.1
distlib 0.3.9
dnspython 2.7.0
docker 7.1.0
docopt 0.6.2
docstring_parser 0.16
dulwich 0.20.50
exceptiongroup 1.2.2
execnet 2.1.1
fastavro 1.9.7
fasteners 0.19
filelock 3.16.1
freezegun 1.5.1
future 1.0.0
google-api-core 2.23.0
google-api-python-client 2.147.0
google-apitools 0.5.31
google-auth 2.36.0
google-auth-httplib2 0.2.0
google-cloud-aiplatform 1.74.0
google-cloud-bigquery 3.27.0
google-cloud-bigquery-storage 2.27.0
google-cloud-bigtable 2.27.0
google-cloud-core 2.4.1
google-cloud-datastore 2.20.1
google-cloud-dlp 3.25.1
google-cloud-language 2.15.1
google-cloud-profiler 4.1.0
google-cloud-pubsub 2.27.1
google-cloud-pubsublite 1.11.1
google-cloud-recommendations-ai 0.10.14
google-cloud-resource-manager 1.13.1
google-cloud-spanner 3.51.0
google-cloud-storage 2.19.0
google-cloud-videointelligence 2.14.1
google-cloud-vision 3.8.1
google-crc32c 1.6.0
google-resumable-media 2.7.2
googleapis-common-protos 1.66.0
greenlet 3.1.1
grpc-google-iam-v1 0.13.1
grpc-interceptor 0.15.4
grpcio 1.65.5
grpcio-status 1.65.5
guppy3 3.1.4.post1
hdfs 2.7.3
html5lib 1.1
httplib2 0.22.0
hypothesis 6.112.3
idna 3.10
importlib_metadata 8.5.0
iniconfig 2.0.0
jaraco.classes 3.4.0
jaraco.context 6.0.1
jaraco.functools 4.1.0
jeepney 0.8.0
Jinja2 3.1.4
joblib 1.4.2
jsonpickle 3.4.2
jsonschema 4.23.0
jsonschema-specifications 2024.10.1
keyring 25.5.0
keyrings.google-artifactregistry-auth 1.1.2
lockfile 0.12.2
MarkupSafe 2.1.5
mmh3 5.0.1
mock 5.1.0
more-itertools 10.5.0
msgpack 1.1.0
nltk 3.9.1
nose 1.3.7
numpy 2.0.2
oauth2client 4.1.3
objsize 0.7.0
opentelemetry-api 1.28.2
opentelemetry-sdk 1.28.2
opentelemetry-semantic-conventions 0.49b2
orjson 3.10.12
overrides 7.7.0
packaging 24.1
pandas 2.1.4
parameterized 0.9.0
pexpect 4.9.0
pip 24.3.1
pkginfo 1.12.0
platformdirs 2.6.2
pluggy 1.5.0
poetry 1.3.2
poetry-core 1.4.0
poetry-plugin-export 1.3.1
proto-plus 1.25.0
protobuf 5.29.1
psycopg2-binary 2.9.9
ptyprocess 0.7.0
pyarrow 16.1.0
pyarrow-hotfix 0.6
pyasn1 0.6.1
pyasn1_modules 0.4.1
pycparser 2.22
pydantic 2.10.3
pydantic_core 2.27.1
pydot 1.4.2
PyHamcrest 2.1.0
pymongo 4.10.1
PyMySQL 1.1.1
pyparsing 3.2.0
pyproject_hooks 1.2.0
pytest 7.4.4
pytest-timeout 2.3.1
pytest-xdist 3.6.1
python-dateutil 2.9.0.post0
python-snappy 0.7.3
pytz 2024.2
PyYAML 6.0.2
RapidFuzz 3.10.1
redis 5.2.1
referencing 0.35.1
regex 2024.11.6
requests 2.32.3
requests-mock 1.12.1
requests-toolbelt 0.10.1
rpds-py 0.22.3
rsa 4.9
scikit-learn 1.5.2
scipy 1.13.1
SecretStorage 3.3.3
setuptools 75.6.0
shapely 2.0.6
shellingham 1.5.4
six 1.17.0
sortedcontainers 2.4.0
soupsieve 2.6
SQLAlchemy 2.0.35
sqlparse 0.5.2
tenacity 8.5.0
testcontainers 3.7.1
threadpoolctl 3.5.0
tomli 2.0.2
tomlkit 0.13.2
tqdm 4.66.5
trove-classifiers 2024.10.21.16
typing_extensions 4.12.2
tzdata 2024.2
uritemplate 4.1.1
urllib3 2.2.3
virtualenv 20.21.1
webencodings 0.5.1
wheel 0.45.0
wrapt 1.17.0
zipp 3.21.0
zstandard 0.23.0`

@liferoad
Copy link
Collaborator

liferoad commented Dec 9, 2024

with a clean Python 3.9.13 venv, when I tried to pip (24.3.1) install all your packages, I got this:

The conflict is caused by:
    The user requested numpy==2.0.2
    apache-beam 2.61.0 depends on numpy<2.2.0 and >=1.14.3
    pandas 2.1.4 depends on numpy<2 and >=1.22.4; python_version < "3.11"

Removed pandas version, got this

The conflict is caused by:
    The user requested urllib3==2.2.3
    docker 7.1.0 depends on urllib3>=1.26.0
    dulwich 0.20.50 depends on urllib3>=1.25
    poetry 1.3.2 depends on urllib3<2.0.0 and >=1.26.0

Removed urllib3 version, got this

The conflict is caused by:
    The user requested importlib_metadata==8.5.0
    build 1.2.2 depends on importlib-metadata>=4.6; python_full_version < "3.10.2"
    keyring 25.5.0 depends on importlib-metadata>=4.11.4; python_version < "3.12"
    opentelemetry-api 1.28.2 depends on importlib-metadata<=8.5.0 and >=6.0
    poetry 1.3.2 depends on importlib-metadata<5.0 and >=4.4; python_version < "3.10"

I stop here since importlib_metadata conflict cannot be resolved due to opentelemetry-api. I have no idea how you could install all these packages with these conflicts.

@liferoad
Copy link
Collaborator

liferoad commented Dec 9, 2024

I removed poetry since it is not used in the production env. This is what I have in my venv and beam seems working.
my_t.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants