Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Failing Test]: Some Python integration tests runs result in environment mismatch. #28653

Closed
1 of 15 tasks
tvalentyn opened this issue Sep 25, 2023 · 9 comments
Closed
1 of 15 tasks
Assignees
Labels
bug done & done Issue has been reviewed after it was closed for verification, followups, etc. failing test flake P1 python tests

Comments

@tvalentyn
Copy link
Contributor

What happened?

See failing runs on:

https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/11337/

Issue Failure

Failure: Test is flaky

Issue Priority

Priority: 1 (unhealthy code / failing or flaky postcommit so we cannot be sure the product is healthy)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@tvalentyn
Copy link
Contributor Author

Tentatively adding as a blocker until confirmed it's not affecting the release branch

@tvalentyn
Copy link
Contributor Author

09:38:53 RuntimeError: Pipeline construction environment and pipeline runtime environment are not compatible. If you use a custom container image, check that the Python interpreter minor version and the Apache Beam version in your image match the versions used at pipeline construction time. Submission environment: beam:version:sdk_base:apache/beam_python3.11_sdk:2.52.0.dev. Runtime environment: beam:version:sdk_base:apache/beam_python3.11_sdk:2.51.0.dev.

@tvalentyn
Copy link
Contributor Author

likely this will not affect the release branch, but something is misconfigured.

@tvalentyn tvalentyn changed the title [Failing Test]: beam_PostCommit_Py_VR_Dataflow is permared on master [Failing Test]: Some beam integration tests runs result in environment mismatch. Sep 25, 2023
@tvalentyn tvalentyn changed the title [Failing Test]: Some beam integration tests runs result in environment mismatch. [Failing Test]: Some Python integration tests runs result in environment mismatch. Sep 25, 2023
@tvalentyn
Copy link
Contributor Author

Seeing this in one job:

ERROR 2023-09-21T20:30:25.653406673Z Processing /var/opt/google/staged/apache_beam-2.52.0.dev0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
ERROR 2023-09-21T20:30:25.653431342Z ERROR: Wheel 'apache-beam' located at /var/opt/google/staged/apache_beam-2.52.0.dev0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl is invalid.
DEBUG 2023-09-21T20:30:25.653442932Z Could not install Apache Beam SDK from a wheel: exit status 1, proceeding to install SDK from source tarball.

...

@tvalentyn
Copy link
Contributor Author

tvalentyn commented Sep 25, 2023

I believe we currently don't stage tarballs in tests, and somehow the provided wheel is either not compatible or got corrupted during retrieval: #28605

@tvalentyn tvalentyn removed the flake label Sep 25, 2023
@tvalentyn
Copy link
Contributor Author

I think this is caused by #28605 .

@kennknowles
Copy link
Member

I am not totally following whether this could impact the release. Would we expect to be seeing red tests on the release branch? We did manage to get green Python tests today.

@tvalentyn
Copy link
Contributor Author

tvalentyn commented Sep 26, 2023

I don't attribute this issue to a regression in 2.51.0, but there may be flakiness in streaming test pipelines until this issue fixed or Dataflow runner rolls out a release (tentative ETA end of this week).

Longer story: Python integration tests are supposed to pass --sdk_location. Due to a race during installation, some workers fail to install the SDK and become incorrectly intialized. This would not happen to workers using so called sibling sdk container protocol. Users on released Beam sdk don't stage SDK at job submission so wouldn't see this particular failure mode.

I will remove this issue from 2.51.0 blocker lists for now.

@tvalentyn tvalentyn removed this from the 2.51.0 Release milestone Sep 26, 2023
@github-actions github-actions bot added this to the 2.53.0 Release milestone Dec 4, 2023
@tvalentyn
Copy link
Contributor Author

This would not happen to workers using so called sibling sdk container protocol.

All Dataflow python pipelines use sibling protocol now.

@tvalentyn tvalentyn added the done & done Issue has been reviewed after it was closed for verification, followups, etc. label Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug done & done Issue has been reviewed after it was closed for verification, followups, etc. failing test flake P1 python tests
Projects
None yet
Development

No branches or pull requests

2 participants