Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create documentation page for Python SDK unrecoverable errors #28702

Merged
merged 10 commits into from
Sep 29, 2023
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
type: languages
title: "Unrecoverable Errors in Beam Python"
---
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Unrecoverable Errors in Beam Python

## What is an Unrecoverable Error?

An unrecoverable error is an issue at job start-up time that will
prevent a job from ever running successfully, usually due to some kind
of misconfiguration. Solving these issues when they occur is key to
successfully running a Beam Python pipeline.

## Common Unrecoverable Errors

### Job Submission/Runtime Python Version Mismatch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you plan to reference these errors in logs, let's add markdown anchors for better linkability, as titles might change but we can keep the same anchors, so links will be preserved.


If the Python version used for job submission does not match the
Python version used to build the worker container, the job will not
execute. Ensure that the Python version being used for job submission
and the container Python version match.

### PIP Dependency Resolution Failures

During worker start-up, dependencies are checked and installed in
jrmccluskey marked this conversation as resolved.
Show resolved Hide resolved
the worker container before accepting work. If there’s an issue during
this process (e.g. a dependency version cannot be found) the worker
jrmccluskey marked this conversation as resolved.
Show resolved Hide resolved
will restart and try again up to four times before outright failing.
jrmccluskey marked this conversation as resolved.
Show resolved Hide resolved
Ensure that dependency versions provided in your requirements.txt file
exist and can be installed locally before submitting jobs.

### Dependency Verision Mismatches
jrmccluskey marked this conversation as resolved.
Show resolved Hide resolved
jrmccluskey marked this conversation as resolved.
Show resolved Hide resolved

When additional dependencies like torch, transformers, etc are not
specified via requirements_file or preinstalled with a custom container
then the worker may go into a restart loop trying to install dependencies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mismatch happens after installation, when worker already started; at this point we won't attempt more installations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible suggestion:

When additional dependencies like torch, transformers, etc. are not
specified via requirements_file or preinstalled in a custom container
then the worker might fail to deserialize (unpickle) the user code. This can result in ModuleNotFound errors.

If dependencies are installed but their versions don't match the versions in submission environment, pipeline might have AttributeError messages.

again up to 4 times and finally fail. There is a debug log specifying `ModuleNotFoundError`.
A similar outcome is observed when there is a dependency mismatch that
often has `AttributeError` logged in debug mode. Ensure that the required
dependencies at runtime and in the submission environment are the same
along with their versions. For better visibility, debug logs are added
specifying the dependencies at both stages starting in Beam 2.52.0.
4 changes: 4 additions & 0 deletions website/www/site/content/en/documentation/sdks/python.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,7 @@ see [Machine Learning](/documentation/sdks/python-machine-learning).
## Python multi-language pipelines quickstart

Apache Beam lets you combine transforms written in any supported SDK language and use them in one multi-language pipeline. To learn how to create a multi-language pipeline using the Python SDK, see the [Python multi-language pipelines quickstart](/documentation/sdks/python-multi-language-pipelines).

## Unrecoverable Errors in Beam Python

Some common errors can occur during worker start-up and prevent jobs from starting. To learn about these errors and how to troubleshoot them in the Python SDK, see [Unrecoverable Errors in Beam Python](/documentation/sdks/python-unrecoverable-errors).
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
<li><a href="/documentation/sdks/python-machine-learning/">Machine Learning</a></li>
<li><a href="/documentation/sdks/python-pipeline-dependencies/">Managing pipeline dependencies</a></li>
<li><a href="/documentation/sdks/python-multi-language-pipelines/">Python multi-language pipelines quickstart</a></li>
<li><a href="/documentation/sdks/python-unrecoverable-errors/">Python Unrecoverable Errors</a></li>
</ul>
</li>

Expand Down