Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create documentation page for Python SDK unrecoverable errors #28702

Merged
merged 10 commits into from
Sep 29, 2023

Conversation

jrmccluskey
Copy link
Contributor

@jrmccluskey jrmccluskey commented Sep 27, 2023

Add a site page to cover what we consider unrecoverable errors as part of the effort to reduce detection times.

Staged at https://apache-beam-website-pull-requests.storage.googleapis.com/28702/documentation/sdks/python-unrecoverable-errors/index.html


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@jrmccluskey
Copy link
Contributor Author

R: @tvalentyn @riteshghorse

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@jrmccluskey jrmccluskey changed the title [WIP] Create documentation page for Python SDK unrecoverable errors Create documentation page for Python SDK unrecoverable errors Sep 28, 2023
Copy link
Contributor

@riteshghorse riteshghorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!


When additional dependencies like torch, transformers, etc are not
specified via requirements_file or preinstalled with a custom container
then the worker may go into a restart loop trying to install dependencies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mismatch happens after installation, when worker already started; at this point we won't attempt more installations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible suggestion:

When additional dependencies like torch, transformers, etc. are not
specified via requirements_file or preinstalled in a custom container
then the worker might fail to deserialize (unpickle) the user code. This can result in ModuleNotFound errors.

If dependencies are installed but their versions don't match the versions in submission environment, pipeline might have AttributeError messages.

@jrmccluskey
Copy link
Contributor Author

@tvalentyn ready for a second pass


## Common Unrecoverable Errors

### Job Submission/Runtime Python Version Mismatch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you plan to reference these errors in logs, let's add markdown anchors for better linkability, as titles might change but we can keep the same anchors, so links will be preserved.


When additional dependencies like torch, transformers, etc are not
specified via requirements_file or preinstalled with a custom container
then the worker may go into a restart loop trying to install dependencies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible suggestion:

When additional dependencies like torch, transformers, etc. are not
specified via requirements_file or preinstalled in a custom container
then the worker might fail to deserialize (unpickle) the user code. This can result in ModuleNotFound errors.

If dependencies are installed but their versions don't match the versions in submission environment, pipeline might have AttributeError messages.

@jrmccluskey jrmccluskey merged commit c5b75ed into apache:master Sep 29, 2023
6 checks passed
@jrmccluskey jrmccluskey deleted the unrecoverableDocs branch September 29, 2023 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants