Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use state sampler stub to defer metrics updates when DoFn#process is executed in subprocess. #32600

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

claudevdm
Copy link
Contributor


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@claudevdm claudevdm force-pushed the exception-with-timout-stub branch from 83968c6 to 576f7f9 Compare September 30, 2024 18:51
@claudevdm claudevdm force-pushed the exception-with-timout-stub branch from 576f7f9 to 7990130 Compare October 4, 2024 19:33
@claudevdm claudevdm changed the title Use state sampler stub to defer metrics updates when DoFn#process is executed in subprocess/thread. Use state sampler stub to defer metrics updates when DoFn#process is executed in subprocess. Oct 4, 2024
@claudevdm claudevdm force-pushed the exception-with-timout-stub branch from 7990130 to 03da33d Compare October 4, 2024 20:15

class StateSamplerInterface(ABC):
@abstractmethod
def update_metric(self, typed_metric_name, value):
Copy link
Contributor

@tvalentyn tvalentyn Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks strange to a reader unfamiliar with this change why we are abstracting only one particular method of StateSampler. Furthermore, we are reasoning about the logic how to accumulate the metrics in the stub, which might go out-of sync.

I wonder whether we can replace the statesampler reference in the subprocess with a unittest.mock.MagicMock, and then replay the calls returned by the_mock.mock_calls() on the real reference without introducing another class. As long as the mock doesn't need to return any values, this should work. @robertwb do you think this would be too hacky? Are there some concerns if the replay happens only after the DoFn.process() call finishes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting idea, however I'd rather avoid dependencies on test classes like MagicMock. (It may also have issues with the pickling required here). I do think if we're going to introduce an interface here, we should list all the relevant methods and have the stub explicitly raise an error for the unsupported ones to clarify that it's a partial implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robertwb do we need to apply the interface onto the cython implementation as well in this case? Does cython support inheriting from a python ABC?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @robertwb offline - cdef Cython classes cannot inherit from python classes, there is a limited support for multiple inheritance but comes with a potential performance hit, and this is performance sensitive code; we can leave the cython codepath alone and add a comment about the existence of the StateSampler interfaces defined in python codebase but not use them in Cythonized codepath.

sdks/python/apache_beam/transforms/core.py Outdated Show resolved Hide resolved

class StateSamplerInterface(ABC):
@abstractmethod
def update_metric(self, typed_metric_name, value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting idea, however I'd rather avoid dependencies on test classes like MagicMock. (It may also have issues with the pickling required here). I do think if we're going to introduce an interface here, we should list all the relevant methods and have the stub explicitly raise an error for the unsupported ones to clarify that it's a partial implementation.

@robertwb
Copy link
Contributor

Any updates on this?

@claudevdm
Copy link
Contributor Author

I will be able to pick this up again early next week. Is the consensus to go with the interface approach but list all the relevant method/raise an error if unimplemented?

@tvalentyn
Copy link
Contributor

Is the consensus to go with the interface approach but list all the relevant method/raise an error if unimplemented?

sgtm

@claudevdm claudevdm force-pushed the exception-with-timout-stub branch 3 times, most recently from d437e86 to 70b8066 Compare November 17, 2024 23:52
@claudevdm claudevdm force-pushed the exception-with-timout-stub branch from 70b8066 to 18a1414 Compare November 18, 2024 20:55
@claudevdm
Copy link
Contributor Author

I've updated with a more complete interface. Can you please take a look @robertwb ?

@claudevdm claudevdm marked this pull request as ready for review November 18, 2024 20:57
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @damccorm for label python.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Copy link
Contributor

Reminder, please take a look at this pr: @damccorm

Copy link
Contributor

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @liferoad for label python.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

Copy link
Contributor

github-actions bot commented Dec 7, 2024

Reminder, please take a look at this pr: @liferoad

Copy link
Contributor

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @damccorm for label python.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

@tvalentyn
Copy link
Contributor

Overall lgtm, PTAL at lint issues. Thanks!

 class StubStateSampler(StateSamplerInterface):
ERROR: /runner/_work/beam/beam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/apache_beam/transforms/core.py Imports are incorrectly sorted.
--- /runner/_work/beam/beam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/apache_beam/transforms/core.py:before	2024-11-18 21:13:46.912794
+++ /runner/_work/beam/beam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/apache_beam/transforms/core.py:after	2024-11-18 21:23:41.718949
@@ -21,6 +21,7 @@
 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants