Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert three commits related to supporting custom coder in reshuffle #33414

Merged
merged 1 commit into from
Dec 18, 2024

Conversation

shunping
Copy link
Contributor

It is causing some internal test failure so we revert it for now.

- Fix custom coder not being used in Reshuffle (global window) (apache#33339)
- Fix custom coders not being used in Reshuffle (non global window) apache#33363
- Add missing to_type_hint to WindowedValueCoder apache#33403
@chamikaramj
Copy link
Contributor

LGTM. Thanks.

Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@shunping
Copy link
Contributor Author

Failed tests are unrelated to the changes.

@chamikaramj chamikaramj merged commit e9424b9 into apache:master Dec 18, 2024
85 of 91 checks passed
@robertwb
Copy link
Contributor

Just a thought, as this changes coders in some cases, should this be guarded by the update compatibility flag? https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options.py#L592

@shunping
Copy link
Contributor Author

shunping commented Dec 19, 2024

As the flag is defined in "StreamingOptions", is it previously designed for using in streaming case?

@kennknowles
Copy link
Member

As the flag is defined in "StreamingOptions", is it previously designed for using in streaming case?

Yes, it is designed for "streaming update" where you may have an in-progress aggregation in a shuffle when you do a pipeline update. Then you need the state to be compatible.

@robertwb
Copy link
Contributor

And by "state" here this includes the in-flight encoded elements that were written by the pre-udpate version of the pipeline and will be read by the post-update code.

Irrelevant for batch pipelines, but may become so if a runner supports some kind of a resume (from pause or failure) where the code might be updated.

@shunping
Copy link
Contributor Author

I see. Thank you both for the clarification!

Regarding the possibly breaking changes that could be introduced by reverting this reverted PR, shall we add a new pipeline option rather than overloading this existing flag?

Something like "use_legacy_reshuffle" can allow users to switch back to the previous reshuffle code path, where basically FastPrimitivesCoder are used inside regardless of coders/typehints specified by cx.

@robertwb
Copy link
Contributor

I don't think we want to introduce a new flag. The point of the update_compatibility_version is so that we don't have to make a new option (that both we have to handle and our users have to know about) for every update incompatible change, all you need to know is what version you used to originally launch your pipeline.

@shunping
Copy link
Contributor Author

shunping commented Dec 21, 2024

I don't think we want to introduce a new flag. The point of the update_compatibility_version is so that we don't have to make a new option (that both we have to handle and our users have to know about) for every update incompatible change, all you need to know is what version you used to originally launch your pipeline.

I am fine with using a flag like that to avoid adding more options, as I don't like too many options to remember too. However, I cannot deny that both the naming and where it is defined are a little bit confusing to me.

We are somehow overloading "update" to both streaming and batch in this context. For batch, cx may only want to "create" a pipeline with existed code that works as before. There is no "update" on the pipeline from their perspective, only an update of Beam version. :)

@robertwb
Copy link
Contributor

For a batch pipeline, setting this flag is a workaround, and they should fix their type hints. (We should make that clear in the docs.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants