Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-13202] Add Coder to CountIfFn.Accum #16856

Merged
merged 3 commits into from
Feb 16, 2022

Conversation

iemejia
Copy link
Member

@iemejia iemejia commented Feb 15, 2022

PTAL or assign to someone who can, thanks!

R: @kennknowles


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

ValidatesRunner compliance status (on master branch)

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- Build Status Build Status Build Status Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Python --- Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status ---
XLang Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status ---

Examples testing status on various runners

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- --- --- --- --- --- ---
Java --- Build Status
Build Status
Build Status
--- --- --- --- ---
Python --- --- --- --- --- --- ---
XLang --- --- --- --- --- --- ---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go Java Python
Build Status Build Status Build Status
Build Status
Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@iemejia iemejia force-pushed the BEAM-13202-countif-accum-coder branch from 1d19ae7 to 1105c34 Compare February 15, 2022 14:45
long countIfResult = 0L;
@AutoValue
public abstract static class Accum implements Serializable {
abstract boolean isExpressionFalse();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! If I'm reading this correctly and Accum is not used for other reasons outside of the class, this is strictly tied to whether countIfResult is zero... This might be an opportunity to simplify the code!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you have a point actually CountIf is a specific case of Count I am going to reuse Count's logic then. Thanks for pointing this out!

return new AutoValue_CountIf_CountIfFn_Accum(isExpressionFalse, countIfResult);
}
}
public static class CountIfFn extends Combine.CombineFn<Boolean, long[], Long> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup! At this point, would it be worthwhile to let Count.CountFn be public, so we could just inherit it? I don't really have a strong opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO using it via composition like this is preferable anyhow. You can see it is more flexible, since you can achieve all the same things without needing it to be public.

@@ -27,49 +30,41 @@
public class CountIf {
private CountIf() {}

public static CountIfFn combineFn() {
return new CountIf.CountIfFn();
public static Combine.CombineFn<Boolean, ?, Long> combineFn() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose technically this is a breaking change. But of course everything implementing SQL is not intended for users. Is sql/impl marked @Internal? (this change still LGTM because it is not actually intended as a user API)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I hesitated but this is internal and this class is only instantiated by SQL when it registers the built-in aggregators so we should be good. sql/impl is not explicitly marked as Internal but I agree that it should.

return new AutoValue_CountIf_CountIfFn_Accum(isExpressionFalse, countIfResult);
}
}
public static class CountIfFn extends Combine.CombineFn<Boolean, long[], Long> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO using it via composition like this is preferable anyhow. You can see it is more flexible, since you can achieve all the same things without needing it to be public.

}

@Override
public Accum addInput(Accum accum, Boolean input) {
if (input) {
accum.isExpressionFalse = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never looked at this code before. Now that I see it... why was this field ever needed? Seems like the 0L result contains all the useful info. Makes me worried there is something tricky that I am not noticing...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was the same conclusion we arrived too with Ryan, that's why I went into the simplification road. I added the tests to try to find issues and fixed the ones I saw.

return 0L;
public Coder<long[]> getAccumulatorCoder(CoderRegistry registry, Coder<Boolean> inputCoder)
throws CannotProvideCoderException {
return countFn.getAccumulatorCoder(registry, inputCoder);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically changing coders here would break in-place update. But SQL really just cannot be relied on for that, since the optimizer might change. So I am just noting that I explicitly say it is OK to break in-place update here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we did not have a specified Coder so it was breaking when running on a distributed system as the JIRA ticket reported so backwards compatibility seems less of an issue ;)

public static class CountIfFn extends Combine.CombineFn<Boolean, CountIfFn.Accum, Long> {
public static class CountIfFn extends Combine.CombineFn<Boolean, long[], Long> {
private final Combine.CombineFn<Boolean, long[], Long> countFn =
(Combine.CombineFn<Boolean, long[], Long>) Count.<Boolean>combineFn();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cast makes me think that Count.combineFn() worked too hard to hide the accumulator type. It should just expose it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I hesitated too make it public and make the CountIfFn inherit it and change just to override the addInput method, what let me puzzled was how to make CompineFn composable so I could just 'filter' and then apply CountFn

@iemejia iemejia merged commit 6e98dd4 into apache:master Feb 16, 2022
@iemejia iemejia deleted the BEAM-13202-countif-accum-coder branch February 16, 2022 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants