-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-13202] Add Coder to CountIfFn.Accum #16856
[BEAM-13202] Add Coder to CountIfFn.Accum #16856
Conversation
1d19ae7
to
1105c34
Compare
long countIfResult = 0L; | ||
@AutoValue | ||
public abstract static class Accum implements Serializable { | ||
abstract boolean isExpressionFalse(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello! If I'm reading this correctly and Accum is not used for other reasons outside of the class, this is strictly tied to whether countIfResult is zero... This might be an opportunity to simplify the code!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you have a point actually CountIf is a specific case of Count I am going to reuse Count's logic then. Thanks for pointing this out!
return new AutoValue_CountIf_CountIfFn_Accum(isExpressionFalse, countIfResult); | ||
} | ||
} | ||
public static class CountIfFn extends Combine.CombineFn<Boolean, long[], Long> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup! At this point, would it be worthwhile to let Count.CountFn be public, so we could just inherit it? I don't really have a strong opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO using it via composition like this is preferable anyhow. You can see it is more flexible, since you can achieve all the same things without needing it to be public.
@@ -27,49 +30,41 @@ | |||
public class CountIf { | |||
private CountIf() {} | |||
|
|||
public static CountIfFn combineFn() { | |||
return new CountIf.CountIfFn(); | |||
public static Combine.CombineFn<Boolean, ?, Long> combineFn() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose technically this is a breaking change. But of course everything implementing SQL is not intended for users. Is sql/impl
marked @Internal
? (this change still LGTM because it is not actually intended as a user API)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I hesitated but this is internal and this class is only instantiated by SQL when it registers the built-in aggregators so we should be good. sql/impl is not explicitly marked as Internal but I agree that it should.
return new AutoValue_CountIf_CountIfFn_Accum(isExpressionFalse, countIfResult); | ||
} | ||
} | ||
public static class CountIfFn extends Combine.CombineFn<Boolean, long[], Long> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO using it via composition like this is preferable anyhow. You can see it is more flexible, since you can achieve all the same things without needing it to be public.
} | ||
|
||
@Override | ||
public Accum addInput(Accum accum, Boolean input) { | ||
if (input) { | ||
accum.isExpressionFalse = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never looked at this code before. Now that I see it... why was this field ever needed? Seems like the 0L
result contains all the useful info. Makes me worried there is something tricky that I am not noticing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was the same conclusion we arrived too with Ryan, that's why I went into the simplification road. I added the tests to try to find issues and fixed the ones I saw.
return 0L; | ||
public Coder<long[]> getAccumulatorCoder(CoderRegistry registry, Coder<Boolean> inputCoder) | ||
throws CannotProvideCoderException { | ||
return countFn.getAccumulatorCoder(registry, inputCoder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically changing coders here would break in-place update. But SQL really just cannot be relied on for that, since the optimizer might change. So I am just noting that I explicitly say it is OK to break in-place update here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically we did not have a specified Coder
so it was breaking when running on a distributed system as the JIRA ticket reported so backwards compatibility seems less of an issue ;)
public static class CountIfFn extends Combine.CombineFn<Boolean, CountIfFn.Accum, Long> { | ||
public static class CountIfFn extends Combine.CombineFn<Boolean, long[], Long> { | ||
private final Combine.CombineFn<Boolean, long[], Long> countFn = | ||
(Combine.CombineFn<Boolean, long[], Long>) Count.<Boolean>combineFn(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cast makes me think that Count.combineFn()
worked too hard to hide the accumulator type. It should just expose it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I hesitated too make it public and make the CountIfFn
inherit it and change just to override the addInput method, what let me puzzled was how to make CompineFn
composable so I could just 'filter' and then apply CountFn
PTAL or assign to someone who can, thanks!
R: @kennknowles
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
ValidatesRunner
compliance status (on master branch)Examples testing status on various runners
Post-Commit SDK/Transform Integration Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.