-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The PostCommit Go VR Flink job is flaky #31122
Comments
The failing tests are
These tests were added in #31046 . Shall we fix the test or disable them on Flink VR test suites? @lostluck |
cc: @kennknowles another currently permared PostCommit on release-2.57.0 branch |
The error also sounds like #30994. The error message notes "Python transforms" which is not accurate either |
Agreed, that filtering is the right move here. Those tests do use strings which is one of the affected datatypes Flink's TestStream corrupts. At some point we determined that Flink does with TestStream and mutates those coders (adding length prefixes where they weren't previously) without making the equivalent mutations to the equivalent bytes, but we weren't able to pin down where it was coming from. This we filtered out those tests here: https://github.com/apache/beam/blob/master/sdks/go/test/integration/integration.go#L181 Ideally we fix the Flink test stream implementation, but until then, we filter, since it's not commonly used. |
Actually, "TestTestStreamSimple" and "TestTestStreamToGBK" should be working, so those are new failures. "TestTestStreamTimersEventTime" I'd expect to fail based on previous behaviour. |
All these 4 tests were added in #31046 and failing since first run. Or do you suggest the newly added test reveals some underlying bug/gap ? |
Ah! Right I recall now. So those were added because they did reveal a gap in Prism's test stream implementation. They're likely revealing one in Flink, so agreed they should be filtered. The simple ones are pipelines without any Impulse transform, so the runner's TestStream must be capable of kicking off the pipeline. |
yeah, thanks, I am trying to do that. However it seems not obvious how can I filter out a specific test for a specific runner in VR test suite. In Java this was done by excludeCategories https://github.com/apache/beam/blob/master/runners/flink/flink_runner.gradle#L298 (as gradle is built for java) |
We filter out those tests flink here: https://github.com/apache/beam/blob/master/sdks/go/test/integration/integration.go#L181 Each runner has its own list of tests it can't run. Wildcards or the raw test name can be used too. |
Ah, this is great news. I had not dug very deeply into the failures. The whole branch looked a mess so I assumed that our GHA just was broken in a big way. |
The PostCommit Go VR Flink is failing over 50% of the time.
Please visit https://github.com/apache/beam/actions/workflows/beam_PostCommit_Go_VR_Flink.yml?query=is%3Afailure+branch%3Amaster to see all failed workflow runs.
See also Grafana statistics: http://metrics.beam.apache.org/d/CTYdoxP4z/ga-post-commits-status?orgId=1&viewPanel=10&var-Workflow=PostCommit%20Go%20VR%20Flink
The text was updated successfully, but these errors were encountered: