-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flakey OS X integration test #2428
Comments
Tracking more data here (and @envoyproxy/maintainers please continue as you see more flakes) https://pastebin.com/NJu7N5ri says Http2IntegrationTest took ~40s and the test timed out after 315 seconds so either some generic "bad expectation" or CookieRoutingNoCookieNoTtl failure. I think I recall other tests flaking so I think it's some bad expectation but it'd be nice to confirm. |
got that one to fail once (after a few thousand repetitions). It's stuck waiting in
|
Internally we have an injector for trickle-write tests one could (fairly) easily plug into general integration tests. I'm thinking of adding one to Envoy since I think the packetization has caused most of the macos behavioral differences and I'd be happy to work on debugging these if I could only repro. Semi-related, we were discussing internally if there was any way to set things up for macos and/or as we add other build systems so other folks could repro. Currently the only way is to send out a PR with debug logging which you tell folks to ignore, and open and close the issue a whole lot, which is clearly suboptimal :-P |
Well interesting. I decided to take the hacky approach and overwrote -int OwnedImpl::write(int fd) { return evbuffer_write(buffer_.get(), fd); } This causes a LOT of tests to fail. Sadly http2_integration_test is still pretty solid for me. How many runs on average does it fail for you? I can up my --runs_per_test if I'm not being aggressive enough. |
Another one with |
Another one in |
@alyssawilk to get the stack traces I ran the test directly from the command line in a loop, so I wasn't using runs_per_test. I tried to get it to reproduce today with bazel and runs per test and it's not as reliable as it was before. |
Every time I try to add extra logging to help me understand the problem, it disappears. I can try throwing dtrace at it. Maybe that will turn up something. |
@zuercher I hate to propose this, but what if on OSX we retry the bazel test command once if it fails? (Or maybe there is something built-in to bazel to retest). This is obviously not great, but would probably cause almost all of the flakes to go away. |
I think the |
Adds the --flaky_test_attempts flag for OS X CI builds to paper over the flaky tests noted in #2428. Test with "integration" in their names will be retried on failure and the CI will succeed if the retry succeeds. Risk Level: Low Testing: n/a Docs Changes: n/a Release Notes: n/a Signed-off-by: Stephan Zuercher <[email protected]>
Per @mattklein123's suggestion we now retry integration tests on OS X once if they fail to make the flakiness less disruptive. |
Adds the --flaky_test_attempts flag for OS X CI builds to paper over the flaky tests noted in envoyproxy#2428. Test with "integration" in their names will be retried on failure and the CI will succeed if the retry succeeds. Risk Level: Low Testing: n/a Docs Changes: n/a Release Notes: n/a Signed-off-by: Stephan Zuercher <[email protected]> Signed-off-by: Rama <[email protected]>
I'm closing this as it seems pretty stable w/ the retry. |
…nvoyproxy#2428) Signed-off-by: Douglas Reid <[email protected]>
Signed-off-by: GitHub Action <[email protected]> Signed-off-by: JP Simard <[email protected]>
Signed-off-by: GitHub Action <[email protected]> Signed-off-by: JP Simard <[email protected]>
Description:
test/integration:http2_integration_test is flakey on OS X.
I believe the exact test that hangs varies, so it's probably some timing issue in the test harness that OS X exposes as compared to Linux. I've not had any luck reproducing this outside of CI.
The text was updated successfully, but these errors were encountered: