-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky tests #439
Comments
After some investigation, I discovered a peculiar issue in the tests, and I believe it could be the cause of the failures in several flaky tests we've been experiencing. I'll try to explain it as clearly as possible. Given that the initial error was in testAlwaysServeStaticContent test:
To try to resolve this issue, I attempted to set a fixed port for WireMock so that all tests would use the same port. In reality, the dynamics I will explain later are most likely the same thing that happened here. When the test in testAlwaysServeStaticContent starts, I see from the logs that the listener has been configured as follows: However, from the error, we see that Carapace is trying to call a port we did not expect. Looking at the preceding test, I see that it configured the port that the test testAlwaysServeStaticContent is launching.
To better understand and resolve this issue, I tried to choose a free port and assign it to WireMock. After that, the issue described below occurred. But I would say it is the same problem.
I identified the problem by simultaneously running all the tests in the BasicStandardEndpointMapperTest class and analyzing the two tests, testServeACMEChallengeToken and testAlwaysServeStaticContent, where the latter is failing. As you can see from my screenshot, the testServeACMEChallengeToken is executed before the testAlwaysServeStaticContent. Quite often, the testAlwaysServeStaticContent fails with the following error:
In theory, this should not be possible since the route configuration specifies that requests matching .not-exists. should be directed to an action serving a static page.
To understand the flow, I changed the names of the routes so that they were different between the two tests. From the logs, it is evident that the requests originating from the testAlwaysServeStaticContent method reach the server configured in the testServeACMEChallengeToken class. In test testServeACMEChallengeToken i set:
and i added some logs in ProxyRequestsManager (method processRequest):
And when i run testAlwaysServeStaticContent the result is:
I cannot comprehend why this happens. Perhaps the HTTP server in testServeACMEChallengeToken has not closed properly when the testAlwaysServeStaticContent starts? However, theoretically, in that case, testAlwaysServeStaticContent should not initiate since the port is still occupied. Or maybe there is another reason that I am not understanding? |
For the issue encountered above, I found some information here reactor/reactor-netty#495 (comment) and reactor/reactor-netty@152476f. It seems that when disposeNow() is called, only inactive connections are closed, so active ones can keep the channel open. To handle this, you can follow the workaround suggested in the comments. In essence, it's necessary to ensure that all tasks are completed. I tried the fix in this PR #457, and it seems to be working. |
@NiccoMlt For the issue of flaky cache tests, I found something interesting here ben-manes/caffeine#568. It seems to be our issue. However, I'm having difficulty reproducing it; it's very random. |
Great find @hamadodene, we are getting closer to the solution! |
@NiccoMlt Occasionally, the test fails with the following error:
I added some logs to understand the dynamics. The final code with logs is as follows:
The result of the test is as follows:
As you can see, in the last iteration of the while loop, we have this log: Cache: FINE content res_3: removed due to max size exceeded The timing of the removal may vary. Therefore, depending on when the removal is done for size exceeded, the test may pass or fail. When the test passes, we observe something like the following, where you can see that the value is removed from the cache after the assertThat has been executed.
|
Again, great write-up! Inspecting the provided block, it seems interesting the last part of the CachedContent result = cache.get(e.key);
System.out.println("Result at this point is for " + e.key + " is_ "+ result);
assertThat(result, is(e.payload));
System.out.println(" > cache.memSize=" + cache.getMemSize());
totSize += e.getMemUsage();
// In the last insertion cache.getMemSize() will exceed the maximum
if (totSize <= maxSize) {
assertThat(cache.getMemSize(), equalTo(totSize));
} else {
ok = true;
} You say that in the last loop iteration, when we reach max size, sometimes the cache is cleaned up and sometimes it doesn't, making the assertion Couldn't we just move this assertion inside the if (totSize <= maxSize) {
assertThat(result, is(e.payload));
assertThat(cache.getMemSize(), equalTo(totSize));
} ... |
@NiccoMlt What you're saying makes sense. I will create a dedicated branch for fixing these flaky tests directly in the main repository so that you can contribute without having to use my fork. |
New flaky test discovered in CI run https://github.com/diennea/carapaceproxy/runs/25798362438 for #475 org.carapaceproxy.server.cache.CacheContentLengthLimitTest ► testWithContentLengthHeader:
|
Also org.carapaceproxy.server.cache.CacheExpireTest seems to be flaky:
|
Some flaky tests were found while implementing #437:
See https://github.com/diennea/carapaceproxy/actions/runs/5005242440/jobs/8969440561
But for example here https://github.com/NiccoMlt/carapaceproxy/actions/runs/5005216296/jobs/8969398591 I see different tests failing:
The text was updated successfully, but these errors were encountered: