Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky tests #439

Open
NiccoMlt opened this issue May 17, 2023 · 10 comments
Open

Flaky tests #439

NiccoMlt opened this issue May 17, 2023 · 10 comments

Comments

@NiccoMlt
Copy link
Contributor

NiccoMlt commented May 17, 2023

Some flaky tests were found while implementing #437:

Report Passed Failed Skipped Time
carapace-server/target/surefire-reports/TEST-org.carapaceproxy.server.cache.CacheContentLengthLimitTest.xml 1✅ 1❌   22s
carapace-server/target/surefire-reports/TEST-org.carapaceproxy.server.cache.CacheExpireTest.xml 2✅ 3❌   33s
carapace-server/target/surefire-reports/TEST-org.carapaceproxy.server.cache.NotModifiedTest.xml   1❌   7s
carapace-server/target/surefire-reports/TEST-org.carapaceproxy.server.mapper.BasicStandardEndpointMapperTest.xml 6✅ 1❌   48s

See https://github.com/diennea/carapaceproxy/actions/runs/5005242440/jobs/8969440561

But for example here https://github.com/NiccoMlt/carapaceproxy/actions/runs/5005216296/jobs/8969398591 I see different tests failing:

Report Passed Failed Skipped Time
carapace-server/target/surefire-reports/TEST-org.carapaceproxy.server.cache.CacheContentLengthLimitTest.xml 1✅ 1❌   22s
carapace-server/target/surefire-reports/TEST-org.carapaceproxy.server.cache.CacheExpireTest.xml 4✅ 1❌   32s
carapace-server/target/surefire-reports/TEST-org.carapaceproxy.server.cache.CaffeineCacheImplTest.xml 2✅ 1❌   3s
carapace-server/target/surefire-reports/TEST-org.carapaceproxy.server.cache.NotModifiedTest.xml   1❌   7s
@hamadodene
Copy link
Contributor

hamadodene commented Feb 16, 2024

After some investigation, I discovered a peculiar issue in the tests, and I believe it could be the cause of the failures in several flaky tests we've been experiencing. I'll try to explain it as clearly as possible.

Given that the initial error was in testAlwaysServeStaticContent test:

AVVERTENZA: [aed4c0cc] Unknown channel option 'TCP_KEEPCOUNT' for channel '[id: 0xaed4c0cc]'
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: localhost/127.0.0.1:56379
Caused by: java.net.ConnectException: Connection refused: no further information
	at java.base/sun.nio.ch.Net.pollConnect(Native Method)
	at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
	at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946)

To try to resolve this issue, I attempted to set a fixed port for WireMock so that all tests would use the same port. In reality, the dynamics I will explain later are most likely the same thing that happened here.

When the test in testAlwaysServeStaticContent starts, I see from the logs that the listener has been configured as follows:
INFO: configured backend foo localhost:56.385 enabled:true

However, from the error, we see that Carapace is trying to call a port we did not expect. Looking at the preceding test, I see that it configured the port that the test testAlwaysServeStaticContent is launching.

INFORMAZIONI: configured backend foo localhost:56.379 enabled:true

To better understand and resolve this issue, I tried to choose a free port and assign it to WireMock. After that, the issue described below occurred. But I would say it is the same problem.


public class BasicStandardEndpointMapperTest {

    static int port;

    static {
        try {
            // Get a free port
            ServerSocket s = new ServerSocket(0);
            port = s.getLocalPort();
            s.close();

        } catch (IOException e) {
            // No OPS
        }
    }
    @Rule
    public WireMockRule backend1 = new WireMockRule(port);

I identified the problem by simultaneously running all the tests in the BasicStandardEndpointMapperTest class and analyzing the two tests, testServeACMEChallengeToken and testAlwaysServeStaticContent, where the latter is failing.

As you can see from my screenshot, the testServeACMEChallengeToken is executed before the testAlwaysServeStaticContent. Quite often, the testAlwaysServeStaticContent fails with the following error:

image


feb 16, 2024 3:51:54 PM org.carapaceproxy.core.Listeners bootListener
INFORMAZIONI: started listener at HostPort[host=0.0.0.0, port=1425]: reactor.netty.transport.ServerTransport$InetDisposableBind@402f8592
feb 16, 2024 3:51:54 PM org.carapaceproxy.server.cache.CacheByteBufMemoryUsageMetric start
INFORMAZIONI: Starting cache ByteBufAllocator usage, period: 5 seconds
request /not-exists.html
action match default-testServeAcme
feb 16, 2024 3:51:54 PM org.carapaceproxy.core.RequestsLogger logRequest
GRAVE: Request [2024-02-16 15:51:54.430] [GET localhost:1425 /not-exists.html] [uid:, sid:, ip:127.0.0.1] server=127.0.0.1, act=PROXY, route=default-testServeAcme, backend=localhost:49201. time t=0ms b=-1708095114430ms, protocol=HTTP/1.1 not logged to access log because RequestsLogger is closed
feb 16, 2024 3:51:54 PM reactor.util.Loggers$Slf4JLogger warn
AVVERTENZA: [d555f853] Unknown channel option 'TCP_KEEPCOUNT' for channel '[id: 0xd555f853]'
feb 16, 2024 3:51:54 PM reactor.util.Loggers$Slf4JLogger warn
AVVERTENZA: [d555f853] Unknown channel option 'TCP_KEEPINTERVAL' for channel '[id: 0xd555f853]'
feb 16, 2024 3:51:54 PM reactor.util.Loggers$Slf4JLogger warn
AVVERTENZA: [d555f853] Unknown channel option 'TCP_KEEPIDLE' for channel '[id: 0xd555f853]'
feb 16, 2024 3:51:54 PM com.github.tomakehurst.wiremock.common.Slf4jNotifier error
GRAVE: 
                                               Request was not matched
                                               =======================

-----------------------------------------------------------------------------------------------------------------------
| Closest stub                                             | Request                                                  |
-----------------------------------------------------------------------------------------------------------------------
                                                           |
GET                                                        | GET
/seconda.html                                              | /not-exists.html                                    <<<<< URL does not match
                                                           |
                                                           |
-----------------------------------------------------------------------------------------------------------------------

feb 16, 2024 3:51:59 PM org.eclipse.jetty.server.AbstractConnector doStop

In theory, this should not be possible since the route configuration specifies that requests matching .not-exists. should be directed to an action serving a static page.


public void testAlwaysServeStaticContent() throws Exception {

        stubFor(get(urlEqualTo("/seconda.html"))
                .willReturn(aResponse()
                        .withStatus(200)
                        .withHeader("Content-Type", "text/html")
                        .withBody("it <b>works</b> !!")));

        try (HttpProxyServer server = new HttpProxyServer(null, tmpDir.newFolder())) {

            {
                Properties configuration = new Properties();
                configuration.put("backend.1.id", "foo");
                configuration.put("backend.1.host", "localhost");
                configuration.put("backend.1.port", String.valueOf(port));
                configuration.put("backend.1.enabled", "true");

                configuration.put("director.1.id", "*");
                configuration.put("director.1.backends", "*");
                configuration.put("director.1.enabled", "true");

                configuration.put("listener.1.host", "0.0.0.0");
                configuration.put("listener.1.port", "1425");
                configuration.put("listener.1.ssl", "false");
                configuration.put("listener.1.enabled", "true");

                configuration.put("route.10.id", "default");
                configuration.put("route.10.enabled", "true");
                configuration.put("route.10.match", "all");
                configuration.put("route.10.action", "proxy-all");

                configuration.put("action.1.id", "serve-static");
                configuration.put("action.1.enabled", "true");
                configuration.put("action.1.type", "static");
                configuration.put("action.1.file", CLASSPATH_RESOURCE + "/test-static-page.html");
                configuration.put("action.1.code", "200");

                configuration.put("route.1.id", "static-page");
                configuration.put("route.1.enabled", "true");
                configuration.put("route.1.match", "request.uri ~ \".*index.*\"");
                configuration.put("route.1.action", "serve-static");

                configuration.put("action.2.id", "static-not-exists"); // file not exists
                configuration.put("action.2.enabled", "true");
                configuration.put("action.2.type", "static");
                configuration.put("action.2.file", CLASSPATH_RESOURCE + "/not-exists.html");
                configuration.put("action.2.code", "200");

                configuration.put("route.2.id", "static-not-exists");
                configuration.put("route.2.enabled", "true");
                configuration.put("route.2.match", "request.uri ~ \".*not-exists.*\"");
                configuration.put("route.2.action", "static-not-exists");

                PropertiesConfigurationStore config = new PropertiesConfigurationStore(configuration);
                server.configureAtBoot(config);
            }

            server.start();

To understand the flow, I changed the names of the routes so that they were different between the two tests. From the logs, it is evident that the requests originating from the testAlwaysServeStaticContent method reach the server configured in the testServeACMEChallengeToken class.

In test testServeACMEChallengeToken i set:

  configuration.put("route.10.id", "default-testServeAcme");
  configuration.put("route.10.enabled", "true");
  configuration.put("route.10.match", "all");
  configuration.put("route.10.action", "proxy-all");

and i added some logs in ProxyRequestsManager (method processRequest):

  public Publisher<Void> processRequest(ProxyRequest request) {
       request.setStartTs(System.currentTimeMillis());
       request.setLastActivity(request.getStartTs());

       parent.getFilters().forEach(filter -> filter.apply(request));

       System.out.println("request " + request.getUri());
       MapResult action = parent.getMapper().map(request);
       System.out.println("action match " + action.getRouteId());
       if (action == null) {
           LOGGER.log(Level.INFO, "Mapper returned NULL action for {0}", this);
           action = MapResult.internalError(MapResult.NO_ROUTE);
       }

And when i run testAlwaysServeStaticContent the result is:

NFORMAZIONI: Starting cache ByteBufAllocator usage, period: 5 seconds
request /not-exists.html
action match default-testServeAcme
feb 16, 2024 3:51:54 PM org.carapaceproxy.core.RequestsLogger logRequest
GRAVE: Request [2024-02-16 15:51:54.430] [GET localhost:1425 /not-exists.html] [uid:, sid:, ip:127.0.0.1] server=127.0.0.1, act=PROXY, route=default-testServeAcme, backend=localhost:49201. time t=0ms b=-1708095114430ms, protocol=HTTP/1.1 not logged to access log because RequestsLogger is closed
feb 16, 2024 3:51:54 PM reactor.util.Loggers$Slf4JLogger warn

I cannot comprehend why this happens. Perhaps the HTTP server in testServeACMEChallengeToken has not closed properly when the testAlwaysServeStaticContent starts? However, theoretically, in that case, testAlwaysServeStaticContent should not initiate since the port is still occupied.

Or maybe there is another reason that I am not understanding?

@hamadodene
Copy link
Contributor

hamadodene commented Feb 21, 2024

For the issue encountered above, I found some information here reactor/reactor-netty#495 (comment) and reactor/reactor-netty@152476f.

It seems that when disposeNow() is called, only inactive connections are closed, so active ones can keep the channel open. To handle this, you can follow the workaround suggested in the comments. In essence, it's necessary to ensure that all tasks are completed.

I tried the fix in this PR #457, and it seems to be working.

@hamadodene
Copy link
Contributor

@NiccoMlt For the issue of flaky cache tests, I found something interesting here ben-manes/caffeine#568.

It seems to be our issue. However, I'm having difficulty reproducing it; it's very random.

@NiccoMlt
Copy link
Contributor Author

Great find @hamadodene, we are getting closer to the solution!

@hamadodene
Copy link
Contributor

@NiccoMlt
I analyzed one of the flaky cache tests and found something interesting.
The test I examined is: org.carapaceproxy.server.cache.CaffeineCacheImplTest --> testMaxSize()

Occasionally, the test fails with the following error:

java.lang.AssertionError:
Expected: is <ContentPayload{chunks_n=0, creationTs=2024-02-25 22:37:46.54, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.636, size=100 (heap=50, direct=50)}>
     but: was null
Expected: is <ContentPayload{chunks_n=0, creationTs=2024-02-25 22:37:46.54, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.636, size=100 (heap=50, direct=50)}>
Actual: null

I added some logs to understand the dynamics. The final code with logs is as follows:


 @Test
    public void testMaxSize() throws Exception {
        final int maxSize = 1000;
        initializeCache(maxSize);

        long sleepBetweenPuts = 100;

        List<CacheEntry> entries = new ArrayList<>();
        int c = 0;
        long totSize = 0;
        boolean ok = false;
        while (!ok) {
            CacheEntry e = genCacheEntry("res_" + (c++), 100, 0);
            System.out.println("Adding element " + e.key.uri + " of size=" + e.getMemUsage());
            entries.add(e);
            System.out.println("Payload to add  for key " + e.key  + " is: " + e.payload);
            cache.put(e.key, e.payload);
            System.out.println("payload at this point is " + e.payload);
            CachedContent result = cache.get(e.key);
            System.out.println("Result at this point is for " + e.key + " is_  "+ result);
            assertThat(result, is(e.payload));
            System.out.println(" > cache.memSize=" + cache.getMemSize());
            totSize += e.getMemUsage();
            // In the last insertion cache.getMemSize() will exceed the maximum
            if (totSize <= maxSize) {
                assertThat(cache.getMemSize(), equalTo(totSize));
            } else {
                ok = true;
            }
            runEviction(cache, null);
            Thread.sleep(sleepBetweenPuts);
        }

        // Now we should have put in our cache much elements as it can contain + 1. The first one should be evicted.
        runEviction(cache, 1);
        assertThat(evictedResources.size(), is(1));

        // Doesn't work. It currently seems that the eviction process always evitcts the last entered key, which is
        // undoubtedly a serious issue
//        for (int i=0; i<entries.size(); i++) {
//            if (i == 0) {
//                assertThat(evictedResources.get(0), equalTo(entries.get(i)));
//                assertThat(evictedResources.get(0).removalCause, equalTo(RemovalCause.SIZE));
//            } else {
//                assertThat(cache.get(entries.get(i).key), equalTo(entries.get(i).payload));
//            }
//        }
//        entries.remove(0);
        CacheEntry evictedEntry = evictedResources.get(0);
        assertTrue(entries.remove(evictedEntry));

        assertThat((int) cache.getMemSize(), equalTo(entries.stream().mapToInt(e -> (int) e.getMemUsage()).sum()));
        assertThat((int) cache.getSize(), is(entries.size()));

        assertThat((int) stats.getDirectMemoryUsed(), is(entries.stream().mapToInt(e -> (int) e.payload.directSize).sum()));
        assertThat((int) stats.getHeapMemoryUsed(), is(entries.stream().mapToInt(e -> (int) e.payload.heapSize).sum()));
        assertThat((int) stats.getTotalMemoryUsed(), is(entries.stream().mapToInt(e -> (int) e.getMemUsage()).sum()));
    }

The result of the test is as follows:

Adding element res_0 of size=314
Payload to add  for key ContentKey{scheme=, method=, host=, uri=res_0} is: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:41.349, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.332, size=100 (heap=50, direct=50)}
Cache: FINE adding content res_0 payload: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:41.349, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.332, size=100 (heap=50, direct=50)}
payload at this point is ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:41.349, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.332, size=100 (heap=50, direct=50)}
Result at this point is for ContentKey{scheme=, method=, host=, uri=res_0} is_  ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:41.349, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.332, size=100 (heap=50, direct=50)}
 > cache.memSize=314
Adding element res_1 of size=314
Payload to add  for key ContentKey{scheme=, method=, host=, uri=res_1} is: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:44.482, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.434, size=100 (heap=50, direct=50)}
Cache: FINE adding content res_1 payload: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:44.482, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.434, size=100 (heap=50, direct=50)}
payload at this point is ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:44.482, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.434, size=100 (heap=50, direct=50)}
Result at this point is for ContentKey{scheme=, method=, host=, uri=res_1} is_  ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:44.482, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.434, size=100 (heap=50, direct=50)}
 > cache.memSize=628
Adding element res_2 of size=314
Payload to add  for key ContentKey{scheme=, method=, host=, uri=res_2} is: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:45.511, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.535, size=100 (heap=50, direct=50)}
Cache: FINE adding content res_2 payload: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:45.511, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.535, size=100 (heap=50, direct=50)}
payload at this point is ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:45.511, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.535, size=100 (heap=50, direct=50)}
Result at this point is for ContentKey{scheme=, method=, host=, uri=res_2} is_  ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:45.511, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.535, size=100 (heap=50, direct=50)}
 > cache.memSize=942
Adding element res_3 of size=314
Payload to add  for key ContentKey{scheme=, method=, host=, uri=res_3} is: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:46.54, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.636, size=100 (heap=50, direct=50)}
Cache: FINE adding content res_3 payload: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:46.54, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.636, size=100 (heap=50, direct=50)}
payload at this point is ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:46.54, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.636, size=100 (heap=50, direct=50)}
Cache: FINE content res_3: removed due to max size exceeded
Result at this point is for ContentKey{scheme=, method=, host=, uri=res_3} is_  null

java.lang.AssertionError: 
Expected: is <ContentPayload{chunks_n=0, creationTs=2024-02-25 22:37:46.54, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.636, size=100 (heap=50, direct=50)}>
     but: was null
Expected :is <ContentPayload{chunks_n=0, creationTs=2024-02-25 22:37:46.54, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.636, size=100 (heap=50, direct=50)}>
Actual   :null

As you can see, in the last iteration of the while loop, we have this log: Cache: FINE content res_3: removed due to max size exceeded
In essence, the problem is that the content is removed from the cache before the assertThat is executed. This is because the maximum size of the cache has been exceeded.

The timing of the removal may vary. Therefore, depending on when the removal is done for size exceeded, the test may pass or fail.

When the test passes, we observe something like the following, where you can see that the value is removed from the cache after the assertThat has been executed.


Adding element res_0 of size=314
Payload to add  for key ContentKey{scheme=, method=, host=, uri=res_0} is: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:38:33.007, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:58.848, size=100 (heap=50, direct=50)}
Cache: FINE adding content res_0 payload: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:38:33.007, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:58.848, size=100 (heap=50, direct=50)}
payload at this point is ContentPayload{chunks_n=2, creationTs=2024-02-25 22:38:33.007, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:58.848, size=100 (heap=50, direct=50)}
Result at this point is for ContentKey{scheme=, method=, host=, uri=res_0} is_  ContentPayload{chunks_n=2, creationTs=2024-02-25 22:38:33.007, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:58.848, size=100 (heap=50, direct=50)}
 > cache.memSize=314
Adding element res_1 of size=314
Payload to add  for key ContentKey{scheme=, method=, host=, uri=res_1} is: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:34:59.296, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.013, size=100 (heap=50, direct=50)}
Cache: FINE adding content res_1 payload: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:34:59.296, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.013, size=100 (heap=50, direct=50)}
payload at this point is ContentPayload{chunks_n=2, creationTs=2024-02-25 22:34:59.296, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.013, size=100 (heap=50, direct=50)}
Result at this point is for ContentKey{scheme=, method=, host=, uri=res_1} is_  ContentPayload{chunks_n=2, creationTs=2024-02-25 22:34:59.296, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.013, size=100 (heap=50, direct=50)}
 > cache.memSize=628
Adding element res_2 of size=314
Payload to add  for key ContentKey{scheme=, method=, host=, uri=res_2} is: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:35:00.604, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.115, size=100 (heap=50, direct=50)}
Cache: FINE adding content res_2 payload: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:35:00.604, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.115, size=100 (heap=50, direct=50)}
payload at this point is ContentPayload{chunks_n=2, creationTs=2024-02-25 22:35:00.604, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.115, size=100 (heap=50, direct=50)}
Result at this point is for ContentKey{scheme=, method=, host=, uri=res_2} is_  ContentPayload{chunks_n=2, creationTs=2024-02-25 22:35:00.604, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.115, size=100 (heap=50, direct=50)}
 > cache.memSize=942
Adding element res_3 of size=314
Payload to add  for key ContentKey{scheme=, method=, host=, uri=res_3} is: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:35:01.633, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.216, size=100 (heap=50, direct=50)}
Cache: FINE adding content res_3 payload: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:35:01.633, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.216, size=100 (heap=50, direct=50)}
payload at this point is ContentPayload{chunks_n=2, creationTs=2024-02-25 22:35:01.633, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.216, size=100 (heap=50, direct=50)}
Result at this point is for ContentKey{scheme=, method=, host=, uri=res_3} is_  ContentPayload{chunks_n=2, creationTs=2024-02-25 22:35:01.633, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.216, size=100 (heap=50, direct=50)}
 > cache.memSize=1256
Cache: FINE content res_3: removed due to max size exceeded

@NiccoMlt
Copy link
Contributor Author

Again, great write-up!

Inspecting the provided block, it seems interesting the last part of the while loop:

CachedContent result = cache.get(e.key);
System.out.println("Result at this point is for " + e.key + " is_  "+ result);
assertThat(result, is(e.payload));
System.out.println(" > cache.memSize=" + cache.getMemSize());
totSize += e.getMemUsage();
// In the last insertion cache.getMemSize() will exceed the maximum
if (totSize <= maxSize) {
    assertThat(cache.getMemSize(), equalTo(totSize));
} else {
    ok = true;
}

You say that in the last loop iteration, when we reach max size, sometimes the cache is cleaned up and sometimes it doesn't, making the assertion assertThat(result, is(e.payload)) fail because there is actually no payload resulting from the get, because the cache was cleaned.

Couldn't we just move this assertion inside the if branch? Something like:

if (totSize <= maxSize) {
    assertThat(result, is(e.payload));
    assertThat(cache.getMemSize(), equalTo(totSize));
} ...

@hamadodene
Copy link
Contributor

@NiccoMlt What you're saying makes sense. I will create a dedicated branch for fixing these flaky tests directly in the main repository so that you can contribute without having to use my fork.

@NiccoMlt
Copy link
Contributor Author

NiccoMlt commented Jun 5, 2024

New flaky test discovered in CI run https://github.com/diennea/carapaceproxy/runs/25798362438 for #475

org.carapaceproxy.server.cache.CacheContentLengthLimitTest ► testWithContentLengthHeader:

java.lang.AssertionError: 

Expected: is <1>
     but: was <0>
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8)
	at org.carapaceproxy.server.cache.CacheContentLengthLimitTest.requestAndTestCached(CacheContentLengthLimitTest.java:156)
	at org.carapaceproxy.server.cache.CacheContentLengthLimitTest.testFileSizeCache(CacheContentLengthLimitTest.java:112)
	at org.carapaceproxy.server.cache.CacheContentLengthLimitTest.testWithContentLengthHeader(CacheContentLengthLimitTest.java:65)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
	at com.github.tomakehurst.wiremock.junit.WireMockRule$1.evaluate(WireMockRule.java:67)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:316)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:240)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:214)
	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:155)
	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:385)
	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:162)
	at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:507)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:495)

@NiccoMlt
Copy link
Contributor Author

Also org.carapaceproxy.server.cache.CacheExpireTest seems to be flaky:

✅ testHandleExpiresMissingFromServer
❌ testHandleExpiresFromServer
	java.lang.AssertionError
❌ testExpireContentWithoutGet
	java.lang.AssertionError
❌ testDoNotCacheExpiredContent
	java.lang.AssertionError
❌ testExpireContentOnGet
	java.lang.AssertionError

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants