Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky integration tests fix #343

Merged
merged 1 commit into from
Nov 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/main_push_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,4 @@ jobs:
run: ./gradlew build test
- name: Build in Linux
if: runner.os == 'Linux'
run: ./gradlew build check test integrationTest
run: ./gradlew build check test integrationTest -i
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to improve the readout on the build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, noticed when the test failed that there were almost no details about failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok cool, just wanted to make sure it was on purpose

Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
import java.nio.file.Files;
import java.time.Duration;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
Expand Down Expand Up @@ -118,9 +119,16 @@ static LocalStackContainer createS3Container() {
.withServices(LocalStackContainer.Service.S3);
}

static int getRandomPort() throws IOException {
try (ServerSocket socket = new ServerSocket(0)) {
return socket.getLocalPort();
/**
* Finds 2 simultaneously free port for Kafka listeners
*
* @return list of 2 ports
* @throws IOException
* when port allocation failure happens
*/
static List<Integer> getKafkaListenerPorts() throws IOException {
try (ServerSocket socket = new ServerSocket(0); ServerSocket socket2 = new ServerSocket(0)) {
return Arrays.asList(socket.getLocalPort(), socket2.getLocalPort());
} catch (IOException e) {
throw new IOException("Failed to allocate port for test", e);
}
Comment on lines +129 to 134
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this code lead to hard to detect collisions and flaky tests?

Reasoning:
The socket is opened and a port assigned.
We save the port number.
we close the socket as we exit the try. (TIMEOUT LINGER now applies to the port number)
The calling code then uses the port numbers.

If 2 tests are running and the timing is just right. The second test can retrieve the same port number that were retrieved in the first test.

Solution:
Return ServerSocket not port numbers. Then you can get as many sockets as needed and hold them until the code is finished.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reserving a free port for future use is pretty tricky to do: even if you return the ServerSocket, you need to close it before the port is freed and any process on the host machine (including parallel tests) could sneak right in and occupy it.

The right thing to do might be to detect if that specific failure occurred (that the supposedly free port is no longer free) and retry, but I'd take the point of view that change makes the test less flaky than it was for now. The two free ports can no longer be assigned to the same value during the same test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Claudenw AFAIU you are talking about parallel test execution that we are not doing as of now.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -123,8 +123,9 @@ void setUp(final TestInfo testInfo) throws Exception {
testBucketAccessor.createBucket();

connectRunner = new ConnectRunner(OFFSET_FLUSH_INTERVAL_MS);
final int localListenerPort = IntegrationBase.getRandomPort();
final int containerListenerPort = IntegrationBase.getRandomPort();
final List<Integer> ports = IntegrationBase.getKafkaListenerPorts();
final int localListenerPort = ports.get(0);
final int containerListenerPort = ports.get(1);
connectRunner.startConnectCluster(CONNECTOR_NAME, localListenerPort, containerListenerPort);

adminClient = newAdminClient(connectRunner.getBootstrapServers());
Expand Down
Loading