Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close SocketChannel in TcpNioClientConnectionFactory.buildNewConnection() #9694

Closed
anthologia opened this issue Dec 5, 2024 · 6 comments · Fixed by #9696
Closed

Close SocketChannel in TcpNioClientConnectionFactory.buildNewConnection() #9694

anthologia opened this issue Dec 5, 2024 · 6 comments · Fixed by #9696

Comments

@anthologia
Copy link
Contributor

Expected Behavior

  • When a connection attempt fails, the SocketChannel should be properly closed, and resources should be released.
  • Socket management should prevent the number of open sockets from increasing indefinitely during retries.

Current Behavior

  • When the target server is down and TcpNioClientConnectionFactory.buildNewConnection() attempts to connect, java.net.ConnectException: Connection refused: no further information is thrown. However, the socket connection is not closed, and the connection retry interval causes the number of sockets to increase indefinitely.
  • When using singleUse=false, clientMode=true, and retryInterval, the number of open sockets increases indefinitely until the target server is up.
// TcpNioClientConnectionFactory class

@Override
protected TcpConnectionSupport buildNewConnection() {
	try {
		SocketChannel socketChannel = SocketChannel.open();
		setSocketAttributes(socketChannel.socket());
		connect(socketChannel);
		TcpNioConnection connection =
				this.tcpNioConnectionSupport.createNewConnection(socketChannel, false, isLookupHost(),
						getApplicationEventPublisher(), getComponentName());
		connection.setUsingDirectBuffers(this.usingDirectBuffers);
		connection.setTaskExecutor(getTaskExecutor());
		Integer sslHandshakeTimeout = getSslHandshakeTimeout();
		if (sslHandshakeTimeout != null && connection instanceof TcpNioSSLConnection) {
			((TcpNioSSLConnection) connection).setHandshakeTimeout(sslHandshakeTimeout);
		}
		TcpConnectionSupport wrappedConnection = wrapConnection(connection);
		if (!wrappedConnection.equals(connection)) {
			connection.setSenders(getSenders());
		}
		initializeConnection(wrappedConnection, socketChannel.socket());
		if (getSoTimeout() > 0) {
			connection.setLastRead(System.currentTimeMillis());
		}
		this.channelMap.put(socketChannel, connection);
		wrappedConnection.publishConnectionOpenEvent();
		this.newChannels.add(socketChannel);
		this.selector.wakeup();
		return wrappedConnection;
	}
	catch (IOException e) {
		throw new UncheckedIOException(e);
	}
	catch (InterruptedException e) {
		Thread.currentThread().interrupt();
		throw new UncheckedIOException(new IOException(e));
	}
}

private void connect(SocketChannel socketChannel) throws IOException, InterruptedException {
	socketChannel.configureBlocking(false);
	socketChannel.connect(new InetSocketAddress(getHost(), getPort()));
	boolean connected = socketChannel.finishConnect();
	long timeLeft = getConnectTimeout().toMillis();
	while (!connected && timeLeft > 0) {
		Thread.sleep(5); // NOSONAR Magic #
		connected = socketChannel.finishConnect();
		timeLeft -= 5; // NOSONAR Magic #
	}
	if (!connected) {
	throw new IOException("Not connected after connectTimeout");
		}
}

Context

  • I believe this may cause resource leakage, as each failed connection attempt opens a new socket that is never closed.
  • I think that when the server is down and the connection fails, the socket should be properly closed, and resources should be released.
  • I'm wondering if adding socketChannel.close() in the event of a connection failure would be a good solution, or if I might be misunderstanding the intended behavior of TcpNioClientConnectionFactory.buildNewConnection().
@anthologia anthologia added status: waiting-for-triage The issue need to be evaluated and its future decided type: enhancement labels Dec 5, 2024
@artembilan artembilan added this to the 6.4.1 milestone Dec 5, 2024
@artembilan
Copy link
Member

I'm not sure how to test it, by I believe that your observation is correct.
Feel free to contribute the fix: https://github.com/spring-projects/spring-integration/blob/main/CONTRIBUTING.adoc.
I'm curious what kind of test you see to verify this logic.
See TcpNioConnectionTests for more info.
Thanks

anthologia pushed a commit to anthologia/spring-integration that referenced this issue Dec 5, 2024
@anthologia
Copy link
Contributor Author

This is what i found a socket leak issue.
Please check with netstat or lsof while running this test:

@Test
public void test() throws Exception {
	final CountDownLatch latch = new CountDownLatch(1);

	TcpNioClientConnectionFactory factory =
			new TcpNioClientConnectionFactory("1.2.3.4", 11111);
	factory.setSingleUse(false);
	factory.setConnectTimeout(0);
	factory.start();

	ThreadPoolTaskScheduler taskScheduler = new ThreadPoolTaskScheduler();
	taskScheduler.initialize();

	TcpSendingMessageHandler handler = new TcpSendingMessageHandler();
	handler.setConnectionFactory(factory);
	handler.setClientMode(true);
	handler.setRetryInterval(1000);
	handler.setTaskScheduler(taskScheduler);
	handler.start();

	latch.await(5, TimeUnit.SECONDS);
}

Each retry attempt creates a new socket as exceptions occur, leading to socket resource leakage.

before

@artembilan
Copy link
Member

That's good!
Any idea how would we verify that from unit test?
Isn't there some tool in Java which could that something similar for us?
Thanks

@anthologia
Copy link
Contributor Author

It seems we can use Runtime to execute OS commands to check this. However, this approach would require different commands depending on the OS.

I believe tests should succeed in an OS-independent manner. However, I found that one of the classes in Spring Integration uses tail, which is a Unix-specific command.

Given that, should I assume that only Unix environments need to be supported?

@artembilan
Copy link
Member

Thanks for the explanation !
I don’t think that we can use tail because that one is there in the framework only for files. Plus indeed OS-specific . However it works in some way on my Windows with WSL.
Either way, we don’t want to be tied to some not stable tool just to be sure that the fix we did is covered.
I believe this is a situation when your analysis is fully enough to justify the change.
I’ll review it tomorrow again and merge.
Thank you again !

@anthologia
Copy link
Contributor Author

I'd like to add a comment as there seems to be a misunderstanding !
The mention of tail wasn't about me wanting to use it for verification. I was actually planning to use netstat or lsof. However, I wanted to get your advice on whether taking an OS-specific verification approach was the right direction. (To be honest, I hadn't thought about WSL.)
Anyway, I agree with your opinion ! I'll revise according to your review and submit it again soon.
Thank you for your response !

anthologia added a commit to anthologia/spring-integration that referenced this issue Dec 6, 2024
Fixes: spring-projects#9694
Issue link: spring-projects#9694

- Add null check before closing socketChannel in catch block
- Prevents potential NullPointerException during error handling
artembilan pushed a commit that referenced this issue Dec 6, 2024
Fixes: #9694
Issue link: #9694

- Add null check before closing `socketChannel` in catch block
- Prevents potential `NullPointerException` during error handling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants