You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
6.1.5 or greater. We have only tested up to 6.2.2.1 though
SQL Server version
SQL Server 2012
Client operating system
Windows Server 2012 R2
Java/JVM version
1.8.0_u152 Hotspot
Problem description
It looks like #160 (introduced in 6.1.5 onwards) introduces some highly undesirable behaviour regarding CPU affinity on servers with two NUMA nodes. That change introduces two connections being opened in sequence and one of them is then closed. The server side of the connection is pinned to a NUMA node in a round robin fashion, but because every second connection is closed it results in a horrible resource allocation imbalance on the remaining connections.
Our Java application (JBoss EAP 7) opens 300 connections at maximum and maintains them under heavy use in a connection pool. We have observed all connections allocated to a single node, or nearly all... sometimes a random connection when the pool is scaling up (like a DBA trying to work out what the hell is going on) will alter the pattern slightly and all new connections then flip to the other node.
The server is then essentially restrained to operate at half capacity because half of the CPUs are idle.
We have reproduced in JMeter as well, using different driver versions and 'multiSubnetFailover=true'. This problem does not occur on versions 6.1.4 and below. Our assumption is #160 is the cause, but perhaps that is incorrect.
Expected behavior and actual behavior
The connections should have been distributed evenly across NUMA nodes... they were instead mostly on a single node. Perfmon shows us that when people are not actively investigating by establishing their own connections it will perfectly pin all 300 connections to a single NUMA node.
Repro code
Use 'multiSubnetFailover=true' in your JDBC connection string.
Connect to a server with two NUMA nodes.
Maintain all open connections for the duration of the test.
Run some needlessly expensive queries to exercise the CPUs.
Observe half of the CPUs on the machine saturate and half remain idle.
DBAs monitor affinity on the server like so:
--Displays connections per NUMA node for 'Microsoft JDBC Driver 6.1' client interfaces
select c.node_affinity, s.client_interface_name, count(*) as connections
from sys.dm_exec_connections c
join sys.dm_exec_sessions s on s.session_id = c.session_id
where s.client_interface_name = 'Microsoft JDBC Driver 6.1'
group by node_affinity, s.client_interface_name
The text was updated successfully, but these errors were encountered:
Thank you for creating the issue. I was able to reproduce it, #160 seems to be the cause. I also noticed the connections are evenly distributed across NUMA nodes when multiSubnetFailover=false.
We will continue the investigation, please stay tuned :)
Driver version or jar name
6.1.5 or greater. We have only tested up to 6.2.2.1 though
SQL Server version
SQL Server 2012
Client operating system
Windows Server 2012 R2
Java/JVM version
1.8.0_u152 Hotspot
Problem description
It looks like #160 (introduced in 6.1.5 onwards) introduces some highly undesirable behaviour regarding CPU affinity on servers with two NUMA nodes. That change introduces two connections being opened in sequence and one of them is then closed. The server side of the connection is pinned to a NUMA node in a round robin fashion, but because every second connection is closed it results in a horrible resource allocation imbalance on the remaining connections.
Our Java application (JBoss EAP 7) opens 300 connections at maximum and maintains them under heavy use in a connection pool. We have observed all connections allocated to a single node, or nearly all... sometimes a random connection when the pool is scaling up (like a DBA trying to work out what the hell is going on) will alter the pattern slightly and all new connections then flip to the other node.
The server is then essentially restrained to operate at half capacity because half of the CPUs are idle.
We have reproduced in JMeter as well, using different driver versions and 'multiSubnetFailover=true'. This problem does not occur on versions 6.1.4 and below. Our assumption is #160 is the cause, but perhaps that is incorrect.
Expected behavior and actual behavior
The connections should have been distributed evenly across NUMA nodes... they were instead mostly on a single node. Perfmon shows us that when people are not actively investigating by establishing their own connections it will perfectly pin all 300 connections to a single NUMA node.
Repro code
The text was updated successfully, but these errors were encountered: