Server-side CPU Affinity problem caused by driver #645

greg-pendlebury · 2018-03-07T02:57:33Z

Driver version or jar name

6.1.5 or greater. We have only tested up to 6.2.2.1 though

SQL Server version

SQL Server 2012

Client operating system

Windows Server 2012 R2

Java/JVM version

1.8.0_u152 Hotspot

Problem description

It looks like #160 (introduced in 6.1.5 onwards) introduces some highly undesirable behaviour regarding CPU affinity on servers with two NUMA nodes. That change introduces two connections being opened in sequence and one of them is then closed. The server side of the connection is pinned to a NUMA node in a round robin fashion, but because every second connection is closed it results in a horrible resource allocation imbalance on the remaining connections.

Our Java application (JBoss EAP 7) opens 300 connections at maximum and maintains them under heavy use in a connection pool. We have observed all connections allocated to a single node, or nearly all... sometimes a random connection when the pool is scaling up (like a DBA trying to work out what the hell is going on) will alter the pattern slightly and all new connections then flip to the other node.

The server is then essentially restrained to operate at half capacity because half of the CPUs are idle.

We have reproduced in JMeter as well, using different driver versions and 'multiSubnetFailover=true'. This problem does not occur on versions 6.1.4 and below. Our assumption is #160 is the cause, but perhaps that is incorrect.

Expected behavior and actual behavior

The connections should have been distributed evenly across NUMA nodes... they were instead mostly on a single node. Perfmon shows us that when people are not actively investigating by establishing their own connections it will perfectly pin all 300 connections to a single NUMA node.

Repro code

Use 'multiSubnetFailover=true' in your JDBC connection string.
Connect to a server with two NUMA nodes.
Maintain all open connections for the duration of the test.
Run some needlessly expensive queries to exercise the CPUs.
Observe half of the CPUs on the machine saturate and half remain idle.
DBAs monitor affinity on the server like so:

--Displays connections per NUMA node for 'Microsoft JDBC Driver 6.1' client interfaces
select c.node_affinity, s.client_interface_name,  count(*) as connections
from sys.dm_exec_connections c
join sys.dm_exec_sessions s on s.session_id = c.session_id
where s.client_interface_name = 'Microsoft JDBC Driver 6.1'
group by node_affinity,  s.client_interface_name

The text was updated successfully, but these errors were encountered:

ulvii · 2018-03-16T00:05:46Z

Hi @greg-pendlebury ,

Thank you for creating the issue. I was able to reproduce it, #160 seems to be the cause. I also noticed the connections are evenly distributed across NUMA nodes when multiSubnetFailover=false.

We will continue the investigation, please stay tuned :)

cheenamalhotra · 2018-04-06T22:18:39Z

Closing issue since PR merged in and will be part of 6.5.1 preview release.

ulvii self-assigned this Mar 16, 2018

ulvii mentioned this issue Mar 20, 2018

Use Socket instead of SocketChannel when multiSubnetFailover=true #662

Merged

cheenamalhotra closed this as completed Apr 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server-side CPU Affinity problem caused by driver #645

Server-side CPU Affinity problem caused by driver #645

greg-pendlebury commented Mar 7, 2018 •

edited

Loading

ulvii commented Mar 16, 2018

cheenamalhotra commented Apr 6, 2018

Server-side CPU Affinity problem caused by driver #645

Server-side CPU Affinity problem caused by driver #645

Comments

greg-pendlebury commented Mar 7, 2018 • edited Loading

Driver version or jar name

SQL Server version

Client operating system

Java/JVM version

Problem description

Expected behavior and actual behavior

Repro code

ulvii commented Mar 16, 2018

cheenamalhotra commented Apr 6, 2018

greg-pendlebury commented Mar 7, 2018 •

edited

Loading