Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server-side CPU Affinity problem caused by driver #645

Closed
greg-pendlebury opened this issue Mar 7, 2018 · 2 comments
Closed

Server-side CPU Affinity problem caused by driver #645

greg-pendlebury opened this issue Mar 7, 2018 · 2 comments
Assignees

Comments

@greg-pendlebury
Copy link

greg-pendlebury commented Mar 7, 2018

Driver version or jar name

6.1.5 or greater. We have only tested up to 6.2.2.1 though

SQL Server version

SQL Server 2012

Client operating system

Windows Server 2012 R2

Java/JVM version

1.8.0_u152 Hotspot

Problem description

It looks like #160 (introduced in 6.1.5 onwards) introduces some highly undesirable behaviour regarding CPU affinity on servers with two NUMA nodes. That change introduces two connections being opened in sequence and one of them is then closed. The server side of the connection is pinned to a NUMA node in a round robin fashion, but because every second connection is closed it results in a horrible resource allocation imbalance on the remaining connections.

Our Java application (JBoss EAP 7) opens 300 connections at maximum and maintains them under heavy use in a connection pool. We have observed all connections allocated to a single node, or nearly all... sometimes a random connection when the pool is scaling up (like a DBA trying to work out what the hell is going on) will alter the pattern slightly and all new connections then flip to the other node.

The server is then essentially restrained to operate at half capacity because half of the CPUs are idle.

We have reproduced in JMeter as well, using different driver versions and 'multiSubnetFailover=true'. This problem does not occur on versions 6.1.4 and below. Our assumption is #160 is the cause, but perhaps that is incorrect.

Expected behavior and actual behavior

The connections should have been distributed evenly across NUMA nodes... they were instead mostly on a single node. Perfmon shows us that when people are not actively investigating by establishing their own connections it will perfectly pin all 300 connections to a single NUMA node.

Repro code

  1. Use 'multiSubnetFailover=true' in your JDBC connection string.
  2. Connect to a server with two NUMA nodes.
  3. Maintain all open connections for the duration of the test.
  4. Run some needlessly expensive queries to exercise the CPUs.
  5. Observe half of the CPUs on the machine saturate and half remain idle.
  6. DBAs monitor affinity on the server like so:
--Displays connections per NUMA node for 'Microsoft JDBC Driver 6.1' client interfaces
select c.node_affinity, s.client_interface_name,  count(*) as connections
from sys.dm_exec_connections c
join sys.dm_exec_sessions s on s.session_id = c.session_id
where s.client_interface_name = 'Microsoft JDBC Driver 6.1'
group by node_affinity,  s.client_interface_name 
@ulvii
Copy link
Contributor

ulvii commented Mar 16, 2018

Hi @greg-pendlebury ,

Thank you for creating the issue. I was able to reproduce it, #160 seems to be the cause. I also noticed the connections are evenly distributed across NUMA nodes when multiSubnetFailover=false.

We will continue the investigation, please stay tuned :)

@cheenamalhotra
Copy link
Member

Closing issue since PR merged in and will be part of 6.5.1 preview release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants