Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subscription threads not being closed #153

Open
zarqman opened this issue Dec 6, 2024 · 2 comments
Open

Subscription threads not being closed #153

zarqman opened this issue Dec 6, 2024 · 2 comments
Labels
defect Suspected defect such as a bug or regression

Comments

@zarqman
Copy link

zarqman commented Dec 6, 2024

Observed behavior

It appears that the thread pool that manages a connection's subscriptions is not being closed/shutdown when the connection is closed. As a result, threads are endlessly added without being cleaned up.

In real world usage, we've ended up with a Ruby process having over 19000 threads before finally locking up due to system limits on creating new threads.

If I'm reading things right, I think the growing threads are connected to subscription_executor. I see that subscription_executor is shutdown inside do_drain at client.rb:1190, but I don't see a matching shutdown inside close_connection.

The subscription_executor was added in #117 and it looks like close_connection previously did call each subscription's Thread#exit, so perhaps this got lost when moving to the thread pool?

@Envek, any insight into this?

Expected behavior

When Client#close is called, any/all subscription threads should be exited.

Server and client version

nats-server 2.10.18
nats-pure.rb 2.4.0
ruby 3.3.6

Host environment

In production, this is a Rails app inside a docker container on amd64. Repro script below is plain ruby on arm64 (no docker).

Steps to reproduce

A simple reproduction script:

# endless_nats_threads.rb
require 'nats/client'

ADMIN_USER     = 'nats'
ADMIN_PASSWORD = 'nats'

def nats_client
  opts = {
    servers: ["nats://localhost:4222"],
    reconnect: false,
    drain_timeout: 2,
    user: ADMIN_USER,
    pass: ADMIN_PASSWORD,
  }
  NATS.connect(opts).tap do |n|
    yield n
  ensure
    n.close
      # adds 1 thread per loop, no cap
    # n.drain
      # climbs to 46 threads
      #   is 5 each for 9 conns before drain_timeout kicks in
  end
end

def run
  loop do
    pp now: Time.now, threads: Thread.list.count
    nats_client do |n|
      n.request "$SYS.REQ.USER.INFO"
    end
    sleep 0.2
  end
end

run
@zarqman zarqman added the defect Suspected defect such as a bug or regression label Dec 6, 2024
@zarqman
Copy link
Author

zarqman commented Dec 7, 2024

The reconnect thread in process_op_error also kills off other threads, but doesn't shutdown the subscription_executor. This then calls attempt_reconnect -> start_threads!, which creates a new subscription_executor (overwriting the existing one).

I think this may leak a subscription_executor on each reconnect. Not sure if the old one should be shutdown or if the existing one should be memoized/reused.

@Envek
Copy link
Contributor

Envek commented Dec 9, 2024

@Envek, any insight into this?

No insights! Most probably I just missed this and it just need to be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

No branches or pull requests

2 participants