You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running a Machine Tracker IP search for a subnet, while checking the DNS checkbox, Machine Tracker will perform bunch of DNS lookups in parallel, in order to return the search results as fast as possible.
However, it seems Machine Tracker parallelizes too much: Each resolver consumes a UDP socket, and the number of sockets uses seems to grow at least linearly with the size of the searched subnet.
Once the number of simultaneously open file descriptors reach the limit imposed by the OS (typically 1024), further attempts to open a file descriptor will crash with an unhandled OSError, which in turn results in a series of chained errors, as Django tries to produce a 500 response, but is unable to open the 500.html template because there are no more available file descriptors - and so Django tries to produce a 500 error for that error, and so on.
The end result (at least when running NAV under uwsgi) is that none of the opened file descriptors get closed, and the worker process stays around with too many open file descriptors. The worker process is now unable to respond properly to further requests, and all requests handled by it result in 500 Internal Server errors (until, in the case of our uwsgi setup, when the worker process has been asked to handle a configured number of requests, the uwsgi master process kills it and starts a new worker).
To Reproduce
Steps to reproduce the behavior:
Go to Machine Tracker in the NAV toolbox
Input a value like 10.0.0.0/20 in the IP address search field
Check the Both filter radio button
Check the Dns checkbox in the Columns section.
Click the Search button.
See error.
Expected behavior
Machine Tracker should respond in a timely manner for large subnet searches, just as with small ones, without crashing or leaking a bunch of resources.
Tracebacks
A typical traceback logged by a worker process that crashes may look like this (the exception chain is very long, so this is cut off as soon as it starts repeating itself):
Traceback (most recent call last):
File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/internet/udp.py", line 195, in _bindSocket
File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/internet/base.py", line 1199, in createInternetSocket
File "/usr/lib/python3.7/socket.py", line 151, in __init__OSError: [Errno 24] Too many open files
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/venvs/nav/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
File "/opt/venvs/nav/lib/python3.7/site-packages/django/core/handlers/base.py", line 181, in _get_response
File "/opt/venvs/nav/lib/python3.7/site-packages/nav/web/machinetracker/views.py", line 244, in mac_search
File "/opt/venvs/nav/lib/python3.7/site-packages/nav/web/machinetracker/views.py", line 317, in mac_do_search
File "/opt/venvs/nav/lib/python3.7/site-packages/nav/web/machinetracker/utils.py", line 162, in track_mac
File "/opt/venvs/nav/lib/python3.7/site-packages/nav/asyncdns.py", line 56, in reverse_lookup
File "/opt/venvs/nav/lib/python3.7/site-packages/nav/asyncdns.py", line 86, in resolve
File "/opt/venvs/nav/lib/python3.7/site-packages/nav/asyncdns.py", line 170, in lookup
File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/names/common.py", line 122, in lookupPointer
File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/names/client.py", line 416, in _lookup
File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/names/client.py", line 317, in queryUDP
File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/names/client.py", line 281, in _query
File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/names/client.py", line 226, in _connectedProtocol
File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/internet/posixbase.py", line 369, in listenUDP
File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/internet/udp.py", line 178, in startListening
File "/opt/venvs/nav/lib/python3.7/site-packages/twisted/internet/udp.py", line 198, in _bindSockettwisted.internet.error.CannotListenError: Couldn't listen on any:39735: [Errno 24] Too many open files.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/venvs/nav/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
File "/opt/venvs/nav/lib/python3.7/site-packages/django/utils/deprecation.py", line 117, in __call__
File "/opt/venvs/nav/lib/python3.7/site-packages/django/core/handlers/exception.py", line 49, in inner
File "/opt/venvs/nav/lib/python3.7/site-packages/django/core/handlers/exception.py", line 114, in response_for_exception
File "/opt/venvs/nav/lib/python3.7/site-packages/django/core/handlers/exception.py", line 153, in handle_uncaught_exception
File "/opt/venvs/nav/lib/python3.7/site-packages/nav/django/views.py", line 28, in custom_500
File "/opt/venvs/nav/lib/python3.7/site-packages/django/template/loader.py", line 15, in get_template
File "/opt/venvs/nav/lib/python3.7/site-packages/django/template/backends/django.py", line 34, in get_template
File "/opt/venvs/nav/lib/python3.7/site-packages/django/template/engine.py", line 143, in get_template
File "/opt/venvs/nav/lib/python3.7/site-packages/django/template/engine.py", line 125, in find_template
File "/opt/venvs/nav/lib/python3.7/site-packages/django/template/loaders/cached.py", line 58, in get_template
File "/opt/venvs/nav/lib/python3.7/site-packages/django/template/loaders/base.py", line 24, in get_template
File "/opt/venvs/nav/lib/python3.7/site-packages/django/template/loaders/cached.py", line 27, in get_contents
File "/opt/venvs/nav/lib/python3.7/site-packages/django/template/loaders/filesystem.py", line 23, in get_contentsOSError: [Errno 24] Too many open files: '/etc/nav/templates/500.html'
During handling of the above exception, another exception occurred:
…
Environment (please complete the following information):
NAV version installed: 5.6.0
Additional context
The open file descriptors of a uwsgi process can be inspected using the lsof command line tool:
There may be problems reproducing this on systems that allow more than 1024 file descriptors per process. When running this in the docker-compose based developement environment, I'm unable to reproduce it, as the container (at least on my machine) seems to allow for much more than 1024 file descriptors.
I don't think I put this in writing anywhere, but I guess the most likely course of action to resolve this is to impose some limit to how many simultaneous requests will be in-flight at any give time. E.g. an unlimited amount is untenable. 1024 would also be too high. 100 simultaneous requests might be a good (although arbitrary) number to ensure we don't bump against the usual limits.
Describe the bug
When running a Machine Tracker IP search for a subnet, while checking the
DNS
checkbox, Machine Tracker will perform bunch of DNS lookups in parallel, in order to return the search results as fast as possible.However, it seems Machine Tracker parallelizes too much: Each resolver consumes a UDP socket, and the number of sockets uses seems to grow at least linearly with the size of the searched subnet.
Once the number of simultaneously open file descriptors reach the limit imposed by the OS (typically 1024), further attempts to open a file descriptor will crash with an unhandled
OSError
, which in turn results in a series of chained errors, as Django tries to produce a 500 response, but is unable to open the500.html
template because there are no more available file descriptors - and so Django tries to produce a500
error for that error, and so on.The end result (at least when running NAV under
uwsgi
) is that none of the opened file descriptors get closed, and the worker process stays around with too many open file descriptors. The worker process is now unable to respond properly to further requests, and all requests handled by it result in 500 Internal Server errors (until, in the case of ouruwsgi
setup, when the worker process has been asked to handle a configured number of requests, theuwsgi
master process kills it and starts a new worker).To Reproduce
Steps to reproduce the behavior:
10.0.0.0/20
in the IP address search fieldBoth
filter radio buttonDns
checkbox in theColumns
section.Search
button.Expected behavior
Machine Tracker should respond in a timely manner for large subnet searches, just as with small ones, without crashing or leaking a bunch of resources.
Tracebacks
A typical traceback logged by a worker process that crashes may look like this (the exception chain is very long, so this is cut off as soon as it starts repeating itself):
Environment (please complete the following information):
Additional context
The open file descriptors of a
uwsgi
process can be inspected using thelsof
command line tool:The text was updated successfully, but these errors were encountered: