[BUG] IPAM front-end enters tight loop calling a failing back-end API endpoint #2989

lunkwill42 · 2024-09-19T11:11:23Z

Describe the bug

A customer experienced a massive amount of e-mail being dispatched from their NAV server through the night. On our end, we received some 78.000 Django error report e-mails over the course of about 14 hours.

It appears that what happened was that a user had the IPAM web tool open, and it kept on hammering (in a tight loop) the back-end API endpoint that retrieves information about a scope-level prefix for display in IPAM. The end-point failed with an unhandled error (500 response code), because the prefix address in question could not be found in the NAV database (it may have been deleted after the IPAM UI loaded its initial data set). The production server is set up to e-mail Django error reports to the developer team when a Django view hits an unhandled exception, which caused every one of these hammer-hits to dispatch a full e-mail debug report.

To Reproduce

Reproducing the failing back-end API call is fairly easy, even from the IPAM UI. However, we've not been able to reproduce the tight loop observed in production.

A failing API call can be produced by visiting /ipam/api/?net_type=all&within=A.B.C.D/XX&show_all=True, where A.B.C.D/XX represents a network prefix that is not registered anywhere in NAV (see traceback below).

The same call is most easily reproduced through the front-end thus:

Steps to reproduce the behavior:

Go to SeedDB
Click on the Prefix tab
Click on the button Add new prefix
Add a new scope prefix (suggest using some RFC 1918 address unused in your installation)
Open a new browser tab, browse /ipam to populate the initial UI list of scope prefixes. Verify that the new prefix in the list.
Return to the SeedDB tab and click the Delete button. Confirm the deletion of the prefix you added in step 4.
Return to the IPAM tab. Click on the delete prefix address to expand it.
IPAM performs the failing API request.

It's a this point we are unable to reproduce a tight reloading loop. What we observe is the API endpoint failing with a 500 error, which the IPAM UI seems to be oblivious to: It expands the selected prefix, and displays a spinner icon with the text Getting data for allocation tree. This spinner never stops.

Expected behavior

The back-end API should not crash if the requested prefix address is unknown. In fact, the end-point seems to be coded to produce an alternative response if the prefix isn't found, but this code is faulty.
The front-end should not hammer the back-end API. However, since we're unable to reproduce this behavior, it's not clear what we can do about it.
The front-end should take appropiate measures to report API errors to the user, rather than showing a spinner indefinitely.

Number 3 could be implemented as a separate issue, since it is entirely in the front-end. The most important issue right now is to fix number 1, as there is no need to mail the Django site admins when the API cannot find the requested address.

Tracebacks

Recreated in development:

Traceback (most recent call last):
  File "/opt/venvs/nav/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/opt/venvs/nav/lib/python3.9/site-packages/django/core/handlers/base.py", line 181, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/opt/venvs/nav/lib/python3.9/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
  File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/viewsets.py", line 125, in view
    return self.dispatch(request, *args, **kwargs)
  File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/views.py", line 509, in dispatch
    response = self.handle_exception(exc)
  File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/views.py", line 469, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
    raise exc
  File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
  File "/opt/venvs/nav/lib/python3.9/site-packages/nav/web/ipam/api.py", line 157, in list
    result = make_tree(prefixes, root_ip=within, family=family, show_all=show_all)
  File "/opt/venvs/nav/lib/python3.9/site-packages/nav/web/ipam/prefix_tree.py", line 471, in make_tree
    scope = Prefix.objects.get(net_address=root_ip)
  File "/opt/venvs/nav/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/opt/venvs/nav/lib/python3.9/site-packages/django/db/models/query.py", line 435, in get
    raise self.model.DoesNotExist(
nav.models.manage.Prefix.DoesNotExist: Prefix matching query does not exist.

Environment (please complete the following information):

NAV version installed: 5.10.2

The text was updated successfully, but these errors were encountered:

NAV test data does not include a prefix 192.168.42.0/24. This should ensure the IPAM main API endpoint does not crash when asked to build a tree for a prefix not known to NAV.

lunkwill42 added the bug label Sep 19, 2024

lunkwill42 self-assigned this Sep 19, 2024

lunkwill42 mentioned this issue Sep 19, 2024

Fix incorrect handling of non-existant scopes #2990

Merged

lunkwill42 closed this as completed in #2990 Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] IPAM front-end enters tight loop calling a failing back-end API endpoint #2989

[BUG] IPAM front-end enters tight loop calling a failing back-end API endpoint #2989

lunkwill42 commented Sep 19, 2024

[BUG] IPAM front-end enters tight loop calling a failing back-end API endpoint #2989

[BUG] IPAM front-end enters tight loop calling a failing back-end API endpoint #2989

Comments

lunkwill42 commented Sep 19, 2024

Describe the bug

To Reproduce

Expected behavior

Tracebacks

Environment (please complete the following information):