-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Performance of Generating Config Contexts #4559
Comments
@lampwins do you have any ideas / advice on how we could make these queries more performant? Sadly this is slightly out of my area of expertise. |
This affects my use of Netbox (via the If nobody else gets to it first, I might have a go at digging into profiling what's going on here at some point in the next few months. |
As @lampwins noted in the previous issue:
|
I dove into this a bit over the weekend and I think there are two primary ways we can approach this. The first is coming up with a reasonable way to do the deep merge of context data in psql and the second is doing some sort of annotation on the config context object query to have psql do the mapping for us in bulk. |
Thanks to @tyler-8 for pointing me to this issue. I've also noticed that redis performance is significantly impacted by this. We have a number of API clients regularly calling the Might I suggest that this functionality gets disabled by default for now, and that the API docs mention the very significant performance impact of enabling it? Aside from that, couldn't this problem be solved by regenerating the config_contexts in a background task when an action happens that requires it to change, rather than generating it on-demand for every API call? |
Might be something to discuss in the Slack channel or Google Group before opening an issue to improve docs.
That was the initial topic that led to this issue. #4544 - the decision was that the logic to generate config contexts needs to be "fixed" first as the way it's currently written invokes multiple DB queries PER device in a query of N-devices, rather than a handful of queries. |
What would be the best way to remove these queries? I don't mind working on it a bit, I just don't want to see this issue fall by the wayside as its pretty important to my team to fix. |
Maybe making this |
Quick fix to implement the Rewrites all filter to include the exclude (or append it), and the all() method to a filter() aswell.
|
Tagging this as |
I found this snippet that could potentially be molded for this use case - however this isn't doing a deep merge. |
This is the explain I get on v2.8.8 for the deep join query mentioned for a single device, fwiw. It doesn't seem particularly bad in isolation, but I guess it's probably executing once per item in the result set.
|
When I profiled this I found the query and talking to the cache were pretty quick and all the slowness was in the serialization. It'd spend ~1-2s getting the data and then 20-30s serializing. |
I think we have a good understanding of the problem already - the question now is how to change it to a more efficient method. That "20-30s serializing" is actually just performing even more queries for each device. Executing a separate query (and then calculating the context merger in Python, separately for each device) is what is chewing up so much time. One of the most common API issues is around users pulling the device list for hundreds/thousands of devices at once and having web server timeouts because it takes upwards of 1 minute to return a response - which is an eternity in web terms. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. NetBox is governed by a small group of core maintainers which means not all opened issues may receive direct feedback. Please see our contributing guide. |
I have a solution going in the |
Here is some analysis of my solution. There are 5000 devices in my test database and 10 config context objects randomly assigned in various objects in the hierarchy. For anyone interested, I have created two NetBox custom scripts to create the devices and config context objects here. Also, I have set Baseline, on develop branch with % curl -H "accept: application/json" -H "authorization: token b7232b1b115300f91f2aef57bc01f4c5fed4aa79" "http://localhost:8000/api/dcim/devices/?limit=0&exclude=config_context" | python -m json.tool > develop-exclude.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 5530k 100 5530k 0 0 587k 0 0:00:09 0:00:09 --:--:-- 1317k On develop branch with % curl -H "accept: application/json" -H "authorization: token b7232b1b115300f91f2aef57bc01f4c5fed4aa79" "http://localhost:8000/api/dcim/devices/?limit=0" | python -m json.tool > develop.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 5770k 100 5770k 0 0 61767 0 0:01:35 0:01:35 --:--:-- 1378k Finally, on 4559-config-context-rendering branch with % curl -H "accept: application/json" -H "authorization: token b7232b1b115300f91f2aef57bc01f4c5fed4aa79" "http://localhost:8000/api/dcim/devices/?limit=0" | python -m json.tool > new-4559.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 5770k 100 5770k 0 0 695k 0 0:00:08 0:00:08 --:--:-- 1413k Keep in mind these results are only with the development server. While not entirely scientific, I'll take the 1 sec gain over baseline as a big win ;) A simple test to ensure we still render the same way: % diff develop.json new-4559.json
% I have also added a handful of tests to more thoroughly cover the integrity of the rendered data. |
Environment
Proposed Functionality
Improve the
get_config_context
serializer method that is observed to have slow performance that builds up as the number of fetched devices at a time increases.I'm not sure exactly where the bottleneck is, but it does perform a very complex query that may be the cause of the overhead. I performed some profiling awhile back and noticed that the SQL queries used an acceptable amount of request time, but that it was mostly using CPU execution time. I'm not sure if this is still the case across the board, but it may be worthwhile to profile and optimize whatever calls are made under this
get_config_context
serializer method.Use Case
Fetching 1000 devices by default takes ~60s. Without config context - using
?excludes=config_context
- the response takes ~5 seconds. This large overhead was determined to be from the generation & merging of the config contexts that exist on devices / virtual machines. Reducing this performance overhead would make NetBox's usage much more friendly and acceptable.Database Changes
This context is regenerated on every request on via this serialize. This then calls this complex query which either in the query, object generation, or serialization takes an enormous amount of time.
We would need to evaluate the performance of this query / method and determine where the bottleneck is occurring, and how to make it faster.
External Dependencies
N/A
Note that this is a revised version of #4544
The text was updated successfully, but these errors were encountered: