-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache the Config Context in the Database #4544
Comments
This proposal needs to be fleshed out more. What is the specific workflow being proposed? E.g. when do config contexts get calculated if not on-demand? How do we detect and resolve discrepancies? Is the calculation being performed in the background? What are the performance implications? Please be specific in your proposal. |
I'll post a small design when I get a moment. If anyone sees this and wants to post their thoughts / comments before I flesh it out, feel free to. |
This sounds great. We are starting to use config contexts rather extensively, and have a few external services which relentlessly hammer the NetBox API. Eliminating the overhead of rendering the config context on demand would help scalability quite a bit. |
I imagine this will look something like:
Serializers and views would be updated to show the contents of |
Would the asynchronous nature of the RQ worker approach cause issues? That is, if I update a few attributes of a device, I'd expect to be able to click save and flip over to the config context tab and see the updated configuration. Using just signals might be a better approach. |
I implemented very similar functionality in one of our NetBox plugins, which allows you to group devices (and virtual machines) into services, such as "all cron servers owned by Unix Support" and "all SaltStack masters owned by Network Engineering". Since manually adding and removing systems to and from services is tedious, it supports contexts, where you can specify criteria that is virtually the same as a configuration context, with the addition of hostname patterns. A system that no longer meets the criteria is automatically removed from the service, and a system that was created or updated and now meets the criteria is automatically added to the service. In order to keep the performance acceptable, I took the approach of building a query set representing the set of devices that need to be evaluated, and then evaluating each service against the query set, moving all of the work to the database. When a ServiceContext is created, updated, or deleted, the query set is simply all devices ( For devices that are created or updated, a signal receiver tracks the PK, and constructs the query set from all PKs once the transaction is committed. We often perform bulk updates of hundreds of devices at a time, so this query set can represent a large number of devices. Our NetBox instance has 50k devices and 10k VMs, and with this approach the overhead is negligible. |
Agree on that. We can't trust the async update proces to be ready before the next consumer requests the config_context and needs this update to be reflected in there. But I also agree that we can't wait on a recompute of all affected data on a save. The update signal on config_context (but also on a lot of other changes like site/region/device that can affect memberships!) should_ imo both
If the config_context of a device is requested, there should be a check on whether a valid computed config context is available. If not we should generate it on request (as current behavior). I hope we will in the future have a way of getting config_context merging on other entities than devices/vms (e.g. site level). Storing the What should be the role of the redis cache vs the database for storing each of these properties (computed configcontext, (in)valid querysets etc)? The computed config context is in fact a cached/optimized copy of data which might make it more suitable for storage in cache. On the other hand storing it as a JSONB datatype in postgres would open up some real cool possibilities on indexing/searching/querying within the context.. |
I think this is an overcomplicated solution to the root problem which is we are not efficiently querying the ConfigContext model instances. The reason this is so inefficient today is that this method gets called on each serialized device instance which in turn calls this complex query logic for every device. We should look to optimize this before going down the denormalization route. |
@lampwins Do you want to close this issue in favor of a new one then? I want to make sure we keep the issue proposal consistent with the eventual solution to avoid confusion. |
@jeremystretch I do not mind creating the new issue that is tied closer to the query optimization. Feel free to close this one in the meantime |
@lampwins Interestingly enough, when I played around with this, Django debug-toolbar noted that my SQL performance was generally okay, but that it was actually CPU time that was causing the slowness. I'm not sure at what level this CPU slowness is occuring. |
Environment
Proposed Functionality
As discussed in Slack, it may be beneficial to cache the generated/merged config context in the database to reduce load times on bulk device list API requests.
Use Case
Fetching 1000 devices with config context returned takes ~60s. Without config context, the response takes ~5 seconds. This large overhead was determined to be from the generation & merging of the config contexts that exist on devices / virtual machines. This context is regenerated on every request. For performance, it would be practical and very performant to pre-cache this generated data in the database.
Database Changes
This would require adding a
computed_config_context
to theConfigContextModel
that would contain the computed config context. The naming of this field can be up for debate if something better is thought of.It would contain the pre-cached computed config context, and it would be rebuilt whenever an associated
ConfigContext
is updated or during thesave()
/post_save
method of aConfigContextModel
.External Dependencies
NA
The text was updated successfully, but these errors were encountered: