-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
address issues with NetBox database and Logstash's NetBox cache #259
Comments
After looking a little closer at this, I think that's the approach I'm going to take. Since netbox is already caching things I think we're adding complication without a lot of gain by doing our own caching on top of that. |
…olab#259); work in progress, almost certainly broken in this state
Working on it, but as of mmguero-dev/Malcolm@cab66bd it's broken (seems to be a concurrency issue). Just as a reminder to myself... |
…s NetBox cache, should fix locking issues (idaholab#259)
The good news is I've removed our extra layer of caching and it does seem to be more consistent now. The bad news is the CPU load increases quite a bit when, especially when doing autopopulation due to the increased API load. I need to do some benchmarks to compare, and also compare when autopopulation is turned on vs. not. |
Okay, on further analysis I think actually we're going to have to come to a middle ground: still do the caching, but with two changes: 1) decrease the TTL significantly (from 600 seconds down to maybe like 30 or 60 seconds) and 2) have ALL of the caches in the file be a TTL cache, rather than having the site/role/device type just being LRU. I think removing the caching completely will just make it too slow. |
…olab#259); restore caching for performance reasons, but decrease TTL significantly and allow it to be specified via environment variable
Closing as per my last comment and commit. Some rough benchmarks (the last number is milliseconds in the filter):
Although this is rough (the numbers of the events aren't exactly the same), you can see that without the caching it's like 10x+ worse. |
I'm not going to hold up the v23.09.0 release for this, but this is an issue I just discovered that needs to be address:
The Logstash Ruby script in charge of enrichment from netbox uses some LRU caches to avoid frequent repetitive API lookups of items.
However, the netbox database is not static: things can be added (not so much of an issue, since a cache miss will just cause the thing to be looked up) but also deleted and changed.
So Logstash could be going along and have information in its cache that is now invalid because the underlying database has been changed.
I don't think there's a good way to "trigger" Logstash when changes are written to NetBox (externally to auto-population, like through the UI or through ./scripts/netbox-restore). But we need to consider how to handle this.
At the very least, a complete wipe/restore of the database via
./scripts/netbox-restore
needs to trigger a complete Logstash restart (or, if we want to be more subtle about it, somehow notify logstash to clear all of its caches).This needs some careful thought to figure out how to deal with it. On one hand, we know NetBox is using postgresql and redis to do its own database and caching: maybe we simply don't need to cache in LogStash at all and it won't be too expensive to just call the API every time? I don't know.
The text was updated successfully, but these errors were encountered: