-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Cache metadata to local disk/db #19511
Comments
Pinging @elastic/integrations (Team:Integrations) |
Persistent cache with entry TTL might be interesting for some use cases, indeed. Although not optimal, I would remove the need to share data between 2 active beats instances for now. Sharing might complicate the implementation, plus this might be a feature that we don't need anymore in the future (e.g. if we merge beats into one). In Beats we just introduced interfaces for a persistent key/value store (implementation is still under review). See package libbeat/statestore. This might be a good base for the persistence layer. Possible implementations I've tried are our memlog (mostly in memory k/v store writing updates to an append only log), badger, and sqlite. Badger seems to scale worst with increased concurrency. sqlite actually supports multiple concurrent access from multiple processes and scales quite well (although the CGO layer and full transactions for every operation slow it down a little). |
This sounds fair @jsoriano do you have any idea of the cost of retrieving this meta, did we see it causing any issue? From what I understand the client will request Apps metadata as they appear in the stream. For huge deployments this will be done by several beats (sharding) which distributes the load. |
Thanks for your comments. It looks like we agree that it may be interesting to implement something like this. We could do it as a new API-compatible alternative to
Agree, I see this as a nice to have if the implementation can easily offer concurrent reads and writes from different processes, but not as a requirement.
Yeah, statestore looks good, I wonder if we could also add an interface for backends that natively support entries with TTLs, as Badger. Though maybe it is not needed if we keep the cache on memory and only dump to disk from time to time, the strategy to follow will be something to think about.
It seems to be a potential issue on very big deployments, specially on startup when collecting metrics. With logs maybe this is not such an issue because not everything is going to be continuously logging, but metrics arrive as a continuous stream, so it is quite likely that already during the first seconds many apps metadata is going to need to be requested.
Yes, but there is actually little control on how this sharding is done, it is possible to receive events from the same App on several beats, so at the end every beat still needs to know about lots of Apps. On that sense it is different to our usual deploy model in Kubernetes, where each beat is only going to be interested on pods running on one node. |
By the way, auditbeat also uses a datastore on disk, based on bbolt: https://github.com/elastic/beats/blob/v7.8.0/auditbeat/datastore/ |
This is a more general problem, not only with meta-data. My proposal would be to introduce support for randomizing the initial run of a module. Having a time-period T and uniformly randomizing the first collection of a module within the range of
I originally have TTL in the interface, but decided for a more minimal interface at some point. The reason for TTL removal was (besides simplifying the interface) also that it was not what I wanted. We have a TTL implemented on top, but for removal I figured I need a more sophisticated predicate like (TTL + resource actually not in use). Same for cache, I would consider to not have the TTL provided by the store, but allow for a custom cache that uses a key-value store as backend only. E.g. do you want to be able to 'pin' some state in the cache, when to renew?. Do we need a hard TTL only or or an LRU cache? You know, naming and caching are hard ... and time... OMG is time hard :P If we could spread the load better, would we even need a cache? The problem seems to be resource usage/contention. Randomization is one of the simplest mitigation strategies. |
Metricbeat modules already have a random delay for initialization. The cloudfoundry module is a push reporter, it doesn't have control on when metrics are received, it keeps a connection to the cloudfoundry endpoint, and cloudfoundry pushes the metrics in a continuous stream. We could do something like dropping metrics that are received more frequently than a period, but this would force us to keep track of all the metrics we are receiving (what could require a cache on its own 🙂 ). We could also introduce some throttling to avoid hitting the metadata APIs too much. Then initial events wouldn't be enriched (or they would be delayed or dropped) till the cache warms-up.
An LRU cache with enough capacity could cover many cases where we are only using metadata that doesn't change over time, but we need to renew at some moment for other cases, and in general we are using the TTL for that. We also rely on the TTL to clean up entries that are not needed anymore (but there could be other strategies for GC). If we continue needing caches with expiration, and the backend we use provides it, it would be nice to use it, otherwise we need to maintain our implementation on top.
At least with cloudfoundry metrics (and can be with the collection of any other event, or logs in general too) the fact is that we don't have control on when messages are received, we could add some additional logic to retain events so we spread better the load caused by collection of metadata, but I am not sure if the potential complexity of this logic would worth it. |
For the case of PCF in particular, it seems that most deployments also contain redis, operators are familiar with it and we already have the client library in. Could this be an option instead of taking care of the cache ourselves? |
I think so, this is the kind of implementation I was thinking of when I mentioned that cache could be shared between beats, but I didn't want to enter on the specific implementation to use at this point. I would still think on having caches that don't require an external service, for scenarios where there is no one available for this. Take also into account that if we use something like |
Pinging @elastic/integrations-platforms (Team:Platforms) |
@urso @exekias an update on this. I tried some options on #20707, #20739 and #20775, that more or less covered these approaches:
Summarizing, the option that I found better is the last one, a new cache implementation based on badger, I will clean-up the PR for review (#20775). Let me know if you think on some blocker for this approach. Some observations during my tests:
More general thoughts affecting all options:
// cc @masci |
Thank you for such a detailed research! Going for badger sounds good to me. Being able to close processors sounds like a requirement (?), let's keep discussing in #16349 |
Correct, statestore is an in memory kv-store. The disk is only used to guarantee we can start from the old state. The cursor input manager in filebeat has a collector with TTL build in. We could extract the GC implementation. Just mentioning :) , badger or sqlite would be fine with me.
I did test filebeat registry with statestore, badger, and sqlite as backends. Single writer throughput was pretty good for all of them (sqlite has had slight overhead). Unfortunately badger seems to scale pretty badly. With 100 concurrent files being collected badger did have a clear contention problem. If you keep the number of routines and updates small, all are fine. But sqlite seems to scale much better on concurrent load.
We've got CBOR support in Beats. CBOR encoding/decoding is much faster and requires less CPU. Especially if many strings are involved (JSON requires strings to be scanned for UTF-8 compliance). Memlog unfortunately uses JSON to be somewhat backwards compatible with the old filebeat registry format. |
Good point, I didn't test with high concurrency. But also I wouldn't expect very high concurrency on metadata enrichment, at least for the cloudfoundry case where in general there is only going to be one events consumer.
I will give a try to CBOR 👍 thanks for the suggestion. |
Maybe. But keep in mind that processors are normally run in the go-routine of the input. On the other hand, it is just a cache. If we have to change the backend in the future because requirements changed it is a no brainer.
Check out the codec.go from spool file support. It supports JSON, CBOR, and UBJSON. For write loads CBOR has had a small advantage over UBJSON. UBJSON a slightly faster for reads though. |
To enrich events collected from Cloud Foundry, Beats need to query the apps API, this is done now by
add_cloudfoundry_metadata
, and it uses an in-memory cache to avoid querying again for the metadata of apps already seen. Metadata is cached by default during 2 minutes, but this time can be increased with thecache_duration
setting.Big Cloud Foundry deployments can have several thousands of applications running at the same time, in these cases
add_cloudfoundry_metadata
needs to cache a lot of data. Increasingcache_duration
can help on these cases to reduce the number of requests done. But on restarts this data is lost and needs to be requested again on startup, provoking thousands of requests that could be avoided if this data were persisted somewhere else. Other implementations of Cloud Foundry events consumers (nozzles) persist the app metadata on disk to prevent problems with this.Beats have several features that use internal in-memory caches. In general they don't contain so much data, but it could also happen for example with kubernetes since we support monitoring at the cluster scope (see #14738). So maybe other features could also benefit of having some kind of persistence between restarts.
If persisted to disk, we would need to make sure that the data directory also persists between restarts. This is relevant when running Beats on containers/Kubernetes.
Depending on how this is implemented, persistent cache could be shared between beats, so for example Filebeat and Metricbeat monitoring the same Cloud Foundry deployment from the same host could share their local caches if they are in some local database, or if they have access to some common data directory.
The enrich processor of Elasticsearch could help here, but at the moment only Filebeat supports pipelines, and even with support for pipelines it wouldn't be so clear how to add this step to existing pipelines.
@exekias @urso I would like to have your thoughts on this.
The text was updated successfully, but these errors were encountered: