Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Metadata-less entries in distributed cache #337

Closed
jarodriguez-itsoft opened this issue Nov 29, 2024 · 13 comments
Closed

[FEATURE] Metadata-less entries in distributed cache #337

jarodriguez-itsoft opened this issue Nov 29, 2024 · 13 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@jarodriguez-itsoft
Copy link

Problem

I’m using this amazing library to manage both combined (in-memory and distributed) caches and distributed-only caches.

Some of these caches hold millions of entries, accessed concurrently by dozens of app nodes (following a producer/consumer pattern). Thanks to a new feature introduced in recent versions of the library, we can now use distributed-only caches, which has spared us from having nodes consume massive amounts of memory unnecessarily.

So far, so good, but KeyDB is also consuming a lot of memory, and we aim to reduce its usage as well.

The issue is that FusionCache stores a lot of metadata in Redis, which is essentially useless since Redis already handles expiration, and the in-memory part of the cache is disabled.

For example, to simply store the string value "var", we get something like this:

127.0.0.1:6379> HGETALL "v0:foo"

  1. "absexp"
  2. "638710876575684414"
  3. "data"
  4. "{"$id":"1","v":"bar","m":{"$id":"2","e":"2024-12-29T16:47:37.5660416+00:00"},"t":638684956575645632}"
  5. "sldexp"
  6. "-1"

That's quite too much memory overhead when the caches have millions of entries.

Solution

Add a cache-level (or entry-level option) to control, at the distributed level, if the value should be serialized/deserialized as usual or simply stored/retrieved in its raw form.

127.0.0.1:6379> GET "foo"
"bar"

@jodydonetti
Copy link
Collaborator

jodydonetti commented Dec 2, 2024

Hi @jarodriguez-itsoft ,
I'm doing some work on this in the new v2.

The idea is not to require a manual setting, but to make it automatic based on concrete usage: I'm already doing this for memory entries (eg: cache entries in L1, the memory level), whereby I only populate the metadata if it's needed, based on the features you are using like eager refresh & co.
This means that if you are not using any feature that requires metadata, metadata will not be there.

I've never done this for the L2 (distributed level) for... reasons 🤔 ... but I'm getting back to this again.

The short version is that in v2, metadata will be there only if needed.

Will update on this.

@jodydonetti
Copy link
Collaborator

I've never done this for the L2 (distributed level) for... reasons 🤔 ... but I'm getting back to this again.

I figured out why I did that: it was to make sure that when an entry comes in from L2 (distributed) to L1 (memory), the actual expiration woul be "aligned".

So anyway, now I'm trying to find a way to do the same while avoiding the metadata.

Will update.

@jodydonetti
Copy link
Collaborator

jodydonetti commented Dec 29, 2024

Hi @jarodriguez-itsoft , since most of the work is done for v2, I'm now work on this.

Upon further inspection, I have a question regarding this:

{
  "$id": "1",
  "v": "bar",
  "m": {
    "$id": "2",
    "e": "2024-12-29T16:47:37.5660416+00:00"
  },
  "t": 638684956575645632
}

As I explained FusionCache can avoid storing the metadata part in case it's not need: for the memory level it's already doing it, but for the distributed level it is not.

One thing that pops out from your example though is that "e" field with a value in it: the field has a value because I assume you are using eager refresh, right?

Then, even by doing what I said above, you would still have the metadata field because it would be needed for eager refresh.

So my questions are:
1. are you in fact using eager refresh?
2. can you try to disable it and let me know what you see in Redis?

Corrections, see below.

UPDATE 2024-12-31
Sorry, my bad, it the current production version of FusionCache (up to v1.4.1) "e" is for the logical expiration, not the eager one.
The logical expiration is needed all the time, to allow for "aligned" expirations between nodes, and it cannot be read from the distributed cache itself since it's not exposed, so it must be there.
What does "aligned" means? Think of it like this: if you cache something for 10 min in node 1 memory + shared distributed, and after 9 min the data is read from distributed via node 2, it should be cached in node 2 for only the remaining 1 min, and not for another 10.
Because of this, in the new V2 I'm moving it outside of the metadata and into the entry itself, so that unless you are using some extra features you will not have the metadata entirely.

Hope this helps.

@jodydonetti jodydonetti self-assigned this Dec 30, 2024
@jodydonetti jodydonetti added the enhancement New feature or request label Dec 30, 2024
@jodydonetti
Copy link
Collaborator

Hi @jarodriguez-itsoft , I just release preview-4, which most probably will be the last preview before going GA with FusionCache V2 🥳

If you can play with it and let me know it would be great, thanks!

@jarodriguez-itsoft
Copy link
Author

jarodriguez-itsoft commented Jan 2, 2025

Hi @jodydonetti, sorry for late response, just returning from holidays!

I have set up a test project, which basically adds a string, with the key having the version so we can compare.

                var fusionCacheAssembly = typeof(ZiggyCreatures.Caching.Fusion.FusionCache).Assembly;
                var version = fusionCacheAssembly.GetName().Version;

                String cacheKey = "STRING-TEST-" + version;
                String cacheValue = "STRING-TEST-" + version;

                await fusionCache.SetAsync<string>(cacheKey, cacheValue, _fusionCacheEntryOptions, CancellationToken.None);

I have tested wih 1.4.1 and with the preview-4

127.0.0.1:6379> KEYS *
1) "RedisTESTv2p4:__fc:t:*"
2) "RedisTESTv2p4:STRING-TEST-2.0.0.0"
3) "RedisTESTv0:STRING-TEST-1.4.1.0"
4) "RedisTESTv2p4:__fc:t:**"

First thing I noticed are a couple of keys "RedisTESTv2p4:__fc:t:*" and "RedisTESTv2p4:__fc:t:**" which I don't know where they come from.

The cached entry from 1.4.1:

127.0.0.1:6379> HGETALL "RedisTESTv0:STRING-TEST-1.4.1.0"
1) "data"
2) "{\"$id\":\"1\",\"v\":\"STRING-TEST-1.4.1.0\",\"m\":{\"$id\":\"2\",\"e\":\"2025-01-02T10:44:02.4249031+00:00\"},\"t\":638714078424203038}"
3) "sldexp"
4) "-1"
5) "absexp"
6) "638714114425692590"

The cached entry from preview-4:

127.0.0.1:6379> HGETALL "RedisTESTv2p4:STRING-TEST-2.0.0.0"
1) "data"
2) "{\"$id\":\"1\",\"v\":\"STRING-TEST-2.0.0.0\",\"t\":638714090900474700,\"l\":638714126900474700}"
3) "sldexp"
4) "-1"
5) "absexp"
6) "638714126902732762

The metadata part seems to have dissapeared. That's cool!

To reduce it further, we created a custom serializer

    public class FusionCacheRawNewtonsoftJsonSerializer : IFusionCacheSerializer
    {
        public class Options
        {
            public JsonSerializerSettings? SerializerSettings { get; set; }
        }

        private static Encoding _encoding = Encoding.UTF8;

        public FusionCacheRawNewtonsoftJsonSerializer(JsonSerializerSettings? settings = null)
        {
            _serializerSettings = settings;
        }
        public FusionCacheRawNewtonsoftJsonSerializer(Options? options) : this(options?.SerializerSettings)
        {
            // EMPTY
        }

        private readonly JsonSerializerSettings? _serializerSettings;

        /// <inheritdoc />
        public byte[] Serialize<T>(T? obj)
        {
            if (obj is ZiggyCreatures.Caching.Fusion.Internals.Distributed.FusionCacheDistributedEntry<string>)
            {
                string? value = (obj as ZiggyCreatures.Caching.Fusion.Internals.Distributed.FusionCacheDistributedEntry<string>).Value;
                return _encoding.GetBytes(JsonConvert.SerializeObject(value, _serializerSettings));
            }
            else if (obj is ZiggyCreatures.Caching.Fusion.Internals.Distributed.FusionCacheDistributedEntry<bool>)
            {
                bool? value = (obj as ZiggyCreatures.Caching.Fusion.Internals.Distributed.FusionCacheDistributedEntry<bool>).Value;
                return _encoding.GetBytes(JsonConvert.SerializeObject(value, _serializerSettings));
            }
            else if (obj is ZiggyCreatures.Caching.Fusion.Internals.Distributed.FusionCacheDistributedEntry<int>)
            {
                int? value = (obj as ZiggyCreatures.Caching.Fusion.Internals.Distributed.FusionCacheDistributedEntry<int>).Value;
                return _encoding.GetBytes(JsonConvert.SerializeObject(value, _serializerSettings));
            }
            else if (obj is ZiggyCreatures.Caching.Fusion.Internals.Distributed.FusionCacheDistributedEntry<int>)
            {
                long? value = (obj as ZiggyCreatures.Caching.Fusion.Internals.Distributed.FusionCacheDistributedEntry<long>).Value;
                return _encoding.GetBytes(JsonConvert.SerializeObject(value, _serializerSettings));
            }

            return _encoding.GetBytes(JsonConvert.SerializeObject(obj, _serializerSettings));
        }

        public byte[] ConvertToDistributedEntryBuffer<T>(byte[] data)
        {
            ZiggyCreatures.Caching.Fusion.Internals.FusionCacheEntryMetadata metadata = new ZiggyCreatures.Caching.Fusion.Internals.FusionCacheEntryMetadata(false, null, null, null, data.Length, null);

            T? value = JsonConvert.DeserializeObject<T>(_encoding.GetString(data), _serializerSettings);
            ZiggyCreatures.Caching.Fusion.Internals.Distributed.FusionCacheDistributedEntry<T> entry = new ZiggyCreatures.Caching.Fusion.Internals.Distributed.FusionCacheDistributedEntry<T>(
                value, DateTimeOffset.Now.Ticks, DateTimeOffset.Now.Ticks,
                (string[]) null,
                metadata);
                return _encoding.GetBytes(JsonConvert.SerializeObject(entry, _serializerSettings));
        }

        /// <inheritdoc />
        public T? Deserialize<T>(byte[] data)
        {
            if (typeof(T) == typeof(ZiggyCreatures.Caching.Fusion.Internals.Distributed.FusionCacheDistributedEntry<string>))
            {
                byte[] entrybuf = ConvertToDistributedEntryBuffer<string>(data);
                T result = JsonConvert.DeserializeObject<T>(_encoding.GetString(entrybuf), _serializerSettings);
                return result;
            }
            else if (typeof(T) == typeof(ZiggyCreatures.Caching.Fusion.Internals.Distributed.FusionCacheDistributedEntry<bool>))
            {
                byte[] entrybuf = ConvertToDistributedEntryBuffer<bool>(data);
                return JsonConvert.DeserializeObject<T>(_encoding.GetString(entrybuf), _serializerSettings);
            }
            else if (typeof(T) == typeof(ZiggyCreatures.Caching.Fusion.Internals.Distributed.FusionCacheDistributedEntry<int>))
            {
                byte[] entrybuf = ConvertToDistributedEntryBuffer<int>(data);
                return JsonConvert.DeserializeObject<T>(_encoding.GetString(entrybuf), _serializerSettings);

            }
            else if (typeof(T) == typeof(ZiggyCreatures.Caching.Fusion.Internals.Distributed.FusionCacheDistributedEntry<long>))
            {
                byte[] entrybuf = ConvertToDistributedEntryBuffer<long>(data);
                return JsonConvert.DeserializeObject<T>(_encoding.GetString(entrybuf), _serializerSettings);

            }

            return JsonConvert.DeserializeObject<T>(_encoding.GetString(data), _serializerSettings);
        }

        /// <inheritdoc />
        public ValueTask<byte[]> SerializeAsync<T>(T? obj, CancellationToken token = default)
        {

            return new ValueTask<byte[]>(Serialize<T>(obj));
        }

        /// <inheritdoc />
        public ValueTask<T?> DeserializeAsync<T>(byte[] data, CancellationToken token = default)
        {
            return new ValueTask<T?>(Deserialize<T>(data));
        }


    }

to skip JSON when using base types, so when using this FusionCacheRawNewtonsoftJsonSerializer the result is :

127.0.0.1:6379> HGETALL "RedisTESTv2p4:STRING-TEST-2.0.0.0"
1) "data"
2) "\"STRING-TEST-2.0.0.0\""
3) "sldexp"
4) "-1"
5) "absexp"
6) "638714143828863737"

The question is that -although it is working- I feel it is quite inefficient due to all the unneeded boxings/unboxings required.
We are handling hundreds of operations per second in each node, and any efficiency improvement makes a real difference in CPU usage.

Any chance of having a built-in mechanism to skip working with FusionCacheDistributedEntry and directly work with values?

Regards

@jarodriguez-itsoft
Copy link
Author

Forgot to check the content of those weird wildcard entries created by preview-4:

127.0.0.1:6379> HGETALL "RedisTESTv2p4:__fc:t:**"
1) "data"
2) "{\"$id\":\"1\",\"t\":638714109287247300,\"l\":638714973287247300,\"m\":{\"$id\":\"2\",\"z\":1,\"p\":3}}"
3) "sldexp"
4) "-1"
5) "absexp"
6) "638714973297694626"
127.0.0.1:6379> HGETALL "RedisTESTv2p4:__fc:t:*"
1) "data"
2) "{\"$id\":\"1\",\"t\":638714109319894770,\"l\":638714973319894770,\"m\":{\"$id\":\"2\",\"z\":1,\"p\":3}}"
3) "sldexp"
4) "-1"
5) "absexp"
6) "638714973329316423"

Seems those are metadata entries with no value.

@jodydonetti
Copy link
Collaborator

Hi @jarodriguez-itsoft thanks for the update!

The new metadata-less mode in preview-4 is looking good, happy about it 👍

The 2 extra entries are there to support the new Clear mechanism: they are 2 instead of 1 to support both a "remove all" and an "expire all" behavior.

If you really don't want any extra cache entry you can disable Tagging (which is the underlying mechanism for Clear, too) by setting DisableTagging to true in the FusionCacheOptions, but if you'll then use Tagging or Clear an exception will be thrown.

Hope this helps.

@jodydonetti
Copy link
Collaborator

Oh, regarding the custom serializer you implemented: I'll have a better look at it later but keep in mind that the other values being serialized are there so FusionCache can work as expected. By removing them you'll probably end up with some nasty surprises, like stuff lasting in the cache (particularly in L1) longer or shorter than expected, fail-safe not working correctly and so on.

Something you can do to have a taste of these potential issues is to temporarily add your own serializer to the ones currently available in the test suite (there's an enum where you add a value and a method that instantiate it, that's it), then run all the tests.

Hope this helps, let me know!

@jodydonetti
Copy link
Collaborator

jodydonetti commented Jan 2, 2025

Forgot to check the content of those weird wildcard entries created by preview-4:
[...]
Seems those are metadata entries with no value.

It seems, but no: the value is the timestamp of when the last Clear happened, and when nothing is in the cache the default value 0 (zero) is used.

Then, when serializing, there's a setting for which default values are not emitted (to save bandwidth etc), and 0 is the default value, therefore is not emitted.

It looks like is useless but it's actually not ;-)

PS: anyway you just gave me an idea for how to optimize it even more, by skipping writing to distributed when the value is 0, thanks 😬 ! Will update you after I tried it.

@jarodriguez-itsoft
Copy link
Author

Hi Jody:

The new metadata-less mode in preview-4 is looking good, happy about it 👍

The 2 extra entries are there to support the new https://github.com/ZiggyCreatures/FusionCache/issues/331 mechanism: they are 2 instead of 1 to support both a "remove all" and an "expire all" behavior.

If you really don't want any extra cache entry you can disable Tagging (which is the underlying mechanism for Clear, too) by setting DisableTagging to true in the FusionCacheOptions, but if you'll then use Tagging or Clear an exception will be thrown.

I have made a series of test using different combinations of the DisableTagging property and ClearCache methods.
Distributed-only cache.

Here are some results:

  • As expected, DisableTagging = true no longers generates the extra entries in the Redis cache
  • As expected, when disabling tagging and trying to clear the cache, an exception is launched
  • With DisableTagging = false, calling ClearCache(true) or ClearCache(false) does not clear any entry. I guess this makes sense as there's no item in-memory metadata.

The only available batch-cleaning command at Redis is the FLUSHALL command, which clears all the entries.
Selectively clearing cache entries (e.g. by a prefix pattern) is not straightforward and I think it is out of the scope of this library.
We currently do so using a lua script, so you can guess the complexity.

Regarding the tagging overhead, I tested adding some keys and no other tagging keys were created appart from those 2.

127.0.0.1:6379> KEYS *
1) "RedisTESTv2p4:STRING-TEST-2.0.0.0"
2) "RedisTESTv2p4:STRING-TEST-2.0.0.0-ITEM-3"
3) "RedisTESTv2p4:__fc:t:**"
4) "RedisTESTv2p4:__fc:t:*"
5) "RedisTESTv2p4:STRING-TEST-2.0.0.0-ITEM-2"
6) "RedisTESTv2p4:STRING-TEST-2.0.0.0-ITEM-1"

IMHO it is completely OK, as far as tagging keys do not increase linearly with the real keys.

Oh, regarding the custom serializer you implemented: I'll have a better look at it later but keep in mind that the other values being serialized are there so FusionCache can work as expected. By removing them you'll probably end up with some nasty surprises, like stuff lasting in the cache (particularly in L1) longer or shorter than expected, fail-safe not working correctly and so on.

Something you can do to have a taste of these potential issues is to temporarily add your own serializer to the ones currently available in the test suite (there's an enum where you add a value and a method that instantiate it, that's it), then run all the tests.

Well, as I commented before, our use case is basically L1-only caches + L2-only caches.
L1 in-memory caches are meant to cache relational database data (e.g. API keys, data specific to the systems calling the APIs, ...) and usually have few keys (compared to the L2-only ones).
L2 redis caches are meant to store message-specific information that needs to be available to all nodes; these caches can have millions of entries.
So any byte saved in each redis entry really makes quite a difference in redis RAM usage, that's why we had to work on a custom serializer. No problems so far because we only use the serializer in distributed-only caches.

Your library makes a great work as an abstraction to both use cases, and by helping with the cache-stampede problem ;)
Regarding the cache-stampede, I have seem some contention exceptions raising from the library when doing stress tests; I will open a new issue when we run another stress and dig some debug information I can provide.

Thanks!

@jodydonetti
Copy link
Collaborator

jodydonetti commented Jan 5, 2025

Hi @jarodriguez-itsoft

  • With DisableTagging = false, calling ClearCache(true) or ClearCache(false) does not clear any entry. I guess this makes sense as there's no item in-memory metadata.

The design is such that a clear method call will not instantly clear the cache, although from "the outside" the end result is the same.

You can read more on the design in the original proposal.

The only available batch-cleaning command at Redis is the FLUSHALL command, which clears all the entries.

To be more precise the command would be FLUSHDB (although in cluster mode there's only 1 database, so it's the same).

Anyway I can't use that because of 3 reasons:

  • it's only available on Redis
  • I want to keep supporting any IDistributedCache implementation, without changing the api surface area
  • it would not take into account one Redis instance being used by multiple FusionCache instances (eg: via a CacheKeyPrefix)

So an actual real "remove all" operation is not suitable, on top of being quite heavy in caches with a ton of entries.

We currently do so using a lua script, so you can guess the complexity.

Eheh yep, been there done that 😅

Regarding the tagging overhead, I tested adding some keys and no other tagging keys were created appart from those 2.
[...]
IMHO it is completely OK, as far as tagging keys do not increase linearly with the real keys.

Exactly: only 2 extra entries globally per-cache + 1 for each tag used (more on this right below, it's now even better!).

Oh, and regarding this last passage: remember the idea you gave me last time, when I said this?

PS: anyway you just gave me an idea for how to optimize it even more, by skipping writing to distributed when the value is 0, thanks 😬 ! Will update you after I tried it.

I just implemented it (to be precise: skip distributed write + skip backplane when value is zero), and can confirm it works beautifully!

So, to recap, in preview-4 it was:

"1 extra entry per each tag ASSOCIATED to any entry + 2 extra entries globally ALWAYS"

while now it has become:

"1 extra entry per each tag for which a RemoveByTag has been actually CALLED + 2 extra entries globally but ONLY IF Clear(true/false) has been called"

Another nice optimization, thanks for the inspiration!

No problems so far because we only use the serializer in distributed-only caches.

Good 👍

Your library makes a great work as an abstraction to both use cases

Love to hear that 😬

and by helping with the cache-stampede problem ;)

One thing about the stampede protection when using only L2: instead of skipping L1 completely, I'd suggest to specify a super low duration locally (like 100ms or similar) and use the "real one" for L2.

Thanks!

@jodydonetti jodydonetti added this to the v2.0.0 milestone Jan 19, 2025
@jodydonetti
Copy link
Collaborator

Hi all, I still can't believe it but v2.0.0 is finally out 🎉
Now I can rest.

@jarodriguez-itsoft
Copy link
Author

That's amazing Jody!
I can only thank you for all the effort invested so far in this amazing project!

Been quite busy but as soon as I have some spare time I will test the final release and (hopefully not) tell if I find something ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants