Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request provide Flush/Clear operation for entire cache #67

Open
NeilMacMullen opened this issue Mar 31, 2019 · 15 comments · May be fixed by #91
Open

Feature Request provide Flush/Clear operation for entire cache #67

NeilMacMullen opened this issue Mar 31, 2019 · 15 comments · May be fixed by #91

Comments

@NeilMacMullen
Copy link

I'm using LazyCache in a web role to cache access to some Azure tables. Occasionally (every few days) we get a request from the customer to change some of the data in the source tables and of course they would like to see this reflected "as soon as" in the web UI. At the moment the "solution" to this is just to bounce the web role but it would be a convenience if there were a Clear operation on IAppCache that would enable us to reset it to the empty state. We could then trigger this from a web-api call without having to cycle the whole web-role.

@alastairtree
Copy link
Owner

alastairtree commented May 22, 2019

This can be achieved by disposing the implementation of ICacheProvider which is typically the MemoryCacheProvider and then creating a new instance of CachingService with a new provider instance.

// Say I already have a cache that I want to dispose:
IAppCache cache = new CachingService();
 
// now I dispose the provider to clear the cache:
cache.CacheProvider.Dispose()

// Need to create a new provider to replace the disposed one
var provider = new MemoryCacheProvider(
                            new MemoryCache(
                               new MemoryCacheOptions()));

// create a new cache instance with the new provider
cache = new CachingService(provider);

(Edit - see also https://github.com/alastairtree/LazyCache/wiki/API-documentation-(v-2.x)#empty-the-entire-cache)

@NeilMacMullen
Copy link
Author

NeilMacMullen commented Jun 19, 2019

Sorry for the delayed response.... I was hoping to be able to hang onto the current IAppCache. In the context where I want to Flush it, the reference was passed in long ago via dependency-injection. Manually newing it up means I get the usual warning about "don't use this constructor - LazyCache is meant to be used with DI!" and also stops me being able to clear the whole "application cache" from the called method unless I find a way to propagate the new cache out again.

Edit To be fair I could just wrap IAppCache inside a customer class and proxy all accesses to it to accomplish this but it seems like it ought to be simpler :-)

@lunar-safari
Copy link

If there was some explanation of how to dispose of the cache entirely when used via Dependency Injection that would be great. Adding and reading to and from the cache is a piece of cake, it's the deleting and disposing that is proving to be a headache.

image

The _cache is readonly. How do we recreate it?

@migr1 migr1 linked a pull request Oct 1, 2019 that will close this issue
@ArnaudB88
Copy link

ArnaudB88 commented Mar 31, 2020

@alastairtree Can you please take a look at the pull request?
Thanks!

Edit: the code in pull request #91 does not work for me.
I currently use the following (as alastairtree suggested):

AppCache.CacheProvider.Dispose();
var provider = new MemoryCacheProvider(new MemoryCache(new MemoryCacheOptions()));
AppCache = new CachingService(provider);

To write my current solution in a LazyCache.RemoveAll() method, see #113

@alastairtree
Copy link
Owner

I have been a little hesitant about including this feature, for the same reason that it is not already implemented in MemoryCache, which is that it is both non trivial and can often indicates a code smell/be abused.

It is hard to get right because typically the cache is being accessed by a large number of threads at the same time across a large number of keys, all of which may need to be blocked from entering the cache while the clear all request is actioned. Access to the cache is only atomic by key but there is no built in atomic way to control the entire cache at once - this is deliberate to maximise cache performance.

A few cases to consider:

  • If something is mid generation when the clear all request arrives should every thread have to wait until it is finished generating so it can be removed from the cache and thrown away before the cache accepts new items. Or should we just move the new cache item into the newly empty cache? Or throw it away? Or start again and put it that version in the new cache?
  • If creating a new cache, accessing the old cache after disposal would cause exceptions so that must be avoided.
  • What should happen to a new add cache item request that arrive after the start of the clear all request, but before the clear has completed? Wait? Add to the new cache when it's ready?
  • What if you need to wait for the clear all to complete before allowing new cache entries to be generated, like the garbage collector behaviour? This might be required in some use cases because you are carefully managing memory and using the cache size feature to ensure all spare RAM is used for cache. You cannot allow a new cache to populate until the old one is empty because the new cache may fill up faster than the old one can empty, which would result in an out of memory exception.

Also, enabling a safe remove all feature should not degrade performance for normal use, so global locks are not an option.

You could do a swap to a new empty cache and redirect new requests to the new cache while you dispose of the old one. However this means for a short while you have 2 caches. Then you have more problems:

  • The performance counters are lost on the new cache
  • You have 2 sets of performance counters for a while, which ones are correct?b the results probably cannot be trusted.
  • The memory issue above may still be an issue

Often the reason people want a clear all feature is because it's easier than doing a proper job and removing only the data that needs to be refreshed by removing just a subset of known keys. It would be better if the developer were encouraged to remove items by key instead.

Often there is a trigger for the need to expire something, such as underlying data being updated and it would be better for that trigger to remove the correct data itself than just wiping out everything. Also if there is a proper known trigger then probably a cancellation token wired up in advance would be a better fit than a clear all, because that is designed to remove all those instances of cache items based on a known reference to the instance and the effort has already been arranged upfront making the removal easier.

Before adding this feature, answers to the above would need to be identified, as well as some thorough performance testing to ensure no deadlocks, exceptions or performance degradation is introduced as a result, as well as good automated test coverage of some of these scenarios.

So in summary, I'm not against it, just flagging up it is non trivial to get perfect, hence the long pause. In addition I have provided a couple of work arounds which should support the subset of users who don't care about any of the above and just want to zap their cache and either don't care about the consequences or don't have enough data/concurrency for it to matter.

@migr1
Copy link

migr1 commented Sep 22, 2020

@alastairtree Removal of all keys should not be labeled "a code smell in general". There is the perfectly valid case in which the application receives a trigger that indicates that all cached values should be reloaded. For example when there is a daily batch process that recalculates all data of a certain kind, publishing an "invalidate cache" message when recalculation is done.

Your points are valid of course, in applications where they are a concern. However, if the application requires such a tight control of the entries in the cache, IMemoryCache should probably not be used as it does not provide such guarantees on eviction of entries. I don't think there is a silver bullet here; developers need to understand and select the mechanism which is appropriate in their application.

So, regardless of whether this library has removeall/clear helper functionality, documenting the trade-offs and suggesting other methods of clearing the cache, is vital to make all developers aware of the issues and selecting the correct method for their application.

@alastairtree
Copy link
Owner

alastairtree commented Sep 22, 2020

Removal of all keys should not be labeled "a code smell in general". There is the perfectly valid case ...

No not always I agree, but often I would argue. Sometimes it is easier to clear everything than do the work that is required to clear some subset of items, but it might be better to encourage devs to implement the better solution than give them a quick and dirty solution that might cause them other issues.

I give another example that is probably a typical use case that shows another issue with a flush/clear all type feature. If using Microsoft.Extensions.DependencyInjection then the underlying MemoryCache is a singleton across the application and say I have an aspnet core web api with entity framework for data access and I want to cache some data using LazyCache. I decide to do a clear all. If I remove every single item in the cache by iterating the keys I unknowingly just removed a bunch of cached query plans generated by entity framework because they also use the default shared instance of MemoryCache, and they were removed for no good reason. If instead I dispose the memory cache and generate a new one then entity framework will still hold references to the old one and so I will likely get a bunch of unexpected exceptions for unrelated code accessing a disposed object. In both cases I would have been better removing my data from the cache than doing a clear all.

There are likely to be many other libraries like EF that also depend on the shared global singleton MemoryCache. LazyCache depending on the shared singleton by default means that performance counters just work (only one cache instance), which is a nice feature I would prefer not to lose just to add cache.RemoveAll().

@patrickklaeren
Copy link

I honestly think a lot of cons here are edge cases - it can be made clear to the developers clearing all keys is of such high concern, but the API should definitely be available. It seems too trivial to not be implemented, in all honesty. It's not a code smell.

Given the amount of comments on the PR that say otherwise, I should think there's enough compelling reason to back down and allow the users of your framework to do what they need to do @alastairtree.

@jnyrup
Copy link
Contributor

jnyrup commented Dec 27, 2020

Related issue dotnet/runtime#45593

@alastairtree
Copy link
Owner

Given the amount of comments on the PR that say otherwise, I should think there's enough compelling reason to back down and allow the users of your framework to do what they need to do @alastairtree.

The LazyCache docs list two separate ways that users of LazyCache can do a Flush/Clear all style operation without adding an extra API. Do you have a use case where these don't work?

@patrickklaeren
Copy link

Given the amount of comments on the PR that say otherwise, I should think there's enough compelling reason to back down and allow the users of your framework to do what they need to do @alastairtree.

The LazyCache docs list two separate ways that users of LazyCache can do a Flush/Clear all style operation without adding an extra API. Do you have a use case where these don't work?

They don't work the same way Clear() has been called a code smell by yourself - disposing of a cache has semantics other than just for the usage of clearing an entire cache and cancelling seems like an abuse of cancellation tokens and it's also not that trivial to implement. The listed ways seem like workarounds.

@ArnaudB88
Copy link

@alastairtree Can you please review the open pull request, and if ok complete it?
The pull request is based on your first method (from the docs) to empty the entire cache. In my opinion, a clean and more easy to understand way than method 2.

As you can see, a lot of people need this functionality, including myself. I understand you don't want to 'help the user writing bad code', but I think that's up to the user.
The example you mentioned above, with the shared MemoryCache instance, that's also the users choice. But as you can see in the pull request, no singleton MemoryCache is used anymore in the default dependency injection code (both netcore and ninject).

I hope you can help us all by adding this functionality.
Thanks.

@npiasecki
Copy link

FWIW, when I have encountered this problem in the past with caches that either don't support it or I don't actually know what all of the exact keys are, I'll use a version number in the key itself, then store the current version number in another key. When I want to invalidate a range of keys, I increment its version number, which means that (until the old cache entries expire) the cache is sometimes larger than it needs to be because it has stale data, but it has the downstream effect of now causing cache misses as if it were cleared.

Something like

var accountCacheKey = $"key_accounts_{version}_{accountId}";

and I'd pass in the accountId to my "cache this please" function.

The "cache this please" function would get the {version} by itself looking up a cache key named something like "VERSION_key_accounts" and return 0 if it wasn't there.

So then it queries the cache for key_accounts_0_bob or key_accounts_0_fred or whatever and returns that result.

So say I did some huge update and I know I want to invalidate all the keys starting with key_accounts, then I can just increment the VERSION_key_accounts entry to some sliding expiration that is longer than what I'm using on the real keys. I don't much care about a potential for a race condition of two threads trying to increment the same version prefix at the same time (in my apps, this would be pretty unusual anyway) because it's a cache and I don't expect it to be perfect.

Now the cache is getting queried for key_accounts_1_bob or key_accounts_1_fred and those will be cache misses. Sure key_accounts_0_bob or key_accounts_0_fred will hang around until they expire, but nobody is asking for them anymore.

While I agree the feature would be nice to have, I hope this workaround helps someone.

@alastairtree
Copy link
Owner

I have not seen that pattern of a version in the key before, but I like it so thanks for sharing it. A slight variation that would also work and avoids the need for a double cache lookup would be to keep the version number in a static, and to use interlocked.increment() when you want to increment the version (i.e. clear all). That way the clear all operation is atomic and fast as well.

@ArnaudB88
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants