Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to asynchronously/proactively refresh recently used keys that are about to be expired #44

Closed
abatkin opened this issue Jan 1, 2016 · 4 comments

Comments

@abatkin
Copy link

abatkin commented Jan 1, 2016

At a high-level, I want to cache a set of expensive computations for a set period of time (time-based, expireAfterWrite). I would also like to be able to proactively refresh keys (well, their values) that have been used "recently" (whatever that means) but are about to be expired/evicted. I don't want to refresh everything in the cache, since then the cache will grow forever.

For example: If I'm happy with ~1hr stale values, I can use LoadingCache and expireAfterWrite(60, TimeUnit.MINUTES). Everything works well, except that every hour-or-so, all of the "hot" keys suddenly take a long time to return (since they have been expired and now need to be recomputed the next time they are accessed).

This type of setup would be valuable when, for example, some of your keys are accessed frequently and repeatedly, and others are only rarely used, and you want to avoid a short period of slow access to those "hot" values when they expire. In my case, this is particularly bad since the common access pattern is such that most of the keys will get loaded when the system initially starts up (i.e. all at the same time) so they will all expire at the same time and then need to be reloaded at the same time (the startup delay is long to do the initial population - now I have to pay that same cost every N Time Units (where N Time Units is my expireAfterWrite() time) instead of being proactive about it).

One potential generic solution would be to add a pre-expire event which would run at a configurable time offset before any item is to be expired/evicted. The event would need access to the "last read timestamp" (and possibly other metadata?) of the key (in my case, so it could decide if it wanted to refresh the value). You could do anything you wanted. In my case, I would just call refresh(key) whenever the last read timestamp was recent enough (i.e. this is up to my own business logic).

There are a bunch of other potential solutions to this problem that I can think of, but this one is the most generic that I have found (so hopefully other people can find other good uses).

I think there would need to be a few rules around how this API would work:

  • Events are delivered on a best-effort basis (you might want to be notified 60 seconds before something is going to expire, but your callback gets run 53 seconds before)
  • Events happen asynchronously, possibly with their own Executor (they don't block anything else from happening)

The next logical step would be to make this a "bulk" API - then you would need to specify some granularity for how often to sweep for things that might expire. You might want to only check every X time units, which means your callback gets a collection of all keys/values that will expire during that X time unit period). For my use case, I'd also need a bulk refresh() API (similar to the way you can override loadAll() to load values in bulk) which doesn't currently exist either).

Thoughts?

@ben-manes
Copy link
Owner

Thanks for the detailed and thoughtful request!

Please correct me, but I think that refreshAfterWrite provides this capability in a minimal form. An entry that is stale (but not expired) is automatically refreshed if accessed. This means that entries that are stale but not accessed will be eligible for eviction by the size/expiration constraints. The user guide includes a little more depth than the JavaDoc linked above.

A limitation is that the refresh is not performed in bulk (#7) during a batch get. I believe this could be done without significant effort, but I have not put any effort towards that task yet. A LoadingCache would then have a refreshAll(keys) method to provide the optimal entry point for explicit refreshes. If coalescing one-off accesses into a batch refresh was desired, this could be done as a user hack by having the refresh simply enqueue the operation for a background thread to refresh every so often.

As with Guava and CLHM, I've tried to stick with the idea of not requiring my own threads, using O(1) data structures, and being library centric (to be customizable, rather than a dictatorial framework). That means preemptive notification without entry access is not directly available. However, hooks are provided to let users peek in to solve their custom needs. The Policy exposes enough information for your own periodic refresh thread (with a smell of reusing the Policy.Expiration interface). That would provide the ability to refresh entries regardless of being "hot", which would otherwise be done on-demand.

I think that your needs are covered, but please poke holes in my argument.

@abatkin
Copy link
Author

abatkin commented Jan 3, 2016

Very interesting. I agree that refreshAfterWrite does 90% of what I want, with slightly different semantics (and given the design constraints for Caffeine, they are entirely reasonable). I'm not sure why I didn't fully grasp that when reading the documentation, but after reading through this conversation and re-reading the docs, it is really quite clear that's exactly what I want.

The reload() method doesn't make it easy to batch up refreshes for later, but that's covered in #7. At least from my perspective, the easiest way to implement batch-reloading would be to somehow use the refreshAfterWrite logic but turn that into more of a notification, at which point I'm free to do whatever I want to batch up reload requests and manually putAll() them back. Of course that opens its own world of complexity too, but at least that's on the client side (like what to do if an item is actually evicted while my reloader is still processing, since if I'm not careful there could then be two in-flight requests for the same key).

@abatkin
Copy link
Author

abatkin commented Jan 3, 2016

Thanks again for your help - unless you see this is a useful placeholder for anything, I think it's safe to close this issue.

@ben-manes
Copy link
Owner

Great! =)

As a notification you might use replace to avoid a refresh from stampeding over an explicit write (if that's used). The races on the client would be the same internally, unfortunately. We would need to lock all entries, batch load, populate, and unlock. We can't do that with synchronized except in raw byte code, so a getAll is racy. The AsyncLoadingCache is able to perform a blocking getAll by inserting future that proxy to the batch operation. I can't see how to reuse that trick for a batch refresh, though, so I think it would always be racy by allowing duplicate loads.

If you come across ways to improve the documentation please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants