-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for limiting expiry of cached objects #75
Comments
If (2) is acceptable, then you can use the The cache uses only O(1) algorithms to provide consistent performance regardless of size and activity. As fixed expiration is often the most useful, it benefits by also being easily modeled by shuffling entries in a linked list. The proposal in (1) could suffer O(n) scans. A common approach to expiration (e.g. memcached) is to perform it lazily and rely on a capacity constraint. If the fetched entry is expired then emulate a miss (e.g. get/load, remove(k,v) if expired, get/load). This pollutes the cache with expired entries, but the size eviction ensures that they are removed eventually. If applicable you could implement that strategy by evaluating your custom criteria. Otherwise you're left with potentially O(lg n) and similar approaches. That means there isn't anything more intelligent that we can do, or that feels less hacky, then what you would have to do in your code. The API tries to be malleable enough to help you get your job done, but sometimes its better to use a different tool if its not the right fit. Usually domain knowledge in the eviction policy either has a huge gain or none whatsoever, with the latter being more likely. So if you go down that path you might want to measure to see if your optimization is helpful in practice. |
If the window in (2) is not acceptable a very hacky but similar idea is to use a Basically you want to disable expiration for certain entries. That's kind of like how an entry can have a weight of 0 indicating it won't be evicted by a size constraint. In that case the policy skips over those entries, but a pending task is to move them into a dedicated queue to avoid the scan. It hasn't been a problem since the feature is rarely used, except for The request seems to be that a predicate evaluates an entry to determine if it is eligible for expiration and place it on the proper internal queue. That would be doable, but leads to a clumsy API if an ad hoc predicate. If like JCache requires per-entry expiration and, like the vendors, ours is currently performed lazily by relying on a maximum size constraint. I have been planning on playing with hashed timer wheel as an alternative, since JCache doesn't enforce that a size constraint must be applied. At best that might be promoted into the core library with a warning regarding the impact on the time complexity of maintenance operations. |
Thanks for the detailed replies Ben. I ended up using your CacheWriter recommendation to implement what I need. I think you have captured what I am trying to achieve, with the clarification that the predicate would only be evaluated if the time based expiry evaluation indicated that the entry was expirable. My way of looking at it is as an override that might cause a "reset" on the expiry timer of a cached object when the timer expires and the predicate matches. |
Sorry if I'm overly detailed in my responses. Its a habit as I think through everything, perhaps talking to myself half the time :-) From an API standpoint the predicate to opt out of expiration is too ad hoc. Too many of those in a library (or language) results in something clumsy and hard to maintain. So short term its not the right solution due to Bloch's conceptual weight rule and your current workaround is an acceptable loss. But iterating towards an efficient variable expiration scheme will probably be a good resolution. The cost then would be losing O(1) behavior for expirable entries, but timer wheels have been shown to scale well in real systems. But they can have poor cascading effects without small adaptions like adding random jitter. I need to follow up on the latest techniques, such as being tickless, so that we have a good implementation to support the resulting API. |
No need to apologise. I appreciate the thought put into the answers. I agree, the current workaround is ugly but works fine, and my proposal is rather specific. I look forward to a general approach evolving. Happy to keep the dialog moving forward as your design crystalises. |
In case you weren't aware, Agrona has adopted Netty's hashed timer wheel. https://github.com/real-logic/Agrona |
You might be interested in JCTools #109. Since @mjpt777 subsequently tweeted a link I posted and that it is deprecated, there's a good chance there's doesn't include many additional optimizations. But to be honest I haven't spent any time on evaluating this approach. I think it would work out well, but I'm hesitant to add it before doing a deep dive into the techniques to improve the worst-case scenarios. Maybe the holidays will be a good time to shift focus, as it would be a fun experiment. |
I've started to think about this again. As discussed, the implementation would be based on a hierarchical timer wheel (see slide 10). The next challenge is to design an API that feels natural but also performs well. I'd appreciate help on this part as I am unhappy with the stub described below. Expiration response:
Constraints:
public interface Expirer<K, V> {
/**
* @param now current time, in nanos
* @return
* r < 0 : don't expire (eternal)
* r == 0: no change
* r > 0 : the amount of time before expiring in nanos
*/
long create(K key, V value, long now);
long update(K key, V value, long now);
long read(K key, V value, long now);
default long expiresIn(long expirationTime, @Nonnull TimeUnit unit, long now) {
return Math.max(1L, unit.toNanos(expirationTime) - now);
}
default long noChange() {
return 0L;
}
default long neverExpires() {
return -1L;
}
} I don't think I'm also unhappy with the interface name: Would might also want to consider a @blschatz @oakad @jgoldhammer @yrfselrahc @Maaartinus @lowasser |
To me, this feels wrong. Eternal and in 2**63 nanoseconds (292 years) is about the same, so I'd go for No idea how to handle "Don't change current expiration" properly. |
You're right. I think I confused myself and wrote this up too quickly during my morning commute. Already expired is a quirky concept. It would mean that a newly loaded value could have been expired, we should not return it to the user, and notify the removal listener. It might make sense if one we wanted to let the user hold the timestamp instead, but then we'd have to query it when shuffling the entry in the timer wheel or allocate a wrapper to publish into the read buffer. Any foreign interface can throw exceptions and we lose ability to dictate the memory visibility constraints. Perhaps to handle "no change", we pass in the current expiration time as well. If no change is desired then the user should return that duration. So we could reduce this down to a single answer of "expire in ..." which has to be positive. |
public interface Expiry<K, V> {
/** Returns the duration in nanoseconds until the entry has expired. */
long expireAfterCreate(K key, V value, long currentTimeNanos) {
return TimeUnit.MINUTES.toNanos(5);
}
long expireAfterUpdate(K key, V value, long currentTimeNanos, long expirationTimeNanos) {
return expirationTimeNanos - currentTimeNanos;
}
long expireAfterRead(K key, V value, long currentTimeNanos, long expirationTimeNanos) {
return expirationTimeNanos - currentTimeNanos;
}
} The above example would satisfy the requests for I don't know of a use-case for I still dislike all of the interface names that I can think of. Thoughts? |
I did think about that quite a lot. Actually, the current interfaces in cache2k for expiry went through several redesigns in the last years after we found some caveats in production. The interface is called ExpiryPolicy there are five return values with a special meaning as described in ExpiryTimeValues. Besides that, there is also an interface that can be implemented by the value class that provides the expiry time, ValueWithExpiryTime. It also needs to be noted, that the ExpiryPolicy is used for the refresh ahead timing as well and there is a special policy that controls expiry times for caching or suppression of exceptions, called ResiliencePolicy. Seriously speaking, the whole thing is quite over engineered, but it proofs very useful. (Sorry, I am aware this has an evangelical touch. However, we already have some practical experience on exactly that topic.) Some comments on the actual proposal: Expiry interface with three methods create(), update() and read() Looks similar to JCache. So far, I never had a use case for a different expiry on duration With a time reference in the parameter returning a duration or point in time can do the same things. I decided for point in time for two reasons:
resolution A millisecond resolution seems to be enough and more practical. OTOH the nano resolution better fits the design philosophy of Caffeine. See below. randomization, sharp or lagging expiry cache2k doesn't do expiry time randomization currently, however we experience some spikes in production and see a need for it. The interface already has a distinction between expiry at an exact point in time (called sharp expiry) and expiry that should happen approximately at a time, so the cache can optimize and expire, which is in reality mostly a refresh, when there are sufficient resources. Talking about randomization as example, cache2k would typically have a configuration switch, while Caffeine, with the library philosophy, would probably go with a randomization adapter on the user expiry policy. In that way using nanosecond resolution makes sense. However, with the current proposal the cache implementation needs to decide for one semantic:
Typically, it would be the first one. But, OTOH, if
We call the use case "dynamic expiry". The duration can be adjusted, depending on how aggressively the data is changing. For example if the available seats in a flight change from 10 to 3 you might do a shorter duration for refresh, then when they change from 180 to 179. You need also the load time of the previous entry, that's why we put in a complete Side note: Providing the previous value is useful for the loader, too. See the comments in AdvancedCacheLoader Just shoot, if there are any questions. For the next release of cache2k I am planing a reimplementation on the actual expiry process, to improve resource usage, so I am looking forward to see some more thoughts on how to implement it efficiently. |
I'm not sure I follow. Is the proposal that at expiry time, the Expirer.update() method would be called, wherein the specific logic (where we want to keep the entry alive) could return a time in the future, thus keeping the entry alive? |
Thanks Jens. That practical, well thought out, and pragmatic feedback that I appreciate. In regards to your points, interface For most usages, the The 3 method interface could be reduced via a functional builder, e.g. duration When I originally tried to model it as point in time it resulted in an ugly interface similar to others. The duration model appears to result in a cleaner API and adds only a two cycle penalty (one subtraction in user code, one in cache). I think we can iterate and, I think, the duration style is likely to come out on top. resolution Nano matches randomization You're right, I would leave approximations to a decorator or custom variable policy. It is nice to provide utility methods or documentation to assist, though. The stampeding effect is something users have to consider for all resources, but this would allow for that compared to the existing API. dynamic expiry I'd probably lean towards asking users to store the timestamps in their value, as you suggest. I also think developers over estimate the value of domain-specific policies. Most often more general strategies are better and easier to reason about, e.g. using pub-sub instead of trying to dynamically tune the refresh time. _ efficiency_ See slide 10 for a very short overview of timer wheels. They can either rely on a ticking thread or be tickless, the latter being what Caffeine would use. There are many good papers, articles, and code examples that discuss in depth. While that is amortized O(1), for fixed expiration we use LRU/FIFO time bounded queues which are strict O(1). |
This would not cover the scenario of resurrection. In your approach of an expired entry querying a predicate, this could have a storm of calls on near expiration. That's because reads would have to call before faking a miss. If the predicate was implemented as a synchronized or I/O operation (e.g. check database) then this would be harmful. I don't think that approach would be supported. Instead your 2nd approach would be doable with a The proposal is for expiration to be configured per-entry. Whenever a read or write operation is performed the user can determine when the entry will expire. They can use fields of the value to determine what makes sense, e.g. if the duration should be based on the entry's database timestamp. Then one entry might have a ten minute lifetime and another an hour lifetime, without requiring O(lg n) operations to sort them. This proposal would still require periodic operations on the cache entry to revisit its expiration time. Sorry if that proposal doesn't fit your use-case (there were a few open issues, and yours was the most technical discussion). It would give you a little more flexibility in the re-evaluation during the writer/loader shuffling if you wanted to promote an entry to eternal or use a non-fixed expiration interval. |
Cannot agree more, but so far no system we had to integrate provided pub-sub :( |
The expiration time can now be customized on a per entry basis to allow them to expire at different rates. This is acheived in O(1) time using a timer wheel and evaluating using the new Expiry interface. This setting can be combined with refreshAfterWrite, but is incompatible with the fixed expiration types (expireAfterAccess, expireAfterWrite). While the test suite was updated to incorporate this new configuration option, there is still remaining work before this should be released. - New tests specific to this feature (such as exceptional conditions) have not yet been written - Incorporate a data integrity check for the timer wheel into the validation listener - Inspection through cache.policy() - JCache integration - Documentation
This feature is implemented, but additional tests are needed before release. |
The expiration time can now be customized on a per entry basis to allow them to expire at different rates. This is acheived in O(1) time using a timer wheel and evaluating using the new Expiry interface. This setting can be combined with refreshAfterWrite, but is incompatible with the fixed expiration types (expireAfterAccess, expireAfterWrite). While the test suite was updated to incorporate this new configuration option, there is still remaining work before this should be released. - New tests specific to this feature (such as exceptional conditions) have not yet been written - Incorporate a data integrity check for the timer wheel into the validation listener - Inspection through cache.policy() - JCache integration - Documentation
The expiration time can now be customized on a per entry basis to allow them to expire at different rates. This is acheived in O(1) time using a timer wheel and evaluating using the new Expiry interface. This setting can be combined with refreshAfterWrite, but is incompatible with the fixed expiration types (expireAfterAccess, expireAfterWrite). While the test suite was updated to incorporate this new configuration option, there is still remaining work before this should be released. - New tests specific to this feature (such as exceptional conditions) have not yet been written - Incorporate a data integrity check for the timer wheel into the validation listener - Inspection through cache.policy() - JCache integration - Documentation
The expiration time can now be customized on a per entry basis to allow them to expire at different rates. This is acheived in O(1) time using a timer wheel and evaluating using the new Expiry interface. This setting can be combined with refreshAfterWrite, but is incompatible with the fixed expiration types (expireAfterAccess, expireAfterWrite). While the test suite was updated to incorporate this new configuration option, there is still remaining work before this should be released. - New tests specific to this feature (such as exceptional conditions) have not yet been written - Incorporate a data integrity check for the timer wheel into the validation listener - Inspection through cache.policy() - JCache integration - Documentation
Released 2.5.0 |
I would like to limit the expiry process of cached values, based not only on time, but also on properties of the object. I can see two ways to achieve this, both of which would require API changes:
The latter approach leaves a window of time where a new value could be created, then replaced by the put() (a putIfAbsent() would appear to solve this).
Is there a way to achieve the above use case using the current API?
The text was updated successfully, but these errors were encountered: