-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refresh as a non-blocking write #56
Comments
This is progressing on the refresh branch. The changes match closely to the above, but further work is needed to complete. The existing tests cases pass. The pending tasks include,
|
This all sounds sensible to me, though I'd workshop the proposed default
method name for simplicity.
I agree with your hesitation to add another per-entry field for
expireAfterWrite.
Charles
|
I'm not sold on the method name either and am hoping to hear suggestions. I figured that I should try to get the rest of the code ready in the meantime, though. |
One idea would be resolveRefreshConflict, and then have it return V instead Charles On Mon, Mar 14, 2016 at 12:46 PM Ben Manes [email protected] wrote:
|
The concern with that approach is if the resolved value is neither the current value nor the refreshed value ( |
Also, if the user decides to return a new object, it means he would probably be blocking the thread because he might need to load that object from somewhere. So we possibly want to discourage that... |
Alternatively the method could be called resolveRefreshConflict and it Charles On Tue, Mar 15, 2016 at 4:45 PM Etienne Houle [email protected]
|
An enum indicates that the options might expand, which could only be to reload yet again. It also carries the weight of increasing the API further and dealing with Perhaps this method doesn't carry its conceptual weight? I'm sure very few users will utilize the conflict handling and some resolution is needed. Guava's clobber/insert is simple and correct in most cases, though arguably could have (hopefully benign) races. Dropping would be safest and less efficient, but requires a new removal cause. If this method shouldn't be included, at least yet, then I'd be inclined to follow Guava's lead here. I don't recall the internal discussion for refresh and why conflicts were resolved that way. |
It occurred to me that this same race occurs on a
That is slightly too narrowly worded because the last load will clobber an existing entry for any reason, including an explicit insert. This allows clobbering to insert stale data. It seems better to generalize this enhancement to resolve conflicts for any type of load. That means the method would be called |
I plan on cutting a minor release (2.3) soon and, given how close this is, it bugs me not to finish it off. I think my only mental blocker is the name.
There's probably a few other good options on a better thesaurus. Ideally the method chosen sounds like a question and it is obvious what the reaction will be to the answer. I think "preserve" is the closest word I can find to describe intent and minimize confusion. |
If I had to vote, I'd say that you can intuitively understand what "retain" and "preserve" will do without having to think, whereas with "resolve" it is not clear exactly what that means. |
The good thing about "resolve" is that since you can't guess, you have to look it up :-) For "retain" I don't know if I'm confusing myself and no one else would be. The caller just loaded a value and sees that a different one exists in the cache. From its perspective "retain" might be misread as not discarding it. By definition I think its "continue to have; keep possession of" would mean that the current value is kept and the refreshed value is discarded. I don't know if I'm confusing myself from all of the other details, or if others wouldn't immediately know what "retain" meant. "preserve" is a very strong word, "maintain (something) in its original or existing state." That is extremely clear to me that the current value should be kept and a newly loaded value discarded. So I'm leaning towards that name, which I hadn't thought of using until today. Do you have a preference? |
I see what you are saying. And names are important. I have no preference between "retain" and "preserve", but vote against "resolve" (even if it might drive more folks to read the documentation). |
Thanks. I'm going to see if I can finish this task up in the next couple of days. I think that's mostly tests and documentation. Then I shouldn't have a good excuse for punting on #7, other than finding the time to focus on it. |
My preference is for retainOnConflict or resolveOnConflict, mainly since I continue to feel that returning an enum, while strictly unnecessary, goes Looking back at this after previous discussions I had to reread your text Charles On Tue, Apr 19, 2016 at 8:06 PM Ben Manes [email protected] wrote:
|
I think with retainOnConflict it's not clear if it means retain the modified value or the refreshed one. I think 'preserve' is clearer that it should keep the one already there (the modified one). I also like the 'resolve' with an enum. Alternatively with could invert the query, eg. clobberOnConflit (or any synonymous of clobber). |
I agree that "retain" and "resolve" feel more natural in the context of a cache. I think preserve is less ambiguous. I'm a little weary of an enum, but could be convinced. I'm unsure if I have created a more confusing mess by deviating from Guava's clobbering behavior, or at least not making it default. Take for example the timeline,
In Guava, Charles had this insert the new value. But we don't know why it changed underneath us - it could be benign like expiration, or we could be loading old data if explicitly invalidated. If we reject conflicts by default then the refresh drops the new value. I suspect Charles would argue that the consistency around caches is murky and users should source from the system of record if they need strict accuracy. That then leads to clobbering being a natural, because inconsistency may be unavoidable. If so, then exposing some conflict handling to the user isn't too important. And if exposed but generally unavoidable, then clobbering as the default highlights that. What would you expect / want? |
When you phrase it that way I actually prefer the clobbering. It sounds like the refresh behavior simply needs to be documented, Charles On Wed, Apr 20, 2016 at 1:08 PM Ben Manes [email protected] wrote:
|
Do you prefer clobbering with documentation (e.g. Guava), or clobbering by default with documentation? My concern with blindly clobbering is that it hides races that could cause inconsistency. Yet you're right that its usually benign and not too worrisome. Is there tangible value to let the user opt-in to deciding the conflict resolution or do you view that as adding unnecessary conceptual weight? |
Well, I favor a well-defined clobbering that does not expose a knob to Doesn't a similar problem exist when an explicit write occurs while a Charles On Wed, Apr 20, 2016 at 2:29 PM Ben Manes [email protected] wrote:
|
Do you mean a
A change like a |
I read through this again and now I see your point. In In ConcurrentHashMap it locks the bin (ideally one per entry). Any subsequent write blocks on the bin waiting for the preceding one to complete. I think that's the only reasonable approach for the map because it doesn't have a removal listener, so clobbering could have surprising side-effects like resource leaks. Since we don't clobber on computations, it doesn't give us a enough of a pattern to emulate. |
Maybe clobbering is the only consistent option? If we think of this in terms of the ABA problem there might not be any other correct way. I was assuming that if
That is convoluted, but technically possible. The tag to compare would be the write time, which makes sense for So either we block other writes (current), clobber, or have ABA unsafe conflict handling. We don't want the last one, so the whole method non-sense is out. We want to avoid the blocking writes, potential of wasting an async thread, and precedent is set in I guess I went the long way around to arrive back at what you did in Guava :-) |
Looking forward to this :) (had to temporarily rollback using caffeine which was causing this deadlock in our asynchronous server, not sure why exactly it happened but pretty sure full asynchronity would cause it to go away - https://gist.github.com/anuraaga/49eb6d92a9b4c656126743314599a56c) As a user of the cache the clobbering sounds fine - as you say, I wouldn't expect perfect consistency from a LoadingCache and generally would expect an extra refresh cycle to catch updates in cases like this one. A part of me doesn't like how it means evicted entries would come back to life thanks to the refresh, but I guess some level of thrashing at the low end of the LRU scale is expected. Also FWIW, I guess in practice, most use-cases of a LoadingCache probably only ever call get(), and some may call refresh() (critical read). IMO, as long as puts aren't completely broken, it's probably not worth worrying about their consistency too much |
Writing tests shows that I'm bad at keeping all the details of Guava, Caffeine, and ConcurrentHashMap straight. Guava's There's not much I can do here, unfortunately. The options appear to be,
I think (3) is possible and could be done without being excessively invasive. It requires care, but I think is doable. However I'd probably also define the behavior as (2) being acceptable. That allows integrating the current patch and, if (3) is done, not needing to do similar for the fully unbounded cache (a simple decorator for CHM). The latter would avoid adding a value wrapper for this case. This seems reasonable. The worst cases of (2) failing are similar issues Guava has, e.g. that an |
Thanks @anuraaga. I think with the above mentioned comments it would guard against evicted entries being resurrected to some degree. Thanks for the stack trace and sorry to hear there's an existing problem. It would be nice to figure out how to reproduce that in a unit test to validate that my patch won't suffer that effect. I hope to finish my patch tonight or tomorrow, so a release Friday or Saturday is likely. |
Thanks - I've looked into our thread dump in more detail and put what details seemed relevant in #69, maybe it provides clues on a repro. |
Guava's performs a
This indicates the previous value is the instance. If updated Guava discards the refresh and explicitly calls out eviction. The statement of What is the desired behavior here? |
My personal preference with cache loading is to think of it in terms of I've thought less about refresh. It seems fine to me if a refresh begins In other words, I argue for all manual user additions to the cache (as well Charles On Fri, Apr 22, 2016 at 5:04 AM Ben Manes [email protected] wrote:
|
I think what kept causing me confusion is that if a user's mutation should cancel a refresh, then arguably an My mistake was to try to provide a predictable sequence that properly handled the various cases. In retrospect it unnecessary and following Guava's lead here is preferred. That is simple and consistent. There are other places where linearizability would be broken as well, in both Guava and Caffeine, and that's acceptable. If a user wants much stricter behavior then Despite going around in circles for too long, I think we have a conclusion. At the very least I understand the rational and can have tests for them. |
Previously refresh was implemented as a computation performed on the executor. Like all writes it allowed concurrent reads, but blocked other writes like updating, invalidating, or evicting. This provided strong consistency at the cost of lock granularity (hash bin) and potentially wasting a thread in an asynchronous cache. This simple model was also shown to be broken, as per the deadlock reported. A refresh is now performed without blocking and better matches Guava's. The newly loaded entry is dropped if the mapping now points to a different value. Like Guava, if the entry disappears (e.g. eviction) the loaded value is inserted. Usage through refreshAfterWrite is preferred and it will try to avoid redundant in-flight loads. Unlike Guava a LoadingCache#refresh() cannot detect and ignore redundant loads. It may be possible to strengthen the implementation, but explicit refreshes are rare. Similar to Guava, the approach is not ABA safe but best effort and does what users would likely prefer. For stricter reload behavior, users should perform a Map#compute instead. Load testing uncovered a weighted eviction bug with a cache heavily dominated by zero-weight entries (e.g. incomplete futures). The main space eviction would find no victims and needed to fallback to scan the admission window. Thanks to everyone who helped in the discussions to wrap my head how to properly implement this.
Previously refresh was implemented as a computation performed on the executor. Like all writes it allowed concurrent reads, but blocked other writes like updating, invalidating, or evicting. This provided strong consistency at the cost of lock granularity (hash bin) and potentially wasting a thread in an asynchronous cache. This simple model was also shown to be broken, as per the deadlock reported. A refresh is now performed without blocking and better matches Guava's. The newly loaded entry is dropped if the mapping now points to a different value. Like Guava, if the entry disappears (e.g. eviction) the loaded value is inserted. Usage through refreshAfterWrite is preferred and it will try to avoid redundant in-flight loads. Unlike Guava a LoadingCache#refresh() cannot detect and ignore redundant loads. It may be possible to strengthen the implementation, but explicit refreshes are rare. Similar to Guava, the approach is not ABA safe but best effort and does what users would likely prefer. For stricter reload behavior, users should perform a Map#compute instead. Load testing uncovered a weighted eviction bug with a cache heavily dominated by zero-weight entries (e.g. incomplete futures). The main space eviction would find no victims and needed to fallback to scan the admission window. Thanks to everyone who helped in the discussions to wrap my head how to properly implement this.
Previously refresh was implemented as a computation performed on the executor. Like all writes it allowed concurrent reads, but blocked other writes like updating, invalidating, or evicting. This provided strong consistency at the cost of lock granularity (hash bin) and potentially wasting a thread in an asynchronous cache. This simple model was also shown to be broken, as per the deadlock reported. A refresh is now performed without blocking and better matches Guava's. The newly loaded entry is dropped if the mapping now points to a different value. Like Guava, if the entry disappears (e.g. eviction) the loaded value is inserted. Usage through refreshAfterWrite is preferred and it will try to avoid redundant in-flight loads. Unlike Guava a LoadingCache#refresh() cannot detect and ignore redundant loads. It may be possible to strengthen the implementation, but explicit refreshes are rare. Similar to Guava, the approach is not ABA safe but best effort and does what users would likely prefer. For stricter reload behavior, users should perform a Map#compute instead. Load testing uncovered a weighted eviction bug with a cache heavily dominated by zero-weight entries (e.g. incomplete futures). The main space eviction would find no victims and needed to fallback to scan the admission window. Thanks to everyone who helped in the discussions to wrap my head how to properly implement this.
Previously refresh was implemented as a computation performed on the executor. Like all writes it allowed concurrent reads, but blocked other writes like updating, invalidating, or evicting. This provided strong consistency at the cost of lock granularity (hash bin) and potentially wasting a thread in an asynchronous cache. This simple model was also shown to be broken, as per the deadlock reported. A refresh is now performed without blocking and better matches Guava's. The newly loaded entry is dropped if the mapping now points to a different value. Like Guava, if the entry disappears (e.g. eviction) the loaded value is inserted. Usage through refreshAfterWrite is preferred and it will try to avoid redundant in-flight loads. Unlike Guava a LoadingCache#refresh() cannot detect and ignore redundant loads. It may be possible to strengthen the implementation, but explicit refreshes are rare. Similar to Guava, the approach is not ABA safe but best effort and does what users would likely prefer. For stricter reload behavior, users should perform a Map#compute instead. Load testing uncovered a weighted eviction bug with a cache heavily dominated by zero-weight entries (e.g. incomplete futures). The main space eviction would find no victims and needed to fallback to scan the admission window. Thanks to everyone who helped in the discussions to wrap my head how to properly implement this.
Previously refresh was implemented as a computation performed on the executor. Like all writes it allowed concurrent reads, but blocked other writes like updating, invalidating, or evicting. This provided strong consistency at the cost of lock granularity (hash bin) and potentially wasting a thread in an asynchronous cache. This simple model was also shown to be broken, as per the deadlock reported. A refresh is now performed without blocking and better matches Guava's. The newly loaded entry is dropped if the mapping now points to a different value. Like Guava, if the entry disappears (e.g. eviction) the loaded value is inserted. Usage through refreshAfterWrite is preferred and it will try to avoid redundant in-flight loads. Unlike Guava a LoadingCache#refresh() cannot detect and ignore redundant loads. It may be possible to strengthen the implementation, but explicit refreshes are rare. Similar to Guava, the approach is not ABA safe but best effort and does what users would likely prefer. For stricter reload behavior, users should perform a Map#compute instead. Load testing uncovered a weighted eviction bug with a cache heavily dominated by zero-weight entries (e.g. incomplete futures). The main space eviction would find no victims and needed to fallback to scan the admission window. Thanks to everyone who helped in the discussions to wrap my head how to properly implement this.
Released |
Currently
refreshAfterWrite
andLoadingCache.refresh(key)
are performed as blocking writes (computations). This was a simple to implement approach that does not obstruct reads, avoids duplicate in-flight calls, and postpones write expiration. However it has a few drawbacks that a fully non-blocking version would not,ConcurrentHashMap
locking on thebin
levelasyncReload(key, executor)
does not use the provided executorA fully asynchronous refresh must handle conflicts and notify the removal listener appropriately. A conflict occurs when the refresh completes and the mapping has changed (different or absent value). This is detectable as a
reload
computes a new value based on the key and old value. The refresh cannot know whether the change is benign or whether it would insert stale data.In Guava the refresh clobbers or inserts the entry, but due to races dropping is the only safe default choice. That default is potentially wasteful, so the user should have the option to resolve the conflict instead. This can be done by adding a new default method to
AsyncCacheLoader
,The removal listener must be notified to avoid resource leaks. In Guava the removal cause is
replaced
, but that is intended as a user actions rather than an automatic one. As we need to signal when a reload replaced or dropped the value, to avoid confusion arefreshed
should be added to the enum.The post-processing of a reload will be implemented by attaching a
whenComplete
to the future returned byasyncReload
. This avoids the unnecessary thread case described above forAsyncLoadingCache
. It is worth noting that cancelling the refreshing future if the entry is removed was rejected. This is because unlikeFuture
, inCompletableFuture
the task is not interrupted to hint that it may stop and the removal listener would still need to be notified if the value materialized.The final detail is whether an in-flight refresh should postpone expiration. This is currently the case for
expireAfterWrite
(but notexpireAfterAccess
) due to reusing the entry'swriteTime
to avoid an extra field to indicate that a refresh was triggered. A stricterexpireAfterWrite
would require an additional per-entry field and is more prone to a redundant load. For now we'll keep it as is, but be willing to change if requested.The above is from brainstorming with Etienne Houle @ Stingray (@ehoule).
@abatkin, @yrfselrahc, @lowasser might have an opinions on this design
The text was updated successfully, but these errors were encountered: