-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk refresh #7
Comments
The API for this improvement is not very clear. Even if a batch refresh may predominantly delegate to A batch refresh using I'm favoring the
On reflection, I like |
To clarify: Would the solution Or are you saying that the default implementation (in the interface) of |
The default interface implementation of I think its best to layout each scenario separately based on which methods are implemented.
The assumption is that most users only ever implement Does that sound right? |
The table looks entirely reasonable. The only thing of note is that existing code that has implemented |
Only if |
I have run into this issue too. Since the overhead for loading a single item is huge I just use an Async Loader and queue the reload requests until I have enough to run a batch. Not the cleanest solution but a workable one. |
@ben-manes i'm not sure if i'm understanding this feature or if there's a real issue. I'm using the here's my simple code to demostrate this: val cache = Scaffeine()
.refreshAfterWrite(100 millis)
.build[String, String](loader = (key: String) => {
println(s"key: ${key}")
key
}, allLoader = Some((keys: Iterable[String]) => {
println(s"keys: ${keys.mkString(",")}")
keys.map(k => k -> k).toMap
}))
val values = cache.getAll(Seq("1", "2", "3")).toList
Thread.sleep(200)
cache.getAll(Seq("1", "2", "3")).toList normally since i have
replacing it with thoughts? |
Currently refreshing is performed on individual keys and calls |
Ah...thanks for the quick answer @ben-manes, i shouldn't have assumed that this is implemented since the ticket is still open 😊. I guess for now i'm gonna see what i can do with what we have. @Dirk-c-Walter's solution is interesting but it wouldn't work for me. |
This interface method provides a placeholder for future support of bulk reloading (see #7). That is not implemented. Therefore this method merely calls refresh(key) for each key and composes the result. If and when bulk reloading is supported then this method may be optimized.
This interface method provides a placeholder for future support of bulk reloading (see #7). That is not implemented. Therefore this method merely calls refresh(key) for each key and composes the result. If and when bulk reloading is supported then this method may be optimized.
This interface method provides a placeholder for future support of bulk reloading (see #7). That is not implemented. Therefore this method merely calls refresh(key) for each key and composes the result. If and when bulk reloading is supported then this method may be optimized.
This interface method provides a placeholder for future support of bulk reloading (see #7). That is not implemented. Therefore this method merely calls refresh(key) for each key and composes the result. If and when bulk reloading is supported then this method may be optimized.
Just posting this here for reference. The issue I described in #323 (comment) and #323 (comment) would most likely not have occurred at all with bulk refresh, because we implement |
There is a CoalescingBulkLoader in the example section that joins multiple load requests into a bulk request by issuing a delay. When reviewing, I think I spotted at least one concurrency issue, so I would not recommend using it without prior vetting. In cache2k I recently added bulk support and also provided a CoalescingBulkLoader. See the issue comment on how to configure it. I did some heavy concurrent testing on it, so I am quite confident it is production ready. The coalescing can happen for every load, which would always introduce a delay and therefor additional latency on user requests, or it can work within the refresh path only, so initial user requests are not delayed. The solution in Caffeine could be similar. I was considering a bulk support in the cache core, which would mean that timer events would be managed for a bunch of keys, however, timing is a cross cutting concern through the whole cache implementation, OTOH "bulk" operations are very useful and widespread but not a common mode of operation. Adding bulk support in the timing code of the core would add a lot complexity. Also the majority of the cache API is still on individual keys, so entries initially loaded in a bulk request, might not be refreshed together. The coalescing approach allows to add it as an extension without altering the core cache. At the moment the implementation uses a separate thread scheduler, which is not optimal as a general solution. It should be enhanced and use a timer from the the cache infrastructure without needing an additional thread. Since coalescing introduces a tiny delay to bundle the requests, a more efficient solution could adjust the timer events by "rounding up" and the timer code could be extended to process bulks of events that happen at the same time. |
@cruftex wow i did not even think this exists...but would you mind pointing out the concurrency issues (at least a one or two of them)? |
@dalegaspi here you are:
This has the chance that multiple timers are scheduled. Probably that causes no serious harm, but smaller bulk batches when there is contention. However, the overall design implies that there is only one timer scheduled, which is not guaranteed. OTOH it will also start multiple Looking at the fields, for example:
Looking at That said I was happy for the idea, and I think the code is basically working. However, it needs a proper review. I am very suspicious when concurrent code is not using final and volatile in a sensible way. |
@cruftex thanks for pointing those out...now, I'm not going to pretend i fully understand this code...i'm sure @sheepdreamofandroids and/or @ben-manes can explain better since synchronized private void startWaiting() {
schedule = timer.schedule(this::doLoad, maxDelay, MILLISECONDS);
} will just replace the previous as for the use of as for the last point, i thought it did honor the max parameter in line 167 but maybe you are referring to an entirely different thing that I don't quite see. kindly take these comments for what it's worth. i'm not starting a debate. i'm just a village idiot. 😊 Yes, I agree, this is a great idea and should be sufficient for most use cases. I will actually try this when i get the chance. ninja edit: @Stephan202 pointed out that @sheepdreamofandroids wrote the class. |
I'm sure he can explain it (:smile:), but he's not the author; @sheepdreamofandroids is. See #336. |
@Stephan202 thanks wow that was an oversight. i updated my previous comment. |
Wow, this is a while ago :-) Indeed I'm pretty sure that the
|
Yes it does! Sorry. I missed the counting down logic. |
Hello. Is bulk refresh supported now? |
Not yet, it's been in the backlog with a contributed alternatives in the |
I'm sorry to ask question here. I override the |
This issue will be closed as won't do after the documentation is updated to guide users. The original ask was an observation that since For bulk refreshes the coalescing examples show a better approach because they can capture reloads triggered by independent cache operations, buffer by a max delay or key count, and throttle the total number reloads in-flight. This can all be done by a more intelligent @sheepdreamofandroids' example is nice and straightforward. It can be further simplified by leveraging a reactive library, such as by using RxJava's buffer(timespan, timeUnit, count) or Reactor's bufferTimeout(maxSize, maxTime). It is then only ~30 LOC to implement and customize to fit the problem at hand. Here is a short example using Reactor demonstrating this, CoalescingBulkLoaderpublic final class CoalescingBulkLoader<K, V> implements AsyncCacheLoader<K, V> {
private final Function<Set<K>, Map<K, V>> mappingFunction;
private final Sinks.Many<Request<K, V>> sink;
public CoalescingBulkLoader(int maxSize, Duration maxTime,
int parallelism, Function<Set<K>, Map<K, V>> mappingFunction) {
this.mappingFunction = requireNonNull(mappingFunction);
sink = Sinks.many().unicast().onBackpressureBuffer();
sink.asFlux()
.bufferTimeout(maxSize, maxTime)
.parallel(parallelism)
.runOn(Schedulers.boundedElastic())
.subscribe(this::handle);
}
@Override public CompletableFuture<V> asyncLoad(K key, Executor executor) {
var result = new CompletableFuture<V>();
sink.tryEmitNext(new Request<>(key, result)).orThrow();
return result;
}
private void handle(List<Request<K, V>> requests) {
try {
var results = mappingFunction.apply(requests.stream().map(Request::key).collect(toSet()));
requests.forEach(request -> request.result.complete(results.get(request.key())));
} catch (Throwable t) {
requests.forEach(request -> request.result.completeExceptionally(t));
}
}
private record Request<K, V>(K key, CompletableFuture<V> result) {}
} Sample test@Test
public void coalesce() {
AsyncLoadingCache<Integer, Integer> cache = Caffeine.newBuilder()
.buildAsync(new CoalescingBulkLoader<>(
/* maxSize */ 5, /* maxTime */ Duration.ofMillis(50), /* parallelism */ 5,
keys -> keys.stream().collect(toMap(key -> key, key -> -key))));
var results = new HashMap<Integer, CompletableFuture<Integer>>();
for (int i = 0; i < 82; i++) {
results.put(i, cache.get(i));
}
for (var entry : results.entrySet()) {
assertThat(entry.getValue().join()).isEqualTo(-entry.getKey());
}
} |
This Guava issue identified an expected optimization not being implemented. A
getAll
where some of the entries should be refreshed due torefreshAfterWrite
schedules each key as an independent asynchronous operation. Due to the providedCacheLoader
supporting bulk loads, it is reasonable to expect that the refresh is performed as a single batch operation.This optimization may be invasive and deals with complex interactions for both the synchronous and asynchronous cache implementations. Or, it could be as simple as using
getAllPresent
and enhancing it to support batch read post-processing.The text was updated successfully, but these errors were encountered: