'Synchronous listener' isn't synchronous on expirations? #195

aukevanleeuwen · 2017-10-28T10:10:19Z

Hello,

I wanted to leverage caffeine in a scenario as follows:

Component A puts something on a request queue and stores the request in a (Caffeine) cache as well. When Component B is answering by means of putting something on a response queue the original request is retrieved from the cache and processed further.

However I also have the idea of a timeout on the amount of time waiting for a response. I've tried to implement that as follows:

Creating a cache in Component A as follows:

    this.requestCache = Caffeine.newBuilder()
            .expireAfterWrite(timeoutInMs, TimeUnit.MILLISECONDS)
            .writer(new CompleteExceptionallyOnExpirationWriter())
            .build();

    public class CompleteExceptionallyOnExpirationWriter implements CacheWriter<String, CompletableFuture<PaymentResult>> {

        // <snip>

        @Override
        public void delete(String key, CompletableFuture<PaymentResult> value, RemovalCause cause) {
            if (cause != RemovalCause.EXPLICIT) {
                log.warn("Didn't receive a response on the topic {} within {}ms for payment {}.", topic, timeoutInMs, key);
                value.completeExceptionally(new TimeoutException("Didn't get a response within " + timeoutInMs + "ms."));
            }
        }
    }

I started out by implementing a RemovalListener, but soon noticed that this was an asynchronous process but I also read that a synchronous 'listener' was possible using a CacheWriter.

When the operation must be performed synchronously with the removal, use CacheWriter instead.

However this still doesn't seem to working as I expect. The behaviour that I see now is as follows:

Request 1 arrives and is put in the cache
After 'timeout' nothing happens
Some time later Request 2 arrives and is put in the cache
The moment (3) happens the CacheWriter.delete() of request 1 is triggered.

Now I start out with a rather slow moving cache, so I can't rely on the fact that the cache is always busy enough to trigger something on the 'previous' request timeout.

Is this expected behaviour? It seems kind of odd to me, but I don't see anything wrong with what I'm doing.

Regards,

Auke

The text was updated successfully, but these errors were encountered:

ben-manes · 2017-10-28T19:10:37Z

Yes, I think it is a disconnect of expectations. I am open to changing this in Java 9, if we can work out the details.

Passive expiration

Currently expiration is not viewed as a scheduling service, where an event would be fired when the timer elapses. This would require a dedicated thread to actively perform the task, but the cache itself does not create any threads. Your use-case is reasonable, but falls into the gray area of what should be expected from expiration.

As of now, expiration is performed passively. This is because the space was already taken and a time constraint is about freshness (not capacity). When an entry expires a miss is emulated if present, and it is discarded when the amortized maintenance cycle is triggered. Note that removing expired entries is O(1) for a fixed policy (LRU queue) and variable policy (timer wheel). The concept of a cache is to provide fast access to data within some policy constraints, so the passive behavior fits this purpose.

The maintenance work can be triggered explicitly using Cache.cleanUp(). That could be called by a ScheduledExecutorService on a fixed schedule, thereby causing your timeouts to be handled.

Interceptors

The CacheWriter and RemovalListener are interceptors for when the hash table entry is manipulated. The writer is called within a computation method (see ConcurrentHashMap), thereby blocking other writes to that entry. As you observed, that hash table operation will not occur immediately upon expiration but be delayed until deemed necessary. The RemovalListener differs by being called outside of a computation after the write, so as to not block writes or cancel the write if an exception is thrown.

Active expiration?

Java 9 introduces a shared scheduler via CompletableFuture.delayedExecutor. This could be used to trigger a maintenance cycle when the oldest entry is set to expire. That scheduling would be O(lg n) due to the heap, the per-entry operations would remain O(1). Assuming the shared scheduler isn't being abused by others, its heap would remain small despite very large caches.

The problem is active expiration assumes a strongly guarantee of event order. If the timer is not fired because order was imperfect then this would be viewed as a bug. That can occur for read-based expiration policies (expireAfterAccess, Expiry's expireAfterRead) because the cache favors dropping reorder events to rather than induce latency when there is high throughput demands. In those cases the timestamps are still updated, but a weaker ordering is assumed acceptable since the external behavior is correct.

If active expiration was enabled, we would have to provide only best-effort for read-based but can guarantee it strongly for writes. That fits your expectations, but does add conceptual weight for anyone interested in this feature.

Alternatives

Active expiration might be outside of the scope of a cache. Its a fair debate. There are multiple ways you could perform the timing logic yourself.

The simplest is if you are on Java 9 and can use CompletableFuture.orTimeout. Assuming the future is the request, if timed out it could discard the cache's entry. If the request queue is combined with the request cache by using AsyncLoadingCache, then the failed future would be automatically removed.

Another possibility would be to schedule each entry in a ScheduledExecutorService. That is inefficient for large caches (e.g. millions), but reasonable for in-flight requests. That's less so if using a timer wheel variant, such as this non-hierarchical scheduler.

Or, as mentioned above, if scheduled passively then you could have a periodic task that checks if there are any expired entries. Then you could still use the cache by calling cleanUp.

aukevanleeuwen · 2017-10-30T13:04:03Z

Thanks for your elaborate response. I think I can follow most of it, and it makes sense that for this to be a general purpose (if you will) cache, there are bound to be some things falling outside the scope of a cache. I don't think I have a very strong idea on this definitely belonging in the scope of a cache considering I really don't oversee all of the drawbacks this might cause on the cache as a whole.

Having said that I do think the documentation in that sense is a bit misleading:

Synchronous Listeners
A CacheWriter may be used to publish to synchronous listeners.
A synchronous listener receives event notifications in the order that the operations occur on the cache for a given key. The listener may either block the cache operation or queue the event to be performed asynchronously. This type of listener is most often used for replication or constructing a distributed cache.

I think especially that last sentence made me pretty sure that both writes and deletes would be synchronous. How else would a distributed cache know for example about the explicit eviction from another (cache) node? Maybe I'm not seeing this through though.

I didn't really like the scheduled Cache.cleanUp(), that seems a bit too hack-ish to me. Also I would know what it would impose on the cache if I would schedule this every 100ms or so?

In the end I now schedule the task in a ScheduledExecutorService after adding it to the cache to check explicitly if the CompletableFuture was already handled or not:

    this.timeoutExecutorService.schedule(() -> {
        if (!future.isDone()) {
            log.warn("Didn't receive a response on the topic {} within {}ms for payment {}.", topic, timeoutInMs, key);
            value.completeExceptionally(new TimeoutException("Didn't get a response within " + timeoutInMs + "ms."));
        }
    }, timeoutInMs, TimeUnit.MILLISECONDS);

I'm kind of disliking the fact that I'm running this task for every request (and not only the ones that are actually timing out), but the intent is pretty clear at least and I don't expect thousands of requests per second yet. Thanks for the timer wheel read, interesting and I keep that in mind if I need the performance later on.

ben-manes · 2017-10-30T16:13:57Z

I think especially that last sentence made me pretty sure that both writes and deletes would be synchronous. How else would a distributed cache know for example about the explicit eviction from another (cache) node? Maybe I'm not seeing this through though.

In passive expiration, there is no write until the cache notices the entry expired and cleans up. The actual mutation on the hash table is communicated to the CacheWriter. This will be delayed some time after the timestamp. Other eviction types (size, explicit) are active because there is an event, such as an insert, that can immediately trigger the work. For expiration, a scheduler thread is needed to trigger an event at some point in the future. The difference here is our expectations of how promptly the cache should detect that an entry has expired.

I didn't really like the scheduled Cache.cleanUp(), that seems a bit too hack-ish to me. Also I would know what it would impose on the cache if I would schedule this every 100ms or so?

I agree, 100ms is quite frequent and may seem wasteful when no work is needed. The overhead should be small due to using O(1) algorithms. This solution isn't great, but at least provides some support.

In the end I now schedule the task in a ScheduledExecutorService after adding it to the cache to check explicitly if the CompletableFuture was already handled or not.

You could have the request future cancel the scheduler's using a whenComplete. That would avoid running it for requests that have been successful.

ben-manes · 2019-08-01T05:37:20Z

@aukevanleeuwen,

Sorry for not getting onto this enhancement earlier. I have it implemented and half of the tests done, so now onto the hardest part - finding the time to wrap it up. I am curious if you think this would have been acceptable for your use-case, had it been available.

The Caffeine builder will now accept a Scheduler, defaulting to the disabled instance. If enabled then the cache will compute the delay until the next expiration event during its routine maintenance and schedule a future maintenance cycle (which then performs any eviction, etc). Since the expiration policies maintain order in O(1) time, we can schedule only the next fire time instead of each entry's time individually. That of course greatly reduces the costs imposed by the scheduler implementation (time/space of a heap).

To avoid thrashing the scheduler there are a few minor caveats to pace the executions.

It will not reschedule if the next expiration event is later than the already scheduled fire time. For example with expireAfterWrite of 10 minutes you could have a scenario like,
- T(0 min): Insert entry E1, scheduled for maintenance at T(10 min)
- T(7 min): Clear cache => schedule T(10) remains
- T(8 min): Insert entry E2 => schedule T(10) remains
- T(10 min): Run maintenance, schedule for E2 at T(18)
It will not reschedule if the next expiration event is earlier than the scheduled time by a tolerance limit. This tolerance is not configurable and currently set to ~1 second, which I think is tolerable (if not, use a timer subsystem). By doing this we avoid thrashing due to minor differences, e.g. by nanoseconds.
If the next expiration event is within a tolerance limit of the current time, it will schedule at that tolerance limit (same ~1 second). By doing this we avoid excessive scheduling, e.g. running again in a few milliseconds, and can batching up more the expiration work.

The maintenance work is cheap and fast, but pacing the executions seems appropriate. The tolerance to delay is necessary regardless, e.g. as ScheduledThreadPoolExecutor warns that there are no real-time guarantees. In addition the TimerWheel obtains its O(1) efficiencies by batching and hashing, so that it can use hashing as a psuedo-sort rather than offer a strict ordering. The tolerance of ~1s matches the smallest bucket size (2^30ns ~= 1.07s).

For Java 9+ users, no additional threads are required by leveraging a system-wide scheduling thread hidden within CompletableFuture.delayedExecutor(duration). For JDK8 compatibility, Scheduler.systemScheduler() will provide it if available reflectively, else return the disabled instance. For now I decided against trying to wrangle a multi-version jar for static calls and cache the method for reflective use, which should be okay as its called at a minimum of 1 second apart.

The interface is below and is nothing special. The given task is run by the given executor, which is Caffeine.executor which defaults to ForkJoinPool.commonPool(). That task holds a weak reference to the cache, so will no-op and allow the cache to be GC'able.

@FunctionalInterface
public interface Scheduler {

  /**
   * Returns a future that will submit the task to the given executor after the given delay.
   *
   * @param executor the executor to run the task
   * @param command the runnable task to schedule
   * @param delay how long to delay, in units of {@code unit}
   * @param unit a {@code TimeUnit} determining how to interpret the {@code delay} parameter
   */
  Future<?> schedule(Executor executor, Runnable command, long delay, TimeUnit unit);

  /**
   * Returns a scheduler that always returns a successfully completed future.
   *
   * @return a scheduler that always returns a successfully completed future
   */
  static Scheduler disabledScheduler() {
    return DisabledScheduler.INSTANCE;
  }

  /**
   * Returns a scheduler that uses the system-when scheduling thread if available, or else returns
   * {@link #disabledScheduler()} is not present. This scheduler is provided in Java 9 or above
   * through {@link CompletableFuture#delayedExecutor}.
   *
   * @return a scheduler that uses the system-wide scheduling thread if available, or else a
   *         disabled scheduler
   */
  static Scheduler systemScheduler() {
    return SystemScheduler.isPresent() ? SystemScheduler.INSTANCE : disabledScheduler() ;
  }

  /**
   * Returns a scheduler that delegates to the a {@link ScheduledExecutorService}.
   *
   * @param scheduledExecutorService the executor to schedule on
   * @return a scheduler that delegates to the a {@link ScheduledExecutorService}
   */
  static Scheduler forScheduledExecutorService(ScheduledExecutorService scheduledExecutorService) {
    return new ExecutorServiceScheduler(scheduledExecutorService);
  }
}

Typically the cache using an amortized maintenance routine to periodically keep the eviction policies up to date. For size this is to replay hits or evict, for expiration this is to discover stale entries, and for reference caching this is to find GC'd keys or values. Each policy uses O(1) algorithms to be cheap, batches the work to avoid lock contention, and performs this routine maintenance as a side effect of normal cache operations. If there is no activity on the cache then the routine maintenance is not triggered, as that requires a background scheduling thread. Generally this is okay as the memory was already in consumed, no activity may indicate low system usage, and the cache is a transient data store. However, some do want to trigger business logic and leverage the cache as a timing subsystem, e.g. using the `RemovalListener` to notify another component. The cache itself does not create or manage threads, but can defer work to a configured `Executor`. Similarly now it can now schedule the maintenance routine on a configured `Scheduler`, default disabled. The cache will still perform maintenance periodically and this scheduler only augments it to add additional liveliness. The scheduling is based on the expiration time only; users may use `Cache.cleanUp()` for fixed periods to assist reference caching, etc as desired. The cache will rate limit the scheduler so that it runs at a reasonable pace and never thrashes. For example, if the next expiration times are tiny durations apart (e.g. 200ms), the cache will not run the maintenance task 5x/s. It will instead ensure a 1 second delay (2^30ns) between scheduled runs (not accounting for the lack of real-time guarantees by a Scheduler impl). Similarly an additional delay may occur due to the priority queues being held in best-effort order. The scheduling for variable expiration is based on the TimerWheel's bucket expiration, which cascades events. For example a timer of 1h 13m 5s would schedule to run in 1h, which would cascade the 1hr bucket into the minute buckets. The next schedule would be at 13m, followed by 5s. Java 9+ users should prefer using `Scheduler.systemScheduler()` rather than creating their own dedicated threads. This leverages the JVM-wide scheduling thread, primarily intended for `CompletableFuture.orTimeout(duration)`. In Java 8 this method returns `Scheduler.disabledScheduler` instead.

ben-manes · 2019-08-06T05:19:26Z

Released in v2.8

Cryptite · 2019-08-24T14:25:29Z

This is super handy! Probably not the right place for this, but is there a cleaner, one-liner-type solution for just triggering a cleanup at builder-construction-time for the scheduler?

ben-manes · 2019-08-24T17:09:31Z

I'm not entirely sure what your intent is with that scheduler logic. The general usage would be to define it like,

newOrderCache = Caffeine.newBuilder()
    .scheduler(Scheduler.systemScheduler())
    ...
    .build();

and the cache will schedule based on the next expiration event.

The systemScheduler is using a trick in CompletableFuture which has its own scheduler thread and delegates to ForkJoinPool.commonPool() by default to run the task.

CompletableFuture.runAsync(task,
    CompletableFuture.delayedExecutor(delay, unit, executor));

What are you trying to achieve with that custom scheduler statement?

Cryptite · 2019-08-24T23:02:56Z

Perhaps I was overthinking it; I was merely wanting to ensure that expiry happened after the given interval, rather than wait for any operations that trigger its own cleanup. It's admittedly slightly against the purpose of a cache but in this case, I'm wanting things added to the cache to be evicted 30s later guaranteed. My cache in this instance isn't high throughput so something may go in 6 hours later, and I don't want to wait 6 hours for eviction.

That may've been overly verbose for my use case, so apologies if so heh.

If merely providing the systemScheduler actually triggers forced eviction after the given expiry interval then that's all I need.

ben-manes · 2019-08-24T23:31:49Z

yeah, you probably over thought it :)

If you provide a scheduler, without any fanciness, then the cache will schedule accordingly. If you are on Java 9+ then the systemScheduler is available and you don't need your own threads. Otherwise you can use forScheduledExecutorService to adapt to your own executor.

There is some minor fuzziness so it is not strictly guaranteed to happen instantly, but within a small margin window. That should be good enough for most cases.

You can try the feature in a test, e.g.

public static void main(String[] args) {
  Stopwatch stopwatch = Stopwatch.createUnstarted();
  Cache<Integer, Integer> cache = Caffeine.newBuilder()
      .expireAfterWrite(1, TimeUnit.SECONDS)
      .scheduler(Scheduler.systemScheduler())
      .removalListener((key, value, cause) -> {
        System.out.printf("%s: %s at time %s%n", cause, key, stopwatch);
      }).build();
  stopwatch.start();
  for (int i = 0; i < 3; i++) {
    cache.put(i, i);
  }
  System.out.printf("Sleeping... %d entries%n", cache.estimatedSize());
  Uninterruptibles.sleepUninterruptibly(2, TimeUnit.SECONDS);
  System.out.println("Done... " + cache.asMap());
}

Sleeping... 3 entries
EXPIRED: 0 at time 1.099 s
EXPIRED: 2 at time 1.119 s
EXPIRED: 1 at time 1.119 s
Done... {}

ben-manes mentioned this issue Feb 14, 2018

Expiry javadoc is a bit misleading #217

Closed

ben-manes mentioned this issue Sep 24, 2018

Expiry always expires after 4-9 milliseconds #271

Closed

ben-manes closed this as completed in 0450ff0 Aug 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'Synchronous listener' isn't synchronous on expirations? #195

'Synchronous listener' isn't synchronous on expirations? #195

aukevanleeuwen commented Oct 28, 2017

ben-manes commented Oct 28, 2017

aukevanleeuwen commented Oct 30, 2017

ben-manes commented Oct 30, 2017

ben-manes commented Aug 1, 2019 •

edited

Loading

ben-manes commented Aug 6, 2019

Cryptite commented Aug 24, 2019

ben-manes commented Aug 24, 2019 •

edited

Loading

Cryptite commented Aug 24, 2019

ben-manes commented Aug 24, 2019

'Synchronous listener' isn't synchronous on expirations? #195

'Synchronous listener' isn't synchronous on expirations? #195

Comments

aukevanleeuwen commented Oct 28, 2017

ben-manes commented Oct 28, 2017

Passive expiration

Interceptors

Active expiration?

Alternatives

aukevanleeuwen commented Oct 30, 2017

ben-manes commented Oct 30, 2017

ben-manes commented Aug 1, 2019 • edited Loading

ben-manes commented Aug 6, 2019

Cryptite commented Aug 24, 2019

ben-manes commented Aug 24, 2019 • edited Loading

Cryptite commented Aug 24, 2019

ben-manes commented Aug 24, 2019

ben-manes commented Aug 1, 2019 •

edited

Loading

ben-manes commented Aug 24, 2019 •

edited

Loading