Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference in iteration order between caffeine and guava in LoadingCache.getAll #220

Closed
slovdahl opened this issue Feb 8, 2018 · 6 comments

Comments

@slovdahl
Copy link

slovdahl commented Feb 8, 2018

The iteration order of the Map<K, V> that Guava's LoadingCache.getAll( Iterable<? extends K> keys ) returns is the same as the iteration order of the input keys, due to Guava using a LinkedHashMap under the hood. Caffeine, instead, uses a HashMap, and therefore the iteration is highly unlikely to be the same. I noticed this while migrating from Guava to Caffeine caches, as we have unfortunately relied on the iteration order being consistent. The getAll javadoc doesn't mention anything about the iteration order, and that should probably have told me to consider it as unspecified iteration order.

I'm creating this issue mainly as a way of documenting the difference in behaviour for others who migrate from Guava to Caffeine. Maybe an entry about this on the Guava wiki page would be suitable? Because I suppose you don't want to replace the internal HashMap with a LinkedHashMap.

@ben-manes
Copy link
Owner

Good point. I didn't see the benefit of providing stable iteration order based on the input. It would be interesting to know if specifying this would be a nice-to-have feature or unimportant. I don't see the harm, but also don't see the benefits.

The main reason why I decided not to replicate this was another possible bug. Guava and Caffeine do not consider weak keys in their result mapping. The weak key configuration uses reference equality and identity hash code for lookup. If k1 and k2 are logically equal but different instances, then there may be two entries in the cache. When the result of getAll is computed, the keys are deduplicated and only one of them is returned. This is undocumented behavior in Guava, though noted in Caffeine. If anyone raises an issue about it then I'd want the flexibility to switch to IdentityHashMap, which is unordered.

@slovdahl
Copy link
Author

slovdahl commented Feb 9, 2018

One of our use-cases is the following:

  • a cache for domain objects that are fairly expensive to create, where the primary key from the RDBMS is the cache key
  • when doing arbitrary queries (filtering, sorting, etc), we just fetch the primary key and get the objects from the cache in the order the primary keys were returned by the query

However, resorting the result after getting it from the cache isn't a showstopper at the moment at least.

@ben-manes
Copy link
Owner

Released. Thanks! (now has Guava ordering, but similarly not doc'd)

@slovdahl
Copy link
Author

Oh joy, undefined behaviour. 😄 I just stumbled upon a strange ordering issue related to this.

Consider the following sequence in pseudo-code for a LoadingCache<Integer, String> that describes the expected undefined behaviour:

  1. cache.get(12) loads and returns "12"
  2. cache.getAll(4, 12, 15) -> loads "4", gets "12" from the cache, loads "15", returns "4", "12", "15"

This is how Guava's cache behaves. Caffeine gets any existing cached values first, and after that uses the loader to load the missing ones, causing the following behaviour:

  1. cache.get(12) loads and returns "12"
  2. cache.getAll(4, 12, 15) -> gets "12" from the cache, loads "4", "15", returns "12, "4", "15"

@ben-manes
Copy link
Owner

I guess we should write some tests for these and try to resolve them. From my reading of Guava's code, this was the behavior and so I followed it (even though I thought it was odd as not fully ordered). In Guava's LocalCache#getAll(keys), a simplified version is below. You'll see a peek into the cache (Map.get(key)), a bulk load, and appending to the earlier results. I don't see a point where the results are reordered by the input keys order.

ImmutableMap<K, V> getAll(Iterable<? extends K> keys) {
  Map<K, V> result = Maps.newLinkedHashMap();
  Set<K> keysToLoad = Sets.newLinkedHashSet();
  for (K key : keys) {
    V value = get(key);
    if (!result.containsKey(key)) {
      result.put(key, value);
      if (value == null) {
        keysToLoad.add(key);
      }
    }
  }

  Map<K, V> newEntries = loadAll(keysToLoad, defaultLoader);
  for (K key : keysToLoad) {
    V value = newEntries.get(key);
    result.put(key, value);
  }
  return ImmutableMap.copyOf(result);
}

@ben-manes
Copy link
Owner

Oh! It clicked what Guava was doing and I wrote a quick test to verify. Sorry that I misunderstood and implemented it wrong.

The first result.put(key, value) will insert a null value when the entry was absent. It will then perform the loadAll and finish populating the map, causing these entries to be fully present. If a key was not found by loadAll then an exception is thrown. Since its a LinkedHashMap the order is preserved.

Since I didn't notice the use of null values, I naively assumed it out of order. Sorry that I didn't write tests to verify that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants