Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix invalidation race in CachingHiveMetastore #10646

Merged
merged 9 commits into from
Jan 18, 2022

Conversation

findepi
Copy link
Member

@findepi findepi commented Jan 17, 2022

The LoadingCache used in CachingHiveMetastore was susceptible to a race
between load-in-flight and invalidation. The invalidation of a key would
not prevent load in flight from being inserted into the cache.

Fixes #10512.

The changed caches do not set `cacheBuilder.refreshAfterWrite` (since
3a1bf59), so the `asyncReloading`,
which changes `CacheLoader.reload`, and delegates the reset), was
redundant.
@cla-bot cla-bot bot added the cla-signed label Jan 17, 2022
@findepi findepi force-pushed the findepi/hive-cache-soundly branch from 655fb71 to a0a9d14 Compare January 17, 2022 14:01
@findepi findepi force-pushed the findepi/hive-cache-soundly branch from a0a9d14 to 6d7de31 Compare January 17, 2022 15:36
@findepi findepi marked this pull request as ready for review January 17, 2022 15:36
@findepi findepi requested review from losipiuk and kokosing January 17, 2022 15:36
@findepi
Copy link
Member Author

findepi commented Jan 17, 2022

cc @alexjo2144 @homar

Capture more commonalities that before.

- `asyncReloading` is used for all caches with `refreshMillis` set
- `memoizeMetastore` no longer provide an (unused) Executor
- individual cache instantiation is now single line, so all the code
  reads better
- separate bulk-loading cache building from non-bulks loading, since we
  do not want to do refreshes for bulk-loading caches
- remove redundant `cacheBuilder = cacheBuilder.xxx` assignments
@findepi findepi force-pushed the findepi/hive-cache-soundly branch from 6d7de31 to ec2ffe3 Compare January 17, 2022 17:03
@findepi findepi force-pushed the findepi/hive-cache-soundly branch 2 times, most recently from c96e83f to b6ebad4 Compare January 17, 2022 18:07
They have same names as test resources in trino-main. This allows
toolkit's and main's test jars to be used together on a plugin's
classpath. The chosen prefix, `file-based-system-`, is already used for
a few other test resources used in the same test class.
Extract `CacheStatsAssertions` from `TestCachingJdbcClient`.
@findepi findepi force-pushed the findepi/hive-cache-soundly branch from b6ebad4 to f3dca68 Compare January 17, 2022 21:45
@findepi
Copy link
Member Author

findepi commented Jan 18, 2022

50 successful and 2 failing checks

public ImmutableMap<K, V> getAll(Iterable<? extends K> keys)
throws ExecutionException
{
BiMap<K, Token<K>> keyToToken = HashBiMap.create();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImmutableBiMap?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect ImmutableBiMap to be less performant, since it tries to create a compacted version at the build() time.
It's a method local state, so immutability doesn't buy me anything


BiMap<Token<K>, K> tokenToKey = keyToToken.inverse();
ImmutableMap.Builder<K, V> result = ImmutableMap.builder();
for (Map.Entry<Token<K>, V> entry : values.entrySet()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

values.entrySet().stream()...collect()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am implementing a generic-purpose class and i don't know where it's going to be used.
I avoid Streams in this code for performance reasons. We know that Streams are not suitable for every occasion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is a case then maybe a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where would you see that?

}

@SuppressModernizer
private static Integer newInteger(int value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a dedicated class. It would be known what is the goal of it. No need to copy paste, no need for SuppressWarnings. Class could have a javadoc etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just boiler plate. Suppressing something is cheaper that writing a class.

Fix grammar. Improve wording.
@findepi findepi force-pushed the findepi/hive-cache-soundly branch from d7b33c8 to 3e8f373 Compare January 18, 2022 09:54
@findepi findepi requested a review from kokosing January 18, 2022 09:54
Comment on lines +353 to +351
List<K> keys = new ArrayList<>();
for (Token<K> token : tokenList) {
keys.add(token.getKey());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

            List<K> keys = tokenList.stream()
                    .map(Token::getKey)
                    .collect(toCollection(ArrayList::new));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% I haven't reviewed testLoadAfterInvalidate.

@Override
public void invalidateAll()
{
Set<K> keys = ImmutableSet.copyOf(tokensCache.asMap().keySet());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we invalidate all entries from dataCache that do not are not mentioned in tokens? So we release stale entries of dataCache in the opportunistic way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, changing.

The `LoadingCache` used in `CachingHiveMetastore` was susceptible to a race
between load-in-flight and invalidation. The invalidation of a key would
not prevent load in flight from being inserted into the cache.
@findepi findepi force-pushed the findepi/hive-cache-soundly branch from 3e8f373 to 0e42ed3 Compare January 18, 2022 11:41
@findepi findepi merged commit 4c99cff into trinodb:master Jan 18, 2022
@findepi findepi deleted the findepi/hive-cache-soundly branch January 18, 2022 21:36
@findepi findepi mentioned this pull request Jan 18, 2022
@github-actions github-actions bot added this to the 369 milestone Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Metadata caching may return incorrect data
3 participants