Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalidate CachingDirectoryLister when writing through Trino #10915

Merged

Conversation

findinpath
Copy link
Contributor

@findinpath findinpath commented Feb 2, 2022

Description

Notify the CachingDirectoryLister about changes that happen
on the table / partition(s) in order to be able to invalidate
the cached directory listings for Hive tables or
partitions of the Hive table.

General information

Is this change a fix, improvement, new feature, refactoring, or other?

This is a cache invalidation fix.

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

This change affects the Hive connector.

How would you describe this change to a non-technical end user or system administrator?

When the content of a table changes,
all the cached paths for under belonging to the table will be
invalidated from the cache.

Please do note that the current implementation does not
make the CachingDirectoryLister aware of changes
performed on the table content by external processes
(e.g. : Hive, Spark).

Related issues, pull requests, and links

Documentation

( x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
( x) Release notes entries required with the following suggested text:

# Hive
* File status cache content is invalidated when writing through Trino. ({issue}`10621`)

Fixes #10621

@cla-bot cla-bot bot added the cla-signed label Feb 2, 2022
@findinpath findinpath marked this pull request as draft February 2, 2022 20:27
@findinpath findinpath force-pushed the hive-invalidate-caching-directory-lister branch 2 times, most recently from 647688c to 973a952 Compare February 7, 2022 21:09
@findinpath findinpath force-pushed the hive-invalidate-caching-directory-lister branch 3 times, most recently from 42c8263 to 8e5c6a0 Compare February 8, 2022 10:31
@findinpath findinpath force-pushed the hive-invalidate-caching-directory-lister branch from 8e5c6a0 to e144cf4 Compare February 8, 2022 11:44
@findinpath findinpath force-pushed the hive-invalidate-caching-directory-lister branch 4 times, most recently from 0d560c0 to 96fddfd Compare February 9, 2022 08:33
@findinpath
Copy link
Contributor Author

Rebasing on upstream/master to make use of the EvictableCache improvements (support for weigher) #10949

@findinpath findinpath force-pushed the hive-invalidate-caching-directory-lister branch 2 times, most recently from bcca8eb to f20dd3d Compare February 10, 2022 12:33
@findinpath findinpath force-pushed the hive-invalidate-caching-directory-lister branch from 120171e to 7c47b88 Compare February 10, 2022 18:12
@findinpath findinpath marked this pull request as ready for review February 10, 2022 18:27
@findinpath findinpath force-pushed the hive-invalidate-caching-directory-lister branch from 7c47b88 to ed4dc20 Compare February 11, 2022 06:11
@findinpath findinpath force-pushed the hive-invalidate-caching-directory-lister branch 2 times, most recently from d5fb37e to 41f2272 Compare February 14, 2022 06:01
@findinpath findinpath force-pushed the hive-invalidate-caching-directory-lister branch 2 times, most recently from 5be4f4d to 8a0d93d Compare February 14, 2022 20:26
@findinpath findinpath requested a review from findepi February 15, 2022 05:18
@findinpath findinpath force-pushed the hive-invalidate-caching-directory-lister branch 5 times, most recently from 94ced51 to bc53cc2 Compare February 15, 2022 05:48
@findepi
Copy link
Member

findepi commented Feb 15, 2022

( currently, depends on #11047 )

@findinpath findinpath force-pushed the hive-invalidate-caching-directory-lister branch 2 times, most recently from a1c823b to 539bd8a Compare February 16, 2022 19:43
@findinpath
Copy link
Contributor Author

Rebased on upstream/master to make use of the EvictableCache improvements #11047

@findinpath findinpath force-pushed the hive-invalidate-caching-directory-lister branch from 539bd8a to 17178c2 Compare February 17, 2022 14:36
@findinpath findinpath requested a review from findepi February 17, 2022 14:37
{
if (isCacheEnabledFor(table.getSchemaTableName()) && isLocationPresent(table.getStorage())) {
if (table.getPartitionColumns().isEmpty()) {
cache.invalidate(new Path(table.getStorage().getLocation()));
Copy link
Contributor Author

@findinpath findinpath Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the usage of the constructor with an unexpected path cause us problems?
I could use String instead of Path for the cache keys to be on the safe side.

@findinpath findinpath force-pushed the hive-invalidate-caching-directory-lister branch from 17178c2 to 0428a10 Compare February 18, 2022 15:26
Notify the CachingDirectoryLister about changes that happen
on the table in order to be able to invalidate
the cached directory listings for Hive tables or
partitions of the Hive tables.

When the content of a table/partition changes,
all the cached paths for under the corresponding
table/partition directory will be
invalidated from the cache.

Co-authored-by: Piotr Findeisen <[email protected]>
@findinpath findinpath force-pushed the hive-invalidate-caching-directory-lister branch from 0428a10 to ba52cee Compare February 18, 2022 16:57
@findepi findepi merged commit 559d443 into trinodb:master Feb 21, 2022
@findepi findepi mentioned this pull request Feb 21, 2022
@github-actions github-actions bot added this to the 372 milestone Feb 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Incorrect query results when caching directory lister is used
3 participants