-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend metadata cache flush procedure to flush specific caches #10385
Conversation
db3b253
to
42dc5d7
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/cache/CachingHiveMetastore.java
Outdated
Show resolved
Hide resolved
@aczajkowski Could you provide a rationale for this feature? Caches are usually not something that one should micromanage. Why flushing entire cache is not sufficient? |
@sopel39 This is mostly for large data sets where deltas from recent period are being updated (overwritten) since next period starts. Eg. Load events from current day (0:00 - 6:00] after six hours export and load events from (0:00 - 12:00] Currently each time delta is being written we:
|
I would focus on that particular partition cache usecase then and avoid other complexity for now. We could have |
I agree we should expose as little functionality as sufficient. Every functionality adds up to maintenance cost
There is no "partition cache". There is "metadata cache containing partition information", and there also is "file listing cache containing partitions' data files". I like the idea of "overloading" Since we may want to be future-proof here and keep ability to add more options to the procedure (hopefully this never happens, but butter be prepared), we could allow syntaxes
while disallowing syntax
to do this, we could add a fake parameter as the first one: return new Procedure(
"system",
"flush_metadata_cache",
ImmutableList.of(
new Procedure.Argument(
"$fake_first_parameter",
VARCHAR,
false,
"procedure should only be invoked with name parameters"),
new Procedure.Argument(
"schema_nname",
VARCHAR,
false,
...
), |
Do the details matter for end user? I don't think user cares what is internal cache layout. All he wants is to evict partition information from metadata cache. Thus |
As for the user interface I am on @findepi's side with it. Having a single |
42dc5d7
to
82192a3
Compare
...src/main/java/io/trino/plugin/hive/metastore/procedure/FlushHiveMetastoreCacheProcedure.java
Show resolved
Hide resolved
82192a3
to
dec68a2
Compare
6f4892d
to
a7de198
Compare
.../test/java/io/trino/plugin/hive/metastore/cache/TestCachingHiveMetastoreWithQueryRunner.java
Outdated
Show resolved
Hide resolved
@@ -136,6 +136,26 @@ public void testFlushHiveMetastoreCacheProcedureCallable() | |||
queryRunner.execute(renamedColumnQuery); | |||
} | |||
|
|||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestCachingHiveMetastoreWithQueryRunner -- this is an odd name for a test class.
It's named like this to differentiate from a unit test TestCachingHiveMetastore, and so it should be called TestCachingHiveMetastoreQueries (alas, this already exists! let's merge the classes -- as a follow-up)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we could merge them there are some differences in QueryRunner setup, but i think we can manage to adjust.
.../test/java/io/trino/plugin/hive/metastore/cache/TestCachingHiveMetastoreWithQueryRunner.java
Outdated
Show resolved
Hide resolved
.../test/java/io/trino/plugin/hive/metastore/cache/TestCachingHiveMetastoreWithQueryRunner.java
Outdated
Show resolved
Hide resolved
...src/main/java/io/trino/plugin/hive/metastore/procedure/FlushHiveMetastoreCacheProcedure.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/cache/CachingHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/cache/CachingHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/cache/CachingHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/cache/CachingHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseTestHivePartitionsOnDataLake.java
Outdated
Show resolved
Hide resolved
a7de198
to
1db58b8
Compare
4bac055
to
6ffcd24
Compare
@findepi Thx for review and approval. Tests got green. Do you want some additional reviewers to approve or we could merge ? |
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseTestHivePartitionsOnDataLake.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseTestHivePartitionsOnDataLake.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseTestHivePartitionsOnDataLake.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Some comments to test code.
6ffcd24
to
7de8efb
Compare
@losipiuk applied your comments. Please let me know if updated integration test is ok now. |
7de8efb
to
0e682c0
Compare
This PR extends existing Hive connector procedure
system.flush_metadata_cache()
with optional parameters which will allow to make invocation more specific to concrete schema, table or even partition.E.g.