-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not cache Hadoop LocatedFileStatus objects #14408
Conversation
Helps with #14313 |
Helps or fixes? the Line 119 in 9859b91
|
plugin/trino-hive/src/main/java/io/trino/plugin/hive/fs/TrinoFileStatus.java
Outdated
Show resolved
Hide resolved
Let's maybe call it
it;s OK to address memory-aware caching configuration (#14408 (comment)) as a follow-up |
Instead of caching Hadoop LocatedFileStatus objects which contain many fields we don't need, store only the information we need.
22f7747
to
16701ff
Compare
I was actually thinking about reducing limits (if needed) and not too much complexity.
|
it's per transaction, and Hive connector supports transactions spanning multiple queries a single table can perhaps comprise of a large number of partitions (even more so after @arhimondr 's #14225) and these of large number of files, so maybe it needs to be size limited? For context, Delta connector has active files cache (getting list of files is more expensive there), and it has been proven to raise connector memory requirements significantly when that connector is in use. |
I mean concurrent queries or long running queries (file listing will still be cached during transaction).
I'm not sure extra complexity of counting bytes is worth it compared to just dropping listing limit to 10_000 (although I'm not sure how effective cache is then). Also alternative is having some global size limit for transactional cache (but one has to keep object lifecycle then). |
LocatedFileStatus objects contain much more fields than are required by Trino. It doesn't make sense
to cache them.
Description
Non-technical explanation
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: