Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow filesystem caching on coordinator unconditionally #21987

Merged
merged 1 commit into from
May 17, 2024

Conversation

raunaqmorarka
Copy link
Member

Description

Earlier this was used to work around bugs which were encountered when filesystem caching was enabled on coordinator, it's not used for that anymore.
Removing this avoids confusion about caching not working on hive connector in a single node setup

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Hive
* Allow file system caching on coordinator. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label May 16, 2024
@raunaqmorarka raunaqmorarka requested a review from jkylling May 16, 2024 06:21
@github-actions github-actions bot added iceberg Iceberg connector delta-lake Delta Lake connector labels May 16, 2024
Earlier this was used to work around bugs which were encountered
when filesystem caching was enabled on coordinator, it's not used for that anymore.
Removing this avoids confusion about caching not working on hive connector
in a single node setup
@osscm
Copy link
Contributor

osscm commented May 16, 2024

@raunaqmorarka not related to this PR.

I was thinking about one question, that user might want to configure cache with differently at coordinator vs worker.
we have this config available, will be applicable to coordinator and worker both.

https://trino.io/docs/current/object-storage/file-system-cache.html#configuration

Any thoughts?

@raunaqmorarka
Copy link
Member Author

@raunaqmorarka not related to this PR.

I was thinking about one question, that user might want to configure cache with differently at coordinator vs worker. we have this config available, will be applicable to coordinator and worker both.

https://trino.io/docs/current/object-storage/file-system-cache.html#configuration

Any thoughts?

Could you elaborate what this difference in configuring caching on the coordinator vs worker would be and why ?

@findepi
Copy link
Member

findepi commented May 16, 2024

Earlier this was used to work around bugs which were encountered when filesystem caching was enabled on coordinator, it's not used for that anymore.

do you maybe remember what these were?

@raunaqmorarka
Copy link
Member Author

Earlier this was used to work around bugs which were encountered when filesystem caching was enabled on coordinator, it's not used for that anymore.

do you maybe remember what these were?

I forget which one exactly, but there were issues initially when we tried to enabled caching on coordinator in delta and iceberg
91b22ca
#20696
There was also some ongoing discussion about what is safe to cache in delta lake metadata.
So we ended up enabling it coordinator in delta and iceberg in follow-ups as these were resolved and this mechanism was useful in doing that in steps. Now we no longer need it, and it causes some confusion to new users when filesystem caching doesn't work in single node setups in hive.

@findepi
Copy link
Member

findepi commented May 16, 2024

91b22ca
#20696

these do not look coordinator-related

@raunaqmorarka
Copy link
Member Author

91b22ca
#20696

these do not look coordinator-related

I think the relation with coordinator was that only coordinator would load metadata files and that metadata would be non-orc/parquet formats which use streaming read APIs and the bugs were in those implementations.

@raunaqmorarka raunaqmorarka merged commit 42c27a2 into trinodb:master May 17, 2024
64 checks passed
@raunaqmorarka raunaqmorarka deleted the alluxio-coord branch May 17, 2024 03:01
@github-actions github-actions bot added this to the 449 milestone May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

5 participants