Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manifest file caching support for Iceberg Hive Catalog #21862

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

agrawalreetika
Copy link
Member

Description

Add Manifest file caching support for Iceberg Hive Catalog

Motivation and Context

Starting from Iceberg version 1.1.0, Apache Iceberg provides a mechanism to cache the contents of Iceberg manifest files in memory. This manifest caching feature helps to reduce repeated reads of small Iceberg manifest files from remote storage.
Reference: apache/iceberg#4518

Impact

No Impact. Default behaviour, Disabled. Feature can be enabled via catalog configuration changes.

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Iceberg Changes
* Add Manifest file caching support for Iceberg Hive Catalog

Copy link

linux-foundation-easycla bot commented Feb 5, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@agrawalreetika agrawalreetika force-pushed the manifest-caching-hive branch 2 times, most recently from 162c69a to 27dee87 Compare February 5, 2024 16:05
@agrawalreetika agrawalreetika marked this pull request as ready for review February 6, 2024 04:23
@agrawalreetika agrawalreetika requested a review from a team as a code owner February 6, 2024 04:23
Copy link

github-actions bot commented Feb 6, 2024

Codenotify: Notifying subscribers in CODENOTIFY files for diff f393c60...9a0369c.

No notifications.

Copy link
Member

@imjalpreet imjalpreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there are some test failures for iceberg product tests(with kerberos and impersonation enabled). I haven't had a chance to look at all the changes yet but I suspect this might be because the iceberg library is probably not able to find the relevant hadoop configurations. Let's verify whether the config is getting passed as expected after the removal of custom FileIO in this PR.

steveburnett
steveburnett previously approved these changes Feb 7, 2024
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

@tdcmeehan tdcmeehan self-assigned this Feb 17, 2024
tdcmeehan
tdcmeehan previously approved these changes Feb 17, 2024
Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, please look into the test failures and fix the merge conflicts

tdcmeehan
tdcmeehan previously approved these changes Mar 6, 2024
@tdcmeehan
Copy link
Contributor

Please merge when you get a clean test run

@tdcmeehan
Copy link
Contributor

Blocked on apache/iceberg#9991

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants