Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements for the Delta Lake history system table #18427

Merged
merged 3 commits into from
Aug 8, 2023

Conversation

alexjo2144
Copy link
Member

Description

The $history table does not need to lead the entire transaction log when a version filter is supplied.

Additional context and related issues

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Delta Lake
* Improve performance of the $history system table.

@cla-bot cla-bot bot added the cla-signed label Jul 26, 2023
@alexjo2144 alexjo2144 requested review from findinpath and ebyhr July 26, 2023 16:40
@alexjo2144 alexjo2144 force-pushed the delta/history-table branch from 30e3212 to df1059c Compare July 26, 2023 16:43
@alexjo2144 alexjo2144 changed the title Performance improvements for Delta Lake system tables Performance improvements for the Delta Lake history system table Jul 26, 2023
@github-actions github-actions bot added the delta-lake Delta Lake connector label Jul 26, 2023
@alexjo2144 alexjo2144 force-pushed the delta/history-table branch from df1059c to a7d2ed5 Compare July 27, 2023 15:42
@findinpath
Copy link
Contributor

FYI I have a PR opened a few months ago on a related topic - #16192

@alexjo2144
Copy link
Member Author

FYI I have a PR opened a few months ago on a related topic - #16192

Which I did not review.... sorry for missing it. Should I close this?

@findinpath
Copy link
Contributor

FYI I have a PR opened a few months ago on a related topic - #16192

Which I did not review.... sorry for missing it. Should I close this?

My PR is rather outdated.

I'm thinking that the idea of an iterator over the transaction log entries is worth pursuing.

https://github.com/trinodb/trino/pull/16192/files#diff-9d037e6fdb6afb57337f6259ef3fc8c33f5b1c8e23b491070306ab1e9c08e902

@alexjo2144
Copy link
Member Author

Pinned this issue down today, looks like getSystemTable gets called a whole bunch of times for a single query so any file system operations we do there get done ~22 times. For now I'm moving most of them to be lazy but they shouldn't need to be.

@alexjo2144 alexjo2144 force-pushed the delta/history-table branch from 643540b to 071ed73 Compare August 7, 2023 15:40
@alexjo2144 alexjo2144 force-pushed the delta/history-table branch from 071ed73 to ddbc7ce Compare August 7, 2023 15:46
The $history table does not need to lead the entire transaction
log when a version filter is supplied.
@alexjo2144 alexjo2144 force-pushed the delta/history-table branch from ddbc7ce to bdbbefc Compare August 7, 2023 20:00
@findepi findepi merged commit 9d2e3e3 into trinodb:master Aug 8, 2023
@github-actions github-actions bot added this to the 423 milestone Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector
Development

Successfully merging this pull request may close these issues.

3 participants