Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefetch page index (#4090) #4216

Merged
merged 5 commits into from
May 17, 2023
Merged

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #4090

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

This adds Send constraints to fetch_parquet_metadata, this is unlikely to trip people up in practice

@tustvold tustvold added the api-change Changes to the arrow API label May 15, 2023
@github-actions github-actions bot added the parquet Changes to the parquet crate label May 15, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tustvold -- I went through this PR carefully and it looks really nice 👌

I wonder if the same basic pattern could be applied to the Bloom filters as well, or if they suffer from the issue that they don't actually appear in the footer 🤔

cc @thinkharderdev and @Ted-Jiang

/// the last 8 bytes to determine the footer's precise length, before
/// issuing a second request to fetch the metadata bytes
///
/// If a `prefetch` is `Some`, this will read the specified number of bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

let mut loader = MetadataLoader::load(f, len, Some(130650)).await.unwrap();
assert_eq!(fetch_count.load(Ordering::SeqCst), 1);
loader.load_page_index(true, true).await.unwrap();
assert_eq!(fetch_count.load(Ordering::SeqCst), 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@Ted-Jiang
Copy link
Member

Thank you @tustvold -- I went through this PR carefully and it looks really nice 👌

I wonder if the same basic pattern could be applied to the Bloom filters as well, or if they suffer from the issue that they don't actually appear in the footer 🤔

cc @thinkharderdev and @Ted-Jiang

Thanks for ping me, i will review this carefully today.

Copy link
Member

@Ted-Jiang Ted-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@tustvold
Copy link
Contributor Author

I wonder if the same basic pattern could be applied to the Bloom filters as well

Yes, the design of this was with them in mind. Bloom filters can be stored at the end of the file, which would allow prefetching to help, I'm not sure the writer currently does this though

@tustvold tustvold merged commit 8580e85 into apache:master May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Preload page index for async ParquetObjectReader
3 participants