-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async ParquetExec #1617
Async ParquetExec #1617
Conversation
afbc9df
to
0617169
Compare
0617169
to
76d5fa0
Compare
This can probably be refreshed after #1775 is merged |
I'm going to wait until after #1738 is merged and I have time to do some comparative investigation. Async is certainly not free as the block structure of parquet is not particularly amenable to streaming. Prior to merge I'd like a better handle on what the trade-offs here are, I'd expect memory usage to be higher, but performance may be better, not sure 😄 |
#1738 |
Running the benchmarks against a backported apache/arrow-rs#1418 show the performance of this to be significantly worse than the current Sync reader (~2x). It is possible this is just a simple oversight with a simple fix, but I'm unlikely to have time to look into it this week. Async is far from free and so I'd expected the performance to be somewhat worse, but not that much worse... Hopefully I will have some time next week, otherwise I'll keep this pootling along on the backburner until it is ready for prime-time 😄 |
Marking PR as stale - will close in a while to clean up the PR list. Feel free to reopen if more work is planned |
I've not thoroughly tested this yet, it may represent a performance regression, and will likely need more memory
Which issue does this PR close?
Closes #TBD.
Rationale for this change
Avoids needing to use a blocking threadpool for reading Parquet data, and opens the door for fetching data from object storage.
What changes are included in this PR?
Updates ParquetExec to use the POC async ParquetRecordBatchStream in apache/arrow-rs#1154
Are there any user-facing changes?
Yes, I had to alter the ObjectReader trait