You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
This issue is to address the remaining tasks from an initial parallel CSV scan PR #6801
The remaining tasks:
Use get_opts() for range read on local FS get_opts() is an interface for range streaming read from ObjectStore (local FS/ cloud storage), currently it's not supported for range read on local FS https://github.com/apache/arrow-rs/blob/0d4e6a727f113f42d58650d2dbecab89b22d4e28/object_store/src/lib.rs#L355
When it's implemented in arrow-rs, we can use it in parallel CSV scan implementation and possibly get some performance improvement (the current implementation will copy the whole CSV file range into memory at once instead of in a streaming fashion)
Use only 1 get operation from ObjectStore for each partition instead of 3 (see original PR discussion)
It's easier to do task 2 after 1 is done (can do tests on the local filesystem)
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem or challenge?
This issue is to address the remaining tasks from an initial parallel CSV scan PR #6801
The remaining tasks:
get_opts()
for range read on local FSget_opts()
is an interface for range streaming read from ObjectStore (local FS/ cloud storage), currently it's not supported for range read on local FS https://github.com/apache/arrow-rs/blob/0d4e6a727f113f42d58650d2dbecab89b22d4e28/object_store/src/lib.rs#L355When it's implemented in
arrow-rs
, we can use it in parallel CSV scan implementation and possibly get some performance improvement (the current implementation will copy the whole CSV file range into memory at once instead of in a streaming fashion)It's easier to do task 2 after 1 is done (can do tests on the local filesystem)
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: