-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve scan perf by re-enable prefetching in ScanNode #473
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what the perf impact is. Also CI
rust/benches/scan.rs
Outdated
@@ -15,6 +15,17 @@ | |||
// specific language governing permissions and limitations | |||
// under the License. | |||
|
|||
//! Before running the dataset, prepare a "test.lance" dataset, in the | |||
//! `lance/rust` directly. There is no limitation in the dataset size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
@@ -15,6 +15,17 @@ | |||
// specific language governing permissions and limitations | |||
// under the License. | |||
|
|||
//! Before running the dataset, prepare a "test.lance" dataset, in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how to prepare though? python? rust script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use either python/rust to generate the dataset. It is also a good test for guaranteeing compatibility between implementations.
So full scan of the highly nested dataset, now it is 3-4x faster than previous, and it is about 2x than the C++ implementation. |
It starts to return batch unordered due to prefetch. need to change the test , working on it. |
No description provided.