Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve scan perf by re-enable prefetching in ScanNode #473

Merged
merged 5 commits into from
Jan 27, 2023
Merged

Conversation

eddyxu
Copy link
Contributor

@eddyxu eddyxu commented Jan 27, 2023

No description provided.

@eddyxu eddyxu requested a review from changhiskhan January 27, 2023 20:31
@eddyxu eddyxu self-assigned this Jan 27, 2023
@eddyxu eddyxu added benchmark rust Rust related tasks labels Jan 27, 2023
@eddyxu
Copy link
Contributor Author

eddyxu commented Jan 27, 2023

Screenshot 2023-01-27 at 12 32 48 PM

Before vs after

Copy link
Contributor

@changhiskhan changhiskhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what the perf impact is. Also CI

@@ -15,6 +15,17 @@
// specific language governing permissions and limitations
// under the License.

//! Before running the dataset, prepare a "test.lance" dataset, in the
//! `lance/rust` directly. There is no limitation in the dataset size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@@ -15,6 +15,17 @@
// specific language governing permissions and limitations
// under the License.

//! Before running the dataset, prepare a "test.lance" dataset, in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to prepare though? python? rust script?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use either python/rust to generate the dataset. It is also a good test for guaranteeing compatibility between implementations.

@eddyxu
Copy link
Contributor Author

eddyxu commented Jan 27, 2023

Curious what the perf impact is. Also CI

So full scan of the highly nested dataset, now it is 3-4x faster than previous, and it is about 2x than the C++ implementation.

@eddyxu
Copy link
Contributor Author

eddyxu commented Jan 27, 2023

Also CI

It starts to return batch unordered due to prefetch. need to change the test , working on it.

@eddyxu eddyxu merged commit 5ca735e into main Jan 27, 2023
@eddyxu eddyxu deleted the lei/scan_perf branch January 27, 2023 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark rust Rust related tasks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants