Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add configuration to create v3 ivf_pq indices via python #2941

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ankitvij-db
Copy link
Contributor

PR adds the ability to create v3 ivf_pq indices from python by passing in a bool parameter via the VectorIndexParams python. Right now it just sets the value to true for IVF_PQ but it can be used for other indices as well. Also, it doesn't change the JNI side for IVF_PQ for now, but can be added as a follow-up PR.

Added a bunch of tests in python and rust to test this.

Copy link

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@github-actions github-actions bot added enhancement New feature or request python java labels Sep 27, 2024
@ankitvij-db ankitvij-db changed the title feat: Add configuration to create v3 ivf_pq indices via python feat: add configuration to create v3 ivf_pq indices via python Sep 27, 2024
@codecov-commenter
Copy link

codecov-commenter commented Sep 27, 2024

Codecov Report

Attention: Patch coverage is 98.63014% with 2 lines in your changes missing coverage. Please review.

Project coverage is 78.99%. Comparing base (f98ffdd) to head (e598b0e).

Files with missing lines Patch % Lines
java/core/lance-jni/src/utils.rs 0.00% 1 Missing ⚠️
rust/lance/src/index/vector/ivf.rs 99.23% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2941      +/-   ##
==========================================
+ Coverage   78.95%   78.99%   +0.04%     
==========================================
  Files         238      238              
  Lines       75577    75617      +40     
  Branches    75577    75617      +40     
==========================================
+ Hits        59674    59737      +63     
+ Misses      12874    12847      -27     
- Partials     3029     3033       +4     
Flag Coverage Δ
unittests 78.99% <98.63%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


/// Use V3 Index builder.
/// Only used by IVF_PQ index since IVF_PQ still creates old index format by default.
pub force_use_new_index_format: Option<bool>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use an Enum to specify which version to use? We will likely to have new index format down the road.

Copy link
Contributor Author

@ankitvij-db ankitvij-db Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought of doing that and just pass the LanceVersion, however, I was not sure how does it tie to the index builder. Can change it to the lance version if that makes it clearer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I would prefer a index_file_version parameter over a use_new_index_format flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are V1 and V3 if you are going to create that enum
V2 has been removed, no index is in V2 format now.

@eddyxu eddyxu requested a review from BubbleCal September 27, 2024 16:55
@ankitvij-db
Copy link
Contributor Author

@BubbleCal @westonpace Wanted to see if there are any other review comments that I need to address apart from the enum change that @eddyxu suggested

Comment on lines +2588 to +2596
#[tokio::test]
async fn test_create_ivf_pq_dot() {
run_ivf_pq_dot_test(false).await;
}

#[tokio::test]
async fn test_create_ivf_pq_v3_dot() {
run_ivf_pq_dot_test(true).await;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of making separate test functions, could we parametrize the test functions with rstest?


/// Use V3 Index builder.
/// Only used by IVF_PQ index since IVF_PQ still creates old index format by default.
pub force_use_new_index_format: Option<bool>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I would prefer a index_file_version parameter over a use_new_index_format flag.

@BubbleCal
Copy link
Contributor

BubbleCal commented Nov 20, 2024

I just added IndexFileVersion to lance, @ankitvij-db you can try reusing that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request java python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants