Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SKLearnVectorStore #5305

Merged
merged 9 commits into from
May 28, 2023
Merged

Add SKLearnVectorStore #5305

merged 9 commits into from
May 28, 2023

Conversation

mrtj
Copy link
Contributor

@mrtj mrtj commented May 26, 2023

Add SKLearnVectorStore

This PR adds SKLearnVectorStore, a simply vector store based on NearestNeighbors implementations in the scikit-learn package. This provides a simple drop-in vector store implementation with minimal dependencies (scikit-learn is typically installed in a data scientist / ml engineer environment). The vector store can be persisted and loaded from json, bson and parquet format.

SKLearnVectorStore has soft (dynamic) dependency on the scikit-learn, numpy and pandas packages. Persisting to bson requires the bson package, persisting to parquet requires the pyarrow package.

Before submitting

Integration tests are provided under tests/integration_tests/vectorstores/test_sklearn.py

Sample usage notebook is provided under docs/modules/indexes/vectorstores/examples/sklear.ipynb

Who can review?

Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested:

@hwchase17 - project lead

VectorStores / Retrievers / Memory
@dev2049

@eyurtsev
Copy link
Collaborator

Thank you for the contribution @mrtj !

Code is looking great! One request is if you could move the test from integration to unit test.

Follow these guidelines -- https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md#working-with-optional-dependencies

This will allow the tests to be picked up on CI with an optional dependency

@dev2049 dev2049 added 03 enhancement Enhancement of existing functionality Ɑ: vector store Related to vector store module labels May 26, 2023
@dev2049 dev2049 merged commit 5f45523 into langchain-ai:master May 28, 2023
@danielchalef danielchalef mentioned this pull request Jun 5, 2023
Undertone0809 pushed a commit to Undertone0809/langchain that referenced this pull request Jun 19, 2023
# Add SKLearnVectorStore

This PR adds SKLearnVectorStore, a simply vector store based on
NearestNeighbors implementations in the scikit-learn package. This
provides a simple drop-in vector store implementation with minimal
dependencies (scikit-learn is typically installed in a data scientist /
ml engineer environment). The vector store can be persisted and loaded
from json, bson and parquet format.

SKLearnVectorStore has soft (dynamic) dependency on the scikit-learn,
numpy and pandas packages. Persisting to bson requires the bson package,
persisting to parquet requires the pyarrow package.

## Before submitting

Integration tests are provided under
`tests/integration_tests/vectorstores/test_sklearn.py`

Sample usage notebook is provided under
`docs/modules/indexes/vectorstores/examples/sklear.ipynb`

Co-authored-by: Dev 2049 <[email protected]>
This was referenced Jun 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
03 enhancement Enhancement of existing functionality Ɑ: vector store Related to vector store module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants