-
Notifications
You must be signed in to change notification settings - Fork 16.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from jupyterjazz/docarray-vectorstore
Docarray vectorstore
- Loading branch information
Showing
10 changed files
with
1,381 additions
and
199 deletions.
There are no files selected for viewing
236 changes: 236 additions & 0 deletions
236
docs/modules/indexes/vectorstores/examples/docarray_hnsw_search.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,236 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "2ce41f46-5711-4311-b04d-2fe233ac5b1b", | ||
"metadata": {}, | ||
"source": [ | ||
"# DocArrayHnswSearch\n", | ||
"\n", | ||
">[DocArrayHnswSearch](https://docs.docarray.org/user_guide/storing/index_hnswlib/) is a lightweight Document Index implementation provided by [Docarray](https://docs.docarray.org/) that runs fully locally and is best suited for small- to medium-sized datasets. It stores vectors on disk in [hnswlib](https://github.com/nmslib/hnswlib), and stores all other data in [SQLite](https://www.sqlite.org/index.html).\n", | ||
"\n", | ||
"This notebook shows how to use functionality related to the `DocArrayHnswSearch`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "8ce1b8cb-dbf0-40c3-99ee-04f28143331b", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"!pip install \"docarray[hnswlib]\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"id": "878f17df-100f-4854-9e87-472cf36d51f3", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [ | ||
{ | ||
"name": "stdin", | ||
"output_type": "stream", | ||
"text": [ | ||
" ········\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"# get a token: https://platform.openai.com/account/api-keys\n", | ||
"\n", | ||
"from getpass import getpass\n", | ||
"\n", | ||
"OPENAI_API_KEY = getpass()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"id": "82d9984a-6031-403d-a977-6bc98d6be23a", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"\n", | ||
"os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"id": "b757afef-ef0a-465d-8e8a-9aadb9c32b88", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [ | ||
{ | ||
"name": "stderr", | ||
"output_type": "stream", | ||
"text": [ | ||
"/Users/jinaai/Desktop/langchain/venv/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", | ||
" from .autonotebook import tqdm as notebook_tqdm\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"from langchain.embeddings.openai import OpenAIEmbeddings\n", | ||
"from langchain.text_splitter import CharacterTextSplitter\n", | ||
"from langchain.vectorstores import DocArrayHnswSearch\n", | ||
"from langchain.document_loaders import TextLoader" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"id": "605e200e-e711-486b-b36e-cbe5dd2512d7", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain.document_loaders import TextLoader\n", | ||
"loader = TextLoader('../../../state_of_the_union.txt')\n", | ||
"documents = loader.load()\n", | ||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", | ||
"docs = text_splitter.split_documents(documents)\n", | ||
"\n", | ||
"embeddings = OpenAIEmbeddings()\n", | ||
"\n", | ||
"db = DocArrayHnswSearch.from_documents(docs, embeddings, work_dir='hnswlib_store/', n_dim=1536)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "ed6f905b-4853-4a44-9730-614aa8e22b78", | ||
"metadata": {}, | ||
"source": [ | ||
"## Similarity search" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"id": "4d7e742f-2002-449d-a10e-16046890906c", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"query = \"What did the president say about Ketanji Brown Jackson\"\n", | ||
"docs = db.similarity_search(query)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"id": "0da9e26f-1fc2-48e6-95a7-f692c853bbd3", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n", | ||
"\n", | ||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n", | ||
"\n", | ||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n", | ||
"\n", | ||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"print(docs[0].page_content)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "3febb987-e903-416f-af26-6897d84c8d61", | ||
"metadata": {}, | ||
"source": [ | ||
"## Similarity search with score" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"id": "40764fdd-357d-475a-8152-5f1979d61a45", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"docs = db.similarity_search_with_score(query)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 8, | ||
"id": "a479fc46-b299-4330-89b9-e9b5a218ea03", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={}),\n", | ||
" 0.36962226)" | ||
] | ||
}, | ||
"execution_count": 8, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"docs[0]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 9, | ||
"id": "4d3d4e97-5d2b-4571-8ff9-e3f6b6778714", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"import shutil\n", | ||
"# delete the dir\n", | ||
"shutil.rmtree('hnswlib_store')" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.9.16" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.