langchain-ai · dev2049 · May 30, 2023 · May 27, 2023 · May 28, 2023 · May 28, 2023
diff --git a/docs/modules/indexes/vectorstores/examples/mongodb_atlas_vector_search.ipynb b/docs/modules/indexes/vectorstores/examples/mongodb_atlas_vector_search.ipynb
@@ -0,0 +1,170 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "683953b3",
+   "metadata": {},
+   "source": [
+    "# MongoDB Atlas Vector Search\n",
+    "\n",
+    ">[MongoDB Atlas](https://www.mongodb.com/docs/atlas/) is a document database managed in the cloud. It also enables Lucene and its vector search feature.\n",
+    "\n",
+    "This notebook shows how to use the functionality related to the `MongoDB Atlas Vector Search` feature where you can store your embeddings in MongoDB documents and create a Lucene vector index to perform a KNN search.\n",
+    "\n",
+    "It uses the [knnBeta Operator](https://www.mongodb.com/docs/atlas/atlas-search/knn-beta) available in MongoDB Atlas Search. This feature is in early access and available only for evaluation purposes, to validate functionality, and to gather feedback from a small closed group of early access users. It is not recommended for production deployments as we may introduce breaking changes.\n",
+    "\n",
+    "To use MongoDB Atlas, you must have first deployed a cluster. Free clusters are available. \n",
+    "Here is the MongoDB Atlas [quick start](https://www.mongodb.com/docs/atlas/getting-started/)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b4c41cad-08ef-4f72-a545-2151e4598efe",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "!pip install pymongo"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c1e38361-c1fe-4ac6-86e9-c90ebaf7ae87",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "MONGODB_ATLAS_URI = os.environ['MONGODB_ATLAS_URI']"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "320af802-9271-46ee-948f-d2453933d44b",
+   "metadata": {},
+   "source": [
+    "We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key. Make sure the environment variable `OPENAI_API_KEY` is set up before proceeding."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1f3ecc42",
+   "metadata": {},
+   "source": [
+    "Now, let's create a Lucene vector index on your cluster. In the below example, `embedding` is the name of the field that contains the embedding vector. Please refer to the [documentation](https://www.mongodb.com/docs/atlas/atlas-search/define-field-mappings-for-vector-search) to get more details on how to define an Atlas Search index.\n",
+    "You can name the index `langchain_demo` and create the index on the namespace `lanchain_db.langchain_col`. Finally, write the following definition in the JSON editor:\n",
+    "\n",
+    "```json\n",
+    "{\n",
+    "  \"mappings\": {\n",
+    "    \"dynamic\": true,\n",
+    "    \"fields\": {\n",
+    "      \"embedding\": {\n",
+    "        \"dimensions\": 1536,\n",
+    "        \"similarity\": \"cosine\",\n",
+    "        \"type\": \"knnVector\"\n",
+    "      }\n",
+    "    }\n",
+    "  }\n",
+    "}\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "aac9563e",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
+    "from langchain.text_splitter import CharacterTextSplitter\n",
+    "from langchain.vectorstores import MongoDBAtlasVectorSearch\n",
+    "from langchain.document_loaders import TextLoader"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "a3c3999a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.document_loaders import TextLoader\n",
+    "loader = TextLoader('../../../state_of_the_union.txt')\n",
+    "documents = loader.load()\n",
+    "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
+    "docs = text_splitter.split_documents(documents)\n",
+    "\n",
+    "embeddings = OpenAIEmbeddings()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6e104aee",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pymongo import MongoClient\n",
+    "\n",
+    "# initialize MongoDB python client\n",
+    "client = MongoClient(MONGODB_ATLAS_CONNECTION_STRING)\n",
+    "\n",
+    "db_name = \"lanchain_db\"\n",
+    "collection_name = \"langchain_col\"\n",
+    "namespace = f\"{db_name}.{collection_name}\"\n",
+    "index_name = \"langchain_demo\"\n",
+    "\n",
+    "# insert the documents in MongoDB Atlas with their embedding\n",
+    "docsearch = MongoDBAtlasVectorSearch.from_documents(\n",
+    "    docs,\n",
+    "    embeddings,\n",
+    "    client=client,\n",
+    "    namespace=namespace,\n",
+    "    index_name=index_name\n",
+    ")\n",
+    "\n",
+    "# perform a similarity search between the embedding of the query and the embeddings of the documents\n",
+    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
+    "docs = docsearch.similarity_search(query)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9c608226",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(docs[0].page_content)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}