Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async mongo document loader #4285

Closed
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
357 commits
Select commit Hold shift + click to select a range
d6e0b9a
fix homepage typo (#4883)
cjcjameson May 17, 2023
ef8b5f6
Tiny code review and docs fix for Docugami DataLoader (#4877)
tjaffri May 17, 2023
4c3ab55
feat(Add FastAPI + Vercel deployment option): (#4520)
msoedov May 17, 2023
1ff7c95
Bold Crumbs (#4876)
vowelparrot May 17, 2023
5c9205d
ConversationalChatAgent: Allow customizing `TEMPLATE_TOOL_RESPONSE` (…
FOLLGAD May 18, 2023
df0c33a
Faiss no avx2 (#4895)
dev2049 May 18, 2023
8e41143
Add a generic document loader (#4875)
eyurtsev May 18, 2023
0dc304c
Add html parsers (#4874)
eyurtsev May 18, 2023
e28bdf4
Cadlabs/python tool sanitization (#4754)
dev2049 May 18, 2023
8966f61
Zep memory (#4898)
dev2049 May 18, 2023
a4ac006
Update gallery (#4873)
dev2049 May 18, 2023
41e2394
Fix AzureOpenAI embeddings documentation example. model -> deployment…
IsmaelGSerrano May 18, 2023
613bf9b
Update getting_started.md (#4482)
bongsang May 18, 2023
c998569
docs: text splitters improvements (#4490)
leo-gan May 18, 2023
9e2227b
Harrison/serper api bug (#4902)
hwchase17 May 18, 2023
ba023d5
Harrison/faiss norm (#4903)
hwchase17 May 18, 2023
9165267
Harrison/improved retry tool (#4842)
hwchase17 May 18, 2023
b8d4893
Harrison/unified objectives (#4905)
hwchase17 May 18, 2023
dfbf45f
bump version to 173 (#4910)
hwchase17 May 18, 2023
c06a47a
Load specific file types from Google Drive (issue #4878) (#4926)
eyurtsev May 18, 2023
8c28ad6
API update: Engines -> Models (#4915)
assert6 May 18, 2023
e462028
feat #4479: TextLoader auto detect encoding and improved exceptions (…
eyurtsev May 18, 2023
1ed4228
Fix bilibili (#4860)
yuekaizhang May 18, 2023
7642f21
Add human message as input variable to chat agent prompt creation (#4…
richardyc May 18, 2023
c9a362e
add alias for model (#4553)
hwchase17 May 18, 2023
d5a0704
dont error on sql import (#4647)
hwchase17 May 18, 2023
e2d7677
docs: compound ecosystem and integrations (#4870)
leo-gan May 18, 2023
c9e2a01
Update GPT4ALL integration (#4567)
Chae4ek May 18, 2023
a8ded21
FIX: GPTCache cache_obj creation loop (#4827)
elBarkey May 18, 2023
440b876
Redis kwargs fix (#4936)
dev2049 May 18, 2023
55baa0d
Update redis integration tests (#4937)
dev2049 May 18, 2023
c75c077
docs supabase update (#4935)
leo-gan May 18, 2023
7e8e21c
Correct typo in APIChain example notebook (Farenheit -> Fahrenheit) (…
verygoodsoftwarenotvirus May 18, 2023
3002c1d
fix: error in gptcache example nb (#4930)
so2liu May 18, 2023
c9f963e
Update custom_multi_action_agent.ipynb (#4931)
vishwa-rn May 18, 2023
8f8593a
docs: added `ecosystem/dependents` page (#4941)
leo-gan May 18, 2023
a9bb314
docs: vectorstores, different updates and fixes (#4939)
leo-gan May 18, 2023
5525b70
Chatconv agent: output parser exception (#4923)
blob42 May 18, 2023
c8c2276
Zep Retriever - Vector Search Over Chat History (#4533)
danielchalef May 18, 2023
3df2d83
Fix get_num_tokens for Anthropic models (#4911)
jarib May 18, 2023
e027a38
NIT: Instead of hardcoding k in each definition, define it as a param…
scafati98 May 19, 2023
db6f7ed
[nit] Simplify Spark Creation Validation Check A Little Bit (#4761)
skcoirz May 19, 2023
c069732
Fix for syntax when setting search_path for Snowflake database (#4747)
aboland May 19, 2023
5feb60f
Harrison/spell executor (#4914)
hwchase17 May 19, 2023
88a3a56
Add Spark SQL support (#4602) (#4956)
hwchase17 May 19, 2023
bf5a3c6
Support Databricks in SQLDatabase (#4702)
gengliangwang May 19, 2023
13c3763
Fixed assumptions misspelling (#4961)
rahulraocoder May 19, 2023
e80585b
Update tutorials.md (#4960)
edrickdch May 19, 2023
e68dfa7
Update planner_prompt.py (#4967)
vishwa-rn May 19, 2023
06e5244
power bi api wrapper integration tests & bug fix (#4983)
eyurtsev May 19, 2023
2abf6b9
bump v0.0.174 (#4988)
dev2049 May 19, 2023
a87a252
Remove autoreload in examples (#4994)
gengliangwang May 19, 2023
616e9a9
Bug fixes and error handling in Redis - Vectorstore (#4932)
iamadhee May 19, 2023
22d844d
Add async search with relevance score (#4558)
jpzhangvincent May 19, 2023
56cb77a
Make test gha workflow manually runnable (#4998)
dev2049 May 19, 2023
0ff5956
Adds 'IN' metadata filter for pgvector for checking set presence (#4982)
eyurtsev May 19, 2023
62d0a01
Update python.py (#4971)
pengwork May 19, 2023
729e935
PGVector logger message level (#4920)
jmtristancho May 19, 2023
ddd595f
feature/4493 Improve Evernote Document Loader (#4577)
MikeMcGarry May 19, 2023
080eb1b
Fix graphql tool (#4984)
dev2049 May 19, 2023
2ab0e1d
changed ValueError to ImportError (#5006)
leo-gan May 19, 2023
02632d5
docs: Big Mendable Improvements (#4964)
nickscamara May 19, 2023
ddc2d4c
added instruction about pip install google-gerativeai (#5004)
leo-gan May 19, 2023
f07b9fd
Update the GPTCache example (#4985)
SimFG May 19, 2023
9928fb2
Revert "API update: Engines -> Models (#4915)" (#5008)
dev2049 May 19, 2023
6c60251
Add self query translator for weaviate vectorstore (#4804)
domchan May 19, 2023
2aa3754
Check for single prompt in __call__ method of the BaseLLM class (#4892)
mwinterde May 19, 2023
27e63b9
Add logs command (#5007)
vowelparrot May 20, 2023
3bc0bf0
fix prompt saving (#4987)
dev2049 May 20, 2023
7388248
Streaming only final output of agent (#2483) (#4630)
UmerHA May 20, 2023
9d1280d
bump v175 (#5041)
dev2049 May 20, 2023
a6ef20d
Fix annoying typo in docs (#5029)
tornikeo May 21, 2023
f9f08c4
Add documentation for Databricks integration (#5013)
gengliangwang May 21, 2023
424a573
DOC: Misspelling in agents.rst documentation (#5038)
jeffzheng13 May 21, 2023
8c661ba
change to type checking (#5062)
hwchase17 May 21, 2023
b0431c6
Harrison/psychic (#5063)
hwchase17 May 21, 2023
6c25f86
bump to 176 (#5064)
hwchase17 May 21, 2023
224f73e
move docs
hwchase17 May 21, 2023
0c3de0a
Merge branch 'master' of github.com:hwchase17/langchain
hwchase17 May 21, 2023
bf3f554
feat: batch multiple files in a single Unstructured API request (#4525)
MthwRobinson May 22, 2023
a395ff7
preserve language in conversation retrieval (#4969)
hansvdam May 22, 2023
443ebe2
docs: `Deployments` page moved into `Ecosystem/` (#4949)
leo-gan May 22, 2023
ef7d015
Separate Runner Functions from Client (#5079)
vowelparrot May 22, 2023
785502e
Add 'get_token_ids' method (#4784)
vowelparrot May 22, 2023
49ca027
Improved query, print & exception handling in REPL Tool (#4997)
svdeepak99 May 22, 2023
10ba201
Harrison/neo4j (#5078)
hwchase17 May 22, 2023
fcd88bc
Bump 177 (#5095)
dev2049 May 22, 2023
6eacd88
fix: revert docarray explicit transitive dependencies and use extras …
malandis May 22, 2023
5cd1210
Improving Resilience of MRKL Agent (#5014)
svdeepak99 May 22, 2023
44dc959
Improve pinecone hybrid search retriever adding metadata support (#5…
lbsnrs May 22, 2023
039f8f1
Add the usage of SSL certificates for Elasticsearch and user password…
May 22, 2023
e57ebf3
add get_top_k_cosine_similarity method to get max top k score and ind…
hwaking May 22, 2023
1cb04f2
PowerBI major refinement in working of tool and tweaks in the rest (#…
eavanvalkenburg May 22, 2023
9e64946
fix: add_texts method of Weaviate vector store creats wrong embedding…
Shawn91 May 22, 2023
467ca6f
update langchainplus client and docker file to reflect port changes (…
agola11 May 22, 2023
5b2b436
Fixed import error for AutoGPT e.g. from langchain.experimental.auton…
ankitarya1019 May 22, 2023
5e47c64
Update serpapi.py (#4947)
venetisgr May 22, 2023
c28cc0f
changed ValueError to ImportError (#5103)
leo-gan May 22, 2023
e173e03
fix: assign current_time to datetime.now() if current_time is None (#…
mbchang May 22, 2023
69de33e
Add Mastodon toots loader (#5036)
imrehg May 22, 2023
de6a401
Add OpenLM LLM multi-provider (#4993)
r2d4 May 23, 2023
87bba2e
Pass Dataset Name by Name not Position (#5108)
vowelparrot May 23, 2023
b950022
Fixes issue #5072 - adds additional support to Weaviate (#5085)
jettro May 23, 2023
d56313a
Improve effeciency of TextSplitter.split_documents, iterate once (#5111)
eyurtsev May 23, 2023
d4fd589
WhyLabs callback (#4906)
jamie256 May 23, 2023
d7f807b
Add AzureCognitiveServicesToolkit to call Azure Cognitive Services AP…
whiskyboy May 23, 2023
5c87dbf
Add link to Psychic from document loaders documentation page (#5115)
Ayan-Bandyopadhyay May 23, 2023
753f4cf
bump 178 (#5130)
dev2049 May 23, 2023
7a75bb2
docs: fix minor typo + add wikipedia package installation part in hum…
amicus-veritatis May 23, 2023
5002f3a
solving #2887 (#5127)
tommasodelorenzo May 23, 2023
754b513
Improve PlanningOutputParser whitespace handling (#5143)
TMRolle May 23, 2023
0b542a9
Add ElasticsearchEmbeddings class for generating embeddings using Ela…
jeffvestal May 23, 2023
68f0d45
Adding Weather Loader (#5056)
iamadhee May 23, 2023
de6e6c7
Add MosaicML inference endpoints (#4607)
dakinggg May 23, 2023
9242998
Empty check before pop (#4929)
edwardzjl May 23, 2023
925dd3e
Add async versions of predict() and predict_messages() (#4867)
jlowin May 24, 2023
b1b7f35
fix: fix current_time=Now bug for aadd_documents in TimeWeightedRetri…
mbchang May 24, 2023
de4ef24
Docs: updated getting_started.md (#5151)
DanQuin May 24, 2023
c111134
Clarification of the reference to the "get_text_legth" function in ge…
DanQuin May 24, 2023
3392948
docs: added missed `document_loaders` examples (#5150)
leo-gan May 24, 2023
9c4b43b
Add Typesense vector store (#1674)
jasonbosco May 24, 2023
c81fb88
Vectara (#5069)
ofermend May 24, 2023
faa2665
Beam (#4996)
NolanTrem May 24, 2023
fff21a0
Update rellm_experimental.ipynb (#5189)
eltociear May 24, 2023
cf19a2a
example usage (#5182)
jeffvestal May 24, 2023
47e4ee4
adjust docarray docstrings (#5185)
jupyterjazz May 24, 2023
2d5588c
bump 179 (#5200)
dev2049 May 24, 2023
11c26eb
Harrison/modelscope (#5156)
hwchase17 May 24, 2023
aa14e22
Reuse `length_func` in `MapReduceDocumentsChain` (#5181)
zachschillaci27 May 24, 2023
fd866d1
Update Cypher QA prompt (#5173)
tomasonjo May 24, 2023
b00c77d
Improve weaviate vectorstore docs (#5201)
hsm207 May 24, 2023
2b2176a
tfidf retriever (#5114)
dev2049 May 24, 2023
94cf391
standardize json parsing (#5168)
hwchase17 May 24, 2023
52714ce
fixing total cost finetuned model giving zero (#5144)
tommasodelorenzo May 24, 2023
c173bf1
Fixes scope of query Session in PGVector (#5194)
May 24, 2023
d8eed60
Output parsing variation allowance (#5178)
dibrale May 24, 2023
f0730c6
Allow readthedoc loader to pass custom html tag (#5175)
ByronHsu May 24, 2023
f10be07
Add Iugu document loader (#5162)
rasiqueira May 24, 2023
44abe92
Add Joplin document loader (#5153)
alondmnt May 24, 2023
dcee893
nit (#5208)
dev2049 May 24, 2023
b7fcb35
add option to pass openai key to langchain plus command (#5213)
agola11 May 24, 2023
66113c2
Log warning (#5192)
vowelparrot May 24, 2023
e76e68b
Add Delete Session Method (#5193)
vowelparrot May 24, 2023
e6c4571
Add 'status' command to get server status (#5197)
vowelparrot May 24, 2023
a775aa6
Harrison/vertex (#5049)
hwchase17 May 24, 2023
2ad29f4
fix a mistake in concepts.md (#5222)
leo-gan May 25, 2023
95c9aa1
Create async copy of from_text() inside GraphIndexCreator. (#5214)
maspotts May 25, 2023
eff31a3
Remove API key from docs (#5223)
kbressem May 25, 2023
f0ea093
Change Default GoogleDriveLoader Behavior to not Load Trashed Files (…
NickL77 May 25, 2023
40b086d
Allow to specify ID when adding to the FAISS vectorstore. (#5190)
atisharma May 25, 2023
5cfa72a
Bibtex integration for document loader and retriever (#5137)
eyurtsev May 25, 2023
5cdd9ab
Add MiniMax embeddings (#5174)
archongum May 25, 2023
09e246f
Weaviate: Add QnA with sources example (#5247)
hsm207 May 25, 2023
9e57be4
Fix typo in docstring of RetryWithErrorOutputParser (#5244)
mwinterde May 25, 2023
15b17f9
bump 180 (#5248)
dev2049 May 25, 2023
c7e2151
remove extra "\n" to ensure that the format of the description, examp…
pengqu123 May 25, 2023
9c0cb90
Resolve error in StructuredOutputParser docs (#5240)
mwinterde May 25, 2023
88ed8e1
Added the option of specifying a proxy for the OpenAI API (#5246)
ymaurer May 25, 2023
3be9ba1
OpenSearch top k parameter fix (#5216)
dev2049 May 25, 2023
d3cd21c
Fixed regression in JoplinLoader's get note url (#5265)
alondmnt May 25, 2023
5525602
Docs link custom agent page in getting started (#5250)
JanilsWoerst May 25, 2023
ca88b25
Zep sdk version (#5267)
dev2049 May 25, 2023
b398862
Add C Transformers for GGML Models (#5218)
marella May 25, 2023
3223a97
Add visible_only and strict_mode options to ClickTool (#4088)
cancan101 May 25, 2023
7652d2a
Add Multi-CSV/DF support in CSV and DataFrame Toolkits (#5009)
NickL77 May 25, 2023
f01dfe8
OpenAI lint (#5273)
dev2049 May 25, 2023
2ef5579
Added pipline args to `HuggingFacePipeline.from_model_id` (#5268)
solomspd May 26, 2023
56ad56c
Support bigquery dialect - SQL (#5261)
HassanOuda May 26, 2023
7047a2c
feat: add Momento as a standard cache and chat message history provid…
malandis May 26, 2023
a0281f5
Fixed typo: 'ouput' to 'output' in all documentation (#5272)
deepblue May 26, 2023
1cb6498
Tedma4/twilio tool (#5136)
tedma4 May 26, 2023
aec642f
LLM wrapper for Databricks (#5142)
mengxr May 26, 2023
d481d88
Add an example to make the prompt more robust (#5291)
pengqu123 May 26, 2023
a669abf
Update CONTRIBUTION guidelines and PR Template (#5140)
eyurtsev May 26, 2023
aa3c7b3
Fixed passing creds to VertexAI LLM (#5297)
lkuligin May 26, 2023
641303a
bump 181 (#5302)
dev2049 May 26, 2023
58e95cd
Better docs for weaviate hybrid search (#5290)
hsm207 May 26, 2023
0a8d6bc
Add instructions to pyproject.toml (#5138)
eyurtsev May 26, 2023
f75f0db
docs: improve flow of llm caching notebook (#5309)
malandis May 26, 2023
6e974b5
Fix typos (#5323)
russellpwirtz May 27, 2023
465a970
docs: added link to LangChain Handbook (#5311)
leo-gan May 28, 2023
179ddbe
add enum output parser (#5165)
hwchase17 May 28, 2023
5292e85
add enum output parser (#5165)
hwchase17 May 28, 2023
c49c6ac
Add Chainlit to deployment options (#5314)
constantinidan May 28, 2023
c6e5d90
Fixing blank thoughts in verbose for "_Exception" Action (#5331)
svdeepak99 May 28, 2023
f079cdf
fix: remove empty lines that cause InvalidRequestError (#5320)
mbchang May 28, 2023
881dfe8
Sample Notebook for DynamoDB Chat Message History (#5351)
KBB99 May 28, 2023
1daa706
added cosmos kwargs option (#5292)
eavanvalkenburg May 28, 2023
e274295
feat: support for shopping search in SerpApi (#5259)
aymenfurter May 28, 2023
5f45523
Add SKLearnVectorStore (#5305)
mrtj May 28, 2023
b705f26
bump 182 (#5364)
dev2049 May 28, 2023
9a5c9df
Fixes iter error in FAISS add_embeddings call (#5367)
May 28, 2023
b692797
revert bad json (#5370)
hwchase17 May 28, 2023
ad7f4c0
bump to 183 (#5372)
hwchase17 May 28, 2023
1366d07
Add path validation to DirectoryLoader (#5327)
os1ma May 28, 2023
99a1e3f
Fix: Handle empty documents in ContextualCompressionRetriever (Issue …
hanguofeng May 28, 2023
6df90ad
handle json parsing errors (#5371)
hwchase17 May 29, 2023
14099f1
Use Default Factory (#5380)
vowelparrot May 29, 2023
f77f271
Update PR template with Twitter handle request (#5382)
jacoblee93 May 29, 2023
8b7721e
fix: Blob.from_data mimetype is lost (#5395)
Digma May 29, 2023
e455ba4
Add async support to routing chains (#5373)
amaudruz May 29, 2023
44b48d9
Fix update_document function, add test and documentation. (#5359)
martinholecekmax May 29, 2023
f6615ca
Update llamacpp demonstration notebook (#5344)
sadaisystems May 29, 2023
642ae83
Removed deprecated llm attribute for load_chain (#5343)
imeckr May 29, 2023
3e16468
Harrison/llamacpp (#5402)
hwchase17 May 29, 2023
c09f8e4
Add pagination for Vertex AI embeddings (#5325)
Jflick58 May 29, 2023
100d665
Reformat openai proxy setting as code (#5330)
sevendark May 29, 2023
416c8b1
Harrison/deep infra (#5403)
hwchase17 May 29, 2023
d6fb25c
Harrison/prediction guard update (#5404)
hwchase17 May 29, 2023
ccb6238
Implemented appending arbitrary messages (#5293)
eavanvalkenburg May 29, 2023
a359819
docs: `ecosystem/integrations` update 2 (#5282)
leo-gan May 29, 2023
1837caa
docs: `ecosystem/integrations` update 1 (#5219)
leo-gan May 29, 2023
2da8c48
Harrison/datetime parser (#4693)
hwchase17 May 29, 2023
cce731c
bump version 184 (#5407)
hwchase17 May 29, 2023
cf5803e
Add ToolException that a tool can throw. (#5050)
xming521 May 29, 2023
72f99ff
Harrison/text splitter (#5417)
hwchase17 May 29, 2023
0b3e0dd
New Trello document loader (#4767)
GMartin-dev May 30, 2023
8259f9b
DocumentLoader for GitHub (#5408)
UmerHA May 30, 2023
760632b
Harrison/spark reader (#5405)
hwchase17 May 30, 2023
26ff185
Set old LCTracer to default to port 8000 (#5381)
vowelparrot May 30, 2023
ee57054
Rename and fix typo in lancedb (#5425)
eddyxu May 30, 2023
c4b502a
Harrison/condense q llm (#5438)
hwchase17 May 30, 2023
a61b7f7
adding MongoDBAtlasVectorSearch (#5338)
P-E-B May 30, 2023
9d658aa
Add more code splitters (go, rst, js, java, cpp, scala, ruby, php, sw…
ByronHsu May 30, 2023
64b4165
bump 185 (#5442)
dev2049 May 30, 2023
2649b63
fix (#5457)
dev2049 May 30, 2023
4379bd4
bump 186 (#5459)
dev2049 May 30, 2023
0d3a9d4
Fixed docstring in faiss.py for load_local (#5440)
luckyduck May 30, 2023
e09afb4
Removes duplicated call from langchain/client/langchain.py (#5449)
patrickkeane May 30, 2023
c1807d8
`encoding_kwargs` for InstructEmbeddings (#5450)
Xmaster6y May 30, 2023
1d861dc
MRKL output parser no longer breaks well formed queries (#5432)
May 30, 2023
1f11f80
docs: cleaning (#5413)
leo-gan May 30, 2023
80e133f
Added async _acall to FakeListLLM (#5439)
camille-vanhoffelen May 30, 2023
f93d256
Feat: Add batching to Qdrant (#5443)
kacperlukawski May 30, 2023
8181f9e
Update psychicapi version (#5471)
Ayan-Bandyopadhyay May 30, 2023
1111f18
Add maximal relevance search to SKLearnVectorStore (#5430)
mrtj May 30, 2023
eab4b4c
add simple test for imports (#5461)
hwchase17 May 30, 2023
199cc70
Ability to specify credentials wihen using Google BigQuery as a data …
nsheils May 30, 2023
e31705b
convert the parameter 'text' to uppercase in the function 'parse' of …
ARSblithe212 May 30, 2023
8121e04
added n_threads functionality for gpt4all (#5427)
Vokturz May 30, 2023
0a44bfd
Allow for async use of SelfAskWithSearchChain (#5394)
pors May 31, 2023
46e181a
Allow ElasticsearchEmbeddings to create a connection with ES Client o…
jeffvestal May 31, 2023
ce8b7a2
SQLite-backed Entity Memory (#5129)
JoseHervas May 31, 2023
1671c2a
py tracer fixes (#5377)
agola11 May 31, 2023
f72bb96
Harrison/html splitter (#5468)
hwchase17 May 31, 2023
8bcaca4
Feature: Qdrant filters supports (#5446)
kacperlukawski May 31, 2023
470b282
Add matching engine vectorstore (#3350)
hwchase17 May 31, 2023
b39c069
Merge branch 'master' into mongo-document-loader
saginawj May 31, 2023
272c63c
Merge branch 'harrison/mongo-loader' into mongo-document-loader
saginawj Sep 13, 2023
922e147
Merge branch 'mongo-document-loader' of https://github.com/saginawj/l…
saginawj Sep 14, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 148 additions & 0 deletions docs/modules/indexes/document_loaders/examples/mongodb.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "vm8vn9t8DvC_"
},
"source": [
"# MongoDB"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "5WjXERXzFEhg"
},
"source": [
"## Overview"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "juAmbgoWD17u"
},
"source": [
"The MongoDB Document Loader returns a list of Langchain Documents from a MongoDB database.\n",
"\n",
"The Loader requires the following parameters:\n",
"\n",
"* MongoDB connection string\n",
"* MongoDB database name\n",
"* MongoDB collection name\n",
"\n",
"The output takes the following format:\n",
"\n",
"- pageContent= Mongo Document\n",
"- metadata={'database': '[database_name]', 'collection': '[collection_name]'}"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load the Document Loader"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"# for running async code in jupyter notebook\n",
"import nest_asyncio\n",
"nest_asyncio.apply()\n",
"\n",
"\n",
"from langchain.document_loaders.mongodb import MongodbLoader"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"loader = MongodbLoader(connection_string=\"mongodb://localhost:27017/\",\n",
" db_name=\"sample_restaurants\", \n",
" collection_name=\"restaurants\"\n",
" ) "
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"25359"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = loader.load()\n",
"\n",
"len(docs)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content=\"{'_id': ObjectId('5eb3d668b31de5d588f4292a'), 'address': {'building': '2780', 'coord': [-73.98241999999999, 40.579505], 'street': 'Stillwell Avenue', 'zipcode': '11224'}, 'borough': 'Brooklyn', 'cuisine': 'American', 'grades': [{'date': datetime.datetime(2014, 6, 10, 0, 0), 'grade': 'A', 'score': 5}, {'date': datetime.datetime(2013, 6, 5, 0, 0), 'grade': 'A', 'score': 7}, {'date': datetime.datetime(2012, 4, 13, 0, 0), 'grade': 'A', 'score': 12}, {'date': datetime.datetime(2011, 10, 12, 0, 0), 'grade': 'A', 'score': 12}], 'name': 'Riviera Caterer', 'restaurant_id': '40356018'}\", metadata={'database': 'sample_restaurants', 'collection': 'restaurants'})"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[0]"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [
"5WjXERXzFEhg"
],
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
2 changes: 2 additions & 0 deletions langchain/document_loaders/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@
from langchain.document_loaders.markdown import UnstructuredMarkdownLoader
from langchain.document_loaders.mediawikidump import MWDumpLoader
from langchain.document_loaders.modern_treasury import ModernTreasuryLoader
from langchain.document_loaders.mongodb import MongodbLoader
from langchain.document_loaders.notebook import NotebookLoader
from langchain.document_loaders.notion import NotionDirectoryLoader
from langchain.document_loaders.notiondb import NotionDBLoader
Expand Down Expand Up @@ -150,6 +151,7 @@
"MWDumpLoader",
"MathpixPDFLoader",
"ModernTreasuryLoader",
"MongodbLoader",
"NotebookLoader",
"NotionDBLoader",
"NotionDirectoryLoader",
Expand Down
61 changes: 61 additions & 0 deletions langchain/document_loaders/mongodb.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
import asyncio
from typing import List

from motor.motor_asyncio import AsyncIOMotorClient

from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader


class MongodbLoader(BaseLoader):
"""Load MongoDB documents."""

def __init__(
self,
connection_string: str,
db_name: str,
collection_name: str,
):
if not connection_string:
raise ValueError("connection_string must be provided.")

if not db_name:
raise ValueError("db_name must be provided.")

if not collection_name:
raise ValueError("collection_name must be provided.")
saginawj marked this conversation as resolved.
Show resolved Hide resolved

self.client = AsyncIOMotorClient(connection_string)
self.db_name = db_name
self.collection_name = collection_name
self.db = self.client.get_database(db_name)
self.collection = self.db.get_collection(collection_name)

def load(self) -> List[Document]:
result = asyncio.run(self._async_load())
return result

async def _async_load(self) -> List[Document]:
result = []

try:
cursor = self.collection.find()

total_docs = self.collection.count_documents({})

async for doc in cursor:
content = str(doc)
metadata = {
"database": self.db_name,
"collection": self.collection_name,
}
result.append(Document(page_content=content, metadata=metadata))

if len(result) != total_docs:
raise Exception(f"Only partial collection of documents returned.")


except Exception as e:
print(f"Error: {e}")
saginawj marked this conversation as resolved.
Show resolved Hide resolved

return result
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ pexpect = {version = "^4.8.0", optional = true}
pyvespa = {version = "^0.33.0", optional = true}
O365 = {version = "^2.0.26", optional = true}
jq = {version = "^1.4.1", optional = true}
motor = "^3.1.2"

[tool.poetry.group.docs.dependencies]
autodoc_pydantic = "^1.8.0"
Expand Down
121 changes: 121 additions & 0 deletions tests/integration_tests/document_loaders/test_mongodb.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
import os
from unittest.mock import AsyncMock, MagicMock, patch

import pytest
from motor.motor_asyncio import AsyncIOMotorClient

from langchain.docstore.document import Document
from langchain.document_loaders import MongodbLoader

if "MONGODB_API_KEY" in os.environ:
mongo_api_key_set = True
mongo_api_key = os.environ["MONGODB_API_KEY"]
else:
mongo_api_key_set = False


@pytest.fixture
def docs():
return [
{"_id": "1", "address": {"building": "1", "room": "1"}},
{"_id": "2", "address": {"building": "2", "room": "2"}},
]


@pytest.fixture
def expected_documents():
return [
Document(
page_content="{'_id': '1', 'address': {'building': '1', 'room': '1'}}",
metadata={"database": "sample_restaurants", "collection": "restaurants"},
),
Document(
page_content="{'_id': '2', 'address': {'building': '2', 'room': '2'}}",
metadata={"database": "sample_restaurants", "collection": "restaurants"},
),
]


@pytest.mark.asyncio
async def test_async_load_mocked(mocker, docs, expected_documents):
async def mock_find():
for doc in docs:
yield doc

mock_collection = MagicMock()
mock_db = MagicMock()
mock_client = MagicMock()

mocker.patch.object(mock_collection, "find", return_value=mock_find())
saginawj marked this conversation as resolved.
Show resolved Hide resolved
mocker.patch.object(AsyncIOMotorClient, "get_database", return_value=mock_db)
mocker.patch.object(mock_db, "get_collection", return_value=mock_collection)

with patch("motor.motor_asyncio.AsyncIOMotorClient", return_value=mock_client):
loader = MongodbLoader(
"mongodb://localhost:27017", "sample_restaurants", "restaurants"
)

documents = await loader._async_load()

assert documents == expected_documents


@pytest.mark.asyncio
async def test_async_partial_load_mocked(mocker, docs, expected_documents):
async def mock_find():
for doc in docs:
yield doc

expected_documents.remove(expected_documents[1])

mock_collection = MagicMock()
mock_db = MagicMock()
mock_client = MagicMock()
saginawj marked this conversation as resolved.
Show resolved Hide resolved

mocker.patch.object(mock_collection, "find", return_value=mock_find())
mocker.patch.object(AsyncIOMotorClient, "get_database", return_value=mock_db)
mocker.patch.object(mock_db, "get_collection", return_value=mock_collection)
saginawj marked this conversation as resolved.
Show resolved Hide resolved

with patch("motor.motor_asyncio.AsyncIOMotorClient", return_value=mock_client):
loader = MongodbLoader(
"mongodb://localhost:27017", "sample_restaurants", "restaurants"
)

with pytest.raises(Exception) as error_PartialLoad:
await loader._async_load()

assert (
str(error_PartialLoad.value)
== "Error: Only partial collection of documents returned."
)


def test_load_mocked(expected_documents):
mock_async_load = AsyncMock()

mock_async_load.return_value = expected_documents

with patch("langchain.document_loaders.MongodbLoader", MagicMock()):
loader = MongodbLoader(
"mongodb://localhost:27017", "test_db", "test_collection"
)
saginawj marked this conversation as resolved.
Show resolved Hide resolved

loader._async_load = mock_async_load

documents = loader.load()

assert documents == expected_documents


@pytest.mark.skipif(not mongo_api_key_set, reason="MONGODB_API_KEY not provided.")
saginawj marked this conversation as resolved.
Show resolved Hide resolved
def test_load_actual_db() -> None:
saginawj marked this conversation as resolved.
Show resolved Hide resolved
if "MONGODB_API_KEY" in os.environ:
mongo_api_key = os.environ["MONGODB_API_KEY"]
db_name = os.environ["MONGODB_DB_NAME"]
collection_name = os.environ["MONGODB_COLLECTION_NAME"]

loader = MongodbLoader(mongo_api_key, db_name, collection_name)

result = loader.load()

assert len(result) > 0