-
Notifications
You must be signed in to change notification settings - Fork 16.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
37 changed files
with
2,938 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,282 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"---\n", | ||
"sidebar_label: Box\n", | ||
"---" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# BoxLoader\n", | ||
"\n", | ||
"This notebook provides a quick overview for getting started with Box [document loader](/docs/integrations/document_loaders/). For detailed documentation of all BoxLoader features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.langchain_box_loader.BoxLoader.html).\n", | ||
"\n", | ||
"\n", | ||
"## Overview\n", | ||
"\n", | ||
"The `BoxLoader` class helps you get your unstructured content from Box in Langchain's `Document` format. You can do this with either a `List[str]` containing Box file IDs, or with a `str` containing a Box folder ID. \n", | ||
"\n", | ||
"You must provide either a `List[str]` containing Box file Ids, or a `str` containing a folder ID. If getting files from a folder with folder ID, you can also set a `Bool` to tell the loader to get all sub-folders in that folder, as well. \n", | ||
"\n", | ||
":::info\n", | ||
"A Box instance can contain Petabytes of files, and folders can contain millions of files. Be intentional when choosing what folders you choose to index. And we recommend never getting all files from folder 0 recursively. Folder ID 0 is your root folder.\n", | ||
":::\n", | ||
"\n", | ||
"Files without a text representation will be skipped.\n", | ||
"\n", | ||
"### Integration details\n", | ||
"\n", | ||
"| Class | Package | Local | Serializable | JS support|\n", | ||
"| :--- | :--- | :---: | :---: | :---: |\n", | ||
"| [BoxLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_box.document_loaders.langchain_boxloader.BoxLoader.html) | [langchain_box](https://api.python.langchain.com/en/latest/box_api_reference.html) | ✅ | ❌ | ❌ | \n", | ||
"### Loader features\n", | ||
"| Source | Document Lazy Loading | Async Support\n", | ||
"| :---: | :---: | :---: | \n", | ||
"| BoxLoader | ✅ | ❌ | \n", | ||
"\n", | ||
"## Setup\n", | ||
"\n", | ||
"In order to use the Box package, you will need a few things:\n", | ||
"\n", | ||
"* A Box account — If you are not a current Box customer or want to test outside of your production Box instance, you can use a [free developer account](https://account.box.com/signup/n/developer#ty9l3).\n", | ||
"* [A Box app](https://developer.box.com/guides/getting-started/first-application/) — This is configured in the [developer console](https://account.box.com/developers/console), and for Box AI, must have the `Manage AI` scope enabled. Here you will also select your authentication method\n", | ||
"* The app must be [enabled by the administrator](https://developer.box.com/guides/authorization/custom-app-approval/#manual-approval). For free developer accounts, this is whomever signed up for the account.\n", | ||
"\n", | ||
"### Credentials\n", | ||
"\n", | ||
"For these examples, we will use [token authentication](https://developer.box.com/guides/authentication/tokens/developer-tokens). This can be used with any [authentication method](https://developer.box.com/guides/authentication/). Just get the token with whatever methodology. If you want to learn more about how to use other authentication types with `langchain-box`, visit the [Box provider](/docs/integrations/providers/box) document.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Enter your Box Developer Token: ········\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"import getpass\n", | ||
"import os\n", | ||
"\n", | ||
"box_developer_token = getpass.getpass(\"Enter your Box Developer Token: \")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n", | ||
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Installation\n", | ||
"\n", | ||
"Install **langchain_box**." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%pip install -qU langchain_box" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Initialization\n", | ||
"\n", | ||
"### Load files\n", | ||
"\n", | ||
"If you wish to load files, you must provide the `List` of file ids at instantiation time. \n", | ||
"\n", | ||
"This requires 1 piece of information:\n", | ||
"\n", | ||
"* **box_file_ids** (`List[str]`)- A list of Box file IDs. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain_box.document_loaders import BoxLoader\n", | ||
"\n", | ||
"box_file_ids = [\"1514555423624\", \"1514553902288\"]\n", | ||
"\n", | ||
"loader = BoxLoader(\n", | ||
" box_developer_token=box_developer_token,\n", | ||
" box_file_ids=box_file_ids,\n", | ||
" character_limit=10000, # Optional. Defaults to no limit\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Load from folder\n", | ||
"\n", | ||
"If you wish to load files from a folder, you must provide a `str` with the Box folder ID at instantiation time. \n", | ||
"\n", | ||
"This requires 1 piece of information:\n", | ||
"\n", | ||
"* **box_folder_id** (`str`)- A string containing a Box folder ID. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain_box.document_loaders import BoxLoader\n", | ||
"\n", | ||
"box_folder_id = \"260932470532\"\n", | ||
"\n", | ||
"loader = BoxLoader(\n", | ||
" box_folder_id=box_folder_id,\n", | ||
" recursive=False, # Optional. return entire tree, defaults to False\n", | ||
" character_limit=10000, # Optional. Defaults to no limit\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Load" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"Document(metadata={'source': 'https://dl.boxcloud.com/api/2.0/internal_files/1514555423624/versions/1663171610024/representations/extracted_text/content/', 'title': 'Invoice-A5555_txt'}, page_content='Vendor: AstroTech Solutions\\nInvoice Number: A5555\\n\\nLine Items:\\n - Gravitational Wave Detector Kit: $800\\n - Exoplanet Terrarium: $120\\nTotal: $920')" | ||
] | ||
}, | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"docs = loader.load()\n", | ||
"docs[0]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"{'source': 'https://dl.boxcloud.com/api/2.0/internal_files/1514555423624/versions/1663171610024/representations/extracted_text/content/', 'title': 'Invoice-A5555_txt'}\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"print(docs[0].metadata)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Lazy Load" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"page = []\n", | ||
"for doc in loader.lazy_load():\n", | ||
" page.append(doc)\n", | ||
" if len(page) >= 10:\n", | ||
" # do some paged operation, e.g.\n", | ||
" # index.upsert(page)\n", | ||
"\n", | ||
" page = []" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## API reference\n", | ||
"\n", | ||
"For detailed documentation of all BoxLoader features and configurations head to the API reference: https://api.python.langchain.com/en/latest/document_loaders/langchain_box.document_loaders.langchain_box_loader.BoxLoader.html\n", | ||
"\n", | ||
"\n", | ||
"## Help\n", | ||
"\n", | ||
"If you have questions, you can check out our [developer documentation](https://developer.box.com) or reach out to use in our [developer community](https://community.box.com)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.6" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |
Oops, something went wrong.