Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Add lcel to combine_docs chains #12310

Merged
merged 5 commits into from
Oct 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
251 changes: 251 additions & 0 deletions docs/docs/modules/chains/document/map_reduce.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,251 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "2accd7d9-20b6-47f9-9cec-923809cc36c7",
"metadata": {},
"source": [
"# Map reduce\n",
"\n",
"The map reduce documents chain first applies an LLM chain to each document individually (the Map step), treating the chain output as a new document. It then passes all the new documents to a separate combine documents chain to get a single output (the Reduce step). It can optionally first compress, or collapse, the mapped documents to make sure that they fit in the combine documents chain (which will often pass them to an LLM). This compression step is performed recursively if necessary.\n",
"\n",
"![map_reduce_diagram](/img/map_reduce.jpg)"
]
},
{
"cell_type": "markdown",
"id": "343fc972-40be-44f8-8ed3-305322661a00",
"metadata": {},
"source": [
"## Recreating with LCEL\n",
"\n",
"With [LangChain Expression Language](/docs/expression_language) we can recreate the `MapReduceDocumentsChain` functionality, with the additional benefit of getting all the built-in LCEL features (batch, async, etc.) and with much more ability to customize specific parts of the chain."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "7bc161bc-4054-457a-9d04-7245093acd16",
"metadata": {},
"outputs": [],
"source": [
"from functools import partial\n",
"\n",
"from langchain.callbacks.manager import CallbackManagerForChainRun\n",
"from langchain.chains.combine_documents.reduce import _collapse_docs, _split_list_of_docs\n",
"from langchain.chat_models import ChatAnthropic\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.schema import StrOutputParser\n",
"from langchain.schema.prompt_template import format_document\n",
"from langchain.schema.runnable import RunnableParallel, RunnablePassthrough"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a93ba908-5b81-4e91-a598-ee6fa05eac01",
"metadata": {},
"outputs": [],
"source": [
"llm = ChatAnthropic()\n",
"\n",
"# Prompt and method for converting Document -> str.\n",
"document_prompt = PromptTemplate.from_template(\"{page_content}\")\n",
"partial_format_document = partial(format_document, prompt=document_prompt)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "75918baa-df6b-4570-91eb-1acd1c87e09b",
"metadata": {},
"outputs": [],
"source": [
"# The chain we'll apply to each individual document.\n",
"# Returns a summary of the document.\n",
"map_chain = (\n",
" {\"context\": partial_format_document} \n",
" | PromptTemplate.from_template(\"Summarize this content:\\n\\n{context}\") \n",
" | llm \n",
" | StrOutputParser()\n",
")\n",
"\n",
"# A wrapper chain to keep the original Document metadata\n",
"map_as_doc_chain = (\n",
" RunnableParallel({\"doc\": RunnablePassthrough(), \"content\": map_chain}) \n",
" | (lambda x: Document(page_content=x[\"content\"], metadata=x[\"doc\"].metadata))\n",
").with_config(run_name=\"Summarize (return doc)\")\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "720cb117-8a3e-4595-9e2c-dbbd7a3777b5",
"metadata": {},
"outputs": [],
"source": [
"# The chain we'll repeatedly apply to collapse subsets of the documents \n",
"# into a consolidate document until the total token size of our \n",
"# documents is below some max size.\n",
"def format_docs(docs):\n",
" return \"\\n\\n\".join(partial_format_document(doc) for doc in docs)\n",
" \n",
"collapse_chain = (\n",
" {\"context\": format_docs}\n",
" | PromptTemplate.from_template(\"Collapse this content:\\n\\n{context}\")\n",
" | llm\n",
" | StrOutputParser()\n",
")\n",
"\n",
"def get_num_tokens(docs):\n",
" return llm.get_num_tokens(format_docs(docs))\n",
"\n",
"def collapse(\n",
" docs, \n",
" config,\n",
" token_max=4000,\n",
"):\n",
" collapse_ct = 1\n",
" while get_num_tokens(docs) > token_max:\n",
" config[\"run_name\"] = f\"Collapse {collapse_ct}\"\n",
" invoke = partial(collapse_chain.invoke, config=config)\n",
" split_docs = _split_list_of_docs(docs, get_num_tokens, token_max)\n",
" docs = [_collapse_docs(_docs, invoke) for _docs in split_docs]\n",
" collapse_ct += 1\n",
" return docs"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "fe5c5597-3ea3-443e-ad9f-e2f055cf092f",
"metadata": {},
"outputs": [],
"source": [
"# The chain we'll use to combine our individual document summaries\n",
"# (or summaries over subset of documents if we had to collapse the map results)\n",
"# into a final summary.\n",
"\n",
"reduce_chain = (\n",
" {\"context\": format_docs}\n",
" | PromptTemplate.from_template(\"Combine these summaries:\\n\\n{context}\")\n",
" | llm\n",
" | StrOutputParser()\n",
").with_config(run_name=\"Reduce\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "fd1148ce-f693-42b5-91e4-304983e26be6",
"metadata": {},
"outputs": [],
"source": [
"# The final full chain\n",
"map_reduce = (map_as_doc_chain.map() | collapse | reduce_chain).with_config(run_name=\"Map reduce\")"
]
},
{
"cell_type": "markdown",
"id": "5a10615c-f3ab-4603-bf0c-e6aea73c5450",
"metadata": {},
"source": [
"## Example run"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "28a8f74c-2441-431a-b352-541d0ad1e75b",
"metadata": {},
"outputs": [],
"source": [
"from langchain.schema import Document\n",
"\n",
"text = \"\"\"Nuclear power in space is the use of nuclear power in outer space, typically either small fission systems or radioactive decay for electricity or heat. Another use is for scientific observation, as in a Mössbauer spectrometer. The most common type is a radioisotope thermoelectric generator, which has been used on many space probes and on crewed lunar missions. Small fission reactors for Earth observation satellites, such as the TOPAZ nuclear reactor, have also been flown.[1] A radioisotope heater unit is powered by radioactive decay and can keep components from becoming too cold to function, potentially over a span of decades.[2]\n",
"\n",
"The United States tested the SNAP-10A nuclear reactor in space for 43 days in 1965,[3] with the next test of a nuclear reactor power system intended for space use occurring on 13 September 2012 with the Demonstration Using Flattop Fission (DUFF) test of the Kilopower reactor.[4]\n",
"\n",
"After a ground-based test of the experimental 1965 Romashka reactor, which used uranium and direct thermoelectric conversion to electricity,[5] the USSR sent about 40 nuclear-electric satellites into space, mostly powered by the BES-5 reactor. The more powerful TOPAZ-II reactor produced 10 kilowatts of electricity.[3]\n",
"\n",
"Examples of concepts that use nuclear power for space propulsion systems include the nuclear electric rocket (nuclear powered ion thruster(s)), the radioisotope rocket, and radioisotope electric propulsion (REP).[6] One of the more explored concepts is the nuclear thermal rocket, which was ground tested in the NERVA program. Nuclear pulse propulsion was the subject of Project Orion.[7]\n",
"\n",
"Regulation and hazard prevention[edit]\n",
"After the ban of nuclear weapons in space by the Outer Space Treaty in 1967, nuclear power has been discussed at least since 1972 as a sensitive issue by states.[8] Particularly its potential hazards to Earth's environment and thus also humans has prompted states to adopt in the U.N. General Assembly the Principles Relevant to the Use of Nuclear Power Sources in Outer Space (1992), particularly introducing safety principles for launches and to manage their traffic.[8]\n",
"\n",
"Benefits\n",
"\n",
"Both the Viking 1 and Viking 2 landers used RTGs for power on the surface of Mars. (Viking launch vehicle pictured)\n",
"While solar power is much more commonly used, nuclear power can offer advantages in some areas. Solar cells, although efficient, can only supply energy to spacecraft in orbits where the solar flux is sufficiently high, such as low Earth orbit and interplanetary destinations close enough to the Sun. Unlike solar cells, nuclear power systems function independently of sunlight, which is necessary for deep space exploration. Nuclear-based systems can have less mass than solar cells of equivalent power, allowing more compact spacecraft that are easier to orient and direct in space. In the case of crewed spaceflight, nuclear power concepts that can power both life support and propulsion systems may reduce both cost and flight time.[9]\n",
"\n",
"Selected applications and/or technologies for space include:\n",
"\n",
"Radioisotope thermoelectric generator\n",
"Radioisotope heater unit\n",
"Radioisotope piezoelectric generator\n",
"Radioisotope rocket\n",
"Nuclear thermal rocket\n",
"Nuclear pulse propulsion\n",
"Nuclear electric rocket\n",
"\"\"\"\n",
"\n",
"docs = [\n",
" Document(\n",
" page_content=split, \n",
" metadata={\"source\": \"https://en.wikipedia.org/wiki/Nuclear_power_in_space\"}\n",
" ) \n",
" for split in text.split(\"\\n\\n\")\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "846fe4ce-7016-4bc7-a8e0-7e914675d568",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Here is a summary that combines the key points about nuclear power in space:\n",
"\n",
"Nuclear power is used in space for electricity, heat, and scientific observation. The most common type is a radioisotope thermoelectric generator, which has powered space probes and lunar missions using the heat from radioactive decay. Small nuclear fission reactors have also been used to generate electricity for Earth observation satellites like the TOPAZ reactor. In addition, radioisotope heater units use radioactive decay to provide reliable heat that can keep components functioning properly over decades in the harsh space environment. Overall, nuclear power has proven useful for providing long-lasting power for space applications where solar power is not practical. Technologies like radioisotope decay heat and small fission reactors allow probes, satellites, and missions to operate far from the Sun and for extended periods by generating electricity and heat without reliance on solar energy.\n"
]
}
],
"source": [
"print(map_reduce.invoke(docs[0:1], config={\"max_concurrency\": 5}))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a8b77e13-6db4-4096-a1a4-d0abe2979b6b",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "poetry-venv",
"language": "python",
"name": "poetry-venv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
5 changes: 0 additions & 5 deletions docs/docs/modules/chains/document/map_reduce.mdx

This file was deleted.

Loading