Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diffbot Graph Transformer / Neo4j Graph document ingestion #9979

Merged
merged 23 commits into from
Sep 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
307 changes: 307 additions & 0 deletions docs/extras/use_cases/more/graph/diffbot_graphtransformer.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,307 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "7f0b0c06-ee70-468c-8bf5-b023f9e5e0a2",
"metadata": {},
"source": [
"# Diffbot Graph Transformer\n",
"\n",
"[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/more/graph/diffbot_transformer.ipynb)\n",
"\n",
"## Use case\n",
"\n",
"Text data often contain rich relationships and insights that can be useful for various analytics, recommendation engines, or knowledge management applications.\n",
"\n",
"Diffbot's NLP API allows for the extraction of entities, relationships, and semantic meaning from unstructured text data.\n",
"\n",
"By coupling Diffbot's NLP API with Neo4j, a graph database, you can create powerful, dynamic graph structures based on the information extracted from text. These graph structures are fully queryable and can be integrated into various applications.\n",
"\n",
"This combination allows for use cases such as:\n",
"\n",
"* Building knowledge graphs from textual documents, websites, or social media feeds.\n",
"* Generating recommendations based on semantic relationships in the data.\n",
"* Creating advanced search features that understand the relationships between entities.\n",
"* Building analytics dashboards that allow users to explore the hidden relationships in data.\n",
"\n",
"## Overview\n",
"\n",
"LangChain provides tools to interact with Graph Databases:\n",
"\n",
"1. `Construct knowledge graphs from text` using graph transformer and store integrations \n",
"2. `Query a graph database` using chains for query creation and execution\n",
"3. `Interact with a graph database` using agents for robust and flexible querying \n",
"\n",
"## Quickstart\n",
"\n",
"First, get required packages and set environment variables:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "975648da-b24f-4164-a671-6772179e12df",
"metadata": {},
"outputs": [],
"source": [
"!pip install langchain langchain-experimental openai neo4j wikipedia"
]
},
{
"cell_type": "markdown",
"id": "77718977-629e-46c2-b091-f9191b9ec569",
"metadata": {},
"source": [
"## Diffbot NLP Service\n",
"\n",
"Diffbot's NLP service is a tool for extracting entities, relationships, and semantic context from unstructured text data.\n",
"This extracted information can be used to construct a knowledge graph.\n",
"To use their service, you'll need to obtain an API key from [Diffbot](https://www.diffbot.com/products/natural-language/)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "2cbf97d0-3682-439b-8750-b695ff726789",
"metadata": {},
"outputs": [],
"source": [
"from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer\n",
"\n",
"diffbot_api_key = \"DIFFBOT_API_KEY\"\n",
"diffbot_nlp = DiffbotGraphTransformer(diffbot_api_key=diffbot_api_key)"
]
},
{
"cell_type": "markdown",
"id": "5e3b894a-e3ee-46c7-8116-f8377f8f0159",
"metadata": {},
"source": [
"This code fetches Wikipedia articles about \"Baldur's Gate 3\" and then uses `DiffbotGraphTransformer` to extract entities and relationships.\n",
"The `DiffbotGraphTransformer` outputs a structured data `GraphDocument`, which can be used to populate a graph database.\n",
"Note that text chunking is avoided due to Diffbot's [character limit per API request](https://docs.diffbot.com/reference/introduction-to-natural-language-api)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "53f8df86-47a1-44a1-9a0f-6725b90703bc",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import WikipediaLoader\n",
"\n",
"query = \"Warren Buffett\"\n",
"raw_documents = WikipediaLoader(query=query).load()\n",
"graph_documents = diffbot_nlp.convert_to_graph_documents(raw_documents)"
]
},
{
"cell_type": "markdown",
"id": "31bb851a-aab4-4b97-a6b7-fce397d32b47",
"metadata": {},
"source": [
"## Loading the data into a knowledge graph\n",
"\n",
"You will need to have a running Neo4j instance. One option is to create a [free Neo4j database instance in their Aura cloud service](https://neo4j.com/cloud/platform/aura-graph-database/). You can also run the database locally using the [Neo4j Desktop application](https://neo4j.com/download/), or running a docker container. You can run a local docker container by running the executing the following script:\n",
"```\n",
"docker run \\\n",
" --name neo4j \\\n",
" -p 7474:7474 -p 7687:7687 \\\n",
" -d \\\n",
" -e NEO4J_AUTH=neo4j/pleaseletmein \\\n",
" -e NEO4J_PLUGINS=\\[\\\"apoc\\\"\\] \\\n",
" neo4j:latest\n",
"``` \n",
"If you are using the docker container, you need to wait a couple of second for the database to start."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "0b2b6641-5a5d-467c-b148-e6aad5e4baa7",
"metadata": {},
"outputs": [],
"source": [
"from langchain.graphs import Neo4jGraph\n",
"\n",
"url=\"bolt://localhost:7687\"\n",
"username=\"neo4j\"\n",
"password=\"pleaseletmein\"\n",
"\n",
"graph = Neo4jGraph(\n",
" url=url,\n",
" username=username, \n",
" password=password\n",
")"
]
},
{
"cell_type": "markdown",
"id": "0b15e840-fe6f-45db-9193-1b4e2df5c12c",
"metadata": {},
"source": [
"The `GraphDocuments` can be loaded into a knowledge graph using the `add_graph_documents` method."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "1a67c4a8-955c-42a2-9c5d-de3ac0e640ec",
"metadata": {},
"outputs": [],
"source": [
"graph.add_graph_documents(graph_documents)"
]
},
{
"cell_type": "markdown",
"id": "ed411e05-2b03-460d-997e-938482774f40",
"metadata": {},
"source": [
"## Refresh graph schema information\n",
"If the schema of database changes, you can refresh the schema information needed to generate Cypher statements"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "904c9ee3-787c-403f-857d-459ce5ad5a1b",
"metadata": {},
"outputs": [],
"source": [
"graph.refresh_schema()"
]
},
{
"cell_type": "markdown",
"id": "f19d1387-5899-4258-8c94-8ef5fa7db464",
"metadata": {},
"source": [
"## Querying the graph\n",
"We can now use the graph cypher QA chain to ask question of the graph. It is advisable to use **gpt-4** to construct Cypher queries to get the best experience."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "9393b732-67c8-45c1-9ec2-089f49c62448",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import GraphCypherQAChain\n",
"from langchain.chat_models import ChatOpenAI\n",
"\n",
"chain = GraphCypherQAChain.from_llm(\n",
" cypher_llm=ChatOpenAI(temperature=0, model_name=\"gpt-4\"),\n",
" qa_llm=ChatOpenAI(temperature=0, model_name=\"gpt-3.5-turbo\"),\n",
" graph=graph, verbose=True,\n",
" \n",
")\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "1a9b3652-b436-404d-aa25-5fb576f23dc0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
"Generated Cypher:\n",
"\u001b[32;1m\u001b[1;3mMATCH (p:Person {name: \"Warren Buffett\"})-[:EDUCATED_AT]->(o:Organization)\n",
"RETURN o.name\u001b[0m\n",
"Full Context:\n",
"\u001b[32;1m\u001b[1;3m[{'o.name': 'New York Institute of Finance'}, {'o.name': 'Alice Deal Junior High School'}, {'o.name': 'Woodrow Wilson High School'}, {'o.name': 'University of Nebraska'}]\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'Warren Buffett attended the University of Nebraska.'"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(\"Which university did Warren Buffett attend?\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "adc0ba0f-a62c-4875-89ce-da717f3ab148",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
"Generated Cypher:\n",
"\u001b[32;1m\u001b[1;3mMATCH (p:Person)-[r:EMPLOYEE_OR_MEMBER_OF]->(o:Organization) WHERE o.name = 'Berkshire Hathaway' RETURN p.name\u001b[0m\n",
"Full Context:\n",
"\u001b[32;1m\u001b[1;3m[{'p.name': 'Charlie Munger'}, {'p.name': 'Oliver Chace'}, {'p.name': 'Howard Buffett'}, {'p.name': 'Howard'}, {'p.name': 'Susan Buffett'}, {'p.name': 'Warren Buffett'}]\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'Charlie Munger, Oliver Chace, Howard Buffett, Susan Buffett, and Warren Buffett are or were working at Berkshire Hathaway.'"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(\"Who is or was working at Berkshire Hathaway?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d636954b-d967-4e96-9489-92e11c74af35",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer

__all__ = [
"DiffbotGraphTransformer",
]
Loading