Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PromptTemplate doesn't work with multi layer metadata (e.g. BedrockKnowledgeBase/AmazonKnowledgeBasesRetriever) #28354

Open
5 tasks done
bdavj opened this issue Nov 26, 2024 · 0 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@bdavj
Copy link

bdavj commented Nov 26, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

def _get_document_info(doc: Document, prompt: BasePromptTemplate[str]) -> dict:
    if 'source_metadata' in doc.metadata:
        base_info = {"page_content": doc.page_content, **doc.metadata, **doc.metadata['source_metadata']}
    else:
        base_info = {"page_content": doc.page_content, **doc.metadata}

Error Message and Stack Trace (if applicable)

No response

Description

I'm trying to use Langchain as a conversational retrieval chain to pull documents with metadata from BedrockKnowledgeBase, include the document prompt with some Bedrock Metadata in the prompt template.

The document metadata for the Document object looks like:
metadata: {
location: "s3://xyz",
score: 0.654,
source_metadata: { field_i_want: 123, another_field: 456}
}

The prompt template is failing in base_get_document_info() on the missing_metadata check as it only draws on the base level of metadata.

The following is a (hack) resolution:

def _get_document_info(doc: Document, prompt: BasePromptTemplate[str]) -> dict:
    if 'source_metadata' in doc.metadata:
        base_info = {"page_content": doc.page_content, **doc.metadata, **doc.metadata['source_metadata']}
    else:
        base_info = {"page_content": doc.page_content, **doc.metadata}

however the tactical resolution would be to somehow allow the dict path to be specified.

System Info

System Information

OS: Darwin
OS Version: Darwin Kernel Version 23.6.0: Thu Sep 12 23:36:12 PDT 2024; root:xnu-10063.141.1.701.1~1/RELEASE_ARM64_T6020
Python Version: 3.12.5 (main, Aug 6 2024, 19:08:49) [Clang 15.0.0 (clang-1500.3.9.4)]

Package Information

langchain_core: 0.3.15
langchain: 0.3.7
langchain_community: 0.3.5
langsmith: 0.1.142
langchain_aws: 0.2.7
langchain_text_splitters: 0.3.2

Optional packages not installed

langgraph
langserve

Other Dependencies

aiohttp: 3.10.10
async-timeout: Installed. No version info available.
boto3: 1.35.57
dataclasses-json: 0.5.9
httpx: 0.27.2
httpx-sse: 0.4.0
jsonpatch: 1.33
numpy: 1.26.4
orjson: 3.10.11
packaging: 23.2
pydantic: 2.9.2
pydantic-settings: 2.6.1
PyYAML: 6.0.2
requests: 2.32.3
requests-toolbelt: 1.0.0
SQLAlchemy: 2.0.35
tenacity: 8.3.0
typing-extensions: 4.12.2

@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant