How to retrieve the whole document for a chunk? #69

istvan-deak · 2024-08-30T10:56:32Z

What is your question or problem? Please describe.

I would like to use the long context window of the LLM of my choice and pass whole files to the prompt.

Describe what you would like to happen

During retrieval, I'd like the system to:

First fetch the small chunks as it currently does
Then look up the parent IDs for those chunks
Return the larger documents or even the whole file associated with those parent IDs

This approach would allow for more context to be provided to the LLM, potentially improving its performance on tasks that require broader context.

szymondudycz · 2024-09-02T11:32:53Z

If you want to use whole files in indexing, then just don't use splitter and make sure parser doesn't split documents (e.g. use 'mode=single' in ParseUnstructured).

Doing exactly what you want, that is indexing over small chunks, but retrieving whole documents is not easily supported, what you can do is write your own splitter that inserts full documents text in the metadata of each chunk, and then after chukns are retrieved rather then using returned text, use the full document text from metadata.

dxtrous · 2024-09-05T07:57:19Z

@szymondudycz I believe this question has come up a number of times already. Perhaps we should make it into a feature request? The resolution could be e.g. a code template that shows how to have a table of full_document_metadata, a table of chunks with document_id in their metadata, and shows how to retrieve full_document_metadata for a given chunk, and maybe also load/reread the document on demand (with a udf).
@istvan-deak if you have any thoughts here, please don't hesitate to share.

istvan-deak added the question Further information is requested label Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to retrieve the whole document for a chunk? #69

How to retrieve the whole document for a chunk? #69

istvan-deak commented Aug 30, 2024

szymondudycz commented Sep 2, 2024

dxtrous commented Sep 5, 2024

How to retrieve the whole document for a chunk? #69

How to retrieve the whole document for a chunk? #69

Comments

istvan-deak commented Aug 30, 2024

What is your question or problem? Please describe.

Describe what you would like to happen

szymondudycz commented Sep 2, 2024

dxtrous commented Sep 5, 2024