Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: How to trace chunks to name of document? #3916

Closed
devMls opened this issue Dec 7, 2024 · 3 comments
Closed

[Question]: How to trace chunks to name of document? #3916

devMls opened this issue Dec 7, 2024 · 3 comments
Labels
question Further information is requested

Comments

@devMls
Copy link
Contributor

devMls commented Dec 7, 2024

Describe your problem

I know that ragflow not include metadata yet.
But I have a recurring problem and I need a advise to fear it.

I haver several documents with the same information but with corrections

Law 1 of 2013 with max allowed values about a component
revision of 2019 with other values
revision of 2023 with other values

I want the latest values but LLLM doesn't know what chunk is "most modern" and he tell me that:

a información sobre el margen de tolerancia aplicable a un abono con aminoácidos es inconsistente en la base de conocimiento proporcionada. Hay varias entradas con diferentes valores:

Entrada 1: Indica un 15% del valor declarado del contenido en cualquier elemento nutriente principal, con un máximo en valor absoluto de: N total 0,9; N orgánico 0,4; P2O5 0,9; K2O 0,9. También menciona una tolerancia para el total de aminoácidos libres: 10% del valor declarado (máximo 1% en valor absoluto si el valor declarado es 10%, o 10% del valor declarado si es menor a 10%). Para cada aminoácido libre declarado en el aminograma, la tolerancia es del 25% del valor declarado .

Entrada 2: Menciona un 20% del valor declarado del contenido en cualquier elemento nutriente principal, con un máximo en valor absoluto de: N total 1,1; P2O5 1,1; K2O 1,1. Para el total de aminoácidos libres, indica 10% del valor declarado con un máximo del 3% en valor absoluto si el valor declarado es 10%, o 10% del valor declarado si es menor a 10%. Para cada aminoácido libre declarado en el aminograma, la tolerancia es del 25% del valor declarado ##5 ##1.

Entrada 3: Indica un 10% del valor declarado del contenido en cualquier elemento nutriente principal, con un máximo en valor absoluto de: N total 0,9; N orgánico y en otras formas 0,5; P2O5 0,9; K2O 0,9. Si contiene más de un nutriente principal, la tolerancia es del 15% del valor declarado, con máximos de 1,1 para N total, 0,5 para N orgánico, y 1,1 para P2O5 y K2O. Para el total de aminoácidos libres, indica 10% del valor declarado con un máximo del 3% en valor absoluto si el valor declarado es 10%, o 10% del valor declarado si es menor a 10%. Para cada aminoácido libre declarado en el aminograma, la tolerancia es del 25% del valor declarado.

Es crucial determinar qué fuente es la correcta o si se debe aplicar una interpretación considerando las diferentes regulaciones o versiones. Sin más información, no es posible determinar con certeza cuál es el margen de tolerancia aplicable.

is a good answer, each entry is for one law. but he doesn't know what law is the each one. when I ask he tell me that he don't know. How can associate each chunks with their document? maybe with autowords manually?

@devMls devMls added the question Further information is requested label Dec 7, 2024
@devMls
Copy link
Contributor Author

devMls commented Dec 7, 2024

maybe generate a option of add a tail (or payload) of each chunk, by document, in the ingest options

@Snify89
Copy link

Snify89 commented Dec 7, 2024

A good balanced way is to improve/add timestamps in general. The parser could extract all found "dates" and store them and sort by the newest date to enhance relevancy (and cluster by due date, creation date,etc). I think this can improve the accuracy further and is a way to handle revisions, etc.

@devMls
Copy link
Contributor Author

devMls commented Dec 8, 2024

I open a pull request to manage this :) #3690

@devMls devMls closed this as completed Dec 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants