-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: Chunks do not make sense #1407
Comments
Can you provide this law document? |
RAGFlow parsing methods restructured the text blocks to ensure that semantic expressions are not interrupted as much as possible. |
You can find the document here: I have the same behavior on shorter non-law documents. I can understand that text blocks are restructured, but why does the text become gibberish? eg. bAoNiCleEr doesn't make sense, it's like a concatenation of parts from different words |
Got it. We're gona to fix it. |
### What problem does this PR solve? #1407 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve? infiniflow#1407 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve? #1407 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve? infiniflow#1407 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve? infiniflow#1407 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve? infiniflow#1407 infiniflow#1656 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)
Describe your problem
Hi,
I have a 500+ pages law document that I chunked once with the general method & default settings and once with the law method & default settings.
In both cases, when looking at the chunks, the chunk looks gibberish. The text in the pdf isn't like that.
What is the reason for this and how can this be solved?
Br
Alex
The text was updated successfully, but these errors were encountered: