Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDFToTextConverter is not forwarding layout parameter to private method #3131

Closed
1 task done
danielbichuetti opened this issue Sep 1, 2022 · 0 comments · Fixed by #3137
Closed
1 task done

PDFToTextConverter is not forwarding layout parameter to private method #3131

danielbichuetti opened this issue Sep 1, 2022 · 0 comments · Fixed by #3137
Labels
Contributions wanted! Looking for external contributions

Comments

@danielbichuetti
Copy link
Contributor

danielbichuetti commented Sep 1, 2022

Describe the bug
PDFToTextConverter supports text extraction using stream order and layout order. Some PDFs have a completely unordered stream of data relative to the physical layout (due to the way they were built)

Error message
PDFs text is totally unordered, and so the text (the primary material of our work) is split into out of context data.

Expected behavior
Enable user to choose for the layout based specific setup into the public method

Additional context

To Reproduce

FAQ Check

System:

  • OS: Ubuntu
  • GPU/CPU:CPU i7
  • Haystack version (commit or version number): 1.8.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Contributions wanted! Looking for external contributions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants