-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pandas.errors.ParserError: Error tokenizing data. C error: EOF inside string starting at row 656 #470
Labels
bug
Something isn't working
Comments
guglie
added a commit
to guglie/docling
that referenced
this issue
Nov 29, 2024
Signed-off-by: guglie <[email protected]>
This was referenced Nov 29, 2024
@guglie could you please provide the input PDF file to reproduce the issue. |
@nikos-livathinos I cannot share the original confidential document, but let me generate one for you: It happens when you have only open quotes at the start of a text block. Maybe @gaspardpetit can share another file as he had the same error. |
dolfim-ibm
pushed a commit
that referenced
this issue
Dec 3, 2024
Signed-off-by: guglie <[email protected]>
ab-shrek
pushed a commit
to ab-shrek/docling
that referenced
this issue
Dec 6, 2024
Signed-off-by: guglie <[email protected]>
lucas-morin
pushed a commit
to lucas-morin/docling
that referenced
this issue
Dec 10, 2024
Signed-off-by: guglie <[email protected]>
cau-git
pushed a commit
that referenced
this issue
Dec 17, 2024
Signed-off-by: guglie <[email protected]> Signed-off-by: Christoph Auer <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Bug
Trying to convert a PDF I get the following error, the same options works on other PDFs.
Seems related to
pandas.read_csv()
on the TSV output of Tesseract.Steps to reproduce
Docling version
Python version
Python 3.12.7
The text was updated successfully, but these errors were encountered: