Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ParserError EOF inside string (#470) #472

Merged
merged 1 commit into from
Dec 3, 2024
Merged

Conversation

guglie
Copy link
Contributor

@guglie guglie commented Nov 29, 2024

Do not interpret quotes at the start of text read by tesseract as TSV cell quoting otherwise an error is raised if the tesseract TSV output contains rows like this:

5	1	45	1	24	1	1557	1119	104	43	79.578239	"Example
5	1	45	1	24	2	1675	1119	76	43	93.807220	rows”

Issue resolved by this Pull Request:
Resolves #470

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

Copy link

mergify bot commented Nov 29, 2024

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?:

Copy link
Contributor

@dolfim-ibm dolfim-ibm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@PeterStaar-IBM PeterStaar-IBM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@PeterStaar-IBM
Copy link
Contributor

@nikos-livathinos Can you quickly review: I would like your approval before we merge this.

Copy link
Collaborator

@nikos-livathinos nikos-livathinos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@dolfim-ibm dolfim-ibm merged commit c90c41c into DS4SD:main Dec 3, 2024
7 checks passed
ab-shrek pushed a commit to ab-shrek/docling that referenced this pull request Dec 6, 2024
lucas-morin pushed a commit to lucas-morin/docling that referenced this pull request Dec 10, 2024
cau-git pushed a commit that referenced this pull request Dec 17, 2024
Signed-off-by: guglie <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pandas.errors.ParserError: Error tokenizing data. C error: EOF inside string starting at row 656
5 participants