Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No tables found on page #238

Closed
kento1109 opened this issue Jan 2, 2019 · 2 comments
Closed

No tables found on page #238

kento1109 opened this issue Jan 2, 2019 · 2 comments

Comments

@kento1109
Copy link

I want to extract DXA Results Summary table from PDF like this.

Sample_Dexa_Report.pdf

But, I cannot handle it..(Camelot warn that no tables found on page)

I tried both lattice and stream mode. But I cannot do well.
How to extract table from this PDF ??

@vinayak-mehta
Copy link
Contributor

@kento1109 As mentioned in the README: "Camelot only works with text-based PDFs and not scanned documents. If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based."

Aside from being an image, the document you've attached is rotated. You can fix the rotation and try using OCR to extract data from this document.

@kento1109
Copy link
Author

Thank you for the quick replay!
I noticed this pdf is based on the image when parsing by pdfminer.

First of all, I tried OCR to transform image to text data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants