No tables found on page #238

kento1109 · 2019-01-02T07:59:04Z

I want to extract DXA Results Summary table from PDF like this.

But, I cannot handle it..(Camelot warn that no tables found on page)

I tried both lattice and stream mode. But I cannot do well.
How to extract table from this PDF ??

vinayak-mehta · 2019-01-02T10:05:53Z

@kento1109 As mentioned in the README: "Camelot only works with text-based PDFs and not scanned documents. If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based."

Aside from being an image, the document you've attached is rotated. You can fix the rotation and try using OCR to extract data from this document.

kento1109 · 2019-01-02T11:17:47Z

Thank you for the quick replay!
I noticed this pdf is based on the image when parsing by pdfminer.

First of all, I tried OCR to transform image to text data.

vinayak-mehta closed this as completed Jan 2, 2019

vinayak-mehta mentioned this issue Jan 2, 2019

Raise a warning if PDF is image based #239

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No tables found on page #238

No tables found on page #238

kento1109 commented Jan 2, 2019

vinayak-mehta commented Jan 2, 2019

kento1109 commented Jan 2, 2019

No tables found on page #238

No tables found on page #238

Comments

kento1109 commented Jan 2, 2019

vinayak-mehta commented Jan 2, 2019

kento1109 commented Jan 2, 2019