Poor table auto-detection? #304

homofortis · 2019-04-05T08:41:11Z

First of all, Camelot is a great project with enormous potential!
In my experience, Camelot is properly extracting most of the tables from a document provided the right parameters have been supplied. However this is not always the case in the real world. Sometimes you have to deal with thousands of documents with different layouts and processing them one by one is not an option. It seems, auto-detection of tables in documents doesn't work very well at the moment. I tried to run a bulk table extraction from PDF documents with random layouts and the results were very poor.
Probably, there is a need of a new robust bulk extraction method working for both framed and streamed tables which produces acceptable results. In other words, sometimes It may be worth trading accuracy for generalisation.

vinayak-mehta · 2019-04-12T12:26:43Z

@homofortis Did you randomly apply the flavors to that large set of PDFs? How did you go about measuring the accuracy of results? Automatically choosing the flavor based on tables in a PDF is a planned enhancement #211.

The second issue is improving table auto-detection in the flavors themselves. It would help if you could that post that set of documents along with your findings which would help us in development.

homofortis changed the title ~~Poor table auto-detection performance?~~ Poor table auto-detection? Apr 5, 2019

vinayak-mehta closed this as completed Jul 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor table auto-detection? #304

Poor table auto-detection? #304

homofortis commented Apr 5, 2019

vinayak-mehta commented Apr 12, 2019

Poor table auto-detection? #304

Poor table auto-detection? #304

Comments

homofortis commented Apr 5, 2019

vinayak-mehta commented Apr 12, 2019