Skip to content
This repository has been archived by the owner on Jan 6, 2025. It is now read-only.

Poor table auto-detection? #304

Closed
homofortis opened this issue Apr 5, 2019 · 1 comment
Closed

Poor table auto-detection? #304

homofortis opened this issue Apr 5, 2019 · 1 comment

Comments

@homofortis
Copy link

First of all, Camelot is a great project with enormous potential!
In my experience, Camelot is properly extracting most of the tables from a document provided the right parameters have been supplied. However this is not always the case in the real world. Sometimes you have to deal with thousands of documents with different layouts and processing them one by one is not an option. It seems, auto-detection of tables in documents doesn't work very well at the moment. I tried to run a bulk table extraction from PDF documents with random layouts and the results were very poor.
Probably, there is a need of a new robust bulk extraction method working for both framed and streamed tables which produces acceptable results. In other words, sometimes It may be worth trading accuracy for generalisation.

@homofortis homofortis changed the title Poor table auto-detection performance? Poor table auto-detection? Apr 5, 2019
@vinayak-mehta
Copy link
Contributor

@homofortis Did you randomly apply the flavors to that large set of PDFs? How did you go about measuring the accuracy of results? Automatically choosing the flavor based on tables in a PDF is a planned enhancement #211.

The second issue is improving table auto-detection in the flavors themselves. It would help if you could that post that set of documents along with your findings which would help us in development.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants