You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 6, 2025. It is now read-only.
First of all, Camelot is a great project with enormous potential!
In my experience, Camelot is properly extracting most of the tables from a document provided the right parameters have been supplied. However this is not always the case in the real world. Sometimes you have to deal with thousands of documents with different layouts and processing them one by one is not an option. It seems, auto-detection of tables in documents doesn't work very well at the moment. I tried to run a bulk table extraction from PDF documents with random layouts and the results were very poor.
Probably, there is a need of a new robust bulk extraction method working for both framed and streamed tables which produces acceptable results. In other words, sometimes It may be worth trading accuracy for generalisation.
The text was updated successfully, but these errors were encountered:
homofortis
changed the title
Poor table auto-detection performance?
Poor table auto-detection?
Apr 5, 2019
@homofortis Did you randomly apply the flavors to that large set of PDFs? How did you go about measuring the accuracy of results? Automatically choosing the flavor based on tables in a PDF is a planned enhancement #211.
The second issue is improving table auto-detection in the flavors themselves. It would help if you could that post that set of documents along with your findings which would help us in development.
First of all, Camelot is a great project with enormous potential!
In my experience, Camelot is properly extracting most of the tables from a document provided the right parameters have been supplied. However this is not always the case in the real world. Sometimes you have to deal with thousands of documents with different layouts and processing them one by one is not an option. It seems, auto-detection of tables in documents doesn't work very well at the moment. I tried to run a bulk table extraction from PDF documents with random layouts and the results were very poor.
Probably, there is a need of a new robust bulk extraction method working for both framed and streamed tables which produces acceptable results. In other words, sometimes It may be worth trading accuracy for generalisation.
The text was updated successfully, but these errors were encountered: