Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitted table captions #15

Open
MrUnknown789556 opened this issue Jul 15, 2023 · 1 comment
Open

Splitted table captions #15

MrUnknown789556 opened this issue Jul 15, 2023 · 1 comment
Assignees

Comments

@MrUnknown789556
Copy link

I installed the pacakage and run it.

It generated all tables from the article smoothly. Very nice and impressive.

But I would here point out a minor deficiency:

If the table caption for a table is splitted over more than 1 line, then only one (the lowest splitted part of the caption text) is included in the extracted table.

With the attached test article, it is also seen, that a "table" is extracted, that is not a table at all.

Best regards

Frank

The.pdf
The_table_12 1
The_table_8 1

@SuleyNL SuleyNL self-assigned this Oct 20, 2023
@SuleyNL
Copy link
Owner

SuleyNL commented Oct 20, 2023

Hi, @MrUnknown789556,
Thanks for trying out Extractable. I take your feedback seriously and will work on it. Keep in mind Extractable is a work in progress, so you can contribute by leaving valuable feedback and by making changes in the code.

as for your points:

  1. If the table caption for a table is splitted over more than 1 line, then only one (the lowest splitted part of the caption text) is included in the extracted table.
  • #TODO I will be looking into this, there should be a solution to it.
  1. With the attached test article, it is also seen, that a "table" is extracted, that is not a table at all.
    This is an unfortunate byproduct of Extractable's ability to recognize tables with no lines present. It is a double edged sword because sometimes there is tables with no lines that we do want to be detected. But in cases like this it is fooled by just text in a table-like format.
  • #TODO There is no quick fix to this but it is possible to have two versions:
  • - one version recognizing all tables including with no lines,
  • - and another version recognizing only tables with lines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants