-
Notifications
You must be signed in to change notification settings - Fork 990
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Support tableformer model choice (#90)
* Support tableformer model choice Signed-off-by: Christoph Auer <[email protected]> * Update datamodel structure Signed-off-by: Christoph Auer <[email protected]> * Update docs Signed-off-by: Christoph Auer <[email protected]> * Cleanup Signed-off-by: Christoph Auer <[email protected]> * Add test unit for table options Signed-off-by: Christoph Auer <[email protected]> * Ensure import backwards-compatibility for PipelineOptions Signed-off-by: Christoph Auer <[email protected]> * Update README Signed-off-by: Christoph Auer <[email protected]> * Adjust parameters on custom_convert Signed-off-by: Christoph Auer <[email protected]> * Update Dockerfile Signed-off-by: Christoph Auer <[email protected]> --------- Signed-off-by: Christoph Auer <[email protected]> Signed-off-by: Christoph Auer <[email protected]>
- Loading branch information
Showing
16 changed files
with
711 additions
and
592 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
from enum import Enum, auto | ||
|
||
from pydantic import BaseModel | ||
|
||
|
||
class TableFormerMode(str, Enum): | ||
FAST = auto() | ||
ACCURATE = auto() | ||
|
||
|
||
class TableStructureOptions(BaseModel): | ||
do_cell_matching: bool = ( | ||
True | ||
# True: Matches predictions back to PDF cells. Can break table output if PDF cells | ||
# are merged across table columns. | ||
# False: Let table structure model define the text cells, ignore PDF cells. | ||
) | ||
mode: TableFormerMode = TableFormerMode.FAST | ||
|
||
|
||
class PipelineOptions(BaseModel): | ||
do_table_structure: bool = True # True: perform table structure extraction | ||
do_ocr: bool = True # True: perform OCR, replace programmatic PDF text | ||
|
||
table_structure_options: TableStructureOptions = TableStructureOptions() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.