Integrate OCRmyPDF #32

R0Wi · 2020-10-13T15:31:20Z

Because of various issues regarding the compression of the PDF outputs an alternative tool should be used to process PDF files. We want to integrate OCRmyPDF.

Issues are discussed here.

Update README.md for new install instructions
- ~~Only File create supported?~~
- Install OCRmyPDF dependency
- Remove unnecessary dependencies
Change PdfOcrProcessor.php to use OCRmyPDF
- Use a commandline wrapper
- Use std-out for getting the pdf stream
Update tests accordingly
Change version to 1.20.1
~~Maybe find a way to configure commandline args (via flow parameter?)~~ (moved to OCRmyPDF enhancements #38)

The text was updated successfully, but these errors were encountered:

R0Wi · 2020-10-15T16:52:47Z

First working version https://github.com/R0Wi/workflow_ocr/suites/1346287870/artifacts/21769925
(hope this artifact can be downloaded ? )

@bahnwaerter FYI

thomasgg23 · 2020-10-16T10:22:40Z

Thanks, artifact can be downloaded.
Will test it on the weekend.

thomasgg23 · 2020-10-17T09:34:33Z

I am getting the following message when trying to create a workflow MIME based.

R0Wi · 2020-10-17T09:42:47Z

That's strange because i did not change anything regarding the UI. Are you on Nextcloud 20? And could you try to remove the "file renamed" condition?

thomasgg23 · 2020-10-17T09:50:25Z

This one is working.

Will try some combinations

R0Wi · 2020-10-17T09:57:39Z

Ok looks rather like a problem inside of the workflow engine of Nextcloud since these things aren't really influenced by the workflow_ocr app.

Let me know your results 👍

bahnwaerter · 2020-10-23T21:24:10Z

I am getting the following message when trying to create a workflow MIME based.

@R0Wi, I've tested your first working version and can confirm the issue observed by @thomasgg23. The issue seems to be a bug in the Nextcloud workflow engine. If the matches operator is applied, the validation method validateCheck of the workflow engine throws the displayed value exception from the screenshot. The exception is thrown in the AbstractStringCheck.php file in line 91.

If the OCR workflow is set up properly (eg. using the is operator), an OCR processing with the first working version integrating OCRmyPDF is possible. The OCR processing succeeds and does not show up any notable warnings or errors.

R0Wi · 2020-10-24T07:09:17Z

Thanks for testing @bahnwaerter. Will check that and eventually open a new issue at Nextcloud server repo 👍

R0Wi · 2020-11-10T15:11:17Z

@bahnwaerter FYI: the configuration currently mentioned in the docs will lead to an infinite loop, see #34 (comment)

We should discuss some solutions ...

EDIT: should be fixed with efd7a3d

* First working version with OCRmyPDF #32 * Fix variable assignment * Use ProcessingFileAccessor to prevent infinite loop * Update README for OCRmyPDF * docs: update TOC * Update README + app compliance * Code compliance * Apply suggestions from code review Co-authored-by: Manuel Bentele <[email protected]> Co-authored-by: R0Wi <[email protected]> Co-authored-by: Manuel Bentele <[email protected]>

* First working version with OCRmyPDF #32 * Fix variable assignment * Use ProcessingFileAccessor to prevent infinite loop * Update README for OCRmyPDF * docs: update TOC * Update README + app compliance * Code compliance * Apply suggestions from code review Co-authored-by: Manuel Bentele <[email protected]> Co-authored-by: R0Wi <[email protected]> Co-authored-by: Manuel Bentele <[email protected]> Co-authored-by: Robin Windey <[email protected]> Co-authored-by: R0Wi <[email protected]> Co-authored-by: Manuel Bentele <[email protected]>

R0Wi added the enhancement New feature or request label Oct 13, 2020

R0Wi added this to the v1.20.1 milestone Oct 13, 2020

R0Wi self-assigned this Oct 13, 2020

R0Wi mentioned this issue Oct 13, 2020

PDF files are getting bloating up #22

Closed

R0Wi added a commit that referenced this issue Oct 15, 2020

First working version with OCRmyPDF #32

b3d4be2

R0Wi mentioned this issue Oct 24, 2020

Workflowengine "The given regular expression is invalid" nextcloud/server#23666

Open

R0Wi mentioned this issue Nov 5, 2020

Missing Catalog - error message upon the OCR process in the workflow of a PDF #34

Closed

This was referenced Nov 11, 2020

ArgumentCountError #35

Closed

pdf size blows up #37

Closed

OCRmyPDF enhancements #38

Closed

R0Wi linked a pull request Nov 17, 2020 that will close this issue

support ocrmypdf#32 #39

Merged

R0Wi mentioned this issue Nov 19, 2020

This PDF document probably uses a compression technique which is not supported by the free parser shipped with FPDI #20

Closed

R0Wi closed this as completed in #39 Nov 30, 2020

bahnwaerter mentioned this issue Dec 14, 2020

error setting up the flow #41

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate OCRmyPDF #32

Integrate OCRmyPDF #32

R0Wi commented Oct 13, 2020 •

edited

Loading

R0Wi commented Oct 15, 2020 •

edited

Loading

thomasgg23 commented Oct 16, 2020

thomasgg23 commented Oct 17, 2020

R0Wi commented Oct 17, 2020

thomasgg23 commented Oct 17, 2020

R0Wi commented Oct 17, 2020

bahnwaerter commented Oct 23, 2020 •

edited

Loading

R0Wi commented Oct 24, 2020

R0Wi commented Nov 10, 2020 •

edited

Loading

Integrate OCRmyPDF #32

Integrate OCRmyPDF #32

Comments

R0Wi commented Oct 13, 2020 • edited Loading

R0Wi commented Oct 15, 2020 • edited Loading

thomasgg23 commented Oct 16, 2020

thomasgg23 commented Oct 17, 2020

R0Wi commented Oct 17, 2020

thomasgg23 commented Oct 17, 2020

R0Wi commented Oct 17, 2020

bahnwaerter commented Oct 23, 2020 • edited Loading

R0Wi commented Oct 24, 2020

R0Wi commented Nov 10, 2020 • edited Loading

R0Wi commented Oct 13, 2020 •

edited

Loading

R0Wi commented Oct 15, 2020 •

edited

Loading

bahnwaerter commented Oct 23, 2020 •

edited

Loading

R0Wi commented Nov 10, 2020 •

edited

Loading