Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate OCRmyPDF #32

Closed
5 tasks done
R0Wi opened this issue Oct 13, 2020 · 9 comments · Fixed by #39
Closed
5 tasks done

Integrate OCRmyPDF #32

R0Wi opened this issue Oct 13, 2020 · 9 comments · Fixed by #39
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@R0Wi
Copy link
Contributor

R0Wi commented Oct 13, 2020

Because of various issues regarding the compression of the PDF outputs an alternative tool should be used to process PDF files. We want to integrate OCRmyPDF.

Issues are discussed here.

  • Update README.md for new install instructions
    • Only File create supported?
    • Install OCRmyPDF dependency
    • Remove unnecessary dependencies
  • Change PdfOcrProcessor.php to use OCRmyPDF
    • Use a commandline wrapper
    • Use std-out for getting the pdf stream
  • Update tests accordingly
  • Change version to 1.20.1
  • Maybe find a way to configure commandline args (via flow parameter?) (moved to OCRmyPDF enhancements #38)
@R0Wi R0Wi added the enhancement New feature or request label Oct 13, 2020
@R0Wi R0Wi added this to the v1.20.1 milestone Oct 13, 2020
@R0Wi R0Wi self-assigned this Oct 13, 2020
R0Wi added a commit that referenced this issue Oct 15, 2020
@R0Wi
Copy link
Contributor Author

R0Wi commented Oct 15, 2020

First working version https://github.com/R0Wi/workflow_ocr/suites/1346287870/artifacts/21769925
(hope this artifact can be downloaded ? )

@bahnwaerter FYI

@thomasgg23
Copy link

Thanks, artifact can be downloaded.
Will test it on the weekend.

@thomasgg23
Copy link

I am getting the following message when trying to create a workflow MIME based.

image

@R0Wi
Copy link
Contributor Author

R0Wi commented Oct 17, 2020

That's strange because i did not change anything regarding the UI. Are you on Nextcloud 20? And could you try to remove the "file renamed" condition?

@thomasgg23
Copy link

This one is working.

Will try some combinations

image

@R0Wi
Copy link
Contributor Author

R0Wi commented Oct 17, 2020

Ok looks rather like a problem inside of the workflow engine of Nextcloud since these things aren't really influenced by the workflow_ocr app.

Let me know your results 👍

@bahnwaerter
Copy link
Collaborator

bahnwaerter commented Oct 23, 2020

I am getting the following message when trying to create a workflow MIME based.

image

@R0Wi, I've tested your first working version and can confirm the issue observed by @thomasgg23. The issue seems to be a bug in the Nextcloud workflow engine. If the matches operator is applied, the validation method validateCheck of the workflow engine throws the displayed value exception from the screenshot. The exception is thrown in the AbstractStringCheck.php file in line 91.

If the OCR workflow is set up properly (eg. using the is operator), an OCR processing with the first working version integrating OCRmyPDF is possible. The OCR processing succeeds and does not show up any notable warnings or errors.

@R0Wi
Copy link
Contributor Author

R0Wi commented Oct 24, 2020

Thanks for testing @bahnwaerter. Will check that and eventually open a new issue at Nextcloud server repo 👍

@R0Wi
Copy link
Contributor Author

R0Wi commented Nov 10, 2020

@bahnwaerter FYI: the configuration currently mentioned in the docs will lead to an infinite loop, see #34 (comment)

We should discuss some solutions ...

EDIT: should be fixed with efd7a3d

This was referenced Nov 11, 2020
@R0Wi R0Wi linked a pull request Nov 17, 2020 that will close this issue
@R0Wi R0Wi closed this as completed in #39 Nov 30, 2020
R0Wi added a commit that referenced this issue Nov 30, 2020
* First working version with OCRmyPDF #32

* Fix variable assignment

* Use ProcessingFileAccessor to prevent infinite loop

* Update README for OCRmyPDF

* docs: update TOC

* Update README + app compliance

* Code compliance

* Apply suggestions from code review

Co-authored-by: Manuel Bentele <[email protected]>

Co-authored-by: R0Wi <[email protected]>
Co-authored-by: Manuel Bentele <[email protected]>
github-actions bot pushed a commit that referenced this issue Nov 30, 2020
* First working version with OCRmyPDF #32

* Fix variable assignment

* Use ProcessingFileAccessor to prevent infinite loop

* Update README for OCRmyPDF

* docs: update TOC

* Update README + app compliance

* Code compliance

* Apply suggestions from code review

Co-authored-by: Manuel Bentele <[email protected]>

Co-authored-by: R0Wi <[email protected]>
Co-authored-by: Manuel Bentele <[email protected]>
R0Wi added a commit that referenced this issue Nov 30, 2020
* First working version with OCRmyPDF #32

* Fix variable assignment

* Use ProcessingFileAccessor to prevent infinite loop

* Update README for OCRmyPDF

* docs: update TOC

* Update README + app compliance

* Code compliance

* Apply suggestions from code review

Co-authored-by: Manuel Bentele <[email protected]>

Co-authored-by: R0Wi <[email protected]>
Co-authored-by: Manuel Bentele <[email protected]>

Co-authored-by: Robin Windey <[email protected]>
Co-authored-by: R0Wi <[email protected]>
Co-authored-by: Manuel Bentele <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants