Skip to content

Commit

Permalink
Feature/impl#144 (#145)
Browse files Browse the repository at this point in the history
* Register EventService class

* Fire TextRecognizedEvent

* Add TextRecognizedEvent class

* Create sidecar and add recognized text to result

* Added PdfOcrProcessor constructor argument

* Added recognizedText variable to class

* Added EventService

* Refactored TextRecognizeEvent

* Added EventService

* Fixed tests

* composer run cs:fix

* Basic code cleanup

Signed-off-by: Robin Windey <[email protected]>

* Adjustments for #144

* Add additional tests
* Refactor code to use more "high-level" SidecarFileAccessor

Signed-off-by: Robin Windey <[email protected]>

* Add docs for #144

* Add section for events to README.md
* Remove TOC workflow

Signed-off-by: Robin Windey <[email protected]>

* Fix php7.4 syntax

Signed-off-by: Robin Windey <[email protected]>

* Add check if event is emitted

Signed-off-by: Robin Windey <[email protected]>

* Change TextRecognizedEvent interface to be more generic

Linked to #144

* Adjust docs to match new interface

Signed-off-by: Robin Windey <[email protected]>

* Fix codecov

Signed-off-by: Robin Windey <[email protected]>

Signed-off-by: Robin Windey <[email protected]>
Co-authored-by: Guido Schmitz <[email protected]>
Co-authored-by: Robin Windey <[email protected]>
  • Loading branch information
3 people committed Sep 24, 2022
1 parent d89d43a commit 60a6ced
Show file tree
Hide file tree
Showing 19 changed files with 645 additions and 78 deletions.
1 change: 1 addition & 0 deletions .github/workflows/codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ jobs:
uses: actions/checkout@v2
with:
path: apps/${{ env.APP_NAME }}
fetch-depth: 0

- name: Set up php ${{ matrix.php-versions }}
uses: shivammathur/setup-php@v2
Expand Down
13 changes: 0 additions & 13 deletions .github/workflows/toc_generator.yml

This file was deleted.

95 changes: 67 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,36 +6,36 @@
[![Generic badge](https://img.shields.io/github/v/release/R0Wi/workflow_ocr)](https://github.com/R0Wi/workflow_ocr/releases)
[![Generic badge](https://img.shields.io/badge/Nextcloud-24-orange)](https://github.com/nextcloud/server)

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
## Table of contents

- [Setup](#setup)
- [App installation](#app-installation)
- [Nextcloud background jobs](#nextcloud-background-jobs)
- [Backend](#backend)
- [Usage](#usage)
- [Useful triggers](#useful-triggers)
- [Trigger OCR if file was created or updated](#trigger-ocr-if-file-was-created-or-updated)
- [Trigger OCR on tag assigning](#trigger-ocr-on-tag-assigning)
- [Settings](#settings)
- [Per workflow settings](#per-workflow-settings)
- [Global settings](#global-settings)
- [Testing your configuration](#testing-your-configuration)
- [How it works](#how-it-works)
- [General](#general)
- [PDF](#pdf)
- [Images](#images)
- [Development](#development)
- [Dev setup](#dev-setup)
- [Debugging](#debugging)
- [`docker`-based setup](#docker-based-setup)
- [Executing tests](#executing-tests)
- [Adding a new `OcrProcessor`](#adding-a-new-ocrprocessor)
- [Limitations](#limitations)
- [Used libraries & components](#used-libraries--components)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->
- [Nextcloud Workflow OCR app](#nextcloud-workflow-ocr-app)
- [Table of contents](#table-of-contents)
- [Setup](#setup)
- [App installation](#app-installation)
- [Nextcloud background jobs](#nextcloud-background-jobs)
- [Backend](#backend)
- [Usage](#usage)
- [Useful triggers](#useful-triggers)
- [Trigger OCR if file was created or updated](#trigger-ocr-if-file-was-created-or-updated)
- [Trigger OCR on tag assigning](#trigger-ocr-on-tag-assigning)
- [Settings](#settings)
- [Per workflow settings](#per-workflow-settings)
- [Global settings](#global-settings)
- [Testing your configuration](#testing-your-configuration)
- [How it works](#how-it-works)
- [General](#general)
- [PDF](#pdf)
- [Images](#images)
- [Development](#development)
- [Dev setup](#dev-setup)
- [Debugging](#debugging)
- [`docker`-based setup](#docker-based-setup)
- [Executing tests](#executing-tests)
- [Adding a new `OcrProcessor`](#adding-a-new-ocrprocessor)
- [Events emitted by the app](#events-emitted-by-the-app)
- [`TextRecognizedEvent`](#textrecognizedevent)
- [Limitations](#limitations)
- [Used libraries & components](#used-libraries--components)

## Setup
### App installation
Expand Down Expand Up @@ -334,6 +334,45 @@ public static function registerOcrProcessors(IRegistrationContext $context) : vo

That's all. If you now create a new workflow based on your added mimetype, your implementation should be triggered by the app. The return value of `ocrFile(string $fileContent, WorkflowSettings $settings, GlobalSettings $globalSettings)` will be interpreted as the file content of the scanned file. This one is used to create a new file version in Nextcloud.

### Events emitted by the app

The app currently emits the following events from `lib/Events`. You can use these hooks to extend the app's functionality inside your own app.
Use the following sample code to implement a listener for the events:

```php
use OCA\WorkflowOcr\Events\TextRecognizedEvent;
use OCP\EventDispatcher\Event;
use OCP\EventDispatcher\IEventListener;
class TextRecognizedListener implements IEventListener {
public function handle(Event $event): void {
if (!$event instanceof TextRecognizedEvent) {
return;
}
// Do something with the event ...
}
}
```

Your implementation should then be registered in your app's `Application.php`:

```php
public function register(IRegistrationContext $context): void {
$context->registerEventListener(TextRecognizedEvent::class, TextRecognizedListener::class);
}
```

#### `TextRecognizedEvent`

This event will be emitted when a OCR process has finished successfully. It contains the following information:

| Method | Type | Description |
|--------|-------|------------|
| `getRecognizedText()` | `string` | Contains the text which was recognized by the OCR process. |
| `getFile()` | `OCP\Files\File` | The NC file node where the OCR processed file was stored to. |

> **Note:** this event will be emitted even if the OCR content was empty.

## Limitations
* **Currently only pdf documents (`application/pdf`) can be used as input.** Other mimetypes are currently ignored but might be added in the future.
* Pdf metadata (like author, comments, ...) is not available in the converted output pdf document.
Expand Down
11 changes: 11 additions & 0 deletions lib/AppInfo/Application.php
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,14 @@
namespace OCA\WorkflowOcr\AppInfo;

use OCA\WorkflowOcr\Helper\IProcessingFileAccessor;
use OCA\WorkflowOcr\Helper\ISidecarFileAccessor;
use OCA\WorkflowOcr\Helper\ProcessingFileAccessor;
use OCA\WorkflowOcr\Helper\SidecarFileAccessor;
use OCA\WorkflowOcr\Listener\RegisterFlowOperationsListener;
use OCA\WorkflowOcr\OcrProcessors\IOcrProcessorFactory;
use OCA\WorkflowOcr\OcrProcessors\OcrProcessorFactory;
use OCA\WorkflowOcr\Service\IEventService;
use OCA\WorkflowOcr\Service\EventService;
use OCA\WorkflowOcr\Service\GlobalSettingsService;
use OCA\WorkflowOcr\Service\IGlobalSettingsService;
use OCA\WorkflowOcr\Service\IOcrService;
Expand All @@ -46,7 +50,10 @@
use OCP\AppFramework\Bootstrap\IBootContext;
use OCP\AppFramework\Bootstrap\IBootstrap;
use OCP\AppFramework\Bootstrap\IRegistrationContext;
use OCP\ITempManager;
use OCP\WorkflowEngine\Events\RegisterOperationsEvent;
use Psr\Container\ContainerInterface;
use Psr\Log\LoggerInterface;

class Application extends App implements IBootstrap {
public const COMPOSER_DIR = __DIR__ . '/../../vendor/';
Expand All @@ -68,11 +75,15 @@ public function register(IRegistrationContext $context): void {
$context->registerServiceAlias(IViewFactory::class, ViewFactory::class);
$context->registerServiceAlias(IFilesystem::class, Filesystem::class);
$context->registerServiceAlias(IGlobalSettingsService::class, GlobalSettingsService::class);
$context->registerServiceAlias(IEventService::class, EventService::class);

// BUG #43
$context->registerService(ICommand::class, function () {
return new CommandWrapper();
}, false);
$context->registerService(ISidecarFileAccessor::class, function (ContainerInterface $c) {
return new SidecarFileAccessor($c->get(ITempManager::class), $c->get(LoggerInterface::class));
}, false);

$context->registerService(IProcessingFileAccessor::class, function () {
return ProcessingFileAccessor::getInstance();
Expand Down
8 changes: 8 additions & 0 deletions lib/BackgroundJobs/ProcessFileJob.php
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
use OCA\WorkflowOcr\Helper\IProcessingFileAccessor;
use OCA\WorkflowOcr\Model\WorkflowSettings;
use OCA\WorkflowOcr\Service\IOcrService;
use OCA\WorkflowOcr\Service\IEventService;
use OCA\WorkflowOcr\Wrapper\IFilesystem;
use OCA\WorkflowOcr\Wrapper\IViewFactory;
use OCP\AppFramework\Utility\ITimeFactory;
Expand All @@ -57,6 +58,8 @@ class ProcessFileJob extends \OCP\BackgroundJob\QueuedJob {
private $rootFolder;
/** @var IOcrService */
private $ocrService;
/** @var IEventService */
private $eventService;
/** @var IViewFactory */
private $viewFactory;
/** @var IFilesystem */
Expand All @@ -72,6 +75,7 @@ public function __construct(
LoggerInterface $logger,
IRootFolder $rootFolder,
IOcrService $ocrService,
IEventService $eventService,
IViewFactory $viewFactory,
IFilesystem $filesystem,
IUserManager $userManager,
Expand All @@ -82,6 +86,7 @@ public function __construct(
$this->logger = $logger;
$this->rootFolder = $rootFolder;
$this->ocrService = $ocrService;
$this->eventService = $eventService;
$this->viewFactory = $viewFactory;
$this->filesystem = $filesystem;
$this->userManager = $userManager;
Expand Down Expand Up @@ -179,15 +184,18 @@ private function processFile(string $filePath, WorkflowSettings $settings) : voi
return;
}


$fileContent = $ocrFile->getFileContent();
$nodeId = $node->getId();
$originalFileExtension = $node->getExtension();
$newFileExtension = $ocrFile->getFileExtension();

if ($originalFileExtension === $newFileExtension) {
$this->createNewFileVersion($filePath, $fileContent, $nodeId);
$this->eventService->textRecognized($ocrFile, $node);
} else {
$this->createNewFileVersion($filePath.".pdf", $fileContent, $nodeId);
$this->eventService->textRecognized($ocrFile, $node);
}
}

Expand Down
67 changes: 67 additions & 0 deletions lib/Events/TextRecognizedEvent.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
<?php

declare(strict_types=1);

/**
* @copyright Copyright (c) 2022 Robin Windey <[email protected]>
*
* @license GNU AGPL version 3 or any later version
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as
* published by the Free Software Foundation, either version 3 of the
* License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Affero General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/

namespace OCA\WorkflowOcr\Events;

use OCP\EventDispatcher\Event;
use OCP\Files\File;

/**
* Class TextRecognizedEvent
*
* @package OCA\WorkflowOcr\Events
*/
class TextRecognizedEvent extends Event {


/** @var string */
private $recognizedText;

/** @var File */
private $file;


/**
* TextRecognizedEvent constructor.
*/
public function __construct(string $recognizedText, File $file) {
parent::__construct();

$this->recognizedText = $recognizedText;
$this->file = $file;
}

/**
* @return string $recognizedText
*/
public function getRecognizedText(): string {
return $this->recognizedText;
}

/**
* @return File $file
*/
public function getFile(): File {
return $this->file;
}
}
40 changes: 40 additions & 0 deletions lib/Helper/ISidecarFileAccessor.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
<?php

declare(strict_types=1);

/**
* @copyright Copyright (c) 2022 Robin Windey <[email protected]>
*
* @license GNU AGPL version 3 or any later version
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as
* published by the Free Software Foundation, either version 3 of the
* License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Affero General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/

namespace OCA\WorkflowOcr\Helper;

interface ISidecarFileAccessor {
/**
* Creates a new temporary sidecar file for OCR text content.
* If a file was already created, the path to the existing file is returned.
*
* @return string|bool Path to the sidecar file or false if the file could not be created
*/
public function getOrCreateSidecarFile();

/**
* Gets the content of the created sidecar file. File has to be created
* before calling this method.
*/
public function getSidecarFileContent(): string;
}
Loading

0 comments on commit 60a6ced

Please sign in to comment.