Skip to content
This repository has been archived by the owner on Jun 28, 2019. It is now read-only.
/ recognize-text Public archive

A Tesseract 4 gRPC service container for optical character generation

License

Notifications You must be signed in to change notification settings

occrp-attic/recognize-text

Repository files navigation

recognize-text

This is a service container that performs OCR on images submitted as a byte stream by using tesseract 4.0. The input should specify the language, if possible, so the right type of script detection can be used.

Images submitted via the API should be limited to less than 4 MiB in size, if possible, to avoid problems with the gRPC protocol.

Usage

In order to use this OCR container, you need to use the same protocol buffer specifications active on the server. The simplest way to use them would be via the servicelayer Python library, but the protocol can also be used independently.

Check out the following client code to see how recognize-text is used:

https://github.com/alephdata/servicelayer/blob/master/servicelayer/rpc/__init__.py

You can also inspect the protocol buffer files directly here:

https://github.com/alephdata/servicelayer/tree/master/protos/servicelayer/rpc

The container itself can be pulled from the DockerHub and will expose a service on port 50000:

docker pull alephdata/recognize-text
docker run -p 50000:50000 -ti alephdata/recognize-text

License

MIT, see LICENSE.

About

A Tesseract 4 gRPC service container for optical character generation

Resources

License

Stars

Watchers

Forks

Packages

No packages published