diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 000000000..e69de29bb diff --git a/2.0.0/.buildinfo b/2.0.0/.buildinfo new file mode 100644 index 000000000..1ae33ee03 --- /dev/null +++ b/2.0.0/.buildinfo @@ -0,0 +1,4 @@ +# Sphinx build info version 1 +# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. +config: 5c56db9ce576f5d7e2c0c707e63c9814 +tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/2.0.0/.doctrees/advanced.doctree b/2.0.0/.doctrees/advanced.doctree new file mode 100644 index 000000000..d116c8f4d Binary files /dev/null and b/2.0.0/.doctrees/advanced.doctree differ diff --git a/2.0.0/.doctrees/api.doctree b/2.0.0/.doctrees/api.doctree new file mode 100644 index 000000000..b34a1e7dd Binary files /dev/null and b/2.0.0/.doctrees/api.doctree differ diff --git a/2.0.0/.doctrees/environment.pickle b/2.0.0/.doctrees/environment.pickle new file mode 100644 index 000000000..0d22fc382 Binary files /dev/null and b/2.0.0/.doctrees/environment.pickle differ diff --git a/2.0.0/.doctrees/gpu.doctree b/2.0.0/.doctrees/gpu.doctree new file mode 100644 index 000000000..9a940f17b Binary files /dev/null and b/2.0.0/.doctrees/gpu.doctree differ diff --git a/2.0.0/.doctrees/index.doctree b/2.0.0/.doctrees/index.doctree new file mode 100644 index 000000000..a4bbb9b14 Binary files /dev/null and b/2.0.0/.doctrees/index.doctree differ diff --git a/2.0.0/.doctrees/ketos.doctree b/2.0.0/.doctrees/ketos.doctree new file mode 100644 index 000000000..fb3f414c7 Binary files /dev/null and b/2.0.0/.doctrees/ketos.doctree differ diff --git a/2.0.0/.doctrees/models.doctree b/2.0.0/.doctrees/models.doctree new file mode 100644 index 000000000..ebc25f021 Binary files /dev/null and b/2.0.0/.doctrees/models.doctree differ diff --git a/2.0.0/.doctrees/training.doctree b/2.0.0/.doctrees/training.doctree new file mode 100644 index 000000000..c4680eaf3 Binary files /dev/null and b/2.0.0/.doctrees/training.doctree differ diff --git a/2.0.0/.doctrees/vgsl.doctree b/2.0.0/.doctrees/vgsl.doctree new file mode 100644 index 000000000..d36673e59 Binary files /dev/null and b/2.0.0/.doctrees/vgsl.doctree differ diff --git a/2.0.0/.nojekyll b/2.0.0/.nojekyll new file mode 100644 index 000000000..e69de29bb diff --git a/2.0.0/_sources/advanced.rst.txt b/2.0.0/_sources/advanced.rst.txt new file mode 100644 index 000000000..4f9f4db43 --- /dev/null +++ b/2.0.0/_sources/advanced.rst.txt @@ -0,0 +1,227 @@ +.. _advanced: + +Advanced Usage +============== + +Optical character recognition is the serial execution of multiple steps, in the +case of kraken binarization (converting color and grayscale images into bitonal +ones), layout analysis/page segmentation (extracting topological text lines +from an image), recognition (feeding text lines images into an classifiers), +and finally serialization of results into an appropriate format such as hOCR or +ALTO. + +Input Specification +------------------- + +All kraken subcommands operating on input-output pairs, i.e. producing one +output document for one input document follow the basic syntax: + +.. code-block:: console + + $ kraken -i input_1 output_1 -i input_2 output_2 ... subcommand_1 subcommand_2 ... subcommand_n + +In particular subcommands may be chained. + +Binarization +------------ + +The binarization subcommand accepts almost the same parameters as +``ocropus-nlbin``. Only options not related to binarization, e.g. skew +detection are missing. In addition, error checking (image sizes, inversion +detection, grayscale enforcement) is always disabled and kraken will happily +binarize any image that is thrown at it. + +Available parameters are: + +=========== ==== +option type +=========== ==== +--threshold FLOAT +--zoom FLOAT +--escale FLOAT +--border FLOAT +--perc INTEGER RANGE +--range INTEGER +--low INTEGER RANGE +--high INTEGER RANGE +=========== ==== + +Page Segmentation and Script Detection +-------------------------------------- + +The `segment` subcommand access two operations page segmentation into lines and +script detection of those lines. + +Page segmentation is mostly parameterless, although a switch to change the +color of column separators has been retained. The segmentation is written as a +`JSON `_ file containing bounding boxes in reading order and +the general text direction (horizontal, i.e. LTR or RTL text in top-to-bottom +reading order or vertical-ltr/rtl for vertical lines read from left-to-right or +right-to-left). + +The script detection splits extracted lines from the segmenter into strip +sharing a particular script that can then be recognized by supplying +appropriate models for each detected script to the `ocr` subcommand. + +Combined output from both consists of lists in the `boxes` field corresponding +to a topographical line and containing one or more bounding boxes of a +particular script. Identifiers are `ISO 15924 +`_ 4 character codes. + +.. code-block:: console + + $ kraken -i 14.tif lines.txt segment + $ cat lines.json + { + "boxes" : [ + [ + ["Grek", [561, 216, 1626,309]] + ], + [ + ["Latn", [2172, 197, 2424, 244]] + ], + [ + ["Grek", [1678, 221, 2236, 320]], + ["Arab", [2241, 221, 2302, 320]] + ], + + ["Grek", [412, 318, 2215, 416]], + ["Latn", [2208, 318, 2424, 416]] + ], + ... + ], + "text_direction" : "horizontal-tb" + } + +Script detection is automatically enabled; by explicitly disabling script +detection the `boxes` field will contain only a list of line bounding boxes: + +.. code-block:: console + + [546, 216, 1626, 309], + [2169, 197, 2423, 244], + [1676, 221, 2293, 320], + ... + [503, 2641, 848, 2681] + +Available page segmentation parameters are: + +=============================================== ====== +option action +=============================================== ====== +-d, --text-direction Sets principal text direction. Valid values are `horizontal-lr`, `horizontal-rl`, `vertical-lr`, and `vertical-rl`. +--scale FLOAT Estimate of the average line height on the page +-m, --maxcolseps Maximum number of columns in the input document. Set to `0` for uni-column layouts. +-b, --black-colseps / -w, --white-colseps Switch to black column separators. +-r, --remove-hlines / -l, --hlines Disables prefiltering of small horizontal lines. Improves segmenter output on some Arabic texts. +=============================================== ====== + +The parameters specific to the script identification are: + +=============================================== ====== +option action +=============================================== ====== +-s/-n Enables/disables script detection +-a, --allowed-script Whitelists specific scripts for detection output. Other detected script runs are merged with their adjacent scripts, after a heuristic pre-merging step. +=============================================== ====== + +Model Repository +---------------- + +There is a semi-curated `repository +`_ of freely licensed recognition +models that can be accessed from the command line using a few subcommands. For +evaluating a series of models it is also possible to just clone the repository +using the normal git client. + +The ``list`` subcommand retrieves a list of all models available and prints +them including some additional information (identifier, type, and a short +description): + +.. code-block:: console + + $ kraken list + Retrieving model list ✓ + default (pyrnn) - A converted version of en-default.pyrnn.gz + toy (clstm) - A toy model trained on 400 lines of the UW3 data set. + ... + +To access more detailed information the ``show`` subcommand may be used: + +.. code-block:: console + + $ kraken show toy + name: toy.clstm + + A toy model trained on 400 lines of the UW3 data set. + + author: Benjamin Kiessling (mittagessen@l.unchti.me) + http://kraken.re + +If a suitable model has been decided upon it can be retrieved using the ``get`` +subcommand: + +.. code-block:: console + + $ kraken get toy + Retrieving model ✓ + +Models will be placed in $XDG_BASE_DIR and can be accessed using their name as +shown by the ``show`` command, e.g.: + +.. code-block:: console + + $ kraken -i ... ... ocr -m toy + +Additions and updates to existing models are always welcome! Just open a pull +request or write an email. + +Recognition +----------- + +Recognition requires a grey-scale or binarized image, a page segmentation for +that image, and a model file. In particular there is no requirement to use the +page segmentation algorithm contained in the ``segment`` subcommand or the +binarization provided by kraken. + +Multi-script recognition is possible by supplying a script-annotated +segmentation and a mapping between scripts and models: + +.. code-block:: console + + $ kraken -i ... ... ocr -m Grek:porson.clstm -m Latn:antiqua.clstm + +All polytonic Greek text portions will be recognized using the `porson.clstm` +model while Latin text will be fed into the `antiqua.clstm` model. It is +possible to define a fallback model that other text will be fed to: + +.. code-block:: console + + $ kraken -i ... ... ocr -m ... -m ... -m default:porson.clstm + +It is also possible to disable recognition on a particular script by mapping to +the special model keyword `ignore`. Ignored lines will still be serialized but +will not contain any recognition results. + +The ``ocr`` subcommand is able to serialize the recognition results either as +plain text (default), as `hOCR `_, into `ALTO +`_, or abbyyXML containing additional +metadata such as bounding boxes and confidences: + +.. code-block:: console + + $ kraken -i ... ... ocr -t # text output + $ kraken -i ... ... ocr -h # hOCR output + $ kraken -i ... ... ocr -a # ALTO output + $ kraken -i ... ... ocr -y # abbyyXML output + +hOCR output is slightly different from hOCR files produced by ocropus. Each +``ocr_line`` span contains not only the bounding box of the line but also +character boxes (``x_bboxes`` attribute) indicating the coordinates of each +character. In each line alternating sequences of alphanumeric and +non-alphanumeric (in the unicode sense) characters are put into ``ocrx_word`` +spans. Both have bounding boxes as attributes and the recognition confidence +for each character in the ``x_conf`` attribute. + +Paragraph detection has been removed as it was deemed to be unduly dependent on +certain typographic features which may not be valid for your input. diff --git a/2.0.0/_sources/api.rst.txt b/2.0.0/_sources/api.rst.txt new file mode 100644 index 000000000..c133c8419 --- /dev/null +++ b/2.0.0/_sources/api.rst.txt @@ -0,0 +1,94 @@ +kraken API +========== + +.. module:: kraken + +Kraken provides routines which are usable by third party tools. In general +you can expect function in the ``kraken`` package to remain stable. We will try +to keep these backward compatible, but as kraken is still in an early +development stage and the API is still quite rudimentary nothing can be +garantueed. + +kraken.binarization module +-------------------------- + +.. automodule:: kraken.binarization + :members: + :show-inheritance: + +kraken.serialization module +--------------------------- + +.. automodule:: kraken.serialization + :members: + :show-inheritance: + +kraken.pageseg module +--------------------- + +.. automodule:: kraken.pageseg + :members: + :show-inheritance: + +kraken.rpred module +------------------- + +.. automodule:: kraken.rpred + :members: + :show-inheritance: + +kraken.transcribe module +------------------------ + +.. automodule:: kraken.transcribe + :members: + :show-inheritance: + +kraken.linegen module +--------------------- + +.. automodule:: kraken.linegen + :members: + :show-inheritance: + +kraken.lib.models module +------------------------ + +.. automodule:: kraken.lib.models + :members: + :show-inheritance: + +kraken.lib.vgsl module +---------------------- + +.. automodule:: kraken.lib.vgsl + :members: + :show-inheritance: + +kraken.lib.codec +---------------- + +.. automodule:: kraken.lib.codec + :members: + :show-inheritance: + +kraken.lib.train module +----------------------- + +.. automodule:: kraken.lib.train + :members: + :show-inheritance: + +kraken.lib.dataset module +------------------------- + +.. automodule:: kraken.lib.dataset + :members: + :show-inheritance: + +kraken.lib.ctc_decoder +---------------------- + +.. automodule:: kraken.lib.ctc_decoder + :members: + :show-inheritance: diff --git a/2.0.0/_sources/gpu.rst.txt b/2.0.0/_sources/gpu.rst.txt new file mode 100644 index 000000000..fbb66ba76 --- /dev/null +++ b/2.0.0/_sources/gpu.rst.txt @@ -0,0 +1,10 @@ +.. _gpu: + +GPU Acceleration +================ + +The latest version of kraken uses a new pytorch backend which enables GPU +acceleration both for training and recognition. Apart from a compatible Nvidia +GPU, CUDA and cuDNN have to be installed so pytorch can run computation on it. + + diff --git a/2.0.0/_sources/index.rst.txt b/2.0.0/_sources/index.rst.txt new file mode 100644 index 000000000..41c5f767b --- /dev/null +++ b/2.0.0/_sources/index.rst.txt @@ -0,0 +1,154 @@ +kraken +====== + +.. toctree:: + :hidden: + :maxdepth: 2 + + advanced + Training + API + Models + +kraken is a turn-key OCR system forked from `ocropus +`_. It is intended to rectify a number of +issues while preserving (mostly) functional equivalence. + +Features +======== + +kraken's main features are: + + - Script detection and multi-script recognition support + - `Right-to-Left `_, `BiDi + `_, and Top-to-Bottom + script support + - `ALTO `_, abbyXML, and hOCR output + - Word bounding boxes and character cuts + - `Public repository `_ of model files + - :ref:`Lightweight model files ` + - :ref:`Variable recognition network architectures ` + +All functionality not pertaining to OCR and prerequisite steps has been +removed, i.e. no more error rate measuring, etc. + +Pull requests and code contributions are always welcome. + +Installation +============ + +kraken requires some external libraries to run. On Debian/Ubuntu they may be +installed using: + +.. code-block:: console + + # apt install libpangocairo-1.0 libxml2 libblas3 liblapack3 python3-dev python3-pip + +pip +--- + +.. code-block:: console + + $ pip3 install kraken + +or by running pip in the git repository: + +.. code-block:: console + + $ pip3 install . + +conda +----- + +If you are running `Anaconda `_/miniconda, use: + +.. code-block:: console + + $ conda install -c mittagessen kraken + +Models +------ + +Finally you'll have to scrounge up a recognition model to do the actual +recognition of characters. To download the default English text recognition +model and place it in the user's kraken directory: + +.. code-block:: console + + $ kraken get default + +A list of libre models available in the central repository can be retrieved by +running: + +.. code-block:: console + + $ kraken list + +Model metadata can be extracted using: + +.. code-block:: console + + $ kraken show arabic-alam-al-kutub + name: arabic-alam-al-kutub.clstm + + An experimental model for Classical Arabic texts. + + Network trained on 889 lines of [0] as a test case for a general Classical + Arabic model. Ground truth was prepared by Sarah Savant + and Maxim Romanov . + + Vocalization was omitted in the ground truth. Training was stopped at ~35000 + iterations with an accuracy of 97%. + + [0] Ibn al-Faqīh (d. 365 AH). Kitāb al-buldān. Edited by Yūsuf al-Hādī, 1st + edition. Bayrūt: ʿĀlam al-kutub, 1416 AH/1996 CE. + alphabet: !()-.0123456789:[] «»،؟ءابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ARABIC + MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW + +Quickstart +========== + +Recognizing text on an image using the default parameters including the +prerequisite steps of binarization and page segmentation: + +.. code-block:: console + + $ kraken -i image.tif image.txt binarize segment ocr + Loading RNN ✓ + Processing ⣻ + +To binarize a single image using the nlbin algorithm: + +.. code-block:: console + + $ kraken -i image.tif bw.tif binarize + +To segment a binarized image into reading-order sorted lines: + +.. code-block:: console + + $ kraken -i bw.tif lines.json segment + +To OCR a binarized image using the default RNN and the previously generated +page segmentation: + +.. code-block:: console + + $ kraken -i bw.tif image.txt ocr --lines lines.json + +All commands and their parameters are documented, just add the standard +``--help`` flag for further information. + +Training Tutorial +================= + +There is a training tutorial at :doc:`training`. + +.. _license: + +License +======= + +``Kraken`` is provided under the terms and conditions of the `Apache 2.0 +License `_ retained +from the original ``ocropus`` distribution. diff --git a/2.0.0/_sources/ketos.rst.txt b/2.0.0/_sources/ketos.rst.txt new file mode 100644 index 000000000..1af797490 --- /dev/null +++ b/2.0.0/_sources/ketos.rst.txt @@ -0,0 +1,519 @@ +.. _ketos: + +Training +======== + +This page describes the training utilities available through the ``ketos`` +command line utility in depth. For a gentle introduction on model training +please refer to the :ref:`tutorial `. + +Thanks to the magic of `Connectionist Temporal Classification +`_ prerequisites for creating a +new recognition model are quite modest. The basic requirement is a number of +text lines (``ground truth``) that correspond to line images and some time for +training. + +Transcription +------------- + +Transcription is done through local browser based HTML transcription +environments. These are created by the ``ketos transcribe`` command line util. +Its basic input is just a number of image files and an output path to write the +HTML file to: + +.. code-block:: console + + $ ketos transcribe -o output.html image_1.png image_2.png ... + +While it is possible to put multiple images into a single transcription +environment splitting into one-image-per-HTML will ease parallel transcription +by multiple people. + +The above command reads in the image files, converts them to black and white, +tries to split them into line images, and puts an editable text field next to +the image in the HTML. There are a handful of option changing the output: + +=============================================== ====== +option action +=============================================== ====== +-d, --text-direction Sets the principal text direction both for the segmenter and in the HTML. Can be one of horizontal-lr, horizontal-rl, vertical-lr, vertical-rl. +--scale A segmenter parameter giving an estimate of average line height. Usually it shouldn't be set manually. +--bw / --orig Disables binarization of input images. If color or grayscale training data is desired this option has to be set. +-m, --maxcolseps A segmenter parameter limiting the number of columns that can be found in the input image by setting the maximum number of column separators. Set to 0 to disable column detection. +-b, --black_colseps / -w, --white_colseps A segmenter parameter selecting white or black column separators. +-f, --font The font family to use for rendering the text in the HTML. +-fs, --font-style The font style to use in the HTML. +-p, --prefill A model to use for prefilling the transcription. (Optional) +-o, --output Output HTML file. +=============================================== ====== + +It is possible to use an existing model to prefill the transcription environments: + +.. code-block:: console + + $ ketos transcribe -p ~/arabic.mlmodel -p output.html image_1.png image_2.png ... + +Transcription has to be diplomatic, i.e. contain the exact character sequence +in the line image, including original orthography. Some deviations, such as +consistently omitting vocalization in Arabic texts, is possible as long as they +are systematic and relatively minor. + +After transcribing a number of lines the results have to be saved, either using +the ``Download`` button on the lower right or through the regular ``Save Page +As`` function of the browser. All the work done is contained directly in the +saved files and it is possible to save partially transcribed files and continue +work later. + +Next the contents of the filled transcription environments have to be +extracted through the ``ketos extract`` command: + +.. code-block:: console + + $ ketos extract --output output_directory *.html + + +There are some options dealing with color images and text normalization: + +======================================================= ====== +option action +======================================================= ====== +-b, --binarize / --no-binarize Binarizes color/grayscale images (default) or retains the original in the output. +-u, --normalization Normalizes text to one of the following Unicode normalization forms: NFD, NFKD, NFC, NFKC +-s, --normalize-whitespace / --no-normalize-whitespace Normalizes whitespace in extracted text. There are several different Unicode `whitespace + `_ characters that + are replaced by a standard space when not disabled. +--reorder / --no-reorder Tells ketos to reorder the code + point for each line into + left-to-right order. Unicode + code points are always in + reading order, e.g. the first + code point in an Arabic line + will be the rightmost + character. This option reorders + them into ``display order``, + i.e. the first code point is + the leftmost, the second one + the next from the left and so + on. The ``train`` subcommand + does this automatically, so it + usually isn't needed. +-r, --rotate / --no-rotate Skips rotation of vertical lines. +-o, --output Output directory, defaults to ``training`` +======================================================= ====== + +The result will be a directory filled with line image text pairs ``NNNNNN.png`` +and ``NNNNNN.gt.txt`` and a ``manifest.txt`` containing a list of all extracted +lines. + +Training +-------- + +The training utility allows training of :ref:`VGSL ` specified models +both from scratch and from existing models. Training data is in all cases just +a directory containing image-text file pairs as produced by the +``transcribe/extract`` tools. Here are its command line options: + +======================================================= ====== +option action +======================================================= ====== +-p, --pad Left and right padding around lines +-o, --output Output model file prefix. Defaults to model. +-s, --spec VGSL spec of the network to train. CTC layer + will be added automatically. default: + [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 + Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] +-a, --append Removes layers before argument and then + appends spec. Only works when loading an + existing model +-i, --load Load existing file to continue training +-F, --savefreq Model save frequency in epochs during + training +-R, --report Report creation frequency in epochs +-q, --quit Stop condition for training. Set to `early` + for early stopping (default) or `dumb` for fixed + number of epochs. +-N, --epochs Number of epochs to train for. Set to -1 for indefinite training. +--lag Number of epochs to wait before stopping + training without improvement. Only used when using early stopping. +--min-delta Minimum improvement between epochs to reset + early stopping. Defaults to 0.005. +-d, --device Select device to use (cpu, cuda:0, cuda:1,...). GPU acceleration requires CUDA. +--optimizer Select optimizer (Adam, SGD, RMSprop). +-r, --lrate Learning rate [default: 0.001] +-m, --momentum Momentum used with SGD optimizer. Ignored otherwise. +-w, --weight-decay Weight decay. +--schedule Sets the learning rate scheduler. May be either constant or 1cycle. For 1cycle + the cycle length is determined by the `--epoch` option. +-p, --partition Ground truth data partition ratio between train/validation set +-u, --normalization Ground truth Unicode normalization. One of NFC, NFKC, NFD, NFKD. +-c, --codec Load a codec JSON definition (invalid if loading existing model) +--resize Codec/output layer resizing option. If set + to `add` code points will be added, `both` + will set the layer to match exactly the + training data, `fail` will abort if training + data and model codec do not match. Only valid when refining an existing model. +-n, --reorder / --no-reorder Reordering of code points to display order. +-t, --training-files File(s) with additional paths to training data. Used to + enforce an explicit train/validation set split and deal with + training sets with more lines than the command line can process. Can be used more than once. +-e, --evaluation-files File(s) with paths to evaluation data. Overrides the `-p` parameter. +--preload / --no-preload Hard enable/disable for training data preloading. Preloading + training data into memory is enabled per default for sets with less than 2500 lines. +--threads Number of OpenMP threads when running on CPU. Defaults to min(4, #cores). +======================================================= ====== + +From Scratch +~~~~~~~~~~~~ + +The absolut minimal example to train a new model is: + +.. code-block:: console + + $ ketos train training_data/*.png + +Training will continue until the error does not improve anymore and the best +model (among intermediate results) will be saved in the current directory. + +In some cases, such as color inputs, changing the network architecture might be +useful: + +.. code-block:: console + + $ ketos train -s '[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do]' syr/*.png + +Complete documentation for the network description language can be found on the +:ref:`VGSL ` page. + +Sometimes the early stopping default parameters might produce suboptimal +results such as stopping training too soon. Adjusting the minimum delta an/or +lag can be useful: + +.. code-block:: console + + $ ketos train --lag 10 --min-delta 0.001 syr/*.png + +To switch optimizers from Adam to SGD or RMSprop just set the option: + +.. code-block:: console + + $ ketos train --optimizer SGD syr/*.png + +It is possible to resume training from a previously saved model: + +.. code-block:: console + + $ ketos train -i model_25.mlmodel syr/*.png + +Fine Tuning +~~~~~~~~~~~ + +Fine tuning an existing model for another typeface or new characters is also +possible with the same syntax as resuming regular training: + +.. code-block:: console + + $ ketos train -i model_best.mlmodel syr/*.png + +The caveat is that the alphabet of the base model and training data have to be +an exact match. Otherwise an error will be raised: + +.. code-block:: console + + $ ketos train -i model_5.mlmodel --no-preload kamil/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [0.8616] alphabet mismatch {'~', '»', '8', '9', 'ـ'} + Network codec not compatible with training set + [0.8620] Training data and model codec alphabets mismatch: {'ٓ', '؟', '!', 'ص', '،', 'ذ', 'ة', 'ي', 'و', 'ب', 'ز', 'ح', 'غ', '~', 'ف', ')', 'د', 'خ', 'م', '»', 'ع', 'ى', 'ق', 'ش', 'ا', 'ه', 'ك', 'ج', 'ث', '(', 'ت', 'ظ', 'ض', 'ل', 'ط', '؛', 'ر', 'س', 'ن', 'ء', 'ٔ', '«', 'ـ', 'ٕ'} + +There are two modes dealing with mismatching alphabets, ``add`` and ``both``. +``add`` resizes the output layer and codec of the loaded model to include all +characters in the new training set without removing any characters. ``both`` +will make the resulting model an exact match with the new training set by both +removing unused characters from the model and adding new ones. + +.. code-block:: console + + $ ketos -v train --resize add -i model_5.mlmodel syr/*.png + ... + [0.7943] Training set 788 lines, validation set 88 lines, alphabet 50 symbols + ... + [0.8337] Resizing codec to include 3 new code points + [0.8374] Resizing last layer in network to 52 outputs + ... + +In this example 3 characters were added for a network that is able to +recognize 52 different characters after sufficient additional training. + +.. code-block:: console + + $ ketos -v train --resize both -i model_5.mlmodel syr/*.png + ... + [0.7593] Training set 788 lines, validation set 88 lines, alphabet 49 symbols + ... + [0.7857] Resizing network or given codec to 49 code sequences + [0.8344] Deleting 2 output classes from network (46 retained) + ... + +In ``both`` mode 2 of the original characters were removed and 3 new ones were added. + + +Slicing +~~~~~~~ + +Refining on mismatched alphabets has its limits. If the alphabets are highly +different the modification of the final linear layer to add/remove character +will destroy the inference capabilities of the network. In those cases it is +faster to slice off the last few layers of the network and only train those +instead of a complete network from scratch. + +Taking the default network definition as printed in the debug log we can see +the layer indices of the model: + +.. code-block:: console + + [0.8760] Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 48 outputs + [0.8762] layer type params + [0.8790] 0 conv kernel 3 x 3 filters 32 activation r + [0.8795] 1 dropout probability 0.1 dims 2 + [0.8797] 2 maxpool kernel 2 x 2 stride 2 x 2 + [0.8802] 3 conv kernel 3 x 3 filters 64 activation r + [0.8804] 4 dropout probability 0.1 dims 2 + [0.8806] 5 maxpool kernel 2 x 2 stride 2 x 2 + [0.8813] 6 reshape from 1 1 x 12 to 1/3 + [0.8876] 7 rnn direction b transposed False summarize False out 100 legacy None + [0.8878] 8 dropout probability 0.5 dims 1 + [0.8883] 9 linear augmented False out 48 + +To remove everything after the initial convolutional stack and add untrained +layers we define a network stub and index for appending: + +.. code-block:: console + + $ ketos train -i model_1.mlmodel --append 7 -s '[Lbx256 Do]' syr/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [0.8014] alphabet mismatch {'8', '3', '9', '7', '܇', '݀', '݂', '4', ':', '0'} + Slicing and dicing model ✓ + +The new model will behave exactly like a new one, except potentially training a +lot faster. + +Testing +------- + +Picking a particular model from a pool or getting a more detailled look on the +recognition accuracy can be done with the `test` command. It uses transcribed +lines, the test set, in the same format as the `train` command, recognizes the +line images with one or more models, and creates a detailled report of the +differences from the ground truth for each of them. + +======================================================= ====== +option action +======================================================= ====== +-m, --model Model(s) to evaluate. +-e, --evaluation-files File(s) with paths to evaluation data. +-d, --device Select device to use. +-p, --pad Left and right padding around lines. + + +Transcriptions are handed to the command in the same way as for the `train` +command, either through a manifest with `-e/--evaluation-files` or by just +adding a number of image files as the final argument: + +.. code-block:: console + + $ ketos test -m $model -e test.txt test/*.png + Evaluating $model + Evaluating [####################################] 100% + === report test_model.mlmodel === + + 7012 Characters + 6022 Errors + 14.12% Accuracy + + 5226 Insertions + 2 Deletions + 794 Substitutions + + Count Missed %Right + 1567 575 63.31% Common + 5230 5230 0.00% Arabic + 215 215 0.00% Inherited + + Errors Correct-Generated + 773 { ا } - { } + 536 { ل } - { } + 328 { و } - { } + 274 { ي } - { } + 266 { م } - { } + 256 { ب } - { } + 246 { ن } - { } + 241 { SPACE } - { } + 207 { ر } - { } + 199 { ف } - { } + 192 { ه } - { } + 174 { ع } - { } + 172 { ARABIC HAMZA ABOVE } - { } + 144 { ت } - { } + 136 { ق } - { } + 122 { س } - { } + 108 { ، } - { } + 106 { د } - { } + 82 { ك } - { } + 81 { ح } - { } + 71 { ج } - { } + 66 { خ } - { } + 62 { ة } - { } + 60 { ص } - { } + 39 { ، } - { - } + 38 { ش } - { } + 30 { ا } - { - } + 30 { ن } - { - } + 29 { ى } - { } + 28 { ذ } - { } + 27 { ه } - { - } + 27 { ARABIC HAMZA BELOW } - { } + 25 { ز } - { } + 23 { ث } - { } + 22 { غ } - { } + 20 { م } - { - } + 20 { ي } - { - } + 20 { ) } - { } + 19 { : } - { } + 19 { ط } - { } + 19 { ل } - { - } + 18 { ، } - { . } + 17 { ة } - { - } + 16 { ض } - { } + ... + Average accuracy: 14.12%, (stddev: 0.00) + +The report(s) contains character accuracy measured per script and a detailled +list of confusions. When evaluating multiple models the last line of the output +will the average accuracy and the standard deviation across all of them. + +Artificial Training Data +------------------------ + +It is possible to rely on artificially created training data, instead of +laborously creating ground truth by manual means. A proper typeface and some +text in the target language will be needed. + +For many popular historical fonts there are free reproductions which quite +closely match printed editions. Most are available in your distribution's + +repositories and often shipped with TeX Live. + +Some good places to start for non-Latin scripts are: + +- `Amiri `_, a classical Arabic typeface by Khaled + Hosny +- The `Greek Font Society `_ offers freely + licensed (historical) typefaces for polytonic Greek. +- The friendly religious fanatics from `SIL `_ + assemble a wide variety of fonts for non-Latin scripts. + +Next we need some text to generate artificial line images from. It should be a +typical example of the type of printed works you want to recognize and at least +500-1000 lines in length. + +A minimal invocation to the line generation tool will look like this: + +.. code-block:: console + + $ ketos linegen -f Amiri da1.txt da2.txt + Reading texts ✓ + Read 3692 unique lines + Σ (len: 99) + Symbols: !(),-./0123456789:ABEFGHILMNPRS[]_acdefghiklmnoprstuvyz«»،؟ءآأؤإئابةتثجحخدذرزسشصضطظعغـفقكلمنهوىيپ + Writing images ✓ + +The output will be written to a directory called ``training_data``, although +this may be changed using the ``-o`` option. Each text line is rendered using +the Amiri typeface. + +Alphabet and Normalization +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Let's take a look at important information in the preamble: + +.. code-block:: console + + Read 3692 unique lines + Σ (len: 99) + Symbols: !(),-./0123456789:ABEFGHILMNPRS[]_acdefghiklmnoprstuvyz«»،؟ﺀﺁﺃﺅﺈﺋﺎﺑﺔﺘﺜﺠﺤﺧﺩﺫﺭﺰﺴﺸﺼﻀﻄﻈﻌﻐـﻔﻘﻜﻠﻤﻨﻫﻭﻰﻳپ + +ketos tells us that it found 3692 unique lines which contained 99 different +``symbols`` or ``code points``. We can see the training data contains all of +the Arabic script including accented precomposed characters, but only a subset +of Latin characters, numerals, and punctuation. A trained model will be able to +recognize only these exact symbols, e.g. a ``C`` or ``j`` on the page will +never be recognized. Either accept this limitation or add additional text lines +to the training corpus until the alphabet matches your needs. + +We can also force a normalization form using the ``-u`` option; per default +none is applied. For example: + +.. code-block:: console + + $ ketos linegen -u NFD -f "GFS Philostratos" grc.txt + Reading texts ✓ + Read 2860 unique lines + Σ (len: 132) + Symbols: #&'()*,-./0123456789:;ABCDEGHILMNOPQRSTVWXZ]abcdefghiklmnopqrstuvxy §·ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩαβγδεζηθικλμνξοπρςστυφχψω—‘’“ + Combining Characters: COMBINING GRAVE ACCENT, COMBINING ACUTE ACCENT, COMBINING DIAERESIS, COMBINING COMMA ABOVE, COMBINING REVERSED COMMA ABOVE, COMBINING DOT BELOW, COMBINING GREEK PERISPOMENI, COMBINING GREEK YPOGEGRAMMENI + + + $ ketos linegen -u NFC -f "GFS Philostratos" grc.txt + Reading texts ✓ + Read 2860 unique lines + Σ (len: 231) + Symbols: #&'()*,-./0123456789:;ABCDEGHILMNOPQRSTVWXZ]abcdefghiklmnopqrstuvxy §·ΐΑΒΓΔΕΖΘΙΚΛΜΝΞΟΠΡΣΤΦΧΨΩάέήίαβγδεζηθικλμνξοπρςστυφχψωϊϋόύώἀἁἂἃἄἅἈἌἎἐἑἓἔἕἘἙἜἝἠἡἢἣἤἥἦἧἩἭἮἰἱἳἴἵἶἷἸἹἼὀὁὂὃὄὅὈὉὌὐὑὓὔὕὖὗὙὝὠὡὢὤὥὦὧὨὩὰὲὴὶὸὺὼᾄᾐᾑᾔᾗᾠᾤᾧᾳᾶᾷῃῄῆῇῒῖῥῦῬῳῴῶῷ—‘’“ + Combining Characters: COMBINING ACUTE ACCENT, COMBINING DOT BELOW + +While there hasn't been any study on the effect of different normalizations on +recognition accuracy there are some benefits to NFD, namely decreased model +size and easier validation of the alphabet. + +Other Parameters +~~~~~~~~~~~~~~~~ + +Sometimes it is desirable to draw a certain number of lines randomly from one +or more large texts. The ``-n`` option does just that: + +.. code-block:: console + + $ ketos linegen -u NFD -n 100 -f Amiri da1.txt da2.txt da3.txt da4.txt + Reading texts ✓ + Read 114265 unique lines + Sampling 100 lines ✓ + Σ (len: 64) + Symbols: !(),-./0123456789:[]{}«»،؛؟ءابةتثجحخدذرزسشصضطظعغـفقكلمنهوىي– + Combining Characters: ARABIC MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW + Writing images ⢿ + +It is also possible to adjust to amount of degradation/distortion of line +images by using the ``-s/-r/-d/-ds`` switches: + +.. code-block:: console + + $ ketos linegen -m 0.2 -s 0.002 -r 0.001 -d 3 Downloads/D/A/da1.txt + Reading texts ✓ + Read 859 unique lines + Σ (len: 46) + Symbols: !"-.:،؛؟ءآأؤإئابةتثجحخدذرزسشصضطظعغفقكلمنهوىي + Writing images ⣽ + + +Sometimes the shaping engine misbehaves using some fonts (notably ``GFS +Philostratos``) by rendering texts in certain normalizations incorrectly if the +font does not contain glyphs for decomposed characters. One sign are misplaced +diacritics and glyphs in different fonts. A workaround is renormalizing the +text for rendering purposes (here to NFC): + +.. code-block:: console + + $ ketos linegen -ur NFC -u NFD -f "GFS Philostratos" grc.txt + + diff --git a/2.0.0/_sources/models.rst.txt b/2.0.0/_sources/models.rst.txt new file mode 100644 index 000000000..4ff0b7f90 --- /dev/null +++ b/2.0.0/_sources/models.rst.txt @@ -0,0 +1,58 @@ +.. _models: + +Models +====== + +There are currently three kinds of models containing the recurrent neural +networks doing all the character recognition supported by kraken: ``pronn`` +files serializing old pickled ``pyrnn`` models as protobuf, clstm's native +serialization, and versatile `Core ML +`_ models. + +.. _pyrnn: + +pyrnn +----- + +These are serialized instances of python ``lstm.SeqRecognizer`` objects. Using +such a model just entails loading the pickle and calling the appropriate +functions to perform recognition much like a shared library in other +programming languages. + +Support for these models has been dropped with kraken 1.0 as python 2.7 is +phased out. + +pronn +----- + +Legacy python models can be converted to a protobuf based serialization. These +are loadable by kraken 1.0 and will be automatically converted to Core ML. + +Protobuf models have several advantages over pickled ones. They are noticeably +smaller (80Mb vs 1.8Mb for the default model), don't allow arbitrary code +execution, and are upward compatible with python 3. Because they are so much +more lightweight they are also loaded much faster. + +clstm +----- + +`clstm `_, a small and fast implementation of +LSTM networks that was used in previous kraken versions. The model files can be +loaded with pytorch-based kraken and will be converted to Core ML. + +CoreML +------ + +Core ML allows arbitrary network architectures in a compact serialization with +metadata. This is the default format in pytorch-based kraken. + +Conversion +---------- + +Per default pronn/clstm models are automatically converted to the new Core ML +format when explicitely defined using the ``-m`` option to the ``ocr`` utility +on the command line. They are stored in the user kraken directory (default is +~/.kraken) and will be automatically substituted in future runs. + +If conversion is not desired, e.g. because there is a bug in the conversion +routine, it can be disabled using the ``--disable-autoconversion`` switch. diff --git a/2.0.0/_sources/training.rst.txt b/2.0.0/_sources/training.rst.txt new file mode 100644 index 000000000..82f6ec97b --- /dev/null +++ b/2.0.0/_sources/training.rst.txt @@ -0,0 +1,513 @@ +.. _training: + +Training a kraken model +======================= + +kraken is an optical character recognition package that can be trained fairly +easily for a large number of scripts. In contrast to other system requiring +segmentation down to glyph level before classification, it is uniquely suited +for the recognition of connected scripts, because the neural network is trained +to assign correct character to unsegmented training data. + +Training a new model for kraken requires a variable amount of training data +manually generated from page images which have to be typographically similar to +the target prints that are to be recognized. As the system works on unsegmented +inputs for both training and recognition and its base unit is a text line, +training data are just transcriptions aligned to line images. + +Installing kraken +----------------- + +The easiest way to install and use kraken is through `conda +`_. kraken works both on Linux and Mac OS +X. After installing conda, download the environment file and create the +environment for kraken: + +.. code-block:: console + + $ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment.yml + $ conda env create -f environment.yml + +Each time you want to use the kraken environment in a shell is has to be +activated first: + +.. code-block:: console + + $ conda activate kraken + +Image acquisition and preprocessing +----------------------------------- + +First a number of high quality scans, preferably color or grayscale and at +least 300dpi are required. Scans should be in a lossless image format such as +TIFF or PNG, images in PDF files have to be extracted beforehand using a tool +such as ``pdftocairo`` or ``pdfimages``. While each of these requirements can +be relaxed to a degree, the final accuracy will suffer to some extent. For +example, only slightly compressed JPEG scans are generally suitable for +training and recognition. + +Depending on the source of the scans some preprocessing such as splitting scans +into pages, correcting skew and warp, and removing speckles is usually +required. For complex layouts such as newspapers it is advisable to split the +page manually into columns as the line extraction algorithm run to create +transcription environments does not deal well with non-codex page layouts. A +fairly user-friendly software for semi-automatic batch processing of image +scans is `Scantailor `_ albeit most work can be done +using a standard image editor. + +The total number of scans required depends on the nature of the script to be +recognized. Only features that are found on the page images and training data +derived from it can later be recognized, so it is important that the coverage +of typographic features is exhaustive. Training a single script model for a +fairly small script such as Arabic or Hebrew requires at least 800 lines, while +multi-script models, e.g. combined polytonic Greek and Latin, will require +significantly more transcriptions. + +There is no hard rule for the amount of training data and it may be required to +retrain a model after the initial training data proves insufficient. Most +``western`` texts contain between 25 and 40 lines per page, therefore upward of +30 pages have to be preprocessed and later transcribed. + +Transcription +------------- + +Transcription is done through local browser based HTML transcription +environments. These are created by the ``ketos transcribe`` command line util +that is part of kraken. Its basic input is just a number of image files and an +output path to write the HTML file to: + +.. code-block:: console + + $ ketos transcribe -o output.html image_1.png image_2.png ... + +While it is possible to put multiple images into a single transcription +environment splitting into one-image-per-HTML will ease parallel transcription +by multiple people. + +The above command reads in the image files, converts them to black and white if +necessary, tries to split them into line images, and puts an editable text +field next to the image in the HTML. + +Transcription has to be diplomatic, i.e. contain the exact character sequence +in the line image, including original orthography. Some deviations, such as +consistently omitting vocalization in Arabic texts, is possible as long as they +are systematic and relatively minor. + +.. note:: + + The page segmentation algorithm extracting lines from images is + optimized for ``western`` page layouts and may recognize lines + erroneously, lumping multiple lines together or cutting them in half. + The most efficient way to deal with these errors is just skipping the + affected lines by leaving the text box empty. + +.. tip:: + + Copy-paste transcription can significantly speed up the whole process. + Either transcribe scans of a work where a digital edition already + exists (but does not for typographically similar prints) or find a + sufficiently similar edition as a base. + +After transcribing a number of lines the results have to be saved, either using +the ``Download`` button on the lower left or through the regular ``Save Page +As`` (CTRL+S) function of the browser. All the work done is contained directly +in the saved files and it is possible to save partially transcribed files and +continue work later. + +Next the contents of the filled transcription environments have to be +extracted through the ``ketos extract`` command: + +.. code-block:: console + + $ ketos extract --output output_directory --normalization NFD *.html + +with + +--output + The output directory where all line image-text pairs (training data) + are written, defaulting to ``training/`` +--normalization + Unicode has code points to encode most glyphs encountered in the wild. + A lesser known feature is that there usually are multiple ways to + encode a glyph. `Unicode normalization + `_ ensures that equal glyphs are + encoded in the same way, i.e. that the encoded representation across + the training data set is consistent and there is only one way the + network can recognize a particular feature on the page. Usually it is + sufficient to set the normalization to Normalization Form + Decomposed (NFD), as it reduces the the size of the overall script to + be recognized slightly. + +The result will be a directory filled with line image text pairs ``NNNNNN.png`` +and ``NNNNNN.gt.txt`` and a ``manifest.txt`` containing a list of all extracted +lines. + +.. note:: + + At this point it is recommended to review the content of the training + data directory before proceeding. + +Training +-------- + +The training data in ``output_dir`` may now be used to train a new model by +invoking the ``ketos train`` command. Just hand a list of images to the command +such as: + +.. code-block:: console + + $ ketos train output_dir/*.png + +to start training. + +A number of lines will be split off into a separate held-out set that is used +to estimate the actual recognition accuracy achieved in the real world. These +are never shown to the network during training but will be recognized +periodically to evaluate the accuracy of the model. Per default the validation +set will comprise of 10% of the training data. + +Basic model training is mostly automatic albeit there are multiple parameters +that can be adjusted: + +--output + Sets the prefix for models generated during training. They will best as + ``prefix_epochs.mlmodel``. +--report + How often evaluation passes are run on the validation set. It is an + integer equal or larger than 1 with 1 meaning a report is created each + time the complete training set has been seen by the network. +--savefreq + How often intermediate models are saved to disk. It is an integer with + the same semantics as ``--report``. +--load + Continuing training is possible by loading an existing model file with + ``--load``. To continue training from a base model with another + training set refer to the full :ref:`ketos ` documentation. +--preload + Enables/disables preloading of the training set into memory for + accelerated training. The default setting preloads data sets with less + than 2500 lines, explicitly adding ``--preload`` will preload arbitrary + sized sets. ``--no-preload`` disables preloading in all circumstances. + +Training a network will take some time on a modern computer, even with the +default parameters. While the exact time required is unpredictable as training +is a somewhat random process a rough guide is that accuracy seldomly improves +after 50 epochs reached between 8 and 24 hours of training. + +When to stop training is a matter of experience; the default setting employs a +fairly reliable approach known as `early stopping +`_ that stops training as soon as +the error rate on the validation set doesn't improve anymore. This will +prevent `overfitting `_, i.e. +fitting the model to recognize only the training data properly instead of the +general patterns contained therein. + +.. code-block:: console + + $ ketos train output_dir/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'} + Initializing model ✓ + Accuracy report (0) -1.5951 3680 9550 + epoch 0/-1 [####################################] 788/788 + Accuracy report (1) 0.0245 3504 3418 + epoch 1/-1 [####################################] 788/788 + Accuracy report (2) 0.8445 3504 545 + epoch 2/-1 [####################################] 788/788 + Accuracy report (3) 0.9541 3504 161 + epoch 3/-1 [------------------------------------] 13/788 0d 00:22:09 + ... + +By now there should be a couple of models model_name-1.mlmodel, +model_name-2.mlmodel, ... in the directory the script was executed in. Lets +take a look at each part of the output. + +.. code-block:: console + + Building training set [####################################] 100% + Building validation set [####################################] 100% + +shows the progress of loading the training and validation set into memory. This +might take a while as preprocessing the whole set and putting it into memory is +computationally intensive. Loading can be made faster without preloading at the +cost of performing preprocessing repeatedlyduring the training process. + +.. code-block:: console + + [270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'} + +is a warning about missing characters in either the validation or training set, +i.e. that the alphabets of the sets are not equal. Increasing the size of the +validation set will often remedy this warning. + +.. code-block:: console + + Accuracy report (2) 0.8445 3504 545 + +this line shows the results of the validation set evaluation. The error after 2 +epochs is 545 incorrect characters out of 3504 characters in the validation set +for a character accuracy of 84.4%. It should decrease fairly rapidly. If +accuracy remains around 0.30 something is amiss, e.g. non-reordered +right-to-left or wildly incorrect transcriptions. Abort training, correct the +error(s) and start again. + +After training is finished the best model is saved as +``model_name_best.mlmodel``. It is highly recommended to also archive the +training log and data for later reference. + +``ketos`` can also produce more verbose output with training set and network +information by appending one or more ``-v`` to the command: + +.. code-block:: console + + $ ketos -vv train syr/*.png + [0.7272] Building ground truth set from 876 line images + [0.7281] Taking 88 lines from training for evaluation + ... + [0.8479] Training set 788 lines, validation set 88 lines, alphabet 48 symbols + [0.8481] alphabet mismatch {'\xa0', '0', ':', '݀', '܇', '݂', '5'} + [0.8482] grapheme count + [0.8484] SPACE 5258 + [0.8484] ܐ 3519 + [0.8485] ܘ 2334 + [0.8486] ܝ 2096 + [0.8487] ܠ 1754 + [0.8487] ܢ 1724 + [0.8488] ܕ 1697 + [0.8489] ܗ 1681 + [0.8489] ܡ 1623 + [0.8490] ܪ 1359 + [0.8491] ܬ 1339 + [0.8491] ܒ 1184 + [0.8492] ܥ 824 + [0.8492] . 811 + [0.8493] COMBINING DOT BELOW 646 + [0.8493] ܟ 599 + [0.8494] ܫ 577 + [0.8495] COMBINING DIAERESIS 488 + [0.8495] ܚ 431 + [0.8496] ܦ 428 + [0.8496] ܩ 307 + [0.8497] COMBINING DOT ABOVE 259 + [0.8497] ܣ 256 + [0.8498] ܛ 204 + [0.8498] ܓ 176 + [0.8499] ܀ 132 + [0.8499] ܙ 81 + [0.8500] * 66 + [0.8501] ܨ 59 + [0.8501] ܆ 40 + [0.8502] [ 40 + [0.8503] ] 40 + [0.8503] 1 18 + [0.8504] 2 11 + [0.8504] ܇ 9 + [0.8505] 3 8 + [0.8505] 6 + [0.8506] 5 5 + [0.8506] NO-BREAK SPACE 4 + [0.8507] 0 4 + [0.8507] 6 4 + [0.8508] : 4 + [0.8508] 8 4 + [0.8509] 9 3 + [0.8510] 7 3 + [0.8510] 4 3 + [0.8511] SYRIAC FEMININE DOT 1 + [0.8511] SYRIAC RUKKAKHA 1 + [0.8512] Encoding training set + [0.9315] Creating new model [1,1,0,48 Lbx100 Do] with 49 outputs + [0.9318] layer type params + [0.9350] 0 rnn direction b transposed False summarize False out 100 legacy None + [0.9361] 1 dropout probability 0.5 dims 1 + [0.9381] 2 linear augmented False out 49 + [0.9918] Constructing RMSprop optimizer (lr: 0.001, momentum: 0.9) + [0.9920] Set OpenMP threads to 4 + [0.9920] Moving model to device cpu + [0.9924] Starting evaluation run + + +indicates that the training is running on 788 transcribed lines and a +validation set of 88 lines. 49 different classes, i.e. Unicode code points, +where found in these 788 lines. These affect the output size of the network; +obviously only these 49 different classes/code points can later be output by +the network. Importantly, we can see that certain characters occur markedly +less often than others. Characters like the Syriac feminine dot and numerals +that occur less than 10 times will most likely not be recognized well by the +trained net. + + +Evaluation and Validation +------------------------- + +While output during training is detailed enough to know when to stop training +one usually wants to know the specific kinds of errors to expect. Doing more +in-depth error analysis also allows to pinpoint weaknesses in the training +data, e.g. above average error rates for numerals indicate either a lack of +representation of numerals in the training data or erroneous transcription in +the first place. + +First the trained model has to be applied to some line transcriptions with the +`ketos test` command: + +.. code-block:: console + + $ ketos test -m syriac_best.mlmodel lines/*.png + Loading model syriac_best.mlmodel ✓ + Evaluating syriac_best.mlmodel + Evaluating [#-----------------------------------] 3% 00:04:56 + ... + +After all lines have been processed a evaluation report will be printed: + +.. code-block:: console + + === report === + + 35619 Characters + 336 Errors + 99.06% Accuracy + + 157 Insertions + 81 Deletions + 98 Substitutions + + Count Missed %Right + 27046 143 99.47% Syriac + 7015 52 99.26% Common + 1558 60 96.15% Inherited + + Errors Correct-Generated + 25 { } - { COMBINING DOT BELOW } + 25 { COMBINING DOT BELOW } - { } + 15 { . } - { } + 15 { COMBINING DIAERESIS } - { } + 12 { ܢ } - { } + 10 { } - { . } + 8 { COMBINING DOT ABOVE } - { } + 8 { ܝ } - { } + 7 { ZERO WIDTH NO-BREAK SPACE } - { } + 7 { ܆ } - { } + 7 { SPACE } - { } + 7 { ܣ } - { } + 6 { } - { ܝ } + 6 { COMBINING DOT ABOVE } - { COMBINING DIAERESIS } + 5 { ܙ } - { } + 5 { ܬ } - { } + 5 { } - { ܢ } + 4 { NO-BREAK SPACE } - { } + 4 { COMBINING DIAERESIS } - { COMBINING DOT ABOVE } + 4 { } - { ܒ } + 4 { } - { COMBINING DIAERESIS } + 4 { ܗ } - { } + 4 { } - { ܬ } + 4 { } - { ܘ } + 4 { ܕ } - { ܢ } + 3 { } - { ܕ } + 3 { ܐ } - { } + 3 { ܗ } - { ܐ } + 3 { ܝ } - { ܢ } + 3 { ܀ } - { . } + 3 { } - { ܗ } + + ..... + +The first section of the report consists of a simple accounting of the number +of characters in the ground truth, the errors in the recognition output and the +resulting accuracy in per cent. + +The next table lists the number of insertions (characters occuring in the +ground truth but not in the recognition output), substitutions (misrecognized +characters), and deletions (superfluous characters recognized by the model). + +Next is a grouping of errors (insertions and substitutions) by Unicode script. + +The final part of the report are errors sorted by frequency and a per +character accuracy report. Importantly most errors are incorrect recognition of +combining marks such as dots and diaereses. These may have several sources: +different dot placement in training and validation set, incorrect transcription +such as non-systematic transcription, or unclean speckled scans. Depending on +the error source, correction most often involves adding more training data and +fixing transcriptions. Sometimes it may even be advisable to remove +unrepresentative data from the training set. + +Recognition +----------- + +The ``kraken`` utility is employed for all non-training related tasks. Optical +character recognition is a multi-step process consisting of binarization +(conversion of input images to black and white), page segmentation (extracting +lines from the image), and recognition (converting line image to character +sequences). All of these may be run in a single call like this: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m MODEL_FILE + +producing a text file from the input image. There are also `hocr +`_ and `ALTO `_ output +formats available through the appropriate switches: + +.. code-block:: console + + $ kraken -i ... ocr -h + $ kraken -i ... ocr -a + +For debugging purposes it is sometimes helpful to run each step manually and +inspect intermediate results: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE BW_IMAGE binarize + $ kraken -i BW_IMAGE LINES segment + $ kraken -i BW_IMAGE OUTPUT_FILE ocr -l LINES ... + +It is also possible to recognize more than one file at a time by just chaining +``-i ... ...`` clauses like this: + +.. code-block:: console + + $ kraken -i input_1 output_1 -i input_2 output_2 ... + +Finally, there is an central repository containing freely available models. +Getting a list of all available models: + +.. code-block:: console + + $ kraken list + +Retrieving model metadata for a particular model: + +.. code-block:: console + + $ kraken show arabic-alam-al-kutub + name: arabic-alam-al-kutub.mlmodel + + An experimental model for Classical Arabic texts. + + Network trained on 889 lines of [0] as a test case for a general Classical + Arabic model. Ground truth was prepared by Sarah Savant + and Maxim Romanov . + + Vocalization was omitted in the ground truth. Training was stopped at ~35000 + iterations with an accuracy of 97%. + + [0] Ibn al-Faqīh (d. 365 AH). Kitāb al-buldān. Edited by Yūsuf al-Hādī, 1st + edition. Bayrūt: ʿĀlam al-kutub, 1416 AH/1996 CE. + alphabet: !()-.0123456789:[] «»،؟ءابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ARABIC + MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW + +and actually fetching the model: + +.. code-block:: console + + $ kraken get arabic-alam-al-kutub + +The downloaded model can then be used for recognition by the name shown in its metadata, e.g.: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m arabic-alam-al-kutub.mlmodel + +For more documentation see the kraken `website `_. diff --git a/2.0.0/_sources/vgsl.rst.txt b/2.0.0/_sources/vgsl.rst.txt new file mode 100644 index 000000000..c3e475d30 --- /dev/null +++ b/2.0.0/_sources/vgsl.rst.txt @@ -0,0 +1,185 @@ +.. _vgsl: + +VGSL network specification +========================== + +kraken implements a dialect of the Variable-size Graph Specification Language +(VGSL), enabling the specification of different network architectures for image +processing purposes using a short definition string. + +Basics +------ + +A VGSL specification consists of an input block, one or more layers, and an +output block. For example: + +.. code-block:: console + + [1,48,0,1 Cr3,3,32 Mp2,2 Cr3,3,64 Mp2,2 S1(1x12)1,3 Lbx100 Do O1c103] + +The first block defines the input in order of [batch, heigh, width, channels] +with zero-valued dimensions being variable. Integer valued height or width +input specifications will result in the input images being automatically scaled +in either dimension. + +When channels are set to 1 grayscale or B/W inputs are expected, 3 expects RGB +color images. Higher values in combination with a height of 1 result in the +network being fed 1 pixel wide grayscale strips scaled to the size of the +channel dimension. + +After the input, a number of layers are defined. Layers operate on the channel +dimension; this is intuitive for convolutional layers but a recurrent layer +doing sequence classification along the width axis on an image of a particular +height requires the height dimension to be moved to the channel dimension, +e.g.: + +.. code-block:: console + + [1,48,0,1 S1(1x48)1,3 Lbx100 O1c103] + +or using the alternative slightly faster formulation: + +.. code-block:: console + + [1,1,0,48 Lbx100 O1c103] + +Finally an output definition is appended. When training sequence classification +networks with the provided tools the appropriate output definition is +automatically appended to the network based on the alphabet of the training +data. + +Examples +-------- + +.. code-block:: console + + [1,1,0,48 Lbx100 Do 01c59] + + Creating new model [1,1,0,48 Lbx100 Do] with 59 outputs + layer type params + 0 rnn direction b transposed False summarize False out 100 legacy None + 1 dropout probability 0.5 dims 1 + 2 linear augmented False out 59 + +A simple recurrent recognition model with a single LSTM layer classifying lines +normalized to 48 pixels in height. + +.. code-block:: console + + [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do 01c59] + + Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 59 outputs + layer type params + 0 conv kernel 3 x 3 filters 32 activation r + 1 dropout probability 0.1 dims 2 + 2 maxpool kernel 2 x 2 stride 2 x 2 + 3 conv kernel 3 x 3 filters 64 activation r + 4 dropout probability 0.1 dims 2 + 5 maxpool kernel 2 x 2 stride 2 x 2 + 6 reshape from 1 1 x 12 to 1/3 + 7 rnn direction b transposed False summarize False out 100 legacy None + 8 dropout probability 0.5 dims 1 + 9 linear augmented False out 59 + +A model with a small convolutional stack before a recurrent LSTM layer. The +extended dropout layer syntax is used to reduce drop probability on the depth +dimension as the default is too high for convolutional layers. The remainder of +the height dimension (`12`) is reshaped into the depth dimensions before +applying the final recurrent and linear layers. + +.. code-block:: console + + [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do 01c59] + + Creating new model [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do] with 59 outputs + layer type params + 0 conv kernel 3 x 3 filters 16 activation r + 1 maxpool kernel 3 x 3 stride 3 x 3 + 2 rnn direction f transposed True summarize True out 64 legacy None + 3 rnn direction b transposed False summarize False out 128 legacy None + 4 rnn direction b transposed False summarize False out 256 legacy None + 5 dropout probability 0.5 dims 1 + 6 linear augmented False out 59 + +A model with arbitrary sized color image input, an initial summarizing +recurrent layer to squash the height to 64, followed by 2 bi-directional +recurrent layers and a linear projection. + +Convolutional Layers +-------------------- + +.. code-block:: console + + C[{name}](s|t|r|l|m)[{name}],, + s = sigmoid + t = tanh + r = relu + l = linear + m = softmax + +Adds a 2D convolution with kernel size `(y, x)` and `d` output channels, applying +the selected nonlinearity. + +Recurrent Layers +---------------- + +.. code-block:: console + + L[{name}](f|r|b)(x|y)[s][{name}] LSTM cell with n outputs. + G[{name}](f|r|b)(x|y)[s][{name}] GRU cell with n outputs. + f runs the RNN forward only. + r runs the RNN reversed only. + b runs the RNN bidirectionally. + s (optional) summarizes the output in the requested dimension, return the last step. + +Adds either an LSTM or GRU recurrent layer to the network using eiter the `x` +(width) or `y` (height) dimension as the time axis. Input features are the +channel dimension and the non-time-axis dimension (height/width) is treated as +another batch dimension. For example, a `Lfx25` layer on an `1, 16, 906, 32` +input will execute 16 independent forward passes on `906x32` tensors resulting +in an output of shape `1, 16, 906, 25`. If this isn't desired either run a +summarizing layer in the other direction, e.g. `Lfys20` for an input `1, 1, +906, 20`, or prepend a reshape layer `S1(1x16)1,3` combining the height and +channel dimension for an `1, 1, 906, 512` input to the recurrent layer. + +Helper and Plumbing Layers +-------------------------- + +Max Pool +^^^^^^^^ +.. code-block:: console + + Mp[{name}],[,,] + +Adds a maximum pooling with `(y, x)` kernel_size and `(y_stride, x_stride)` stride. + +Reshape +^^^^^^^ + +.. code-block:: console + + S[{name}](x), Splits one dimension, moves one part to another + dimension. + +The `S` layer reshapes a source dimension `d` to `a,b` and distributes `a` into +dimension `e`, respectively `b` into `f`. Either `e` or `f` has to be equal to +`d`. So `S1(1, 48)1, 3` on an `1, 48, 1020, 8` input will first reshape into +`1, 1, 48, 1020, 8`, leave the `1` part in the height dimension and distribute +the `48` sized tensor into the channel dimension resulting in a `1, 1, 1024, +48*8=384` sized output. `S` layers are mostly used to remove undesirable non-1 +height before a recurrent layer. + +.. note:: + + This `S` layer is equivalent to the one implemented in the tensorflow + implementation of VGSL, i.e. behaves differently from tesseract. + +Regularization Layers +--------------------- + +.. code-block:: console + + Do[{name}][],[] Insert a 1D or 2D dropout layer + +Adds an 1D or 2D dropout layer with a given probability. Defaults to `0.5` drop +probability and 1D dropout. Set to `dim` to `2` after convolutional layers. diff --git a/2.0.0/_static/alabaster.css b/2.0.0/_static/alabaster.css new file mode 100644 index 000000000..e3174bf93 --- /dev/null +++ b/2.0.0/_static/alabaster.css @@ -0,0 +1,708 @@ +@import url("basic.css"); + +/* -- page layout ----------------------------------------------------------- */ + +body { + font-family: Georgia, serif; + font-size: 17px; + background-color: #fff; + color: #000; + margin: 0; + padding: 0; +} + + +div.document { + width: 940px; + margin: 30px auto 0 auto; +} + +div.documentwrapper { + float: left; + width: 100%; +} + +div.bodywrapper { + margin: 0 0 0 220px; +} + +div.sphinxsidebar { + width: 220px; + font-size: 14px; + line-height: 1.5; +} + +hr { + border: 1px solid #B1B4B6; +} + +div.body { + background-color: #fff; + color: #3E4349; + padding: 0 30px 0 30px; +} + +div.body > .section { + text-align: left; +} + +div.footer { + width: 940px; + margin: 20px auto 30px auto; + font-size: 14px; + color: #888; + text-align: right; +} + +div.footer a { + color: #888; +} + +p.caption { + font-family: inherit; + font-size: inherit; +} + + +div.relations { + display: none; +} + + +div.sphinxsidebar { + max-height: 100%; + overflow-y: auto; +} + +div.sphinxsidebar a { + color: #444; + text-decoration: none; + border-bottom: 1px dotted #999; +} + +div.sphinxsidebar a:hover { + border-bottom: 1px solid #999; +} + +div.sphinxsidebarwrapper { + padding: 18px 10px; +} + +div.sphinxsidebarwrapper p.logo { + padding: 0; + margin: -10px 0 0 0px; + text-align: center; +} + +div.sphinxsidebarwrapper h1.logo { + margin-top: -10px; + text-align: center; + margin-bottom: 5px; + text-align: left; +} + +div.sphinxsidebarwrapper h1.logo-name { + margin-top: 0px; +} + +div.sphinxsidebarwrapper p.blurb { + margin-top: 0; + font-style: normal; +} + +div.sphinxsidebar h3, +div.sphinxsidebar h4 { + font-family: Georgia, serif; + color: #444; + font-size: 24px; + font-weight: normal; + margin: 0 0 5px 0; + padding: 0; +} + +div.sphinxsidebar h4 { + font-size: 20px; +} + +div.sphinxsidebar h3 a { + color: #444; +} + +div.sphinxsidebar p.logo a, +div.sphinxsidebar h3 a, +div.sphinxsidebar p.logo a:hover, +div.sphinxsidebar h3 a:hover { + border: none; +} + +div.sphinxsidebar p { + color: #555; + margin: 10px 0; +} + +div.sphinxsidebar ul { + margin: 10px 0; + padding: 0; + color: #000; +} + +div.sphinxsidebar ul li.toctree-l1 > a { + font-size: 120%; +} + +div.sphinxsidebar ul li.toctree-l2 > a { + font-size: 110%; +} + +div.sphinxsidebar input { + border: 1px solid #CCC; + font-family: Georgia, serif; + font-size: 1em; +} + +div.sphinxsidebar #searchbox input[type="text"] { + width: 160px; +} + +div.sphinxsidebar .search > div { + display: table-cell; +} + +div.sphinxsidebar hr { + border: none; + height: 1px; + color: #AAA; + background: #AAA; + + text-align: left; + margin-left: 0; + width: 50%; +} + +div.sphinxsidebar .badge { + border-bottom: none; +} + +div.sphinxsidebar .badge:hover { + border-bottom: none; +} + +/* To address an issue with donation coming after search */ +div.sphinxsidebar h3.donation { + margin-top: 10px; +} + +/* -- body styles ----------------------------------------------------------- */ + +a { + color: #004B6B; + text-decoration: underline; +} + +a:hover { + color: #6D4100; + text-decoration: underline; +} + +div.body h1, +div.body h2, +div.body h3, +div.body h4, +div.body h5, +div.body h6 { + font-family: Georgia, serif; + font-weight: normal; + margin: 30px 0px 10px 0px; + padding: 0; +} + +div.body h1 { margin-top: 0; padding-top: 0; font-size: 240%; } +div.body h2 { font-size: 180%; } +div.body h3 { font-size: 150%; } +div.body h4 { font-size: 130%; } +div.body h5 { font-size: 100%; } +div.body h6 { font-size: 100%; } + +a.headerlink { + color: #DDD; + padding: 0 4px; + text-decoration: none; +} + +a.headerlink:hover { + color: #444; + background: #EAEAEA; +} + +div.body p, div.body dd, div.body li { + line-height: 1.4em; +} + +div.admonition { + margin: 20px 0px; + padding: 10px 30px; + background-color: #EEE; + border: 1px solid #CCC; +} + +div.admonition tt.xref, div.admonition code.xref, div.admonition a tt { + background-color: #FBFBFB; + border-bottom: 1px solid #fafafa; +} + +div.admonition p.admonition-title { + font-family: Georgia, serif; + font-weight: normal; + font-size: 24px; + margin: 0 0 10px 0; + padding: 0; + line-height: 1; +} + +div.admonition p.last { + margin-bottom: 0; +} + +div.highlight { + background-color: #fff; +} + +dt:target, .highlight { + background: #FAF3E8; +} + +div.warning { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.danger { + background-color: #FCC; + border: 1px solid #FAA; + -moz-box-shadow: 2px 2px 4px #D52C2C; + -webkit-box-shadow: 2px 2px 4px #D52C2C; + box-shadow: 2px 2px 4px #D52C2C; +} + +div.error { + background-color: #FCC; + border: 1px solid #FAA; + -moz-box-shadow: 2px 2px 4px #D52C2C; + -webkit-box-shadow: 2px 2px 4px #D52C2C; + box-shadow: 2px 2px 4px #D52C2C; +} + +div.caution { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.attention { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.important { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.note { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.tip { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.hint { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.seealso { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.topic { + background-color: #EEE; +} + +p.admonition-title { + display: inline; +} + +p.admonition-title:after { + content: ":"; +} + +pre, tt, code { + font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace; + font-size: 0.9em; +} + +.hll { + background-color: #FFC; + margin: 0 -12px; + padding: 0 12px; + display: block; +} + +img.screenshot { +} + +tt.descname, tt.descclassname, code.descname, code.descclassname { + font-size: 0.95em; +} + +tt.descname, code.descname { + padding-right: 0.08em; +} + +img.screenshot { + -moz-box-shadow: 2px 2px 4px #EEE; + -webkit-box-shadow: 2px 2px 4px #EEE; + box-shadow: 2px 2px 4px #EEE; +} + +table.docutils { + border: 1px solid #888; + -moz-box-shadow: 2px 2px 4px #EEE; + -webkit-box-shadow: 2px 2px 4px #EEE; + box-shadow: 2px 2px 4px #EEE; +} + +table.docutils td, table.docutils th { + border: 1px solid #888; + padding: 0.25em 0.7em; +} + +table.field-list, table.footnote { + border: none; + -moz-box-shadow: none; + -webkit-box-shadow: none; + box-shadow: none; +} + +table.footnote { + margin: 15px 0; + width: 100%; + border: 1px solid #EEE; + background: #FDFDFD; + font-size: 0.9em; +} + +table.footnote + table.footnote { + margin-top: -15px; + border-top: none; +} + +table.field-list th { + padding: 0 0.8em 0 0; +} + +table.field-list td { + padding: 0; +} + +table.field-list p { + margin-bottom: 0.8em; +} + +/* Cloned from + * https://github.com/sphinx-doc/sphinx/commit/ef60dbfce09286b20b7385333d63a60321784e68 + */ +.field-name { + -moz-hyphens: manual; + -ms-hyphens: manual; + -webkit-hyphens: manual; + hyphens: manual; +} + +table.footnote td.label { + width: .1px; + padding: 0.3em 0 0.3em 0.5em; +} + +table.footnote td { + padding: 0.3em 0.5em; +} + +dl { + margin-left: 0; + margin-right: 0; + margin-top: 0; + padding: 0; +} + +dl dd { + margin-left: 30px; +} + +blockquote { + margin: 0 0 0 30px; + padding: 0; +} + +ul, ol { + /* Matches the 30px from the narrow-screen "li > ul" selector below */ + margin: 10px 0 10px 30px; + padding: 0; +} + +pre { + background: #EEE; + padding: 7px 30px; + margin: 15px 0px; + line-height: 1.3em; +} + +div.viewcode-block:target { + background: #ffd; +} + +dl pre, blockquote pre, li pre { + margin-left: 0; + padding-left: 30px; +} + +tt, code { + background-color: #ecf0f3; + color: #222; + /* padding: 1px 2px; */ +} + +tt.xref, code.xref, a tt { + background-color: #FBFBFB; + border-bottom: 1px solid #fff; +} + +a.reference { + text-decoration: none; + border-bottom: 1px dotted #004B6B; +} + +/* Don't put an underline on images */ +a.image-reference, a.image-reference:hover { + border-bottom: none; +} + +a.reference:hover { + border-bottom: 1px solid #6D4100; +} + +a.footnote-reference { + text-decoration: none; + font-size: 0.7em; + vertical-align: top; + border-bottom: 1px dotted #004B6B; +} + +a.footnote-reference:hover { + border-bottom: 1px solid #6D4100; +} + +a:hover tt, a:hover code { + background: #EEE; +} + + +@media screen and (max-width: 870px) { + + div.sphinxsidebar { + display: none; + } + + div.document { + width: 100%; + + } + + div.documentwrapper { + margin-left: 0; + margin-top: 0; + margin-right: 0; + margin-bottom: 0; + } + + div.bodywrapper { + margin-top: 0; + margin-right: 0; + margin-bottom: 0; + margin-left: 0; + } + + ul { + margin-left: 0; + } + + li > ul { + /* Matches the 30px from the "ul, ol" selector above */ + margin-left: 30px; + } + + .document { + width: auto; + } + + .footer { + width: auto; + } + + .bodywrapper { + margin: 0; + } + + .footer { + width: auto; + } + + .github { + display: none; + } + + + +} + + + +@media screen and (max-width: 875px) { + + body { + margin: 0; + padding: 20px 30px; + } + + div.documentwrapper { + float: none; + background: #fff; + } + + div.sphinxsidebar { + display: block; + float: none; + width: 102.5%; + margin: 50px -30px -20px -30px; + padding: 10px 20px; + background: #333; + color: #FFF; + } + + div.sphinxsidebar h3, div.sphinxsidebar h4, div.sphinxsidebar p, + div.sphinxsidebar h3 a { + color: #fff; + } + + div.sphinxsidebar a { + color: #AAA; + } + + div.sphinxsidebar p.logo { + display: none; + } + + div.document { + width: 100%; + margin: 0; + } + + div.footer { + display: none; + } + + div.bodywrapper { + margin: 0; + } + + div.body { + min-height: 0; + padding: 0; + } + + .rtd_doc_footer { + display: none; + } + + .document { + width: auto; + } + + .footer { + width: auto; + } + + .footer { + width: auto; + } + + .github { + display: none; + } +} + + +/* misc. */ + +.revsys-inline { + display: none!important; +} + +/* Hide ugly table cell borders in ..bibliography:: directive output */ +table.docutils.citation, table.docutils.citation td, table.docutils.citation th { + border: none; + /* Below needed in some edge cases; if not applied, bottom shadows appear */ + -moz-box-shadow: none; + -webkit-box-shadow: none; + box-shadow: none; +} + + +/* relbar */ + +.related { + line-height: 30px; + width: 100%; + font-size: 0.9rem; +} + +.related.top { + border-bottom: 1px solid #EEE; + margin-bottom: 20px; +} + +.related.bottom { + border-top: 1px solid #EEE; +} + +.related ul { + padding: 0; + margin: 0; + list-style: none; +} + +.related li { + display: inline; +} + +nav#rellinks { + float: right; +} + +nav#rellinks li+li:before { + content: "|"; +} + +nav#breadcrumbs li+li:before { + content: "\00BB"; +} + +/* Hide certain items when printing */ +@media print { + div.related { + display: none; + } +} \ No newline at end of file diff --git a/2.0.0/_static/basic.css b/2.0.0/_static/basic.css new file mode 100644 index 000000000..e5179b7a9 --- /dev/null +++ b/2.0.0/_static/basic.css @@ -0,0 +1,925 @@ +/* + * basic.css + * ~~~~~~~~~ + * + * Sphinx stylesheet -- basic theme. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +/* -- main layout ----------------------------------------------------------- */ + +div.clearer { + clear: both; +} + +div.section::after { + display: block; + content: ''; + clear: left; +} + +/* -- relbar ---------------------------------------------------------------- */ + +div.related { + width: 100%; + font-size: 90%; +} + +div.related h3 { + display: none; +} + +div.related ul { + margin: 0; + padding: 0 0 0 10px; + list-style: none; +} + +div.related li { + display: inline; +} + +div.related li.right { + float: right; + margin-right: 5px; +} + +/* -- sidebar --------------------------------------------------------------- */ + +div.sphinxsidebarwrapper { + padding: 10px 5px 0 10px; +} + +div.sphinxsidebar { + float: left; + width: 230px; + margin-left: -100%; + font-size: 90%; + word-wrap: break-word; + overflow-wrap : break-word; +} + +div.sphinxsidebar ul { + list-style: none; +} + +div.sphinxsidebar ul ul, +div.sphinxsidebar ul.want-points { + margin-left: 20px; + list-style: square; +} + +div.sphinxsidebar ul ul { + margin-top: 0; + margin-bottom: 0; +} + +div.sphinxsidebar form { + margin-top: 10px; +} + +div.sphinxsidebar input { + border: 1px solid #98dbcc; + font-family: sans-serif; + font-size: 1em; +} + +div.sphinxsidebar #searchbox form.search { + overflow: hidden; +} + +div.sphinxsidebar #searchbox input[type="text"] { + float: left; + width: 80%; + padding: 0.25em; + box-sizing: border-box; +} + +div.sphinxsidebar #searchbox input[type="submit"] { + float: left; + width: 20%; + border-left: none; + padding: 0.25em; + box-sizing: border-box; +} + + +img { + border: 0; + max-width: 100%; +} + +/* -- search page ----------------------------------------------------------- */ + +ul.search { + margin: 10px 0 0 20px; + padding: 0; +} + +ul.search li { + padding: 5px 0 5px 20px; + background-image: url(file.png); + background-repeat: no-repeat; + background-position: 0 7px; +} + +ul.search li a { + font-weight: bold; +} + +ul.search li p.context { + color: #888; + margin: 2px 0 0 30px; + text-align: left; +} + +ul.keywordmatches li.goodmatch a { + font-weight: bold; +} + +/* -- index page ------------------------------------------------------------ */ + +table.contentstable { + width: 90%; + margin-left: auto; + margin-right: auto; +} + +table.contentstable p.biglink { + line-height: 150%; +} + +a.biglink { + font-size: 1.3em; +} + +span.linkdescr { + font-style: italic; + padding-top: 5px; + font-size: 90%; +} + +/* -- general index --------------------------------------------------------- */ + +table.indextable { + width: 100%; +} + +table.indextable td { + text-align: left; + vertical-align: top; +} + +table.indextable ul { + margin-top: 0; + margin-bottom: 0; + list-style-type: none; +} + +table.indextable > tbody > tr > td > ul { + padding-left: 0em; +} + +table.indextable tr.pcap { + height: 10px; +} + +table.indextable tr.cap { + margin-top: 10px; + background-color: #f2f2f2; +} + +img.toggler { + margin-right: 3px; + margin-top: 3px; + cursor: pointer; +} + +div.modindex-jumpbox { + border-top: 1px solid #ddd; + border-bottom: 1px solid #ddd; + margin: 1em 0 1em 0; + padding: 0.4em; +} + +div.genindex-jumpbox { + border-top: 1px solid #ddd; + border-bottom: 1px solid #ddd; + margin: 1em 0 1em 0; + padding: 0.4em; +} + +/* -- domain module index --------------------------------------------------- */ + +table.modindextable td { + padding: 2px; + border-collapse: collapse; +} + +/* -- general body styles --------------------------------------------------- */ + +div.body { + min-width: inherit; + max-width: 800px; +} + +div.body p, div.body dd, div.body li, div.body blockquote { + -moz-hyphens: auto; + -ms-hyphens: auto; + -webkit-hyphens: auto; + hyphens: auto; +} + +a.headerlink { + visibility: hidden; +} + +a:visited { + color: #551A8B; +} + +h1:hover > a.headerlink, +h2:hover > a.headerlink, +h3:hover > a.headerlink, +h4:hover > a.headerlink, +h5:hover > a.headerlink, +h6:hover > a.headerlink, +dt:hover > a.headerlink, +caption:hover > a.headerlink, +p.caption:hover > a.headerlink, +div.code-block-caption:hover > a.headerlink { + visibility: visible; +} + +div.body p.caption { + text-align: inherit; +} + +div.body td { + text-align: left; +} + +.first { + margin-top: 0 !important; +} + +p.rubric { + margin-top: 30px; + font-weight: bold; +} + +img.align-left, figure.align-left, .figure.align-left, object.align-left { + clear: left; + float: left; + margin-right: 1em; +} + +img.align-right, figure.align-right, .figure.align-right, object.align-right { + clear: right; + float: right; + margin-left: 1em; +} + +img.align-center, figure.align-center, .figure.align-center, object.align-center { + display: block; + margin-left: auto; + margin-right: auto; +} + +img.align-default, figure.align-default, .figure.align-default { + display: block; + margin-left: auto; + margin-right: auto; +} + +.align-left { + text-align: left; +} + +.align-center { + text-align: center; +} + +.align-default { + text-align: center; +} + +.align-right { + text-align: right; +} + +/* -- sidebars -------------------------------------------------------------- */ + +div.sidebar, +aside.sidebar { + margin: 0 0 0.5em 1em; + border: 1px solid #ddb; + padding: 7px; + background-color: #ffe; + width: 40%; + float: right; + clear: right; + overflow-x: auto; +} + +p.sidebar-title { + font-weight: bold; +} + +nav.contents, +aside.topic, +div.admonition, div.topic, blockquote { + clear: left; +} + +/* -- topics ---------------------------------------------------------------- */ + +nav.contents, +aside.topic, +div.topic { + border: 1px solid #ccc; + padding: 7px; + margin: 10px 0 10px 0; +} + +p.topic-title { + font-size: 1.1em; + font-weight: bold; + margin-top: 10px; +} + +/* -- admonitions ----------------------------------------------------------- */ + +div.admonition { + margin-top: 10px; + margin-bottom: 10px; + padding: 7px; +} + +div.admonition dt { + font-weight: bold; +} + +p.admonition-title { + margin: 0px 10px 5px 0px; + font-weight: bold; +} + +div.body p.centered { + text-align: center; + margin-top: 25px; +} + +/* -- content of sidebars/topics/admonitions -------------------------------- */ + +div.sidebar > :last-child, +aside.sidebar > :last-child, +nav.contents > :last-child, +aside.topic > :last-child, +div.topic > :last-child, +div.admonition > :last-child { + margin-bottom: 0; +} + +div.sidebar::after, +aside.sidebar::after, +nav.contents::after, +aside.topic::after, +div.topic::after, +div.admonition::after, +blockquote::after { + display: block; + content: ''; + clear: both; +} + +/* -- tables ---------------------------------------------------------------- */ + +table.docutils { + margin-top: 10px; + margin-bottom: 10px; + border: 0; + border-collapse: collapse; +} + +table.align-center { + margin-left: auto; + margin-right: auto; +} + +table.align-default { + margin-left: auto; + margin-right: auto; +} + +table caption span.caption-number { + font-style: italic; +} + +table caption span.caption-text { +} + +table.docutils td, table.docutils th { + padding: 1px 8px 1px 5px; + border-top: 0; + border-left: 0; + border-right: 0; + border-bottom: 1px solid #aaa; +} + +th { + text-align: left; + padding-right: 5px; +} + +table.citation { + border-left: solid 1px gray; + margin-left: 1px; +} + +table.citation td { + border-bottom: none; +} + +th > :first-child, +td > :first-child { + margin-top: 0px; +} + +th > :last-child, +td > :last-child { + margin-bottom: 0px; +} + +/* -- figures --------------------------------------------------------------- */ + +div.figure, figure { + margin: 0.5em; + padding: 0.5em; +} + +div.figure p.caption, figcaption { + padding: 0.3em; +} + +div.figure p.caption span.caption-number, +figcaption span.caption-number { + font-style: italic; +} + +div.figure p.caption span.caption-text, +figcaption span.caption-text { +} + +/* -- field list styles ----------------------------------------------------- */ + +table.field-list td, table.field-list th { + border: 0 !important; +} + +.field-list ul { + margin: 0; + padding-left: 1em; +} + +.field-list p { + margin: 0; +} + +.field-name { + -moz-hyphens: manual; + -ms-hyphens: manual; + -webkit-hyphens: manual; + hyphens: manual; +} + +/* -- hlist styles ---------------------------------------------------------- */ + +table.hlist { + margin: 1em 0; +} + +table.hlist td { + vertical-align: top; +} + +/* -- object description styles --------------------------------------------- */ + +.sig { + font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace; +} + +.sig-name, code.descname { + background-color: transparent; + font-weight: bold; +} + +.sig-name { + font-size: 1.1em; +} + +code.descname { + font-size: 1.2em; +} + +.sig-prename, code.descclassname { + background-color: transparent; +} + +.optional { + font-size: 1.3em; +} + +.sig-paren { + font-size: larger; +} + +.sig-param.n { + font-style: italic; +} + +/* C++ specific styling */ + +.sig-inline.c-texpr, +.sig-inline.cpp-texpr { + font-family: unset; +} + +.sig.c .k, .sig.c .kt, +.sig.cpp .k, .sig.cpp .kt { + color: #0033B3; +} + +.sig.c .m, +.sig.cpp .m { + color: #1750EB; +} + +.sig.c .s, .sig.c .sc, +.sig.cpp .s, .sig.cpp .sc { + color: #067D17; +} + + +/* -- other body styles ----------------------------------------------------- */ + +ol.arabic { + list-style: decimal; +} + +ol.loweralpha { + list-style: lower-alpha; +} + +ol.upperalpha { + list-style: upper-alpha; +} + +ol.lowerroman { + list-style: lower-roman; +} + +ol.upperroman { + list-style: upper-roman; +} + +:not(li) > ol > li:first-child > :first-child, +:not(li) > ul > li:first-child > :first-child { + margin-top: 0px; +} + +:not(li) > ol > li:last-child > :last-child, +:not(li) > ul > li:last-child > :last-child { + margin-bottom: 0px; +} + +ol.simple ol p, +ol.simple ul p, +ul.simple ol p, +ul.simple ul p { + margin-top: 0; +} + +ol.simple > li:not(:first-child) > p, +ul.simple > li:not(:first-child) > p { + margin-top: 0; +} + +ol.simple p, +ul.simple p { + margin-bottom: 0; +} + +aside.footnote > span, +div.citation > span { + float: left; +} +aside.footnote > span:last-of-type, +div.citation > span:last-of-type { + padding-right: 0.5em; +} +aside.footnote > p { + margin-left: 2em; +} +div.citation > p { + margin-left: 4em; +} +aside.footnote > p:last-of-type, +div.citation > p:last-of-type { + margin-bottom: 0em; +} +aside.footnote > p:last-of-type:after, +div.citation > p:last-of-type:after { + content: ""; + clear: both; +} + +dl.field-list { + display: grid; + grid-template-columns: fit-content(30%) auto; +} + +dl.field-list > dt { + font-weight: bold; + word-break: break-word; + padding-left: 0.5em; + padding-right: 5px; +} + +dl.field-list > dd { + padding-left: 0.5em; + margin-top: 0em; + margin-left: 0em; + margin-bottom: 0em; +} + +dl { + margin-bottom: 15px; +} + +dd > :first-child { + margin-top: 0px; +} + +dd ul, dd table { + margin-bottom: 10px; +} + +dd { + margin-top: 3px; + margin-bottom: 10px; + margin-left: 30px; +} + +.sig dd { + margin-top: 0px; + margin-bottom: 0px; +} + +.sig dl { + margin-top: 0px; + margin-bottom: 0px; +} + +dl > dd:last-child, +dl > dd:last-child > :last-child { + margin-bottom: 0; +} + +dt:target, span.highlighted { + background-color: #fbe54e; +} + +rect.highlighted { + fill: #fbe54e; +} + +dl.glossary dt { + font-weight: bold; + font-size: 1.1em; +} + +.versionmodified { + font-style: italic; +} + +.system-message { + background-color: #fda; + padding: 5px; + border: 3px solid red; +} + +.footnote:target { + background-color: #ffa; +} + +.line-block { + display: block; + margin-top: 1em; + margin-bottom: 1em; +} + +.line-block .line-block { + margin-top: 0; + margin-bottom: 0; + margin-left: 1.5em; +} + +.guilabel, .menuselection { + font-family: sans-serif; +} + +.accelerator { + text-decoration: underline; +} + +.classifier { + font-style: oblique; +} + +.classifier:before { + font-style: normal; + margin: 0 0.5em; + content: ":"; + display: inline-block; +} + +abbr, acronym { + border-bottom: dotted 1px; + cursor: help; +} + +.translated { + background-color: rgba(207, 255, 207, 0.2) +} + +.untranslated { + background-color: rgba(255, 207, 207, 0.2) +} + +/* -- code displays --------------------------------------------------------- */ + +pre { + overflow: auto; + overflow-y: hidden; /* fixes display issues on Chrome browsers */ +} + +pre, div[class*="highlight-"] { + clear: both; +} + +span.pre { + -moz-hyphens: none; + -ms-hyphens: none; + -webkit-hyphens: none; + hyphens: none; + white-space: nowrap; +} + +div[class*="highlight-"] { + margin: 1em 0; +} + +td.linenos pre { + border: 0; + background-color: transparent; + color: #aaa; +} + +table.highlighttable { + display: block; +} + +table.highlighttable tbody { + display: block; +} + +table.highlighttable tr { + display: flex; +} + +table.highlighttable td { + margin: 0; + padding: 0; +} + +table.highlighttable td.linenos { + padding-right: 0.5em; +} + +table.highlighttable td.code { + flex: 1; + overflow: hidden; +} + +.highlight .hll { + display: block; +} + +div.highlight pre, +table.highlighttable pre { + margin: 0; +} + +div.code-block-caption + div { + margin-top: 0; +} + +div.code-block-caption { + margin-top: 1em; + padding: 2px 5px; + font-size: small; +} + +div.code-block-caption code { + background-color: transparent; +} + +table.highlighttable td.linenos, +span.linenos, +div.highlight span.gp { /* gp: Generic.Prompt */ + user-select: none; + -webkit-user-select: text; /* Safari fallback only */ + -webkit-user-select: none; /* Chrome/Safari */ + -moz-user-select: none; /* Firefox */ + -ms-user-select: none; /* IE10+ */ +} + +div.code-block-caption span.caption-number { + padding: 0.1em 0.3em; + font-style: italic; +} + +div.code-block-caption span.caption-text { +} + +div.literal-block-wrapper { + margin: 1em 0; +} + +code.xref, a code { + background-color: transparent; + font-weight: bold; +} + +h1 code, h2 code, h3 code, h4 code, h5 code, h6 code { + background-color: transparent; +} + +.viewcode-link { + float: right; +} + +.viewcode-back { + float: right; + font-family: sans-serif; +} + +div.viewcode-block:target { + margin: -1px -10px; + padding: 0 10px; +} + +/* -- math display ---------------------------------------------------------- */ + +img.math { + vertical-align: middle; +} + +div.body div.math p { + text-align: center; +} + +span.eqno { + float: right; +} + +span.eqno a.headerlink { + position: absolute; + z-index: 1; +} + +div.math:hover a.headerlink { + visibility: visible; +} + +/* -- printout stylesheet --------------------------------------------------- */ + +@media print { + div.document, + div.documentwrapper, + div.bodywrapper { + margin: 0 !important; + width: 100%; + } + + div.sphinxsidebar, + div.related, + div.footer, + #top-link { + display: none; + } +} \ No newline at end of file diff --git a/2.0.0/_static/blla_heatmap.jpg b/2.0.0/_static/blla_heatmap.jpg new file mode 100644 index 000000000..3f3381096 Binary files /dev/null and b/2.0.0/_static/blla_heatmap.jpg differ diff --git a/2.0.0/_static/blla_output.jpg b/2.0.0/_static/blla_output.jpg new file mode 100644 index 000000000..72c652fa4 Binary files /dev/null and b/2.0.0/_static/blla_output.jpg differ diff --git a/2.0.0/_static/bw.png b/2.0.0/_static/bw.png new file mode 100644 index 000000000..e7e5054eb Binary files /dev/null and b/2.0.0/_static/bw.png differ diff --git a/2.0.0/_static/custom.css b/2.0.0/_static/custom.css new file mode 100644 index 000000000..c41f90af5 --- /dev/null +++ b/2.0.0/_static/custom.css @@ -0,0 +1,24 @@ +pre { + white-space: pre-wrap; +} +svg { + width: 100%; +} +.highlight .err { + border: inherit; + box-sizing: inherit; +} + +div.leftside { + width: 110px; + padding: 0px 3px 0px 0px; + float: left; +} + +div.rightside { + margin-left: 125px; +} + +dl.py { + margin-top: 25px; +} diff --git a/2.0.0/_static/doctools.js b/2.0.0/_static/doctools.js new file mode 100644 index 000000000..4d67807d1 --- /dev/null +++ b/2.0.0/_static/doctools.js @@ -0,0 +1,156 @@ +/* + * doctools.js + * ~~~~~~~~~~~ + * + * Base JavaScript utilities for all Sphinx HTML documentation. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ +"use strict"; + +const BLACKLISTED_KEY_CONTROL_ELEMENTS = new Set([ + "TEXTAREA", + "INPUT", + "SELECT", + "BUTTON", +]); + +const _ready = (callback) => { + if (document.readyState !== "loading") { + callback(); + } else { + document.addEventListener("DOMContentLoaded", callback); + } +}; + +/** + * Small JavaScript module for the documentation. + */ +const Documentation = { + init: () => { + Documentation.initDomainIndexTable(); + Documentation.initOnKeyListeners(); + }, + + /** + * i18n support + */ + TRANSLATIONS: {}, + PLURAL_EXPR: (n) => (n === 1 ? 0 : 1), + LOCALE: "unknown", + + // gettext and ngettext don't access this so that the functions + // can safely bound to a different name (_ = Documentation.gettext) + gettext: (string) => { + const translated = Documentation.TRANSLATIONS[string]; + switch (typeof translated) { + case "undefined": + return string; // no translation + case "string": + return translated; // translation exists + default: + return translated[0]; // (singular, plural) translation tuple exists + } + }, + + ngettext: (singular, plural, n) => { + const translated = Documentation.TRANSLATIONS[singular]; + if (typeof translated !== "undefined") + return translated[Documentation.PLURAL_EXPR(n)]; + return n === 1 ? singular : plural; + }, + + addTranslations: (catalog) => { + Object.assign(Documentation.TRANSLATIONS, catalog.messages); + Documentation.PLURAL_EXPR = new Function( + "n", + `return (${catalog.plural_expr})` + ); + Documentation.LOCALE = catalog.locale; + }, + + /** + * helper function to focus on search bar + */ + focusSearchBar: () => { + document.querySelectorAll("input[name=q]")[0]?.focus(); + }, + + /** + * Initialise the domain index toggle buttons + */ + initDomainIndexTable: () => { + const toggler = (el) => { + const idNumber = el.id.substr(7); + const toggledRows = document.querySelectorAll(`tr.cg-${idNumber}`); + if (el.src.substr(-9) === "minus.png") { + el.src = `${el.src.substr(0, el.src.length - 9)}plus.png`; + toggledRows.forEach((el) => (el.style.display = "none")); + } else { + el.src = `${el.src.substr(0, el.src.length - 8)}minus.png`; + toggledRows.forEach((el) => (el.style.display = "")); + } + }; + + const togglerElements = document.querySelectorAll("img.toggler"); + togglerElements.forEach((el) => + el.addEventListener("click", (event) => toggler(event.currentTarget)) + ); + togglerElements.forEach((el) => (el.style.display = "")); + if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) togglerElements.forEach(toggler); + }, + + initOnKeyListeners: () => { + // only install a listener if it is really needed + if ( + !DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS && + !DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS + ) + return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.altKey || event.ctrlKey || event.metaKey) return; + + if (!event.shiftKey) { + switch (event.key) { + case "ArrowLeft": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const prevLink = document.querySelector('link[rel="prev"]'); + if (prevLink && prevLink.href) { + window.location.href = prevLink.href; + event.preventDefault(); + } + break; + case "ArrowRight": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const nextLink = document.querySelector('link[rel="next"]'); + if (nextLink && nextLink.href) { + window.location.href = nextLink.href; + event.preventDefault(); + } + break; + } + } + + // some keyboard layouts may need Shift to get / + switch (event.key) { + case "/": + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) break; + Documentation.focusSearchBar(); + event.preventDefault(); + } + }); + }, +}; + +// quick alias for translations +const _ = Documentation.gettext; + +_ready(Documentation.init); diff --git a/2.0.0/_static/documentation_options.js b/2.0.0/_static/documentation_options.js new file mode 100644 index 000000000..7e4c114f2 --- /dev/null +++ b/2.0.0/_static/documentation_options.js @@ -0,0 +1,13 @@ +const DOCUMENTATION_OPTIONS = { + VERSION: '', + LANGUAGE: 'en', + COLLAPSE_INDEX: false, + BUILDER: 'html', + FILE_SUFFIX: '.html', + LINK_SUFFIX: '.html', + HAS_SOURCE: true, + SOURCELINK_SUFFIX: '.txt', + NAVIGATION_WITH_KEYS: false, + SHOW_SEARCH_SUMMARY: true, + ENABLE_SEARCH_SHORTCUTS: true, +}; \ No newline at end of file diff --git a/2.0.0/_static/file.png b/2.0.0/_static/file.png new file mode 100644 index 000000000..a858a410e Binary files /dev/null and b/2.0.0/_static/file.png differ diff --git a/2.0.0/_static/graphviz.css b/2.0.0/_static/graphviz.css new file mode 100644 index 000000000..027576e34 --- /dev/null +++ b/2.0.0/_static/graphviz.css @@ -0,0 +1,19 @@ +/* + * graphviz.css + * ~~~~~~~~~~~~ + * + * Sphinx stylesheet -- graphviz extension. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +img.graphviz { + border: 0; + max-width: 100%; +} + +object.graphviz { + max-width: 100%; +} diff --git a/2.0.0/_static/kraken.png b/2.0.0/_static/kraken.png new file mode 100644 index 000000000..8f25dd8be Binary files /dev/null and b/2.0.0/_static/kraken.png differ diff --git a/2.0.0/_static/kraken_recognition.svg b/2.0.0/_static/kraken_recognition.svg new file mode 100644 index 000000000..129b2c67a --- /dev/null +++ b/2.0.0/_static/kraken_recognition.svg @@ -0,0 +1,948 @@ + + + + + + + + + + + + Output Matrix + + + Labels + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Label + Sequence + + + 15, 10, 1, ... + + + + 'Time' Steps + + + + + + + + + + + + + + 'Time' Steps + (Width) + + + + + + + + + + + + + + + + + + + + + + + + + + Neural + Net + + + + Character + Sequence + + + o, c, u, ... + + + + + + + + + + + + + + + CTC + decoder + + + + + Codec + + + + + + + + + + + + + + diff --git a/2.0.0/_static/kraken_segmentation.svg b/2.0.0/_static/kraken_segmentation.svg new file mode 100644 index 000000000..4b9c860ce --- /dev/null +++ b/2.0.0/_static/kraken_segmentation.svg @@ -0,0 +1,1161 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Pixel Labelling + + + + + + + + Line and Separator + Heatmaps + + + + + + + + + Bounding Polygon + Calculation + + + + + + + + + + + Baseline + Vectorization + and Orientation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Oriented + Baselines + + + + + + + + + Line + Ordering + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Bounding + Polygons + + + + + + + Trainable + + + + + + + + + + + + Segmentation + + + + + + + + + + Region Heatmaps + + + + + + + + + + Region + Vectorization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Region + Boundaries + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/2.0.0/_static/kraken_segmodel.svg b/2.0.0/_static/kraken_segmodel.svg new file mode 100644 index 000000000..e722a9707 --- /dev/null +++ b/2.0.0/_static/kraken_segmodel.svg @@ -0,0 +1,250 @@ + + + + + + + + + + + + + Segmentation Model + (TorchVGSLModel) + + + + + + + + + Metadata + + + + + + + Line and Region Types + + + + + + + Baseline location flag + + + + + + + Bounding Regions + + + + + + + + + + + Neural Network + + + + diff --git a/2.0.0/_static/kraken_torchseqrecognizer.svg b/2.0.0/_static/kraken_torchseqrecognizer.svg new file mode 100644 index 000000000..c9a2f1135 --- /dev/null +++ b/2.0.0/_static/kraken_torchseqrecognizer.svg @@ -0,0 +1,239 @@ + + + + + + + + + + + + + Transcription Model + (TorchSeqRecognizer) + + + + + + + + + + Codec + + + + + + + + + + + Metadata + + + + + + + + + + + CTC Decoder + + + + + + + + + + + Neural Network + + + + diff --git a/2.0.0/_static/kraken_workflow.svg b/2.0.0/_static/kraken_workflow.svg new file mode 100644 index 000000000..5a50b51d6 --- /dev/null +++ b/2.0.0/_static/kraken_workflow.svg @@ -0,0 +1,753 @@ + + + + + + + + + + + + + + + Segmentation + + + + + + + + + + + Recognition + + + + + + + + + + + Serialization + + + + + + + + + + + + + + + + + + + + + + Recognition Model + + + + + + + + + + + + + + + + + + + + + + Segmentation Model + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + OCR Records + + + + + + + + + + + + + + + + + + Baselines, + Regions, + and Order + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Output File + + + + + + + + + + + + + + + + + + Output Template + + + + + + + + + + + + + + + + + + Image + + diff --git a/2.0.0/_static/language_data.js b/2.0.0/_static/language_data.js new file mode 100644 index 000000000..367b8ed81 --- /dev/null +++ b/2.0.0/_static/language_data.js @@ -0,0 +1,199 @@ +/* + * language_data.js + * ~~~~~~~~~~~~~~~~ + * + * This script contains the language-specific data used by searchtools.js, + * namely the list of stopwords, stemmer, scorer and splitter. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"]; + + +/* Non-minified version is copied as a separate JS file, if available */ + +/** + * Porter Stemmer + */ +var Stemmer = function() { + + var step2list = { + ational: 'ate', + tional: 'tion', + enci: 'ence', + anci: 'ance', + izer: 'ize', + bli: 'ble', + alli: 'al', + entli: 'ent', + eli: 'e', + ousli: 'ous', + ization: 'ize', + ation: 'ate', + ator: 'ate', + alism: 'al', + iveness: 'ive', + fulness: 'ful', + ousness: 'ous', + aliti: 'al', + iviti: 'ive', + biliti: 'ble', + logi: 'log' + }; + + var step3list = { + icate: 'ic', + ative: '', + alize: 'al', + iciti: 'ic', + ical: 'ic', + ful: '', + ness: '' + }; + + var c = "[^aeiou]"; // consonant + var v = "[aeiouy]"; // vowel + var C = c + "[^aeiouy]*"; // consonant sequence + var V = v + "[aeiou]*"; // vowel sequence + + var mgr0 = "^(" + C + ")?" + V + C; // [C]VC... is m>0 + var meq1 = "^(" + C + ")?" + V + C + "(" + V + ")?$"; // [C]VC[V] is m=1 + var mgr1 = "^(" + C + ")?" + V + C + V + C; // [C]VCVC... is m>1 + var s_v = "^(" + C + ")?" + v; // vowel in stem + + this.stemWord = function (w) { + var stem; + var suffix; + var firstch; + var origword = w; + + if (w.length < 3) + return w; + + var re; + var re2; + var re3; + var re4; + + firstch = w.substr(0,1); + if (firstch == "y") + w = firstch.toUpperCase() + w.substr(1); + + // Step 1a + re = /^(.+?)(ss|i)es$/; + re2 = /^(.+?)([^s])s$/; + + if (re.test(w)) + w = w.replace(re,"$1$2"); + else if (re2.test(w)) + w = w.replace(re2,"$1$2"); + + // Step 1b + re = /^(.+?)eed$/; + re2 = /^(.+?)(ed|ing)$/; + if (re.test(w)) { + var fp = re.exec(w); + re = new RegExp(mgr0); + if (re.test(fp[1])) { + re = /.$/; + w = w.replace(re,""); + } + } + else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1]; + re2 = new RegExp(s_v); + if (re2.test(stem)) { + w = stem; + re2 = /(at|bl|iz)$/; + re3 = new RegExp("([^aeiouylsz])\\1$"); + re4 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + if (re2.test(w)) + w = w + "e"; + else if (re3.test(w)) { + re = /.$/; + w = w.replace(re,""); + } + else if (re4.test(w)) + w = w + "e"; + } + } + + // Step 1c + re = /^(.+?)y$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(s_v); + if (re.test(stem)) + w = stem + "i"; + } + + // Step 2 + re = /^(.+?)(ational|tional|enci|anci|izer|bli|alli|entli|eli|ousli|ization|ation|ator|alism|iveness|fulness|ousness|aliti|iviti|biliti|logi)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = new RegExp(mgr0); + if (re.test(stem)) + w = stem + step2list[suffix]; + } + + // Step 3 + re = /^(.+?)(icate|ative|alize|iciti|ical|ful|ness)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = new RegExp(mgr0); + if (re.test(stem)) + w = stem + step3list[suffix]; + } + + // Step 4 + re = /^(.+?)(al|ance|ence|er|ic|able|ible|ant|ement|ment|ent|ou|ism|ate|iti|ous|ive|ize)$/; + re2 = /^(.+?)(s|t)(ion)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(mgr1); + if (re.test(stem)) + w = stem; + } + else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1] + fp[2]; + re2 = new RegExp(mgr1); + if (re2.test(stem)) + w = stem; + } + + // Step 5 + re = /^(.+?)e$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(mgr1); + re2 = new RegExp(meq1); + re3 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + if (re.test(stem) || (re2.test(stem) && !(re3.test(stem)))) + w = stem; + } + re = /ll$/; + re2 = new RegExp(mgr1); + if (re.test(w) && re2.test(w)) { + re = /.$/; + w = w.replace(re,""); + } + + // and turn initial Y back to y + if (firstch == "y") + w = firstch.toLowerCase() + w.substr(1); + return w; + } +} + diff --git a/2.0.0/_static/minus.png b/2.0.0/_static/minus.png new file mode 100644 index 000000000..d96755fda Binary files /dev/null and b/2.0.0/_static/minus.png differ diff --git a/2.0.0/_static/normal-reproduction-low-resolution.jpg b/2.0.0/_static/normal-reproduction-low-resolution.jpg new file mode 100644 index 000000000..673be92ae Binary files /dev/null and b/2.0.0/_static/normal-reproduction-low-resolution.jpg differ diff --git a/2.0.0/_static/pat.png b/2.0.0/_static/pat.png new file mode 100644 index 000000000..52f7a8995 Binary files /dev/null and b/2.0.0/_static/pat.png differ diff --git a/2.0.0/_static/plus.png b/2.0.0/_static/plus.png new file mode 100644 index 000000000..7107cec93 Binary files /dev/null and b/2.0.0/_static/plus.png differ diff --git a/2.0.0/_static/pygments.css b/2.0.0/_static/pygments.css new file mode 100644 index 000000000..0d49244ed --- /dev/null +++ b/2.0.0/_static/pygments.css @@ -0,0 +1,75 @@ +pre { line-height: 125%; } +td.linenos .normal { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +span.linenos { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +td.linenos .special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +.highlight .hll { background-color: #ffffcc } +.highlight { background: #eeffcc; } +.highlight .c { color: #408090; font-style: italic } /* Comment */ +.highlight .err { border: 1px solid #FF0000 } /* Error */ +.highlight .k { color: #007020; font-weight: bold } /* Keyword */ +.highlight .o { color: #666666 } /* Operator */ +.highlight .ch { color: #408090; font-style: italic } /* Comment.Hashbang */ +.highlight .cm { color: #408090; font-style: italic } /* Comment.Multiline */ +.highlight .cp { color: #007020 } /* Comment.Preproc */ +.highlight .cpf { color: #408090; font-style: italic } /* Comment.PreprocFile */ +.highlight .c1 { color: #408090; font-style: italic } /* Comment.Single */ +.highlight .cs { color: #408090; background-color: #fff0f0 } /* Comment.Special */ +.highlight .gd { color: #A00000 } /* Generic.Deleted */ +.highlight .ge { font-style: italic } /* Generic.Emph */ +.highlight .ges { font-weight: bold; font-style: italic } /* Generic.EmphStrong */ +.highlight .gr { color: #FF0000 } /* Generic.Error */ +.highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */ +.highlight .gi { color: #00A000 } /* Generic.Inserted */ +.highlight .go { color: #333333 } /* Generic.Output */ +.highlight .gp { color: #c65d09; font-weight: bold } /* Generic.Prompt */ +.highlight .gs { font-weight: bold } /* Generic.Strong */ +.highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */ +.highlight .gt { color: #0044DD } /* Generic.Traceback */ +.highlight .kc { color: #007020; font-weight: bold } /* Keyword.Constant */ +.highlight .kd { color: #007020; font-weight: bold } /* Keyword.Declaration */ +.highlight .kn { color: #007020; font-weight: bold } /* Keyword.Namespace */ +.highlight .kp { color: #007020 } /* Keyword.Pseudo */ +.highlight .kr { color: #007020; font-weight: bold } /* Keyword.Reserved */ +.highlight .kt { color: #902000 } /* Keyword.Type */ +.highlight .m { color: #208050 } /* Literal.Number */ +.highlight .s { color: #4070a0 } /* Literal.String */ +.highlight .na { color: #4070a0 } /* Name.Attribute */ +.highlight .nb { color: #007020 } /* Name.Builtin */ +.highlight .nc { color: #0e84b5; font-weight: bold } /* Name.Class */ +.highlight .no { color: #60add5 } /* Name.Constant */ +.highlight .nd { color: #555555; font-weight: bold } /* Name.Decorator */ +.highlight .ni { color: #d55537; font-weight: bold } /* Name.Entity */ +.highlight .ne { color: #007020 } /* Name.Exception */ +.highlight .nf { color: #06287e } /* Name.Function */ +.highlight .nl { color: #002070; font-weight: bold } /* Name.Label */ +.highlight .nn { color: #0e84b5; font-weight: bold } /* Name.Namespace */ +.highlight .nt { color: #062873; font-weight: bold } /* Name.Tag */ +.highlight .nv { color: #bb60d5 } /* Name.Variable */ +.highlight .ow { color: #007020; font-weight: bold } /* Operator.Word */ +.highlight .w { color: #bbbbbb } /* Text.Whitespace */ +.highlight .mb { color: #208050 } /* Literal.Number.Bin */ +.highlight .mf { color: #208050 } /* Literal.Number.Float */ +.highlight .mh { color: #208050 } /* Literal.Number.Hex */ +.highlight .mi { color: #208050 } /* Literal.Number.Integer */ +.highlight .mo { color: #208050 } /* Literal.Number.Oct */ +.highlight .sa { color: #4070a0 } /* Literal.String.Affix */ +.highlight .sb { color: #4070a0 } /* Literal.String.Backtick */ +.highlight .sc { color: #4070a0 } /* Literal.String.Char */ +.highlight .dl { color: #4070a0 } /* Literal.String.Delimiter */ +.highlight .sd { color: #4070a0; font-style: italic } /* Literal.String.Doc */ +.highlight .s2 { color: #4070a0 } /* Literal.String.Double */ +.highlight .se { color: #4070a0; font-weight: bold } /* Literal.String.Escape */ +.highlight .sh { color: #4070a0 } /* Literal.String.Heredoc */ +.highlight .si { color: #70a0d0; font-style: italic } /* Literal.String.Interpol */ +.highlight .sx { color: #c65d09 } /* Literal.String.Other */ +.highlight .sr { color: #235388 } /* Literal.String.Regex */ +.highlight .s1 { color: #4070a0 } /* Literal.String.Single */ +.highlight .ss { color: #517918 } /* Literal.String.Symbol */ +.highlight .bp { color: #007020 } /* Name.Builtin.Pseudo */ +.highlight .fm { color: #06287e } /* Name.Function.Magic */ +.highlight .vc { color: #bb60d5 } /* Name.Variable.Class */ +.highlight .vg { color: #bb60d5 } /* Name.Variable.Global */ +.highlight .vi { color: #bb60d5 } /* Name.Variable.Instance */ +.highlight .vm { color: #bb60d5 } /* Name.Variable.Magic */ +.highlight .il { color: #208050 } /* Literal.Number.Integer.Long */ \ No newline at end of file diff --git a/2.0.0/_static/searchtools.js b/2.0.0/_static/searchtools.js new file mode 100644 index 000000000..b08d58c9b --- /dev/null +++ b/2.0.0/_static/searchtools.js @@ -0,0 +1,620 @@ +/* + * searchtools.js + * ~~~~~~~~~~~~~~~~ + * + * Sphinx JavaScript utilities for the full-text search. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ +"use strict"; + +/** + * Simple result scoring code. + */ +if (typeof Scorer === "undefined") { + var Scorer = { + // Implement the following function to further tweak the score for each result + // The function takes a result array [docname, title, anchor, descr, score, filename] + // and returns the new score. + /* + score: result => { + const [docname, title, anchor, descr, score, filename] = result + return score + }, + */ + + // query matches the full name of an object + objNameMatch: 11, + // or matches in the last dotted part of the object name + objPartialMatch: 6, + // Additive scores depending on the priority of the object + objPrio: { + 0: 15, // used to be importantResults + 1: 5, // used to be objectResults + 2: -5, // used to be unimportantResults + }, + // Used when the priority is not in the mapping. + objPrioDefault: 0, + + // query found in title + title: 15, + partialTitle: 7, + // query found in terms + term: 5, + partialTerm: 2, + }; +} + +const _removeChildren = (element) => { + while (element && element.lastChild) element.removeChild(element.lastChild); +}; + +/** + * See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping + */ +const _escapeRegExp = (string) => + string.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string + +const _displayItem = (item, searchTerms, highlightTerms) => { + const docBuilder = DOCUMENTATION_OPTIONS.BUILDER; + const docFileSuffix = DOCUMENTATION_OPTIONS.FILE_SUFFIX; + const docLinkSuffix = DOCUMENTATION_OPTIONS.LINK_SUFFIX; + const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY; + const contentRoot = document.documentElement.dataset.content_root; + + const [docName, title, anchor, descr, score, _filename] = item; + + let listItem = document.createElement("li"); + let requestUrl; + let linkUrl; + if (docBuilder === "dirhtml") { + // dirhtml builder + let dirname = docName + "/"; + if (dirname.match(/\/index\/$/)) + dirname = dirname.substring(0, dirname.length - 6); + else if (dirname === "index/") dirname = ""; + requestUrl = contentRoot + dirname; + linkUrl = requestUrl; + } else { + // normal html builders + requestUrl = contentRoot + docName + docFileSuffix; + linkUrl = docName + docLinkSuffix; + } + let linkEl = listItem.appendChild(document.createElement("a")); + linkEl.href = linkUrl + anchor; + linkEl.dataset.score = score; + linkEl.innerHTML = title; + if (descr) { + listItem.appendChild(document.createElement("span")).innerHTML = + " (" + descr + ")"; + // highlight search terms in the description + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + } + else if (showSearchSummary) + fetch(requestUrl) + .then((responseData) => responseData.text()) + .then((data) => { + if (data) + listItem.appendChild( + Search.makeSearchSummary(data, searchTerms, anchor) + ); + // highlight search terms in the summary + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + }); + Search.output.appendChild(listItem); +}; +const _finishSearch = (resultCount) => { + Search.stopPulse(); + Search.title.innerText = _("Search Results"); + if (!resultCount) + Search.status.innerText = Documentation.gettext( + "Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories." + ); + else + Search.status.innerText = _( + "Search finished, found ${resultCount} page(s) matching the search query." + ).replace('${resultCount}', resultCount); +}; +const _displayNextItem = ( + results, + resultCount, + searchTerms, + highlightTerms, +) => { + // results left, load the summary and display it + // this is intended to be dynamic (don't sub resultsCount) + if (results.length) { + _displayItem(results.pop(), searchTerms, highlightTerms); + setTimeout( + () => _displayNextItem(results, resultCount, searchTerms, highlightTerms), + 5 + ); + } + // search finished, update title and status message + else _finishSearch(resultCount); +}; +// Helper function used by query() to order search results. +// Each input is an array of [docname, title, anchor, descr, score, filename]. +// Order the results by score (in opposite order of appearance, since the +// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically. +const _orderResultsByScoreThenName = (a, b) => { + const leftScore = a[4]; + const rightScore = b[4]; + if (leftScore === rightScore) { + // same score: sort alphabetically + const leftTitle = a[1].toLowerCase(); + const rightTitle = b[1].toLowerCase(); + if (leftTitle === rightTitle) return 0; + return leftTitle > rightTitle ? -1 : 1; // inverted is intentional + } + return leftScore > rightScore ? 1 : -1; +}; + +/** + * Default splitQuery function. Can be overridden in ``sphinx.search`` with a + * custom function per language. + * + * The regular expression works by splitting the string on consecutive characters + * that are not Unicode letters, numbers, underscores, or emoji characters. + * This is the same as ``\W+`` in Python, preserving the surrogate pair area. + */ +if (typeof splitQuery === "undefined") { + var splitQuery = (query) => query + .split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu) + .filter(term => term) // remove remaining empty strings +} + +/** + * Search Module + */ +const Search = { + _index: null, + _queued_query: null, + _pulse_status: -1, + + htmlToText: (htmlString, anchor) => { + const htmlElement = new DOMParser().parseFromString(htmlString, 'text/html'); + for (const removalQuery of [".headerlink", "script", "style"]) { + htmlElement.querySelectorAll(removalQuery).forEach((el) => { el.remove() }); + } + if (anchor) { + const anchorContent = htmlElement.querySelector(`[role="main"] ${anchor}`); + if (anchorContent) return anchorContent.textContent; + + console.warn( + `Anchored content block not found. Sphinx search tries to obtain it via DOM query '[role=main] ${anchor}'. Check your theme or template.` + ); + } + + // if anchor not specified or not found, fall back to main content + const docContent = htmlElement.querySelector('[role="main"]'); + if (docContent) return docContent.textContent; + + console.warn( + "Content block not found. Sphinx search tries to obtain it via DOM query '[role=main]'. Check your theme or template." + ); + return ""; + }, + + init: () => { + const query = new URLSearchParams(window.location.search).get("q"); + document + .querySelectorAll('input[name="q"]') + .forEach((el) => (el.value = query)); + if (query) Search.performSearch(query); + }, + + loadIndex: (url) => + (document.body.appendChild(document.createElement("script")).src = url), + + setIndex: (index) => { + Search._index = index; + if (Search._queued_query !== null) { + const query = Search._queued_query; + Search._queued_query = null; + Search.query(query); + } + }, + + hasIndex: () => Search._index !== null, + + deferQuery: (query) => (Search._queued_query = query), + + stopPulse: () => (Search._pulse_status = -1), + + startPulse: () => { + if (Search._pulse_status >= 0) return; + + const pulse = () => { + Search._pulse_status = (Search._pulse_status + 1) % 4; + Search.dots.innerText = ".".repeat(Search._pulse_status); + if (Search._pulse_status >= 0) window.setTimeout(pulse, 500); + }; + pulse(); + }, + + /** + * perform a search for something (or wait until index is loaded) + */ + performSearch: (query) => { + // create the required interface elements + const searchText = document.createElement("h2"); + searchText.textContent = _("Searching"); + const searchSummary = document.createElement("p"); + searchSummary.classList.add("search-summary"); + searchSummary.innerText = ""; + const searchList = document.createElement("ul"); + searchList.classList.add("search"); + + const out = document.getElementById("search-results"); + Search.title = out.appendChild(searchText); + Search.dots = Search.title.appendChild(document.createElement("span")); + Search.status = out.appendChild(searchSummary); + Search.output = out.appendChild(searchList); + + const searchProgress = document.getElementById("search-progress"); + // Some themes don't use the search progress node + if (searchProgress) { + searchProgress.innerText = _("Preparing search..."); + } + Search.startPulse(); + + // index already loaded, the browser was quick! + if (Search.hasIndex()) Search.query(query); + else Search.deferQuery(query); + }, + + _parseQuery: (query) => { + // stem the search terms and add them to the correct list + const stemmer = new Stemmer(); + const searchTerms = new Set(); + const excludedTerms = new Set(); + const highlightTerms = new Set(); + const objectTerms = new Set(splitQuery(query.toLowerCase().trim())); + splitQuery(query.trim()).forEach((queryTerm) => { + const queryTermLower = queryTerm.toLowerCase(); + + // maybe skip this "word" + // stopwords array is from language_data.js + if ( + stopwords.indexOf(queryTermLower) !== -1 || + queryTerm.match(/^\d+$/) + ) + return; + + // stem the word + let word = stemmer.stemWord(queryTermLower); + // select the correct list + if (word[0] === "-") excludedTerms.add(word.substr(1)); + else { + searchTerms.add(word); + highlightTerms.add(queryTermLower); + } + }); + + if (SPHINX_HIGHLIGHT_ENABLED) { // set in sphinx_highlight.js + localStorage.setItem("sphinx_highlight_terms", [...highlightTerms].join(" ")) + } + + // console.debug("SEARCH: searching for:"); + // console.info("required: ", [...searchTerms]); + // console.info("excluded: ", [...excludedTerms]); + + return [query, searchTerms, excludedTerms, highlightTerms, objectTerms]; + }, + + /** + * execute search (requires search index to be loaded) + */ + _performSearch: (query, searchTerms, excludedTerms, highlightTerms, objectTerms) => { + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const titles = Search._index.titles; + const allTitles = Search._index.alltitles; + const indexEntries = Search._index.indexentries; + + // Collect multiple result groups to be sorted separately and then ordered. + // Each is an array of [docname, title, anchor, descr, score, filename]. + const normalResults = []; + const nonMainIndexResults = []; + + _removeChildren(document.getElementById("search-progress")); + + const queryLower = query.toLowerCase().trim(); + for (const [title, foundTitles] of Object.entries(allTitles)) { + if (title.toLowerCase().trim().includes(queryLower) && (queryLower.length >= title.length/2)) { + for (const [file, id] of foundTitles) { + const score = Math.round(Scorer.title * queryLower.length / title.length); + const boost = titles[file] === title ? 1 : 0; // add a boost for document titles + normalResults.push([ + docNames[file], + titles[file] !== title ? `${titles[file]} > ${title}` : title, + id !== null ? "#" + id : "", + null, + score + boost, + filenames[file], + ]); + } + } + } + + // search for explicit entries in index directives + for (const [entry, foundEntries] of Object.entries(indexEntries)) { + if (entry.includes(queryLower) && (queryLower.length >= entry.length/2)) { + for (const [file, id, isMain] of foundEntries) { + const score = Math.round(100 * queryLower.length / entry.length); + const result = [ + docNames[file], + titles[file], + id ? "#" + id : "", + null, + score, + filenames[file], + ]; + if (isMain) { + normalResults.push(result); + } else { + nonMainIndexResults.push(result); + } + } + } + } + + // lookup as object + objectTerms.forEach((term) => + normalResults.push(...Search.performObjectSearch(term, objectTerms)) + ); + + // lookup as search terms in fulltext + normalResults.push(...Search.performTermsSearch(searchTerms, excludedTerms)); + + // let the scorer override scores with a custom scoring function + if (Scorer.score) { + normalResults.forEach((item) => (item[4] = Scorer.score(item))); + nonMainIndexResults.forEach((item) => (item[4] = Scorer.score(item))); + } + + // Sort each group of results by score and then alphabetically by name. + normalResults.sort(_orderResultsByScoreThenName); + nonMainIndexResults.sort(_orderResultsByScoreThenName); + + // Combine the result groups in (reverse) order. + // Non-main index entries are typically arbitrary cross-references, + // so display them after other results. + let results = [...nonMainIndexResults, ...normalResults]; + + // remove duplicate search results + // note the reversing of results, so that in the case of duplicates, the highest-scoring entry is kept + let seen = new Set(); + results = results.reverse().reduce((acc, result) => { + let resultStr = result.slice(0, 4).concat([result[5]]).map(v => String(v)).join(','); + if (!seen.has(resultStr)) { + acc.push(result); + seen.add(resultStr); + } + return acc; + }, []); + + return results.reverse(); + }, + + query: (query) => { + const [searchQuery, searchTerms, excludedTerms, highlightTerms, objectTerms] = Search._parseQuery(query); + const results = Search._performSearch(searchQuery, searchTerms, excludedTerms, highlightTerms, objectTerms); + + // for debugging + //Search.lastresults = results.slice(); // a copy + // console.info("search results:", Search.lastresults); + + // print the results + _displayNextItem(results, results.length, searchTerms, highlightTerms); + }, + + /** + * search for object names + */ + performObjectSearch: (object, objectTerms) => { + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const objects = Search._index.objects; + const objNames = Search._index.objnames; + const titles = Search._index.titles; + + const results = []; + + const objectSearchCallback = (prefix, match) => { + const name = match[4] + const fullname = (prefix ? prefix + "." : "") + name; + const fullnameLower = fullname.toLowerCase(); + if (fullnameLower.indexOf(object) < 0) return; + + let score = 0; + const parts = fullnameLower.split("."); + + // check for different match types: exact matches of full name or + // "last name" (i.e. last dotted part) + if (fullnameLower === object || parts.slice(-1)[0] === object) + score += Scorer.objNameMatch; + else if (parts.slice(-1)[0].indexOf(object) > -1) + score += Scorer.objPartialMatch; // matches in last name + + const objName = objNames[match[1]][2]; + const title = titles[match[0]]; + + // If more than one term searched for, we require other words to be + // found in the name/title/description + const otherTerms = new Set(objectTerms); + otherTerms.delete(object); + if (otherTerms.size > 0) { + const haystack = `${prefix} ${name} ${objName} ${title}`.toLowerCase(); + if ( + [...otherTerms].some((otherTerm) => haystack.indexOf(otherTerm) < 0) + ) + return; + } + + let anchor = match[3]; + if (anchor === "") anchor = fullname; + else if (anchor === "-") anchor = objNames[match[1]][1] + "-" + fullname; + + const descr = objName + _(", in ") + title; + + // add custom score for some objects according to scorer + if (Scorer.objPrio.hasOwnProperty(match[2])) + score += Scorer.objPrio[match[2]]; + else score += Scorer.objPrioDefault; + + results.push([ + docNames[match[0]], + fullname, + "#" + anchor, + descr, + score, + filenames[match[0]], + ]); + }; + Object.keys(objects).forEach((prefix) => + objects[prefix].forEach((array) => + objectSearchCallback(prefix, array) + ) + ); + return results; + }, + + /** + * search for full-text terms in the index + */ + performTermsSearch: (searchTerms, excludedTerms) => { + // prepare search + const terms = Search._index.terms; + const titleTerms = Search._index.titleterms; + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const titles = Search._index.titles; + + const scoreMap = new Map(); + const fileMap = new Map(); + + // perform the search on the required terms + searchTerms.forEach((word) => { + const files = []; + const arr = [ + { files: terms[word], score: Scorer.term }, + { files: titleTerms[word], score: Scorer.title }, + ]; + // add support for partial matches + if (word.length > 2) { + const escapedWord = _escapeRegExp(word); + if (!terms.hasOwnProperty(word)) { + Object.keys(terms).forEach((term) => { + if (term.match(escapedWord)) + arr.push({ files: terms[term], score: Scorer.partialTerm }); + }); + } + if (!titleTerms.hasOwnProperty(word)) { + Object.keys(titleTerms).forEach((term) => { + if (term.match(escapedWord)) + arr.push({ files: titleTerms[term], score: Scorer.partialTitle }); + }); + } + } + + // no match but word was a required one + if (arr.every((record) => record.files === undefined)) return; + + // found search word in contents + arr.forEach((record) => { + if (record.files === undefined) return; + + let recordFiles = record.files; + if (recordFiles.length === undefined) recordFiles = [recordFiles]; + files.push(...recordFiles); + + // set score for the word in each file + recordFiles.forEach((file) => { + if (!scoreMap.has(file)) scoreMap.set(file, {}); + scoreMap.get(file)[word] = record.score; + }); + }); + + // create the mapping + files.forEach((file) => { + if (!fileMap.has(file)) fileMap.set(file, [word]); + else if (fileMap.get(file).indexOf(word) === -1) fileMap.get(file).push(word); + }); + }); + + // now check if the files don't contain excluded terms + const results = []; + for (const [file, wordList] of fileMap) { + // check if all requirements are matched + + // as search terms with length < 3 are discarded + const filteredTermCount = [...searchTerms].filter( + (term) => term.length > 2 + ).length; + if ( + wordList.length !== searchTerms.size && + wordList.length !== filteredTermCount + ) + continue; + + // ensure that none of the excluded terms is in the search result + if ( + [...excludedTerms].some( + (term) => + terms[term] === file || + titleTerms[term] === file || + (terms[term] || []).includes(file) || + (titleTerms[term] || []).includes(file) + ) + ) + break; + + // select one (max) score for the file. + const score = Math.max(...wordList.map((w) => scoreMap.get(file)[w])); + // add result to the result list + results.push([ + docNames[file], + titles[file], + "", + null, + score, + filenames[file], + ]); + } + return results; + }, + + /** + * helper function to return a node containing the + * search summary for a given text. keywords is a list + * of stemmed words. + */ + makeSearchSummary: (htmlText, keywords, anchor) => { + const text = Search.htmlToText(htmlText, anchor); + if (text === "") return null; + + const textLower = text.toLowerCase(); + const actualStartPosition = [...keywords] + .map((k) => textLower.indexOf(k.toLowerCase())) + .filter((i) => i > -1) + .slice(-1)[0]; + const startWithContext = Math.max(actualStartPosition - 120, 0); + + const top = startWithContext === 0 ? "" : "..."; + const tail = startWithContext + 240 < text.length ? "..." : ""; + + let summary = document.createElement("p"); + summary.classList.add("context"); + summary.textContent = top + text.substr(startWithContext, 240).trim() + tail; + + return summary; + }, +}; + +_ready(Search.init); diff --git a/2.0.0/_static/sphinx_highlight.js b/2.0.0/_static/sphinx_highlight.js new file mode 100644 index 000000000..8a96c69a1 --- /dev/null +++ b/2.0.0/_static/sphinx_highlight.js @@ -0,0 +1,154 @@ +/* Highlighting utilities for Sphinx HTML documentation. */ +"use strict"; + +const SPHINX_HIGHLIGHT_ENABLED = true + +/** + * highlight a given string on a node by wrapping it in + * span elements with the given class name. + */ +const _highlight = (node, addItems, text, className) => { + if (node.nodeType === Node.TEXT_NODE) { + const val = node.nodeValue; + const parent = node.parentNode; + const pos = val.toLowerCase().indexOf(text); + if ( + pos >= 0 && + !parent.classList.contains(className) && + !parent.classList.contains("nohighlight") + ) { + let span; + + const closestNode = parent.closest("body, svg, foreignObject"); + const isInSVG = closestNode && closestNode.matches("svg"); + if (isInSVG) { + span = document.createElementNS("http://www.w3.org/2000/svg", "tspan"); + } else { + span = document.createElement("span"); + span.classList.add(className); + } + + span.appendChild(document.createTextNode(val.substr(pos, text.length))); + const rest = document.createTextNode(val.substr(pos + text.length)); + parent.insertBefore( + span, + parent.insertBefore( + rest, + node.nextSibling + ) + ); + node.nodeValue = val.substr(0, pos); + /* There may be more occurrences of search term in this node. So call this + * function recursively on the remaining fragment. + */ + _highlight(rest, addItems, text, className); + + if (isInSVG) { + const rect = document.createElementNS( + "http://www.w3.org/2000/svg", + "rect" + ); + const bbox = parent.getBBox(); + rect.x.baseVal.value = bbox.x; + rect.y.baseVal.value = bbox.y; + rect.width.baseVal.value = bbox.width; + rect.height.baseVal.value = bbox.height; + rect.setAttribute("class", className); + addItems.push({ parent: parent, target: rect }); + } + } + } else if (node.matches && !node.matches("button, select, textarea")) { + node.childNodes.forEach((el) => _highlight(el, addItems, text, className)); + } +}; +const _highlightText = (thisNode, text, className) => { + let addItems = []; + _highlight(thisNode, addItems, text, className); + addItems.forEach((obj) => + obj.parent.insertAdjacentElement("beforebegin", obj.target) + ); +}; + +/** + * Small JavaScript module for the documentation. + */ +const SphinxHighlight = { + + /** + * highlight the search words provided in localstorage in the text + */ + highlightSearchWords: () => { + if (!SPHINX_HIGHLIGHT_ENABLED) return; // bail if no highlight + + // get and clear terms from localstorage + const url = new URL(window.location); + const highlight = + localStorage.getItem("sphinx_highlight_terms") + || url.searchParams.get("highlight") + || ""; + localStorage.removeItem("sphinx_highlight_terms") + url.searchParams.delete("highlight"); + window.history.replaceState({}, "", url); + + // get individual terms from highlight string + const terms = highlight.toLowerCase().split(/\s+/).filter(x => x); + if (terms.length === 0) return; // nothing to do + + // There should never be more than one element matching "div.body" + const divBody = document.querySelectorAll("div.body"); + const body = divBody.length ? divBody[0] : document.querySelector("body"); + window.setTimeout(() => { + terms.forEach((term) => _highlightText(body, term, "highlighted")); + }, 10); + + const searchBox = document.getElementById("searchbox"); + if (searchBox === null) return; + searchBox.appendChild( + document + .createRange() + .createContextualFragment( + '" + ) + ); + }, + + /** + * helper function to hide the search marks again + */ + hideSearchWords: () => { + document + .querySelectorAll("#searchbox .highlight-link") + .forEach((el) => el.remove()); + document + .querySelectorAll("span.highlighted") + .forEach((el) => el.classList.remove("highlighted")); + localStorage.removeItem("sphinx_highlight_terms") + }, + + initEscapeListener: () => { + // only install a listener if it is really needed + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.shiftKey || event.altKey || event.ctrlKey || event.metaKey) return; + if (DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS && (event.key === "Escape")) { + SphinxHighlight.hideSearchWords(); + event.preventDefault(); + } + }); + }, +}; + +_ready(() => { + /* Do not call highlightSearchWords() when we are on the search page. + * It will highlight words from the *previous* search query. + */ + if (typeof Search === "undefined") SphinxHighlight.highlightSearchWords(); + SphinxHighlight.initEscapeListener(); +}); diff --git a/2.0.0/advanced.html b/2.0.0/advanced.html new file mode 100644 index 000000000..bc43c4778 --- /dev/null +++ b/2.0.0/advanced.html @@ -0,0 +1,342 @@ + + + + + + + + Advanced Usage — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Advanced Usage

+

Optical character recognition is the serial execution of multiple steps, in the +case of kraken binarization (converting color and grayscale images into bitonal +ones), layout analysis/page segmentation (extracting topological text lines +from an image), recognition (feeding text lines images into an classifiers), +and finally serialization of results into an appropriate format such as hOCR or +ALTO.

+
+

Input Specification

+

All kraken subcommands operating on input-output pairs, i.e. producing one +output document for one input document follow the basic syntax:

+
$ kraken -i input_1 output_1 -i input_2 output_2 ... subcommand_1 subcommand_2 ... subcommand_n
+
+
+

In particular subcommands may be chained.

+
+
+

Binarization

+

The binarization subcommand accepts almost the same parameters as +ocropus-nlbin. Only options not related to binarization, e.g. skew +detection are missing. In addition, error checking (image sizes, inversion +detection, grayscale enforcement) is always disabled and kraken will happily +binarize any image that is thrown at it.

+

Available parameters are:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

option

type

–threshold

FLOAT

–zoom

FLOAT

–escale

FLOAT

–border

FLOAT

–perc

INTEGER RANGE

–range

INTEGER

–low

INTEGER RANGE

–high

INTEGER RANGE

+
+
+

Page Segmentation and Script Detection

+

The segment subcommand access two operations page segmentation into lines and +script detection of those lines.

+

Page segmentation is mostly parameterless, although a switch to change the +color of column separators has been retained. The segmentation is written as a +JSON file containing bounding boxes in reading order and +the general text direction (horizontal, i.e. LTR or RTL text in top-to-bottom +reading order or vertical-ltr/rtl for vertical lines read from left-to-right or +right-to-left).

+

The script detection splits extracted lines from the segmenter into strip +sharing a particular script that can then be recognized by supplying +appropriate models for each detected script to the ocr subcommand.

+

Combined output from both consists of lists in the boxes field corresponding +to a topographical line and containing one or more bounding boxes of a +particular script. Identifiers are ISO 15924 4 character codes.

+
$ kraken -i 14.tif lines.txt segment
+$ cat lines.json
+{
+   "boxes" : [
+    [
+        ["Grek", [561, 216, 1626,309]]
+    ],
+    [
+        ["Latn", [2172, 197, 2424, 244]]
+    ],
+    [
+        ["Grek", [1678, 221, 2236, 320]],
+        ["Arab", [2241, 221, 2302, 320]]
+    ],
+
+        ["Grek", [412, 318, 2215, 416]],
+        ["Latn", [2208, 318, 2424, 416]]
+    ],
+    ...
+   ],
+   "text_direction" : "horizontal-tb"
+}
+
+
+

Script detection is automatically enabled; by explicitly disabling script +detection the boxes field will contain only a list of line bounding boxes:

+
[546, 216, 1626, 309],
+[2169, 197, 2423, 244],
+[1676, 221, 2293, 320],
+...
+[503, 2641, 848, 2681]
+
+
+

Available page segmentation parameters are:

+ + + + + + + + + + + + + + + + + + + + + + + +

option

action

-d, –text-direction

Sets principal text direction. Valid values are horizontal-lr, horizontal-rl, vertical-lr, and vertical-rl.

–scale FLOAT

Estimate of the average line height on the page

-m, –maxcolseps

Maximum number of columns in the input document. Set to 0 for uni-column layouts.

-b, –black-colseps / -w, –white-colseps

Switch to black column separators.

-r, –remove-hlines / -l, –hlines

Disables prefiltering of small horizontal lines. Improves segmenter output on some Arabic texts.

+

The parameters specific to the script identification are:

+ + + + + + + + + + + + + + +

option

action

-s/-n

Enables/disables script detection

-a, –allowed-script

Whitelists specific scripts for detection output. Other detected script runs are merged with their adjacent scripts, after a heuristic pre-merging step.

+
+
+

Model Repository

+

There is a semi-curated repository of freely licensed recognition +models that can be accessed from the command line using a few subcommands. For +evaluating a series of models it is also possible to just clone the repository +using the normal git client.

+

The list subcommand retrieves a list of all models available and prints +them including some additional information (identifier, type, and a short +description):

+
$ kraken list
+Retrieving model list   ✓
+default (pyrnn) - A converted version of en-default.pyrnn.gz
+toy (clstm) - A toy model trained on 400 lines of the UW3 data set.
+...
+
+
+

To access more detailed information the show subcommand may be used:

+
$ kraken show toy
+name: toy.clstm
+
+A toy model trained on 400 lines of the UW3 data set.
+
+author: Benjamin Kiessling (mittagessen@l.unchti.me)
+http://kraken.re
+
+
+

If a suitable model has been decided upon it can be retrieved using the get +subcommand:

+
$ kraken get toy
+Retrieving model        ✓
+
+
+

Models will be placed in $XDG_BASE_DIR and can be accessed using their name as +shown by the show command, e.g.:

+
$ kraken -i ... ... ocr -m toy
+
+
+

Additions and updates to existing models are always welcome! Just open a pull +request or write an email.

+
+
+

Recognition

+

Recognition requires a grey-scale or binarized image, a page segmentation for +that image, and a model file. In particular there is no requirement to use the +page segmentation algorithm contained in the segment subcommand or the +binarization provided by kraken.

+

Multi-script recognition is possible by supplying a script-annotated +segmentation and a mapping between scripts and models:

+
$ kraken -i ... ... ocr -m Grek:porson.clstm -m Latn:antiqua.clstm
+
+
+

All polytonic Greek text portions will be recognized using the porson.clstm +model while Latin text will be fed into the antiqua.clstm model. It is +possible to define a fallback model that other text will be fed to:

+
$ kraken -i ... ... ocr -m ... -m ... -m default:porson.clstm
+
+
+

It is also possible to disable recognition on a particular script by mapping to +the special model keyword ignore. Ignored lines will still be serialized but +will not contain any recognition results.

+

The ocr subcommand is able to serialize the recognition results either as +plain text (default), as hOCR, into ALTO, or abbyyXML containing additional +metadata such as bounding boxes and confidences:

+
$ kraken -i ... ... ocr -t # text output
+$ kraken -i ... ... ocr -h # hOCR output
+$ kraken -i ... ... ocr -a # ALTO output
+$ kraken -i ... ... ocr -y # abbyyXML output
+
+
+

hOCR output is slightly different from hOCR files produced by ocropus. Each +ocr_line span contains not only the bounding box of the line but also +character boxes (x_bboxes attribute) indicating the coordinates of each +character. In each line alternating sequences of alphanumeric and +non-alphanumeric (in the unicode sense) characters are put into ocrx_word +spans. Both have bounding boxes as attributes and the recognition confidence +for each character in the x_conf attribute.

+

Paragraph detection has been removed as it was deemed to be unduly dependent on +certain typographic features which may not be valid for your input.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/2.0.0/api.html b/2.0.0/api.html new file mode 100644 index 000000000..48b3dee83 --- /dev/null +++ b/2.0.0/api.html @@ -0,0 +1,163 @@ + + + + + + + + kraken API — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

kraken API

+

Kraken provides routines which are usable by third party tools. In general +you can expect function in the kraken package to remain stable. We will try +to keep these backward compatible, but as kraken is still in an early +development stage and the API is still quite rudimentary nothing can be +garantueed.

+
+

kraken.binarization module

+
+
+

kraken.serialization module

+
+
+

kraken.pageseg module

+
+
+

kraken.rpred module

+
+
+

kraken.transcribe module

+
+
+

kraken.linegen module

+
+
+

kraken.lib.models module

+
+
+

kraken.lib.vgsl module

+
+
+

kraken.lib.codec

+
+
+

kraken.lib.train module

+
+
+

kraken.lib.dataset module

+
+
+

kraken.lib.ctc_decoder

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/2.0.0/genindex.html b/2.0.0/genindex.html new file mode 100644 index 000000000..5b2e1ee12 --- /dev/null +++ b/2.0.0/genindex.html @@ -0,0 +1,124 @@ + + + + + + + Index — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ + +

Index

+ +
+ K + | M + +
+

K

+ + +
    +
  • + kraken + +
  • +
+ +

M

+ + +
    +
  • + module + +
  • +
+ + + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/2.0.0/gpu.html b/2.0.0/gpu.html new file mode 100644 index 000000000..048858777 --- /dev/null +++ b/2.0.0/gpu.html @@ -0,0 +1,100 @@ + + + + + + + + GPU Acceleration — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

GPU Acceleration

+

The latest version of kraken uses a new pytorch backend which enables GPU +acceleration both for training and recognition. Apart from a compatible Nvidia +GPU, CUDA and cuDNN have to be installed so pytorch can run computation on it.

+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/2.0.0/index.html b/2.0.0/index.html new file mode 100644 index 000000000..51a26f3fb --- /dev/null +++ b/2.0.0/index.html @@ -0,0 +1,222 @@ + + + + + + + + kraken — kraken documentation + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

kraken

+
+
+

kraken is a turn-key OCR system forked from ocropus. It is intended to rectify a number of +issues while preserving (mostly) functional equivalence.

+
+
+

Features

+

kraken’s main features are:

+
+
+
+

All functionality not pertaining to OCR and prerequisite steps has been +removed, i.e. no more error rate measuring, etc.

+

Pull requests and code contributions are always welcome.

+
+
+

Installation

+

kraken requires some external libraries to run. On Debian/Ubuntu they may be +installed using:

+
# apt install libpangocairo-1.0 libxml2 libblas3 liblapack3 python3-dev python3-pip
+
+
+
+

pip

+
$ pip3 install kraken
+
+
+

or by running pip in the git repository:

+
$ pip3 install .
+
+
+
+
+

conda

+

If you are running Anaconda/miniconda, use:

+
$ conda install -c mittagessen kraken
+
+
+
+
+

Models

+

Finally you’ll have to scrounge up a recognition model to do the actual +recognition of characters. To download the default English text recognition +model and place it in the user’s kraken directory:

+
$ kraken get default
+
+
+

A list of libre models available in the central repository can be retrieved by +running:

+
$ kraken list
+
+
+

Model metadata can be extracted using:

+
$ kraken show arabic-alam-al-kutub
+name: arabic-alam-al-kutub.clstm
+
+An experimental model for Classical Arabic texts.
+
+Network trained on 889 lines of [0] as a test case for a general Classical
+Arabic model. Ground truth was prepared by Sarah Savant
+<sarah.savant@aku.edu> and Maxim Romanov <maxim.romanov@uni-leipzig.de>.
+
+Vocalization was omitted in the ground truth. Training was stopped at ~35000
+iterations with an accuracy of 97%.
+
+[0] Ibn al-Faqīh (d. 365 AH). Kitāb al-buldān. Edited by Yūsuf al-Hādī, 1st
+edition. Bayrūt: ʿĀlam al-kutub, 1416 AH/1996 CE.
+alphabet:  !()-.0123456789:[] «»،؟ءابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ARABIC
+MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW
+
+
+
+
+
+

Quickstart

+

Recognizing text on an image using the default parameters including the +prerequisite steps of binarization and page segmentation:

+
$ kraken -i image.tif image.txt binarize segment ocr
+Loading RNN     ✓
+Processing      ⣻
+
+
+

To binarize a single image using the nlbin algorithm:

+
$ kraken -i image.tif bw.tif binarize
+
+
+

To segment a binarized image into reading-order sorted lines:

+
$ kraken -i bw.tif lines.json segment
+
+
+

To OCR a binarized image using the default RNN and the previously generated +page segmentation:

+
$ kraken -i bw.tif image.txt ocr --lines lines.json
+
+
+

All commands and their parameters are documented, just add the standard +--help flag for further information.

+
+
+

Training Tutorial

+

There is a training tutorial at Training a kraken model.

+
+
+

License

+

Kraken is provided under the terms and conditions of the Apache 2.0 +License retained +from the original ocropus distribution.

+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/2.0.0/ketos.html b/2.0.0/ketos.html new file mode 100644 index 000000000..8443f674e --- /dev/null +++ b/2.0.0/ketos.html @@ -0,0 +1,665 @@ + + + + + + + + Training — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Training

+

This page describes the training utilities available through the ketos +command line utility in depth. For a gentle introduction on model training +please refer to the tutorial.

+

Thanks to the magic of Connectionist Temporal Classification prerequisites for creating a +new recognition model are quite modest. The basic requirement is a number of +text lines (ground truth) that correspond to line images and some time for +training.

+
+

Transcription

+

Transcription is done through local browser based HTML transcription +environments. These are created by the ketos transcribe command line util. +Its basic input is just a number of image files and an output path to write the +HTML file to:

+
$ ketos transcribe -o output.html image_1.png image_2.png ...
+
+
+

While it is possible to put multiple images into a single transcription +environment splitting into one-image-per-HTML will ease parallel transcription +by multiple people.

+

The above command reads in the image files, converts them to black and white, +tries to split them into line images, and puts an editable text field next to +the image in the HTML. There are a handful of option changing the output:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

option

action

-d, –text-direction

Sets the principal text direction both for the segmenter and in the HTML. Can be one of horizontal-lr, horizontal-rl, vertical-lr, vertical-rl.

–scale

A segmenter parameter giving an estimate of average line height. Usually it shouldn’t be set manually.

–bw / –orig

Disables binarization of input images. If color or grayscale training data is desired this option has to be set.

-m, –maxcolseps

A segmenter parameter limiting the number of columns that can be found in the input image by setting the maximum number of column separators. Set to 0 to disable column detection.

-b, –black_colseps / -w, –white_colseps

A segmenter parameter selecting white or black column separators.

-f, –font

The font family to use for rendering the text in the HTML.

-fs, –font-style

The font style to use in the HTML.

-p, –prefill

A model to use for prefilling the transcription. (Optional)

-o, –output

Output HTML file.

+

It is possible to use an existing model to prefill the transcription environments:

+
$ ketos transcribe -p ~/arabic.mlmodel -p output.html image_1.png image_2.png ...
+
+
+

Transcription has to be diplomatic, i.e. contain the exact character sequence +in the line image, including original orthography. Some deviations, such as +consistently omitting vocalization in Arabic texts, is possible as long as they +are systematic and relatively minor.

+

After transcribing a number of lines the results have to be saved, either using +the Download button on the lower right or through the regular Save Page +As function of the browser. All the work done is contained directly in the +saved files and it is possible to save partially transcribed files and continue +work later.

+

Next the contents of the filled transcription environments have to be +extracted through the ketos extract command:

+
$ ketos extract --output output_directory *.html
+
+
+

There are some options dealing with color images and text normalization:

+ + + + + + + + + + + + + + + + + + + + + + + + + + +

option

action

-b, –binarize / –no-binarize

Binarizes color/grayscale images (default) or retains the original in the output.

-u, –normalization

Normalizes text to one of the following Unicode normalization forms: NFD, NFKD, NFC, NFKC

-s, –normalize-whitespace / –no-normalize-whitespace

Normalizes whitespace in extracted text. There are several different Unicode whitespace characters that +are replaced by a standard space when not disabled.

–reorder / –no-reorder

Tells ketos to reorder the code +point for each line into +left-to-right order. Unicode +code points are always in +reading order, e.g. the first +code point in an Arabic line +will be the rightmost +character. This option reorders +them into display order, +i.e. the first code point is +the leftmost, the second one +the next from the left and so +on. The train subcommand +does this automatically, so it +usually isn’t needed.

-r, –rotate / –no-rotate

Skips rotation of vertical lines.

-o, –output

Output directory, defaults to training

+

The result will be a directory filled with line image text pairs NNNNNN.png +and NNNNNN.gt.txt and a manifest.txt containing a list of all extracted +lines.

+
+
+

Training

+

The training utility allows training of VGSL specified models +both from scratch and from existing models. Training data is in all cases just +a directory containing image-text file pairs as produced by the +transcribe/extract tools. Here are its command line options:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

option

action

-p, –pad

Left and right padding around lines

-o, –output

Output model file prefix. Defaults to model.

-s, –spec

VGSL spec of the network to train. CTC layer +will be added automatically. default: +[1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 +Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do]

-a, –append

Removes layers before argument and then +appends spec. Only works when loading an +existing model

-i, –load

Load existing file to continue training

-F, –savefreq

Model save frequency in epochs during +training

-R, –report

Report creation frequency in epochs

-q, –quit

Stop condition for training. Set to early +for early stopping (default) or dumb for fixed +number of epochs.

-N, –epochs

Number of epochs to train for. Set to -1 for indefinite training.

–lag

Number of epochs to wait before stopping +training without improvement. Only used when using early stopping.

–min-delta

Minimum improvement between epochs to reset +early stopping. Defaults to 0.005.

-d, –device

Select device to use (cpu, cuda:0, cuda:1,…). GPU acceleration requires CUDA.

–optimizer

Select optimizer (Adam, SGD, RMSprop).

-r, –lrate

Learning rate [default: 0.001]

-m, –momentum

Momentum used with SGD optimizer. Ignored otherwise.

-w, –weight-decay

Weight decay.

–schedule

Sets the learning rate scheduler. May be either constant or 1cycle. For 1cycle +the cycle length is determined by the –epoch option.

-p, –partition

Ground truth data partition ratio between train/validation set

-u, –normalization

Ground truth Unicode normalization. One of NFC, NFKC, NFD, NFKD.

-c, –codec

Load a codec JSON definition (invalid if loading existing model)

–resize

Codec/output layer resizing option. If set +to add code points will be added, both +will set the layer to match exactly the +training data, fail will abort if training +data and model codec do not match. Only valid when refining an existing model.

-n, –reorder / –no-reorder

Reordering of code points to display order.

-t, –training-files

File(s) with additional paths to training data. Used to +enforce an explicit train/validation set split and deal with +training sets with more lines than the command line can process. Can be used more than once.

-e, –evaluation-files

File(s) with paths to evaluation data. Overrides the -p parameter.

–preload / –no-preload

Hard enable/disable for training data preloading. Preloading +training data into memory is enabled per default for sets with less than 2500 lines.

–threads

Number of OpenMP threads when running on CPU. Defaults to min(4, #cores).

+
+

From Scratch

+

The absolut minimal example to train a new model is:

+
$ ketos train training_data/*.png
+
+
+

Training will continue until the error does not improve anymore and the best +model (among intermediate results) will be saved in the current directory.

+

In some cases, such as color inputs, changing the network architecture might be +useful:

+
$ ketos train -s '[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do]' syr/*.png
+
+
+

Complete documentation for the network description language can be found on the +VGSL page.

+

Sometimes the early stopping default parameters might produce suboptimal +results such as stopping training too soon. Adjusting the minimum delta an/or +lag can be useful:

+
$ ketos train --lag 10 --min-delta 0.001 syr/*.png
+
+
+

To switch optimizers from Adam to SGD or RMSprop just set the option:

+
$ ketos train --optimizer SGD syr/*.png
+
+
+

It is possible to resume training from a previously saved model:

+
$ ketos train -i model_25.mlmodel syr/*.png
+
+
+
+
+

Fine Tuning

+

Fine tuning an existing model for another typeface or new characters is also +possible with the same syntax as resuming regular training:

+
$ ketos train -i model_best.mlmodel syr/*.png
+
+
+

The caveat is that the alphabet of the base model and training data have to be +an exact match. Otherwise an error will be raised:

+
$ ketos train -i model_5.mlmodel --no-preload kamil/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[0.8616] alphabet mismatch {'~', '»', '8', '9', 'ـ'}
+Network codec not compatible with training set
+[0.8620] Training data and model codec alphabets mismatch: {'ٓ', '؟', '!', 'ص', '،', 'ذ', 'ة', 'ي', 'و', 'ب', 'ز', 'ح', 'غ', '~', 'ف', ')', 'د', 'خ', 'م', '»', 'ع', 'ى', 'ق', 'ش', 'ا', 'ه', 'ك', 'ج', 'ث', '(', 'ت', 'ظ', 'ض', 'ل', 'ط', '؛', 'ر', 'س', 'ن', 'ء', 'ٔ', '«', 'ـ', 'ٕ'}
+
+
+

There are two modes dealing with mismatching alphabets, add and both. +add resizes the output layer and codec of the loaded model to include all +characters in the new training set without removing any characters. both +will make the resulting model an exact match with the new training set by both +removing unused characters from the model and adding new ones.

+
$ ketos -v train --resize add -i model_5.mlmodel syr/*.png
+...
+[0.7943] Training set 788 lines, validation set 88 lines, alphabet 50 symbols
+...
+[0.8337] Resizing codec to include 3 new code points
+[0.8374] Resizing last layer in network to 52 outputs
+...
+
+
+

In this example 3 characters were added for a network that is able to +recognize 52 different characters after sufficient additional training.

+
$ ketos -v train --resize both -i model_5.mlmodel syr/*.png
+...
+[0.7593] Training set 788 lines, validation set 88 lines, alphabet 49 symbols
+...
+[0.7857] Resizing network or given codec to 49 code sequences
+[0.8344] Deleting 2 output classes from network (46 retained)
+...
+
+
+

In both mode 2 of the original characters were removed and 3 new ones were added.

+
+
+

Slicing

+

Refining on mismatched alphabets has its limits. If the alphabets are highly +different the modification of the final linear layer to add/remove character +will destroy the inference capabilities of the network. In those cases it is +faster to slice off the last few layers of the network and only train those +instead of a complete network from scratch.

+

Taking the default network definition as printed in the debug log we can see +the layer indices of the model:

+
[0.8760] Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 48 outputs
+[0.8762] layer          type    params
+[0.8790] 0              conv    kernel 3 x 3 filters 32 activation r
+[0.8795] 1              dropout probability 0.1 dims 2
+[0.8797] 2              maxpool kernel 2 x 2 stride 2 x 2
+[0.8802] 3              conv    kernel 3 x 3 filters 64 activation r
+[0.8804] 4              dropout probability 0.1 dims 2
+[0.8806] 5              maxpool kernel 2 x 2 stride 2 x 2
+[0.8813] 6              reshape from 1 1 x 12 to 1/3
+[0.8876] 7              rnn     direction b transposed False summarize False out 100 legacy None
+[0.8878] 8              dropout probability 0.5 dims 1
+[0.8883] 9              linear  augmented False out 48
+
+
+

To remove everything after the initial convolutional stack and add untrained +layers we define a network stub and index for appending:

+
$ ketos train -i model_1.mlmodel --append 7 -s '[Lbx256 Do]' syr/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[0.8014] alphabet mismatch {'8', '3', '9', '7', '܇', '݀', '݂', '4', ':', '0'}
+Slicing and dicing model ✓
+
+
+

The new model will behave exactly like a new one, except potentially training a +lot faster.

+
+
+
+

Testing

+

Picking a particular model from a pool or getting a more detailled look on the +recognition accuracy can be done with the test command. It uses transcribed +lines, the test set, in the same format as the train command, recognizes the +line images with one or more models, and creates a detailled report of the +differences from the ground truth for each of them.

+
+
-m, --model
+

Model(s) to evaluate.

+
+
-e, --evaluation-files
+

File(s) with paths to evaluation data.

+
+
-d, --device
+

Select device to use.

+
+
-p, --pad
+

Left and right padding around lines.

+
+
+

Transcriptions are handed to the command in the same way as for the train +command, either through a manifest with -e/–evaluation-files or by just +adding a number of image files as the final argument:

+
$ ketos test -m $model -e test.txt test/*.png
+Evaluating $model
+Evaluating  [####################################]  100%
+=== report test_model.mlmodel ===
+
+7012 Characters
+6022 Errors
+14.12%       Accuracy
+
+5226 Insertions
+2    Deletions
+794  Substitutions
+
+Count Missed   %Right
+1567  575    63.31%  Common
+5230  5230   0.00%   Arabic
+215   215    0.00%   Inherited
+
+Errors       Correct-Generated
+773  { ا } - {  }
+536  { ل } - {  }
+328  { و } - {  }
+274  { ي } - {  }
+266  { م } - {  }
+256  { ب } - {  }
+246  { ن } - {  }
+241  { SPACE } - {  }
+207  { ر } - {  }
+199  { ف } - {  }
+192  { ه } - {  }
+174  { ع } - {  }
+172  { ARABIC HAMZA ABOVE } - {  }
+144  { ت } - {  }
+136  { ق } - {  }
+122  { س } - {  }
+108  { ، } - {  }
+106  { د } - {  }
+82   { ك } - {  }
+81   { ح } - {  }
+71   { ج } - {  }
+66   { خ } - {  }
+62   { ة } - {  }
+60   { ص } - {  }
+39   { ، } - { - }
+38   { ش } - {  }
+30   { ا } - { - }
+30   { ن } - { - }
+29   { ى } - {  }
+28   { ذ } - {  }
+27   { ه } - { - }
+27   { ARABIC HAMZA BELOW } - {  }
+25   { ز } - {  }
+23   { ث } - {  }
+22   { غ } - {  }
+20   { م } - { - }
+20   { ي } - { - }
+20   { ) } - {  }
+19   { : } - {  }
+19   { ط } - {  }
+19   { ل } - { - }
+18   { ، } - { . }
+17   { ة } - { - }
+16   { ض } - {  }
+...
+Average accuracy: 14.12%, (stddev: 0.00)
+
+
+

The report(s) contains character accuracy measured per script and a detailled +list of confusions. When evaluating multiple models the last line of the output +will the average accuracy and the standard deviation across all of them.

+
+
+

Artificial Training Data

+

It is possible to rely on artificially created training data, instead of +laborously creating ground truth by manual means. A proper typeface and some +text in the target language will be needed.

+

For many popular historical fonts there are free reproductions which quite +closely match printed editions. Most are available in your distribution’s

+

repositories and often shipped with TeX Live.

+

Some good places to start for non-Latin scripts are:

+
    +
  • Amiri, a classical Arabic typeface by Khaled +Hosny

  • +
  • The Greek Font Society offers freely +licensed (historical) typefaces for polytonic Greek.

  • +
  • The friendly religious fanatics from SIL +assemble a wide variety of fonts for non-Latin scripts.

  • +
+

Next we need some text to generate artificial line images from. It should be a +typical example of the type of printed works you want to recognize and at least +500-1000 lines in length.

+

A minimal invocation to the line generation tool will look like this:

+
$ ketos linegen -f Amiri da1.txt da2.txt
+Reading texts   ✓
+Read 3692 unique lines
+Σ (len: 99)
+Symbols:  !(),-./0123456789:ABEFGHILMNPRS[]_acdefghiklmnoprstuvyz«»،؟ءآأؤإئابةتثجحخدذرزسشصضطظعغـفقكلمنهوىيپ
+Writing images  ✓
+
+
+

The output will be written to a directory called training_data, although +this may be changed using the -o option. Each text line is rendered using +the Amiri typeface.

+
+

Alphabet and Normalization

+

Let’s take a look at important information in the preamble:

+
Read 3692 unique lines
+Σ (len: 99)
+Symbols:  !(),-./0123456789:ABEFGHILMNPRS[]_acdefghiklmnoprstuvyz«»،؟ﺀﺁﺃﺅﺈﺋﺎﺑﺔﺘﺜﺠﺤﺧﺩﺫﺭﺰﺴﺸﺼﻀﻄﻈﻌﻐـﻔﻘﻜﻠﻤﻨﻫﻭﻰﻳپ
+
+
+

ketos tells us that it found 3692 unique lines which contained 99 different +symbols or code points. We can see the training data contains all of +the Arabic script including accented precomposed characters, but only a subset +of Latin characters, numerals, and punctuation. A trained model will be able to +recognize only these exact symbols, e.g. a C or j on the page will +never be recognized. Either accept this limitation or add additional text lines +to the training corpus until the alphabet matches your needs.

+

We can also force a normalization form using the -u option; per default +none is applied. For example:

+
$ ketos linegen -u NFD -f "GFS Philostratos" grc.txt
+Reading texts   ✓
+Read 2860 unique lines
+Σ (len: 132)
+Symbols:  #&'()*,-./0123456789:;ABCDEGHILMNOPQRSTVWXZ]abcdefghiklmnopqrstuvxy §·ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩαβγδεζηθικλμνξοπρςστυφχψω—‘’“
+Combining Characters: COMBINING GRAVE ACCENT, COMBINING ACUTE ACCENT, COMBINING DIAERESIS, COMBINING COMMA ABOVE, COMBINING REVERSED COMMA ABOVE, COMBINING DOT BELOW, COMBINING GREEK PERISPOMENI, COMBINING GREEK YPOGEGRAMMENI
+
+
+$ ketos linegen -u NFC -f "GFS Philostratos" grc.txt
+Reading texts   ✓
+Read 2860 unique lines
+Σ (len: 231)
+Symbols:  #&'()*,-./0123456789:;ABCDEGHILMNOPQRSTVWXZ]abcdefghiklmnopqrstuvxy §·ΐΑΒΓΔΕΖΘΙΚΛΜΝΞΟΠΡΣΤΦΧΨΩάέήίαβγδεζηθικλμνξοπρςστυφχψωϊϋόύώἀἁἂἃἄἅἈἌἎἐἑἓἔἕἘἙἜἝἠἡἢἣἤἥἦἧἩἭἮἰἱἳἴἵἶἷἸἹἼὀὁὂὃὄὅὈὉὌὐὑὓὔὕὖὗὙὝὠὡὢὤὥὦὧὨὩὰὲὴὶὸὺὼᾄᾐᾑᾔᾗᾠᾤᾧᾳᾶᾷῃῄῆῇῒῖῥῦῬῳῴῶῷ—‘’“
+Combining Characters: COMBINING ACUTE ACCENT, COMBINING DOT BELOW
+
+
+

While there hasn’t been any study on the effect of different normalizations on +recognition accuracy there are some benefits to NFD, namely decreased model +size and easier validation of the alphabet.

+
+
+

Other Parameters

+

Sometimes it is desirable to draw a certain number of lines randomly from one +or more large texts. The -n option does just that:

+
$ ketos linegen -u NFD -n 100 -f Amiri da1.txt da2.txt da3.txt da4.txt
+Reading texts   ✓
+Read 114265 unique lines
+Sampling 100 lines      ✓
+Σ (len: 64)
+Symbols:  !(),-./0123456789:[]{}«»،؛؟ءابةتثجحخدذرزسشصضطظعغـفقكلمنهوىي–
+Combining Characters: ARABIC MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW
+Writing images ⢿
+
+
+

It is also possible to adjust to amount of degradation/distortion of line +images by using the -s/-r/-d/-ds switches:

+
$ ketos linegen -m 0.2 -s 0.002 -r 0.001 -d 3 Downloads/D/A/da1.txt
+Reading texts   ✓
+Read 859 unique lines
+Σ (len: 46)
+Symbols:  !"-.:،؛؟ءآأؤإئابةتثجحخدذرزسشصضطظعغفقكلمنهوىي
+Writing images  ⣽
+
+
+

Sometimes the shaping engine misbehaves using some fonts (notably GFS +Philostratos) by rendering texts in certain normalizations incorrectly if the +font does not contain glyphs for decomposed characters. One sign are misplaced +diacritics and glyphs in different fonts. A workaround is renormalizing the +text for rendering purposes (here to NFC):

+
$ ketos linegen -ur NFC -u NFD -f "GFS Philostratos" grc.txt
+
+
+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/2.0.0/models.html b/2.0.0/models.html new file mode 100644 index 000000000..c4c7c5c17 --- /dev/null +++ b/2.0.0/models.html @@ -0,0 +1,155 @@ + + + + + + + + Models — kraken documentation + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Models

+

There are currently three kinds of models containing the recurrent neural +networks doing all the character recognition supported by kraken: pronn +files serializing old pickled pyrnn models as protobuf, clstm’s native +serialization, and versatile Core ML models.

+
+

pyrnn

+

These are serialized instances of python lstm.SeqRecognizer objects. Using +such a model just entails loading the pickle and calling the appropriate +functions to perform recognition much like a shared library in other +programming languages.

+

Support for these models has been dropped with kraken 1.0 as python 2.7 is +phased out.

+
+
+

pronn

+

Legacy python models can be converted to a protobuf based serialization. These +are loadable by kraken 1.0 and will be automatically converted to Core ML.

+

Protobuf models have several advantages over pickled ones. They are noticeably +smaller (80Mb vs 1.8Mb for the default model), don’t allow arbitrary code +execution, and are upward compatible with python 3. Because they are so much +more lightweight they are also loaded much faster.

+
+
+

clstm

+

clstm, a small and fast implementation of +LSTM networks that was used in previous kraken versions. The model files can be +loaded with pytorch-based kraken and will be converted to Core ML.

+
+
+

CoreML

+

Core ML allows arbitrary network architectures in a compact serialization with +metadata. This is the default format in pytorch-based kraken.

+
+
+

Conversion

+

Per default pronn/clstm models are automatically converted to the new Core ML +format when explicitely defined using the -m option to the ocr utility +on the command line. They are stored in the user kraken directory (default is +~/.kraken) and will be automatically substituted in future runs.

+

If conversion is not desired, e.g. because there is a bug in the conversion +routine, it can be disabled using the --disable-autoconversion switch.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/2.0.0/objects.inv b/2.0.0/objects.inv new file mode 100644 index 000000000..35f9fb65d --- /dev/null +++ b/2.0.0/objects.inv @@ -0,0 +1,7 @@ +# Sphinx inventory version 2 +# Project: kraken +# Version: +# The remainder of this file is compressed using zlib. +xڅ�1o� �w~�I��]�y�"���$�)\md�-C����l q�np���Nu�k4���k#o�x�^*w��d[��wnJ�N�d# {��6����S�柨W��v%jU0 5 �V�QF�O�;W ? GV����_��/.� �;�Tc��M�p/�o�Kc�]ccd,r��b�P+��b�.�=��s�N��B���9��[�y��c�m�͍`;�Zy��GV��j̣�3��d��;Q� ���D�b�Ln�`����>����y�2�$������8s�S���t���j�- +��}� +qA棞i'M� \ No newline at end of file diff --git a/2.0.0/py-modindex.html b/2.0.0/py-modindex.html new file mode 100644 index 000000000..3458216d8 --- /dev/null +++ b/2.0.0/py-modindex.html @@ -0,0 +1,114 @@ + + + + + + + Python Module Index — kraken documentation + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ + +

Python Module Index

+ +
+ k +
+ + + + + + + +
 
+ k
+ kraken +
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/2.0.0/search.html b/2.0.0/search.html new file mode 100644 index 000000000..3a7e3b28e --- /dev/null +++ b/2.0.0/search.html @@ -0,0 +1,113 @@ + + + + + + + Search — kraken documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +

Search

+ + + + +

+ Searching for multiple words only shows matches that contain + all words. +

+ + +
+ + + +
+ + +
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/2.0.0/searchindex.js b/2.0.0/searchindex.js new file mode 100644 index 000000000..ad7bb4c11 --- /dev/null +++ b/2.0.0/searchindex.js @@ -0,0 +1 @@ +Search.setIndex({"alltitles": {"Advanced Usage": [[0, null]], "Alphabet and Normalization": [[4, "alphabet-and-normalization"]], "Artificial Training Data": [[4, "artificial-training-data"]], "Basics": [[7, "basics"]], "Binarization": [[0, "binarization"]], "Conversion": [[5, "conversion"]], "Convolutional Layers": [[7, "convolutional-layers"]], "CoreML": [[5, "coreml"]], "Evaluation and Validation": [[6, "evaluation-and-validation"]], "Examples": [[7, "examples"]], "Features": [[3, "features"]], "Fine Tuning": [[4, "fine-tuning"]], "From Scratch": [[4, "from-scratch"]], "GPU Acceleration": [[2, null]], "Helper and Plumbing Layers": [[7, "helper-and-plumbing-layers"]], "Image acquisition and preprocessing": [[6, "image-acquisition-and-preprocessing"]], "Input Specification": [[0, "input-specification"]], "Installation": [[3, "installation"]], "Installing kraken": [[6, "installing-kraken"]], "License": [[3, "license"]], "Max Pool": [[7, "max-pool"]], "Model Repository": [[0, "model-repository"]], "Models": [[3, "models"], [5, null]], "Other Parameters": [[4, "other-parameters"]], "Page Segmentation and Script Detection": [[0, "page-segmentation-and-script-detection"]], "Quickstart": [[3, "quickstart"]], "Recognition": [[0, "recognition"], [6, "recognition"]], "Recurrent Layers": [[7, "recurrent-layers"]], "Regularization Layers": [[7, "regularization-layers"]], "Reshape": [[7, "reshape"]], "Slicing": [[4, "slicing"]], "Testing": [[4, "testing"]], "Training": [[4, null], [4, "id1"], [6, "id1"]], "Training Tutorial": [[3, "training-tutorial"]], "Training a kraken model": [[6, null]], "Transcription": [[4, "transcription"], [6, "transcription"]], "VGSL network specification": [[7, null]], "clstm": [[5, "clstm"]], "conda": [[3, "conda"]], "kraken": [[3, null]], "kraken API": [[1, null]], "kraken.binarization module": [[1, "kraken-binarization-module"]], "kraken.lib.codec": [[1, "kraken-lib-codec"]], "kraken.lib.ctc_decoder": [[1, "kraken-lib-ctc-decoder"]], "kraken.lib.dataset module": [[1, "kraken-lib-dataset-module"]], "kraken.lib.models module": [[1, "kraken-lib-models-module"]], "kraken.lib.train module": [[1, "kraken-lib-train-module"]], "kraken.lib.vgsl module": [[1, "kraken-lib-vgsl-module"]], "kraken.linegen module": [[1, "kraken-linegen-module"]], "kraken.pageseg module": [[1, "kraken-pageseg-module"]], "kraken.rpred module": [[1, "kraken-rpred-module"]], "kraken.serialization module": [[1, "kraken-serialization-module"]], "kraken.transcribe module": [[1, "kraken-transcribe-module"]], "pip": [[3, "pip"]], "pronn": [[5, "pronn"]], "pyrnn": [[5, "pyrnn"]]}, "docnames": ["advanced", "api", "gpu", "index", "ketos", "models", "training", "vgsl"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2}, "filenames": ["advanced.rst", "api.rst", "gpu.rst", "index.rst", "ketos.rst", "models.rst", "training.rst", "vgsl.rst"], "indexentries": {"kraken": [[1, "module-kraken", false]], "module": [[1, "module-kraken", false]]}, "objects": {"": [[1, 0, 0, "-", "kraken"]]}, "objnames": {"0": ["py", "module", "Python module"]}, "objtypes": {"0": "py:module"}, "terms": {"": [0, 3, 4, 5, 6, 7], "0": [0, 3, 4, 5, 6, 7], "00": [4, 6], "001": [4, 6], "002": 4, "005": 4, "0123456789": [3, 4, 6], "01c59": 7, "0245": 6, "04": 6, "06": 6, "09": 6, "0d": 6, "1": [3, 4, 5, 6, 7], "10": [4, 6], "100": [4, 6, 7], "1000": 4, "1020": 7, "1024": 7, "106": 4, "108": 4, "11": 6, "114265": 4, "1184": 6, "12": [4, 6, 7], "122": 4, "128": 7, "13": 6, "132": [4, 6], "1339": 6, "1359": 6, "136": 4, "14": [0, 4], "1416": [3, 6], "143": 6, "144": 4, "15": 6, "1558": 6, "1567": 4, "157": 6, "15924": 0, "16": [4, 7], "161": 6, "1623": 6, "1626": 0, "1676": 0, "1678": 0, "1681": 6, "1697": 6, "17": 4, "172": 4, "1724": 6, "174": 4, "1754": 6, "176": 6, "18": [4, 6], "19": 4, "192": 4, "197": 0, "199": 4, "1996": [3, 6], "1cycl": 4, "1d": 7, "1st": [3, 6], "1x12": [4, 7], "1x16": 7, "1x48": 7, "2": [3, 4, 5, 6, 7], "20": [4, 7], "204": 6, "207": 4, "2096": 6, "215": 4, "216": 0, "2169": 0, "2172": 0, "22": [4, 6], "2208": 0, "221": 0, "2215": 0, "2236": 0, "2241": 0, "2293": 0, "23": 4, "2302": 0, "231": 4, "2334": 6, "2364": 6, "24": 6, "241": 4, "2423": 0, "2424": 0, "244": 0, "246": 4, "25": [4, 6, 7], "2500": [4, 6], "256": [4, 6, 7], "259": 6, "26": 6, "2641": 0, "266": 4, "2681": 0, "27": 4, "270": 6, "27046": 6, "274": 4, "28": 4, "2860": 4, "29": 4, "2d": 7, "3": [4, 5, 6, 7], "30": [4, 6], "300dpi": 6, "307": 6, "309": 0, "31": 4, "318": 0, "32": [4, 7], "320": 0, "328": 4, "336": 6, "3418": 6, "35000": [3, 6], "3504": 6, "3519": 6, "35619": 6, "365": [3, 6], "3680": 6, "3692": 4, "38": 4, "384": 7, "39": 4, "4": [0, 4, 6, 7], "40": 6, "400": 0, "412": 0, "416": 0, "428": 6, "431": 6, "46": 4, "47": 6, "48": [4, 6, 7], "488": 6, "49": [4, 6], "5": [4, 6, 7], "50": [4, 6], "500": 4, "503": 0, "512": 7, "52": [4, 6], "5226": 4, "5230": 4, "5258": 6, "536": 4, "545": 6, "546": 0, "56": 6, "561": 0, "575": 4, "577": 6, "59": [6, 7], "5951": 6, "599": 6, "6": [4, 6, 7], "60": [4, 6], "6022": 4, "62": 4, "63": 4, "64": [4, 7], "646": 6, "66": [4, 6], "7": [4, 5, 6, 7], "7012": 4, "7015": 6, "71": 4, "7272": 6, "7281": 6, "7593": 4, "773": 4, "7857": 4, "788": [4, 6], "794": 4, "7943": 4, "8": [4, 6, 7], "800": 6, "8014": 4, "80mb": 5, "81": [4, 6], "811": 6, "82": 4, "824": 6, "8337": 4, "8344": 4, "8374": 4, "84": 6, "8445": 6, "8479": 6, "848": 0, "8481": 6, "8482": 6, "8484": 6, "8485": 6, "8486": 6, "8487": 6, "8488": 6, "8489": 6, "8490": 6, "8491": 6, "8492": 6, "8493": 6, "8494": 6, "8495": 6, "8496": 6, "8497": 6, "8498": 6, "8499": 6, "8500": 6, "8501": 6, "8502": 6, "8503": 6, "8504": 6, "8505": 6, "8506": 6, "8507": 6, "8508": 6, "8509": 6, "8510": 6, "8511": 6, "8512": 6, "859": 4, "8616": 4, "8620": 4, "876": 6, "8760": 4, "8762": 4, "8790": 4, "8795": 4, "8797": 4, "88": [4, 6], "8802": 4, "8804": 4, "8806": 4, "8813": 4, "8876": 4, "8878": 4, "8883": 4, "889": [3, 6], "8mb": 5, "9": [4, 6, 7], "906": 7, "906x32": 7, "9315": 6, "9318": 6, "9350": 6, "9361": 6, "9381": 6, "9541": 6, "9550": 6, "96": 6, "97": [3, 6], "98": 6, "99": [4, 6], "9918": 6, "9920": 6, "9924": 6, "A": [0, 3, 4, 6, 7], "As": [4, 6], "At": 6, "By": 6, "For": [0, 4, 6, 7], "If": [0, 3, 4, 5, 6, 7], "In": [0, 1, 4, 6], "It": [0, 3, 4, 6], "Its": [4, 6], "NO": 6, "On": 3, "One": 4, "The": [0, 2, 4, 5, 6, 7], "There": [0, 3, 4, 5, 6], "These": [4, 5, 6], "To": [0, 3, 4, 6], "_acdefghiklmnoprstuvyz": 4, "abbyxml": 3, "abbyyxml": 0, "abcdefghiklmnopqrstuvxi": 4, "abcdeghilmnopqrstvwxz": 4, "abefghilmnpr": 4, "abl": [0, 4], "abort": [4, 6], "about": 6, "abov": [3, 4, 6], "absolut": 4, "acceler": [4, 6], "accent": 4, "accept": [0, 4], "access": 0, "account": 6, "accuraci": [3, 4, 6], "achiev": 6, "across": [4, 6], "action": [0, 4], "activ": [4, 6, 7], "actual": [3, 6], "acut": 4, "ad": [4, 6], "adam": 4, "add": [3, 4, 7], "addit": [0, 4], "adjac": 0, "adjust": [4, 6], "advantag": 5, "advis": 6, "affect": 6, "after": [0, 4, 6, 7], "again": 6, "ah": [3, 6], "aku": [3, 6], "al": [3, 6], "alam": [3, 6], "albeit": 6, "algorithm": [0, 3, 6], "align": 6, "all": [0, 3, 4, 5, 6], "allow": [0, 4, 5, 6], "almost": 0, "along": 7, "alphabet": [3, 6, 7], "alphanumer": 0, "alreadi": 6, "also": [0, 4, 5, 6], "altern": [0, 7], "although": [0, 4], "alto": [0, 3, 6], "alwai": [0, 3, 4], "amiri": 4, "amiss": 6, "among": 4, "amount": [4, 6], "an": [0, 1, 3, 4, 6, 7], "anaconda": 3, "analysi": [0, 6], "ani": [0, 4], "annot": 0, "anoth": [4, 6, 7], "antiqua": 0, "anymor": [4, 6], "apach": 3, "apart": 2, "append": [4, 6, 7], "appli": [4, 6, 7], "approach": 6, "appropri": [0, 5, 6, 7], "apt": 3, "ar": [0, 1, 3, 4, 5, 6, 7], "arab": [0, 3, 4, 6], "arbitrari": [5, 6, 7], "architectur": [3, 4, 5, 7], "archiv": 6, "argument": 4, "around": [4, 6], "assembl": 4, "assign": 6, "attribut": 0, "augment": [4, 6, 7], "author": 0, "autoconvers": 5, "automat": [0, 4, 5, 6, 7], "avail": [0, 3, 4, 6], "averag": [0, 4, 6], "axi": 7, "b": [0, 4, 6, 7], "backend": 2, "backward": 1, "base": [4, 5, 6, 7], "basic": [0, 4, 6], "batch": [6, 7], "bayr\u016bt": [3, 6], "becaus": [5, 6], "been": [0, 3, 4, 5, 6], "befor": [4, 6, 7], "beforehand": 6, "behav": [4, 7], "being": 7, "below": [3, 4, 6], "benefit": 4, "benjamin": 0, "best": [4, 6], "between": [0, 4, 6], "bi": 7, "bidi": 3, "bidirection": 7, "binar": [3, 4, 6], "biton": 0, "black": [0, 4, 6], "black_colsep": 4, "block": 7, "border": 0, "both": [0, 2, 4, 6], "bottom": [0, 3], "bound": [0, 3], "box": [0, 3, 6], "break": 6, "browser": [4, 6], "bug": 5, "build": [4, 6], "buld\u0101n": [3, 6], "button": [4, 6], "bw": [3, 4], "bw_imag": 6, "c": [3, 4, 7], "call": [4, 5, 6], "can": [0, 1, 2, 3, 4, 5, 6], "capabl": 4, "case": [0, 3, 4, 6], "cat": 0, "caveat": 4, "ce": [3, 6], "cell": 7, "cent": 6, "central": [3, 6], "certain": [0, 4, 6], "chain": [0, 6], "chang": [0, 4], "channel": 7, "charact": [0, 3, 4, 5, 6], "check": 0, "circumst": 6, "class": [4, 6], "classic": [3, 4, 6], "classif": [4, 6, 7], "classifi": [0, 7], "claus": 6, "client": 0, "clone": 0, "close": 4, "clstm": [0, 3], "code": [0, 3, 4, 5, 6], "codec": 4, "codex": 6, "color": [0, 4, 6, 7], "colsep": 0, "column": [0, 4, 6], "com": 6, "combin": [0, 4, 6, 7], "comma": 4, "command": [0, 3, 4, 5, 6], "common": [4, 6], "compact": 5, "compat": [1, 2, 4, 5], "complet": [4, 6], "complex": 6, "compress": 6, "compris": 6, "comput": [2, 6], "computation": 6, "conda": 6, "condit": [3, 4], "confid": 0, "confus": 4, "connect": 6, "connectionist": 4, "consist": [0, 4, 6, 7], "constant": 4, "construct": 6, "contain": [0, 4, 5, 6], "content": [4, 6], "continu": [4, 6], "contrast": 6, "contribut": 3, "conv": [4, 7], "convers": 6, "convert": [0, 4, 5, 6], "convolut": 4, "coordin": 0, "copi": 6, "core": [4, 5], "corpu": 4, "correct": [4, 6], "correspond": [0, 4], "cost": 6, "count": [4, 6], "coupl": 6, "coverag": 6, "cpu": [4, 6], "cr3": [4, 7], "creat": [4, 6, 7], "creation": 4, "ctc": 4, "ctrl": 6, "cuda": [2, 4], "cudnn": 2, "curat": 0, "current": [4, 5], "cut": [3, 6], "cycl": 4, "d": [0, 3, 4, 6, 7], "da1": 4, "da2": 4, "da3": 4, "da4": 4, "data": [0, 6, 7], "de": [3, 6], "deal": [4, 6], "debian": 3, "debug": [4, 6], "decai": 4, "decid": 0, "decompos": [4, 6], "decreas": [4, 6], "deem": 0, "default": [0, 3, 4, 5, 6, 7], "defin": [0, 4, 5, 7], "definit": [4, 7], "degrad": 4, "degre": 6, "delet": [4, 6], "delta": 4, "depend": [0, 6], "depth": [4, 6, 7], "deriv": 6, "describ": 4, "descript": [0, 4], "desir": [4, 5, 7], "destroi": 4, "detail": [0, 4, 6], "detect": [3, 4], "determin": 4, "dev": 3, "develop": 1, "deviat": [4, 6], "devic": [4, 6], "diacrit": 4, "diaeres": 6, "diaeresi": [4, 6], "dialect": 7, "dice": 4, "differ": [0, 4, 6, 7], "digit": 6, "dim": [4, 6, 7], "dimens": 7, "diplomat": [4, 6], "direct": [0, 4, 6, 7], "directli": [4, 6], "directori": [3, 4, 5, 6], "disabl": [0, 4, 5, 6], "disk": 6, "displai": 4, "distort": 4, "distribut": [3, 4, 7], "do": [3, 4, 5, 6, 7], "do0": [4, 7], "document": [0, 3, 4, 6], "doe": [4, 6], "doesn": 6, "don": 5, "done": [4, 6], "dot": [4, 6], "down": 6, "download": [3, 4, 6], "draw": 4, "drop": [5, 7], "dropout": [4, 6, 7], "dumb": 4, "dure": [4, 6], "e": [0, 3, 4, 5, 6, 7], "each": [0, 4, 6], "earli": [1, 4, 6], "eas": [4, 6], "easier": 4, "easiest": 6, "easili": 6, "edit": [3, 4, 6], "editor": 6, "edu": [3, 6], "effect": 4, "effici": 6, "eiter": 7, "either": [0, 4, 6, 7], "email": 0, "emploi": 6, "empti": 6, "en": 0, "enabl": [0, 2, 4, 6, 7], "encod": 6, "encount": 6, "enforc": [0, 4], "engin": 4, "english": 3, "enough": 6, "ensur": 6, "entail": 5, "env": 6, "environ": [4, 6], "epoch": [4, 6], "equal": [6, 7], "equival": [3, 7], "erron": 6, "error": [0, 3, 4, 6], "escal": 0, "estim": [0, 4, 6], "etc": 3, "evalu": [0, 4], "even": 6, "everyth": 4, "exact": [4, 6], "exactli": 4, "exampl": [4, 6], "except": 4, "execut": [0, 5, 6, 7], "exhaust": 6, "exist": [0, 4, 6], "expect": [1, 6, 7], "experi": 6, "experiment": [3, 6], "explicit": [4, 5], "explicitli": [0, 6], "extend": 7, "extent": 6, "extern": 3, "extract": [0, 3, 4, 6], "f": [4, 6, 7], "fail": 4, "fairli": 6, "fallback": 0, "fals": [4, 6, 7], "famili": 4, "fanat": 4, "faq\u012bh": [3, 6], "fast": 5, "faster": [4, 5, 6, 7], "featur": [0, 6, 7], "fed": [0, 7], "feed": 0, "feminin": 6, "fetch": 6, "few": [0, 4], "field": [0, 4, 6], "file": [0, 3, 4, 5, 6], "fill": [4, 6], "filter": [4, 7], "final": [0, 3, 4, 6, 7], "find": 6, "finish": 6, "first": [4, 6, 7], "fit": 6, "fix": [4, 6], "flag": 3, "float": 0, "follow": [0, 4, 7], "font": 4, "forc": 4, "fork": 3, "form": [4, 6], "format": [0, 4, 5, 6], "formul": 7, "forward": 7, "found": [4, 6], "free": 4, "freeli": [0, 4, 6], "frequenc": [4, 6], "friendli": [4, 6], "from": [0, 2, 3, 6, 7], "full": 6, "function": [1, 3, 4, 5, 6], "further": 3, "futur": 5, "g": [0, 4, 5, 6, 7], "garantue": 1, "gener": [0, 1, 3, 4, 6], "gentl": 4, "get": [0, 3, 4, 6], "gf": 4, "git": [0, 3], "githubusercont": 6, "give": 4, "given": [4, 7], "glyph": [4, 6], "good": 4, "gpu": 4, "graph": 7, "graphem": 6, "grave": 4, "grayscal": [0, 4, 6, 7], "grc": 4, "greek": [0, 4, 6], "grei": 0, "grek": 0, "ground": [3, 4, 6], "group": 6, "gru": 7, "gt": [4, 6], "guid": 6, "gz": 0, "h": [0, 6], "ha": [0, 3, 4, 5, 6, 7], "half": 6, "hamza": [3, 4, 6], "hand": [4, 6], "happili": 0, "hard": [4, 6], "hasn": 4, "have": [0, 2, 3, 4, 5, 6], "hebrew": 6, "heigh": 7, "height": [0, 4, 7], "held": 6, "help": [3, 6], "here": 4, "heurist": 0, "high": [0, 6, 7], "higher": 7, "highli": [4, 6], "histor": 4, "hline": 0, "hocr": [0, 3, 6], "horizont": [0, 4], "hosni": 4, "hour": 6, "how": 6, "html": [4, 6], "http": [0, 6], "h\u0101d\u012b": [3, 6], "i": [0, 1, 3, 4, 5, 6, 7], "ibn": [3, 6], "identif": 0, "identifi": 0, "ignor": [0, 4], "imag": [0, 3, 4, 7], "image_1": [4, 6], "image_2": [4, 6], "implement": [5, 7], "import": [4, 6], "importantli": 6, "improv": [0, 4, 6], "includ": [0, 3, 4, 6], "incorrect": 6, "incorrectli": 4, "increas": 6, "indefinit": 4, "independ": 7, "index": 4, "indic": [0, 4, 6], "infer": 4, "inform": [0, 3, 4, 6], "inherit": [4, 6], "initi": [4, 6, 7], "input": [4, 6, 7], "input_1": [0, 6], "input_2": [0, 6], "input_imag": 6, "insert": [4, 6, 7], "inspect": 6, "instal": 2, "instanc": 5, "instead": [4, 6], "insuffici": 6, "integ": [0, 6, 7], "intend": 3, "intens": 6, "intermedi": [4, 6], "introduct": 4, "intuit": 7, "invalid": 4, "invers": 0, "invoc": 4, "invok": 6, "involv": 6, "isn": [4, 7], "iso": 0, "issu": 3, "iter": [3, 6], "its": [4, 6], "j": 4, "jpeg": 6, "json": [0, 3, 4], "just": [0, 3, 4, 5, 6], "kamil": 4, "keep": 1, "kei": 3, "kernel": [4, 7], "kernel_s": 7, "keto": [4, 6], "keyword": 0, "khale": 4, "kiessl": 0, "kind": [5, 6], "kit\u0101b": [3, 6], "know": 6, "known": 6, "kraken": [0, 2, 5, 7], "kutub": [3, 6], "l": [0, 6, 7], "labor": 4, "lack": 6, "lag": 4, "languag": [4, 5, 7], "larg": [4, 6], "larger": 6, "last": [4, 7], "later": [4, 6], "latest": 2, "latin": [0, 4, 6], "latn": 0, "layer": [4, 6], "layout": [0, 6], "lbx100": [4, 6, 7], "lbx128": [4, 7], "lbx256": [4, 7], "learn": 4, "least": [4, 6], "leav": [6, 7], "left": [0, 3, 4, 6], "leftmost": 4, "legaci": [4, 5, 6, 7], "leipzig": [3, 6], "len": 4, "length": 4, "less": [4, 6], "lesser": 6, "let": [4, 6], "level": 6, "lfx25": 7, "lfys20": 7, "lfys64": [4, 7], "libblas3": 3, "liblapack3": 3, "libpangocairo": 3, "libr": 3, "librari": [3, 5], "libxml2": 3, "licens": [0, 4], "lightweight": [3, 5], "like": [4, 5, 6], "limit": 4, "line": [0, 3, 4, 5, 6, 7], "linear": [4, 6, 7], "linegen": 4, "linux": 6, "list": [0, 3, 4, 6], "live": 4, "ll": 3, "load": [3, 4, 5, 6], "loadabl": 5, "local": [4, 6], "log": [4, 6], "long": [4, 6], "look": [4, 6], "lossless": 6, "lot": 4, "low": 0, "lower": [4, 6], "lr": [0, 4, 6], "lrate": 4, "lstm": [5, 7], "ltr": 0, "lump": 6, "m": [0, 4, 5, 6, 7], "mac": 6, "maddah": [3, 4, 6], "made": 6, "magic": 4, "mai": [0, 3, 4, 6], "main": 3, "make": 4, "mani": 4, "manifest": [4, 6], "manual": [4, 6], "map": 0, "mark": 6, "markedli": 6, "master": 6, "match": 4, "matter": 6, "maxcolsep": [0, 4], "maxim": [3, 6], "maximum": [0, 4, 7], "maxpool": [4, 7], "me": 0, "mean": [4, 6], "measur": [3, 4], "memori": [4, 6], "merg": 0, "metadata": [0, 3, 5, 6], "might": [4, 6], "min": 4, "miniconda": 3, "minim": 4, "minimum": 4, "minor": [4, 6], "misbehav": 4, "mismatch": [4, 6], "misplac": 4, "misrecogn": 6, "miss": [0, 4, 6], "mittagessen": [0, 3, 6], "ml": 5, "mlmodel": [4, 6], "mode": 4, "model": [4, 7], "model_1": 4, "model_25": 4, "model_5": 4, "model_best": 4, "model_fil": 6, "model_nam": 6, "model_name_best": 6, "modern": 6, "modest": 4, "modif": 4, "momentum": [4, 6], "more": [0, 3, 4, 5, 6, 7], "most": [4, 6], "mostli": [0, 3, 6, 7], "move": [6, 7], "mp": 7, "mp2": [4, 7], "mp3": [4, 7], "much": 5, "multi": [0, 3, 6], "multipl": [0, 4, 6], "n": [0, 4, 7], "name": [0, 3, 4, 6, 7], "nativ": 5, "natur": 6, "necessari": 6, "need": 4, "net": 6, "network": [3, 4, 5, 6], "neural": [5, 6], "never": [4, 6], "new": [2, 4, 5, 6, 7], "newspap": 6, "next": [4, 6], "nfc": 4, "nfd": [4, 6], "nfkc": 4, "nfkd": 4, "nlbin": [0, 3], "nnnnnn": [4, 6], "non": [0, 4, 6, 7], "none": [4, 6, 7], "nonlinear": 7, "normal": [0, 6, 7], "notabl": 4, "noth": 1, "notic": 5, "now": 6, "number": [0, 3, 4, 6, 7], "numer": [4, 6], "nvidia": 2, "o": [4, 6], "o1c103": 7, "object": 5, "obvious": 6, "occur": 6, "ocr": [0, 3, 5, 6], "ocr_lin": 0, "ocropu": [0, 3], "ocrx_word": 0, "off": [4, 6], "offer": 4, "often": [4, 6], "old": 5, "omit": [3, 4, 6], "onc": 4, "one": [0, 4, 6, 7], "ones": [0, 4, 5], "onli": [0, 4, 6, 7], "open": 0, "openmp": [4, 6], "oper": [0, 7], "optic": [0, 6], "optim": [4, 6], "option": [0, 4, 5, 7], "order": [0, 3, 4, 7], "orig": 4, "origin": [3, 4, 6], "orthographi": [4, 6], "other": [0, 5, 6, 7], "otherwis": 4, "out": [4, 5, 6, 7], "output": [0, 3, 4, 6, 7], "output_1": [0, 6], "output_2": [0, 6], "output_dir": 6, "output_directori": [4, 6], "output_fil": 6, "over": 5, "overal": 6, "overfit": 6, "overrid": 4, "p": 4, "packag": [1, 6], "pad": 4, "page": [3, 4, 6], "pair": [0, 4, 6], "paragraph": 0, "parallel": [4, 6], "param": [4, 6, 7], "paramet": [0, 3, 6], "parameterless": 0, "part": [6, 7], "parti": 1, "partial": [4, 6], "particular": [0, 4, 6, 7], "partit": 4, "pass": [6, 7], "past": 6, "path": [4, 6], "pattern": 6, "pdf": 6, "pdfimag": 6, "pdftocairo": 6, "peopl": [4, 6], "per": [4, 5, 6], "perc": 0, "perform": [5, 6], "period": 6, "perispomeni": 4, "pertain": 3, "phase": 5, "philostrato": 4, "pick": 4, "pickl": 5, "pinpoint": 6, "pip3": 3, "pixel": 7, "place": [0, 3, 4, 6], "placement": 6, "plain": 0, "pleas": 4, "png": [4, 6], "point": [4, 6], "polyton": [0, 4, 6], "pool": 4, "popular": 4, "porson": 0, "portion": 0, "possibl": [0, 4, 6], "potenti": 4, "pre": 0, "preambl": 4, "precompos": 4, "prefer": 6, "prefil": 4, "prefilt": 0, "prefix": [4, 6], "prefix_epoch": 6, "preload": [4, 6], "prepar": [3, 6], "prepend": 7, "prerequisit": [3, 4], "preserv": 3, "prevent": 6, "previou": 5, "previous": [3, 4], "princip": [0, 4], "print": [0, 4, 6], "prob": 7, "probabl": [4, 6, 7], "proceed": 6, "process": [3, 4, 6, 7], "produc": [0, 4, 6], "program": 5, "progress": 6, "project": 7, "proper": 4, "properli": 6, "protobuf": 5, "prove": 6, "provid": [0, 1, 3, 7], "public": 3, "pull": [0, 3], "punctuat": 4, "purpos": [4, 6, 7], "put": [0, 4, 6], "pyrnn": 0, "python": 5, "python3": 3, "pytorch": [2, 5], "q": 4, "qualiti": 6, "quit": [1, 4], "r": [0, 4, 7], "rais": 4, "random": 6, "randomli": 4, "rang": 0, "rapidli": 6, "rate": [3, 4, 6], "ratio": 4, "raw": 6, "re": 0, "reach": 6, "read": [0, 3, 4, 6], "real": 6, "recogn": [0, 3, 4, 6], "recognit": [2, 3, 4, 5, 7], "recommend": 6, "rectifi": 3, "recurr": 5, "reduc": [6, 7], "refer": [4, 6], "refin": 4, "regular": [4, 6], "rel": [4, 6], "relat": [0, 6], "relax": 6, "reli": 4, "reliabl": 6, "religi": 4, "relu": 7, "remain": [1, 6], "remaind": 7, "remedi": 6, "remov": [0, 3, 4, 6, 7], "render": 4, "renorm": 4, "reorder": [4, 6], "repeatedlydur": 6, "replac": 4, "report": [4, 6], "repositori": [3, 4, 6], "represent": 6, "reproduct": 4, "request": [0, 3, 7], "requir": [0, 3, 4, 6, 7], "reset": 4, "reshap": 4, "resiz": 4, "respect": 7, "result": [0, 4, 6, 7], "resum": 4, "retain": [0, 3, 4], "retrain": 6, "retriev": [0, 3, 6], "return": 7, "revers": [4, 7], "review": 6, "rgb": 7, "right": [0, 3, 4, 6], "rightmost": 4, "rl": [0, 4], "rmsprop": [4, 6], "rnn": [3, 4, 6, 7], "romanov": [3, 6], "rotat": 4, "rough": 6, "routin": [1, 5], "rtl": 0, "rudimentari": 1, "rukkakha": 6, "rule": 6, "run": [0, 2, 3, 4, 5, 6, 7], "s1": [4, 7], "same": [0, 4, 6], "sampl": 4, "sarah": [3, 6], "savant": [3, 6], "save": [4, 6], "savefreq": [4, 6], "scale": [0, 4, 7], "scan": 6, "scantailor": 6, "schedul": 4, "script": [3, 4, 6], "scroung": 3, "second": 4, "section": 6, "see": [4, 6], "seen": 6, "segment": [3, 4, 6], "seldomli": 6, "select": [4, 7], "semant": 6, "semi": [0, 6], "sens": 0, "separ": [0, 4, 6], "seqrecogn": 5, "sequenc": [0, 4, 6, 7], "seri": 0, "serial": [0, 5], "set": [0, 4, 6, 7], "sever": [4, 5, 6], "sgd": 4, "shape": [4, 7], "share": [0, 5], "shell": 6, "ship": 4, "short": [0, 7], "should": [4, 6], "shouldn": 4, "show": [0, 3, 6], "shown": [0, 6], "sigmoid": 7, "sign": 4, "significantli": 6, "sil": 4, "similar": 6, "simpl": [6, 7], "singl": [3, 4, 6, 7], "size": [0, 4, 6, 7], "skew": [0, 6], "skip": [4, 6], "slightli": [0, 6, 7], "small": [0, 5, 6, 7], "smaller": 5, "so": [2, 4, 5, 6, 7], "societi": 4, "softmax": 7, "softwar": 6, "some": [0, 3, 4, 6], "someth": 6, "sometim": [4, 6], "somewhat": 6, "soon": [4, 6], "sort": [3, 6], "sourc": [6, 7], "space": [4, 6], "span": 0, "spec": 4, "special": 0, "specif": 6, "specifi": 4, "speckl": 6, "speed": 6, "split": [0, 4, 6, 7], "squash": 7, "stabl": 1, "stack": [4, 7], "stage": 1, "standard": [3, 4, 6], "start": [4, 6], "stddev": 4, "step": [0, 3, 6, 7], "still": [0, 1], "stop": [3, 4, 6], "store": 5, "stride": [4, 7], "string": 7, "strip": [0, 7], "stub": 4, "studi": 4, "style": 4, "subcommand": [0, 4], "subcommand_1": 0, "subcommand_2": 0, "subcommand_n": 0, "suboptim": 4, "subset": 4, "substitut": [4, 5, 6], "suffer": 6, "suffici": [4, 6], "suit": 6, "suitabl": [0, 6], "summar": [4, 6, 7], "superflu": 6, "suppli": 0, "support": [3, 5], "switch": [0, 4, 5, 6], "symbol": [4, 6], "syntax": [0, 4, 7], "syr": [4, 6], "syriac": 6, "syriac_best": 6, "system": [3, 6], "systemat": [4, 6], "t": [0, 4, 5, 6, 7], "tabl": 6, "take": [4, 6], "tanh": 7, "target": [4, 6], "task": 6, "tb": 0, "tell": 4, "tempor": 4, "tensor": 7, "tensorflow": 7, "term": 3, "tesseract": 7, "test": [3, 6], "test_model": 4, "tex": 4, "text": [0, 3, 4, 6], "text_direct": 0, "than": [4, 6], "thank": 4, "thei": [3, 4, 5, 6], "them": [0, 4, 6], "therefor": 6, "therein": 6, "thi": [4, 5, 6, 7], "third": 1, "those": [0, 4], "thread": [4, 6], "three": 5, "threshold": 0, "through": [4, 6], "thrown": 0, "tif": [0, 3], "tiff": 6, "time": [4, 6, 7], "togeth": 6, "toi": 0, "too": [4, 7], "tool": [1, 4, 6, 7], "top": [0, 3], "topograph": 0, "topolog": 0, "total": 6, "train": [0, 2, 7], "training_data": 4, "transcrib": [4, 6], "transpos": [4, 6, 7], "treat": 7, "tri": [4, 6], "true": 7, "truth": [3, 4, 6], "try": 1, "turn": 3, "tutori": 4, "two": [0, 4], "txt": [0, 3, 4, 6], "type": [0, 4, 6, 7], "typefac": 4, "typic": 4, "typograph": [0, 6], "u": 4, "ubuntu": 3, "unchti": 0, "unclean": 6, "under": 3, "undesir": 7, "unduli": 0, "uni": [0, 3, 6], "unicod": [0, 4, 6], "uniqu": [4, 6], "unit": 6, "unpredict": 6, "unrepres": 6, "unseg": 6, "until": 4, "untrain": 4, "unus": 4, "up": [3, 6], "updat": 0, "upon": 0, "upward": [5, 6], "ur": 4, "us": [0, 2, 3, 4, 5, 6, 7], "usabl": 1, "user": [3, 5, 6], "usual": [4, 6], "util": [4, 5, 6], "uw3": 0, "v": [4, 5, 6], "valid": [0, 4], "valu": [0, 7], "variabl": [3, 6, 7], "varieti": 4, "verbos": 6, "versatil": 5, "version": [0, 2, 5], "vertic": [0, 4], "vgsl": 4, "vocal": [3, 4, 6], "vv": 6, "w": [0, 4, 7], "wa": [0, 3, 5, 6], "wai": [4, 6], "wait": 4, "want": [4, 6], "warn": 6, "warp": 6, "we": [1, 4, 6], "weak": 6, "websit": 6, "weight": 4, "welcom": [0, 3], "well": 6, "were": 4, "western": 6, "wget": 6, "when": [4, 5, 6, 7], "where": 6, "which": [0, 1, 2, 4, 6], "while": [0, 3, 4, 6], "white": [0, 4, 6], "white_colsep": 4, "whitelist": 0, "whitespac": 4, "whole": 6, "wide": [4, 7], "width": [6, 7], "wild": 6, "wildli": 6, "without": [4, 6], "word": 3, "work": [4, 6], "workaround": 4, "world": 6, "write": [0, 4, 6], "written": [0, 4, 6], "x": [4, 6, 7], "x_bbox": 0, "x_conf": 0, "x_stride": 7, "xa0": 6, "xdg_base_dir": 0, "y": [0, 7], "y_stride": 7, "yml": 6, "you": [1, 3, 4, 6], "your": [0, 4], "ypogegrammeni": 4, "y\u016bsuf": [3, 6], "zero": [6, 7], "zoom": 0, "\u02bf\u0101lam": [3, 6], "\u0390\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c6\u03c7\u03c8\u03c9\u03ac\u03ad\u03ae\u03af\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c2\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03ca\u03cb\u03cc\u03cd\u03ce\u1f00\u1f01\u1f02\u1f03\u1f04\u1f05\u1f00\u1f04\u1f06\u1f10\u1f11\u1f13\u1f14\u1f15\u1f10\u1f11\u1f14\u1f15\u1f20\u1f21\u1f22\u1f23\u1f24\u1f25\u1f26\u1f27\u1f21\u1f25\u1f26\u1f30\u1f31\u1f33\u1f34\u1f35\u1f36\u1f37\u1f30\u1f31\u1f34\u1f40\u1f41\u1f42\u1f43\u1f44\u1f45\u1f40\u1f41\u1f44\u1f50\u1f51\u1f53\u1f54\u1f55\u1f56\u1f57\u1f51\u1f55\u1f60\u1f61\u1f62\u1f64\u1f65\u1f66\u1f67\u1f60\u1f61\u1f70\u1f72\u1f74\u1f76\u1f78\u1f7a\u1f7c\u1f84\u1f90\u1f91\u1f94\u1f97\u1fa0\u1fa4\u1fa7\u1fb3\u1fb6\u1fb7\u1fc3\u1fc4\u1fc6\u1fc7\u1fd2\u1fd6\u1fe5\u1fe6\u1fe5\u1ff3\u1ff4\u1ff6\u1ff7": 4, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c2\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 4, "\u03c3": 4, "\u0621": 4, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e": 4, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 4, "\u0621\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 4, "\u0621\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": [3, 6], "\u0627": 4, "\u0628": 4, "\u0629": 4, "\u062a": 4, "\u062b": 4, "\u062c": 4, "\u062d": 4, "\u062e": 4, "\u062f": 4, "\u0630": 4, "\u0631": 4, "\u0632": 4, "\u0633": 4, "\u0634": 4, "\u0635": 4, "\u0636": 4, "\u0637": 4, "\u0638": 4, "\u0639": 4, "\u063a": 4, "\u0640": 4, "\u0641": 4, "\u0642": 4, "\u0643": 4, "\u0644": 4, "\u0645": 4, "\u0646": 4, "\u0647": 4, "\u0648": 4, "\u0649": 4, "\u064a": 4, "\u0710": 6, "\u0712": 6, "\u0713": 6, "\u0715": 6, "\u0717": 6, "\u0718": 6, "\u0719": 6, "\u071a": 6, "\u071b": 6, "\u071d": 6, "\u071f": 6, "\u0720": 6, "\u0721": 6, "\u0722": 6, "\u0723": 6, "\u0725": 6, "\u0726": 6, "\u0728": 6, "\u0729": 6, "\u072a": 6, "\u072b": 6, "\u072c": 6, "\ufe80\ufe81\ufe83\ufe85\ufe88\ufe8b\ufe8e\ufe91\ufe94\ufe98\ufe9c\ufea0\ufea4\ufea7\ufea9\ufeab\ufead\ufeb0\ufeb4\ufeb8\ufebc\ufec0\ufec4\ufec8\ufecc\ufed0\u0640\ufed4\ufed8\ufedc\ufee0\ufee4\ufee8\ufeeb\ufeed\ufef0\ufef3\u067e": 4}, "titles": ["Advanced Usage", "kraken API", "GPU Acceleration", "kraken", "Training", "Models", "Training a kraken model", "VGSL network specification"], "titleterms": {"acceler": 2, "acquisit": 6, "advanc": 0, "alphabet": 4, "api": 1, "artifici": 4, "basic": 7, "binar": [0, 1], "clstm": 5, "codec": 1, "conda": 3, "convers": 5, "convolut": 7, "coreml": 5, "ctc_decod": 1, "data": 4, "dataset": 1, "detect": 0, "evalu": 6, "exampl": 7, "featur": 3, "fine": 4, "from": 4, "gpu": 2, "helper": 7, "imag": 6, "input": 0, "instal": [3, 6], "kraken": [1, 3, 6], "layer": 7, "lib": 1, "licens": 3, "linegen": 1, "max": 7, "model": [0, 1, 3, 5, 6], "modul": 1, "network": 7, "normal": 4, "other": 4, "page": 0, "pageseg": 1, "paramet": 4, "pip": 3, "plumb": 7, "pool": 7, "preprocess": 6, "pronn": 5, "pyrnn": 5, "quickstart": 3, "recognit": [0, 6], "recurr": 7, "regular": 7, "repositori": 0, "reshap": 7, "rpred": 1, "scratch": 4, "script": 0, "segment": 0, "serial": 1, "slice": 4, "specif": [0, 7], "test": 4, "train": [1, 3, 4, 6], "transcrib": 1, "transcript": [4, 6], "tune": 4, "tutori": 3, "usag": 0, "valid": 6, "vgsl": [1, 7]}}) \ No newline at end of file diff --git a/2.0.0/training.html b/2.0.0/training.html new file mode 100644 index 000000000..5a7215a90 --- /dev/null +++ b/2.0.0/training.html @@ -0,0 +1,559 @@ + + + + + + + + Training a kraken model — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Training a kraken model

+

kraken is an optical character recognition package that can be trained fairly +easily for a large number of scripts. In contrast to other system requiring +segmentation down to glyph level before classification, it is uniquely suited +for the recognition of connected scripts, because the neural network is trained +to assign correct character to unsegmented training data.

+

Training a new model for kraken requires a variable amount of training data +manually generated from page images which have to be typographically similar to +the target prints that are to be recognized. As the system works on unsegmented +inputs for both training and recognition and its base unit is a text line, +training data are just transcriptions aligned to line images.

+
+

Installing kraken

+

The easiest way to install and use kraken is through conda. kraken works both on Linux and Mac OS +X. After installing conda, download the environment file and create the +environment for kraken:

+
$ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment.yml
+$ conda env create -f environment.yml
+
+
+

Each time you want to use the kraken environment in a shell is has to be +activated first:

+
$ conda activate kraken
+
+
+
+
+

Image acquisition and preprocessing

+

First a number of high quality scans, preferably color or grayscale and at +least 300dpi are required. Scans should be in a lossless image format such as +TIFF or PNG, images in PDF files have to be extracted beforehand using a tool +such as pdftocairo or pdfimages. While each of these requirements can +be relaxed to a degree, the final accuracy will suffer to some extent. For +example, only slightly compressed JPEG scans are generally suitable for +training and recognition.

+

Depending on the source of the scans some preprocessing such as splitting scans +into pages, correcting skew and warp, and removing speckles is usually +required. For complex layouts such as newspapers it is advisable to split the +page manually into columns as the line extraction algorithm run to create +transcription environments does not deal well with non-codex page layouts. A +fairly user-friendly software for semi-automatic batch processing of image +scans is Scantailor albeit most work can be done +using a standard image editor.

+

The total number of scans required depends on the nature of the script to be +recognized. Only features that are found on the page images and training data +derived from it can later be recognized, so it is important that the coverage +of typographic features is exhaustive. Training a single script model for a +fairly small script such as Arabic or Hebrew requires at least 800 lines, while +multi-script models, e.g. combined polytonic Greek and Latin, will require +significantly more transcriptions.

+

There is no hard rule for the amount of training data and it may be required to +retrain a model after the initial training data proves insufficient. Most +western texts contain between 25 and 40 lines per page, therefore upward of +30 pages have to be preprocessed and later transcribed.

+
+
+

Transcription

+

Transcription is done through local browser based HTML transcription +environments. These are created by the ketos transcribe command line util +that is part of kraken. Its basic input is just a number of image files and an +output path to write the HTML file to:

+
$ ketos transcribe -o output.html image_1.png image_2.png ...
+
+
+

While it is possible to put multiple images into a single transcription +environment splitting into one-image-per-HTML will ease parallel transcription +by multiple people.

+

The above command reads in the image files, converts them to black and white if +necessary, tries to split them into line images, and puts an editable text +field next to the image in the HTML.

+

Transcription has to be diplomatic, i.e. contain the exact character sequence +in the line image, including original orthography. Some deviations, such as +consistently omitting vocalization in Arabic texts, is possible as long as they +are systematic and relatively minor.

+
+

Note

+

The page segmentation algorithm extracting lines from images is +optimized for western page layouts and may recognize lines +erroneously, lumping multiple lines together or cutting them in half. +The most efficient way to deal with these errors is just skipping the +affected lines by leaving the text box empty.

+
+
+

Tip

+

Copy-paste transcription can significantly speed up the whole process. +Either transcribe scans of a work where a digital edition already +exists (but does not for typographically similar prints) or find a +sufficiently similar edition as a base.

+
+

After transcribing a number of lines the results have to be saved, either using +the Download button on the lower left or through the regular Save Page +As (CTRL+S) function of the browser. All the work done is contained directly +in the saved files and it is possible to save partially transcribed files and +continue work later.

+

Next the contents of the filled transcription environments have to be +extracted through the ketos extract command:

+
$ ketos extract --output output_directory --normalization NFD *.html
+
+
+

with

+
+
--output
+

The output directory where all line image-text pairs (training data) +are written, defaulting to training/

+
+
--normalization
+

Unicode has code points to encode most glyphs encountered in the wild. +A lesser known feature is that there usually are multiple ways to +encode a glyph. Unicode normalization ensures that equal glyphs are +encoded in the same way, i.e. that the encoded representation across +the training data set is consistent and there is only one way the +network can recognize a particular feature on the page. Usually it is +sufficient to set the normalization to Normalization Form +Decomposed (NFD), as it reduces the the size of the overall script to +be recognized slightly.

+
+
+

The result will be a directory filled with line image text pairs NNNNNN.png +and NNNNNN.gt.txt and a manifest.txt containing a list of all extracted +lines.

+
+

Note

+

At this point it is recommended to review the content of the training +data directory before proceeding.

+
+
+
+

Training

+

The training data in output_dir may now be used to train a new model by +invoking the ketos train command. Just hand a list of images to the command +such as:

+
$ ketos train output_dir/*.png
+
+
+

to start training.

+

A number of lines will be split off into a separate held-out set that is used +to estimate the actual recognition accuracy achieved in the real world. These +are never shown to the network during training but will be recognized +periodically to evaluate the accuracy of the model. Per default the validation +set will comprise of 10% of the training data.

+

Basic model training is mostly automatic albeit there are multiple parameters +that can be adjusted:

+
+
--output
+

Sets the prefix for models generated during training. They will best as +prefix_epochs.mlmodel.

+
+
--report
+

How often evaluation passes are run on the validation set. It is an +integer equal or larger than 1 with 1 meaning a report is created each +time the complete training set has been seen by the network.

+
+
--savefreq
+

How often intermediate models are saved to disk. It is an integer with +the same semantics as --report.

+
+
--load
+

Continuing training is possible by loading an existing model file with +--load. To continue training from a base model with another +training set refer to the full ketos documentation.

+
+
--preload
+

Enables/disables preloading of the training set into memory for +accelerated training. The default setting preloads data sets with less +than 2500 lines, explicitly adding --preload will preload arbitrary +sized sets. --no-preload disables preloading in all circumstances.

+
+
+

Training a network will take some time on a modern computer, even with the +default parameters. While the exact time required is unpredictable as training +is a somewhat random process a rough guide is that accuracy seldomly improves +after 50 epochs reached between 8 and 24 hours of training.

+

When to stop training is a matter of experience; the default setting employs a +fairly reliable approach known as early stopping that stops training as soon as +the error rate on the validation set doesn’t improve anymore. This will +prevent overfitting, i.e. +fitting the model to recognize only the training data properly instead of the +general patterns contained therein.

+
$ ketos train output_dir/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'}
+Initializing model ✓
+Accuracy report (0) -1.5951 3680 9550
+epoch 0/-1  [####################################]  788/788
+Accuracy report (1) 0.0245 3504 3418
+epoch 1/-1  [####################################]  788/788
+Accuracy report (2) 0.8445 3504 545
+epoch 2/-1  [####################################]  788/788
+Accuracy report (3) 0.9541 3504 161
+epoch 3/-1  [------------------------------------]  13/788  0d 00:22:09
+...
+
+
+

By now there should be a couple of models model_name-1.mlmodel, +model_name-2.mlmodel, … in the directory the script was executed in. Lets +take a look at each part of the output.

+
Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+
+
+

shows the progress of loading the training and validation set into memory. This +might take a while as preprocessing the whole set and putting it into memory is +computationally intensive. Loading can be made faster without preloading at the +cost of performing preprocessing repeatedlyduring the training process.

+
[270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'}
+
+
+

is a warning about missing characters in either the validation or training set, +i.e. that the alphabets of the sets are not equal. Increasing the size of the +validation set will often remedy this warning.

+
Accuracy report (2) 0.8445 3504 545
+
+
+

this line shows the results of the validation set evaluation. The error after 2 +epochs is 545 incorrect characters out of 3504 characters in the validation set +for a character accuracy of 84.4%. It should decrease fairly rapidly. If +accuracy remains around 0.30 something is amiss, e.g. non-reordered +right-to-left or wildly incorrect transcriptions. Abort training, correct the +error(s) and start again.

+

After training is finished the best model is saved as +model_name_best.mlmodel. It is highly recommended to also archive the +training log and data for later reference.

+

ketos can also produce more verbose output with training set and network +information by appending one or more -v to the command:

+
$ ketos -vv train syr/*.png
+[0.7272] Building ground truth set from 876 line images
+[0.7281] Taking 88 lines from training for evaluation
+...
+[0.8479] Training set 788 lines, validation set 88 lines, alphabet 48 symbols
+[0.8481] alphabet mismatch {'\xa0', '0', ':', '݀', '܇', '݂', '5'}
+[0.8482] grapheme       count
+[0.8484] SPACE  5258
+[0.8484]        ܐ       3519
+[0.8485]        ܘ       2334
+[0.8486]        ܝ       2096
+[0.8487]        ܠ       1754
+[0.8487]        ܢ       1724
+[0.8488]        ܕ       1697
+[0.8489]        ܗ       1681
+[0.8489]        ܡ       1623
+[0.8490]        ܪ       1359
+[0.8491]        ܬ       1339
+[0.8491]        ܒ       1184
+[0.8492]        ܥ       824
+[0.8492]        .       811
+[0.8493] COMBINING DOT BELOW    646
+[0.8493]        ܟ       599
+[0.8494]        ܫ       577
+[0.8495] COMBINING DIAERESIS    488
+[0.8495]        ܚ       431
+[0.8496]        ܦ       428
+[0.8496]        ܩ       307
+[0.8497] COMBINING DOT ABOVE    259
+[0.8497]        ܣ       256
+[0.8498]        ܛ       204
+[0.8498]        ܓ       176
+[0.8499]        ܀       132
+[0.8499]        ܙ       81
+[0.8500]        *       66
+[0.8501]        ܨ       59
+[0.8501]        ܆       40
+[0.8502]        [       40
+[0.8503]        ]       40
+[0.8503]        1       18
+[0.8504]        2       11
+[0.8504]        ܇       9
+[0.8505]        3       8
+[0.8505]                6
+[0.8506]        5       5
+[0.8506] NO-BREAK SPACE 4
+[0.8507]        0       4
+[0.8507]        6       4
+[0.8508]        :       4
+[0.8508]        8       4
+[0.8509]        9       3
+[0.8510]        7       3
+[0.8510]        4       3
+[0.8511] SYRIAC FEMININE DOT    1
+[0.8511] SYRIAC RUKKAKHA        1
+[0.8512] Encoding training set
+[0.9315] Creating new model [1,1,0,48 Lbx100 Do] with 49 outputs
+[0.9318] layer          type    params
+[0.9350] 0              rnn     direction b transposed False summarize False out 100 legacy None
+[0.9361] 1              dropout probability 0.5 dims 1
+[0.9381] 2              linear  augmented False out 49
+[0.9918] Constructing RMSprop optimizer (lr: 0.001, momentum: 0.9)
+[0.9920] Set OpenMP threads to 4
+[0.9920] Moving model to device cpu
+[0.9924] Starting evaluation run
+
+
+

indicates that the training is running on 788 transcribed lines and a +validation set of 88 lines. 49 different classes, i.e. Unicode code points, +where found in these 788 lines. These affect the output size of the network; +obviously only these 49 different classes/code points can later be output by +the network. Importantly, we can see that certain characters occur markedly +less often than others. Characters like the Syriac feminine dot and numerals +that occur less than 10 times will most likely not be recognized well by the +trained net.

+
+
+

Evaluation and Validation

+

While output during training is detailed enough to know when to stop training +one usually wants to know the specific kinds of errors to expect. Doing more +in-depth error analysis also allows to pinpoint weaknesses in the training +data, e.g. above average error rates for numerals indicate either a lack of +representation of numerals in the training data or erroneous transcription in +the first place.

+

First the trained model has to be applied to some line transcriptions with the +ketos test command:

+
$ ketos test -m syriac_best.mlmodel lines/*.png
+Loading model syriac_best.mlmodel ✓
+Evaluating syriac_best.mlmodel
+Evaluating  [#-----------------------------------]    3%  00:04:56
+...
+
+
+

After all lines have been processed a evaluation report will be printed:

+
=== report  ===
+
+35619     Characters
+336       Errors
+99.06%    Accuracy
+
+157       Insertions
+81        Deletions
+98        Substitutions
+
+Count     Missed  %Right
+27046     143     99.47%  Syriac
+7015      52      99.26%  Common
+1558      60      96.15%  Inherited
+
+Errors    Correct-Generated
+25        {  } - { COMBINING DOT BELOW }
+25        { COMBINING DOT BELOW } - {  }
+15        { . } - {  }
+15        { COMBINING DIAERESIS } - {  }
+12        { ܢ } - {  }
+10        {  } - { . }
+8 { COMBINING DOT ABOVE } - {  }
+8 { ܝ } - {  }
+7 { ZERO WIDTH NO-BREAK SPACE } - {  }
+7 { ܆ } - {  }
+7 { SPACE } - {  }
+7 { ܣ } - {  }
+6 {  } - { ܝ }
+6 { COMBINING DOT ABOVE } - { COMBINING DIAERESIS }
+5 { ܙ } - {  }
+5 { ܬ } - {  }
+5 {  } - { ܢ }
+4 { NO-BREAK SPACE } - {  }
+4 { COMBINING DIAERESIS } - { COMBINING DOT ABOVE }
+4 {  } - { ܒ }
+4 {  } - { COMBINING DIAERESIS }
+4 { ܗ } - {  }
+4 {  } - { ܬ }
+4 {  } - { ܘ }
+4 { ܕ } - { ܢ }
+3 {  } - { ܕ }
+3 { ܐ } - {  }
+3 { ܗ } - { ܐ }
+3 { ܝ } - { ܢ }
+3 { ܀ } - { . }
+3 {  } - { ܗ }
+
+  .....
+
+
+

The first section of the report consists of a simple accounting of the number +of characters in the ground truth, the errors in the recognition output and the +resulting accuracy in per cent.

+

The next table lists the number of insertions (characters occuring in the +ground truth but not in the recognition output), substitutions (misrecognized +characters), and deletions (superfluous characters recognized by the model).

+

Next is a grouping of errors (insertions and substitutions) by Unicode script.

+

The final part of the report are errors sorted by frequency and a per +character accuracy report. Importantly most errors are incorrect recognition of +combining marks such as dots and diaereses. These may have several sources: +different dot placement in training and validation set, incorrect transcription +such as non-systematic transcription, or unclean speckled scans. Depending on +the error source, correction most often involves adding more training data and +fixing transcriptions. Sometimes it may even be advisable to remove +unrepresentative data from the training set.

+
+
+

Recognition

+

The kraken utility is employed for all non-training related tasks. Optical +character recognition is a multi-step process consisting of binarization +(conversion of input images to black and white), page segmentation (extracting +lines from the image), and recognition (converting line image to character +sequences). All of these may be run in a single call like this:

+
$ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m MODEL_FILE
+
+
+

producing a text file from the input image. There are also hocr and ALTO output +formats available through the appropriate switches:

+
$ kraken -i ... ocr -h
+$ kraken -i ... ocr -a
+
+
+

For debugging purposes it is sometimes helpful to run each step manually and +inspect intermediate results:

+
$ kraken -i INPUT_IMAGE BW_IMAGE binarize
+$ kraken -i BW_IMAGE LINES segment
+$ kraken -i BW_IMAGE OUTPUT_FILE ocr -l LINES ...
+
+
+

It is also possible to recognize more than one file at a time by just chaining +-i ... ... clauses like this:

+
$ kraken -i input_1 output_1 -i input_2 output_2 ...
+
+
+

Finally, there is an central repository containing freely available models. +Getting a list of all available models:

+
$ kraken list
+
+
+

Retrieving model metadata for a particular model:

+
$ kraken show arabic-alam-al-kutub
+name: arabic-alam-al-kutub.mlmodel
+
+An experimental model for Classical Arabic texts.
+
+Network trained on 889 lines of [0] as a test case for a general Classical
+Arabic model. Ground truth was prepared by Sarah Savant
+<sarah.savant@aku.edu> and Maxim Romanov <maxim.romanov@uni-leipzig.de>.
+
+Vocalization was omitted in the ground truth. Training was stopped at ~35000
+iterations with an accuracy of 97%.
+
+[0] Ibn al-Faqīh (d. 365 AH). Kitāb al-buldān. Edited by Yūsuf al-Hādī, 1st
+edition. Bayrūt: ʿĀlam al-kutub, 1416 AH/1996 CE.
+alphabet:  !()-.0123456789:[] «»،؟ءابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ARABIC
+MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW
+
+
+

and actually fetching the model:

+
$ kraken get arabic-alam-al-kutub
+
+
+

The downloaded model can then be used for recognition by the name shown in its metadata, e.g.:

+
$ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m arabic-alam-al-kutub.mlmodel
+
+
+

For more documentation see the kraken website.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/2.0.0/vgsl.html b/2.0.0/vgsl.html new file mode 100644 index 000000000..c1c6cb7da --- /dev/null +++ b/2.0.0/vgsl.html @@ -0,0 +1,272 @@ + + + + + + + + VGSL network specification — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

VGSL network specification

+

kraken implements a dialect of the Variable-size Graph Specification Language +(VGSL), enabling the specification of different network architectures for image +processing purposes using a short definition string.

+
+

Basics

+

A VGSL specification consists of an input block, one or more layers, and an +output block. For example:

+
[1,48,0,1 Cr3,3,32 Mp2,2 Cr3,3,64 Mp2,2 S1(1x12)1,3 Lbx100 Do O1c103]
+
+
+

The first block defines the input in order of [batch, heigh, width, channels] +with zero-valued dimensions being variable. Integer valued height or width +input specifications will result in the input images being automatically scaled +in either dimension.

+

When channels are set to 1 grayscale or B/W inputs are expected, 3 expects RGB +color images. Higher values in combination with a height of 1 result in the +network being fed 1 pixel wide grayscale strips scaled to the size of the +channel dimension.

+

After the input, a number of layers are defined. Layers operate on the channel +dimension; this is intuitive for convolutional layers but a recurrent layer +doing sequence classification along the width axis on an image of a particular +height requires the height dimension to be moved to the channel dimension, +e.g.:

+
[1,48,0,1 S1(1x48)1,3 Lbx100 O1c103]
+
+
+

or using the alternative slightly faster formulation:

+
[1,1,0,48 Lbx100 O1c103]
+
+
+

Finally an output definition is appended. When training sequence classification +networks with the provided tools the appropriate output definition is +automatically appended to the network based on the alphabet of the training +data.

+
+
+

Examples

+
[1,1,0,48 Lbx100 Do 01c59]
+
+Creating new model [1,1,0,48 Lbx100 Do] with 59 outputs
+layer           type    params
+0               rnn     direction b transposed False summarize False out 100 legacy None
+1               dropout probability 0.5 dims 1
+2               linear  augmented False out 59
+
+
+

A simple recurrent recognition model with a single LSTM layer classifying lines +normalized to 48 pixels in height.

+
[1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do 01c59]
+
+Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 59 outputs
+layer           type    params
+0               conv    kernel 3 x 3 filters 32 activation r
+1               dropout probability 0.1 dims 2
+2               maxpool kernel 2 x 2 stride 2 x 2
+3               conv    kernel 3 x 3 filters 64 activation r
+4               dropout probability 0.1 dims 2
+5               maxpool kernel 2 x 2 stride 2 x 2
+6               reshape from 1 1 x 12 to 1/3
+7               rnn     direction b transposed False summarize False out 100 legacy None
+8               dropout probability 0.5 dims 1
+9               linear  augmented False out 59
+
+
+

A model with a small convolutional stack before a recurrent LSTM layer. The +extended dropout layer syntax is used to reduce drop probability on the depth +dimension as the default is too high for convolutional layers. The remainder of +the height dimension (12) is reshaped into the depth dimensions before +applying the final recurrent and linear layers.

+
[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do 01c59]
+
+Creating new model [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do] with 59 outputs
+layer           type    params
+0               conv    kernel 3 x 3 filters 16 activation r
+1               maxpool kernel 3 x 3 stride 3 x 3
+2               rnn     direction f transposed True summarize True out 64 legacy None
+3               rnn     direction b transposed False summarize False out 128 legacy None
+4               rnn     direction b transposed False summarize False out 256 legacy None
+5               dropout probability 0.5 dims 1
+6               linear  augmented False out 59
+
+
+

A model with arbitrary sized color image input, an initial summarizing +recurrent layer to squash the height to 64, followed by 2 bi-directional +recurrent layers and a linear projection.

+
+
+

Convolutional Layers

+
C[{name}](s|t|r|l|m)[{name}]<y>,<x>,<d>
+s = sigmoid
+t = tanh
+r = relu
+l = linear
+m = softmax
+
+
+

Adds a 2D convolution with kernel size (y, x) and d output channels, applying +the selected nonlinearity.

+
+
+

Recurrent Layers

+
L[{name}](f|r|b)(x|y)[s][{name}]<n> LSTM cell with n outputs.
+G[{name}](f|r|b)(x|y)[s][{name}]<n> GRU cell with n outputs.
+f runs the RNN forward only.
+r runs the RNN reversed only.
+b runs the RNN bidirectionally.
+s (optional) summarizes the output in the requested dimension, return the last step.
+
+
+

Adds either an LSTM or GRU recurrent layer to the network using eiter the x +(width) or y (height) dimension as the time axis. Input features are the +channel dimension and the non-time-axis dimension (height/width) is treated as +another batch dimension. For example, a Lfx25 layer on an 1, 16, 906, 32 +input will execute 16 independent forward passes on 906x32 tensors resulting +in an output of shape 1, 16, 906, 25. If this isn’t desired either run a +summarizing layer in the other direction, e.g. Lfys20 for an input 1, 1, +906, 20, or prepend a reshape layer S1(1x16)1,3 combining the height and +channel dimension for an 1, 1, 906, 512 input to the recurrent layer.

+
+
+

Helper and Plumbing Layers

+
+

Max Pool

+
Mp[{name}]<y>,<x>[,<y_stride>,<x_stride>]
+
+
+

Adds a maximum pooling with (y, x) kernel_size and (y_stride, x_stride) stride.

+
+
+

Reshape

+
S[{name}]<d>(<a>x<b>)<e>,<f> Splits one dimension, moves one part to another
+        dimension.
+
+
+

The S layer reshapes a source dimension d to a,b and distributes a into +dimension e, respectively b into f. Either e or f has to be equal to +d. So S1(1, 48)1, 3 on an 1, 48, 1020, 8 input will first reshape into +1, 1, 48, 1020, 8, leave the 1 part in the height dimension and distribute +the 48 sized tensor into the channel dimension resulting in a 1, 1, 1024, +48*8=384 sized output. S layers are mostly used to remove undesirable non-1 +height before a recurrent layer.

+
+

Note

+

This S layer is equivalent to the one implemented in the tensorflow +implementation of VGSL, i.e. behaves differently from tesseract.

+
+
+
+
+

Regularization Layers

+
Do[{name}][<prob>],[<dim>] Insert a 1D or 2D dropout layer
+
+
+

Adds an 1D or 2D dropout layer with a given probability. Defaults to 0.5 drop +probability and 1D dropout. Set to dim to 2 after convolutional layers.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/3.0/.buildinfo b/3.0/.buildinfo new file mode 100644 index 000000000..93216c585 --- /dev/null +++ b/3.0/.buildinfo @@ -0,0 +1,4 @@ +# Sphinx build info version 1 +# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. +config: 57bc08b2c4a63553f3a0ea3b9d8c2b41 +tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/3.0/.doctrees/advanced.doctree b/3.0/.doctrees/advanced.doctree new file mode 100644 index 000000000..e2023838d Binary files /dev/null and b/3.0/.doctrees/advanced.doctree differ diff --git a/3.0/.doctrees/api.doctree b/3.0/.doctrees/api.doctree new file mode 100644 index 000000000..8ab7b75ae Binary files /dev/null and b/3.0/.doctrees/api.doctree differ diff --git a/3.0/.doctrees/api_docs.doctree b/3.0/.doctrees/api_docs.doctree new file mode 100644 index 000000000..02cab8cc0 Binary files /dev/null and b/3.0/.doctrees/api_docs.doctree differ diff --git a/3.0/.doctrees/environment.pickle b/3.0/.doctrees/environment.pickle new file mode 100644 index 000000000..c40eec9cf Binary files /dev/null and b/3.0/.doctrees/environment.pickle differ diff --git a/3.0/.doctrees/gpu.doctree b/3.0/.doctrees/gpu.doctree new file mode 100644 index 000000000..80288bb8b Binary files /dev/null and b/3.0/.doctrees/gpu.doctree differ diff --git a/3.0/.doctrees/index.doctree b/3.0/.doctrees/index.doctree new file mode 100644 index 000000000..93c928d2e Binary files /dev/null and b/3.0/.doctrees/index.doctree differ diff --git a/3.0/.doctrees/ketos.doctree b/3.0/.doctrees/ketos.doctree new file mode 100644 index 000000000..8f61002f2 Binary files /dev/null and b/3.0/.doctrees/ketos.doctree differ diff --git a/3.0/.doctrees/models.doctree b/3.0/.doctrees/models.doctree new file mode 100644 index 000000000..09257e202 Binary files /dev/null and b/3.0/.doctrees/models.doctree differ diff --git a/3.0/.doctrees/training.doctree b/3.0/.doctrees/training.doctree new file mode 100644 index 000000000..a419fca9c Binary files /dev/null and b/3.0/.doctrees/training.doctree differ diff --git a/3.0/.doctrees/vgsl.doctree b/3.0/.doctrees/vgsl.doctree new file mode 100644 index 000000000..d79b183e9 Binary files /dev/null and b/3.0/.doctrees/vgsl.doctree differ diff --git a/3.0/.nojekyll b/3.0/.nojekyll new file mode 100644 index 000000000..e69de29bb diff --git a/3.0/_sources/advanced.rst.txt b/3.0/_sources/advanced.rst.txt new file mode 100644 index 000000000..8600c45e5 --- /dev/null +++ b/3.0/_sources/advanced.rst.txt @@ -0,0 +1,228 @@ +.. _advanced: + +Advanced Usage +============== + +Optical character recognition is the serial execution of multiple steps, in the +case of kraken binarization (converting color and grayscale images into bitonal +ones), layout analysis/page segmentation (extracting topological text lines +from an image), recognition (feeding text lines images into an classifiers), +and finally serialization of results into an appropriate format such as hOCR or +ALTO. + +Input Specification +------------------- + +All kraken subcommands operating on input-output pairs, i.e. producing one +output document for one input document follow the basic syntax: + +.. code-block:: console + + $ kraken -i input_1 output_1 -i input_2 output_2 ... subcommand_1 subcommand_2 ... subcommand_n + +In particular subcommands may be chained. + +Binarization +------------ + +The binarization subcommand accepts almost the same parameters as +``ocropus-nlbin``. Only options not related to binarization, e.g. skew +detection are missing. In addition, error checking (image sizes, inversion +detection, grayscale enforcement) is always disabled and kraken will happily +binarize any image that is thrown at it. + +Available parameters are: + +=========== ==== +option type +=========== ==== +--threshold FLOAT +--zoom FLOAT +--escale FLOAT +--border FLOAT +--perc INTEGER RANGE +--range INTEGER +--low INTEGER RANGE +--high INTEGER RANGE +=========== ==== + +Page Segmentation and Script Detection +-------------------------------------- + +The `segment` subcommand access two operations page segmentation into lines and +script detection of those lines. + +Page segmentation is mostly parameterless, although a switch to change the +color of column separators has been retained. The segmentation is written as a +`JSON `_ file containing bounding boxes in reading order and +the general text direction (horizontal, i.e. LTR or RTL text in top-to-bottom +reading order or vertical-ltr/rtl for vertical lines read from left-to-right or +right-to-left). + +The script detection splits extracted lines from the segmenter into strip +sharing a particular script that can then be recognized by supplying +appropriate models for each detected script to the `ocr` subcommand. + +Combined output from both consists of lists in the `boxes` field corresponding +to a topographical line and containing one or more bounding boxes of a +particular script. Identifiers are `ISO 15924 +`_ 4 character codes. + +.. code-block:: console + + $ kraken -i 14.tif lines.txt segment + $ cat lines.json + { + "boxes" : [ + [ + ["Grek", [561, 216, 1626,309]] + ], + [ + ["Latn", [2172, 197, 2424, 244]] + ], + [ + ["Grek", [1678, 221, 2236, 320]], + ["Arab", [2241, 221, 2302, 320]] + ], + + ["Grek", [412, 318, 2215, 416]], + ["Latn", [2208, 318, 2424, 416]] + ], + ... + ], + "script_detection": true, + "text_direction" : "horizontal-tb" + } + +Script detection is automatically enabled; by explicitly disabling script +detection the `boxes` field will contain only a list of line bounding boxes: + +.. code-block:: console + + [546, 216, 1626, 309], + [2169, 197, 2423, 244], + [1676, 221, 2293, 320], + ... + [503, 2641, 848, 2681] + +Available page segmentation parameters are: + +=============================================== ====== +option action +=============================================== ====== +-d, --text-direction Sets principal text direction. Valid values are `horizontal-lr`, `horizontal-rl`, `vertical-lr`, and `vertical-rl`. +--scale FLOAT Estimate of the average line height on the page +-m, --maxcolseps Maximum number of columns in the input document. Set to `0` for uni-column layouts. +-b, --black-colseps / -w, --white-colseps Switch to black column separators. +-r, --remove-hlines / -l, --hlines Disables prefiltering of small horizontal lines. Improves segmenter output on some Arabic texts. +=============================================== ====== + +The parameters specific to the script identification are: + +=============================================== ====== +option action +=============================================== ====== +-s/-n Enables/disables script detection +-a, --allowed-script Whitelists specific scripts for detection output. Other detected script runs are merged with their adjacent scripts, after a heuristic pre-merging step. +=============================================== ====== + +Model Repository +---------------- + +There is a semi-curated `repository +`_ of freely licensed recognition +models that can be accessed from the command line using a few subcommands. For +evaluating a series of models it is also possible to just clone the repository +using the normal git client. + +The ``list`` subcommand retrieves a list of all models available and prints +them including some additional information (identifier, type, and a short +description): + +.. code-block:: console + + $ kraken list + Retrieving model list ✓ + default (pyrnn) - A converted version of en-default.pyrnn.gz + toy (clstm) - A toy model trained on 400 lines of the UW3 data set. + ... + +To access more detailed information the ``show`` subcommand may be used: + +.. code-block:: console + + $ kraken show toy + name: toy.clstm + + A toy model trained on 400 lines of the UW3 data set. + + author: Benjamin Kiessling (mittagessen@l.unchti.me) + http://kraken.re + +If a suitable model has been decided upon it can be retrieved using the ``get`` +subcommand: + +.. code-block:: console + + $ kraken get toy + Retrieving model ✓ + +Models will be placed in $XDG_BASE_DIR and can be accessed using their name as +shown by the ``show`` command, e.g.: + +.. code-block:: console + + $ kraken -i ... ... ocr -m toy + +Additions and updates to existing models are always welcome! Just open a pull +request or write an email. + +Recognition +----------- + +Recognition requires a grey-scale or binarized image, a page segmentation for +that image, and a model file. In particular there is no requirement to use the +page segmentation algorithm contained in the ``segment`` subcommand or the +binarization provided by kraken. + +Multi-script recognition is possible by supplying a script-annotated +segmentation and a mapping between scripts and models: + +.. code-block:: console + + $ kraken -i ... ... ocr -m Grek:porson.clstm -m Latn:antiqua.clstm + +All polytonic Greek text portions will be recognized using the `porson.clstm` +model while Latin text will be fed into the `antiqua.clstm` model. It is +possible to define a fallback model that other text will be fed to: + +.. code-block:: console + + $ kraken -i ... ... ocr -m ... -m ... -m default:porson.clstm + +It is also possible to disable recognition on a particular script by mapping to +the special model keyword `ignore`. Ignored lines will still be serialized but +will not contain any recognition results. + +The ``ocr`` subcommand is able to serialize the recognition results either as +plain text (default), as `hOCR `_, into `ALTO +`_, or abbyyXML containing additional +metadata such as bounding boxes and confidences: + +.. code-block:: console + + $ kraken -i ... ... ocr -t # text output + $ kraken -i ... ... ocr -h # hOCR output + $ kraken -i ... ... ocr -a # ALTO output + $ kraken -i ... ... ocr -y # abbyyXML output + +hOCR output is slightly different from hOCR files produced by ocropus. Each +``ocr_line`` span contains not only the bounding box of the line but also +character boxes (``x_bboxes`` attribute) indicating the coordinates of each +character. In each line alternating sequences of alphanumeric and +non-alphanumeric (in the unicode sense) characters are put into ``ocrx_word`` +spans. Both have bounding boxes as attributes and the recognition confidence +for each character in the ``x_conf`` attribute. + +Paragraph detection has been removed as it was deemed to be unduly dependent on +certain typographic features which may not be valid for your input. diff --git a/3.0/_sources/api.rst.txt b/3.0/_sources/api.rst.txt new file mode 100644 index 000000000..3ee8c5e43 --- /dev/null +++ b/3.0/_sources/api.rst.txt @@ -0,0 +1,379 @@ +API Quickstart +============== + +Kraken provides routines which are usable by third party tools to access all +functionality of the OCR engine. Most functional blocks, binarization, +segmentation, recognition, and serialization are encapsulated in one high +level method each. + +Simple use cases of the API which are mostly useful for debugging purposes are +contained in the `contrib` directory. In general it is recommended to look at +this tutorial, these scripts, or the API reference. The command line drivers +are unnecessarily complex for straightforward applications as they contain lots +of boilerplate to enable all use cases. + +Basic Concepts +-------------- + +The fundamental modules of the API are similar to the command line drivers. +Image inputs and outputs are generally `Pillow `_ +objects and numerical outputs numpy arrays. + +Top-level modules implement high level functionality while :mod:`kraken.lib` +contains loaders and low level methods that usually should not be used if +access to intermediate results is not required. + +Preprocessing and Segmentation +------------------------------ + +The primary preprocessing function is binarization although depending on the +particular setup of the pipeline and the models utilized it can be optional. +For the non-trainable legacy bounding box segmenter binarization is mandatory +although it is still possible to feed color and grayscale images to the +recognizer. The trainable baseline segmenter can work with black and white, +grayscale, and color images, depending on the training data and netork +configuration utilized; though grayscale and color data are used in almost all +cases. + +.. code-block:: python + + >>> from PIL import Image + + >>> from kraken import binarization + + # can be any supported image format and mode + >>> im = Image.open('foo.png') + >>> bw_im = binarization.nlbin(im) + +Legacy segmentation +~~~~~~~~~~~~~~~~~~~ + +The basic parameter of the legacy segmenter consists just of a b/w image +object, although some additional parameters exist, largely to change the +principal text direction (important for column ordering and top-to-bottom +scripts) and explicit masking of non-text image regions: + +.. code-block:: python + + >>> from kraken import pageseg + + >>> seg = pageseg.segment(bw_im) + >>> seg + {'text_direction': 'horizontal-lr', + 'boxes': [[0, 29, 232, 56], + [28, 54, 121, 84], + [9, 73, 92, 117], + [103, 76, 145, 131], + [7, 105, 119, 230], + [10, 228, 126, 345], + ... + ], + 'script_detection': False} + +Baseline segmentation +~~~~~~~~~~~~~~~~~~~~~ + +The baseline segmentation method is based on a neural network that classifies +image pixels into baselines and regions. Because it is trainable, a +segmentation model is required in addition to the image to be segmentation and +it has to be loaded first: + +.. code-block:: python + + >>> from kraken import blla + >>> from kraken.lib import vgsl + + >>> model_path = 'path/to/model/file' + >>> model = vgsl.TorchVGSLModel.load_model(model_path) + +Afterwards they can be fed into the segmentation method +:func:`kraken.blla.segment` with image objects: + +.. code-block:: python + + >>> from kraken import blla + + >>> baseline_seg = blla.segment(im, model=model) + >>> baseline_seg + {'text_direction': 'horizontal-lr', + 'type': 'baselines', + 'script_detection': False, + 'lines': [{'script': 'default', + 'baseline': [[471, 1408], [524, 1412], [509, 1397], [1161, 1412], [1195, 1412]], + 'boundary': [[471, 1408], [491, 1408], [515, 1385], [562, 1388], [575, 1377], ... [473, 1410]]}, + ...], + 'regions': {'$tip':[[[536, 1716], ... [522, 1708], [524, 1716], [536, 1716], ...] + '$par': ... + '$nop': ...}} + +Optional parameters are largely the same as for the legacy segmenter, i.e. text +direction and masking. + +Images are automatically converted into the proper mode for recognition, except +in the case of models trained on binary images as there is a plethora of +different algorithms available, each with strengths and weaknesses. For most +material the kraken-provided binarization should be sufficient, though. This +does not mean that a segmentation model trained on RGB images will have equal +accuracy for B/W, grayscale, and RGB inputs. Nevertheless the drop in quality +will often be modest or non-existant in for color models while non-binarized +inputs to a binary model will cause severe degradation (and a warning to that +notion). + +Per default segmentation is performed on the CPU although the neural network +can be run on a GPU with the `device` argument. As the vast majority of the +processing required is postprocessing the performance gain will most likely +modest though. + +Recognition +----------- + +The character recognizer is equally based on a neural network which has to be +loaded first. + +.. code-block:: python + + >>> from kraken.lib import models + + >>> rec_model_path = '/path/to/recognition/model' + >>> model = models.load_any(rec_model_path) + +Afterwards, given an image, a segmentation and the model one can perform text +recognition. The code is identical for both legacy and baseline segmentations. +Like for segmentation input images are auto-converted to the correct color +mode, except in the case of binary models and a warning will be raised if there +is a mismatch for binary input models. + +There are two methods for recognition, a basic single model call +:func:`kraken.rpred.rpred` and a multi-model recognizer +:func:`kraken.rpred.mm_rpred`. The latter is useful for recognizing +multi-scriptal documents, i.e. applying different models to different parts of +a document. + +.. code-block:: python + + >>> from kraken import rpred + # single model recognition + >>> pred_it = rpred(model, im, baseline_seg) + >>> for record in pred_it: + print(record) + +The output isn't just a sequence of characters but a record object containing +the character prediction, cuts (approximate locations), and confidences. + +.. code-block:: python + + >>> record.cuts + >>> record.prediction + >>> record.confidences + +it is also possible to access the original line information: + +.. code-block:: python + + # for baselines + >>> record.type + 'baselines' + >>> record.line + >>> record.baseline + >>> record.script + + # for box lines + >>> record.type + 'box' + >>> record.line + >>> record.script + +Sometimes the undecoded raw output of the network is required. The :math:`C +\times W` softmax output matrix is accessible as an attribute on the +:class:`kraken.lib.models.TorchSeqRecognizer` after each step of the :func:`kraken.rpred.rpred` iterator. To get a mapping +from the label space :math:`C` the network operates in to Unicode code points a +codec is used. An arbitrary sequence of labels can generate an arbitrary number +of Unicode code points although usually the relation is one-to-one. + +.. code-block:: python + + >>> pred_it = rpred(model, im, baseline_seg) + >>> next(pred_it) + >>> model.output + >>> model.codec.l2c + {'\x01': ' ', + '\x02': '"', + '\x03': "'", + '\x04': '(', + '\x05': ')', + '\x06': '-', + '\x07': '/', + ... + } + +There are several different ways to convert the output matrix to a sequence of +labels that can be decoded into a character sequence. These are contained in +:mod:`kraken.lib.ctc_decoder` with +:func:`kraken.lib.ctc_decoder.greedy_decoder` being the default. + +XML Parsing +----------- + +Sometimes it is desired to take the data in an existing XML serialization +format like PageXML or ALTO and apply an OCR function on it. The +:mod:`kraken.lib.xml` module includes parsers extracting information into data +structures processable with minimal transformtion by the functional blocks: + +.. code-block:: python + + >>> from kraken.lib import xml + + >>> alto_doc = '/path/to/alto' + >>> xml.parse_alto(alto_doc) + {'image': '/path/to/image/file', + 'type': 'baselines', + 'lines': [{'baseline': [(24, 2017), (25, 2078)], + 'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)], + 'text': '', + 'script': 'default'}, + {'baseline': [(79, 2016), (79, 2041)], + 'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)], + 'text': '', + 'script': 'default'}, ...], + 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)], + [(253, 3292), (668, 3292), (668, 3455), (253, 3455)], + [(216, -4), (1015, -4), (1015, 534), (216, 534)]], + 'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)], + [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]], + ...} + } + + >>> page_doc = '/path/to/page' + >>> xml.parse_page(page_doc) + {'image': '/path/to/image/file', + 'type': 'baselines', + 'lines': [{'baseline': [(24, 2017), (25, 2078)], + 'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)], + 'text': '', + 'script': 'default'}, + {'baseline': [(79, 2016), (79, 2041)], + 'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)], + 'text': '', + 'script': 'default'}, ...], + 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)], + [(253, 3292), (668, 3292), (668, 3455), (253, 3455)], + [(216, -4), (1015, -4), (1015, 534), (216, 534)]], + 'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)], + [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]], + ...} + + +Serialization +------------- + +The serialization module can be used to transform the :class:`ocr_records +` returned by the prediction iterator into a text +based (most often XML) format for archival. The module renders `jinja2 +`_ templates in `kraken/templates` through +the :func:`kraken.serialization.serialize` function. + +.. code-block:: python + + >>> from kraken.lib import serialization + + >>> records = [record for record in pred_it] + >>> alto = serialization.serialize(records, image_name='path/to/image', image_size=im.size, template='alto') + >>> with open('output.xml', 'w') as fp: + fp.write(alto) + + +Training +-------- + +There are catch-all constructors for quickly setting up +:cls:`kraken.lib.train.KrakenTrainer` instances for all training needs. They +largely map the comand line utils `ketos train` and `ketos segtrain` to a +programmatic interface. The arguments are identical, apart from a +differentiation between general arguments (data sources and setup, file names, +devices, ...) and hyperparameters (optimizers, learning rate schedules, +augmentation. + +Training a recognition model from a number of xml files in ALTO or PAGE XML: + +.. code-block:: python + + >>> from kraken.lib.train import KrakenTrainer + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> trainer = KrakenTrainer.recognition_train_gen(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer.run() + +Likewise for a baseline and region segmentation model: + +.. code-block:: python + + >>> from kraken.lib.train import KrakenTrainer + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> trainer = KrakenTrainer.segmentation_train_gen(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer.run() + +Both constructing the trainer object and the training itself can take quite a +bit of time. The constructor provides a callback for each iterative process +during object initialization that is intended to set up a progress bar: + +.. code-block:: python + + >>> from kraken.lib.train import KrakenTrainer + + >>> def progress_callback(string, length): + print(f'starting process "{string}" of length {length}') + return lambda: print('.', end='') + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:25] # training data is shuffled internally + >>> evaluation_files = ground_truth[25:95] + >>> trainer = KrakenTrainer.segmentation_train_gen(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', progress_callback=progress_callback, augment=True) + starting process "Building training set" of length 25 + ......................... + starting process "Building validation set" of length 70 + ...................................................................... + >>> trainer.run() + +Executing the trainer object has two callbacks as arguments, one called after +each iteration and one returning the evaluation metrics after the end of each +epoch: + +.. code-block:: python + + >>> from kraken.lib.train import KrakenTrainer + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> trainer = KrakenTrainer.segmentation_train_gen(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> def _update_progress(): + print('.', end='') + >>> def _print_eval(epoch, accuracy, **kwargs): + print(accuracy) + >>> trainer.run(_print_eval, _update_progress) + .........................0.0 + .........................0.0 + .........................0.0 + .........................0.0 + .........................0.0 + ... + +The metrics differ for recognition +(:func:`kraken.lib.train.recognition_evaluator_fn`) and segmentation +(:func:`kraken.lib.train.baseline_label_evaluator_fn`). + +Depending on the stopping method chosen the last model file might not be the +one with the best accuracy. Per default early stopping is used which aborts +training after a certain number of epochs without improvement. In that case the +best model and evaluation loss can be determined through: + +.. code-block:: python + + >>> trainer.stopper.best_epoch + >>> trainer.stopper.best_loss + >>> best_model_path = f'{trainer.filename_prefix}_{trainer.stopper.best_epoch}.mlmodel' + +This is only a small subset of the training functionality. It is suggested to +have a closer look at the command line parameters for features as transfer +learning, region and baseline filtering, training continuation, and so on. diff --git a/3.0/_sources/api_docs.rst.txt b/3.0/_sources/api_docs.rst.txt new file mode 100644 index 000000000..dba52f429 --- /dev/null +++ b/3.0/_sources/api_docs.rst.txt @@ -0,0 +1,119 @@ +API reference +============== + +kraken.binarization module +-------------------------- + +.. automodule:: kraken.binarization + :members: + :show-inheritance: + +kraken.serialization module +--------------------------- + +.. automodule:: kraken.serialization + :members: + :show-inheritance: + +kraken.blla module +------------------ + +.. note:: + + `blla` provides the interface to the fully trainable segmenter. For the + legacy segmenter interface refer to the `pageseg` module. Note that + recognition models are not interchangeable between segmenters. + +.. automodule:: kraken.blla + :members: + :show-inheritance: + +kraken.pageseg module +--------------------- + +.. note:: + + `pageseg` is the legacy bounding box-based segmenter. For the trainable + baseline segmenter interface refer to the `blla` module. Note that + recognition models are not interchangeable between segmenters. + +.. automodule:: kraken.pageseg + :members: + :show-inheritance: + +kraken.rpred module +------------------- + +.. automodule:: kraken.rpred + :members: + :show-inheritance: + +kraken.transcribe module +------------------------ + +.. automodule:: kraken.transcribe + :members: + :show-inheritance: + +kraken.linegen module +--------------------- + +.. automodule:: kraken.linegen + :members: + :show-inheritance: + +kraken.lib.models module +------------------------ + +.. automodule:: kraken.lib.models + :members: + :show-inheritance: + +kraken.lib.vgsl module +---------------------- + +.. automodule:: kraken.lib.vgsl + :members: + :show-inheritance: + +kraken.lib.xml module +--------------------- + +.. automodule:: kraken.lib.xml + :members: + :show-inheritance: + +kraken.lib.codec +---------------- + +.. automodule:: kraken.lib.codec + :members: + :show-inheritance: + +kraken.lib.train module +----------------------- + +.. automodule:: kraken.lib.train + :members: + :show-inheritance: + +kraken.lib.dataset module +------------------------- + +.. automodule:: kraken.lib.dataset + :members: + :show-inheritance: + +kraken.lib.segmentation module +------------------------------ + +.. automodule:: kraken.lib.segmentation + :members: + :show-inheritance: + +kraken.lib.ctc_decoder +---------------------- + +.. automodule:: kraken.lib.ctc_decoder + :members: + :show-inheritance: diff --git a/3.0/_sources/gpu.rst.txt b/3.0/_sources/gpu.rst.txt new file mode 100644 index 000000000..fbb66ba76 --- /dev/null +++ b/3.0/_sources/gpu.rst.txt @@ -0,0 +1,10 @@ +.. _gpu: + +GPU Acceleration +================ + +The latest version of kraken uses a new pytorch backend which enables GPU +acceleration both for training and recognition. Apart from a compatible Nvidia +GPU, CUDA and cuDNN have to be installed so pytorch can run computation on it. + + diff --git a/3.0/_sources/index.rst.txt b/3.0/_sources/index.rst.txt new file mode 100644 index 000000000..5aca96103 --- /dev/null +++ b/3.0/_sources/index.rst.txt @@ -0,0 +1,157 @@ +kraken +====== + +.. toctree:: + :hidden: + :maxdepth: 2 + + advanced + Training + API tutorial + API reference + Models + +kraken is a turn-key OCR system optimized for historical and non-Latin script +material. + +Features +======== + +kraken's main features are: + + - Fully trainable layout analysis and character recognition + - `Right-to-Left `_, `BiDi + `_, and Top-to-Bottom + script support + - `ALTO `_, PageXML, abbyXML, and hOCR + output + - Word bounding boxes and character cuts + - Multi-script recognition support + - `Public repository `_ of model files + - :ref:`Lightweight model files ` + - :ref:`Variable recognition network architectures ` + +Pull requests and code contributions are always welcome. + +Installation +============ + +kraken requires some external libraries to run. On Debian/Ubuntu they may be +installed using: + +.. code-block:: console + + # apt install libpangocairo-1.0 libxml2 libblas3 liblapack3 python3-dev python3-pip libvips + +pip +--- + +.. code-block:: console + + $ pip3 install kraken + +or by running pip in the git repository: + +.. code-block:: console + + $ pip3 install . + +conda +----- + +Install the latest development version through `conda `_: + +:: + + $ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment.yml + $ conda env create -f environment.yml + +or: + +:: + + $ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment_cuda.yml + $ conda env create -f environment_cuda.yml + +for CUDA acceleration with the appropriate hardware. + +Models +------ + +Finally you'll have to scrounge up a recognition model to do the actual +recognition of characters. To download the default English text recognition +model and place it in the user's kraken directory: + +.. code-block:: console + + $ kraken get 10.5281/zenodo.2577813 + +A list of libre models available in the central repository can be retrieved by +running: + +.. code-block:: console + + $ kraken list + +Model metadata can be extracted using: + +.. code-block:: console + + $ kraken show 10.5281/zenodo.2577813 + name: 10.5281/zenodo.2577813 + + A generalized model for English printed text + + This model has been trained on a large corpus of modern printed English text\naugmented with ~10000 lines of historical p + scripts: Latn + alphabet: !"#$%&'()+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]`abcdefghijklmnopqrstuvwxyz{} SPACE + accuracy: 99.95% + license: Apache-2.0 + author(s): Kiessling, Benjamin + date: 2019-02-26 + +Quickstart +========== + +Recognizing text on an image using the default parameters including the +prerequisite steps of binarization and page segmentation: + +.. code-block:: console + + $ kraken -i image.tif image.txt segment -bl ocr + Loading RNN ✓ + Processing ⣻ + +To binarize a single image using the nlbin algorithm (usually not required with the baseline segmenter): + +.. code-block:: console + + $ kraken -i image.tif bw.tif binarize + +To segment a binarized image into reading-order sorted baselines and regions: + +.. code-block:: console + + $ kraken -i bw.tif lines.json segment -bl + +To OCR an image using the default RNN: + +.. code-block:: console + + $ kraken -i bw.tif image.txt segment -bl ocr + +All commands and their parameters are documented, just add the standard +``--help`` flag for further information. + +Training Tutorial +================= + +There is a training tutorial at :doc:`training`. + +.. _license: + +License +======= + +``Kraken`` is provided under the terms and conditions of the `Apache 2.0 +License `_. diff --git a/3.0/_sources/ketos.rst.txt b/3.0/_sources/ketos.rst.txt new file mode 100644 index 000000000..0e7e46ef1 --- /dev/null +++ b/3.0/_sources/ketos.rst.txt @@ -0,0 +1,345 @@ +.. _ketos: + +Training +======== + +This page describes the training utilities available through the ``ketos`` +command line utility in depth. For a gentle introduction on model training +please refer to the :ref:`tutorial `. + +Both segmentation and recognition are trainable in kraken. The segmentation +model finds baselines and regions on a page image. Recognition models convert +text image lines found by the segmenter into digital text. + +Training data formats +--------------------- + +The training tools accept a variety of training data formats, usually some kind +of custom low level format, and the XML-based formats that are commony used for +archival of annotation and transcription data. It is recommended to use the XML +formats as they are interchangeable with other tools, do not incur +transformation losses, and allow training all components of kraken from the +same datasets easily. + +ALTO +~~~~ + +Kraken parses and produces files according to the upcoming version of the ALTO +standard: 4.2. It validates against version 4.1 with the exception of the +`redefinition `_ of the `BASELINE` +attribute to accomodate polygonal chain baselines. An example showing the +attributes necessary for segmentation and recognition training follows: + +.. literalinclude:: alto.xml + :language: xml + :force: + +Importantly, the parser only works with measurements in the pixel domain, i.e. +an unset `MeasurementUnit` or one with an element value of `pixel`. In +addition, as the minimal version required for ingestion is quite new it is +likely that most existing ALTO documents will not contain sufficient +information to be used with kraken out of the box. + +PAGE XML +~~~~~~~~ + +PAGE XML is parsed and produced according to the 2019-07-15 version of the +schema, although the parser is not strict and works with non-conformant output +of a variety of tools. + +.. literalinclude:: pagexml.xml + :language: xml + :force: + +Recognition training +-------------------- + +The training utility allows training of :ref:`VGSL ` specified models +both from scratch and from existing models. Here are its command line options: + +======================================================= ====== +option action +======================================================= ====== +-p, --pad Left and right padding around lines +-o, --output Output model file prefix. Defaults to model. +-s, --spec VGSL spec of the network to train. CTC layer + will be added automatically. default: + [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 + Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] +-a, --append Removes layers before argument and then + appends spec. Only works when loading an + existing model +-i, --load Load existing file to continue training +-F, --savefreq Model save frequency in epochs during + training +-R, --report Report creation frequency in epochs +-q, --quit Stop condition for training. Set to `early` + for early stopping (default) or `dumb` for fixed + number of epochs. +-N, --epochs Number of epochs to train for. Set to -1 for indefinite training. +--lag Number of epochs to wait before stopping + training without improvement. Only used when using early stopping. +--min-delta Minimum improvement between epochs to reset + early stopping. Defaults to 0.005. +-d, --device Select device to use (cpu, cuda:0, cuda:1,...). GPU acceleration requires CUDA. +--optimizer Select optimizer (Adam, SGD, RMSprop). +-r, --lrate Learning rate [default: 0.001] +-m, --momentum Momentum used with SGD optimizer. Ignored otherwise. +-w, --weight-decay Weight decay. +--schedule Sets the learning rate scheduler. May be either constant or 1cycle. For 1cycle + the cycle length is determined by the `--epoch` option. +-p, --partition Ground truth data partition ratio between train/validation set +-u, --normalization Ground truth Unicode normalization. One of NFC, NFKC, NFD, NFKD. +-c, --codec Load a codec JSON definition (invalid if loading existing model) +--resize Codec/output layer resizing option. If set + to `add` code points will be added, `both` + will set the layer to match exactly the + training data, `fail` will abort if training + data and model codec do not match. Only valid when refining an existing model. +-n, --reorder / --no-reorder Reordering of code points to display order. +-t, --training-files File(s) with additional paths to training data. Used to + enforce an explicit train/validation set split and deal with + training sets with more lines than the command line can process. Can be used more than once. +-e, --evaluation-files File(s) with paths to evaluation data. Overrides the `-p` parameter. +--preload / --no-preload Hard enable/disable for training data preloading. Preloading + training data into memory is enabled per default for sets with less than 2500 lines. +--threads Number of OpenMP threads when running on CPU. Defaults to min(4, #cores). +======================================================= ====== + +From Scratch +~~~~~~~~~~~~ + +The absolute minimal example to train a new recognition model from a number of +PAGE XML documents is similar to the segmentation training: + +.. code-block:: console + + $ ketos train training_data/*.png + +Training will continue until the error does not improve anymore and the best +model (among intermediate results) will be saved in the current directory. + +In some cases, such as color inputs, changing the network architecture might be +useful: + +.. code-block:: console + + $ ketos train -f page -s '[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do]' syr/*.xml + +Complete documentation for the network description language can be found on the +:ref:`VGSL ` page. + +Sometimes the early stopping default parameters might produce suboptimal +results such as stopping training too soon. Adjusting the minimum delta an/or +lag can be useful: + +.. code-block:: console + + $ ketos train --lag 10 --min-delta 0.001 syr/*.png + +To switch optimizers from Adam to SGD or RMSprop just set the option: + +.. code-block:: console + + $ ketos train --optimizer SGD syr/*.png + +It is possible to resume training from a previously saved model: + +.. code-block:: console + + $ ketos train -i model_25.mlmodel syr/*.png + +Fine Tuning +~~~~~~~~~~~ + +Fine tuning an existing model for another typeface or new characters is also +possible with the same syntax as resuming regular training: + +.. code-block:: console + + $ ketos train -f page -i model_best.mlmodel syr/*.xml + +The caveat is that the alphabet of the base model and training data have to be +an exact match. Otherwise an error will be raised: + +.. code-block:: console + + $ ketos train -i model_5.mlmodel --no-preload kamil/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [0.8616] alphabet mismatch {'~', '»', '8', '9', 'ـ'} + Network codec not compatible with training set + [0.8620] Training data and model codec alphabets mismatch: {'ٓ', '؟', '!', 'ص', '،', 'ذ', 'ة', 'ي', 'و', 'ب', 'ز', 'ح', 'غ', '~', 'ف', ')', 'د', 'خ', 'م', '»', 'ع', 'ى', 'ق', 'ش', 'ا', 'ه', 'ك', 'ج', 'ث', '(', 'ت', 'ظ', 'ض', 'ل', 'ط', '؛', 'ر', 'س', 'ن', 'ء', 'ٔ', '«', 'ـ', 'ٕ'} + +There are two modes dealing with mismatching alphabets, ``add`` and ``both``. +``add`` resizes the output layer and codec of the loaded model to include all +characters in the new training set without removing any characters. ``both`` +will make the resulting model an exact match with the new training set by both +removing unused characters from the model and adding new ones. + +.. code-block:: console + + $ ketos -v train --resize add -i model_5.mlmodel syr/*.png + ... + [0.7943] Training set 788 lines, validation set 88 lines, alphabet 50 symbols + ... + [0.8337] Resizing codec to include 3 new code points + [0.8374] Resizing last layer in network to 52 outputs + ... + +In this example 3 characters were added for a network that is able to +recognize 52 different characters after sufficient additional training. + +.. code-block:: console + + $ ketos -v train --resize both -i model_5.mlmodel syr/*.png + ... + [0.7593] Training set 788 lines, validation set 88 lines, alphabet 49 symbols + ... + [0.7857] Resizing network or given codec to 49 code sequences + [0.8344] Deleting 2 output classes from network (46 retained) + ... + +In ``both`` mode 2 of the original characters were removed and 3 new ones were added. + + +Slicing +~~~~~~~ + +Refining on mismatched alphabets has its limits. If the alphabets are highly +different the modification of the final linear layer to add/remove character +will destroy the inference capabilities of the network. In those cases it is +faster to slice off the last few layers of the network and only train those +instead of a complete network from scratch. + +Taking the default network definition as printed in the debug log we can see +the layer indices of the model: + +.. code-block:: console + + [0.8760] Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 48 outputs + [0.8762] layer type params + [0.8790] 0 conv kernel 3 x 3 filters 32 activation r + [0.8795] 1 dropout probability 0.1 dims 2 + [0.8797] 2 maxpool kernel 2 x 2 stride 2 x 2 + [0.8802] 3 conv kernel 3 x 3 filters 64 activation r + [0.8804] 4 dropout probability 0.1 dims 2 + [0.8806] 5 maxpool kernel 2 x 2 stride 2 x 2 + [0.8813] 6 reshape from 1 1 x 12 to 1/3 + [0.8876] 7 rnn direction b transposed False summarize False out 100 legacy None + [0.8878] 8 dropout probability 0.5 dims 1 + [0.8883] 9 linear augmented False out 48 + +To remove everything after the initial convolutional stack and add untrained +layers we define a network stub and index for appending: + +.. code-block:: console + + $ ketos train -i model_1.mlmodel --append 7 -s '[Lbx256 Do]' syr/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [0.8014] alphabet mismatch {'8', '3', '9', '7', '܇', '݀', '݂', '4', ':', '0'} + Slicing and dicing model ✓ + +The new model will behave exactly like a new one, except potentially training a +lot faster. + +Segmentation training +--------------------- + +Training a segmentation model is very similar to training one for + + +Testing +------- + +Picking a particular model from a pool or getting a more detailled look on the +recognition accuracy can be done with the `test` command. It uses transcribed +lines, the test set, in the same format as the `train` command, recognizes the +line images with one or more models, and creates a detailled report of the +differences from the ground truth for each of them. + +======================================================= ====== +option action +======================================================= ====== +-m, --model Model(s) to evaluate. +-e, --evaluation-files File(s) with paths to evaluation data. +-d, --device Select device to use. +-p, --pad Left and right padding around lines. + + +Transcriptions are handed to the command in the same way as for the `train` +command, either through a manifest with `-e/--evaluation-files` or by just +adding a number of image files as the final argument: + +.. code-block:: console + + $ ketos test -m $model -e test.txt test/*.png + Evaluating $model + Evaluating [####################################] 100% + === report test_model.mlmodel === + + 7012 Characters + 6022 Errors + 14.12% Accuracy + + 5226 Insertions + 2 Deletions + 794 Substitutions + + Count Missed %Right + 1567 575 63.31% Common + 5230 5230 0.00% Arabic + 215 215 0.00% Inherited + + Errors Correct-Generated + 773 { ا } - { } + 536 { ل } - { } + 328 { و } - { } + 274 { ي } - { } + 266 { م } - { } + 256 { ب } - { } + 246 { ن } - { } + 241 { SPACE } - { } + 207 { ر } - { } + 199 { ف } - { } + 192 { ه } - { } + 174 { ع } - { } + 172 { ARABIC HAMZA ABOVE } - { } + 144 { ت } - { } + 136 { ق } - { } + 122 { س } - { } + 108 { ، } - { } + 106 { د } - { } + 82 { ك } - { } + 81 { ح } - { } + 71 { ج } - { } + 66 { خ } - { } + 62 { ة } - { } + 60 { ص } - { } + 39 { ، } - { - } + 38 { ش } - { } + 30 { ا } - { - } + 30 { ن } - { - } + 29 { ى } - { } + 28 { ذ } - { } + 27 { ه } - { - } + 27 { ARABIC HAMZA BELOW } - { } + 25 { ز } - { } + 23 { ث } - { } + 22 { غ } - { } + 20 { م } - { - } + 20 { ي } - { - } + 20 { ) } - { } + 19 { : } - { } + 19 { ط } - { } + 19 { ل } - { - } + 18 { ، } - { . } + 17 { ة } - { - } + 16 { ض } - { } + ... + Average accuracy: 14.12%, (stddev: 0.00) + +The report(s) contains character accuracy measured per script and a detailled +list of confusions. When evaluating multiple models the last line of the output +will the average accuracy and the standard deviation across all of them. diff --git a/3.0/_sources/models.rst.txt b/3.0/_sources/models.rst.txt new file mode 100644 index 000000000..24033d111 --- /dev/null +++ b/3.0/_sources/models.rst.txt @@ -0,0 +1,17 @@ +.. _models: + +Models +====== + +There are currently three kinds of models containing the recurrent neural +networks doing all the character recognition supported by kraken: ``pronn`` +files serializing old pickled ``pyrnn`` models as protobuf, clstm's native +serialization, and versatile `Core ML +`_ models. + +CoreML +------ + +Core ML allows arbitrary network architectures in a compact serialization with +metadata. This is the default format in pytorch-based kraken. + diff --git a/3.0/_sources/training.rst.txt b/3.0/_sources/training.rst.txt new file mode 100644 index 000000000..3a339a750 --- /dev/null +++ b/3.0/_sources/training.rst.txt @@ -0,0 +1,456 @@ +.. _training: + +Training kraken +=============== + +kraken is an optical character recognition package that can be trained fairly +easily for a large number of scripts. In contrast to other system requiring +segmentation down to glyph level before classification, it is uniquely suited +for the recognition of connected scripts, because the neural network is trained +to assign correct character to unsegmented training data. + +Both segmentation, the process finding lines and regions on a page image, and +recognition, the conversion of line images into text, can be trained in kraken. +To train models for either we require training data, i.e. examples of page +segmentations and transcriptions that are similar to what we want to be able to +recognize. For segmentation the examples are the location of baselines, i.e. +the imaginary lines the text is written on, and polygons of regions. For +recognition these are the text contained in a line. There are multiple ways to +supply training data but the easiest is through PageXML or ALTO files. + +Installing kraken +----------------- + +The easiest way to install and use kraken is through `conda +`_. kraken works both on Linux and Mac OS +X. After installing conda, download the environment file and create the +environment for kraken: + +.. code-block:: console + + $ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment.yml + $ conda env create -f environment.yml + +Each time you want to use the kraken environment in a shell is has to be +activated first: + +.. code-block:: console + + $ conda activate kraken + +Image acquisition and preprocessing +----------------------------------- + +First a number of high quality scans, preferably color or grayscale and at +least 300dpi are required. Scans should be in a lossless image format such as +TIFF or PNG, images in PDF files have to be extracted beforehand using a tool +such as ``pdftocairo`` or ``pdfimages``. While each of these requirements can +be relaxed to a degree, the final accuracy will suffer to some extent. For +example, only slightly compressed JPEG scans are generally suitable for +training and recognition. + +Depending on the source of the scans some preprocessing such as splitting scans +into pages, correcting skew and warp, and removing speckles can be advisable +although it isn't strictly necessary as the segmenter can be trained to treat +noisy material with a high accuracy. A fairly user-friendly software for +semi-automatic batch processing of image scans is `Scantailor +`_ albeit most work can be done using a standard image +editor. + +The total number of scans required depends on the kind of model to train +(segmentation or recognition), the complexity of the layout or the nature of +the script to recognize. Only features that are found in the training data can +later be recognized, so it is important that the coverage of typographic +features is exhaustive. Training a small segmentation model for a particular +kind of material might require less than a few hundred samples while a general +model can well go into the thousands of pages. Likewise a specific recognition +model for printed script with a small grapheme inventory such as Arabic or +Hebrew requires around 800 lines, with manuscripts, complex scripts (such as +polytonic Greek), and general models for multiple typefaces and hands needing +more training data for the same accuracy. + +There is no hard rule for the amount of training data and it may be required to +retrain a model after the initial training data proves insufficient. Most +``western`` texts contain between 25 and 40 lines per page, therefore upward of +30 pages have to be preprocessed and later transcribed. + +Annotation and transcription +---------------------------- + +kraken does not provide internal tools for the annotation and transcription of +baselines, regions, and text. There are a number of tools available that can +create ALTO and PageXML files containing the requisite information for either +segmentation or recognition training: `escriptorium +`_ integrates kraken tightly including +training and inference, `Aletheia +`_ is a powerful desktop +application that can create fine grained annotations. + +Training +-------- + +The training data, e.g. a collection of PAGE XML documents, obtained through +annotation and transcription may now be used to train segmentation and/or +transcription models. + +The training data in ``output_dir`` may now be used to train a new model by +invoking the ``ketos train`` command. Just hand a list of images to the command +such as: + +.. code-block:: console + + $ ketos train output_dir/*.png + +to start training. + +A number of lines will be split off into a separate held-out set that is used +to estimate the actual recognition accuracy achieved in the real world. These +are never shown to the network during training but will be recognized +periodically to evaluate the accuracy of the model. Per default the validation +set will comprise of 10% of the training data. + +Basic model training is mostly automatic albeit there are multiple parameters +that can be adjusted: + +--output + Sets the prefix for models generated during training. They will best as + ``prefix_epochs.mlmodel``. +--report + How often evaluation passes are run on the validation set. It is an + integer equal or larger than 1 with 1 meaning a report is created each + time the complete training set has been seen by the network. +--savefreq + How often intermediate models are saved to disk. It is an integer with + the same semantics as ``--report``. +--load + Continuing training is possible by loading an existing model file with + ``--load``. To continue training from a base model with another + training set refer to the full :ref:`ketos ` documentation. +--preload + Enables/disables preloading of the training set into memory for + accelerated training. The default setting preloads data sets with less + than 2500 lines, explicitly adding ``--preload`` will preload arbitrary + sized sets. ``--no-preload`` disables preloading in all circumstances. + +Training a network will take some time on a modern computer, even with the +default parameters. While the exact time required is unpredictable as training +is a somewhat random process a rough guide is that accuracy seldomly improves +after 50 epochs reached between 8 and 24 hours of training. + +When to stop training is a matter of experience; the default setting employs a +fairly reliable approach known as `early stopping +`_ that stops training as soon as +the error rate on the validation set doesn't improve anymore. This will +prevent `overfitting `_, i.e. +fitting the model to recognize only the training data properly instead of the +general patterns contained therein. + +.. code-block:: console + + $ ketos train output_dir/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'} + Initializing model ✓ + Accuracy report (0) -1.5951 3680 9550 + epoch 0/-1 [####################################] 788/788 + Accuracy report (1) 0.0245 3504 3418 + epoch 1/-1 [####################################] 788/788 + Accuracy report (2) 0.8445 3504 545 + epoch 2/-1 [####################################] 788/788 + Accuracy report (3) 0.9541 3504 161 + epoch 3/-1 [------------------------------------] 13/788 0d 00:22:09 + ... + +By now there should be a couple of models model_name-1.mlmodel, +model_name-2.mlmodel, ... in the directory the script was executed in. Lets +take a look at each part of the output. + +.. code-block:: console + + Building training set [####################################] 100% + Building validation set [####################################] 100% + +shows the progress of loading the training and validation set into memory. This +might take a while as preprocessing the whole set and putting it into memory is +computationally intensive. Loading can be made faster without preloading at the +cost of performing preprocessing repeatedlyduring the training process. + +.. code-block:: console + + [270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'} + +is a warning about missing characters in either the validation or training set, +i.e. that the alphabets of the sets are not equal. Increasing the size of the +validation set will often remedy this warning. + +.. code-block:: console + + Accuracy report (2) 0.8445 3504 545 + +this line shows the results of the validation set evaluation. The error after 2 +epochs is 545 incorrect characters out of 3504 characters in the validation set +for a character accuracy of 84.4%. It should decrease fairly rapidly. If +accuracy remains around 0.30 something is amiss, e.g. non-reordered +right-to-left or wildly incorrect transcriptions. Abort training, correct the +error(s) and start again. + +After training is finished the best model is saved as +``model_name_best.mlmodel``. It is highly recommended to also archive the +training log and data for later reference. + +``ketos`` can also produce more verbose output with training set and network +information by appending one or more ``-v`` to the command: + +.. code-block:: console + + $ ketos -vv train syr/*.png + [0.7272] Building ground truth set from 876 line images + [0.7281] Taking 88 lines from training for evaluation + ... + [0.8479] Training set 788 lines, validation set 88 lines, alphabet 48 symbols + [0.8481] alphabet mismatch {'\xa0', '0', ':', '݀', '܇', '݂', '5'} + [0.8482] grapheme count + [0.8484] SPACE 5258 + [0.8484] ܐ 3519 + [0.8485] ܘ 2334 + [0.8486] ܝ 2096 + [0.8487] ܠ 1754 + [0.8487] ܢ 1724 + [0.8488] ܕ 1697 + [0.8489] ܗ 1681 + [0.8489] ܡ 1623 + [0.8490] ܪ 1359 + [0.8491] ܬ 1339 + [0.8491] ܒ 1184 + [0.8492] ܥ 824 + [0.8492] . 811 + [0.8493] COMBINING DOT BELOW 646 + [0.8493] ܟ 599 + [0.8494] ܫ 577 + [0.8495] COMBINING DIAERESIS 488 + [0.8495] ܚ 431 + [0.8496] ܦ 428 + [0.8496] ܩ 307 + [0.8497] COMBINING DOT ABOVE 259 + [0.8497] ܣ 256 + [0.8498] ܛ 204 + [0.8498] ܓ 176 + [0.8499] ܀ 132 + [0.8499] ܙ 81 + [0.8500] * 66 + [0.8501] ܨ 59 + [0.8501] ܆ 40 + [0.8502] [ 40 + [0.8503] ] 40 + [0.8503] 1 18 + [0.8504] 2 11 + [0.8504] ܇ 9 + [0.8505] 3 8 + [0.8505] 6 + [0.8506] 5 5 + [0.8506] NO-BREAK SPACE 4 + [0.8507] 0 4 + [0.8507] 6 4 + [0.8508] : 4 + [0.8508] 8 4 + [0.8509] 9 3 + [0.8510] 7 3 + [0.8510] 4 3 + [0.8511] SYRIAC FEMININE DOT 1 + [0.8511] SYRIAC RUKKAKHA 1 + [0.8512] Encoding training set + [0.9315] Creating new model [1,1,0,48 Lbx100 Do] with 49 outputs + [0.9318] layer type params + [0.9350] 0 rnn direction b transposed False summarize False out 100 legacy None + [0.9361] 1 dropout probability 0.5 dims 1 + [0.9381] 2 linear augmented False out 49 + [0.9918] Constructing RMSprop optimizer (lr: 0.001, momentum: 0.9) + [0.9920] Set OpenMP threads to 4 + [0.9920] Moving model to device cpu + [0.9924] Starting evaluation run + + +indicates that the training is running on 788 transcribed lines and a +validation set of 88 lines. 49 different classes, i.e. Unicode code points, +where found in these 788 lines. These affect the output size of the network; +obviously only these 49 different classes/code points can later be output by +the network. Importantly, we can see that certain characters occur markedly +less often than others. Characters like the Syriac feminine dot and numerals +that occur less than 10 times will most likely not be recognized well by the +trained net. + + +Evaluation and Validation +------------------------- + +While output during training is detailed enough to know when to stop training +one usually wants to know the specific kinds of errors to expect. Doing more +in-depth error analysis also allows to pinpoint weaknesses in the training +data, e.g. above average error rates for numerals indicate either a lack of +representation of numerals in the training data or erroneous transcription in +the first place. + +First the trained model has to be applied to some line transcriptions with the +`ketos test` command: + +.. code-block:: console + + $ ketos test -m syriac_best.mlmodel lines/*.png + Loading model syriac_best.mlmodel ✓ + Evaluating syriac_best.mlmodel + Evaluating [#-----------------------------------] 3% 00:04:56 + ... + +After all lines have been processed a evaluation report will be printed: + +.. code-block:: console + + === report === + + 35619 Characters + 336 Errors + 99.06% Accuracy + + 157 Insertions + 81 Deletions + 98 Substitutions + + Count Missed %Right + 27046 143 99.47% Syriac + 7015 52 99.26% Common + 1558 60 96.15% Inherited + + Errors Correct-Generated + 25 { } - { COMBINING DOT BELOW } + 25 { COMBINING DOT BELOW } - { } + 15 { . } - { } + 15 { COMBINING DIAERESIS } - { } + 12 { ܢ } - { } + 10 { } - { . } + 8 { COMBINING DOT ABOVE } - { } + 8 { ܝ } - { } + 7 { ZERO WIDTH NO-BREAK SPACE } - { } + 7 { ܆ } - { } + 7 { SPACE } - { } + 7 { ܣ } - { } + 6 { } - { ܝ } + 6 { COMBINING DOT ABOVE } - { COMBINING DIAERESIS } + 5 { ܙ } - { } + 5 { ܬ } - { } + 5 { } - { ܢ } + 4 { NO-BREAK SPACE } - { } + 4 { COMBINING DIAERESIS } - { COMBINING DOT ABOVE } + 4 { } - { ܒ } + 4 { } - { COMBINING DIAERESIS } + 4 { ܗ } - { } + 4 { } - { ܬ } + 4 { } - { ܘ } + 4 { ܕ } - { ܢ } + 3 { } - { ܕ } + 3 { ܐ } - { } + 3 { ܗ } - { ܐ } + 3 { ܝ } - { ܢ } + 3 { ܀ } - { . } + 3 { } - { ܗ } + + ..... + +The first section of the report consists of a simple accounting of the number +of characters in the ground truth, the errors in the recognition output and the +resulting accuracy in per cent. + +The next table lists the number of insertions (characters occuring in the +ground truth but not in the recognition output), substitutions (misrecognized +characters), and deletions (superfluous characters recognized by the model). + +Next is a grouping of errors (insertions and substitutions) by Unicode script. + +The final part of the report are errors sorted by frequency and a per +character accuracy report. Importantly most errors are incorrect recognition of +combining marks such as dots and diaereses. These may have several sources: +different dot placement in training and validation set, incorrect transcription +such as non-systematic transcription, or unclean speckled scans. Depending on +the error source, correction most often involves adding more training data and +fixing transcriptions. Sometimes it may even be advisable to remove +unrepresentative data from the training set. + +Recognition +----------- + +The ``kraken`` utility is employed for all non-training related tasks. Optical +character recognition is a multi-step process consisting of binarization +(conversion of input images to black and white), page segmentation (extracting +lines from the image), and recognition (converting line image to character +sequences). All of these may be run in a single call like this: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m MODEL_FILE + +producing a text file from the input image. There are also `hocr +`_ and `ALTO `_ output +formats available through the appropriate switches: + +.. code-block:: console + + $ kraken -i ... ocr -h + $ kraken -i ... ocr -a + +For debugging purposes it is sometimes helpful to run each step manually and +inspect intermediate results: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE BW_IMAGE binarize + $ kraken -i BW_IMAGE LINES segment + $ kraken -i BW_IMAGE OUTPUT_FILE ocr -l LINES ... + +It is also possible to recognize more than one file at a time by just chaining +``-i ... ...`` clauses like this: + +.. code-block:: console + + $ kraken -i input_1 output_1 -i input_2 output_2 ... + +Finally, there is an central repository containing freely available models. +Getting a list of all available models: + +.. code-block:: console + + $ kraken list + +Retrieving model metadata for a particular model: + +.. code-block:: console + + $ kraken show arabic-alam-al-kutub + name: arabic-alam-al-kutub.mlmodel + + An experimental model for Classical Arabic texts. + + Network trained on 889 lines of [0] as a test case for a general Classical + Arabic model. Ground truth was prepared by Sarah Savant + and Maxim Romanov . + + Vocalization was omitted in the ground truth. Training was stopped at ~35000 + iterations with an accuracy of 97%. + + [0] Ibn al-Faqīh (d. 365 AH). Kitāb al-buldān. Edited by Yūsuf al-Hādī, 1st + edition. Bayrūt: ʿĀlam al-kutub, 1416 AH/1996 CE. + alphabet: !()-.0123456789:[] «»،؟ءابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ARABIC + MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW + +and actually fetching the model: + +.. code-block:: console + + $ kraken get arabic-alam-al-kutub + +The downloaded model can then be used for recognition by the name shown in its metadata, e.g.: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m arabic-alam-al-kutub.mlmodel + +For more documentation see the kraken `website `_. diff --git a/3.0/_sources/vgsl.rst.txt b/3.0/_sources/vgsl.rst.txt new file mode 100644 index 000000000..6ba6df1c9 --- /dev/null +++ b/3.0/_sources/vgsl.rst.txt @@ -0,0 +1,199 @@ +.. _vgsl: + +VGSL network specification +========================== + +kraken implements a dialect of the Variable-size Graph Specification Language +(VGSL), enabling the specification of different network architectures for image +processing purposes using a short definition string. + +Basics +------ + +A VGSL specification consists of an input block, one or more layers, and an +output block. For example: + +.. code-block:: console + + [1,48,0,1 Cr3,3,32 Mp2,2 Cr3,3,64 Mp2,2 S1(1x12)1,3 Lbx100 Do O1c103] + +The first block defines the input in order of [batch, heigh, width, channels] +with zero-valued dimensions being variable. Integer valued height or width +input specifications will result in the input images being automatically scaled +in either dimension. + +When channels are set to 1 grayscale or B/W inputs are expected, 3 expects RGB +color images. Higher values in combination with a height of 1 result in the +network being fed 1 pixel wide grayscale strips scaled to the size of the +channel dimension. + +After the input, a number of layers are defined. Layers operate on the channel +dimension; this is intuitive for convolutional layers but a recurrent layer +doing sequence classification along the width axis on an image of a particular +height requires the height dimension to be moved to the channel dimension, +e.g.: + +.. code-block:: console + + [1,48,0,1 S1(1x48)1,3 Lbx100 O1c103] + +or using the alternative slightly faster formulation: + +.. code-block:: console + + [1,1,0,48 Lbx100 O1c103] + +Finally an output definition is appended. When training sequence classification +networks with the provided tools the appropriate output definition is +automatically appended to the network based on the alphabet of the training +data. + +Examples +-------- + +.. code-block:: console + + [1,1,0,48 Lbx100 Do 01c59] + + Creating new model [1,1,0,48 Lbx100 Do] with 59 outputs + layer type params + 0 rnn direction b transposed False summarize False out 100 legacy None + 1 dropout probability 0.5 dims 1 + 2 linear augmented False out 59 + +A simple recurrent recognition model with a single LSTM layer classifying lines +normalized to 48 pixels in height. + +.. code-block:: console + + [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do 01c59] + + Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 59 outputs + layer type params + 0 conv kernel 3 x 3 filters 32 activation r + 1 dropout probability 0.1 dims 2 + 2 maxpool kernel 2 x 2 stride 2 x 2 + 3 conv kernel 3 x 3 filters 64 activation r + 4 dropout probability 0.1 dims 2 + 5 maxpool kernel 2 x 2 stride 2 x 2 + 6 reshape from 1 1 x 12 to 1/3 + 7 rnn direction b transposed False summarize False out 100 legacy None + 8 dropout probability 0.5 dims 1 + 9 linear augmented False out 59 + +A model with a small convolutional stack before a recurrent LSTM layer. The +extended dropout layer syntax is used to reduce drop probability on the depth +dimension as the default is too high for convolutional layers. The remainder of +the height dimension (`12`) is reshaped into the depth dimensions before +applying the final recurrent and linear layers. + +.. code-block:: console + + [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do 01c59] + + Creating new model [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do] with 59 outputs + layer type params + 0 conv kernel 3 x 3 filters 16 activation r + 1 maxpool kernel 3 x 3 stride 3 x 3 + 2 rnn direction f transposed True summarize True out 64 legacy None + 3 rnn direction b transposed False summarize False out 128 legacy None + 4 rnn direction b transposed False summarize False out 256 legacy None + 5 dropout probability 0.5 dims 1 + 6 linear augmented False out 59 + +A model with arbitrary sized color image input, an initial summarizing +recurrent layer to squash the height to 64, followed by 2 bi-directional +recurrent layers and a linear projection. + +Convolutional Layers +-------------------- + +.. code-block:: console + + C[{name}](s|t|r|l|m)[{name}],,[,,] + s = sigmoid + t = tanh + r = relu + l = linear + m = softmax + +Adds a 2D convolution with kernel size `(y, x)` and `d` output channels, applying +the selected nonlinearity. The stride can be adjusted with the optional last +two parameters. + +Recurrent Layers +---------------- + +.. code-block:: console + + L[{name}](f|r|b)(x|y)[s][{name}] LSTM cell with n outputs. + G[{name}](f|r|b)(x|y)[s][{name}] GRU cell with n outputs. + f runs the RNN forward only. + r runs the RNN reversed only. + b runs the RNN bidirectionally. + s (optional) summarizes the output in the requested dimension, return the last step. + +Adds either an LSTM or GRU recurrent layer to the network using eiter the `x` +(width) or `y` (height) dimension as the time axis. Input features are the +channel dimension and the non-time-axis dimension (height/width) is treated as +another batch dimension. For example, a `Lfx25` layer on an `1, 16, 906, 32` +input will execute 16 independent forward passes on `906x32` tensors resulting +in an output of shape `1, 16, 906, 25`. If this isn't desired either run a +summarizing layer in the other direction, e.g. `Lfys20` for an input `1, 1, +906, 20`, or prepend a reshape layer `S1(1x16)1,3` combining the height and +channel dimension for an `1, 1, 906, 512` input to the recurrent layer. + +Helper and Plumbing Layers +-------------------------- + +Max Pool +^^^^^^^^ +.. code-block:: console + + Mp[{name}],[,,] + +Adds a maximum pooling with `(y, x)` kernel_size and `(y_stride, x_stride)` stride. + +Reshape +^^^^^^^ + +.. code-block:: console + + S[{name}](x), Splits one dimension, moves one part to another + dimension. + +The `S` layer reshapes a source dimension `d` to `a,b` and distributes `a` into +dimension `e`, respectively `b` into `f`. Either `e` or `f` has to be equal to +`d`. So `S1(1, 48)1, 3` on an `1, 48, 1020, 8` input will first reshape into +`1, 1, 48, 1020, 8`, leave the `1` part in the height dimension and distribute +the `48` sized tensor into the channel dimension resulting in a `1, 1, 1024, +48*8=384` sized output. `S` layers are mostly used to remove undesirable non-1 +height before a recurrent layer. + +.. note:: + + This `S` layer is equivalent to the one implemented in the tensorflow + implementation of VGSL, i.e. behaves differently from tesseract. + +Regularization Layers +--------------------- + +Dropout +^^^^^^^ + +.. code-block:: console + + Do[{name}][],[] Insert a 1D or 2D dropout layer + +Adds an 1D or 2D dropout layer with a given probability. Defaults to `0.5` drop +probability and 1D dropout. Set to `dim` to `2` after convolutional layers. + +Group Normalization +^^^^^^^^^^^^^^^^^^^ + +.. code-block:: console + + Gn Inserts a group normalization layer + +Adds a group normalization layer separating the input into `` groups, +normalizing each separately. diff --git a/3.0/_static/alabaster.css b/3.0/_static/alabaster.css new file mode 100644 index 000000000..e3174bf93 --- /dev/null +++ b/3.0/_static/alabaster.css @@ -0,0 +1,708 @@ +@import url("basic.css"); + +/* -- page layout ----------------------------------------------------------- */ + +body { + font-family: Georgia, serif; + font-size: 17px; + background-color: #fff; + color: #000; + margin: 0; + padding: 0; +} + + +div.document { + width: 940px; + margin: 30px auto 0 auto; +} + +div.documentwrapper { + float: left; + width: 100%; +} + +div.bodywrapper { + margin: 0 0 0 220px; +} + +div.sphinxsidebar { + width: 220px; + font-size: 14px; + line-height: 1.5; +} + +hr { + border: 1px solid #B1B4B6; +} + +div.body { + background-color: #fff; + color: #3E4349; + padding: 0 30px 0 30px; +} + +div.body > .section { + text-align: left; +} + +div.footer { + width: 940px; + margin: 20px auto 30px auto; + font-size: 14px; + color: #888; + text-align: right; +} + +div.footer a { + color: #888; +} + +p.caption { + font-family: inherit; + font-size: inherit; +} + + +div.relations { + display: none; +} + + +div.sphinxsidebar { + max-height: 100%; + overflow-y: auto; +} + +div.sphinxsidebar a { + color: #444; + text-decoration: none; + border-bottom: 1px dotted #999; +} + +div.sphinxsidebar a:hover { + border-bottom: 1px solid #999; +} + +div.sphinxsidebarwrapper { + padding: 18px 10px; +} + +div.sphinxsidebarwrapper p.logo { + padding: 0; + margin: -10px 0 0 0px; + text-align: center; +} + +div.sphinxsidebarwrapper h1.logo { + margin-top: -10px; + text-align: center; + margin-bottom: 5px; + text-align: left; +} + +div.sphinxsidebarwrapper h1.logo-name { + margin-top: 0px; +} + +div.sphinxsidebarwrapper p.blurb { + margin-top: 0; + font-style: normal; +} + +div.sphinxsidebar h3, +div.sphinxsidebar h4 { + font-family: Georgia, serif; + color: #444; + font-size: 24px; + font-weight: normal; + margin: 0 0 5px 0; + padding: 0; +} + +div.sphinxsidebar h4 { + font-size: 20px; +} + +div.sphinxsidebar h3 a { + color: #444; +} + +div.sphinxsidebar p.logo a, +div.sphinxsidebar h3 a, +div.sphinxsidebar p.logo a:hover, +div.sphinxsidebar h3 a:hover { + border: none; +} + +div.sphinxsidebar p { + color: #555; + margin: 10px 0; +} + +div.sphinxsidebar ul { + margin: 10px 0; + padding: 0; + color: #000; +} + +div.sphinxsidebar ul li.toctree-l1 > a { + font-size: 120%; +} + +div.sphinxsidebar ul li.toctree-l2 > a { + font-size: 110%; +} + +div.sphinxsidebar input { + border: 1px solid #CCC; + font-family: Georgia, serif; + font-size: 1em; +} + +div.sphinxsidebar #searchbox input[type="text"] { + width: 160px; +} + +div.sphinxsidebar .search > div { + display: table-cell; +} + +div.sphinxsidebar hr { + border: none; + height: 1px; + color: #AAA; + background: #AAA; + + text-align: left; + margin-left: 0; + width: 50%; +} + +div.sphinxsidebar .badge { + border-bottom: none; +} + +div.sphinxsidebar .badge:hover { + border-bottom: none; +} + +/* To address an issue with donation coming after search */ +div.sphinxsidebar h3.donation { + margin-top: 10px; +} + +/* -- body styles ----------------------------------------------------------- */ + +a { + color: #004B6B; + text-decoration: underline; +} + +a:hover { + color: #6D4100; + text-decoration: underline; +} + +div.body h1, +div.body h2, +div.body h3, +div.body h4, +div.body h5, +div.body h6 { + font-family: Georgia, serif; + font-weight: normal; + margin: 30px 0px 10px 0px; + padding: 0; +} + +div.body h1 { margin-top: 0; padding-top: 0; font-size: 240%; } +div.body h2 { font-size: 180%; } +div.body h3 { font-size: 150%; } +div.body h4 { font-size: 130%; } +div.body h5 { font-size: 100%; } +div.body h6 { font-size: 100%; } + +a.headerlink { + color: #DDD; + padding: 0 4px; + text-decoration: none; +} + +a.headerlink:hover { + color: #444; + background: #EAEAEA; +} + +div.body p, div.body dd, div.body li { + line-height: 1.4em; +} + +div.admonition { + margin: 20px 0px; + padding: 10px 30px; + background-color: #EEE; + border: 1px solid #CCC; +} + +div.admonition tt.xref, div.admonition code.xref, div.admonition a tt { + background-color: #FBFBFB; + border-bottom: 1px solid #fafafa; +} + +div.admonition p.admonition-title { + font-family: Georgia, serif; + font-weight: normal; + font-size: 24px; + margin: 0 0 10px 0; + padding: 0; + line-height: 1; +} + +div.admonition p.last { + margin-bottom: 0; +} + +div.highlight { + background-color: #fff; +} + +dt:target, .highlight { + background: #FAF3E8; +} + +div.warning { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.danger { + background-color: #FCC; + border: 1px solid #FAA; + -moz-box-shadow: 2px 2px 4px #D52C2C; + -webkit-box-shadow: 2px 2px 4px #D52C2C; + box-shadow: 2px 2px 4px #D52C2C; +} + +div.error { + background-color: #FCC; + border: 1px solid #FAA; + -moz-box-shadow: 2px 2px 4px #D52C2C; + -webkit-box-shadow: 2px 2px 4px #D52C2C; + box-shadow: 2px 2px 4px #D52C2C; +} + +div.caution { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.attention { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.important { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.note { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.tip { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.hint { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.seealso { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.topic { + background-color: #EEE; +} + +p.admonition-title { + display: inline; +} + +p.admonition-title:after { + content: ":"; +} + +pre, tt, code { + font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace; + font-size: 0.9em; +} + +.hll { + background-color: #FFC; + margin: 0 -12px; + padding: 0 12px; + display: block; +} + +img.screenshot { +} + +tt.descname, tt.descclassname, code.descname, code.descclassname { + font-size: 0.95em; +} + +tt.descname, code.descname { + padding-right: 0.08em; +} + +img.screenshot { + -moz-box-shadow: 2px 2px 4px #EEE; + -webkit-box-shadow: 2px 2px 4px #EEE; + box-shadow: 2px 2px 4px #EEE; +} + +table.docutils { + border: 1px solid #888; + -moz-box-shadow: 2px 2px 4px #EEE; + -webkit-box-shadow: 2px 2px 4px #EEE; + box-shadow: 2px 2px 4px #EEE; +} + +table.docutils td, table.docutils th { + border: 1px solid #888; + padding: 0.25em 0.7em; +} + +table.field-list, table.footnote { + border: none; + -moz-box-shadow: none; + -webkit-box-shadow: none; + box-shadow: none; +} + +table.footnote { + margin: 15px 0; + width: 100%; + border: 1px solid #EEE; + background: #FDFDFD; + font-size: 0.9em; +} + +table.footnote + table.footnote { + margin-top: -15px; + border-top: none; +} + +table.field-list th { + padding: 0 0.8em 0 0; +} + +table.field-list td { + padding: 0; +} + +table.field-list p { + margin-bottom: 0.8em; +} + +/* Cloned from + * https://github.com/sphinx-doc/sphinx/commit/ef60dbfce09286b20b7385333d63a60321784e68 + */ +.field-name { + -moz-hyphens: manual; + -ms-hyphens: manual; + -webkit-hyphens: manual; + hyphens: manual; +} + +table.footnote td.label { + width: .1px; + padding: 0.3em 0 0.3em 0.5em; +} + +table.footnote td { + padding: 0.3em 0.5em; +} + +dl { + margin-left: 0; + margin-right: 0; + margin-top: 0; + padding: 0; +} + +dl dd { + margin-left: 30px; +} + +blockquote { + margin: 0 0 0 30px; + padding: 0; +} + +ul, ol { + /* Matches the 30px from the narrow-screen "li > ul" selector below */ + margin: 10px 0 10px 30px; + padding: 0; +} + +pre { + background: #EEE; + padding: 7px 30px; + margin: 15px 0px; + line-height: 1.3em; +} + +div.viewcode-block:target { + background: #ffd; +} + +dl pre, blockquote pre, li pre { + margin-left: 0; + padding-left: 30px; +} + +tt, code { + background-color: #ecf0f3; + color: #222; + /* padding: 1px 2px; */ +} + +tt.xref, code.xref, a tt { + background-color: #FBFBFB; + border-bottom: 1px solid #fff; +} + +a.reference { + text-decoration: none; + border-bottom: 1px dotted #004B6B; +} + +/* Don't put an underline on images */ +a.image-reference, a.image-reference:hover { + border-bottom: none; +} + +a.reference:hover { + border-bottom: 1px solid #6D4100; +} + +a.footnote-reference { + text-decoration: none; + font-size: 0.7em; + vertical-align: top; + border-bottom: 1px dotted #004B6B; +} + +a.footnote-reference:hover { + border-bottom: 1px solid #6D4100; +} + +a:hover tt, a:hover code { + background: #EEE; +} + + +@media screen and (max-width: 870px) { + + div.sphinxsidebar { + display: none; + } + + div.document { + width: 100%; + + } + + div.documentwrapper { + margin-left: 0; + margin-top: 0; + margin-right: 0; + margin-bottom: 0; + } + + div.bodywrapper { + margin-top: 0; + margin-right: 0; + margin-bottom: 0; + margin-left: 0; + } + + ul { + margin-left: 0; + } + + li > ul { + /* Matches the 30px from the "ul, ol" selector above */ + margin-left: 30px; + } + + .document { + width: auto; + } + + .footer { + width: auto; + } + + .bodywrapper { + margin: 0; + } + + .footer { + width: auto; + } + + .github { + display: none; + } + + + +} + + + +@media screen and (max-width: 875px) { + + body { + margin: 0; + padding: 20px 30px; + } + + div.documentwrapper { + float: none; + background: #fff; + } + + div.sphinxsidebar { + display: block; + float: none; + width: 102.5%; + margin: 50px -30px -20px -30px; + padding: 10px 20px; + background: #333; + color: #FFF; + } + + div.sphinxsidebar h3, div.sphinxsidebar h4, div.sphinxsidebar p, + div.sphinxsidebar h3 a { + color: #fff; + } + + div.sphinxsidebar a { + color: #AAA; + } + + div.sphinxsidebar p.logo { + display: none; + } + + div.document { + width: 100%; + margin: 0; + } + + div.footer { + display: none; + } + + div.bodywrapper { + margin: 0; + } + + div.body { + min-height: 0; + padding: 0; + } + + .rtd_doc_footer { + display: none; + } + + .document { + width: auto; + } + + .footer { + width: auto; + } + + .footer { + width: auto; + } + + .github { + display: none; + } +} + + +/* misc. */ + +.revsys-inline { + display: none!important; +} + +/* Hide ugly table cell borders in ..bibliography:: directive output */ +table.docutils.citation, table.docutils.citation td, table.docutils.citation th { + border: none; + /* Below needed in some edge cases; if not applied, bottom shadows appear */ + -moz-box-shadow: none; + -webkit-box-shadow: none; + box-shadow: none; +} + + +/* relbar */ + +.related { + line-height: 30px; + width: 100%; + font-size: 0.9rem; +} + +.related.top { + border-bottom: 1px solid #EEE; + margin-bottom: 20px; +} + +.related.bottom { + border-top: 1px solid #EEE; +} + +.related ul { + padding: 0; + margin: 0; + list-style: none; +} + +.related li { + display: inline; +} + +nav#rellinks { + float: right; +} + +nav#rellinks li+li:before { + content: "|"; +} + +nav#breadcrumbs li+li:before { + content: "\00BB"; +} + +/* Hide certain items when printing */ +@media print { + div.related { + display: none; + } +} \ No newline at end of file diff --git a/3.0/_static/basic.css b/3.0/_static/basic.css new file mode 100644 index 000000000..e5179b7a9 --- /dev/null +++ b/3.0/_static/basic.css @@ -0,0 +1,925 @@ +/* + * basic.css + * ~~~~~~~~~ + * + * Sphinx stylesheet -- basic theme. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +/* -- main layout ----------------------------------------------------------- */ + +div.clearer { + clear: both; +} + +div.section::after { + display: block; + content: ''; + clear: left; +} + +/* -- relbar ---------------------------------------------------------------- */ + +div.related { + width: 100%; + font-size: 90%; +} + +div.related h3 { + display: none; +} + +div.related ul { + margin: 0; + padding: 0 0 0 10px; + list-style: none; +} + +div.related li { + display: inline; +} + +div.related li.right { + float: right; + margin-right: 5px; +} + +/* -- sidebar --------------------------------------------------------------- */ + +div.sphinxsidebarwrapper { + padding: 10px 5px 0 10px; +} + +div.sphinxsidebar { + float: left; + width: 230px; + margin-left: -100%; + font-size: 90%; + word-wrap: break-word; + overflow-wrap : break-word; +} + +div.sphinxsidebar ul { + list-style: none; +} + +div.sphinxsidebar ul ul, +div.sphinxsidebar ul.want-points { + margin-left: 20px; + list-style: square; +} + +div.sphinxsidebar ul ul { + margin-top: 0; + margin-bottom: 0; +} + +div.sphinxsidebar form { + margin-top: 10px; +} + +div.sphinxsidebar input { + border: 1px solid #98dbcc; + font-family: sans-serif; + font-size: 1em; +} + +div.sphinxsidebar #searchbox form.search { + overflow: hidden; +} + +div.sphinxsidebar #searchbox input[type="text"] { + float: left; + width: 80%; + padding: 0.25em; + box-sizing: border-box; +} + +div.sphinxsidebar #searchbox input[type="submit"] { + float: left; + width: 20%; + border-left: none; + padding: 0.25em; + box-sizing: border-box; +} + + +img { + border: 0; + max-width: 100%; +} + +/* -- search page ----------------------------------------------------------- */ + +ul.search { + margin: 10px 0 0 20px; + padding: 0; +} + +ul.search li { + padding: 5px 0 5px 20px; + background-image: url(file.png); + background-repeat: no-repeat; + background-position: 0 7px; +} + +ul.search li a { + font-weight: bold; +} + +ul.search li p.context { + color: #888; + margin: 2px 0 0 30px; + text-align: left; +} + +ul.keywordmatches li.goodmatch a { + font-weight: bold; +} + +/* -- index page ------------------------------------------------------------ */ + +table.contentstable { + width: 90%; + margin-left: auto; + margin-right: auto; +} + +table.contentstable p.biglink { + line-height: 150%; +} + +a.biglink { + font-size: 1.3em; +} + +span.linkdescr { + font-style: italic; + padding-top: 5px; + font-size: 90%; +} + +/* -- general index --------------------------------------------------------- */ + +table.indextable { + width: 100%; +} + +table.indextable td { + text-align: left; + vertical-align: top; +} + +table.indextable ul { + margin-top: 0; + margin-bottom: 0; + list-style-type: none; +} + +table.indextable > tbody > tr > td > ul { + padding-left: 0em; +} + +table.indextable tr.pcap { + height: 10px; +} + +table.indextable tr.cap { + margin-top: 10px; + background-color: #f2f2f2; +} + +img.toggler { + margin-right: 3px; + margin-top: 3px; + cursor: pointer; +} + +div.modindex-jumpbox { + border-top: 1px solid #ddd; + border-bottom: 1px solid #ddd; + margin: 1em 0 1em 0; + padding: 0.4em; +} + +div.genindex-jumpbox { + border-top: 1px solid #ddd; + border-bottom: 1px solid #ddd; + margin: 1em 0 1em 0; + padding: 0.4em; +} + +/* -- domain module index --------------------------------------------------- */ + +table.modindextable td { + padding: 2px; + border-collapse: collapse; +} + +/* -- general body styles --------------------------------------------------- */ + +div.body { + min-width: inherit; + max-width: 800px; +} + +div.body p, div.body dd, div.body li, div.body blockquote { + -moz-hyphens: auto; + -ms-hyphens: auto; + -webkit-hyphens: auto; + hyphens: auto; +} + +a.headerlink { + visibility: hidden; +} + +a:visited { + color: #551A8B; +} + +h1:hover > a.headerlink, +h2:hover > a.headerlink, +h3:hover > a.headerlink, +h4:hover > a.headerlink, +h5:hover > a.headerlink, +h6:hover > a.headerlink, +dt:hover > a.headerlink, +caption:hover > a.headerlink, +p.caption:hover > a.headerlink, +div.code-block-caption:hover > a.headerlink { + visibility: visible; +} + +div.body p.caption { + text-align: inherit; +} + +div.body td { + text-align: left; +} + +.first { + margin-top: 0 !important; +} + +p.rubric { + margin-top: 30px; + font-weight: bold; +} + +img.align-left, figure.align-left, .figure.align-left, object.align-left { + clear: left; + float: left; + margin-right: 1em; +} + +img.align-right, figure.align-right, .figure.align-right, object.align-right { + clear: right; + float: right; + margin-left: 1em; +} + +img.align-center, figure.align-center, .figure.align-center, object.align-center { + display: block; + margin-left: auto; + margin-right: auto; +} + +img.align-default, figure.align-default, .figure.align-default { + display: block; + margin-left: auto; + margin-right: auto; +} + +.align-left { + text-align: left; +} + +.align-center { + text-align: center; +} + +.align-default { + text-align: center; +} + +.align-right { + text-align: right; +} + +/* -- sidebars -------------------------------------------------------------- */ + +div.sidebar, +aside.sidebar { + margin: 0 0 0.5em 1em; + border: 1px solid #ddb; + padding: 7px; + background-color: #ffe; + width: 40%; + float: right; + clear: right; + overflow-x: auto; +} + +p.sidebar-title { + font-weight: bold; +} + +nav.contents, +aside.topic, +div.admonition, div.topic, blockquote { + clear: left; +} + +/* -- topics ---------------------------------------------------------------- */ + +nav.contents, +aside.topic, +div.topic { + border: 1px solid #ccc; + padding: 7px; + margin: 10px 0 10px 0; +} + +p.topic-title { + font-size: 1.1em; + font-weight: bold; + margin-top: 10px; +} + +/* -- admonitions ----------------------------------------------------------- */ + +div.admonition { + margin-top: 10px; + margin-bottom: 10px; + padding: 7px; +} + +div.admonition dt { + font-weight: bold; +} + +p.admonition-title { + margin: 0px 10px 5px 0px; + font-weight: bold; +} + +div.body p.centered { + text-align: center; + margin-top: 25px; +} + +/* -- content of sidebars/topics/admonitions -------------------------------- */ + +div.sidebar > :last-child, +aside.sidebar > :last-child, +nav.contents > :last-child, +aside.topic > :last-child, +div.topic > :last-child, +div.admonition > :last-child { + margin-bottom: 0; +} + +div.sidebar::after, +aside.sidebar::after, +nav.contents::after, +aside.topic::after, +div.topic::after, +div.admonition::after, +blockquote::after { + display: block; + content: ''; + clear: both; +} + +/* -- tables ---------------------------------------------------------------- */ + +table.docutils { + margin-top: 10px; + margin-bottom: 10px; + border: 0; + border-collapse: collapse; +} + +table.align-center { + margin-left: auto; + margin-right: auto; +} + +table.align-default { + margin-left: auto; + margin-right: auto; +} + +table caption span.caption-number { + font-style: italic; +} + +table caption span.caption-text { +} + +table.docutils td, table.docutils th { + padding: 1px 8px 1px 5px; + border-top: 0; + border-left: 0; + border-right: 0; + border-bottom: 1px solid #aaa; +} + +th { + text-align: left; + padding-right: 5px; +} + +table.citation { + border-left: solid 1px gray; + margin-left: 1px; +} + +table.citation td { + border-bottom: none; +} + +th > :first-child, +td > :first-child { + margin-top: 0px; +} + +th > :last-child, +td > :last-child { + margin-bottom: 0px; +} + +/* -- figures --------------------------------------------------------------- */ + +div.figure, figure { + margin: 0.5em; + padding: 0.5em; +} + +div.figure p.caption, figcaption { + padding: 0.3em; +} + +div.figure p.caption span.caption-number, +figcaption span.caption-number { + font-style: italic; +} + +div.figure p.caption span.caption-text, +figcaption span.caption-text { +} + +/* -- field list styles ----------------------------------------------------- */ + +table.field-list td, table.field-list th { + border: 0 !important; +} + +.field-list ul { + margin: 0; + padding-left: 1em; +} + +.field-list p { + margin: 0; +} + +.field-name { + -moz-hyphens: manual; + -ms-hyphens: manual; + -webkit-hyphens: manual; + hyphens: manual; +} + +/* -- hlist styles ---------------------------------------------------------- */ + +table.hlist { + margin: 1em 0; +} + +table.hlist td { + vertical-align: top; +} + +/* -- object description styles --------------------------------------------- */ + +.sig { + font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace; +} + +.sig-name, code.descname { + background-color: transparent; + font-weight: bold; +} + +.sig-name { + font-size: 1.1em; +} + +code.descname { + font-size: 1.2em; +} + +.sig-prename, code.descclassname { + background-color: transparent; +} + +.optional { + font-size: 1.3em; +} + +.sig-paren { + font-size: larger; +} + +.sig-param.n { + font-style: italic; +} + +/* C++ specific styling */ + +.sig-inline.c-texpr, +.sig-inline.cpp-texpr { + font-family: unset; +} + +.sig.c .k, .sig.c .kt, +.sig.cpp .k, .sig.cpp .kt { + color: #0033B3; +} + +.sig.c .m, +.sig.cpp .m { + color: #1750EB; +} + +.sig.c .s, .sig.c .sc, +.sig.cpp .s, .sig.cpp .sc { + color: #067D17; +} + + +/* -- other body styles ----------------------------------------------------- */ + +ol.arabic { + list-style: decimal; +} + +ol.loweralpha { + list-style: lower-alpha; +} + +ol.upperalpha { + list-style: upper-alpha; +} + +ol.lowerroman { + list-style: lower-roman; +} + +ol.upperroman { + list-style: upper-roman; +} + +:not(li) > ol > li:first-child > :first-child, +:not(li) > ul > li:first-child > :first-child { + margin-top: 0px; +} + +:not(li) > ol > li:last-child > :last-child, +:not(li) > ul > li:last-child > :last-child { + margin-bottom: 0px; +} + +ol.simple ol p, +ol.simple ul p, +ul.simple ol p, +ul.simple ul p { + margin-top: 0; +} + +ol.simple > li:not(:first-child) > p, +ul.simple > li:not(:first-child) > p { + margin-top: 0; +} + +ol.simple p, +ul.simple p { + margin-bottom: 0; +} + +aside.footnote > span, +div.citation > span { + float: left; +} +aside.footnote > span:last-of-type, +div.citation > span:last-of-type { + padding-right: 0.5em; +} +aside.footnote > p { + margin-left: 2em; +} +div.citation > p { + margin-left: 4em; +} +aside.footnote > p:last-of-type, +div.citation > p:last-of-type { + margin-bottom: 0em; +} +aside.footnote > p:last-of-type:after, +div.citation > p:last-of-type:after { + content: ""; + clear: both; +} + +dl.field-list { + display: grid; + grid-template-columns: fit-content(30%) auto; +} + +dl.field-list > dt { + font-weight: bold; + word-break: break-word; + padding-left: 0.5em; + padding-right: 5px; +} + +dl.field-list > dd { + padding-left: 0.5em; + margin-top: 0em; + margin-left: 0em; + margin-bottom: 0em; +} + +dl { + margin-bottom: 15px; +} + +dd > :first-child { + margin-top: 0px; +} + +dd ul, dd table { + margin-bottom: 10px; +} + +dd { + margin-top: 3px; + margin-bottom: 10px; + margin-left: 30px; +} + +.sig dd { + margin-top: 0px; + margin-bottom: 0px; +} + +.sig dl { + margin-top: 0px; + margin-bottom: 0px; +} + +dl > dd:last-child, +dl > dd:last-child > :last-child { + margin-bottom: 0; +} + +dt:target, span.highlighted { + background-color: #fbe54e; +} + +rect.highlighted { + fill: #fbe54e; +} + +dl.glossary dt { + font-weight: bold; + font-size: 1.1em; +} + +.versionmodified { + font-style: italic; +} + +.system-message { + background-color: #fda; + padding: 5px; + border: 3px solid red; +} + +.footnote:target { + background-color: #ffa; +} + +.line-block { + display: block; + margin-top: 1em; + margin-bottom: 1em; +} + +.line-block .line-block { + margin-top: 0; + margin-bottom: 0; + margin-left: 1.5em; +} + +.guilabel, .menuselection { + font-family: sans-serif; +} + +.accelerator { + text-decoration: underline; +} + +.classifier { + font-style: oblique; +} + +.classifier:before { + font-style: normal; + margin: 0 0.5em; + content: ":"; + display: inline-block; +} + +abbr, acronym { + border-bottom: dotted 1px; + cursor: help; +} + +.translated { + background-color: rgba(207, 255, 207, 0.2) +} + +.untranslated { + background-color: rgba(255, 207, 207, 0.2) +} + +/* -- code displays --------------------------------------------------------- */ + +pre { + overflow: auto; + overflow-y: hidden; /* fixes display issues on Chrome browsers */ +} + +pre, div[class*="highlight-"] { + clear: both; +} + +span.pre { + -moz-hyphens: none; + -ms-hyphens: none; + -webkit-hyphens: none; + hyphens: none; + white-space: nowrap; +} + +div[class*="highlight-"] { + margin: 1em 0; +} + +td.linenos pre { + border: 0; + background-color: transparent; + color: #aaa; +} + +table.highlighttable { + display: block; +} + +table.highlighttable tbody { + display: block; +} + +table.highlighttable tr { + display: flex; +} + +table.highlighttable td { + margin: 0; + padding: 0; +} + +table.highlighttable td.linenos { + padding-right: 0.5em; +} + +table.highlighttable td.code { + flex: 1; + overflow: hidden; +} + +.highlight .hll { + display: block; +} + +div.highlight pre, +table.highlighttable pre { + margin: 0; +} + +div.code-block-caption + div { + margin-top: 0; +} + +div.code-block-caption { + margin-top: 1em; + padding: 2px 5px; + font-size: small; +} + +div.code-block-caption code { + background-color: transparent; +} + +table.highlighttable td.linenos, +span.linenos, +div.highlight span.gp { /* gp: Generic.Prompt */ + user-select: none; + -webkit-user-select: text; /* Safari fallback only */ + -webkit-user-select: none; /* Chrome/Safari */ + -moz-user-select: none; /* Firefox */ + -ms-user-select: none; /* IE10+ */ +} + +div.code-block-caption span.caption-number { + padding: 0.1em 0.3em; + font-style: italic; +} + +div.code-block-caption span.caption-text { +} + +div.literal-block-wrapper { + margin: 1em 0; +} + +code.xref, a code { + background-color: transparent; + font-weight: bold; +} + +h1 code, h2 code, h3 code, h4 code, h5 code, h6 code { + background-color: transparent; +} + +.viewcode-link { + float: right; +} + +.viewcode-back { + float: right; + font-family: sans-serif; +} + +div.viewcode-block:target { + margin: -1px -10px; + padding: 0 10px; +} + +/* -- math display ---------------------------------------------------------- */ + +img.math { + vertical-align: middle; +} + +div.body div.math p { + text-align: center; +} + +span.eqno { + float: right; +} + +span.eqno a.headerlink { + position: absolute; + z-index: 1; +} + +div.math:hover a.headerlink { + visibility: visible; +} + +/* -- printout stylesheet --------------------------------------------------- */ + +@media print { + div.document, + div.documentwrapper, + div.bodywrapper { + margin: 0 !important; + width: 100%; + } + + div.sphinxsidebar, + div.related, + div.footer, + #top-link { + display: none; + } +} \ No newline at end of file diff --git a/3.0/_static/blla_heatmap.jpg b/3.0/_static/blla_heatmap.jpg new file mode 100644 index 000000000..3f3381096 Binary files /dev/null and b/3.0/_static/blla_heatmap.jpg differ diff --git a/3.0/_static/blla_output.jpg b/3.0/_static/blla_output.jpg new file mode 100644 index 000000000..72c652fa4 Binary files /dev/null and b/3.0/_static/blla_output.jpg differ diff --git a/3.0/_static/bw.png b/3.0/_static/bw.png new file mode 100644 index 000000000..e7e5054eb Binary files /dev/null and b/3.0/_static/bw.png differ diff --git a/3.0/_static/custom.css b/3.0/_static/custom.css new file mode 100644 index 000000000..c41f90af5 --- /dev/null +++ b/3.0/_static/custom.css @@ -0,0 +1,24 @@ +pre { + white-space: pre-wrap; +} +svg { + width: 100%; +} +.highlight .err { + border: inherit; + box-sizing: inherit; +} + +div.leftside { + width: 110px; + padding: 0px 3px 0px 0px; + float: left; +} + +div.rightside { + margin-left: 125px; +} + +dl.py { + margin-top: 25px; +} diff --git a/3.0/_static/doctools.js b/3.0/_static/doctools.js new file mode 100644 index 000000000..4d67807d1 --- /dev/null +++ b/3.0/_static/doctools.js @@ -0,0 +1,156 @@ +/* + * doctools.js + * ~~~~~~~~~~~ + * + * Base JavaScript utilities for all Sphinx HTML documentation. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ +"use strict"; + +const BLACKLISTED_KEY_CONTROL_ELEMENTS = new Set([ + "TEXTAREA", + "INPUT", + "SELECT", + "BUTTON", +]); + +const _ready = (callback) => { + if (document.readyState !== "loading") { + callback(); + } else { + document.addEventListener("DOMContentLoaded", callback); + } +}; + +/** + * Small JavaScript module for the documentation. + */ +const Documentation = { + init: () => { + Documentation.initDomainIndexTable(); + Documentation.initOnKeyListeners(); + }, + + /** + * i18n support + */ + TRANSLATIONS: {}, + PLURAL_EXPR: (n) => (n === 1 ? 0 : 1), + LOCALE: "unknown", + + // gettext and ngettext don't access this so that the functions + // can safely bound to a different name (_ = Documentation.gettext) + gettext: (string) => { + const translated = Documentation.TRANSLATIONS[string]; + switch (typeof translated) { + case "undefined": + return string; // no translation + case "string": + return translated; // translation exists + default: + return translated[0]; // (singular, plural) translation tuple exists + } + }, + + ngettext: (singular, plural, n) => { + const translated = Documentation.TRANSLATIONS[singular]; + if (typeof translated !== "undefined") + return translated[Documentation.PLURAL_EXPR(n)]; + return n === 1 ? singular : plural; + }, + + addTranslations: (catalog) => { + Object.assign(Documentation.TRANSLATIONS, catalog.messages); + Documentation.PLURAL_EXPR = new Function( + "n", + `return (${catalog.plural_expr})` + ); + Documentation.LOCALE = catalog.locale; + }, + + /** + * helper function to focus on search bar + */ + focusSearchBar: () => { + document.querySelectorAll("input[name=q]")[0]?.focus(); + }, + + /** + * Initialise the domain index toggle buttons + */ + initDomainIndexTable: () => { + const toggler = (el) => { + const idNumber = el.id.substr(7); + const toggledRows = document.querySelectorAll(`tr.cg-${idNumber}`); + if (el.src.substr(-9) === "minus.png") { + el.src = `${el.src.substr(0, el.src.length - 9)}plus.png`; + toggledRows.forEach((el) => (el.style.display = "none")); + } else { + el.src = `${el.src.substr(0, el.src.length - 8)}minus.png`; + toggledRows.forEach((el) => (el.style.display = "")); + } + }; + + const togglerElements = document.querySelectorAll("img.toggler"); + togglerElements.forEach((el) => + el.addEventListener("click", (event) => toggler(event.currentTarget)) + ); + togglerElements.forEach((el) => (el.style.display = "")); + if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) togglerElements.forEach(toggler); + }, + + initOnKeyListeners: () => { + // only install a listener if it is really needed + if ( + !DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS && + !DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS + ) + return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.altKey || event.ctrlKey || event.metaKey) return; + + if (!event.shiftKey) { + switch (event.key) { + case "ArrowLeft": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const prevLink = document.querySelector('link[rel="prev"]'); + if (prevLink && prevLink.href) { + window.location.href = prevLink.href; + event.preventDefault(); + } + break; + case "ArrowRight": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const nextLink = document.querySelector('link[rel="next"]'); + if (nextLink && nextLink.href) { + window.location.href = nextLink.href; + event.preventDefault(); + } + break; + } + } + + // some keyboard layouts may need Shift to get / + switch (event.key) { + case "/": + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) break; + Documentation.focusSearchBar(); + event.preventDefault(); + } + }); + }, +}; + +// quick alias for translations +const _ = Documentation.gettext; + +_ready(Documentation.init); diff --git a/3.0/_static/documentation_options.js b/3.0/_static/documentation_options.js new file mode 100644 index 000000000..7e4c114f2 --- /dev/null +++ b/3.0/_static/documentation_options.js @@ -0,0 +1,13 @@ +const DOCUMENTATION_OPTIONS = { + VERSION: '', + LANGUAGE: 'en', + COLLAPSE_INDEX: false, + BUILDER: 'html', + FILE_SUFFIX: '.html', + LINK_SUFFIX: '.html', + HAS_SOURCE: true, + SOURCELINK_SUFFIX: '.txt', + NAVIGATION_WITH_KEYS: false, + SHOW_SEARCH_SUMMARY: true, + ENABLE_SEARCH_SHORTCUTS: true, +}; \ No newline at end of file diff --git a/3.0/_static/file.png b/3.0/_static/file.png new file mode 100644 index 000000000..a858a410e Binary files /dev/null and b/3.0/_static/file.png differ diff --git a/3.0/_static/graphviz.css b/3.0/_static/graphviz.css new file mode 100644 index 000000000..027576e34 --- /dev/null +++ b/3.0/_static/graphviz.css @@ -0,0 +1,19 @@ +/* + * graphviz.css + * ~~~~~~~~~~~~ + * + * Sphinx stylesheet -- graphviz extension. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +img.graphviz { + border: 0; + max-width: 100%; +} + +object.graphviz { + max-width: 100%; +} diff --git a/3.0/_static/kraken.png b/3.0/_static/kraken.png new file mode 100644 index 000000000..8f25dd8be Binary files /dev/null and b/3.0/_static/kraken.png differ diff --git a/3.0/_static/kraken_recognition.svg b/3.0/_static/kraken_recognition.svg new file mode 100644 index 000000000..129b2c67a --- /dev/null +++ b/3.0/_static/kraken_recognition.svg @@ -0,0 +1,948 @@ + + + + + + + + + + + + Output Matrix + + + Labels + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Label + Sequence + + + 15, 10, 1, ... + + + + 'Time' Steps + + + + + + + + + + + + + + 'Time' Steps + (Width) + + + + + + + + + + + + + + + + + + + + + + + + + + Neural + Net + + + + Character + Sequence + + + o, c, u, ... + + + + + + + + + + + + + + + CTC + decoder + + + + + Codec + + + + + + + + + + + + + + diff --git a/3.0/_static/kraken_segmentation.svg b/3.0/_static/kraken_segmentation.svg new file mode 100644 index 000000000..4b9c860ce --- /dev/null +++ b/3.0/_static/kraken_segmentation.svg @@ -0,0 +1,1161 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Pixel Labelling + + + + + + + + Line and Separator + Heatmaps + + + + + + + + + Bounding Polygon + Calculation + + + + + + + + + + + Baseline + Vectorization + and Orientation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Oriented + Baselines + + + + + + + + + Line + Ordering + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Bounding + Polygons + + + + + + + Trainable + + + + + + + + + + + + Segmentation + + + + + + + + + + Region Heatmaps + + + + + + + + + + Region + Vectorization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Region + Boundaries + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/3.0/_static/kraken_segmodel.svg b/3.0/_static/kraken_segmodel.svg new file mode 100644 index 000000000..e722a9707 --- /dev/null +++ b/3.0/_static/kraken_segmodel.svg @@ -0,0 +1,250 @@ + + + + + + + + + + + + + Segmentation Model + (TorchVGSLModel) + + + + + + + + + Metadata + + + + + + + Line and Region Types + + + + + + + Baseline location flag + + + + + + + Bounding Regions + + + + + + + + + + + Neural Network + + + + diff --git a/3.0/_static/kraken_torchseqrecognizer.svg b/3.0/_static/kraken_torchseqrecognizer.svg new file mode 100644 index 000000000..c9a2f1135 --- /dev/null +++ b/3.0/_static/kraken_torchseqrecognizer.svg @@ -0,0 +1,239 @@ + + + + + + + + + + + + + Transcription Model + (TorchSeqRecognizer) + + + + + + + + + + Codec + + + + + + + + + + + Metadata + + + + + + + + + + + CTC Decoder + + + + + + + + + + + Neural Network + + + + diff --git a/3.0/_static/kraken_workflow.svg b/3.0/_static/kraken_workflow.svg new file mode 100644 index 000000000..5a50b51d6 --- /dev/null +++ b/3.0/_static/kraken_workflow.svg @@ -0,0 +1,753 @@ + + + + + + + + + + + + + + + Segmentation + + + + + + + + + + + Recognition + + + + + + + + + + + Serialization + + + + + + + + + + + + + + + + + + + + + + Recognition Model + + + + + + + + + + + + + + + + + + + + + + Segmentation Model + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + OCR Records + + + + + + + + + + + + + + + + + + Baselines, + Regions, + and Order + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Output File + + + + + + + + + + + + + + + + + + Output Template + + + + + + + + + + + + + + + + + + Image + + diff --git a/3.0/_static/language_data.js b/3.0/_static/language_data.js new file mode 100644 index 000000000..367b8ed81 --- /dev/null +++ b/3.0/_static/language_data.js @@ -0,0 +1,199 @@ +/* + * language_data.js + * ~~~~~~~~~~~~~~~~ + * + * This script contains the language-specific data used by searchtools.js, + * namely the list of stopwords, stemmer, scorer and splitter. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"]; + + +/* Non-minified version is copied as a separate JS file, if available */ + +/** + * Porter Stemmer + */ +var Stemmer = function() { + + var step2list = { + ational: 'ate', + tional: 'tion', + enci: 'ence', + anci: 'ance', + izer: 'ize', + bli: 'ble', + alli: 'al', + entli: 'ent', + eli: 'e', + ousli: 'ous', + ization: 'ize', + ation: 'ate', + ator: 'ate', + alism: 'al', + iveness: 'ive', + fulness: 'ful', + ousness: 'ous', + aliti: 'al', + iviti: 'ive', + biliti: 'ble', + logi: 'log' + }; + + var step3list = { + icate: 'ic', + ative: '', + alize: 'al', + iciti: 'ic', + ical: 'ic', + ful: '', + ness: '' + }; + + var c = "[^aeiou]"; // consonant + var v = "[aeiouy]"; // vowel + var C = c + "[^aeiouy]*"; // consonant sequence + var V = v + "[aeiou]*"; // vowel sequence + + var mgr0 = "^(" + C + ")?" + V + C; // [C]VC... is m>0 + var meq1 = "^(" + C + ")?" + V + C + "(" + V + ")?$"; // [C]VC[V] is m=1 + var mgr1 = "^(" + C + ")?" + V + C + V + C; // [C]VCVC... is m>1 + var s_v = "^(" + C + ")?" + v; // vowel in stem + + this.stemWord = function (w) { + var stem; + var suffix; + var firstch; + var origword = w; + + if (w.length < 3) + return w; + + var re; + var re2; + var re3; + var re4; + + firstch = w.substr(0,1); + if (firstch == "y") + w = firstch.toUpperCase() + w.substr(1); + + // Step 1a + re = /^(.+?)(ss|i)es$/; + re2 = /^(.+?)([^s])s$/; + + if (re.test(w)) + w = w.replace(re,"$1$2"); + else if (re2.test(w)) + w = w.replace(re2,"$1$2"); + + // Step 1b + re = /^(.+?)eed$/; + re2 = /^(.+?)(ed|ing)$/; + if (re.test(w)) { + var fp = re.exec(w); + re = new RegExp(mgr0); + if (re.test(fp[1])) { + re = /.$/; + w = w.replace(re,""); + } + } + else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1]; + re2 = new RegExp(s_v); + if (re2.test(stem)) { + w = stem; + re2 = /(at|bl|iz)$/; + re3 = new RegExp("([^aeiouylsz])\\1$"); + re4 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + if (re2.test(w)) + w = w + "e"; + else if (re3.test(w)) { + re = /.$/; + w = w.replace(re,""); + } + else if (re4.test(w)) + w = w + "e"; + } + } + + // Step 1c + re = /^(.+?)y$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(s_v); + if (re.test(stem)) + w = stem + "i"; + } + + // Step 2 + re = /^(.+?)(ational|tional|enci|anci|izer|bli|alli|entli|eli|ousli|ization|ation|ator|alism|iveness|fulness|ousness|aliti|iviti|biliti|logi)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = new RegExp(mgr0); + if (re.test(stem)) + w = stem + step2list[suffix]; + } + + // Step 3 + re = /^(.+?)(icate|ative|alize|iciti|ical|ful|ness)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = new RegExp(mgr0); + if (re.test(stem)) + w = stem + step3list[suffix]; + } + + // Step 4 + re = /^(.+?)(al|ance|ence|er|ic|able|ible|ant|ement|ment|ent|ou|ism|ate|iti|ous|ive|ize)$/; + re2 = /^(.+?)(s|t)(ion)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(mgr1); + if (re.test(stem)) + w = stem; + } + else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1] + fp[2]; + re2 = new RegExp(mgr1); + if (re2.test(stem)) + w = stem; + } + + // Step 5 + re = /^(.+?)e$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(mgr1); + re2 = new RegExp(meq1); + re3 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + if (re.test(stem) || (re2.test(stem) && !(re3.test(stem)))) + w = stem; + } + re = /ll$/; + re2 = new RegExp(mgr1); + if (re.test(w) && re2.test(w)) { + re = /.$/; + w = w.replace(re,""); + } + + // and turn initial Y back to y + if (firstch == "y") + w = firstch.toLowerCase() + w.substr(1); + return w; + } +} + diff --git a/3.0/_static/minus.png b/3.0/_static/minus.png new file mode 100644 index 000000000..d96755fda Binary files /dev/null and b/3.0/_static/minus.png differ diff --git a/3.0/_static/normal-reproduction-low-resolution.jpg b/3.0/_static/normal-reproduction-low-resolution.jpg new file mode 100644 index 000000000..673be92ae Binary files /dev/null and b/3.0/_static/normal-reproduction-low-resolution.jpg differ diff --git a/3.0/_static/pat.png b/3.0/_static/pat.png new file mode 100644 index 000000000..52f7a8995 Binary files /dev/null and b/3.0/_static/pat.png differ diff --git a/3.0/_static/plus.png b/3.0/_static/plus.png new file mode 100644 index 000000000..7107cec93 Binary files /dev/null and b/3.0/_static/plus.png differ diff --git a/3.0/_static/pygments.css b/3.0/_static/pygments.css new file mode 100644 index 000000000..0d49244ed --- /dev/null +++ b/3.0/_static/pygments.css @@ -0,0 +1,75 @@ +pre { line-height: 125%; } +td.linenos .normal { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +span.linenos { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +td.linenos .special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +.highlight .hll { background-color: #ffffcc } +.highlight { background: #eeffcc; } +.highlight .c { color: #408090; font-style: italic } /* Comment */ +.highlight .err { border: 1px solid #FF0000 } /* Error */ +.highlight .k { color: #007020; font-weight: bold } /* Keyword */ +.highlight .o { color: #666666 } /* Operator */ +.highlight .ch { color: #408090; font-style: italic } /* Comment.Hashbang */ +.highlight .cm { color: #408090; font-style: italic } /* Comment.Multiline */ +.highlight .cp { color: #007020 } /* Comment.Preproc */ +.highlight .cpf { color: #408090; font-style: italic } /* Comment.PreprocFile */ +.highlight .c1 { color: #408090; font-style: italic } /* Comment.Single */ +.highlight .cs { color: #408090; background-color: #fff0f0 } /* Comment.Special */ +.highlight .gd { color: #A00000 } /* Generic.Deleted */ +.highlight .ge { font-style: italic } /* Generic.Emph */ +.highlight .ges { font-weight: bold; font-style: italic } /* Generic.EmphStrong */ +.highlight .gr { color: #FF0000 } /* Generic.Error */ +.highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */ +.highlight .gi { color: #00A000 } /* Generic.Inserted */ +.highlight .go { color: #333333 } /* Generic.Output */ +.highlight .gp { color: #c65d09; font-weight: bold } /* Generic.Prompt */ +.highlight .gs { font-weight: bold } /* Generic.Strong */ +.highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */ +.highlight .gt { color: #0044DD } /* Generic.Traceback */ +.highlight .kc { color: #007020; font-weight: bold } /* Keyword.Constant */ +.highlight .kd { color: #007020; font-weight: bold } /* Keyword.Declaration */ +.highlight .kn { color: #007020; font-weight: bold } /* Keyword.Namespace */ +.highlight .kp { color: #007020 } /* Keyword.Pseudo */ +.highlight .kr { color: #007020; font-weight: bold } /* Keyword.Reserved */ +.highlight .kt { color: #902000 } /* Keyword.Type */ +.highlight .m { color: #208050 } /* Literal.Number */ +.highlight .s { color: #4070a0 } /* Literal.String */ +.highlight .na { color: #4070a0 } /* Name.Attribute */ +.highlight .nb { color: #007020 } /* Name.Builtin */ +.highlight .nc { color: #0e84b5; font-weight: bold } /* Name.Class */ +.highlight .no { color: #60add5 } /* Name.Constant */ +.highlight .nd { color: #555555; font-weight: bold } /* Name.Decorator */ +.highlight .ni { color: #d55537; font-weight: bold } /* Name.Entity */ +.highlight .ne { color: #007020 } /* Name.Exception */ +.highlight .nf { color: #06287e } /* Name.Function */ +.highlight .nl { color: #002070; font-weight: bold } /* Name.Label */ +.highlight .nn { color: #0e84b5; font-weight: bold } /* Name.Namespace */ +.highlight .nt { color: #062873; font-weight: bold } /* Name.Tag */ +.highlight .nv { color: #bb60d5 } /* Name.Variable */ +.highlight .ow { color: #007020; font-weight: bold } /* Operator.Word */ +.highlight .w { color: #bbbbbb } /* Text.Whitespace */ +.highlight .mb { color: #208050 } /* Literal.Number.Bin */ +.highlight .mf { color: #208050 } /* Literal.Number.Float */ +.highlight .mh { color: #208050 } /* Literal.Number.Hex */ +.highlight .mi { color: #208050 } /* Literal.Number.Integer */ +.highlight .mo { color: #208050 } /* Literal.Number.Oct */ +.highlight .sa { color: #4070a0 } /* Literal.String.Affix */ +.highlight .sb { color: #4070a0 } /* Literal.String.Backtick */ +.highlight .sc { color: #4070a0 } /* Literal.String.Char */ +.highlight .dl { color: #4070a0 } /* Literal.String.Delimiter */ +.highlight .sd { color: #4070a0; font-style: italic } /* Literal.String.Doc */ +.highlight .s2 { color: #4070a0 } /* Literal.String.Double */ +.highlight .se { color: #4070a0; font-weight: bold } /* Literal.String.Escape */ +.highlight .sh { color: #4070a0 } /* Literal.String.Heredoc */ +.highlight .si { color: #70a0d0; font-style: italic } /* Literal.String.Interpol */ +.highlight .sx { color: #c65d09 } /* Literal.String.Other */ +.highlight .sr { color: #235388 } /* Literal.String.Regex */ +.highlight .s1 { color: #4070a0 } /* Literal.String.Single */ +.highlight .ss { color: #517918 } /* Literal.String.Symbol */ +.highlight .bp { color: #007020 } /* Name.Builtin.Pseudo */ +.highlight .fm { color: #06287e } /* Name.Function.Magic */ +.highlight .vc { color: #bb60d5 } /* Name.Variable.Class */ +.highlight .vg { color: #bb60d5 } /* Name.Variable.Global */ +.highlight .vi { color: #bb60d5 } /* Name.Variable.Instance */ +.highlight .vm { color: #bb60d5 } /* Name.Variable.Magic */ +.highlight .il { color: #208050 } /* Literal.Number.Integer.Long */ \ No newline at end of file diff --git a/3.0/_static/searchtools.js b/3.0/_static/searchtools.js new file mode 100644 index 000000000..b08d58c9b --- /dev/null +++ b/3.0/_static/searchtools.js @@ -0,0 +1,620 @@ +/* + * searchtools.js + * ~~~~~~~~~~~~~~~~ + * + * Sphinx JavaScript utilities for the full-text search. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ +"use strict"; + +/** + * Simple result scoring code. + */ +if (typeof Scorer === "undefined") { + var Scorer = { + // Implement the following function to further tweak the score for each result + // The function takes a result array [docname, title, anchor, descr, score, filename] + // and returns the new score. + /* + score: result => { + const [docname, title, anchor, descr, score, filename] = result + return score + }, + */ + + // query matches the full name of an object + objNameMatch: 11, + // or matches in the last dotted part of the object name + objPartialMatch: 6, + // Additive scores depending on the priority of the object + objPrio: { + 0: 15, // used to be importantResults + 1: 5, // used to be objectResults + 2: -5, // used to be unimportantResults + }, + // Used when the priority is not in the mapping. + objPrioDefault: 0, + + // query found in title + title: 15, + partialTitle: 7, + // query found in terms + term: 5, + partialTerm: 2, + }; +} + +const _removeChildren = (element) => { + while (element && element.lastChild) element.removeChild(element.lastChild); +}; + +/** + * See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping + */ +const _escapeRegExp = (string) => + string.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string + +const _displayItem = (item, searchTerms, highlightTerms) => { + const docBuilder = DOCUMENTATION_OPTIONS.BUILDER; + const docFileSuffix = DOCUMENTATION_OPTIONS.FILE_SUFFIX; + const docLinkSuffix = DOCUMENTATION_OPTIONS.LINK_SUFFIX; + const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY; + const contentRoot = document.documentElement.dataset.content_root; + + const [docName, title, anchor, descr, score, _filename] = item; + + let listItem = document.createElement("li"); + let requestUrl; + let linkUrl; + if (docBuilder === "dirhtml") { + // dirhtml builder + let dirname = docName + "/"; + if (dirname.match(/\/index\/$/)) + dirname = dirname.substring(0, dirname.length - 6); + else if (dirname === "index/") dirname = ""; + requestUrl = contentRoot + dirname; + linkUrl = requestUrl; + } else { + // normal html builders + requestUrl = contentRoot + docName + docFileSuffix; + linkUrl = docName + docLinkSuffix; + } + let linkEl = listItem.appendChild(document.createElement("a")); + linkEl.href = linkUrl + anchor; + linkEl.dataset.score = score; + linkEl.innerHTML = title; + if (descr) { + listItem.appendChild(document.createElement("span")).innerHTML = + " (" + descr + ")"; + // highlight search terms in the description + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + } + else if (showSearchSummary) + fetch(requestUrl) + .then((responseData) => responseData.text()) + .then((data) => { + if (data) + listItem.appendChild( + Search.makeSearchSummary(data, searchTerms, anchor) + ); + // highlight search terms in the summary + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + }); + Search.output.appendChild(listItem); +}; +const _finishSearch = (resultCount) => { + Search.stopPulse(); + Search.title.innerText = _("Search Results"); + if (!resultCount) + Search.status.innerText = Documentation.gettext( + "Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories." + ); + else + Search.status.innerText = _( + "Search finished, found ${resultCount} page(s) matching the search query." + ).replace('${resultCount}', resultCount); +}; +const _displayNextItem = ( + results, + resultCount, + searchTerms, + highlightTerms, +) => { + // results left, load the summary and display it + // this is intended to be dynamic (don't sub resultsCount) + if (results.length) { + _displayItem(results.pop(), searchTerms, highlightTerms); + setTimeout( + () => _displayNextItem(results, resultCount, searchTerms, highlightTerms), + 5 + ); + } + // search finished, update title and status message + else _finishSearch(resultCount); +}; +// Helper function used by query() to order search results. +// Each input is an array of [docname, title, anchor, descr, score, filename]. +// Order the results by score (in opposite order of appearance, since the +// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically. +const _orderResultsByScoreThenName = (a, b) => { + const leftScore = a[4]; + const rightScore = b[4]; + if (leftScore === rightScore) { + // same score: sort alphabetically + const leftTitle = a[1].toLowerCase(); + const rightTitle = b[1].toLowerCase(); + if (leftTitle === rightTitle) return 0; + return leftTitle > rightTitle ? -1 : 1; // inverted is intentional + } + return leftScore > rightScore ? 1 : -1; +}; + +/** + * Default splitQuery function. Can be overridden in ``sphinx.search`` with a + * custom function per language. + * + * The regular expression works by splitting the string on consecutive characters + * that are not Unicode letters, numbers, underscores, or emoji characters. + * This is the same as ``\W+`` in Python, preserving the surrogate pair area. + */ +if (typeof splitQuery === "undefined") { + var splitQuery = (query) => query + .split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu) + .filter(term => term) // remove remaining empty strings +} + +/** + * Search Module + */ +const Search = { + _index: null, + _queued_query: null, + _pulse_status: -1, + + htmlToText: (htmlString, anchor) => { + const htmlElement = new DOMParser().parseFromString(htmlString, 'text/html'); + for (const removalQuery of [".headerlink", "script", "style"]) { + htmlElement.querySelectorAll(removalQuery).forEach((el) => { el.remove() }); + } + if (anchor) { + const anchorContent = htmlElement.querySelector(`[role="main"] ${anchor}`); + if (anchorContent) return anchorContent.textContent; + + console.warn( + `Anchored content block not found. Sphinx search tries to obtain it via DOM query '[role=main] ${anchor}'. Check your theme or template.` + ); + } + + // if anchor not specified or not found, fall back to main content + const docContent = htmlElement.querySelector('[role="main"]'); + if (docContent) return docContent.textContent; + + console.warn( + "Content block not found. Sphinx search tries to obtain it via DOM query '[role=main]'. Check your theme or template." + ); + return ""; + }, + + init: () => { + const query = new URLSearchParams(window.location.search).get("q"); + document + .querySelectorAll('input[name="q"]') + .forEach((el) => (el.value = query)); + if (query) Search.performSearch(query); + }, + + loadIndex: (url) => + (document.body.appendChild(document.createElement("script")).src = url), + + setIndex: (index) => { + Search._index = index; + if (Search._queued_query !== null) { + const query = Search._queued_query; + Search._queued_query = null; + Search.query(query); + } + }, + + hasIndex: () => Search._index !== null, + + deferQuery: (query) => (Search._queued_query = query), + + stopPulse: () => (Search._pulse_status = -1), + + startPulse: () => { + if (Search._pulse_status >= 0) return; + + const pulse = () => { + Search._pulse_status = (Search._pulse_status + 1) % 4; + Search.dots.innerText = ".".repeat(Search._pulse_status); + if (Search._pulse_status >= 0) window.setTimeout(pulse, 500); + }; + pulse(); + }, + + /** + * perform a search for something (or wait until index is loaded) + */ + performSearch: (query) => { + // create the required interface elements + const searchText = document.createElement("h2"); + searchText.textContent = _("Searching"); + const searchSummary = document.createElement("p"); + searchSummary.classList.add("search-summary"); + searchSummary.innerText = ""; + const searchList = document.createElement("ul"); + searchList.classList.add("search"); + + const out = document.getElementById("search-results"); + Search.title = out.appendChild(searchText); + Search.dots = Search.title.appendChild(document.createElement("span")); + Search.status = out.appendChild(searchSummary); + Search.output = out.appendChild(searchList); + + const searchProgress = document.getElementById("search-progress"); + // Some themes don't use the search progress node + if (searchProgress) { + searchProgress.innerText = _("Preparing search..."); + } + Search.startPulse(); + + // index already loaded, the browser was quick! + if (Search.hasIndex()) Search.query(query); + else Search.deferQuery(query); + }, + + _parseQuery: (query) => { + // stem the search terms and add them to the correct list + const stemmer = new Stemmer(); + const searchTerms = new Set(); + const excludedTerms = new Set(); + const highlightTerms = new Set(); + const objectTerms = new Set(splitQuery(query.toLowerCase().trim())); + splitQuery(query.trim()).forEach((queryTerm) => { + const queryTermLower = queryTerm.toLowerCase(); + + // maybe skip this "word" + // stopwords array is from language_data.js + if ( + stopwords.indexOf(queryTermLower) !== -1 || + queryTerm.match(/^\d+$/) + ) + return; + + // stem the word + let word = stemmer.stemWord(queryTermLower); + // select the correct list + if (word[0] === "-") excludedTerms.add(word.substr(1)); + else { + searchTerms.add(word); + highlightTerms.add(queryTermLower); + } + }); + + if (SPHINX_HIGHLIGHT_ENABLED) { // set in sphinx_highlight.js + localStorage.setItem("sphinx_highlight_terms", [...highlightTerms].join(" ")) + } + + // console.debug("SEARCH: searching for:"); + // console.info("required: ", [...searchTerms]); + // console.info("excluded: ", [...excludedTerms]); + + return [query, searchTerms, excludedTerms, highlightTerms, objectTerms]; + }, + + /** + * execute search (requires search index to be loaded) + */ + _performSearch: (query, searchTerms, excludedTerms, highlightTerms, objectTerms) => { + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const titles = Search._index.titles; + const allTitles = Search._index.alltitles; + const indexEntries = Search._index.indexentries; + + // Collect multiple result groups to be sorted separately and then ordered. + // Each is an array of [docname, title, anchor, descr, score, filename]. + const normalResults = []; + const nonMainIndexResults = []; + + _removeChildren(document.getElementById("search-progress")); + + const queryLower = query.toLowerCase().trim(); + for (const [title, foundTitles] of Object.entries(allTitles)) { + if (title.toLowerCase().trim().includes(queryLower) && (queryLower.length >= title.length/2)) { + for (const [file, id] of foundTitles) { + const score = Math.round(Scorer.title * queryLower.length / title.length); + const boost = titles[file] === title ? 1 : 0; // add a boost for document titles + normalResults.push([ + docNames[file], + titles[file] !== title ? `${titles[file]} > ${title}` : title, + id !== null ? "#" + id : "", + null, + score + boost, + filenames[file], + ]); + } + } + } + + // search for explicit entries in index directives + for (const [entry, foundEntries] of Object.entries(indexEntries)) { + if (entry.includes(queryLower) && (queryLower.length >= entry.length/2)) { + for (const [file, id, isMain] of foundEntries) { + const score = Math.round(100 * queryLower.length / entry.length); + const result = [ + docNames[file], + titles[file], + id ? "#" + id : "", + null, + score, + filenames[file], + ]; + if (isMain) { + normalResults.push(result); + } else { + nonMainIndexResults.push(result); + } + } + } + } + + // lookup as object + objectTerms.forEach((term) => + normalResults.push(...Search.performObjectSearch(term, objectTerms)) + ); + + // lookup as search terms in fulltext + normalResults.push(...Search.performTermsSearch(searchTerms, excludedTerms)); + + // let the scorer override scores with a custom scoring function + if (Scorer.score) { + normalResults.forEach((item) => (item[4] = Scorer.score(item))); + nonMainIndexResults.forEach((item) => (item[4] = Scorer.score(item))); + } + + // Sort each group of results by score and then alphabetically by name. + normalResults.sort(_orderResultsByScoreThenName); + nonMainIndexResults.sort(_orderResultsByScoreThenName); + + // Combine the result groups in (reverse) order. + // Non-main index entries are typically arbitrary cross-references, + // so display them after other results. + let results = [...nonMainIndexResults, ...normalResults]; + + // remove duplicate search results + // note the reversing of results, so that in the case of duplicates, the highest-scoring entry is kept + let seen = new Set(); + results = results.reverse().reduce((acc, result) => { + let resultStr = result.slice(0, 4).concat([result[5]]).map(v => String(v)).join(','); + if (!seen.has(resultStr)) { + acc.push(result); + seen.add(resultStr); + } + return acc; + }, []); + + return results.reverse(); + }, + + query: (query) => { + const [searchQuery, searchTerms, excludedTerms, highlightTerms, objectTerms] = Search._parseQuery(query); + const results = Search._performSearch(searchQuery, searchTerms, excludedTerms, highlightTerms, objectTerms); + + // for debugging + //Search.lastresults = results.slice(); // a copy + // console.info("search results:", Search.lastresults); + + // print the results + _displayNextItem(results, results.length, searchTerms, highlightTerms); + }, + + /** + * search for object names + */ + performObjectSearch: (object, objectTerms) => { + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const objects = Search._index.objects; + const objNames = Search._index.objnames; + const titles = Search._index.titles; + + const results = []; + + const objectSearchCallback = (prefix, match) => { + const name = match[4] + const fullname = (prefix ? prefix + "." : "") + name; + const fullnameLower = fullname.toLowerCase(); + if (fullnameLower.indexOf(object) < 0) return; + + let score = 0; + const parts = fullnameLower.split("."); + + // check for different match types: exact matches of full name or + // "last name" (i.e. last dotted part) + if (fullnameLower === object || parts.slice(-1)[0] === object) + score += Scorer.objNameMatch; + else if (parts.slice(-1)[0].indexOf(object) > -1) + score += Scorer.objPartialMatch; // matches in last name + + const objName = objNames[match[1]][2]; + const title = titles[match[0]]; + + // If more than one term searched for, we require other words to be + // found in the name/title/description + const otherTerms = new Set(objectTerms); + otherTerms.delete(object); + if (otherTerms.size > 0) { + const haystack = `${prefix} ${name} ${objName} ${title}`.toLowerCase(); + if ( + [...otherTerms].some((otherTerm) => haystack.indexOf(otherTerm) < 0) + ) + return; + } + + let anchor = match[3]; + if (anchor === "") anchor = fullname; + else if (anchor === "-") anchor = objNames[match[1]][1] + "-" + fullname; + + const descr = objName + _(", in ") + title; + + // add custom score for some objects according to scorer + if (Scorer.objPrio.hasOwnProperty(match[2])) + score += Scorer.objPrio[match[2]]; + else score += Scorer.objPrioDefault; + + results.push([ + docNames[match[0]], + fullname, + "#" + anchor, + descr, + score, + filenames[match[0]], + ]); + }; + Object.keys(objects).forEach((prefix) => + objects[prefix].forEach((array) => + objectSearchCallback(prefix, array) + ) + ); + return results; + }, + + /** + * search for full-text terms in the index + */ + performTermsSearch: (searchTerms, excludedTerms) => { + // prepare search + const terms = Search._index.terms; + const titleTerms = Search._index.titleterms; + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const titles = Search._index.titles; + + const scoreMap = new Map(); + const fileMap = new Map(); + + // perform the search on the required terms + searchTerms.forEach((word) => { + const files = []; + const arr = [ + { files: terms[word], score: Scorer.term }, + { files: titleTerms[word], score: Scorer.title }, + ]; + // add support for partial matches + if (word.length > 2) { + const escapedWord = _escapeRegExp(word); + if (!terms.hasOwnProperty(word)) { + Object.keys(terms).forEach((term) => { + if (term.match(escapedWord)) + arr.push({ files: terms[term], score: Scorer.partialTerm }); + }); + } + if (!titleTerms.hasOwnProperty(word)) { + Object.keys(titleTerms).forEach((term) => { + if (term.match(escapedWord)) + arr.push({ files: titleTerms[term], score: Scorer.partialTitle }); + }); + } + } + + // no match but word was a required one + if (arr.every((record) => record.files === undefined)) return; + + // found search word in contents + arr.forEach((record) => { + if (record.files === undefined) return; + + let recordFiles = record.files; + if (recordFiles.length === undefined) recordFiles = [recordFiles]; + files.push(...recordFiles); + + // set score for the word in each file + recordFiles.forEach((file) => { + if (!scoreMap.has(file)) scoreMap.set(file, {}); + scoreMap.get(file)[word] = record.score; + }); + }); + + // create the mapping + files.forEach((file) => { + if (!fileMap.has(file)) fileMap.set(file, [word]); + else if (fileMap.get(file).indexOf(word) === -1) fileMap.get(file).push(word); + }); + }); + + // now check if the files don't contain excluded terms + const results = []; + for (const [file, wordList] of fileMap) { + // check if all requirements are matched + + // as search terms with length < 3 are discarded + const filteredTermCount = [...searchTerms].filter( + (term) => term.length > 2 + ).length; + if ( + wordList.length !== searchTerms.size && + wordList.length !== filteredTermCount + ) + continue; + + // ensure that none of the excluded terms is in the search result + if ( + [...excludedTerms].some( + (term) => + terms[term] === file || + titleTerms[term] === file || + (terms[term] || []).includes(file) || + (titleTerms[term] || []).includes(file) + ) + ) + break; + + // select one (max) score for the file. + const score = Math.max(...wordList.map((w) => scoreMap.get(file)[w])); + // add result to the result list + results.push([ + docNames[file], + titles[file], + "", + null, + score, + filenames[file], + ]); + } + return results; + }, + + /** + * helper function to return a node containing the + * search summary for a given text. keywords is a list + * of stemmed words. + */ + makeSearchSummary: (htmlText, keywords, anchor) => { + const text = Search.htmlToText(htmlText, anchor); + if (text === "") return null; + + const textLower = text.toLowerCase(); + const actualStartPosition = [...keywords] + .map((k) => textLower.indexOf(k.toLowerCase())) + .filter((i) => i > -1) + .slice(-1)[0]; + const startWithContext = Math.max(actualStartPosition - 120, 0); + + const top = startWithContext === 0 ? "" : "..."; + const tail = startWithContext + 240 < text.length ? "..." : ""; + + let summary = document.createElement("p"); + summary.classList.add("context"); + summary.textContent = top + text.substr(startWithContext, 240).trim() + tail; + + return summary; + }, +}; + +_ready(Search.init); diff --git a/3.0/_static/sphinx_highlight.js b/3.0/_static/sphinx_highlight.js new file mode 100644 index 000000000..8a96c69a1 --- /dev/null +++ b/3.0/_static/sphinx_highlight.js @@ -0,0 +1,154 @@ +/* Highlighting utilities for Sphinx HTML documentation. */ +"use strict"; + +const SPHINX_HIGHLIGHT_ENABLED = true + +/** + * highlight a given string on a node by wrapping it in + * span elements with the given class name. + */ +const _highlight = (node, addItems, text, className) => { + if (node.nodeType === Node.TEXT_NODE) { + const val = node.nodeValue; + const parent = node.parentNode; + const pos = val.toLowerCase().indexOf(text); + if ( + pos >= 0 && + !parent.classList.contains(className) && + !parent.classList.contains("nohighlight") + ) { + let span; + + const closestNode = parent.closest("body, svg, foreignObject"); + const isInSVG = closestNode && closestNode.matches("svg"); + if (isInSVG) { + span = document.createElementNS("http://www.w3.org/2000/svg", "tspan"); + } else { + span = document.createElement("span"); + span.classList.add(className); + } + + span.appendChild(document.createTextNode(val.substr(pos, text.length))); + const rest = document.createTextNode(val.substr(pos + text.length)); + parent.insertBefore( + span, + parent.insertBefore( + rest, + node.nextSibling + ) + ); + node.nodeValue = val.substr(0, pos); + /* There may be more occurrences of search term in this node. So call this + * function recursively on the remaining fragment. + */ + _highlight(rest, addItems, text, className); + + if (isInSVG) { + const rect = document.createElementNS( + "http://www.w3.org/2000/svg", + "rect" + ); + const bbox = parent.getBBox(); + rect.x.baseVal.value = bbox.x; + rect.y.baseVal.value = bbox.y; + rect.width.baseVal.value = bbox.width; + rect.height.baseVal.value = bbox.height; + rect.setAttribute("class", className); + addItems.push({ parent: parent, target: rect }); + } + } + } else if (node.matches && !node.matches("button, select, textarea")) { + node.childNodes.forEach((el) => _highlight(el, addItems, text, className)); + } +}; +const _highlightText = (thisNode, text, className) => { + let addItems = []; + _highlight(thisNode, addItems, text, className); + addItems.forEach((obj) => + obj.parent.insertAdjacentElement("beforebegin", obj.target) + ); +}; + +/** + * Small JavaScript module for the documentation. + */ +const SphinxHighlight = { + + /** + * highlight the search words provided in localstorage in the text + */ + highlightSearchWords: () => { + if (!SPHINX_HIGHLIGHT_ENABLED) return; // bail if no highlight + + // get and clear terms from localstorage + const url = new URL(window.location); + const highlight = + localStorage.getItem("sphinx_highlight_terms") + || url.searchParams.get("highlight") + || ""; + localStorage.removeItem("sphinx_highlight_terms") + url.searchParams.delete("highlight"); + window.history.replaceState({}, "", url); + + // get individual terms from highlight string + const terms = highlight.toLowerCase().split(/\s+/).filter(x => x); + if (terms.length === 0) return; // nothing to do + + // There should never be more than one element matching "div.body" + const divBody = document.querySelectorAll("div.body"); + const body = divBody.length ? divBody[0] : document.querySelector("body"); + window.setTimeout(() => { + terms.forEach((term) => _highlightText(body, term, "highlighted")); + }, 10); + + const searchBox = document.getElementById("searchbox"); + if (searchBox === null) return; + searchBox.appendChild( + document + .createRange() + .createContextualFragment( + '" + ) + ); + }, + + /** + * helper function to hide the search marks again + */ + hideSearchWords: () => { + document + .querySelectorAll("#searchbox .highlight-link") + .forEach((el) => el.remove()); + document + .querySelectorAll("span.highlighted") + .forEach((el) => el.classList.remove("highlighted")); + localStorage.removeItem("sphinx_highlight_terms") + }, + + initEscapeListener: () => { + // only install a listener if it is really needed + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.shiftKey || event.altKey || event.ctrlKey || event.metaKey) return; + if (DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS && (event.key === "Escape")) { + SphinxHighlight.hideSearchWords(); + event.preventDefault(); + } + }); + }, +}; + +_ready(() => { + /* Do not call highlightSearchWords() when we are on the search page. + * It will highlight words from the *previous* search query. + */ + if (typeof Search === "undefined") SphinxHighlight.highlightSearchWords(); + SphinxHighlight.initEscapeListener(); +}); diff --git a/3.0/advanced.html b/3.0/advanced.html new file mode 100644 index 000000000..6b5e44053 --- /dev/null +++ b/3.0/advanced.html @@ -0,0 +1,343 @@ + + + + + + + + Advanced Usage — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Advanced Usage

+

Optical character recognition is the serial execution of multiple steps, in the +case of kraken binarization (converting color and grayscale images into bitonal +ones), layout analysis/page segmentation (extracting topological text lines +from an image), recognition (feeding text lines images into an classifiers), +and finally serialization of results into an appropriate format such as hOCR or +ALTO.

+
+

Input Specification

+

All kraken subcommands operating on input-output pairs, i.e. producing one +output document for one input document follow the basic syntax:

+
$ kraken -i input_1 output_1 -i input_2 output_2 ... subcommand_1 subcommand_2 ... subcommand_n
+
+
+

In particular subcommands may be chained.

+
+
+

Binarization

+

The binarization subcommand accepts almost the same parameters as +ocropus-nlbin. Only options not related to binarization, e.g. skew +detection are missing. In addition, error checking (image sizes, inversion +detection, grayscale enforcement) is always disabled and kraken will happily +binarize any image that is thrown at it.

+

Available parameters are:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

option

type

–threshold

FLOAT

–zoom

FLOAT

–escale

FLOAT

–border

FLOAT

–perc

INTEGER RANGE

–range

INTEGER

–low

INTEGER RANGE

–high

INTEGER RANGE

+
+
+

Page Segmentation and Script Detection

+

The segment subcommand access two operations page segmentation into lines and +script detection of those lines.

+

Page segmentation is mostly parameterless, although a switch to change the +color of column separators has been retained. The segmentation is written as a +JSON file containing bounding boxes in reading order and +the general text direction (horizontal, i.e. LTR or RTL text in top-to-bottom +reading order or vertical-ltr/rtl for vertical lines read from left-to-right or +right-to-left).

+

The script detection splits extracted lines from the segmenter into strip +sharing a particular script that can then be recognized by supplying +appropriate models for each detected script to the ocr subcommand.

+

Combined output from both consists of lists in the boxes field corresponding +to a topographical line and containing one or more bounding boxes of a +particular script. Identifiers are ISO 15924 4 character codes.

+
$ kraken -i 14.tif lines.txt segment
+$ cat lines.json
+{
+   "boxes" : [
+    [
+        ["Grek", [561, 216, 1626,309]]
+    ],
+    [
+        ["Latn", [2172, 197, 2424, 244]]
+    ],
+    [
+        ["Grek", [1678, 221, 2236, 320]],
+        ["Arab", [2241, 221, 2302, 320]]
+    ],
+
+        ["Grek", [412, 318, 2215, 416]],
+        ["Latn", [2208, 318, 2424, 416]]
+    ],
+    ...
+   ],
+   "script_detection": true,
+   "text_direction" : "horizontal-tb"
+}
+
+
+

Script detection is automatically enabled; by explicitly disabling script +detection the boxes field will contain only a list of line bounding boxes:

+
[546, 216, 1626, 309],
+[2169, 197, 2423, 244],
+[1676, 221, 2293, 320],
+...
+[503, 2641, 848, 2681]
+
+
+

Available page segmentation parameters are:

+ + + + + + + + + + + + + + + + + + + + + + + +

option

action

-d, –text-direction

Sets principal text direction. Valid values are horizontal-lr, horizontal-rl, vertical-lr, and vertical-rl.

–scale FLOAT

Estimate of the average line height on the page

-m, –maxcolseps

Maximum number of columns in the input document. Set to 0 for uni-column layouts.

-b, –black-colseps / -w, –white-colseps

Switch to black column separators.

-r, –remove-hlines / -l, –hlines

Disables prefiltering of small horizontal lines. Improves segmenter output on some Arabic texts.

+

The parameters specific to the script identification are:

+ + + + + + + + + + + + + + +

option

action

-s/-n

Enables/disables script detection

-a, –allowed-script

Whitelists specific scripts for detection output. Other detected script runs are merged with their adjacent scripts, after a heuristic pre-merging step.

+
+
+

Model Repository

+

There is a semi-curated repository of freely licensed recognition +models that can be accessed from the command line using a few subcommands. For +evaluating a series of models it is also possible to just clone the repository +using the normal git client.

+

The list subcommand retrieves a list of all models available and prints +them including some additional information (identifier, type, and a short +description):

+
$ kraken list
+Retrieving model list   ✓
+default (pyrnn) - A converted version of en-default.pyrnn.gz
+toy (clstm) - A toy model trained on 400 lines of the UW3 data set.
+...
+
+
+

To access more detailed information the show subcommand may be used:

+
$ kraken show toy
+name: toy.clstm
+
+A toy model trained on 400 lines of the UW3 data set.
+
+author: Benjamin Kiessling (mittagessen@l.unchti.me)
+http://kraken.re
+
+
+

If a suitable model has been decided upon it can be retrieved using the get +subcommand:

+
$ kraken get toy
+Retrieving model        ✓
+
+
+

Models will be placed in $XDG_BASE_DIR and can be accessed using their name as +shown by the show command, e.g.:

+
$ kraken -i ... ... ocr -m toy
+
+
+

Additions and updates to existing models are always welcome! Just open a pull +request or write an email.

+
+
+

Recognition

+

Recognition requires a grey-scale or binarized image, a page segmentation for +that image, and a model file. In particular there is no requirement to use the +page segmentation algorithm contained in the segment subcommand or the +binarization provided by kraken.

+

Multi-script recognition is possible by supplying a script-annotated +segmentation and a mapping between scripts and models:

+
$ kraken -i ... ... ocr -m Grek:porson.clstm -m Latn:antiqua.clstm
+
+
+

All polytonic Greek text portions will be recognized using the porson.clstm +model while Latin text will be fed into the antiqua.clstm model. It is +possible to define a fallback model that other text will be fed to:

+
$ kraken -i ... ... ocr -m ... -m ... -m default:porson.clstm
+
+
+

It is also possible to disable recognition on a particular script by mapping to +the special model keyword ignore. Ignored lines will still be serialized but +will not contain any recognition results.

+

The ocr subcommand is able to serialize the recognition results either as +plain text (default), as hOCR, into ALTO, or abbyyXML containing additional +metadata such as bounding boxes and confidences:

+
$ kraken -i ... ... ocr -t # text output
+$ kraken -i ... ... ocr -h # hOCR output
+$ kraken -i ... ... ocr -a # ALTO output
+$ kraken -i ... ... ocr -y # abbyyXML output
+
+
+

hOCR output is slightly different from hOCR files produced by ocropus. Each +ocr_line span contains not only the bounding box of the line but also +character boxes (x_bboxes attribute) indicating the coordinates of each +character. In each line alternating sequences of alphanumeric and +non-alphanumeric (in the unicode sense) characters are put into ocrx_word +spans. Both have bounding boxes as attributes and the recognition confidence +for each character in the x_conf attribute.

+

Paragraph detection has been removed as it was deemed to be unduly dependent on +certain typographic features which may not be valid for your input.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/3.0/api.html b/3.0/api.html new file mode 100644 index 000000000..0107fbbff --- /dev/null +++ b/3.0/api.html @@ -0,0 +1,450 @@ + + + + + + + + API Quickstart — kraken documentation + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

API Quickstart

+

Kraken provides routines which are usable by third party tools to access all +functionality of the OCR engine. Most functional blocks, binarization, +segmentation, recognition, and serialization are encapsulated in one high +level method each.

+

Simple use cases of the API which are mostly useful for debugging purposes are +contained in the contrib directory. In general it is recommended to look at +this tutorial, these scripts, or the API reference. The command line drivers +are unnecessarily complex for straightforward applications as they contain lots +of boilerplate to enable all use cases.

+
+

Basic Concepts

+

The fundamental modules of the API are similar to the command line drivers. +Image inputs and outputs are generally Pillow +objects and numerical outputs numpy arrays.

+

Top-level modules implement high level functionality while kraken.lib +contains loaders and low level methods that usually should not be used if +access to intermediate results is not required.

+
+
+

Preprocessing and Segmentation

+

The primary preprocessing function is binarization although depending on the +particular setup of the pipeline and the models utilized it can be optional. +For the non-trainable legacy bounding box segmenter binarization is mandatory +although it is still possible to feed color and grayscale images to the +recognizer. The trainable baseline segmenter can work with black and white, +grayscale, and color images, depending on the training data and netork +configuration utilized; though grayscale and color data are used in almost all +cases.

+
>>> from PIL import Image
+
+>>> from kraken import binarization
+
+# can be any supported image format and mode
+>>> im = Image.open('foo.png')
+>>> bw_im = binarization.nlbin(im)
+
+
+
+

Legacy segmentation

+

The basic parameter of the legacy segmenter consists just of a b/w image +object, although some additional parameters exist, largely to change the +principal text direction (important for column ordering and top-to-bottom +scripts) and explicit masking of non-text image regions:

+
>>> from kraken import pageseg
+
+>>> seg = pageseg.segment(bw_im)
+>>> seg
+{'text_direction': 'horizontal-lr',
+ 'boxes': [[0, 29, 232, 56],
+           [28, 54, 121, 84],
+           [9, 73, 92, 117],
+           [103, 76, 145, 131],
+           [7, 105, 119, 230],
+           [10, 228, 126, 345],
+           ...
+          ],
+ 'script_detection': False}
+
+
+
+
+

Baseline segmentation

+

The baseline segmentation method is based on a neural network that classifies +image pixels into baselines and regions. Because it is trainable, a +segmentation model is required in addition to the image to be segmentation and +it has to be loaded first:

+
>>> from kraken import blla
+>>> from kraken.lib import vgsl
+
+>>> model_path = 'path/to/model/file'
+>>> model = vgsl.TorchVGSLModel.load_model(model_path)
+
+
+

Afterwards they can be fed into the segmentation method +kraken.blla.segment() with image objects:

+
>>> from kraken import blla
+
+>>> baseline_seg = blla.segment(im, model=model)
+>>> baseline_seg
+{'text_direction': 'horizontal-lr',
+ 'type': 'baselines',
+ 'script_detection': False,
+ 'lines': [{'script': 'default',
+            'baseline': [[471, 1408], [524, 1412], [509, 1397], [1161, 1412], [1195, 1412]],
+            'boundary': [[471, 1408], [491, 1408], [515, 1385], [562, 1388], [575, 1377], ... [473, 1410]]},
+           ...],
+ 'regions': {'$tip':[[[536, 1716], ... [522, 1708], [524, 1716], [536, 1716], ...]
+             '$par': ...
+             '$nop':  ...}}
+
+
+

Optional parameters are largely the same as for the legacy segmenter, i.e. text +direction and masking.

+

Images are automatically converted into the proper mode for recognition, except +in the case of models trained on binary images as there is a plethora of +different algorithms available, each with strengths and weaknesses. For most +material the kraken-provided binarization should be sufficient, though. This +does not mean that a segmentation model trained on RGB images will have equal +accuracy for B/W, grayscale, and RGB inputs. Nevertheless the drop in quality +will often be modest or non-existant in for color models while non-binarized +inputs to a binary model will cause severe degradation (and a warning to that +notion).

+

Per default segmentation is performed on the CPU although the neural network +can be run on a GPU with the device argument. As the vast majority of the +processing required is postprocessing the performance gain will most likely +modest though.

+
+
+
+

Recognition

+

The character recognizer is equally based on a neural network which has to be +loaded first.

+
>>> from kraken.lib import models
+
+>>> rec_model_path = '/path/to/recognition/model'
+>>> model = models.load_any(rec_model_path)
+
+
+

Afterwards, given an image, a segmentation and the model one can perform text +recognition. The code is identical for both legacy and baseline segmentations. +Like for segmentation input images are auto-converted to the correct color +mode, except in the case of binary models and a warning will be raised if there +is a mismatch for binary input models.

+

There are two methods for recognition, a basic single model call +kraken.rpred.rpred() and a multi-model recognizer +kraken.rpred.mm_rpred(). The latter is useful for recognizing +multi-scriptal documents, i.e. applying different models to different parts of +a document.

+
>>> from kraken import rpred
+# single model recognition
+>>> pred_it = rpred(model, im, baseline_seg)
+>>> for record in pred_it:
+        print(record)
+
+
+

The output isn’t just a sequence of characters but a record object containing +the character prediction, cuts (approximate locations), and confidences.

+
>>> record.cuts
+>>> record.prediction
+>>> record.confidences
+
+
+

it is also possible to access the original line information:

+
# for baselines
+>>> record.type
+'baselines'
+>>> record.line
+>>> record.baseline
+>>> record.script
+
+# for box lines
+>>> record.type
+'box'
+>>> record.line
+>>> record.script
+
+
+

Sometimes the undecoded raw output of the network is required. The \(C +\times W\) softmax output matrix is accessible as an attribute on the +kraken.lib.models.TorchSeqRecognizer after each step of the kraken.rpred.rpred() iterator. To get a mapping +from the label space \(C\) the network operates in to Unicode code points a +codec is used. An arbitrary sequence of labels can generate an arbitrary number +of Unicode code points although usually the relation is one-to-one.

+
>>> pred_it = rpred(model, im, baseline_seg)
+>>> next(pred_it)
+>>> model.output
+>>> model.codec.l2c
+{'\x01': ' ',
+ '\x02': '"',
+ '\x03': "'",
+ '\x04': '(',
+ '\x05': ')',
+ '\x06': '-',
+ '\x07': '/',
+ ...
+}
+
+
+

There are several different ways to convert the output matrix to a sequence of +labels that can be decoded into a character sequence. These are contained in +kraken.lib.ctc_decoder with +kraken.lib.ctc_decoder.greedy_decoder() being the default.

+
+
+

XML Parsing

+

Sometimes it is desired to take the data in an existing XML serialization +format like PageXML or ALTO and apply an OCR function on it. The +kraken.lib.xml module includes parsers extracting information into data +structures processable with minimal transformtion by the functional blocks:

+
>>> from kraken.lib import xml
+
+>>> alto_doc = '/path/to/alto'
+>>> xml.parse_alto(alto_doc)
+{'image': '/path/to/image/file',
+ 'type': 'baselines',
+ 'lines': [{'baseline': [(24, 2017), (25, 2078)],
+            'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)],
+            'text': '',
+            'script': 'default'},
+           {'baseline': [(79, 2016), (79, 2041)],
+            'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)],
+            'text': '',
+            'script': 'default'}, ...],
+ 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)],
+                                      [(253, 3292), (668, 3292), (668, 3455), (253, 3455)],
+                                      [(216, -4), (1015, -4), (1015, 534), (216, 534)]],
+             'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)],
+                                  [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]],
+             ...}
+}
+
+>>> page_doc = '/path/to/page'
+>>> xml.parse_page(page_doc)
+{'image': '/path/to/image/file',
+ 'type': 'baselines',
+ 'lines': [{'baseline': [(24, 2017), (25, 2078)],
+            'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)],
+            'text': '',
+            'script': 'default'},
+           {'baseline': [(79, 2016), (79, 2041)],
+            'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)],
+            'text': '',
+            'script': 'default'}, ...],
+ 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)],
+                                      [(253, 3292), (668, 3292), (668, 3455), (253, 3455)],
+                                      [(216, -4), (1015, -4), (1015, 534), (216, 534)]],
+             'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)],
+                                  [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]],
+             ...}
+
+
+
+
+

Serialization

+

The serialization module can be used to transform the ocr_records returned by the prediction iterator into a text +based (most often XML) format for archival. The module renders jinja2 templates in kraken/templates through +the kraken.serialization.serialize() function.

+
>>> from kraken.lib import serialization
+
+>>> records = [record for record in pred_it]
+>>> alto = serialization.serialize(records, image_name='path/to/image', image_size=im.size, template='alto')
+>>> with open('output.xml', 'w') as fp:
+        fp.write(alto)
+
+
+
+
+

Training

+

There are catch-all constructors for quickly setting up +:cls:`kraken.lib.train.KrakenTrainer` instances for all training needs. They +largely map the comand line utils ketos train and ketos segtrain to a +programmatic interface. The arguments are identical, apart from a +differentiation between general arguments (data sources and setup, file names, +devices, …) and hyperparameters (optimizers, learning rate schedules, +augmentation.

+

Training a recognition model from a number of xml files in ALTO or PAGE XML:

+
>>> from kraken.lib.train import KrakenTrainer
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> trainer = KrakenTrainer.recognition_train_gen(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer.run()
+
+
+

Likewise for a baseline and region segmentation model:

+
>>> from kraken.lib.train import KrakenTrainer
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> trainer = KrakenTrainer.segmentation_train_gen(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer.run()
+
+
+

Both constructing the trainer object and the training itself can take quite a +bit of time. The constructor provides a callback for each iterative process +during object initialization that is intended to set up a progress bar:

+
>>> from kraken.lib.train import KrakenTrainer
+
+>>> def progress_callback(string, length):
+        print(f'starting process "{string}" of length {length}')
+        return lambda: print('.', end='')
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:25] # training data is shuffled internally
+>>> evaluation_files = ground_truth[25:95]
+>>> trainer = KrakenTrainer.segmentation_train_gen(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', progress_callback=progress_callback, augment=True)
+starting process "Building training set" of length 25
+.........................
+starting process "Building validation set" of length 70
+......................................................................
+>>> trainer.run()
+
+
+

Executing the trainer object has two callbacks as arguments, one called after +each iteration and one returning the evaluation metrics after the end of each +epoch:

+
>>> from kraken.lib.train import KrakenTrainer
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> trainer = KrakenTrainer.segmentation_train_gen(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> def _update_progress():
+        print('.', end='')
+>>> def _print_eval(epoch, accuracy, **kwargs):
+        print(accuracy)
+>>> trainer.run(_print_eval, _update_progress)
+.........................0.0
+.........................0.0
+.........................0.0
+.........................0.0
+.........................0.0
+...
+
+
+

The metrics differ for recognition +(kraken.lib.train.recognition_evaluator_fn()) and segmentation +(kraken.lib.train.baseline_label_evaluator_fn()).

+

Depending on the stopping method chosen the last model file might not be the +one with the best accuracy. Per default early stopping is used which aborts +training after a certain number of epochs without improvement. In that case the +best model and evaluation loss can be determined through:

+
>>> trainer.stopper.best_epoch
+>>> trainer.stopper.best_loss
+>>> best_model_path = f'{trainer.filename_prefix}_{trainer.stopper.best_epoch}.mlmodel'
+
+
+

This is only a small subset of the training functionality. It is suggested to +have a closer look at the command line parameters for features as transfer +learning, region and baseline filtering, training continuation, and so on.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/3.0/api_docs.html b/3.0/api_docs.html new file mode 100644 index 000000000..52028f148 --- /dev/null +++ b/3.0/api_docs.html @@ -0,0 +1,182 @@ + + + + + + + + API reference — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

API reference

+
+

kraken.binarization module

+
+
+

kraken.serialization module

+
+
+

kraken.blla module

+
+

Note

+

blla provides the interface to the fully trainable segmenter. For the +legacy segmenter interface refer to the pageseg module. Note that +recognition models are not interchangeable between segmenters.

+
+
+
+

kraken.pageseg module

+
+

Note

+

pageseg is the legacy bounding box-based segmenter. For the trainable +baseline segmenter interface refer to the blla module. Note that +recognition models are not interchangeable between segmenters.

+
+
+
+

kraken.rpred module

+
+
+

kraken.transcribe module

+
+
+

kraken.linegen module

+
+
+

kraken.lib.models module

+
+
+

kraken.lib.vgsl module

+
+
+

kraken.lib.xml module

+
+
+

kraken.lib.codec

+
+
+

kraken.lib.train module

+
+
+

kraken.lib.dataset module

+
+
+

kraken.lib.segmentation module

+
+
+

kraken.lib.ctc_decoder

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/3.0/genindex.html b/3.0/genindex.html new file mode 100644 index 000000000..05e1b0ba6 --- /dev/null +++ b/3.0/genindex.html @@ -0,0 +1,96 @@ + + + + + + + Index — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ + +

Index

+ +
+ +
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/3.0/gpu.html b/3.0/gpu.html new file mode 100644 index 000000000..2e911e900 --- /dev/null +++ b/3.0/gpu.html @@ -0,0 +1,100 @@ + + + + + + + + GPU Acceleration — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

GPU Acceleration

+

The latest version of kraken uses a new pytorch backend which enables GPU +acceleration both for training and recognition. Apart from a compatible Nvidia +GPU, CUDA and cuDNN have to be installed so pytorch can run computation on it.

+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/3.0/index.html b/3.0/index.html new file mode 100644 index 000000000..b1230b344 --- /dev/null +++ b/3.0/index.html @@ -0,0 +1,225 @@ + + + + + + + + kraken — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

kraken

+
+
+

kraken is a turn-key OCR system optimized for historical and non-Latin script +material.

+
+
+

Features

+

kraken’s main features are:

+
+
+
+

Pull requests and code contributions are always welcome.

+
+
+

Installation

+

kraken requires some external libraries to run. On Debian/Ubuntu they may be +installed using:

+
# apt install libpangocairo-1.0 libxml2 libblas3 liblapack3 python3-dev python3-pip libvips
+
+
+
+

pip

+
$ pip3 install kraken
+
+
+

or by running pip in the git repository:

+
$ pip3 install .
+
+
+
+
+

conda

+

Install the latest development version through conda:

+
$ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment.yml
+$ conda env create -f environment.yml
+
+
+

or:

+
$ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment_cuda.yml
+$ conda env create -f environment_cuda.yml
+
+
+

for CUDA acceleration with the appropriate hardware.

+
+
+

Models

+

Finally you’ll have to scrounge up a recognition model to do the actual +recognition of characters. To download the default English text recognition +model and place it in the user’s kraken directory:

+
$ kraken get 10.5281/zenodo.2577813
+
+
+

A list of libre models available in the central repository can be retrieved by +running:

+
$ kraken list
+
+
+

Model metadata can be extracted using:

+
$ kraken show 10.5281/zenodo.2577813
+name: 10.5281/zenodo.2577813
+
+A generalized model for English printed text
+
+This model has been trained on a large corpus of modern printed English text\naugmented with ~10000 lines of historical p
+scripts: Latn
+alphabet: !"#$%&'()+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]`abcdefghijklmnopqrstuvwxyz{} SPACE
+accuracy: 99.95%
+license: Apache-2.0
+author(s): Kiessling, Benjamin
+date: 2019-02-26
+
+
+
+
+
+

Quickstart

+

Recognizing text on an image using the default parameters including the +prerequisite steps of binarization and page segmentation:

+
$ kraken -i image.tif image.txt segment -bl ocr
+Loading RNN     ✓
+Processing      ⣻
+
+
+

To binarize a single image using the nlbin algorithm (usually not required with the baseline segmenter):

+
$ kraken -i image.tif bw.tif binarize
+
+
+

To segment a binarized image into reading-order sorted baselines and regions:

+
$ kraken -i bw.tif lines.json segment -bl
+
+
+

To OCR an image using the default RNN:

+
$ kraken -i bw.tif image.txt segment -bl ocr
+
+
+

All commands and their parameters are documented, just add the standard +--help flag for further information.

+
+
+

Training Tutorial

+

There is a training tutorial at Training kraken.

+
+
+

License

+

Kraken is provided under the terms and conditions of the Apache 2.0 +License.

+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/3.0/ketos.html b/3.0/ketos.html new file mode 100644 index 000000000..21bf0c972 --- /dev/null +++ b/3.0/ketos.html @@ -0,0 +1,542 @@ + + + + + + + + Training — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Training

+

This page describes the training utilities available through the ketos +command line utility in depth. For a gentle introduction on model training +please refer to the tutorial.

+

Both segmentation and recognition are trainable in kraken. The segmentation +model finds baselines and regions on a page image. Recognition models convert +text image lines found by the segmenter into digital text.

+
+

Training data formats

+

The training tools accept a variety of training data formats, usually some kind +of custom low level format, and the XML-based formats that are commony used for +archival of annotation and transcription data. It is recommended to use the XML +formats as they are interchangeable with other tools, do not incur +transformation losses, and allow training all components of kraken from the +same datasets easily.

+
+

ALTO

+

Kraken parses and produces files according to the upcoming version of the ALTO +standard: 4.2. It validates against version 4.1 with the exception of the +redefinition of the BASELINE +attribute to accomodate polygonal chain baselines. An example showing the +attributes necessary for segmentation and recognition training follows:

+
<?xml version="1.0" encoding="UTF-8"?>
+<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+	xmlns="http://www.loc.gov/standards/alto/ns-v4#"
+	xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-0.xsd">
+	<Description>
+		<sourceImageInformation>
+			<fileName>filename.jpg</fileName><!-- relative path in relation to XML location of the image file-->
+		</sourceImageInformation>
+		....
+	</Description>
+	<Layout>
+		<Page...>
+			<PrintSpace...>
+				<ComposedBlockType ID="block_I"
+						   HPOS="125"
+						   VPOS="523" 
+						   WIDTH="5234" 
+						   HEIGHT="4000"
+						   TYPE="region_type"><!-- for textlines part of a semantic region -->
+					<TextBlock ID="textblock_N">
+						<TextLine ID="line_0"
+							  HPOS="..."
+							  VPOS="..." 
+							  WIDTH="..." 
+							  HEIGHT="..."
+							  BASELINE="10 20 15 20 400 20"><!-- necessary for segmentation training -->
+							<String ID="segment_K" 
+								CONTENT="word_text"><!-- necessary for recognition training. Text is retrieved from <String> and <SP> tags. Lower level glyphs are ignored. -->
+								...
+							</String>
+							<SP.../>
+						</TextLine>
+					</TextBlock>
+				</ComposedBlockType>
+				<TextBlock ID="textblock_M"><!-- for textlines not part of a region -->
+				...
+				</TextBlock>
+			</PrintSpace>
+		</Page>
+	</Layout>
+</alto>
+
+
+

Importantly, the parser only works with measurements in the pixel domain, i.e. +an unset MeasurementUnit or one with an element value of pixel. In +addition, as the minimal version required for ingestion is quite new it is +likely that most existing ALTO documents will not contain sufficient +information to be used with kraken out of the box.

+
+
+

PAGE XML

+

PAGE XML is parsed and produced according to the 2019-07-15 version of the +schema, although the parser is not strict and works with non-conformant output +of a variety of tools.

+
<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd">
+	<Metadata>...</Metadata>
+	<Page imageFilename="filename.jpg"...><!-- relative path to an image file from the location of the XML document -->
+		<TextRegion id="block_N"
+			    custom="structure {type:region_type;}"><!-- region type is a free text field-->
+			<Coords points="10,20 500,20 400,200, 500,300, 10,300 5,80"/><!-- polygon for region boundary -->
+			<TextLine id="line_K">
+				<Baseline points="80,200 100,210, 400,198"/><!-- required for baseline segmentation training -->
+				<TextEquiv><Unicode>text text text</Unicode></TextEquiv><!-- only TextEquiv tags immediately below the TextLine tag are parsed for recognition training -->
+				<Word>
+				...
+			</TextLine>
+			....
+		</TextRegion>
+		<TextRegion id="textblock_M"><!-- for lines not contained in any region. TextRegions without a type are automatically assigned the 'text' type which can be filtered out for training. -->
+			<Coords points="0,0 0,{{ page.size[1] }} {{ page.size[0] }},{{ page.size[1] }} {{ page.size[0] }},0"/>
+			<TextLine>...</TextLine><!-- same as above -->
+			....
+                </TextRegion>
+	</Page>
+</PcGts>
+
+
+
+
+
+

Recognition training

+

The training utility allows training of VGSL specified models +both from scratch and from existing models. Here are its command line options:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

option

action

-p, –pad

Left and right padding around lines

-o, –output

Output model file prefix. Defaults to model.

-s, –spec

VGSL spec of the network to train. CTC layer +will be added automatically. default: +[1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 +Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do]

-a, –append

Removes layers before argument and then +appends spec. Only works when loading an +existing model

-i, –load

Load existing file to continue training

-F, –savefreq

Model save frequency in epochs during +training

-R, –report

Report creation frequency in epochs

-q, –quit

Stop condition for training. Set to early +for early stopping (default) or dumb for fixed +number of epochs.

-N, –epochs

Number of epochs to train for. Set to -1 for indefinite training.

–lag

Number of epochs to wait before stopping +training without improvement. Only used when using early stopping.

–min-delta

Minimum improvement between epochs to reset +early stopping. Defaults to 0.005.

-d, –device

Select device to use (cpu, cuda:0, cuda:1,…). GPU acceleration requires CUDA.

–optimizer

Select optimizer (Adam, SGD, RMSprop).

-r, –lrate

Learning rate [default: 0.001]

-m, –momentum

Momentum used with SGD optimizer. Ignored otherwise.

-w, –weight-decay

Weight decay.

–schedule

Sets the learning rate scheduler. May be either constant or 1cycle. For 1cycle +the cycle length is determined by the –epoch option.

-p, –partition

Ground truth data partition ratio between train/validation set

-u, –normalization

Ground truth Unicode normalization. One of NFC, NFKC, NFD, NFKD.

-c, –codec

Load a codec JSON definition (invalid if loading existing model)

–resize

Codec/output layer resizing option. If set +to add code points will be added, both +will set the layer to match exactly the +training data, fail will abort if training +data and model codec do not match. Only valid when refining an existing model.

-n, –reorder / –no-reorder

Reordering of code points to display order.

-t, –training-files

File(s) with additional paths to training data. Used to +enforce an explicit train/validation set split and deal with +training sets with more lines than the command line can process. Can be used more than once.

-e, –evaluation-files

File(s) with paths to evaluation data. Overrides the -p parameter.

–preload / –no-preload

Hard enable/disable for training data preloading. Preloading +training data into memory is enabled per default for sets with less than 2500 lines.

–threads

Number of OpenMP threads when running on CPU. Defaults to min(4, #cores).

+
+

From Scratch

+

The absolute minimal example to train a new recognition model from a number of +PAGE XML documents is similar to the segmentation training:

+
$ ketos train training_data/*.png
+
+
+

Training will continue until the error does not improve anymore and the best +model (among intermediate results) will be saved in the current directory.

+

In some cases, such as color inputs, changing the network architecture might be +useful:

+
$ ketos train -f page -s '[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do]' syr/*.xml
+
+
+

Complete documentation for the network description language can be found on the +VGSL page.

+

Sometimes the early stopping default parameters might produce suboptimal +results such as stopping training too soon. Adjusting the minimum delta an/or +lag can be useful:

+
$ ketos train --lag 10 --min-delta 0.001 syr/*.png
+
+
+

To switch optimizers from Adam to SGD or RMSprop just set the option:

+
$ ketos train --optimizer SGD syr/*.png
+
+
+

It is possible to resume training from a previously saved model:

+
$ ketos train -i model_25.mlmodel syr/*.png
+
+
+
+
+

Fine Tuning

+

Fine tuning an existing model for another typeface or new characters is also +possible with the same syntax as resuming regular training:

+
$ ketos train -f page -i model_best.mlmodel syr/*.xml
+
+
+

The caveat is that the alphabet of the base model and training data have to be +an exact match. Otherwise an error will be raised:

+
$ ketos train -i model_5.mlmodel --no-preload kamil/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[0.8616] alphabet mismatch {'~', '»', '8', '9', 'ـ'}
+Network codec not compatible with training set
+[0.8620] Training data and model codec alphabets mismatch: {'ٓ', '؟', '!', 'ص', '،', 'ذ', 'ة', 'ي', 'و', 'ب', 'ز', 'ح', 'غ', '~', 'ف', ')', 'د', 'خ', 'م', '»', 'ع', 'ى', 'ق', 'ش', 'ا', 'ه', 'ك', 'ج', 'ث', '(', 'ت', 'ظ', 'ض', 'ل', 'ط', '؛', 'ر', 'س', 'ن', 'ء', 'ٔ', '«', 'ـ', 'ٕ'}
+
+
+

There are two modes dealing with mismatching alphabets, add and both. +add resizes the output layer and codec of the loaded model to include all +characters in the new training set without removing any characters. both +will make the resulting model an exact match with the new training set by both +removing unused characters from the model and adding new ones.

+
$ ketos -v train --resize add -i model_5.mlmodel syr/*.png
+...
+[0.7943] Training set 788 lines, validation set 88 lines, alphabet 50 symbols
+...
+[0.8337] Resizing codec to include 3 new code points
+[0.8374] Resizing last layer in network to 52 outputs
+...
+
+
+

In this example 3 characters were added for a network that is able to +recognize 52 different characters after sufficient additional training.

+
$ ketos -v train --resize both -i model_5.mlmodel syr/*.png
+...
+[0.7593] Training set 788 lines, validation set 88 lines, alphabet 49 symbols
+...
+[0.7857] Resizing network or given codec to 49 code sequences
+[0.8344] Deleting 2 output classes from network (46 retained)
+...
+
+
+

In both mode 2 of the original characters were removed and 3 new ones were added.

+
+
+

Slicing

+

Refining on mismatched alphabets has its limits. If the alphabets are highly +different the modification of the final linear layer to add/remove character +will destroy the inference capabilities of the network. In those cases it is +faster to slice off the last few layers of the network and only train those +instead of a complete network from scratch.

+

Taking the default network definition as printed in the debug log we can see +the layer indices of the model:

+
[0.8760] Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 48 outputs
+[0.8762] layer          type    params
+[0.8790] 0              conv    kernel 3 x 3 filters 32 activation r
+[0.8795] 1              dropout probability 0.1 dims 2
+[0.8797] 2              maxpool kernel 2 x 2 stride 2 x 2
+[0.8802] 3              conv    kernel 3 x 3 filters 64 activation r
+[0.8804] 4              dropout probability 0.1 dims 2
+[0.8806] 5              maxpool kernel 2 x 2 stride 2 x 2
+[0.8813] 6              reshape from 1 1 x 12 to 1/3
+[0.8876] 7              rnn     direction b transposed False summarize False out 100 legacy None
+[0.8878] 8              dropout probability 0.5 dims 1
+[0.8883] 9              linear  augmented False out 48
+
+
+

To remove everything after the initial convolutional stack and add untrained +layers we define a network stub and index for appending:

+
$ ketos train -i model_1.mlmodel --append 7 -s '[Lbx256 Do]' syr/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[0.8014] alphabet mismatch {'8', '3', '9', '7', '܇', '݀', '݂', '4', ':', '0'}
+Slicing and dicing model ✓
+
+
+

The new model will behave exactly like a new one, except potentially training a +lot faster.

+
+
+
+

Segmentation training

+

Training a segmentation model is very similar to training one for

+
+
+

Testing

+

Picking a particular model from a pool or getting a more detailled look on the +recognition accuracy can be done with the test command. It uses transcribed +lines, the test set, in the same format as the train command, recognizes the +line images with one or more models, and creates a detailled report of the +differences from the ground truth for each of them.

+
+
-m, --model
+

Model(s) to evaluate.

+
+
-e, --evaluation-files
+

File(s) with paths to evaluation data.

+
+
-d, --device
+

Select device to use.

+
+
-p, --pad
+

Left and right padding around lines.

+
+
+

Transcriptions are handed to the command in the same way as for the train +command, either through a manifest with -e/–evaluation-files or by just +adding a number of image files as the final argument:

+
$ ketos test -m $model -e test.txt test/*.png
+Evaluating $model
+Evaluating  [####################################]  100%
+=== report test_model.mlmodel ===
+
+7012 Characters
+6022 Errors
+14.12%       Accuracy
+
+5226 Insertions
+2    Deletions
+794  Substitutions
+
+Count Missed   %Right
+1567  575    63.31%  Common
+5230  5230   0.00%   Arabic
+215   215    0.00%   Inherited
+
+Errors       Correct-Generated
+773  { ا } - {  }
+536  { ل } - {  }
+328  { و } - {  }
+274  { ي } - {  }
+266  { م } - {  }
+256  { ب } - {  }
+246  { ن } - {  }
+241  { SPACE } - {  }
+207  { ر } - {  }
+199  { ف } - {  }
+192  { ه } - {  }
+174  { ع } - {  }
+172  { ARABIC HAMZA ABOVE } - {  }
+144  { ت } - {  }
+136  { ق } - {  }
+122  { س } - {  }
+108  { ، } - {  }
+106  { د } - {  }
+82   { ك } - {  }
+81   { ح } - {  }
+71   { ج } - {  }
+66   { خ } - {  }
+62   { ة } - {  }
+60   { ص } - {  }
+39   { ، } - { - }
+38   { ش } - {  }
+30   { ا } - { - }
+30   { ن } - { - }
+29   { ى } - {  }
+28   { ذ } - {  }
+27   { ه } - { - }
+27   { ARABIC HAMZA BELOW } - {  }
+25   { ز } - {  }
+23   { ث } - {  }
+22   { غ } - {  }
+20   { م } - { - }
+20   { ي } - { - }
+20   { ) } - {  }
+19   { : } - {  }
+19   { ط } - {  }
+19   { ل } - { - }
+18   { ، } - { . }
+17   { ة } - { - }
+16   { ض } - {  }
+...
+Average accuracy: 14.12%, (stddev: 0.00)
+
+
+

The report(s) contains character accuracy measured per script and a detailled +list of confusions. When evaluating multiple models the last line of the output +will the average accuracy and the standard deviation across all of them.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/3.0/models.html b/3.0/models.html new file mode 100644 index 000000000..b90cbb3a8 --- /dev/null +++ b/3.0/models.html @@ -0,0 +1,118 @@ + + + + + + + + Models — kraken documentation + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Models

+

There are currently three kinds of models containing the recurrent neural +networks doing all the character recognition supported by kraken: pronn +files serializing old pickled pyrnn models as protobuf, clstm’s native +serialization, and versatile Core ML models.

+
+

CoreML

+

Core ML allows arbitrary network architectures in a compact serialization with +metadata. This is the default format in pytorch-based kraken.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/3.0/objects.inv b/3.0/objects.inv new file mode 100644 index 000000000..e55f650bf Binary files /dev/null and b/3.0/objects.inv differ diff --git a/3.0/search.html b/3.0/search.html new file mode 100644 index 000000000..b2ec1e681 --- /dev/null +++ b/3.0/search.html @@ -0,0 +1,113 @@ + + + + + + + Search — kraken documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +

Search

+ + + + +

+ Searching for multiple words only shows matches that contain + all words. +

+ + +
+ + + +
+ + +
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/3.0/searchindex.js b/3.0/searchindex.js new file mode 100644 index 000000000..e3b911c63 --- /dev/null +++ b/3.0/searchindex.js @@ -0,0 +1 @@ +Search.setIndex({"alltitles": {"ALTO": [[5, "alto"]], "API Quickstart": [[1, null]], "API reference": [[2, null]], "Advanced Usage": [[0, null]], "Annotation and transcription": [[7, "annotation-and-transcription"]], "Baseline segmentation": [[1, "baseline-segmentation"]], "Basic Concepts": [[1, "basic-concepts"]], "Basics": [[8, "basics"]], "Binarization": [[0, "binarization"]], "Convolutional Layers": [[8, "convolutional-layers"]], "CoreML": [[6, "coreml"]], "Dropout": [[8, "dropout"]], "Evaluation and Validation": [[7, "evaluation-and-validation"]], "Examples": [[8, "examples"]], "Features": [[4, "features"]], "Fine Tuning": [[5, "fine-tuning"]], "From Scratch": [[5, "from-scratch"]], "GPU Acceleration": [[3, null]], "Group Normalization": [[8, "group-normalization"]], "Helper and Plumbing Layers": [[8, "helper-and-plumbing-layers"]], "Image acquisition and preprocessing": [[7, "image-acquisition-and-preprocessing"]], "Input Specification": [[0, "input-specification"]], "Installation": [[4, "installation"]], "Installing kraken": [[7, "installing-kraken"]], "Legacy segmentation": [[1, "legacy-segmentation"]], "License": [[4, "license"]], "Max Pool": [[8, "max-pool"]], "Model Repository": [[0, "model-repository"]], "Models": [[4, "models"], [6, null]], "PAGE XML": [[5, "page-xml"]], "Page Segmentation and Script Detection": [[0, "page-segmentation-and-script-detection"]], "Preprocessing and Segmentation": [[1, "preprocessing-and-segmentation"]], "Quickstart": [[4, "quickstart"]], "Recognition": [[0, "recognition"], [1, "recognition"], [7, "recognition"]], "Recognition training": [[5, "recognition-training"]], "Recurrent Layers": [[8, "recurrent-layers"]], "Regularization Layers": [[8, "regularization-layers"]], "Reshape": [[8, "reshape"]], "Segmentation training": [[5, "segmentation-training"]], "Serialization": [[1, "serialization"]], "Slicing": [[5, "slicing"]], "Testing": [[5, "testing"]], "Training": [[1, "training"], [5, null], [7, "id1"]], "Training Tutorial": [[4, "training-tutorial"]], "Training data formats": [[5, "training-data-formats"]], "Training kraken": [[7, null]], "VGSL network specification": [[8, null]], "XML Parsing": [[1, "xml-parsing"]], "conda": [[4, "conda"]], "kraken": [[4, null]], "kraken.binarization module": [[2, "kraken-binarization-module"]], "kraken.blla module": [[2, "kraken-blla-module"]], "kraken.lib.codec": [[2, "kraken-lib-codec"]], "kraken.lib.ctc_decoder": [[2, "kraken-lib-ctc-decoder"]], "kraken.lib.dataset module": [[2, "kraken-lib-dataset-module"]], "kraken.lib.models module": [[2, "kraken-lib-models-module"]], "kraken.lib.segmentation module": [[2, "kraken-lib-segmentation-module"]], "kraken.lib.train module": [[2, "kraken-lib-train-module"]], "kraken.lib.vgsl module": [[2, "kraken-lib-vgsl-module"]], "kraken.lib.xml module": [[2, "kraken-lib-xml-module"]], "kraken.linegen module": [[2, "kraken-linegen-module"]], "kraken.pageseg module": [[2, "kraken-pageseg-module"]], "kraken.rpred module": [[2, "kraken-rpred-module"]], "kraken.serialization module": [[2, "kraken-serialization-module"]], "kraken.transcribe module": [[2, "kraken-transcribe-module"]], "pip": [[4, "pip"]]}, "docnames": ["advanced", "api", "api_docs", "gpu", "index", "ketos", "models", "training", "vgsl"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2}, "filenames": ["advanced.rst", "api.rst", "api_docs.rst", "gpu.rst", "index.rst", "ketos.rst", "models.rst", "training.rst", "vgsl.rst"], "indexentries": {}, "objects": {}, "objnames": {}, "objtypes": {}, "terms": {"": [0, 4, 5, 6, 7, 8], "0": [0, 1, 4, 5, 7, 8], "00": [5, 7], "001": [5, 7], "005": 5, "0123456789": [4, 7], "01c59": 8, "02": 4, "0245": 7, "04": 7, "06": 7, "07": 5, "09": 7, "0d": 7, "1": [4, 5, 7, 8], "10": [1, 4, 5, 7], "100": [5, 7, 8], "10000": 4, "1015": 1, "1020": 8, "1024": 8, "103": 1, "105": 1, "106": 5, "108": 5, "11": 7, "1161": 1, "117": 1, "1184": 7, "119": 1, "1195": 1, "12": [5, 7, 8], "121": 1, "122": 5, "124": 1, "125": 5, "126": 1, "128": 8, "13": 7, "131": 1, "132": 7, "1339": 7, "1359": 7, "136": 5, "1377": 1, "1385": 1, "1388": 1, "1397": 1, "14": [0, 5], "1408": 1, "1410": 1, "1412": 1, "1416": 7, "143": 7, "144": 5, "145": 1, "15": [5, 7], "1558": 7, "1567": 5, "157": 7, "15924": 0, "16": [5, 8], "161": 7, "1623": 7, "1626": 0, "1676": 0, "1678": 0, "1681": 7, "1697": 7, "17": 5, "1708": 1, "1716": 1, "172": 5, "1724": 7, "174": 5, "1754": 7, "176": 7, "18": [5, 7], "1824": 1, "19": [1, 5], "192": 5, "197": 0, "198": 5, "199": 5, "1996": 7, "1cycl": 5, "1d": 8, "1st": 7, "1x12": [5, 8], "1x16": 8, "1x48": 8, "2": [4, 5, 7, 8], "20": [1, 5, 8], "200": 5, "2000": 1, "2001": 5, "2016": 1, "2017": 1, "2019": [4, 5], "204": 7, "2041": 1, "207": [1, 5], "2072": 1, "2077": 1, "2078": 1, "2096": 7, "210": 5, "215": 5, "216": [0, 1], "2169": 0, "2172": 0, "22": [5, 7], "2208": 0, "221": 0, "2215": 0, "2236": 0, "2241": 0, "228": 1, "2293": 0, "23": 5, "230": 1, "2302": 0, "232": 1, "2334": 7, "2364": 7, "24": [1, 7], "241": 5, "2423": 0, "2424": 0, "2426": 1, "244": 0, "246": 5, "2483": 1, "25": [1, 5, 7, 8], "250": 1, "2500": [5, 7], "253": 1, "256": [5, 7, 8], "2577813": 4, "259": 7, "26": [4, 7], "2641": 0, "266": 5, "2681": 0, "27": 5, "270": 7, "27046": 7, "274": 5, "28": [1, 5], "29": [1, 5], "2d": 8, "3": [5, 7, 8], "30": [5, 7], "300": 5, "300dpi": 7, "307": 7, "309": 0, "31": 5, "318": 0, "32": [5, 8], "320": 0, "328": 5, "3292": 1, "336": 7, "3367": 1, "3398": 1, "3414": 1, "3418": 7, "3437": 1, "345": 1, "3455": 1, "35000": 7, "3504": 7, "3514": 1, "3519": 7, "35619": 7, "365": 7, "3680": 7, "38": 5, "384": 8, "39": 5, "4": [0, 1, 5, 7, 8], "40": 7, "400": [0, 5], "4000": 5, "412": 0, "416": 0, "428": 7, "431": 7, "46": 5, "47": 7, "471": 1, "473": 1, "48": [5, 7, 8], "488": 7, "49": [5, 7], "491": 1, "5": [1, 5, 7, 8], "50": [5, 7], "500": 5, "503": 0, "509": 1, "512": 8, "515": 1, "52": [5, 7], "522": 1, "5226": 5, "523": 5, "5230": 5, "5234": 5, "524": 1, "5258": 7, "5281": 4, "534": 1, "536": [1, 5], "54": 1, "545": 7, "546": 0, "56": [1, 7], "561": 0, "562": 1, "575": [1, 5], "577": 7, "59": [7, 8], "5951": 7, "599": 7, "6": [5, 7, 8], "60": [5, 7], "6022": 5, "62": 5, "63": 5, "64": [5, 8], "646": 7, "66": [5, 7], "668": 1, "69": 1, "7": [1, 5, 7, 8], "70": 1, "7012": 5, "7015": 7, "71": 5, "7272": 7, "7281": 7, "73": 1, "74": 1, "7593": 5, "76": 1, "773": 5, "7857": 5, "788": [5, 7], "79": 1, "794": 5, "7943": 5, "8": [5, 7, 8], "80": 5, "800": 7, "8014": 5, "81": [5, 7], "811": 7, "82": 5, "824": 7, "8337": 5, "8344": 5, "8374": 5, "84": [1, 7], "8445": 7, "8479": 7, "848": 0, "8481": 7, "8482": 7, "8484": 7, "8485": 7, "8486": 7, "8487": 7, "8488": 7, "8489": 7, "8490": 7, "8491": 7, "8492": 7, "8493": 7, "8494": 7, "8495": 7, "8496": 7, "8497": 7, "8498": 7, "8499": 7, "8500": 7, "8501": 7, "8502": 7, "8503": 7, "8504": 7, "8505": 7, "8506": 7, "8507": 7, "8508": 7, "8509": 7, "8510": 7, "8511": 7, "8512": 7, "8616": 5, "8620": 5, "876": 7, "8760": 5, "8762": 5, "8790": 5, "8795": 5, "8797": 5, "88": [5, 7], "8802": 5, "8804": 5, "8806": 5, "8813": 5, "8876": 5, "8878": 5, "8883": 5, "889": 7, "9": [1, 5, 7, 8], "906": 8, "906x32": 8, "92": 1, "9315": 7, "9318": 7, "9350": 7, "9361": 7, "9381": 7, "95": [1, 4], "9541": 7, "9550": 7, "96": 7, "97": 7, "98": 7, "99": [4, 7], "9918": 7, "9920": 7, "9924": 7, "A": [0, 4, 7, 8], "As": 1, "By": 7, "For": [0, 1, 2, 5, 7, 8], "If": [0, 5, 7, 8], "In": [0, 1, 5, 7], "It": [0, 1, 5, 7], "NO": 7, "On": 4, "One": 5, "The": [0, 1, 3, 5, 7, 8], "There": [0, 1, 4, 5, 6, 7], "These": [1, 7], "To": [0, 1, 4, 5, 7], "_": 1, "_print_ev": 1, "_update_progress": 1, "abbyxml": 4, "abbyyxml": 0, "abcdefghijklmnopqrstuvwxyz": 4, "abl": [0, 5, 7], "abort": [1, 5, 7], "about": 7, "abov": [5, 7], "absolut": 5, "acceler": [4, 5, 7], "accept": [0, 5], "access": [0, 1], "accomod": 5, "accord": 5, "account": 7, "accuraci": [1, 4, 5, 7], "achiev": 7, "across": 5, "action": [0, 5], "activ": [5, 7, 8], "actual": [4, 7], "ad": [5, 7], "adam": 5, "add": [4, 5, 8], "addit": [0, 1, 5], "adjac": 0, "adjust": [5, 7, 8], "advis": 7, "affect": 7, "after": [0, 1, 5, 7, 8], "afterward": 1, "again": 7, "against": 5, "ah": 7, "aku": 7, "al": 7, "alam": 7, "albeit": 7, "aletheia": 7, "algorithm": [0, 1, 4], "all": [0, 1, 4, 5, 6, 7], "allow": [0, 5, 6, 7], "almost": [0, 1], "along": 8, "alphabet": [4, 5, 7, 8], "alphanumer": 0, "also": [0, 1, 5, 7], "altern": [0, 8], "although": [0, 1, 5, 7], "alto": [0, 1, 4, 7], "alto_doc": 1, "alwai": [0, 4], "amiss": 7, "among": 5, "amount": 7, "an": [0, 1, 4, 5, 7, 8], "analysi": [0, 4, 7], "ani": [0, 1, 5], "annot": [0, 5], "anoth": [5, 7, 8], "antiqua": 0, "anymor": [5, 7], "apach": 4, "apart": [1, 3], "append": [5, 7, 8], "appli": [1, 7, 8], "applic": [1, 7], "approach": 7, "appropri": [0, 4, 7, 8], "approxim": 1, "apt": 4, "ar": [0, 1, 2, 4, 5, 6, 7, 8], "arab": [0, 5, 7], "arbitrari": [1, 6, 7, 8], "architectur": [4, 5, 6, 8], "archiv": [1, 5, 7], "argument": [1, 5], "around": [5, 7], "arrai": 1, "assign": [5, 7], "attribut": [0, 1, 5], "augment": [1, 5, 7, 8], "author": [0, 4], "auto": 1, "automat": [0, 1, 5, 7, 8], "avail": [0, 1, 4, 5, 7], "averag": [0, 5, 7], "axi": 8, "b": [0, 1, 5, 7, 8], "backend": 3, "bar": 1, "base": [1, 2, 5, 6, 7, 8], "baselin": [2, 4, 5, 7], "baseline_label_evaluator_fn": 1, "baseline_seg": 1, "basic": [0, 7], "batch": [7, 8], "bayr\u016bt": 7, "becaus": [1, 7], "been": [0, 4, 7], "befor": [5, 7, 8], "beforehand": 7, "behav": [5, 8], "being": [1, 8], "below": [5, 7], "benjamin": [0, 4], "best": [1, 5, 7], "best_epoch": 1, "best_loss": 1, "best_model_path": 1, "between": [0, 1, 2, 5, 7], "bi": 8, "bidi": 4, "bidirection": 8, "binar": [1, 4, 7], "binari": 1, "bit": 1, "biton": 0, "bl": 4, "black": [0, 1, 7], "blla": 1, "block": [1, 8], "block_i": 5, "block_n": 5, "boilerpl": 1, "border": 0, "both": [0, 1, 3, 5, 7], "bottom": [0, 1, 4], "bound": [0, 1, 2, 4], "boundari": [1, 5], "box": [0, 1, 2, 4, 5], "break": 7, "build": [1, 5, 7], "buld\u0101n": 7, "bw": 4, "bw_im": 1, "bw_imag": 7, "c": [1, 5, 8], "call": [1, 7], "callback": 1, "can": [0, 1, 3, 4, 5, 7, 8], "capabl": 5, "case": [0, 1, 5, 7], "cat": 0, "catch": 1, "caus": 1, "caveat": 5, "ce": 7, "cell": 8, "cent": 7, "central": [4, 7], "certain": [0, 1, 7], "chain": [0, 5, 7], "chang": [0, 1, 5], "channel": 8, "charact": [0, 1, 4, 5, 6, 7], "check": 0, "chosen": 1, "circumst": 7, "cl": 1, "class": [5, 7], "classic": 7, "classif": [7, 8], "classifi": [0, 1, 8], "claus": 7, "client": 0, "clone": 0, "closer": 1, "clstm": [0, 6], "code": [0, 1, 4, 5, 7], "codec": [1, 5], "collect": 7, "color": [0, 1, 5, 7, 8], "colsep": 0, "column": [0, 1], "com": [4, 7], "comand": 1, "combin": [0, 7, 8], "command": [0, 1, 4, 5, 7], "common": [5, 7], "commoni": 5, "compact": 6, "compat": [3, 5], "complet": [5, 7], "complex": [1, 7], "compon": 5, "composedblocktyp": 5, "compress": 7, "compris": 7, "comput": [3, 7], "computation": 7, "conda": 7, "condit": [4, 5], "confid": [0, 1], "configur": 1, "conform": 5, "confus": 5, "connect": 7, "consist": [0, 1, 7, 8], "constant": 5, "construct": [1, 7], "constructor": 1, "contain": [0, 1, 5, 6, 7], "content": 5, "continu": [1, 5, 7], "contrast": 7, "contrib": 1, "contribut": 4, "conv": [5, 8], "convers": 7, "convert": [0, 1, 5, 7], "convolut": 5, "coord": 5, "coordin": 0, "core": [5, 6], "corpu": 4, "correct": [1, 5, 7], "correspond": 0, "cost": 7, "count": [5, 7], "coupl": 7, "coverag": 7, "cpu": [1, 5, 7], "cr3": [5, 8], "creat": [4, 5, 7, 8], "creation": 5, "ctc": 5, "ctc_decod": 1, "cuda": [3, 4, 5], "cudnn": 3, "curat": 0, "current": [5, 6], "custom": 5, "cut": [1, 4], "cycl": 5, "d": [0, 5, 7, 8], "data": [0, 1, 7, 8], "dataset": 5, "date": 4, "de": 7, "deal": 5, "debian": 4, "debug": [1, 5, 7], "decai": 5, "decid": 0, "decod": 1, "decreas": 7, "deem": 0, "def": 1, "default": [0, 1, 4, 5, 6, 7, 8], "defin": [0, 5, 8], "definit": [5, 8], "degrad": 1, "degre": 7, "delet": [5, 7], "delta": 5, "depend": [0, 1, 7], "depth": [5, 7, 8], "describ": 5, "descript": [0, 5], "desir": [1, 8], "desktop": 7, "destroi": 5, "detail": [0, 5, 7], "determin": [1, 5], "dev": 4, "develop": 4, "deviat": 5, "devic": [1, 5, 7], "diaeres": 7, "diaeresi": 7, "dialect": 8, "dice": 5, "differ": [0, 1, 5, 7, 8], "differenti": 1, "digit": 5, "dim": [5, 7, 8], "dimens": 8, "direct": [0, 1, 5, 7, 8], "directori": [1, 4, 5, 7], "disabl": [0, 5, 7], "disk": 7, "displai": 5, "distribut": 8, "do": [4, 5, 6, 7, 8], "do0": [5, 8], "document": [0, 1, 4, 5, 7], "doe": [1, 5, 7], "doesn": 7, "domain": 5, "done": [5, 7], "dot": 7, "down": 7, "download": [4, 7], "draw": 1, "driver": 1, "drop": [1, 8], "dropout": [5, 7], "dumb": 5, "dure": [1, 5, 7], "e": [0, 1, 5, 7, 8], "each": [0, 1, 5, 7, 8], "earli": [1, 5, 7], "easiest": 7, "easili": [5, 7], "edit": 7, "editor": 7, "edu": 7, "eiter": 8, "either": [0, 5, 7, 8], "element": 5, "email": 0, "emploi": 7, "en": 0, "enabl": [0, 1, 3, 5, 7, 8], "encapsul": 1, "encod": [5, 7], "end": 1, "enforc": [0, 5], "engin": 1, "english": 4, "enough": 7, "env": [4, 7], "environ": [4, 7], "environment_cuda": 4, "epoch": [1, 5, 7], "equal": [1, 7, 8], "equival": 8, "erron": 7, "error": [0, 5, 7], "escal": 0, "escriptorium": 7, "estim": [0, 7], "evalu": [0, 1, 5], "evaluation_data": 1, "evaluation_fil": 1, "even": 7, "everyth": 5, "exact": [5, 7], "exactli": 5, "exampl": [5, 7], "except": [1, 5], "execut": [0, 1, 7, 8], "exhaust": 7, "exist": [0, 1, 5, 7], "expect": [7, 8], "experi": 7, "experiment": 7, "explicit": [1, 5], "explicitli": [0, 7], "extend": 8, "extent": 7, "extern": 4, "extract": [0, 1, 4, 7], "f": [1, 4, 5, 7, 8], "fail": 5, "fairli": 7, "fallback": 0, "fals": [1, 5, 7, 8], "faq\u012bh": 7, "faster": [5, 7, 8], "featur": [0, 1, 7, 8], "fed": [0, 1, 8], "feed": [0, 1], "feminin": 7, "fetch": 7, "few": [0, 5, 7], "field": [0, 5], "figur": 1, "file": [0, 1, 4, 5, 6, 7], "filenam": 5, "filename_prefix": 1, "filter": [1, 5, 8], "final": [0, 4, 5, 7, 8], "find": [5, 7], "fine": 7, "finish": 7, "first": [1, 7, 8], "fit": 7, "fix": [5, 7], "flag": 4, "float": 0, "follow": [0, 5, 8], "foo": 1, "format": [0, 1, 6, 7], "format_typ": 1, "formul": 8, "forward": 8, "found": [5, 7], "fp": 1, "free": 5, "freeli": [0, 7], "frequenc": [5, 7], "friendli": 7, "from": [0, 1, 3, 7, 8], "full": 7, "fulli": [2, 4], "function": 1, "fundament": 1, "further": 4, "g": [0, 7, 8], "gain": 1, "gener": [0, 1, 4, 5, 7], "gentl": 5, "get": [0, 1, 4, 5, 7], "git": [0, 4], "githubusercont": [4, 7], "given": [1, 5, 8], "glob": 1, "glyph": [5, 7], "gn": 8, "go": 7, "gov": 5, "gpu": [1, 5], "grain": 7, "graph": 8, "graphem": 7, "grayscal": [0, 1, 7, 8], "greedy_decod": 1, "greek": [0, 7], "grei": 0, "grek": 0, "ground": [5, 7], "ground_truth": 1, "group": 7, "gru": 8, "gt": 5, "guid": 7, "gz": 0, "h": [0, 7], "ha": [0, 1, 4, 5, 7, 8], "hamza": [5, 7], "hand": [5, 7], "handwritten": 1, "happili": 0, "hard": [5, 7], "hardwar": 4, "have": [0, 1, 3, 4, 5, 7], "hebrew": 7, "heigh": 8, "height": [0, 5, 8], "held": 7, "help": [4, 7], "here": 5, "heurist": 0, "high": [0, 1, 7, 8], "higher": 8, "highli": [5, 7], "histor": 4, "hline": 0, "hocr": [0, 4, 7], "horizont": [0, 1], "hour": 7, "how": 7, "hpo": 5, "http": [0, 4, 5, 7], "hundr": 7, "hyperparamet": 1, "h\u0101d\u012b": 7, "i": [0, 1, 2, 4, 5, 6, 7, 8], "ibn": 7, "id": 5, "ident": 1, "identif": 0, "identifi": 0, "ignor": [0, 5], "im": 1, "imag": [0, 1, 4, 5, 8], "image_nam": 1, "image_s": 1, "imagefilenam": 5, "imaginari": 7, "immedi": 5, "implement": [1, 8], "import": [1, 7], "importantli": [5, 7], "improv": [0, 1, 5, 7], "includ": [0, 1, 4, 5, 7], "incorrect": 7, "increas": 7, "incur": 5, "indefinit": 5, "independ": 8, "index": 5, "indic": [0, 5, 7], "infer": [5, 7], "inform": [0, 1, 4, 5, 7], "ingest": 5, "inherit": [5, 7], "initi": [1, 5, 7, 8], "input": [1, 5, 7, 8], "input_1": [0, 7], "input_2": [0, 7], "input_imag": 7, "insert": [5, 7, 8], "inspect": 7, "instal": 3, "instanc": [1, 5], "instead": [5, 7], "insuffici": 7, "integ": [0, 7, 8], "integr": 7, "intend": 1, "intens": 7, "interchang": [2, 5], "interfac": [1, 2], "intermedi": [1, 5, 7], "intern": [1, 7], "introduct": 5, "intuit": 8, "invalid": 5, "inventori": 7, "invers": 0, "invok": 7, "involv": 7, "isn": [1, 7, 8], "iso": 0, "iter": [1, 7], "its": [5, 7], "itself": 1, "jinja2": 1, "jpeg": 7, "jpg": 5, "json": [0, 4, 5], "just": [0, 1, 4, 5, 7], "kamil": 5, "kei": 4, "kernel": [5, 8], "kernel_s": 8, "keto": [1, 5, 7], "keyword": 0, "kiessl": [0, 4], "kind": [5, 6, 7], "kit\u0101b": 7, "know": 7, "known": 7, "kraken": [0, 1, 3, 5, 6, 8], "krakentrain": 1, "kutub": 7, "kwarg": 1, "l": [0, 7, 8], "l2c": 1, "label": 1, "lack": 7, "lag": 5, "lambda": 1, "languag": [5, 8], "larg": [1, 4, 7], "larger": 7, "last": [1, 5, 8], "later": 7, "latest": [3, 4], "latin": [0, 4], "latn": [0, 4], "latter": 1, "layer": [5, 7], "layout": [0, 4, 5, 7], "lbx100": [5, 7, 8], "lbx128": [5, 8], "lbx256": [5, 8], "learn": [1, 5], "least": 7, "leav": 8, "left": [0, 4, 5, 7], "legaci": [2, 5, 7, 8], "leipzig": 7, "length": [1, 5], "less": [5, 7], "let": 7, "level": [1, 5, 7], "lfx25": 8, "lfys20": 8, "lfys64": [5, 8], "lib": 1, "libblas3": 4, "liblapack3": 4, "libpangocairo": 4, "libr": 4, "librari": 4, "libvip": 4, "libxml2": 4, "licens": 0, "lightweight": 4, "like": [1, 5, 7], "likewis": [1, 7], "limit": 5, "line": [0, 1, 4, 5, 7, 8], "line_0": 5, "line_k": 5, "linear": [5, 7, 8], "linux": 7, "list": [0, 4, 5, 7], "ll": 4, "load": [1, 4, 5, 7], "load_ani": 1, "load_model": 1, "loader": 1, "loc": 5, "locat": [1, 5, 7], "log": [5, 7], "look": [1, 5, 7], "loss": [1, 5], "lossless": 7, "lot": [1, 5], "low": [0, 1, 5], "lower": 5, "lr": [0, 1, 7], "lrate": 5, "lstm": 8, "ltr": 0, "m": [0, 5, 7, 8], "mac": 7, "maddah": 7, "made": 7, "mai": [0, 4, 5, 7], "main": 4, "major": 1, "make": 5, "mandatori": 1, "manifest": 5, "manual": 7, "manuscript": 7, "map": [0, 1], "mark": 7, "markedli": 7, "mask": 1, "master": [4, 7], "match": 5, "materi": [1, 4, 7], "matrix": 1, "matter": 7, "maxcolsep": 0, "maxim": 7, "maximum": [0, 8], "maxpool": [5, 8], "me": 0, "mean": [1, 7], "measur": 5, "measurementunit": 5, "memori": [5, 7], "merg": 0, "metadata": [0, 4, 5, 6, 7], "method": 1, "metric": 1, "might": [1, 5, 7], "min": 5, "minim": [1, 5], "minimum": 5, "mismatch": [1, 5, 7], "misrecogn": 7, "miss": [0, 5, 7], "mittagessen": [0, 4, 7], "ml": 6, "mlmodel": [1, 5, 7], "mm_rpred": 1, "mode": [1, 5], "model": [1, 5, 7, 8], "model_1": 5, "model_25": 5, "model_5": 5, "model_best": 5, "model_fil": 7, "model_nam": 7, "model_name_best": 7, "model_path": 1, "modern": [4, 7], "modest": 1, "modif": 5, "modul": 1, "momentum": [5, 7], "more": [0, 5, 7, 8], "most": [1, 5, 7], "mostli": [0, 1, 7, 8], "move": [7, 8], "mp": 8, "mp2": [5, 8], "mp3": [5, 8], "multi": [0, 1, 4, 7], "multipl": [0, 5, 7], "n": [0, 5, 8], "name": [0, 1, 4, 7, 8], "nativ": 6, "natur": 7, "naugment": 4, "necessari": [5, 7], "need": [1, 7], "net": 7, "netork": 1, "network": [1, 4, 5, 6, 7], "neural": [1, 6, 7], "never": 7, "nevertheless": 1, "new": [3, 5, 7, 8], "next": [1, 7], "nfc": 5, "nfd": 5, "nfkc": 5, "nfkd": 5, "nlbin": [0, 1, 4], "noisi": 7, "non": [0, 1, 4, 5, 7, 8], "none": [5, 7, 8], "nonlinear": 8, "nop": 1, "normal": [0, 5], "note": 2, "notion": 1, "now": 7, "number": [0, 1, 5, 7, 8], "numer": [1, 7], "numpi": 1, "nvidia": 3, "o": [5, 7], "o1c103": 8, "object": 1, "obtain": 7, "obvious": 7, "occur": 7, "ocr": [0, 1, 4, 7], "ocr_lin": 0, "ocr_record": 1, "ocropu": 0, "ocrx_word": 0, "off": [5, 7], "often": [1, 7], "old": 6, "omit": 7, "onc": 5, "one": [0, 1, 5, 7, 8], "ones": [0, 5], "onli": [0, 1, 5, 7, 8], "open": [0, 1], "openmp": [5, 7], "oper": [0, 1, 8], "optic": [0, 7], "optim": [1, 4, 5, 7], "option": [0, 1, 5, 8], "order": [0, 1, 4, 5, 8], "org": 5, "origin": [1, 5], "other": [0, 5, 7, 8], "otherwis": 5, "out": [5, 7, 8], "output": [0, 1, 4, 5, 7, 8], "output_1": [0, 7], "output_2": [0, 7], "output_dir": 7, "output_fil": 7, "overfit": 7, "overrid": 5, "p": [4, 5], "packag": 7, "pad": 5, "page": [1, 4, 7], "page_doc": 1, "pagecont": 5, "pageseg": 1, "pagexml": [1, 4, 7], "pair": 0, "par": 1, "paragraph": 0, "param": [5, 7, 8], "paramet": [0, 1, 4, 5, 7, 8], "parameterless": 0, "pars": 5, "parse_alto": 1, "parse_pag": 1, "parser": [1, 5], "part": [1, 5, 7, 8], "parti": 1, "particular": [0, 1, 5, 7, 8], "partit": 5, "pass": [7, 8], "path": [1, 5], "pattern": 7, "pcgt": 5, "pdf": 7, "pdfimag": 7, "pdftocairo": 7, "per": [1, 5, 7], "perc": 0, "perform": [1, 7], "period": 7, "pick": 5, "pickl": 6, "pil": 1, "pillow": 1, "pinpoint": 7, "pip3": 4, "pipelin": 1, "pixel": [1, 5, 8], "place": [0, 4, 7], "placement": 7, "plain": 0, "pleas": 5, "plethora": 1, "png": [1, 5, 7], "point": [1, 5, 7], "polygon": [5, 7], "polyton": [0, 7], "pool": 5, "porson": 0, "portion": 0, "possibl": [0, 1, 5, 7], "postprocess": 1, "potenti": 5, "power": 7, "pre": 0, "pred_it": 1, "predict": 1, "prefer": 7, "prefilt": 0, "prefix": [5, 7], "prefix_epoch": 7, "preload": [5, 7], "prepar": 7, "prepend": 8, "prerequisit": 4, "prevent": 7, "previous": 5, "primaresearch": 5, "primari": 1, "princip": [0, 1], "print": [0, 1, 4, 5, 7], "printspac": 5, "prob": 8, "probabl": [5, 7, 8], "process": [1, 4, 5, 7, 8], "produc": [0, 5, 7], "programmat": 1, "progress": [1, 7], "progress_callback": 1, "project": 8, "pronn": 6, "proper": 1, "properli": 7, "protobuf": 6, "prove": 7, "provid": [0, 1, 2, 4, 7, 8], "public": 4, "pull": [0, 4], "purpos": [1, 7, 8], "put": [0, 7], "pyrnn": [0, 6], "python3": 4, "pytorch": [3, 6], "q": 5, "qualiti": [1, 7], "quickli": 1, "quit": [1, 5], "r": [0, 5, 8], "rais": [1, 5], "random": 7, "rang": 0, "rapidli": 7, "rate": [1, 5, 7], "ratio": 5, "raw": [1, 4, 7], "re": 0, "reach": 7, "read": [0, 4], "real": 7, "rec_model_path": 1, "recogn": [0, 1, 4, 5, 7], "recognit": [2, 3, 4, 6, 8], "recognition_evaluator_fn": 1, "recognition_train_gen": 1, "recommend": [1, 5, 7], "record": 1, "recurr": 6, "redefinit": 5, "reduc": 8, "refer": [1, 5, 7], "refin": 5, "region": [1, 4, 5, 7], "region_typ": 5, "regular": 5, "rel": 5, "relat": [0, 1, 5, 7], "relax": 7, "reliabl": 7, "relu": 8, "remain": 7, "remaind": 8, "remedi": 7, "remov": [0, 5, 7, 8], "render": 1, "reorder": [5, 7], "repeatedlydur": 7, "report": [5, 7], "repositori": [4, 7], "represent": 7, "request": [0, 4, 8], "requir": [0, 1, 4, 5, 7, 8], "requisit": 7, "reset": 5, "reshap": 5, "resiz": 5, "respect": 8, "result": [0, 1, 5, 7, 8], "resum": 5, "retain": [0, 5], "retrain": 7, "retriev": [0, 4, 5, 7], "return": [1, 8], "revers": 8, "rgb": [1, 8], "right": [0, 4, 5, 7], "rl": 0, "rmsprop": [5, 7], "rnn": [4, 5, 7, 8], "romanov": 7, "rough": 7, "routin": 1, "rpred": 1, "rtl": 0, "rukkakha": 7, "rule": 7, "run": [0, 1, 3, 4, 5, 7, 8], "s1": [5, 8], "same": [0, 1, 5, 7], "sampl": 7, "sarah": 7, "savant": 7, "save": [5, 7], "savefreq": [5, 7], "scale": [0, 8], "scan": 7, "scantailor": 7, "schedul": [1, 5], "schema": 5, "schemaloc": 5, "script": [1, 4, 5, 7], "script_detect": [0, 1], "scriptal": 1, "scroung": 4, "section": 7, "see": [5, 7], "seen": 7, "seg": 1, "segment": [4, 7], "segment_k": 5, "segmentation_train_gen": 1, "segtrain": 1, "seldomli": 7, "select": [5, 8], "semant": [5, 7], "semi": [0, 7], "sens": 0, "separ": [0, 7, 8], "sequenc": [0, 1, 5, 7, 8], "seri": 0, "serial": [0, 6], "set": [0, 1, 5, 7, 8], "setup": 1, "sever": [1, 7], "sgd": 5, "shape": 8, "share": 0, "shell": 7, "short": [0, 8], "should": [1, 7], "show": [0, 4, 5, 7], "shown": [0, 7], "shuffl": 1, "sigmoid": 8, "similar": [1, 5, 7], "simpl": [1, 7, 8], "singl": [1, 4, 7, 8], "size": [0, 1, 5, 7, 8], "skew": [0, 7], "slightli": [0, 7, 8], "small": [0, 1, 7, 8], "so": [1, 3, 7, 8], "softmax": [1, 8], "softwar": 7, "some": [0, 1, 4, 5, 7], "someth": 7, "sometim": [1, 5, 7], "somewhat": 7, "soon": [5, 7], "sort": [4, 7], "sourc": [1, 7, 8], "sourceimageinform": 5, "sp": 5, "space": [1, 4, 5, 7], "span": 0, "spec": 5, "special": 0, "specif": 7, "specifi": 5, "speckl": 7, "split": [0, 5, 7, 8], "squash": 8, "stack": [5, 8], "standard": [4, 5, 7], "start": [1, 7], "stddev": 5, "step": [0, 1, 4, 7, 8], "still": [0, 1], "stop": [1, 5, 7], "stopper": 1, "straightforward": 1, "strength": 1, "strict": 5, "strictli": 7, "stride": [5, 8], "stride_i": 8, "stride_x": 8, "string": [1, 5, 8], "strip": [0, 8], "structur": [1, 5], "stub": 5, "subcommand": 0, "subcommand_1": 0, "subcommand_2": 0, "subcommand_n": 0, "suboptim": 5, "subset": 1, "substitut": [5, 7], "suffer": 7, "suffici": [1, 5], "suggest": 1, "suit": 7, "suitabl": [0, 7], "summar": [5, 7, 8], "superflu": 7, "suppli": [0, 7], "support": [1, 4, 6], "switch": [0, 5, 7], "symbol": [5, 7], "syntax": [0, 5, 8], "syr": [5, 7], "syriac": 7, "syriac_best": 7, "system": [4, 7], "systemat": 7, "t": [0, 1, 5, 7, 8], "tabl": 7, "tag": 5, "take": [1, 5, 7], "tanh": 8, "task": 7, "tb": 0, "templat": 1, "tensor": 8, "tensorflow": 8, "term": 4, "tesseract": 8, "test": 7, "test_model": 5, "text": [0, 1, 4, 5, 7], "text_direct": [0, 1], "textblock": 5, "textblock_m": 5, "textblock_n": 5, "textequiv": 5, "textlin": 5, "textregion": 5, "than": [5, 7], "thei": [1, 4, 5, 7], "them": [0, 5], "therefor": 7, "therein": 7, "thi": [1, 4, 5, 6, 7, 8], "third": 1, "those": [0, 5], "though": 1, "thousand": 7, "thread": [5, 7], "three": 6, "threshold": 0, "through": [1, 4, 5, 7], "thrown": 0, "tif": [0, 4], "tiff": 7, "tightli": 7, "time": [1, 7, 8], "tip": 1, "toi": 0, "too": [5, 8], "tool": [1, 5, 7, 8], "top": [0, 1, 4], "topograph": 0, "topolog": 0, "torchseqrecogn": 1, "torchvgslmodel": 1, "total": 7, "train": [0, 3, 8], "trainabl": [1, 2, 4, 5], "trainer": 1, "training_data": [1, 5], "training_fil": 1, "transcrib": [5, 7], "transcript": 5, "transfer": 1, "transform": [1, 5], "transformt": 1, "transpos": [5, 7, 8], "treat": [7, 8], "true": [0, 1, 8], "truth": [5, 7], "turn": 4, "tutori": [1, 5], "two": [0, 1, 5, 8], "txt": [0, 4, 5], "type": [0, 1, 5, 7, 8], "typefac": [5, 7], "typograph": [0, 7], "u": 5, "ubuntu": 4, "unchti": 0, "unclean": 7, "undecod": 1, "under": 4, "undesir": 8, "unduli": 0, "uni": [0, 7], "unicod": [0, 1, 5, 7], "uniqu": 7, "unnecessarili": 1, "unpredict": 7, "unrepres": 7, "unseg": 7, "unset": 5, "until": 5, "untrain": 5, "unus": 5, "up": [1, 4], "upcom": 5, "updat": 0, "upon": 0, "upward": 7, "us": [0, 1, 3, 4, 5, 7, 8], "usabl": 1, "user": [4, 7], "usual": [1, 4, 5, 7], "utf": 5, "util": [1, 5, 7], "uw3": 0, "v": [5, 7], "v4": 5, "valid": [0, 1, 5], "valu": [0, 5, 8], "variabl": [4, 8], "varieti": 5, "vast": 1, "verbos": 7, "veri": 5, "versatil": 6, "version": [0, 3, 4, 5], "vertic": 0, "vgsl": [1, 5], "vocal": 7, "vpo": 5, "vv": 7, "w": [0, 1, 5, 8], "w3": 5, "wa": [0, 7], "wai": [1, 5, 7], "wait": 5, "want": 7, "warn": [1, 7], "warp": 7, "we": [5, 7], "weak": [1, 7], "websit": 7, "weight": 5, "welcom": [0, 4], "well": 7, "were": 5, "western": 7, "wget": [4, 7], "what": 7, "when": [5, 7, 8], "where": 7, "which": [0, 1, 3, 5], "while": [0, 1, 7], "white": [0, 1, 7], "whitelist": 0, "whole": 7, "wide": 8, "width": [5, 7, 8], "wildli": 7, "without": [1, 5, 7], "word": [4, 5], "word_text": 5, "work": [1, 5, 7], "world": 7, "write": [0, 1], "written": [0, 7], "www": 5, "x": [5, 7, 8], "x01": 1, "x02": 1, "x03": 1, "x04": 1, "x05": 1, "x06": 1, "x07": 1, "x_bbox": 0, "x_conf": 0, "x_stride": 8, "xa0": 7, "xdg_base_dir": 0, "xml": 7, "xmln": 5, "xmlschema": 5, "xsd": 5, "xsi": 5, "y": [0, 8], "y_stride": 8, "yml": [4, 7], "you": [4, 7], "your": 0, "y\u016bsuf": 7, "zenodo": 4, "zero": [7, 8], "zoom": 0, "\u02bf\u0101lam": 7, "\u0621": 5, "\u0621\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0627": 5, "\u0628": 5, "\u0629": 5, "\u062a": 5, "\u062b": 5, "\u062c": 5, "\u062d": 5, "\u062e": 5, "\u062f": 5, "\u0630": 5, "\u0631": 5, "\u0632": 5, "\u0633": 5, "\u0634": 5, "\u0635": 5, "\u0636": 5, "\u0637": 5, "\u0638": 5, "\u0639": 5, "\u063a": 5, "\u0640": 5, "\u0641": 5, "\u0642": 5, "\u0643": 5, "\u0644": 5, "\u0645": 5, "\u0646": 5, "\u0647": 5, "\u0648": 5, "\u0649": 5, "\u064a": 5, "\u0710": 7, "\u0712": 7, "\u0713": 7, "\u0715": 7, "\u0717": 7, "\u0718": 7, "\u0719": 7, "\u071a": 7, "\u071b": 7, "\u071d": 7, "\u071f": 7, "\u0720": 7, "\u0721": 7, "\u0722": 7, "\u0723": 7, "\u0725": 7, "\u0726": 7, "\u0728": 7, "\u0729": 7, "\u072a": 7, "\u072b": 7, "\u072c": 7}, "titles": ["Advanced Usage", "API Quickstart", "API reference", "GPU Acceleration", "kraken", "Training", "Models", "Training kraken", "VGSL network specification"], "titleterms": {"acceler": 3, "acquisit": 7, "advanc": 0, "alto": 5, "annot": 7, "api": [1, 2], "baselin": 1, "basic": [1, 8], "binar": [0, 2], "blla": 2, "codec": 2, "concept": 1, "conda": 4, "convolut": 8, "coreml": 6, "ctc_decod": 2, "data": 5, "dataset": 2, "detect": 0, "dropout": 8, "evalu": 7, "exampl": 8, "featur": 4, "fine": 5, "format": 5, "from": 5, "gpu": 3, "group": 8, "helper": 8, "imag": 7, "input": 0, "instal": [4, 7], "kraken": [2, 4, 7], "layer": 8, "legaci": 1, "lib": 2, "licens": 4, "linegen": 2, "max": 8, "model": [0, 2, 4, 6], "modul": 2, "network": 8, "normal": 8, "page": [0, 5], "pageseg": 2, "pars": 1, "pip": 4, "plumb": 8, "pool": 8, "preprocess": [1, 7], "quickstart": [1, 4], "recognit": [0, 1, 5, 7], "recurr": 8, "refer": 2, "regular": 8, "repositori": 0, "reshap": 8, "rpred": 2, "scratch": 5, "script": 0, "segment": [0, 1, 2, 5], "serial": [1, 2], "slice": 5, "specif": [0, 8], "test": 5, "train": [1, 2, 4, 5, 7], "transcrib": 2, "transcript": 7, "tune": 5, "tutori": 4, "usag": 0, "valid": 7, "vgsl": [2, 8], "xml": [1, 2, 5]}}) \ No newline at end of file diff --git a/3.0/training.html b/3.0/training.html new file mode 100644 index 000000000..272ebc0eb --- /dev/null +++ b/3.0/training.html @@ -0,0 +1,505 @@ + + + + + + + + Training kraken — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Training kraken

+

kraken is an optical character recognition package that can be trained fairly +easily for a large number of scripts. In contrast to other system requiring +segmentation down to glyph level before classification, it is uniquely suited +for the recognition of connected scripts, because the neural network is trained +to assign correct character to unsegmented training data.

+

Both segmentation, the process finding lines and regions on a page image, and +recognition, the conversion of line images into text, can be trained in kraken. +To train models for either we require training data, i.e. examples of page +segmentations and transcriptions that are similar to what we want to be able to +recognize. For segmentation the examples are the location of baselines, i.e. +the imaginary lines the text is written on, and polygons of regions. For +recognition these are the text contained in a line. There are multiple ways to +supply training data but the easiest is through PageXML or ALTO files.

+
+

Installing kraken

+

The easiest way to install and use kraken is through conda. kraken works both on Linux and Mac OS +X. After installing conda, download the environment file and create the +environment for kraken:

+
$ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment.yml
+$ conda env create -f environment.yml
+
+
+

Each time you want to use the kraken environment in a shell is has to be +activated first:

+
$ conda activate kraken
+
+
+
+
+

Image acquisition and preprocessing

+

First a number of high quality scans, preferably color or grayscale and at +least 300dpi are required. Scans should be in a lossless image format such as +TIFF or PNG, images in PDF files have to be extracted beforehand using a tool +such as pdftocairo or pdfimages. While each of these requirements can +be relaxed to a degree, the final accuracy will suffer to some extent. For +example, only slightly compressed JPEG scans are generally suitable for +training and recognition.

+

Depending on the source of the scans some preprocessing such as splitting scans +into pages, correcting skew and warp, and removing speckles can be advisable +although it isn’t strictly necessary as the segmenter can be trained to treat +noisy material with a high accuracy. A fairly user-friendly software for +semi-automatic batch processing of image scans is Scantailor albeit most work can be done using a standard image +editor.

+

The total number of scans required depends on the kind of model to train +(segmentation or recognition), the complexity of the layout or the nature of +the script to recognize. Only features that are found in the training data can +later be recognized, so it is important that the coverage of typographic +features is exhaustive. Training a small segmentation model for a particular +kind of material might require less than a few hundred samples while a general +model can well go into the thousands of pages. Likewise a specific recognition +model for printed script with a small grapheme inventory such as Arabic or +Hebrew requires around 800 lines, with manuscripts, complex scripts (such as +polytonic Greek), and general models for multiple typefaces and hands needing +more training data for the same accuracy.

+

There is no hard rule for the amount of training data and it may be required to +retrain a model after the initial training data proves insufficient. Most +western texts contain between 25 and 40 lines per page, therefore upward of +30 pages have to be preprocessed and later transcribed.

+
+
+

Annotation and transcription

+

kraken does not provide internal tools for the annotation and transcription of +baselines, regions, and text. There are a number of tools available that can +create ALTO and PageXML files containing the requisite information for either +segmentation or recognition training: escriptorium integrates kraken tightly including +training and inference, Aletheia is a powerful desktop +application that can create fine grained annotations.

+
+
+

Training

+

The training data, e.g. a collection of PAGE XML documents, obtained through +annotation and transcription may now be used to train segmentation and/or +transcription models.

+

The training data in output_dir may now be used to train a new model by +invoking the ketos train command. Just hand a list of images to the command +such as:

+
$ ketos train output_dir/*.png
+
+
+

to start training.

+

A number of lines will be split off into a separate held-out set that is used +to estimate the actual recognition accuracy achieved in the real world. These +are never shown to the network during training but will be recognized +periodically to evaluate the accuracy of the model. Per default the validation +set will comprise of 10% of the training data.

+

Basic model training is mostly automatic albeit there are multiple parameters +that can be adjusted:

+
+
--output
+

Sets the prefix for models generated during training. They will best as +prefix_epochs.mlmodel.

+
+
--report
+

How often evaluation passes are run on the validation set. It is an +integer equal or larger than 1 with 1 meaning a report is created each +time the complete training set has been seen by the network.

+
+
--savefreq
+

How often intermediate models are saved to disk. It is an integer with +the same semantics as --report.

+
+
--load
+

Continuing training is possible by loading an existing model file with +--load. To continue training from a base model with another +training set refer to the full ketos documentation.

+
+
--preload
+

Enables/disables preloading of the training set into memory for +accelerated training. The default setting preloads data sets with less +than 2500 lines, explicitly adding --preload will preload arbitrary +sized sets. --no-preload disables preloading in all circumstances.

+
+
+

Training a network will take some time on a modern computer, even with the +default parameters. While the exact time required is unpredictable as training +is a somewhat random process a rough guide is that accuracy seldomly improves +after 50 epochs reached between 8 and 24 hours of training.

+

When to stop training is a matter of experience; the default setting employs a +fairly reliable approach known as early stopping that stops training as soon as +the error rate on the validation set doesn’t improve anymore. This will +prevent overfitting, i.e. +fitting the model to recognize only the training data properly instead of the +general patterns contained therein.

+
$ ketos train output_dir/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'}
+Initializing model ✓
+Accuracy report (0) -1.5951 3680 9550
+epoch 0/-1  [####################################]  788/788
+Accuracy report (1) 0.0245 3504 3418
+epoch 1/-1  [####################################]  788/788
+Accuracy report (2) 0.8445 3504 545
+epoch 2/-1  [####################################]  788/788
+Accuracy report (3) 0.9541 3504 161
+epoch 3/-1  [------------------------------------]  13/788  0d 00:22:09
+...
+
+
+

By now there should be a couple of models model_name-1.mlmodel, +model_name-2.mlmodel, … in the directory the script was executed in. Lets +take a look at each part of the output.

+
Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+
+
+

shows the progress of loading the training and validation set into memory. This +might take a while as preprocessing the whole set and putting it into memory is +computationally intensive. Loading can be made faster without preloading at the +cost of performing preprocessing repeatedlyduring the training process.

+
[270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'}
+
+
+

is a warning about missing characters in either the validation or training set, +i.e. that the alphabets of the sets are not equal. Increasing the size of the +validation set will often remedy this warning.

+
Accuracy report (2) 0.8445 3504 545
+
+
+

this line shows the results of the validation set evaluation. The error after 2 +epochs is 545 incorrect characters out of 3504 characters in the validation set +for a character accuracy of 84.4%. It should decrease fairly rapidly. If +accuracy remains around 0.30 something is amiss, e.g. non-reordered +right-to-left or wildly incorrect transcriptions. Abort training, correct the +error(s) and start again.

+

After training is finished the best model is saved as +model_name_best.mlmodel. It is highly recommended to also archive the +training log and data for later reference.

+

ketos can also produce more verbose output with training set and network +information by appending one or more -v to the command:

+
$ ketos -vv train syr/*.png
+[0.7272] Building ground truth set from 876 line images
+[0.7281] Taking 88 lines from training for evaluation
+...
+[0.8479] Training set 788 lines, validation set 88 lines, alphabet 48 symbols
+[0.8481] alphabet mismatch {'\xa0', '0', ':', '݀', '܇', '݂', '5'}
+[0.8482] grapheme       count
+[0.8484] SPACE  5258
+[0.8484]        ܐ       3519
+[0.8485]        ܘ       2334
+[0.8486]        ܝ       2096
+[0.8487]        ܠ       1754
+[0.8487]        ܢ       1724
+[0.8488]        ܕ       1697
+[0.8489]        ܗ       1681
+[0.8489]        ܡ       1623
+[0.8490]        ܪ       1359
+[0.8491]        ܬ       1339
+[0.8491]        ܒ       1184
+[0.8492]        ܥ       824
+[0.8492]        .       811
+[0.8493] COMBINING DOT BELOW    646
+[0.8493]        ܟ       599
+[0.8494]        ܫ       577
+[0.8495] COMBINING DIAERESIS    488
+[0.8495]        ܚ       431
+[0.8496]        ܦ       428
+[0.8496]        ܩ       307
+[0.8497] COMBINING DOT ABOVE    259
+[0.8497]        ܣ       256
+[0.8498]        ܛ       204
+[0.8498]        ܓ       176
+[0.8499]        ܀       132
+[0.8499]        ܙ       81
+[0.8500]        *       66
+[0.8501]        ܨ       59
+[0.8501]        ܆       40
+[0.8502]        [       40
+[0.8503]        ]       40
+[0.8503]        1       18
+[0.8504]        2       11
+[0.8504]        ܇       9
+[0.8505]        3       8
+[0.8505]                6
+[0.8506]        5       5
+[0.8506] NO-BREAK SPACE 4
+[0.8507]        0       4
+[0.8507]        6       4
+[0.8508]        :       4
+[0.8508]        8       4
+[0.8509]        9       3
+[0.8510]        7       3
+[0.8510]        4       3
+[0.8511] SYRIAC FEMININE DOT    1
+[0.8511] SYRIAC RUKKAKHA        1
+[0.8512] Encoding training set
+[0.9315] Creating new model [1,1,0,48 Lbx100 Do] with 49 outputs
+[0.9318] layer          type    params
+[0.9350] 0              rnn     direction b transposed False summarize False out 100 legacy None
+[0.9361] 1              dropout probability 0.5 dims 1
+[0.9381] 2              linear  augmented False out 49
+[0.9918] Constructing RMSprop optimizer (lr: 0.001, momentum: 0.9)
+[0.9920] Set OpenMP threads to 4
+[0.9920] Moving model to device cpu
+[0.9924] Starting evaluation run
+
+
+

indicates that the training is running on 788 transcribed lines and a +validation set of 88 lines. 49 different classes, i.e. Unicode code points, +where found in these 788 lines. These affect the output size of the network; +obviously only these 49 different classes/code points can later be output by +the network. Importantly, we can see that certain characters occur markedly +less often than others. Characters like the Syriac feminine dot and numerals +that occur less than 10 times will most likely not be recognized well by the +trained net.

+
+
+

Evaluation and Validation

+

While output during training is detailed enough to know when to stop training +one usually wants to know the specific kinds of errors to expect. Doing more +in-depth error analysis also allows to pinpoint weaknesses in the training +data, e.g. above average error rates for numerals indicate either a lack of +representation of numerals in the training data or erroneous transcription in +the first place.

+

First the trained model has to be applied to some line transcriptions with the +ketos test command:

+
$ ketos test -m syriac_best.mlmodel lines/*.png
+Loading model syriac_best.mlmodel ✓
+Evaluating syriac_best.mlmodel
+Evaluating  [#-----------------------------------]    3%  00:04:56
+...
+
+
+

After all lines have been processed a evaluation report will be printed:

+
=== report  ===
+
+35619     Characters
+336       Errors
+99.06%    Accuracy
+
+157       Insertions
+81        Deletions
+98        Substitutions
+
+Count     Missed  %Right
+27046     143     99.47%  Syriac
+7015      52      99.26%  Common
+1558      60      96.15%  Inherited
+
+Errors    Correct-Generated
+25        {  } - { COMBINING DOT BELOW }
+25        { COMBINING DOT BELOW } - {  }
+15        { . } - {  }
+15        { COMBINING DIAERESIS } - {  }
+12        { ܢ } - {  }
+10        {  } - { . }
+8 { COMBINING DOT ABOVE } - {  }
+8 { ܝ } - {  }
+7 { ZERO WIDTH NO-BREAK SPACE } - {  }
+7 { ܆ } - {  }
+7 { SPACE } - {  }
+7 { ܣ } - {  }
+6 {  } - { ܝ }
+6 { COMBINING DOT ABOVE } - { COMBINING DIAERESIS }
+5 { ܙ } - {  }
+5 { ܬ } - {  }
+5 {  } - { ܢ }
+4 { NO-BREAK SPACE } - {  }
+4 { COMBINING DIAERESIS } - { COMBINING DOT ABOVE }
+4 {  } - { ܒ }
+4 {  } - { COMBINING DIAERESIS }
+4 { ܗ } - {  }
+4 {  } - { ܬ }
+4 {  } - { ܘ }
+4 { ܕ } - { ܢ }
+3 {  } - { ܕ }
+3 { ܐ } - {  }
+3 { ܗ } - { ܐ }
+3 { ܝ } - { ܢ }
+3 { ܀ } - { . }
+3 {  } - { ܗ }
+
+  .....
+
+
+

The first section of the report consists of a simple accounting of the number +of characters in the ground truth, the errors in the recognition output and the +resulting accuracy in per cent.

+

The next table lists the number of insertions (characters occuring in the +ground truth but not in the recognition output), substitutions (misrecognized +characters), and deletions (superfluous characters recognized by the model).

+

Next is a grouping of errors (insertions and substitutions) by Unicode script.

+

The final part of the report are errors sorted by frequency and a per +character accuracy report. Importantly most errors are incorrect recognition of +combining marks such as dots and diaereses. These may have several sources: +different dot placement in training and validation set, incorrect transcription +such as non-systematic transcription, or unclean speckled scans. Depending on +the error source, correction most often involves adding more training data and +fixing transcriptions. Sometimes it may even be advisable to remove +unrepresentative data from the training set.

+
+
+

Recognition

+

The kraken utility is employed for all non-training related tasks. Optical +character recognition is a multi-step process consisting of binarization +(conversion of input images to black and white), page segmentation (extracting +lines from the image), and recognition (converting line image to character +sequences). All of these may be run in a single call like this:

+
$ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m MODEL_FILE
+
+
+

producing a text file from the input image. There are also hocr and ALTO output +formats available through the appropriate switches:

+
$ kraken -i ... ocr -h
+$ kraken -i ... ocr -a
+
+
+

For debugging purposes it is sometimes helpful to run each step manually and +inspect intermediate results:

+
$ kraken -i INPUT_IMAGE BW_IMAGE binarize
+$ kraken -i BW_IMAGE LINES segment
+$ kraken -i BW_IMAGE OUTPUT_FILE ocr -l LINES ...
+
+
+

It is also possible to recognize more than one file at a time by just chaining +-i ... ... clauses like this:

+
$ kraken -i input_1 output_1 -i input_2 output_2 ...
+
+
+

Finally, there is an central repository containing freely available models. +Getting a list of all available models:

+
$ kraken list
+
+
+

Retrieving model metadata for a particular model:

+
$ kraken show arabic-alam-al-kutub
+name: arabic-alam-al-kutub.mlmodel
+
+An experimental model for Classical Arabic texts.
+
+Network trained on 889 lines of [0] as a test case for a general Classical
+Arabic model. Ground truth was prepared by Sarah Savant
+<sarah.savant@aku.edu> and Maxim Romanov <maxim.romanov@uni-leipzig.de>.
+
+Vocalization was omitted in the ground truth. Training was stopped at ~35000
+iterations with an accuracy of 97%.
+
+[0] Ibn al-Faqīh (d. 365 AH). Kitāb al-buldān. Edited by Yūsuf al-Hādī, 1st
+edition. Bayrūt: ʿĀlam al-kutub, 1416 AH/1996 CE.
+alphabet:  !()-.0123456789:[] «»،؟ءابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ARABIC
+MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW
+
+
+

and actually fetching the model:

+
$ kraken get arabic-alam-al-kutub
+
+
+

The downloaded model can then be used for recognition by the name shown in its metadata, e.g.:

+
$ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m arabic-alam-al-kutub.mlmodel
+
+
+

For more documentation see the kraken website.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/3.0/vgsl.html b/3.0/vgsl.html new file mode 100644 index 000000000..dda0e425f --- /dev/null +++ b/3.0/vgsl.html @@ -0,0 +1,288 @@ + + + + + + + + VGSL network specification — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

VGSL network specification

+

kraken implements a dialect of the Variable-size Graph Specification Language +(VGSL), enabling the specification of different network architectures for image +processing purposes using a short definition string.

+
+

Basics

+

A VGSL specification consists of an input block, one or more layers, and an +output block. For example:

+
[1,48,0,1 Cr3,3,32 Mp2,2 Cr3,3,64 Mp2,2 S1(1x12)1,3 Lbx100 Do O1c103]
+
+
+

The first block defines the input in order of [batch, heigh, width, channels] +with zero-valued dimensions being variable. Integer valued height or width +input specifications will result in the input images being automatically scaled +in either dimension.

+

When channels are set to 1 grayscale or B/W inputs are expected, 3 expects RGB +color images. Higher values in combination with a height of 1 result in the +network being fed 1 pixel wide grayscale strips scaled to the size of the +channel dimension.

+

After the input, a number of layers are defined. Layers operate on the channel +dimension; this is intuitive for convolutional layers but a recurrent layer +doing sequence classification along the width axis on an image of a particular +height requires the height dimension to be moved to the channel dimension, +e.g.:

+
[1,48,0,1 S1(1x48)1,3 Lbx100 O1c103]
+
+
+

or using the alternative slightly faster formulation:

+
[1,1,0,48 Lbx100 O1c103]
+
+
+

Finally an output definition is appended. When training sequence classification +networks with the provided tools the appropriate output definition is +automatically appended to the network based on the alphabet of the training +data.

+
+
+

Examples

+
[1,1,0,48 Lbx100 Do 01c59]
+
+Creating new model [1,1,0,48 Lbx100 Do] with 59 outputs
+layer           type    params
+0               rnn     direction b transposed False summarize False out 100 legacy None
+1               dropout probability 0.5 dims 1
+2               linear  augmented False out 59
+
+
+

A simple recurrent recognition model with a single LSTM layer classifying lines +normalized to 48 pixels in height.

+
[1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do 01c59]
+
+Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 59 outputs
+layer           type    params
+0               conv    kernel 3 x 3 filters 32 activation r
+1               dropout probability 0.1 dims 2
+2               maxpool kernel 2 x 2 stride 2 x 2
+3               conv    kernel 3 x 3 filters 64 activation r
+4               dropout probability 0.1 dims 2
+5               maxpool kernel 2 x 2 stride 2 x 2
+6               reshape from 1 1 x 12 to 1/3
+7               rnn     direction b transposed False summarize False out 100 legacy None
+8               dropout probability 0.5 dims 1
+9               linear  augmented False out 59
+
+
+

A model with a small convolutional stack before a recurrent LSTM layer. The +extended dropout layer syntax is used to reduce drop probability on the depth +dimension as the default is too high for convolutional layers. The remainder of +the height dimension (12) is reshaped into the depth dimensions before +applying the final recurrent and linear layers.

+
[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do 01c59]
+
+Creating new model [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do] with 59 outputs
+layer           type    params
+0               conv    kernel 3 x 3 filters 16 activation r
+1               maxpool kernel 3 x 3 stride 3 x 3
+2               rnn     direction f transposed True summarize True out 64 legacy None
+3               rnn     direction b transposed False summarize False out 128 legacy None
+4               rnn     direction b transposed False summarize False out 256 legacy None
+5               dropout probability 0.5 dims 1
+6               linear  augmented False out 59
+
+
+

A model with arbitrary sized color image input, an initial summarizing +recurrent layer to squash the height to 64, followed by 2 bi-directional +recurrent layers and a linear projection.

+
+
+

Convolutional Layers

+
C[{name}](s|t|r|l|m)[{name}]<y>,<x>,<d>[,<stride_y>,<stride_x>]
+s = sigmoid
+t = tanh
+r = relu
+l = linear
+m = softmax
+
+
+

Adds a 2D convolution with kernel size (y, x) and d output channels, applying +the selected nonlinearity. The stride can be adjusted with the optional last +two parameters.

+
+
+

Recurrent Layers

+
L[{name}](f|r|b)(x|y)[s][{name}]<n> LSTM cell with n outputs.
+G[{name}](f|r|b)(x|y)[s][{name}]<n> GRU cell with n outputs.
+f runs the RNN forward only.
+r runs the RNN reversed only.
+b runs the RNN bidirectionally.
+s (optional) summarizes the output in the requested dimension, return the last step.
+
+
+

Adds either an LSTM or GRU recurrent layer to the network using eiter the x +(width) or y (height) dimension as the time axis. Input features are the +channel dimension and the non-time-axis dimension (height/width) is treated as +another batch dimension. For example, a Lfx25 layer on an 1, 16, 906, 32 +input will execute 16 independent forward passes on 906x32 tensors resulting +in an output of shape 1, 16, 906, 25. If this isn’t desired either run a +summarizing layer in the other direction, e.g. Lfys20 for an input 1, 1, +906, 20, or prepend a reshape layer S1(1x16)1,3 combining the height and +channel dimension for an 1, 1, 906, 512 input to the recurrent layer.

+
+
+

Helper and Plumbing Layers

+
+

Max Pool

+
Mp[{name}]<y>,<x>[,<y_stride>,<x_stride>]
+
+
+

Adds a maximum pooling with (y, x) kernel_size and (y_stride, x_stride) stride.

+
+
+

Reshape

+
S[{name}]<d>(<a>x<b>)<e>,<f> Splits one dimension, moves one part to another
+        dimension.
+
+
+

The S layer reshapes a source dimension d to a,b and distributes a into +dimension e, respectively b into f. Either e or f has to be equal to +d. So S1(1, 48)1, 3 on an 1, 48, 1020, 8 input will first reshape into +1, 1, 48, 1020, 8, leave the 1 part in the height dimension and distribute +the 48 sized tensor into the channel dimension resulting in a 1, 1, 1024, +48*8=384 sized output. S layers are mostly used to remove undesirable non-1 +height before a recurrent layer.

+
+

Note

+

This S layer is equivalent to the one implemented in the tensorflow +implementation of VGSL, i.e. behaves differently from tesseract.

+
+
+
+
+

Regularization Layers

+
+

Dropout

+
Do[{name}][<prob>],[<dim>] Insert a 1D or 2D dropout layer
+
+
+

Adds an 1D or 2D dropout layer with a given probability. Defaults to 0.5 drop +probability and 1D dropout. Set to dim to 2 after convolutional layers.

+
+
+

Group Normalization

+
Gn<groups> Inserts a group normalization layer
+
+
+

Adds a group normalization layer separating the input into <groups> groups, +normalizing each separately.

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.0/.buildinfo b/4.0/.buildinfo new file mode 100644 index 000000000..5ffeded98 --- /dev/null +++ b/4.0/.buildinfo @@ -0,0 +1,4 @@ +# Sphinx build info version 1 +# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. +config: 39820f44c85672ce70d25b526c6d40ae +tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/4.0/.doctrees/advanced.doctree b/4.0/.doctrees/advanced.doctree new file mode 100644 index 000000000..a7a8cbe79 Binary files /dev/null and b/4.0/.doctrees/advanced.doctree differ diff --git a/4.0/.doctrees/api.doctree b/4.0/.doctrees/api.doctree new file mode 100644 index 000000000..eed772079 Binary files /dev/null and b/4.0/.doctrees/api.doctree differ diff --git a/4.0/.doctrees/api_docs.doctree b/4.0/.doctrees/api_docs.doctree new file mode 100644 index 000000000..2fa07e7df Binary files /dev/null and b/4.0/.doctrees/api_docs.doctree differ diff --git a/4.0/.doctrees/environment.pickle b/4.0/.doctrees/environment.pickle new file mode 100644 index 000000000..dcc6795be Binary files /dev/null and b/4.0/.doctrees/environment.pickle differ diff --git a/4.0/.doctrees/gpu.doctree b/4.0/.doctrees/gpu.doctree new file mode 100644 index 000000000..9d3d9ffb1 Binary files /dev/null and b/4.0/.doctrees/gpu.doctree differ diff --git a/4.0/.doctrees/index.doctree b/4.0/.doctrees/index.doctree new file mode 100644 index 000000000..770c3728f Binary files /dev/null and b/4.0/.doctrees/index.doctree differ diff --git a/4.0/.doctrees/ketos.doctree b/4.0/.doctrees/ketos.doctree new file mode 100644 index 000000000..de9ba4289 Binary files /dev/null and b/4.0/.doctrees/ketos.doctree differ diff --git a/4.0/.doctrees/models.doctree b/4.0/.doctrees/models.doctree new file mode 100644 index 000000000..2949bd446 Binary files /dev/null and b/4.0/.doctrees/models.doctree differ diff --git a/4.0/.doctrees/training.doctree b/4.0/.doctrees/training.doctree new file mode 100644 index 000000000..5ea193766 Binary files /dev/null and b/4.0/.doctrees/training.doctree differ diff --git a/4.0/.doctrees/vgsl.doctree b/4.0/.doctrees/vgsl.doctree new file mode 100644 index 000000000..173646e02 Binary files /dev/null and b/4.0/.doctrees/vgsl.doctree differ diff --git a/4.0/.nojekyll b/4.0/.nojekyll new file mode 100644 index 000000000..e69de29bb diff --git a/4.0/_sources/advanced.rst.txt b/4.0/_sources/advanced.rst.txt new file mode 100644 index 000000000..ebb9a0bfb --- /dev/null +++ b/4.0/_sources/advanced.rst.txt @@ -0,0 +1,255 @@ +.. _advanced: + +Advanced Usage +============== + +Optical character recognition is the serial execution of multiple steps, in the +case of kraken binarization (converting color and grayscale images into bitonal +ones), layout analysis/page segmentation (extracting topological text lines +from an image), recognition (feeding text lines images into an classifiers), +and finally serialization of results into an appropriate format such as hOCR or +ALTO. + +Input Specification +------------------- + +All kraken subcommands operating on input-output pairs, i.e. producing one +output document for one input document follow the basic syntax: + +.. code-block:: console + + $ kraken -i input_1 output_1 -i input_2 output_2 ... subcommand_1 subcommand_2 ... subcommand_n + +In particular subcommands may be chained. + +There are other ways to define inputs and outputs as the syntax shown above can +become rather cumbersome for large amounts of files. + +As such there are a couple of ways to deal with multiple files in a compact +way. The first is batch processing: + +.. code-block:: console + + $ kraken -I '*.png' -o ocr.txt segment ... + +which expands the `glob expression +`_ in kraken internally and +appends the suffix defined with `-o` to each output file. An input file +`xyz.png` will therefore produce an output file `xyz.png.ocr.txt`. A second way +is to input multi-image files directly. These can be either in PDF, TIFF, or +JPEG2000 format and are specified like: + +.. code-block:: console + + $ kraken -I some.pdf -o ocr.txt -f pdf segment ... + +This will internally extract all page images from the input PDF file and write +one output file with an index (can be changed using the `-p` option) and the +suffix defined with `-o`. + +The `-f` option can not only be used to extract data from PDF/TIFF/JPEG2000 +files but also various XML formats. In these cases the appropriate data is +automatically selected from the inputs, image data for segmentation or line and +region segmentation for recognition: + +.. code-block:: console + + $ kraken -i alto.xml alto.ocr.txt -i page.xml page.ocr.txt -f xml ocr ... + +The code is able to automatically determine if a file is in PageXML or ALTO format. + +Binarization +------------ + +The binarization subcommand accepts almost the same parameters as +``ocropus-nlbin``. Only options not related to binarization, e.g. skew +detection are missing. In addition, error checking (image sizes, inversion +detection, grayscale enforcement) is always disabled and kraken will happily +binarize any image that is thrown at it. + +Available parameters are: + +=========== ==== +option type +=========== ==== +--threshold FLOAT +--zoom FLOAT +--escale FLOAT +--border FLOAT +--perc INTEGER RANGE +--range INTEGER +--low INTEGER RANGE +--high INTEGER RANGE +=========== ==== + +Page Segmentation and Script Detection +-------------------------------------- + +The `segment` subcommand access two operations page segmentation into lines and +script detection of those lines. + +Page segmentation is mostly parameterless, although a switch to change the +color of column separators has been retained. The segmentation is written as a +`JSON `_ file containing bounding boxes in reading order and +the general text direction (horizontal, i.e. LTR or RTL text in top-to-bottom +reading order or vertical-ltr/rtl for vertical lines read from left-to-right or +right-to-left). + +The script detection splits extracted lines from the segmenter into strip +sharing a particular script that can then be recognized by supplying +appropriate models for each detected script to the `ocr` subcommand. + +Combined output from both consists of lists in the `boxes` field corresponding +to a topographical line and containing one or more bounding boxes of a +particular script. Identifiers are `ISO 15924 +`_ 4 character codes. + +.. code-block:: console + + $ kraken -i 14.tif lines.txt segment + $ cat lines.json + { + "boxes" : [ + [ + ["Grek", [561, 216, 1626,309]] + ], + [ + ["Latn", [2172, 197, 2424, 244]] + ], + [ + ["Grek", [1678, 221, 2236, 320]], + ["Arab", [2241, 221, 2302, 320]] + ], + + ["Grek", [412, 318, 2215, 416]], + ["Latn", [2208, 318, 2424, 416]] + ], + ... + ], + "script_detection": true, + "text_direction" : "horizontal-tb" + } + +Script detection is automatically enabled; by explicitly disabling script +detection the `boxes` field will contain only a list of line bounding boxes: + +.. code-block:: console + + [546, 216, 1626, 309], + [2169, 197, 2423, 244], + [1676, 221, 2293, 320], + ... + [503, 2641, 848, 2681] + +Available page segmentation parameters are: + +=============================================== ====== +option action +=============================================== ====== +-d, --text-direction Sets principal text direction. Valid values are `horizontal-lr`, `horizontal-rl`, `vertical-lr`, and `vertical-rl`. +--scale FLOAT Estimate of the average line height on the page +-m, --maxcolseps Maximum number of columns in the input document. Set to `0` for uni-column layouts. +-b, --black-colseps / -w, --white-colseps Switch to black column separators. +-r, --remove-hlines / -l, --hlines Disables prefiltering of small horizontal lines. Improves segmenter output on some Arabic texts. +=============================================== ====== + +Model Repository +---------------- + +There is a semi-curated `repository +`_ of freely licensed recognition +models that can be accessed from the command line using a few subcommands. For +evaluating a series of models it is also possible to just clone the repository +using the normal git client. + +The ``list`` subcommand retrieves a list of all models available and prints +them including some additional information (identifier, type, and a short +description): + +.. code-block:: console + + $ kraken list + Retrieving model list ✓ + default (pyrnn) - A converted version of en-default.pyrnn.gz + toy (clstm) - A toy model trained on 400 lines of the UW3 data set. + ... + +To access more detailed information the ``show`` subcommand may be used: + +.. code-block:: console + + $ kraken show toy + name: toy.clstm + + A toy model trained on 400 lines of the UW3 data set. + + author: Benjamin Kiessling (mittagessen@l.unchti.me) + http://kraken.re + +If a suitable model has been decided upon it can be retrieved using the ``get`` +subcommand: + +.. code-block:: console + + $ kraken get toy + Retrieving model ✓ + +Models will be placed in $XDG_BASE_DIR and can be accessed using their name as +shown by the ``show`` command, e.g.: + +.. code-block:: console + + $ kraken -i ... ... ocr -m toy + +Additions and updates to existing models are always welcome! Just open a pull +request or write an email. + +Recognition +----------- + +Recognition requires a grey-scale or binarized image, a page segmentation for +that image, and a model file. In particular there is no requirement to use the +page segmentation algorithm contained in the ``segment`` subcommand or the +binarization provided by kraken. + +Multi-script recognition is possible by supplying a script-annotated +segmentation and a mapping between scripts and models: + +.. code-block:: console + + $ kraken -i ... ... ocr -m Grek:porson.clstm -m Latn:antiqua.clstm + +All polytonic Greek text portions will be recognized using the `porson.clstm` +model while Latin text will be fed into the `antiqua.clstm` model. It is +possible to define a fallback model that other text will be fed to: + +.. code-block:: console + + $ kraken -i ... ... ocr -m ... -m ... -m default:porson.clstm + +It is also possible to disable recognition on a particular script by mapping to +the special model keyword `ignore`. Ignored lines will still be serialized but +will not contain any recognition results. + +The ``ocr`` subcommand is able to serialize the recognition results either as +plain text (default), as `hOCR `_, into `ALTO +`_, or abbyyXML containing additional +metadata such as bounding boxes and confidences: + +.. code-block:: console + + $ kraken -i ... ... ocr -t # text output + $ kraken -i ... ... ocr -h # hOCR output + $ kraken -i ... ... ocr -a # ALTO output + $ kraken -i ... ... ocr -y # abbyyXML output + +hOCR output is slightly different from hOCR files produced by ocropus. Each +``ocr_line`` span contains not only the bounding box of the line but also +character boxes (``x_bboxes`` attribute) indicating the coordinates of each +character. In each line alternating sequences of alphanumeric and +non-alphanumeric (in the unicode sense) characters are put into ``ocrx_word`` +spans. Both have bounding boxes as attributes and the recognition confidence +for each character in the ``x_conf`` attribute. + +Paragraph detection has been removed as it was deemed to be unduly dependent on +certain typographic features which may not be valid for your input. diff --git a/4.0/_sources/api.rst.txt b/4.0/_sources/api.rst.txt new file mode 100644 index 000000000..effad7c4f --- /dev/null +++ b/4.0/_sources/api.rst.txt @@ -0,0 +1,406 @@ +API Quickstart +============== + +Kraken provides routines which are usable by third party tools to access all +functionality of the OCR engine. Most functional blocks, binarization, +segmentation, recognition, and serialization are encapsulated in one high +level method each. + +Simple use cases of the API which are mostly useful for debugging purposes are +contained in the `contrib` directory. In general it is recommended to look at +this tutorial, these scripts, or the API reference. The command line drivers +are unnecessarily complex for straightforward applications as they contain lots +of boilerplate to enable all use cases. + +Basic Concepts +-------------- + +The fundamental modules of the API are similar to the command line drivers. +Image inputs and outputs are generally `Pillow `_ +objects and numerical outputs numpy arrays. + +Top-level modules implement high level functionality while :mod:`kraken.lib` +contains loaders and low level methods that usually should not be used if +access to intermediate results is not required. + +Preprocessing and Segmentation +------------------------------ + +The primary preprocessing function is binarization although depending on the +particular setup of the pipeline and the models utilized it can be optional. +For the non-trainable legacy bounding box segmenter binarization is mandatory +although it is still possible to feed color and grayscale images to the +recognizer. The trainable baseline segmenter can work with black and white, +grayscale, and color images, depending on the training data and netork +configuration utilized; though grayscale and color data are used in almost all +cases. + +.. code-block:: python + + >>> from PIL import Image + + >>> from kraken import binarization + + # can be any supported image format and mode + >>> im = Image.open('foo.png') + >>> bw_im = binarization.nlbin(im) + +Legacy segmentation +~~~~~~~~~~~~~~~~~~~ + +The basic parameter of the legacy segmenter consists just of a b/w image +object, although some additional parameters exist, largely to change the +principal text direction (important for column ordering and top-to-bottom +scripts) and explicit masking of non-text image regions: + +.. code-block:: python + + >>> from kraken import pageseg + + >>> seg = pageseg.segment(bw_im) + >>> seg + {'text_direction': 'horizontal-lr', + 'boxes': [[0, 29, 232, 56], + [28, 54, 121, 84], + [9, 73, 92, 117], + [103, 76, 145, 131], + [7, 105, 119, 230], + [10, 228, 126, 345], + ... + ], + 'script_detection': False} + +Baseline segmentation +~~~~~~~~~~~~~~~~~~~~~ + +The baseline segmentation method is based on a neural network that classifies +image pixels into baselines and regions. Because it is trainable, a +segmentation model is required in addition to the image to be segmentation and +it has to be loaded first: + +.. code-block:: python + + >>> from kraken import blla + >>> from kraken.lib import vgsl + + >>> model_path = 'path/to/model/file' + >>> model = vgsl.TorchVGSLModel.load_model(model_path) + +A segmentation model contains a basic neural network and associated metadata +defining the available line and region types, bounding regions, and an +auxiliary baseline location flag for the polygonizer: + +.. raw:: html + :file: _static/kraken_segmodel.svg + +Afterwards they can be fed into the segmentation method +:func:`kraken.blla.segment` with image objects: + +.. code-block:: python + + >>> from kraken import blla + >>> from kraken import serialization + + >>> baseline_seg = blla.segment(im, model=model) + >>> baseline_seg + {'text_direction': 'horizontal-lr', + 'type': 'baselines', + 'script_detection': False, + 'lines': [{'script': 'default', + 'baseline': [[471, 1408], [524, 1412], [509, 1397], [1161, 1412], [1195, 1412]], + 'boundary': [[471, 1408], [491, 1408], [515, 1385], [562, 1388], [575, 1377], ... [473, 1410]]}, + ...], + 'regions': {'$tip':[[[536, 1716], ... [522, 1708], [524, 1716], [536, 1716], ...] + '$par': ... + '$nop': ...}} + >>> alto = serialization.serialize_segmentation(baseline_seg, image_name=im.filename, image_size=im.size, template='alto') + >>> with open('segmentation_output.xml', 'w') as fp: + fp.write(alto) + +Optional parameters are largely the same as for the legacy segmenter, i.e. text +direction and masking. + +Images are automatically converted into the proper mode for recognition, except +in the case of models trained on binary images as there is a plethora of +different algorithms available, each with strengths and weaknesses. For most +material the kraken-provided binarization should be sufficient, though. This +does not mean that a segmentation model trained on RGB images will have equal +accuracy for B/W, grayscale, and RGB inputs. Nevertheless the drop in quality +will often be modest or non-existent for color models while non-binarized +inputs to a binary model will cause severe degradation (and a warning to that +notion). + +Per default segmentation is performed on the CPU although the neural network +can be run on a GPU with the `device` argument. As the vast majority of the +processing required is postprocessing the performance gain will most likely +modest though. + +The above API is the most simple way to perform a complete segmentation. The +process consists of multiple steps such as pixel labelling, separate region and +baseline vectorization, and bounding polygon calculation: + +.. raw:: html + :file: _static/kraken_segmentation.svg + +It is possible to only run a subset of the functionality depending on one's +needs by calling the respective functions in :mod:`kraken.lib.segmentation`. As +part of the sub-library the API is not guaranteed to be stable but it generally +does not change much. Examples of more fine-grained use of the segmentation API +can be found in `contrib/repolygonize.py +`_ +and `contrib/segmentation_overlay.py +`_. + +Recognition +----------- + +Recognition itself is a multi-step process with a neural network producing a +matrix with a confidence value for possible outputs at each time step. This +matrix is decoded into a sequence of integer labels (*label domain*) which are +subsequently mapped into Unicode code points using a codec. Labels and code +points usually correspond one-to-one, i.e. each label is mapped to exactly one +Unicode code point, but if desired more complex codecs can map single labels to +multiple code points, multiple labels to single code points, or multiple labels +to multiple code points (see the :ref:`Codec ` section for further +information). + +.. _recognition_steps: + +.. raw:: html + :file: _static/kraken_recognition.svg + +As the customization of this two-stage decoding process is usually reserved +for specialized use cases, sensible defaults are chosen by default: codecs are +part of the model file and do not have to be supplied manually; the preferred +CTC decoder is an optional parameter of the recognition model object. + +To perform text line recognition a neural network has to be loaded first. A +:class:`kraken.lib.models.TorchSeqRecognizer` is returned which is a wrapper +around the :class:`kraken.lib.vgsl.TorchVGSLModel` class seen above for +segmentation model loading. + +.. code-block:: python + + >>> from kraken.lib import models + + >>> rec_model_path = '/path/to/recognition/model' + >>> model = models.load_any(rec_model_path) + +The sequence recognizer wrapper combines the neural network itself, a +:ref:`codec `, metadata such as the if the input is supposed to be +grayscale or binarized, and an instance of a CTC decoder that performs the +conversion of the raw output tensor of the network into a sequence of labels: + +.. raw:: html + :file: _static/kraken_torchseqrecognizer.svg + +Afterwards, given an image, a segmentation and the model one can perform text +recognition. The code is identical for both legacy and baseline segmentations. +Like for segmentation input images are auto-converted to the correct color +mode, except in the case of binary models for which a warning will be raised if +there is a mismatch for binary input models. + +There are two methods for recognition, a basic single model call +:func:`kraken.rpred.rpred` and a multi-model recognizer +:func:`kraken.rpred.mm_rpred`. The latter is useful for recognizing +multi-scriptal documents, i.e. applying different models to different parts of +a document. + +.. code-block:: python + + >>> from kraken import rpred + # single model recognition + >>> pred_it = rpred(model, im, baseline_seg) + >>> for record in pred_it: + print(record) + +The output isn't just a sequence of characters but an +:class:`kraken.rpred.ocr_record` record object containing the character +prediction, cuts (approximate locations), and confidences. + +.. code-block:: python + + >>> record.cuts + >>> record.prediction + >>> record.confidences + +it is also possible to access the original line information: + +.. code-block:: python + + # for baselines + >>> record.type + 'baselines' + >>> record.line + >>> record.baseline + >>> record.script + + # for box lines + >>> record.type + 'box' + >>> record.line + >>> record.script + +Sometimes the undecoded raw output of the network is required. The :math:`C +\times W` softmax output matrix is accessible as the `outputs` attribute on the +:class:`kraken.lib.models.TorchSeqRecognizer` after each step of the +:func:`kraken.rpred.rpred` iterator. To get a mapping from the label space +:math:`C` the network operates in to Unicode code points a codec is used. An +arbitrary sequence of labels can generate an arbitrary number of Unicode code +points although usually the relation is one-to-one. + +.. code-block:: python + + >>> pred_it = rpred(model, im, baseline_seg) + >>> next(pred_it) + >>> model.output + >>> model.codec.l2c + {'\x01': ' ', + '\x02': '"', + '\x03': "'", + '\x04': '(', + '\x05': ')', + '\x06': '-', + '\x07': '/', + ... + } + +There are several different ways to convert the output matrix to a sequence of +labels that can be decoded into a character sequence. These are contained in +:mod:`kraken.lib.ctc_decoder` with +:func:`kraken.lib.ctc_decoder.greedy_decoder` being the default. + +XML Parsing +----------- + +Sometimes it is desired to take the data in an existing XML serialization +format like PageXML or ALTO and apply an OCR function on it. The +:mod:`kraken.lib.xml` module includes parsers extracting information into data +structures processable with minimal transformtion by the functional blocks: + +.. code-block:: python + + >>> from kraken.lib import xml + + >>> alto_doc = '/path/to/alto' + >>> xml.parse_alto(alto_doc) + {'image': '/path/to/image/file', + 'type': 'baselines', + 'lines': [{'baseline': [(24, 2017), (25, 2078)], + 'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)], + 'text': '', + 'script': 'default'}, + {'baseline': [(79, 2016), (79, 2041)], + 'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)], + 'text': '', + 'script': 'default'}, ...], + 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)], + [(253, 3292), (668, 3292), (668, 3455), (253, 3455)], + [(216, -4), (1015, -4), (1015, 534), (216, 534)]], + 'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)], + [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]], + ...} + } + + >>> page_doc = '/path/to/page' + >>> xml.parse_page(page_doc) + {'image': '/path/to/image/file', + 'type': 'baselines', + 'lines': [{'baseline': [(24, 2017), (25, 2078)], + 'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)], + 'text': '', + 'script': 'default'}, + {'baseline': [(79, 2016), (79, 2041)], + 'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)], + 'text': '', + 'script': 'default'}, ...], + 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)], + [(253, 3292), (668, 3292), (668, 3455), (253, 3455)], + [(216, -4), (1015, -4), (1015, 534), (216, 534)]], + 'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)], + [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]], + ...} + + +Serialization +------------- + +The serialization module can be used to transform the :class:`ocr_records +` returned by the prediction iterator into a text +based (most often XML) format for archival. The module renders `jinja2 +`_ templates in `kraken/templates` through +the :func:`kraken.serialization.serialize` function. + +.. code-block:: python + + >>> from kraken.lib import serialization + + >>> records = [record for record in pred_it] + >>> alto = serialization.serialize(records, image_name='path/to/image', image_size=im.size, template='alto') + >>> with open('output.xml', 'w') as fp: + fp.write(alto) + + +Training +-------- + +Training is largely implemented with the `pytorch lightning +`_ framework. There are separate +`LightningModule`s for recognition and segmentation training and a small +wrapper around the lightning's `Trainer` class that mainly sets up model +handling and verbosity options for the CLI. + + +.. code-block:: python + + >>> from kraken.lib.train import RecognitionModel, KrakenTrainer + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer() + >>> trainer.fit(model) + +Likewise for a baseline and region segmentation model: + +.. code-block:: python + + >>> from kraken.lib.train import SegmentationModel, KrakenTrainer + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = SegmentationModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer() + >>> trainer.fit(model) + +When the `fit()` method is called the dataset is initialized and the training +commences. Both can take quite a bit of time. To get insight into what exactly +is happening the standard `lightning callbacks +`_ +can be attached to the trainer object: + +.. code-block:: python + + >>> from pytorch_lightning.callbacks import Callback + >>> from kraken.lib.train import RecognitionModel, KrakenTrainer + >>> class MyPrintingCallback(Callback): + def on_init_start(self, trainer): + print("Starting to init trainer!") + + def on_init_end(self, trainer): + print("trainer is init now") + + def on_train_end(self, trainer, pl_module): + print("do something when training ends") + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer(enable_progress_bar=False, callbacks=[MyPrintingCallback]) + >>> trainer.fit(model) + Starting to init trainer! + trainer is init now + +This is only a small subset of the training functionality. It is suggested to +have a closer look at the command line parameters for features as transfer +learning, region and baseline filtering, training continuation, and so on. diff --git a/4.0/_sources/api_docs.rst.txt b/4.0/_sources/api_docs.rst.txt new file mode 100644 index 000000000..46379f2b8 --- /dev/null +++ b/4.0/_sources/api_docs.rst.txt @@ -0,0 +1,251 @@ +************* +API Reference +************* + +kraken.blla module +================== + +.. note:: + + `blla` provides the interface to the fully trainable segmenter. For the + legacy segmenter interface refer to the `pageseg` module. Note that + recognition models are not interchangeable between segmenters. + +.. autoapifunction:: kraken.blla.segment + +kraken.pageseg module +===================== + +.. note:: + + `pageseg` is the legacy bounding box-based segmenter. For the trainable + baseline segmenter interface refer to the `blla` module. Note that + recognition models are not interchangeable between segmenters. + +.. autoapifunction:: kraken.pageseg.segment + +kraken.rpred module +=================== + +.. autoapifunction:: kraken.rpred.bidi_record + +.. autoapiclass:: kraken.rpred.mm_rpred + :members: + +.. autoapiclass:: kraken.rpred.ocr_record + :members: + +.. autoapifunction:: kraken.rpred.rpred + + +kraken.serialization module +=========================== + +.. autoapifunction:: kraken.serialization.render_report + +.. autoapifunction:: kraken.serialization.serialize + +.. autoapifunction:: kraken.serialization.serialize_segmentation + +kraken.lib.models module +======================== + +.. autoapiclass:: kraken.lib.models.TorchSeqRecognizer + :members: + +.. autoapifunction:: kraken.lib.models.load_any + +kraken.lib.vgsl module +====================== + +.. autoapiclass:: kraken.lib.vgsl.TorchVGSLModel + :members: + +kraken.lib.xml module +===================== + +.. autoapifunction:: kraken.lib.xml.parse_xml + +.. autoapifunction:: kraken.lib.xml.parse_page + +.. autoapifunction:: kraken.lib.xml.parse_alto + +kraken.lib.codec module +======================= + +.. autoapiclass:: kraken.lib.codec.PytorchCodec + :members: + +kraken.lib.train module +======================= + +Training Schedulers +------------------- + +.. autoapiclass:: kraken.lib.train.TrainScheduler + :members: + +.. autoapiclass:: kraken.lib.train.annealing_step + :members: + +.. autoapiclass:: kraken.lib.train.annealing_const + :members: + +.. autoapiclass:: kraken.lib.train.annealing_exponential + :members: + +.. autoapiclass:: kraken.lib.train.annealing_reduceonplateau + :members: + +.. autoapiclass:: kraken.lib.train.annealing_cosine + :members: + +.. autoapiclass:: kraken.lib.train.annealing_onecycle + :members: + +Training Stoppers +----------------- + +.. autoapiclass:: kraken.lib.train.TrainStopper + :members: + +.. autoapiclass:: kraken.lib.train.EarlyStopping + :members: + +.. autoapiclass:: kraken.lib.train.EpochStopping + :members: + +.. autoapiclass:: kraken.lib.train.NoStopping + :members: + +Loss and Evaluation Functions +----------------------------- + +.. autoapifunction:: kraken.lib.train.recognition_loss_fn + +.. autoapifunction:: kraken.lib.train.baseline_label_loss_fn + +.. autoapifunction:: kraken.lib.train.recognition_evaluator_fn + +.. autoapifunction:: kraken.lib.train.baseline_label_evaluator_fn + +Trainer +------- + +.. autoapiclass:: kraken.lib.train.KrakenTrainer + :members: + + +kraken.lib.dataset module +========================= + +Datasets +-------- + +.. autoapiclass:: kraken.lib.dataset.BaselineSet + :members: + +.. autoapiclass:: kraken.lib.dataset.PolygonGTDataset + :members: + +.. autoapiclass:: kraken.lib.dataset.GroundTruthDataset + :members: + +Helpers +------- + +.. autoapifunction:: kraken.lib.dataset.compute_error + +.. autoapifunction:: kraken.lib.dataset.preparse_xml_data + +.. autoapifunction:: kraken.lib.dataset.generate_input_transforms + +kraken.lib.segmentation module +------------------------------ + +.. autoapifunction:: kraken.lib.segmentation.reading_order + +.. autoapifunction:: kraken.lib.segmentation.polygonal_reading_order + +.. autoapifunction:: kraken.lib.segmentation.denoising_hysteresis_thresh + +.. autoapifunction:: kraken.lib.segmentation.vectorize_lines + +.. autoapifunction:: kraken.lib.segmentation.calculate_polygonal_environment + +.. autoapifunction:: kraken.lib.segmentation.scale_polygonal_lines + +.. autoapifunction:: kraken.lib.segmentation.scale_regions + +.. autoapifunction:: kraken.lib.segmentation.compute_polygon_section + +.. autoapifunction:: kraken.lib.segmentation.extract_polygons + + +kraken.lib.ctc_decoder +====================== + +.. autoapifunction:: kraken.lib.ctc_decoder.beam_decoder + +.. autoapifunction:: kraken.lib.ctc_decoder.greedy_decoder + +.. autoapifunction:: kraken.lib.ctc_decoder.blank_threshold_decoder + +kraken.lib.exceptions +===================== + +.. autoapiclass:: kraken.lib.exceptions.KrakenCodecException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenStopTrainingException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenEncodeException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenRecordException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenInvalidModelException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenInputException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenRepoException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenCairoSurfaceException + :members: + + +Legacy modules +============== + +These modules are retained for compatibility reasons or highly specialized use +cases. In most cases their use is not necessary and they aren't further +developed for interoperability with new functionality, e.g. the transcription +and line generation modules do not work with the baseline segmenter. + +kraken.binarization module +-------------------------- + +.. autoapifunction:: kraken.binarization.nlbin + +kraken.transcribe module +------------------------ + +.. autoapiclass:: kraken.transcribe.TranscriptionInterface + :members: + +kraken.linegen module +--------------------- + +.. autoapiclass:: kraken.transcribe.LineGenerator + :members: + +.. autoapifunction:: kraken.transcribe.ocropy_degrade + +.. autoapifunction:: kraken.transcribe.degrade_line + +.. autoapifunction:: kraken.transcribe.distort_line diff --git a/4.0/_sources/gpu.rst.txt b/4.0/_sources/gpu.rst.txt new file mode 100644 index 000000000..fbb66ba76 --- /dev/null +++ b/4.0/_sources/gpu.rst.txt @@ -0,0 +1,10 @@ +.. _gpu: + +GPU Acceleration +================ + +The latest version of kraken uses a new pytorch backend which enables GPU +acceleration both for training and recognition. Apart from a compatible Nvidia +GPU, CUDA and cuDNN have to be installed so pytorch can run computation on it. + + diff --git a/4.0/_sources/index.rst.txt b/4.0/_sources/index.rst.txt new file mode 100644 index 000000000..1e99c0f83 --- /dev/null +++ b/4.0/_sources/index.rst.txt @@ -0,0 +1,243 @@ +kraken +====== + +.. toctree:: + :hidden: + :maxdepth: 2 + + advanced + Training + API Tutorial + API Reference + Models + +kraken is a turn-key OCR system optimized for historical and non-Latin script +material. + +Features +======== + +kraken's main features are: + + - Fully trainable layout analysis and character recognition + - `Right-to-Left `_, `BiDi + `_, and Top-to-Bottom + script support + - `ALTO `_, PageXML, abbyXML, and hOCR + output + - Word bounding boxes and character cuts + - Multi-script recognition support + - `Public repository `_ of model files + - :ref:`Lightweight model files ` + - :ref:`Variable recognition network architectures ` + +Pull requests and code contributions are always welcome. + +Installation +============ + +Kraken can be run on Linux or Mac OS X (both x64 and ARM). Installation through +the on-board *pip* utility and the `anaconda `_ +scientific computing python are supported. + +Installation using Pip +---------------------- + +.. code-block:: console + + $ pip install kraken + +or by running pip in the git repository: + +.. code-block:: console + + $ pip install . + +If you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to +install the `pdf` extras package for PyPi: + +.. code-block:: console + + $ pip install kraken[pdf] + +or + +.. code-block:: console + + $ pip install .[pdf] + +respectively. + +Installation using Conda +------------------------ + +To install the stable version through `conda `_: + +.. code-block:: console + + $ conda install -c conda-forge -c mittagessen kraken + +Again PDF/multi-page TIFF/JPEG2000 support requires some additional dependencies: + +.. code-block:: console + + $ conda install -c conda-forge pyvips + +The git repository contains some environment files that aid in setting up the latest development version: + +.. code-block:: console + + $ git clone git://github.com/mittagessen/kraken.git + $ cd kraken + $ conda env create -f environment.yml + +or: + +.. code-block:: console + + $ git clone git://github.com/mittagessen/kraken.git + $ cd kraken + $ conda env create -f environment_cuda.yml + +for CUDA acceleration with the appropriate hardware. + +Finding Recognition Models +-------------------------- + +Finally you'll have to scrounge up a recognition model to do the actual +recognition of characters. To download the default English text recognition +model and place it in the user's kraken directory: + +.. code-block:: console + + $ kraken get 10.5281/zenodo.2577813 + +A list of libre models available in the central repository can be retrieved by +running: + +.. code-block:: console + + $ kraken list + +Model metadata can be extracted using: + +.. code-block:: console + + $ kraken show 10.5281/zenodo.2577813 + name: 10.5281/zenodo.2577813 + + A generalized model for English printed text + + This model has been trained on a large corpus of modern printed English text\naugmented with ~10000 lines of historical p + scripts: Latn + alphabet: !"#$%&'()+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]`abcdefghijklmnopqrstuvwxyz{} SPACE + accuracy: 99.95% + license: Apache-2.0 + author(s): Kiessling, Benjamin + date: 2019-02-26 + +Quickstart +========== + +The structure of an OCR software consists of multiple steps, primarily +preprocessing, segmentation, and recognition, each of which takes the output of +the previous step and sometimes additional files such as models and templates +that define how a particular transformation is to be performed. + +In kraken these are separated into different subcommands that can be chained or +ran separately: + +.. raw:: html + :file: _static/kraken_workflow.svg + +Recognizing text on an image using the default parameters including the +prerequisite step of page segmentation: + +.. code-block:: console + + $ kraken -i image.tif image.txt segment -bl ocr + Loading RNN ✓ + Processing ⣻ + +To segment an image into reading-order sorted baselines and regions: + +.. code-block:: console + + $ kraken -i bw.tif lines.json segment -bl + +To OCR an image using the default model: + +.. code-block:: console + + $ kraken -i bw.tif image.txt segment -bl ocr + +To OCR an image using the default model and serialize the output using the ALTO +template: + +.. code-block:: console + + $ kraken -a -i bw.tif image.txt segment -bl ocr + +All commands and their parameters are documented, just add the standard +``--help`` flag for further information. + +Training Tutorial +================= + +There is a training tutorial at :doc:`training`. + +Related Software +================ + +These days kraken is quite closely linked to the `escriptorium +`_ project developed in the same eScripta research +group. eScriptorium provides a user-friendly interface for annotating data, +training models, and inference (but also much more). There is a `gitter channel +`_ that is mostly intended for +coordinating technical development but is also a spot to find people with +experience on applying kraken on a wide variety of material. + +.. _license: + +License +======= + +``Kraken`` is provided under the terms and conditions of the `Apache 2.0 +License `_. + +Funding +======= + +kraken is developed at the `École Pratique des Hautes Études `_, `Université PSL `_. + + +.. container:: twocol + + .. container:: leftside + + .. image:: https://ec.europa.eu/regional_policy/images/information/logos/eu_flag.jpg + :width: 100 + :alt: Co-financed by the European Union + + .. container:: rightside + + This project was partially funded through the RESILIENCE project, funded from + the European Union’s Horizon 2020 Framework Programme for Research and + Innovation. + + +.. container:: twocol + + .. container:: leftside + + .. image:: https://www.gouvernement.fr/sites/default/files/styles/illustration-centre/public/contenu/illustration/2018/10/logo_investirlavenir_rvb.png + :width: 100 + :alt: Received funding from the Programme d’investissements d’Avenir + + .. container:: rightside + + Ce travail a bénéficié d’une aide de l’État gérée par l’Agence Nationale de la + Recherche au titre du Programme d’Investissements d’Avenir portant la référence + ANR-21-ESRE-0005. + + diff --git a/4.0/_sources/ketos.rst.txt b/4.0/_sources/ketos.rst.txt new file mode 100644 index 000000000..ee761532f --- /dev/null +++ b/4.0/_sources/ketos.rst.txt @@ -0,0 +1,656 @@ +.. _ketos: + +Training +======== + +This page describes the training utilities available through the ``ketos`` +command line utility in depth. For a gentle introduction on model training +please refer to the :ref:`tutorial `. + +Both segmentation and recognition are trainable in kraken. The segmentation +model finds baselines and regions on a page image. Recognition models convert +text image lines found by the segmenter into digital text. + +Training data formats +--------------------- + +The training tools accept a variety of training data formats, usually some kind +of custom low level format, the XML-based formats that are commony used for +archival of annotation and transcription data, and in the case of recognizer +training a precompiled binary format. It is recommended to use the XML formats +for segmentation training and the binary format for recognition training. + +ALTO +~~~~ + +Kraken parses and produces files according to ALTO 4.2. An example showing the +attributes necessary for segmentation and recognition training follows: + +.. literalinclude:: alto.xml + :language: xml + :force: + +Importantly, the parser only works with measurements in the pixel domain, i.e. +an unset `MeasurementUnit` or one with an element value of `pixel`. In +addition, as the minimal version required for ingestion is quite new it is +likely that most existing ALTO documents will not contain sufficient +information to be used with kraken out of the box. + +PAGE XML +~~~~~~~~ + +PAGE XML is parsed and produced according to the 2019-07-15 version of the +schema, although the parser is not strict and works with non-conformant output +of a variety of tools. + +.. literalinclude:: pagexml.xml + :language: xml + :force: + +Binary Datasets +~~~~~~~~~~~~~~~ + +.. _binary_datasets: + +In addition to training recognition models directly from XML and image files, a +binary dataset format offering a couple of advantages is supported. Binary +datasets drastically improve loading performance allowing the saturation of +most GPUs with minimal computational overhead while also allowing training with +datasets that are larger than the systems main memory. A minor drawback is a +~30% increase in dataset size in comparison to the raw images + XML approach. + +To realize this speedup the dataset has to be compiled first: + +.. code-block:: console + + $ ketos compile -f xml -o dataset.arrow file_1.xml file_2.xml ... + +if there are a lot of individual lines containing many lines this process can +take a long time. It can easily be parallelized by specifying the number of +separate parsing workers with the `--workers` option: + +.. code-block:: console + + $ ketos compile --workers 8 -f xml ... + +In addition, binary datasets can contain fixed splits which allow +reproducibility and comparability between training and evaluation runs. +Training, validation, and test splits can be pre-defined from multiple sources. +Per default they are sourced from tags defined in the source XML files unless +the option telling kraken to ignore them is set: + +.. code-block:: console + + $ ketos compile --ignore-splits -f xml ... + +Alternatively fixed-proportion random splits can be created ad-hoc during +compile time: + +.. code-block:: console + + $ ketos compile --random-split 0.8 0.1 0.1 ... + +The above line splits assigns 80% of the source lines to the training set, 10% +to the validation set, and 10% to the test set. The training and validation +sets in the dataset file are used automatically by `ketos train` (unless told +otherwise) while the remaining 10% of the test set is selected by `ketos test`. + +Recognition training +-------------------- + +The training utility allows training of :ref:`VGSL ` specified models +both from scratch and from existing models. Here are its most important command line options: + +======================================================= ====== +option action +======================================================= ====== +-o, --output Output model file prefix. Defaults to model. +-s, --spec VGSL spec of the network to train. CTC layer + will be added automatically. default: + [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 + Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] +-a, --append Removes layers before argument and then + appends spec. Only works when loading an + existing model +-i, --load Load existing file to continue training +-F, --savefreq Model save frequency in epochs during + training +-q, --quit Stop condition for training. Set to `early` + for early stopping (default) or `dumb` for fixed + number of epochs. +-N, --epochs Number of epochs to train for. +--min-epochs Minimum number of epochs to train for when using early stopping. +--lag Number of epochs to wait before stopping + training without improvement. Only used when using early stopping. +-d, --device Select device to use (cpu, cuda:0, cuda:1,...). GPU acceleration requires CUDA. +--optimizer Select optimizer (Adam, SGD, RMSprop). +-r, --lrate Learning rate [default: 0.001] +-m, --momentum Momentum used with SGD optimizer. Ignored otherwise. +-w, --weight-decay Weight decay. +--schedule Sets the learning rate scheduler. May be either constant, 1cycle, exponential, cosine, step, or + reduceonplateau. For 1cycle the cycle length is determined by the `--epoch` option. +-p, --partition Ground truth data partition ratio between train/validation set +-u, --normalization Ground truth Unicode normalization. One of NFC, NFKC, NFD, NFKD. +-c, --codec Load a codec JSON definition (invalid if loading existing model) +--resize Codec/output layer resizing option. If set + to `add` code points will be added, `both` + will set the layer to match exactly the + training data, `fail` will abort if training + data and model codec do not match. Only valid when refining an existing model. +-n, --reorder / --no-reorder Reordering of code points to display order. +-t, --training-files File(s) with additional paths to training data. Used to + enforce an explicit train/validation set split and deal with + training sets with more lines than the command line can process. Can be used more than once. +-e, --evaluation-files File(s) with paths to evaluation data. Overrides the `-p` parameter. +-f, --format-type Sets the training and evaluation data format. + Valid choices are 'path', 'xml' (default), 'alto', 'page', or binary. + In `alto`, `page`, and xml mode all data is extracted from XML files + containing both baselines and a link to source images. + In `path` mode arguments are image files sharing a prefix up to the last + extension with JSON `.path` files containing the baseline information. + In `binary` mode arguments are precompiled binary dataset files. +--augment / --no-augment Enables/disables data augmentation. +--workers Number of OpenMP threads and workers used to perform neural network passes and load samples from the dataset. +======================================================= ====== + +From Scratch +~~~~~~~~~~~~ + +The absolute minimal example to train a new recognition model from a number of +ALTO or PAGE XML documents is similar to the segmentation training: + +.. code-block:: console + + $ ketos train -f xml training_data/*.xml + +Training will continue until the error does not improve anymore and the best +model (among intermediate results) will be saved in the current directory; this +approach is called early stopping. + +In some cases, such as color inputs, changing the network architecture might be +useful: + +.. code-block:: console + + $ ketos train -f page -s '[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do]' syr/*.xml + +Complete documentation for the network description language can be found on the +:ref:`VGSL ` page. + +Sometimes the early stopping default parameters might produce suboptimal +results such as stopping training too soon. Adjusting the minimum delta an/or +lag can be useful: + +.. code-block:: console + + $ ketos train --lag 10 --min-delta 0.001 syr/*.png + +To switch optimizers from Adam to SGD or RMSprop just set the option: + +.. code-block:: console + + $ ketos train --optimizer SGD syr/*.png + +It is possible to resume training from a previously saved model: + +.. code-block:: console + + $ ketos train -i model_25.mlmodel syr/*.png + +A good configuration for a small precompiled print dataset and GPU acceleration +would be: + +.. code-block:: console + + $ ketos train -d cuda -f binary dataset.arrow + +A better configuration for large and complicated datasets such as handwritten texts: + +.. code-block:: console + + $ ketos train --augment --workers 4 -d cuda -f binary --min-epochs 20 -w 0 -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]' -r 0.0001 dataset_large.arrow + +This configuration is slower to train and often requires a couple of epochs to +output any sensible text at all. Therefore we tell ketos to train for at least +20 epochs so the early stopping algorithm doesn't prematurely interrupt the +training process. + +Fine Tuning +~~~~~~~~~~~ + +Fine tuning an existing model for another typeface or new characters is also +possible with the same syntax as resuming regular training: + +.. code-block:: console + + $ ketos train -f page -i model_best.mlmodel syr/*.xml + +The caveat is that the alphabet of the base model and training data have to be +an exact match. Otherwise an error will be raised: + +.. code-block:: console + + $ ketos train -i model_5.mlmodel kamil/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [0.8616] alphabet mismatch {'~', '»', '8', '9', 'ـ'} + Network codec not compatible with training set + [0.8620] Training data and model codec alphabets mismatch: {'ٓ', '؟', '!', 'ص', '،', 'ذ', 'ة', 'ي', 'و', 'ب', 'ز', 'ح', 'غ', '~', 'ف', ')', 'د', 'خ', 'م', '»', 'ع', 'ى', 'ق', 'ش', 'ا', 'ه', 'ك', 'ج', 'ث', '(', 'ت', 'ظ', 'ض', 'ل', 'ط', '؛', 'ر', 'س', 'ن', 'ء', 'ٔ', '«', 'ـ', 'ٕ'} + +There are two modes dealing with mismatching alphabets, ``add`` and ``both``. +``add`` resizes the output layer and codec of the loaded model to include all +characters in the new training set without removing any characters. ``both`` +will make the resulting model an exact match with the new training set by both +removing unused characters from the model and adding new ones. + +.. code-block:: console + + $ ketos -v train --resize add -i model_5.mlmodel syr/*.png + ... + [0.7943] Training set 788 lines, validation set 88 lines, alphabet 50 symbols + ... + [0.8337] Resizing codec to include 3 new code points + [0.8374] Resizing last layer in network to 52 outputs + ... + +In this example 3 characters were added for a network that is able to +recognize 52 different characters after sufficient additional training. + +.. code-block:: console + + $ ketos -v train --resize both -i model_5.mlmodel syr/*.png + ... + [0.7593] Training set 788 lines, validation set 88 lines, alphabet 49 symbols + ... + [0.7857] Resizing network or given codec to 49 code sequences + [0.8344] Deleting 2 output classes from network (46 retained) + ... + +In ``both`` mode 2 of the original characters were removed and 3 new ones were added. + +Slicing +~~~~~~~ + +Refining on mismatched alphabets has its limits. If the alphabets are highly +different the modification of the final linear layer to add/remove character +will destroy the inference capabilities of the network. In those cases it is +faster to slice off the last few layers of the network and only train those +instead of a complete network from scratch. + +Taking the default network definition as printed in the debug log we can see +the layer indices of the model: + +.. code-block:: console + + [0.8760] Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 48 outputs + [0.8762] layer type params + [0.8790] 0 conv kernel 3 x 3 filters 32 activation r + [0.8795] 1 dropout probability 0.1 dims 2 + [0.8797] 2 maxpool kernel 2 x 2 stride 2 x 2 + [0.8802] 3 conv kernel 3 x 3 filters 64 activation r + [0.8804] 4 dropout probability 0.1 dims 2 + [0.8806] 5 maxpool kernel 2 x 2 stride 2 x 2 + [0.8813] 6 reshape from 1 1 x 12 to 1/3 + [0.8876] 7 rnn direction b transposed False summarize False out 100 legacy None + [0.8878] 8 dropout probability 0.5 dims 1 + [0.8883] 9 linear augmented False out 48 + +To remove everything after the initial convolutional stack and add untrained +layers we define a network stub and index for appending: + +.. code-block:: console + + $ ketos train -i model_1.mlmodel --append 7 -s '[Lbx256 Do]' syr/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [0.8014] alphabet mismatch {'8', '3', '9', '7', '܇', '݀', '݂', '4', ':', '0'} + Slicing and dicing model ✓ + +The new model will behave exactly like a new one, except potentially training a +lot faster. + +Text Normalization and Unicode +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. note: + + The description of the different behaviors of Unicode text below are highly + abbreviated. If confusion arrises it is recommended to take a look at the + linked documents which are more exhaustive and include visual examples. + +Text can be encoded in multiple different ways when using Unicode. For many +scripts characters with diacritics can be encoded either as a single code point +or a base character and the diacritic, `different types of whitespace +`_ exist, and mixed bidirectional text +can be written differently depending on the `base line direction +`_. + +Ketos provides options to largely normalize input into normalized forms that +make processing of data from multiple sources possible. Principally, two +options are available: one for `Unicode normalization +`_ and one for whitespace normalization. The +Unicode normalization (disabled per default) switch allows one to select one of +the 4 normalization forms: + +.. code-block:: console + + $ ketos train --normalization NFD -f xml training_data/*.xml + $ ketos train --normalization NFC -f xml training_data/*.xml + $ ketos train --normalization NFKD -f xml training_data/*.xml + $ ketos train --normalization NFKC -f xml training_data/*.xml + +Whitespace normalization is enabled per default and converts all Unicode +whitespace characters into a simple space. It is highly recommended to leave +this function enabled as the variation of space width, resulting either from +text justification or the irregularity of handwriting, is difficult for a +recognition model to accurately model and map onto the different space code +points. Nevertheless it can be disabled through: + +.. code-block:: console + + $ ketos train --no-normalize-whitespace -f xml training_data/*.xml + +Further the behavior of the `BiDi algorithm +`_ can be influenced through two options. The +configuration of the algorithm is important as the recognition network is +trained to output characters (or rather labels which are mapped to code points +by a :ref:`codec `) in the order a line is fed into the network, i.e. +left-to-right also called display order. Unicode text is encoded as a stream of +code points in logical order, i.e. the order the characters in a line are read +in by a human reader, for example (mostly) right-to-left for a text in Hebrew. +The BiDi algorithm resolves this logical order to the display order expected by +the network and vice versa. The primary parameter of the algorithm is the base +direction which is just the default direction of the input fields of the user +when the ground truth was initially transcribed. Base direction will be +automatically determined by kraken when using PAGE XML or ALTO files that +contain it, otherwise it will have to be supplied if it differs from the +default when training a model: + +.. code-block:: console + + $ ketos train --base-dir R -f xml rtl_training_data/*.xml + +It is also possible to disable BiDi processing completely, e.g. when the text +has been brought into display order already: + +.. code-block:: console + + $ ketos train --no-reorder -f xml rtl_display_data/*.xml + +Codecs +~~~~~~ + +.. _codecs: + +Codecs map between the label decoded from the raw network output and Unicode +code points (see :ref:`this ` diagram for the precise steps +involved in text line recognition). Codecs are attached to a recognition model +and are usually defined once at initial training time, although they can be +adapted either explicitly (with the API) or implicitly through domain adaptation. + +The default behavior of kraken is to auto-infer this mapping from all the +characters in the training set and map each code point to one separate label. +This is usually sufficient for alphabetic scripts, abjads, and abugidas apart +from very specialised use cases. Logographic writing systems with a very large +number of different graphemes, such as all the variants of Han characters or +Cuneiform, can be more problematic as their large inventory makes recognition +both slow and error-prone. In such cases it can be advantageous to decompose +each code point into multiple labels to reduce the output dimensionality of the +network. During decoding valid sequences of labels will be mapped to their +respective code points as usual. + +There are multiple approaches one could follow constructing a custom codec: +*randomized block codes*, i.e. producing random fixed-length labels for each code +point, *Huffmann coding*, i.e. variable length label sequences depending on the +frequency of each code point in some text (not necessarily the training set), +or *structural decomposition*, i.e. describing each code point through a +sequence of labels that describe the shape of the grapheme similar to how some +input systems for Chinese characters function. + +While the system is functional it is not well-tested in practice and it is +unclear which approach works best for which kinds of inputs. + +Custom codecs can be supplied as simple JSON files that contain a dictionary +mapping between strings and integer sequences, e.g.: + +.. code-block:: console + + $ ketos train -c sample.codec -f xml training_data/*.xml + +with `sample.codec` containing: + +.. code-block:: json + + {"S": [50, 53, 74, 23], + "A": [95, 60, 19, 95], + "B": [2, 96, 28, 29], + "\u1f05": [91, 14, 95, 90]} + +Segmentation training +--------------------- + +Training a segmentation model is very similar to training models for text +recognition. The basic invocation is: + +.. code-block:: console + + $ ketos segtrain -f xml training_data/*.xml + Training line types: + default 2 53980 + foo 8 134 + Training region types: + graphic 3 135 + text 4 1128 + separator 5 5431 + paragraph 6 10218 + table 7 16 + val check [------------------------------------] 0/0 + +This takes all text lines and regions encoded in the XML files and trains a +model to recognize them. + +Most other options available in transcription training are also available in +segmentation training. CUDA acceleration: + +.. code-block:: console + + $ ketos segtrain -d cuda -f xml training_data/*.xml + +Defining custom architectures: + +.. code-block:: console + + $ ketos segtrain -d cuda -s '[1,1200,0,3 Cr7,7,64,2,2 Gn32 Cr3,3,128,2,2 Gn32 Cr3,3,128 Gn32 Cr3,3,256 Gn32]' -f xml training_data/*.xml + +Fine tuning/transfer learning with last layer adaptation and slicing: + +.. code-block:: console + + $ ketos segtrain --resize both -i segmodel_best.mlmodel training_data/*.xml + $ ketos segtrain -i segmodel_best.mlmodel --append 7 -s '[Cr3,3,64 Do0.1]' training_data/*.xml + +In addition there are a couple of specific options that allow filtering of +baseline and region types. Datasets are often annotated to a level that is too +detailled or contains undesirable types, e.g. when combining segmentation data +from different sources. The most basic option is the suppression of *all* of +either baseline or region data contained in the dataset: + +.. code-block:: console + + $ ketos segtrain --suppress-baselines -f xml training_data/*.xml + Training line types: + Training region types: + graphic 3 135 + text 4 1128 + separator 5 5431 + paragraph 6 10218 + table 7 16 + ... + $ ketos segtrain --suppress-regions -f xml training-data/*.xml + Training line types: + default 2 53980 + foo 8 134 + ... + +It is also possible to filter out baselines/regions selectively: + +.. code-block:: console + + $ ketos segtrain -f xml --valid-baselines default training_data/*.xml + Training line types: + default 2 53980 + Training region types: + graphic 3 135 + text 4 1128 + separator 5 5431 + paragraph 6 10218 + table 7 16 + $ ketos segtrain -f xml --valid-regions graphic --valid-regions paragraph training_data/*.xml + Training line types: + default 2 53980 + Training region types: + graphic 3 135 + paragraph 6 10218 + +Finally, we can merge baselines and regions into each other: + +.. code-block:: console + + $ ketos segtrain -f xml --merge-baselines default:foo training_data/*.xml + Training line types: + default 2 54114 + ... + $ ketos segtrain -f xml --merge-regions text:paragraph --merge-regions graphic:table training_data/*.xml + ... + Training region types: + graphic 3 151 + text 4 11346 + separator 5 5431 + ... + +These options are combinable to massage the dataset into any typology you want. + +Then there are some options that set metadata fields controlling the +postprocessing. When computing the bounding polygons the recognized baselines +are offset slightly to ensure overlap with the line corpus. This offset is per +default upwards for baselines but as it is possible to annotate toplines (for +scripts like Hebrew) and centerlines (for baseline-free scripts like Chinese) +the appropriate offset can be selected with an option: + +.. code-block:: console + + $ ketos segtrain --topline -f xml hebrew_training_data/*.xml + $ ketos segtrain --centerline -f xml chinese_training_data/*.xml + $ ketos segtrain --baseline -f xml latin_training_data/*.xml + +Lastly, there are some regions that are absolute boundaries for text line +content. When these regions are marked as such the polygonization can sometimes +be improved: + +.. code-block:: console + + $ ketos segtrain --bounding-regions paragraph -f xml training_data/*.xml + ... + +Recognition Testing +------------------- + +Picking a particular model from a pool or getting a more detailed look on the +recognition accuracy can be done with the `test` command. It uses transcribed +lines, the test set, in the same format as the `train` command, recognizes the +line images with one or more models, and creates a detailed report of the +differences from the ground truth for each of them. + +======================================================= ====== +option action +======================================================= ====== +-f, --format-type Sets the test set data format. + Valid choices are 'path', 'xml' (default), 'alto', 'page', or binary. + In `alto`, `page`, and xml mode all data is extracted from XML files + containing both baselines and a link to source images. + In `path` mode arguments are image files sharing a prefix up to the last + extension with JSON `.path` files containing the baseline information. + In `binary` mode arguments are precompiled binary dataset files. +-m, --model Model(s) to evaluate. +-e, --evaluation-files File(s) with paths to evaluation data. +-d, --device Select device to use. +--pad Left and right padding around lines. +======================================================= ====== + +Transcriptions are handed to the command in the same way as for the `train` +command, either through a manifest with `-e/--evaluation-files` or by just +adding a number of image files as the final argument: + +.. code-block:: console + + $ ketos test -m $model -e test.txt test/*.png + Evaluating $model + Evaluating [####################################] 100% + === report test_model.mlmodel === + + 7012 Characters + 6022 Errors + 14.12% Accuracy + + 5226 Insertions + 2 Deletions + 794 Substitutions + + Count Missed %Right + 1567 575 63.31% Common + 5230 5230 0.00% Arabic + 215 215 0.00% Inherited + + Errors Correct-Generated + 773 { ا } - { } + 536 { ل } - { } + 328 { و } - { } + 274 { ي } - { } + 266 { م } - { } + 256 { ب } - { } + 246 { ن } - { } + 241 { SPACE } - { } + 207 { ر } - { } + 199 { ف } - { } + 192 { ه } - { } + 174 { ع } - { } + 172 { ARABIC HAMZA ABOVE } - { } + 144 { ت } - { } + 136 { ق } - { } + 122 { س } - { } + 108 { ، } - { } + 106 { د } - { } + 82 { ك } - { } + 81 { ح } - { } + 71 { ج } - { } + 66 { خ } - { } + 62 { ة } - { } + 60 { ص } - { } + 39 { ، } - { - } + 38 { ش } - { } + 30 { ا } - { - } + 30 { ن } - { - } + 29 { ى } - { } + 28 { ذ } - { } + 27 { ه } - { - } + 27 { ARABIC HAMZA BELOW } - { } + 25 { ز } - { } + 23 { ث } - { } + 22 { غ } - { } + 20 { م } - { - } + 20 { ي } - { - } + 20 { ) } - { } + 19 { : } - { } + 19 { ط } - { } + 19 { ل } - { - } + 18 { ، } - { . } + 17 { ة } - { - } + 16 { ض } - { } + ... + Average accuracy: 14.12%, (stddev: 0.00) + +The report(s) contains character accuracy measured per script and a detailed +list of confusions. When evaluating multiple models the last line of the output +will the average accuracy and the standard deviation across all of them. + + diff --git a/4.0/_sources/models.rst.txt b/4.0/_sources/models.rst.txt new file mode 100644 index 000000000..b393f0738 --- /dev/null +++ b/4.0/_sources/models.rst.txt @@ -0,0 +1,24 @@ +.. _models: + +Models +====== + +There are currently three kinds of models containing the recurrent neural +networks doing all the character recognition supported by kraken: ``pronn`` +files serializing old pickled ``pyrnn`` models as protobuf, clstm's native +serialization, and versatile `Core ML +`_ models. + +CoreML +------ + +Core ML allows arbitrary network architectures in a compact serialization with +metadata. This is the default format in pytorch-based kraken. + +Segmentation Models +------------------- + +Recognition Models +------------------ + + diff --git a/4.0/_sources/training.rst.txt b/4.0/_sources/training.rst.txt new file mode 100644 index 000000000..f514da49b --- /dev/null +++ b/4.0/_sources/training.rst.txt @@ -0,0 +1,463 @@ +.. _training: + +Training kraken +=============== + +kraken is an optical character recognition package that can be trained fairly +easily for a large number of scripts. In contrast to other system requiring +segmentation down to glyph level before classification, it is uniquely suited +for the recognition of connected scripts, because the neural network is trained +to assign correct character to unsegmented training data. + +Both segmentation, the process finding lines and regions on a page image, and +recognition, the conversion of line images into text, can be trained in kraken. +To train models for either we require training data, i.e. examples of page +segmentations and transcriptions that are similar to what we want to be able to +recognize. For segmentation the examples are the location of baselines, i.e. +the imaginary lines the text is written on, and polygons of regions. For +recognition these are the text contained in a line. There are multiple ways to +supply training data but the easiest is through PageXML or ALTO files. + +Installing kraken +----------------- + +The easiest way to install and use kraken is through `conda +`_. kraken works both on Linux and Mac OS +X. After installing conda, download the environment file and create the +environment for kraken: + +.. code-block:: console + + $ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment.yml + $ conda env create -f environment.yml + +Each time you want to use the kraken environment in a shell is has to be +activated first: + +.. code-block:: console + + $ conda activate kraken + +Image acquisition and preprocessing +----------------------------------- + +First a number of high quality scans, preferably color or grayscale and at +least 300dpi are required. Scans should be in a lossless image format such as +TIFF or PNG, images in PDF files have to be extracted beforehand using a tool +such as ``pdftocairo`` or ``pdfimages``. While each of these requirements can +be relaxed to a degree, the final accuracy will suffer to some extent. For +example, only slightly compressed JPEG scans are generally suitable for +training and recognition. + +Depending on the source of the scans some preprocessing such as splitting scans +into pages, correcting skew and warp, and removing speckles can be advisable +although it isn't strictly necessary as the segmenter can be trained to treat +noisy material with a high accuracy. A fairly user-friendly software for +semi-automatic batch processing of image scans is `Scantailor +`_ albeit most work can be done using a standard image +editor. + +The total number of scans required depends on the kind of model to train +(segmentation or recognition), the complexity of the layout or the nature of +the script to recognize. Only features that are found in the training data can +later be recognized, so it is important that the coverage of typographic +features is exhaustive. Training a small segmentation model for a particular +kind of material might require less than a few hundred samples while a general +model can well go into the thousands of pages. Likewise a specific recognition +model for printed script with a small grapheme inventory such as Arabic or +Hebrew requires around 800 lines, with manuscripts, complex scripts (such as +polytonic Greek), and general models for multiple typefaces and hands needing +more training data for the same accuracy. + +There is no hard rule for the amount of training data and it may be required to +retrain a model after the initial training data proves insufficient. Most +``western`` texts contain between 25 and 40 lines per page, therefore upward of +30 pages have to be preprocessed and later transcribed. + +Annotation and transcription +---------------------------- + +kraken does not provide internal tools for the annotation and transcription of +baselines, regions, and text. There are a number of tools available that can +create ALTO and PageXML files containing the requisite information for either +segmentation or recognition training: `escriptorium +`_ integrates kraken tightly including +training and inference, `Aletheia +`_ is a powerful desktop +application that can create fine grained annotations. + +Dataset Compilation +------------------- + +.. _compilation: + +Training +-------- + +.. _training_step: + +The training data, e.g. a collection of PAGE XML documents, obtained through +annotation and transcription may now be used to train segmentation and/or +transcription models. + +The training data in ``output_dir`` may now be used to train a new model by +invoking the ``ketos train`` command. Just hand a list of images to the command +such as: + +.. code-block:: console + + $ ketos train output_dir/*.png + +to start training. + +A number of lines will be split off into a separate held-out set that is used +to estimate the actual recognition accuracy achieved in the real world. These +are never shown to the network during training but will be recognized +periodically to evaluate the accuracy of the model. Per default the validation +set will comprise of 10% of the training data. + +Basic model training is mostly automatic albeit there are multiple parameters +that can be adjusted: + +--output + Sets the prefix for models generated during training. They will best as + ``prefix_epochs.mlmodel``. +--report + How often evaluation passes are run on the validation set. It is an + integer equal or larger than 1 with 1 meaning a report is created each + time the complete training set has been seen by the network. +--savefreq + How often intermediate models are saved to disk. It is an integer with + the same semantics as ``--report``. +--load + Continuing training is possible by loading an existing model file with + ``--load``. To continue training from a base model with another + training set refer to the full :ref:`ketos ` documentation. +--preload + Enables/disables preloading of the training set into memory for + accelerated training. The default setting preloads data sets with less + than 2500 lines, explicitly adding ``--preload`` will preload arbitrary + sized sets. ``--no-preload`` disables preloading in all circumstances. + +Training a network will take some time on a modern computer, even with the +default parameters. While the exact time required is unpredictable as training +is a somewhat random process a rough guide is that accuracy seldom improves +after 50 epochs reached between 8 and 24 hours of training. + +When to stop training is a matter of experience; the default setting employs a +fairly reliable approach known as `early stopping +`_ that stops training as soon as +the error rate on the validation set doesn't improve anymore. This will +prevent `overfitting `_, i.e. +fitting the model to recognize only the training data properly instead of the +general patterns contained therein. + +.. code-block:: console + + $ ketos train output_dir/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'} + Initializing model ✓ + Accuracy report (0) -1.5951 3680 9550 + epoch 0/-1 [####################################] 788/788 + Accuracy report (1) 0.0245 3504 3418 + epoch 1/-1 [####################################] 788/788 + Accuracy report (2) 0.8445 3504 545 + epoch 2/-1 [####################################] 788/788 + Accuracy report (3) 0.9541 3504 161 + epoch 3/-1 [------------------------------------] 13/788 0d 00:22:09 + ... + +By now there should be a couple of models model_name-1.mlmodel, +model_name-2.mlmodel, ... in the directory the script was executed in. Lets +take a look at each part of the output. + +.. code-block:: console + + Building training set [####################################] 100% + Building validation set [####################################] 100% + +shows the progress of loading the training and validation set into memory. This +might take a while as preprocessing the whole set and putting it into memory is +computationally intensive. Loading can be made faster without preloading at the +cost of performing preprocessing repeatedly during the training process. + +.. code-block:: console + + [270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'} + +is a warning about missing characters in either the validation or training set, +i.e. that the alphabets of the sets are not equal. Increasing the size of the +validation set will often remedy this warning. + +.. code-block:: console + + Accuracy report (2) 0.8445 3504 545 + +this line shows the results of the validation set evaluation. The error after 2 +epochs is 545 incorrect characters out of 3504 characters in the validation set +for a character accuracy of 84.4%. It should decrease fairly rapidly. If +accuracy remains around 0.30 something is amiss, e.g. non-reordered +right-to-left or wildly incorrect transcriptions. Abort training, correct the +error(s) and start again. + +After training is finished the best model is saved as +``model_name_best.mlmodel``. It is highly recommended to also archive the +training log and data for later reference. + +``ketos`` can also produce more verbose output with training set and network +information by appending one or more ``-v`` to the command: + +.. code-block:: console + + $ ketos -vv train syr/*.png + [0.7272] Building ground truth set from 876 line images + [0.7281] Taking 88 lines from training for evaluation + ... + [0.8479] Training set 788 lines, validation set 88 lines, alphabet 48 symbols + [0.8481] alphabet mismatch {'\xa0', '0', ':', '݀', '܇', '݂', '5'} + [0.8482] grapheme count + [0.8484] SPACE 5258 + [0.8484] ܐ 3519 + [0.8485] ܘ 2334 + [0.8486] ܝ 2096 + [0.8487] ܠ 1754 + [0.8487] ܢ 1724 + [0.8488] ܕ 1697 + [0.8489] ܗ 1681 + [0.8489] ܡ 1623 + [0.8490] ܪ 1359 + [0.8491] ܬ 1339 + [0.8491] ܒ 1184 + [0.8492] ܥ 824 + [0.8492] . 811 + [0.8493] COMBINING DOT BELOW 646 + [0.8493] ܟ 599 + [0.8494] ܫ 577 + [0.8495] COMBINING DIAERESIS 488 + [0.8495] ܚ 431 + [0.8496] ܦ 428 + [0.8496] ܩ 307 + [0.8497] COMBINING DOT ABOVE 259 + [0.8497] ܣ 256 + [0.8498] ܛ 204 + [0.8498] ܓ 176 + [0.8499] ܀ 132 + [0.8499] ܙ 81 + [0.8500] * 66 + [0.8501] ܨ 59 + [0.8501] ܆ 40 + [0.8502] [ 40 + [0.8503] ] 40 + [0.8503] 1 18 + [0.8504] 2 11 + [0.8504] ܇ 9 + [0.8505] 3 8 + [0.8505] 6 + [0.8506] 5 5 + [0.8506] NO-BREAK SPACE 4 + [0.8507] 0 4 + [0.8507] 6 4 + [0.8508] : 4 + [0.8508] 8 4 + [0.8509] 9 3 + [0.8510] 7 3 + [0.8510] 4 3 + [0.8511] SYRIAC FEMININE DOT 1 + [0.8511] SYRIAC RUKKAKHA 1 + [0.8512] Encoding training set + [0.9315] Creating new model [1,1,0,48 Lbx100 Do] with 49 outputs + [0.9318] layer type params + [0.9350] 0 rnn direction b transposed False summarize False out 100 legacy None + [0.9361] 1 dropout probability 0.5 dims 1 + [0.9381] 2 linear augmented False out 49 + [0.9918] Constructing RMSprop optimizer (lr: 0.001, momentum: 0.9) + [0.9920] Set OpenMP threads to 4 + [0.9920] Moving model to device cpu + [0.9924] Starting evaluation run + + +indicates that the training is running on 788 transcribed lines and a +validation set of 88 lines. 49 different classes, i.e. Unicode code points, +where found in these 788 lines. These affect the output size of the network; +obviously only these 49 different classes/code points can later be output by +the network. Importantly, we can see that certain characters occur markedly +less often than others. Characters like the Syriac feminine dot and numerals +that occur less than 10 times will most likely not be recognized well by the +trained net. + + +Evaluation and Validation +------------------------- + +While output during training is detailed enough to know when to stop training +one usually wants to know the specific kinds of errors to expect. Doing more +in-depth error analysis also allows to pinpoint weaknesses in the training +data, e.g. above average error rates for numerals indicate either a lack of +representation of numerals in the training data or erroneous transcription in +the first place. + +First the trained model has to be applied to some line transcriptions with the +`ketos test` command: + +.. code-block:: console + + $ ketos test -m syriac_best.mlmodel lines/*.png + Loading model syriac_best.mlmodel ✓ + Evaluating syriac_best.mlmodel + Evaluating [#-----------------------------------] 3% 00:04:56 + ... + +After all lines have been processed a evaluation report will be printed: + +.. code-block:: console + + === report === + + 35619 Characters + 336 Errors + 99.06% Accuracy + + 157 Insertions + 81 Deletions + 98 Substitutions + + Count Missed %Right + 27046 143 99.47% Syriac + 7015 52 99.26% Common + 1558 60 96.15% Inherited + + Errors Correct-Generated + 25 { } - { COMBINING DOT BELOW } + 25 { COMBINING DOT BELOW } - { } + 15 { . } - { } + 15 { COMBINING DIAERESIS } - { } + 12 { ܢ } - { } + 10 { } - { . } + 8 { COMBINING DOT ABOVE } - { } + 8 { ܝ } - { } + 7 { ZERO WIDTH NO-BREAK SPACE } - { } + 7 { ܆ } - { } + 7 { SPACE } - { } + 7 { ܣ } - { } + 6 { } - { ܝ } + 6 { COMBINING DOT ABOVE } - { COMBINING DIAERESIS } + 5 { ܙ } - { } + 5 { ܬ } - { } + 5 { } - { ܢ } + 4 { NO-BREAK SPACE } - { } + 4 { COMBINING DIAERESIS } - { COMBINING DOT ABOVE } + 4 { } - { ܒ } + 4 { } - { COMBINING DIAERESIS } + 4 { ܗ } - { } + 4 { } - { ܬ } + 4 { } - { ܘ } + 4 { ܕ } - { ܢ } + 3 { } - { ܕ } + 3 { ܐ } - { } + 3 { ܗ } - { ܐ } + 3 { ܝ } - { ܢ } + 3 { ܀ } - { . } + 3 { } - { ܗ } + + ..... + +The first section of the report consists of a simple accounting of the number +of characters in the ground truth, the errors in the recognition output and the +resulting accuracy in per cent. + +The next table lists the number of insertions (characters occurring in the +ground truth but not in the recognition output), substitutions (misrecognized +characters), and deletions (superfluous characters recognized by the model). + +Next is a grouping of errors (insertions and substitutions) by Unicode script. + +The final part of the report are errors sorted by frequency and a per +character accuracy report. Importantly most errors are incorrect recognition of +combining marks such as dots and diaereses. These may have several sources: +different dot placement in training and validation set, incorrect transcription +such as non-systematic transcription, or unclean speckled scans. Depending on +the error source, correction most often involves adding more training data and +fixing transcriptions. Sometimes it may even be advisable to remove +unrepresentative data from the training set. + +Recognition +----------- + +The ``kraken`` utility is employed for all non-training related tasks. Optical +character recognition is a multi-step process consisting of binarization +(conversion of input images to black and white), page segmentation (extracting +lines from the image), and recognition (converting line image to character +sequences). All of these may be run in a single call like this: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m MODEL_FILE + +producing a text file from the input image. There are also `hocr +`_ and `ALTO `_ output +formats available through the appropriate switches: + +.. code-block:: console + + $ kraken -i ... ocr -h + $ kraken -i ... ocr -a + +For debugging purposes it is sometimes helpful to run each step manually and +inspect intermediate results: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE BW_IMAGE binarize + $ kraken -i BW_IMAGE LINES segment + $ kraken -i BW_IMAGE OUTPUT_FILE ocr -l LINES ... + +It is also possible to recognize more than one file at a time by just chaining +``-i ... ...`` clauses like this: + +.. code-block:: console + + $ kraken -i input_1 output_1 -i input_2 output_2 ... + +Finally, there is a central repository containing freely available models. +Getting a list of all available models: + +.. code-block:: console + + $ kraken list + +Retrieving model metadata for a particular model: + +.. code-block:: console + + $ kraken show arabic-alam-al-kutub + name: arabic-alam-al-kutub.mlmodel + + An experimental model for Classical Arabic texts. + + Network trained on 889 lines of [0] as a test case for a general Classical + Arabic model. Ground truth was prepared by Sarah Savant + and Maxim Romanov . + + Vocalization was omitted in the ground truth. Training was stopped at ~35000 + iterations with an accuracy of 97%. + + [0] Ibn al-Faqīh (d. 365 AH). Kitāb al-buldān. Edited by Yūsuf al-Hādī, 1st + edition. Bayrūt: ʿĀlam al-kutub, 1416 AH/1996 CE. + alphabet: !()-.0123456789:[] «»،؟ءابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ARABIC + MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW + +and actually fetching the model: + +.. code-block:: console + + $ kraken get arabic-alam-al-kutub + +The downloaded model can then be used for recognition by the name shown in its metadata, e.g.: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m arabic-alam-al-kutub.mlmodel + +For more documentation see the kraken `website `_. diff --git a/4.0/_sources/vgsl.rst.txt b/4.0/_sources/vgsl.rst.txt new file mode 100644 index 000000000..913a7b5b1 --- /dev/null +++ b/4.0/_sources/vgsl.rst.txt @@ -0,0 +1,199 @@ +.. _vgsl: + +VGSL network specification +========================== + +kraken implements a dialect of the Variable-size Graph Specification Language +(VGSL), enabling the specification of different network architectures for image +processing purposes using a short definition string. + +Basics +------ + +A VGSL specification consists of an input block, one or more layers, and an +output block. For example: + +.. code-block:: console + + [1,48,0,1 Cr3,3,32 Mp2,2 Cr3,3,64 Mp2,2 S1(1x12)1,3 Lbx100 Do O1c103] + +The first block defines the input in order of [batch, height, width, channels] +with zero-valued dimensions being variable. Integer valued height or width +input specifications will result in the input images being automatically scaled +in either dimension. + +When channels are set to 1 grayscale or B/W inputs are expected, 3 expects RGB +color images. Higher values in combination with a height of 1 result in the +network being fed 1 pixel wide grayscale strips scaled to the size of the +channel dimension. + +After the input, a number of layers are defined. Layers operate on the channel +dimension; this is intuitive for convolutional layers but a recurrent layer +doing sequence classification along the width axis on an image of a particular +height requires the height dimension to be moved to the channel dimension, +e.g.: + +.. code-block:: console + + [1,48,0,1 S1(1x48)1,3 Lbx100 O1c103] + +or using the alternative slightly faster formulation: + +.. code-block:: console + + [1,1,0,48 Lbx100 O1c103] + +Finally an output definition is appended. When training sequence classification +networks with the provided tools the appropriate output definition is +automatically appended to the network based on the alphabet of the training +data. + +Examples +-------- + +.. code-block:: console + + [1,1,0,48 Lbx100 Do 01c59] + + Creating new model [1,1,0,48 Lbx100 Do] with 59 outputs + layer type params + 0 rnn direction b transposed False summarize False out 100 legacy None + 1 dropout probability 0.5 dims 1 + 2 linear augmented False out 59 + +A simple recurrent recognition model with a single LSTM layer classifying lines +normalized to 48 pixels in height. + +.. code-block:: console + + [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do 01c59] + + Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 59 outputs + layer type params + 0 conv kernel 3 x 3 filters 32 activation r + 1 dropout probability 0.1 dims 2 + 2 maxpool kernel 2 x 2 stride 2 x 2 + 3 conv kernel 3 x 3 filters 64 activation r + 4 dropout probability 0.1 dims 2 + 5 maxpool kernel 2 x 2 stride 2 x 2 + 6 reshape from 1 1 x 12 to 1/3 + 7 rnn direction b transposed False summarize False out 100 legacy None + 8 dropout probability 0.5 dims 1 + 9 linear augmented False out 59 + +A model with a small convolutional stack before a recurrent LSTM layer. The +extended dropout layer syntax is used to reduce drop probability on the depth +dimension as the default is too high for convolutional layers. The remainder of +the height dimension (`12`) is reshaped into the depth dimensions before +applying the final recurrent and linear layers. + +.. code-block:: console + + [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do 01c59] + + Creating new model [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do] with 59 outputs + layer type params + 0 conv kernel 3 x 3 filters 16 activation r + 1 maxpool kernel 3 x 3 stride 3 x 3 + 2 rnn direction f transposed True summarize True out 64 legacy None + 3 rnn direction b transposed False summarize False out 128 legacy None + 4 rnn direction b transposed False summarize False out 256 legacy None + 5 dropout probability 0.5 dims 1 + 6 linear augmented False out 59 + +A model with arbitrary sized color image input, an initial summarizing +recurrent layer to squash the height to 64, followed by 2 bi-directional +recurrent layers and a linear projection. + +Convolutional Layers +-------------------- + +.. code-block:: console + + C[{name}](s|t|r|l|m)[{name}],,[,,] + s = sigmoid + t = tanh + r = relu + l = linear + m = softmax + +Adds a 2D convolution with kernel size `(y, x)` and `d` output channels, applying +the selected nonlinearity. The stride can be adjusted with the optional last +two parameters. + +Recurrent Layers +---------------- + +.. code-block:: console + + L[{name}](f|r|b)(x|y)[s][{name}] LSTM cell with n outputs. + G[{name}](f|r|b)(x|y)[s][{name}] GRU cell with n outputs. + f runs the RNN forward only. + r runs the RNN reversed only. + b runs the RNN bidirectionally. + s (optional) summarizes the output in the requested dimension, return the last step. + +Adds either an LSTM or GRU recurrent layer to the network using either the `x` +(width) or `y` (height) dimension as the time axis. Input features are the +channel dimension and the non-time-axis dimension (height/width) is treated as +another batch dimension. For example, a `Lfx25` layer on an `1, 16, 906, 32` +input will execute 16 independent forward passes on `906x32` tensors resulting +in an output of shape `1, 16, 906, 25`. If this isn't desired either run a +summarizing layer in the other direction, e.g. `Lfys20` for an input `1, 1, +906, 20`, or prepend a reshape layer `S1(1x16)1,3` combining the height and +channel dimension for an `1, 1, 906, 512` input to the recurrent layer. + +Helper and Plumbing Layers +-------------------------- + +Max Pool +^^^^^^^^ +.. code-block:: console + + Mp[{name}],[,,] + +Adds a maximum pooling with `(y, x)` kernel_size and `(y_stride, x_stride)` stride. + +Reshape +^^^^^^^ + +.. code-block:: console + + S[{name}](x), Splits one dimension, moves one part to another + dimension. + +The `S` layer reshapes a source dimension `d` to `a,b` and distributes `a` into +dimension `e`, respectively `b` into `f`. Either `e` or `f` has to be equal to +`d`. So `S1(1, 48)1, 3` on an `1, 48, 1020, 8` input will first reshape into +`1, 1, 48, 1020, 8`, leave the `1` part in the height dimension and distribute +the `48` sized tensor into the channel dimension resulting in a `1, 1, 1024, +48*8=384` sized output. `S` layers are mostly used to remove undesirable non-1 +height before a recurrent layer. + +.. note:: + + This `S` layer is equivalent to the one implemented in the tensorflow + implementation of VGSL, i.e. behaves differently from tesseract. + +Regularization Layers +--------------------- + +Dropout +^^^^^^^ + +.. code-block:: console + + Do[{name}][],[] Insert a 1D or 2D dropout layer + +Adds an 1D or 2D dropout layer with a given probability. Defaults to `0.5` drop +probability and 1D dropout. Set to `dim` to `2` after convolutional layers. + +Group Normalization +^^^^^^^^^^^^^^^^^^^ + +.. code-block:: console + + Gn Inserts a group normalization layer + +Adds a group normalization layer separating the input into `` groups, +normalizing each separately. diff --git a/4.0/_static/alabaster.css b/4.0/_static/alabaster.css new file mode 100644 index 000000000..e3174bf93 --- /dev/null +++ b/4.0/_static/alabaster.css @@ -0,0 +1,708 @@ +@import url("basic.css"); + +/* -- page layout ----------------------------------------------------------- */ + +body { + font-family: Georgia, serif; + font-size: 17px; + background-color: #fff; + color: #000; + margin: 0; + padding: 0; +} + + +div.document { + width: 940px; + margin: 30px auto 0 auto; +} + +div.documentwrapper { + float: left; + width: 100%; +} + +div.bodywrapper { + margin: 0 0 0 220px; +} + +div.sphinxsidebar { + width: 220px; + font-size: 14px; + line-height: 1.5; +} + +hr { + border: 1px solid #B1B4B6; +} + +div.body { + background-color: #fff; + color: #3E4349; + padding: 0 30px 0 30px; +} + +div.body > .section { + text-align: left; +} + +div.footer { + width: 940px; + margin: 20px auto 30px auto; + font-size: 14px; + color: #888; + text-align: right; +} + +div.footer a { + color: #888; +} + +p.caption { + font-family: inherit; + font-size: inherit; +} + + +div.relations { + display: none; +} + + +div.sphinxsidebar { + max-height: 100%; + overflow-y: auto; +} + +div.sphinxsidebar a { + color: #444; + text-decoration: none; + border-bottom: 1px dotted #999; +} + +div.sphinxsidebar a:hover { + border-bottom: 1px solid #999; +} + +div.sphinxsidebarwrapper { + padding: 18px 10px; +} + +div.sphinxsidebarwrapper p.logo { + padding: 0; + margin: -10px 0 0 0px; + text-align: center; +} + +div.sphinxsidebarwrapper h1.logo { + margin-top: -10px; + text-align: center; + margin-bottom: 5px; + text-align: left; +} + +div.sphinxsidebarwrapper h1.logo-name { + margin-top: 0px; +} + +div.sphinxsidebarwrapper p.blurb { + margin-top: 0; + font-style: normal; +} + +div.sphinxsidebar h3, +div.sphinxsidebar h4 { + font-family: Georgia, serif; + color: #444; + font-size: 24px; + font-weight: normal; + margin: 0 0 5px 0; + padding: 0; +} + +div.sphinxsidebar h4 { + font-size: 20px; +} + +div.sphinxsidebar h3 a { + color: #444; +} + +div.sphinxsidebar p.logo a, +div.sphinxsidebar h3 a, +div.sphinxsidebar p.logo a:hover, +div.sphinxsidebar h3 a:hover { + border: none; +} + +div.sphinxsidebar p { + color: #555; + margin: 10px 0; +} + +div.sphinxsidebar ul { + margin: 10px 0; + padding: 0; + color: #000; +} + +div.sphinxsidebar ul li.toctree-l1 > a { + font-size: 120%; +} + +div.sphinxsidebar ul li.toctree-l2 > a { + font-size: 110%; +} + +div.sphinxsidebar input { + border: 1px solid #CCC; + font-family: Georgia, serif; + font-size: 1em; +} + +div.sphinxsidebar #searchbox input[type="text"] { + width: 160px; +} + +div.sphinxsidebar .search > div { + display: table-cell; +} + +div.sphinxsidebar hr { + border: none; + height: 1px; + color: #AAA; + background: #AAA; + + text-align: left; + margin-left: 0; + width: 50%; +} + +div.sphinxsidebar .badge { + border-bottom: none; +} + +div.sphinxsidebar .badge:hover { + border-bottom: none; +} + +/* To address an issue with donation coming after search */ +div.sphinxsidebar h3.donation { + margin-top: 10px; +} + +/* -- body styles ----------------------------------------------------------- */ + +a { + color: #004B6B; + text-decoration: underline; +} + +a:hover { + color: #6D4100; + text-decoration: underline; +} + +div.body h1, +div.body h2, +div.body h3, +div.body h4, +div.body h5, +div.body h6 { + font-family: Georgia, serif; + font-weight: normal; + margin: 30px 0px 10px 0px; + padding: 0; +} + +div.body h1 { margin-top: 0; padding-top: 0; font-size: 240%; } +div.body h2 { font-size: 180%; } +div.body h3 { font-size: 150%; } +div.body h4 { font-size: 130%; } +div.body h5 { font-size: 100%; } +div.body h6 { font-size: 100%; } + +a.headerlink { + color: #DDD; + padding: 0 4px; + text-decoration: none; +} + +a.headerlink:hover { + color: #444; + background: #EAEAEA; +} + +div.body p, div.body dd, div.body li { + line-height: 1.4em; +} + +div.admonition { + margin: 20px 0px; + padding: 10px 30px; + background-color: #EEE; + border: 1px solid #CCC; +} + +div.admonition tt.xref, div.admonition code.xref, div.admonition a tt { + background-color: #FBFBFB; + border-bottom: 1px solid #fafafa; +} + +div.admonition p.admonition-title { + font-family: Georgia, serif; + font-weight: normal; + font-size: 24px; + margin: 0 0 10px 0; + padding: 0; + line-height: 1; +} + +div.admonition p.last { + margin-bottom: 0; +} + +div.highlight { + background-color: #fff; +} + +dt:target, .highlight { + background: #FAF3E8; +} + +div.warning { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.danger { + background-color: #FCC; + border: 1px solid #FAA; + -moz-box-shadow: 2px 2px 4px #D52C2C; + -webkit-box-shadow: 2px 2px 4px #D52C2C; + box-shadow: 2px 2px 4px #D52C2C; +} + +div.error { + background-color: #FCC; + border: 1px solid #FAA; + -moz-box-shadow: 2px 2px 4px #D52C2C; + -webkit-box-shadow: 2px 2px 4px #D52C2C; + box-shadow: 2px 2px 4px #D52C2C; +} + +div.caution { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.attention { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.important { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.note { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.tip { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.hint { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.seealso { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.topic { + background-color: #EEE; +} + +p.admonition-title { + display: inline; +} + +p.admonition-title:after { + content: ":"; +} + +pre, tt, code { + font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace; + font-size: 0.9em; +} + +.hll { + background-color: #FFC; + margin: 0 -12px; + padding: 0 12px; + display: block; +} + +img.screenshot { +} + +tt.descname, tt.descclassname, code.descname, code.descclassname { + font-size: 0.95em; +} + +tt.descname, code.descname { + padding-right: 0.08em; +} + +img.screenshot { + -moz-box-shadow: 2px 2px 4px #EEE; + -webkit-box-shadow: 2px 2px 4px #EEE; + box-shadow: 2px 2px 4px #EEE; +} + +table.docutils { + border: 1px solid #888; + -moz-box-shadow: 2px 2px 4px #EEE; + -webkit-box-shadow: 2px 2px 4px #EEE; + box-shadow: 2px 2px 4px #EEE; +} + +table.docutils td, table.docutils th { + border: 1px solid #888; + padding: 0.25em 0.7em; +} + +table.field-list, table.footnote { + border: none; + -moz-box-shadow: none; + -webkit-box-shadow: none; + box-shadow: none; +} + +table.footnote { + margin: 15px 0; + width: 100%; + border: 1px solid #EEE; + background: #FDFDFD; + font-size: 0.9em; +} + +table.footnote + table.footnote { + margin-top: -15px; + border-top: none; +} + +table.field-list th { + padding: 0 0.8em 0 0; +} + +table.field-list td { + padding: 0; +} + +table.field-list p { + margin-bottom: 0.8em; +} + +/* Cloned from + * https://github.com/sphinx-doc/sphinx/commit/ef60dbfce09286b20b7385333d63a60321784e68 + */ +.field-name { + -moz-hyphens: manual; + -ms-hyphens: manual; + -webkit-hyphens: manual; + hyphens: manual; +} + +table.footnote td.label { + width: .1px; + padding: 0.3em 0 0.3em 0.5em; +} + +table.footnote td { + padding: 0.3em 0.5em; +} + +dl { + margin-left: 0; + margin-right: 0; + margin-top: 0; + padding: 0; +} + +dl dd { + margin-left: 30px; +} + +blockquote { + margin: 0 0 0 30px; + padding: 0; +} + +ul, ol { + /* Matches the 30px from the narrow-screen "li > ul" selector below */ + margin: 10px 0 10px 30px; + padding: 0; +} + +pre { + background: #EEE; + padding: 7px 30px; + margin: 15px 0px; + line-height: 1.3em; +} + +div.viewcode-block:target { + background: #ffd; +} + +dl pre, blockquote pre, li pre { + margin-left: 0; + padding-left: 30px; +} + +tt, code { + background-color: #ecf0f3; + color: #222; + /* padding: 1px 2px; */ +} + +tt.xref, code.xref, a tt { + background-color: #FBFBFB; + border-bottom: 1px solid #fff; +} + +a.reference { + text-decoration: none; + border-bottom: 1px dotted #004B6B; +} + +/* Don't put an underline on images */ +a.image-reference, a.image-reference:hover { + border-bottom: none; +} + +a.reference:hover { + border-bottom: 1px solid #6D4100; +} + +a.footnote-reference { + text-decoration: none; + font-size: 0.7em; + vertical-align: top; + border-bottom: 1px dotted #004B6B; +} + +a.footnote-reference:hover { + border-bottom: 1px solid #6D4100; +} + +a:hover tt, a:hover code { + background: #EEE; +} + + +@media screen and (max-width: 870px) { + + div.sphinxsidebar { + display: none; + } + + div.document { + width: 100%; + + } + + div.documentwrapper { + margin-left: 0; + margin-top: 0; + margin-right: 0; + margin-bottom: 0; + } + + div.bodywrapper { + margin-top: 0; + margin-right: 0; + margin-bottom: 0; + margin-left: 0; + } + + ul { + margin-left: 0; + } + + li > ul { + /* Matches the 30px from the "ul, ol" selector above */ + margin-left: 30px; + } + + .document { + width: auto; + } + + .footer { + width: auto; + } + + .bodywrapper { + margin: 0; + } + + .footer { + width: auto; + } + + .github { + display: none; + } + + + +} + + + +@media screen and (max-width: 875px) { + + body { + margin: 0; + padding: 20px 30px; + } + + div.documentwrapper { + float: none; + background: #fff; + } + + div.sphinxsidebar { + display: block; + float: none; + width: 102.5%; + margin: 50px -30px -20px -30px; + padding: 10px 20px; + background: #333; + color: #FFF; + } + + div.sphinxsidebar h3, div.sphinxsidebar h4, div.sphinxsidebar p, + div.sphinxsidebar h3 a { + color: #fff; + } + + div.sphinxsidebar a { + color: #AAA; + } + + div.sphinxsidebar p.logo { + display: none; + } + + div.document { + width: 100%; + margin: 0; + } + + div.footer { + display: none; + } + + div.bodywrapper { + margin: 0; + } + + div.body { + min-height: 0; + padding: 0; + } + + .rtd_doc_footer { + display: none; + } + + .document { + width: auto; + } + + .footer { + width: auto; + } + + .footer { + width: auto; + } + + .github { + display: none; + } +} + + +/* misc. */ + +.revsys-inline { + display: none!important; +} + +/* Hide ugly table cell borders in ..bibliography:: directive output */ +table.docutils.citation, table.docutils.citation td, table.docutils.citation th { + border: none; + /* Below needed in some edge cases; if not applied, bottom shadows appear */ + -moz-box-shadow: none; + -webkit-box-shadow: none; + box-shadow: none; +} + + +/* relbar */ + +.related { + line-height: 30px; + width: 100%; + font-size: 0.9rem; +} + +.related.top { + border-bottom: 1px solid #EEE; + margin-bottom: 20px; +} + +.related.bottom { + border-top: 1px solid #EEE; +} + +.related ul { + padding: 0; + margin: 0; + list-style: none; +} + +.related li { + display: inline; +} + +nav#rellinks { + float: right; +} + +nav#rellinks li+li:before { + content: "|"; +} + +nav#breadcrumbs li+li:before { + content: "\00BB"; +} + +/* Hide certain items when printing */ +@media print { + div.related { + display: none; + } +} \ No newline at end of file diff --git a/4.0/_static/basic.css b/4.0/_static/basic.css new file mode 100644 index 000000000..e5179b7a9 --- /dev/null +++ b/4.0/_static/basic.css @@ -0,0 +1,925 @@ +/* + * basic.css + * ~~~~~~~~~ + * + * Sphinx stylesheet -- basic theme. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +/* -- main layout ----------------------------------------------------------- */ + +div.clearer { + clear: both; +} + +div.section::after { + display: block; + content: ''; + clear: left; +} + +/* -- relbar ---------------------------------------------------------------- */ + +div.related { + width: 100%; + font-size: 90%; +} + +div.related h3 { + display: none; +} + +div.related ul { + margin: 0; + padding: 0 0 0 10px; + list-style: none; +} + +div.related li { + display: inline; +} + +div.related li.right { + float: right; + margin-right: 5px; +} + +/* -- sidebar --------------------------------------------------------------- */ + +div.sphinxsidebarwrapper { + padding: 10px 5px 0 10px; +} + +div.sphinxsidebar { + float: left; + width: 230px; + margin-left: -100%; + font-size: 90%; + word-wrap: break-word; + overflow-wrap : break-word; +} + +div.sphinxsidebar ul { + list-style: none; +} + +div.sphinxsidebar ul ul, +div.sphinxsidebar ul.want-points { + margin-left: 20px; + list-style: square; +} + +div.sphinxsidebar ul ul { + margin-top: 0; + margin-bottom: 0; +} + +div.sphinxsidebar form { + margin-top: 10px; +} + +div.sphinxsidebar input { + border: 1px solid #98dbcc; + font-family: sans-serif; + font-size: 1em; +} + +div.sphinxsidebar #searchbox form.search { + overflow: hidden; +} + +div.sphinxsidebar #searchbox input[type="text"] { + float: left; + width: 80%; + padding: 0.25em; + box-sizing: border-box; +} + +div.sphinxsidebar #searchbox input[type="submit"] { + float: left; + width: 20%; + border-left: none; + padding: 0.25em; + box-sizing: border-box; +} + + +img { + border: 0; + max-width: 100%; +} + +/* -- search page ----------------------------------------------------------- */ + +ul.search { + margin: 10px 0 0 20px; + padding: 0; +} + +ul.search li { + padding: 5px 0 5px 20px; + background-image: url(file.png); + background-repeat: no-repeat; + background-position: 0 7px; +} + +ul.search li a { + font-weight: bold; +} + +ul.search li p.context { + color: #888; + margin: 2px 0 0 30px; + text-align: left; +} + +ul.keywordmatches li.goodmatch a { + font-weight: bold; +} + +/* -- index page ------------------------------------------------------------ */ + +table.contentstable { + width: 90%; + margin-left: auto; + margin-right: auto; +} + +table.contentstable p.biglink { + line-height: 150%; +} + +a.biglink { + font-size: 1.3em; +} + +span.linkdescr { + font-style: italic; + padding-top: 5px; + font-size: 90%; +} + +/* -- general index --------------------------------------------------------- */ + +table.indextable { + width: 100%; +} + +table.indextable td { + text-align: left; + vertical-align: top; +} + +table.indextable ul { + margin-top: 0; + margin-bottom: 0; + list-style-type: none; +} + +table.indextable > tbody > tr > td > ul { + padding-left: 0em; +} + +table.indextable tr.pcap { + height: 10px; +} + +table.indextable tr.cap { + margin-top: 10px; + background-color: #f2f2f2; +} + +img.toggler { + margin-right: 3px; + margin-top: 3px; + cursor: pointer; +} + +div.modindex-jumpbox { + border-top: 1px solid #ddd; + border-bottom: 1px solid #ddd; + margin: 1em 0 1em 0; + padding: 0.4em; +} + +div.genindex-jumpbox { + border-top: 1px solid #ddd; + border-bottom: 1px solid #ddd; + margin: 1em 0 1em 0; + padding: 0.4em; +} + +/* -- domain module index --------------------------------------------------- */ + +table.modindextable td { + padding: 2px; + border-collapse: collapse; +} + +/* -- general body styles --------------------------------------------------- */ + +div.body { + min-width: inherit; + max-width: 800px; +} + +div.body p, div.body dd, div.body li, div.body blockquote { + -moz-hyphens: auto; + -ms-hyphens: auto; + -webkit-hyphens: auto; + hyphens: auto; +} + +a.headerlink { + visibility: hidden; +} + +a:visited { + color: #551A8B; +} + +h1:hover > a.headerlink, +h2:hover > a.headerlink, +h3:hover > a.headerlink, +h4:hover > a.headerlink, +h5:hover > a.headerlink, +h6:hover > a.headerlink, +dt:hover > a.headerlink, +caption:hover > a.headerlink, +p.caption:hover > a.headerlink, +div.code-block-caption:hover > a.headerlink { + visibility: visible; +} + +div.body p.caption { + text-align: inherit; +} + +div.body td { + text-align: left; +} + +.first { + margin-top: 0 !important; +} + +p.rubric { + margin-top: 30px; + font-weight: bold; +} + +img.align-left, figure.align-left, .figure.align-left, object.align-left { + clear: left; + float: left; + margin-right: 1em; +} + +img.align-right, figure.align-right, .figure.align-right, object.align-right { + clear: right; + float: right; + margin-left: 1em; +} + +img.align-center, figure.align-center, .figure.align-center, object.align-center { + display: block; + margin-left: auto; + margin-right: auto; +} + +img.align-default, figure.align-default, .figure.align-default { + display: block; + margin-left: auto; + margin-right: auto; +} + +.align-left { + text-align: left; +} + +.align-center { + text-align: center; +} + +.align-default { + text-align: center; +} + +.align-right { + text-align: right; +} + +/* -- sidebars -------------------------------------------------------------- */ + +div.sidebar, +aside.sidebar { + margin: 0 0 0.5em 1em; + border: 1px solid #ddb; + padding: 7px; + background-color: #ffe; + width: 40%; + float: right; + clear: right; + overflow-x: auto; +} + +p.sidebar-title { + font-weight: bold; +} + +nav.contents, +aside.topic, +div.admonition, div.topic, blockquote { + clear: left; +} + +/* -- topics ---------------------------------------------------------------- */ + +nav.contents, +aside.topic, +div.topic { + border: 1px solid #ccc; + padding: 7px; + margin: 10px 0 10px 0; +} + +p.topic-title { + font-size: 1.1em; + font-weight: bold; + margin-top: 10px; +} + +/* -- admonitions ----------------------------------------------------------- */ + +div.admonition { + margin-top: 10px; + margin-bottom: 10px; + padding: 7px; +} + +div.admonition dt { + font-weight: bold; +} + +p.admonition-title { + margin: 0px 10px 5px 0px; + font-weight: bold; +} + +div.body p.centered { + text-align: center; + margin-top: 25px; +} + +/* -- content of sidebars/topics/admonitions -------------------------------- */ + +div.sidebar > :last-child, +aside.sidebar > :last-child, +nav.contents > :last-child, +aside.topic > :last-child, +div.topic > :last-child, +div.admonition > :last-child { + margin-bottom: 0; +} + +div.sidebar::after, +aside.sidebar::after, +nav.contents::after, +aside.topic::after, +div.topic::after, +div.admonition::after, +blockquote::after { + display: block; + content: ''; + clear: both; +} + +/* -- tables ---------------------------------------------------------------- */ + +table.docutils { + margin-top: 10px; + margin-bottom: 10px; + border: 0; + border-collapse: collapse; +} + +table.align-center { + margin-left: auto; + margin-right: auto; +} + +table.align-default { + margin-left: auto; + margin-right: auto; +} + +table caption span.caption-number { + font-style: italic; +} + +table caption span.caption-text { +} + +table.docutils td, table.docutils th { + padding: 1px 8px 1px 5px; + border-top: 0; + border-left: 0; + border-right: 0; + border-bottom: 1px solid #aaa; +} + +th { + text-align: left; + padding-right: 5px; +} + +table.citation { + border-left: solid 1px gray; + margin-left: 1px; +} + +table.citation td { + border-bottom: none; +} + +th > :first-child, +td > :first-child { + margin-top: 0px; +} + +th > :last-child, +td > :last-child { + margin-bottom: 0px; +} + +/* -- figures --------------------------------------------------------------- */ + +div.figure, figure { + margin: 0.5em; + padding: 0.5em; +} + +div.figure p.caption, figcaption { + padding: 0.3em; +} + +div.figure p.caption span.caption-number, +figcaption span.caption-number { + font-style: italic; +} + +div.figure p.caption span.caption-text, +figcaption span.caption-text { +} + +/* -- field list styles ----------------------------------------------------- */ + +table.field-list td, table.field-list th { + border: 0 !important; +} + +.field-list ul { + margin: 0; + padding-left: 1em; +} + +.field-list p { + margin: 0; +} + +.field-name { + -moz-hyphens: manual; + -ms-hyphens: manual; + -webkit-hyphens: manual; + hyphens: manual; +} + +/* -- hlist styles ---------------------------------------------------------- */ + +table.hlist { + margin: 1em 0; +} + +table.hlist td { + vertical-align: top; +} + +/* -- object description styles --------------------------------------------- */ + +.sig { + font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace; +} + +.sig-name, code.descname { + background-color: transparent; + font-weight: bold; +} + +.sig-name { + font-size: 1.1em; +} + +code.descname { + font-size: 1.2em; +} + +.sig-prename, code.descclassname { + background-color: transparent; +} + +.optional { + font-size: 1.3em; +} + +.sig-paren { + font-size: larger; +} + +.sig-param.n { + font-style: italic; +} + +/* C++ specific styling */ + +.sig-inline.c-texpr, +.sig-inline.cpp-texpr { + font-family: unset; +} + +.sig.c .k, .sig.c .kt, +.sig.cpp .k, .sig.cpp .kt { + color: #0033B3; +} + +.sig.c .m, +.sig.cpp .m { + color: #1750EB; +} + +.sig.c .s, .sig.c .sc, +.sig.cpp .s, .sig.cpp .sc { + color: #067D17; +} + + +/* -- other body styles ----------------------------------------------------- */ + +ol.arabic { + list-style: decimal; +} + +ol.loweralpha { + list-style: lower-alpha; +} + +ol.upperalpha { + list-style: upper-alpha; +} + +ol.lowerroman { + list-style: lower-roman; +} + +ol.upperroman { + list-style: upper-roman; +} + +:not(li) > ol > li:first-child > :first-child, +:not(li) > ul > li:first-child > :first-child { + margin-top: 0px; +} + +:not(li) > ol > li:last-child > :last-child, +:not(li) > ul > li:last-child > :last-child { + margin-bottom: 0px; +} + +ol.simple ol p, +ol.simple ul p, +ul.simple ol p, +ul.simple ul p { + margin-top: 0; +} + +ol.simple > li:not(:first-child) > p, +ul.simple > li:not(:first-child) > p { + margin-top: 0; +} + +ol.simple p, +ul.simple p { + margin-bottom: 0; +} + +aside.footnote > span, +div.citation > span { + float: left; +} +aside.footnote > span:last-of-type, +div.citation > span:last-of-type { + padding-right: 0.5em; +} +aside.footnote > p { + margin-left: 2em; +} +div.citation > p { + margin-left: 4em; +} +aside.footnote > p:last-of-type, +div.citation > p:last-of-type { + margin-bottom: 0em; +} +aside.footnote > p:last-of-type:after, +div.citation > p:last-of-type:after { + content: ""; + clear: both; +} + +dl.field-list { + display: grid; + grid-template-columns: fit-content(30%) auto; +} + +dl.field-list > dt { + font-weight: bold; + word-break: break-word; + padding-left: 0.5em; + padding-right: 5px; +} + +dl.field-list > dd { + padding-left: 0.5em; + margin-top: 0em; + margin-left: 0em; + margin-bottom: 0em; +} + +dl { + margin-bottom: 15px; +} + +dd > :first-child { + margin-top: 0px; +} + +dd ul, dd table { + margin-bottom: 10px; +} + +dd { + margin-top: 3px; + margin-bottom: 10px; + margin-left: 30px; +} + +.sig dd { + margin-top: 0px; + margin-bottom: 0px; +} + +.sig dl { + margin-top: 0px; + margin-bottom: 0px; +} + +dl > dd:last-child, +dl > dd:last-child > :last-child { + margin-bottom: 0; +} + +dt:target, span.highlighted { + background-color: #fbe54e; +} + +rect.highlighted { + fill: #fbe54e; +} + +dl.glossary dt { + font-weight: bold; + font-size: 1.1em; +} + +.versionmodified { + font-style: italic; +} + +.system-message { + background-color: #fda; + padding: 5px; + border: 3px solid red; +} + +.footnote:target { + background-color: #ffa; +} + +.line-block { + display: block; + margin-top: 1em; + margin-bottom: 1em; +} + +.line-block .line-block { + margin-top: 0; + margin-bottom: 0; + margin-left: 1.5em; +} + +.guilabel, .menuselection { + font-family: sans-serif; +} + +.accelerator { + text-decoration: underline; +} + +.classifier { + font-style: oblique; +} + +.classifier:before { + font-style: normal; + margin: 0 0.5em; + content: ":"; + display: inline-block; +} + +abbr, acronym { + border-bottom: dotted 1px; + cursor: help; +} + +.translated { + background-color: rgba(207, 255, 207, 0.2) +} + +.untranslated { + background-color: rgba(255, 207, 207, 0.2) +} + +/* -- code displays --------------------------------------------------------- */ + +pre { + overflow: auto; + overflow-y: hidden; /* fixes display issues on Chrome browsers */ +} + +pre, div[class*="highlight-"] { + clear: both; +} + +span.pre { + -moz-hyphens: none; + -ms-hyphens: none; + -webkit-hyphens: none; + hyphens: none; + white-space: nowrap; +} + +div[class*="highlight-"] { + margin: 1em 0; +} + +td.linenos pre { + border: 0; + background-color: transparent; + color: #aaa; +} + +table.highlighttable { + display: block; +} + +table.highlighttable tbody { + display: block; +} + +table.highlighttable tr { + display: flex; +} + +table.highlighttable td { + margin: 0; + padding: 0; +} + +table.highlighttable td.linenos { + padding-right: 0.5em; +} + +table.highlighttable td.code { + flex: 1; + overflow: hidden; +} + +.highlight .hll { + display: block; +} + +div.highlight pre, +table.highlighttable pre { + margin: 0; +} + +div.code-block-caption + div { + margin-top: 0; +} + +div.code-block-caption { + margin-top: 1em; + padding: 2px 5px; + font-size: small; +} + +div.code-block-caption code { + background-color: transparent; +} + +table.highlighttable td.linenos, +span.linenos, +div.highlight span.gp { /* gp: Generic.Prompt */ + user-select: none; + -webkit-user-select: text; /* Safari fallback only */ + -webkit-user-select: none; /* Chrome/Safari */ + -moz-user-select: none; /* Firefox */ + -ms-user-select: none; /* IE10+ */ +} + +div.code-block-caption span.caption-number { + padding: 0.1em 0.3em; + font-style: italic; +} + +div.code-block-caption span.caption-text { +} + +div.literal-block-wrapper { + margin: 1em 0; +} + +code.xref, a code { + background-color: transparent; + font-weight: bold; +} + +h1 code, h2 code, h3 code, h4 code, h5 code, h6 code { + background-color: transparent; +} + +.viewcode-link { + float: right; +} + +.viewcode-back { + float: right; + font-family: sans-serif; +} + +div.viewcode-block:target { + margin: -1px -10px; + padding: 0 10px; +} + +/* -- math display ---------------------------------------------------------- */ + +img.math { + vertical-align: middle; +} + +div.body div.math p { + text-align: center; +} + +span.eqno { + float: right; +} + +span.eqno a.headerlink { + position: absolute; + z-index: 1; +} + +div.math:hover a.headerlink { + visibility: visible; +} + +/* -- printout stylesheet --------------------------------------------------- */ + +@media print { + div.document, + div.documentwrapper, + div.bodywrapper { + margin: 0 !important; + width: 100%; + } + + div.sphinxsidebar, + div.related, + div.footer, + #top-link { + display: none; + } +} \ No newline at end of file diff --git a/4.0/_static/blla_heatmap.jpg b/4.0/_static/blla_heatmap.jpg new file mode 100644 index 000000000..3f3381096 Binary files /dev/null and b/4.0/_static/blla_heatmap.jpg differ diff --git a/4.0/_static/blla_output.jpg b/4.0/_static/blla_output.jpg new file mode 100644 index 000000000..72c652fa4 Binary files /dev/null and b/4.0/_static/blla_output.jpg differ diff --git a/4.0/_static/bw.png b/4.0/_static/bw.png new file mode 100644 index 000000000..e7e5054eb Binary files /dev/null and b/4.0/_static/bw.png differ diff --git a/4.0/_static/custom.css b/4.0/_static/custom.css new file mode 100644 index 000000000..c41f90af5 --- /dev/null +++ b/4.0/_static/custom.css @@ -0,0 +1,24 @@ +pre { + white-space: pre-wrap; +} +svg { + width: 100%; +} +.highlight .err { + border: inherit; + box-sizing: inherit; +} + +div.leftside { + width: 110px; + padding: 0px 3px 0px 0px; + float: left; +} + +div.rightside { + margin-left: 125px; +} + +dl.py { + margin-top: 25px; +} diff --git a/4.0/_static/doctools.js b/4.0/_static/doctools.js new file mode 100644 index 000000000..4d67807d1 --- /dev/null +++ b/4.0/_static/doctools.js @@ -0,0 +1,156 @@ +/* + * doctools.js + * ~~~~~~~~~~~ + * + * Base JavaScript utilities for all Sphinx HTML documentation. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ +"use strict"; + +const BLACKLISTED_KEY_CONTROL_ELEMENTS = new Set([ + "TEXTAREA", + "INPUT", + "SELECT", + "BUTTON", +]); + +const _ready = (callback) => { + if (document.readyState !== "loading") { + callback(); + } else { + document.addEventListener("DOMContentLoaded", callback); + } +}; + +/** + * Small JavaScript module for the documentation. + */ +const Documentation = { + init: () => { + Documentation.initDomainIndexTable(); + Documentation.initOnKeyListeners(); + }, + + /** + * i18n support + */ + TRANSLATIONS: {}, + PLURAL_EXPR: (n) => (n === 1 ? 0 : 1), + LOCALE: "unknown", + + // gettext and ngettext don't access this so that the functions + // can safely bound to a different name (_ = Documentation.gettext) + gettext: (string) => { + const translated = Documentation.TRANSLATIONS[string]; + switch (typeof translated) { + case "undefined": + return string; // no translation + case "string": + return translated; // translation exists + default: + return translated[0]; // (singular, plural) translation tuple exists + } + }, + + ngettext: (singular, plural, n) => { + const translated = Documentation.TRANSLATIONS[singular]; + if (typeof translated !== "undefined") + return translated[Documentation.PLURAL_EXPR(n)]; + return n === 1 ? singular : plural; + }, + + addTranslations: (catalog) => { + Object.assign(Documentation.TRANSLATIONS, catalog.messages); + Documentation.PLURAL_EXPR = new Function( + "n", + `return (${catalog.plural_expr})` + ); + Documentation.LOCALE = catalog.locale; + }, + + /** + * helper function to focus on search bar + */ + focusSearchBar: () => { + document.querySelectorAll("input[name=q]")[0]?.focus(); + }, + + /** + * Initialise the domain index toggle buttons + */ + initDomainIndexTable: () => { + const toggler = (el) => { + const idNumber = el.id.substr(7); + const toggledRows = document.querySelectorAll(`tr.cg-${idNumber}`); + if (el.src.substr(-9) === "minus.png") { + el.src = `${el.src.substr(0, el.src.length - 9)}plus.png`; + toggledRows.forEach((el) => (el.style.display = "none")); + } else { + el.src = `${el.src.substr(0, el.src.length - 8)}minus.png`; + toggledRows.forEach((el) => (el.style.display = "")); + } + }; + + const togglerElements = document.querySelectorAll("img.toggler"); + togglerElements.forEach((el) => + el.addEventListener("click", (event) => toggler(event.currentTarget)) + ); + togglerElements.forEach((el) => (el.style.display = "")); + if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) togglerElements.forEach(toggler); + }, + + initOnKeyListeners: () => { + // only install a listener if it is really needed + if ( + !DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS && + !DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS + ) + return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.altKey || event.ctrlKey || event.metaKey) return; + + if (!event.shiftKey) { + switch (event.key) { + case "ArrowLeft": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const prevLink = document.querySelector('link[rel="prev"]'); + if (prevLink && prevLink.href) { + window.location.href = prevLink.href; + event.preventDefault(); + } + break; + case "ArrowRight": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const nextLink = document.querySelector('link[rel="next"]'); + if (nextLink && nextLink.href) { + window.location.href = nextLink.href; + event.preventDefault(); + } + break; + } + } + + // some keyboard layouts may need Shift to get / + switch (event.key) { + case "/": + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) break; + Documentation.focusSearchBar(); + event.preventDefault(); + } + }); + }, +}; + +// quick alias for translations +const _ = Documentation.gettext; + +_ready(Documentation.init); diff --git a/4.0/_static/documentation_options.js b/4.0/_static/documentation_options.js new file mode 100644 index 000000000..7e4c114f2 --- /dev/null +++ b/4.0/_static/documentation_options.js @@ -0,0 +1,13 @@ +const DOCUMENTATION_OPTIONS = { + VERSION: '', + LANGUAGE: 'en', + COLLAPSE_INDEX: false, + BUILDER: 'html', + FILE_SUFFIX: '.html', + LINK_SUFFIX: '.html', + HAS_SOURCE: true, + SOURCELINK_SUFFIX: '.txt', + NAVIGATION_WITH_KEYS: false, + SHOW_SEARCH_SUMMARY: true, + ENABLE_SEARCH_SHORTCUTS: true, +}; \ No newline at end of file diff --git a/4.0/_static/file.png b/4.0/_static/file.png new file mode 100644 index 000000000..a858a410e Binary files /dev/null and b/4.0/_static/file.png differ diff --git a/4.0/_static/graphviz.css b/4.0/_static/graphviz.css new file mode 100644 index 000000000..027576e34 --- /dev/null +++ b/4.0/_static/graphviz.css @@ -0,0 +1,19 @@ +/* + * graphviz.css + * ~~~~~~~~~~~~ + * + * Sphinx stylesheet -- graphviz extension. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +img.graphviz { + border: 0; + max-width: 100%; +} + +object.graphviz { + max-width: 100%; +} diff --git a/4.0/_static/kraken.png b/4.0/_static/kraken.png new file mode 100644 index 000000000..8f25dd8be Binary files /dev/null and b/4.0/_static/kraken.png differ diff --git a/4.0/_static/kraken_recognition.svg b/4.0/_static/kraken_recognition.svg new file mode 100644 index 000000000..129b2c67a --- /dev/null +++ b/4.0/_static/kraken_recognition.svg @@ -0,0 +1,948 @@ + + + + + + + + + + + + Output Matrix + + + Labels + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Label + Sequence + + + 15, 10, 1, ... + + + + 'Time' Steps + + + + + + + + + + + + + + 'Time' Steps + (Width) + + + + + + + + + + + + + + + + + + + + + + + + + + Neural + Net + + + + Character + Sequence + + + o, c, u, ... + + + + + + + + + + + + + + + CTC + decoder + + + + + Codec + + + + + + + + + + + + + + diff --git a/4.0/_static/kraken_segmentation.svg b/4.0/_static/kraken_segmentation.svg new file mode 100644 index 000000000..4b9c860ce --- /dev/null +++ b/4.0/_static/kraken_segmentation.svg @@ -0,0 +1,1161 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Pixel Labelling + + + + + + + + Line and Separator + Heatmaps + + + + + + + + + Bounding Polygon + Calculation + + + + + + + + + + + Baseline + Vectorization + and Orientation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Oriented + Baselines + + + + + + + + + Line + Ordering + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Bounding + Polygons + + + + + + + Trainable + + + + + + + + + + + + Segmentation + + + + + + + + + + Region Heatmaps + + + + + + + + + + Region + Vectorization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Region + Boundaries + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/4.0/_static/kraken_segmodel.svg b/4.0/_static/kraken_segmodel.svg new file mode 100644 index 000000000..e722a9707 --- /dev/null +++ b/4.0/_static/kraken_segmodel.svg @@ -0,0 +1,250 @@ + + + + + + + + + + + + + Segmentation Model + (TorchVGSLModel) + + + + + + + + + Metadata + + + + + + + Line and Region Types + + + + + + + Baseline location flag + + + + + + + Bounding Regions + + + + + + + + + + + Neural Network + + + + diff --git a/4.0/_static/kraken_torchseqrecognizer.svg b/4.0/_static/kraken_torchseqrecognizer.svg new file mode 100644 index 000000000..c9a2f1135 --- /dev/null +++ b/4.0/_static/kraken_torchseqrecognizer.svg @@ -0,0 +1,239 @@ + + + + + + + + + + + + + Transcription Model + (TorchSeqRecognizer) + + + + + + + + + + Codec + + + + + + + + + + + Metadata + + + + + + + + + + + CTC Decoder + + + + + + + + + + + Neural Network + + + + diff --git a/4.0/_static/kraken_workflow.svg b/4.0/_static/kraken_workflow.svg new file mode 100644 index 000000000..5a50b51d6 --- /dev/null +++ b/4.0/_static/kraken_workflow.svg @@ -0,0 +1,753 @@ + + + + + + + + + + + + + + + Segmentation + + + + + + + + + + + Recognition + + + + + + + + + + + Serialization + + + + + + + + + + + + + + + + + + + + + + Recognition Model + + + + + + + + + + + + + + + + + + + + + + Segmentation Model + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + OCR Records + + + + + + + + + + + + + + + + + + Baselines, + Regions, + and Order + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Output File + + + + + + + + + + + + + + + + + + Output Template + + + + + + + + + + + + + + + + + + Image + + diff --git a/4.0/_static/language_data.js b/4.0/_static/language_data.js new file mode 100644 index 000000000..367b8ed81 --- /dev/null +++ b/4.0/_static/language_data.js @@ -0,0 +1,199 @@ +/* + * language_data.js + * ~~~~~~~~~~~~~~~~ + * + * This script contains the language-specific data used by searchtools.js, + * namely the list of stopwords, stemmer, scorer and splitter. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"]; + + +/* Non-minified version is copied as a separate JS file, if available */ + +/** + * Porter Stemmer + */ +var Stemmer = function() { + + var step2list = { + ational: 'ate', + tional: 'tion', + enci: 'ence', + anci: 'ance', + izer: 'ize', + bli: 'ble', + alli: 'al', + entli: 'ent', + eli: 'e', + ousli: 'ous', + ization: 'ize', + ation: 'ate', + ator: 'ate', + alism: 'al', + iveness: 'ive', + fulness: 'ful', + ousness: 'ous', + aliti: 'al', + iviti: 'ive', + biliti: 'ble', + logi: 'log' + }; + + var step3list = { + icate: 'ic', + ative: '', + alize: 'al', + iciti: 'ic', + ical: 'ic', + ful: '', + ness: '' + }; + + var c = "[^aeiou]"; // consonant + var v = "[aeiouy]"; // vowel + var C = c + "[^aeiouy]*"; // consonant sequence + var V = v + "[aeiou]*"; // vowel sequence + + var mgr0 = "^(" + C + ")?" + V + C; // [C]VC... is m>0 + var meq1 = "^(" + C + ")?" + V + C + "(" + V + ")?$"; // [C]VC[V] is m=1 + var mgr1 = "^(" + C + ")?" + V + C + V + C; // [C]VCVC... is m>1 + var s_v = "^(" + C + ")?" + v; // vowel in stem + + this.stemWord = function (w) { + var stem; + var suffix; + var firstch; + var origword = w; + + if (w.length < 3) + return w; + + var re; + var re2; + var re3; + var re4; + + firstch = w.substr(0,1); + if (firstch == "y") + w = firstch.toUpperCase() + w.substr(1); + + // Step 1a + re = /^(.+?)(ss|i)es$/; + re2 = /^(.+?)([^s])s$/; + + if (re.test(w)) + w = w.replace(re,"$1$2"); + else if (re2.test(w)) + w = w.replace(re2,"$1$2"); + + // Step 1b + re = /^(.+?)eed$/; + re2 = /^(.+?)(ed|ing)$/; + if (re.test(w)) { + var fp = re.exec(w); + re = new RegExp(mgr0); + if (re.test(fp[1])) { + re = /.$/; + w = w.replace(re,""); + } + } + else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1]; + re2 = new RegExp(s_v); + if (re2.test(stem)) { + w = stem; + re2 = /(at|bl|iz)$/; + re3 = new RegExp("([^aeiouylsz])\\1$"); + re4 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + if (re2.test(w)) + w = w + "e"; + else if (re3.test(w)) { + re = /.$/; + w = w.replace(re,""); + } + else if (re4.test(w)) + w = w + "e"; + } + } + + // Step 1c + re = /^(.+?)y$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(s_v); + if (re.test(stem)) + w = stem + "i"; + } + + // Step 2 + re = /^(.+?)(ational|tional|enci|anci|izer|bli|alli|entli|eli|ousli|ization|ation|ator|alism|iveness|fulness|ousness|aliti|iviti|biliti|logi)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = new RegExp(mgr0); + if (re.test(stem)) + w = stem + step2list[suffix]; + } + + // Step 3 + re = /^(.+?)(icate|ative|alize|iciti|ical|ful|ness)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = new RegExp(mgr0); + if (re.test(stem)) + w = stem + step3list[suffix]; + } + + // Step 4 + re = /^(.+?)(al|ance|ence|er|ic|able|ible|ant|ement|ment|ent|ou|ism|ate|iti|ous|ive|ize)$/; + re2 = /^(.+?)(s|t)(ion)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(mgr1); + if (re.test(stem)) + w = stem; + } + else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1] + fp[2]; + re2 = new RegExp(mgr1); + if (re2.test(stem)) + w = stem; + } + + // Step 5 + re = /^(.+?)e$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(mgr1); + re2 = new RegExp(meq1); + re3 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + if (re.test(stem) || (re2.test(stem) && !(re3.test(stem)))) + w = stem; + } + re = /ll$/; + re2 = new RegExp(mgr1); + if (re.test(w) && re2.test(w)) { + re = /.$/; + w = w.replace(re,""); + } + + // and turn initial Y back to y + if (firstch == "y") + w = firstch.toLowerCase() + w.substr(1); + return w; + } +} + diff --git a/4.0/_static/minus.png b/4.0/_static/minus.png new file mode 100644 index 000000000..d96755fda Binary files /dev/null and b/4.0/_static/minus.png differ diff --git a/4.0/_static/normal-reproduction-low-resolution.jpg b/4.0/_static/normal-reproduction-low-resolution.jpg new file mode 100644 index 000000000..673be92ae Binary files /dev/null and b/4.0/_static/normal-reproduction-low-resolution.jpg differ diff --git a/4.0/_static/pat.png b/4.0/_static/pat.png new file mode 100644 index 000000000..52f7a8995 Binary files /dev/null and b/4.0/_static/pat.png differ diff --git a/4.0/_static/plus.png b/4.0/_static/plus.png new file mode 100644 index 000000000..7107cec93 Binary files /dev/null and b/4.0/_static/plus.png differ diff --git a/4.0/_static/pygments.css b/4.0/_static/pygments.css new file mode 100644 index 000000000..0d49244ed --- /dev/null +++ b/4.0/_static/pygments.css @@ -0,0 +1,75 @@ +pre { line-height: 125%; } +td.linenos .normal { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +span.linenos { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +td.linenos .special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +.highlight .hll { background-color: #ffffcc } +.highlight { background: #eeffcc; } +.highlight .c { color: #408090; font-style: italic } /* Comment */ +.highlight .err { border: 1px solid #FF0000 } /* Error */ +.highlight .k { color: #007020; font-weight: bold } /* Keyword */ +.highlight .o { color: #666666 } /* Operator */ +.highlight .ch { color: #408090; font-style: italic } /* Comment.Hashbang */ +.highlight .cm { color: #408090; font-style: italic } /* Comment.Multiline */ +.highlight .cp { color: #007020 } /* Comment.Preproc */ +.highlight .cpf { color: #408090; font-style: italic } /* Comment.PreprocFile */ +.highlight .c1 { color: #408090; font-style: italic } /* Comment.Single */ +.highlight .cs { color: #408090; background-color: #fff0f0 } /* Comment.Special */ +.highlight .gd { color: #A00000 } /* Generic.Deleted */ +.highlight .ge { font-style: italic } /* Generic.Emph */ +.highlight .ges { font-weight: bold; font-style: italic } /* Generic.EmphStrong */ +.highlight .gr { color: #FF0000 } /* Generic.Error */ +.highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */ +.highlight .gi { color: #00A000 } /* Generic.Inserted */ +.highlight .go { color: #333333 } /* Generic.Output */ +.highlight .gp { color: #c65d09; font-weight: bold } /* Generic.Prompt */ +.highlight .gs { font-weight: bold } /* Generic.Strong */ +.highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */ +.highlight .gt { color: #0044DD } /* Generic.Traceback */ +.highlight .kc { color: #007020; font-weight: bold } /* Keyword.Constant */ +.highlight .kd { color: #007020; font-weight: bold } /* Keyword.Declaration */ +.highlight .kn { color: #007020; font-weight: bold } /* Keyword.Namespace */ +.highlight .kp { color: #007020 } /* Keyword.Pseudo */ +.highlight .kr { color: #007020; font-weight: bold } /* Keyword.Reserved */ +.highlight .kt { color: #902000 } /* Keyword.Type */ +.highlight .m { color: #208050 } /* Literal.Number */ +.highlight .s { color: #4070a0 } /* Literal.String */ +.highlight .na { color: #4070a0 } /* Name.Attribute */ +.highlight .nb { color: #007020 } /* Name.Builtin */ +.highlight .nc { color: #0e84b5; font-weight: bold } /* Name.Class */ +.highlight .no { color: #60add5 } /* Name.Constant */ +.highlight .nd { color: #555555; font-weight: bold } /* Name.Decorator */ +.highlight .ni { color: #d55537; font-weight: bold } /* Name.Entity */ +.highlight .ne { color: #007020 } /* Name.Exception */ +.highlight .nf { color: #06287e } /* Name.Function */ +.highlight .nl { color: #002070; font-weight: bold } /* Name.Label */ +.highlight .nn { color: #0e84b5; font-weight: bold } /* Name.Namespace */ +.highlight .nt { color: #062873; font-weight: bold } /* Name.Tag */ +.highlight .nv { color: #bb60d5 } /* Name.Variable */ +.highlight .ow { color: #007020; font-weight: bold } /* Operator.Word */ +.highlight .w { color: #bbbbbb } /* Text.Whitespace */ +.highlight .mb { color: #208050 } /* Literal.Number.Bin */ +.highlight .mf { color: #208050 } /* Literal.Number.Float */ +.highlight .mh { color: #208050 } /* Literal.Number.Hex */ +.highlight .mi { color: #208050 } /* Literal.Number.Integer */ +.highlight .mo { color: #208050 } /* Literal.Number.Oct */ +.highlight .sa { color: #4070a0 } /* Literal.String.Affix */ +.highlight .sb { color: #4070a0 } /* Literal.String.Backtick */ +.highlight .sc { color: #4070a0 } /* Literal.String.Char */ +.highlight .dl { color: #4070a0 } /* Literal.String.Delimiter */ +.highlight .sd { color: #4070a0; font-style: italic } /* Literal.String.Doc */ +.highlight .s2 { color: #4070a0 } /* Literal.String.Double */ +.highlight .se { color: #4070a0; font-weight: bold } /* Literal.String.Escape */ +.highlight .sh { color: #4070a0 } /* Literal.String.Heredoc */ +.highlight .si { color: #70a0d0; font-style: italic } /* Literal.String.Interpol */ +.highlight .sx { color: #c65d09 } /* Literal.String.Other */ +.highlight .sr { color: #235388 } /* Literal.String.Regex */ +.highlight .s1 { color: #4070a0 } /* Literal.String.Single */ +.highlight .ss { color: #517918 } /* Literal.String.Symbol */ +.highlight .bp { color: #007020 } /* Name.Builtin.Pseudo */ +.highlight .fm { color: #06287e } /* Name.Function.Magic */ +.highlight .vc { color: #bb60d5 } /* Name.Variable.Class */ +.highlight .vg { color: #bb60d5 } /* Name.Variable.Global */ +.highlight .vi { color: #bb60d5 } /* Name.Variable.Instance */ +.highlight .vm { color: #bb60d5 } /* Name.Variable.Magic */ +.highlight .il { color: #208050 } /* Literal.Number.Integer.Long */ \ No newline at end of file diff --git a/4.0/_static/searchtools.js b/4.0/_static/searchtools.js new file mode 100644 index 000000000..b08d58c9b --- /dev/null +++ b/4.0/_static/searchtools.js @@ -0,0 +1,620 @@ +/* + * searchtools.js + * ~~~~~~~~~~~~~~~~ + * + * Sphinx JavaScript utilities for the full-text search. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ +"use strict"; + +/** + * Simple result scoring code. + */ +if (typeof Scorer === "undefined") { + var Scorer = { + // Implement the following function to further tweak the score for each result + // The function takes a result array [docname, title, anchor, descr, score, filename] + // and returns the new score. + /* + score: result => { + const [docname, title, anchor, descr, score, filename] = result + return score + }, + */ + + // query matches the full name of an object + objNameMatch: 11, + // or matches in the last dotted part of the object name + objPartialMatch: 6, + // Additive scores depending on the priority of the object + objPrio: { + 0: 15, // used to be importantResults + 1: 5, // used to be objectResults + 2: -5, // used to be unimportantResults + }, + // Used when the priority is not in the mapping. + objPrioDefault: 0, + + // query found in title + title: 15, + partialTitle: 7, + // query found in terms + term: 5, + partialTerm: 2, + }; +} + +const _removeChildren = (element) => { + while (element && element.lastChild) element.removeChild(element.lastChild); +}; + +/** + * See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping + */ +const _escapeRegExp = (string) => + string.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string + +const _displayItem = (item, searchTerms, highlightTerms) => { + const docBuilder = DOCUMENTATION_OPTIONS.BUILDER; + const docFileSuffix = DOCUMENTATION_OPTIONS.FILE_SUFFIX; + const docLinkSuffix = DOCUMENTATION_OPTIONS.LINK_SUFFIX; + const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY; + const contentRoot = document.documentElement.dataset.content_root; + + const [docName, title, anchor, descr, score, _filename] = item; + + let listItem = document.createElement("li"); + let requestUrl; + let linkUrl; + if (docBuilder === "dirhtml") { + // dirhtml builder + let dirname = docName + "/"; + if (dirname.match(/\/index\/$/)) + dirname = dirname.substring(0, dirname.length - 6); + else if (dirname === "index/") dirname = ""; + requestUrl = contentRoot + dirname; + linkUrl = requestUrl; + } else { + // normal html builders + requestUrl = contentRoot + docName + docFileSuffix; + linkUrl = docName + docLinkSuffix; + } + let linkEl = listItem.appendChild(document.createElement("a")); + linkEl.href = linkUrl + anchor; + linkEl.dataset.score = score; + linkEl.innerHTML = title; + if (descr) { + listItem.appendChild(document.createElement("span")).innerHTML = + " (" + descr + ")"; + // highlight search terms in the description + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + } + else if (showSearchSummary) + fetch(requestUrl) + .then((responseData) => responseData.text()) + .then((data) => { + if (data) + listItem.appendChild( + Search.makeSearchSummary(data, searchTerms, anchor) + ); + // highlight search terms in the summary + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + }); + Search.output.appendChild(listItem); +}; +const _finishSearch = (resultCount) => { + Search.stopPulse(); + Search.title.innerText = _("Search Results"); + if (!resultCount) + Search.status.innerText = Documentation.gettext( + "Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories." + ); + else + Search.status.innerText = _( + "Search finished, found ${resultCount} page(s) matching the search query." + ).replace('${resultCount}', resultCount); +}; +const _displayNextItem = ( + results, + resultCount, + searchTerms, + highlightTerms, +) => { + // results left, load the summary and display it + // this is intended to be dynamic (don't sub resultsCount) + if (results.length) { + _displayItem(results.pop(), searchTerms, highlightTerms); + setTimeout( + () => _displayNextItem(results, resultCount, searchTerms, highlightTerms), + 5 + ); + } + // search finished, update title and status message + else _finishSearch(resultCount); +}; +// Helper function used by query() to order search results. +// Each input is an array of [docname, title, anchor, descr, score, filename]. +// Order the results by score (in opposite order of appearance, since the +// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically. +const _orderResultsByScoreThenName = (a, b) => { + const leftScore = a[4]; + const rightScore = b[4]; + if (leftScore === rightScore) { + // same score: sort alphabetically + const leftTitle = a[1].toLowerCase(); + const rightTitle = b[1].toLowerCase(); + if (leftTitle === rightTitle) return 0; + return leftTitle > rightTitle ? -1 : 1; // inverted is intentional + } + return leftScore > rightScore ? 1 : -1; +}; + +/** + * Default splitQuery function. Can be overridden in ``sphinx.search`` with a + * custom function per language. + * + * The regular expression works by splitting the string on consecutive characters + * that are not Unicode letters, numbers, underscores, or emoji characters. + * This is the same as ``\W+`` in Python, preserving the surrogate pair area. + */ +if (typeof splitQuery === "undefined") { + var splitQuery = (query) => query + .split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu) + .filter(term => term) // remove remaining empty strings +} + +/** + * Search Module + */ +const Search = { + _index: null, + _queued_query: null, + _pulse_status: -1, + + htmlToText: (htmlString, anchor) => { + const htmlElement = new DOMParser().parseFromString(htmlString, 'text/html'); + for (const removalQuery of [".headerlink", "script", "style"]) { + htmlElement.querySelectorAll(removalQuery).forEach((el) => { el.remove() }); + } + if (anchor) { + const anchorContent = htmlElement.querySelector(`[role="main"] ${anchor}`); + if (anchorContent) return anchorContent.textContent; + + console.warn( + `Anchored content block not found. Sphinx search tries to obtain it via DOM query '[role=main] ${anchor}'. Check your theme or template.` + ); + } + + // if anchor not specified or not found, fall back to main content + const docContent = htmlElement.querySelector('[role="main"]'); + if (docContent) return docContent.textContent; + + console.warn( + "Content block not found. Sphinx search tries to obtain it via DOM query '[role=main]'. Check your theme or template." + ); + return ""; + }, + + init: () => { + const query = new URLSearchParams(window.location.search).get("q"); + document + .querySelectorAll('input[name="q"]') + .forEach((el) => (el.value = query)); + if (query) Search.performSearch(query); + }, + + loadIndex: (url) => + (document.body.appendChild(document.createElement("script")).src = url), + + setIndex: (index) => { + Search._index = index; + if (Search._queued_query !== null) { + const query = Search._queued_query; + Search._queued_query = null; + Search.query(query); + } + }, + + hasIndex: () => Search._index !== null, + + deferQuery: (query) => (Search._queued_query = query), + + stopPulse: () => (Search._pulse_status = -1), + + startPulse: () => { + if (Search._pulse_status >= 0) return; + + const pulse = () => { + Search._pulse_status = (Search._pulse_status + 1) % 4; + Search.dots.innerText = ".".repeat(Search._pulse_status); + if (Search._pulse_status >= 0) window.setTimeout(pulse, 500); + }; + pulse(); + }, + + /** + * perform a search for something (or wait until index is loaded) + */ + performSearch: (query) => { + // create the required interface elements + const searchText = document.createElement("h2"); + searchText.textContent = _("Searching"); + const searchSummary = document.createElement("p"); + searchSummary.classList.add("search-summary"); + searchSummary.innerText = ""; + const searchList = document.createElement("ul"); + searchList.classList.add("search"); + + const out = document.getElementById("search-results"); + Search.title = out.appendChild(searchText); + Search.dots = Search.title.appendChild(document.createElement("span")); + Search.status = out.appendChild(searchSummary); + Search.output = out.appendChild(searchList); + + const searchProgress = document.getElementById("search-progress"); + // Some themes don't use the search progress node + if (searchProgress) { + searchProgress.innerText = _("Preparing search..."); + } + Search.startPulse(); + + // index already loaded, the browser was quick! + if (Search.hasIndex()) Search.query(query); + else Search.deferQuery(query); + }, + + _parseQuery: (query) => { + // stem the search terms and add them to the correct list + const stemmer = new Stemmer(); + const searchTerms = new Set(); + const excludedTerms = new Set(); + const highlightTerms = new Set(); + const objectTerms = new Set(splitQuery(query.toLowerCase().trim())); + splitQuery(query.trim()).forEach((queryTerm) => { + const queryTermLower = queryTerm.toLowerCase(); + + // maybe skip this "word" + // stopwords array is from language_data.js + if ( + stopwords.indexOf(queryTermLower) !== -1 || + queryTerm.match(/^\d+$/) + ) + return; + + // stem the word + let word = stemmer.stemWord(queryTermLower); + // select the correct list + if (word[0] === "-") excludedTerms.add(word.substr(1)); + else { + searchTerms.add(word); + highlightTerms.add(queryTermLower); + } + }); + + if (SPHINX_HIGHLIGHT_ENABLED) { // set in sphinx_highlight.js + localStorage.setItem("sphinx_highlight_terms", [...highlightTerms].join(" ")) + } + + // console.debug("SEARCH: searching for:"); + // console.info("required: ", [...searchTerms]); + // console.info("excluded: ", [...excludedTerms]); + + return [query, searchTerms, excludedTerms, highlightTerms, objectTerms]; + }, + + /** + * execute search (requires search index to be loaded) + */ + _performSearch: (query, searchTerms, excludedTerms, highlightTerms, objectTerms) => { + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const titles = Search._index.titles; + const allTitles = Search._index.alltitles; + const indexEntries = Search._index.indexentries; + + // Collect multiple result groups to be sorted separately and then ordered. + // Each is an array of [docname, title, anchor, descr, score, filename]. + const normalResults = []; + const nonMainIndexResults = []; + + _removeChildren(document.getElementById("search-progress")); + + const queryLower = query.toLowerCase().trim(); + for (const [title, foundTitles] of Object.entries(allTitles)) { + if (title.toLowerCase().trim().includes(queryLower) && (queryLower.length >= title.length/2)) { + for (const [file, id] of foundTitles) { + const score = Math.round(Scorer.title * queryLower.length / title.length); + const boost = titles[file] === title ? 1 : 0; // add a boost for document titles + normalResults.push([ + docNames[file], + titles[file] !== title ? `${titles[file]} > ${title}` : title, + id !== null ? "#" + id : "", + null, + score + boost, + filenames[file], + ]); + } + } + } + + // search for explicit entries in index directives + for (const [entry, foundEntries] of Object.entries(indexEntries)) { + if (entry.includes(queryLower) && (queryLower.length >= entry.length/2)) { + for (const [file, id, isMain] of foundEntries) { + const score = Math.round(100 * queryLower.length / entry.length); + const result = [ + docNames[file], + titles[file], + id ? "#" + id : "", + null, + score, + filenames[file], + ]; + if (isMain) { + normalResults.push(result); + } else { + nonMainIndexResults.push(result); + } + } + } + } + + // lookup as object + objectTerms.forEach((term) => + normalResults.push(...Search.performObjectSearch(term, objectTerms)) + ); + + // lookup as search terms in fulltext + normalResults.push(...Search.performTermsSearch(searchTerms, excludedTerms)); + + // let the scorer override scores with a custom scoring function + if (Scorer.score) { + normalResults.forEach((item) => (item[4] = Scorer.score(item))); + nonMainIndexResults.forEach((item) => (item[4] = Scorer.score(item))); + } + + // Sort each group of results by score and then alphabetically by name. + normalResults.sort(_orderResultsByScoreThenName); + nonMainIndexResults.sort(_orderResultsByScoreThenName); + + // Combine the result groups in (reverse) order. + // Non-main index entries are typically arbitrary cross-references, + // so display them after other results. + let results = [...nonMainIndexResults, ...normalResults]; + + // remove duplicate search results + // note the reversing of results, so that in the case of duplicates, the highest-scoring entry is kept + let seen = new Set(); + results = results.reverse().reduce((acc, result) => { + let resultStr = result.slice(0, 4).concat([result[5]]).map(v => String(v)).join(','); + if (!seen.has(resultStr)) { + acc.push(result); + seen.add(resultStr); + } + return acc; + }, []); + + return results.reverse(); + }, + + query: (query) => { + const [searchQuery, searchTerms, excludedTerms, highlightTerms, objectTerms] = Search._parseQuery(query); + const results = Search._performSearch(searchQuery, searchTerms, excludedTerms, highlightTerms, objectTerms); + + // for debugging + //Search.lastresults = results.slice(); // a copy + // console.info("search results:", Search.lastresults); + + // print the results + _displayNextItem(results, results.length, searchTerms, highlightTerms); + }, + + /** + * search for object names + */ + performObjectSearch: (object, objectTerms) => { + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const objects = Search._index.objects; + const objNames = Search._index.objnames; + const titles = Search._index.titles; + + const results = []; + + const objectSearchCallback = (prefix, match) => { + const name = match[4] + const fullname = (prefix ? prefix + "." : "") + name; + const fullnameLower = fullname.toLowerCase(); + if (fullnameLower.indexOf(object) < 0) return; + + let score = 0; + const parts = fullnameLower.split("."); + + // check for different match types: exact matches of full name or + // "last name" (i.e. last dotted part) + if (fullnameLower === object || parts.slice(-1)[0] === object) + score += Scorer.objNameMatch; + else if (parts.slice(-1)[0].indexOf(object) > -1) + score += Scorer.objPartialMatch; // matches in last name + + const objName = objNames[match[1]][2]; + const title = titles[match[0]]; + + // If more than one term searched for, we require other words to be + // found in the name/title/description + const otherTerms = new Set(objectTerms); + otherTerms.delete(object); + if (otherTerms.size > 0) { + const haystack = `${prefix} ${name} ${objName} ${title}`.toLowerCase(); + if ( + [...otherTerms].some((otherTerm) => haystack.indexOf(otherTerm) < 0) + ) + return; + } + + let anchor = match[3]; + if (anchor === "") anchor = fullname; + else if (anchor === "-") anchor = objNames[match[1]][1] + "-" + fullname; + + const descr = objName + _(", in ") + title; + + // add custom score for some objects according to scorer + if (Scorer.objPrio.hasOwnProperty(match[2])) + score += Scorer.objPrio[match[2]]; + else score += Scorer.objPrioDefault; + + results.push([ + docNames[match[0]], + fullname, + "#" + anchor, + descr, + score, + filenames[match[0]], + ]); + }; + Object.keys(objects).forEach((prefix) => + objects[prefix].forEach((array) => + objectSearchCallback(prefix, array) + ) + ); + return results; + }, + + /** + * search for full-text terms in the index + */ + performTermsSearch: (searchTerms, excludedTerms) => { + // prepare search + const terms = Search._index.terms; + const titleTerms = Search._index.titleterms; + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const titles = Search._index.titles; + + const scoreMap = new Map(); + const fileMap = new Map(); + + // perform the search on the required terms + searchTerms.forEach((word) => { + const files = []; + const arr = [ + { files: terms[word], score: Scorer.term }, + { files: titleTerms[word], score: Scorer.title }, + ]; + // add support for partial matches + if (word.length > 2) { + const escapedWord = _escapeRegExp(word); + if (!terms.hasOwnProperty(word)) { + Object.keys(terms).forEach((term) => { + if (term.match(escapedWord)) + arr.push({ files: terms[term], score: Scorer.partialTerm }); + }); + } + if (!titleTerms.hasOwnProperty(word)) { + Object.keys(titleTerms).forEach((term) => { + if (term.match(escapedWord)) + arr.push({ files: titleTerms[term], score: Scorer.partialTitle }); + }); + } + } + + // no match but word was a required one + if (arr.every((record) => record.files === undefined)) return; + + // found search word in contents + arr.forEach((record) => { + if (record.files === undefined) return; + + let recordFiles = record.files; + if (recordFiles.length === undefined) recordFiles = [recordFiles]; + files.push(...recordFiles); + + // set score for the word in each file + recordFiles.forEach((file) => { + if (!scoreMap.has(file)) scoreMap.set(file, {}); + scoreMap.get(file)[word] = record.score; + }); + }); + + // create the mapping + files.forEach((file) => { + if (!fileMap.has(file)) fileMap.set(file, [word]); + else if (fileMap.get(file).indexOf(word) === -1) fileMap.get(file).push(word); + }); + }); + + // now check if the files don't contain excluded terms + const results = []; + for (const [file, wordList] of fileMap) { + // check if all requirements are matched + + // as search terms with length < 3 are discarded + const filteredTermCount = [...searchTerms].filter( + (term) => term.length > 2 + ).length; + if ( + wordList.length !== searchTerms.size && + wordList.length !== filteredTermCount + ) + continue; + + // ensure that none of the excluded terms is in the search result + if ( + [...excludedTerms].some( + (term) => + terms[term] === file || + titleTerms[term] === file || + (terms[term] || []).includes(file) || + (titleTerms[term] || []).includes(file) + ) + ) + break; + + // select one (max) score for the file. + const score = Math.max(...wordList.map((w) => scoreMap.get(file)[w])); + // add result to the result list + results.push([ + docNames[file], + titles[file], + "", + null, + score, + filenames[file], + ]); + } + return results; + }, + + /** + * helper function to return a node containing the + * search summary for a given text. keywords is a list + * of stemmed words. + */ + makeSearchSummary: (htmlText, keywords, anchor) => { + const text = Search.htmlToText(htmlText, anchor); + if (text === "") return null; + + const textLower = text.toLowerCase(); + const actualStartPosition = [...keywords] + .map((k) => textLower.indexOf(k.toLowerCase())) + .filter((i) => i > -1) + .slice(-1)[0]; + const startWithContext = Math.max(actualStartPosition - 120, 0); + + const top = startWithContext === 0 ? "" : "..."; + const tail = startWithContext + 240 < text.length ? "..." : ""; + + let summary = document.createElement("p"); + summary.classList.add("context"); + summary.textContent = top + text.substr(startWithContext, 240).trim() + tail; + + return summary; + }, +}; + +_ready(Search.init); diff --git a/4.0/_static/sphinx_highlight.js b/4.0/_static/sphinx_highlight.js new file mode 100644 index 000000000..8a96c69a1 --- /dev/null +++ b/4.0/_static/sphinx_highlight.js @@ -0,0 +1,154 @@ +/* Highlighting utilities for Sphinx HTML documentation. */ +"use strict"; + +const SPHINX_HIGHLIGHT_ENABLED = true + +/** + * highlight a given string on a node by wrapping it in + * span elements with the given class name. + */ +const _highlight = (node, addItems, text, className) => { + if (node.nodeType === Node.TEXT_NODE) { + const val = node.nodeValue; + const parent = node.parentNode; + const pos = val.toLowerCase().indexOf(text); + if ( + pos >= 0 && + !parent.classList.contains(className) && + !parent.classList.contains("nohighlight") + ) { + let span; + + const closestNode = parent.closest("body, svg, foreignObject"); + const isInSVG = closestNode && closestNode.matches("svg"); + if (isInSVG) { + span = document.createElementNS("http://www.w3.org/2000/svg", "tspan"); + } else { + span = document.createElement("span"); + span.classList.add(className); + } + + span.appendChild(document.createTextNode(val.substr(pos, text.length))); + const rest = document.createTextNode(val.substr(pos + text.length)); + parent.insertBefore( + span, + parent.insertBefore( + rest, + node.nextSibling + ) + ); + node.nodeValue = val.substr(0, pos); + /* There may be more occurrences of search term in this node. So call this + * function recursively on the remaining fragment. + */ + _highlight(rest, addItems, text, className); + + if (isInSVG) { + const rect = document.createElementNS( + "http://www.w3.org/2000/svg", + "rect" + ); + const bbox = parent.getBBox(); + rect.x.baseVal.value = bbox.x; + rect.y.baseVal.value = bbox.y; + rect.width.baseVal.value = bbox.width; + rect.height.baseVal.value = bbox.height; + rect.setAttribute("class", className); + addItems.push({ parent: parent, target: rect }); + } + } + } else if (node.matches && !node.matches("button, select, textarea")) { + node.childNodes.forEach((el) => _highlight(el, addItems, text, className)); + } +}; +const _highlightText = (thisNode, text, className) => { + let addItems = []; + _highlight(thisNode, addItems, text, className); + addItems.forEach((obj) => + obj.parent.insertAdjacentElement("beforebegin", obj.target) + ); +}; + +/** + * Small JavaScript module for the documentation. + */ +const SphinxHighlight = { + + /** + * highlight the search words provided in localstorage in the text + */ + highlightSearchWords: () => { + if (!SPHINX_HIGHLIGHT_ENABLED) return; // bail if no highlight + + // get and clear terms from localstorage + const url = new URL(window.location); + const highlight = + localStorage.getItem("sphinx_highlight_terms") + || url.searchParams.get("highlight") + || ""; + localStorage.removeItem("sphinx_highlight_terms") + url.searchParams.delete("highlight"); + window.history.replaceState({}, "", url); + + // get individual terms from highlight string + const terms = highlight.toLowerCase().split(/\s+/).filter(x => x); + if (terms.length === 0) return; // nothing to do + + // There should never be more than one element matching "div.body" + const divBody = document.querySelectorAll("div.body"); + const body = divBody.length ? divBody[0] : document.querySelector("body"); + window.setTimeout(() => { + terms.forEach((term) => _highlightText(body, term, "highlighted")); + }, 10); + + const searchBox = document.getElementById("searchbox"); + if (searchBox === null) return; + searchBox.appendChild( + document + .createRange() + .createContextualFragment( + '" + ) + ); + }, + + /** + * helper function to hide the search marks again + */ + hideSearchWords: () => { + document + .querySelectorAll("#searchbox .highlight-link") + .forEach((el) => el.remove()); + document + .querySelectorAll("span.highlighted") + .forEach((el) => el.classList.remove("highlighted")); + localStorage.removeItem("sphinx_highlight_terms") + }, + + initEscapeListener: () => { + // only install a listener if it is really needed + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.shiftKey || event.altKey || event.ctrlKey || event.metaKey) return; + if (DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS && (event.key === "Escape")) { + SphinxHighlight.hideSearchWords(); + event.preventDefault(); + } + }); + }, +}; + +_ready(() => { + /* Do not call highlightSearchWords() when we are on the search page. + * It will highlight words from the *previous* search query. + */ + if (typeof Search === "undefined") SphinxHighlight.highlightSearchWords(); + SphinxHighlight.initEscapeListener(); +}); diff --git a/4.0/advanced.html b/4.0/advanced.html new file mode 100644 index 000000000..d3fc3cee5 --- /dev/null +++ b/4.0/advanced.html @@ -0,0 +1,353 @@ + + + + + + + + Advanced Usage — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Advanced Usage

+

Optical character recognition is the serial execution of multiple steps, in the +case of kraken binarization (converting color and grayscale images into bitonal +ones), layout analysis/page segmentation (extracting topological text lines +from an image), recognition (feeding text lines images into an classifiers), +and finally serialization of results into an appropriate format such as hOCR or +ALTO.

+
+

Input Specification

+

All kraken subcommands operating on input-output pairs, i.e. producing one +output document for one input document follow the basic syntax:

+
$ kraken -i input_1 output_1 -i input_2 output_2 ... subcommand_1 subcommand_2 ... subcommand_n
+
+
+

In particular subcommands may be chained.

+

There are other ways to define inputs and outputs as the syntax shown above can +become rather cumbersome for large amounts of files.

+

As such there are a couple of ways to deal with multiple files in a compact +way. The first is batch processing:

+
$ kraken -I '*.png' -o ocr.txt segment ...
+
+
+

which expands the glob expression in kraken internally and +appends the suffix defined with -o to each output file. An input file +xyz.png will therefore produce an output file xyz.png.ocr.txt. A second way +is to input multi-image files directly. These can be either in PDF, TIFF, or +JPEG2000 format and are specified like:

+
$ kraken -I some.pdf -o ocr.txt -f pdf segment ...
+
+
+

This will internally extract all page images from the input PDF file and write +one output file with an index (can be changed using the -p option) and the +suffix defined with -o.

+

The -f option can not only be used to extract data from PDF/TIFF/JPEG2000 +files but also various XML formats. In these cases the appropriate data is +automatically selected from the inputs, image data for segmentation or line and +region segmentation for recognition:

+
$ kraken -i alto.xml alto.ocr.txt -i page.xml page.ocr.txt -f xml ocr ...
+
+
+

The code is able to automatically determine if a file is in PageXML or ALTO format.

+
+
+

Binarization

+

The binarization subcommand accepts almost the same parameters as +ocropus-nlbin. Only options not related to binarization, e.g. skew +detection are missing. In addition, error checking (image sizes, inversion +detection, grayscale enforcement) is always disabled and kraken will happily +binarize any image that is thrown at it.

+

Available parameters are:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

option

type

–threshold

FLOAT

–zoom

FLOAT

–escale

FLOAT

–border

FLOAT

–perc

INTEGER RANGE

–range

INTEGER

–low

INTEGER RANGE

–high

INTEGER RANGE

+
+
+

Page Segmentation and Script Detection

+

The segment subcommand access two operations page segmentation into lines and +script detection of those lines.

+

Page segmentation is mostly parameterless, although a switch to change the +color of column separators has been retained. The segmentation is written as a +JSON file containing bounding boxes in reading order and +the general text direction (horizontal, i.e. LTR or RTL text in top-to-bottom +reading order or vertical-ltr/rtl for vertical lines read from left-to-right or +right-to-left).

+

The script detection splits extracted lines from the segmenter into strip +sharing a particular script that can then be recognized by supplying +appropriate models for each detected script to the ocr subcommand.

+

Combined output from both consists of lists in the boxes field corresponding +to a topographical line and containing one or more bounding boxes of a +particular script. Identifiers are ISO 15924 4 character codes.

+
$ kraken -i 14.tif lines.txt segment
+$ cat lines.json
+{
+   "boxes" : [
+    [
+        ["Grek", [561, 216, 1626,309]]
+    ],
+    [
+        ["Latn", [2172, 197, 2424, 244]]
+    ],
+    [
+        ["Grek", [1678, 221, 2236, 320]],
+        ["Arab", [2241, 221, 2302, 320]]
+    ],
+
+        ["Grek", [412, 318, 2215, 416]],
+        ["Latn", [2208, 318, 2424, 416]]
+    ],
+    ...
+   ],
+   "script_detection": true,
+   "text_direction" : "horizontal-tb"
+}
+
+
+

Script detection is automatically enabled; by explicitly disabling script +detection the boxes field will contain only a list of line bounding boxes:

+
[546, 216, 1626, 309],
+[2169, 197, 2423, 244],
+[1676, 221, 2293, 320],
+...
+[503, 2641, 848, 2681]
+
+
+

Available page segmentation parameters are:

+ + + + + + + + + + + + + + + + + + + + + + + +

option

action

-d, –text-direction

Sets principal text direction. Valid values are horizontal-lr, horizontal-rl, vertical-lr, and vertical-rl.

–scale FLOAT

Estimate of the average line height on the page

-m, –maxcolseps

Maximum number of columns in the input document. Set to 0 for uni-column layouts.

-b, –black-colseps / -w, –white-colseps

Switch to black column separators.

-r, –remove-hlines / -l, –hlines

Disables prefiltering of small horizontal lines. Improves segmenter output on some Arabic texts.

+
+
+

Model Repository

+

There is a semi-curated repository of freely licensed recognition +models that can be accessed from the command line using a few subcommands. For +evaluating a series of models it is also possible to just clone the repository +using the normal git client.

+

The list subcommand retrieves a list of all models available and prints +them including some additional information (identifier, type, and a short +description):

+
$ kraken list
+Retrieving model list   ✓
+default (pyrnn) - A converted version of en-default.pyrnn.gz
+toy (clstm) - A toy model trained on 400 lines of the UW3 data set.
+...
+
+
+

To access more detailed information the show subcommand may be used:

+
$ kraken show toy
+name: toy.clstm
+
+A toy model trained on 400 lines of the UW3 data set.
+
+author: Benjamin Kiessling (mittagessen@l.unchti.me)
+http://kraken.re
+
+
+

If a suitable model has been decided upon it can be retrieved using the get +subcommand:

+
$ kraken get toy
+Retrieving model        ✓
+
+
+

Models will be placed in $XDG_BASE_DIR and can be accessed using their name as +shown by the show command, e.g.:

+
$ kraken -i ... ... ocr -m toy
+
+
+

Additions and updates to existing models are always welcome! Just open a pull +request or write an email.

+
+
+

Recognition

+

Recognition requires a grey-scale or binarized image, a page segmentation for +that image, and a model file. In particular there is no requirement to use the +page segmentation algorithm contained in the segment subcommand or the +binarization provided by kraken.

+

Multi-script recognition is possible by supplying a script-annotated +segmentation and a mapping between scripts and models:

+
$ kraken -i ... ... ocr -m Grek:porson.clstm -m Latn:antiqua.clstm
+
+
+

All polytonic Greek text portions will be recognized using the porson.clstm +model while Latin text will be fed into the antiqua.clstm model. It is +possible to define a fallback model that other text will be fed to:

+
$ kraken -i ... ... ocr -m ... -m ... -m default:porson.clstm
+
+
+

It is also possible to disable recognition on a particular script by mapping to +the special model keyword ignore. Ignored lines will still be serialized but +will not contain any recognition results.

+

The ocr subcommand is able to serialize the recognition results either as +plain text (default), as hOCR, into ALTO, or abbyyXML containing additional +metadata such as bounding boxes and confidences:

+
$ kraken -i ... ... ocr -t # text output
+$ kraken -i ... ... ocr -h # hOCR output
+$ kraken -i ... ... ocr -a # ALTO output
+$ kraken -i ... ... ocr -y # abbyyXML output
+
+
+

hOCR output is slightly different from hOCR files produced by ocropus. Each +ocr_line span contains not only the bounding box of the line but also +character boxes (x_bboxes attribute) indicating the coordinates of each +character. In each line alternating sequences of alphanumeric and +non-alphanumeric (in the unicode sense) characters are put into ocrx_word +spans. Both have bounding boxes as attributes and the recognition confidence +for each character in the x_conf attribute.

+

Paragraph detection has been removed as it was deemed to be unduly dependent on +certain typographic features which may not be valid for your input.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.0/api.html b/4.0/api.html new file mode 100644 index 000000000..7b7b8b226 --- /dev/null +++ b/4.0/api.html @@ -0,0 +1,3056 @@ + + + + + + + + API Quickstart — kraken documentation + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

API Quickstart

+

Kraken provides routines which are usable by third party tools to access all +functionality of the OCR engine. Most functional blocks, binarization, +segmentation, recognition, and serialization are encapsulated in one high +level method each.

+

Simple use cases of the API which are mostly useful for debugging purposes are +contained in the contrib directory. In general it is recommended to look at +this tutorial, these scripts, or the API reference. The command line drivers +are unnecessarily complex for straightforward applications as they contain lots +of boilerplate to enable all use cases.

+
+

Basic Concepts

+

The fundamental modules of the API are similar to the command line drivers. +Image inputs and outputs are generally Pillow +objects and numerical outputs numpy arrays.

+

Top-level modules implement high level functionality while kraken.lib +contains loaders and low level methods that usually should not be used if +access to intermediate results is not required.

+
+
+

Preprocessing and Segmentation

+

The primary preprocessing function is binarization although depending on the +particular setup of the pipeline and the models utilized it can be optional. +For the non-trainable legacy bounding box segmenter binarization is mandatory +although it is still possible to feed color and grayscale images to the +recognizer. The trainable baseline segmenter can work with black and white, +grayscale, and color images, depending on the training data and netork +configuration utilized; though grayscale and color data are used in almost all +cases.

+
>>> from PIL import Image
+
+>>> from kraken import binarization
+
+# can be any supported image format and mode
+>>> im = Image.open('foo.png')
+>>> bw_im = binarization.nlbin(im)
+
+
+
+

Legacy segmentation

+

The basic parameter of the legacy segmenter consists just of a b/w image +object, although some additional parameters exist, largely to change the +principal text direction (important for column ordering and top-to-bottom +scripts) and explicit masking of non-text image regions:

+
>>> from kraken import pageseg
+
+>>> seg = pageseg.segment(bw_im)
+>>> seg
+{'text_direction': 'horizontal-lr',
+ 'boxes': [[0, 29, 232, 56],
+           [28, 54, 121, 84],
+           [9, 73, 92, 117],
+           [103, 76, 145, 131],
+           [7, 105, 119, 230],
+           [10, 228, 126, 345],
+           ...
+          ],
+ 'script_detection': False}
+
+
+
+
+

Baseline segmentation

+

The baseline segmentation method is based on a neural network that classifies +image pixels into baselines and regions. Because it is trainable, a +segmentation model is required in addition to the image to be segmentation and +it has to be loaded first:

+
>>> from kraken import blla
+>>> from kraken.lib import vgsl
+
+>>> model_path = 'path/to/model/file'
+>>> model = vgsl.TorchVGSLModel.load_model(model_path)
+
+
+

A segmentation model contains a basic neural network and associated metadata +defining the available line and region types, bounding regions, and an +auxiliary baseline location flag for the polygonizer:

+ + + + + + + + + + + + + Segmentation Model + (TorchVGSLModel) + + + + + + + + + Metadata + + + + + + + Line and Region Types + + + + + + + Baseline location flag + + + + + + + Bounding Regions + + + + + + + + + + + Neural Network + + + + +

Afterwards they can be fed into the segmentation method +kraken.blla.segment() with image objects:

+
>>> from kraken import blla
+>>> from kraken import serialization
+
+>>> baseline_seg = blla.segment(im, model=model)
+>>> baseline_seg
+{'text_direction': 'horizontal-lr',
+ 'type': 'baselines',
+ 'script_detection': False,
+ 'lines': [{'script': 'default',
+            'baseline': [[471, 1408], [524, 1412], [509, 1397], [1161, 1412], [1195, 1412]],
+            'boundary': [[471, 1408], [491, 1408], [515, 1385], [562, 1388], [575, 1377], ... [473, 1410]]},
+           ...],
+ 'regions': {'$tip':[[[536, 1716], ... [522, 1708], [524, 1716], [536, 1716], ...]
+             '$par': ...
+             '$nop':  ...}}
+>>> alto = serialization.serialize_segmentation(baseline_seg, image_name=im.filename, image_size=im.size, template='alto')
+>>> with open('segmentation_output.xml', 'w') as fp:
+        fp.write(alto)
+
+
+

Optional parameters are largely the same as for the legacy segmenter, i.e. text +direction and masking.

+

Images are automatically converted into the proper mode for recognition, except +in the case of models trained on binary images as there is a plethora of +different algorithms available, each with strengths and weaknesses. For most +material the kraken-provided binarization should be sufficient, though. This +does not mean that a segmentation model trained on RGB images will have equal +accuracy for B/W, grayscale, and RGB inputs. Nevertheless the drop in quality +will often be modest or non-existent for color models while non-binarized +inputs to a binary model will cause severe degradation (and a warning to that +notion).

+

Per default segmentation is performed on the CPU although the neural network +can be run on a GPU with the device argument. As the vast majority of the +processing required is postprocessing the performance gain will most likely +modest though.

+

The above API is the most simple way to perform a complete segmentation. The +process consists of multiple steps such as pixel labelling, separate region and +baseline vectorization, and bounding polygon calculation:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Pixel Labelling + + + + + + + + Line and Separator + Heatmaps + + + + + + + + + Bounding Polygon + Calculation + + + + + + + + + + + Baseline + Vectorization + and Orientation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Oriented + Baselines + + + + + + + + + Line + Ordering + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Bounding + Polygons + + + + + + + Trainable + + + + + + + + + + + + Segmentation + + + + + + + + + + Region Heatmaps + + + + + + + + + + Region + Vectorization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Region + Boundaries + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

It is possible to only run a subset of the functionality depending on one’s +needs by calling the respective functions in kraken.lib.segmentation. As +part of the sub-library the API is not guaranteed to be stable but it generally +does not change much. Examples of more fine-grained use of the segmentation API +can be found in contrib/repolygonize.py +and contrib/segmentation_overlay.py.

+
+
+
+

Recognition

+

Recognition itself is a multi-step process with a neural network producing a +matrix with a confidence value for possible outputs at each time step. This +matrix is decoded into a sequence of integer labels (label domain) which are +subsequently mapped into Unicode code points using a codec. Labels and code +points usually correspond one-to-one, i.e. each label is mapped to exactly one +Unicode code point, but if desired more complex codecs can map single labels to +multiple code points, multiple labels to single code points, or multiple labels +to multiple code points (see the Codec section for further +information).

+ + + + + + + + + + + + Output Matrix + + + Labels + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Label + Sequence + + + 15, 10, 1, ... + + + + 'Time' Steps + + + + + + + + + + + + + + 'Time' Steps + (Width) + + + + + + + + + + + + + + + + + + + + + + + + + + Neural + Net + + + + Character + Sequence + + + o, c, u, ... + + + + + + + + + + + + + + + CTC + decoder + + + + + Codec + + + + + + + + + + + + + + +

As the customization of this two-stage decoding process is usually reserved +for specialized use cases, sensible defaults are chosen by default: codecs are +part of the model file and do not have to be supplied manually; the preferred +CTC decoder is an optional parameter of the recognition model object.

+

To perform text line recognition a neural network has to be loaded first. A +kraken.lib.models.TorchSeqRecognizer is returned which is a wrapper +around the kraken.lib.vgsl.TorchVGSLModel class seen above for +segmentation model loading.

+
>>> from kraken.lib import models
+
+>>> rec_model_path = '/path/to/recognition/model'
+>>> model = models.load_any(rec_model_path)
+
+
+

The sequence recognizer wrapper combines the neural network itself, a +codec, metadata such as the if the input is supposed to be +grayscale or binarized, and an instance of a CTC decoder that performs the +conversion of the raw output tensor of the network into a sequence of labels:

+ + + + + + + + + + + + + Transcription Model + (TorchSeqRecognizer) + + + + + + + + + + Codec + + + + + + + + + + + Metadata + + + + + + + + + + + CTC Decoder + + + + + + + + + + + Neural Network + + + + +

Afterwards, given an image, a segmentation and the model one can perform text +recognition. The code is identical for both legacy and baseline segmentations. +Like for segmentation input images are auto-converted to the correct color +mode, except in the case of binary models for which a warning will be raised if +there is a mismatch for binary input models.

+

There are two methods for recognition, a basic single model call +kraken.rpred.rpred() and a multi-model recognizer +kraken.rpred.mm_rpred(). The latter is useful for recognizing +multi-scriptal documents, i.e. applying different models to different parts of +a document.

+
>>> from kraken import rpred
+# single model recognition
+>>> pred_it = rpred(model, im, baseline_seg)
+>>> for record in pred_it:
+        print(record)
+
+
+

The output isn’t just a sequence of characters but an +kraken.rpred.ocr_record record object containing the character +prediction, cuts (approximate locations), and confidences.

+
>>> record.cuts
+>>> record.prediction
+>>> record.confidences
+
+
+

it is also possible to access the original line information:

+
# for baselines
+>>> record.type
+'baselines'
+>>> record.line
+>>> record.baseline
+>>> record.script
+
+# for box lines
+>>> record.type
+'box'
+>>> record.line
+>>> record.script
+
+
+

Sometimes the undecoded raw output of the network is required. The \(C +\times W\) softmax output matrix is accessible as the outputs attribute on the +kraken.lib.models.TorchSeqRecognizer after each step of the +kraken.rpred.rpred() iterator. To get a mapping from the label space +\(C\) the network operates in to Unicode code points a codec is used. An +arbitrary sequence of labels can generate an arbitrary number of Unicode code +points although usually the relation is one-to-one.

+
>>> pred_it = rpred(model, im, baseline_seg)
+>>> next(pred_it)
+>>> model.output
+>>> model.codec.l2c
+{'\x01': ' ',
+ '\x02': '"',
+ '\x03': "'",
+ '\x04': '(',
+ '\x05': ')',
+ '\x06': '-',
+ '\x07': '/',
+ ...
+}
+
+
+

There are several different ways to convert the output matrix to a sequence of +labels that can be decoded into a character sequence. These are contained in +kraken.lib.ctc_decoder with +kraken.lib.ctc_decoder.greedy_decoder() being the default.

+
+
+

XML Parsing

+

Sometimes it is desired to take the data in an existing XML serialization +format like PageXML or ALTO and apply an OCR function on it. The +kraken.lib.xml module includes parsers extracting information into data +structures processable with minimal transformtion by the functional blocks:

+
>>> from kraken.lib import xml
+
+>>> alto_doc = '/path/to/alto'
+>>> xml.parse_alto(alto_doc)
+{'image': '/path/to/image/file',
+ 'type': 'baselines',
+ 'lines': [{'baseline': [(24, 2017), (25, 2078)],
+            'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)],
+            'text': '',
+            'script': 'default'},
+           {'baseline': [(79, 2016), (79, 2041)],
+            'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)],
+            'text': '',
+            'script': 'default'}, ...],
+ 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)],
+                                      [(253, 3292), (668, 3292), (668, 3455), (253, 3455)],
+                                      [(216, -4), (1015, -4), (1015, 534), (216, 534)]],
+             'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)],
+                                  [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]],
+             ...}
+}
+
+>>> page_doc = '/path/to/page'
+>>> xml.parse_page(page_doc)
+{'image': '/path/to/image/file',
+ 'type': 'baselines',
+ 'lines': [{'baseline': [(24, 2017), (25, 2078)],
+            'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)],
+            'text': '',
+            'script': 'default'},
+           {'baseline': [(79, 2016), (79, 2041)],
+            'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)],
+            'text': '',
+            'script': 'default'}, ...],
+ 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)],
+                                      [(253, 3292), (668, 3292), (668, 3455), (253, 3455)],
+                                      [(216, -4), (1015, -4), (1015, 534), (216, 534)]],
+             'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)],
+                                  [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]],
+             ...}
+
+
+
+
+

Serialization

+

The serialization module can be used to transform the ocr_records returned by the prediction iterator into a text +based (most often XML) format for archival. The module renders jinja2 templates in kraken/templates through +the kraken.serialization.serialize() function.

+
>>> from kraken.lib import serialization
+
+>>> records = [record for record in pred_it]
+>>> alto = serialization.serialize(records, image_name='path/to/image', image_size=im.size, template='alto')
+>>> with open('output.xml', 'w') as fp:
+        fp.write(alto)
+
+
+
+
+

Training

+

Training is largely implemented with the pytorch lightning framework. There are separate +LightningModule`s for recognition and segmentation training and a small +wrapper around the lightning’s `Trainer class that mainly sets up model +handling and verbosity options for the CLI.

+
>>> from kraken.lib.train import RecognitionModel, KrakenTrainer
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer()
+>>> trainer.fit(model)
+
+
+

Likewise for a baseline and region segmentation model:

+
>>> from kraken.lib.train import SegmentationModel, KrakenTrainer
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = SegmentationModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer()
+>>> trainer.fit(model)
+
+
+

When the fit() method is called the dataset is initialized and the training +commences. Both can take quite a bit of time. To get insight into what exactly +is happening the standard lightning callbacks +can be attached to the trainer object:

+
>>> from pytorch_lightning.callbacks import Callback
+>>> from kraken.lib.train import RecognitionModel, KrakenTrainer
+>>> class MyPrintingCallback(Callback):
+    def on_init_start(self, trainer):
+        print("Starting to init trainer!")
+
+    def on_init_end(self, trainer):
+        print("trainer is init now")
+
+    def on_train_end(self, trainer, pl_module):
+        print("do something when training ends")
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer(enable_progress_bar=False, callbacks=[MyPrintingCallback])
+>>> trainer.fit(model)
+Starting to init trainer!
+trainer is init now
+
+
+

This is only a small subset of the training functionality. It is suggested to +have a closer look at the command line parameters for features as transfer +learning, region and baseline filtering, training continuation, and so on.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.0/api_docs.html b/4.0/api_docs.html new file mode 100644 index 000000000..353bbbef5 --- /dev/null +++ b/4.0/api_docs.html @@ -0,0 +1,2684 @@ + + + + + + + + API Reference — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

API Reference

+
+

kraken.blla module

+
+

Note

+

blla provides the interface to the fully trainable segmenter. For the +legacy segmenter interface refer to the pageseg module. Note that +recognition models are not interchangeable between segmenters.

+
+
+
+kraken.blla.segment(im, text_direction='horizontal-lr', mask=None, reading_order_fn=polygonal_reading_order, model=None, device='cpu')
+

Segments a page into text lines using the baseline segmenter.

+

Segments a page into text lines and returns the polyline formed by each +baseline and their estimated environment.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image. The mode can generally be anything but it is possible +to supply a binarized-input-only model which requires accordingly +treated images.

  • +
  • text_direction (str) – Passed-through value for serialization.serialize.

  • +
  • mask (Optional[numpy.ndarray]) – A bi-level mask image of the same size as im where 0-valued +regions are ignored for segmentation purposes. Disables column +detection.

  • +
  • reading_order_fn (Callable) – Function to determine the reading order. Has to +accept a list of tuples (baselines, polygon) and a +text direction (lr or rl).

  • +
  • model (Union[List[kraken.lib.vgsl.TorchVGSLModel], kraken.lib.vgsl.TorchVGSLModel]) – One or more TorchVGSLModel containing a segmentation model. If +none is given a default model will be loaded.

  • +
  • device (str) – The target device to run the neural network on.

  • +
+
+
Returns:
+

A dictionary containing the text direction and under the key ‘lines’ a +list of reading order sorted baselines (polylines) and their respective +polygonal boundaries. The last and first point of each boundary polygon +are connected.

+
 {'text_direction': '$dir',
+  'type': 'baseline',
+  'lines': [
+     {'baseline': [[x0, y0], [x1, y1], ..., [x_n, y_n]], 'boundary': [[x0, y0, x1, y1], ... [x_m, y_m]]},
+     {'baseline': [[x0, ...]], 'boundary': [[x0, ...]]}
+   ]
+   'regions': [
+     {'region': [[x0, y0], [x1, y1], ..., [x_n, y_n]], 'type': 'image'},
+     {'region': [[x0, ...]], 'type': 'text'}
+   ]
+ }
+
+
+

+
+
Raises:
+
+
+
Return type:
+

Dict[str, Any]

+
+
+
+ +
+
+

kraken.pageseg module

+
+

Note

+

pageseg is the legacy bounding box-based segmenter. For the trainable +baseline segmenter interface refer to the blla module. Note that +recognition models are not interchangeable between segmenters.

+
+
+
+kraken.pageseg.segment(im, text_direction='horizontal-lr', scale=None, maxcolseps=2, black_colseps=False, no_hlines=True, pad=0, mask=None, reading_order_fn=reading_order)
+

Segments a page into text lines.

+

Segments a page into text lines and returns the absolute coordinates of +each line in reading order.

+
+
Parameters:
+
    +
  • im – A bi-level page of mode ‘1’ or ‘L’

  • +
  • text_direction (str) – Principal direction of the text +(horizontal-lr/rl/vertical-lr/rl)

  • +
  • scale (Optional[float]) – Scale of the image. Will be auto-determined if set to None.

  • +
  • maxcolseps (float) – Maximum number of whitespace column separators

  • +
  • black_colseps (bool) – Whether column separators are assumed to be vertical +black lines or not

  • +
  • no_hlines (bool) – Switch for small horizontal line removal.

  • +
  • pad (Union[int, Tuple[int, int]]) – Padding to add to line bounding boxes. If int the same padding is +used both left and right. If a 2-tuple, uses (padding_left, +padding_right).

  • +
  • mask (Optional[numpy.ndarray]) – A bi-level mask image of the same size as im where 0-valued +regions are ignored for segmentation purposes. Disables column +detection.

  • +
  • reading_order_fn (Callable) – Function to call to order line output. Callable +accepting a list of slices (y, x) and a text +direction in (rl, lr).

  • +
+
+
Returns:
+

A dictionary containing the text direction and a list of reading order +sorted bounding boxes under the key ‘boxes’:

+
{'text_direction': '$dir', 'boxes': [(x1, y1, x2, y2),...]}
+
+
+

+
+
Raises:
+

KrakenInputException – if the input image is not binarized or the text +direction is invalid.

+
+
Return type:
+

Dict[str, Any]

+
+
+
+ +
+
+

kraken.rpred module

+
+
+kraken.rpred.bidi_record(record, base_dir=None)
+

Reorders a record using the Unicode BiDi algorithm.

+

Models trained for RTL or mixed scripts still emit classes in LTR order +requiring reordering for proper display.

+
+
Parameters:
+

record (kraken.rpred.ocr_record)

+
+
Returns:
+

kraken.rpred.ocr_record

+
+
Return type:
+

ocr_record

+
+
+
+ +
+
+class kraken.rpred.mm_rpred(nets, im, bounds, pad=16, bidi_reordering=True, tags_ignore=None)
+

Multi-model version of kraken.rpred.rpred

+
+
Parameters:
+
+
+
+
+
+bidi_reordering
+
+ +
+
+bounds
+
+ +
+
+filtered_tags = []
+
+ +
+
+im
+
+ +
+
+im_str
+
+ +
+
+miss = []
+
+ +
+
+nets
+
+ +
+
+one_channel_modes
+
+ +
+
+pad
+
+ +
+
+seg_types
+
+ +
+
+tags
+
+ +
+
+tags_ignore
+
+ +
+
+ts
+
+ +
+ +
+
+class kraken.rpred.ocr_record(prediction, cuts, confidences, line)
+

A record object containing the recognition result of a single line

+
+
Parameters:
+
    +
  • prediction (str)

  • +
  • confidences (List[float])

  • +
  • line (Union[List, Dict[str, List]])

  • +
+
+
+
+
+base_dir = None
+
+ +
+
+confidences
+
+ +
+
+cuts
+
+ +
+
+prediction
+
+ +
+
+tags
+
+ +
+
+type
+
+ +
+ +
+
+kraken.rpred.rpred(network, im, bounds, pad=16, bidi_reordering=True)
+

Uses a TorchSeqRecognizer and a segmentation to recognize text

+
+
Parameters:
+
    +
  • network (kraken.lib.models.TorchSeqRecognizer) – A TorchSegRecognizer +object

  • +
  • im (PIL.Image.Image) – Image to extract text from

  • +
  • bounds (dict) – A dictionary containing a ‘boxes’ entry with a list of +coordinates (x0, y0, x1, y1) of a text line in the image +and an entry ‘text_direction’ containing +‘horizontal-lr/rl/vertical-lr/rl’.

  • +
  • pad (int) – Extra blank padding to the left and right of text line. +Auto-disabled when expected network inputs are incompatible +with padding.

  • +
  • bidi_reordering (bool|str) – Reorder classes in the ocr_record according to +the Unicode bidirectional algorithm for correct +display. Set to L|R to change base text +direction.

  • +
+
+
Yields:
+

An ocr_record containing the recognized text, absolute character +positions, and confidence values for each character.

+
+
Return type:
+

Generator[ocr_record, None, None]

+
+
+
+ +
+
+

kraken.serialization module

+
+
+kraken.serialization.render_report(model, chars, errors, char_confusions, scripts, insertions, deletions, substitutions)
+

Renders an accuracy report.

+
+
Parameters:
+
    +
  • model (str) – Model name.

  • +
  • errors (int) – Number of errors on test set.

  • +
  • char_confusions (dict) – Dictionary mapping a tuple (gt, pred) to a +number of occurrences.

  • +
  • scripts (dict) – Dictionary counting character per script.

  • +
  • insertions (dict) – Dictionary counting insertion operations per Unicode +script

  • +
  • deletions (int) – Number of deletions

  • +
  • substitutions (dict) – Dictionary counting substitution operations per +Unicode script.

  • +
  • chars (int)

  • +
+
+
Returns:
+

A string containing the rendered report.

+
+
Return type:
+

str

+
+
+
+ +
+
+kraken.serialization.serialize(records, image_name=None, image_size=(0, 0), writing_mode='horizontal-tb', scripts=None, regions=None, template='hocr')
+

Serializes a list of ocr_records into an output document.

+

Serializes a list of predictions and their corresponding positions by doing +some hOCR-specific preprocessing and then renders them through one of +several jinja2 templates.

+

Note: Empty records are ignored for serialization purposes.

+
+
Parameters:
+
    +
  • records (iterable) – List of kraken.rpred.ocr_record

  • +
  • image_name (str) – Name of the source image

  • +
  • image_size (tuple) – Dimensions of the source image

  • +
  • writing_mode (str) – Sets the principal layout of lines and the +direction in which blocks progress. Valid values +are horizontal-tb, vertical-rl, and +vertical-lr.

  • +
  • scripts (list) – List of scripts contained in the OCR records

  • +
  • regions (list) – Dictionary mapping region types to a list of region +polygons.

  • +
  • template (str) – Selector for the serialization format. May be +‘hocr’ or ‘alto’.

  • +
+
+
Returns:
+

(str) rendered template.

+
+
Return type:
+

str

+
+
+
+ +
+
+kraken.serialization.serialize_segmentation(segresult, image_name=None, image_size=(0, 0), template='hocr')
+

Serializes a segmentation result into an output document.

+
+
Parameters:
+
    +
  • segresult (Dict[str, Any]) – Result of blla.segment

  • +
  • image_name (str) – Name of the source image

  • +
  • image_size (tuple) – Dimensions of the source image

  • +
  • template (str) – Selector for the serialization format. May be +‘hocr’ or ‘alto’.

  • +
+
+
Returns:
+

(str) rendered template.

+
+
Return type:
+

str

+
+
+
+ +
+
+

kraken.lib.models module

+
+
+class kraken.lib.models.TorchSeqRecognizer(nn, decoder=kraken.lib.ctc_decoder.greedy_decoder, train=False, device='cpu')
+

A wrapper class around a TorchVGSLModel for text recognition.

+
+
Parameters:
+
+
+
+
+
+codec
+
+ +
+
+decoder
+
+ +
+
+device
+
+ +
+
+forward(line, lens=None)
+

Performs a forward pass on a torch tensor of one or more lines with +shape (N, C, H, W) and returns a numpy array (N, W, C).

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (torch.Tensor) – Optional tensor containing sequence lengths if N > 1

  • +
+
+
Returns:
+

Tuple with (N, W, C) shaped numpy array and final output sequence +lengths.

+
+
Raises:
+

KrakenInputException – Is raised if the channel dimension isn’t of +size 1 in the network output.

+
+
Return type:
+

Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]

+
+
+
+ +
+
+kind = ''
+
+ +
+
+nn
+
+ +
+
+one_channel_mode
+
+ +
+
+predict(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns the decoding as a list of tuples (string, start, end, +confidence).

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (Optional[torch.Tensor]) – Optional tensor containing sequence lengths if N > 1

  • +
+
+
Returns:
+

List of decoded sequences.

+
+
Return type:
+

List[List[Tuple[str, int, int, float]]]

+
+
+
+ +
+
+predict_labels(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns a list of tuples (class, start, end, max). Max is the +maximum value of the softmax layer in the region.

+
+
Parameters:
+
    +
  • line (torch.tensor)

  • +
  • lens (torch.Tensor)

  • +
+
+
Return type:
+

List[List[Tuple[int, int, int, float]]]

+
+
+
+ +
+
+predict_string(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns a string of the results.

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (Optional[torch.Tensor]) – Optional tensor

  • +
+
+
Return type:
+

List[str]

+
+
+
+ +
+
+seg_type
+
+ +
+
+to(device)
+

Moves model to device and automatically loads input tensors onto it.

+
+ +
+
+train
+
+ +
+ +
+
+kraken.lib.models.load_any(fname, train=False, device='cpu')
+

Loads anything that was, is, and will be a valid ocropus model and +instantiates a shiny new kraken.lib.lstm.SeqRecognizer from the RNN +configuration in the file.

+

Currently it recognizes the following kinds of models:

+
+
    +
  • protobuf models containing converted python BIDILSTMs (recognition +only)

  • +
  • protobuf models containing CLSTM networks (recognition only)

  • +
  • protobuf models containing VGSL segmentation and recognitino +networks.

  • +
+
+

Additionally an attribute ‘kind’ will be added to the SeqRecognizer +containing a string representation of the source kind. Current known values +are:

+
+
    +
  • pyrnn for pickled BIDILSTMs

  • +
  • clstm for protobuf models generated by clstm

  • +
  • vgsl for VGSL models

  • +
+
+
+
Parameters:
+
    +
  • fname (str) – Path to the model

  • +
  • train (bool) – Enables gradient calculation and dropout layers in model.

  • +
  • device (str) – Target device

  • +
+
+
Returns:
+

A kraken.lib.models.TorchSeqRecognizer object.

+
+
Raises:
+

KrakenInvalidModelException – if the model is not loadable by any parser.

+
+
Return type:
+

TorchSeqRecognizer

+
+
+
+ +
+
+

kraken.lib.vgsl module

+
+
+class kraken.lib.vgsl.TorchVGSLModel(spec)
+

Class building a torch module from a VSGL spec.

+

The initialized class will contain a variable number of layers and a loss +function. Inputs and outputs are always 4D tensors in order (batch, +channels, height, width) with channels always being the feature dimension.

+

Importantly this means that a recurrent network will be fed the channel +vector at each step along its time axis, i.e. either put the non-time-axis +dimension into the channels dimension or use a summarizing RNN squashing +the time axis to 1 and putting the output into the channels dimension +respectively.

+
+
Parameters:
+

spec (str)

+
+
+
+
+input
+

Expected input tensor as a 4-tuple.

+
+
Type:
+

tuple

+
+
+
+ +
+
+nn
+

Stack of layers parsed from the spec.

+
+
Type:
+

torch.nn.Sequential

+
+
+
+ +
+
+criterion
+

Fully parametrized loss function.

+
+
Type:
+

torch.nn.Module

+
+
+
+ +
+
+user_metadata
+

dict with user defined metadata. Is flushed into +model file during saving/overwritten by loading +operations.

+
+
Type:
+

dict

+
+
+
+ +
+
+one_channel_mode
+

Field indicating the image type used during +training of one-channel images. Is ‘1’ for +models trained on binarized images, ‘L’ for +grayscale, and None otherwise.

+
+
Type:
+

str

+
+
+
+ +
+
+add_codec(codec)
+

Adds a PytorchCodec to the model.

+
+
Parameters:
+

codec (kraken.lib.codec.PytorchCodec)

+
+
Return type:
+

None

+
+
+
+ +
+
+append(idx, spec)
+

Splits a model at layer idx and append layers spec.

+

New layers are initialized using the init_weights method.

+
+
Parameters:
+
    +
  • idx (int) – Index of layer to append spec to starting with 1. To +select the whole layer stack set idx to None.

  • +
  • spec (str) – VGSL spec without input block to append to model.

  • +
+
+
Return type:
+

None

+
+
+
+ +
+
+blocks
+
+ +
+
+build_addition(input, blocks, idx)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_conv(input, blocks, idx)
+

Builds a 2D convolution layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_dropout(input, blocks, idx)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_groupnorm(input, blocks, idx)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_identity(input, blocks, idx)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_maxpool(input, blocks, idx)
+

Builds a maxpool layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_output(input, blocks, idx)
+

Builds an output layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_parallel(input, blocks, idx)
+

Builds a block of parallel layers.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_reshape(input, blocks, idx)
+

Builds a reshape layer

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_rnn(input, blocks, idx)
+

Builds an LSTM/GRU layer returning number of outputs and layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_series(input, blocks, idx)
+

Builds a serial block of layers.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+codec: kraken.lib.codec.PytorchCodec | None = None
+
+ +
+
+criterion: Any = None
+
+ +
+
+eval()
+

Sets the model to evaluation/inference mode, disabling dropout and +gradient calculation.

+
+
Return type:
+

None

+
+
+
+ +
+
+property hyper_params
+
+ +
+
+idx
+
+ +
+
+init_weights(idx=slice(0, None))
+

Initializes weights for all or a subset of layers in the graph.

+

LSTM/GRU layers are orthogonally initialized, convolutional layers +uniformly from (-0.1,0.1).

+
+
Parameters:
+

idx (slice) – A slice object representing the indices of layers to +initialize.

+
+
Return type:
+

None

+
+
+
+ +
+
+input
+
+ +
+
+classmethod load_clstm_model(path)
+

Loads an CLSTM model to VGSL.

+
+
Parameters:
+

path (Union[str, pathlib.Path])

+
+
+
+ +
+
+classmethod load_model(path)
+

Deserializes a VGSL model from a CoreML file.

+
+
Parameters:
+

path (Union[str, pathlib.Path]) – CoreML file

+
+
Returns:
+

A TorchVGSLModel instance.

+
+
Raises:
+
    +
  • KrakenInvalidModelException if the model data is invalid (not a

  • +
  • string, protobuf file, or without appropriate metadata).

  • +
  • FileNotFoundError if the path doesn't point to a file.

  • +
+
+
+
+ +
+
+classmethod load_pronn_model(path)
+

Loads an pronn model to VGSL.

+
+
Parameters:
+

path (Union[str, pathlib.Path])

+
+
+
+ +
+
+m
+
+ +
+
+property model_type
+
+ +
+
+named_spec: List[str] = []
+
+ +
+
+nn
+
+ +
+
+property one_channel_mode
+
+ +
+
+ops
+
+ +
+
+pattern
+
+ +
+
+resize_output(output_size, del_indices=None)
+

Resizes an output layer.

+
+
Parameters:
+
    +
  • output_size (int) – New size/output channels of last layer

  • +
  • del_indices (list) – list of outputs to delete from layer

  • +
+
+
Return type:
+

None

+
+
+
+ +
+
+save_model(path)
+

Serializes the model into path.

+
+
Parameters:
+

path (str) – Target destination

+
+
+
+ +
+
+property seg_type
+
+ +
+
+set_num_threads(num)
+

Sets number of OpenMP threads to use.

+
+
Parameters:
+

num (int)

+
+
Return type:
+

None

+
+
+
+ +
+
+spec
+
+ +
+
+to(device)
+
+
Parameters:
+

device (Union[str, torch.device])

+
+
Return type:
+

None

+
+
+
+ +
+
+train()
+

Sets the model to training mode (enables dropout layers and disables +softmax on CTC layers).

+
+
Return type:
+

None

+
+
+
+ +
+
+user_metadata: dict[str, str]
+
+ +
+ +
+
+

kraken.lib.xml module

+
+
+kraken.lib.xml.parse_xml(filename)
+

Parses either a PageXML or ALTO file with autodetermination of the file +format.

+
+
Parameters:
+

filename (Union[str, pathlib.Path]) – path to an XML file.

+
+
Returns:
+

impath, lines: [{‘boundary’: [[x0, y0], …], +‘baseline’: [[x0, y0], …]}, {…], ‘text’: ‘apdjfqpf’, ‘tags’: +[‘script_type_0’, ‘script_type_1’]}, regions: {‘region_type_0’: [[[x0, +y0], …], …], …}}

+
+
Return type:
+

A dict {‘image’

+
+
+
+ +
+
+kraken.lib.xml.parse_page(filename)
+

Parses a PageXML file, returns the baselines defined in it, and loads the +referenced image.

+
+
Parameters:
+

filename (Union[str, pathlib.Path]) – path to a PageXML file.

+
+
Returns:
+

impath, lines: [{‘boundary’: [[x0, y0], …], +‘baseline’: [[x0, y0], …]}, {…], ‘text’: ‘apdjfqpf’, ‘tags’: +{‘script’: ‘script_type’, ‘split’: ‘train’, ‘type’: ‘type_1’]}, +regions: {‘region_type_0’: [[[x0, y0], …], …], …}}

+
+
Return type:
+

A dict {‘image’

+
+
+
+ +
+
+kraken.lib.xml.parse_alto(filename)
+

Parses an ALTO file, returns the baselines defined in it, and loads the +referenced image.

+
+
Parameters:
+

filename (Union[str, pathlib.Path]) – path to an ALTO file.

+
+
Returns:
+

impath, lines: [{‘boundary’: [[x0, y0], …], +‘baseline’: [[x0, y0], …]}, {…], ‘text’: ‘apdjfqpf’, ‘tags’: +{‘script’: ‘script_type’, ‘split’: ‘train’, ‘type’: ‘type_1’]}, +regions: {‘region_type_0’: [[[x0, y0], …], …], …}}

+
+
Return type:
+

A dict {‘image’

+
+
+
+ +
+
+

kraken.lib.codec module

+
+
+class kraken.lib.codec.PytorchCodec(charset, strict=False)
+

Builds a codec converting between graphemes/code points and integer +label sequences.

+

charset may either be a string, a list or a dict. In the first case +each code point will be assigned a label, in the second case each +string in the list will be assigned a label, and in the final case each +key string will be mapped to the value sequence of integers. In the +first two cases labels will be assigned automatically. When a mapping +is manually provided the label codes need to be a prefix-free code.

+

As 0 is the blank label in a CTC output layer, output labels and input +dictionaries are/should be 1-indexed.

+
+
Parameters:
+
    +
  • charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) – Input character set.

  • +
  • strict – Flag indicating if encoding/decoding errors should be ignored +or cause an exception.

  • +
+
+
Raises:
+

KrakenCodecException – If the character set contains duplicate +entries or the mapping is non-singular or +non-prefix-free.

+
+
+
+
+add_labels(charset)
+

Adds additional characters/labels to the codec.

+

charset may either be a string, a list or a dict. In the first case +each code point will be assigned a label, in the second case each +string in the list will be assigned a label, and in the final case each +key string will be mapped to the value sequence of integers. In the +first two cases labels will be assigned automatically.

+

As 0 is the blank label in a CTC output layer, output labels and input +dictionaries are/should be 1-indexed.

+
+
Parameters:
+

charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) – Input character set.

+
+
Return type:
+

PytorchCodec

+
+
+
+ +
+
+c_sorted
+
+ +
+
+decode(labels)
+

Decodes a labelling.

+

Given a labelling with cuts and confidences returns a string with the +cuts and confidences aggregated across label-code point +correspondences. When decoding multilabels to code points the resulting +cuts are min/max, confidences are averaged.

+
+
Parameters:
+

labels (Sequence[Tuple[int, int, int, float]]) – Input containing tuples (label, start, end, +confidence).

+
+
Returns:
+

A list of tuples (code point, start, end, confidence)

+
+
Return type:
+

List[Tuple[str, int, int, float]]

+
+
+
+ +
+
+encode(s)
+

Encodes a string into a sequence of labels.

+

If the code is non-singular we greedily encode the longest sequence first.

+
+
Parameters:
+

s (str) – Input unicode string

+
+
Returns:
+

Ecoded label sequence

+
+
Raises:
+

KrakenEncodeException – if the a subsequence is not encodable and the +codec is set to strict mode.

+
+
Return type:
+

torch.IntTensor

+
+
+
+ +
+
+property is_valid: bool
+

Returns True if the codec is prefix-free (in label space) and +non-singular (in both directions).

+
+
Return type:
+

bool

+
+
+
+ +
+
+l2c: Dict[Tuple[int], str]
+
+ +
+
+property max_label: int
+

Returns the maximum label value.

+
+
Return type:
+

int

+
+
+
+ +
+
+merge(codec)
+

Transforms this codec (c1) into another (c2) reusing as many labels as +possible.

+

The resulting codec is able to encode the same code point sequences +while not necessarily having the same labels for them as c2. +Retains matching character -> label mappings from both codecs, removes +mappings not c2, and adds mappings not in c1. Compound labels in c2 for +code point sequences not in c1 containing labels also in use in c1 are +added as separate labels.

+
+
Parameters:
+

codec (PytorchCodec) – PytorchCodec to merge with

+
+
Returns:
+

A merged codec and a list of labels that were removed from the +original codec.

+
+
Return type:
+

Tuple[PytorchCodec, Set]

+
+
+
+ +
+
+strict
+
+ +
+ +
+
+

kraken.lib.train module

+
+

Training Schedulers

+
+
+

Training Stoppers

+
+
+

Loss and Evaluation Functions

+
+
+

Trainer

+
+
+class kraken.lib.train.KrakenTrainer(callbacks=None, enable_progress_bar=True, enable_summary=True, min_epochs=5, max_epochs=100, *args, **kwargs)
+
+
Parameters:
+
    +
  • callbacks (Optional[Union[List[pytorch_lightning.callbacks.Callback], pytorch_lightning.callbacks.Callback]])

  • +
  • enable_progress_bar (bool)

  • +
  • enable_summary (bool)

  • +
+
+
+
+
+fit(*args, **kwargs)
+
+ +
+
+on_validation_end()
+
+ +
+ +
+
+
+

kraken.lib.dataset module

+
+

Datasets

+
+
+class kraken.lib.dataset.BaselineSet(imgs=None, suffix='.path', line_width=4, im_transforms=transforms.Compose([]), mode='path', augmentation=False, valid_baselines=None, merge_baselines=None, valid_regions=None, merge_regions=None)
+

Dataset for training a baseline/region segmentation model.

+
+
Parameters:
+
    +
  • imgs (Sequence[str])

  • +
  • suffix (str)

  • +
  • line_width (int)

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • mode (str)

  • +
  • augmentation (bool)

  • +
  • valid_baselines (Sequence[str])

  • +
  • merge_baselines (Dict[str, Sequence[str]])

  • +
  • valid_regions (Sequence[str])

  • +
  • merge_regions (Dict[str, Sequence[str]])

  • +
+
+
+
+
+add(image, baselines=None, regions=None, *args, **kwargs)
+

Adds a page to the dataset.

+
+
Parameters:
+
    +
  • im (path) – Path to the whole page image

  • +
  • baseline (dict) – A list containing dicts with a list of coordinates +and tags [{‘baseline’: [[x0, y0], …, +[xn, yn]], ‘tags’: (‘script_type’,)}, …]

  • +
  • regions (dict) – A dict containing list of lists of coordinates +{‘region_type_0’: [[x0, y0], …, [xn, yn]]], +‘region_type_1’: …}.

  • +
  • image (Union[str, PIL.Image.Image])

  • +
  • baselines (List[List[List[Tuple[int, int]]]])

  • +
+
+
+
+ +
+
+aug = None
+
+ +
+
+class_mapping
+
+ +
+
+class_stats
+
+ +
+
+im_mode = '1'
+
+ +
+
+imgs
+
+ +
+
+line_width
+
+ +
+
+mbl_dict
+
+ +
+
+mode
+
+ +
+
+mreg_dict
+
+ +
+
+num_classes = 2
+
+ +
+
+seg_type = None
+
+ +
+
+targets = []
+
+ +
+
+transform(image, target)
+
+ +
+
+transforms
+
+ +
+
+valid_baselines
+
+ +
+
+valid_regions
+
+ +
+ +
+
+class kraken.lib.dataset.PolygonGTDataset(normalization=None, whitespace_normalization=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False)
+

Dataset for training a line recognition model from polygonal/baseline data.

+
+
Parameters:
+
    +
  • normalization (Optional[str])

  • +
  • whitespace_normalization (bool)

  • +
  • reorder (Union[bool, str])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • augmentation (bool)

  • +
+
+
+
+
+add(*args, **kwargs)
+

Adds a line to the dataset.

+
+
Parameters:
+
    +
  • im (path) – Path to the whole page image

  • +
  • text (str) – Transcription of the line.

  • +
  • baseline (list) – A list of coordinates [[x0, y0], …, [xn, yn]].

  • +
  • boundary (list) – A polygon mask for the line.

  • +
+
+
+
+ +
+
+alphabet: collections.Counter
+
+ +
+
+aug = None
+
+ +
+
+encode(codec=None)
+

Adds a codec to the dataset and encodes all text lines.

+

Has to be run before sampling from the dataset.

+
+
Parameters:
+

codec (Optional[kraken.lib.codec.PytorchCodec])

+
+
Return type:
+

None

+
+
+
+ +
+
+im_mode = '1'
+
+ +
+
+no_encode()
+

Creates an unencoded dataset.

+
+
Return type:
+

None

+
+
+
+ +
+
+parse(image, text, baseline, boundary, *args, **kwargs)
+

Parses a sample for the dataset and returns it.

+

This function is mainly uses for parallelized loading of training data.

+
+
Parameters:
+
    +
  • im (path) – Path to the whole page image

  • +
  • text (str) – Transcription of the line.

  • +
  • baseline (list) – A list of coordinates [[x0, y0], …, [xn, yn]].

  • +
  • boundary (list) – A polygon mask for the line.

  • +
  • image (Union[str, PIL.Image.Image])

  • +
+
+
+
+ +
+
+seg_type = 'baselines'
+
+ +
+
+text_transforms: List[Callable[[str], str]] = []
+
+ +
+
+transforms
+
+ +
+ +
+
+class kraken.lib.dataset.GroundTruthDataset(split=F_t.default_split, suffix='.gt.txt', normalization=None, whitespace_normalization=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False)
+

Dataset for training a line recognition model.

+

All data is cached in memory.

+
+
Parameters:
+
    +
  • split (Callable[[str], str])

  • +
  • suffix (str)

  • +
  • normalization (Optional[str])

  • +
  • whitespace_normalization (bool)

  • +
  • reorder (Union[bool, str])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • augmentation (bool)

  • +
+
+
+
+
+add(*args, **kwargs)
+

Adds a line-image-text pair to the dataset.

+
+
Parameters:
+

image (str) – Input image path

+
+
Return type:
+

None

+
+
+
+ +
+
+alphabet: collections.Counter
+
+ +
+
+aug = None
+
+ +
+
+encode(codec=None)
+

Adds a codec to the dataset and encodes all text lines.

+

Has to be run before sampling from the dataset.

+
+
Parameters:
+

codec (Optional[kraken.lib.codec.PytorchCodec])

+
+
Return type:
+

None

+
+
+
+ +
+
+im_mode = '1'
+
+ +
+
+no_encode()
+

Creates an unencoded dataset.

+
+
Return type:
+

None

+
+
+
+ +
+
+parse(image, *args, **kwargs)
+

Parses a sample for this dataset.

+

This is mostly used to parallelize populating the dataset.

+
+
Parameters:
+

image (str) – Input image path

+
+
Return type:
+

Dict

+
+
+
+ +
+
+seg_type = 'bbox'
+
+ +
+
+split
+
+ +
+
+suffix
+
+ +
+
+text_transforms: List[Callable[[str], str]] = []
+
+ +
+
+transforms
+
+ +
+ +
+
+

Helpers

+
+
+kraken.lib.dataset.compute_error(model, batch)
+

Computes error report from a model and a list of line image-text pairs.

+
+
Parameters:
+
+
+
Returns:
+

A tuple with total number of characters and edit distance across the +whole validation set.

+
+
Return type:
+

Tuple[int, int]

+
+
+
+ +
+
+kraken.lib.dataset.preparse_xml_data(filenames, format_type='xml', repolygonize=False)
+

Loads training data from a set of xml files.

+

Extracts line information from Page/ALTO xml files for training of +recognition models.

+
+
Parameters:
+
    +
  • filenames (Sequence[Union[str, pathlib.Path]]) – List of XML files.

  • +
  • format_type (str) – Either page, alto or xml for autodetermination.

  • +
  • repolygonize (bool) – (Re-)calculates polygon information using the kraken +algorithm.

  • +
+
+
Returns:
+

text, ‘baseline’: [[x0, y0], …], ‘boundary’: +[[x0, y0], …], ‘image’: PIL.Image}.

+
+
Return type:
+

A list of dicts {‘text’

+
+
+
+ +
+
+

kraken.lib.segmentation module

+
+
+kraken.lib.segmentation.reading_order(lines, text_direction='lr')
+

Given the list of lines (a list of 2D slices), computes +the partial reading order. The output is a binary 2D array +such that order[i,j] is true if line i comes before line j +in reading order.

+
+
Parameters:
+
    +
  • lines (Sequence[Tuple[slice, slice]])

  • +
  • text_direction (str)

  • +
+
+
Return type:
+

numpy.ndarray

+
+
+
+ +
+
+kraken.lib.segmentation.polygonal_reading_order(lines, text_direction='lr', regions=None)
+

Given a list of baselines and regions, calculates the correct reading order +and applies it to the input.

+
+
Parameters:
+
    +
  • lines (Sequence) – List of tuples containing the baseline and its +polygonization.

  • +
  • regions (Sequence) – List of region polygons.

  • +
  • text_direction (str) – Set principal text direction for column ordering. +Can be ‘lr’ or ‘rl’

  • +
+
+
Returns:
+

A reordered input.

+
+
Return type:
+

Sequence[Tuple[List[Tuple[int, int]], List[Tuple[int, int]]]]

+
+
+
+ +
+
+kraken.lib.segmentation.denoising_hysteresis_thresh(im, low, high, sigma)
+
+ +
+
+kraken.lib.segmentation.vectorize_lines(im, threshold=0.17, min_length=5)
+

Vectorizes lines from a binarized array.

+
+
Parameters:
+
    +
  • im (np.ndarray) – Array of shape (3, H, W) with the first dimension +being probabilities for (start_separators, +end_separators, baseline).

  • +
  • threshold (float) – Threshold for baseline blob detection.

  • +
  • min_length (int) – Minimal length of output baselines.

  • +
+
+
Returns:
+

[[x0, y0, … xn, yn], [xm, ym, …, xk, yk], … ] +A list of lists containing the points of all baseline polylines.

+
+
+
+ +
+
+kraken.lib.segmentation.calculate_polygonal_environment(im=None, baselines=None, suppl_obj=None, im_feats=None, scale=None, topline=False)
+

Given a list of baselines and an input image, calculates a polygonal +environment around each baseline.

+
+
Parameters:
+
    +
  • im (PIL.Image) – grayscale input image (mode ‘L’)

  • +
  • baselines (sequence) – List of lists containing a single baseline per +entry.

  • +
  • suppl_obj (sequence) – List of lists containing additional polylines +that should be considered hard boundaries for +polygonizaton purposes. Can be used to prevent +polygonization into non-text areas such as +illustrations or to compute the polygonization of +a subset of the lines in an image.

  • +
  • im_feats (numpy.array) – An optional precomputed seamcarve energy map. +Overrides data in im. The default map is +gaussian_filter(sobel(im), 2).

  • +
  • scale (tuple) – A 2-tuple (h, w) containing optional scale factors of +the input. Values of 0 are used for aspect-preserving +scaling. None skips input scaling.

  • +
  • topline (bool) – Switch to change default baseline location for offset +calculation purposes. If set to False, baselines are +assumed to be on the bottom of the text line and will +be offset upwards, if set to True, baselines are on the +top and will be offset downwards. If set to None, no +offset will be applied.

  • +
+
+
Returns:
+

List of lists of coordinates. If no polygonization could be compute for +a baseline None is returned instead.

+
+
+
+ +
+
+kraken.lib.segmentation.scale_polygonal_lines(lines, scale)
+

Scales baselines/polygon coordinates by a certain factor.

+
+
Parameters:
+
    +
  • lines (Sequence) – List of tuples containing the baseline and it’s +polygonization.

  • +
  • scale (float or tuple of floats) – Scaling factor

  • +
+
+
Return type:
+

Sequence[Tuple[List, List]]

+
+
+
+ +
+
+kraken.lib.segmentation.scale_regions(regions, scale)
+

Scales baselines/polygon coordinates by a certain factor.

+
+
Parameters:
+
    +
  • lines (Sequence) – List of tuples containing the baseline and it’s +polygonization.

  • +
  • scale (float or tuple of floats) – Scaling factor

  • +
  • regions (Sequence[Tuple[List[int], List[int]]])

  • +
+
+
Return type:
+

Sequence[Tuple[List, List]]

+
+
+
+ +
+
+kraken.lib.segmentation.compute_polygon_section(baseline, boundary, dist1, dist2)
+

Given a baseline, polygonal boundary, and two points on the baseline return +the rectangle formed by the orthogonal cuts on that baseline segment. The +resulting polygon is not garantueed to have a non-zero area.

+

The distance can be larger than the actual length of the baseline if the +baseline endpoints are inside the bounding polygon. In that case the +baseline will be extrapolated to the polygon edge.

+
+
Parameters:
+
    +
  • baseline (list) – A polyline ((x1, y1), …, (xn, yn))

  • +
  • boundary (list) – A bounding polygon around the baseline (same format as +baseline).

  • +
  • dist1 (int) – Absolute distance along the baseline of the first point.

  • +
  • dist2 (int) – Absolute distance along the baseline of the second point.

  • +
+
+
Returns:
+

A sequence of polygon points.

+
+
Return type:
+

List[Tuple[int, int]]

+
+
+
+ +
+
+kraken.lib.segmentation.extract_polygons(im, bounds)
+

Yields the subimages of image im defined in the list of bounding polygons +with baselines preserving order.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image

  • +
  • bounds (Dict[str, Any]) –

    A list of dicts in baseline: +``` +{‘type’: ‘baselines’,

    +
    +
    +
    ’lines’: [{‘baseline’: [[x_0, y_0], … [x_n, y_n]],
    +

    ’boundary’: [[x_0, y_0], … [x_n, y_n]]},

    +
    +

    ….]

    +
    +
    +
    +

    or bounding box format: +``` +{‘boxes’: [[x_0, y_0, x_1, y_1], …],

    +
    +

    ’text_direction’: ‘horizontal-lr’}

    +
    +

    ```

    +

  • +
+
+
Yields:
+

The extracted subimage

+
+
Return type:
+

PIL.Image.Image

+
+
+
+ +
+
+
+

kraken.lib.ctc_decoder

+
+
+kraken.lib.ctc_decoder.beam_decoder(outputs, beam_size=3)
+

Translates back the network output to a label sequence using +same-prefix-merge beam search decoding as described in [0].

+

[0] Hannun, Awni Y., et al. “First-pass large vocabulary continuous speech +recognition using bi-directional recurrent DNNs.” arXiv preprint +arXiv:1408.2873 (2014).

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • beam_size (int) – Size of the beam

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, prob). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+kraken.lib.ctc_decoder.greedy_decoder(outputs)
+

Translates back the network output to a label sequence using greedy/best +path decoding as described in [0].

+

[0] Graves, Alex, et al. “Connectionist temporal classification: labelling +unsegmented sequence data with recurrent neural networks.” Proceedings of +the 23rd international conference on Machine learning. ACM, 2006.

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, max). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+kraken.lib.ctc_decoder.blank_threshold_decoder(outputs, threshold=0.5)
+

Translates back the network output to a label sequence as the original +ocropy/clstm.

+

Thresholds on class 0, then assigns the maximum (non-zero) class to each +region.

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • threshold (float) – Threshold for 0 class when determining possible label +locations.

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, max). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+

kraken.lib.exceptions

+
+
+class kraken.lib.exceptions.KrakenCodecException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenStopTrainingException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenEncodeException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenRecordException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenInvalidModelException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenInputException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenRepoException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenCairoSurfaceException(message, width, height)
+

Raised when the Cairo surface couldn’t be created.

+
+
Parameters:
+
    +
  • message (str)

  • +
  • width (int)

  • +
  • height (int)

  • +
+
+
+
+
+message
+

Error message

+
+
Type:
+

str

+
+
+
+ +
+
+width
+

Width of the surface

+
+
Type:
+

int

+
+
+
+ +
+
+height
+

Height of the surface

+
+
Type:
+

int

+
+
+
+ +
+
+height
+
+ +
+
+message
+
+ +
+
+width
+
+ +
+ +
+
+

Legacy modules

+

These modules are retained for compatibility reasons or highly specialized use +cases. In most cases their use is not necessary and they aren’t further +developed for interoperability with new functionality, e.g. the transcription +and line generation modules do not work with the baseline segmenter.

+
+

kraken.binarization module

+
+
+kraken.binarization.nlbin(im, threshold=0.5, zoom=0.5, escale=1.0, border=0.1, perc=80, range=20, low=5, high=90)
+

Performs binarization using non-linear processing.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image

  • +
  • threshold (float)

  • +
  • zoom (float) – Zoom for background page estimation

  • +
  • escale (float) – Scale for estimating a mask over the text region

  • +
  • border (float) – Ignore this much of the border

  • +
  • perc (int) – Percentage for filters

  • +
  • range (int) – Range for filters

  • +
  • low (int) – Percentile for black estimation

  • +
  • high (int) – Percentile for white estimation

  • +
+
+
Returns:
+

PIL.Image.Image containing the binarized image

+
+
Raises:
+

KrakenInputException – When trying to binarize an empty image.

+
+
Return type:
+

PIL.Image.Image

+
+
+
+ +
+
+

kraken.transcribe module

+
+
+class kraken.transcribe.TranscriptionInterface(font=None, font_style=None)
+
+
+add_page(im, segmentation=None, records=None)
+

Adds an image to the transcription interface, optionally filling in +information from a list of ocr_record objects.

+
+
Parameters:
+
    +
  • im (PIL.Image) – Input image

  • +
  • segmentation (dict) – Output of the segment method.

  • +
  • records (list) – A list of ocr_record objects.

  • +
+
+
+
+ +
+
+env
+
+ +
+
+font
+
+ +
+
+line_idx = 1
+
+ +
+
+page_idx = 1
+
+ +
+
+pages: List[dict] = []
+
+ +
+
+seg_idx = 1
+
+ +
+
+text_direction = 'horizontal-tb'
+
+ +
+
+tmpl
+
+ +
+
+write(fd)
+

Writes the HTML file to a file descriptor.

+
+
Parameters:
+

fd (File) – File descriptor (mode=’rb’) to write to.

+
+
+
+ +
+ +
+
+

kraken.linegen module

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.0/genindex.html b/4.0/genindex.html new file mode 100644 index 000000000..d8352cacb --- /dev/null +++ b/4.0/genindex.html @@ -0,0 +1,673 @@ + + + + + + + Index — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ + +

Index

+ +
+ A + | B + | C + | D + | E + | F + | G + | H + | I + | K + | L + | M + | N + | O + | P + | R + | S + | T + | U + | V + | W + +
+

A

+ + + +
+ +

B

+ + + +
+ +

C

+ + + +
+ +

D

+ + + +
+ +

E

+ + + +
+ +

F

+ + + +
+ +

G

+ + + +
+ +

H

+ + + +
+ +

I

+ + + +
+ +

K

+ + + +
+ +

L

+ + + +
+ +

M

+ + + +
+ +

N

+ + + +
+ +

O

+ + + +
+ +

P

+ + + +
+ +

R

+ + + +
+ +

S

+ + + +
+ +

T

+ + + +
+ +

U

+ + +
+ +

V

+ + + +
+ +

W

+ + + +
+ + + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.0/gpu.html b/4.0/gpu.html new file mode 100644 index 000000000..3a1f38e44 --- /dev/null +++ b/4.0/gpu.html @@ -0,0 +1,100 @@ + + + + + + + + GPU Acceleration — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

GPU Acceleration

+

The latest version of kraken uses a new pytorch backend which enables GPU +acceleration both for training and recognition. Apart from a compatible Nvidia +GPU, CUDA and cuDNN have to be installed so pytorch can run computation on it.

+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.0/index.html b/4.0/index.html new file mode 100644 index 000000000..c34dd32c8 --- /dev/null +++ b/4.0/index.html @@ -0,0 +1,1037 @@ + + + + + + + + kraken — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

kraken

+
+
+

kraken is a turn-key OCR system optimized for historical and non-Latin script +material.

+
+
+

Features

+

kraken’s main features are:

+
+
+
+

Pull requests and code contributions are always welcome.

+
+
+

Installation

+

Kraken can be run on Linux or Mac OS X (both x64 and ARM). Installation through +the on-board pip utility and the anaconda +scientific computing python are supported.

+
+

Installation using Pip

+
$ pip install kraken
+
+
+

or by running pip in the git repository:

+
$ pip install .
+
+
+

If you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to +install the pdf extras package for PyPi:

+
$ pip install kraken[pdf]
+
+
+

or

+
$ pip install .[pdf]
+
+
+

respectively.

+
+
+

Installation using Conda

+

To install the stable version through conda:

+
$ conda install -c conda-forge -c mittagessen kraken
+
+
+

Again PDF/multi-page TIFF/JPEG2000 support requires some additional dependencies:

+
$ conda install -c conda-forge pyvips
+
+
+

The git repository contains some environment files that aid in setting up the latest development version:

+
$ git clone git://github.com/mittagessen/kraken.git
+$ cd kraken
+$ conda env create -f environment.yml
+
+
+

or:

+
$ git clone git://github.com/mittagessen/kraken.git
+$ cd kraken
+$ conda env create -f environment_cuda.yml
+
+
+

for CUDA acceleration with the appropriate hardware.

+
+
+

Finding Recognition Models

+

Finally you’ll have to scrounge up a recognition model to do the actual +recognition of characters. To download the default English text recognition +model and place it in the user’s kraken directory:

+
$ kraken get 10.5281/zenodo.2577813
+
+
+

A list of libre models available in the central repository can be retrieved by +running:

+
$ kraken list
+
+
+

Model metadata can be extracted using:

+
$ kraken show 10.5281/zenodo.2577813
+name: 10.5281/zenodo.2577813
+
+A generalized model for English printed text
+
+This model has been trained on a large corpus of modern printed English text\naugmented with ~10000 lines of historical p
+scripts: Latn
+alphabet: !"#$%&'()+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]`abcdefghijklmnopqrstuvwxyz{} SPACE
+accuracy: 99.95%
+license: Apache-2.0
+author(s): Kiessling, Benjamin
+date: 2019-02-26
+
+
+
+
+
+

Quickstart

+

The structure of an OCR software consists of multiple steps, primarily +preprocessing, segmentation, and recognition, each of which takes the output of +the previous step and sometimes additional files such as models and templates +that define how a particular transformation is to be performed.

+

In kraken these are separated into different subcommands that can be chained or +ran separately:

+ + + + + + + + + + + + + + + Segmentation + + + + + + + + + + + Recognition + + + + + + + + + + + Serialization + + + + + + + + + + + + + + + + + + + + + + Recognition Model + + + + + + + + + + + + + + + + + + + + + + Segmentation Model + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + OCR Records + + + + + + + + + + + + + + + + + + Baselines, + Regions, + and Order + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Output File + + + + + + + + + + + + + + + + + + Output Template + + + + + + + + + + + + + + + + + + Image + + +

Recognizing text on an image using the default parameters including the +prerequisite step of page segmentation:

+
$ kraken -i image.tif image.txt segment -bl ocr
+Loading RNN     ✓
+Processing      ⣻
+
+
+

To segment an image into reading-order sorted baselines and regions:

+
$ kraken -i bw.tif lines.json segment -bl
+
+
+

To OCR an image using the default model:

+
$ kraken -i bw.tif image.txt segment -bl ocr
+
+
+

To OCR an image using the default model and serialize the output using the ALTO +template:

+
$ kraken -a -i bw.tif image.txt segment -bl ocr
+
+
+

All commands and their parameters are documented, just add the standard +--help flag for further information.

+
+
+

Training Tutorial

+

There is a training tutorial at Training kraken.

+
+ +
+

License

+

Kraken is provided under the terms and conditions of the Apache 2.0 +License.

+
+
+

Funding

+

kraken is developed at the École Pratique des Hautes Études, Université PSL.

+
+
+Co-financed by the European Union + +
+
+

This project was partially funded through the RESILIENCE project, funded from +the European Union’s Horizon 2020 Framework Programme for Research and +Innovation.

+
+
+
+
+Received funding from the Programme d’investissements d’Avenir + +
+
+

Ce travail a bénéficié d’une aide de l’État gérée par l’Agence Nationale de la +Recherche au titre du Programme d’Investissements d’Avenir portant la référence +ANR-21-ESRE-0005.

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.0/ketos.html b/4.0/ketos.html new file mode 100644 index 000000000..281cb9031 --- /dev/null +++ b/4.0/ketos.html @@ -0,0 +1,798 @@ + + + + + + + + Training — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Training

+

This page describes the training utilities available through the ketos +command line utility in depth. For a gentle introduction on model training +please refer to the tutorial.

+

Both segmentation and recognition are trainable in kraken. The segmentation +model finds baselines and regions on a page image. Recognition models convert +text image lines found by the segmenter into digital text.

+
+

Training data formats

+

The training tools accept a variety of training data formats, usually some kind +of custom low level format, the XML-based formats that are commony used for +archival of annotation and transcription data, and in the case of recognizer +training a precompiled binary format. It is recommended to use the XML formats +for segmentation training and the binary format for recognition training.

+
+

ALTO

+

Kraken parses and produces files according to ALTO 4.2. An example showing the +attributes necessary for segmentation and recognition training follows:

+
<?xml version="1.0" encoding="UTF-8"?>
+<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+	xmlns="http://www.loc.gov/standards/alto/ns-v4#"
+	xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-0.xsd">
+	<Description>
+		<sourceImageInformation>
+			<fileName>filename.jpg</fileName><!-- relative path in relation to XML location of the image file-->
+		</sourceImageInformation>
+		....
+	</Description>
+	<Layout>
+		<Page...>
+			<PrintSpace...>
+				<ComposedBlockType ID="block_I"
+						   HPOS="125"
+						   VPOS="523" 
+						   WIDTH="5234" 
+						   HEIGHT="4000"
+						   TYPE="region_type"><!-- for textlines part of a semantic region -->
+					<TextBlock ID="textblock_N">
+						<TextLine ID="line_0"
+							  HPOS="..."
+							  VPOS="..." 
+							  WIDTH="..." 
+							  HEIGHT="..."
+							  BASELINE="10 20 15 20 400 20"><!-- necessary for segmentation training -->
+							<String ID="segment_K" 
+								CONTENT="word_text"><!-- necessary for recognition training. Text is retrieved from <String> and <SP> tags. Lower level glyphs are ignored. -->
+								...
+							</String>
+							<SP.../>
+						</TextLine>
+					</TextBlock>
+				</ComposedBlockType>
+				<TextBlock ID="textblock_M"><!-- for textlines not part of a region -->
+				...
+				</TextBlock>
+			</PrintSpace>
+		</Page>
+	</Layout>
+</alto>
+
+
+

Importantly, the parser only works with measurements in the pixel domain, i.e. +an unset MeasurementUnit or one with an element value of pixel. In +addition, as the minimal version required for ingestion is quite new it is +likely that most existing ALTO documents will not contain sufficient +information to be used with kraken out of the box.

+
+
+

PAGE XML

+

PAGE XML is parsed and produced according to the 2019-07-15 version of the +schema, although the parser is not strict and works with non-conformant output +of a variety of tools.

+
<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd">
+	<Metadata>...</Metadata>
+	<Page imageFilename="filename.jpg"...><!-- relative path to an image file from the location of the XML document -->
+		<TextRegion id="block_N"
+			    custom="structure {type:region_type;}"><!-- region type is a free text field-->
+			<Coords points="10,20 500,20 400,200, 500,300, 10,300 5,80"/><!-- polygon for region boundary -->
+			<TextLine id="line_K">
+				<Baseline points="80,200 100,210, 400,198"/><!-- required for baseline segmentation training -->
+				<TextEquiv><Unicode>text text text</Unicode></TextEquiv><!-- only TextEquiv tags immediately below the TextLine tag are parsed for recognition training -->
+				<Word>
+				...
+			</TextLine>
+			....
+		</TextRegion>
+		<TextRegion id="textblock_M"><!-- for lines not contained in any region. TextRegions without a type are automatically assigned the 'text' type which can be filtered out for training. -->
+			<Coords points="0,0 0,{{ page.size[1] }} {{ page.size[0] }},{{ page.size[1] }} {{ page.size[0] }},0"/>
+			<TextLine>...</TextLine><!-- same as above -->
+			....
+                </TextRegion>
+	</Page>
+</PcGts>
+
+
+
+
+

Binary Datasets

+

In addition to training recognition models directly from XML and image files, a +binary dataset format offering a couple of advantages is supported. Binary +datasets drastically improve loading performance allowing the saturation of +most GPUs with minimal computational overhead while also allowing training with +datasets that are larger than the systems main memory. A minor drawback is a +~30% increase in dataset size in comparison to the raw images + XML approach.

+

To realize this speedup the dataset has to be compiled first:

+
$ ketos compile -f xml -o dataset.arrow file_1.xml file_2.xml ...
+
+
+

if there are a lot of individual lines containing many lines this process can +take a long time. It can easily be parallelized by specifying the number of +separate parsing workers with the –workers option:

+
$ ketos compile --workers 8 -f xml ...
+
+
+

In addition, binary datasets can contain fixed splits which allow +reproducibility and comparability between training and evaluation runs. +Training, validation, and test splits can be pre-defined from multiple sources. +Per default they are sourced from tags defined in the source XML files unless +the option telling kraken to ignore them is set:

+
$ ketos compile --ignore-splits -f xml ...
+
+
+

Alternatively fixed-proportion random splits can be created ad-hoc during +compile time:

+
$ ketos compile --random-split 0.8 0.1 0.1 ...
+
+
+

The above line splits assigns 80% of the source lines to the training set, 10% +to the validation set, and 10% to the test set. The training and validation +sets in the dataset file are used automatically by ketos train (unless told +otherwise) while the remaining 10% of the test set is selected by ketos test.

+
+
+
+

Recognition training

+

The training utility allows training of VGSL specified models +both from scratch and from existing models. Here are its most important command line options:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

option

action

-o, –output

Output model file prefix. Defaults to model.

-s, –spec

VGSL spec of the network to train. CTC layer +will be added automatically. default: +[1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 +Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do]

-a, –append

Removes layers before argument and then +appends spec. Only works when loading an +existing model

-i, –load

Load existing file to continue training

-F, –savefreq

Model save frequency in epochs during +training

-q, –quit

Stop condition for training. Set to early +for early stopping (default) or dumb for fixed +number of epochs.

-N, –epochs

Number of epochs to train for.

–min-epochs

Minimum number of epochs to train for when using early stopping.

–lag

Number of epochs to wait before stopping +training without improvement. Only used when using early stopping.

-d, –device

Select device to use (cpu, cuda:0, cuda:1,…). GPU acceleration requires CUDA.

–optimizer

Select optimizer (Adam, SGD, RMSprop).

-r, –lrate

Learning rate [default: 0.001]

-m, –momentum

Momentum used with SGD optimizer. Ignored otherwise.

-w, –weight-decay

Weight decay.

–schedule

Sets the learning rate scheduler. May be either constant, 1cycle, exponential, cosine, step, or +reduceonplateau. For 1cycle the cycle length is determined by the –epoch option.

-p, –partition

Ground truth data partition ratio between train/validation set

-u, –normalization

Ground truth Unicode normalization. One of NFC, NFKC, NFD, NFKD.

-c, –codec

Load a codec JSON definition (invalid if loading existing model)

–resize

Codec/output layer resizing option. If set +to add code points will be added, both +will set the layer to match exactly the +training data, fail will abort if training +data and model codec do not match. Only valid when refining an existing model.

-n, –reorder / –no-reorder

Reordering of code points to display order.

-t, –training-files

File(s) with additional paths to training data. Used to +enforce an explicit train/validation set split and deal with +training sets with more lines than the command line can process. Can be used more than once.

-e, –evaluation-files

File(s) with paths to evaluation data. Overrides the -p parameter.

-f, –format-type

Sets the training and evaluation data format. +Valid choices are ‘path’, ‘xml’ (default), ‘alto’, ‘page’, or binary. +In alto, page, and xml mode all data is extracted from XML files +containing both baselines and a link to source images. +In path mode arguments are image files sharing a prefix up to the last +extension with JSON .path files containing the baseline information. +In binary mode arguments are precompiled binary dataset files.

–augment / –no-augment

Enables/disables data augmentation.

–workers

Number of OpenMP threads and workers used to perform neural network passes and load samples from the dataset.

+
+

From Scratch

+

The absolute minimal example to train a new recognition model from a number of +ALTO or PAGE XML documents is similar to the segmentation training:

+
$ ketos train -f xml training_data/*.xml
+
+
+

Training will continue until the error does not improve anymore and the best +model (among intermediate results) will be saved in the current directory; this +approach is called early stopping.

+

In some cases, such as color inputs, changing the network architecture might be +useful:

+
$ ketos train -f page -s '[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do]' syr/*.xml
+
+
+

Complete documentation for the network description language can be found on the +VGSL page.

+

Sometimes the early stopping default parameters might produce suboptimal +results such as stopping training too soon. Adjusting the minimum delta an/or +lag can be useful:

+
$ ketos train --lag 10 --min-delta 0.001 syr/*.png
+
+
+

To switch optimizers from Adam to SGD or RMSprop just set the option:

+
$ ketos train --optimizer SGD syr/*.png
+
+
+

It is possible to resume training from a previously saved model:

+
$ ketos train -i model_25.mlmodel syr/*.png
+
+
+

A good configuration for a small precompiled print dataset and GPU acceleration +would be:

+
$ ketos train -d cuda -f binary dataset.arrow
+
+
+

A better configuration for large and complicated datasets such as handwritten texts:

+
$ ketos train --augment --workers 4 -d cuda -f binary --min-epochs 20 -w 0 -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]' -r 0.0001 dataset_large.arrow
+
+
+

This configuration is slower to train and often requires a couple of epochs to +output any sensible text at all. Therefore we tell ketos to train for at least +20 epochs so the early stopping algorithm doesn’t prematurely interrupt the +training process.

+
+
+

Fine Tuning

+

Fine tuning an existing model for another typeface or new characters is also +possible with the same syntax as resuming regular training:

+
$ ketos train -f page -i model_best.mlmodel syr/*.xml
+
+
+

The caveat is that the alphabet of the base model and training data have to be +an exact match. Otherwise an error will be raised:

+
$ ketos train -i model_5.mlmodel kamil/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[0.8616] alphabet mismatch {'~', '»', '8', '9', 'ـ'}
+Network codec not compatible with training set
+[0.8620] Training data and model codec alphabets mismatch: {'ٓ', '؟', '!', 'ص', '،', 'ذ', 'ة', 'ي', 'و', 'ب', 'ز', 'ح', 'غ', '~', 'ف', ')', 'د', 'خ', 'م', '»', 'ع', 'ى', 'ق', 'ش', 'ا', 'ه', 'ك', 'ج', 'ث', '(', 'ت', 'ظ', 'ض', 'ل', 'ط', '؛', 'ر', 'س', 'ن', 'ء', 'ٔ', '«', 'ـ', 'ٕ'}
+
+
+

There are two modes dealing with mismatching alphabets, add and both. +add resizes the output layer and codec of the loaded model to include all +characters in the new training set without removing any characters. both +will make the resulting model an exact match with the new training set by both +removing unused characters from the model and adding new ones.

+
$ ketos -v train --resize add -i model_5.mlmodel syr/*.png
+...
+[0.7943] Training set 788 lines, validation set 88 lines, alphabet 50 symbols
+...
+[0.8337] Resizing codec to include 3 new code points
+[0.8374] Resizing last layer in network to 52 outputs
+...
+
+
+

In this example 3 characters were added for a network that is able to +recognize 52 different characters after sufficient additional training.

+
$ ketos -v train --resize both -i model_5.mlmodel syr/*.png
+...
+[0.7593] Training set 788 lines, validation set 88 lines, alphabet 49 symbols
+...
+[0.7857] Resizing network or given codec to 49 code sequences
+[0.8344] Deleting 2 output classes from network (46 retained)
+...
+
+
+

In both mode 2 of the original characters were removed and 3 new ones were added.

+
+
+

Slicing

+

Refining on mismatched alphabets has its limits. If the alphabets are highly +different the modification of the final linear layer to add/remove character +will destroy the inference capabilities of the network. In those cases it is +faster to slice off the last few layers of the network and only train those +instead of a complete network from scratch.

+

Taking the default network definition as printed in the debug log we can see +the layer indices of the model:

+
[0.8760] Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 48 outputs
+[0.8762] layer          type    params
+[0.8790] 0              conv    kernel 3 x 3 filters 32 activation r
+[0.8795] 1              dropout probability 0.1 dims 2
+[0.8797] 2              maxpool kernel 2 x 2 stride 2 x 2
+[0.8802] 3              conv    kernel 3 x 3 filters 64 activation r
+[0.8804] 4              dropout probability 0.1 dims 2
+[0.8806] 5              maxpool kernel 2 x 2 stride 2 x 2
+[0.8813] 6              reshape from 1 1 x 12 to 1/3
+[0.8876] 7              rnn     direction b transposed False summarize False out 100 legacy None
+[0.8878] 8              dropout probability 0.5 dims 1
+[0.8883] 9              linear  augmented False out 48
+
+
+

To remove everything after the initial convolutional stack and add untrained +layers we define a network stub and index for appending:

+
$ ketos train -i model_1.mlmodel --append 7 -s '[Lbx256 Do]' syr/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[0.8014] alphabet mismatch {'8', '3', '9', '7', '܇', '݀', '݂', '4', ':', '0'}
+Slicing and dicing model ✓
+
+
+

The new model will behave exactly like a new one, except potentially training a +lot faster.

+
+
+

Text Normalization and Unicode

+

Text can be encoded in multiple different ways when using Unicode. For many +scripts characters with diacritics can be encoded either as a single code point +or a base character and the diacritic, different types of whitespace exist, and mixed bidirectional text +can be written differently depending on the base line direction.

+

Ketos provides options to largely normalize input into normalized forms that +make processing of data from multiple sources possible. Principally, two +options are available: one for Unicode normalization and one for whitespace normalization. The +Unicode normalization (disabled per default) switch allows one to select one of +the 4 normalization forms:

+
$ ketos train --normalization NFD -f xml training_data/*.xml
+$ ketos train --normalization NFC -f xml training_data/*.xml
+$ ketos train --normalization NFKD -f xml training_data/*.xml
+$ ketos train --normalization NFKC -f xml training_data/*.xml
+
+
+

Whitespace normalization is enabled per default and converts all Unicode +whitespace characters into a simple space. It is highly recommended to leave +this function enabled as the variation of space width, resulting either from +text justification or the irregularity of handwriting, is difficult for a +recognition model to accurately model and map onto the different space code +points. Nevertheless it can be disabled through:

+
$ ketos train --no-normalize-whitespace -f xml training_data/*.xml
+
+
+

Further the behavior of the BiDi algorithm can be influenced through two options. The +configuration of the algorithm is important as the recognition network is +trained to output characters (or rather labels which are mapped to code points +by a codec) in the order a line is fed into the network, i.e. +left-to-right also called display order. Unicode text is encoded as a stream of +code points in logical order, i.e. the order the characters in a line are read +in by a human reader, for example (mostly) right-to-left for a text in Hebrew. +The BiDi algorithm resolves this logical order to the display order expected by +the network and vice versa. The primary parameter of the algorithm is the base +direction which is just the default direction of the input fields of the user +when the ground truth was initially transcribed. Base direction will be +automatically determined by kraken when using PAGE XML or ALTO files that +contain it, otherwise it will have to be supplied if it differs from the +default when training a model:

+
$ ketos train --base-dir R -f xml rtl_training_data/*.xml
+
+
+

It is also possible to disable BiDi processing completely, e.g. when the text +has been brought into display order already:

+
$ ketos train --no-reorder -f xml rtl_display_data/*.xml
+
+
+
+
+

Codecs

+

Codecs map between the label decoded from the raw network output and Unicode +code points (see this diagram for the precise steps +involved in text line recognition). Codecs are attached to a recognition model +and are usually defined once at initial training time, although they can be +adapted either explicitly (with the API) or implicitly through domain adaptation.

+

The default behavior of kraken is to auto-infer this mapping from all the +characters in the training set and map each code point to one separate label. +This is usually sufficient for alphabetic scripts, abjads, and abugidas apart +from very specialised use cases. Logographic writing systems with a very large +number of different graphemes, such as all the variants of Han characters or +Cuneiform, can be more problematic as their large inventory makes recognition +both slow and error-prone. In such cases it can be advantageous to decompose +each code point into multiple labels to reduce the output dimensionality of the +network. During decoding valid sequences of labels will be mapped to their +respective code points as usual.

+

There are multiple approaches one could follow constructing a custom codec: +randomized block codes, i.e. producing random fixed-length labels for each code +point, Huffmann coding, i.e. variable length label sequences depending on the +frequency of each code point in some text (not necessarily the training set), +or structural decomposition, i.e. describing each code point through a +sequence of labels that describe the shape of the grapheme similar to how some +input systems for Chinese characters function.

+

While the system is functional it is not well-tested in practice and it is +unclear which approach works best for which kinds of inputs.

+

Custom codecs can be supplied as simple JSON files that contain a dictionary +mapping between strings and integer sequences, e.g.:

+
$ ketos train -c sample.codec -f xml training_data/*.xml
+
+
+

with sample.codec containing:

+
{"S": [50, 53, 74, 23],
+ "A": [95, 60, 19, 95],
+ "B": [2, 96, 28, 29],
+ "\u1f05": [91, 14, 95, 90]}
+
+
+
+
+
+

Segmentation training

+

Training a segmentation model is very similar to training models for text +recognition. The basic invocation is:

+
$ ketos segtrain -f xml training_data/*.xml
+Training line types:
+  default 2     53980
+  foo     8     134
+Training region types:
+  graphic       3       135
+  text  4       1128
+  separator     5       5431
+  paragraph     6       10218
+  table 7       16
+val check  [------------------------------------]  0/0
+
+
+

This takes all text lines and regions encoded in the XML files and trains a +model to recognize them.

+

Most other options available in transcription training are also available in +segmentation training. CUDA acceleration:

+
$ ketos segtrain -d cuda -f xml training_data/*.xml
+
+
+

Defining custom architectures:

+
$ ketos segtrain -d cuda -s '[1,1200,0,3 Cr7,7,64,2,2 Gn32 Cr3,3,128,2,2 Gn32 Cr3,3,128 Gn32 Cr3,3,256 Gn32]' -f xml training_data/*.xml
+
+
+

Fine tuning/transfer learning with last layer adaptation and slicing:

+
$ ketos segtrain --resize both -i segmodel_best.mlmodel training_data/*.xml
+$ ketos segtrain -i segmodel_best.mlmodel --append 7 -s '[Cr3,3,64 Do0.1]' training_data/*.xml
+
+
+

In addition there are a couple of specific options that allow filtering of +baseline and region types. Datasets are often annotated to a level that is too +detailled or contains undesirable types, e.g. when combining segmentation data +from different sources. The most basic option is the suppression of all of +either baseline or region data contained in the dataset:

+
$ ketos segtrain --suppress-baselines -f xml training_data/*.xml
+Training line types:
+Training region types:
+  graphic       3       135
+  text  4       1128
+  separator     5       5431
+  paragraph     6       10218
+  table 7       16
+...
+$ ketos segtrain --suppress-regions -f xml training-data/*.xml
+Training line types:
+  default 2     53980
+  foo     8     134
+...
+
+
+

It is also possible to filter out baselines/regions selectively:

+
$ ketos segtrain -f xml --valid-baselines default training_data/*.xml
+Training line types:
+  default 2     53980
+Training region types:
+  graphic       3       135
+  text  4       1128
+  separator     5       5431
+  paragraph     6       10218
+  table 7       16
+$ ketos segtrain -f xml --valid-regions graphic --valid-regions paragraph training_data/*.xml
+Training line types:
+  default 2     53980
+ Training region types:
+  graphic       3       135
+  paragraph     6       10218
+
+
+

Finally, we can merge baselines and regions into each other:

+
$ ketos segtrain -f xml --merge-baselines default:foo training_data/*.xml
+Training line types:
+  default 2     54114
+...
+$ ketos segtrain -f xml --merge-regions text:paragraph --merge-regions graphic:table training_data/*.xml
+...
+Training region types:
+  graphic       3       151
+  text  4       11346
+  separator     5       5431
+...
+
+
+

These options are combinable to massage the dataset into any typology you want.

+

Then there are some options that set metadata fields controlling the +postprocessing. When computing the bounding polygons the recognized baselines +are offset slightly to ensure overlap with the line corpus. This offset is per +default upwards for baselines but as it is possible to annotate toplines (for +scripts like Hebrew) and centerlines (for baseline-free scripts like Chinese) +the appropriate offset can be selected with an option:

+
$ ketos segtrain --topline -f xml hebrew_training_data/*.xml
+$ ketos segtrain --centerline -f xml chinese_training_data/*.xml
+$ ketos segtrain --baseline -f xml latin_training_data/*.xml
+
+
+

Lastly, there are some regions that are absolute boundaries for text line +content. When these regions are marked as such the polygonization can sometimes +be improved:

+
$ ketos segtrain --bounding-regions paragraph -f xml training_data/*.xml
+...
+
+
+
+
+

Recognition Testing

+

Picking a particular model from a pool or getting a more detailed look on the +recognition accuracy can be done with the test command. It uses transcribed +lines, the test set, in the same format as the train command, recognizes the +line images with one or more models, and creates a detailed report of the +differences from the ground truth for each of them.

+ + + + + + + + + + + + + + + + + + + + + + + +

option

action

-f, –format-type

Sets the test set data format. +Valid choices are ‘path’, ‘xml’ (default), ‘alto’, ‘page’, or binary. +In alto, page, and xml mode all data is extracted from XML files +containing both baselines and a link to source images. +In path mode arguments are image files sharing a prefix up to the last +extension with JSON .path files containing the baseline information. +In binary mode arguments are precompiled binary dataset files.

-m, –model

Model(s) to evaluate.

-e, –evaluation-files

File(s) with paths to evaluation data.

-d, –device

Select device to use.

–pad

Left and right padding around lines.

+

Transcriptions are handed to the command in the same way as for the train +command, either through a manifest with -e/–evaluation-files or by just +adding a number of image files as the final argument:

+
$ ketos test -m $model -e test.txt test/*.png
+Evaluating $model
+Evaluating  [####################################]  100%
+=== report test_model.mlmodel ===
+
+7012 Characters
+6022 Errors
+14.12%       Accuracy
+
+5226 Insertions
+2    Deletions
+794  Substitutions
+
+Count Missed   %Right
+1567  575    63.31%  Common
+5230  5230   0.00%   Arabic
+215   215    0.00%   Inherited
+
+Errors       Correct-Generated
+773  { ا } - {  }
+536  { ل } - {  }
+328  { و } - {  }
+274  { ي } - {  }
+266  { م } - {  }
+256  { ب } - {  }
+246  { ن } - {  }
+241  { SPACE } - {  }
+207  { ر } - {  }
+199  { ف } - {  }
+192  { ه } - {  }
+174  { ع } - {  }
+172  { ARABIC HAMZA ABOVE } - {  }
+144  { ت } - {  }
+136  { ق } - {  }
+122  { س } - {  }
+108  { ، } - {  }
+106  { د } - {  }
+82   { ك } - {  }
+81   { ح } - {  }
+71   { ج } - {  }
+66   { خ } - {  }
+62   { ة } - {  }
+60   { ص } - {  }
+39   { ، } - { - }
+38   { ش } - {  }
+30   { ا } - { - }
+30   { ن } - { - }
+29   { ى } - {  }
+28   { ذ } - {  }
+27   { ه } - { - }
+27   { ARABIC HAMZA BELOW } - {  }
+25   { ز } - {  }
+23   { ث } - {  }
+22   { غ } - {  }
+20   { م } - { - }
+20   { ي } - { - }
+20   { ) } - {  }
+19   { : } - {  }
+19   { ط } - {  }
+19   { ل } - { - }
+18   { ، } - { . }
+17   { ة } - { - }
+16   { ض } - {  }
+...
+Average accuracy: 14.12%, (stddev: 0.00)
+
+
+

The report(s) contains character accuracy measured per script and a detailed +list of confusions. When evaluating multiple models the last line of the output +will the average accuracy and the standard deviation across all of them.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.0/models.html b/4.0/models.html new file mode 100644 index 000000000..bf0c1a5a2 --- /dev/null +++ b/4.0/models.html @@ -0,0 +1,126 @@ + + + + + + + + Models — kraken documentation + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Models

+

There are currently three kinds of models containing the recurrent neural +networks doing all the character recognition supported by kraken: pronn +files serializing old pickled pyrnn models as protobuf, clstm’s native +serialization, and versatile Core ML models.

+
+

CoreML

+

Core ML allows arbitrary network architectures in a compact serialization with +metadata. This is the default format in pytorch-based kraken.

+
+
+

Segmentation Models

+
+
+

Recognition Models

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.0/objects.inv b/4.0/objects.inv new file mode 100644 index 000000000..0a03a2242 Binary files /dev/null and b/4.0/objects.inv differ diff --git a/4.0/search.html b/4.0/search.html new file mode 100644 index 000000000..cb5c191b4 --- /dev/null +++ b/4.0/search.html @@ -0,0 +1,113 @@ + + + + + + + Search — kraken documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +

Search

+ + + + +

+ Searching for multiple words only shows matches that contain + all words. +

+ + +
+ + + +
+ + +
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.0/searchindex.js b/4.0/searchindex.js new file mode 100644 index 000000000..f558e9d7f --- /dev/null +++ b/4.0/searchindex.js @@ -0,0 +1 @@ +Search.setIndex({"alltitles": {"ALTO": [[5, "alto"]], "API Quickstart": [[1, null]], "API Reference": [[2, null]], "Advanced Usage": [[0, null]], "Annotation and transcription": [[7, "annotation-and-transcription"]], "Baseline segmentation": [[1, "baseline-segmentation"]], "Basic Concepts": [[1, "basic-concepts"]], "Basics": [[8, "basics"]], "Binarization": [[0, "binarization"]], "Binary Datasets": [[5, "binary-datasets"]], "Codecs": [[5, "codecs"]], "Convolutional Layers": [[8, "convolutional-layers"]], "CoreML": [[6, "coreml"]], "Dataset Compilation": [[7, "dataset-compilation"]], "Datasets": [[2, "datasets"]], "Dropout": [[8, "dropout"]], "Evaluation and Validation": [[7, "evaluation-and-validation"]], "Examples": [[8, "examples"]], "Features": [[4, "features"]], "Finding Recognition Models": [[4, "finding-recognition-models"]], "Fine Tuning": [[5, "fine-tuning"]], "From Scratch": [[5, "from-scratch"]], "Funding": [[4, "funding"]], "GPU Acceleration": [[3, null]], "Group Normalization": [[8, "group-normalization"]], "Helper and Plumbing Layers": [[8, "helper-and-plumbing-layers"]], "Helpers": [[2, "helpers"]], "Image acquisition and preprocessing": [[7, "image-acquisition-and-preprocessing"]], "Input Specification": [[0, "input-specification"]], "Installation": [[4, "installation"]], "Installation using Conda": [[4, "installation-using-conda"]], "Installation using Pip": [[4, "installation-using-pip"]], "Installing kraken": [[7, "installing-kraken"]], "Legacy modules": [[2, "legacy-modules"]], "Legacy segmentation": [[1, "legacy-segmentation"]], "License": [[4, "license"]], "Loss and Evaluation Functions": [[2, "loss-and-evaluation-functions"]], "Max Pool": [[8, "max-pool"]], "Model Repository": [[0, "model-repository"]], "Models": [[6, null]], "PAGE XML": [[5, "page-xml"]], "Page Segmentation and Script Detection": [[0, "page-segmentation-and-script-detection"]], "Preprocessing and Segmentation": [[1, "preprocessing-and-segmentation"]], "Quickstart": [[4, "quickstart"]], "Recognition": [[0, "recognition"], [1, "recognition"], [7, "recognition"]], "Recognition Models": [[6, "recognition-models"]], "Recognition Testing": [[5, "recognition-testing"]], "Recognition training": [[5, "recognition-training"]], "Recurrent Layers": [[8, "recurrent-layers"]], "Regularization Layers": [[8, "regularization-layers"]], "Related Software": [[4, "related-software"]], "Reshape": [[8, "reshape"]], "Segmentation Models": [[6, "segmentation-models"]], "Segmentation training": [[5, "segmentation-training"]], "Serialization": [[1, "serialization"]], "Slicing": [[5, "slicing"]], "Text Normalization and Unicode": [[5, "text-normalization-and-unicode"]], "Trainer": [[2, "trainer"]], "Training": [[1, "training"], [5, null], [7, "compilation"]], "Training Schedulers": [[2, "training-schedulers"]], "Training Stoppers": [[2, "training-stoppers"]], "Training Tutorial": [[4, "training-tutorial"]], "Training data formats": [[5, "training-data-formats"]], "Training kraken": [[7, null]], "VGSL network specification": [[8, null]], "XML Parsing": [[1, "xml-parsing"]], "kraken": [[4, null]], "kraken.binarization module": [[2, "kraken-binarization-module"]], "kraken.blla module": [[2, "kraken-blla-module"]], "kraken.lib.codec module": [[2, "kraken-lib-codec-module"]], "kraken.lib.ctc_decoder": [[2, "kraken-lib-ctc-decoder"]], "kraken.lib.dataset module": [[2, "kraken-lib-dataset-module"]], "kraken.lib.exceptions": [[2, "kraken-lib-exceptions"]], "kraken.lib.models module": [[2, "kraken-lib-models-module"]], "kraken.lib.segmentation module": [[2, "kraken-lib-segmentation-module"]], "kraken.lib.train module": [[2, "kraken-lib-train-module"]], "kraken.lib.vgsl module": [[2, "kraken-lib-vgsl-module"]], "kraken.lib.xml module": [[2, "kraken-lib-xml-module"]], "kraken.linegen module": [[2, "kraken-linegen-module"]], "kraken.pageseg module": [[2, "kraken-pageseg-module"]], "kraken.rpred module": [[2, "kraken-rpred-module"]], "kraken.serialization module": [[2, "kraken-serialization-module"]], "kraken.transcribe module": [[2, "kraken-transcribe-module"]]}, "docnames": ["advanced", "api", "api_docs", "gpu", "index", "ketos", "models", "training", "vgsl"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2}, "filenames": ["advanced.rst", "api.rst", "api_docs.rst", "gpu.rst", "index.rst", "ketos.rst", "models.rst", "training.rst", "vgsl.rst"], "indexentries": {"add() (kraken.lib.dataset.baselineset method)": [[2, "kraken.lib.dataset.BaselineSet.add", false]], "add() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.add", false]], "add() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.add", false]], "add_codec() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.add_codec", false]], "add_labels() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.add_labels", false]], "add_page() (kraken.transcribe.transcriptioninterface method)": [[2, "kraken.transcribe.TranscriptionInterface.add_page", false]], "alphabet (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.alphabet", false]], "alphabet (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.alphabet", false]], "append() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.append", false]], "aug (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.aug", false]], "aug (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.aug", false]], "aug (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.aug", false]], "base_dir (kraken.rpred.ocr_record attribute)": [[2, "kraken.rpred.ocr_record.base_dir", false]], "baselineset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.BaselineSet", false]], "beam_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.beam_decoder", false]], "bidi_record() (in module kraken.rpred)": [[2, "kraken.rpred.bidi_record", false]], "bidi_reordering (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.bidi_reordering", false]], "blank_threshold_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.blank_threshold_decoder", false]], "blocks (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.blocks", false]], "bounds (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.bounds", false]], "build_addition() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_addition", false]], "build_conv() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_conv", false]], "build_dropout() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_dropout", false]], "build_groupnorm() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_groupnorm", false]], "build_identity() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_identity", false]], "build_maxpool() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_maxpool", false]], "build_output() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_output", false]], "build_parallel() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_parallel", false]], "build_reshape() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_reshape", false]], "build_rnn() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_rnn", false]], "build_series() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_series", false]], "c_sorted (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.c_sorted", false]], "calculate_polygonal_environment() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.calculate_polygonal_environment", false]], "class_mapping (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.class_mapping", false]], "class_stats (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.class_stats", false]], "codec (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.codec", false]], "codec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.codec", false]], "compute_error() (in module kraken.lib.dataset)": [[2, "kraken.lib.dataset.compute_error", false]], "compute_polygon_section() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.compute_polygon_section", false]], "confidences (kraken.rpred.ocr_record attribute)": [[2, "kraken.rpred.ocr_record.confidences", false]], "criterion (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id0", false], [2, "kraken.lib.vgsl.TorchVGSLModel.criterion", false]], "cuts (kraken.rpred.ocr_record attribute)": [[2, "kraken.rpred.ocr_record.cuts", false]], "decode() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.decode", false]], "decoder (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.decoder", false]], "denoising_hysteresis_thresh() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.denoising_hysteresis_thresh", false]], "device (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.device", false]], "encode() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.encode", false]], "encode() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.encode", false]], "encode() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.encode", false]], "env (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.env", false]], "eval() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.eval", false]], "extract_polygons() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.extract_polygons", false]], "filtered_tags (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.filtered_tags", false]], "fit() (kraken.lib.train.krakentrainer method)": [[2, "kraken.lib.train.KrakenTrainer.fit", false]], "font (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.font", false]], "forward() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.forward", false]], "greedy_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.greedy_decoder", false]], "groundtruthdataset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.GroundTruthDataset", false]], "height (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id13", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.height", false]], "hyper_params (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.hyper_params", false]], "idx (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.idx", false]], "im (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.im", false]], "im_mode (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.im_mode", false]], "im_mode (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.im_mode", false]], "im_mode (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.im_mode", false]], "im_str (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.im_str", false]], "imgs (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.imgs", false]], "init_weights() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.init_weights", false]], "input (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id1", false], [2, "kraken.lib.vgsl.TorchVGSLModel.input", false]], "is_valid (kraken.lib.codec.pytorchcodec property)": [[2, "kraken.lib.codec.PytorchCodec.is_valid", false]], "kind (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.kind", false]], "krakencairosurfaceexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenCairoSurfaceException", false]], "krakencodecexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenCodecException", false]], "krakenencodeexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenEncodeException", false]], "krakeninputexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenInputException", false]], "krakeninvalidmodelexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenInvalidModelException", false]], "krakenrecordexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenRecordException", false]], "krakenrepoexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenRepoException", false]], "krakenstoptrainingexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenStopTrainingException", false]], "krakentrainer (class in kraken.lib.train)": [[2, "kraken.lib.train.KrakenTrainer", false]], "l2c (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.l2c", false]], "line_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.line_idx", false]], "line_width (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.line_width", false]], "load_any() (in module kraken.lib.models)": [[2, "kraken.lib.models.load_any", false]], "load_clstm_model() (kraken.lib.vgsl.torchvgslmodel class method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.load_clstm_model", false]], "load_model() (kraken.lib.vgsl.torchvgslmodel class method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.load_model", false]], "load_pronn_model() (kraken.lib.vgsl.torchvgslmodel class method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.load_pronn_model", false]], "m (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.m", false]], "max_label (kraken.lib.codec.pytorchcodec property)": [[2, "kraken.lib.codec.PytorchCodec.max_label", false]], "mbl_dict (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.mbl_dict", false]], "merge() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.merge", false]], "message (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id14", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.message", false]], "miss (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.miss", false]], "mm_rpred (class in kraken.rpred)": [[2, "kraken.rpred.mm_rpred", false]], "mode (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.mode", false]], "model_type (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.model_type", false]], "mreg_dict (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.mreg_dict", false]], "named_spec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.named_spec", false]], "nets (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.nets", false]], "nlbin() (in module kraken.binarization)": [[2, "kraken.binarization.nlbin", false]], "nn (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.nn", false]], "nn (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id2", false], [2, "kraken.lib.vgsl.TorchVGSLModel.nn", false]], "no_encode() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.no_encode", false]], "no_encode() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.no_encode", false]], "num_classes (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.num_classes", false]], "ocr_record (class in kraken.rpred)": [[2, "kraken.rpred.ocr_record", false]], "on_validation_end() (kraken.lib.train.krakentrainer method)": [[2, "kraken.lib.train.KrakenTrainer.on_validation_end", false]], "one_channel_mode (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.one_channel_mode", false]], "one_channel_mode (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.one_channel_mode", false]], "one_channel_mode (kraken.lib.vgsl.torchvgslmodel property)": [[2, "id3", false]], "one_channel_modes (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.one_channel_modes", false]], "ops (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.ops", false]], "pad (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.pad", false]], "page_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.page_idx", false]], "pages (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.pages", false]], "parse() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.parse", false]], "parse() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.parse", false]], "parse_alto() (in module kraken.lib.xml)": [[2, "kraken.lib.xml.parse_alto", false]], "parse_page() (in module kraken.lib.xml)": [[2, "kraken.lib.xml.parse_page", false]], "parse_xml() (in module kraken.lib.xml)": [[2, "kraken.lib.xml.parse_xml", false]], "pattern (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.pattern", false]], "polygonal_reading_order() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.polygonal_reading_order", false]], "polygongtdataset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.PolygonGTDataset", false]], "predict() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict", false]], "predict_labels() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict_labels", false]], "predict_string() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict_string", false]], "prediction (kraken.rpred.ocr_record attribute)": [[2, "kraken.rpred.ocr_record.prediction", false]], "preparse_xml_data() (in module kraken.lib.dataset)": [[2, "kraken.lib.dataset.preparse_xml_data", false]], "pytorchcodec (class in kraken.lib.codec)": [[2, "kraken.lib.codec.PytorchCodec", false]], "reading_order() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.reading_order", false]], "render_report() (in module kraken.serialization)": [[2, "kraken.serialization.render_report", false]], "resize_output() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.resize_output", false]], "rpred() (in module kraken.rpred)": [[2, "kraken.rpred.rpred", false]], "save_model() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.save_model", false]], "scale_polygonal_lines() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.scale_polygonal_lines", false]], "scale_regions() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.scale_regions", false]], "seg_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.seg_idx", false]], "seg_type (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.seg_type", false]], "seg_type (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.seg_type", false]], "seg_type (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.seg_type", false]], "seg_type (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.seg_type", false]], "seg_type (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.seg_type", false]], "seg_types (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.seg_types", false]], "segment() (in module kraken.blla)": [[2, "kraken.blla.segment", false]], "segment() (in module kraken.pageseg)": [[2, "kraken.pageseg.segment", false]], "serialize() (in module kraken.serialization)": [[2, "kraken.serialization.serialize", false]], "serialize_segmentation() (in module kraken.serialization)": [[2, "kraken.serialization.serialize_segmentation", false]], "set_num_threads() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.set_num_threads", false]], "spec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.spec", false]], "split (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.split", false]], "strict (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.strict", false]], "suffix (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.suffix", false]], "tags (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.tags", false]], "tags (kraken.rpred.ocr_record attribute)": [[2, "kraken.rpred.ocr_record.tags", false]], "tags_ignore (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.tags_ignore", false]], "targets (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.targets", false]], "text_direction (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.text_direction", false]], "text_transforms (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.text_transforms", false]], "text_transforms (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.text_transforms", false]], "tmpl (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.tmpl", false]], "to() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.to", false]], "to() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.to", false]], "torchseqrecognizer (class in kraken.lib.models)": [[2, "kraken.lib.models.TorchSeqRecognizer", false]], "torchvgslmodel (class in kraken.lib.vgsl)": [[2, "kraken.lib.vgsl.TorchVGSLModel", false]], "train (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.train", false]], "train() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.train", false]], "transcriptioninterface (class in kraken.transcribe)": [[2, "kraken.transcribe.TranscriptionInterface", false]], "transform() (kraken.lib.dataset.baselineset method)": [[2, "kraken.lib.dataset.BaselineSet.transform", false]], "transforms (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.transforms", false]], "transforms (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.transforms", false]], "transforms (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.transforms", false]], "ts (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.ts", false]], "type (kraken.rpred.ocr_record attribute)": [[2, "kraken.rpred.ocr_record.type", false]], "user_metadata (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id4", false], [2, "kraken.lib.vgsl.TorchVGSLModel.user_metadata", false]], "valid_baselines (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.valid_baselines", false]], "valid_regions (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.valid_regions", false]], "vectorize_lines() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.vectorize_lines", false]], "width (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id15", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.width", false]], "write() (kraken.transcribe.transcriptioninterface method)": [[2, "kraken.transcribe.TranscriptionInterface.write", false]]}, "objects": {"kraken.binarization": [[2, 0, 1, "", "nlbin"]], "kraken.blla": [[2, 0, 1, "", "segment"]], "kraken.lib.codec": [[2, 1, 1, "", "PytorchCodec"]], "kraken.lib.codec.PytorchCodec": [[2, 2, 1, "", "add_labels"], [2, 3, 1, "", "c_sorted"], [2, 2, 1, "", "decode"], [2, 2, 1, "", "encode"], [2, 4, 1, "", "is_valid"], [2, 3, 1, "", "l2c"], [2, 4, 1, "", "max_label"], [2, 2, 1, "", "merge"], [2, 3, 1, "", "strict"]], "kraken.lib.ctc_decoder": [[2, 0, 1, "", "beam_decoder"], [2, 0, 1, "", "blank_threshold_decoder"], [2, 0, 1, "", "greedy_decoder"]], "kraken.lib.dataset": [[2, 1, 1, "", "BaselineSet"], [2, 1, 1, "", "GroundTruthDataset"], [2, 1, 1, "", "PolygonGTDataset"], [2, 0, 1, "", "compute_error"], [2, 0, 1, "", "preparse_xml_data"]], "kraken.lib.dataset.BaselineSet": [[2, 2, 1, "", "add"], [2, 3, 1, "", "aug"], [2, 3, 1, "", "class_mapping"], [2, 3, 1, "", "class_stats"], [2, 3, 1, "", "im_mode"], [2, 3, 1, "", "imgs"], [2, 3, 1, "", "line_width"], [2, 3, 1, "", "mbl_dict"], [2, 3, 1, "", "mode"], [2, 3, 1, "", "mreg_dict"], [2, 3, 1, "", "num_classes"], [2, 3, 1, "", "seg_type"], [2, 3, 1, "", "targets"], [2, 2, 1, "", "transform"], [2, 3, 1, "", "transforms"], [2, 3, 1, "", "valid_baselines"], [2, 3, 1, "", "valid_regions"]], "kraken.lib.dataset.GroundTruthDataset": [[2, 2, 1, "", "add"], [2, 3, 1, "", "alphabet"], [2, 3, 1, "", "aug"], [2, 2, 1, "", "encode"], [2, 3, 1, "", "im_mode"], [2, 2, 1, "", "no_encode"], [2, 2, 1, "", "parse"], [2, 3, 1, "", "seg_type"], [2, 3, 1, "", "split"], [2, 3, 1, "", "suffix"], [2, 3, 1, "", "text_transforms"], [2, 3, 1, "", "transforms"]], "kraken.lib.dataset.PolygonGTDataset": [[2, 2, 1, "", "add"], [2, 3, 1, "", "alphabet"], [2, 3, 1, "", "aug"], [2, 2, 1, "", "encode"], [2, 3, 1, "", "im_mode"], [2, 2, 1, "", "no_encode"], [2, 2, 1, "", "parse"], [2, 3, 1, "", "seg_type"], [2, 3, 1, "", "text_transforms"], [2, 3, 1, "", "transforms"]], "kraken.lib.exceptions": [[2, 1, 1, "", "KrakenCairoSurfaceException"], [2, 1, 1, "", "KrakenCodecException"], [2, 1, 1, "", "KrakenEncodeException"], [2, 1, 1, "", "KrakenInputException"], [2, 1, 1, "", "KrakenInvalidModelException"], [2, 1, 1, "", "KrakenRecordException"], [2, 1, 1, "", "KrakenRepoException"], [2, 1, 1, "", "KrakenStopTrainingException"]], "kraken.lib.exceptions.KrakenCairoSurfaceException": [[2, 3, 1, "id13", "height"], [2, 3, 1, "id14", "message"], [2, 3, 1, "id15", "width"]], "kraken.lib.models": [[2, 1, 1, "", "TorchSeqRecognizer"], [2, 0, 1, "", "load_any"]], "kraken.lib.models.TorchSeqRecognizer": [[2, 3, 1, "", "codec"], [2, 3, 1, "", "decoder"], [2, 3, 1, "", "device"], [2, 2, 1, "", "forward"], [2, 3, 1, "", "kind"], [2, 3, 1, "", "nn"], [2, 3, 1, "", "one_channel_mode"], [2, 2, 1, "", "predict"], [2, 2, 1, "", "predict_labels"], [2, 2, 1, "", "predict_string"], [2, 3, 1, "", "seg_type"], [2, 2, 1, "", "to"], [2, 3, 1, "", "train"]], "kraken.lib.segmentation": [[2, 0, 1, "", "calculate_polygonal_environment"], [2, 0, 1, "", "compute_polygon_section"], [2, 0, 1, "", "denoising_hysteresis_thresh"], [2, 0, 1, "", "extract_polygons"], [2, 0, 1, "", "polygonal_reading_order"], [2, 0, 1, "", "reading_order"], [2, 0, 1, "", "scale_polygonal_lines"], [2, 0, 1, "", "scale_regions"], [2, 0, 1, "", "vectorize_lines"]], "kraken.lib.train": [[2, 1, 1, "", "KrakenTrainer"]], "kraken.lib.train.KrakenTrainer": [[2, 2, 1, "", "fit"], [2, 2, 1, "", "on_validation_end"]], "kraken.lib.vgsl": [[2, 1, 1, "", "TorchVGSLModel"]], "kraken.lib.vgsl.TorchVGSLModel": [[2, 2, 1, "", "add_codec"], [2, 2, 1, "", "append"], [2, 3, 1, "", "blocks"], [2, 2, 1, "", "build_addition"], [2, 2, 1, "", "build_conv"], [2, 2, 1, "", "build_dropout"], [2, 2, 1, "", "build_groupnorm"], [2, 2, 1, "", "build_identity"], [2, 2, 1, "", "build_maxpool"], [2, 2, 1, "", "build_output"], [2, 2, 1, "", "build_parallel"], [2, 2, 1, "", "build_reshape"], [2, 2, 1, "", "build_rnn"], [2, 2, 1, "", "build_series"], [2, 3, 1, "", "codec"], [2, 3, 1, "id0", "criterion"], [2, 2, 1, "", "eval"], [2, 4, 1, "", "hyper_params"], [2, 3, 1, "", "idx"], [2, 2, 1, "", "init_weights"], [2, 3, 1, "id1", "input"], [2, 2, 1, "", "load_clstm_model"], [2, 2, 1, "", "load_model"], [2, 2, 1, "", "load_pronn_model"], [2, 3, 1, "", "m"], [2, 4, 1, "", "model_type"], [2, 3, 1, "", "named_spec"], [2, 3, 1, "id2", "nn"], [2, 4, 1, "id3", "one_channel_mode"], [2, 3, 1, "", "ops"], [2, 3, 1, "", "pattern"], [2, 2, 1, "", "resize_output"], [2, 2, 1, "", "save_model"], [2, 4, 1, "", "seg_type"], [2, 2, 1, "", "set_num_threads"], [2, 3, 1, "", "spec"], [2, 2, 1, "", "to"], [2, 2, 1, "", "train"], [2, 3, 1, "id4", "user_metadata"]], "kraken.lib.xml": [[2, 0, 1, "", "parse_alto"], [2, 0, 1, "", "parse_page"], [2, 0, 1, "", "parse_xml"]], "kraken.pageseg": [[2, 0, 1, "", "segment"]], "kraken.rpred": [[2, 0, 1, "", "bidi_record"], [2, 1, 1, "", "mm_rpred"], [2, 1, 1, "", "ocr_record"], [2, 0, 1, "", "rpred"]], "kraken.rpred.mm_rpred": [[2, 3, 1, "", "bidi_reordering"], [2, 3, 1, "", "bounds"], [2, 3, 1, "", "filtered_tags"], [2, 3, 1, "", "im"], [2, 3, 1, "", "im_str"], [2, 3, 1, "", "miss"], [2, 3, 1, "", "nets"], [2, 3, 1, "", "one_channel_modes"], [2, 3, 1, "", "pad"], [2, 3, 1, "", "seg_types"], [2, 3, 1, "", "tags"], [2, 3, 1, "", "tags_ignore"], [2, 3, 1, "", "ts"]], "kraken.rpred.ocr_record": [[2, 3, 1, "", "base_dir"], [2, 3, 1, "", "confidences"], [2, 3, 1, "", "cuts"], [2, 3, 1, "", "prediction"], [2, 3, 1, "", "tags"], [2, 3, 1, "", "type"]], "kraken.serialization": [[2, 0, 1, "", "render_report"], [2, 0, 1, "", "serialize"], [2, 0, 1, "", "serialize_segmentation"]], "kraken.transcribe": [[2, 1, 1, "", "TranscriptionInterface"]], "kraken.transcribe.TranscriptionInterface": [[2, 2, 1, "", "add_page"], [2, 3, 1, "", "env"], [2, 3, 1, "", "font"], [2, 3, 1, "", "line_idx"], [2, 3, 1, "", "page_idx"], [2, 3, 1, "", "pages"], [2, 3, 1, "", "seg_idx"], [2, 3, 1, "", "text_direction"], [2, 3, 1, "", "tmpl"], [2, 2, 1, "", "write"]]}, "objnames": {"0": ["py", "function", "Python function"], "1": ["py", "class", "Python class"], "2": ["py", "method", "Python method"], "3": ["py", "attribute", "Python attribute"], "4": ["py", "property", "Python property"]}, "objtypes": {"0": "py:function", "1": "py:class", "2": "py:method", "3": "py:attribute", "4": "py:property"}, "terms": {"": [1, 2, 4, 5, 6, 7, 8], "0": [0, 1, 2, 4, 5, 7, 8], "00": [5, 7], "0001": 5, "0005": 4, "001": [5, 7], "0123456789": [4, 7], "01c59": 8, "02": 4, "0245": 7, "04": 7, "06": 7, "07": 5, "09": 7, "0d": 7, "1": [1, 2, 5, 7, 8], "10": [1, 4, 5, 7], "100": [2, 5, 7, 8], "10000": 4, "1015": 1, "1020": 8, "10218": 5, "1024": 8, "103": 1, "105": 1, "106": 5, "108": 5, "11": 7, "1128": 5, "11346": 5, "1161": 1, "117": 1, "1184": 7, "119": 1, "1195": 1, "12": [5, 7, 8], "120": 5, "1200": 5, "121": 1, "122": 5, "124": 1, "125": 5, "126": 1, "128": [5, 8], "13": [5, 7], "131": 1, "132": 7, "1339": 7, "134": 5, "135": 5, "1359": 7, "136": 5, "1377": 1, "1385": 1, "1388": 1, "1397": 1, "14": [0, 5], "1408": [1, 2], "1410": 1, "1412": 1, "1416": 7, "143": 7, "144": 5, "145": 1, "15": [1, 5, 7], "151": 5, "1558": 7, "1567": 5, "157": 7, "15924": 0, "16": [2, 5, 8], "161": 7, "1623": 7, "1626": 0, "1676": 0, "1678": 0, "1681": 7, "1697": 7, "17": [2, 5], "1708": 1, "1716": 1, "172": 5, "1724": 7, "174": 5, "1754": 7, "176": 7, "18": [5, 7], "1824": 1, "19": [1, 5], "192": 5, "197": 0, "198": 5, "199": 5, "1996": 7, "1cycl": 5, "1d": 8, "1st": 7, "1x0": 5, "1x12": [5, 8], "1x16": 8, "1x48": 8, "2": [2, 4, 5, 7, 8], "20": [1, 2, 5, 8], "200": 5, "2000": 1, "2001": 5, "2006": 2, "2014": 2, "2016": 1, "2017": 1, "2019": [4, 5], "2020": 4, "204": 7, "2041": 1, "207": [1, 5], "2072": 1, "2077": 1, "2078": 1, "2096": 7, "21": 4, "210": 5, "215": 5, "216": [0, 1], "2169": 0, "2172": 0, "22": [5, 7], "2208": 0, "221": 0, "2215": 0, "2236": 0, "2241": 0, "228": 1, "2293": 0, "23": 5, "230": 1, "2302": 0, "232": 1, "2334": 7, "2364": 7, "23rd": 2, "24": [1, 7], "241": 5, "2423": 0, "2424": 0, "2426": 1, "244": 0, "246": 5, "2483": 1, "25": [1, 5, 7, 8], "250": 1, "2500": 7, "253": 1, "256": [5, 7, 8], "2577813": 4, "259": 7, "26": [4, 7], "2641": 0, "266": 5, "2681": 0, "27": 5, "270": 7, "27046": 7, "274": 5, "28": [1, 5], "2873": 2, "29": [1, 5], "2d": [2, 8], "3": [2, 5, 7, 8], "30": [5, 7], "300": 5, "300dpi": 7, "307": 7, "309": 0, "31": 5, "318": 0, "32": [5, 8], "320": 0, "328": 5, "3292": 1, "336": 7, "3367": 1, "3398": 1, "3414": 1, "3418": 7, "3437": 1, "345": 1, "3455": 1, "35000": 7, "3504": 7, "3514": 1, "3519": 7, "35619": 7, "365": 7, "3680": 7, "38": 5, "384": 8, "39": 5, "4": [0, 1, 2, 5, 7, 8], "40": 7, "400": [0, 5], "4000": 5, "412": 0, "416": 0, "428": 7, "431": 7, "46": 5, "47": 7, "471": 1, "473": 1, "48": [5, 7, 8], "488": 7, "49": [5, 7], "491": 1, "4d": 2, "5": [1, 2, 5, 7, 8], "50": [5, 7], "500": 5, "503": 0, "509": 1, "512": 8, "515": 1, "52": [5, 7], "522": 1, "5226": 5, "523": 5, "5230": 5, "5234": 5, "524": 1, "5258": 7, "5281": 4, "53": 5, "534": 1, "536": [1, 5], "53980": 5, "54": 1, "54114": 5, "5431": 5, "545": 7, "546": 0, "56": [1, 7], "561": 0, "562": 1, "575": [1, 5], "577": 7, "59": [7, 8], "5951": 7, "599": 7, "6": [5, 7, 8], "60": [5, 7], "6022": 5, "62": 5, "63": 5, "64": [5, 8], "646": 7, "66": [5, 7], "668": 1, "69": 1, "7": [1, 5, 7, 8], "70": 1, "7012": 5, "7015": 7, "71": 5, "7272": 7, "7281": 7, "73": 1, "74": [1, 5], "7593": 5, "76": 1, "773": 5, "7857": 5, "788": [5, 7], "79": 1, "794": 5, "7943": 5, "8": [5, 7, 8], "80": [2, 5], "800": 7, "8014": 5, "81": [5, 7], "811": 7, "82": 5, "824": 7, "8337": 5, "8344": 5, "8374": 5, "84": [1, 7], "8445": 7, "8479": 7, "848": 0, "8481": 7, "8482": 7, "8484": 7, "8485": 7, "8486": 7, "8487": 7, "8488": 7, "8489": 7, "8490": 7, "8491": 7, "8492": 7, "8493": 7, "8494": 7, "8495": 7, "8496": 7, "8497": 7, "8498": 7, "8499": 7, "8500": 7, "8501": 7, "8502": 7, "8503": 7, "8504": 7, "8505": 7, "8506": 7, "8507": 7, "8508": 7, "8509": 7, "8510": 7, "8511": 7, "8512": 7, "8616": 5, "8620": 5, "876": 7, "8760": 5, "8762": 5, "8790": 5, "8795": 5, "8797": 5, "88": [5, 7], "8802": 5, "8804": 5, "8806": 5, "8813": 5, "8876": 5, "8878": 5, "8883": 5, "889": 7, "9": [1, 5, 7, 8], "90": [2, 5], "906": 8, "906x32": 8, "91": 5, "92": 1, "9315": 7, "9318": 7, "9350": 7, "9361": 7, "9381": 7, "95": [4, 5], "9541": 7, "9550": 7, "96": [5, 7], "97": 7, "98": 7, "99": [4, 7], "9918": 7, "9920": 7, "9924": 7, "A": [0, 1, 2, 4, 5, 7, 8], "As": [0, 1, 2], "By": 7, "For": [0, 1, 2, 5, 7, 8], "If": [0, 2, 4, 5, 7, 8], "In": [0, 1, 2, 4, 5, 7], "It": [0, 1, 5, 7], "NO": 7, "One": [2, 5], "The": [0, 1, 2, 3, 4, 5, 7, 8], "Then": 5, "There": [0, 1, 4, 5, 6, 7], "These": [0, 1, 2, 4, 5, 7], "To": [0, 1, 2, 4, 5, 7], "Will": 2, "abbyxml": 4, "abbyyxml": 0, "abcdefghijklmnopqrstuvwxyz": 4, "abjad": 5, "abl": [0, 2, 5, 7], "abort": [5, 7], "about": 7, "abov": [0, 1, 5, 7], "absolut": [2, 5], "abugida": 5, "acceler": [4, 5, 7], "accept": [0, 2, 5], "access": [0, 1], "accord": [2, 5], "accordingli": 2, "account": 7, "accur": 5, "accuraci": [1, 2, 4, 5, 7], "achiev": 7, "acm": 2, "across": [2, 5], "action": [0, 5], "activ": [5, 7, 8], "actual": [2, 4, 7], "ad": [2, 5, 7], "adam": 5, "adapt": 5, "add": [2, 4, 5, 8], "add_codec": 2, "add_label": 2, "add_pag": 2, "addit": [0, 1, 2, 4, 5], "addition": 2, "adjust": [5, 7, 8], "advantag": 5, "advis": 7, "affect": 7, "after": [1, 5, 7, 8], "afterward": 1, "again": [4, 7], "agenc": 4, "aggreg": 2, "ah": 7, "aid": 4, "aku": 7, "al": [2, 7], "alam": 7, "albeit": 7, "aletheia": 7, "alex": 2, "algorithm": [0, 1, 2, 5], "all": [0, 1, 2, 4, 5, 6, 7], "allow": [5, 6, 7], "almost": [0, 1], "along": [2, 8], "alphabet": [2, 4, 5, 7, 8], "alphanumer": 0, "alreadi": 5, "also": [0, 1, 2, 4, 5, 7], "altern": [0, 5, 8], "although": [0, 1, 5, 7], "alto": [0, 1, 2, 4, 7], "alto_doc": 1, "alwai": [0, 2, 4], "amiss": 7, "among": 5, "amount": [0, 7], "an": [0, 1, 2, 4, 5, 7, 8], "anaconda": 4, "analysi": [0, 4, 7], "ani": [0, 1, 2, 5], "annot": [0, 4, 5], "anoth": [2, 5, 7, 8], "anr": 4, "antiqua": 0, "anymor": [5, 7], "anyth": 2, "apach": 4, "apart": [3, 5], "apdjfqpf": 2, "api": 5, "append": [0, 2, 5, 7, 8], "appli": [1, 2, 4, 7, 8], "applic": [1, 7], "approach": [5, 7], "appropri": [0, 2, 4, 5, 7, 8], "approxim": 1, "ar": [0, 1, 2, 4, 5, 6, 7, 8], "arab": [0, 5, 7], "arbitrari": [1, 6, 7, 8], "architectur": [4, 5, 6, 8], "archiv": [1, 5, 7], "area": 2, "aren": 2, "arg": 2, "argument": [1, 5], "arm": 4, "around": [1, 2, 5, 7], "arrai": [1, 2], "arrow": 5, "arxiv": 2, "aspect": 2, "assign": [2, 5, 7], "associ": 1, "assum": 2, "attach": [1, 5], "attribut": [0, 1, 2, 5], "au": 4, "aug": 2, "augment": [1, 2, 5, 7, 8], "author": [0, 4], "auto": [1, 2, 5], "autodetermin": 2, "automat": [0, 1, 2, 5, 7, 8], "auxiliari": 1, "avail": [0, 1, 4, 5, 7], "avenir": 4, "averag": [0, 2, 5, 7], "awni": 2, "axi": [2, 8], "b": [0, 1, 5, 7, 8], "back": 2, "backend": 3, "background": 2, "base": [1, 2, 5, 6, 7, 8], "base_dir": 2, "baselin": [2, 4, 5, 7], "baseline_seg": 1, "baselineset": 2, "basic": [0, 5, 7], "batch": [0, 2, 7, 8], "bayr\u016bt": 7, "bbox": 2, "beam": 2, "beam_decod": 2, "beam_siz": 2, "becaus": [1, 7], "becom": 0, "been": [0, 4, 5, 7], "befor": [2, 5, 7, 8], "beforehand": 7, "behav": [5, 8], "behavior": 5, "being": [1, 2, 8], "below": [5, 7], "benjamin": [0, 4], "best": [2, 5, 7], "better": 5, "between": [0, 2, 5, 7], "bi": [2, 8], "bidi": [2, 4, 5], "bidi_record": 2, "bidi_reord": 2, "bidilstm": 2, "bidirect": [2, 5], "bidirection": 8, "binar": [1, 7], "binari": [1, 2], "bit": 1, "biton": [0, 2], "bl": 4, "black": [0, 1, 2, 7], "black_colsep": 2, "blank": 2, "blank_threshold_decod": 2, "blla": 1, "blob": 2, "block": [1, 2, 5, 8], "block_i": 5, "block_n": 5, "board": 4, "boilerpl": 1, "bool": 2, "border": [0, 2], "both": [0, 1, 2, 3, 4, 5, 7], "bottom": [0, 1, 2, 4], "bound": [0, 1, 2, 4, 5], "boundari": [1, 2, 5], "box": [0, 1, 2, 4, 5], "break": 7, "brought": 5, "build": [2, 5, 7], "build_addit": 2, "build_conv": 2, "build_dropout": 2, "build_groupnorm": 2, "build_ident": 2, "build_maxpool": 2, "build_output": 2, "build_parallel": 2, "build_reshap": 2, "build_rnn": 2, "build_seri": 2, "buld\u0101n": 7, "bw": 4, "bw_im": 1, "bw_imag": 7, "b\u00e9n\u00e9fici\u00e9": 4, "c": [1, 2, 4, 5, 8], "c1": 2, "c2": 2, "c_sort": 2, "cach": 2, "cairo": 2, "calcul": [1, 2], "calculate_polygonal_environ": 2, "call": [1, 2, 5, 7], "callabl": 2, "callback": [1, 2], "can": [0, 1, 2, 3, 4, 5, 7, 8], "capabl": 5, "case": [0, 1, 2, 5, 7], "cat": 0, "caus": [1, 2], "caveat": 5, "cd": 4, "ce": [4, 7], "cell": 8, "cent": 7, "centerlin": 5, "central": [4, 7], "certain": [0, 2, 7], "chain": [0, 4, 7], "chang": [0, 1, 2, 5], "channel": [2, 4, 8], "char": 2, "char_confus": 2, "charact": [0, 1, 2, 4, 5, 6, 7], "charset": 2, "check": [0, 5], "chines": 5, "chinese_training_data": 5, "choic": 5, "chosen": 1, "circumst": 7, "class": [1, 2, 5, 7], "class_map": 2, "class_stat": 2, "classic": 7, "classif": [2, 7, 8], "classifi": [0, 1, 8], "classmethod": 2, "claus": 7, "cli": 1, "client": 0, "clone": [0, 4], "close": 4, "closer": 1, "clstm": [0, 2, 6], "code": [0, 1, 2, 4, 5, 7], "codec": 1, "collect": [2, 7], "color": [0, 1, 5, 7, 8], "colsep": 0, "column": [0, 1, 2], "com": [4, 7], "combin": [0, 1, 5, 7, 8], "come": 2, "command": [0, 1, 4, 5, 7], "commenc": 1, "common": [2, 5, 7], "commoni": 5, "compact": [0, 6], "compar": 5, "comparison": 5, "compat": [2, 3, 5], "compil": 5, "complet": [1, 5, 7], "complex": [1, 7], "complic": 5, "compos": 2, "composedblocktyp": 5, "compound": 2, "compress": 7, "compris": 7, "comput": [2, 3, 4, 5, 7], "computation": 7, "compute_error": 2, "compute_polygon_sect": 2, "conda": 7, "condit": [4, 5], "confer": 2, "confid": [0, 1, 2], "configur": [1, 2, 5], "conform": 5, "confus": 5, "connect": [2, 7], "connectionist": 2, "consid": 2, "consist": [0, 1, 4, 7, 8], "constant": 5, "construct": [5, 7], "contain": [0, 1, 2, 4, 5, 6, 7], "content": 5, "continu": [1, 2, 5, 7], "contrast": 7, "contrib": 1, "contribut": 4, "control": 5, "conv": [5, 8], "convers": [1, 7], "convert": [0, 1, 2, 5, 7], "convolut": [2, 5], "coord": 5, "coordin": [0, 2, 4], "core": 6, "coreml": 2, "corpu": [4, 5], "correct": [1, 2, 5, 7], "correspond": [0, 1, 2], "cosin": 5, "cost": 7, "could": [2, 5], "couldn": 2, "count": [2, 5, 7], "counter": 2, "coupl": [0, 5, 7], "coverag": 7, "cpu": [1, 2, 5, 7], "cr3": [5, 8], "cr7": 5, "creat": [2, 4, 5, 7, 8], "criterion": 2, "ctc": [1, 2, 5], "ctc_decod": 1, "cuda": [3, 4, 5], "cudnn": 3, "cumbersom": 0, "cuneiform": 5, "curat": 0, "current": [2, 5, 6], "custom": [1, 5], "cut": [1, 2, 4], "cycl": 5, "d": [0, 4, 5, 7, 8], "dai": 4, "data": [0, 1, 2, 4, 7, 8], "dataset": 1, "dataset_larg": 5, "date": 4, "de": [4, 7], "deal": [0, 5], "debug": [1, 5, 7], "decai": 5, "decid": 0, "decod": [1, 2, 5], "decompos": 5, "decomposit": 5, "decreas": 7, "deem": 0, "def": 1, "default": [0, 1, 2, 4, 5, 6, 7, 8], "default_split": 2, "defin": [0, 1, 2, 4, 5, 8], "definit": [5, 8], "degrad": 1, "degre": 7, "del_indic": 2, "delet": [2, 5, 7], "delta": 5, "denoising_hysteresis_thresh": 2, "depend": [0, 1, 4, 5, 7], "depth": [5, 7, 8], "describ": [2, 5], "descript": [0, 5], "descriptor": 2, "deseri": 2, "desir": [1, 8], "desktop": 7, "destin": 2, "destroi": 5, "detail": [0, 5, 7], "detect": 2, "determin": [0, 2, 5], "develop": [2, 4], "deviat": 5, "devic": [1, 2, 5, 7], "diacrit": 5, "diaeres": 7, "diaeresi": 7, "diagram": 5, "dialect": 8, "dice": 5, "dict": 2, "dictionari": [2, 5], "differ": [0, 1, 4, 5, 7, 8], "difficult": 5, "digit": 5, "dim": [5, 7, 8], "dimens": [2, 8], "dimension": 5, "dir": [2, 5], "direct": [0, 1, 2, 4, 5, 7, 8], "directli": [0, 5], "directori": [1, 4, 5, 7], "disabl": [0, 2, 5, 7], "disk": 7, "displai": [2, 5], "dist1": 2, "dist2": 2, "distanc": 2, "distribut": 8, "dnn": 2, "do": [1, 2, 4, 5, 6, 7, 8], "do0": [5, 8], "document": [0, 1, 2, 4, 5, 7], "doe": [1, 2, 5, 7], "doesn": [2, 5, 7], "domain": [1, 5], "done": [5, 7], "dot": 7, "down": 7, "download": [4, 7], "downward": 2, "drastic": 5, "draw": 1, "drawback": 5, "driver": 1, "drop": [1, 8], "dropout": [2, 5, 7], "du": 4, "dumb": 5, "duplic": 2, "dure": [2, 5, 7], "e": [0, 1, 2, 5, 7, 8], "each": [0, 1, 2, 4, 5, 7, 8], "earli": [5, 7], "easiest": 7, "easili": [5, 7], "ecod": 2, "edg": 2, "edit": [2, 7], "editor": 7, "edu": 7, "either": [0, 2, 5, 7, 8], "element": 5, "email": 0, "emit": 2, "emploi": 7, "empti": 2, "en": 0, "enabl": [0, 1, 2, 3, 5, 7, 8], "enable_progress_bar": [1, 2], "enable_summari": 2, "encapsul": 1, "encod": [2, 5, 7], "end": [1, 2], "end_separ": 2, "endpoint": 2, "energi": 2, "enforc": [0, 5], "engin": 1, "english": 4, "enough": 7, "ensur": 5, "entri": 2, "env": [2, 4, 7], "environ": [2, 4, 7], "environment_cuda": 4, "epoch": [5, 7], "equal": [1, 7, 8], "equival": 8, "erron": 7, "error": [0, 2, 5, 7], "escal": [0, 2], "escripta": 4, "escriptorium": [4, 7], "esr": 4, "estim": [0, 2, 7], "et": 2, "european": 4, "eval": 2, "evalu": [0, 5], "evaluation_data": 1, "evaluation_fil": 1, "even": 7, "everyth": 5, "exact": [5, 7], "exactli": [1, 5], "exampl": [1, 5, 7], "except": [1, 5], "execut": [0, 7, 8], "exhaust": 7, "exist": [0, 1, 5, 7], "exit": 2, "expand": 0, "expect": [2, 5, 7, 8], "experi": [4, 7], "experiment": 7, "explicit": [1, 5], "explicitli": [0, 5, 7], "exponenti": 5, "express": 0, "extend": 8, "extens": 5, "extent": 7, "extra": [2, 4], "extract": [0, 1, 2, 4, 5, 7], "extract_polygon": 2, "extrapol": 2, "f": [0, 4, 5, 7, 8], "f_t": 2, "factor": 2, "fail": 5, "fairli": 7, "fallback": 0, "fals": [1, 2, 5, 7, 8], "faq\u012bh": 7, "faster": [5, 7, 8], "fd": 2, "featur": [0, 1, 2, 7, 8], "fed": [0, 1, 2, 5, 8], "feed": [0, 1], "feminin": 7, "fetch": 7, "few": [0, 5, 7], "field": [0, 2, 5], "figur": 1, "file": [0, 1, 2, 4, 5, 6, 7], "file_1": 5, "file_2": 5, "filenam": [1, 2, 5], "filenotfounderror": 2, "fill": 2, "filter": [1, 2, 5, 8], "filtered_tag": 2, "final": [0, 2, 4, 5, 7, 8], "find": [5, 7], "fine": [1, 7], "finish": 7, "first": [0, 1, 2, 5, 7, 8], "fit": [1, 2, 7], "fix": [5, 7], "flag": [1, 2, 4], "float": [0, 2], "flush": 2, "fname": 2, "follow": [0, 2, 5, 8], "font": 2, "font_styl": 2, "foo": [1, 5], "forg": 4, "form": [2, 5], "format": [0, 1, 2, 6, 7], "format_typ": [1, 2], "formul": 8, "forward": [2, 8], "found": [1, 5, 7], "fp": 1, "framework": [1, 4], "free": [2, 5], "freeli": [0, 7], "frequenc": [5, 7], "friendli": [4, 7], "from": [0, 1, 2, 3, 4, 7, 8], "full": 7, "fulli": [2, 4], "function": [1, 5], "fundament": 1, "further": [1, 2, 4, 5], "g": [0, 2, 5, 7, 8], "gain": 1, "garantue": 2, "gaussian_filt": 2, "gener": [0, 1, 2, 4, 5, 7], "gentl": 5, "get": [0, 1, 4, 5, 7], "git": [0, 4], "github": 4, "githubusercont": 7, "gitter": 4, "given": [1, 2, 5, 8], "glob": [0, 1], "glyph": [5, 7], "gn": 8, "gn32": 5, "go": 7, "good": 5, "gov": 5, "gpu": [1, 5], "gradient": 2, "grain": [1, 7], "graph": [2, 8], "graphem": [2, 5, 7], "graphic": 5, "grave": 2, "grayscal": [0, 1, 2, 7, 8], "greedi": 2, "greedili": 2, "greedy_decod": [1, 2], "greek": [0, 7], "grei": 0, "grek": 0, "ground": [5, 7], "ground_truth": 1, "groundtruthdataset": 2, "group": [4, 7], "gru": [2, 8], "gt": [2, 5], "guarante": 1, "guid": 7, "gz": 0, "g\u00e9r\u00e9e": 4, "h": [0, 2, 7], "ha": [0, 1, 2, 4, 5, 7, 8], "hamza": [5, 7], "han": 5, "hand": [5, 7], "handl": 1, "handwrit": 5, "handwritten": [1, 5], "hannun": 2, "happen": 1, "happili": 0, "hard": [2, 7], "hardwar": 4, "haut": 4, "have": [0, 1, 2, 3, 4, 5, 7], "heatmap": 1, "hebrew": [5, 7], "hebrew_training_data": 5, "height": [0, 2, 5, 8], "held": 7, "help": [4, 7], "here": 5, "high": [0, 1, 2, 7, 8], "higher": 8, "highli": [2, 5, 7], "histor": 4, "hline": 0, "hoc": 5, "hocr": [0, 2, 4, 7], "horizon": 4, "horizont": [0, 1, 2], "hour": 7, "how": [4, 5, 7], "hpo": 5, "html": 2, "http": [0, 5, 7], "huffmann": 5, "human": 5, "hundr": 7, "hyper_param": 2, "h\u0101d\u012b": 7, "i": [0, 1, 2, 4, 5, 6, 7, 8], "ibn": 7, "id": 5, "ident": 1, "identifi": 0, "idx": 2, "ignor": [0, 2, 5], "illustr": 2, "im": [1, 2], "im_feat": 2, "im_mod": 2, "im_str": 2, "im_transform": 2, "imag": [0, 1, 2, 4, 5, 8], "image_nam": [1, 2], "image_s": [1, 2], "imagefilenam": 5, "imaginari": 7, "img": 2, "immedi": 5, "impath": 2, "implement": [1, 8], "implicitli": 5, "import": [1, 5, 7], "importantli": [2, 5, 7], "improv": [0, 5, 7], "includ": [0, 1, 4, 5, 7], "incompat": 2, "incorrect": 7, "increas": [5, 7], "independ": 8, "index": [0, 2, 5], "indic": [0, 2, 5, 7], "individu": 5, "infer": [2, 4, 5, 7], "influenc": 5, "inform": [0, 1, 2, 4, 5, 7], "ingest": 5, "inherit": [5, 7], "init": 1, "init_weight": 2, "initi": [1, 2, 5, 7, 8], "innov": 4, "input": [1, 2, 5, 7, 8], "input_1": [0, 7], "input_2": [0, 7], "input_imag": 7, "insert": [2, 5, 7, 8], "insid": 2, "insight": 1, "inspect": 7, "instal": 3, "instanc": [1, 2, 5], "instanti": 2, "instead": [2, 5, 7], "insuffici": 7, "int": 2, "integ": [0, 1, 2, 5, 7, 8], "integr": 7, "intend": 4, "intens": 7, "interchang": 2, "interfac": [2, 4], "intermedi": [1, 5, 7], "intern": [0, 1, 2, 7], "interoper": 2, "interrupt": 5, "introduct": 5, "inttensor": 2, "intuit": 8, "invalid": [2, 5], "inventori": [5, 7], "invers": 0, "investiss": 4, "invoc": 5, "invok": 7, "involv": [5, 7], "irregular": 5, "is_valid": 2, "isn": [1, 2, 7, 8], "iso": 0, "iter": [1, 2, 7], "its": [2, 5, 7], "itself": 1, "j": 2, "jinja2": [1, 2], "jpeg": 7, "jpeg2000": [0, 4], "jpg": 5, "json": [0, 4, 5], "just": [0, 1, 4, 5, 7], "justif": 5, "kamil": 5, "kei": [2, 4], "kernel": [5, 8], "kernel_s": 8, "keto": [5, 7], "keyword": 0, "kiessl": [0, 4], "kind": [2, 5, 6, 7], "kit\u0101b": 7, "know": 7, "known": [2, 7], "kraken": [0, 1, 3, 5, 6, 8], "krakencairosurfaceexcept": 2, "krakencodecexcept": 2, "krakenencodeexcept": 2, "krakeninputexcept": 2, "krakeninvalidmodelexcept": 2, "krakenrecordexcept": 2, "krakenrepoexcept": 2, "krakenstoptrainingexcept": 2, "krakentrain": [1, 2], "kutub": 7, "kwarg": 2, "l": [0, 2, 4, 7, 8], "l2c": [1, 2], "la": 4, "label": [1, 2, 5], "lack": 7, "lag": 5, "languag": [5, 8], "larg": [0, 1, 2, 4, 5, 7], "larger": [2, 5, 7], "last": [2, 5, 8], "lastli": 5, "later": 7, "latest": [3, 4], "latin": [0, 4], "latin_training_data": 5, "latn": [0, 4], "latter": 1, "layer": [2, 5, 7], "layout": [0, 2, 4, 5, 7], "lbx100": [5, 7, 8], "lbx128": [5, 8], "lbx200": 5, "lbx256": [5, 8], "learn": [1, 2, 5], "least": [5, 7], "leav": [5, 8], "left": [0, 2, 4, 5, 7], "legaci": [5, 7, 8], "leipzig": 7, "len": 2, "length": [2, 5], "less": 7, "let": 7, "level": [1, 2, 5, 7], "lfx25": 8, "lfys20": 8, "lfys64": [5, 8], "lib": 1, "libr": 4, "librari": 1, "licens": 0, "lightn": 1, "lightningmodul": 1, "lightweight": 4, "like": [0, 1, 5, 7], "likewis": [1, 7], "limit": 5, "line": [0, 1, 2, 4, 5, 7, 8], "line_0": 5, "line_idx": 2, "line_k": 5, "line_width": 2, "linear": [2, 5, 7, 8], "link": [4, 5], "linux": [4, 7], "list": [0, 2, 4, 5, 7], "ll": 4, "load": [1, 2, 4, 5, 7], "load_ani": [1, 2], "load_clstm_model": 2, "load_model": [1, 2], "load_pronn_model": 2, "loadabl": 2, "loader": 1, "loc": 5, "locat": [1, 2, 5, 7], "log": [5, 7], "logic": 5, "logograph": 5, "long": 5, "longest": 2, "look": [1, 5, 7], "lossless": 7, "lot": [1, 5], "low": [0, 1, 2, 5], "lower": 5, "lr": [0, 1, 2, 7], "lrate": 5, "lstm": [2, 8], "ltr": [0, 2], "m": [0, 2, 5, 7, 8], "mac": [4, 7], "machin": 2, "maddah": 7, "made": 7, "mai": [0, 2, 5, 7], "main": [4, 5], "mainli": [1, 2], "major": 1, "make": 5, "mandatori": 1, "mani": [2, 5], "manifest": 5, "manual": [1, 2, 7], "manuscript": 7, "map": [0, 1, 2, 5], "mark": [5, 7], "markedli": 7, "mask": [1, 2], "massag": 5, "master": 7, "match": [2, 5], "materi": [1, 4, 7], "matrix": 1, "matter": 7, "max": 2, "max_epoch": 2, "max_label": 2, "maxcolsep": [0, 2], "maxim": 7, "maximum": [0, 2, 8], "maxpool": [2, 5, 8], "mbl_dict": 2, "me": 0, "mean": [1, 2, 7], "measur": 5, "measurementunit": 5, "memori": [2, 5, 7], "merg": [2, 5], "merge_baselin": 2, "merge_region": 2, "messag": 2, "metadata": [0, 1, 2, 4, 5, 6, 7], "method": [1, 2], "might": [5, 7], "min": [2, 5], "min_epoch": 2, "min_length": 2, "minim": [1, 2, 5], "minimum": 5, "minor": 5, "mismatch": [1, 5, 7], "misrecogn": 7, "miss": [0, 2, 5, 7], "mittagessen": [0, 4, 7], "mix": [2, 5], "ml": 6, "mlmodel": [5, 7], "mm_rpred": [1, 2], "mode": [1, 2, 5], "model": [1, 5, 7, 8], "model_1": 5, "model_25": 5, "model_5": 5, "model_best": 5, "model_fil": 7, "model_nam": 7, "model_name_best": 7, "model_path": 1, "model_typ": 2, "modern": [4, 7], "modest": 1, "modif": 5, "modul": 1, "momentum": [5, 7], "more": [0, 1, 2, 4, 5, 7, 8], "most": [1, 2, 5, 7], "mostli": [0, 1, 2, 4, 5, 7, 8], "move": [2, 7, 8], "mp": 8, "mp2": [5, 8], "mp3": [5, 8], "mreg_dict": 2, "much": [1, 2, 4], "multi": [0, 1, 2, 4, 7], "multilabel": 2, "multipl": [0, 1, 4, 5, 7], "myprintingcallback": 1, "n": [2, 5, 8], "name": [0, 2, 4, 7, 8], "named_spec": 2, "national": 4, "nativ": 6, "natur": 7, "naugment": 4, "nchw": 2, "ndarrai": 2, "necessari": [2, 4, 5, 7], "necessarili": [2, 5], "need": [1, 2, 7], "net": [1, 2, 7], "netork": 1, "network": [1, 2, 4, 5, 6, 7], "neural": [1, 2, 5, 6, 7], "never": 7, "nevertheless": [1, 5], "new": [2, 3, 5, 7, 8], "next": [1, 7], "nfc": 5, "nfd": 5, "nfkc": 5, "nfkd": 5, "nlbin": [0, 1, 2], "nn": 2, "no_encod": 2, "no_hlin": 2, "noisi": 7, "non": [0, 1, 2, 4, 5, 7, 8], "none": [2, 5, 7, 8], "nonlinear": 8, "nop": 1, "normal": [0, 2], "note": 2, "notion": 1, "now": [1, 7], "np": 2, "num": 2, "num_class": 2, "number": [0, 1, 2, 5, 7, 8], "numer": [1, 7], "numpi": [1, 2], "nvidia": 3, "o": [0, 1, 4, 5, 7], "o1c103": 8, "object": [1, 2], "obtain": 7, "obvious": 7, "occur": 7, "occurr": 2, "ocr": [0, 1, 2, 4, 7], "ocr_lin": 0, "ocr_record": [1, 2], "ocropi": 2, "ocropu": [0, 2], "ocrx_word": 0, "off": [5, 7], "offer": 5, "offset": [2, 5], "often": [1, 5, 7], "old": 6, "omit": 7, "on_init_end": 1, "on_init_start": 1, "on_train_end": 1, "on_validation_end": 2, "onc": 5, "one": [0, 1, 2, 5, 7, 8], "one_channel_mod": 2, "ones": [0, 5], "onli": [0, 1, 2, 5, 7, 8], "onto": [2, 5], "op": 2, "open": [0, 1], "openmp": [2, 5, 7], "oper": [0, 1, 2, 8], "optic": [0, 7], "optim": [4, 5, 7], "option": [0, 1, 2, 5, 8], "order": [0, 1, 2, 4, 5, 8], "org": 5, "orient": 1, "origin": [1, 2, 5], "orthogon": 2, "other": [0, 5, 7, 8], "otherwis": [2, 5], "out": [5, 7, 8], "output": [0, 1, 2, 4, 5, 7, 8], "output_1": [0, 7], "output_2": [0, 7], "output_dir": 7, "output_fil": 7, "output_s": 2, "over": 2, "overfit": 7, "overhead": 5, "overlap": 5, "overrid": [2, 5], "overwritten": 2, "p": [0, 4, 5], "packag": [4, 7], "pad": [2, 5], "padding_left": 2, "padding_right": 2, "page": [1, 2, 4, 7], "page_doc": 1, "page_idx": 2, "pagecont": 5, "pageseg": 1, "pagexml": [0, 1, 2, 4, 7], "pair": [0, 2], "par": [1, 4], "paragraph": [0, 5], "parallel": [2, 5], "param": [5, 7, 8], "paramet": [0, 1, 2, 4, 5, 7, 8], "parameterless": 0, "parametr": 2, "pars": [2, 5], "parse_alto": [1, 2], "parse_pag": [1, 2], "parse_xml": 2, "parser": [1, 2, 5], "part": [1, 5, 7, 8], "parti": 1, "partial": [2, 4], "particular": [0, 1, 4, 5, 7, 8], "partit": 5, "pass": [2, 5, 7, 8], "path": [1, 2, 5], "pathlib": 2, "pattern": [2, 7], "pcgt": 5, "pdf": [0, 4, 7], "pdfimag": 7, "pdftocairo": 7, "peopl": 4, "per": [1, 2, 5, 7], "perc": [0, 2], "percentag": 2, "percentil": 2, "perform": [1, 2, 4, 5, 7], "period": 7, "pick": 5, "pickl": [2, 6], "pil": [1, 2], "pillow": 1, "pinpoint": 7, "pipelin": 1, "pixel": [1, 5, 8], "pl_modul": 1, "place": [0, 4, 7], "placement": 7, "plain": 0, "pleas": 5, "plethora": 1, "png": [0, 1, 5, 7], "point": [1, 2, 5, 7], "polygon": [1, 2, 5, 7], "polygonal_reading_ord": 2, "polygongtdataset": 2, "polygonizaton": 2, "polylin": 2, "polyton": [0, 7], "pool": 5, "popul": 2, "porson": 0, "portant": 4, "portion": 0, "posit": 2, "possibl": [0, 1, 2, 5, 7], "postprocess": [1, 5], "potenti": 5, "power": 7, "practic": 5, "pratiqu": 4, "pre": 5, "precis": 5, "precompil": 5, "precomput": 2, "pred": 2, "pred_it": 1, "predict": [1, 2], "predict_label": 2, "predict_str": 2, "prefer": [1, 7], "prefilt": 0, "prefix": [2, 5, 7], "prefix_epoch": 7, "preload": 7, "prematur": 5, "prepar": 7, "preparse_xml_data": 2, "prepend": 8, "preprint": 2, "preprocess": [2, 4], "prerequisit": 4, "preserv": 2, "prevent": [2, 7], "previou": 4, "previous": 5, "primaresearch": 5, "primari": [1, 5], "primarili": 4, "princip": [0, 1, 2, 5], "print": [0, 1, 4, 5, 7], "printspac": 5, "prob": [2, 8], "probabl": [2, 5, 7, 8], "problemat": 5, "proceed": 2, "process": [0, 1, 2, 4, 5, 7, 8], "produc": [0, 1, 5, 7], "programm": 4, "progress": [2, 7], "project": [4, 8], "prone": 5, "pronn": [2, 6], "proper": [1, 2], "properli": 7, "properti": 2, "proport": 5, "protobuf": [2, 6], "prove": 7, "provid": [0, 1, 2, 4, 5, 7, 8], "psl": 4, "public": 4, "pull": [0, 4], "purpos": [1, 2, 7, 8], "put": [0, 2, 7], "py": 1, "pypi": 4, "pyrnn": [0, 2, 6], "python": [2, 4], "pytorch": [1, 3, 6], "pytorch_lightn": [1, 2], "pytorchcodec": 2, "pyvip": 4, "q": 5, "qualiti": [1, 7], "quit": [1, 4, 5], "r": [0, 2, 5, 8], "rais": [1, 2, 5], "ran": 4, "random": [5, 7], "rang": [0, 2], "rapidli": 7, "rate": [5, 7], "rather": [0, 5], "ratio": 5, "raw": [1, 5, 7], "rb": 2, "re": [0, 2], "reach": 7, "read": [0, 2, 4, 5], "reader": 5, "reading_ord": 2, "reading_order_fn": 2, "real": 7, "realiz": 5, "reason": 2, "rec_model_path": 1, "recherch": 4, "recogn": [0, 1, 2, 4, 5, 7], "recognit": [2, 3, 8], "recognitino": 2, "recognitionmodel": 1, "recommend": [1, 5, 7], "record": [1, 2, 4], "rectangl": 2, "recurr": [2, 6], "reduc": [5, 8], "reduceonplateau": 5, "refer": [1, 5, 7], "referenc": 2, "refin": 5, "region": [0, 1, 2, 4, 5, 7], "region_typ": 5, "region_type_0": 2, "region_type_1": 2, "regular": 5, "rel": 5, "relat": [0, 1, 5, 7], "relax": 7, "reliabl": 7, "relu": 8, "remain": [5, 7], "remaind": 8, "remedi": 7, "remov": [0, 2, 5, 7, 8], "render": [1, 2], "render_report": 2, "reorder": [2, 5, 7], "repeatedli": 7, "repolygon": [1, 2], "report": [2, 5, 7], "repositori": [4, 7], "repres": 2, "represent": [2, 7], "reproduc": 5, "request": [0, 4, 8], "requir": [0, 1, 2, 4, 5, 7, 8], "requisit": 7, "research": 4, "reserv": 1, "reshap": [2, 5], "resili": 4, "resiz": [2, 5], "resize_output": 2, "resolv": 5, "respect": [1, 2, 4, 5, 8], "result": [0, 1, 2, 5, 7, 8], "resum": 5, "retain": [0, 2, 5], "retrain": 7, "retriev": [0, 4, 5, 7], "return": [1, 2, 8], "reus": 2, "revers": 8, "rgb": [1, 8], "right": [0, 2, 4, 5, 7], "rl": [0, 2], "rmsprop": [5, 7], "rnn": [2, 4, 5, 7, 8], "romanov": 7, "rough": 7, "routin": 1, "rpred": 1, "rtl": [0, 2], "rtl_display_data": 5, "rtl_training_data": 5, "rukkakha": 7, "rule": 7, "run": [1, 2, 3, 4, 5, 7, 8], "r\u00e9f\u00e9renc": 4, "s1": [5, 8], "same": [0, 1, 2, 4, 5, 7], "sampl": [2, 5, 7], "sarah": 7, "satur": 5, "savant": 7, "save": [2, 5, 7], "save_model": 2, "savefreq": [5, 7], "scale": [0, 2, 8], "scale_polygonal_lin": 2, "scale_region": 2, "scan": 7, "scantailor": 7, "schedul": 5, "schema": 5, "schemaloc": 5, "scientif": 4, "script": [1, 2, 4, 5, 7], "script_detect": [0, 1], "script_typ": 2, "script_type_0": 2, "script_type_1": 2, "scriptal": 1, "scroung": 4, "seamcarv": 2, "search": 2, "second": [0, 2], "section": [1, 7], "see": [1, 5, 7], "seen": [1, 7], "seg": 1, "seg_idx": 2, "seg_typ": 2, "segment": [4, 7], "segment_k": 5, "segmentation_output": 1, "segmentation_overlai": 1, "segmentationmodel": 1, "segmodel_best": 5, "segresult": 2, "segtrain": 5, "seldom": 7, "select": [0, 2, 5, 8], "selector": 2, "self": 1, "semant": [5, 7], "semi": [0, 7], "sens": 0, "sensibl": [1, 5], "separ": [0, 1, 2, 4, 5, 7, 8], "seqrecogn": 2, "sequenc": [0, 1, 2, 5, 7, 8], "sequenti": 2, "seri": 0, "serial": [0, 4, 6], "serialize_segment": [1, 2], "set": [0, 1, 2, 4, 5, 7, 8], "set_num_thread": 2, "setup": 1, "sever": [1, 2, 7], "sgd": 5, "shape": [2, 5, 8], "share": [0, 5], "shell": 7, "shini": 2, "short": [0, 8], "should": [1, 2, 7], "show": [0, 4, 5, 7], "shown": [0, 7], "shuffl": 1, "sigma": 2, "sigmoid": 8, "similar": [1, 5, 7], "simpl": [1, 5, 7, 8], "singl": [1, 2, 5, 7, 8], "singular": 2, "size": [0, 1, 2, 5, 7, 8], "skew": [0, 7], "skip": 2, "slice": 2, "slightli": [0, 5, 7, 8], "slow": 5, "slower": 5, "small": [0, 1, 2, 5, 7, 8], "so": [1, 3, 5, 7, 8], "sobel": 2, "softmax": [1, 2, 8], "softwar": 7, "some": [0, 1, 2, 4, 5, 7], "someth": [1, 7], "sometim": [1, 4, 5, 7], "somewhat": 7, "soon": [5, 7], "sort": [2, 4, 7], "sourc": [2, 5, 7, 8], "sourceimageinform": 5, "sp": 5, "space": [1, 2, 4, 5, 7], "span": 0, "spec": [2, 5], "special": [0, 1, 2], "specialis": 5, "specif": [2, 5, 7], "specifi": [0, 5], "speckl": 7, "speech": 2, "speedup": 5, "split": [0, 2, 5, 7, 8], "spot": 4, "squash": [2, 8], "stabl": [1, 4], "stack": [2, 5, 8], "stage": 1, "standard": [1, 4, 5, 7], "start": [1, 2, 7], "start_separ": 2, "stddev": 5, "step": [0, 1, 2, 4, 5, 7, 8], "still": [0, 1, 2], "stop": [5, 7], "str": 2, "straightforward": 1, "stream": 5, "strength": 1, "strict": [2, 5], "strictli": 7, "stride": [5, 8], "stride_i": 8, "stride_x": 8, "string": [2, 5, 8], "strip": [0, 8], "structur": [1, 4, 5], "stub": 5, "sub": 1, "subcommand": [0, 4], "subcommand_1": 0, "subcommand_2": 0, "subcommand_n": 0, "subimag": 2, "suboptim": 5, "subsequ": [1, 2], "subset": [1, 2], "substitut": [2, 5, 7], "suffer": 7, "suffici": [1, 5], "suffix": [0, 2], "suggest": 1, "suit": 7, "suitabl": [0, 7], "summar": [2, 5, 7, 8], "superflu": 7, "suppl_obj": 2, "suppli": [0, 1, 2, 5, 7], "support": [1, 4, 5, 6], "suppos": 1, "suppress": 5, "surfac": 2, "switch": [0, 2, 5, 7], "symbol": [5, 7], "syntax": [0, 5, 8], "syr": [5, 7], "syriac": 7, "syriac_best": 7, "system": [4, 5, 7], "systemat": 7, "t": [0, 1, 2, 5, 7, 8], "tabl": [5, 7], "tag": [2, 5], "tags_ignor": 2, "take": [1, 4, 5, 7], "tanh": 8, "target": 2, "task": 7, "tb": [0, 2], "technic": 4, "tell": 5, "templat": [1, 2, 4], "tempor": 2, "tensor": [1, 2, 8], "tensorflow": 8, "term": 4, "tesseract": 8, "test": [2, 7], "test_model": 5, "text": [0, 1, 2, 4, 7], "text_direct": [0, 1, 2], "text_transform": 2, "textblock": 5, "textblock_m": 5, "textblock_n": 5, "textequiv": 5, "textlin": 5, "textregion": 5, "than": [2, 5, 7], "thei": [1, 2, 5, 7], "them": [0, 2, 5], "therefor": [0, 5, 7], "therein": 7, "thi": [0, 1, 2, 4, 5, 6, 7, 8], "third": 1, "those": [0, 5], "though": 1, "thousand": 7, "thread": [2, 5, 7], "three": 6, "threshold": [0, 2], "through": [1, 2, 4, 5, 7], "thrown": 0, "tif": [0, 4], "tiff": [0, 4, 7], "tightli": 7, "time": [1, 2, 5, 7, 8], "tip": 1, "titr": 4, "tmpl": 2, "toi": 0, "told": 5, "too": [5, 8], "tool": [1, 5, 7, 8], "top": [0, 1, 2, 4], "toplin": [2, 5], "topograph": 0, "topolog": 0, "torch": 2, "torchsegrecogn": 2, "torchseqrecogn": [1, 2], "torchvgslmodel": [1, 2], "total": [2, 7], "train": [0, 3, 8], "trainabl": [1, 2, 4, 5], "trainer": 1, "training_data": [1, 5], "training_fil": 1, "transcrib": [5, 7], "transcript": [1, 2, 5], "transcriptioninterfac": 2, "transfer": [1, 5], "transform": [1, 2, 4], "transformt": 1, "translat": 2, "transpos": [5, 7, 8], "travail": 4, "treat": [2, 7, 8], "true": [0, 1, 2, 8], "truth": [5, 7], "try": 2, "tupl": 2, "turn": 4, "tutori": [1, 5], "two": [0, 1, 2, 5, 8], "txt": [0, 2, 4, 5], "type": [0, 1, 2, 5, 7, 8], "type_1": 2, "typefac": [5, 7], "typograph": [0, 7], "typologi": 5, "u": [1, 5], "u1f05": 5, "un": 4, "unchti": 0, "unclean": 7, "unclear": 5, "undecod": 1, "under": [2, 4], "undesir": [5, 8], "unduli": 0, "unencod": 2, "uni": [0, 7], "unicod": [0, 1, 2, 7], "uniformli": 2, "union": [2, 4], "uniqu": 7, "universit\u00e9": 4, "unless": 5, "unnecessarili": 1, "unpredict": 7, "unrepres": 7, "unseg": [2, 7], "unset": 5, "until": 5, "untrain": 5, "unus": 5, "up": [1, 4, 5], "updat": 0, "upon": 0, "upward": [2, 5, 7], "us": [0, 1, 2, 3, 5, 7, 8], "usabl": 1, "user": [2, 4, 5, 7], "user_metadata": 2, "usual": [1, 5, 7], "utf": 5, "util": [1, 4, 5, 7], "uw3": 0, "v": [5, 7], "v4": 5, "val": 5, "valid": [0, 2, 5], "valid_baselin": 2, "valid_region": 2, "validation_set": 2, "valu": [0, 1, 2, 5, 8], "variabl": [2, 4, 5, 8], "variant": 5, "variat": 5, "varieti": [4, 5], "variou": 0, "vast": 1, "vector": [1, 2], "vectorize_lin": 2, "verbos": [1, 7], "veri": 5, "versa": 5, "versatil": 6, "version": [0, 2, 3, 4, 5], "vertic": [0, 2], "vgsl": [1, 5], "vice": 5, "vocabulari": 2, "vocal": 7, "vpo": 5, "vsgl": 2, "vv": 7, "w": [0, 1, 2, 5, 8], "w3": 5, "wa": [0, 2, 4, 5, 7], "wai": [0, 1, 5, 7], "wait": 5, "want": [4, 5, 7], "warn": [1, 7], "warp": 7, "we": [2, 5, 7], "weak": [1, 7], "websit": 7, "weight": [2, 5], "welcom": [0, 4], "well": [5, 7], "were": [2, 5], "western": 7, "wget": 7, "what": [1, 7], "when": [1, 2, 5, 7, 8], "where": [2, 7], "whether": 2, "which": [0, 1, 2, 3, 4, 5], "while": [0, 1, 2, 5, 7], "white": [0, 1, 2, 7], "whitespac": [2, 5], "whitespace_norm": 2, "whole": [2, 7], "wide": [4, 8], "width": [1, 2, 5, 7, 8], "wildli": 7, "without": [2, 5, 7], "word": [4, 5], "word_text": 5, "work": [1, 2, 5, 7], "worker": 5, "world": 7, "would": 5, "wrapper": [1, 2], "write": [0, 1, 2, 5], "writing_mod": 2, "written": [0, 5, 7], "www": 5, "x": [2, 4, 5, 7, 8], "x0": 2, "x01": 1, "x02": 1, "x03": 1, "x04": 1, "x05": 1, "x06": 1, "x07": 1, "x1": 2, "x2": 2, "x64": 4, "x_0": 2, "x_1": 2, "x_bbox": 0, "x_conf": 0, "x_m": 2, "x_n": 2, "x_stride": 8, "xa0": 7, "xdg_base_dir": 0, "xk": 2, "xm": 2, "xml": [0, 7], "xmln": 5, "xmlschema": 5, "xn": 2, "xsd": 5, "xsi": 5, "xyz": 0, "y": [0, 2, 8], "y0": 2, "y1": 2, "y2": 2, "y_0": 2, "y_1": 2, "y_m": 2, "y_n": 2, "y_stride": 8, "yield": 2, "yk": 2, "ym": 2, "yml": [4, 7], "yn": 2, "you": [4, 5, 7], "your": 0, "y\u016bsuf": 7, "zenodo": 4, "zero": [2, 7, 8], "zoom": [0, 2], "\u00e9cole": 4, "\u00e9tat": 4, "\u00e9tude": 4, "\u02bf\u0101lam": 7, "\u0621": 5, "\u0621\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0627": 5, "\u0628": 5, "\u0629": 5, "\u062a": 5, "\u062b": 5, "\u062c": 5, "\u062d": 5, "\u062e": 5, "\u062f": 5, "\u0630": 5, "\u0631": 5, "\u0632": 5, "\u0633": 5, "\u0634": 5, "\u0635": 5, "\u0636": 5, "\u0637": 5, "\u0638": 5, "\u0639": 5, "\u063a": 5, "\u0640": 5, "\u0641": 5, "\u0642": 5, "\u0643": 5, "\u0644": 5, "\u0645": 5, "\u0646": 5, "\u0647": 5, "\u0648": 5, "\u0649": 5, "\u064a": 5, "\u0710": 7, "\u0712": 7, "\u0713": 7, "\u0715": 7, "\u0717": 7, "\u0718": 7, "\u0719": 7, "\u071a": 7, "\u071b": 7, "\u071d": 7, "\u071f": 7, "\u0720": 7, "\u0721": 7, "\u0722": 7, "\u0723": 7, "\u0725": 7, "\u0726": 7, "\u0728": 7, "\u0729": 7, "\u072a": 7, "\u072b": 7, "\u072c": 7}, "titles": ["Advanced Usage", "API Quickstart", "API Reference", "GPU Acceleration", "kraken", "Training", "Models", "Training kraken", "VGSL network specification"], "titleterms": {"acceler": 3, "acquisit": 7, "advanc": 0, "alto": 5, "annot": 7, "api": [1, 2], "baselin": 1, "basic": [1, 8], "binar": [0, 2], "binari": 5, "blla": 2, "codec": [2, 5], "compil": 7, "concept": 1, "conda": 4, "convolut": 8, "coreml": 6, "ctc_decod": 2, "data": 5, "dataset": [2, 5, 7], "detect": 0, "dropout": 8, "evalu": [2, 7], "exampl": 8, "except": 2, "featur": 4, "find": 4, "fine": 5, "format": 5, "from": 5, "function": 2, "fund": 4, "gpu": 3, "group": 8, "helper": [2, 8], "imag": 7, "input": 0, "instal": [4, 7], "kraken": [2, 4, 7], "layer": 8, "legaci": [1, 2], "lib": 2, "licens": 4, "linegen": 2, "loss": 2, "max": 8, "model": [0, 2, 4, 6], "modul": 2, "network": 8, "normal": [5, 8], "page": [0, 5], "pageseg": 2, "pars": 1, "pip": 4, "plumb": 8, "pool": 8, "preprocess": [1, 7], "quickstart": [1, 4], "recognit": [0, 1, 4, 5, 6, 7], "recurr": 8, "refer": 2, "regular": 8, "relat": 4, "repositori": 0, "reshap": 8, "rpred": 2, "schedul": 2, "scratch": 5, "script": 0, "segment": [0, 1, 2, 5, 6], "serial": [1, 2], "slice": 5, "softwar": 4, "specif": [0, 8], "stopper": 2, "test": 5, "text": 5, "train": [1, 2, 4, 5, 7], "trainer": 2, "transcrib": 2, "transcript": 7, "tune": 5, "tutori": 4, "unicod": 5, "us": 4, "usag": 0, "valid": 7, "vgsl": [2, 8], "xml": [1, 2, 5]}}) \ No newline at end of file diff --git a/4.0/training.html b/4.0/training.html new file mode 100644 index 000000000..08c3c4fcf --- /dev/null +++ b/4.0/training.html @@ -0,0 +1,509 @@ + + + + + + + + Training kraken — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Training kraken

+

kraken is an optical character recognition package that can be trained fairly +easily for a large number of scripts. In contrast to other system requiring +segmentation down to glyph level before classification, it is uniquely suited +for the recognition of connected scripts, because the neural network is trained +to assign correct character to unsegmented training data.

+

Both segmentation, the process finding lines and regions on a page image, and +recognition, the conversion of line images into text, can be trained in kraken. +To train models for either we require training data, i.e. examples of page +segmentations and transcriptions that are similar to what we want to be able to +recognize. For segmentation the examples are the location of baselines, i.e. +the imaginary lines the text is written on, and polygons of regions. For +recognition these are the text contained in a line. There are multiple ways to +supply training data but the easiest is through PageXML or ALTO files.

+
+

Installing kraken

+

The easiest way to install and use kraken is through conda. kraken works both on Linux and Mac OS +X. After installing conda, download the environment file and create the +environment for kraken:

+
$ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment.yml
+$ conda env create -f environment.yml
+
+
+

Each time you want to use the kraken environment in a shell is has to be +activated first:

+
$ conda activate kraken
+
+
+
+
+

Image acquisition and preprocessing

+

First a number of high quality scans, preferably color or grayscale and at +least 300dpi are required. Scans should be in a lossless image format such as +TIFF or PNG, images in PDF files have to be extracted beforehand using a tool +such as pdftocairo or pdfimages. While each of these requirements can +be relaxed to a degree, the final accuracy will suffer to some extent. For +example, only slightly compressed JPEG scans are generally suitable for +training and recognition.

+

Depending on the source of the scans some preprocessing such as splitting scans +into pages, correcting skew and warp, and removing speckles can be advisable +although it isn’t strictly necessary as the segmenter can be trained to treat +noisy material with a high accuracy. A fairly user-friendly software for +semi-automatic batch processing of image scans is Scantailor albeit most work can be done using a standard image +editor.

+

The total number of scans required depends on the kind of model to train +(segmentation or recognition), the complexity of the layout or the nature of +the script to recognize. Only features that are found in the training data can +later be recognized, so it is important that the coverage of typographic +features is exhaustive. Training a small segmentation model for a particular +kind of material might require less than a few hundred samples while a general +model can well go into the thousands of pages. Likewise a specific recognition +model for printed script with a small grapheme inventory such as Arabic or +Hebrew requires around 800 lines, with manuscripts, complex scripts (such as +polytonic Greek), and general models for multiple typefaces and hands needing +more training data for the same accuracy.

+

There is no hard rule for the amount of training data and it may be required to +retrain a model after the initial training data proves insufficient. Most +western texts contain between 25 and 40 lines per page, therefore upward of +30 pages have to be preprocessed and later transcribed.

+
+
+

Annotation and transcription

+

kraken does not provide internal tools for the annotation and transcription of +baselines, regions, and text. There are a number of tools available that can +create ALTO and PageXML files containing the requisite information for either +segmentation or recognition training: escriptorium integrates kraken tightly including +training and inference, Aletheia is a powerful desktop +application that can create fine grained annotations.

+
+
+

Dataset Compilation

+
+
+

Training

+

The training data, e.g. a collection of PAGE XML documents, obtained through +annotation and transcription may now be used to train segmentation and/or +transcription models.

+

The training data in output_dir may now be used to train a new model by +invoking the ketos train command. Just hand a list of images to the command +such as:

+
$ ketos train output_dir/*.png
+
+
+

to start training.

+

A number of lines will be split off into a separate held-out set that is used +to estimate the actual recognition accuracy achieved in the real world. These +are never shown to the network during training but will be recognized +periodically to evaluate the accuracy of the model. Per default the validation +set will comprise of 10% of the training data.

+

Basic model training is mostly automatic albeit there are multiple parameters +that can be adjusted:

+
+
--output
+

Sets the prefix for models generated during training. They will best as +prefix_epochs.mlmodel.

+
+
--report
+

How often evaluation passes are run on the validation set. It is an +integer equal or larger than 1 with 1 meaning a report is created each +time the complete training set has been seen by the network.

+
+
--savefreq
+

How often intermediate models are saved to disk. It is an integer with +the same semantics as --report.

+
+
--load
+

Continuing training is possible by loading an existing model file with +--load. To continue training from a base model with another +training set refer to the full ketos documentation.

+
+
--preload
+

Enables/disables preloading of the training set into memory for +accelerated training. The default setting preloads data sets with less +than 2500 lines, explicitly adding --preload will preload arbitrary +sized sets. --no-preload disables preloading in all circumstances.

+
+
+

Training a network will take some time on a modern computer, even with the +default parameters. While the exact time required is unpredictable as training +is a somewhat random process a rough guide is that accuracy seldom improves +after 50 epochs reached between 8 and 24 hours of training.

+

When to stop training is a matter of experience; the default setting employs a +fairly reliable approach known as early stopping that stops training as soon as +the error rate on the validation set doesn’t improve anymore. This will +prevent overfitting, i.e. +fitting the model to recognize only the training data properly instead of the +general patterns contained therein.

+
$ ketos train output_dir/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'}
+Initializing model ✓
+Accuracy report (0) -1.5951 3680 9550
+epoch 0/-1  [####################################]  788/788
+Accuracy report (1) 0.0245 3504 3418
+epoch 1/-1  [####################################]  788/788
+Accuracy report (2) 0.8445 3504 545
+epoch 2/-1  [####################################]  788/788
+Accuracy report (3) 0.9541 3504 161
+epoch 3/-1  [------------------------------------]  13/788  0d 00:22:09
+...
+
+
+

By now there should be a couple of models model_name-1.mlmodel, +model_name-2.mlmodel, … in the directory the script was executed in. Lets +take a look at each part of the output.

+
Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+
+
+

shows the progress of loading the training and validation set into memory. This +might take a while as preprocessing the whole set and putting it into memory is +computationally intensive. Loading can be made faster without preloading at the +cost of performing preprocessing repeatedly during the training process.

+
[270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'}
+
+
+

is a warning about missing characters in either the validation or training set, +i.e. that the alphabets of the sets are not equal. Increasing the size of the +validation set will often remedy this warning.

+
Accuracy report (2) 0.8445 3504 545
+
+
+

this line shows the results of the validation set evaluation. The error after 2 +epochs is 545 incorrect characters out of 3504 characters in the validation set +for a character accuracy of 84.4%. It should decrease fairly rapidly. If +accuracy remains around 0.30 something is amiss, e.g. non-reordered +right-to-left or wildly incorrect transcriptions. Abort training, correct the +error(s) and start again.

+

After training is finished the best model is saved as +model_name_best.mlmodel. It is highly recommended to also archive the +training log and data for later reference.

+

ketos can also produce more verbose output with training set and network +information by appending one or more -v to the command:

+
$ ketos -vv train syr/*.png
+[0.7272] Building ground truth set from 876 line images
+[0.7281] Taking 88 lines from training for evaluation
+...
+[0.8479] Training set 788 lines, validation set 88 lines, alphabet 48 symbols
+[0.8481] alphabet mismatch {'\xa0', '0', ':', '݀', '܇', '݂', '5'}
+[0.8482] grapheme       count
+[0.8484] SPACE  5258
+[0.8484]        ܐ       3519
+[0.8485]        ܘ       2334
+[0.8486]        ܝ       2096
+[0.8487]        ܠ       1754
+[0.8487]        ܢ       1724
+[0.8488]        ܕ       1697
+[0.8489]        ܗ       1681
+[0.8489]        ܡ       1623
+[0.8490]        ܪ       1359
+[0.8491]        ܬ       1339
+[0.8491]        ܒ       1184
+[0.8492]        ܥ       824
+[0.8492]        .       811
+[0.8493] COMBINING DOT BELOW    646
+[0.8493]        ܟ       599
+[0.8494]        ܫ       577
+[0.8495] COMBINING DIAERESIS    488
+[0.8495]        ܚ       431
+[0.8496]        ܦ       428
+[0.8496]        ܩ       307
+[0.8497] COMBINING DOT ABOVE    259
+[0.8497]        ܣ       256
+[0.8498]        ܛ       204
+[0.8498]        ܓ       176
+[0.8499]        ܀       132
+[0.8499]        ܙ       81
+[0.8500]        *       66
+[0.8501]        ܨ       59
+[0.8501]        ܆       40
+[0.8502]        [       40
+[0.8503]        ]       40
+[0.8503]        1       18
+[0.8504]        2       11
+[0.8504]        ܇       9
+[0.8505]        3       8
+[0.8505]                6
+[0.8506]        5       5
+[0.8506] NO-BREAK SPACE 4
+[0.8507]        0       4
+[0.8507]        6       4
+[0.8508]        :       4
+[0.8508]        8       4
+[0.8509]        9       3
+[0.8510]        7       3
+[0.8510]        4       3
+[0.8511] SYRIAC FEMININE DOT    1
+[0.8511] SYRIAC RUKKAKHA        1
+[0.8512] Encoding training set
+[0.9315] Creating new model [1,1,0,48 Lbx100 Do] with 49 outputs
+[0.9318] layer          type    params
+[0.9350] 0              rnn     direction b transposed False summarize False out 100 legacy None
+[0.9361] 1              dropout probability 0.5 dims 1
+[0.9381] 2              linear  augmented False out 49
+[0.9918] Constructing RMSprop optimizer (lr: 0.001, momentum: 0.9)
+[0.9920] Set OpenMP threads to 4
+[0.9920] Moving model to device cpu
+[0.9924] Starting evaluation run
+
+
+

indicates that the training is running on 788 transcribed lines and a +validation set of 88 lines. 49 different classes, i.e. Unicode code points, +where found in these 788 lines. These affect the output size of the network; +obviously only these 49 different classes/code points can later be output by +the network. Importantly, we can see that certain characters occur markedly +less often than others. Characters like the Syriac feminine dot and numerals +that occur less than 10 times will most likely not be recognized well by the +trained net.

+
+
+

Evaluation and Validation

+

While output during training is detailed enough to know when to stop training +one usually wants to know the specific kinds of errors to expect. Doing more +in-depth error analysis also allows to pinpoint weaknesses in the training +data, e.g. above average error rates for numerals indicate either a lack of +representation of numerals in the training data or erroneous transcription in +the first place.

+

First the trained model has to be applied to some line transcriptions with the +ketos test command:

+
$ ketos test -m syriac_best.mlmodel lines/*.png
+Loading model syriac_best.mlmodel ✓
+Evaluating syriac_best.mlmodel
+Evaluating  [#-----------------------------------]    3%  00:04:56
+...
+
+
+

After all lines have been processed a evaluation report will be printed:

+
=== report  ===
+
+35619     Characters
+336       Errors
+99.06%    Accuracy
+
+157       Insertions
+81        Deletions
+98        Substitutions
+
+Count     Missed  %Right
+27046     143     99.47%  Syriac
+7015      52      99.26%  Common
+1558      60      96.15%  Inherited
+
+Errors    Correct-Generated
+25        {  } - { COMBINING DOT BELOW }
+25        { COMBINING DOT BELOW } - {  }
+15        { . } - {  }
+15        { COMBINING DIAERESIS } - {  }
+12        { ܢ } - {  }
+10        {  } - { . }
+8 { COMBINING DOT ABOVE } - {  }
+8 { ܝ } - {  }
+7 { ZERO WIDTH NO-BREAK SPACE } - {  }
+7 { ܆ } - {  }
+7 { SPACE } - {  }
+7 { ܣ } - {  }
+6 {  } - { ܝ }
+6 { COMBINING DOT ABOVE } - { COMBINING DIAERESIS }
+5 { ܙ } - {  }
+5 { ܬ } - {  }
+5 {  } - { ܢ }
+4 { NO-BREAK SPACE } - {  }
+4 { COMBINING DIAERESIS } - { COMBINING DOT ABOVE }
+4 {  } - { ܒ }
+4 {  } - { COMBINING DIAERESIS }
+4 { ܗ } - {  }
+4 {  } - { ܬ }
+4 {  } - { ܘ }
+4 { ܕ } - { ܢ }
+3 {  } - { ܕ }
+3 { ܐ } - {  }
+3 { ܗ } - { ܐ }
+3 { ܝ } - { ܢ }
+3 { ܀ } - { . }
+3 {  } - { ܗ }
+
+  .....
+
+
+

The first section of the report consists of a simple accounting of the number +of characters in the ground truth, the errors in the recognition output and the +resulting accuracy in per cent.

+

The next table lists the number of insertions (characters occurring in the +ground truth but not in the recognition output), substitutions (misrecognized +characters), and deletions (superfluous characters recognized by the model).

+

Next is a grouping of errors (insertions and substitutions) by Unicode script.

+

The final part of the report are errors sorted by frequency and a per +character accuracy report. Importantly most errors are incorrect recognition of +combining marks such as dots and diaereses. These may have several sources: +different dot placement in training and validation set, incorrect transcription +such as non-systematic transcription, or unclean speckled scans. Depending on +the error source, correction most often involves adding more training data and +fixing transcriptions. Sometimes it may even be advisable to remove +unrepresentative data from the training set.

+
+
+

Recognition

+

The kraken utility is employed for all non-training related tasks. Optical +character recognition is a multi-step process consisting of binarization +(conversion of input images to black and white), page segmentation (extracting +lines from the image), and recognition (converting line image to character +sequences). All of these may be run in a single call like this:

+
$ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m MODEL_FILE
+
+
+

producing a text file from the input image. There are also hocr and ALTO output +formats available through the appropriate switches:

+
$ kraken -i ... ocr -h
+$ kraken -i ... ocr -a
+
+
+

For debugging purposes it is sometimes helpful to run each step manually and +inspect intermediate results:

+
$ kraken -i INPUT_IMAGE BW_IMAGE binarize
+$ kraken -i BW_IMAGE LINES segment
+$ kraken -i BW_IMAGE OUTPUT_FILE ocr -l LINES ...
+
+
+

It is also possible to recognize more than one file at a time by just chaining +-i ... ... clauses like this:

+
$ kraken -i input_1 output_1 -i input_2 output_2 ...
+
+
+

Finally, there is a central repository containing freely available models. +Getting a list of all available models:

+
$ kraken list
+
+
+

Retrieving model metadata for a particular model:

+
$ kraken show arabic-alam-al-kutub
+name: arabic-alam-al-kutub.mlmodel
+
+An experimental model for Classical Arabic texts.
+
+Network trained on 889 lines of [0] as a test case for a general Classical
+Arabic model. Ground truth was prepared by Sarah Savant
+<sarah.savant@aku.edu> and Maxim Romanov <maxim.romanov@uni-leipzig.de>.
+
+Vocalization was omitted in the ground truth. Training was stopped at ~35000
+iterations with an accuracy of 97%.
+
+[0] Ibn al-Faqīh (d. 365 AH). Kitāb al-buldān. Edited by Yūsuf al-Hādī, 1st
+edition. Bayrūt: ʿĀlam al-kutub, 1416 AH/1996 CE.
+alphabet:  !()-.0123456789:[] «»،؟ءابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ARABIC
+MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW
+
+
+

and actually fetching the model:

+
$ kraken get arabic-alam-al-kutub
+
+
+

The downloaded model can then be used for recognition by the name shown in its metadata, e.g.:

+
$ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m arabic-alam-al-kutub.mlmodel
+
+
+

For more documentation see the kraken website.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.0/vgsl.html b/4.0/vgsl.html new file mode 100644 index 000000000..877c4a303 --- /dev/null +++ b/4.0/vgsl.html @@ -0,0 +1,288 @@ + + + + + + + + VGSL network specification — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

VGSL network specification

+

kraken implements a dialect of the Variable-size Graph Specification Language +(VGSL), enabling the specification of different network architectures for image +processing purposes using a short definition string.

+
+

Basics

+

A VGSL specification consists of an input block, one or more layers, and an +output block. For example:

+
[1,48,0,1 Cr3,3,32 Mp2,2 Cr3,3,64 Mp2,2 S1(1x12)1,3 Lbx100 Do O1c103]
+
+
+

The first block defines the input in order of [batch, height, width, channels] +with zero-valued dimensions being variable. Integer valued height or width +input specifications will result in the input images being automatically scaled +in either dimension.

+

When channels are set to 1 grayscale or B/W inputs are expected, 3 expects RGB +color images. Higher values in combination with a height of 1 result in the +network being fed 1 pixel wide grayscale strips scaled to the size of the +channel dimension.

+

After the input, a number of layers are defined. Layers operate on the channel +dimension; this is intuitive for convolutional layers but a recurrent layer +doing sequence classification along the width axis on an image of a particular +height requires the height dimension to be moved to the channel dimension, +e.g.:

+
[1,48,0,1 S1(1x48)1,3 Lbx100 O1c103]
+
+
+

or using the alternative slightly faster formulation:

+
[1,1,0,48 Lbx100 O1c103]
+
+
+

Finally an output definition is appended. When training sequence classification +networks with the provided tools the appropriate output definition is +automatically appended to the network based on the alphabet of the training +data.

+
+
+

Examples

+
[1,1,0,48 Lbx100 Do 01c59]
+
+Creating new model [1,1,0,48 Lbx100 Do] with 59 outputs
+layer           type    params
+0               rnn     direction b transposed False summarize False out 100 legacy None
+1               dropout probability 0.5 dims 1
+2               linear  augmented False out 59
+
+
+

A simple recurrent recognition model with a single LSTM layer classifying lines +normalized to 48 pixels in height.

+
[1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do 01c59]
+
+Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 59 outputs
+layer           type    params
+0               conv    kernel 3 x 3 filters 32 activation r
+1               dropout probability 0.1 dims 2
+2               maxpool kernel 2 x 2 stride 2 x 2
+3               conv    kernel 3 x 3 filters 64 activation r
+4               dropout probability 0.1 dims 2
+5               maxpool kernel 2 x 2 stride 2 x 2
+6               reshape from 1 1 x 12 to 1/3
+7               rnn     direction b transposed False summarize False out 100 legacy None
+8               dropout probability 0.5 dims 1
+9               linear  augmented False out 59
+
+
+

A model with a small convolutional stack before a recurrent LSTM layer. The +extended dropout layer syntax is used to reduce drop probability on the depth +dimension as the default is too high for convolutional layers. The remainder of +the height dimension (12) is reshaped into the depth dimensions before +applying the final recurrent and linear layers.

+
[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do 01c59]
+
+Creating new model [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do] with 59 outputs
+layer           type    params
+0               conv    kernel 3 x 3 filters 16 activation r
+1               maxpool kernel 3 x 3 stride 3 x 3
+2               rnn     direction f transposed True summarize True out 64 legacy None
+3               rnn     direction b transposed False summarize False out 128 legacy None
+4               rnn     direction b transposed False summarize False out 256 legacy None
+5               dropout probability 0.5 dims 1
+6               linear  augmented False out 59
+
+
+

A model with arbitrary sized color image input, an initial summarizing +recurrent layer to squash the height to 64, followed by 2 bi-directional +recurrent layers and a linear projection.

+
+
+

Convolutional Layers

+
C[{name}](s|t|r|l|m)[{name}]<y>,<x>,<d>[,<stride_y>,<stride_x>]
+s = sigmoid
+t = tanh
+r = relu
+l = linear
+m = softmax
+
+
+

Adds a 2D convolution with kernel size (y, x) and d output channels, applying +the selected nonlinearity. The stride can be adjusted with the optional last +two parameters.

+
+
+

Recurrent Layers

+
L[{name}](f|r|b)(x|y)[s][{name}]<n> LSTM cell with n outputs.
+G[{name}](f|r|b)(x|y)[s][{name}]<n> GRU cell with n outputs.
+f runs the RNN forward only.
+r runs the RNN reversed only.
+b runs the RNN bidirectionally.
+s (optional) summarizes the output in the requested dimension, return the last step.
+
+
+

Adds either an LSTM or GRU recurrent layer to the network using either the x +(width) or y (height) dimension as the time axis. Input features are the +channel dimension and the non-time-axis dimension (height/width) is treated as +another batch dimension. For example, a Lfx25 layer on an 1, 16, 906, 32 +input will execute 16 independent forward passes on 906x32 tensors resulting +in an output of shape 1, 16, 906, 25. If this isn’t desired either run a +summarizing layer in the other direction, e.g. Lfys20 for an input 1, 1, +906, 20, or prepend a reshape layer S1(1x16)1,3 combining the height and +channel dimension for an 1, 1, 906, 512 input to the recurrent layer.

+
+
+

Helper and Plumbing Layers

+
+

Max Pool

+
Mp[{name}]<y>,<x>[,<y_stride>,<x_stride>]
+
+
+

Adds a maximum pooling with (y, x) kernel_size and (y_stride, x_stride) stride.

+
+
+

Reshape

+
S[{name}]<d>(<a>x<b>)<e>,<f> Splits one dimension, moves one part to another
+        dimension.
+
+
+

The S layer reshapes a source dimension d to a,b and distributes a into +dimension e, respectively b into f. Either e or f has to be equal to +d. So S1(1, 48)1, 3 on an 1, 48, 1020, 8 input will first reshape into +1, 1, 48, 1020, 8, leave the 1 part in the height dimension and distribute +the 48 sized tensor into the channel dimension resulting in a 1, 1, 1024, +48*8=384 sized output. S layers are mostly used to remove undesirable non-1 +height before a recurrent layer.

+
+

Note

+

This S layer is equivalent to the one implemented in the tensorflow +implementation of VGSL, i.e. behaves differently from tesseract.

+
+
+
+
+

Regularization Layers

+
+

Dropout

+
Do[{name}][<prob>],[<dim>] Insert a 1D or 2D dropout layer
+
+
+

Adds an 1D or 2D dropout layer with a given probability. Defaults to 0.5 drop +probability and 1D dropout. Set to dim to 2 after convolutional layers.

+
+
+

Group Normalization

+
Gn<groups> Inserts a group normalization layer
+
+
+

Adds a group normalization layer separating the input into <groups> groups, +normalizing each separately.

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.1/.buildinfo b/4.1/.buildinfo new file mode 100644 index 000000000..149316600 --- /dev/null +++ b/4.1/.buildinfo @@ -0,0 +1,4 @@ +# Sphinx build info version 1 +# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. +config: eeac3be1dad7423c724f84e87831c07c +tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/4.1/.doctrees/advanced.doctree b/4.1/.doctrees/advanced.doctree new file mode 100644 index 000000000..b7156e2ca Binary files /dev/null and b/4.1/.doctrees/advanced.doctree differ diff --git a/4.1/.doctrees/api.doctree b/4.1/.doctrees/api.doctree new file mode 100644 index 000000000..cdeb9e9be Binary files /dev/null and b/4.1/.doctrees/api.doctree differ diff --git a/4.1/.doctrees/api_docs.doctree b/4.1/.doctrees/api_docs.doctree new file mode 100644 index 000000000..b996bf255 Binary files /dev/null and b/4.1/.doctrees/api_docs.doctree differ diff --git a/4.1/.doctrees/environment.pickle b/4.1/.doctrees/environment.pickle new file mode 100644 index 000000000..fb44825ae Binary files /dev/null and b/4.1/.doctrees/environment.pickle differ diff --git a/4.1/.doctrees/gpu.doctree b/4.1/.doctrees/gpu.doctree new file mode 100644 index 000000000..ee85af155 Binary files /dev/null and b/4.1/.doctrees/gpu.doctree differ diff --git a/4.1/.doctrees/index.doctree b/4.1/.doctrees/index.doctree new file mode 100644 index 000000000..13d558d58 Binary files /dev/null and b/4.1/.doctrees/index.doctree differ diff --git a/4.1/.doctrees/ketos.doctree b/4.1/.doctrees/ketos.doctree new file mode 100644 index 000000000..50cf667af Binary files /dev/null and b/4.1/.doctrees/ketos.doctree differ diff --git a/4.1/.doctrees/models.doctree b/4.1/.doctrees/models.doctree new file mode 100644 index 000000000..a20f6d849 Binary files /dev/null and b/4.1/.doctrees/models.doctree differ diff --git a/4.1/.doctrees/training.doctree b/4.1/.doctrees/training.doctree new file mode 100644 index 000000000..8404f7b81 Binary files /dev/null and b/4.1/.doctrees/training.doctree differ diff --git a/4.1/.doctrees/vgsl.doctree b/4.1/.doctrees/vgsl.doctree new file mode 100644 index 000000000..c4cc7192f Binary files /dev/null and b/4.1/.doctrees/vgsl.doctree differ diff --git a/4.1/.nojekyll b/4.1/.nojekyll new file mode 100644 index 000000000..e69de29bb diff --git a/4.1/_sources/advanced.rst.txt b/4.1/_sources/advanced.rst.txt new file mode 100644 index 000000000..ebb9a0bfb --- /dev/null +++ b/4.1/_sources/advanced.rst.txt @@ -0,0 +1,255 @@ +.. _advanced: + +Advanced Usage +============== + +Optical character recognition is the serial execution of multiple steps, in the +case of kraken binarization (converting color and grayscale images into bitonal +ones), layout analysis/page segmentation (extracting topological text lines +from an image), recognition (feeding text lines images into an classifiers), +and finally serialization of results into an appropriate format such as hOCR or +ALTO. + +Input Specification +------------------- + +All kraken subcommands operating on input-output pairs, i.e. producing one +output document for one input document follow the basic syntax: + +.. code-block:: console + + $ kraken -i input_1 output_1 -i input_2 output_2 ... subcommand_1 subcommand_2 ... subcommand_n + +In particular subcommands may be chained. + +There are other ways to define inputs and outputs as the syntax shown above can +become rather cumbersome for large amounts of files. + +As such there are a couple of ways to deal with multiple files in a compact +way. The first is batch processing: + +.. code-block:: console + + $ kraken -I '*.png' -o ocr.txt segment ... + +which expands the `glob expression +`_ in kraken internally and +appends the suffix defined with `-o` to each output file. An input file +`xyz.png` will therefore produce an output file `xyz.png.ocr.txt`. A second way +is to input multi-image files directly. These can be either in PDF, TIFF, or +JPEG2000 format and are specified like: + +.. code-block:: console + + $ kraken -I some.pdf -o ocr.txt -f pdf segment ... + +This will internally extract all page images from the input PDF file and write +one output file with an index (can be changed using the `-p` option) and the +suffix defined with `-o`. + +The `-f` option can not only be used to extract data from PDF/TIFF/JPEG2000 +files but also various XML formats. In these cases the appropriate data is +automatically selected from the inputs, image data for segmentation or line and +region segmentation for recognition: + +.. code-block:: console + + $ kraken -i alto.xml alto.ocr.txt -i page.xml page.ocr.txt -f xml ocr ... + +The code is able to automatically determine if a file is in PageXML or ALTO format. + +Binarization +------------ + +The binarization subcommand accepts almost the same parameters as +``ocropus-nlbin``. Only options not related to binarization, e.g. skew +detection are missing. In addition, error checking (image sizes, inversion +detection, grayscale enforcement) is always disabled and kraken will happily +binarize any image that is thrown at it. + +Available parameters are: + +=========== ==== +option type +=========== ==== +--threshold FLOAT +--zoom FLOAT +--escale FLOAT +--border FLOAT +--perc INTEGER RANGE +--range INTEGER +--low INTEGER RANGE +--high INTEGER RANGE +=========== ==== + +Page Segmentation and Script Detection +-------------------------------------- + +The `segment` subcommand access two operations page segmentation into lines and +script detection of those lines. + +Page segmentation is mostly parameterless, although a switch to change the +color of column separators has been retained. The segmentation is written as a +`JSON `_ file containing bounding boxes in reading order and +the general text direction (horizontal, i.e. LTR or RTL text in top-to-bottom +reading order or vertical-ltr/rtl for vertical lines read from left-to-right or +right-to-left). + +The script detection splits extracted lines from the segmenter into strip +sharing a particular script that can then be recognized by supplying +appropriate models for each detected script to the `ocr` subcommand. + +Combined output from both consists of lists in the `boxes` field corresponding +to a topographical line and containing one or more bounding boxes of a +particular script. Identifiers are `ISO 15924 +`_ 4 character codes. + +.. code-block:: console + + $ kraken -i 14.tif lines.txt segment + $ cat lines.json + { + "boxes" : [ + [ + ["Grek", [561, 216, 1626,309]] + ], + [ + ["Latn", [2172, 197, 2424, 244]] + ], + [ + ["Grek", [1678, 221, 2236, 320]], + ["Arab", [2241, 221, 2302, 320]] + ], + + ["Grek", [412, 318, 2215, 416]], + ["Latn", [2208, 318, 2424, 416]] + ], + ... + ], + "script_detection": true, + "text_direction" : "horizontal-tb" + } + +Script detection is automatically enabled; by explicitly disabling script +detection the `boxes` field will contain only a list of line bounding boxes: + +.. code-block:: console + + [546, 216, 1626, 309], + [2169, 197, 2423, 244], + [1676, 221, 2293, 320], + ... + [503, 2641, 848, 2681] + +Available page segmentation parameters are: + +=============================================== ====== +option action +=============================================== ====== +-d, --text-direction Sets principal text direction. Valid values are `horizontal-lr`, `horizontal-rl`, `vertical-lr`, and `vertical-rl`. +--scale FLOAT Estimate of the average line height on the page +-m, --maxcolseps Maximum number of columns in the input document. Set to `0` for uni-column layouts. +-b, --black-colseps / -w, --white-colseps Switch to black column separators. +-r, --remove-hlines / -l, --hlines Disables prefiltering of small horizontal lines. Improves segmenter output on some Arabic texts. +=============================================== ====== + +Model Repository +---------------- + +There is a semi-curated `repository +`_ of freely licensed recognition +models that can be accessed from the command line using a few subcommands. For +evaluating a series of models it is also possible to just clone the repository +using the normal git client. + +The ``list`` subcommand retrieves a list of all models available and prints +them including some additional information (identifier, type, and a short +description): + +.. code-block:: console + + $ kraken list + Retrieving model list ✓ + default (pyrnn) - A converted version of en-default.pyrnn.gz + toy (clstm) - A toy model trained on 400 lines of the UW3 data set. + ... + +To access more detailed information the ``show`` subcommand may be used: + +.. code-block:: console + + $ kraken show toy + name: toy.clstm + + A toy model trained on 400 lines of the UW3 data set. + + author: Benjamin Kiessling (mittagessen@l.unchti.me) + http://kraken.re + +If a suitable model has been decided upon it can be retrieved using the ``get`` +subcommand: + +.. code-block:: console + + $ kraken get toy + Retrieving model ✓ + +Models will be placed in $XDG_BASE_DIR and can be accessed using their name as +shown by the ``show`` command, e.g.: + +.. code-block:: console + + $ kraken -i ... ... ocr -m toy + +Additions and updates to existing models are always welcome! Just open a pull +request or write an email. + +Recognition +----------- + +Recognition requires a grey-scale or binarized image, a page segmentation for +that image, and a model file. In particular there is no requirement to use the +page segmentation algorithm contained in the ``segment`` subcommand or the +binarization provided by kraken. + +Multi-script recognition is possible by supplying a script-annotated +segmentation and a mapping between scripts and models: + +.. code-block:: console + + $ kraken -i ... ... ocr -m Grek:porson.clstm -m Latn:antiqua.clstm + +All polytonic Greek text portions will be recognized using the `porson.clstm` +model while Latin text will be fed into the `antiqua.clstm` model. It is +possible to define a fallback model that other text will be fed to: + +.. code-block:: console + + $ kraken -i ... ... ocr -m ... -m ... -m default:porson.clstm + +It is also possible to disable recognition on a particular script by mapping to +the special model keyword `ignore`. Ignored lines will still be serialized but +will not contain any recognition results. + +The ``ocr`` subcommand is able to serialize the recognition results either as +plain text (default), as `hOCR `_, into `ALTO +`_, or abbyyXML containing additional +metadata such as bounding boxes and confidences: + +.. code-block:: console + + $ kraken -i ... ... ocr -t # text output + $ kraken -i ... ... ocr -h # hOCR output + $ kraken -i ... ... ocr -a # ALTO output + $ kraken -i ... ... ocr -y # abbyyXML output + +hOCR output is slightly different from hOCR files produced by ocropus. Each +``ocr_line`` span contains not only the bounding box of the line but also +character boxes (``x_bboxes`` attribute) indicating the coordinates of each +character. In each line alternating sequences of alphanumeric and +non-alphanumeric (in the unicode sense) characters are put into ``ocrx_word`` +spans. Both have bounding boxes as attributes and the recognition confidence +for each character in the ``x_conf`` attribute. + +Paragraph detection has been removed as it was deemed to be unduly dependent on +certain typographic features which may not be valid for your input. diff --git a/4.1/_sources/api.rst.txt b/4.1/_sources/api.rst.txt new file mode 100644 index 000000000..effad7c4f --- /dev/null +++ b/4.1/_sources/api.rst.txt @@ -0,0 +1,406 @@ +API Quickstart +============== + +Kraken provides routines which are usable by third party tools to access all +functionality of the OCR engine. Most functional blocks, binarization, +segmentation, recognition, and serialization are encapsulated in one high +level method each. + +Simple use cases of the API which are mostly useful for debugging purposes are +contained in the `contrib` directory. In general it is recommended to look at +this tutorial, these scripts, or the API reference. The command line drivers +are unnecessarily complex for straightforward applications as they contain lots +of boilerplate to enable all use cases. + +Basic Concepts +-------------- + +The fundamental modules of the API are similar to the command line drivers. +Image inputs and outputs are generally `Pillow `_ +objects and numerical outputs numpy arrays. + +Top-level modules implement high level functionality while :mod:`kraken.lib` +contains loaders and low level methods that usually should not be used if +access to intermediate results is not required. + +Preprocessing and Segmentation +------------------------------ + +The primary preprocessing function is binarization although depending on the +particular setup of the pipeline and the models utilized it can be optional. +For the non-trainable legacy bounding box segmenter binarization is mandatory +although it is still possible to feed color and grayscale images to the +recognizer. The trainable baseline segmenter can work with black and white, +grayscale, and color images, depending on the training data and netork +configuration utilized; though grayscale and color data are used in almost all +cases. + +.. code-block:: python + + >>> from PIL import Image + + >>> from kraken import binarization + + # can be any supported image format and mode + >>> im = Image.open('foo.png') + >>> bw_im = binarization.nlbin(im) + +Legacy segmentation +~~~~~~~~~~~~~~~~~~~ + +The basic parameter of the legacy segmenter consists just of a b/w image +object, although some additional parameters exist, largely to change the +principal text direction (important for column ordering and top-to-bottom +scripts) and explicit masking of non-text image regions: + +.. code-block:: python + + >>> from kraken import pageseg + + >>> seg = pageseg.segment(bw_im) + >>> seg + {'text_direction': 'horizontal-lr', + 'boxes': [[0, 29, 232, 56], + [28, 54, 121, 84], + [9, 73, 92, 117], + [103, 76, 145, 131], + [7, 105, 119, 230], + [10, 228, 126, 345], + ... + ], + 'script_detection': False} + +Baseline segmentation +~~~~~~~~~~~~~~~~~~~~~ + +The baseline segmentation method is based on a neural network that classifies +image pixels into baselines and regions. Because it is trainable, a +segmentation model is required in addition to the image to be segmentation and +it has to be loaded first: + +.. code-block:: python + + >>> from kraken import blla + >>> from kraken.lib import vgsl + + >>> model_path = 'path/to/model/file' + >>> model = vgsl.TorchVGSLModel.load_model(model_path) + +A segmentation model contains a basic neural network and associated metadata +defining the available line and region types, bounding regions, and an +auxiliary baseline location flag for the polygonizer: + +.. raw:: html + :file: _static/kraken_segmodel.svg + +Afterwards they can be fed into the segmentation method +:func:`kraken.blla.segment` with image objects: + +.. code-block:: python + + >>> from kraken import blla + >>> from kraken import serialization + + >>> baseline_seg = blla.segment(im, model=model) + >>> baseline_seg + {'text_direction': 'horizontal-lr', + 'type': 'baselines', + 'script_detection': False, + 'lines': [{'script': 'default', + 'baseline': [[471, 1408], [524, 1412], [509, 1397], [1161, 1412], [1195, 1412]], + 'boundary': [[471, 1408], [491, 1408], [515, 1385], [562, 1388], [575, 1377], ... [473, 1410]]}, + ...], + 'regions': {'$tip':[[[536, 1716], ... [522, 1708], [524, 1716], [536, 1716], ...] + '$par': ... + '$nop': ...}} + >>> alto = serialization.serialize_segmentation(baseline_seg, image_name=im.filename, image_size=im.size, template='alto') + >>> with open('segmentation_output.xml', 'w') as fp: + fp.write(alto) + +Optional parameters are largely the same as for the legacy segmenter, i.e. text +direction and masking. + +Images are automatically converted into the proper mode for recognition, except +in the case of models trained on binary images as there is a plethora of +different algorithms available, each with strengths and weaknesses. For most +material the kraken-provided binarization should be sufficient, though. This +does not mean that a segmentation model trained on RGB images will have equal +accuracy for B/W, grayscale, and RGB inputs. Nevertheless the drop in quality +will often be modest or non-existent for color models while non-binarized +inputs to a binary model will cause severe degradation (and a warning to that +notion). + +Per default segmentation is performed on the CPU although the neural network +can be run on a GPU with the `device` argument. As the vast majority of the +processing required is postprocessing the performance gain will most likely +modest though. + +The above API is the most simple way to perform a complete segmentation. The +process consists of multiple steps such as pixel labelling, separate region and +baseline vectorization, and bounding polygon calculation: + +.. raw:: html + :file: _static/kraken_segmentation.svg + +It is possible to only run a subset of the functionality depending on one's +needs by calling the respective functions in :mod:`kraken.lib.segmentation`. As +part of the sub-library the API is not guaranteed to be stable but it generally +does not change much. Examples of more fine-grained use of the segmentation API +can be found in `contrib/repolygonize.py +`_ +and `contrib/segmentation_overlay.py +`_. + +Recognition +----------- + +Recognition itself is a multi-step process with a neural network producing a +matrix with a confidence value for possible outputs at each time step. This +matrix is decoded into a sequence of integer labels (*label domain*) which are +subsequently mapped into Unicode code points using a codec. Labels and code +points usually correspond one-to-one, i.e. each label is mapped to exactly one +Unicode code point, but if desired more complex codecs can map single labels to +multiple code points, multiple labels to single code points, or multiple labels +to multiple code points (see the :ref:`Codec ` section for further +information). + +.. _recognition_steps: + +.. raw:: html + :file: _static/kraken_recognition.svg + +As the customization of this two-stage decoding process is usually reserved +for specialized use cases, sensible defaults are chosen by default: codecs are +part of the model file and do not have to be supplied manually; the preferred +CTC decoder is an optional parameter of the recognition model object. + +To perform text line recognition a neural network has to be loaded first. A +:class:`kraken.lib.models.TorchSeqRecognizer` is returned which is a wrapper +around the :class:`kraken.lib.vgsl.TorchVGSLModel` class seen above for +segmentation model loading. + +.. code-block:: python + + >>> from kraken.lib import models + + >>> rec_model_path = '/path/to/recognition/model' + >>> model = models.load_any(rec_model_path) + +The sequence recognizer wrapper combines the neural network itself, a +:ref:`codec `, metadata such as the if the input is supposed to be +grayscale or binarized, and an instance of a CTC decoder that performs the +conversion of the raw output tensor of the network into a sequence of labels: + +.. raw:: html + :file: _static/kraken_torchseqrecognizer.svg + +Afterwards, given an image, a segmentation and the model one can perform text +recognition. The code is identical for both legacy and baseline segmentations. +Like for segmentation input images are auto-converted to the correct color +mode, except in the case of binary models for which a warning will be raised if +there is a mismatch for binary input models. + +There are two methods for recognition, a basic single model call +:func:`kraken.rpred.rpred` and a multi-model recognizer +:func:`kraken.rpred.mm_rpred`. The latter is useful for recognizing +multi-scriptal documents, i.e. applying different models to different parts of +a document. + +.. code-block:: python + + >>> from kraken import rpred + # single model recognition + >>> pred_it = rpred(model, im, baseline_seg) + >>> for record in pred_it: + print(record) + +The output isn't just a sequence of characters but an +:class:`kraken.rpred.ocr_record` record object containing the character +prediction, cuts (approximate locations), and confidences. + +.. code-block:: python + + >>> record.cuts + >>> record.prediction + >>> record.confidences + +it is also possible to access the original line information: + +.. code-block:: python + + # for baselines + >>> record.type + 'baselines' + >>> record.line + >>> record.baseline + >>> record.script + + # for box lines + >>> record.type + 'box' + >>> record.line + >>> record.script + +Sometimes the undecoded raw output of the network is required. The :math:`C +\times W` softmax output matrix is accessible as the `outputs` attribute on the +:class:`kraken.lib.models.TorchSeqRecognizer` after each step of the +:func:`kraken.rpred.rpred` iterator. To get a mapping from the label space +:math:`C` the network operates in to Unicode code points a codec is used. An +arbitrary sequence of labels can generate an arbitrary number of Unicode code +points although usually the relation is one-to-one. + +.. code-block:: python + + >>> pred_it = rpred(model, im, baseline_seg) + >>> next(pred_it) + >>> model.output + >>> model.codec.l2c + {'\x01': ' ', + '\x02': '"', + '\x03': "'", + '\x04': '(', + '\x05': ')', + '\x06': '-', + '\x07': '/', + ... + } + +There are several different ways to convert the output matrix to a sequence of +labels that can be decoded into a character sequence. These are contained in +:mod:`kraken.lib.ctc_decoder` with +:func:`kraken.lib.ctc_decoder.greedy_decoder` being the default. + +XML Parsing +----------- + +Sometimes it is desired to take the data in an existing XML serialization +format like PageXML or ALTO and apply an OCR function on it. The +:mod:`kraken.lib.xml` module includes parsers extracting information into data +structures processable with minimal transformtion by the functional blocks: + +.. code-block:: python + + >>> from kraken.lib import xml + + >>> alto_doc = '/path/to/alto' + >>> xml.parse_alto(alto_doc) + {'image': '/path/to/image/file', + 'type': 'baselines', + 'lines': [{'baseline': [(24, 2017), (25, 2078)], + 'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)], + 'text': '', + 'script': 'default'}, + {'baseline': [(79, 2016), (79, 2041)], + 'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)], + 'text': '', + 'script': 'default'}, ...], + 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)], + [(253, 3292), (668, 3292), (668, 3455), (253, 3455)], + [(216, -4), (1015, -4), (1015, 534), (216, 534)]], + 'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)], + [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]], + ...} + } + + >>> page_doc = '/path/to/page' + >>> xml.parse_page(page_doc) + {'image': '/path/to/image/file', + 'type': 'baselines', + 'lines': [{'baseline': [(24, 2017), (25, 2078)], + 'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)], + 'text': '', + 'script': 'default'}, + {'baseline': [(79, 2016), (79, 2041)], + 'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)], + 'text': '', + 'script': 'default'}, ...], + 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)], + [(253, 3292), (668, 3292), (668, 3455), (253, 3455)], + [(216, -4), (1015, -4), (1015, 534), (216, 534)]], + 'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)], + [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]], + ...} + + +Serialization +------------- + +The serialization module can be used to transform the :class:`ocr_records +` returned by the prediction iterator into a text +based (most often XML) format for archival. The module renders `jinja2 +`_ templates in `kraken/templates` through +the :func:`kraken.serialization.serialize` function. + +.. code-block:: python + + >>> from kraken.lib import serialization + + >>> records = [record for record in pred_it] + >>> alto = serialization.serialize(records, image_name='path/to/image', image_size=im.size, template='alto') + >>> with open('output.xml', 'w') as fp: + fp.write(alto) + + +Training +-------- + +Training is largely implemented with the `pytorch lightning +`_ framework. There are separate +`LightningModule`s for recognition and segmentation training and a small +wrapper around the lightning's `Trainer` class that mainly sets up model +handling and verbosity options for the CLI. + + +.. code-block:: python + + >>> from kraken.lib.train import RecognitionModel, KrakenTrainer + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer() + >>> trainer.fit(model) + +Likewise for a baseline and region segmentation model: + +.. code-block:: python + + >>> from kraken.lib.train import SegmentationModel, KrakenTrainer + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = SegmentationModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer() + >>> trainer.fit(model) + +When the `fit()` method is called the dataset is initialized and the training +commences. Both can take quite a bit of time. To get insight into what exactly +is happening the standard `lightning callbacks +`_ +can be attached to the trainer object: + +.. code-block:: python + + >>> from pytorch_lightning.callbacks import Callback + >>> from kraken.lib.train import RecognitionModel, KrakenTrainer + >>> class MyPrintingCallback(Callback): + def on_init_start(self, trainer): + print("Starting to init trainer!") + + def on_init_end(self, trainer): + print("trainer is init now") + + def on_train_end(self, trainer, pl_module): + print("do something when training ends") + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer(enable_progress_bar=False, callbacks=[MyPrintingCallback]) + >>> trainer.fit(model) + Starting to init trainer! + trainer is init now + +This is only a small subset of the training functionality. It is suggested to +have a closer look at the command line parameters for features as transfer +learning, region and baseline filtering, training continuation, and so on. diff --git a/4.1/_sources/api_docs.rst.txt b/4.1/_sources/api_docs.rst.txt new file mode 100644 index 000000000..46379f2b8 --- /dev/null +++ b/4.1/_sources/api_docs.rst.txt @@ -0,0 +1,251 @@ +************* +API Reference +************* + +kraken.blla module +================== + +.. note:: + + `blla` provides the interface to the fully trainable segmenter. For the + legacy segmenter interface refer to the `pageseg` module. Note that + recognition models are not interchangeable between segmenters. + +.. autoapifunction:: kraken.blla.segment + +kraken.pageseg module +===================== + +.. note:: + + `pageseg` is the legacy bounding box-based segmenter. For the trainable + baseline segmenter interface refer to the `blla` module. Note that + recognition models are not interchangeable between segmenters. + +.. autoapifunction:: kraken.pageseg.segment + +kraken.rpred module +=================== + +.. autoapifunction:: kraken.rpred.bidi_record + +.. autoapiclass:: kraken.rpred.mm_rpred + :members: + +.. autoapiclass:: kraken.rpred.ocr_record + :members: + +.. autoapifunction:: kraken.rpred.rpred + + +kraken.serialization module +=========================== + +.. autoapifunction:: kraken.serialization.render_report + +.. autoapifunction:: kraken.serialization.serialize + +.. autoapifunction:: kraken.serialization.serialize_segmentation + +kraken.lib.models module +======================== + +.. autoapiclass:: kraken.lib.models.TorchSeqRecognizer + :members: + +.. autoapifunction:: kraken.lib.models.load_any + +kraken.lib.vgsl module +====================== + +.. autoapiclass:: kraken.lib.vgsl.TorchVGSLModel + :members: + +kraken.lib.xml module +===================== + +.. autoapifunction:: kraken.lib.xml.parse_xml + +.. autoapifunction:: kraken.lib.xml.parse_page + +.. autoapifunction:: kraken.lib.xml.parse_alto + +kraken.lib.codec module +======================= + +.. autoapiclass:: kraken.lib.codec.PytorchCodec + :members: + +kraken.lib.train module +======================= + +Training Schedulers +------------------- + +.. autoapiclass:: kraken.lib.train.TrainScheduler + :members: + +.. autoapiclass:: kraken.lib.train.annealing_step + :members: + +.. autoapiclass:: kraken.lib.train.annealing_const + :members: + +.. autoapiclass:: kraken.lib.train.annealing_exponential + :members: + +.. autoapiclass:: kraken.lib.train.annealing_reduceonplateau + :members: + +.. autoapiclass:: kraken.lib.train.annealing_cosine + :members: + +.. autoapiclass:: kraken.lib.train.annealing_onecycle + :members: + +Training Stoppers +----------------- + +.. autoapiclass:: kraken.lib.train.TrainStopper + :members: + +.. autoapiclass:: kraken.lib.train.EarlyStopping + :members: + +.. autoapiclass:: kraken.lib.train.EpochStopping + :members: + +.. autoapiclass:: kraken.lib.train.NoStopping + :members: + +Loss and Evaluation Functions +----------------------------- + +.. autoapifunction:: kraken.lib.train.recognition_loss_fn + +.. autoapifunction:: kraken.lib.train.baseline_label_loss_fn + +.. autoapifunction:: kraken.lib.train.recognition_evaluator_fn + +.. autoapifunction:: kraken.lib.train.baseline_label_evaluator_fn + +Trainer +------- + +.. autoapiclass:: kraken.lib.train.KrakenTrainer + :members: + + +kraken.lib.dataset module +========================= + +Datasets +-------- + +.. autoapiclass:: kraken.lib.dataset.BaselineSet + :members: + +.. autoapiclass:: kraken.lib.dataset.PolygonGTDataset + :members: + +.. autoapiclass:: kraken.lib.dataset.GroundTruthDataset + :members: + +Helpers +------- + +.. autoapifunction:: kraken.lib.dataset.compute_error + +.. autoapifunction:: kraken.lib.dataset.preparse_xml_data + +.. autoapifunction:: kraken.lib.dataset.generate_input_transforms + +kraken.lib.segmentation module +------------------------------ + +.. autoapifunction:: kraken.lib.segmentation.reading_order + +.. autoapifunction:: kraken.lib.segmentation.polygonal_reading_order + +.. autoapifunction:: kraken.lib.segmentation.denoising_hysteresis_thresh + +.. autoapifunction:: kraken.lib.segmentation.vectorize_lines + +.. autoapifunction:: kraken.lib.segmentation.calculate_polygonal_environment + +.. autoapifunction:: kraken.lib.segmentation.scale_polygonal_lines + +.. autoapifunction:: kraken.lib.segmentation.scale_regions + +.. autoapifunction:: kraken.lib.segmentation.compute_polygon_section + +.. autoapifunction:: kraken.lib.segmentation.extract_polygons + + +kraken.lib.ctc_decoder +====================== + +.. autoapifunction:: kraken.lib.ctc_decoder.beam_decoder + +.. autoapifunction:: kraken.lib.ctc_decoder.greedy_decoder + +.. autoapifunction:: kraken.lib.ctc_decoder.blank_threshold_decoder + +kraken.lib.exceptions +===================== + +.. autoapiclass:: kraken.lib.exceptions.KrakenCodecException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenStopTrainingException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenEncodeException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenRecordException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenInvalidModelException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenInputException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenRepoException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenCairoSurfaceException + :members: + + +Legacy modules +============== + +These modules are retained for compatibility reasons or highly specialized use +cases. In most cases their use is not necessary and they aren't further +developed for interoperability with new functionality, e.g. the transcription +and line generation modules do not work with the baseline segmenter. + +kraken.binarization module +-------------------------- + +.. autoapifunction:: kraken.binarization.nlbin + +kraken.transcribe module +------------------------ + +.. autoapiclass:: kraken.transcribe.TranscriptionInterface + :members: + +kraken.linegen module +--------------------- + +.. autoapiclass:: kraken.transcribe.LineGenerator + :members: + +.. autoapifunction:: kraken.transcribe.ocropy_degrade + +.. autoapifunction:: kraken.transcribe.degrade_line + +.. autoapifunction:: kraken.transcribe.distort_line diff --git a/4.1/_sources/gpu.rst.txt b/4.1/_sources/gpu.rst.txt new file mode 100644 index 000000000..fbb66ba76 --- /dev/null +++ b/4.1/_sources/gpu.rst.txt @@ -0,0 +1,10 @@ +.. _gpu: + +GPU Acceleration +================ + +The latest version of kraken uses a new pytorch backend which enables GPU +acceleration both for training and recognition. Apart from a compatible Nvidia +GPU, CUDA and cuDNN have to be installed so pytorch can run computation on it. + + diff --git a/4.1/_sources/index.rst.txt b/4.1/_sources/index.rst.txt new file mode 100644 index 000000000..1e99c0f83 --- /dev/null +++ b/4.1/_sources/index.rst.txt @@ -0,0 +1,243 @@ +kraken +====== + +.. toctree:: + :hidden: + :maxdepth: 2 + + advanced + Training + API Tutorial + API Reference + Models + +kraken is a turn-key OCR system optimized for historical and non-Latin script +material. + +Features +======== + +kraken's main features are: + + - Fully trainable layout analysis and character recognition + - `Right-to-Left `_, `BiDi + `_, and Top-to-Bottom + script support + - `ALTO `_, PageXML, abbyXML, and hOCR + output + - Word bounding boxes and character cuts + - Multi-script recognition support + - `Public repository `_ of model files + - :ref:`Lightweight model files ` + - :ref:`Variable recognition network architectures ` + +Pull requests and code contributions are always welcome. + +Installation +============ + +Kraken can be run on Linux or Mac OS X (both x64 and ARM). Installation through +the on-board *pip* utility and the `anaconda `_ +scientific computing python are supported. + +Installation using Pip +---------------------- + +.. code-block:: console + + $ pip install kraken + +or by running pip in the git repository: + +.. code-block:: console + + $ pip install . + +If you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to +install the `pdf` extras package for PyPi: + +.. code-block:: console + + $ pip install kraken[pdf] + +or + +.. code-block:: console + + $ pip install .[pdf] + +respectively. + +Installation using Conda +------------------------ + +To install the stable version through `conda `_: + +.. code-block:: console + + $ conda install -c conda-forge -c mittagessen kraken + +Again PDF/multi-page TIFF/JPEG2000 support requires some additional dependencies: + +.. code-block:: console + + $ conda install -c conda-forge pyvips + +The git repository contains some environment files that aid in setting up the latest development version: + +.. code-block:: console + + $ git clone git://github.com/mittagessen/kraken.git + $ cd kraken + $ conda env create -f environment.yml + +or: + +.. code-block:: console + + $ git clone git://github.com/mittagessen/kraken.git + $ cd kraken + $ conda env create -f environment_cuda.yml + +for CUDA acceleration with the appropriate hardware. + +Finding Recognition Models +-------------------------- + +Finally you'll have to scrounge up a recognition model to do the actual +recognition of characters. To download the default English text recognition +model and place it in the user's kraken directory: + +.. code-block:: console + + $ kraken get 10.5281/zenodo.2577813 + +A list of libre models available in the central repository can be retrieved by +running: + +.. code-block:: console + + $ kraken list + +Model metadata can be extracted using: + +.. code-block:: console + + $ kraken show 10.5281/zenodo.2577813 + name: 10.5281/zenodo.2577813 + + A generalized model for English printed text + + This model has been trained on a large corpus of modern printed English text\naugmented with ~10000 lines of historical p + scripts: Latn + alphabet: !"#$%&'()+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]`abcdefghijklmnopqrstuvwxyz{} SPACE + accuracy: 99.95% + license: Apache-2.0 + author(s): Kiessling, Benjamin + date: 2019-02-26 + +Quickstart +========== + +The structure of an OCR software consists of multiple steps, primarily +preprocessing, segmentation, and recognition, each of which takes the output of +the previous step and sometimes additional files such as models and templates +that define how a particular transformation is to be performed. + +In kraken these are separated into different subcommands that can be chained or +ran separately: + +.. raw:: html + :file: _static/kraken_workflow.svg + +Recognizing text on an image using the default parameters including the +prerequisite step of page segmentation: + +.. code-block:: console + + $ kraken -i image.tif image.txt segment -bl ocr + Loading RNN ✓ + Processing ⣻ + +To segment an image into reading-order sorted baselines and regions: + +.. code-block:: console + + $ kraken -i bw.tif lines.json segment -bl + +To OCR an image using the default model: + +.. code-block:: console + + $ kraken -i bw.tif image.txt segment -bl ocr + +To OCR an image using the default model and serialize the output using the ALTO +template: + +.. code-block:: console + + $ kraken -a -i bw.tif image.txt segment -bl ocr + +All commands and their parameters are documented, just add the standard +``--help`` flag for further information. + +Training Tutorial +================= + +There is a training tutorial at :doc:`training`. + +Related Software +================ + +These days kraken is quite closely linked to the `escriptorium +`_ project developed in the same eScripta research +group. eScriptorium provides a user-friendly interface for annotating data, +training models, and inference (but also much more). There is a `gitter channel +`_ that is mostly intended for +coordinating technical development but is also a spot to find people with +experience on applying kraken on a wide variety of material. + +.. _license: + +License +======= + +``Kraken`` is provided under the terms and conditions of the `Apache 2.0 +License `_. + +Funding +======= + +kraken is developed at the `École Pratique des Hautes Études `_, `Université PSL `_. + + +.. container:: twocol + + .. container:: leftside + + .. image:: https://ec.europa.eu/regional_policy/images/information/logos/eu_flag.jpg + :width: 100 + :alt: Co-financed by the European Union + + .. container:: rightside + + This project was partially funded through the RESILIENCE project, funded from + the European Union’s Horizon 2020 Framework Programme for Research and + Innovation. + + +.. container:: twocol + + .. container:: leftside + + .. image:: https://www.gouvernement.fr/sites/default/files/styles/illustration-centre/public/contenu/illustration/2018/10/logo_investirlavenir_rvb.png + :width: 100 + :alt: Received funding from the Programme d’investissements d’Avenir + + .. container:: rightside + + Ce travail a bénéficié d’une aide de l’État gérée par l’Agence Nationale de la + Recherche au titre du Programme d’Investissements d’Avenir portant la référence + ANR-21-ESRE-0005. + + diff --git a/4.1/_sources/ketos.rst.txt b/4.1/_sources/ketos.rst.txt new file mode 100644 index 000000000..ee761532f --- /dev/null +++ b/4.1/_sources/ketos.rst.txt @@ -0,0 +1,656 @@ +.. _ketos: + +Training +======== + +This page describes the training utilities available through the ``ketos`` +command line utility in depth. For a gentle introduction on model training +please refer to the :ref:`tutorial `. + +Both segmentation and recognition are trainable in kraken. The segmentation +model finds baselines and regions on a page image. Recognition models convert +text image lines found by the segmenter into digital text. + +Training data formats +--------------------- + +The training tools accept a variety of training data formats, usually some kind +of custom low level format, the XML-based formats that are commony used for +archival of annotation and transcription data, and in the case of recognizer +training a precompiled binary format. It is recommended to use the XML formats +for segmentation training and the binary format for recognition training. + +ALTO +~~~~ + +Kraken parses and produces files according to ALTO 4.2. An example showing the +attributes necessary for segmentation and recognition training follows: + +.. literalinclude:: alto.xml + :language: xml + :force: + +Importantly, the parser only works with measurements in the pixel domain, i.e. +an unset `MeasurementUnit` or one with an element value of `pixel`. In +addition, as the minimal version required for ingestion is quite new it is +likely that most existing ALTO documents will not contain sufficient +information to be used with kraken out of the box. + +PAGE XML +~~~~~~~~ + +PAGE XML is parsed and produced according to the 2019-07-15 version of the +schema, although the parser is not strict and works with non-conformant output +of a variety of tools. + +.. literalinclude:: pagexml.xml + :language: xml + :force: + +Binary Datasets +~~~~~~~~~~~~~~~ + +.. _binary_datasets: + +In addition to training recognition models directly from XML and image files, a +binary dataset format offering a couple of advantages is supported. Binary +datasets drastically improve loading performance allowing the saturation of +most GPUs with minimal computational overhead while also allowing training with +datasets that are larger than the systems main memory. A minor drawback is a +~30% increase in dataset size in comparison to the raw images + XML approach. + +To realize this speedup the dataset has to be compiled first: + +.. code-block:: console + + $ ketos compile -f xml -o dataset.arrow file_1.xml file_2.xml ... + +if there are a lot of individual lines containing many lines this process can +take a long time. It can easily be parallelized by specifying the number of +separate parsing workers with the `--workers` option: + +.. code-block:: console + + $ ketos compile --workers 8 -f xml ... + +In addition, binary datasets can contain fixed splits which allow +reproducibility and comparability between training and evaluation runs. +Training, validation, and test splits can be pre-defined from multiple sources. +Per default they are sourced from tags defined in the source XML files unless +the option telling kraken to ignore them is set: + +.. code-block:: console + + $ ketos compile --ignore-splits -f xml ... + +Alternatively fixed-proportion random splits can be created ad-hoc during +compile time: + +.. code-block:: console + + $ ketos compile --random-split 0.8 0.1 0.1 ... + +The above line splits assigns 80% of the source lines to the training set, 10% +to the validation set, and 10% to the test set. The training and validation +sets in the dataset file are used automatically by `ketos train` (unless told +otherwise) while the remaining 10% of the test set is selected by `ketos test`. + +Recognition training +-------------------- + +The training utility allows training of :ref:`VGSL ` specified models +both from scratch and from existing models. Here are its most important command line options: + +======================================================= ====== +option action +======================================================= ====== +-o, --output Output model file prefix. Defaults to model. +-s, --spec VGSL spec of the network to train. CTC layer + will be added automatically. default: + [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 + Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] +-a, --append Removes layers before argument and then + appends spec. Only works when loading an + existing model +-i, --load Load existing file to continue training +-F, --savefreq Model save frequency in epochs during + training +-q, --quit Stop condition for training. Set to `early` + for early stopping (default) or `dumb` for fixed + number of epochs. +-N, --epochs Number of epochs to train for. +--min-epochs Minimum number of epochs to train for when using early stopping. +--lag Number of epochs to wait before stopping + training without improvement. Only used when using early stopping. +-d, --device Select device to use (cpu, cuda:0, cuda:1,...). GPU acceleration requires CUDA. +--optimizer Select optimizer (Adam, SGD, RMSprop). +-r, --lrate Learning rate [default: 0.001] +-m, --momentum Momentum used with SGD optimizer. Ignored otherwise. +-w, --weight-decay Weight decay. +--schedule Sets the learning rate scheduler. May be either constant, 1cycle, exponential, cosine, step, or + reduceonplateau. For 1cycle the cycle length is determined by the `--epoch` option. +-p, --partition Ground truth data partition ratio between train/validation set +-u, --normalization Ground truth Unicode normalization. One of NFC, NFKC, NFD, NFKD. +-c, --codec Load a codec JSON definition (invalid if loading existing model) +--resize Codec/output layer resizing option. If set + to `add` code points will be added, `both` + will set the layer to match exactly the + training data, `fail` will abort if training + data and model codec do not match. Only valid when refining an existing model. +-n, --reorder / --no-reorder Reordering of code points to display order. +-t, --training-files File(s) with additional paths to training data. Used to + enforce an explicit train/validation set split and deal with + training sets with more lines than the command line can process. Can be used more than once. +-e, --evaluation-files File(s) with paths to evaluation data. Overrides the `-p` parameter. +-f, --format-type Sets the training and evaluation data format. + Valid choices are 'path', 'xml' (default), 'alto', 'page', or binary. + In `alto`, `page`, and xml mode all data is extracted from XML files + containing both baselines and a link to source images. + In `path` mode arguments are image files sharing a prefix up to the last + extension with JSON `.path` files containing the baseline information. + In `binary` mode arguments are precompiled binary dataset files. +--augment / --no-augment Enables/disables data augmentation. +--workers Number of OpenMP threads and workers used to perform neural network passes and load samples from the dataset. +======================================================= ====== + +From Scratch +~~~~~~~~~~~~ + +The absolute minimal example to train a new recognition model from a number of +ALTO or PAGE XML documents is similar to the segmentation training: + +.. code-block:: console + + $ ketos train -f xml training_data/*.xml + +Training will continue until the error does not improve anymore and the best +model (among intermediate results) will be saved in the current directory; this +approach is called early stopping. + +In some cases, such as color inputs, changing the network architecture might be +useful: + +.. code-block:: console + + $ ketos train -f page -s '[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do]' syr/*.xml + +Complete documentation for the network description language can be found on the +:ref:`VGSL ` page. + +Sometimes the early stopping default parameters might produce suboptimal +results such as stopping training too soon. Adjusting the minimum delta an/or +lag can be useful: + +.. code-block:: console + + $ ketos train --lag 10 --min-delta 0.001 syr/*.png + +To switch optimizers from Adam to SGD or RMSprop just set the option: + +.. code-block:: console + + $ ketos train --optimizer SGD syr/*.png + +It is possible to resume training from a previously saved model: + +.. code-block:: console + + $ ketos train -i model_25.mlmodel syr/*.png + +A good configuration for a small precompiled print dataset and GPU acceleration +would be: + +.. code-block:: console + + $ ketos train -d cuda -f binary dataset.arrow + +A better configuration for large and complicated datasets such as handwritten texts: + +.. code-block:: console + + $ ketos train --augment --workers 4 -d cuda -f binary --min-epochs 20 -w 0 -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]' -r 0.0001 dataset_large.arrow + +This configuration is slower to train and often requires a couple of epochs to +output any sensible text at all. Therefore we tell ketos to train for at least +20 epochs so the early stopping algorithm doesn't prematurely interrupt the +training process. + +Fine Tuning +~~~~~~~~~~~ + +Fine tuning an existing model for another typeface or new characters is also +possible with the same syntax as resuming regular training: + +.. code-block:: console + + $ ketos train -f page -i model_best.mlmodel syr/*.xml + +The caveat is that the alphabet of the base model and training data have to be +an exact match. Otherwise an error will be raised: + +.. code-block:: console + + $ ketos train -i model_5.mlmodel kamil/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [0.8616] alphabet mismatch {'~', '»', '8', '9', 'ـ'} + Network codec not compatible with training set + [0.8620] Training data and model codec alphabets mismatch: {'ٓ', '؟', '!', 'ص', '،', 'ذ', 'ة', 'ي', 'و', 'ب', 'ز', 'ح', 'غ', '~', 'ف', ')', 'د', 'خ', 'م', '»', 'ع', 'ى', 'ق', 'ش', 'ا', 'ه', 'ك', 'ج', 'ث', '(', 'ت', 'ظ', 'ض', 'ل', 'ط', '؛', 'ر', 'س', 'ن', 'ء', 'ٔ', '«', 'ـ', 'ٕ'} + +There are two modes dealing with mismatching alphabets, ``add`` and ``both``. +``add`` resizes the output layer and codec of the loaded model to include all +characters in the new training set without removing any characters. ``both`` +will make the resulting model an exact match with the new training set by both +removing unused characters from the model and adding new ones. + +.. code-block:: console + + $ ketos -v train --resize add -i model_5.mlmodel syr/*.png + ... + [0.7943] Training set 788 lines, validation set 88 lines, alphabet 50 symbols + ... + [0.8337] Resizing codec to include 3 new code points + [0.8374] Resizing last layer in network to 52 outputs + ... + +In this example 3 characters were added for a network that is able to +recognize 52 different characters after sufficient additional training. + +.. code-block:: console + + $ ketos -v train --resize both -i model_5.mlmodel syr/*.png + ... + [0.7593] Training set 788 lines, validation set 88 lines, alphabet 49 symbols + ... + [0.7857] Resizing network or given codec to 49 code sequences + [0.8344] Deleting 2 output classes from network (46 retained) + ... + +In ``both`` mode 2 of the original characters were removed and 3 new ones were added. + +Slicing +~~~~~~~ + +Refining on mismatched alphabets has its limits. If the alphabets are highly +different the modification of the final linear layer to add/remove character +will destroy the inference capabilities of the network. In those cases it is +faster to slice off the last few layers of the network and only train those +instead of a complete network from scratch. + +Taking the default network definition as printed in the debug log we can see +the layer indices of the model: + +.. code-block:: console + + [0.8760] Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 48 outputs + [0.8762] layer type params + [0.8790] 0 conv kernel 3 x 3 filters 32 activation r + [0.8795] 1 dropout probability 0.1 dims 2 + [0.8797] 2 maxpool kernel 2 x 2 stride 2 x 2 + [0.8802] 3 conv kernel 3 x 3 filters 64 activation r + [0.8804] 4 dropout probability 0.1 dims 2 + [0.8806] 5 maxpool kernel 2 x 2 stride 2 x 2 + [0.8813] 6 reshape from 1 1 x 12 to 1/3 + [0.8876] 7 rnn direction b transposed False summarize False out 100 legacy None + [0.8878] 8 dropout probability 0.5 dims 1 + [0.8883] 9 linear augmented False out 48 + +To remove everything after the initial convolutional stack and add untrained +layers we define a network stub and index for appending: + +.. code-block:: console + + $ ketos train -i model_1.mlmodel --append 7 -s '[Lbx256 Do]' syr/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [0.8014] alphabet mismatch {'8', '3', '9', '7', '܇', '݀', '݂', '4', ':', '0'} + Slicing and dicing model ✓ + +The new model will behave exactly like a new one, except potentially training a +lot faster. + +Text Normalization and Unicode +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. note: + + The description of the different behaviors of Unicode text below are highly + abbreviated. If confusion arrises it is recommended to take a look at the + linked documents which are more exhaustive and include visual examples. + +Text can be encoded in multiple different ways when using Unicode. For many +scripts characters with diacritics can be encoded either as a single code point +or a base character and the diacritic, `different types of whitespace +`_ exist, and mixed bidirectional text +can be written differently depending on the `base line direction +`_. + +Ketos provides options to largely normalize input into normalized forms that +make processing of data from multiple sources possible. Principally, two +options are available: one for `Unicode normalization +`_ and one for whitespace normalization. The +Unicode normalization (disabled per default) switch allows one to select one of +the 4 normalization forms: + +.. code-block:: console + + $ ketos train --normalization NFD -f xml training_data/*.xml + $ ketos train --normalization NFC -f xml training_data/*.xml + $ ketos train --normalization NFKD -f xml training_data/*.xml + $ ketos train --normalization NFKC -f xml training_data/*.xml + +Whitespace normalization is enabled per default and converts all Unicode +whitespace characters into a simple space. It is highly recommended to leave +this function enabled as the variation of space width, resulting either from +text justification or the irregularity of handwriting, is difficult for a +recognition model to accurately model and map onto the different space code +points. Nevertheless it can be disabled through: + +.. code-block:: console + + $ ketos train --no-normalize-whitespace -f xml training_data/*.xml + +Further the behavior of the `BiDi algorithm +`_ can be influenced through two options. The +configuration of the algorithm is important as the recognition network is +trained to output characters (or rather labels which are mapped to code points +by a :ref:`codec `) in the order a line is fed into the network, i.e. +left-to-right also called display order. Unicode text is encoded as a stream of +code points in logical order, i.e. the order the characters in a line are read +in by a human reader, for example (mostly) right-to-left for a text in Hebrew. +The BiDi algorithm resolves this logical order to the display order expected by +the network and vice versa. The primary parameter of the algorithm is the base +direction which is just the default direction of the input fields of the user +when the ground truth was initially transcribed. Base direction will be +automatically determined by kraken when using PAGE XML or ALTO files that +contain it, otherwise it will have to be supplied if it differs from the +default when training a model: + +.. code-block:: console + + $ ketos train --base-dir R -f xml rtl_training_data/*.xml + +It is also possible to disable BiDi processing completely, e.g. when the text +has been brought into display order already: + +.. code-block:: console + + $ ketos train --no-reorder -f xml rtl_display_data/*.xml + +Codecs +~~~~~~ + +.. _codecs: + +Codecs map between the label decoded from the raw network output and Unicode +code points (see :ref:`this ` diagram for the precise steps +involved in text line recognition). Codecs are attached to a recognition model +and are usually defined once at initial training time, although they can be +adapted either explicitly (with the API) or implicitly through domain adaptation. + +The default behavior of kraken is to auto-infer this mapping from all the +characters in the training set and map each code point to one separate label. +This is usually sufficient for alphabetic scripts, abjads, and abugidas apart +from very specialised use cases. Logographic writing systems with a very large +number of different graphemes, such as all the variants of Han characters or +Cuneiform, can be more problematic as their large inventory makes recognition +both slow and error-prone. In such cases it can be advantageous to decompose +each code point into multiple labels to reduce the output dimensionality of the +network. During decoding valid sequences of labels will be mapped to their +respective code points as usual. + +There are multiple approaches one could follow constructing a custom codec: +*randomized block codes*, i.e. producing random fixed-length labels for each code +point, *Huffmann coding*, i.e. variable length label sequences depending on the +frequency of each code point in some text (not necessarily the training set), +or *structural decomposition*, i.e. describing each code point through a +sequence of labels that describe the shape of the grapheme similar to how some +input systems for Chinese characters function. + +While the system is functional it is not well-tested in practice and it is +unclear which approach works best for which kinds of inputs. + +Custom codecs can be supplied as simple JSON files that contain a dictionary +mapping between strings and integer sequences, e.g.: + +.. code-block:: console + + $ ketos train -c sample.codec -f xml training_data/*.xml + +with `sample.codec` containing: + +.. code-block:: json + + {"S": [50, 53, 74, 23], + "A": [95, 60, 19, 95], + "B": [2, 96, 28, 29], + "\u1f05": [91, 14, 95, 90]} + +Segmentation training +--------------------- + +Training a segmentation model is very similar to training models for text +recognition. The basic invocation is: + +.. code-block:: console + + $ ketos segtrain -f xml training_data/*.xml + Training line types: + default 2 53980 + foo 8 134 + Training region types: + graphic 3 135 + text 4 1128 + separator 5 5431 + paragraph 6 10218 + table 7 16 + val check [------------------------------------] 0/0 + +This takes all text lines and regions encoded in the XML files and trains a +model to recognize them. + +Most other options available in transcription training are also available in +segmentation training. CUDA acceleration: + +.. code-block:: console + + $ ketos segtrain -d cuda -f xml training_data/*.xml + +Defining custom architectures: + +.. code-block:: console + + $ ketos segtrain -d cuda -s '[1,1200,0,3 Cr7,7,64,2,2 Gn32 Cr3,3,128,2,2 Gn32 Cr3,3,128 Gn32 Cr3,3,256 Gn32]' -f xml training_data/*.xml + +Fine tuning/transfer learning with last layer adaptation and slicing: + +.. code-block:: console + + $ ketos segtrain --resize both -i segmodel_best.mlmodel training_data/*.xml + $ ketos segtrain -i segmodel_best.mlmodel --append 7 -s '[Cr3,3,64 Do0.1]' training_data/*.xml + +In addition there are a couple of specific options that allow filtering of +baseline and region types. Datasets are often annotated to a level that is too +detailled or contains undesirable types, e.g. when combining segmentation data +from different sources. The most basic option is the suppression of *all* of +either baseline or region data contained in the dataset: + +.. code-block:: console + + $ ketos segtrain --suppress-baselines -f xml training_data/*.xml + Training line types: + Training region types: + graphic 3 135 + text 4 1128 + separator 5 5431 + paragraph 6 10218 + table 7 16 + ... + $ ketos segtrain --suppress-regions -f xml training-data/*.xml + Training line types: + default 2 53980 + foo 8 134 + ... + +It is also possible to filter out baselines/regions selectively: + +.. code-block:: console + + $ ketos segtrain -f xml --valid-baselines default training_data/*.xml + Training line types: + default 2 53980 + Training region types: + graphic 3 135 + text 4 1128 + separator 5 5431 + paragraph 6 10218 + table 7 16 + $ ketos segtrain -f xml --valid-regions graphic --valid-regions paragraph training_data/*.xml + Training line types: + default 2 53980 + Training region types: + graphic 3 135 + paragraph 6 10218 + +Finally, we can merge baselines and regions into each other: + +.. code-block:: console + + $ ketos segtrain -f xml --merge-baselines default:foo training_data/*.xml + Training line types: + default 2 54114 + ... + $ ketos segtrain -f xml --merge-regions text:paragraph --merge-regions graphic:table training_data/*.xml + ... + Training region types: + graphic 3 151 + text 4 11346 + separator 5 5431 + ... + +These options are combinable to massage the dataset into any typology you want. + +Then there are some options that set metadata fields controlling the +postprocessing. When computing the bounding polygons the recognized baselines +are offset slightly to ensure overlap with the line corpus. This offset is per +default upwards for baselines but as it is possible to annotate toplines (for +scripts like Hebrew) and centerlines (for baseline-free scripts like Chinese) +the appropriate offset can be selected with an option: + +.. code-block:: console + + $ ketos segtrain --topline -f xml hebrew_training_data/*.xml + $ ketos segtrain --centerline -f xml chinese_training_data/*.xml + $ ketos segtrain --baseline -f xml latin_training_data/*.xml + +Lastly, there are some regions that are absolute boundaries for text line +content. When these regions are marked as such the polygonization can sometimes +be improved: + +.. code-block:: console + + $ ketos segtrain --bounding-regions paragraph -f xml training_data/*.xml + ... + +Recognition Testing +------------------- + +Picking a particular model from a pool or getting a more detailed look on the +recognition accuracy can be done with the `test` command. It uses transcribed +lines, the test set, in the same format as the `train` command, recognizes the +line images with one or more models, and creates a detailed report of the +differences from the ground truth for each of them. + +======================================================= ====== +option action +======================================================= ====== +-f, --format-type Sets the test set data format. + Valid choices are 'path', 'xml' (default), 'alto', 'page', or binary. + In `alto`, `page`, and xml mode all data is extracted from XML files + containing both baselines and a link to source images. + In `path` mode arguments are image files sharing a prefix up to the last + extension with JSON `.path` files containing the baseline information. + In `binary` mode arguments are precompiled binary dataset files. +-m, --model Model(s) to evaluate. +-e, --evaluation-files File(s) with paths to evaluation data. +-d, --device Select device to use. +--pad Left and right padding around lines. +======================================================= ====== + +Transcriptions are handed to the command in the same way as for the `train` +command, either through a manifest with `-e/--evaluation-files` or by just +adding a number of image files as the final argument: + +.. code-block:: console + + $ ketos test -m $model -e test.txt test/*.png + Evaluating $model + Evaluating [####################################] 100% + === report test_model.mlmodel === + + 7012 Characters + 6022 Errors + 14.12% Accuracy + + 5226 Insertions + 2 Deletions + 794 Substitutions + + Count Missed %Right + 1567 575 63.31% Common + 5230 5230 0.00% Arabic + 215 215 0.00% Inherited + + Errors Correct-Generated + 773 { ا } - { } + 536 { ل } - { } + 328 { و } - { } + 274 { ي } - { } + 266 { م } - { } + 256 { ب } - { } + 246 { ن } - { } + 241 { SPACE } - { } + 207 { ر } - { } + 199 { ف } - { } + 192 { ه } - { } + 174 { ع } - { } + 172 { ARABIC HAMZA ABOVE } - { } + 144 { ت } - { } + 136 { ق } - { } + 122 { س } - { } + 108 { ، } - { } + 106 { د } - { } + 82 { ك } - { } + 81 { ح } - { } + 71 { ج } - { } + 66 { خ } - { } + 62 { ة } - { } + 60 { ص } - { } + 39 { ، } - { - } + 38 { ش } - { } + 30 { ا } - { - } + 30 { ن } - { - } + 29 { ى } - { } + 28 { ذ } - { } + 27 { ه } - { - } + 27 { ARABIC HAMZA BELOW } - { } + 25 { ز } - { } + 23 { ث } - { } + 22 { غ } - { } + 20 { م } - { - } + 20 { ي } - { - } + 20 { ) } - { } + 19 { : } - { } + 19 { ط } - { } + 19 { ل } - { - } + 18 { ، } - { . } + 17 { ة } - { - } + 16 { ض } - { } + ... + Average accuracy: 14.12%, (stddev: 0.00) + +The report(s) contains character accuracy measured per script and a detailed +list of confusions. When evaluating multiple models the last line of the output +will the average accuracy and the standard deviation across all of them. + + diff --git a/4.1/_sources/models.rst.txt b/4.1/_sources/models.rst.txt new file mode 100644 index 000000000..b393f0738 --- /dev/null +++ b/4.1/_sources/models.rst.txt @@ -0,0 +1,24 @@ +.. _models: + +Models +====== + +There are currently three kinds of models containing the recurrent neural +networks doing all the character recognition supported by kraken: ``pronn`` +files serializing old pickled ``pyrnn`` models as protobuf, clstm's native +serialization, and versatile `Core ML +`_ models. + +CoreML +------ + +Core ML allows arbitrary network architectures in a compact serialization with +metadata. This is the default format in pytorch-based kraken. + +Segmentation Models +------------------- + +Recognition Models +------------------ + + diff --git a/4.1/_sources/training.rst.txt b/4.1/_sources/training.rst.txt new file mode 100644 index 000000000..f514da49b --- /dev/null +++ b/4.1/_sources/training.rst.txt @@ -0,0 +1,463 @@ +.. _training: + +Training kraken +=============== + +kraken is an optical character recognition package that can be trained fairly +easily for a large number of scripts. In contrast to other system requiring +segmentation down to glyph level before classification, it is uniquely suited +for the recognition of connected scripts, because the neural network is trained +to assign correct character to unsegmented training data. + +Both segmentation, the process finding lines and regions on a page image, and +recognition, the conversion of line images into text, can be trained in kraken. +To train models for either we require training data, i.e. examples of page +segmentations and transcriptions that are similar to what we want to be able to +recognize. For segmentation the examples are the location of baselines, i.e. +the imaginary lines the text is written on, and polygons of regions. For +recognition these are the text contained in a line. There are multiple ways to +supply training data but the easiest is through PageXML or ALTO files. + +Installing kraken +----------------- + +The easiest way to install and use kraken is through `conda +`_. kraken works both on Linux and Mac OS +X. After installing conda, download the environment file and create the +environment for kraken: + +.. code-block:: console + + $ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment.yml + $ conda env create -f environment.yml + +Each time you want to use the kraken environment in a shell is has to be +activated first: + +.. code-block:: console + + $ conda activate kraken + +Image acquisition and preprocessing +----------------------------------- + +First a number of high quality scans, preferably color or grayscale and at +least 300dpi are required. Scans should be in a lossless image format such as +TIFF or PNG, images in PDF files have to be extracted beforehand using a tool +such as ``pdftocairo`` or ``pdfimages``. While each of these requirements can +be relaxed to a degree, the final accuracy will suffer to some extent. For +example, only slightly compressed JPEG scans are generally suitable for +training and recognition. + +Depending on the source of the scans some preprocessing such as splitting scans +into pages, correcting skew and warp, and removing speckles can be advisable +although it isn't strictly necessary as the segmenter can be trained to treat +noisy material with a high accuracy. A fairly user-friendly software for +semi-automatic batch processing of image scans is `Scantailor +`_ albeit most work can be done using a standard image +editor. + +The total number of scans required depends on the kind of model to train +(segmentation or recognition), the complexity of the layout or the nature of +the script to recognize. Only features that are found in the training data can +later be recognized, so it is important that the coverage of typographic +features is exhaustive. Training a small segmentation model for a particular +kind of material might require less than a few hundred samples while a general +model can well go into the thousands of pages. Likewise a specific recognition +model for printed script with a small grapheme inventory such as Arabic or +Hebrew requires around 800 lines, with manuscripts, complex scripts (such as +polytonic Greek), and general models for multiple typefaces and hands needing +more training data for the same accuracy. + +There is no hard rule for the amount of training data and it may be required to +retrain a model after the initial training data proves insufficient. Most +``western`` texts contain between 25 and 40 lines per page, therefore upward of +30 pages have to be preprocessed and later transcribed. + +Annotation and transcription +---------------------------- + +kraken does not provide internal tools for the annotation and transcription of +baselines, regions, and text. There are a number of tools available that can +create ALTO and PageXML files containing the requisite information for either +segmentation or recognition training: `escriptorium +`_ integrates kraken tightly including +training and inference, `Aletheia +`_ is a powerful desktop +application that can create fine grained annotations. + +Dataset Compilation +------------------- + +.. _compilation: + +Training +-------- + +.. _training_step: + +The training data, e.g. a collection of PAGE XML documents, obtained through +annotation and transcription may now be used to train segmentation and/or +transcription models. + +The training data in ``output_dir`` may now be used to train a new model by +invoking the ``ketos train`` command. Just hand a list of images to the command +such as: + +.. code-block:: console + + $ ketos train output_dir/*.png + +to start training. + +A number of lines will be split off into a separate held-out set that is used +to estimate the actual recognition accuracy achieved in the real world. These +are never shown to the network during training but will be recognized +periodically to evaluate the accuracy of the model. Per default the validation +set will comprise of 10% of the training data. + +Basic model training is mostly automatic albeit there are multiple parameters +that can be adjusted: + +--output + Sets the prefix for models generated during training. They will best as + ``prefix_epochs.mlmodel``. +--report + How often evaluation passes are run on the validation set. It is an + integer equal or larger than 1 with 1 meaning a report is created each + time the complete training set has been seen by the network. +--savefreq + How often intermediate models are saved to disk. It is an integer with + the same semantics as ``--report``. +--load + Continuing training is possible by loading an existing model file with + ``--load``. To continue training from a base model with another + training set refer to the full :ref:`ketos ` documentation. +--preload + Enables/disables preloading of the training set into memory for + accelerated training. The default setting preloads data sets with less + than 2500 lines, explicitly adding ``--preload`` will preload arbitrary + sized sets. ``--no-preload`` disables preloading in all circumstances. + +Training a network will take some time on a modern computer, even with the +default parameters. While the exact time required is unpredictable as training +is a somewhat random process a rough guide is that accuracy seldom improves +after 50 epochs reached between 8 and 24 hours of training. + +When to stop training is a matter of experience; the default setting employs a +fairly reliable approach known as `early stopping +`_ that stops training as soon as +the error rate on the validation set doesn't improve anymore. This will +prevent `overfitting `_, i.e. +fitting the model to recognize only the training data properly instead of the +general patterns contained therein. + +.. code-block:: console + + $ ketos train output_dir/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'} + Initializing model ✓ + Accuracy report (0) -1.5951 3680 9550 + epoch 0/-1 [####################################] 788/788 + Accuracy report (1) 0.0245 3504 3418 + epoch 1/-1 [####################################] 788/788 + Accuracy report (2) 0.8445 3504 545 + epoch 2/-1 [####################################] 788/788 + Accuracy report (3) 0.9541 3504 161 + epoch 3/-1 [------------------------------------] 13/788 0d 00:22:09 + ... + +By now there should be a couple of models model_name-1.mlmodel, +model_name-2.mlmodel, ... in the directory the script was executed in. Lets +take a look at each part of the output. + +.. code-block:: console + + Building training set [####################################] 100% + Building validation set [####################################] 100% + +shows the progress of loading the training and validation set into memory. This +might take a while as preprocessing the whole set and putting it into memory is +computationally intensive. Loading can be made faster without preloading at the +cost of performing preprocessing repeatedly during the training process. + +.. code-block:: console + + [270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'} + +is a warning about missing characters in either the validation or training set, +i.e. that the alphabets of the sets are not equal. Increasing the size of the +validation set will often remedy this warning. + +.. code-block:: console + + Accuracy report (2) 0.8445 3504 545 + +this line shows the results of the validation set evaluation. The error after 2 +epochs is 545 incorrect characters out of 3504 characters in the validation set +for a character accuracy of 84.4%. It should decrease fairly rapidly. If +accuracy remains around 0.30 something is amiss, e.g. non-reordered +right-to-left or wildly incorrect transcriptions. Abort training, correct the +error(s) and start again. + +After training is finished the best model is saved as +``model_name_best.mlmodel``. It is highly recommended to also archive the +training log and data for later reference. + +``ketos`` can also produce more verbose output with training set and network +information by appending one or more ``-v`` to the command: + +.. code-block:: console + + $ ketos -vv train syr/*.png + [0.7272] Building ground truth set from 876 line images + [0.7281] Taking 88 lines from training for evaluation + ... + [0.8479] Training set 788 lines, validation set 88 lines, alphabet 48 symbols + [0.8481] alphabet mismatch {'\xa0', '0', ':', '݀', '܇', '݂', '5'} + [0.8482] grapheme count + [0.8484] SPACE 5258 + [0.8484] ܐ 3519 + [0.8485] ܘ 2334 + [0.8486] ܝ 2096 + [0.8487] ܠ 1754 + [0.8487] ܢ 1724 + [0.8488] ܕ 1697 + [0.8489] ܗ 1681 + [0.8489] ܡ 1623 + [0.8490] ܪ 1359 + [0.8491] ܬ 1339 + [0.8491] ܒ 1184 + [0.8492] ܥ 824 + [0.8492] . 811 + [0.8493] COMBINING DOT BELOW 646 + [0.8493] ܟ 599 + [0.8494] ܫ 577 + [0.8495] COMBINING DIAERESIS 488 + [0.8495] ܚ 431 + [0.8496] ܦ 428 + [0.8496] ܩ 307 + [0.8497] COMBINING DOT ABOVE 259 + [0.8497] ܣ 256 + [0.8498] ܛ 204 + [0.8498] ܓ 176 + [0.8499] ܀ 132 + [0.8499] ܙ 81 + [0.8500] * 66 + [0.8501] ܨ 59 + [0.8501] ܆ 40 + [0.8502] [ 40 + [0.8503] ] 40 + [0.8503] 1 18 + [0.8504] 2 11 + [0.8504] ܇ 9 + [0.8505] 3 8 + [0.8505] 6 + [0.8506] 5 5 + [0.8506] NO-BREAK SPACE 4 + [0.8507] 0 4 + [0.8507] 6 4 + [0.8508] : 4 + [0.8508] 8 4 + [0.8509] 9 3 + [0.8510] 7 3 + [0.8510] 4 3 + [0.8511] SYRIAC FEMININE DOT 1 + [0.8511] SYRIAC RUKKAKHA 1 + [0.8512] Encoding training set + [0.9315] Creating new model [1,1,0,48 Lbx100 Do] with 49 outputs + [0.9318] layer type params + [0.9350] 0 rnn direction b transposed False summarize False out 100 legacy None + [0.9361] 1 dropout probability 0.5 dims 1 + [0.9381] 2 linear augmented False out 49 + [0.9918] Constructing RMSprop optimizer (lr: 0.001, momentum: 0.9) + [0.9920] Set OpenMP threads to 4 + [0.9920] Moving model to device cpu + [0.9924] Starting evaluation run + + +indicates that the training is running on 788 transcribed lines and a +validation set of 88 lines. 49 different classes, i.e. Unicode code points, +where found in these 788 lines. These affect the output size of the network; +obviously only these 49 different classes/code points can later be output by +the network. Importantly, we can see that certain characters occur markedly +less often than others. Characters like the Syriac feminine dot and numerals +that occur less than 10 times will most likely not be recognized well by the +trained net. + + +Evaluation and Validation +------------------------- + +While output during training is detailed enough to know when to stop training +one usually wants to know the specific kinds of errors to expect. Doing more +in-depth error analysis also allows to pinpoint weaknesses in the training +data, e.g. above average error rates for numerals indicate either a lack of +representation of numerals in the training data or erroneous transcription in +the first place. + +First the trained model has to be applied to some line transcriptions with the +`ketos test` command: + +.. code-block:: console + + $ ketos test -m syriac_best.mlmodel lines/*.png + Loading model syriac_best.mlmodel ✓ + Evaluating syriac_best.mlmodel + Evaluating [#-----------------------------------] 3% 00:04:56 + ... + +After all lines have been processed a evaluation report will be printed: + +.. code-block:: console + + === report === + + 35619 Characters + 336 Errors + 99.06% Accuracy + + 157 Insertions + 81 Deletions + 98 Substitutions + + Count Missed %Right + 27046 143 99.47% Syriac + 7015 52 99.26% Common + 1558 60 96.15% Inherited + + Errors Correct-Generated + 25 { } - { COMBINING DOT BELOW } + 25 { COMBINING DOT BELOW } - { } + 15 { . } - { } + 15 { COMBINING DIAERESIS } - { } + 12 { ܢ } - { } + 10 { } - { . } + 8 { COMBINING DOT ABOVE } - { } + 8 { ܝ } - { } + 7 { ZERO WIDTH NO-BREAK SPACE } - { } + 7 { ܆ } - { } + 7 { SPACE } - { } + 7 { ܣ } - { } + 6 { } - { ܝ } + 6 { COMBINING DOT ABOVE } - { COMBINING DIAERESIS } + 5 { ܙ } - { } + 5 { ܬ } - { } + 5 { } - { ܢ } + 4 { NO-BREAK SPACE } - { } + 4 { COMBINING DIAERESIS } - { COMBINING DOT ABOVE } + 4 { } - { ܒ } + 4 { } - { COMBINING DIAERESIS } + 4 { ܗ } - { } + 4 { } - { ܬ } + 4 { } - { ܘ } + 4 { ܕ } - { ܢ } + 3 { } - { ܕ } + 3 { ܐ } - { } + 3 { ܗ } - { ܐ } + 3 { ܝ } - { ܢ } + 3 { ܀ } - { . } + 3 { } - { ܗ } + + ..... + +The first section of the report consists of a simple accounting of the number +of characters in the ground truth, the errors in the recognition output and the +resulting accuracy in per cent. + +The next table lists the number of insertions (characters occurring in the +ground truth but not in the recognition output), substitutions (misrecognized +characters), and deletions (superfluous characters recognized by the model). + +Next is a grouping of errors (insertions and substitutions) by Unicode script. + +The final part of the report are errors sorted by frequency and a per +character accuracy report. Importantly most errors are incorrect recognition of +combining marks such as dots and diaereses. These may have several sources: +different dot placement in training and validation set, incorrect transcription +such as non-systematic transcription, or unclean speckled scans. Depending on +the error source, correction most often involves adding more training data and +fixing transcriptions. Sometimes it may even be advisable to remove +unrepresentative data from the training set. + +Recognition +----------- + +The ``kraken`` utility is employed for all non-training related tasks. Optical +character recognition is a multi-step process consisting of binarization +(conversion of input images to black and white), page segmentation (extracting +lines from the image), and recognition (converting line image to character +sequences). All of these may be run in a single call like this: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m MODEL_FILE + +producing a text file from the input image. There are also `hocr +`_ and `ALTO `_ output +formats available through the appropriate switches: + +.. code-block:: console + + $ kraken -i ... ocr -h + $ kraken -i ... ocr -a + +For debugging purposes it is sometimes helpful to run each step manually and +inspect intermediate results: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE BW_IMAGE binarize + $ kraken -i BW_IMAGE LINES segment + $ kraken -i BW_IMAGE OUTPUT_FILE ocr -l LINES ... + +It is also possible to recognize more than one file at a time by just chaining +``-i ... ...`` clauses like this: + +.. code-block:: console + + $ kraken -i input_1 output_1 -i input_2 output_2 ... + +Finally, there is a central repository containing freely available models. +Getting a list of all available models: + +.. code-block:: console + + $ kraken list + +Retrieving model metadata for a particular model: + +.. code-block:: console + + $ kraken show arabic-alam-al-kutub + name: arabic-alam-al-kutub.mlmodel + + An experimental model for Classical Arabic texts. + + Network trained on 889 lines of [0] as a test case for a general Classical + Arabic model. Ground truth was prepared by Sarah Savant + and Maxim Romanov . + + Vocalization was omitted in the ground truth. Training was stopped at ~35000 + iterations with an accuracy of 97%. + + [0] Ibn al-Faqīh (d. 365 AH). Kitāb al-buldān. Edited by Yūsuf al-Hādī, 1st + edition. Bayrūt: ʿĀlam al-kutub, 1416 AH/1996 CE. + alphabet: !()-.0123456789:[] «»،؟ءابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ARABIC + MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW + +and actually fetching the model: + +.. code-block:: console + + $ kraken get arabic-alam-al-kutub + +The downloaded model can then be used for recognition by the name shown in its metadata, e.g.: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m arabic-alam-al-kutub.mlmodel + +For more documentation see the kraken `website `_. diff --git a/4.1/_sources/vgsl.rst.txt b/4.1/_sources/vgsl.rst.txt new file mode 100644 index 000000000..913a7b5b1 --- /dev/null +++ b/4.1/_sources/vgsl.rst.txt @@ -0,0 +1,199 @@ +.. _vgsl: + +VGSL network specification +========================== + +kraken implements a dialect of the Variable-size Graph Specification Language +(VGSL), enabling the specification of different network architectures for image +processing purposes using a short definition string. + +Basics +------ + +A VGSL specification consists of an input block, one or more layers, and an +output block. For example: + +.. code-block:: console + + [1,48,0,1 Cr3,3,32 Mp2,2 Cr3,3,64 Mp2,2 S1(1x12)1,3 Lbx100 Do O1c103] + +The first block defines the input in order of [batch, height, width, channels] +with zero-valued dimensions being variable. Integer valued height or width +input specifications will result in the input images being automatically scaled +in either dimension. + +When channels are set to 1 grayscale or B/W inputs are expected, 3 expects RGB +color images. Higher values in combination with a height of 1 result in the +network being fed 1 pixel wide grayscale strips scaled to the size of the +channel dimension. + +After the input, a number of layers are defined. Layers operate on the channel +dimension; this is intuitive for convolutional layers but a recurrent layer +doing sequence classification along the width axis on an image of a particular +height requires the height dimension to be moved to the channel dimension, +e.g.: + +.. code-block:: console + + [1,48,0,1 S1(1x48)1,3 Lbx100 O1c103] + +or using the alternative slightly faster formulation: + +.. code-block:: console + + [1,1,0,48 Lbx100 O1c103] + +Finally an output definition is appended. When training sequence classification +networks with the provided tools the appropriate output definition is +automatically appended to the network based on the alphabet of the training +data. + +Examples +-------- + +.. code-block:: console + + [1,1,0,48 Lbx100 Do 01c59] + + Creating new model [1,1,0,48 Lbx100 Do] with 59 outputs + layer type params + 0 rnn direction b transposed False summarize False out 100 legacy None + 1 dropout probability 0.5 dims 1 + 2 linear augmented False out 59 + +A simple recurrent recognition model with a single LSTM layer classifying lines +normalized to 48 pixels in height. + +.. code-block:: console + + [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do 01c59] + + Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 59 outputs + layer type params + 0 conv kernel 3 x 3 filters 32 activation r + 1 dropout probability 0.1 dims 2 + 2 maxpool kernel 2 x 2 stride 2 x 2 + 3 conv kernel 3 x 3 filters 64 activation r + 4 dropout probability 0.1 dims 2 + 5 maxpool kernel 2 x 2 stride 2 x 2 + 6 reshape from 1 1 x 12 to 1/3 + 7 rnn direction b transposed False summarize False out 100 legacy None + 8 dropout probability 0.5 dims 1 + 9 linear augmented False out 59 + +A model with a small convolutional stack before a recurrent LSTM layer. The +extended dropout layer syntax is used to reduce drop probability on the depth +dimension as the default is too high for convolutional layers. The remainder of +the height dimension (`12`) is reshaped into the depth dimensions before +applying the final recurrent and linear layers. + +.. code-block:: console + + [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do 01c59] + + Creating new model [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do] with 59 outputs + layer type params + 0 conv kernel 3 x 3 filters 16 activation r + 1 maxpool kernel 3 x 3 stride 3 x 3 + 2 rnn direction f transposed True summarize True out 64 legacy None + 3 rnn direction b transposed False summarize False out 128 legacy None + 4 rnn direction b transposed False summarize False out 256 legacy None + 5 dropout probability 0.5 dims 1 + 6 linear augmented False out 59 + +A model with arbitrary sized color image input, an initial summarizing +recurrent layer to squash the height to 64, followed by 2 bi-directional +recurrent layers and a linear projection. + +Convolutional Layers +-------------------- + +.. code-block:: console + + C[{name}](s|t|r|l|m)[{name}],,[,,] + s = sigmoid + t = tanh + r = relu + l = linear + m = softmax + +Adds a 2D convolution with kernel size `(y, x)` and `d` output channels, applying +the selected nonlinearity. The stride can be adjusted with the optional last +two parameters. + +Recurrent Layers +---------------- + +.. code-block:: console + + L[{name}](f|r|b)(x|y)[s][{name}] LSTM cell with n outputs. + G[{name}](f|r|b)(x|y)[s][{name}] GRU cell with n outputs. + f runs the RNN forward only. + r runs the RNN reversed only. + b runs the RNN bidirectionally. + s (optional) summarizes the output in the requested dimension, return the last step. + +Adds either an LSTM or GRU recurrent layer to the network using either the `x` +(width) or `y` (height) dimension as the time axis. Input features are the +channel dimension and the non-time-axis dimension (height/width) is treated as +another batch dimension. For example, a `Lfx25` layer on an `1, 16, 906, 32` +input will execute 16 independent forward passes on `906x32` tensors resulting +in an output of shape `1, 16, 906, 25`. If this isn't desired either run a +summarizing layer in the other direction, e.g. `Lfys20` for an input `1, 1, +906, 20`, or prepend a reshape layer `S1(1x16)1,3` combining the height and +channel dimension for an `1, 1, 906, 512` input to the recurrent layer. + +Helper and Plumbing Layers +-------------------------- + +Max Pool +^^^^^^^^ +.. code-block:: console + + Mp[{name}],[,,] + +Adds a maximum pooling with `(y, x)` kernel_size and `(y_stride, x_stride)` stride. + +Reshape +^^^^^^^ + +.. code-block:: console + + S[{name}](x), Splits one dimension, moves one part to another + dimension. + +The `S` layer reshapes a source dimension `d` to `a,b` and distributes `a` into +dimension `e`, respectively `b` into `f`. Either `e` or `f` has to be equal to +`d`. So `S1(1, 48)1, 3` on an `1, 48, 1020, 8` input will first reshape into +`1, 1, 48, 1020, 8`, leave the `1` part in the height dimension and distribute +the `48` sized tensor into the channel dimension resulting in a `1, 1, 1024, +48*8=384` sized output. `S` layers are mostly used to remove undesirable non-1 +height before a recurrent layer. + +.. note:: + + This `S` layer is equivalent to the one implemented in the tensorflow + implementation of VGSL, i.e. behaves differently from tesseract. + +Regularization Layers +--------------------- + +Dropout +^^^^^^^ + +.. code-block:: console + + Do[{name}][],[] Insert a 1D or 2D dropout layer + +Adds an 1D or 2D dropout layer with a given probability. Defaults to `0.5` drop +probability and 1D dropout. Set to `dim` to `2` after convolutional layers. + +Group Normalization +^^^^^^^^^^^^^^^^^^^ + +.. code-block:: console + + Gn Inserts a group normalization layer + +Adds a group normalization layer separating the input into `` groups, +normalizing each separately. diff --git a/4.1/_static/alabaster.css b/4.1/_static/alabaster.css new file mode 100644 index 000000000..e3174bf93 --- /dev/null +++ b/4.1/_static/alabaster.css @@ -0,0 +1,708 @@ +@import url("basic.css"); + +/* -- page layout ----------------------------------------------------------- */ + +body { + font-family: Georgia, serif; + font-size: 17px; + background-color: #fff; + color: #000; + margin: 0; + padding: 0; +} + + +div.document { + width: 940px; + margin: 30px auto 0 auto; +} + +div.documentwrapper { + float: left; + width: 100%; +} + +div.bodywrapper { + margin: 0 0 0 220px; +} + +div.sphinxsidebar { + width: 220px; + font-size: 14px; + line-height: 1.5; +} + +hr { + border: 1px solid #B1B4B6; +} + +div.body { + background-color: #fff; + color: #3E4349; + padding: 0 30px 0 30px; +} + +div.body > .section { + text-align: left; +} + +div.footer { + width: 940px; + margin: 20px auto 30px auto; + font-size: 14px; + color: #888; + text-align: right; +} + +div.footer a { + color: #888; +} + +p.caption { + font-family: inherit; + font-size: inherit; +} + + +div.relations { + display: none; +} + + +div.sphinxsidebar { + max-height: 100%; + overflow-y: auto; +} + +div.sphinxsidebar a { + color: #444; + text-decoration: none; + border-bottom: 1px dotted #999; +} + +div.sphinxsidebar a:hover { + border-bottom: 1px solid #999; +} + +div.sphinxsidebarwrapper { + padding: 18px 10px; +} + +div.sphinxsidebarwrapper p.logo { + padding: 0; + margin: -10px 0 0 0px; + text-align: center; +} + +div.sphinxsidebarwrapper h1.logo { + margin-top: -10px; + text-align: center; + margin-bottom: 5px; + text-align: left; +} + +div.sphinxsidebarwrapper h1.logo-name { + margin-top: 0px; +} + +div.sphinxsidebarwrapper p.blurb { + margin-top: 0; + font-style: normal; +} + +div.sphinxsidebar h3, +div.sphinxsidebar h4 { + font-family: Georgia, serif; + color: #444; + font-size: 24px; + font-weight: normal; + margin: 0 0 5px 0; + padding: 0; +} + +div.sphinxsidebar h4 { + font-size: 20px; +} + +div.sphinxsidebar h3 a { + color: #444; +} + +div.sphinxsidebar p.logo a, +div.sphinxsidebar h3 a, +div.sphinxsidebar p.logo a:hover, +div.sphinxsidebar h3 a:hover { + border: none; +} + +div.sphinxsidebar p { + color: #555; + margin: 10px 0; +} + +div.sphinxsidebar ul { + margin: 10px 0; + padding: 0; + color: #000; +} + +div.sphinxsidebar ul li.toctree-l1 > a { + font-size: 120%; +} + +div.sphinxsidebar ul li.toctree-l2 > a { + font-size: 110%; +} + +div.sphinxsidebar input { + border: 1px solid #CCC; + font-family: Georgia, serif; + font-size: 1em; +} + +div.sphinxsidebar #searchbox input[type="text"] { + width: 160px; +} + +div.sphinxsidebar .search > div { + display: table-cell; +} + +div.sphinxsidebar hr { + border: none; + height: 1px; + color: #AAA; + background: #AAA; + + text-align: left; + margin-left: 0; + width: 50%; +} + +div.sphinxsidebar .badge { + border-bottom: none; +} + +div.sphinxsidebar .badge:hover { + border-bottom: none; +} + +/* To address an issue with donation coming after search */ +div.sphinxsidebar h3.donation { + margin-top: 10px; +} + +/* -- body styles ----------------------------------------------------------- */ + +a { + color: #004B6B; + text-decoration: underline; +} + +a:hover { + color: #6D4100; + text-decoration: underline; +} + +div.body h1, +div.body h2, +div.body h3, +div.body h4, +div.body h5, +div.body h6 { + font-family: Georgia, serif; + font-weight: normal; + margin: 30px 0px 10px 0px; + padding: 0; +} + +div.body h1 { margin-top: 0; padding-top: 0; font-size: 240%; } +div.body h2 { font-size: 180%; } +div.body h3 { font-size: 150%; } +div.body h4 { font-size: 130%; } +div.body h5 { font-size: 100%; } +div.body h6 { font-size: 100%; } + +a.headerlink { + color: #DDD; + padding: 0 4px; + text-decoration: none; +} + +a.headerlink:hover { + color: #444; + background: #EAEAEA; +} + +div.body p, div.body dd, div.body li { + line-height: 1.4em; +} + +div.admonition { + margin: 20px 0px; + padding: 10px 30px; + background-color: #EEE; + border: 1px solid #CCC; +} + +div.admonition tt.xref, div.admonition code.xref, div.admonition a tt { + background-color: #FBFBFB; + border-bottom: 1px solid #fafafa; +} + +div.admonition p.admonition-title { + font-family: Georgia, serif; + font-weight: normal; + font-size: 24px; + margin: 0 0 10px 0; + padding: 0; + line-height: 1; +} + +div.admonition p.last { + margin-bottom: 0; +} + +div.highlight { + background-color: #fff; +} + +dt:target, .highlight { + background: #FAF3E8; +} + +div.warning { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.danger { + background-color: #FCC; + border: 1px solid #FAA; + -moz-box-shadow: 2px 2px 4px #D52C2C; + -webkit-box-shadow: 2px 2px 4px #D52C2C; + box-shadow: 2px 2px 4px #D52C2C; +} + +div.error { + background-color: #FCC; + border: 1px solid #FAA; + -moz-box-shadow: 2px 2px 4px #D52C2C; + -webkit-box-shadow: 2px 2px 4px #D52C2C; + box-shadow: 2px 2px 4px #D52C2C; +} + +div.caution { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.attention { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.important { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.note { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.tip { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.hint { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.seealso { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.topic { + background-color: #EEE; +} + +p.admonition-title { + display: inline; +} + +p.admonition-title:after { + content: ":"; +} + +pre, tt, code { + font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace; + font-size: 0.9em; +} + +.hll { + background-color: #FFC; + margin: 0 -12px; + padding: 0 12px; + display: block; +} + +img.screenshot { +} + +tt.descname, tt.descclassname, code.descname, code.descclassname { + font-size: 0.95em; +} + +tt.descname, code.descname { + padding-right: 0.08em; +} + +img.screenshot { + -moz-box-shadow: 2px 2px 4px #EEE; + -webkit-box-shadow: 2px 2px 4px #EEE; + box-shadow: 2px 2px 4px #EEE; +} + +table.docutils { + border: 1px solid #888; + -moz-box-shadow: 2px 2px 4px #EEE; + -webkit-box-shadow: 2px 2px 4px #EEE; + box-shadow: 2px 2px 4px #EEE; +} + +table.docutils td, table.docutils th { + border: 1px solid #888; + padding: 0.25em 0.7em; +} + +table.field-list, table.footnote { + border: none; + -moz-box-shadow: none; + -webkit-box-shadow: none; + box-shadow: none; +} + +table.footnote { + margin: 15px 0; + width: 100%; + border: 1px solid #EEE; + background: #FDFDFD; + font-size: 0.9em; +} + +table.footnote + table.footnote { + margin-top: -15px; + border-top: none; +} + +table.field-list th { + padding: 0 0.8em 0 0; +} + +table.field-list td { + padding: 0; +} + +table.field-list p { + margin-bottom: 0.8em; +} + +/* Cloned from + * https://github.com/sphinx-doc/sphinx/commit/ef60dbfce09286b20b7385333d63a60321784e68 + */ +.field-name { + -moz-hyphens: manual; + -ms-hyphens: manual; + -webkit-hyphens: manual; + hyphens: manual; +} + +table.footnote td.label { + width: .1px; + padding: 0.3em 0 0.3em 0.5em; +} + +table.footnote td { + padding: 0.3em 0.5em; +} + +dl { + margin-left: 0; + margin-right: 0; + margin-top: 0; + padding: 0; +} + +dl dd { + margin-left: 30px; +} + +blockquote { + margin: 0 0 0 30px; + padding: 0; +} + +ul, ol { + /* Matches the 30px from the narrow-screen "li > ul" selector below */ + margin: 10px 0 10px 30px; + padding: 0; +} + +pre { + background: #EEE; + padding: 7px 30px; + margin: 15px 0px; + line-height: 1.3em; +} + +div.viewcode-block:target { + background: #ffd; +} + +dl pre, blockquote pre, li pre { + margin-left: 0; + padding-left: 30px; +} + +tt, code { + background-color: #ecf0f3; + color: #222; + /* padding: 1px 2px; */ +} + +tt.xref, code.xref, a tt { + background-color: #FBFBFB; + border-bottom: 1px solid #fff; +} + +a.reference { + text-decoration: none; + border-bottom: 1px dotted #004B6B; +} + +/* Don't put an underline on images */ +a.image-reference, a.image-reference:hover { + border-bottom: none; +} + +a.reference:hover { + border-bottom: 1px solid #6D4100; +} + +a.footnote-reference { + text-decoration: none; + font-size: 0.7em; + vertical-align: top; + border-bottom: 1px dotted #004B6B; +} + +a.footnote-reference:hover { + border-bottom: 1px solid #6D4100; +} + +a:hover tt, a:hover code { + background: #EEE; +} + + +@media screen and (max-width: 870px) { + + div.sphinxsidebar { + display: none; + } + + div.document { + width: 100%; + + } + + div.documentwrapper { + margin-left: 0; + margin-top: 0; + margin-right: 0; + margin-bottom: 0; + } + + div.bodywrapper { + margin-top: 0; + margin-right: 0; + margin-bottom: 0; + margin-left: 0; + } + + ul { + margin-left: 0; + } + + li > ul { + /* Matches the 30px from the "ul, ol" selector above */ + margin-left: 30px; + } + + .document { + width: auto; + } + + .footer { + width: auto; + } + + .bodywrapper { + margin: 0; + } + + .footer { + width: auto; + } + + .github { + display: none; + } + + + +} + + + +@media screen and (max-width: 875px) { + + body { + margin: 0; + padding: 20px 30px; + } + + div.documentwrapper { + float: none; + background: #fff; + } + + div.sphinxsidebar { + display: block; + float: none; + width: 102.5%; + margin: 50px -30px -20px -30px; + padding: 10px 20px; + background: #333; + color: #FFF; + } + + div.sphinxsidebar h3, div.sphinxsidebar h4, div.sphinxsidebar p, + div.sphinxsidebar h3 a { + color: #fff; + } + + div.sphinxsidebar a { + color: #AAA; + } + + div.sphinxsidebar p.logo { + display: none; + } + + div.document { + width: 100%; + margin: 0; + } + + div.footer { + display: none; + } + + div.bodywrapper { + margin: 0; + } + + div.body { + min-height: 0; + padding: 0; + } + + .rtd_doc_footer { + display: none; + } + + .document { + width: auto; + } + + .footer { + width: auto; + } + + .footer { + width: auto; + } + + .github { + display: none; + } +} + + +/* misc. */ + +.revsys-inline { + display: none!important; +} + +/* Hide ugly table cell borders in ..bibliography:: directive output */ +table.docutils.citation, table.docutils.citation td, table.docutils.citation th { + border: none; + /* Below needed in some edge cases; if not applied, bottom shadows appear */ + -moz-box-shadow: none; + -webkit-box-shadow: none; + box-shadow: none; +} + + +/* relbar */ + +.related { + line-height: 30px; + width: 100%; + font-size: 0.9rem; +} + +.related.top { + border-bottom: 1px solid #EEE; + margin-bottom: 20px; +} + +.related.bottom { + border-top: 1px solid #EEE; +} + +.related ul { + padding: 0; + margin: 0; + list-style: none; +} + +.related li { + display: inline; +} + +nav#rellinks { + float: right; +} + +nav#rellinks li+li:before { + content: "|"; +} + +nav#breadcrumbs li+li:before { + content: "\00BB"; +} + +/* Hide certain items when printing */ +@media print { + div.related { + display: none; + } +} \ No newline at end of file diff --git a/4.1/_static/basic.css b/4.1/_static/basic.css new file mode 100644 index 000000000..e5179b7a9 --- /dev/null +++ b/4.1/_static/basic.css @@ -0,0 +1,925 @@ +/* + * basic.css + * ~~~~~~~~~ + * + * Sphinx stylesheet -- basic theme. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +/* -- main layout ----------------------------------------------------------- */ + +div.clearer { + clear: both; +} + +div.section::after { + display: block; + content: ''; + clear: left; +} + +/* -- relbar ---------------------------------------------------------------- */ + +div.related { + width: 100%; + font-size: 90%; +} + +div.related h3 { + display: none; +} + +div.related ul { + margin: 0; + padding: 0 0 0 10px; + list-style: none; +} + +div.related li { + display: inline; +} + +div.related li.right { + float: right; + margin-right: 5px; +} + +/* -- sidebar --------------------------------------------------------------- */ + +div.sphinxsidebarwrapper { + padding: 10px 5px 0 10px; +} + +div.sphinxsidebar { + float: left; + width: 230px; + margin-left: -100%; + font-size: 90%; + word-wrap: break-word; + overflow-wrap : break-word; +} + +div.sphinxsidebar ul { + list-style: none; +} + +div.sphinxsidebar ul ul, +div.sphinxsidebar ul.want-points { + margin-left: 20px; + list-style: square; +} + +div.sphinxsidebar ul ul { + margin-top: 0; + margin-bottom: 0; +} + +div.sphinxsidebar form { + margin-top: 10px; +} + +div.sphinxsidebar input { + border: 1px solid #98dbcc; + font-family: sans-serif; + font-size: 1em; +} + +div.sphinxsidebar #searchbox form.search { + overflow: hidden; +} + +div.sphinxsidebar #searchbox input[type="text"] { + float: left; + width: 80%; + padding: 0.25em; + box-sizing: border-box; +} + +div.sphinxsidebar #searchbox input[type="submit"] { + float: left; + width: 20%; + border-left: none; + padding: 0.25em; + box-sizing: border-box; +} + + +img { + border: 0; + max-width: 100%; +} + +/* -- search page ----------------------------------------------------------- */ + +ul.search { + margin: 10px 0 0 20px; + padding: 0; +} + +ul.search li { + padding: 5px 0 5px 20px; + background-image: url(file.png); + background-repeat: no-repeat; + background-position: 0 7px; +} + +ul.search li a { + font-weight: bold; +} + +ul.search li p.context { + color: #888; + margin: 2px 0 0 30px; + text-align: left; +} + +ul.keywordmatches li.goodmatch a { + font-weight: bold; +} + +/* -- index page ------------------------------------------------------------ */ + +table.contentstable { + width: 90%; + margin-left: auto; + margin-right: auto; +} + +table.contentstable p.biglink { + line-height: 150%; +} + +a.biglink { + font-size: 1.3em; +} + +span.linkdescr { + font-style: italic; + padding-top: 5px; + font-size: 90%; +} + +/* -- general index --------------------------------------------------------- */ + +table.indextable { + width: 100%; +} + +table.indextable td { + text-align: left; + vertical-align: top; +} + +table.indextable ul { + margin-top: 0; + margin-bottom: 0; + list-style-type: none; +} + +table.indextable > tbody > tr > td > ul { + padding-left: 0em; +} + +table.indextable tr.pcap { + height: 10px; +} + +table.indextable tr.cap { + margin-top: 10px; + background-color: #f2f2f2; +} + +img.toggler { + margin-right: 3px; + margin-top: 3px; + cursor: pointer; +} + +div.modindex-jumpbox { + border-top: 1px solid #ddd; + border-bottom: 1px solid #ddd; + margin: 1em 0 1em 0; + padding: 0.4em; +} + +div.genindex-jumpbox { + border-top: 1px solid #ddd; + border-bottom: 1px solid #ddd; + margin: 1em 0 1em 0; + padding: 0.4em; +} + +/* -- domain module index --------------------------------------------------- */ + +table.modindextable td { + padding: 2px; + border-collapse: collapse; +} + +/* -- general body styles --------------------------------------------------- */ + +div.body { + min-width: inherit; + max-width: 800px; +} + +div.body p, div.body dd, div.body li, div.body blockquote { + -moz-hyphens: auto; + -ms-hyphens: auto; + -webkit-hyphens: auto; + hyphens: auto; +} + +a.headerlink { + visibility: hidden; +} + +a:visited { + color: #551A8B; +} + +h1:hover > a.headerlink, +h2:hover > a.headerlink, +h3:hover > a.headerlink, +h4:hover > a.headerlink, +h5:hover > a.headerlink, +h6:hover > a.headerlink, +dt:hover > a.headerlink, +caption:hover > a.headerlink, +p.caption:hover > a.headerlink, +div.code-block-caption:hover > a.headerlink { + visibility: visible; +} + +div.body p.caption { + text-align: inherit; +} + +div.body td { + text-align: left; +} + +.first { + margin-top: 0 !important; +} + +p.rubric { + margin-top: 30px; + font-weight: bold; +} + +img.align-left, figure.align-left, .figure.align-left, object.align-left { + clear: left; + float: left; + margin-right: 1em; +} + +img.align-right, figure.align-right, .figure.align-right, object.align-right { + clear: right; + float: right; + margin-left: 1em; +} + +img.align-center, figure.align-center, .figure.align-center, object.align-center { + display: block; + margin-left: auto; + margin-right: auto; +} + +img.align-default, figure.align-default, .figure.align-default { + display: block; + margin-left: auto; + margin-right: auto; +} + +.align-left { + text-align: left; +} + +.align-center { + text-align: center; +} + +.align-default { + text-align: center; +} + +.align-right { + text-align: right; +} + +/* -- sidebars -------------------------------------------------------------- */ + +div.sidebar, +aside.sidebar { + margin: 0 0 0.5em 1em; + border: 1px solid #ddb; + padding: 7px; + background-color: #ffe; + width: 40%; + float: right; + clear: right; + overflow-x: auto; +} + +p.sidebar-title { + font-weight: bold; +} + +nav.contents, +aside.topic, +div.admonition, div.topic, blockquote { + clear: left; +} + +/* -- topics ---------------------------------------------------------------- */ + +nav.contents, +aside.topic, +div.topic { + border: 1px solid #ccc; + padding: 7px; + margin: 10px 0 10px 0; +} + +p.topic-title { + font-size: 1.1em; + font-weight: bold; + margin-top: 10px; +} + +/* -- admonitions ----------------------------------------------------------- */ + +div.admonition { + margin-top: 10px; + margin-bottom: 10px; + padding: 7px; +} + +div.admonition dt { + font-weight: bold; +} + +p.admonition-title { + margin: 0px 10px 5px 0px; + font-weight: bold; +} + +div.body p.centered { + text-align: center; + margin-top: 25px; +} + +/* -- content of sidebars/topics/admonitions -------------------------------- */ + +div.sidebar > :last-child, +aside.sidebar > :last-child, +nav.contents > :last-child, +aside.topic > :last-child, +div.topic > :last-child, +div.admonition > :last-child { + margin-bottom: 0; +} + +div.sidebar::after, +aside.sidebar::after, +nav.contents::after, +aside.topic::after, +div.topic::after, +div.admonition::after, +blockquote::after { + display: block; + content: ''; + clear: both; +} + +/* -- tables ---------------------------------------------------------------- */ + +table.docutils { + margin-top: 10px; + margin-bottom: 10px; + border: 0; + border-collapse: collapse; +} + +table.align-center { + margin-left: auto; + margin-right: auto; +} + +table.align-default { + margin-left: auto; + margin-right: auto; +} + +table caption span.caption-number { + font-style: italic; +} + +table caption span.caption-text { +} + +table.docutils td, table.docutils th { + padding: 1px 8px 1px 5px; + border-top: 0; + border-left: 0; + border-right: 0; + border-bottom: 1px solid #aaa; +} + +th { + text-align: left; + padding-right: 5px; +} + +table.citation { + border-left: solid 1px gray; + margin-left: 1px; +} + +table.citation td { + border-bottom: none; +} + +th > :first-child, +td > :first-child { + margin-top: 0px; +} + +th > :last-child, +td > :last-child { + margin-bottom: 0px; +} + +/* -- figures --------------------------------------------------------------- */ + +div.figure, figure { + margin: 0.5em; + padding: 0.5em; +} + +div.figure p.caption, figcaption { + padding: 0.3em; +} + +div.figure p.caption span.caption-number, +figcaption span.caption-number { + font-style: italic; +} + +div.figure p.caption span.caption-text, +figcaption span.caption-text { +} + +/* -- field list styles ----------------------------------------------------- */ + +table.field-list td, table.field-list th { + border: 0 !important; +} + +.field-list ul { + margin: 0; + padding-left: 1em; +} + +.field-list p { + margin: 0; +} + +.field-name { + -moz-hyphens: manual; + -ms-hyphens: manual; + -webkit-hyphens: manual; + hyphens: manual; +} + +/* -- hlist styles ---------------------------------------------------------- */ + +table.hlist { + margin: 1em 0; +} + +table.hlist td { + vertical-align: top; +} + +/* -- object description styles --------------------------------------------- */ + +.sig { + font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace; +} + +.sig-name, code.descname { + background-color: transparent; + font-weight: bold; +} + +.sig-name { + font-size: 1.1em; +} + +code.descname { + font-size: 1.2em; +} + +.sig-prename, code.descclassname { + background-color: transparent; +} + +.optional { + font-size: 1.3em; +} + +.sig-paren { + font-size: larger; +} + +.sig-param.n { + font-style: italic; +} + +/* C++ specific styling */ + +.sig-inline.c-texpr, +.sig-inline.cpp-texpr { + font-family: unset; +} + +.sig.c .k, .sig.c .kt, +.sig.cpp .k, .sig.cpp .kt { + color: #0033B3; +} + +.sig.c .m, +.sig.cpp .m { + color: #1750EB; +} + +.sig.c .s, .sig.c .sc, +.sig.cpp .s, .sig.cpp .sc { + color: #067D17; +} + + +/* -- other body styles ----------------------------------------------------- */ + +ol.arabic { + list-style: decimal; +} + +ol.loweralpha { + list-style: lower-alpha; +} + +ol.upperalpha { + list-style: upper-alpha; +} + +ol.lowerroman { + list-style: lower-roman; +} + +ol.upperroman { + list-style: upper-roman; +} + +:not(li) > ol > li:first-child > :first-child, +:not(li) > ul > li:first-child > :first-child { + margin-top: 0px; +} + +:not(li) > ol > li:last-child > :last-child, +:not(li) > ul > li:last-child > :last-child { + margin-bottom: 0px; +} + +ol.simple ol p, +ol.simple ul p, +ul.simple ol p, +ul.simple ul p { + margin-top: 0; +} + +ol.simple > li:not(:first-child) > p, +ul.simple > li:not(:first-child) > p { + margin-top: 0; +} + +ol.simple p, +ul.simple p { + margin-bottom: 0; +} + +aside.footnote > span, +div.citation > span { + float: left; +} +aside.footnote > span:last-of-type, +div.citation > span:last-of-type { + padding-right: 0.5em; +} +aside.footnote > p { + margin-left: 2em; +} +div.citation > p { + margin-left: 4em; +} +aside.footnote > p:last-of-type, +div.citation > p:last-of-type { + margin-bottom: 0em; +} +aside.footnote > p:last-of-type:after, +div.citation > p:last-of-type:after { + content: ""; + clear: both; +} + +dl.field-list { + display: grid; + grid-template-columns: fit-content(30%) auto; +} + +dl.field-list > dt { + font-weight: bold; + word-break: break-word; + padding-left: 0.5em; + padding-right: 5px; +} + +dl.field-list > dd { + padding-left: 0.5em; + margin-top: 0em; + margin-left: 0em; + margin-bottom: 0em; +} + +dl { + margin-bottom: 15px; +} + +dd > :first-child { + margin-top: 0px; +} + +dd ul, dd table { + margin-bottom: 10px; +} + +dd { + margin-top: 3px; + margin-bottom: 10px; + margin-left: 30px; +} + +.sig dd { + margin-top: 0px; + margin-bottom: 0px; +} + +.sig dl { + margin-top: 0px; + margin-bottom: 0px; +} + +dl > dd:last-child, +dl > dd:last-child > :last-child { + margin-bottom: 0; +} + +dt:target, span.highlighted { + background-color: #fbe54e; +} + +rect.highlighted { + fill: #fbe54e; +} + +dl.glossary dt { + font-weight: bold; + font-size: 1.1em; +} + +.versionmodified { + font-style: italic; +} + +.system-message { + background-color: #fda; + padding: 5px; + border: 3px solid red; +} + +.footnote:target { + background-color: #ffa; +} + +.line-block { + display: block; + margin-top: 1em; + margin-bottom: 1em; +} + +.line-block .line-block { + margin-top: 0; + margin-bottom: 0; + margin-left: 1.5em; +} + +.guilabel, .menuselection { + font-family: sans-serif; +} + +.accelerator { + text-decoration: underline; +} + +.classifier { + font-style: oblique; +} + +.classifier:before { + font-style: normal; + margin: 0 0.5em; + content: ":"; + display: inline-block; +} + +abbr, acronym { + border-bottom: dotted 1px; + cursor: help; +} + +.translated { + background-color: rgba(207, 255, 207, 0.2) +} + +.untranslated { + background-color: rgba(255, 207, 207, 0.2) +} + +/* -- code displays --------------------------------------------------------- */ + +pre { + overflow: auto; + overflow-y: hidden; /* fixes display issues on Chrome browsers */ +} + +pre, div[class*="highlight-"] { + clear: both; +} + +span.pre { + -moz-hyphens: none; + -ms-hyphens: none; + -webkit-hyphens: none; + hyphens: none; + white-space: nowrap; +} + +div[class*="highlight-"] { + margin: 1em 0; +} + +td.linenos pre { + border: 0; + background-color: transparent; + color: #aaa; +} + +table.highlighttable { + display: block; +} + +table.highlighttable tbody { + display: block; +} + +table.highlighttable tr { + display: flex; +} + +table.highlighttable td { + margin: 0; + padding: 0; +} + +table.highlighttable td.linenos { + padding-right: 0.5em; +} + +table.highlighttable td.code { + flex: 1; + overflow: hidden; +} + +.highlight .hll { + display: block; +} + +div.highlight pre, +table.highlighttable pre { + margin: 0; +} + +div.code-block-caption + div { + margin-top: 0; +} + +div.code-block-caption { + margin-top: 1em; + padding: 2px 5px; + font-size: small; +} + +div.code-block-caption code { + background-color: transparent; +} + +table.highlighttable td.linenos, +span.linenos, +div.highlight span.gp { /* gp: Generic.Prompt */ + user-select: none; + -webkit-user-select: text; /* Safari fallback only */ + -webkit-user-select: none; /* Chrome/Safari */ + -moz-user-select: none; /* Firefox */ + -ms-user-select: none; /* IE10+ */ +} + +div.code-block-caption span.caption-number { + padding: 0.1em 0.3em; + font-style: italic; +} + +div.code-block-caption span.caption-text { +} + +div.literal-block-wrapper { + margin: 1em 0; +} + +code.xref, a code { + background-color: transparent; + font-weight: bold; +} + +h1 code, h2 code, h3 code, h4 code, h5 code, h6 code { + background-color: transparent; +} + +.viewcode-link { + float: right; +} + +.viewcode-back { + float: right; + font-family: sans-serif; +} + +div.viewcode-block:target { + margin: -1px -10px; + padding: 0 10px; +} + +/* -- math display ---------------------------------------------------------- */ + +img.math { + vertical-align: middle; +} + +div.body div.math p { + text-align: center; +} + +span.eqno { + float: right; +} + +span.eqno a.headerlink { + position: absolute; + z-index: 1; +} + +div.math:hover a.headerlink { + visibility: visible; +} + +/* -- printout stylesheet --------------------------------------------------- */ + +@media print { + div.document, + div.documentwrapper, + div.bodywrapper { + margin: 0 !important; + width: 100%; + } + + div.sphinxsidebar, + div.related, + div.footer, + #top-link { + display: none; + } +} \ No newline at end of file diff --git a/4.1/_static/blla_heatmap.jpg b/4.1/_static/blla_heatmap.jpg new file mode 100644 index 000000000..3f3381096 Binary files /dev/null and b/4.1/_static/blla_heatmap.jpg differ diff --git a/4.1/_static/blla_output.jpg b/4.1/_static/blla_output.jpg new file mode 100644 index 000000000..72c652fa4 Binary files /dev/null and b/4.1/_static/blla_output.jpg differ diff --git a/4.1/_static/bw.png b/4.1/_static/bw.png new file mode 100644 index 000000000..e7e5054eb Binary files /dev/null and b/4.1/_static/bw.png differ diff --git a/4.1/_static/custom.css b/4.1/_static/custom.css new file mode 100644 index 000000000..c41f90af5 --- /dev/null +++ b/4.1/_static/custom.css @@ -0,0 +1,24 @@ +pre { + white-space: pre-wrap; +} +svg { + width: 100%; +} +.highlight .err { + border: inherit; + box-sizing: inherit; +} + +div.leftside { + width: 110px; + padding: 0px 3px 0px 0px; + float: left; +} + +div.rightside { + margin-left: 125px; +} + +dl.py { + margin-top: 25px; +} diff --git a/4.1/_static/doctools.js b/4.1/_static/doctools.js new file mode 100644 index 000000000..4d67807d1 --- /dev/null +++ b/4.1/_static/doctools.js @@ -0,0 +1,156 @@ +/* + * doctools.js + * ~~~~~~~~~~~ + * + * Base JavaScript utilities for all Sphinx HTML documentation. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ +"use strict"; + +const BLACKLISTED_KEY_CONTROL_ELEMENTS = new Set([ + "TEXTAREA", + "INPUT", + "SELECT", + "BUTTON", +]); + +const _ready = (callback) => { + if (document.readyState !== "loading") { + callback(); + } else { + document.addEventListener("DOMContentLoaded", callback); + } +}; + +/** + * Small JavaScript module for the documentation. + */ +const Documentation = { + init: () => { + Documentation.initDomainIndexTable(); + Documentation.initOnKeyListeners(); + }, + + /** + * i18n support + */ + TRANSLATIONS: {}, + PLURAL_EXPR: (n) => (n === 1 ? 0 : 1), + LOCALE: "unknown", + + // gettext and ngettext don't access this so that the functions + // can safely bound to a different name (_ = Documentation.gettext) + gettext: (string) => { + const translated = Documentation.TRANSLATIONS[string]; + switch (typeof translated) { + case "undefined": + return string; // no translation + case "string": + return translated; // translation exists + default: + return translated[0]; // (singular, plural) translation tuple exists + } + }, + + ngettext: (singular, plural, n) => { + const translated = Documentation.TRANSLATIONS[singular]; + if (typeof translated !== "undefined") + return translated[Documentation.PLURAL_EXPR(n)]; + return n === 1 ? singular : plural; + }, + + addTranslations: (catalog) => { + Object.assign(Documentation.TRANSLATIONS, catalog.messages); + Documentation.PLURAL_EXPR = new Function( + "n", + `return (${catalog.plural_expr})` + ); + Documentation.LOCALE = catalog.locale; + }, + + /** + * helper function to focus on search bar + */ + focusSearchBar: () => { + document.querySelectorAll("input[name=q]")[0]?.focus(); + }, + + /** + * Initialise the domain index toggle buttons + */ + initDomainIndexTable: () => { + const toggler = (el) => { + const idNumber = el.id.substr(7); + const toggledRows = document.querySelectorAll(`tr.cg-${idNumber}`); + if (el.src.substr(-9) === "minus.png") { + el.src = `${el.src.substr(0, el.src.length - 9)}plus.png`; + toggledRows.forEach((el) => (el.style.display = "none")); + } else { + el.src = `${el.src.substr(0, el.src.length - 8)}minus.png`; + toggledRows.forEach((el) => (el.style.display = "")); + } + }; + + const togglerElements = document.querySelectorAll("img.toggler"); + togglerElements.forEach((el) => + el.addEventListener("click", (event) => toggler(event.currentTarget)) + ); + togglerElements.forEach((el) => (el.style.display = "")); + if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) togglerElements.forEach(toggler); + }, + + initOnKeyListeners: () => { + // only install a listener if it is really needed + if ( + !DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS && + !DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS + ) + return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.altKey || event.ctrlKey || event.metaKey) return; + + if (!event.shiftKey) { + switch (event.key) { + case "ArrowLeft": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const prevLink = document.querySelector('link[rel="prev"]'); + if (prevLink && prevLink.href) { + window.location.href = prevLink.href; + event.preventDefault(); + } + break; + case "ArrowRight": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const nextLink = document.querySelector('link[rel="next"]'); + if (nextLink && nextLink.href) { + window.location.href = nextLink.href; + event.preventDefault(); + } + break; + } + } + + // some keyboard layouts may need Shift to get / + switch (event.key) { + case "/": + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) break; + Documentation.focusSearchBar(); + event.preventDefault(); + } + }); + }, +}; + +// quick alias for translations +const _ = Documentation.gettext; + +_ready(Documentation.init); diff --git a/4.1/_static/documentation_options.js b/4.1/_static/documentation_options.js new file mode 100644 index 000000000..7e4c114f2 --- /dev/null +++ b/4.1/_static/documentation_options.js @@ -0,0 +1,13 @@ +const DOCUMENTATION_OPTIONS = { + VERSION: '', + LANGUAGE: 'en', + COLLAPSE_INDEX: false, + BUILDER: 'html', + FILE_SUFFIX: '.html', + LINK_SUFFIX: '.html', + HAS_SOURCE: true, + SOURCELINK_SUFFIX: '.txt', + NAVIGATION_WITH_KEYS: false, + SHOW_SEARCH_SUMMARY: true, + ENABLE_SEARCH_SHORTCUTS: true, +}; \ No newline at end of file diff --git a/4.1/_static/file.png b/4.1/_static/file.png new file mode 100644 index 000000000..a858a410e Binary files /dev/null and b/4.1/_static/file.png differ diff --git a/4.1/_static/graphviz.css b/4.1/_static/graphviz.css new file mode 100644 index 000000000..027576e34 --- /dev/null +++ b/4.1/_static/graphviz.css @@ -0,0 +1,19 @@ +/* + * graphviz.css + * ~~~~~~~~~~~~ + * + * Sphinx stylesheet -- graphviz extension. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +img.graphviz { + border: 0; + max-width: 100%; +} + +object.graphviz { + max-width: 100%; +} diff --git a/4.1/_static/kraken.png b/4.1/_static/kraken.png new file mode 100644 index 000000000..8f25dd8be Binary files /dev/null and b/4.1/_static/kraken.png differ diff --git a/4.1/_static/kraken_recognition.svg b/4.1/_static/kraken_recognition.svg new file mode 100644 index 000000000..129b2c67a --- /dev/null +++ b/4.1/_static/kraken_recognition.svg @@ -0,0 +1,948 @@ + + + + + + + + + + + + Output Matrix + + + Labels + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Label + Sequence + + + 15, 10, 1, ... + + + + 'Time' Steps + + + + + + + + + + + + + + 'Time' Steps + (Width) + + + + + + + + + + + + + + + + + + + + + + + + + + Neural + Net + + + + Character + Sequence + + + o, c, u, ... + + + + + + + + + + + + + + + CTC + decoder + + + + + Codec + + + + + + + + + + + + + + diff --git a/4.1/_static/kraken_segmentation.svg b/4.1/_static/kraken_segmentation.svg new file mode 100644 index 000000000..4b9c860ce --- /dev/null +++ b/4.1/_static/kraken_segmentation.svg @@ -0,0 +1,1161 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Pixel Labelling + + + + + + + + Line and Separator + Heatmaps + + + + + + + + + Bounding Polygon + Calculation + + + + + + + + + + + Baseline + Vectorization + and Orientation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Oriented + Baselines + + + + + + + + + Line + Ordering + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Bounding + Polygons + + + + + + + Trainable + + + + + + + + + + + + Segmentation + + + + + + + + + + Region Heatmaps + + + + + + + + + + Region + Vectorization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Region + Boundaries + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/4.1/_static/kraken_segmodel.svg b/4.1/_static/kraken_segmodel.svg new file mode 100644 index 000000000..e722a9707 --- /dev/null +++ b/4.1/_static/kraken_segmodel.svg @@ -0,0 +1,250 @@ + + + + + + + + + + + + + Segmentation Model + (TorchVGSLModel) + + + + + + + + + Metadata + + + + + + + Line and Region Types + + + + + + + Baseline location flag + + + + + + + Bounding Regions + + + + + + + + + + + Neural Network + + + + diff --git a/4.1/_static/kraken_torchseqrecognizer.svg b/4.1/_static/kraken_torchseqrecognizer.svg new file mode 100644 index 000000000..c9a2f1135 --- /dev/null +++ b/4.1/_static/kraken_torchseqrecognizer.svg @@ -0,0 +1,239 @@ + + + + + + + + + + + + + Transcription Model + (TorchSeqRecognizer) + + + + + + + + + + Codec + + + + + + + + + + + Metadata + + + + + + + + + + + CTC Decoder + + + + + + + + + + + Neural Network + + + + diff --git a/4.1/_static/kraken_workflow.svg b/4.1/_static/kraken_workflow.svg new file mode 100644 index 000000000..5a50b51d6 --- /dev/null +++ b/4.1/_static/kraken_workflow.svg @@ -0,0 +1,753 @@ + + + + + + + + + + + + + + + Segmentation + + + + + + + + + + + Recognition + + + + + + + + + + + Serialization + + + + + + + + + + + + + + + + + + + + + + Recognition Model + + + + + + + + + + + + + + + + + + + + + + Segmentation Model + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + OCR Records + + + + + + + + + + + + + + + + + + Baselines, + Regions, + and Order + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Output File + + + + + + + + + + + + + + + + + + Output Template + + + + + + + + + + + + + + + + + + Image + + diff --git a/4.1/_static/language_data.js b/4.1/_static/language_data.js new file mode 100644 index 000000000..367b8ed81 --- /dev/null +++ b/4.1/_static/language_data.js @@ -0,0 +1,199 @@ +/* + * language_data.js + * ~~~~~~~~~~~~~~~~ + * + * This script contains the language-specific data used by searchtools.js, + * namely the list of stopwords, stemmer, scorer and splitter. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"]; + + +/* Non-minified version is copied as a separate JS file, if available */ + +/** + * Porter Stemmer + */ +var Stemmer = function() { + + var step2list = { + ational: 'ate', + tional: 'tion', + enci: 'ence', + anci: 'ance', + izer: 'ize', + bli: 'ble', + alli: 'al', + entli: 'ent', + eli: 'e', + ousli: 'ous', + ization: 'ize', + ation: 'ate', + ator: 'ate', + alism: 'al', + iveness: 'ive', + fulness: 'ful', + ousness: 'ous', + aliti: 'al', + iviti: 'ive', + biliti: 'ble', + logi: 'log' + }; + + var step3list = { + icate: 'ic', + ative: '', + alize: 'al', + iciti: 'ic', + ical: 'ic', + ful: '', + ness: '' + }; + + var c = "[^aeiou]"; // consonant + var v = "[aeiouy]"; // vowel + var C = c + "[^aeiouy]*"; // consonant sequence + var V = v + "[aeiou]*"; // vowel sequence + + var mgr0 = "^(" + C + ")?" + V + C; // [C]VC... is m>0 + var meq1 = "^(" + C + ")?" + V + C + "(" + V + ")?$"; // [C]VC[V] is m=1 + var mgr1 = "^(" + C + ")?" + V + C + V + C; // [C]VCVC... is m>1 + var s_v = "^(" + C + ")?" + v; // vowel in stem + + this.stemWord = function (w) { + var stem; + var suffix; + var firstch; + var origword = w; + + if (w.length < 3) + return w; + + var re; + var re2; + var re3; + var re4; + + firstch = w.substr(0,1); + if (firstch == "y") + w = firstch.toUpperCase() + w.substr(1); + + // Step 1a + re = /^(.+?)(ss|i)es$/; + re2 = /^(.+?)([^s])s$/; + + if (re.test(w)) + w = w.replace(re,"$1$2"); + else if (re2.test(w)) + w = w.replace(re2,"$1$2"); + + // Step 1b + re = /^(.+?)eed$/; + re2 = /^(.+?)(ed|ing)$/; + if (re.test(w)) { + var fp = re.exec(w); + re = new RegExp(mgr0); + if (re.test(fp[1])) { + re = /.$/; + w = w.replace(re,""); + } + } + else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1]; + re2 = new RegExp(s_v); + if (re2.test(stem)) { + w = stem; + re2 = /(at|bl|iz)$/; + re3 = new RegExp("([^aeiouylsz])\\1$"); + re4 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + if (re2.test(w)) + w = w + "e"; + else if (re3.test(w)) { + re = /.$/; + w = w.replace(re,""); + } + else if (re4.test(w)) + w = w + "e"; + } + } + + // Step 1c + re = /^(.+?)y$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(s_v); + if (re.test(stem)) + w = stem + "i"; + } + + // Step 2 + re = /^(.+?)(ational|tional|enci|anci|izer|bli|alli|entli|eli|ousli|ization|ation|ator|alism|iveness|fulness|ousness|aliti|iviti|biliti|logi)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = new RegExp(mgr0); + if (re.test(stem)) + w = stem + step2list[suffix]; + } + + // Step 3 + re = /^(.+?)(icate|ative|alize|iciti|ical|ful|ness)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = new RegExp(mgr0); + if (re.test(stem)) + w = stem + step3list[suffix]; + } + + // Step 4 + re = /^(.+?)(al|ance|ence|er|ic|able|ible|ant|ement|ment|ent|ou|ism|ate|iti|ous|ive|ize)$/; + re2 = /^(.+?)(s|t)(ion)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(mgr1); + if (re.test(stem)) + w = stem; + } + else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1] + fp[2]; + re2 = new RegExp(mgr1); + if (re2.test(stem)) + w = stem; + } + + // Step 5 + re = /^(.+?)e$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(mgr1); + re2 = new RegExp(meq1); + re3 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + if (re.test(stem) || (re2.test(stem) && !(re3.test(stem)))) + w = stem; + } + re = /ll$/; + re2 = new RegExp(mgr1); + if (re.test(w) && re2.test(w)) { + re = /.$/; + w = w.replace(re,""); + } + + // and turn initial Y back to y + if (firstch == "y") + w = firstch.toLowerCase() + w.substr(1); + return w; + } +} + diff --git a/4.1/_static/minus.png b/4.1/_static/minus.png new file mode 100644 index 000000000..d96755fda Binary files /dev/null and b/4.1/_static/minus.png differ diff --git a/4.1/_static/normal-reproduction-low-resolution.jpg b/4.1/_static/normal-reproduction-low-resolution.jpg new file mode 100644 index 000000000..673be92ae Binary files /dev/null and b/4.1/_static/normal-reproduction-low-resolution.jpg differ diff --git a/4.1/_static/pat.png b/4.1/_static/pat.png new file mode 100644 index 000000000..52f7a8995 Binary files /dev/null and b/4.1/_static/pat.png differ diff --git a/4.1/_static/plus.png b/4.1/_static/plus.png new file mode 100644 index 000000000..7107cec93 Binary files /dev/null and b/4.1/_static/plus.png differ diff --git a/4.1/_static/pygments.css b/4.1/_static/pygments.css new file mode 100644 index 000000000..0d49244ed --- /dev/null +++ b/4.1/_static/pygments.css @@ -0,0 +1,75 @@ +pre { line-height: 125%; } +td.linenos .normal { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +span.linenos { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +td.linenos .special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +.highlight .hll { background-color: #ffffcc } +.highlight { background: #eeffcc; } +.highlight .c { color: #408090; font-style: italic } /* Comment */ +.highlight .err { border: 1px solid #FF0000 } /* Error */ +.highlight .k { color: #007020; font-weight: bold } /* Keyword */ +.highlight .o { color: #666666 } /* Operator */ +.highlight .ch { color: #408090; font-style: italic } /* Comment.Hashbang */ +.highlight .cm { color: #408090; font-style: italic } /* Comment.Multiline */ +.highlight .cp { color: #007020 } /* Comment.Preproc */ +.highlight .cpf { color: #408090; font-style: italic } /* Comment.PreprocFile */ +.highlight .c1 { color: #408090; font-style: italic } /* Comment.Single */ +.highlight .cs { color: #408090; background-color: #fff0f0 } /* Comment.Special */ +.highlight .gd { color: #A00000 } /* Generic.Deleted */ +.highlight .ge { font-style: italic } /* Generic.Emph */ +.highlight .ges { font-weight: bold; font-style: italic } /* Generic.EmphStrong */ +.highlight .gr { color: #FF0000 } /* Generic.Error */ +.highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */ +.highlight .gi { color: #00A000 } /* Generic.Inserted */ +.highlight .go { color: #333333 } /* Generic.Output */ +.highlight .gp { color: #c65d09; font-weight: bold } /* Generic.Prompt */ +.highlight .gs { font-weight: bold } /* Generic.Strong */ +.highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */ +.highlight .gt { color: #0044DD } /* Generic.Traceback */ +.highlight .kc { color: #007020; font-weight: bold } /* Keyword.Constant */ +.highlight .kd { color: #007020; font-weight: bold } /* Keyword.Declaration */ +.highlight .kn { color: #007020; font-weight: bold } /* Keyword.Namespace */ +.highlight .kp { color: #007020 } /* Keyword.Pseudo */ +.highlight .kr { color: #007020; font-weight: bold } /* Keyword.Reserved */ +.highlight .kt { color: #902000 } /* Keyword.Type */ +.highlight .m { color: #208050 } /* Literal.Number */ +.highlight .s { color: #4070a0 } /* Literal.String */ +.highlight .na { color: #4070a0 } /* Name.Attribute */ +.highlight .nb { color: #007020 } /* Name.Builtin */ +.highlight .nc { color: #0e84b5; font-weight: bold } /* Name.Class */ +.highlight .no { color: #60add5 } /* Name.Constant */ +.highlight .nd { color: #555555; font-weight: bold } /* Name.Decorator */ +.highlight .ni { color: #d55537; font-weight: bold } /* Name.Entity */ +.highlight .ne { color: #007020 } /* Name.Exception */ +.highlight .nf { color: #06287e } /* Name.Function */ +.highlight .nl { color: #002070; font-weight: bold } /* Name.Label */ +.highlight .nn { color: #0e84b5; font-weight: bold } /* Name.Namespace */ +.highlight .nt { color: #062873; font-weight: bold } /* Name.Tag */ +.highlight .nv { color: #bb60d5 } /* Name.Variable */ +.highlight .ow { color: #007020; font-weight: bold } /* Operator.Word */ +.highlight .w { color: #bbbbbb } /* Text.Whitespace */ +.highlight .mb { color: #208050 } /* Literal.Number.Bin */ +.highlight .mf { color: #208050 } /* Literal.Number.Float */ +.highlight .mh { color: #208050 } /* Literal.Number.Hex */ +.highlight .mi { color: #208050 } /* Literal.Number.Integer */ +.highlight .mo { color: #208050 } /* Literal.Number.Oct */ +.highlight .sa { color: #4070a0 } /* Literal.String.Affix */ +.highlight .sb { color: #4070a0 } /* Literal.String.Backtick */ +.highlight .sc { color: #4070a0 } /* Literal.String.Char */ +.highlight .dl { color: #4070a0 } /* Literal.String.Delimiter */ +.highlight .sd { color: #4070a0; font-style: italic } /* Literal.String.Doc */ +.highlight .s2 { color: #4070a0 } /* Literal.String.Double */ +.highlight .se { color: #4070a0; font-weight: bold } /* Literal.String.Escape */ +.highlight .sh { color: #4070a0 } /* Literal.String.Heredoc */ +.highlight .si { color: #70a0d0; font-style: italic } /* Literal.String.Interpol */ +.highlight .sx { color: #c65d09 } /* Literal.String.Other */ +.highlight .sr { color: #235388 } /* Literal.String.Regex */ +.highlight .s1 { color: #4070a0 } /* Literal.String.Single */ +.highlight .ss { color: #517918 } /* Literal.String.Symbol */ +.highlight .bp { color: #007020 } /* Name.Builtin.Pseudo */ +.highlight .fm { color: #06287e } /* Name.Function.Magic */ +.highlight .vc { color: #bb60d5 } /* Name.Variable.Class */ +.highlight .vg { color: #bb60d5 } /* Name.Variable.Global */ +.highlight .vi { color: #bb60d5 } /* Name.Variable.Instance */ +.highlight .vm { color: #bb60d5 } /* Name.Variable.Magic */ +.highlight .il { color: #208050 } /* Literal.Number.Integer.Long */ \ No newline at end of file diff --git a/4.1/_static/searchtools.js b/4.1/_static/searchtools.js new file mode 100644 index 000000000..b08d58c9b --- /dev/null +++ b/4.1/_static/searchtools.js @@ -0,0 +1,620 @@ +/* + * searchtools.js + * ~~~~~~~~~~~~~~~~ + * + * Sphinx JavaScript utilities for the full-text search. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ +"use strict"; + +/** + * Simple result scoring code. + */ +if (typeof Scorer === "undefined") { + var Scorer = { + // Implement the following function to further tweak the score for each result + // The function takes a result array [docname, title, anchor, descr, score, filename] + // and returns the new score. + /* + score: result => { + const [docname, title, anchor, descr, score, filename] = result + return score + }, + */ + + // query matches the full name of an object + objNameMatch: 11, + // or matches in the last dotted part of the object name + objPartialMatch: 6, + // Additive scores depending on the priority of the object + objPrio: { + 0: 15, // used to be importantResults + 1: 5, // used to be objectResults + 2: -5, // used to be unimportantResults + }, + // Used when the priority is not in the mapping. + objPrioDefault: 0, + + // query found in title + title: 15, + partialTitle: 7, + // query found in terms + term: 5, + partialTerm: 2, + }; +} + +const _removeChildren = (element) => { + while (element && element.lastChild) element.removeChild(element.lastChild); +}; + +/** + * See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping + */ +const _escapeRegExp = (string) => + string.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string + +const _displayItem = (item, searchTerms, highlightTerms) => { + const docBuilder = DOCUMENTATION_OPTIONS.BUILDER; + const docFileSuffix = DOCUMENTATION_OPTIONS.FILE_SUFFIX; + const docLinkSuffix = DOCUMENTATION_OPTIONS.LINK_SUFFIX; + const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY; + const contentRoot = document.documentElement.dataset.content_root; + + const [docName, title, anchor, descr, score, _filename] = item; + + let listItem = document.createElement("li"); + let requestUrl; + let linkUrl; + if (docBuilder === "dirhtml") { + // dirhtml builder + let dirname = docName + "/"; + if (dirname.match(/\/index\/$/)) + dirname = dirname.substring(0, dirname.length - 6); + else if (dirname === "index/") dirname = ""; + requestUrl = contentRoot + dirname; + linkUrl = requestUrl; + } else { + // normal html builders + requestUrl = contentRoot + docName + docFileSuffix; + linkUrl = docName + docLinkSuffix; + } + let linkEl = listItem.appendChild(document.createElement("a")); + linkEl.href = linkUrl + anchor; + linkEl.dataset.score = score; + linkEl.innerHTML = title; + if (descr) { + listItem.appendChild(document.createElement("span")).innerHTML = + " (" + descr + ")"; + // highlight search terms in the description + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + } + else if (showSearchSummary) + fetch(requestUrl) + .then((responseData) => responseData.text()) + .then((data) => { + if (data) + listItem.appendChild( + Search.makeSearchSummary(data, searchTerms, anchor) + ); + // highlight search terms in the summary + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + }); + Search.output.appendChild(listItem); +}; +const _finishSearch = (resultCount) => { + Search.stopPulse(); + Search.title.innerText = _("Search Results"); + if (!resultCount) + Search.status.innerText = Documentation.gettext( + "Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories." + ); + else + Search.status.innerText = _( + "Search finished, found ${resultCount} page(s) matching the search query." + ).replace('${resultCount}', resultCount); +}; +const _displayNextItem = ( + results, + resultCount, + searchTerms, + highlightTerms, +) => { + // results left, load the summary and display it + // this is intended to be dynamic (don't sub resultsCount) + if (results.length) { + _displayItem(results.pop(), searchTerms, highlightTerms); + setTimeout( + () => _displayNextItem(results, resultCount, searchTerms, highlightTerms), + 5 + ); + } + // search finished, update title and status message + else _finishSearch(resultCount); +}; +// Helper function used by query() to order search results. +// Each input is an array of [docname, title, anchor, descr, score, filename]. +// Order the results by score (in opposite order of appearance, since the +// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically. +const _orderResultsByScoreThenName = (a, b) => { + const leftScore = a[4]; + const rightScore = b[4]; + if (leftScore === rightScore) { + // same score: sort alphabetically + const leftTitle = a[1].toLowerCase(); + const rightTitle = b[1].toLowerCase(); + if (leftTitle === rightTitle) return 0; + return leftTitle > rightTitle ? -1 : 1; // inverted is intentional + } + return leftScore > rightScore ? 1 : -1; +}; + +/** + * Default splitQuery function. Can be overridden in ``sphinx.search`` with a + * custom function per language. + * + * The regular expression works by splitting the string on consecutive characters + * that are not Unicode letters, numbers, underscores, or emoji characters. + * This is the same as ``\W+`` in Python, preserving the surrogate pair area. + */ +if (typeof splitQuery === "undefined") { + var splitQuery = (query) => query + .split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu) + .filter(term => term) // remove remaining empty strings +} + +/** + * Search Module + */ +const Search = { + _index: null, + _queued_query: null, + _pulse_status: -1, + + htmlToText: (htmlString, anchor) => { + const htmlElement = new DOMParser().parseFromString(htmlString, 'text/html'); + for (const removalQuery of [".headerlink", "script", "style"]) { + htmlElement.querySelectorAll(removalQuery).forEach((el) => { el.remove() }); + } + if (anchor) { + const anchorContent = htmlElement.querySelector(`[role="main"] ${anchor}`); + if (anchorContent) return anchorContent.textContent; + + console.warn( + `Anchored content block not found. Sphinx search tries to obtain it via DOM query '[role=main] ${anchor}'. Check your theme or template.` + ); + } + + // if anchor not specified or not found, fall back to main content + const docContent = htmlElement.querySelector('[role="main"]'); + if (docContent) return docContent.textContent; + + console.warn( + "Content block not found. Sphinx search tries to obtain it via DOM query '[role=main]'. Check your theme or template." + ); + return ""; + }, + + init: () => { + const query = new URLSearchParams(window.location.search).get("q"); + document + .querySelectorAll('input[name="q"]') + .forEach((el) => (el.value = query)); + if (query) Search.performSearch(query); + }, + + loadIndex: (url) => + (document.body.appendChild(document.createElement("script")).src = url), + + setIndex: (index) => { + Search._index = index; + if (Search._queued_query !== null) { + const query = Search._queued_query; + Search._queued_query = null; + Search.query(query); + } + }, + + hasIndex: () => Search._index !== null, + + deferQuery: (query) => (Search._queued_query = query), + + stopPulse: () => (Search._pulse_status = -1), + + startPulse: () => { + if (Search._pulse_status >= 0) return; + + const pulse = () => { + Search._pulse_status = (Search._pulse_status + 1) % 4; + Search.dots.innerText = ".".repeat(Search._pulse_status); + if (Search._pulse_status >= 0) window.setTimeout(pulse, 500); + }; + pulse(); + }, + + /** + * perform a search for something (or wait until index is loaded) + */ + performSearch: (query) => { + // create the required interface elements + const searchText = document.createElement("h2"); + searchText.textContent = _("Searching"); + const searchSummary = document.createElement("p"); + searchSummary.classList.add("search-summary"); + searchSummary.innerText = ""; + const searchList = document.createElement("ul"); + searchList.classList.add("search"); + + const out = document.getElementById("search-results"); + Search.title = out.appendChild(searchText); + Search.dots = Search.title.appendChild(document.createElement("span")); + Search.status = out.appendChild(searchSummary); + Search.output = out.appendChild(searchList); + + const searchProgress = document.getElementById("search-progress"); + // Some themes don't use the search progress node + if (searchProgress) { + searchProgress.innerText = _("Preparing search..."); + } + Search.startPulse(); + + // index already loaded, the browser was quick! + if (Search.hasIndex()) Search.query(query); + else Search.deferQuery(query); + }, + + _parseQuery: (query) => { + // stem the search terms and add them to the correct list + const stemmer = new Stemmer(); + const searchTerms = new Set(); + const excludedTerms = new Set(); + const highlightTerms = new Set(); + const objectTerms = new Set(splitQuery(query.toLowerCase().trim())); + splitQuery(query.trim()).forEach((queryTerm) => { + const queryTermLower = queryTerm.toLowerCase(); + + // maybe skip this "word" + // stopwords array is from language_data.js + if ( + stopwords.indexOf(queryTermLower) !== -1 || + queryTerm.match(/^\d+$/) + ) + return; + + // stem the word + let word = stemmer.stemWord(queryTermLower); + // select the correct list + if (word[0] === "-") excludedTerms.add(word.substr(1)); + else { + searchTerms.add(word); + highlightTerms.add(queryTermLower); + } + }); + + if (SPHINX_HIGHLIGHT_ENABLED) { // set in sphinx_highlight.js + localStorage.setItem("sphinx_highlight_terms", [...highlightTerms].join(" ")) + } + + // console.debug("SEARCH: searching for:"); + // console.info("required: ", [...searchTerms]); + // console.info("excluded: ", [...excludedTerms]); + + return [query, searchTerms, excludedTerms, highlightTerms, objectTerms]; + }, + + /** + * execute search (requires search index to be loaded) + */ + _performSearch: (query, searchTerms, excludedTerms, highlightTerms, objectTerms) => { + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const titles = Search._index.titles; + const allTitles = Search._index.alltitles; + const indexEntries = Search._index.indexentries; + + // Collect multiple result groups to be sorted separately and then ordered. + // Each is an array of [docname, title, anchor, descr, score, filename]. + const normalResults = []; + const nonMainIndexResults = []; + + _removeChildren(document.getElementById("search-progress")); + + const queryLower = query.toLowerCase().trim(); + for (const [title, foundTitles] of Object.entries(allTitles)) { + if (title.toLowerCase().trim().includes(queryLower) && (queryLower.length >= title.length/2)) { + for (const [file, id] of foundTitles) { + const score = Math.round(Scorer.title * queryLower.length / title.length); + const boost = titles[file] === title ? 1 : 0; // add a boost for document titles + normalResults.push([ + docNames[file], + titles[file] !== title ? `${titles[file]} > ${title}` : title, + id !== null ? "#" + id : "", + null, + score + boost, + filenames[file], + ]); + } + } + } + + // search for explicit entries in index directives + for (const [entry, foundEntries] of Object.entries(indexEntries)) { + if (entry.includes(queryLower) && (queryLower.length >= entry.length/2)) { + for (const [file, id, isMain] of foundEntries) { + const score = Math.round(100 * queryLower.length / entry.length); + const result = [ + docNames[file], + titles[file], + id ? "#" + id : "", + null, + score, + filenames[file], + ]; + if (isMain) { + normalResults.push(result); + } else { + nonMainIndexResults.push(result); + } + } + } + } + + // lookup as object + objectTerms.forEach((term) => + normalResults.push(...Search.performObjectSearch(term, objectTerms)) + ); + + // lookup as search terms in fulltext + normalResults.push(...Search.performTermsSearch(searchTerms, excludedTerms)); + + // let the scorer override scores with a custom scoring function + if (Scorer.score) { + normalResults.forEach((item) => (item[4] = Scorer.score(item))); + nonMainIndexResults.forEach((item) => (item[4] = Scorer.score(item))); + } + + // Sort each group of results by score and then alphabetically by name. + normalResults.sort(_orderResultsByScoreThenName); + nonMainIndexResults.sort(_orderResultsByScoreThenName); + + // Combine the result groups in (reverse) order. + // Non-main index entries are typically arbitrary cross-references, + // so display them after other results. + let results = [...nonMainIndexResults, ...normalResults]; + + // remove duplicate search results + // note the reversing of results, so that in the case of duplicates, the highest-scoring entry is kept + let seen = new Set(); + results = results.reverse().reduce((acc, result) => { + let resultStr = result.slice(0, 4).concat([result[5]]).map(v => String(v)).join(','); + if (!seen.has(resultStr)) { + acc.push(result); + seen.add(resultStr); + } + return acc; + }, []); + + return results.reverse(); + }, + + query: (query) => { + const [searchQuery, searchTerms, excludedTerms, highlightTerms, objectTerms] = Search._parseQuery(query); + const results = Search._performSearch(searchQuery, searchTerms, excludedTerms, highlightTerms, objectTerms); + + // for debugging + //Search.lastresults = results.slice(); // a copy + // console.info("search results:", Search.lastresults); + + // print the results + _displayNextItem(results, results.length, searchTerms, highlightTerms); + }, + + /** + * search for object names + */ + performObjectSearch: (object, objectTerms) => { + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const objects = Search._index.objects; + const objNames = Search._index.objnames; + const titles = Search._index.titles; + + const results = []; + + const objectSearchCallback = (prefix, match) => { + const name = match[4] + const fullname = (prefix ? prefix + "." : "") + name; + const fullnameLower = fullname.toLowerCase(); + if (fullnameLower.indexOf(object) < 0) return; + + let score = 0; + const parts = fullnameLower.split("."); + + // check for different match types: exact matches of full name or + // "last name" (i.e. last dotted part) + if (fullnameLower === object || parts.slice(-1)[0] === object) + score += Scorer.objNameMatch; + else if (parts.slice(-1)[0].indexOf(object) > -1) + score += Scorer.objPartialMatch; // matches in last name + + const objName = objNames[match[1]][2]; + const title = titles[match[0]]; + + // If more than one term searched for, we require other words to be + // found in the name/title/description + const otherTerms = new Set(objectTerms); + otherTerms.delete(object); + if (otherTerms.size > 0) { + const haystack = `${prefix} ${name} ${objName} ${title}`.toLowerCase(); + if ( + [...otherTerms].some((otherTerm) => haystack.indexOf(otherTerm) < 0) + ) + return; + } + + let anchor = match[3]; + if (anchor === "") anchor = fullname; + else if (anchor === "-") anchor = objNames[match[1]][1] + "-" + fullname; + + const descr = objName + _(", in ") + title; + + // add custom score for some objects according to scorer + if (Scorer.objPrio.hasOwnProperty(match[2])) + score += Scorer.objPrio[match[2]]; + else score += Scorer.objPrioDefault; + + results.push([ + docNames[match[0]], + fullname, + "#" + anchor, + descr, + score, + filenames[match[0]], + ]); + }; + Object.keys(objects).forEach((prefix) => + objects[prefix].forEach((array) => + objectSearchCallback(prefix, array) + ) + ); + return results; + }, + + /** + * search for full-text terms in the index + */ + performTermsSearch: (searchTerms, excludedTerms) => { + // prepare search + const terms = Search._index.terms; + const titleTerms = Search._index.titleterms; + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const titles = Search._index.titles; + + const scoreMap = new Map(); + const fileMap = new Map(); + + // perform the search on the required terms + searchTerms.forEach((word) => { + const files = []; + const arr = [ + { files: terms[word], score: Scorer.term }, + { files: titleTerms[word], score: Scorer.title }, + ]; + // add support for partial matches + if (word.length > 2) { + const escapedWord = _escapeRegExp(word); + if (!terms.hasOwnProperty(word)) { + Object.keys(terms).forEach((term) => { + if (term.match(escapedWord)) + arr.push({ files: terms[term], score: Scorer.partialTerm }); + }); + } + if (!titleTerms.hasOwnProperty(word)) { + Object.keys(titleTerms).forEach((term) => { + if (term.match(escapedWord)) + arr.push({ files: titleTerms[term], score: Scorer.partialTitle }); + }); + } + } + + // no match but word was a required one + if (arr.every((record) => record.files === undefined)) return; + + // found search word in contents + arr.forEach((record) => { + if (record.files === undefined) return; + + let recordFiles = record.files; + if (recordFiles.length === undefined) recordFiles = [recordFiles]; + files.push(...recordFiles); + + // set score for the word in each file + recordFiles.forEach((file) => { + if (!scoreMap.has(file)) scoreMap.set(file, {}); + scoreMap.get(file)[word] = record.score; + }); + }); + + // create the mapping + files.forEach((file) => { + if (!fileMap.has(file)) fileMap.set(file, [word]); + else if (fileMap.get(file).indexOf(word) === -1) fileMap.get(file).push(word); + }); + }); + + // now check if the files don't contain excluded terms + const results = []; + for (const [file, wordList] of fileMap) { + // check if all requirements are matched + + // as search terms with length < 3 are discarded + const filteredTermCount = [...searchTerms].filter( + (term) => term.length > 2 + ).length; + if ( + wordList.length !== searchTerms.size && + wordList.length !== filteredTermCount + ) + continue; + + // ensure that none of the excluded terms is in the search result + if ( + [...excludedTerms].some( + (term) => + terms[term] === file || + titleTerms[term] === file || + (terms[term] || []).includes(file) || + (titleTerms[term] || []).includes(file) + ) + ) + break; + + // select one (max) score for the file. + const score = Math.max(...wordList.map((w) => scoreMap.get(file)[w])); + // add result to the result list + results.push([ + docNames[file], + titles[file], + "", + null, + score, + filenames[file], + ]); + } + return results; + }, + + /** + * helper function to return a node containing the + * search summary for a given text. keywords is a list + * of stemmed words. + */ + makeSearchSummary: (htmlText, keywords, anchor) => { + const text = Search.htmlToText(htmlText, anchor); + if (text === "") return null; + + const textLower = text.toLowerCase(); + const actualStartPosition = [...keywords] + .map((k) => textLower.indexOf(k.toLowerCase())) + .filter((i) => i > -1) + .slice(-1)[0]; + const startWithContext = Math.max(actualStartPosition - 120, 0); + + const top = startWithContext === 0 ? "" : "..."; + const tail = startWithContext + 240 < text.length ? "..." : ""; + + let summary = document.createElement("p"); + summary.classList.add("context"); + summary.textContent = top + text.substr(startWithContext, 240).trim() + tail; + + return summary; + }, +}; + +_ready(Search.init); diff --git a/4.1/_static/sphinx_highlight.js b/4.1/_static/sphinx_highlight.js new file mode 100644 index 000000000..8a96c69a1 --- /dev/null +++ b/4.1/_static/sphinx_highlight.js @@ -0,0 +1,154 @@ +/* Highlighting utilities for Sphinx HTML documentation. */ +"use strict"; + +const SPHINX_HIGHLIGHT_ENABLED = true + +/** + * highlight a given string on a node by wrapping it in + * span elements with the given class name. + */ +const _highlight = (node, addItems, text, className) => { + if (node.nodeType === Node.TEXT_NODE) { + const val = node.nodeValue; + const parent = node.parentNode; + const pos = val.toLowerCase().indexOf(text); + if ( + pos >= 0 && + !parent.classList.contains(className) && + !parent.classList.contains("nohighlight") + ) { + let span; + + const closestNode = parent.closest("body, svg, foreignObject"); + const isInSVG = closestNode && closestNode.matches("svg"); + if (isInSVG) { + span = document.createElementNS("http://www.w3.org/2000/svg", "tspan"); + } else { + span = document.createElement("span"); + span.classList.add(className); + } + + span.appendChild(document.createTextNode(val.substr(pos, text.length))); + const rest = document.createTextNode(val.substr(pos + text.length)); + parent.insertBefore( + span, + parent.insertBefore( + rest, + node.nextSibling + ) + ); + node.nodeValue = val.substr(0, pos); + /* There may be more occurrences of search term in this node. So call this + * function recursively on the remaining fragment. + */ + _highlight(rest, addItems, text, className); + + if (isInSVG) { + const rect = document.createElementNS( + "http://www.w3.org/2000/svg", + "rect" + ); + const bbox = parent.getBBox(); + rect.x.baseVal.value = bbox.x; + rect.y.baseVal.value = bbox.y; + rect.width.baseVal.value = bbox.width; + rect.height.baseVal.value = bbox.height; + rect.setAttribute("class", className); + addItems.push({ parent: parent, target: rect }); + } + } + } else if (node.matches && !node.matches("button, select, textarea")) { + node.childNodes.forEach((el) => _highlight(el, addItems, text, className)); + } +}; +const _highlightText = (thisNode, text, className) => { + let addItems = []; + _highlight(thisNode, addItems, text, className); + addItems.forEach((obj) => + obj.parent.insertAdjacentElement("beforebegin", obj.target) + ); +}; + +/** + * Small JavaScript module for the documentation. + */ +const SphinxHighlight = { + + /** + * highlight the search words provided in localstorage in the text + */ + highlightSearchWords: () => { + if (!SPHINX_HIGHLIGHT_ENABLED) return; // bail if no highlight + + // get and clear terms from localstorage + const url = new URL(window.location); + const highlight = + localStorage.getItem("sphinx_highlight_terms") + || url.searchParams.get("highlight") + || ""; + localStorage.removeItem("sphinx_highlight_terms") + url.searchParams.delete("highlight"); + window.history.replaceState({}, "", url); + + // get individual terms from highlight string + const terms = highlight.toLowerCase().split(/\s+/).filter(x => x); + if (terms.length === 0) return; // nothing to do + + // There should never be more than one element matching "div.body" + const divBody = document.querySelectorAll("div.body"); + const body = divBody.length ? divBody[0] : document.querySelector("body"); + window.setTimeout(() => { + terms.forEach((term) => _highlightText(body, term, "highlighted")); + }, 10); + + const searchBox = document.getElementById("searchbox"); + if (searchBox === null) return; + searchBox.appendChild( + document + .createRange() + .createContextualFragment( + '" + ) + ); + }, + + /** + * helper function to hide the search marks again + */ + hideSearchWords: () => { + document + .querySelectorAll("#searchbox .highlight-link") + .forEach((el) => el.remove()); + document + .querySelectorAll("span.highlighted") + .forEach((el) => el.classList.remove("highlighted")); + localStorage.removeItem("sphinx_highlight_terms") + }, + + initEscapeListener: () => { + // only install a listener if it is really needed + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.shiftKey || event.altKey || event.ctrlKey || event.metaKey) return; + if (DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS && (event.key === "Escape")) { + SphinxHighlight.hideSearchWords(); + event.preventDefault(); + } + }); + }, +}; + +_ready(() => { + /* Do not call highlightSearchWords() when we are on the search page. + * It will highlight words from the *previous* search query. + */ + if (typeof Search === "undefined") SphinxHighlight.highlightSearchWords(); + SphinxHighlight.initEscapeListener(); +}); diff --git a/4.1/advanced.html b/4.1/advanced.html new file mode 100644 index 000000000..a22159521 --- /dev/null +++ b/4.1/advanced.html @@ -0,0 +1,353 @@ + + + + + + + + Advanced Usage — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Advanced Usage

+

Optical character recognition is the serial execution of multiple steps, in the +case of kraken binarization (converting color and grayscale images into bitonal +ones), layout analysis/page segmentation (extracting topological text lines +from an image), recognition (feeding text lines images into an classifiers), +and finally serialization of results into an appropriate format such as hOCR or +ALTO.

+
+

Input Specification

+

All kraken subcommands operating on input-output pairs, i.e. producing one +output document for one input document follow the basic syntax:

+
$ kraken -i input_1 output_1 -i input_2 output_2 ... subcommand_1 subcommand_2 ... subcommand_n
+
+
+

In particular subcommands may be chained.

+

There are other ways to define inputs and outputs as the syntax shown above can +become rather cumbersome for large amounts of files.

+

As such there are a couple of ways to deal with multiple files in a compact +way. The first is batch processing:

+
$ kraken -I '*.png' -o ocr.txt segment ...
+
+
+

which expands the glob expression in kraken internally and +appends the suffix defined with -o to each output file. An input file +xyz.png will therefore produce an output file xyz.png.ocr.txt. A second way +is to input multi-image files directly. These can be either in PDF, TIFF, or +JPEG2000 format and are specified like:

+
$ kraken -I some.pdf -o ocr.txt -f pdf segment ...
+
+
+

This will internally extract all page images from the input PDF file and write +one output file with an index (can be changed using the -p option) and the +suffix defined with -o.

+

The -f option can not only be used to extract data from PDF/TIFF/JPEG2000 +files but also various XML formats. In these cases the appropriate data is +automatically selected from the inputs, image data for segmentation or line and +region segmentation for recognition:

+
$ kraken -i alto.xml alto.ocr.txt -i page.xml page.ocr.txt -f xml ocr ...
+
+
+

The code is able to automatically determine if a file is in PageXML or ALTO format.

+
+
+

Binarization

+

The binarization subcommand accepts almost the same parameters as +ocropus-nlbin. Only options not related to binarization, e.g. skew +detection are missing. In addition, error checking (image sizes, inversion +detection, grayscale enforcement) is always disabled and kraken will happily +binarize any image that is thrown at it.

+

Available parameters are:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

option

type

–threshold

FLOAT

–zoom

FLOAT

–escale

FLOAT

–border

FLOAT

–perc

INTEGER RANGE

–range

INTEGER

–low

INTEGER RANGE

–high

INTEGER RANGE

+
+
+

Page Segmentation and Script Detection

+

The segment subcommand access two operations page segmentation into lines and +script detection of those lines.

+

Page segmentation is mostly parameterless, although a switch to change the +color of column separators has been retained. The segmentation is written as a +JSON file containing bounding boxes in reading order and +the general text direction (horizontal, i.e. LTR or RTL text in top-to-bottom +reading order or vertical-ltr/rtl for vertical lines read from left-to-right or +right-to-left).

+

The script detection splits extracted lines from the segmenter into strip +sharing a particular script that can then be recognized by supplying +appropriate models for each detected script to the ocr subcommand.

+

Combined output from both consists of lists in the boxes field corresponding +to a topographical line and containing one or more bounding boxes of a +particular script. Identifiers are ISO 15924 4 character codes.

+
$ kraken -i 14.tif lines.txt segment
+$ cat lines.json
+{
+   "boxes" : [
+    [
+        ["Grek", [561, 216, 1626,309]]
+    ],
+    [
+        ["Latn", [2172, 197, 2424, 244]]
+    ],
+    [
+        ["Grek", [1678, 221, 2236, 320]],
+        ["Arab", [2241, 221, 2302, 320]]
+    ],
+
+        ["Grek", [412, 318, 2215, 416]],
+        ["Latn", [2208, 318, 2424, 416]]
+    ],
+    ...
+   ],
+   "script_detection": true,
+   "text_direction" : "horizontal-tb"
+}
+
+
+

Script detection is automatically enabled; by explicitly disabling script +detection the boxes field will contain only a list of line bounding boxes:

+
[546, 216, 1626, 309],
+[2169, 197, 2423, 244],
+[1676, 221, 2293, 320],
+...
+[503, 2641, 848, 2681]
+
+
+

Available page segmentation parameters are:

+ + + + + + + + + + + + + + + + + + + + + + + +

option

action

-d, –text-direction

Sets principal text direction. Valid values are horizontal-lr, horizontal-rl, vertical-lr, and vertical-rl.

–scale FLOAT

Estimate of the average line height on the page

-m, –maxcolseps

Maximum number of columns in the input document. Set to 0 for uni-column layouts.

-b, –black-colseps / -w, –white-colseps

Switch to black column separators.

-r, –remove-hlines / -l, –hlines

Disables prefiltering of small horizontal lines. Improves segmenter output on some Arabic texts.

+
+
+

Model Repository

+

There is a semi-curated repository of freely licensed recognition +models that can be accessed from the command line using a few subcommands. For +evaluating a series of models it is also possible to just clone the repository +using the normal git client.

+

The list subcommand retrieves a list of all models available and prints +them including some additional information (identifier, type, and a short +description):

+
$ kraken list
+Retrieving model list   ✓
+default (pyrnn) - A converted version of en-default.pyrnn.gz
+toy (clstm) - A toy model trained on 400 lines of the UW3 data set.
+...
+
+
+

To access more detailed information the show subcommand may be used:

+
$ kraken show toy
+name: toy.clstm
+
+A toy model trained on 400 lines of the UW3 data set.
+
+author: Benjamin Kiessling (mittagessen@l.unchti.me)
+http://kraken.re
+
+
+

If a suitable model has been decided upon it can be retrieved using the get +subcommand:

+
$ kraken get toy
+Retrieving model        ✓
+
+
+

Models will be placed in $XDG_BASE_DIR and can be accessed using their name as +shown by the show command, e.g.:

+
$ kraken -i ... ... ocr -m toy
+
+
+

Additions and updates to existing models are always welcome! Just open a pull +request or write an email.

+
+
+

Recognition

+

Recognition requires a grey-scale or binarized image, a page segmentation for +that image, and a model file. In particular there is no requirement to use the +page segmentation algorithm contained in the segment subcommand or the +binarization provided by kraken.

+

Multi-script recognition is possible by supplying a script-annotated +segmentation and a mapping between scripts and models:

+
$ kraken -i ... ... ocr -m Grek:porson.clstm -m Latn:antiqua.clstm
+
+
+

All polytonic Greek text portions will be recognized using the porson.clstm +model while Latin text will be fed into the antiqua.clstm model. It is +possible to define a fallback model that other text will be fed to:

+
$ kraken -i ... ... ocr -m ... -m ... -m default:porson.clstm
+
+
+

It is also possible to disable recognition on a particular script by mapping to +the special model keyword ignore. Ignored lines will still be serialized but +will not contain any recognition results.

+

The ocr subcommand is able to serialize the recognition results either as +plain text (default), as hOCR, into ALTO, or abbyyXML containing additional +metadata such as bounding boxes and confidences:

+
$ kraken -i ... ... ocr -t # text output
+$ kraken -i ... ... ocr -h # hOCR output
+$ kraken -i ... ... ocr -a # ALTO output
+$ kraken -i ... ... ocr -y # abbyyXML output
+
+
+

hOCR output is slightly different from hOCR files produced by ocropus. Each +ocr_line span contains not only the bounding box of the line but also +character boxes (x_bboxes attribute) indicating the coordinates of each +character. In each line alternating sequences of alphanumeric and +non-alphanumeric (in the unicode sense) characters are put into ocrx_word +spans. Both have bounding boxes as attributes and the recognition confidence +for each character in the x_conf attribute.

+

Paragraph detection has been removed as it was deemed to be unduly dependent on +certain typographic features which may not be valid for your input.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.1/api.html b/4.1/api.html new file mode 100644 index 000000000..416c44642 --- /dev/null +++ b/4.1/api.html @@ -0,0 +1,3056 @@ + + + + + + + + API Quickstart — kraken documentation + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

API Quickstart

+

Kraken provides routines which are usable by third party tools to access all +functionality of the OCR engine. Most functional blocks, binarization, +segmentation, recognition, and serialization are encapsulated in one high +level method each.

+

Simple use cases of the API which are mostly useful for debugging purposes are +contained in the contrib directory. In general it is recommended to look at +this tutorial, these scripts, or the API reference. The command line drivers +are unnecessarily complex for straightforward applications as they contain lots +of boilerplate to enable all use cases.

+
+

Basic Concepts

+

The fundamental modules of the API are similar to the command line drivers. +Image inputs and outputs are generally Pillow +objects and numerical outputs numpy arrays.

+

Top-level modules implement high level functionality while kraken.lib +contains loaders and low level methods that usually should not be used if +access to intermediate results is not required.

+
+
+

Preprocessing and Segmentation

+

The primary preprocessing function is binarization although depending on the +particular setup of the pipeline and the models utilized it can be optional. +For the non-trainable legacy bounding box segmenter binarization is mandatory +although it is still possible to feed color and grayscale images to the +recognizer. The trainable baseline segmenter can work with black and white, +grayscale, and color images, depending on the training data and netork +configuration utilized; though grayscale and color data are used in almost all +cases.

+
>>> from PIL import Image
+
+>>> from kraken import binarization
+
+# can be any supported image format and mode
+>>> im = Image.open('foo.png')
+>>> bw_im = binarization.nlbin(im)
+
+
+
+

Legacy segmentation

+

The basic parameter of the legacy segmenter consists just of a b/w image +object, although some additional parameters exist, largely to change the +principal text direction (important for column ordering and top-to-bottom +scripts) and explicit masking of non-text image regions:

+
>>> from kraken import pageseg
+
+>>> seg = pageseg.segment(bw_im)
+>>> seg
+{'text_direction': 'horizontal-lr',
+ 'boxes': [[0, 29, 232, 56],
+           [28, 54, 121, 84],
+           [9, 73, 92, 117],
+           [103, 76, 145, 131],
+           [7, 105, 119, 230],
+           [10, 228, 126, 345],
+           ...
+          ],
+ 'script_detection': False}
+
+
+
+
+

Baseline segmentation

+

The baseline segmentation method is based on a neural network that classifies +image pixels into baselines and regions. Because it is trainable, a +segmentation model is required in addition to the image to be segmentation and +it has to be loaded first:

+
>>> from kraken import blla
+>>> from kraken.lib import vgsl
+
+>>> model_path = 'path/to/model/file'
+>>> model = vgsl.TorchVGSLModel.load_model(model_path)
+
+
+

A segmentation model contains a basic neural network and associated metadata +defining the available line and region types, bounding regions, and an +auxiliary baseline location flag for the polygonizer:

+ + + + + + + + + + + + + Segmentation Model + (TorchVGSLModel) + + + + + + + + + Metadata + + + + + + + Line and Region Types + + + + + + + Baseline location flag + + + + + + + Bounding Regions + + + + + + + + + + + Neural Network + + + + +

Afterwards they can be fed into the segmentation method +kraken.blla.segment() with image objects:

+
>>> from kraken import blla
+>>> from kraken import serialization
+
+>>> baseline_seg = blla.segment(im, model=model)
+>>> baseline_seg
+{'text_direction': 'horizontal-lr',
+ 'type': 'baselines',
+ 'script_detection': False,
+ 'lines': [{'script': 'default',
+            'baseline': [[471, 1408], [524, 1412], [509, 1397], [1161, 1412], [1195, 1412]],
+            'boundary': [[471, 1408], [491, 1408], [515, 1385], [562, 1388], [575, 1377], ... [473, 1410]]},
+           ...],
+ 'regions': {'$tip':[[[536, 1716], ... [522, 1708], [524, 1716], [536, 1716], ...]
+             '$par': ...
+             '$nop':  ...}}
+>>> alto = serialization.serialize_segmentation(baseline_seg, image_name=im.filename, image_size=im.size, template='alto')
+>>> with open('segmentation_output.xml', 'w') as fp:
+        fp.write(alto)
+
+
+

Optional parameters are largely the same as for the legacy segmenter, i.e. text +direction and masking.

+

Images are automatically converted into the proper mode for recognition, except +in the case of models trained on binary images as there is a plethora of +different algorithms available, each with strengths and weaknesses. For most +material the kraken-provided binarization should be sufficient, though. This +does not mean that a segmentation model trained on RGB images will have equal +accuracy for B/W, grayscale, and RGB inputs. Nevertheless the drop in quality +will often be modest or non-existent for color models while non-binarized +inputs to a binary model will cause severe degradation (and a warning to that +notion).

+

Per default segmentation is performed on the CPU although the neural network +can be run on a GPU with the device argument. As the vast majority of the +processing required is postprocessing the performance gain will most likely +modest though.

+

The above API is the most simple way to perform a complete segmentation. The +process consists of multiple steps such as pixel labelling, separate region and +baseline vectorization, and bounding polygon calculation:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Pixel Labelling + + + + + + + + Line and Separator + Heatmaps + + + + + + + + + Bounding Polygon + Calculation + + + + + + + + + + + Baseline + Vectorization + and Orientation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Oriented + Baselines + + + + + + + + + Line + Ordering + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Bounding + Polygons + + + + + + + Trainable + + + + + + + + + + + + Segmentation + + + + + + + + + + Region Heatmaps + + + + + + + + + + Region + Vectorization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Region + Boundaries + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

It is possible to only run a subset of the functionality depending on one’s +needs by calling the respective functions in kraken.lib.segmentation. As +part of the sub-library the API is not guaranteed to be stable but it generally +does not change much. Examples of more fine-grained use of the segmentation API +can be found in contrib/repolygonize.py +and contrib/segmentation_overlay.py.

+
+
+
+

Recognition

+

Recognition itself is a multi-step process with a neural network producing a +matrix with a confidence value for possible outputs at each time step. This +matrix is decoded into a sequence of integer labels (label domain) which are +subsequently mapped into Unicode code points using a codec. Labels and code +points usually correspond one-to-one, i.e. each label is mapped to exactly one +Unicode code point, but if desired more complex codecs can map single labels to +multiple code points, multiple labels to single code points, or multiple labels +to multiple code points (see the Codec section for further +information).

+ + + + + + + + + + + + Output Matrix + + + Labels + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Label + Sequence + + + 15, 10, 1, ... + + + + 'Time' Steps + + + + + + + + + + + + + + 'Time' Steps + (Width) + + + + + + + + + + + + + + + + + + + + + + + + + + Neural + Net + + + + Character + Sequence + + + o, c, u, ... + + + + + + + + + + + + + + + CTC + decoder + + + + + Codec + + + + + + + + + + + + + + +

As the customization of this two-stage decoding process is usually reserved +for specialized use cases, sensible defaults are chosen by default: codecs are +part of the model file and do not have to be supplied manually; the preferred +CTC decoder is an optional parameter of the recognition model object.

+

To perform text line recognition a neural network has to be loaded first. A +kraken.lib.models.TorchSeqRecognizer is returned which is a wrapper +around the kraken.lib.vgsl.TorchVGSLModel class seen above for +segmentation model loading.

+
>>> from kraken.lib import models
+
+>>> rec_model_path = '/path/to/recognition/model'
+>>> model = models.load_any(rec_model_path)
+
+
+

The sequence recognizer wrapper combines the neural network itself, a +codec, metadata such as the if the input is supposed to be +grayscale or binarized, and an instance of a CTC decoder that performs the +conversion of the raw output tensor of the network into a sequence of labels:

+ + + + + + + + + + + + + Transcription Model + (TorchSeqRecognizer) + + + + + + + + + + Codec + + + + + + + + + + + Metadata + + + + + + + + + + + CTC Decoder + + + + + + + + + + + Neural Network + + + + +

Afterwards, given an image, a segmentation and the model one can perform text +recognition. The code is identical for both legacy and baseline segmentations. +Like for segmentation input images are auto-converted to the correct color +mode, except in the case of binary models for which a warning will be raised if +there is a mismatch for binary input models.

+

There are two methods for recognition, a basic single model call +kraken.rpred.rpred() and a multi-model recognizer +kraken.rpred.mm_rpred(). The latter is useful for recognizing +multi-scriptal documents, i.e. applying different models to different parts of +a document.

+
>>> from kraken import rpred
+# single model recognition
+>>> pred_it = rpred(model, im, baseline_seg)
+>>> for record in pred_it:
+        print(record)
+
+
+

The output isn’t just a sequence of characters but an +kraken.rpred.ocr_record record object containing the character +prediction, cuts (approximate locations), and confidences.

+
>>> record.cuts
+>>> record.prediction
+>>> record.confidences
+
+
+

it is also possible to access the original line information:

+
# for baselines
+>>> record.type
+'baselines'
+>>> record.line
+>>> record.baseline
+>>> record.script
+
+# for box lines
+>>> record.type
+'box'
+>>> record.line
+>>> record.script
+
+
+

Sometimes the undecoded raw output of the network is required. The \(C +\times W\) softmax output matrix is accessible as the outputs attribute on the +kraken.lib.models.TorchSeqRecognizer after each step of the +kraken.rpred.rpred() iterator. To get a mapping from the label space +\(C\) the network operates in to Unicode code points a codec is used. An +arbitrary sequence of labels can generate an arbitrary number of Unicode code +points although usually the relation is one-to-one.

+
>>> pred_it = rpred(model, im, baseline_seg)
+>>> next(pred_it)
+>>> model.output
+>>> model.codec.l2c
+{'\x01': ' ',
+ '\x02': '"',
+ '\x03': "'",
+ '\x04': '(',
+ '\x05': ')',
+ '\x06': '-',
+ '\x07': '/',
+ ...
+}
+
+
+

There are several different ways to convert the output matrix to a sequence of +labels that can be decoded into a character sequence. These are contained in +kraken.lib.ctc_decoder with +kraken.lib.ctc_decoder.greedy_decoder() being the default.

+
+
+

XML Parsing

+

Sometimes it is desired to take the data in an existing XML serialization +format like PageXML or ALTO and apply an OCR function on it. The +kraken.lib.xml module includes parsers extracting information into data +structures processable with minimal transformtion by the functional blocks:

+
>>> from kraken.lib import xml
+
+>>> alto_doc = '/path/to/alto'
+>>> xml.parse_alto(alto_doc)
+{'image': '/path/to/image/file',
+ 'type': 'baselines',
+ 'lines': [{'baseline': [(24, 2017), (25, 2078)],
+            'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)],
+            'text': '',
+            'script': 'default'},
+           {'baseline': [(79, 2016), (79, 2041)],
+            'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)],
+            'text': '',
+            'script': 'default'}, ...],
+ 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)],
+                                      [(253, 3292), (668, 3292), (668, 3455), (253, 3455)],
+                                      [(216, -4), (1015, -4), (1015, 534), (216, 534)]],
+             'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)],
+                                  [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]],
+             ...}
+}
+
+>>> page_doc = '/path/to/page'
+>>> xml.parse_page(page_doc)
+{'image': '/path/to/image/file',
+ 'type': 'baselines',
+ 'lines': [{'baseline': [(24, 2017), (25, 2078)],
+            'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)],
+            'text': '',
+            'script': 'default'},
+           {'baseline': [(79, 2016), (79, 2041)],
+            'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)],
+            'text': '',
+            'script': 'default'}, ...],
+ 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)],
+                                      [(253, 3292), (668, 3292), (668, 3455), (253, 3455)],
+                                      [(216, -4), (1015, -4), (1015, 534), (216, 534)]],
+             'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)],
+                                  [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]],
+             ...}
+
+
+
+
+

Serialization

+

The serialization module can be used to transform the ocr_records returned by the prediction iterator into a text +based (most often XML) format for archival. The module renders jinja2 templates in kraken/templates through +the kraken.serialization.serialize() function.

+
>>> from kraken.lib import serialization
+
+>>> records = [record for record in pred_it]
+>>> alto = serialization.serialize(records, image_name='path/to/image', image_size=im.size, template='alto')
+>>> with open('output.xml', 'w') as fp:
+        fp.write(alto)
+
+
+
+
+

Training

+

Training is largely implemented with the pytorch lightning framework. There are separate +LightningModule`s for recognition and segmentation training and a small +wrapper around the lightning’s `Trainer class that mainly sets up model +handling and verbosity options for the CLI.

+
>>> from kraken.lib.train import RecognitionModel, KrakenTrainer
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer()
+>>> trainer.fit(model)
+
+
+

Likewise for a baseline and region segmentation model:

+
>>> from kraken.lib.train import SegmentationModel, KrakenTrainer
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = SegmentationModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer()
+>>> trainer.fit(model)
+
+
+

When the fit() method is called the dataset is initialized and the training +commences. Both can take quite a bit of time. To get insight into what exactly +is happening the standard lightning callbacks +can be attached to the trainer object:

+
>>> from pytorch_lightning.callbacks import Callback
+>>> from kraken.lib.train import RecognitionModel, KrakenTrainer
+>>> class MyPrintingCallback(Callback):
+    def on_init_start(self, trainer):
+        print("Starting to init trainer!")
+
+    def on_init_end(self, trainer):
+        print("trainer is init now")
+
+    def on_train_end(self, trainer, pl_module):
+        print("do something when training ends")
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer(enable_progress_bar=False, callbacks=[MyPrintingCallback])
+>>> trainer.fit(model)
+Starting to init trainer!
+trainer is init now
+
+
+

This is only a small subset of the training functionality. It is suggested to +have a closer look at the command line parameters for features as transfer +learning, region and baseline filtering, training continuation, and so on.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.1/api_docs.html b/4.1/api_docs.html new file mode 100644 index 000000000..0941491e9 --- /dev/null +++ b/4.1/api_docs.html @@ -0,0 +1,2702 @@ + + + + + + + + API Reference — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

API Reference

+
+

kraken.blla module

+
+

Note

+

blla provides the interface to the fully trainable segmenter. For the +legacy segmenter interface refer to the pageseg module. Note that +recognition models are not interchangeable between segmenters.

+
+
+
+kraken.blla.segment(im, text_direction='horizontal-lr', mask=None, reading_order_fn=polygonal_reading_order, model=None, device='cpu')
+

Segments a page into text lines using the baseline segmenter.

+

Segments a page into text lines and returns the polyline formed by each +baseline and their estimated environment.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image. The mode can generally be anything but it is possible +to supply a binarized-input-only model which requires accordingly +treated images.

  • +
  • text_direction (str) – Passed-through value for serialization.serialize.

  • +
  • mask (Optional[numpy.ndarray]) – A bi-level mask image of the same size as im where 0-valued +regions are ignored for segmentation purposes. Disables column +detection.

  • +
  • reading_order_fn (Callable) – Function to determine the reading order. Has to +accept a list of tuples (baselines, polygon) and a +text direction (lr or rl).

  • +
  • model (Union[List[kraken.lib.vgsl.TorchVGSLModel], kraken.lib.vgsl.TorchVGSLModel]) – One or more TorchVGSLModel containing a segmentation model. If +none is given a default model will be loaded.

  • +
  • device (str) – The target device to run the neural network on.

  • +
+
+
Returns:
+

A dictionary containing the text direction and under the key ‘lines’ a +list of reading order sorted baselines (polylines) and their respective +polygonal boundaries. The last and first point of each boundary polygon +are connected.

+
 {'text_direction': '$dir',
+  'type': 'baseline',
+  'lines': [
+     {'baseline': [[x0, y0], [x1, y1], ..., [x_n, y_n]], 'boundary': [[x0, y0, x1, y1], ... [x_m, y_m]]},
+     {'baseline': [[x0, ...]], 'boundary': [[x0, ...]]}
+   ]
+   'regions': [
+     {'region': [[x0, y0], [x1, y1], ..., [x_n, y_n]], 'type': 'image'},
+     {'region': [[x0, ...]], 'type': 'text'}
+   ]
+ }
+
+
+

+
+
Raises:
+
+
+
Return type:
+

Dict[str, Any]

+
+
+
+ +
+
+

kraken.pageseg module

+
+

Note

+

pageseg is the legacy bounding box-based segmenter. For the trainable +baseline segmenter interface refer to the blla module. Note that +recognition models are not interchangeable between segmenters.

+
+
+
+kraken.pageseg.segment(im, text_direction='horizontal-lr', scale=None, maxcolseps=2, black_colseps=False, no_hlines=True, pad=0, mask=None, reading_order_fn=reading_order)
+

Segments a page into text lines.

+

Segments a page into text lines and returns the absolute coordinates of +each line in reading order.

+
+
Parameters:
+
    +
  • im – A bi-level page of mode ‘1’ or ‘L’

  • +
  • text_direction (str) – Principal direction of the text +(horizontal-lr/rl/vertical-lr/rl)

  • +
  • scale (Optional[float]) – Scale of the image. Will be auto-determined if set to None.

  • +
  • maxcolseps (float) – Maximum number of whitespace column separators

  • +
  • black_colseps (bool) – Whether column separators are assumed to be vertical +black lines or not

  • +
  • no_hlines (bool) – Switch for small horizontal line removal.

  • +
  • pad (Union[int, Tuple[int, int]]) – Padding to add to line bounding boxes. If int the same padding is +used both left and right. If a 2-tuple, uses (padding_left, +padding_right).

  • +
  • mask (Optional[numpy.ndarray]) – A bi-level mask image of the same size as im where 0-valued +regions are ignored for segmentation purposes. Disables column +detection.

  • +
  • reading_order_fn (Callable) – Function to call to order line output. Callable +accepting a list of slices (y, x) and a text +direction in (rl, lr).

  • +
+
+
Returns:
+

A dictionary containing the text direction and a list of reading order +sorted bounding boxes under the key ‘boxes’:

+
{'text_direction': '$dir', 'boxes': [(x1, y1, x2, y2),...]}
+
+
+

+
+
Raises:
+

KrakenInputException – if the input image is not binarized or the text +direction is invalid.

+
+
Return type:
+

Dict[str, Any]

+
+
+
+ +
+
+

kraken.rpred module

+
+
+kraken.rpred.bidi_record(record, base_dir=None)
+

Reorders a record using the Unicode BiDi algorithm.

+

Models trained for RTL or mixed scripts still emit classes in LTR order +requiring reordering for proper display.

+
+
Parameters:
+

record (kraken.rpred.ocr_record)

+
+
Returns:
+

kraken.rpred.ocr_record

+
+
Return type:
+

ocr_record

+
+
+
+ +
+
+class kraken.rpred.mm_rpred(nets, im, bounds, pad=16, bidi_reordering=True, tags_ignore=None)
+

Multi-model version of kraken.rpred.rpred

+
+
Parameters:
+
+
+
+
+
+bidi_reordering
+
+ +
+
+bounds
+
+ +
+
+filtered_tags = []
+
+ +
+
+im
+
+ +
+
+im_str
+
+ +
+
+miss = []
+
+ +
+
+nets
+
+ +
+
+one_channel_modes
+
+ +
+
+pad
+
+ +
+
+seg_types
+
+ +
+
+tags
+
+ +
+
+tags_ignore
+
+ +
+
+ts
+
+ +
+ +
+
+class kraken.rpred.ocr_record(prediction, cuts, confidences, line)
+

A record object containing the recognition result of a single line

+
+
Parameters:
+
    +
  • prediction (str)

  • +
  • confidences (List[float])

  • +
  • line (Union[List, Dict[str, List]])

  • +
+
+
+
+
+base_dir = None
+
+ +
+
+confidences
+
+ +
+
+cuts
+
+ +
+
+prediction
+
+ +
+
+tags
+
+ +
+
+type
+
+ +
+ +
+
+kraken.rpred.rpred(network, im, bounds, pad=16, bidi_reordering=True)
+

Uses a TorchSeqRecognizer and a segmentation to recognize text

+
+
Parameters:
+
    +
  • network (kraken.lib.models.TorchSeqRecognizer) – A TorchSegRecognizer +object

  • +
  • im (PIL.Image.Image) – Image to extract text from

  • +
  • bounds (dict) – A dictionary containing a ‘boxes’ entry with a list of +coordinates (x0, y0, x1, y1) of a text line in the image +and an entry ‘text_direction’ containing +‘horizontal-lr/rl/vertical-lr/rl’.

  • +
  • pad (int) – Extra blank padding to the left and right of text line. +Auto-disabled when expected network inputs are incompatible +with padding.

  • +
  • bidi_reordering (bool|str) – Reorder classes in the ocr_record according to +the Unicode bidirectional algorithm for correct +display. Set to L|R to change base text +direction.

  • +
+
+
Yields:
+

An ocr_record containing the recognized text, absolute character +positions, and confidence values for each character.

+
+
Return type:
+

Generator[ocr_record, None, None]

+
+
+
+ +
+
+

kraken.serialization module

+
+
+kraken.serialization.render_report(model, chars, errors, char_confusions, scripts, insertions, deletions, substitutions)
+

Renders an accuracy report.

+
+
Parameters:
+
    +
  • model (str) – Model name.

  • +
  • errors (int) – Number of errors on test set.

  • +
  • char_confusions (dict) – Dictionary mapping a tuple (gt, pred) to a +number of occurrences.

  • +
  • scripts (dict) – Dictionary counting character per script.

  • +
  • insertions (dict) – Dictionary counting insertion operations per Unicode +script

  • +
  • deletions (int) – Number of deletions

  • +
  • substitutions (dict) – Dictionary counting substitution operations per +Unicode script.

  • +
  • chars (int)

  • +
+
+
Returns:
+

A string containing the rendered report.

+
+
Return type:
+

str

+
+
+
+ +
+
+kraken.serialization.serialize(records, image_name=None, image_size=(0, 0), writing_mode='horizontal-tb', scripts=None, regions=None, template='hocr')
+

Serializes a list of ocr_records into an output document.

+

Serializes a list of predictions and their corresponding positions by doing +some hOCR-specific preprocessing and then renders them through one of +several jinja2 templates.

+

Note: Empty records are ignored for serialization purposes.

+
+
Parameters:
+
    +
  • records (iterable) – List of kraken.rpred.ocr_record

  • +
  • image_name (str) – Name of the source image

  • +
  • image_size (tuple) – Dimensions of the source image

  • +
  • writing_mode (str) – Sets the principal layout of lines and the +direction in which blocks progress. Valid values +are horizontal-tb, vertical-rl, and +vertical-lr.

  • +
  • scripts (list) – List of scripts contained in the OCR records

  • +
  • regions (list) – Dictionary mapping region types to a list of region +polygons.

  • +
  • template (str) – Selector for the serialization format. May be +‘hocr’ or ‘alto’.

  • +
+
+
Returns:
+

(str) rendered template.

+
+
Return type:
+

str

+
+
+
+ +
+
+kraken.serialization.serialize_segmentation(segresult, image_name=None, image_size=(0, 0), template='hocr')
+

Serializes a segmentation result into an output document.

+
+
Parameters:
+
    +
  • segresult (Dict[str, Any]) – Result of blla.segment

  • +
  • image_name (str) – Name of the source image

  • +
  • image_size (tuple) – Dimensions of the source image

  • +
  • template (str) – Selector for the serialization format. May be +‘hocr’ or ‘alto’.

  • +
+
+
Returns:
+

(str) rendered template.

+
+
Return type:
+

str

+
+
+
+ +
+
+

kraken.lib.models module

+
+
+class kraken.lib.models.TorchSeqRecognizer(nn, decoder=kraken.lib.ctc_decoder.greedy_decoder, train=False, device='cpu')
+

A wrapper class around a TorchVGSLModel for text recognition.

+
+
Parameters:
+
+
+
+
+
+codec
+
+ +
+
+decoder
+
+ +
+
+device
+
+ +
+
+forward(line, lens=None)
+

Performs a forward pass on a torch tensor of one or more lines with +shape (N, C, H, W) and returns a numpy array (N, W, C).

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (torch.Tensor) – Optional tensor containing sequence lengths if N > 1

  • +
+
+
Returns:
+

Tuple with (N, W, C) shaped numpy array and final output sequence +lengths.

+
+
Raises:
+

KrakenInputException – Is raised if the channel dimension isn’t of +size 1 in the network output.

+
+
Return type:
+

Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]

+
+
+
+ +
+
+kind = ''
+
+ +
+
+nn
+
+ +
+
+one_channel_mode
+
+ +
+
+predict(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns the decoding as a list of tuples (string, start, end, +confidence).

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (Optional[torch.Tensor]) – Optional tensor containing sequence lengths if N > 1

  • +
+
+
Returns:
+

List of decoded sequences.

+
+
Return type:
+

List[List[Tuple[str, int, int, float]]]

+
+
+
+ +
+
+predict_labels(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns a list of tuples (class, start, end, max). Max is the +maximum value of the softmax layer in the region.

+
+
Parameters:
+
    +
  • line (torch.tensor)

  • +
  • lens (torch.Tensor)

  • +
+
+
Return type:
+

List[List[Tuple[int, int, int, float]]]

+
+
+
+ +
+
+predict_string(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns a string of the results.

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (Optional[torch.Tensor]) – Optional tensor

  • +
+
+
Return type:
+

List[str]

+
+
+
+ +
+
+seg_type
+
+ +
+
+to(device)
+

Moves model to device and automatically loads input tensors onto it.

+
+ +
+
+train
+
+ +
+ +
+
+kraken.lib.models.load_any(fname, train=False, device='cpu')
+

Loads anything that was, is, and will be a valid ocropus model and +instantiates a shiny new kraken.lib.lstm.SeqRecognizer from the RNN +configuration in the file.

+

Currently it recognizes the following kinds of models:

+
+
    +
  • protobuf models containing converted python BIDILSTMs (recognition +only)

  • +
  • protobuf models containing CLSTM networks (recognition only)

  • +
  • protobuf models containing VGSL segmentation and recognitino +networks.

  • +
+
+

Additionally an attribute ‘kind’ will be added to the SeqRecognizer +containing a string representation of the source kind. Current known values +are:

+
+
    +
  • pyrnn for pickled BIDILSTMs

  • +
  • clstm for protobuf models generated by clstm

  • +
  • vgsl for VGSL models

  • +
+
+
+
Parameters:
+
    +
  • fname (str) – Path to the model

  • +
  • train (bool) – Enables gradient calculation and dropout layers in model.

  • +
  • device (str) – Target device

  • +
+
+
Returns:
+

A kraken.lib.models.TorchSeqRecognizer object.

+
+
Raises:
+

KrakenInvalidModelException – if the model is not loadable by any parser.

+
+
Return type:
+

TorchSeqRecognizer

+
+
+
+ +
+
+

kraken.lib.vgsl module

+
+
+class kraken.lib.vgsl.TorchVGSLModel(spec)
+

Class building a torch module from a VSGL spec.

+

The initialized class will contain a variable number of layers and a loss +function. Inputs and outputs are always 4D tensors in order (batch, +channels, height, width) with channels always being the feature dimension.

+

Importantly this means that a recurrent network will be fed the channel +vector at each step along its time axis, i.e. either put the non-time-axis +dimension into the channels dimension or use a summarizing RNN squashing +the time axis to 1 and putting the output into the channels dimension +respectively.

+
+
Parameters:
+

spec (str)

+
+
+
+
+input
+

Expected input tensor as a 4-tuple.

+
+
Type:
+

tuple

+
+
+
+ +
+
+nn
+

Stack of layers parsed from the spec.

+
+
Type:
+

torch.nn.Sequential

+
+
+
+ +
+
+criterion
+

Fully parametrized loss function.

+
+
Type:
+

torch.nn.Module

+
+
+
+ +
+
+user_metadata
+

dict with user defined metadata. Is flushed into +model file during saving/overwritten by loading +operations.

+
+
Type:
+

dict

+
+
+
+ +
+
+one_channel_mode
+

Field indicating the image type used during +training of one-channel images. Is ‘1’ for +models trained on binarized images, ‘L’ for +grayscale, and None otherwise.

+
+
Type:
+

str

+
+
+
+ +
+
+add_codec(codec)
+

Adds a PytorchCodec to the model.

+
+
Parameters:
+

codec (kraken.lib.codec.PytorchCodec)

+
+
Return type:
+

None

+
+
+
+ +
+
+append(idx, spec)
+

Splits a model at layer idx and append layers spec.

+

New layers are initialized using the init_weights method.

+
+
Parameters:
+
    +
  • idx (int) – Index of layer to append spec to starting with 1. To +select the whole layer stack set idx to None.

  • +
  • spec (str) – VGSL spec without input block to append to model.

  • +
+
+
Return type:
+

None

+
+
+
+ +
+
+blocks
+
+ +
+
+build_addition(input, blocks, idx)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_conv(input, blocks, idx)
+

Builds a 2D convolution layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_dropout(input, blocks, idx)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_groupnorm(input, blocks, idx)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_identity(input, blocks, idx)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_maxpool(input, blocks, idx)
+

Builds a maxpool layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_output(input, blocks, idx)
+

Builds an output layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_parallel(input, blocks, idx)
+

Builds a block of parallel layers.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_reshape(input, blocks, idx)
+

Builds a reshape layer

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_rnn(input, blocks, idx)
+

Builds an LSTM/GRU layer returning number of outputs and layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_series(input, blocks, idx)
+

Builds a serial block of layers.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+codec: kraken.lib.codec.PytorchCodec | None = None
+
+ +
+
+criterion: Any = None
+
+ +
+
+eval()
+

Sets the model to evaluation/inference mode, disabling dropout and +gradient calculation.

+
+
Return type:
+

None

+
+
+
+ +
+
+property hyper_params
+
+ +
+
+idx
+
+ +
+
+init_weights(idx=slice(0, None))
+

Initializes weights for all or a subset of layers in the graph.

+

LSTM/GRU layers are orthogonally initialized, convolutional layers +uniformly from (-0.1,0.1).

+
+
Parameters:
+

idx (slice) – A slice object representing the indices of layers to +initialize.

+
+
Return type:
+

None

+
+
+
+ +
+
+input
+
+ +
+
+classmethod load_clstm_model(path)
+

Loads an CLSTM model to VGSL.

+
+
Parameters:
+

path (Union[str, pathlib.Path])

+
+
+
+ +
+
+classmethod load_model(path)
+

Deserializes a VGSL model from a CoreML file.

+
+
Parameters:
+

path (Union[str, pathlib.Path]) – CoreML file

+
+
Returns:
+

A TorchVGSLModel instance.

+
+
Raises:
+
    +
  • KrakenInvalidModelException if the model data is invalid (not a

  • +
  • string, protobuf file, or without appropriate metadata).

  • +
  • FileNotFoundError if the path doesn't point to a file.

  • +
+
+
+
+ +
+
+classmethod load_pronn_model(path)
+

Loads an pronn model to VGSL.

+
+
Parameters:
+

path (Union[str, pathlib.Path])

+
+
+
+ +
+
+m
+
+ +
+
+property model_type
+
+ +
+
+named_spec: List[str] = []
+
+ +
+
+nn
+
+ +
+
+property one_channel_mode
+
+ +
+
+ops
+
+ +
+
+pattern
+
+ +
+
+resize_output(output_size, del_indices=None)
+

Resizes an output layer.

+
+
Parameters:
+
    +
  • output_size (int) – New size/output channels of last layer

  • +
  • del_indices (list) – list of outputs to delete from layer

  • +
+
+
Return type:
+

None

+
+
+
+ +
+
+save_model(path)
+

Serializes the model into path.

+
+
Parameters:
+

path (str) – Target destination

+
+
+
+ +
+
+property seg_type
+
+ +
+
+set_num_threads(num)
+

Sets number of OpenMP threads to use.

+
+
Parameters:
+

num (int)

+
+
Return type:
+

None

+
+
+
+ +
+
+spec
+
+ +
+
+to(device)
+
+
Parameters:
+

device (Union[str, torch.device])

+
+
Return type:
+

None

+
+
+
+ +
+
+train()
+

Sets the model to training mode (enables dropout layers and disables +softmax on CTC layers).

+
+
Return type:
+

None

+
+
+
+ +
+
+user_metadata: dict[str, str]
+
+ +
+ +
+
+

kraken.lib.xml module

+
+
+kraken.lib.xml.parse_xml(filename)
+

Parses either a PageXML or ALTO file with autodetermination of the file +format.

+
+
Parameters:
+

filename (Union[str, pathlib.Path]) – path to an XML file.

+
+
Returns:
+

A dict:

+
{'image': impath,
+ 'lines': [{'boundary': [[x0, y0], ...],
+            'baseline': [[x0, y0], ...],
+            'text': apdjfqpf',
+            'tags': {'type': 'default', ...}},
+           ...
+           {...}],
+ 'regions': {'region_type_0': [[[x0, y0], ...], ...], ...}}
+
+
+

+
+
Return type:
+

Dict[str, Any]

+
+
+
+ +
+
+kraken.lib.xml.parse_page(filename)
+

Parses a PageXML file, returns the baselines defined in it, and loads the +referenced image.

+
+
Parameters:
+

filename (Union[str, pathlib.Path]) – path to a PageXML file.

+
+
Returns:
+

A dict:

+
{'image': impath,
+ 'lines': [{'boundary': [[x0, y0], ...],
+            'baseline': [[x0, y0], ...],
+            'text': apdjfqpf',
+            'tags': {'type': 'default', ...}},
+           ...
+           {...}],
+ 'regions': {'region_type_0': [[[x0, y0], ...], ...], ...}}
+
+
+

+
+
Return type:
+

Dict[str, Any]

+
+
+
+ +
+
+kraken.lib.xml.parse_alto(filename)
+

Parses an ALTO file, returns the baselines defined in it, and loads the +referenced image.

+
+
Parameters:
+

filename (Union[str, pathlib.Path]) – path to an ALTO file.

+
+
Returns:
+

A dict:

+
{'image': impath,
+ 'lines': [{'boundary': [[x0, y0], ...],
+            'baseline': [[x0, y0], ...],
+            'text': apdjfqpf',
+            'tags': {'type': 'default', ...}},
+           ...
+           {...}],
+ 'regions': {'region_type_0': [[[x0, y0], ...], ...], ...}}
+
+
+

+
+
Return type:
+

Dict[str, Any]

+
+
+
+ +
+
+

kraken.lib.codec module

+
+
+class kraken.lib.codec.PytorchCodec(charset, strict=False)
+

Builds a codec converting between graphemes/code points and integer +label sequences.

+

charset may either be a string, a list or a dict. In the first case +each code point will be assigned a label, in the second case each +string in the list will be assigned a label, and in the final case each +key string will be mapped to the value sequence of integers. In the +first two cases labels will be assigned automatically. When a mapping +is manually provided the label codes need to be a prefix-free code.

+

As 0 is the blank label in a CTC output layer, output labels and input +dictionaries are/should be 1-indexed.

+
+
Parameters:
+
    +
  • charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) – Input character set.

  • +
  • strict – Flag indicating if encoding/decoding errors should be ignored +or cause an exception.

  • +
+
+
Raises:
+

KrakenCodecException – If the character set contains duplicate +entries or the mapping is non-singular or +non-prefix-free.

+
+
+
+
+add_labels(charset)
+

Adds additional characters/labels to the codec.

+

charset may either be a string, a list or a dict. In the first case +each code point will be assigned a label, in the second case each +string in the list will be assigned a label, and in the final case each +key string will be mapped to the value sequence of integers. In the +first two cases labels will be assigned automatically.

+

As 0 is the blank label in a CTC output layer, output labels and input +dictionaries are/should be 1-indexed.

+
+
Parameters:
+

charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) – Input character set.

+
+
Return type:
+

PytorchCodec

+
+
+
+ +
+
+c_sorted
+
+ +
+
+decode(labels)
+

Decodes a labelling.

+

Given a labelling with cuts and confidences returns a string with the +cuts and confidences aggregated across label-code point +correspondences. When decoding multilabels to code points the resulting +cuts are min/max, confidences are averaged.

+
+
Parameters:
+

labels (Sequence[Tuple[int, int, int, float]]) – Input containing tuples (label, start, end, +confidence).

+
+
Returns:
+

A list of tuples (code point, start, end, confidence)

+
+
Return type:
+

List[Tuple[str, int, int, float]]

+
+
+
+ +
+
+encode(s)
+

Encodes a string into a sequence of labels.

+

If the code is non-singular we greedily encode the longest sequence first.

+
+
Parameters:
+

s (str) – Input unicode string

+
+
Returns:
+

Ecoded label sequence

+
+
Raises:
+

KrakenEncodeException – if the a subsequence is not encodable and the +codec is set to strict mode.

+
+
Return type:
+

torch.IntTensor

+
+
+
+ +
+
+property is_valid: bool
+

Returns True if the codec is prefix-free (in label space) and +non-singular (in both directions).

+
+
Return type:
+

bool

+
+
+
+ +
+
+l2c: Dict[Tuple[int], str]
+
+ +
+
+property max_label: int
+

Returns the maximum label value.

+
+
Return type:
+

int

+
+
+
+ +
+
+merge(codec)
+

Transforms this codec (c1) into another (c2) reusing as many labels as +possible.

+

The resulting codec is able to encode the same code point sequences +while not necessarily having the same labels for them as c2. +Retains matching character -> label mappings from both codecs, removes +mappings not c2, and adds mappings not in c1. Compound labels in c2 for +code point sequences not in c1 containing labels also in use in c1 are +added as separate labels.

+
+
Parameters:
+

codec (PytorchCodec) – PytorchCodec to merge with

+
+
Returns:
+

A merged codec and a list of labels that were removed from the +original codec.

+
+
Return type:
+

Tuple[PytorchCodec, Set]

+
+
+
+ +
+
+strict
+
+ +
+ +
+
+

kraken.lib.train module

+
+

Training Schedulers

+
+
+

Training Stoppers

+
+
+

Loss and Evaluation Functions

+
+
+

Trainer

+
+
+class kraken.lib.train.KrakenTrainer(callbacks=None, enable_progress_bar=True, enable_summary=True, min_epochs=5, max_epochs=100, *args, **kwargs)
+
+
Parameters:
+
    +
  • callbacks (Optional[Union[List[pytorch_lightning.callbacks.Callback], pytorch_lightning.callbacks.Callback]])

  • +
  • enable_progress_bar (bool)

  • +
  • enable_summary (bool)

  • +
+
+
+
+
+fit(*args, **kwargs)
+
+ +
+ +
+
+
+

kraken.lib.dataset module

+
+

Datasets

+
+
+class kraken.lib.dataset.BaselineSet(imgs=None, suffix='.path', line_width=4, im_transforms=transforms.Compose([]), mode='path', augmentation=False, valid_baselines=None, merge_baselines=None, valid_regions=None, merge_regions=None)
+

Dataset for training a baseline/region segmentation model.

+
+
Parameters:
+
    +
  • imgs (Sequence[str])

  • +
  • suffix (str)

  • +
  • line_width (int)

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • mode (str)

  • +
  • augmentation (bool)

  • +
  • valid_baselines (Sequence[str])

  • +
  • merge_baselines (Dict[str, Sequence[str]])

  • +
  • valid_regions (Sequence[str])

  • +
  • merge_regions (Dict[str, Sequence[str]])

  • +
+
+
+
+
+add(image, baselines=None, regions=None, *args, **kwargs)
+

Adds a page to the dataset.

+
+
Parameters:
+
    +
  • im (path) – Path to the whole page image

  • +
  • baseline (dict) – A list containing dicts with a list of coordinates +and tags [{‘baseline’: [[x0, y0], …, +[xn, yn]], ‘tags’: (‘script_type’,)}, …]

  • +
  • regions (dict) – A dict containing list of lists of coordinates +{‘region_type_0’: [[x0, y0], …, [xn, yn]]], +‘region_type_1’: …}.

  • +
  • image (Union[str, PIL.Image.Image])

  • +
  • baselines (List[List[List[Tuple[int, int]]]])

  • +
+
+
+
+ +
+
+aug = None
+
+ +
+
+class_mapping
+
+ +
+
+class_stats
+
+ +
+
+im_mode = '1'
+
+ +
+
+imgs
+
+ +
+
+line_width
+
+ +
+
+mbl_dict
+
+ +
+
+mode
+
+ +
+
+mreg_dict
+
+ +
+
+num_classes = 2
+
+ +
+
+seg_type = None
+
+ +
+
+targets = []
+
+ +
+
+transform(image, target)
+
+ +
+
+transforms
+
+ +
+
+valid_baselines
+
+ +
+
+valid_regions
+
+ +
+ +
+
+class kraken.lib.dataset.PolygonGTDataset(normalization=None, whitespace_normalization=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False)
+

Dataset for training a line recognition model from polygonal/baseline data.

+
+
Parameters:
+
    +
  • normalization (Optional[str])

  • +
  • whitespace_normalization (bool)

  • +
  • reorder (Union[bool, str])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • augmentation (bool)

  • +
+
+
+
+
+add(*args, **kwargs)
+

Adds a line to the dataset.

+
+
Parameters:
+
    +
  • im (path) – Path to the whole page image

  • +
  • text (str) – Transcription of the line.

  • +
  • baseline (list) – A list of coordinates [[x0, y0], …, [xn, yn]].

  • +
  • boundary (list) – A polygon mask for the line.

  • +
+
+
+
+ +
+
+alphabet: collections.Counter
+
+ +
+
+aug = None
+
+ +
+
+encode(codec=None)
+

Adds a codec to the dataset and encodes all text lines.

+

Has to be run before sampling from the dataset.

+
+
Parameters:
+

codec (Optional[kraken.lib.codec.PytorchCodec])

+
+
Return type:
+

None

+
+
+
+ +
+
+im_mode = '1'
+
+ +
+
+no_encode()
+

Creates an unencoded dataset.

+
+
Return type:
+

None

+
+
+
+ +
+
+parse(image, text, baseline, boundary, *args, **kwargs)
+

Parses a sample for the dataset and returns it.

+

This function is mainly uses for parallelized loading of training data.

+
+
Parameters:
+
    +
  • im (path) – Path to the whole page image

  • +
  • text (str) – Transcription of the line.

  • +
  • baseline (list) – A list of coordinates [[x0, y0], …, [xn, yn]].

  • +
  • boundary (list) – A polygon mask for the line.

  • +
  • image (Union[str, PIL.Image.Image])

  • +
+
+
+
+ +
+
+seg_type = 'baselines'
+
+ +
+
+text_transforms: List[Callable[[str], str]] = []
+
+ +
+
+transforms
+
+ +
+ +
+
+class kraken.lib.dataset.GroundTruthDataset(split=F_t.default_split, suffix='.gt.txt', normalization=None, whitespace_normalization=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False)
+

Dataset for training a line recognition model.

+

All data is cached in memory.

+
+
Parameters:
+
    +
  • split (Callable[[str], str])

  • +
  • suffix (str)

  • +
  • normalization (Optional[str])

  • +
  • whitespace_normalization (bool)

  • +
  • reorder (Union[bool, str])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • augmentation (bool)

  • +
+
+
+
+
+add(*args, **kwargs)
+

Adds a line-image-text pair to the dataset.

+
+
Parameters:
+

image (str) – Input image path

+
+
Return type:
+

None

+
+
+
+ +
+
+alphabet: collections.Counter
+
+ +
+
+aug = None
+
+ +
+
+encode(codec=None)
+

Adds a codec to the dataset and encodes all text lines.

+

Has to be run before sampling from the dataset.

+
+
Parameters:
+

codec (Optional[kraken.lib.codec.PytorchCodec])

+
+
Return type:
+

None

+
+
+
+ +
+
+im_mode = '1'
+
+ +
+
+no_encode()
+

Creates an unencoded dataset.

+
+
Return type:
+

None

+
+
+
+ +
+
+parse(image, *args, **kwargs)
+

Parses a sample for this dataset.

+

This is mostly used to parallelize populating the dataset.

+
+
Parameters:
+

image (str) – Input image path

+
+
Return type:
+

Dict

+
+
+
+ +
+
+seg_type = 'bbox'
+
+ +
+
+split
+
+ +
+
+suffix
+
+ +
+
+text_transforms: List[Callable[[str], str]] = []
+
+ +
+
+transforms
+
+ +
+ +
+
+

Helpers

+
+
+kraken.lib.dataset.compute_error(model, batch)
+

Computes error report from a model and a list of line image-text pairs.

+
+
Parameters:
+
+
+
Returns:
+

A tuple with total number of characters and edit distance across the +whole validation set.

+
+
Return type:
+

Tuple[int, int]

+
+
+
+ +
+
+kraken.lib.dataset.preparse_xml_data(filenames, format_type='xml', repolygonize=False)
+

Loads training data from a set of xml files.

+

Extracts line information from Page/ALTO xml files for training of +recognition models.

+
+
Parameters:
+
    +
  • filenames (Sequence[Union[str, pathlib.Path]]) – List of XML files.

  • +
  • format_type (str) – Either page, alto or xml for autodetermination.

  • +
  • repolygonize (bool) – (Re-)calculates polygon information using the kraken +algorithm.

  • +
+
+
Returns:
+

text, ‘baseline’: [[x0, y0], …], ‘boundary’: +[[x0, y0], …], ‘image’: PIL.Image}.

+
+
Return type:
+

A list of dicts {‘text’

+
+
+
+ +
+
+

kraken.lib.segmentation module

+
+
+kraken.lib.segmentation.reading_order(lines, text_direction='lr')
+

Given the list of lines (a list of 2D slices), computes +the partial reading order. The output is a binary 2D array +such that order[i,j] is true if line i comes before line j +in reading order.

+
+
Parameters:
+
    +
  • lines (Sequence[Tuple[slice, slice]])

  • +
  • text_direction (str)

  • +
+
+
Return type:
+

numpy.ndarray

+
+
+
+ +
+
+kraken.lib.segmentation.polygonal_reading_order(lines, text_direction='lr', regions=None)
+

Given a list of baselines and regions, calculates the correct reading order +and applies it to the input.

+
+
Parameters:
+
    +
  • lines (Sequence) – List of tuples containing the baseline and its +polygonization.

  • +
  • regions (Sequence) – List of region polygons.

  • +
  • text_direction (str) – Set principal text direction for column ordering. +Can be ‘lr’ or ‘rl’

  • +
+
+
Returns:
+

A reordered input.

+
+
Return type:
+

Sequence[Tuple[List[Tuple[int, int]], List[Tuple[int, int]]]]

+
+
+
+ +
+
+kraken.lib.segmentation.denoising_hysteresis_thresh(im, low, high, sigma)
+
+ +
+
+kraken.lib.segmentation.vectorize_lines(im, threshold=0.17, min_length=5)
+

Vectorizes lines from a binarized array.

+
+
Parameters:
+
    +
  • im (np.ndarray) – Array of shape (3, H, W) with the first dimension +being probabilities for (start_separators, +end_separators, baseline).

  • +
  • threshold (float) – Threshold for baseline blob detection.

  • +
  • min_length (int) – Minimal length of output baselines.

  • +
+
+
Returns:
+

[[x0, y0, … xn, yn], [xm, ym, …, xk, yk], … ] +A list of lists containing the points of all baseline polylines.

+
+
+
+ +
+
+kraken.lib.segmentation.calculate_polygonal_environment(im=None, baselines=None, suppl_obj=None, im_feats=None, scale=None, topline=False)
+

Given a list of baselines and an input image, calculates a polygonal +environment around each baseline.

+
+
Parameters:
+
    +
  • im (PIL.Image) – grayscale input image (mode ‘L’)

  • +
  • baselines (sequence) – List of lists containing a single baseline per +entry.

  • +
  • suppl_obj (sequence) – List of lists containing additional polylines +that should be considered hard boundaries for +polygonizaton purposes. Can be used to prevent +polygonization into non-text areas such as +illustrations or to compute the polygonization of +a subset of the lines in an image.

  • +
  • im_feats (numpy.array) – An optional precomputed seamcarve energy map. +Overrides data in im. The default map is +gaussian_filter(sobel(im), 2).

  • +
  • scale (tuple) – A 2-tuple (h, w) containing optional scale factors of +the input. Values of 0 are used for aspect-preserving +scaling. None skips input scaling.

  • +
  • topline (bool) – Switch to change default baseline location for offset +calculation purposes. If set to False, baselines are +assumed to be on the bottom of the text line and will +be offset upwards, if set to True, baselines are on the +top and will be offset downwards. If set to None, no +offset will be applied.

  • +
+
+
Returns:
+

List of lists of coordinates. If no polygonization could be compute for +a baseline None is returned instead.

+
+
+
+ +
+
+kraken.lib.segmentation.scale_polygonal_lines(lines, scale)
+

Scales baselines/polygon coordinates by a certain factor.

+
+
Parameters:
+
    +
  • lines (Sequence) – List of tuples containing the baseline and it’s +polygonization.

  • +
  • scale (float or tuple of floats) – Scaling factor

  • +
+
+
Return type:
+

Sequence[Tuple[List, List]]

+
+
+
+ +
+
+kraken.lib.segmentation.scale_regions(regions, scale)
+

Scales baselines/polygon coordinates by a certain factor.

+
+
Parameters:
+
    +
  • lines (Sequence) – List of tuples containing the baseline and it’s +polygonization.

  • +
  • scale (float or tuple of floats) – Scaling factor

  • +
  • regions (Sequence[Tuple[List[int], List[int]]])

  • +
+
+
Return type:
+

Sequence[Tuple[List, List]]

+
+
+
+ +
+
+kraken.lib.segmentation.compute_polygon_section(baseline, boundary, dist1, dist2)
+

Given a baseline, polygonal boundary, and two points on the baseline return +the rectangle formed by the orthogonal cuts on that baseline segment. The +resulting polygon is not garantueed to have a non-zero area.

+

The distance can be larger than the actual length of the baseline if the +baseline endpoints are inside the bounding polygon. In that case the +baseline will be extrapolated to the polygon edge.

+
+
Parameters:
+
    +
  • baseline (list) – A polyline ((x1, y1), …, (xn, yn))

  • +
  • boundary (list) – A bounding polygon around the baseline (same format as +baseline).

  • +
  • dist1 (int) – Absolute distance along the baseline of the first point.

  • +
  • dist2 (int) – Absolute distance along the baseline of the second point.

  • +
+
+
Returns:
+

A sequence of polygon points.

+
+
Return type:
+

List[Tuple[int, int]]

+
+
+
+ +
+
+kraken.lib.segmentation.extract_polygons(im, bounds)
+

Yields the subimages of image im defined in the list of bounding polygons +with baselines preserving order.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image

  • +
  • bounds (Dict[str, Any]) –

    A list of dicts in baseline: +``` +{‘type’: ‘baselines’,

    +
    +
    +
    ’lines’: [{‘baseline’: [[x_0, y_0], … [x_n, y_n]],
    +

    ’boundary’: [[x_0, y_0], … [x_n, y_n]]},

    +
    +

    ….]

    +
    +
    +
    +

    or bounding box format: +``` +{‘boxes’: [[x_0, y_0, x_1, y_1], …],

    +
    +

    ’text_direction’: ‘horizontal-lr’}

    +
    +

    ```

    +

  • +
+
+
Yields:
+

The extracted subimage

+
+
Return type:
+

PIL.Image.Image

+
+
+
+ +
+
+
+

kraken.lib.ctc_decoder

+
+
+kraken.lib.ctc_decoder.beam_decoder(outputs, beam_size=3)
+

Translates back the network output to a label sequence using +same-prefix-merge beam search decoding as described in [0].

+

[0] Hannun, Awni Y., et al. “First-pass large vocabulary continuous speech +recognition using bi-directional recurrent DNNs.” arXiv preprint +arXiv:1408.2873 (2014).

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • beam_size (int) – Size of the beam

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, prob). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+kraken.lib.ctc_decoder.greedy_decoder(outputs)
+

Translates back the network output to a label sequence using greedy/best +path decoding as described in [0].

+

[0] Graves, Alex, et al. “Connectionist temporal classification: labelling +unsegmented sequence data with recurrent neural networks.” Proceedings of +the 23rd international conference on Machine learning. ACM, 2006.

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, max). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+kraken.lib.ctc_decoder.blank_threshold_decoder(outputs, threshold=0.5)
+

Translates back the network output to a label sequence as the original +ocropy/clstm.

+

Thresholds on class 0, then assigns the maximum (non-zero) class to each +region.

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • threshold (float) – Threshold for 0 class when determining possible label +locations.

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, max). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+

kraken.lib.exceptions

+
+
+class kraken.lib.exceptions.KrakenCodecException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenStopTrainingException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenEncodeException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenRecordException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenInvalidModelException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenInputException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenRepoException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenCairoSurfaceException(message, width, height)
+

Raised when the Cairo surface couldn’t be created.

+
+
Parameters:
+
    +
  • message (str)

  • +
  • width (int)

  • +
  • height (int)

  • +
+
+
+
+
+message
+

Error message

+
+
Type:
+

str

+
+
+
+ +
+
+width
+

Width of the surface

+
+
Type:
+

int

+
+
+
+ +
+
+height
+

Height of the surface

+
+
Type:
+

int

+
+
+
+ +
+
+height
+
+ +
+
+message
+
+ +
+
+width
+
+ +
+ +
+
+

Legacy modules

+

These modules are retained for compatibility reasons or highly specialized use +cases. In most cases their use is not necessary and they aren’t further +developed for interoperability with new functionality, e.g. the transcription +and line generation modules do not work with the baseline segmenter.

+
+

kraken.binarization module

+
+
+kraken.binarization.nlbin(im, threshold=0.5, zoom=0.5, escale=1.0, border=0.1, perc=80, range=20, low=5, high=90)
+

Performs binarization using non-linear processing.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image

  • +
  • threshold (float)

  • +
  • zoom (float) – Zoom for background page estimation

  • +
  • escale (float) – Scale for estimating a mask over the text region

  • +
  • border (float) – Ignore this much of the border

  • +
  • perc (int) – Percentage for filters

  • +
  • range (int) – Range for filters

  • +
  • low (int) – Percentile for black estimation

  • +
  • high (int) – Percentile for white estimation

  • +
+
+
Returns:
+

PIL.Image.Image containing the binarized image

+
+
Raises:
+

KrakenInputException – When trying to binarize an empty image.

+
+
Return type:
+

PIL.Image.Image

+
+
+
+ +
+
+

kraken.transcribe module

+
+
+class kraken.transcribe.TranscriptionInterface(font=None, font_style=None)
+
+
+add_page(im, segmentation=None, records=None)
+

Adds an image to the transcription interface, optionally filling in +information from a list of ocr_record objects.

+
+
Parameters:
+
    +
  • im (PIL.Image) – Input image

  • +
  • segmentation (dict) – Output of the segment method.

  • +
  • records (list) – A list of ocr_record objects.

  • +
+
+
+
+ +
+
+env
+
+ +
+
+font
+
+ +
+
+line_idx = 1
+
+ +
+
+page_idx = 1
+
+ +
+
+pages: List[dict] = []
+
+ +
+
+seg_idx = 1
+
+ +
+
+text_direction = 'horizontal-tb'
+
+ +
+
+tmpl
+
+ +
+
+write(fd)
+

Writes the HTML file to a file descriptor.

+
+
Parameters:
+

fd (File) – File descriptor (mode=’rb’) to write to.

+
+
+
+ +
+ +
+
+

kraken.linegen module

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.1/genindex.html b/4.1/genindex.html new file mode 100644 index 000000000..aacce22f9 --- /dev/null +++ b/4.1/genindex.html @@ -0,0 +1,671 @@ + + + + + + + Index — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ + +

Index

+ +
+ A + | B + | C + | D + | E + | F + | G + | H + | I + | K + | L + | M + | N + | O + | P + | R + | S + | T + | U + | V + | W + +
+

A

+ + + +
+ +

B

+ + + +
+ +

C

+ + + +
+ +

D

+ + + +
+ +

E

+ + + +
+ +

F

+ + + +
+ +

G

+ + + +
+ +

H

+ + + +
+ +

I

+ + + +
+ +

K

+ + + +
+ +

L

+ + + +
+ +

M

+ + + +
+ +

N

+ + + +
+ +

O

+ + + +
+ +

P

+ + + +
+ +

R

+ + + +
+ +

S

+ + + +
+ +

T

+ + + +
+ +

U

+ + +
+ +

V

+ + + +
+ +

W

+ + + +
+ + + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.1/gpu.html b/4.1/gpu.html new file mode 100644 index 000000000..1d48be7f5 --- /dev/null +++ b/4.1/gpu.html @@ -0,0 +1,100 @@ + + + + + + + + GPU Acceleration — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

GPU Acceleration

+

The latest version of kraken uses a new pytorch backend which enables GPU +acceleration both for training and recognition. Apart from a compatible Nvidia +GPU, CUDA and cuDNN have to be installed so pytorch can run computation on it.

+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.1/index.html b/4.1/index.html new file mode 100644 index 000000000..991e52f36 --- /dev/null +++ b/4.1/index.html @@ -0,0 +1,1037 @@ + + + + + + + + kraken — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

kraken

+
+
+

kraken is a turn-key OCR system optimized for historical and non-Latin script +material.

+
+
+

Features

+

kraken’s main features are:

+
+
+
+

Pull requests and code contributions are always welcome.

+
+
+

Installation

+

Kraken can be run on Linux or Mac OS X (both x64 and ARM). Installation through +the on-board pip utility and the anaconda +scientific computing python are supported.

+
+

Installation using Pip

+
$ pip install kraken
+
+
+

or by running pip in the git repository:

+
$ pip install .
+
+
+

If you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to +install the pdf extras package for PyPi:

+
$ pip install kraken[pdf]
+
+
+

or

+
$ pip install .[pdf]
+
+
+

respectively.

+
+
+

Installation using Conda

+

To install the stable version through conda:

+
$ conda install -c conda-forge -c mittagessen kraken
+
+
+

Again PDF/multi-page TIFF/JPEG2000 support requires some additional dependencies:

+
$ conda install -c conda-forge pyvips
+
+
+

The git repository contains some environment files that aid in setting up the latest development version:

+
$ git clone git://github.com/mittagessen/kraken.git
+$ cd kraken
+$ conda env create -f environment.yml
+
+
+

or:

+
$ git clone git://github.com/mittagessen/kraken.git
+$ cd kraken
+$ conda env create -f environment_cuda.yml
+
+
+

for CUDA acceleration with the appropriate hardware.

+
+
+

Finding Recognition Models

+

Finally you’ll have to scrounge up a recognition model to do the actual +recognition of characters. To download the default English text recognition +model and place it in the user’s kraken directory:

+
$ kraken get 10.5281/zenodo.2577813
+
+
+

A list of libre models available in the central repository can be retrieved by +running:

+
$ kraken list
+
+
+

Model metadata can be extracted using:

+
$ kraken show 10.5281/zenodo.2577813
+name: 10.5281/zenodo.2577813
+
+A generalized model for English printed text
+
+This model has been trained on a large corpus of modern printed English text\naugmented with ~10000 lines of historical p
+scripts: Latn
+alphabet: !"#$%&'()+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]`abcdefghijklmnopqrstuvwxyz{} SPACE
+accuracy: 99.95%
+license: Apache-2.0
+author(s): Kiessling, Benjamin
+date: 2019-02-26
+
+
+
+
+
+

Quickstart

+

The structure of an OCR software consists of multiple steps, primarily +preprocessing, segmentation, and recognition, each of which takes the output of +the previous step and sometimes additional files such as models and templates +that define how a particular transformation is to be performed.

+

In kraken these are separated into different subcommands that can be chained or +ran separately:

+ + + + + + + + + + + + + + + Segmentation + + + + + + + + + + + Recognition + + + + + + + + + + + Serialization + + + + + + + + + + + + + + + + + + + + + + Recognition Model + + + + + + + + + + + + + + + + + + + + + + Segmentation Model + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + OCR Records + + + + + + + + + + + + + + + + + + Baselines, + Regions, + and Order + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Output File + + + + + + + + + + + + + + + + + + Output Template + + + + + + + + + + + + + + + + + + Image + + +

Recognizing text on an image using the default parameters including the +prerequisite step of page segmentation:

+
$ kraken -i image.tif image.txt segment -bl ocr
+Loading RNN     ✓
+Processing      ⣻
+
+
+

To segment an image into reading-order sorted baselines and regions:

+
$ kraken -i bw.tif lines.json segment -bl
+
+
+

To OCR an image using the default model:

+
$ kraken -i bw.tif image.txt segment -bl ocr
+
+
+

To OCR an image using the default model and serialize the output using the ALTO +template:

+
$ kraken -a -i bw.tif image.txt segment -bl ocr
+
+
+

All commands and their parameters are documented, just add the standard +--help flag for further information.

+
+
+

Training Tutorial

+

There is a training tutorial at Training kraken.

+
+ +
+

License

+

Kraken is provided under the terms and conditions of the Apache 2.0 +License.

+
+
+

Funding

+

kraken is developed at the École Pratique des Hautes Études, Université PSL.

+
+
+Co-financed by the European Union + +
+
+

This project was partially funded through the RESILIENCE project, funded from +the European Union’s Horizon 2020 Framework Programme for Research and +Innovation.

+
+
+
+
+Received funding from the Programme d’investissements d’Avenir + +
+
+

Ce travail a bénéficié d’une aide de l’État gérée par l’Agence Nationale de la +Recherche au titre du Programme d’Investissements d’Avenir portant la référence +ANR-21-ESRE-0005.

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.1/ketos.html b/4.1/ketos.html new file mode 100644 index 000000000..dd82027e5 --- /dev/null +++ b/4.1/ketos.html @@ -0,0 +1,798 @@ + + + + + + + + Training — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Training

+

This page describes the training utilities available through the ketos +command line utility in depth. For a gentle introduction on model training +please refer to the tutorial.

+

Both segmentation and recognition are trainable in kraken. The segmentation +model finds baselines and regions on a page image. Recognition models convert +text image lines found by the segmenter into digital text.

+
+

Training data formats

+

The training tools accept a variety of training data formats, usually some kind +of custom low level format, the XML-based formats that are commony used for +archival of annotation and transcription data, and in the case of recognizer +training a precompiled binary format. It is recommended to use the XML formats +for segmentation training and the binary format for recognition training.

+
+

ALTO

+

Kraken parses and produces files according to ALTO 4.2. An example showing the +attributes necessary for segmentation and recognition training follows:

+
<?xml version="1.0" encoding="UTF-8"?>
+<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+	xmlns="http://www.loc.gov/standards/alto/ns-v4#"
+	xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-0.xsd">
+	<Description>
+		<sourceImageInformation>
+			<fileName>filename.jpg</fileName><!-- relative path in relation to XML location of the image file-->
+		</sourceImageInformation>
+		....
+	</Description>
+	<Layout>
+		<Page...>
+			<PrintSpace...>
+				<ComposedBlockType ID="block_I"
+						   HPOS="125"
+						   VPOS="523" 
+						   WIDTH="5234" 
+						   HEIGHT="4000"
+						   TYPE="region_type"><!-- for textlines part of a semantic region -->
+					<TextBlock ID="textblock_N">
+						<TextLine ID="line_0"
+							  HPOS="..."
+							  VPOS="..." 
+							  WIDTH="..." 
+							  HEIGHT="..."
+							  BASELINE="10 20 15 20 400 20"><!-- necessary for segmentation training -->
+							<String ID="segment_K" 
+								CONTENT="word_text"><!-- necessary for recognition training. Text is retrieved from <String> and <SP> tags. Lower level glyphs are ignored. -->
+								...
+							</String>
+							<SP.../>
+						</TextLine>
+					</TextBlock>
+				</ComposedBlockType>
+				<TextBlock ID="textblock_M"><!-- for textlines not part of a region -->
+				...
+				</TextBlock>
+			</PrintSpace>
+		</Page>
+	</Layout>
+</alto>
+
+
+

Importantly, the parser only works with measurements in the pixel domain, i.e. +an unset MeasurementUnit or one with an element value of pixel. In +addition, as the minimal version required for ingestion is quite new it is +likely that most existing ALTO documents will not contain sufficient +information to be used with kraken out of the box.

+
+
+

PAGE XML

+

PAGE XML is parsed and produced according to the 2019-07-15 version of the +schema, although the parser is not strict and works with non-conformant output +of a variety of tools.

+
<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd">
+	<Metadata>...</Metadata>
+	<Page imageFilename="filename.jpg"...><!-- relative path to an image file from the location of the XML document -->
+		<TextRegion id="block_N"
+			    custom="structure {type:region_type;}"><!-- region type is a free text field-->
+			<Coords points="10,20 500,20 400,200, 500,300, 10,300 5,80"/><!-- polygon for region boundary -->
+			<TextLine id="line_K">
+				<Baseline points="80,200 100,210, 400,198"/><!-- required for baseline segmentation training -->
+				<TextEquiv><Unicode>text text text</Unicode></TextEquiv><!-- only TextEquiv tags immediately below the TextLine tag are parsed for recognition training -->
+				<Word>
+				...
+			</TextLine>
+			....
+		</TextRegion>
+		<TextRegion id="textblock_M"><!-- for lines not contained in any region. TextRegions without a type are automatically assigned the 'text' type which can be filtered out for training. -->
+			<Coords points="0,0 0,{{ page.size[1] }} {{ page.size[0] }},{{ page.size[1] }} {{ page.size[0] }},0"/>
+			<TextLine>...</TextLine><!-- same as above -->
+			....
+                </TextRegion>
+	</Page>
+</PcGts>
+
+
+
+
+

Binary Datasets

+

In addition to training recognition models directly from XML and image files, a +binary dataset format offering a couple of advantages is supported. Binary +datasets drastically improve loading performance allowing the saturation of +most GPUs with minimal computational overhead while also allowing training with +datasets that are larger than the systems main memory. A minor drawback is a +~30% increase in dataset size in comparison to the raw images + XML approach.

+

To realize this speedup the dataset has to be compiled first:

+
$ ketos compile -f xml -o dataset.arrow file_1.xml file_2.xml ...
+
+
+

if there are a lot of individual lines containing many lines this process can +take a long time. It can easily be parallelized by specifying the number of +separate parsing workers with the –workers option:

+
$ ketos compile --workers 8 -f xml ...
+
+
+

In addition, binary datasets can contain fixed splits which allow +reproducibility and comparability between training and evaluation runs. +Training, validation, and test splits can be pre-defined from multiple sources. +Per default they are sourced from tags defined in the source XML files unless +the option telling kraken to ignore them is set:

+
$ ketos compile --ignore-splits -f xml ...
+
+
+

Alternatively fixed-proportion random splits can be created ad-hoc during +compile time:

+
$ ketos compile --random-split 0.8 0.1 0.1 ...
+
+
+

The above line splits assigns 80% of the source lines to the training set, 10% +to the validation set, and 10% to the test set. The training and validation +sets in the dataset file are used automatically by ketos train (unless told +otherwise) while the remaining 10% of the test set is selected by ketos test.

+
+
+
+

Recognition training

+

The training utility allows training of VGSL specified models +both from scratch and from existing models. Here are its most important command line options:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

option

action

-o, –output

Output model file prefix. Defaults to model.

-s, –spec

VGSL spec of the network to train. CTC layer +will be added automatically. default: +[1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 +Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do]

-a, –append

Removes layers before argument and then +appends spec. Only works when loading an +existing model

-i, –load

Load existing file to continue training

-F, –savefreq

Model save frequency in epochs during +training

-q, –quit

Stop condition for training. Set to early +for early stopping (default) or dumb for fixed +number of epochs.

-N, –epochs

Number of epochs to train for.

–min-epochs

Minimum number of epochs to train for when using early stopping.

–lag

Number of epochs to wait before stopping +training without improvement. Only used when using early stopping.

-d, –device

Select device to use (cpu, cuda:0, cuda:1,…). GPU acceleration requires CUDA.

–optimizer

Select optimizer (Adam, SGD, RMSprop).

-r, –lrate

Learning rate [default: 0.001]

-m, –momentum

Momentum used with SGD optimizer. Ignored otherwise.

-w, –weight-decay

Weight decay.

–schedule

Sets the learning rate scheduler. May be either constant, 1cycle, exponential, cosine, step, or +reduceonplateau. For 1cycle the cycle length is determined by the –epoch option.

-p, –partition

Ground truth data partition ratio between train/validation set

-u, –normalization

Ground truth Unicode normalization. One of NFC, NFKC, NFD, NFKD.

-c, –codec

Load a codec JSON definition (invalid if loading existing model)

–resize

Codec/output layer resizing option. If set +to add code points will be added, both +will set the layer to match exactly the +training data, fail will abort if training +data and model codec do not match. Only valid when refining an existing model.

-n, –reorder / –no-reorder

Reordering of code points to display order.

-t, –training-files

File(s) with additional paths to training data. Used to +enforce an explicit train/validation set split and deal with +training sets with more lines than the command line can process. Can be used more than once.

-e, –evaluation-files

File(s) with paths to evaluation data. Overrides the -p parameter.

-f, –format-type

Sets the training and evaluation data format. +Valid choices are ‘path’, ‘xml’ (default), ‘alto’, ‘page’, or binary. +In alto, page, and xml mode all data is extracted from XML files +containing both baselines and a link to source images. +In path mode arguments are image files sharing a prefix up to the last +extension with JSON .path files containing the baseline information. +In binary mode arguments are precompiled binary dataset files.

–augment / –no-augment

Enables/disables data augmentation.

–workers

Number of OpenMP threads and workers used to perform neural network passes and load samples from the dataset.

+
+

From Scratch

+

The absolute minimal example to train a new recognition model from a number of +ALTO or PAGE XML documents is similar to the segmentation training:

+
$ ketos train -f xml training_data/*.xml
+
+
+

Training will continue until the error does not improve anymore and the best +model (among intermediate results) will be saved in the current directory; this +approach is called early stopping.

+

In some cases, such as color inputs, changing the network architecture might be +useful:

+
$ ketos train -f page -s '[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do]' syr/*.xml
+
+
+

Complete documentation for the network description language can be found on the +VGSL page.

+

Sometimes the early stopping default parameters might produce suboptimal +results such as stopping training too soon. Adjusting the minimum delta an/or +lag can be useful:

+
$ ketos train --lag 10 --min-delta 0.001 syr/*.png
+
+
+

To switch optimizers from Adam to SGD or RMSprop just set the option:

+
$ ketos train --optimizer SGD syr/*.png
+
+
+

It is possible to resume training from a previously saved model:

+
$ ketos train -i model_25.mlmodel syr/*.png
+
+
+

A good configuration for a small precompiled print dataset and GPU acceleration +would be:

+
$ ketos train -d cuda -f binary dataset.arrow
+
+
+

A better configuration for large and complicated datasets such as handwritten texts:

+
$ ketos train --augment --workers 4 -d cuda -f binary --min-epochs 20 -w 0 -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]' -r 0.0001 dataset_large.arrow
+
+
+

This configuration is slower to train and often requires a couple of epochs to +output any sensible text at all. Therefore we tell ketos to train for at least +20 epochs so the early stopping algorithm doesn’t prematurely interrupt the +training process.

+
+
+

Fine Tuning

+

Fine tuning an existing model for another typeface or new characters is also +possible with the same syntax as resuming regular training:

+
$ ketos train -f page -i model_best.mlmodel syr/*.xml
+
+
+

The caveat is that the alphabet of the base model and training data have to be +an exact match. Otherwise an error will be raised:

+
$ ketos train -i model_5.mlmodel kamil/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[0.8616] alphabet mismatch {'~', '»', '8', '9', 'ـ'}
+Network codec not compatible with training set
+[0.8620] Training data and model codec alphabets mismatch: {'ٓ', '؟', '!', 'ص', '،', 'ذ', 'ة', 'ي', 'و', 'ب', 'ز', 'ح', 'غ', '~', 'ف', ')', 'د', 'خ', 'م', '»', 'ع', 'ى', 'ق', 'ش', 'ا', 'ه', 'ك', 'ج', 'ث', '(', 'ت', 'ظ', 'ض', 'ل', 'ط', '؛', 'ر', 'س', 'ن', 'ء', 'ٔ', '«', 'ـ', 'ٕ'}
+
+
+

There are two modes dealing with mismatching alphabets, add and both. +add resizes the output layer and codec of the loaded model to include all +characters in the new training set without removing any characters. both +will make the resulting model an exact match with the new training set by both +removing unused characters from the model and adding new ones.

+
$ ketos -v train --resize add -i model_5.mlmodel syr/*.png
+...
+[0.7943] Training set 788 lines, validation set 88 lines, alphabet 50 symbols
+...
+[0.8337] Resizing codec to include 3 new code points
+[0.8374] Resizing last layer in network to 52 outputs
+...
+
+
+

In this example 3 characters were added for a network that is able to +recognize 52 different characters after sufficient additional training.

+
$ ketos -v train --resize both -i model_5.mlmodel syr/*.png
+...
+[0.7593] Training set 788 lines, validation set 88 lines, alphabet 49 symbols
+...
+[0.7857] Resizing network or given codec to 49 code sequences
+[0.8344] Deleting 2 output classes from network (46 retained)
+...
+
+
+

In both mode 2 of the original characters were removed and 3 new ones were added.

+
+
+

Slicing

+

Refining on mismatched alphabets has its limits. If the alphabets are highly +different the modification of the final linear layer to add/remove character +will destroy the inference capabilities of the network. In those cases it is +faster to slice off the last few layers of the network and only train those +instead of a complete network from scratch.

+

Taking the default network definition as printed in the debug log we can see +the layer indices of the model:

+
[0.8760] Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 48 outputs
+[0.8762] layer          type    params
+[0.8790] 0              conv    kernel 3 x 3 filters 32 activation r
+[0.8795] 1              dropout probability 0.1 dims 2
+[0.8797] 2              maxpool kernel 2 x 2 stride 2 x 2
+[0.8802] 3              conv    kernel 3 x 3 filters 64 activation r
+[0.8804] 4              dropout probability 0.1 dims 2
+[0.8806] 5              maxpool kernel 2 x 2 stride 2 x 2
+[0.8813] 6              reshape from 1 1 x 12 to 1/3
+[0.8876] 7              rnn     direction b transposed False summarize False out 100 legacy None
+[0.8878] 8              dropout probability 0.5 dims 1
+[0.8883] 9              linear  augmented False out 48
+
+
+

To remove everything after the initial convolutional stack and add untrained +layers we define a network stub and index for appending:

+
$ ketos train -i model_1.mlmodel --append 7 -s '[Lbx256 Do]' syr/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[0.8014] alphabet mismatch {'8', '3', '9', '7', '܇', '݀', '݂', '4', ':', '0'}
+Slicing and dicing model ✓
+
+
+

The new model will behave exactly like a new one, except potentially training a +lot faster.

+
+
+

Text Normalization and Unicode

+

Text can be encoded in multiple different ways when using Unicode. For many +scripts characters with diacritics can be encoded either as a single code point +or a base character and the diacritic, different types of whitespace exist, and mixed bidirectional text +can be written differently depending on the base line direction.

+

Ketos provides options to largely normalize input into normalized forms that +make processing of data from multiple sources possible. Principally, two +options are available: one for Unicode normalization and one for whitespace normalization. The +Unicode normalization (disabled per default) switch allows one to select one of +the 4 normalization forms:

+
$ ketos train --normalization NFD -f xml training_data/*.xml
+$ ketos train --normalization NFC -f xml training_data/*.xml
+$ ketos train --normalization NFKD -f xml training_data/*.xml
+$ ketos train --normalization NFKC -f xml training_data/*.xml
+
+
+

Whitespace normalization is enabled per default and converts all Unicode +whitespace characters into a simple space. It is highly recommended to leave +this function enabled as the variation of space width, resulting either from +text justification or the irregularity of handwriting, is difficult for a +recognition model to accurately model and map onto the different space code +points. Nevertheless it can be disabled through:

+
$ ketos train --no-normalize-whitespace -f xml training_data/*.xml
+
+
+

Further the behavior of the BiDi algorithm can be influenced through two options. The +configuration of the algorithm is important as the recognition network is +trained to output characters (or rather labels which are mapped to code points +by a codec) in the order a line is fed into the network, i.e. +left-to-right also called display order. Unicode text is encoded as a stream of +code points in logical order, i.e. the order the characters in a line are read +in by a human reader, for example (mostly) right-to-left for a text in Hebrew. +The BiDi algorithm resolves this logical order to the display order expected by +the network and vice versa. The primary parameter of the algorithm is the base +direction which is just the default direction of the input fields of the user +when the ground truth was initially transcribed. Base direction will be +automatically determined by kraken when using PAGE XML or ALTO files that +contain it, otherwise it will have to be supplied if it differs from the +default when training a model:

+
$ ketos train --base-dir R -f xml rtl_training_data/*.xml
+
+
+

It is also possible to disable BiDi processing completely, e.g. when the text +has been brought into display order already:

+
$ ketos train --no-reorder -f xml rtl_display_data/*.xml
+
+
+
+
+

Codecs

+

Codecs map between the label decoded from the raw network output and Unicode +code points (see this diagram for the precise steps +involved in text line recognition). Codecs are attached to a recognition model +and are usually defined once at initial training time, although they can be +adapted either explicitly (with the API) or implicitly through domain adaptation.

+

The default behavior of kraken is to auto-infer this mapping from all the +characters in the training set and map each code point to one separate label. +This is usually sufficient for alphabetic scripts, abjads, and abugidas apart +from very specialised use cases. Logographic writing systems with a very large +number of different graphemes, such as all the variants of Han characters or +Cuneiform, can be more problematic as their large inventory makes recognition +both slow and error-prone. In such cases it can be advantageous to decompose +each code point into multiple labels to reduce the output dimensionality of the +network. During decoding valid sequences of labels will be mapped to their +respective code points as usual.

+

There are multiple approaches one could follow constructing a custom codec: +randomized block codes, i.e. producing random fixed-length labels for each code +point, Huffmann coding, i.e. variable length label sequences depending on the +frequency of each code point in some text (not necessarily the training set), +or structural decomposition, i.e. describing each code point through a +sequence of labels that describe the shape of the grapheme similar to how some +input systems for Chinese characters function.

+

While the system is functional it is not well-tested in practice and it is +unclear which approach works best for which kinds of inputs.

+

Custom codecs can be supplied as simple JSON files that contain a dictionary +mapping between strings and integer sequences, e.g.:

+
$ ketos train -c sample.codec -f xml training_data/*.xml
+
+
+

with sample.codec containing:

+
{"S": [50, 53, 74, 23],
+ "A": [95, 60, 19, 95],
+ "B": [2, 96, 28, 29],
+ "\u1f05": [91, 14, 95, 90]}
+
+
+
+
+
+

Segmentation training

+

Training a segmentation model is very similar to training models for text +recognition. The basic invocation is:

+
$ ketos segtrain -f xml training_data/*.xml
+Training line types:
+  default 2     53980
+  foo     8     134
+Training region types:
+  graphic       3       135
+  text  4       1128
+  separator     5       5431
+  paragraph     6       10218
+  table 7       16
+val check  [------------------------------------]  0/0
+
+
+

This takes all text lines and regions encoded in the XML files and trains a +model to recognize them.

+

Most other options available in transcription training are also available in +segmentation training. CUDA acceleration:

+
$ ketos segtrain -d cuda -f xml training_data/*.xml
+
+
+

Defining custom architectures:

+
$ ketos segtrain -d cuda -s '[1,1200,0,3 Cr7,7,64,2,2 Gn32 Cr3,3,128,2,2 Gn32 Cr3,3,128 Gn32 Cr3,3,256 Gn32]' -f xml training_data/*.xml
+
+
+

Fine tuning/transfer learning with last layer adaptation and slicing:

+
$ ketos segtrain --resize both -i segmodel_best.mlmodel training_data/*.xml
+$ ketos segtrain -i segmodel_best.mlmodel --append 7 -s '[Cr3,3,64 Do0.1]' training_data/*.xml
+
+
+

In addition there are a couple of specific options that allow filtering of +baseline and region types. Datasets are often annotated to a level that is too +detailled or contains undesirable types, e.g. when combining segmentation data +from different sources. The most basic option is the suppression of all of +either baseline or region data contained in the dataset:

+
$ ketos segtrain --suppress-baselines -f xml training_data/*.xml
+Training line types:
+Training region types:
+  graphic       3       135
+  text  4       1128
+  separator     5       5431
+  paragraph     6       10218
+  table 7       16
+...
+$ ketos segtrain --suppress-regions -f xml training-data/*.xml
+Training line types:
+  default 2     53980
+  foo     8     134
+...
+
+
+

It is also possible to filter out baselines/regions selectively:

+
$ ketos segtrain -f xml --valid-baselines default training_data/*.xml
+Training line types:
+  default 2     53980
+Training region types:
+  graphic       3       135
+  text  4       1128
+  separator     5       5431
+  paragraph     6       10218
+  table 7       16
+$ ketos segtrain -f xml --valid-regions graphic --valid-regions paragraph training_data/*.xml
+Training line types:
+  default 2     53980
+ Training region types:
+  graphic       3       135
+  paragraph     6       10218
+
+
+

Finally, we can merge baselines and regions into each other:

+
$ ketos segtrain -f xml --merge-baselines default:foo training_data/*.xml
+Training line types:
+  default 2     54114
+...
+$ ketos segtrain -f xml --merge-regions text:paragraph --merge-regions graphic:table training_data/*.xml
+...
+Training region types:
+  graphic       3       151
+  text  4       11346
+  separator     5       5431
+...
+
+
+

These options are combinable to massage the dataset into any typology you want.

+

Then there are some options that set metadata fields controlling the +postprocessing. When computing the bounding polygons the recognized baselines +are offset slightly to ensure overlap with the line corpus. This offset is per +default upwards for baselines but as it is possible to annotate toplines (for +scripts like Hebrew) and centerlines (for baseline-free scripts like Chinese) +the appropriate offset can be selected with an option:

+
$ ketos segtrain --topline -f xml hebrew_training_data/*.xml
+$ ketos segtrain --centerline -f xml chinese_training_data/*.xml
+$ ketos segtrain --baseline -f xml latin_training_data/*.xml
+
+
+

Lastly, there are some regions that are absolute boundaries for text line +content. When these regions are marked as such the polygonization can sometimes +be improved:

+
$ ketos segtrain --bounding-regions paragraph -f xml training_data/*.xml
+...
+
+
+
+
+

Recognition Testing

+

Picking a particular model from a pool or getting a more detailed look on the +recognition accuracy can be done with the test command. It uses transcribed +lines, the test set, in the same format as the train command, recognizes the +line images with one or more models, and creates a detailed report of the +differences from the ground truth for each of them.

+ + + + + + + + + + + + + + + + + + + + + + + +

option

action

-f, –format-type

Sets the test set data format. +Valid choices are ‘path’, ‘xml’ (default), ‘alto’, ‘page’, or binary. +In alto, page, and xml mode all data is extracted from XML files +containing both baselines and a link to source images. +In path mode arguments are image files sharing a prefix up to the last +extension with JSON .path files containing the baseline information. +In binary mode arguments are precompiled binary dataset files.

-m, –model

Model(s) to evaluate.

-e, –evaluation-files

File(s) with paths to evaluation data.

-d, –device

Select device to use.

–pad

Left and right padding around lines.

+

Transcriptions are handed to the command in the same way as for the train +command, either through a manifest with -e/–evaluation-files or by just +adding a number of image files as the final argument:

+
$ ketos test -m $model -e test.txt test/*.png
+Evaluating $model
+Evaluating  [####################################]  100%
+=== report test_model.mlmodel ===
+
+7012 Characters
+6022 Errors
+14.12%       Accuracy
+
+5226 Insertions
+2    Deletions
+794  Substitutions
+
+Count Missed   %Right
+1567  575    63.31%  Common
+5230  5230   0.00%   Arabic
+215   215    0.00%   Inherited
+
+Errors       Correct-Generated
+773  { ا } - {  }
+536  { ل } - {  }
+328  { و } - {  }
+274  { ي } - {  }
+266  { م } - {  }
+256  { ب } - {  }
+246  { ن } - {  }
+241  { SPACE } - {  }
+207  { ر } - {  }
+199  { ف } - {  }
+192  { ه } - {  }
+174  { ع } - {  }
+172  { ARABIC HAMZA ABOVE } - {  }
+144  { ت } - {  }
+136  { ق } - {  }
+122  { س } - {  }
+108  { ، } - {  }
+106  { د } - {  }
+82   { ك } - {  }
+81   { ح } - {  }
+71   { ج } - {  }
+66   { خ } - {  }
+62   { ة } - {  }
+60   { ص } - {  }
+39   { ، } - { - }
+38   { ش } - {  }
+30   { ا } - { - }
+30   { ن } - { - }
+29   { ى } - {  }
+28   { ذ } - {  }
+27   { ه } - { - }
+27   { ARABIC HAMZA BELOW } - {  }
+25   { ز } - {  }
+23   { ث } - {  }
+22   { غ } - {  }
+20   { م } - { - }
+20   { ي } - { - }
+20   { ) } - {  }
+19   { : } - {  }
+19   { ط } - {  }
+19   { ل } - { - }
+18   { ، } - { . }
+17   { ة } - { - }
+16   { ض } - {  }
+...
+Average accuracy: 14.12%, (stddev: 0.00)
+
+
+

The report(s) contains character accuracy measured per script and a detailed +list of confusions. When evaluating multiple models the last line of the output +will the average accuracy and the standard deviation across all of them.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.1/models.html b/4.1/models.html new file mode 100644 index 000000000..e7cba0e0e --- /dev/null +++ b/4.1/models.html @@ -0,0 +1,126 @@ + + + + + + + + Models — kraken documentation + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Models

+

There are currently three kinds of models containing the recurrent neural +networks doing all the character recognition supported by kraken: pronn +files serializing old pickled pyrnn models as protobuf, clstm’s native +serialization, and versatile Core ML models.

+
+

CoreML

+

Core ML allows arbitrary network architectures in a compact serialization with +metadata. This is the default format in pytorch-based kraken.

+
+
+

Segmentation Models

+
+
+

Recognition Models

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.1/objects.inv b/4.1/objects.inv new file mode 100644 index 000000000..f3f2e3cdb Binary files /dev/null and b/4.1/objects.inv differ diff --git a/4.1/search.html b/4.1/search.html new file mode 100644 index 000000000..9dbf91c3a --- /dev/null +++ b/4.1/search.html @@ -0,0 +1,113 @@ + + + + + + + Search — kraken documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +

Search

+ + + + +

+ Searching for multiple words only shows matches that contain + all words. +

+ + +
+ + + +
+ + +
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.1/searchindex.js b/4.1/searchindex.js new file mode 100644 index 000000000..1b880b344 --- /dev/null +++ b/4.1/searchindex.js @@ -0,0 +1 @@ +Search.setIndex({"alltitles": {"ALTO": [[5, "alto"]], "API Quickstart": [[1, null]], "API Reference": [[2, null]], "Advanced Usage": [[0, null]], "Annotation and transcription": [[7, "annotation-and-transcription"]], "Baseline segmentation": [[1, "baseline-segmentation"]], "Basic Concepts": [[1, "basic-concepts"]], "Basics": [[8, "basics"]], "Binarization": [[0, "binarization"]], "Binary Datasets": [[5, "binary-datasets"]], "Codecs": [[5, "codecs"]], "Convolutional Layers": [[8, "convolutional-layers"]], "CoreML": [[6, "coreml"]], "Dataset Compilation": [[7, "dataset-compilation"]], "Datasets": [[2, "datasets"]], "Dropout": [[8, "dropout"]], "Evaluation and Validation": [[7, "evaluation-and-validation"]], "Examples": [[8, "examples"]], "Features": [[4, "features"]], "Finding Recognition Models": [[4, "finding-recognition-models"]], "Fine Tuning": [[5, "fine-tuning"]], "From Scratch": [[5, "from-scratch"]], "Funding": [[4, "funding"]], "GPU Acceleration": [[3, null]], "Group Normalization": [[8, "group-normalization"]], "Helper and Plumbing Layers": [[8, "helper-and-plumbing-layers"]], "Helpers": [[2, "helpers"]], "Image acquisition and preprocessing": [[7, "image-acquisition-and-preprocessing"]], "Input Specification": [[0, "input-specification"]], "Installation": [[4, "installation"]], "Installation using Conda": [[4, "installation-using-conda"]], "Installation using Pip": [[4, "installation-using-pip"]], "Installing kraken": [[7, "installing-kraken"]], "Legacy modules": [[2, "legacy-modules"]], "Legacy segmentation": [[1, "legacy-segmentation"]], "License": [[4, "license"]], "Loss and Evaluation Functions": [[2, "loss-and-evaluation-functions"]], "Max Pool": [[8, "max-pool"]], "Model Repository": [[0, "model-repository"]], "Models": [[6, null]], "PAGE XML": [[5, "page-xml"]], "Page Segmentation and Script Detection": [[0, "page-segmentation-and-script-detection"]], "Preprocessing and Segmentation": [[1, "preprocessing-and-segmentation"]], "Quickstart": [[4, "quickstart"]], "Recognition": [[0, "recognition"], [1, "recognition"], [7, "recognition"]], "Recognition Models": [[6, "recognition-models"]], "Recognition Testing": [[5, "recognition-testing"]], "Recognition training": [[5, "recognition-training"]], "Recurrent Layers": [[8, "recurrent-layers"]], "Regularization Layers": [[8, "regularization-layers"]], "Related Software": [[4, "related-software"]], "Reshape": [[8, "reshape"]], "Segmentation Models": [[6, "segmentation-models"]], "Segmentation training": [[5, "segmentation-training"]], "Serialization": [[1, "serialization"]], "Slicing": [[5, "slicing"]], "Text Normalization and Unicode": [[5, "text-normalization-and-unicode"]], "Trainer": [[2, "trainer"]], "Training": [[1, "training"], [5, null], [7, "compilation"]], "Training Schedulers": [[2, "training-schedulers"]], "Training Stoppers": [[2, "training-stoppers"]], "Training Tutorial": [[4, "training-tutorial"]], "Training data formats": [[5, "training-data-formats"]], "Training kraken": [[7, null]], "VGSL network specification": [[8, null]], "XML Parsing": [[1, "xml-parsing"]], "kraken": [[4, null]], "kraken.binarization module": [[2, "kraken-binarization-module"]], "kraken.blla module": [[2, "kraken-blla-module"]], "kraken.lib.codec module": [[2, "kraken-lib-codec-module"]], "kraken.lib.ctc_decoder": [[2, "kraken-lib-ctc-decoder"]], "kraken.lib.dataset module": [[2, "kraken-lib-dataset-module"]], "kraken.lib.exceptions": [[2, "kraken-lib-exceptions"]], "kraken.lib.models module": [[2, "kraken-lib-models-module"]], "kraken.lib.segmentation module": [[2, "kraken-lib-segmentation-module"]], "kraken.lib.train module": [[2, "kraken-lib-train-module"]], "kraken.lib.vgsl module": [[2, "kraken-lib-vgsl-module"]], "kraken.lib.xml module": [[2, "kraken-lib-xml-module"]], "kraken.linegen module": [[2, "kraken-linegen-module"]], "kraken.pageseg module": [[2, "kraken-pageseg-module"]], "kraken.rpred module": [[2, "kraken-rpred-module"]], "kraken.serialization module": [[2, "kraken-serialization-module"]], "kraken.transcribe module": [[2, "kraken-transcribe-module"]]}, "docnames": ["advanced", "api", "api_docs", "gpu", "index", "ketos", "models", "training", "vgsl"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2}, "filenames": ["advanced.rst", "api.rst", "api_docs.rst", "gpu.rst", "index.rst", "ketos.rst", "models.rst", "training.rst", "vgsl.rst"], "indexentries": {"add() (kraken.lib.dataset.baselineset method)": [[2, "kraken.lib.dataset.BaselineSet.add", false]], "add() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.add", false]], "add() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.add", false]], "add_codec() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.add_codec", false]], "add_labels() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.add_labels", false]], "add_page() (kraken.transcribe.transcriptioninterface method)": [[2, "kraken.transcribe.TranscriptionInterface.add_page", false]], "alphabet (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.alphabet", false]], "alphabet (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.alphabet", false]], "append() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.append", false]], "aug (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.aug", false]], "aug (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.aug", false]], "aug (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.aug", false]], "base_dir (kraken.rpred.ocr_record attribute)": [[2, "kraken.rpred.ocr_record.base_dir", false]], "baselineset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.BaselineSet", false]], "beam_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.beam_decoder", false]], "bidi_record() (in module kraken.rpred)": [[2, "kraken.rpred.bidi_record", false]], "bidi_reordering (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.bidi_reordering", false]], "blank_threshold_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.blank_threshold_decoder", false]], "blocks (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.blocks", false]], "bounds (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.bounds", false]], "build_addition() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_addition", false]], "build_conv() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_conv", false]], "build_dropout() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_dropout", false]], "build_groupnorm() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_groupnorm", false]], "build_identity() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_identity", false]], "build_maxpool() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_maxpool", false]], "build_output() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_output", false]], "build_parallel() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_parallel", false]], "build_reshape() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_reshape", false]], "build_rnn() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_rnn", false]], "build_series() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_series", false]], "c_sorted (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.c_sorted", false]], "calculate_polygonal_environment() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.calculate_polygonal_environment", false]], "class_mapping (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.class_mapping", false]], "class_stats (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.class_stats", false]], "codec (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.codec", false]], "codec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.codec", false]], "compute_error() (in module kraken.lib.dataset)": [[2, "kraken.lib.dataset.compute_error", false]], "compute_polygon_section() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.compute_polygon_section", false]], "confidences (kraken.rpred.ocr_record attribute)": [[2, "kraken.rpred.ocr_record.confidences", false]], "criterion (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id0", false], [2, "kraken.lib.vgsl.TorchVGSLModel.criterion", false]], "cuts (kraken.rpred.ocr_record attribute)": [[2, "kraken.rpred.ocr_record.cuts", false]], "decode() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.decode", false]], "decoder (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.decoder", false]], "denoising_hysteresis_thresh() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.denoising_hysteresis_thresh", false]], "device (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.device", false]], "encode() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.encode", false]], "encode() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.encode", false]], "encode() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.encode", false]], "env (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.env", false]], "eval() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.eval", false]], "extract_polygons() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.extract_polygons", false]], "filtered_tags (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.filtered_tags", false]], "fit() (kraken.lib.train.krakentrainer method)": [[2, "kraken.lib.train.KrakenTrainer.fit", false]], "font (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.font", false]], "forward() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.forward", false]], "greedy_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.greedy_decoder", false]], "groundtruthdataset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.GroundTruthDataset", false]], "height (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id13", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.height", false]], "hyper_params (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.hyper_params", false]], "idx (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.idx", false]], "im (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.im", false]], "im_mode (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.im_mode", false]], "im_mode (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.im_mode", false]], "im_mode (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.im_mode", false]], "im_str (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.im_str", false]], "imgs (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.imgs", false]], "init_weights() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.init_weights", false]], "input (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id1", false], [2, "kraken.lib.vgsl.TorchVGSLModel.input", false]], "is_valid (kraken.lib.codec.pytorchcodec property)": [[2, "kraken.lib.codec.PytorchCodec.is_valid", false]], "kind (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.kind", false]], "krakencairosurfaceexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenCairoSurfaceException", false]], "krakencodecexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenCodecException", false]], "krakenencodeexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenEncodeException", false]], "krakeninputexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenInputException", false]], "krakeninvalidmodelexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenInvalidModelException", false]], "krakenrecordexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenRecordException", false]], "krakenrepoexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenRepoException", false]], "krakenstoptrainingexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenStopTrainingException", false]], "krakentrainer (class in kraken.lib.train)": [[2, "kraken.lib.train.KrakenTrainer", false]], "l2c (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.l2c", false]], "line_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.line_idx", false]], "line_width (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.line_width", false]], "load_any() (in module kraken.lib.models)": [[2, "kraken.lib.models.load_any", false]], "load_clstm_model() (kraken.lib.vgsl.torchvgslmodel class method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.load_clstm_model", false]], "load_model() (kraken.lib.vgsl.torchvgslmodel class method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.load_model", false]], "load_pronn_model() (kraken.lib.vgsl.torchvgslmodel class method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.load_pronn_model", false]], "m (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.m", false]], "max_label (kraken.lib.codec.pytorchcodec property)": [[2, "kraken.lib.codec.PytorchCodec.max_label", false]], "mbl_dict (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.mbl_dict", false]], "merge() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.merge", false]], "message (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id14", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.message", false]], "miss (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.miss", false]], "mm_rpred (class in kraken.rpred)": [[2, "kraken.rpred.mm_rpred", false]], "mode (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.mode", false]], "model_type (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.model_type", false]], "mreg_dict (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.mreg_dict", false]], "named_spec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.named_spec", false]], "nets (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.nets", false]], "nlbin() (in module kraken.binarization)": [[2, "kraken.binarization.nlbin", false]], "nn (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.nn", false]], "nn (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id2", false], [2, "kraken.lib.vgsl.TorchVGSLModel.nn", false]], "no_encode() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.no_encode", false]], "no_encode() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.no_encode", false]], "num_classes (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.num_classes", false]], "ocr_record (class in kraken.rpred)": [[2, "kraken.rpred.ocr_record", false]], "one_channel_mode (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.one_channel_mode", false]], "one_channel_mode (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.one_channel_mode", false]], "one_channel_mode (kraken.lib.vgsl.torchvgslmodel property)": [[2, "id3", false]], "one_channel_modes (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.one_channel_modes", false]], "ops (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.ops", false]], "pad (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.pad", false]], "page_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.page_idx", false]], "pages (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.pages", false]], "parse() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.parse", false]], "parse() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.parse", false]], "parse_alto() (in module kraken.lib.xml)": [[2, "kraken.lib.xml.parse_alto", false]], "parse_page() (in module kraken.lib.xml)": [[2, "kraken.lib.xml.parse_page", false]], "parse_xml() (in module kraken.lib.xml)": [[2, "kraken.lib.xml.parse_xml", false]], "pattern (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.pattern", false]], "polygonal_reading_order() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.polygonal_reading_order", false]], "polygongtdataset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.PolygonGTDataset", false]], "predict() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict", false]], "predict_labels() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict_labels", false]], "predict_string() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict_string", false]], "prediction (kraken.rpred.ocr_record attribute)": [[2, "kraken.rpred.ocr_record.prediction", false]], "preparse_xml_data() (in module kraken.lib.dataset)": [[2, "kraken.lib.dataset.preparse_xml_data", false]], "pytorchcodec (class in kraken.lib.codec)": [[2, "kraken.lib.codec.PytorchCodec", false]], "reading_order() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.reading_order", false]], "render_report() (in module kraken.serialization)": [[2, "kraken.serialization.render_report", false]], "resize_output() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.resize_output", false]], "rpred() (in module kraken.rpred)": [[2, "kraken.rpred.rpred", false]], "save_model() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.save_model", false]], "scale_polygonal_lines() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.scale_polygonal_lines", false]], "scale_regions() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.scale_regions", false]], "seg_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.seg_idx", false]], "seg_type (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.seg_type", false]], "seg_type (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.seg_type", false]], "seg_type (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.seg_type", false]], "seg_type (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.seg_type", false]], "seg_type (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.seg_type", false]], "seg_types (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.seg_types", false]], "segment() (in module kraken.blla)": [[2, "kraken.blla.segment", false]], "segment() (in module kraken.pageseg)": [[2, "kraken.pageseg.segment", false]], "serialize() (in module kraken.serialization)": [[2, "kraken.serialization.serialize", false]], "serialize_segmentation() (in module kraken.serialization)": [[2, "kraken.serialization.serialize_segmentation", false]], "set_num_threads() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.set_num_threads", false]], "spec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.spec", false]], "split (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.split", false]], "strict (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.strict", false]], "suffix (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.suffix", false]], "tags (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.tags", false]], "tags (kraken.rpred.ocr_record attribute)": [[2, "kraken.rpred.ocr_record.tags", false]], "tags_ignore (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.tags_ignore", false]], "targets (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.targets", false]], "text_direction (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.text_direction", false]], "text_transforms (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.text_transforms", false]], "text_transforms (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.text_transforms", false]], "tmpl (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.tmpl", false]], "to() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.to", false]], "to() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.to", false]], "torchseqrecognizer (class in kraken.lib.models)": [[2, "kraken.lib.models.TorchSeqRecognizer", false]], "torchvgslmodel (class in kraken.lib.vgsl)": [[2, "kraken.lib.vgsl.TorchVGSLModel", false]], "train (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.train", false]], "train() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.train", false]], "transcriptioninterface (class in kraken.transcribe)": [[2, "kraken.transcribe.TranscriptionInterface", false]], "transform() (kraken.lib.dataset.baselineset method)": [[2, "kraken.lib.dataset.BaselineSet.transform", false]], "transforms (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.transforms", false]], "transforms (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.transforms", false]], "transforms (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.transforms", false]], "ts (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.ts", false]], "type (kraken.rpred.ocr_record attribute)": [[2, "kraken.rpred.ocr_record.type", false]], "user_metadata (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id4", false], [2, "kraken.lib.vgsl.TorchVGSLModel.user_metadata", false]], "valid_baselines (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.valid_baselines", false]], "valid_regions (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.valid_regions", false]], "vectorize_lines() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.vectorize_lines", false]], "width (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id15", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.width", false]], "write() (kraken.transcribe.transcriptioninterface method)": [[2, "kraken.transcribe.TranscriptionInterface.write", false]]}, "objects": {"kraken.binarization": [[2, 0, 1, "", "nlbin"]], "kraken.blla": [[2, 0, 1, "", "segment"]], "kraken.lib.codec": [[2, 1, 1, "", "PytorchCodec"]], "kraken.lib.codec.PytorchCodec": [[2, 2, 1, "", "add_labels"], [2, 3, 1, "", "c_sorted"], [2, 2, 1, "", "decode"], [2, 2, 1, "", "encode"], [2, 4, 1, "", "is_valid"], [2, 3, 1, "", "l2c"], [2, 4, 1, "", "max_label"], [2, 2, 1, "", "merge"], [2, 3, 1, "", "strict"]], "kraken.lib.ctc_decoder": [[2, 0, 1, "", "beam_decoder"], [2, 0, 1, "", "blank_threshold_decoder"], [2, 0, 1, "", "greedy_decoder"]], "kraken.lib.dataset": [[2, 1, 1, "", "BaselineSet"], [2, 1, 1, "", "GroundTruthDataset"], [2, 1, 1, "", "PolygonGTDataset"], [2, 0, 1, "", "compute_error"], [2, 0, 1, "", "preparse_xml_data"]], "kraken.lib.dataset.BaselineSet": [[2, 2, 1, "", "add"], [2, 3, 1, "", "aug"], [2, 3, 1, "", "class_mapping"], [2, 3, 1, "", "class_stats"], [2, 3, 1, "", "im_mode"], [2, 3, 1, "", "imgs"], [2, 3, 1, "", "line_width"], [2, 3, 1, "", "mbl_dict"], [2, 3, 1, "", "mode"], [2, 3, 1, "", "mreg_dict"], [2, 3, 1, "", "num_classes"], [2, 3, 1, "", "seg_type"], [2, 3, 1, "", "targets"], [2, 2, 1, "", "transform"], [2, 3, 1, "", "transforms"], [2, 3, 1, "", "valid_baselines"], [2, 3, 1, "", "valid_regions"]], "kraken.lib.dataset.GroundTruthDataset": [[2, 2, 1, "", "add"], [2, 3, 1, "", "alphabet"], [2, 3, 1, "", "aug"], [2, 2, 1, "", "encode"], [2, 3, 1, "", "im_mode"], [2, 2, 1, "", "no_encode"], [2, 2, 1, "", "parse"], [2, 3, 1, "", "seg_type"], [2, 3, 1, "", "split"], [2, 3, 1, "", "suffix"], [2, 3, 1, "", "text_transforms"], [2, 3, 1, "", "transforms"]], "kraken.lib.dataset.PolygonGTDataset": [[2, 2, 1, "", "add"], [2, 3, 1, "", "alphabet"], [2, 3, 1, "", "aug"], [2, 2, 1, "", "encode"], [2, 3, 1, "", "im_mode"], [2, 2, 1, "", "no_encode"], [2, 2, 1, "", "parse"], [2, 3, 1, "", "seg_type"], [2, 3, 1, "", "text_transforms"], [2, 3, 1, "", "transforms"]], "kraken.lib.exceptions": [[2, 1, 1, "", "KrakenCairoSurfaceException"], [2, 1, 1, "", "KrakenCodecException"], [2, 1, 1, "", "KrakenEncodeException"], [2, 1, 1, "", "KrakenInputException"], [2, 1, 1, "", "KrakenInvalidModelException"], [2, 1, 1, "", "KrakenRecordException"], [2, 1, 1, "", "KrakenRepoException"], [2, 1, 1, "", "KrakenStopTrainingException"]], "kraken.lib.exceptions.KrakenCairoSurfaceException": [[2, 3, 1, "id13", "height"], [2, 3, 1, "id14", "message"], [2, 3, 1, "id15", "width"]], "kraken.lib.models": [[2, 1, 1, "", "TorchSeqRecognizer"], [2, 0, 1, "", "load_any"]], "kraken.lib.models.TorchSeqRecognizer": [[2, 3, 1, "", "codec"], [2, 3, 1, "", "decoder"], [2, 3, 1, "", "device"], [2, 2, 1, "", "forward"], [2, 3, 1, "", "kind"], [2, 3, 1, "", "nn"], [2, 3, 1, "", "one_channel_mode"], [2, 2, 1, "", "predict"], [2, 2, 1, "", "predict_labels"], [2, 2, 1, "", "predict_string"], [2, 3, 1, "", "seg_type"], [2, 2, 1, "", "to"], [2, 3, 1, "", "train"]], "kraken.lib.segmentation": [[2, 0, 1, "", "calculate_polygonal_environment"], [2, 0, 1, "", "compute_polygon_section"], [2, 0, 1, "", "denoising_hysteresis_thresh"], [2, 0, 1, "", "extract_polygons"], [2, 0, 1, "", "polygonal_reading_order"], [2, 0, 1, "", "reading_order"], [2, 0, 1, "", "scale_polygonal_lines"], [2, 0, 1, "", "scale_regions"], [2, 0, 1, "", "vectorize_lines"]], "kraken.lib.train": [[2, 1, 1, "", "KrakenTrainer"]], "kraken.lib.train.KrakenTrainer": [[2, 2, 1, "", "fit"]], "kraken.lib.vgsl": [[2, 1, 1, "", "TorchVGSLModel"]], "kraken.lib.vgsl.TorchVGSLModel": [[2, 2, 1, "", "add_codec"], [2, 2, 1, "", "append"], [2, 3, 1, "", "blocks"], [2, 2, 1, "", "build_addition"], [2, 2, 1, "", "build_conv"], [2, 2, 1, "", "build_dropout"], [2, 2, 1, "", "build_groupnorm"], [2, 2, 1, "", "build_identity"], [2, 2, 1, "", "build_maxpool"], [2, 2, 1, "", "build_output"], [2, 2, 1, "", "build_parallel"], [2, 2, 1, "", "build_reshape"], [2, 2, 1, "", "build_rnn"], [2, 2, 1, "", "build_series"], [2, 3, 1, "", "codec"], [2, 3, 1, "id0", "criterion"], [2, 2, 1, "", "eval"], [2, 4, 1, "", "hyper_params"], [2, 3, 1, "", "idx"], [2, 2, 1, "", "init_weights"], [2, 3, 1, "id1", "input"], [2, 2, 1, "", "load_clstm_model"], [2, 2, 1, "", "load_model"], [2, 2, 1, "", "load_pronn_model"], [2, 3, 1, "", "m"], [2, 4, 1, "", "model_type"], [2, 3, 1, "", "named_spec"], [2, 3, 1, "id2", "nn"], [2, 4, 1, "id3", "one_channel_mode"], [2, 3, 1, "", "ops"], [2, 3, 1, "", "pattern"], [2, 2, 1, "", "resize_output"], [2, 2, 1, "", "save_model"], [2, 4, 1, "", "seg_type"], [2, 2, 1, "", "set_num_threads"], [2, 3, 1, "", "spec"], [2, 2, 1, "", "to"], [2, 2, 1, "", "train"], [2, 3, 1, "id4", "user_metadata"]], "kraken.lib.xml": [[2, 0, 1, "", "parse_alto"], [2, 0, 1, "", "parse_page"], [2, 0, 1, "", "parse_xml"]], "kraken.pageseg": [[2, 0, 1, "", "segment"]], "kraken.rpred": [[2, 0, 1, "", "bidi_record"], [2, 1, 1, "", "mm_rpred"], [2, 1, 1, "", "ocr_record"], [2, 0, 1, "", "rpred"]], "kraken.rpred.mm_rpred": [[2, 3, 1, "", "bidi_reordering"], [2, 3, 1, "", "bounds"], [2, 3, 1, "", "filtered_tags"], [2, 3, 1, "", "im"], [2, 3, 1, "", "im_str"], [2, 3, 1, "", "miss"], [2, 3, 1, "", "nets"], [2, 3, 1, "", "one_channel_modes"], [2, 3, 1, "", "pad"], [2, 3, 1, "", "seg_types"], [2, 3, 1, "", "tags"], [2, 3, 1, "", "tags_ignore"], [2, 3, 1, "", "ts"]], "kraken.rpred.ocr_record": [[2, 3, 1, "", "base_dir"], [2, 3, 1, "", "confidences"], [2, 3, 1, "", "cuts"], [2, 3, 1, "", "prediction"], [2, 3, 1, "", "tags"], [2, 3, 1, "", "type"]], "kraken.serialization": [[2, 0, 1, "", "render_report"], [2, 0, 1, "", "serialize"], [2, 0, 1, "", "serialize_segmentation"]], "kraken.transcribe": [[2, 1, 1, "", "TranscriptionInterface"]], "kraken.transcribe.TranscriptionInterface": [[2, 2, 1, "", "add_page"], [2, 3, 1, "", "env"], [2, 3, 1, "", "font"], [2, 3, 1, "", "line_idx"], [2, 3, 1, "", "page_idx"], [2, 3, 1, "", "pages"], [2, 3, 1, "", "seg_idx"], [2, 3, 1, "", "text_direction"], [2, 3, 1, "", "tmpl"], [2, 2, 1, "", "write"]]}, "objnames": {"0": ["py", "function", "Python function"], "1": ["py", "class", "Python class"], "2": ["py", "method", "Python method"], "3": ["py", "attribute", "Python attribute"], "4": ["py", "property", "Python property"]}, "objtypes": {"0": "py:function", "1": "py:class", "2": "py:method", "3": "py:attribute", "4": "py:property"}, "terms": {"": [1, 2, 4, 5, 6, 7, 8], "0": [0, 1, 2, 4, 5, 7, 8], "00": [5, 7], "0001": 5, "0005": 4, "001": [5, 7], "0123456789": [4, 7], "01c59": 8, "02": 4, "0245": 7, "04": 7, "06": 7, "07": 5, "09": 7, "0d": 7, "1": [1, 2, 5, 7, 8], "10": [1, 4, 5, 7], "100": [2, 5, 7, 8], "10000": 4, "1015": 1, "1020": 8, "10218": 5, "1024": 8, "103": 1, "105": 1, "106": 5, "108": 5, "11": 7, "1128": 5, "11346": 5, "1161": 1, "117": 1, "1184": 7, "119": 1, "1195": 1, "12": [5, 7, 8], "120": 5, "1200": 5, "121": 1, "122": 5, "124": 1, "125": 5, "126": 1, "128": [5, 8], "13": [5, 7], "131": 1, "132": 7, "1339": 7, "134": 5, "135": 5, "1359": 7, "136": 5, "1377": 1, "1385": 1, "1388": 1, "1397": 1, "14": [0, 5], "1408": [1, 2], "1410": 1, "1412": 1, "1416": 7, "143": 7, "144": 5, "145": 1, "15": [1, 5, 7], "151": 5, "1558": 7, "1567": 5, "157": 7, "15924": 0, "16": [2, 5, 8], "161": 7, "1623": 7, "1626": 0, "1676": 0, "1678": 0, "1681": 7, "1697": 7, "17": [2, 5], "1708": 1, "1716": 1, "172": 5, "1724": 7, "174": 5, "1754": 7, "176": 7, "18": [5, 7], "1824": 1, "19": [1, 5], "192": 5, "197": 0, "198": 5, "199": 5, "1996": 7, "1cycl": 5, "1d": 8, "1st": 7, "1x0": 5, "1x12": [5, 8], "1x16": 8, "1x48": 8, "2": [2, 4, 5, 7, 8], "20": [1, 2, 5, 8], "200": 5, "2000": 1, "2001": 5, "2006": 2, "2014": 2, "2016": 1, "2017": 1, "2019": [4, 5], "2020": 4, "204": 7, "2041": 1, "207": [1, 5], "2072": 1, "2077": 1, "2078": 1, "2096": 7, "21": 4, "210": 5, "215": 5, "216": [0, 1], "2169": 0, "2172": 0, "22": [5, 7], "2208": 0, "221": 0, "2215": 0, "2236": 0, "2241": 0, "228": 1, "2293": 0, "23": 5, "230": 1, "2302": 0, "232": 1, "2334": 7, "2364": 7, "23rd": 2, "24": [1, 7], "241": 5, "2423": 0, "2424": 0, "2426": 1, "244": 0, "246": 5, "2483": 1, "25": [1, 5, 7, 8], "250": 1, "2500": 7, "253": 1, "256": [5, 7, 8], "2577813": 4, "259": 7, "26": [4, 7], "2641": 0, "266": 5, "2681": 0, "27": 5, "270": 7, "27046": 7, "274": 5, "28": [1, 5], "2873": 2, "29": [1, 5], "2d": [2, 8], "3": [2, 5, 7, 8], "30": [5, 7], "300": 5, "300dpi": 7, "307": 7, "309": 0, "31": 5, "318": 0, "32": [5, 8], "320": 0, "328": 5, "3292": 1, "336": 7, "3367": 1, "3398": 1, "3414": 1, "3418": 7, "3437": 1, "345": 1, "3455": 1, "35000": 7, "3504": 7, "3514": 1, "3519": 7, "35619": 7, "365": 7, "3680": 7, "38": 5, "384": 8, "39": 5, "4": [0, 1, 2, 5, 7, 8], "40": 7, "400": [0, 5], "4000": 5, "412": 0, "416": 0, "428": 7, "431": 7, "46": 5, "47": 7, "471": 1, "473": 1, "48": [5, 7, 8], "488": 7, "49": [5, 7], "491": 1, "4d": 2, "5": [1, 2, 5, 7, 8], "50": [5, 7], "500": 5, "503": 0, "509": 1, "512": 8, "515": 1, "52": [5, 7], "522": 1, "5226": 5, "523": 5, "5230": 5, "5234": 5, "524": 1, "5258": 7, "5281": 4, "53": 5, "534": 1, "536": [1, 5], "53980": 5, "54": 1, "54114": 5, "5431": 5, "545": 7, "546": 0, "56": [1, 7], "561": 0, "562": 1, "575": [1, 5], "577": 7, "59": [7, 8], "5951": 7, "599": 7, "6": [5, 7, 8], "60": [5, 7], "6022": 5, "62": 5, "63": 5, "64": [5, 8], "646": 7, "66": [5, 7], "668": 1, "69": 1, "7": [1, 5, 7, 8], "70": 1, "7012": 5, "7015": 7, "71": 5, "7272": 7, "7281": 7, "73": 1, "74": [1, 5], "7593": 5, "76": 1, "773": 5, "7857": 5, "788": [5, 7], "79": 1, "794": 5, "7943": 5, "8": [5, 7, 8], "80": [2, 5], "800": 7, "8014": 5, "81": [5, 7], "811": 7, "82": 5, "824": 7, "8337": 5, "8344": 5, "8374": 5, "84": [1, 7], "8445": 7, "8479": 7, "848": 0, "8481": 7, "8482": 7, "8484": 7, "8485": 7, "8486": 7, "8487": 7, "8488": 7, "8489": 7, "8490": 7, "8491": 7, "8492": 7, "8493": 7, "8494": 7, "8495": 7, "8496": 7, "8497": 7, "8498": 7, "8499": 7, "8500": 7, "8501": 7, "8502": 7, "8503": 7, "8504": 7, "8505": 7, "8506": 7, "8507": 7, "8508": 7, "8509": 7, "8510": 7, "8511": 7, "8512": 7, "8616": 5, "8620": 5, "876": 7, "8760": 5, "8762": 5, "8790": 5, "8795": 5, "8797": 5, "88": [5, 7], "8802": 5, "8804": 5, "8806": 5, "8813": 5, "8876": 5, "8878": 5, "8883": 5, "889": 7, "9": [1, 5, 7, 8], "90": [2, 5], "906": 8, "906x32": 8, "91": 5, "92": 1, "9315": 7, "9318": 7, "9350": 7, "9361": 7, "9381": 7, "95": [4, 5], "9541": 7, "9550": 7, "96": [5, 7], "97": 7, "98": 7, "99": [4, 7], "9918": 7, "9920": 7, "9924": 7, "A": [0, 1, 2, 4, 5, 7, 8], "As": [0, 1, 2], "By": 7, "For": [0, 1, 2, 5, 7, 8], "If": [0, 2, 4, 5, 7, 8], "In": [0, 1, 2, 4, 5, 7], "It": [0, 1, 5, 7], "NO": 7, "One": [2, 5], "The": [0, 1, 2, 3, 4, 5, 7, 8], "Then": 5, "There": [0, 1, 4, 5, 6, 7], "These": [0, 1, 2, 4, 5, 7], "To": [0, 1, 2, 4, 5, 7], "Will": 2, "abbyxml": 4, "abbyyxml": 0, "abcdefghijklmnopqrstuvwxyz": 4, "abjad": 5, "abl": [0, 2, 5, 7], "abort": [5, 7], "about": 7, "abov": [0, 1, 5, 7], "absolut": [2, 5], "abugida": 5, "acceler": [4, 5, 7], "accept": [0, 2, 5], "access": [0, 1], "accord": [2, 5], "accordingli": 2, "account": 7, "accur": 5, "accuraci": [1, 2, 4, 5, 7], "achiev": 7, "acm": 2, "across": [2, 5], "action": [0, 5], "activ": [5, 7, 8], "actual": [2, 4, 7], "ad": [2, 5, 7], "adam": 5, "adapt": 5, "add": [2, 4, 5, 8], "add_codec": 2, "add_label": 2, "add_pag": 2, "addit": [0, 1, 2, 4, 5], "addition": 2, "adjust": [5, 7, 8], "advantag": 5, "advis": 7, "affect": 7, "after": [1, 5, 7, 8], "afterward": 1, "again": [4, 7], "agenc": 4, "aggreg": 2, "ah": 7, "aid": 4, "aku": 7, "al": [2, 7], "alam": 7, "albeit": 7, "aletheia": 7, "alex": 2, "algorithm": [0, 1, 2, 5], "all": [0, 1, 2, 4, 5, 6, 7], "allow": [5, 6, 7], "almost": [0, 1], "along": [2, 8], "alphabet": [2, 4, 5, 7, 8], "alphanumer": 0, "alreadi": 5, "also": [0, 1, 2, 4, 5, 7], "altern": [0, 5, 8], "although": [0, 1, 5, 7], "alto": [0, 1, 2, 4, 7], "alto_doc": 1, "alwai": [0, 2, 4], "amiss": 7, "among": 5, "amount": [0, 7], "an": [0, 1, 2, 4, 5, 7, 8], "anaconda": 4, "analysi": [0, 4, 7], "ani": [0, 1, 2, 5], "annot": [0, 4, 5], "anoth": [2, 5, 7, 8], "anr": 4, "antiqua": 0, "anymor": [5, 7], "anyth": 2, "apach": 4, "apart": [3, 5], "apdjfqpf": 2, "api": 5, "append": [0, 2, 5, 7, 8], "appli": [1, 2, 4, 7, 8], "applic": [1, 7], "approach": [5, 7], "appropri": [0, 2, 4, 5, 7, 8], "approxim": 1, "ar": [0, 1, 2, 4, 5, 6, 7, 8], "arab": [0, 5, 7], "arbitrari": [1, 6, 7, 8], "architectur": [4, 5, 6, 8], "archiv": [1, 5, 7], "area": 2, "aren": 2, "arg": 2, "argument": [1, 5], "arm": 4, "around": [1, 2, 5, 7], "arrai": [1, 2], "arrow": 5, "arxiv": 2, "aspect": 2, "assign": [2, 5, 7], "associ": 1, "assum": 2, "attach": [1, 5], "attribut": [0, 1, 2, 5], "au": 4, "aug": 2, "augment": [1, 2, 5, 7, 8], "author": [0, 4], "auto": [1, 2, 5], "autodetermin": 2, "automat": [0, 1, 2, 5, 7, 8], "auxiliari": 1, "avail": [0, 1, 4, 5, 7], "avenir": 4, "averag": [0, 2, 5, 7], "awni": 2, "axi": [2, 8], "b": [0, 1, 5, 7, 8], "back": 2, "backend": 3, "background": 2, "base": [1, 2, 5, 6, 7, 8], "base_dir": 2, "baselin": [2, 4, 5, 7], "baseline_seg": 1, "baselineset": 2, "basic": [0, 5, 7], "batch": [0, 2, 7, 8], "bayr\u016bt": 7, "bbox": 2, "beam": 2, "beam_decod": 2, "beam_siz": 2, "becaus": [1, 7], "becom": 0, "been": [0, 4, 5, 7], "befor": [2, 5, 7, 8], "beforehand": 7, "behav": [5, 8], "behavior": 5, "being": [1, 2, 8], "below": [5, 7], "benjamin": [0, 4], "best": [2, 5, 7], "better": 5, "between": [0, 2, 5, 7], "bi": [2, 8], "bidi": [2, 4, 5], "bidi_record": 2, "bidi_reord": 2, "bidilstm": 2, "bidirect": [2, 5], "bidirection": 8, "binar": [1, 7], "binari": [1, 2], "bit": 1, "biton": [0, 2], "bl": 4, "black": [0, 1, 2, 7], "black_colsep": 2, "blank": 2, "blank_threshold_decod": 2, "blla": 1, "blob": 2, "block": [1, 2, 5, 8], "block_i": 5, "block_n": 5, "board": 4, "boilerpl": 1, "bool": 2, "border": [0, 2], "both": [0, 1, 2, 3, 4, 5, 7], "bottom": [0, 1, 2, 4], "bound": [0, 1, 2, 4, 5], "boundari": [1, 2, 5], "box": [0, 1, 2, 4, 5], "break": 7, "brought": 5, "build": [2, 5, 7], "build_addit": 2, "build_conv": 2, "build_dropout": 2, "build_groupnorm": 2, "build_ident": 2, "build_maxpool": 2, "build_output": 2, "build_parallel": 2, "build_reshap": 2, "build_rnn": 2, "build_seri": 2, "buld\u0101n": 7, "bw": 4, "bw_im": 1, "bw_imag": 7, "b\u00e9n\u00e9fici\u00e9": 4, "c": [1, 2, 4, 5, 8], "c1": 2, "c2": 2, "c_sort": 2, "cach": 2, "cairo": 2, "calcul": [1, 2], "calculate_polygonal_environ": 2, "call": [1, 2, 5, 7], "callabl": 2, "callback": [1, 2], "can": [0, 1, 2, 3, 4, 5, 7, 8], "capabl": 5, "case": [0, 1, 2, 5, 7], "cat": 0, "caus": [1, 2], "caveat": 5, "cd": 4, "ce": [4, 7], "cell": 8, "cent": 7, "centerlin": 5, "central": [4, 7], "certain": [0, 2, 7], "chain": [0, 4, 7], "chang": [0, 1, 2, 5], "channel": [2, 4, 8], "char": 2, "char_confus": 2, "charact": [0, 1, 2, 4, 5, 6, 7], "charset": 2, "check": [0, 5], "chines": 5, "chinese_training_data": 5, "choic": 5, "chosen": 1, "circumst": 7, "class": [1, 2, 5, 7], "class_map": 2, "class_stat": 2, "classic": 7, "classif": [2, 7, 8], "classifi": [0, 1, 8], "classmethod": 2, "claus": 7, "cli": 1, "client": 0, "clone": [0, 4], "close": 4, "closer": 1, "clstm": [0, 2, 6], "code": [0, 1, 2, 4, 5, 7], "codec": 1, "collect": [2, 7], "color": [0, 1, 5, 7, 8], "colsep": 0, "column": [0, 1, 2], "com": [4, 7], "combin": [0, 1, 5, 7, 8], "come": 2, "command": [0, 1, 4, 5, 7], "commenc": 1, "common": [2, 5, 7], "commoni": 5, "compact": [0, 6], "compar": 5, "comparison": 5, "compat": [2, 3, 5], "compil": 5, "complet": [1, 5, 7], "complex": [1, 7], "complic": 5, "compos": 2, "composedblocktyp": 5, "compound": 2, "compress": 7, "compris": 7, "comput": [2, 3, 4, 5, 7], "computation": 7, "compute_error": 2, "compute_polygon_sect": 2, "conda": 7, "condit": [4, 5], "confer": 2, "confid": [0, 1, 2], "configur": [1, 2, 5], "conform": 5, "confus": 5, "connect": [2, 7], "connectionist": 2, "consid": 2, "consist": [0, 1, 4, 7, 8], "constant": 5, "construct": [5, 7], "contain": [0, 1, 2, 4, 5, 6, 7], "content": 5, "continu": [1, 2, 5, 7], "contrast": 7, "contrib": 1, "contribut": 4, "control": 5, "conv": [5, 8], "convers": [1, 7], "convert": [0, 1, 2, 5, 7], "convolut": [2, 5], "coord": 5, "coordin": [0, 2, 4], "core": 6, "coreml": 2, "corpu": [4, 5], "correct": [1, 2, 5, 7], "correspond": [0, 1, 2], "cosin": 5, "cost": 7, "could": [2, 5], "couldn": 2, "count": [2, 5, 7], "counter": 2, "coupl": [0, 5, 7], "coverag": 7, "cpu": [1, 2, 5, 7], "cr3": [5, 8], "cr7": 5, "creat": [2, 4, 5, 7, 8], "criterion": 2, "ctc": [1, 2, 5], "ctc_decod": 1, "cuda": [3, 4, 5], "cudnn": 3, "cumbersom": 0, "cuneiform": 5, "curat": 0, "current": [2, 5, 6], "custom": [1, 5], "cut": [1, 2, 4], "cycl": 5, "d": [0, 4, 5, 7, 8], "dai": 4, "data": [0, 1, 2, 4, 7, 8], "dataset": 1, "dataset_larg": 5, "date": 4, "de": [4, 7], "deal": [0, 5], "debug": [1, 5, 7], "decai": 5, "decid": 0, "decod": [1, 2, 5], "decompos": 5, "decomposit": 5, "decreas": 7, "deem": 0, "def": 1, "default": [0, 1, 2, 4, 5, 6, 7, 8], "default_split": 2, "defin": [0, 1, 2, 4, 5, 8], "definit": [5, 8], "degrad": 1, "degre": 7, "del_indic": 2, "delet": [2, 5, 7], "delta": 5, "denoising_hysteresis_thresh": 2, "depend": [0, 1, 4, 5, 7], "depth": [5, 7, 8], "describ": [2, 5], "descript": [0, 5], "descriptor": 2, "deseri": 2, "desir": [1, 8], "desktop": 7, "destin": 2, "destroi": 5, "detail": [0, 5, 7], "detect": 2, "determin": [0, 2, 5], "develop": [2, 4], "deviat": 5, "devic": [1, 2, 5, 7], "diacrit": 5, "diaeres": 7, "diaeresi": 7, "diagram": 5, "dialect": 8, "dice": 5, "dict": 2, "dictionari": [2, 5], "differ": [0, 1, 4, 5, 7, 8], "difficult": 5, "digit": 5, "dim": [5, 7, 8], "dimens": [2, 8], "dimension": 5, "dir": [2, 5], "direct": [0, 1, 2, 4, 5, 7, 8], "directli": [0, 5], "directori": [1, 4, 5, 7], "disabl": [0, 2, 5, 7], "disk": 7, "displai": [2, 5], "dist1": 2, "dist2": 2, "distanc": 2, "distribut": 8, "dnn": 2, "do": [1, 2, 4, 5, 6, 7, 8], "do0": [5, 8], "document": [0, 1, 2, 4, 5, 7], "doe": [1, 2, 5, 7], "doesn": [2, 5, 7], "domain": [1, 5], "done": [5, 7], "dot": 7, "down": 7, "download": [4, 7], "downward": 2, "drastic": 5, "draw": 1, "drawback": 5, "driver": 1, "drop": [1, 8], "dropout": [2, 5, 7], "du": 4, "dumb": 5, "duplic": 2, "dure": [2, 5, 7], "e": [0, 1, 2, 5, 7, 8], "each": [0, 1, 2, 4, 5, 7, 8], "earli": [5, 7], "easiest": 7, "easili": [5, 7], "ecod": 2, "edg": 2, "edit": [2, 7], "editor": 7, "edu": 7, "either": [0, 2, 5, 7, 8], "element": 5, "email": 0, "emit": 2, "emploi": 7, "empti": 2, "en": 0, "enabl": [0, 1, 2, 3, 5, 7, 8], "enable_progress_bar": [1, 2], "enable_summari": 2, "encapsul": 1, "encod": [2, 5, 7], "end": [1, 2], "end_separ": 2, "endpoint": 2, "energi": 2, "enforc": [0, 5], "engin": 1, "english": 4, "enough": 7, "ensur": 5, "entri": 2, "env": [2, 4, 7], "environ": [2, 4, 7], "environment_cuda": 4, "epoch": [5, 7], "equal": [1, 7, 8], "equival": 8, "erron": 7, "error": [0, 2, 5, 7], "escal": [0, 2], "escripta": 4, "escriptorium": [4, 7], "esr": 4, "estim": [0, 2, 7], "et": 2, "european": 4, "eval": 2, "evalu": [0, 5], "evaluation_data": 1, "evaluation_fil": 1, "even": 7, "everyth": 5, "exact": [5, 7], "exactli": [1, 5], "exampl": [1, 5, 7], "except": [1, 5], "execut": [0, 7, 8], "exhaust": 7, "exist": [0, 1, 5, 7], "exit": 2, "expand": 0, "expect": [2, 5, 7, 8], "experi": [4, 7], "experiment": 7, "explicit": [1, 5], "explicitli": [0, 5, 7], "exponenti": 5, "express": 0, "extend": 8, "extens": 5, "extent": 7, "extra": [2, 4], "extract": [0, 1, 2, 4, 5, 7], "extract_polygon": 2, "extrapol": 2, "f": [0, 4, 5, 7, 8], "f_t": 2, "factor": 2, "fail": 5, "fairli": 7, "fallback": 0, "fals": [1, 2, 5, 7, 8], "faq\u012bh": 7, "faster": [5, 7, 8], "fd": 2, "featur": [0, 1, 2, 7, 8], "fed": [0, 1, 2, 5, 8], "feed": [0, 1], "feminin": 7, "fetch": 7, "few": [0, 5, 7], "field": [0, 2, 5], "figur": 1, "file": [0, 1, 2, 4, 5, 6, 7], "file_1": 5, "file_2": 5, "filenam": [1, 2, 5], "filenotfounderror": 2, "fill": 2, "filter": [1, 2, 5, 8], "filtered_tag": 2, "final": [0, 2, 4, 5, 7, 8], "find": [5, 7], "fine": [1, 7], "finish": 7, "first": [0, 1, 2, 5, 7, 8], "fit": [1, 2, 7], "fix": [5, 7], "flag": [1, 2, 4], "float": [0, 2], "flush": 2, "fname": 2, "follow": [0, 2, 5, 8], "font": 2, "font_styl": 2, "foo": [1, 5], "forg": 4, "form": [2, 5], "format": [0, 1, 2, 6, 7], "format_typ": [1, 2], "formul": 8, "forward": [2, 8], "found": [1, 5, 7], "fp": 1, "framework": [1, 4], "free": [2, 5], "freeli": [0, 7], "frequenc": [5, 7], "friendli": [4, 7], "from": [0, 1, 2, 3, 4, 7, 8], "full": 7, "fulli": [2, 4], "function": [1, 5], "fundament": 1, "further": [1, 2, 4, 5], "g": [0, 2, 5, 7, 8], "gain": 1, "garantue": 2, "gaussian_filt": 2, "gener": [0, 1, 2, 4, 5, 7], "gentl": 5, "get": [0, 1, 4, 5, 7], "git": [0, 4], "github": 4, "githubusercont": 7, "gitter": 4, "given": [1, 2, 5, 8], "glob": [0, 1], "glyph": [5, 7], "gn": 8, "gn32": 5, "go": 7, "good": 5, "gov": 5, "gpu": [1, 5], "gradient": 2, "grain": [1, 7], "graph": [2, 8], "graphem": [2, 5, 7], "graphic": 5, "grave": 2, "grayscal": [0, 1, 2, 7, 8], "greedi": 2, "greedili": 2, "greedy_decod": [1, 2], "greek": [0, 7], "grei": 0, "grek": 0, "ground": [5, 7], "ground_truth": 1, "groundtruthdataset": 2, "group": [4, 7], "gru": [2, 8], "gt": [2, 5], "guarante": 1, "guid": 7, "gz": 0, "g\u00e9r\u00e9e": 4, "h": [0, 2, 7], "ha": [0, 1, 2, 4, 5, 7, 8], "hamza": [5, 7], "han": 5, "hand": [5, 7], "handl": 1, "handwrit": 5, "handwritten": [1, 5], "hannun": 2, "happen": 1, "happili": 0, "hard": [2, 7], "hardwar": 4, "haut": 4, "have": [0, 1, 2, 3, 4, 5, 7], "heatmap": 1, "hebrew": [5, 7], "hebrew_training_data": 5, "height": [0, 2, 5, 8], "held": 7, "help": [4, 7], "here": 5, "high": [0, 1, 2, 7, 8], "higher": 8, "highli": [2, 5, 7], "histor": 4, "hline": 0, "hoc": 5, "hocr": [0, 2, 4, 7], "horizon": 4, "horizont": [0, 1, 2], "hour": 7, "how": [4, 5, 7], "hpo": 5, "html": 2, "http": [0, 5, 7], "huffmann": 5, "human": 5, "hundr": 7, "hyper_param": 2, "h\u0101d\u012b": 7, "i": [0, 1, 2, 4, 5, 6, 7, 8], "ibn": 7, "id": 5, "ident": 1, "identifi": 0, "idx": 2, "ignor": [0, 2, 5], "illustr": 2, "im": [1, 2], "im_feat": 2, "im_mod": 2, "im_str": 2, "im_transform": 2, "imag": [0, 1, 2, 4, 5, 8], "image_nam": [1, 2], "image_s": [1, 2], "imagefilenam": 5, "imaginari": 7, "img": 2, "immedi": 5, "impath": 2, "implement": [1, 8], "implicitli": 5, "import": [1, 5, 7], "importantli": [2, 5, 7], "improv": [0, 5, 7], "includ": [0, 1, 4, 5, 7], "incompat": 2, "incorrect": 7, "increas": [5, 7], "independ": 8, "index": [0, 2, 5], "indic": [0, 2, 5, 7], "individu": 5, "infer": [2, 4, 5, 7], "influenc": 5, "inform": [0, 1, 2, 4, 5, 7], "ingest": 5, "inherit": [5, 7], "init": 1, "init_weight": 2, "initi": [1, 2, 5, 7, 8], "innov": 4, "input": [1, 2, 5, 7, 8], "input_1": [0, 7], "input_2": [0, 7], "input_imag": 7, "insert": [2, 5, 7, 8], "insid": 2, "insight": 1, "inspect": 7, "instal": 3, "instanc": [1, 2, 5], "instanti": 2, "instead": [2, 5, 7], "insuffici": 7, "int": 2, "integ": [0, 1, 2, 5, 7, 8], "integr": 7, "intend": 4, "intens": 7, "interchang": 2, "interfac": [2, 4], "intermedi": [1, 5, 7], "intern": [0, 1, 2, 7], "interoper": 2, "interrupt": 5, "introduct": 5, "inttensor": 2, "intuit": 8, "invalid": [2, 5], "inventori": [5, 7], "invers": 0, "investiss": 4, "invoc": 5, "invok": 7, "involv": [5, 7], "irregular": 5, "is_valid": 2, "isn": [1, 2, 7, 8], "iso": 0, "iter": [1, 2, 7], "its": [2, 5, 7], "itself": 1, "j": 2, "jinja2": [1, 2], "jpeg": 7, "jpeg2000": [0, 4], "jpg": 5, "json": [0, 4, 5], "just": [0, 1, 4, 5, 7], "justif": 5, "kamil": 5, "kei": [2, 4], "kernel": [5, 8], "kernel_s": 8, "keto": [5, 7], "keyword": 0, "kiessl": [0, 4], "kind": [2, 5, 6, 7], "kit\u0101b": 7, "know": 7, "known": [2, 7], "kraken": [0, 1, 3, 5, 6, 8], "krakencairosurfaceexcept": 2, "krakencodecexcept": 2, "krakenencodeexcept": 2, "krakeninputexcept": 2, "krakeninvalidmodelexcept": 2, "krakenrecordexcept": 2, "krakenrepoexcept": 2, "krakenstoptrainingexcept": 2, "krakentrain": [1, 2], "kutub": 7, "kwarg": 2, "l": [0, 2, 4, 7, 8], "l2c": [1, 2], "la": 4, "label": [1, 2, 5], "lack": 7, "lag": 5, "languag": [5, 8], "larg": [0, 1, 2, 4, 5, 7], "larger": [2, 5, 7], "last": [2, 5, 8], "lastli": 5, "later": 7, "latest": [3, 4], "latin": [0, 4], "latin_training_data": 5, "latn": [0, 4], "latter": 1, "layer": [2, 5, 7], "layout": [0, 2, 4, 5, 7], "lbx100": [5, 7, 8], "lbx128": [5, 8], "lbx200": 5, "lbx256": [5, 8], "learn": [1, 2, 5], "least": [5, 7], "leav": [5, 8], "left": [0, 2, 4, 5, 7], "legaci": [5, 7, 8], "leipzig": 7, "len": 2, "length": [2, 5], "less": 7, "let": 7, "level": [1, 2, 5, 7], "lfx25": 8, "lfys20": 8, "lfys64": [5, 8], "lib": 1, "libr": 4, "librari": 1, "licens": 0, "lightn": 1, "lightningmodul": 1, "lightweight": 4, "like": [0, 1, 5, 7], "likewis": [1, 7], "limit": 5, "line": [0, 1, 2, 4, 5, 7, 8], "line_0": 5, "line_idx": 2, "line_k": 5, "line_width": 2, "linear": [2, 5, 7, 8], "link": [4, 5], "linux": [4, 7], "list": [0, 2, 4, 5, 7], "ll": 4, "load": [1, 2, 4, 5, 7], "load_ani": [1, 2], "load_clstm_model": 2, "load_model": [1, 2], "load_pronn_model": 2, "loadabl": 2, "loader": 1, "loc": 5, "locat": [1, 2, 5, 7], "log": [5, 7], "logic": 5, "logograph": 5, "long": 5, "longest": 2, "look": [1, 5, 7], "lossless": 7, "lot": [1, 5], "low": [0, 1, 2, 5], "lower": 5, "lr": [0, 1, 2, 7], "lrate": 5, "lstm": [2, 8], "ltr": [0, 2], "m": [0, 2, 5, 7, 8], "mac": [4, 7], "machin": 2, "maddah": 7, "made": 7, "mai": [0, 2, 5, 7], "main": [4, 5], "mainli": [1, 2], "major": 1, "make": 5, "mandatori": 1, "mani": [2, 5], "manifest": 5, "manual": [1, 2, 7], "manuscript": 7, "map": [0, 1, 2, 5], "mark": [5, 7], "markedli": 7, "mask": [1, 2], "massag": 5, "master": 7, "match": [2, 5], "materi": [1, 4, 7], "matrix": 1, "matter": 7, "max": 2, "max_epoch": 2, "max_label": 2, "maxcolsep": [0, 2], "maxim": 7, "maximum": [0, 2, 8], "maxpool": [2, 5, 8], "mbl_dict": 2, "me": 0, "mean": [1, 2, 7], "measur": 5, "measurementunit": 5, "memori": [2, 5, 7], "merg": [2, 5], "merge_baselin": 2, "merge_region": 2, "messag": 2, "metadata": [0, 1, 2, 4, 5, 6, 7], "method": [1, 2], "might": [5, 7], "min": [2, 5], "min_epoch": 2, "min_length": 2, "minim": [1, 2, 5], "minimum": 5, "minor": 5, "mismatch": [1, 5, 7], "misrecogn": 7, "miss": [0, 2, 5, 7], "mittagessen": [0, 4, 7], "mix": [2, 5], "ml": 6, "mlmodel": [5, 7], "mm_rpred": [1, 2], "mode": [1, 2, 5], "model": [1, 5, 7, 8], "model_1": 5, "model_25": 5, "model_5": 5, "model_best": 5, "model_fil": 7, "model_nam": 7, "model_name_best": 7, "model_path": 1, "model_typ": 2, "modern": [4, 7], "modest": 1, "modif": 5, "modul": 1, "momentum": [5, 7], "more": [0, 1, 2, 4, 5, 7, 8], "most": [1, 2, 5, 7], "mostli": [0, 1, 2, 4, 5, 7, 8], "move": [2, 7, 8], "mp": 8, "mp2": [5, 8], "mp3": [5, 8], "mreg_dict": 2, "much": [1, 2, 4], "multi": [0, 1, 2, 4, 7], "multilabel": 2, "multipl": [0, 1, 4, 5, 7], "myprintingcallback": 1, "n": [2, 5, 8], "name": [0, 2, 4, 7, 8], "named_spec": 2, "national": 4, "nativ": 6, "natur": 7, "naugment": 4, "nchw": 2, "ndarrai": 2, "necessari": [2, 4, 5, 7], "necessarili": [2, 5], "need": [1, 2, 7], "net": [1, 2, 7], "netork": 1, "network": [1, 2, 4, 5, 6, 7], "neural": [1, 2, 5, 6, 7], "never": 7, "nevertheless": [1, 5], "new": [2, 3, 5, 7, 8], "next": [1, 7], "nfc": 5, "nfd": 5, "nfkc": 5, "nfkd": 5, "nlbin": [0, 1, 2], "nn": 2, "no_encod": 2, "no_hlin": 2, "noisi": 7, "non": [0, 1, 2, 4, 5, 7, 8], "none": [2, 5, 7, 8], "nonlinear": 8, "nop": 1, "normal": [0, 2], "note": 2, "notion": 1, "now": [1, 7], "np": 2, "num": 2, "num_class": 2, "number": [0, 1, 2, 5, 7, 8], "numer": [1, 7], "numpi": [1, 2], "nvidia": 3, "o": [0, 1, 4, 5, 7], "o1c103": 8, "object": [1, 2], "obtain": 7, "obvious": 7, "occur": 7, "occurr": 2, "ocr": [0, 1, 2, 4, 7], "ocr_lin": 0, "ocr_record": [1, 2], "ocropi": 2, "ocropu": [0, 2], "ocrx_word": 0, "off": [5, 7], "offer": 5, "offset": [2, 5], "often": [1, 5, 7], "old": 6, "omit": 7, "on_init_end": 1, "on_init_start": 1, "on_train_end": 1, "onc": 5, "one": [0, 1, 2, 5, 7, 8], "one_channel_mod": 2, "ones": [0, 5], "onli": [0, 1, 2, 5, 7, 8], "onto": [2, 5], "op": 2, "open": [0, 1], "openmp": [2, 5, 7], "oper": [0, 1, 2, 8], "optic": [0, 7], "optim": [4, 5, 7], "option": [0, 1, 2, 5, 8], "order": [0, 1, 2, 4, 5, 8], "org": 5, "orient": 1, "origin": [1, 2, 5], "orthogon": 2, "other": [0, 5, 7, 8], "otherwis": [2, 5], "out": [5, 7, 8], "output": [0, 1, 2, 4, 5, 7, 8], "output_1": [0, 7], "output_2": [0, 7], "output_dir": 7, "output_fil": 7, "output_s": 2, "over": 2, "overfit": 7, "overhead": 5, "overlap": 5, "overrid": [2, 5], "overwritten": 2, "p": [0, 4, 5], "packag": [4, 7], "pad": [2, 5], "padding_left": 2, "padding_right": 2, "page": [1, 2, 4, 7], "page_doc": 1, "page_idx": 2, "pagecont": 5, "pageseg": 1, "pagexml": [0, 1, 2, 4, 7], "pair": [0, 2], "par": [1, 4], "paragraph": [0, 5], "parallel": [2, 5], "param": [5, 7, 8], "paramet": [0, 1, 2, 4, 5, 7, 8], "parameterless": 0, "parametr": 2, "pars": [2, 5], "parse_alto": [1, 2], "parse_pag": [1, 2], "parse_xml": 2, "parser": [1, 2, 5], "part": [1, 5, 7, 8], "parti": 1, "partial": [2, 4], "particular": [0, 1, 4, 5, 7, 8], "partit": 5, "pass": [2, 5, 7, 8], "path": [1, 2, 5], "pathlib": 2, "pattern": [2, 7], "pcgt": 5, "pdf": [0, 4, 7], "pdfimag": 7, "pdftocairo": 7, "peopl": 4, "per": [1, 2, 5, 7], "perc": [0, 2], "percentag": 2, "percentil": 2, "perform": [1, 2, 4, 5, 7], "period": 7, "pick": 5, "pickl": [2, 6], "pil": [1, 2], "pillow": 1, "pinpoint": 7, "pipelin": 1, "pixel": [1, 5, 8], "pl_modul": 1, "place": [0, 4, 7], "placement": 7, "plain": 0, "pleas": 5, "plethora": 1, "png": [0, 1, 5, 7], "point": [1, 2, 5, 7], "polygon": [1, 2, 5, 7], "polygonal_reading_ord": 2, "polygongtdataset": 2, "polygonizaton": 2, "polylin": 2, "polyton": [0, 7], "pool": 5, "popul": 2, "porson": 0, "portant": 4, "portion": 0, "posit": 2, "possibl": [0, 1, 2, 5, 7], "postprocess": [1, 5], "potenti": 5, "power": 7, "practic": 5, "pratiqu": 4, "pre": 5, "precis": 5, "precompil": 5, "precomput": 2, "pred": 2, "pred_it": 1, "predict": [1, 2], "predict_label": 2, "predict_str": 2, "prefer": [1, 7], "prefilt": 0, "prefix": [2, 5, 7], "prefix_epoch": 7, "preload": 7, "prematur": 5, "prepar": 7, "preparse_xml_data": 2, "prepend": 8, "preprint": 2, "preprocess": [2, 4], "prerequisit": 4, "preserv": 2, "prevent": [2, 7], "previou": 4, "previous": 5, "primaresearch": 5, "primari": [1, 5], "primarili": 4, "princip": [0, 1, 2, 5], "print": [0, 1, 4, 5, 7], "printspac": 5, "prob": [2, 8], "probabl": [2, 5, 7, 8], "problemat": 5, "proceed": 2, "process": [0, 1, 2, 4, 5, 7, 8], "produc": [0, 1, 5, 7], "programm": 4, "progress": [2, 7], "project": [4, 8], "prone": 5, "pronn": [2, 6], "proper": [1, 2], "properli": 7, "properti": 2, "proport": 5, "protobuf": [2, 6], "prove": 7, "provid": [0, 1, 2, 4, 5, 7, 8], "psl": 4, "public": 4, "pull": [0, 4], "purpos": [1, 2, 7, 8], "put": [0, 2, 7], "py": 1, "pypi": 4, "pyrnn": [0, 2, 6], "python": [2, 4], "pytorch": [1, 3, 6], "pytorch_lightn": [1, 2], "pytorchcodec": 2, "pyvip": 4, "q": 5, "qualiti": [1, 7], "quit": [1, 4, 5], "r": [0, 2, 5, 8], "rais": [1, 2, 5], "ran": 4, "random": [5, 7], "rang": [0, 2], "rapidli": 7, "rate": [5, 7], "rather": [0, 5], "ratio": 5, "raw": [1, 5, 7], "rb": 2, "re": [0, 2], "reach": 7, "read": [0, 2, 4, 5], "reader": 5, "reading_ord": 2, "reading_order_fn": 2, "real": 7, "realiz": 5, "reason": 2, "rec_model_path": 1, "recherch": 4, "recogn": [0, 1, 2, 4, 5, 7], "recognit": [2, 3, 8], "recognitino": 2, "recognitionmodel": 1, "recommend": [1, 5, 7], "record": [1, 2, 4], "rectangl": 2, "recurr": [2, 6], "reduc": [5, 8], "reduceonplateau": 5, "refer": [1, 5, 7], "referenc": 2, "refin": 5, "region": [0, 1, 2, 4, 5, 7], "region_typ": 5, "region_type_0": 2, "region_type_1": 2, "regular": 5, "rel": 5, "relat": [0, 1, 5, 7], "relax": 7, "reliabl": 7, "relu": 8, "remain": [5, 7], "remaind": 8, "remedi": 7, "remov": [0, 2, 5, 7, 8], "render": [1, 2], "render_report": 2, "reorder": [2, 5, 7], "repeatedli": 7, "repolygon": [1, 2], "report": [2, 5, 7], "repositori": [4, 7], "repres": 2, "represent": [2, 7], "reproduc": 5, "request": [0, 4, 8], "requir": [0, 1, 2, 4, 5, 7, 8], "requisit": 7, "research": 4, "reserv": 1, "reshap": [2, 5], "resili": 4, "resiz": [2, 5], "resize_output": 2, "resolv": 5, "respect": [1, 2, 4, 5, 8], "result": [0, 1, 2, 5, 7, 8], "resum": 5, "retain": [0, 2, 5], "retrain": 7, "retriev": [0, 4, 5, 7], "return": [1, 2, 8], "reus": 2, "revers": 8, "rgb": [1, 8], "right": [0, 2, 4, 5, 7], "rl": [0, 2], "rmsprop": [5, 7], "rnn": [2, 4, 5, 7, 8], "romanov": 7, "rough": 7, "routin": 1, "rpred": 1, "rtl": [0, 2], "rtl_display_data": 5, "rtl_training_data": 5, "rukkakha": 7, "rule": 7, "run": [1, 2, 3, 4, 5, 7, 8], "r\u00e9f\u00e9renc": 4, "s1": [5, 8], "same": [0, 1, 2, 4, 5, 7], "sampl": [2, 5, 7], "sarah": 7, "satur": 5, "savant": 7, "save": [2, 5, 7], "save_model": 2, "savefreq": [5, 7], "scale": [0, 2, 8], "scale_polygonal_lin": 2, "scale_region": 2, "scan": 7, "scantailor": 7, "schedul": 5, "schema": 5, "schemaloc": 5, "scientif": 4, "script": [1, 2, 4, 5, 7], "script_detect": [0, 1], "script_typ": 2, "scriptal": 1, "scroung": 4, "seamcarv": 2, "search": 2, "second": [0, 2], "section": [1, 7], "see": [1, 5, 7], "seen": [1, 7], "seg": 1, "seg_idx": 2, "seg_typ": 2, "segment": [4, 7], "segment_k": 5, "segmentation_output": 1, "segmentation_overlai": 1, "segmentationmodel": 1, "segmodel_best": 5, "segresult": 2, "segtrain": 5, "seldom": 7, "select": [0, 2, 5, 8], "selector": 2, "self": 1, "semant": [5, 7], "semi": [0, 7], "sens": 0, "sensibl": [1, 5], "separ": [0, 1, 2, 4, 5, 7, 8], "seqrecogn": 2, "sequenc": [0, 1, 2, 5, 7, 8], "sequenti": 2, "seri": 0, "serial": [0, 4, 6], "serialize_segment": [1, 2], "set": [0, 1, 2, 4, 5, 7, 8], "set_num_thread": 2, "setup": 1, "sever": [1, 2, 7], "sgd": 5, "shape": [2, 5, 8], "share": [0, 5], "shell": 7, "shini": 2, "short": [0, 8], "should": [1, 2, 7], "show": [0, 4, 5, 7], "shown": [0, 7], "shuffl": 1, "sigma": 2, "sigmoid": 8, "similar": [1, 5, 7], "simpl": [1, 5, 7, 8], "singl": [1, 2, 5, 7, 8], "singular": 2, "size": [0, 1, 2, 5, 7, 8], "skew": [0, 7], "skip": 2, "slice": 2, "slightli": [0, 5, 7, 8], "slow": 5, "slower": 5, "small": [0, 1, 2, 5, 7, 8], "so": [1, 3, 5, 7, 8], "sobel": 2, "softmax": [1, 2, 8], "softwar": 7, "some": [0, 1, 2, 4, 5, 7], "someth": [1, 7], "sometim": [1, 4, 5, 7], "somewhat": 7, "soon": [5, 7], "sort": [2, 4, 7], "sourc": [2, 5, 7, 8], "sourceimageinform": 5, "sp": 5, "space": [1, 2, 4, 5, 7], "span": 0, "spec": [2, 5], "special": [0, 1, 2], "specialis": 5, "specif": [2, 5, 7], "specifi": [0, 5], "speckl": 7, "speech": 2, "speedup": 5, "split": [0, 2, 5, 7, 8], "spot": 4, "squash": [2, 8], "stabl": [1, 4], "stack": [2, 5, 8], "stage": 1, "standard": [1, 4, 5, 7], "start": [1, 2, 7], "start_separ": 2, "stddev": 5, "step": [0, 1, 2, 4, 5, 7, 8], "still": [0, 1, 2], "stop": [5, 7], "str": 2, "straightforward": 1, "stream": 5, "strength": 1, "strict": [2, 5], "strictli": 7, "stride": [5, 8], "stride_i": 8, "stride_x": 8, "string": [2, 5, 8], "strip": [0, 8], "structur": [1, 4, 5], "stub": 5, "sub": 1, "subcommand": [0, 4], "subcommand_1": 0, "subcommand_2": 0, "subcommand_n": 0, "subimag": 2, "suboptim": 5, "subsequ": [1, 2], "subset": [1, 2], "substitut": [2, 5, 7], "suffer": 7, "suffici": [1, 5], "suffix": [0, 2], "suggest": 1, "suit": 7, "suitabl": [0, 7], "summar": [2, 5, 7, 8], "superflu": 7, "suppl_obj": 2, "suppli": [0, 1, 2, 5, 7], "support": [1, 4, 5, 6], "suppos": 1, "suppress": 5, "surfac": 2, "switch": [0, 2, 5, 7], "symbol": [5, 7], "syntax": [0, 5, 8], "syr": [5, 7], "syriac": 7, "syriac_best": 7, "system": [4, 5, 7], "systemat": 7, "t": [0, 1, 2, 5, 7, 8], "tabl": [5, 7], "tag": [2, 5], "tags_ignor": 2, "take": [1, 4, 5, 7], "tanh": 8, "target": 2, "task": 7, "tb": [0, 2], "technic": 4, "tell": 5, "templat": [1, 2, 4], "tempor": 2, "tensor": [1, 2, 8], "tensorflow": 8, "term": 4, "tesseract": 8, "test": [2, 7], "test_model": 5, "text": [0, 1, 2, 4, 7], "text_direct": [0, 1, 2], "text_transform": 2, "textblock": 5, "textblock_m": 5, "textblock_n": 5, "textequiv": 5, "textlin": 5, "textregion": 5, "than": [2, 5, 7], "thei": [1, 2, 5, 7], "them": [0, 2, 5], "therefor": [0, 5, 7], "therein": 7, "thi": [0, 1, 2, 4, 5, 6, 7, 8], "third": 1, "those": [0, 5], "though": 1, "thousand": 7, "thread": [2, 5, 7], "three": 6, "threshold": [0, 2], "through": [1, 2, 4, 5, 7], "thrown": 0, "tif": [0, 4], "tiff": [0, 4, 7], "tightli": 7, "time": [1, 2, 5, 7, 8], "tip": 1, "titr": 4, "tmpl": 2, "toi": 0, "told": 5, "too": [5, 8], "tool": [1, 5, 7, 8], "top": [0, 1, 2, 4], "toplin": [2, 5], "topograph": 0, "topolog": 0, "torch": 2, "torchsegrecogn": 2, "torchseqrecogn": [1, 2], "torchvgslmodel": [1, 2], "total": [2, 7], "train": [0, 3, 8], "trainabl": [1, 2, 4, 5], "trainer": 1, "training_data": [1, 5], "training_fil": 1, "transcrib": [5, 7], "transcript": [1, 2, 5], "transcriptioninterfac": 2, "transfer": [1, 5], "transform": [1, 2, 4], "transformt": 1, "translat": 2, "transpos": [5, 7, 8], "travail": 4, "treat": [2, 7, 8], "true": [0, 1, 2, 8], "truth": [5, 7], "try": 2, "tupl": 2, "turn": 4, "tutori": [1, 5], "two": [0, 1, 2, 5, 8], "txt": [0, 2, 4, 5], "type": [0, 1, 2, 5, 7, 8], "typefac": [5, 7], "typograph": [0, 7], "typologi": 5, "u": [1, 5], "u1f05": 5, "un": 4, "unchti": 0, "unclean": 7, "unclear": 5, "undecod": 1, "under": [2, 4], "undesir": [5, 8], "unduli": 0, "unencod": 2, "uni": [0, 7], "unicod": [0, 1, 2, 7], "uniformli": 2, "union": [2, 4], "uniqu": 7, "universit\u00e9": 4, "unless": 5, "unnecessarili": 1, "unpredict": 7, "unrepres": 7, "unseg": [2, 7], "unset": 5, "until": 5, "untrain": 5, "unus": 5, "up": [1, 4, 5], "updat": 0, "upon": 0, "upward": [2, 5, 7], "us": [0, 1, 2, 3, 5, 7, 8], "usabl": 1, "user": [2, 4, 5, 7], "user_metadata": 2, "usual": [1, 5, 7], "utf": 5, "util": [1, 4, 5, 7], "uw3": 0, "v": [5, 7], "v4": 5, "val": 5, "valid": [0, 2, 5], "valid_baselin": 2, "valid_region": 2, "validation_set": 2, "valu": [0, 1, 2, 5, 8], "variabl": [2, 4, 5, 8], "variant": 5, "variat": 5, "varieti": [4, 5], "variou": 0, "vast": 1, "vector": [1, 2], "vectorize_lin": 2, "verbos": [1, 7], "veri": 5, "versa": 5, "versatil": 6, "version": [0, 2, 3, 4, 5], "vertic": [0, 2], "vgsl": [1, 5], "vice": 5, "vocabulari": 2, "vocal": 7, "vpo": 5, "vsgl": 2, "vv": 7, "w": [0, 1, 2, 5, 8], "w3": 5, "wa": [0, 2, 4, 5, 7], "wai": [0, 1, 5, 7], "wait": 5, "want": [4, 5, 7], "warn": [1, 7], "warp": 7, "we": [2, 5, 7], "weak": [1, 7], "websit": 7, "weight": [2, 5], "welcom": [0, 4], "well": [5, 7], "were": [2, 5], "western": 7, "wget": 7, "what": [1, 7], "when": [1, 2, 5, 7, 8], "where": [2, 7], "whether": 2, "which": [0, 1, 2, 3, 4, 5], "while": [0, 1, 2, 5, 7], "white": [0, 1, 2, 7], "whitespac": [2, 5], "whitespace_norm": 2, "whole": [2, 7], "wide": [4, 8], "width": [1, 2, 5, 7, 8], "wildli": 7, "without": [2, 5, 7], "word": [4, 5], "word_text": 5, "work": [1, 2, 5, 7], "worker": 5, "world": 7, "would": 5, "wrapper": [1, 2], "write": [0, 1, 2, 5], "writing_mod": 2, "written": [0, 5, 7], "www": 5, "x": [2, 4, 5, 7, 8], "x0": 2, "x01": 1, "x02": 1, "x03": 1, "x04": 1, "x05": 1, "x06": 1, "x07": 1, "x1": 2, "x2": 2, "x64": 4, "x_0": 2, "x_1": 2, "x_bbox": 0, "x_conf": 0, "x_m": 2, "x_n": 2, "x_stride": 8, "xa0": 7, "xdg_base_dir": 0, "xk": 2, "xm": 2, "xml": [0, 7], "xmln": 5, "xmlschema": 5, "xn": 2, "xsd": 5, "xsi": 5, "xyz": 0, "y": [0, 2, 8], "y0": 2, "y1": 2, "y2": 2, "y_0": 2, "y_1": 2, "y_m": 2, "y_n": 2, "y_stride": 8, "yield": 2, "yk": 2, "ym": 2, "yml": [4, 7], "yn": 2, "you": [4, 5, 7], "your": 0, "y\u016bsuf": 7, "zenodo": 4, "zero": [2, 7, 8], "zoom": [0, 2], "\u00e9cole": 4, "\u00e9tat": 4, "\u00e9tude": 4, "\u02bf\u0101lam": 7, "\u0621": 5, "\u0621\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0627": 5, "\u0628": 5, "\u0629": 5, "\u062a": 5, "\u062b": 5, "\u062c": 5, "\u062d": 5, "\u062e": 5, "\u062f": 5, "\u0630": 5, "\u0631": 5, "\u0632": 5, "\u0633": 5, "\u0634": 5, "\u0635": 5, "\u0636": 5, "\u0637": 5, "\u0638": 5, "\u0639": 5, "\u063a": 5, "\u0640": 5, "\u0641": 5, "\u0642": 5, "\u0643": 5, "\u0644": 5, "\u0645": 5, "\u0646": 5, "\u0647": 5, "\u0648": 5, "\u0649": 5, "\u064a": 5, "\u0710": 7, "\u0712": 7, "\u0713": 7, "\u0715": 7, "\u0717": 7, "\u0718": 7, "\u0719": 7, "\u071a": 7, "\u071b": 7, "\u071d": 7, "\u071f": 7, "\u0720": 7, "\u0721": 7, "\u0722": 7, "\u0723": 7, "\u0725": 7, "\u0726": 7, "\u0728": 7, "\u0729": 7, "\u072a": 7, "\u072b": 7, "\u072c": 7}, "titles": ["Advanced Usage", "API Quickstart", "API Reference", "GPU Acceleration", "kraken", "Training", "Models", "Training kraken", "VGSL network specification"], "titleterms": {"acceler": 3, "acquisit": 7, "advanc": 0, "alto": 5, "annot": 7, "api": [1, 2], "baselin": 1, "basic": [1, 8], "binar": [0, 2], "binari": 5, "blla": 2, "codec": [2, 5], "compil": 7, "concept": 1, "conda": 4, "convolut": 8, "coreml": 6, "ctc_decod": 2, "data": 5, "dataset": [2, 5, 7], "detect": 0, "dropout": 8, "evalu": [2, 7], "exampl": 8, "except": 2, "featur": 4, "find": 4, "fine": 5, "format": 5, "from": 5, "function": 2, "fund": 4, "gpu": 3, "group": 8, "helper": [2, 8], "imag": 7, "input": 0, "instal": [4, 7], "kraken": [2, 4, 7], "layer": 8, "legaci": [1, 2], "lib": 2, "licens": 4, "linegen": 2, "loss": 2, "max": 8, "model": [0, 2, 4, 6], "modul": 2, "network": 8, "normal": [5, 8], "page": [0, 5], "pageseg": 2, "pars": 1, "pip": 4, "plumb": 8, "pool": 8, "preprocess": [1, 7], "quickstart": [1, 4], "recognit": [0, 1, 4, 5, 6, 7], "recurr": 8, "refer": 2, "regular": 8, "relat": 4, "repositori": 0, "reshap": 8, "rpred": 2, "schedul": 2, "scratch": 5, "script": 0, "segment": [0, 1, 2, 5, 6], "serial": [1, 2], "slice": 5, "softwar": 4, "specif": [0, 8], "stopper": 2, "test": 5, "text": 5, "train": [1, 2, 4, 5, 7], "trainer": 2, "transcrib": 2, "transcript": 7, "tune": 5, "tutori": 4, "unicod": 5, "us": 4, "usag": 0, "valid": 7, "vgsl": [2, 8], "xml": [1, 2, 5]}}) \ No newline at end of file diff --git a/4.1/training.html b/4.1/training.html new file mode 100644 index 000000000..19299d21f --- /dev/null +++ b/4.1/training.html @@ -0,0 +1,509 @@ + + + + + + + + Training kraken — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Training kraken

+

kraken is an optical character recognition package that can be trained fairly +easily for a large number of scripts. In contrast to other system requiring +segmentation down to glyph level before classification, it is uniquely suited +for the recognition of connected scripts, because the neural network is trained +to assign correct character to unsegmented training data.

+

Both segmentation, the process finding lines and regions on a page image, and +recognition, the conversion of line images into text, can be trained in kraken. +To train models for either we require training data, i.e. examples of page +segmentations and transcriptions that are similar to what we want to be able to +recognize. For segmentation the examples are the location of baselines, i.e. +the imaginary lines the text is written on, and polygons of regions. For +recognition these are the text contained in a line. There are multiple ways to +supply training data but the easiest is through PageXML or ALTO files.

+
+

Installing kraken

+

The easiest way to install and use kraken is through conda. kraken works both on Linux and Mac OS +X. After installing conda, download the environment file and create the +environment for kraken:

+
$ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment.yml
+$ conda env create -f environment.yml
+
+
+

Each time you want to use the kraken environment in a shell is has to be +activated first:

+
$ conda activate kraken
+
+
+
+
+

Image acquisition and preprocessing

+

First a number of high quality scans, preferably color or grayscale and at +least 300dpi are required. Scans should be in a lossless image format such as +TIFF or PNG, images in PDF files have to be extracted beforehand using a tool +such as pdftocairo or pdfimages. While each of these requirements can +be relaxed to a degree, the final accuracy will suffer to some extent. For +example, only slightly compressed JPEG scans are generally suitable for +training and recognition.

+

Depending on the source of the scans some preprocessing such as splitting scans +into pages, correcting skew and warp, and removing speckles can be advisable +although it isn’t strictly necessary as the segmenter can be trained to treat +noisy material with a high accuracy. A fairly user-friendly software for +semi-automatic batch processing of image scans is Scantailor albeit most work can be done using a standard image +editor.

+

The total number of scans required depends on the kind of model to train +(segmentation or recognition), the complexity of the layout or the nature of +the script to recognize. Only features that are found in the training data can +later be recognized, so it is important that the coverage of typographic +features is exhaustive. Training a small segmentation model for a particular +kind of material might require less than a few hundred samples while a general +model can well go into the thousands of pages. Likewise a specific recognition +model for printed script with a small grapheme inventory such as Arabic or +Hebrew requires around 800 lines, with manuscripts, complex scripts (such as +polytonic Greek), and general models for multiple typefaces and hands needing +more training data for the same accuracy.

+

There is no hard rule for the amount of training data and it may be required to +retrain a model after the initial training data proves insufficient. Most +western texts contain between 25 and 40 lines per page, therefore upward of +30 pages have to be preprocessed and later transcribed.

+
+
+

Annotation and transcription

+

kraken does not provide internal tools for the annotation and transcription of +baselines, regions, and text. There are a number of tools available that can +create ALTO and PageXML files containing the requisite information for either +segmentation or recognition training: escriptorium integrates kraken tightly including +training and inference, Aletheia is a powerful desktop +application that can create fine grained annotations.

+
+
+

Dataset Compilation

+
+
+

Training

+

The training data, e.g. a collection of PAGE XML documents, obtained through +annotation and transcription may now be used to train segmentation and/or +transcription models.

+

The training data in output_dir may now be used to train a new model by +invoking the ketos train command. Just hand a list of images to the command +such as:

+
$ ketos train output_dir/*.png
+
+
+

to start training.

+

A number of lines will be split off into a separate held-out set that is used +to estimate the actual recognition accuracy achieved in the real world. These +are never shown to the network during training but will be recognized +periodically to evaluate the accuracy of the model. Per default the validation +set will comprise of 10% of the training data.

+

Basic model training is mostly automatic albeit there are multiple parameters +that can be adjusted:

+
+
--output
+

Sets the prefix for models generated during training. They will best as +prefix_epochs.mlmodel.

+
+
--report
+

How often evaluation passes are run on the validation set. It is an +integer equal or larger than 1 with 1 meaning a report is created each +time the complete training set has been seen by the network.

+
+
--savefreq
+

How often intermediate models are saved to disk. It is an integer with +the same semantics as --report.

+
+
--load
+

Continuing training is possible by loading an existing model file with +--load. To continue training from a base model with another +training set refer to the full ketos documentation.

+
+
--preload
+

Enables/disables preloading of the training set into memory for +accelerated training. The default setting preloads data sets with less +than 2500 lines, explicitly adding --preload will preload arbitrary +sized sets. --no-preload disables preloading in all circumstances.

+
+
+

Training a network will take some time on a modern computer, even with the +default parameters. While the exact time required is unpredictable as training +is a somewhat random process a rough guide is that accuracy seldom improves +after 50 epochs reached between 8 and 24 hours of training.

+

When to stop training is a matter of experience; the default setting employs a +fairly reliable approach known as early stopping that stops training as soon as +the error rate on the validation set doesn’t improve anymore. This will +prevent overfitting, i.e. +fitting the model to recognize only the training data properly instead of the +general patterns contained therein.

+
$ ketos train output_dir/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'}
+Initializing model ✓
+Accuracy report (0) -1.5951 3680 9550
+epoch 0/-1  [####################################]  788/788
+Accuracy report (1) 0.0245 3504 3418
+epoch 1/-1  [####################################]  788/788
+Accuracy report (2) 0.8445 3504 545
+epoch 2/-1  [####################################]  788/788
+Accuracy report (3) 0.9541 3504 161
+epoch 3/-1  [------------------------------------]  13/788  0d 00:22:09
+...
+
+
+

By now there should be a couple of models model_name-1.mlmodel, +model_name-2.mlmodel, … in the directory the script was executed in. Lets +take a look at each part of the output.

+
Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+
+
+

shows the progress of loading the training and validation set into memory. This +might take a while as preprocessing the whole set and putting it into memory is +computationally intensive. Loading can be made faster without preloading at the +cost of performing preprocessing repeatedly during the training process.

+
[270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'}
+
+
+

is a warning about missing characters in either the validation or training set, +i.e. that the alphabets of the sets are not equal. Increasing the size of the +validation set will often remedy this warning.

+
Accuracy report (2) 0.8445 3504 545
+
+
+

this line shows the results of the validation set evaluation. The error after 2 +epochs is 545 incorrect characters out of 3504 characters in the validation set +for a character accuracy of 84.4%. It should decrease fairly rapidly. If +accuracy remains around 0.30 something is amiss, e.g. non-reordered +right-to-left or wildly incorrect transcriptions. Abort training, correct the +error(s) and start again.

+

After training is finished the best model is saved as +model_name_best.mlmodel. It is highly recommended to also archive the +training log and data for later reference.

+

ketos can also produce more verbose output with training set and network +information by appending one or more -v to the command:

+
$ ketos -vv train syr/*.png
+[0.7272] Building ground truth set from 876 line images
+[0.7281] Taking 88 lines from training for evaluation
+...
+[0.8479] Training set 788 lines, validation set 88 lines, alphabet 48 symbols
+[0.8481] alphabet mismatch {'\xa0', '0', ':', '݀', '܇', '݂', '5'}
+[0.8482] grapheme       count
+[0.8484] SPACE  5258
+[0.8484]        ܐ       3519
+[0.8485]        ܘ       2334
+[0.8486]        ܝ       2096
+[0.8487]        ܠ       1754
+[0.8487]        ܢ       1724
+[0.8488]        ܕ       1697
+[0.8489]        ܗ       1681
+[0.8489]        ܡ       1623
+[0.8490]        ܪ       1359
+[0.8491]        ܬ       1339
+[0.8491]        ܒ       1184
+[0.8492]        ܥ       824
+[0.8492]        .       811
+[0.8493] COMBINING DOT BELOW    646
+[0.8493]        ܟ       599
+[0.8494]        ܫ       577
+[0.8495] COMBINING DIAERESIS    488
+[0.8495]        ܚ       431
+[0.8496]        ܦ       428
+[0.8496]        ܩ       307
+[0.8497] COMBINING DOT ABOVE    259
+[0.8497]        ܣ       256
+[0.8498]        ܛ       204
+[0.8498]        ܓ       176
+[0.8499]        ܀       132
+[0.8499]        ܙ       81
+[0.8500]        *       66
+[0.8501]        ܨ       59
+[0.8501]        ܆       40
+[0.8502]        [       40
+[0.8503]        ]       40
+[0.8503]        1       18
+[0.8504]        2       11
+[0.8504]        ܇       9
+[0.8505]        3       8
+[0.8505]                6
+[0.8506]        5       5
+[0.8506] NO-BREAK SPACE 4
+[0.8507]        0       4
+[0.8507]        6       4
+[0.8508]        :       4
+[0.8508]        8       4
+[0.8509]        9       3
+[0.8510]        7       3
+[0.8510]        4       3
+[0.8511] SYRIAC FEMININE DOT    1
+[0.8511] SYRIAC RUKKAKHA        1
+[0.8512] Encoding training set
+[0.9315] Creating new model [1,1,0,48 Lbx100 Do] with 49 outputs
+[0.9318] layer          type    params
+[0.9350] 0              rnn     direction b transposed False summarize False out 100 legacy None
+[0.9361] 1              dropout probability 0.5 dims 1
+[0.9381] 2              linear  augmented False out 49
+[0.9918] Constructing RMSprop optimizer (lr: 0.001, momentum: 0.9)
+[0.9920] Set OpenMP threads to 4
+[0.9920] Moving model to device cpu
+[0.9924] Starting evaluation run
+
+
+

indicates that the training is running on 788 transcribed lines and a +validation set of 88 lines. 49 different classes, i.e. Unicode code points, +where found in these 788 lines. These affect the output size of the network; +obviously only these 49 different classes/code points can later be output by +the network. Importantly, we can see that certain characters occur markedly +less often than others. Characters like the Syriac feminine dot and numerals +that occur less than 10 times will most likely not be recognized well by the +trained net.

+
+
+

Evaluation and Validation

+

While output during training is detailed enough to know when to stop training +one usually wants to know the specific kinds of errors to expect. Doing more +in-depth error analysis also allows to pinpoint weaknesses in the training +data, e.g. above average error rates for numerals indicate either a lack of +representation of numerals in the training data or erroneous transcription in +the first place.

+

First the trained model has to be applied to some line transcriptions with the +ketos test command:

+
$ ketos test -m syriac_best.mlmodel lines/*.png
+Loading model syriac_best.mlmodel ✓
+Evaluating syriac_best.mlmodel
+Evaluating  [#-----------------------------------]    3%  00:04:56
+...
+
+
+

After all lines have been processed a evaluation report will be printed:

+
=== report  ===
+
+35619     Characters
+336       Errors
+99.06%    Accuracy
+
+157       Insertions
+81        Deletions
+98        Substitutions
+
+Count     Missed  %Right
+27046     143     99.47%  Syriac
+7015      52      99.26%  Common
+1558      60      96.15%  Inherited
+
+Errors    Correct-Generated
+25        {  } - { COMBINING DOT BELOW }
+25        { COMBINING DOT BELOW } - {  }
+15        { . } - {  }
+15        { COMBINING DIAERESIS } - {  }
+12        { ܢ } - {  }
+10        {  } - { . }
+8 { COMBINING DOT ABOVE } - {  }
+8 { ܝ } - {  }
+7 { ZERO WIDTH NO-BREAK SPACE } - {  }
+7 { ܆ } - {  }
+7 { SPACE } - {  }
+7 { ܣ } - {  }
+6 {  } - { ܝ }
+6 { COMBINING DOT ABOVE } - { COMBINING DIAERESIS }
+5 { ܙ } - {  }
+5 { ܬ } - {  }
+5 {  } - { ܢ }
+4 { NO-BREAK SPACE } - {  }
+4 { COMBINING DIAERESIS } - { COMBINING DOT ABOVE }
+4 {  } - { ܒ }
+4 {  } - { COMBINING DIAERESIS }
+4 { ܗ } - {  }
+4 {  } - { ܬ }
+4 {  } - { ܘ }
+4 { ܕ } - { ܢ }
+3 {  } - { ܕ }
+3 { ܐ } - {  }
+3 { ܗ } - { ܐ }
+3 { ܝ } - { ܢ }
+3 { ܀ } - { . }
+3 {  } - { ܗ }
+
+  .....
+
+
+

The first section of the report consists of a simple accounting of the number +of characters in the ground truth, the errors in the recognition output and the +resulting accuracy in per cent.

+

The next table lists the number of insertions (characters occurring in the +ground truth but not in the recognition output), substitutions (misrecognized +characters), and deletions (superfluous characters recognized by the model).

+

Next is a grouping of errors (insertions and substitutions) by Unicode script.

+

The final part of the report are errors sorted by frequency and a per +character accuracy report. Importantly most errors are incorrect recognition of +combining marks such as dots and diaereses. These may have several sources: +different dot placement in training and validation set, incorrect transcription +such as non-systematic transcription, or unclean speckled scans. Depending on +the error source, correction most often involves adding more training data and +fixing transcriptions. Sometimes it may even be advisable to remove +unrepresentative data from the training set.

+
+
+

Recognition

+

The kraken utility is employed for all non-training related tasks. Optical +character recognition is a multi-step process consisting of binarization +(conversion of input images to black and white), page segmentation (extracting +lines from the image), and recognition (converting line image to character +sequences). All of these may be run in a single call like this:

+
$ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m MODEL_FILE
+
+
+

producing a text file from the input image. There are also hocr and ALTO output +formats available through the appropriate switches:

+
$ kraken -i ... ocr -h
+$ kraken -i ... ocr -a
+
+
+

For debugging purposes it is sometimes helpful to run each step manually and +inspect intermediate results:

+
$ kraken -i INPUT_IMAGE BW_IMAGE binarize
+$ kraken -i BW_IMAGE LINES segment
+$ kraken -i BW_IMAGE OUTPUT_FILE ocr -l LINES ...
+
+
+

It is also possible to recognize more than one file at a time by just chaining +-i ... ... clauses like this:

+
$ kraken -i input_1 output_1 -i input_2 output_2 ...
+
+
+

Finally, there is a central repository containing freely available models. +Getting a list of all available models:

+
$ kraken list
+
+
+

Retrieving model metadata for a particular model:

+
$ kraken show arabic-alam-al-kutub
+name: arabic-alam-al-kutub.mlmodel
+
+An experimental model for Classical Arabic texts.
+
+Network trained on 889 lines of [0] as a test case for a general Classical
+Arabic model. Ground truth was prepared by Sarah Savant
+<sarah.savant@aku.edu> and Maxim Romanov <maxim.romanov@uni-leipzig.de>.
+
+Vocalization was omitted in the ground truth. Training was stopped at ~35000
+iterations with an accuracy of 97%.
+
+[0] Ibn al-Faqīh (d. 365 AH). Kitāb al-buldān. Edited by Yūsuf al-Hādī, 1st
+edition. Bayrūt: ʿĀlam al-kutub, 1416 AH/1996 CE.
+alphabet:  !()-.0123456789:[] «»،؟ءابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ARABIC
+MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW
+
+
+

and actually fetching the model:

+
$ kraken get arabic-alam-al-kutub
+
+
+

The downloaded model can then be used for recognition by the name shown in its metadata, e.g.:

+
$ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m arabic-alam-al-kutub.mlmodel
+
+
+

For more documentation see the kraken website.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.1/vgsl.html b/4.1/vgsl.html new file mode 100644 index 000000000..94147cb69 --- /dev/null +++ b/4.1/vgsl.html @@ -0,0 +1,288 @@ + + + + + + + + VGSL network specification — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

VGSL network specification

+

kraken implements a dialect of the Variable-size Graph Specification Language +(VGSL), enabling the specification of different network architectures for image +processing purposes using a short definition string.

+
+

Basics

+

A VGSL specification consists of an input block, one or more layers, and an +output block. For example:

+
[1,48,0,1 Cr3,3,32 Mp2,2 Cr3,3,64 Mp2,2 S1(1x12)1,3 Lbx100 Do O1c103]
+
+
+

The first block defines the input in order of [batch, height, width, channels] +with zero-valued dimensions being variable. Integer valued height or width +input specifications will result in the input images being automatically scaled +in either dimension.

+

When channels are set to 1 grayscale or B/W inputs are expected, 3 expects RGB +color images. Higher values in combination with a height of 1 result in the +network being fed 1 pixel wide grayscale strips scaled to the size of the +channel dimension.

+

After the input, a number of layers are defined. Layers operate on the channel +dimension; this is intuitive for convolutional layers but a recurrent layer +doing sequence classification along the width axis on an image of a particular +height requires the height dimension to be moved to the channel dimension, +e.g.:

+
[1,48,0,1 S1(1x48)1,3 Lbx100 O1c103]
+
+
+

or using the alternative slightly faster formulation:

+
[1,1,0,48 Lbx100 O1c103]
+
+
+

Finally an output definition is appended. When training sequence classification +networks with the provided tools the appropriate output definition is +automatically appended to the network based on the alphabet of the training +data.

+
+
+

Examples

+
[1,1,0,48 Lbx100 Do 01c59]
+
+Creating new model [1,1,0,48 Lbx100 Do] with 59 outputs
+layer           type    params
+0               rnn     direction b transposed False summarize False out 100 legacy None
+1               dropout probability 0.5 dims 1
+2               linear  augmented False out 59
+
+
+

A simple recurrent recognition model with a single LSTM layer classifying lines +normalized to 48 pixels in height.

+
[1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do 01c59]
+
+Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 59 outputs
+layer           type    params
+0               conv    kernel 3 x 3 filters 32 activation r
+1               dropout probability 0.1 dims 2
+2               maxpool kernel 2 x 2 stride 2 x 2
+3               conv    kernel 3 x 3 filters 64 activation r
+4               dropout probability 0.1 dims 2
+5               maxpool kernel 2 x 2 stride 2 x 2
+6               reshape from 1 1 x 12 to 1/3
+7               rnn     direction b transposed False summarize False out 100 legacy None
+8               dropout probability 0.5 dims 1
+9               linear  augmented False out 59
+
+
+

A model with a small convolutional stack before a recurrent LSTM layer. The +extended dropout layer syntax is used to reduce drop probability on the depth +dimension as the default is too high for convolutional layers. The remainder of +the height dimension (12) is reshaped into the depth dimensions before +applying the final recurrent and linear layers.

+
[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do 01c59]
+
+Creating new model [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do] with 59 outputs
+layer           type    params
+0               conv    kernel 3 x 3 filters 16 activation r
+1               maxpool kernel 3 x 3 stride 3 x 3
+2               rnn     direction f transposed True summarize True out 64 legacy None
+3               rnn     direction b transposed False summarize False out 128 legacy None
+4               rnn     direction b transposed False summarize False out 256 legacy None
+5               dropout probability 0.5 dims 1
+6               linear  augmented False out 59
+
+
+

A model with arbitrary sized color image input, an initial summarizing +recurrent layer to squash the height to 64, followed by 2 bi-directional +recurrent layers and a linear projection.

+
+
+

Convolutional Layers

+
C[{name}](s|t|r|l|m)[{name}]<y>,<x>,<d>[,<stride_y>,<stride_x>]
+s = sigmoid
+t = tanh
+r = relu
+l = linear
+m = softmax
+
+
+

Adds a 2D convolution with kernel size (y, x) and d output channels, applying +the selected nonlinearity. The stride can be adjusted with the optional last +two parameters.

+
+
+

Recurrent Layers

+
L[{name}](f|r|b)(x|y)[s][{name}]<n> LSTM cell with n outputs.
+G[{name}](f|r|b)(x|y)[s][{name}]<n> GRU cell with n outputs.
+f runs the RNN forward only.
+r runs the RNN reversed only.
+b runs the RNN bidirectionally.
+s (optional) summarizes the output in the requested dimension, return the last step.
+
+
+

Adds either an LSTM or GRU recurrent layer to the network using either the x +(width) or y (height) dimension as the time axis. Input features are the +channel dimension and the non-time-axis dimension (height/width) is treated as +another batch dimension. For example, a Lfx25 layer on an 1, 16, 906, 32 +input will execute 16 independent forward passes on 906x32 tensors resulting +in an output of shape 1, 16, 906, 25. If this isn’t desired either run a +summarizing layer in the other direction, e.g. Lfys20 for an input 1, 1, +906, 20, or prepend a reshape layer S1(1x16)1,3 combining the height and +channel dimension for an 1, 1, 906, 512 input to the recurrent layer.

+
+
+

Helper and Plumbing Layers

+
+

Max Pool

+
Mp[{name}]<y>,<x>[,<y_stride>,<x_stride>]
+
+
+

Adds a maximum pooling with (y, x) kernel_size and (y_stride, x_stride) stride.

+
+
+

Reshape

+
S[{name}]<d>(<a>x<b>)<e>,<f> Splits one dimension, moves one part to another
+        dimension.
+
+
+

The S layer reshapes a source dimension d to a,b and distributes a into +dimension e, respectively b into f. Either e or f has to be equal to +d. So S1(1, 48)1, 3 on an 1, 48, 1020, 8 input will first reshape into +1, 1, 48, 1020, 8, leave the 1 part in the height dimension and distribute +the 48 sized tensor into the channel dimension resulting in a 1, 1, 1024, +48*8=384 sized output. S layers are mostly used to remove undesirable non-1 +height before a recurrent layer.

+
+

Note

+

This S layer is equivalent to the one implemented in the tensorflow +implementation of VGSL, i.e. behaves differently from tesseract.

+
+
+
+
+

Regularization Layers

+
+

Dropout

+
Do[{name}][<prob>],[<dim>] Insert a 1D or 2D dropout layer
+
+
+

Adds an 1D or 2D dropout layer with a given probability. Defaults to 0.5 drop +probability and 1D dropout. Set to dim to 2 after convolutional layers.

+
+
+

Group Normalization

+
Gn<groups> Inserts a group normalization layer
+
+
+

Adds a group normalization layer separating the input into <groups> groups, +normalizing each separately.

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.2.0/.buildinfo b/4.2.0/.buildinfo new file mode 100644 index 000000000..fa83bc397 --- /dev/null +++ b/4.2.0/.buildinfo @@ -0,0 +1,4 @@ +# Sphinx build info version 1 +# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. +config: 4b6babeacebb54753e3c485869a88dc5 +tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/4.2.0/.doctrees/advanced.doctree b/4.2.0/.doctrees/advanced.doctree new file mode 100644 index 000000000..9d552858e Binary files /dev/null and b/4.2.0/.doctrees/advanced.doctree differ diff --git a/4.2.0/.doctrees/api.doctree b/4.2.0/.doctrees/api.doctree new file mode 100644 index 000000000..0a1a590e4 Binary files /dev/null and b/4.2.0/.doctrees/api.doctree differ diff --git a/4.2.0/.doctrees/api_docs.doctree b/4.2.0/.doctrees/api_docs.doctree new file mode 100644 index 000000000..ecc4eed8e Binary files /dev/null and b/4.2.0/.doctrees/api_docs.doctree differ diff --git a/4.2.0/.doctrees/environment.pickle b/4.2.0/.doctrees/environment.pickle new file mode 100644 index 000000000..a364fef37 Binary files /dev/null and b/4.2.0/.doctrees/environment.pickle differ diff --git a/4.2.0/.doctrees/gpu.doctree b/4.2.0/.doctrees/gpu.doctree new file mode 100644 index 000000000..0604531c4 Binary files /dev/null and b/4.2.0/.doctrees/gpu.doctree differ diff --git a/4.2.0/.doctrees/index.doctree b/4.2.0/.doctrees/index.doctree new file mode 100644 index 000000000..cc8f8e9e6 Binary files /dev/null and b/4.2.0/.doctrees/index.doctree differ diff --git a/4.2.0/.doctrees/ketos.doctree b/4.2.0/.doctrees/ketos.doctree new file mode 100644 index 000000000..f751e9251 Binary files /dev/null and b/4.2.0/.doctrees/ketos.doctree differ diff --git a/4.2.0/.doctrees/models.doctree b/4.2.0/.doctrees/models.doctree new file mode 100644 index 000000000..8fddf74fd Binary files /dev/null and b/4.2.0/.doctrees/models.doctree differ diff --git a/4.2.0/.doctrees/training.doctree b/4.2.0/.doctrees/training.doctree new file mode 100644 index 000000000..c2da395d7 Binary files /dev/null and b/4.2.0/.doctrees/training.doctree differ diff --git a/4.2.0/.doctrees/vgsl.doctree b/4.2.0/.doctrees/vgsl.doctree new file mode 100644 index 000000000..a5091aaea Binary files /dev/null and b/4.2.0/.doctrees/vgsl.doctree differ diff --git a/4.2.0/.nojekyll b/4.2.0/.nojekyll new file mode 100644 index 000000000..e69de29bb diff --git a/4.2.0/_images/blla_heatmap.jpg b/4.2.0/_images/blla_heatmap.jpg new file mode 100644 index 000000000..3f3381096 Binary files /dev/null and b/4.2.0/_images/blla_heatmap.jpg differ diff --git a/4.2.0/_images/blla_output.jpg b/4.2.0/_images/blla_output.jpg new file mode 100644 index 000000000..72c652fa4 Binary files /dev/null and b/4.2.0/_images/blla_output.jpg differ diff --git a/4.2.0/_images/bw.png b/4.2.0/_images/bw.png new file mode 100644 index 000000000..e7e5054eb Binary files /dev/null and b/4.2.0/_images/bw.png differ diff --git a/4.2.0/_images/pat.png b/4.2.0/_images/pat.png new file mode 100644 index 000000000..52f7a8995 Binary files /dev/null and b/4.2.0/_images/pat.png differ diff --git a/4.2.0/_sources/advanced.rst.txt b/4.2.0/_sources/advanced.rst.txt new file mode 100644 index 000000000..c6d6bcbeb --- /dev/null +++ b/4.2.0/_sources/advanced.rst.txt @@ -0,0 +1,439 @@ +.. _advanced: + +Advanced Usage +============== + +Optical character recognition is the serial execution of multiple steps, in the +case of kraken, layout analysis/page segmentation (extracting topological text +lines from an image), recognition (feeding text lines images into a +classifier), and finally serialization of results into an appropriate format +such as ALTO or PageXML. + +Input Specification +------------------- + +Kraken inputs and their outputs can be defined in multiple ways. The most +simple are input-output pairs, i.e. producing one output document for one input +document follow the basic syntax: + +.. code-block:: console + + $ kraken -i input_1 output_1 -i input_2 output_2 ... subcommand_1 subcommand_2 ... subcommand_n + +In particular subcommands may be chained. + +There are other ways to define inputs and outputs as the syntax shown above can +become rather cumbersome for large amounts of files. + +As such there are a couple of ways to deal with multiple files in a compact +way. The first is batch processing: + +.. code-block:: console + + $ kraken -I '*.png' -o ocr.txt segment ... + +which expands the `glob expression +`_ in kraken internally and +appends the suffix defined with `-o` to each output file. An input file +`xyz.png` will therefore produce an output file `xyz.png.ocr.txt`. A second way +is to input multi-image files directly. These can be either in PDF, TIFF, or +JPEG2000 format and are specified like: + +.. code-block:: console + + $ kraken -I some.pdf -o ocr.txt -f pdf segment ... + +This will internally extract all page images from the input PDF file and write +one output file with an index (can be changed using the `-p` option) and the +suffix defined with `-o`. + +The `-f` option can not only be used to extract data from PDF/TIFF/JPEG2000 +files but also various XML formats. In these cases the appropriate data is +automatically selected from the inputs, image data for segmentation or line and +region segmentation for recognition: + +.. code-block:: console + + $ kraken -i alto.xml alto.ocr.txt -i page.xml page.ocr.txt -f xml ocr ... + +The code is able to automatically determine if a file is in PageXML or ALTO format. + +Binarization +------------ + +.. _binarization: + +.. note:: + + Binarization is deprecated and mostly not necessary anymore. It can often + worsen text recognition results especially for documents with uneven + lighting, faint writing, etc. + +The binarization subcommand converts a color or grayscale input image into an +image containing only two color levels: white (background) and black +(foreground, i.e. text). It accepts almost the same parameters as +``ocropus-nlbin``. Only options not related to binarization, e.g. skew +detection are missing. In addition, error checking (image sizes, inversion +detection, grayscale enforcement) is always disabled and kraken will happily +binarize any image that is thrown at it. + +Available parameters are: + +=========== ==== +option type +=========== ==== +--threshold FLOAT +--zoom FLOAT +--escale FLOAT +--border FLOAT +--perc INTEGER RANGE +--range INTEGER +--low INTEGER RANGE +--high INTEGER RANGE +=========== ==== + +To binarize a image: + +.. code-block:: console + + $ kraken -i input.jpg bw.png binarize + +.. note:: + + Some image formats, notably JPEG, do not support a black and white + image mode. Per default the output format according to the output file + name extension will be honored. If this is not possible, a warning will + be printed and the output forced to PNG: + + .. code-block:: console + + $ kraken -i input.jpg bw.jpg binarize + Binarizing [06/24/22 09:56:23] WARNING jpeg does not support 1bpp images. Forcing to png. + ✓ + +Page Segmentation +----------------- + +The `segment` subcommand accesses page segmentation into lines and regions with +the two layout analysis methods implemented: the trainable baseline segmenter +that is capable of detecting both lines of different types and regions and a +legacy non-trainable segmenter that produces bounding boxes. + +Universal parameters of either segmenter are: + +=============================================== ====== +option action +=============================================== ====== +-d, --text-direction Sets principal text direction. Valid values are `horizontal-lr`, `horizontal-rl`, `vertical-lr`, and `vertical-rl`. +-m, --mask Segmentation mask suppressing page areas for line detection. A simple black and white mask image where 0-valued (black) areas are ignored for segmentation purposes. +=============================================== ====== + +Baseline Segmentation +^^^^^^^^^^^^^^^^^^^^^ + +The baseline segmenter works by applying a segmentation model on a page image +which labels each pixel on the image with one or more classes with each class +corresponding to a line or region of a specific type. In addition there are two +auxilary classes that are used to determine the line orientation. A simplified +example of a composite image of the auxilary classes and a single line type +without regions can be seen below: + +.. image:: _static/blla_heatmap.jpg + :width: 800 + :alt: BLLA output heatmap + +In a second step the raw heatmap is vectorized to extract line instances and +region boundaries, followed by bounding polygon computation for the baselines, +and text line ordering. The final output can be visualized as: + +.. image:: _static/blla_output.jpg + :width: 800 + :alt: BLLA final output + +The primary determinant of segmentation quality is the segmentation model +employed. There is a default model that works reasonably well on printed and +handwritten material on undegraded, even writing surfaces such as paper or +parchment. The output of this model consists of a single line type and a +generic text region class that denotes coherent blocks of text. This model is +employed automatically when the baseline segment is activated with the `-bl` +option: + +.. code-block:: console + + $ kraken -i input.jpg segmentation.json segment -bl + +New models optimized for other kinds of documents can be trained (see +:ref:`here `). These can be applied with the `-i` option of the +`segment` subcommand: + +.. code-block:: console + + $ kraken -i input.jpg segmentation.json segment -bl -i fancy_model.mlmodel + +Legacy Box Segmentation +^^^^^^^^^^^^^^^^^^^^^^^ + +The legacy page segmentation is mostly parameterless, although a couple of +switches exist to tweak it for particular inputs. Its output consists of +rectangular bounding boxes in reading order and the general text direction +(horizontal, i.e. LTR or RTL text in top-to-bottom reading order or +vertical-ltr/rtl for vertical lines read from left-to-right or right-to-left). + +Apart from the limitations of the bounding box paradigm (rotated and curved +lines cannot be effectively extracted) another important drawback of the legacy +segmenter is the requirement for binarized input images. It is therefore +necessary to apply :ref:`binarization ` first or supply only +pre-binarized inputs. + +The legacy segmenter can be applied on some input image with: + +.. code-block:: console + + $ kraken -i 14.tif lines.json segment -x + $ cat lines.json + +Available specific parameters are: + +=============================================== ====== +option action +=============================================== ====== +--scale FLOAT Estimate of the average line height on the page +-m, --maxcolseps Maximum number of columns in the input document. Set to `0` for uni-column layouts. +-b, --black-colseps / -w, --white-colseps Switch to black column separators. +-r, --remove-hlines / -l, --hlines Disables prefiltering of small horizontal lines. Improves segmenter output on some Arabic texts. +-p, --pad Adds left and right padding around lines in the output. +=============================================== ====== + +Principal Text Direction +^^^^^^^^^^^^^^^^^^^^^^^^ + +The principal text direction selected with the `-d/--text-direction` is a +switch used in the reading order heuristic to determine the order of text +blocks (regions) and individual lines. It roughly corresponds to the `block +flow direction +`_ in CSS with +an additional option. Valid options consist of two parts, an initial principal +line orientation (`horizontal` or `vertical`) followed by a block order (`lr` +for left-to-right or `rl` for right-to-left). + +.. warning: + + The principal text direction is independent of the direction of the + *inline text direction* (which is left-to-right for writing systems like + Latin and right-to-left for ones like Hebrew or Arabic). Kraken deals + automatically with the inline text direction through the BiDi algorithm + but can't infer the principal text direction automatically as it is + determined by factors like layout, type of document, primary script in + the document, and other factors. The differents types of text + directionality and their relation can be confusing, the `W3C writing + mode `_ document explains + the fundamentals, although the model used in Kraken differs slightly. + +The first part is usually `horizontal` for scripts like Latin, Arabic, or +Hebrew where the lines are horizontally oriented on the page and are written/read from +top to bottom: + +.. image:: _static/bw.png + :width: 800 + :alt: Horizontal Latin script text + +Other scripts like Chinese can be written with vertical lines that are +written/read from left to right or right to left: + +.. image:: https://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Chinese_manuscript_Ti-i_ch%27i-shu._Wellcome_L0020843.jpg/577px-Chinese_manuscript_Ti-i_ch%27i-shu._Wellcome_L0020843.jpg + :width: 800 + :alt: Vertical Chinese text + +The second part is dependent on a number of factors as the order in which text +blocks are read is not fixed for every writing system. In mono-script texts it +is usually determined by the inline text direction, i.e. Latin script texts +columns are read starting with the top-left column followed by the column to +its right and so on, continuing with the left-most column below if none remain +to the right (inverse for right-to-left scripts like Arabic which start on the +top right-most columns, continuing leftward, and returning to the right-most +column just below when none remain). + +In multi-script documents the order of is determined by the primary writing +system employed in the document, e.g. for a modern book containing both Latin +and Arabic script text it would be set to `lr` when Latin is primary, e.g. when +the binding is on the left side of the book seen from the title cover, and +vice-versa (`rl` if binding is on the right on the title cover). The analogue +applies to text written with vertical lines. + +With these explications there are four different text directions available: + +=============================================== ====== +Text Direction Examples +=============================================== ====== +horizontal-lr Latin script texts, Mixed LTR/RTL docs with principal LTR script +horizontal-rl Arabic script texts, Mixed LTR/RTL docs with principal RTL script +vertical-lr Vertical script texts read from left-to-right. +vertical-rl Vertical script texts read from right-to-left. +=============================================== ====== + +Masking +^^^^^^^ + +It is possible to keep the segmenter from finding text lines and regions on +certain areas of the input image. This is done through providing a binary mask +image that has the same size as the input image where blocked out regions are +black and valid regions white: + +.. code-block:: console + + $ kraken -i input.jpg segmentation.json segment -bl -m mask.png + +Model Repository +---------------- + +.. _repo: + +There is a semi-curated `repository +`_ of freely licensed recognition +models that can be interacted with from the command line using a few +subcommands. + +Querying and Model Retrieval +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``list`` subcommand retrieves a list of all models available and prints +them including some additional information (identifier, type, and a short +description): + +.. code-block:: console + + $ kraken list + Retrieving model list ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 8/8 0:00:00 0:00:07 + 10.5281/zenodo.6542744 (pytorch) - LECTAUREP Contemporary French Model (Administration) + 10.5281/zenodo.5617783 (pytorch) - Cremma-Medieval Old French Model (Litterature) + 10.5281/zenodo.5468665 (pytorch) - Medieval Hebrew manuscripts in Sephardi bookhand version 1.0 + ... + +To access more detailed information the ``show`` subcommand may be used: + +.. code-block:: console + + $ kraken show 10.5281/zenodo.5617783 + name: 10.5281/zenodo.5617783 + + Cremma-Medieval Old French Model (Litterature) + + .... + scripts: Latn + alphabet: &'(),-.0123456789:;?ABCDEFGHIJKLMNOPQRSTUVXabcdefghijklmnopqrstuvwxyz¶ãíñõ÷ħĩłũƺᵉẽ’•⁊⁹ꝑꝓꝯꝰ SPACE, COMBINING ACUTE ACCENT, COMBINING TILDE, COMBINING MACRON, COMBINING ZIGZAG ABOVE, COMBINING LATIN SMALL LETTER A, COMBINING LATIN SMALL LETTER E, COMBINING LATIN SMALL LETTER I, COMBINING LATIN SMALL LETTER O, COMBINING LATIN SMALL LETTER U, COMBINING LATIN SMALL LETTER C, COMBINING LATIN SMALL LETTER R, COMBINING LATIN SMALL LETTER T, COMBINING UR ABOVE, COMBINING US ABOVE, COMBINING LATIN SMALL LETTER S, 0xe8e5, 0xf038, 0xf128 + accuracy: 95.49% + license: CC-BY-SA-2.0 + author(s): Pinche, Ariane + date: 2021-10-29 + +If a suitable model has been decided upon it can be retrieved using the ``get`` +subcommand: + +.. code-block:: console + + $ kraken get 10.5281/zenodo.5617783 + Processing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 16.1/16.1 MB 0:00:00 0:00:10 + Model name: cremma_medieval_bicerin.mlmodel + +Models will be placed in ``$XDG_BASE_DIR`` and can be accessed using their name as +printed in the last line of the ``kraken get`` output. + +.. code-block:: console + + $ kraken -i ... ... ocr -m cremma_medieval_bicerin.mlmodel + +Publishing +^^^^^^^^^^ + +When one would like to share a model with the wider world (for fame and glory!) +it is possible (and recommended) to upload them to repository. The process +consists of 2 stages: the creation of the deposit on the Zenodo platform +followed by approval of the model in the community making it discoverable for +other kraken users. + +For uploading model a Zenodo account and a personal access token is required. +After account creation tokens can be created under the account settings: + +.. image:: _static/pat.png + :width: 800 + :alt: Zenodo token creation dialogue + +With the token models can then be uploaded: + +.. code-block:: console + + $ ketos publish -a $ACCESS_TOKEN aaebv2-2.mlmodel + DOI: 10.5281/zenodo.5617783 + +A number of important metadata will be asked for such as a short description of +the model, long form description, recognized scripts, and authorship. +Afterwards the model is deposited at Zenodo. This deposit is persistent, i.e. +can't be changed or deleted so it is important to make sure that all the +information is correct. Each deposit also has a unique persistent identifier, a +DOI, that can be used to refer to it, e.g. in publications or when pointing +someone to a particular model. + +Once the deposit has been created a request (requiring manual approval) for +inclusion in the repository will automatically be created which will make it +discoverable by other users. + +It is possible to deposit models without including them in the queryable +repository. Models uploaded this way are not truly private and can still be +found through the standard Zenodo search and be downloaded with `kraken get` +and its DOI. It is mostly suggested for preliminary models that might get +updated later: + +.. code-block:: console + + $ ketos publish --private -a $ACCESS_TOKEN aaebv2-2.mlmodel + DOI: 10.5281/zenodo.5617734 + +Recognition +----------- + +Recognition requires a grey-scale or binarized image, a page segmentation for +that image, and a model file. In particular there is no requirement to use the +page segmentation algorithm contained in the ``segment`` subcommand or the +binarization provided by kraken. + +Multi-script recognition is possible by supplying a script-annotated +segmentation and a mapping between scripts and models: + +.. code-block:: console + + $ kraken -i ... ... ocr -m Grek:porson.clstm -m Latn:antiqua.clstm + +All polytonic Greek text portions will be recognized using the `porson.clstm` +model while Latin text will be fed into the `antiqua.clstm` model. It is +possible to define a fallback model that other text will be fed to: + +.. code-block:: console + + $ kraken -i ... ... ocr -m ... -m ... -m default:porson.clstm + +It is also possible to disable recognition on a particular script by mapping to +the special model keyword `ignore`. Ignored lines will still be serialized but +will not contain any recognition results. + +The ``ocr`` subcommand is able to serialize the recognition results either as +plain text (default), as `hOCR `_, into `ALTO +`_, or abbyyXML containing additional +metadata such as bounding boxes and confidences: + +.. code-block:: console + + $ kraken -i ... ... ocr -t # text output + $ kraken -i ... ... ocr -h # hOCR output + $ kraken -i ... ... ocr -a # ALTO output + $ kraken -i ... ... ocr -y # abbyyXML output + +hOCR output is slightly different from hOCR files produced by ocropus. Each +``ocr_line`` span contains not only the bounding box of the line but also +character boxes (``x_bboxes`` attribute) indicating the coordinates of each +character. In each line alternating sequences of alphanumeric and +non-alphanumeric (in the unicode sense) characters are put into ``ocrx_word`` +spans. Both have bounding boxes as attributes and the recognition confidence +for each character in the ``x_conf`` attribute. + +Paragraph detection has been removed as it was deemed to be unduly dependent on +certain typographic features which may not be valid for your input. diff --git a/4.2.0/_sources/api.rst.txt b/4.2.0/_sources/api.rst.txt new file mode 100644 index 000000000..a907f33dc --- /dev/null +++ b/4.2.0/_sources/api.rst.txt @@ -0,0 +1,406 @@ +API Quickstart +============== + +Kraken provides routines which are usable by third party tools to access all +functionality of the OCR engine. Most functional blocks, binarization, +segmentation, recognition, and serialization are encapsulated in one high +level method each. + +Simple use cases of the API which are mostly useful for debugging purposes are +contained in the `contrib` directory. In general it is recommended to look at +this tutorial, these scripts, or the API reference. The command line drivers +are unnecessarily complex for straightforward applications as they contain lots +of boilerplate to enable all use cases. + +Basic Concepts +-------------- + +The fundamental modules of the API are similar to the command line drivers. +Image inputs and outputs are generally `Pillow `_ +objects and numerical outputs numpy arrays. + +Top-level modules implement high level functionality while :mod:`kraken.lib` +contains loaders and low level methods that usually should not be used if +access to intermediate results is not required. + +Preprocessing and Segmentation +------------------------------ + +The primary preprocessing function is binarization although depending on the +particular setup of the pipeline and the models utilized it can be optional. +For the non-trainable legacy bounding box segmenter binarization is mandatory +although it is still possible to feed color and grayscale images to the +recognizer. The trainable baseline segmenter can work with black and white, +grayscale, and color images, depending on the training data and network +configuration utilized; though grayscale and color data are used in almost all +cases. + +.. code-block:: python + + >>> from PIL import Image + + >>> from kraken import binarization + + # can be any supported image format and mode + >>> im = Image.open('foo.png') + >>> bw_im = binarization.nlbin(im) + +Legacy segmentation +~~~~~~~~~~~~~~~~~~~ + +The basic parameter of the legacy segmenter consists just of a b/w image +object, although some additional parameters exist, largely to change the +principal text direction (important for column ordering and top-to-bottom +scripts) and explicit masking of non-text image regions: + +.. code-block:: python + + >>> from kraken import pageseg + + >>> seg = pageseg.segment(bw_im) + >>> seg + {'text_direction': 'horizontal-lr', + 'boxes': [[0, 29, 232, 56], + [28, 54, 121, 84], + [9, 73, 92, 117], + [103, 76, 145, 131], + [7, 105, 119, 230], + [10, 228, 126, 345], + ... + ], + 'script_detection': False} + +Baseline segmentation +~~~~~~~~~~~~~~~~~~~~~ + +The baseline segmentation method is based on a neural network that classifies +image pixels into baselines and regions. Because it is trainable, a +segmentation model is required in addition to the image to be segmented and +it has to be loaded first: + +.. code-block:: python + + >>> from kraken import blla + >>> from kraken.lib import vgsl + + >>> model_path = 'path/to/model/file' + >>> model = vgsl.TorchVGSLModel.load_model(model_path) + +A segmentation model contains a basic neural network and associated metadata +defining the available line and region types, bounding regions, and an +auxiliary baseline location flag for the polygonizer: + +.. raw:: html + :file: _static/kraken_segmodel.svg + +Afterwards they can be fed into the segmentation method +:func:`kraken.blla.segment` with image objects: + +.. code-block:: python + + >>> from kraken import blla + >>> from kraken import serialization + + >>> baseline_seg = blla.segment(im, model=model) + >>> baseline_seg + {'text_direction': 'horizontal-lr', + 'type': 'baselines', + 'script_detection': False, + 'lines': [{'script': 'default', + 'baseline': [[471, 1408], [524, 1412], [509, 1397], [1161, 1412], [1195, 1412]], + 'boundary': [[471, 1408], [491, 1408], [515, 1385], [562, 1388], [575, 1377], ... [473, 1410]]}, + ...], + 'regions': {'$tip':[[[536, 1716], ... [522, 1708], [524, 1716], [536, 1716], ...] + '$par': ... + '$nop': ...}} + >>> alto = serialization.serialize_segmentation(baseline_seg, image_name=im.filename, image_size=im.size, template='alto') + >>> with open('segmentation_output.xml', 'w') as fp: + fp.write(alto) + +Optional parameters are largely the same as for the legacy segmenter, i.e. text +direction and masking. + +Images are automatically converted into the proper mode for recognition, except +in the case of models trained on binary images as there is a plethora of +different algorithms available, each with strengths and weaknesses. For most +material the kraken-provided binarization should be sufficient, though. This +does not mean that a segmentation model trained on RGB images will have equal +accuracy for B/W, grayscale, and RGB inputs. Nevertheless the drop in quality +will often be modest or non-existent for color models while non-binarized +inputs to a binary model will cause severe degradation (and a warning to that +notion). + +Per default segmentation is performed on the CPU although the neural network +can be run on a GPU with the `device` argument. As the vast majority of the +processing required is postprocessing the performance gain will most likely +modest though. + +The above API is the most simple way to perform a complete segmentation. The +process consists of multiple steps such as pixel labelling, separate region and +baseline vectorization, and bounding polygon calculation: + +.. raw:: html + :file: _static/kraken_segmentation.svg + +It is possible to only run a subset of the functionality depending on one's +needs by calling the respective functions in :mod:`kraken.lib.segmentation`. As +part of the sub-library the API is not guaranteed to be stable but it generally +does not change much. Examples of more fine-grained use of the segmentation API +can be found in `contrib/repolygonize.py +`_ +and `contrib/segmentation_overlay.py +`_. + +Recognition +----------- + +Recognition itself is a multi-step process with a neural network producing a +matrix with a confidence value for possible outputs at each time step. This +matrix is decoded into a sequence of integer labels (*label domain*) which are +subsequently mapped into Unicode code points using a codec. Labels and code +points usually correspond one-to-one, i.e. each label is mapped to exactly one +Unicode code point, but if desired more complex codecs can map single labels to +multiple code points, multiple labels to single code points, or multiple labels +to multiple code points (see the :ref:`Codec ` section for further +information). + +.. _recognition_steps: + +.. raw:: html + :file: _static/kraken_recognition.svg + +As the customization of this two-stage decoding process is usually reserved +for specialized use cases, sensible defaults are chosen by default: codecs are +part of the model file and do not have to be supplied manually; the preferred +CTC decoder is an optional parameter of the recognition model object. + +To perform text line recognition a neural network has to be loaded first. A +:class:`kraken.lib.models.TorchSeqRecognizer` is returned which is a wrapper +around the :class:`kraken.lib.vgsl.TorchVGSLModel` class seen above for +segmentation model loading. + +.. code-block:: python + + >>> from kraken.lib import models + + >>> rec_model_path = '/path/to/recognition/model' + >>> model = models.load_any(rec_model_path) + +The sequence recognizer wrapper combines the neural network itself, a +:ref:`codec `, metadata such as the if the input is supposed to be +grayscale or binarized, and an instance of a CTC decoder that performs the +conversion of the raw output tensor of the network into a sequence of labels: + +.. raw:: html + :file: _static/kraken_torchseqrecognizer.svg + +Afterwards, given an image, a segmentation and the model one can perform text +recognition. The code is identical for both legacy and baseline segmentations. +Like for segmentation input images are auto-converted to the correct color +mode, except in the case of binary models for which a warning will be raised if +there is a mismatch for binary input models. + +There are two methods for recognition, a basic single model call +:func:`kraken.rpred.rpred` and a multi-model recognizer +:func:`kraken.rpred.mm_rpred`. The latter is useful for recognizing +multi-scriptal documents, i.e. applying different models to different parts of +a document. + +.. code-block:: python + + >>> from kraken import rpred + # single model recognition + >>> pred_it = rpred(model, im, baseline_seg) + >>> for record in pred_it: + print(record) + +The output isn't just a sequence of characters but an +:class:`kraken.rpred.ocr_record` record object containing the character +prediction, cuts (approximate locations), and confidences. + +.. code-block:: python + + >>> record.cuts + >>> record.prediction + >>> record.confidences + +it is also possible to access the original line information: + +.. code-block:: python + + # for baselines + >>> record.type + 'baselines' + >>> record.line + >>> record.baseline + >>> record.script + + # for box lines + >>> record.type + 'box' + >>> record.line + >>> record.script + +Sometimes the undecoded raw output of the network is required. The :math:`C +\times W` softmax output matrix is accessible as the `outputs` attribute on the +:class:`kraken.lib.models.TorchSeqRecognizer` after each step of the +:func:`kraken.rpred.rpred` iterator. To get a mapping from the label space +:math:`C` the network operates in to Unicode code points a codec is used. An +arbitrary sequence of labels can generate an arbitrary number of Unicode code +points although usually the relation is one-to-one. + +.. code-block:: python + + >>> pred_it = rpred(model, im, baseline_seg) + >>> next(pred_it) + >>> model.output + >>> model.codec.l2c + {'\x01': ' ', + '\x02': '"', + '\x03': "'", + '\x04': '(', + '\x05': ')', + '\x06': '-', + '\x07': '/', + ... + } + +There are several different ways to convert the output matrix to a sequence of +labels that can be decoded into a character sequence. These are contained in +:mod:`kraken.lib.ctc_decoder` with +:func:`kraken.lib.ctc_decoder.greedy_decoder` being the default. + +XML Parsing +----------- + +Sometimes it is desired to take the data in an existing XML serialization +format like PageXML or ALTO and apply an OCR function on it. The +:mod:`kraken.lib.xml` module includes parsers extracting information into data +structures processable with minimal transformtion by the functional blocks: + +.. code-block:: python + + >>> from kraken.lib import xml + + >>> alto_doc = '/path/to/alto' + >>> xml.parse_alto(alto_doc) + {'image': '/path/to/image/file', + 'type': 'baselines', + 'lines': [{'baseline': [(24, 2017), (25, 2078)], + 'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)], + 'text': '', + 'script': 'default'}, + {'baseline': [(79, 2016), (79, 2041)], + 'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)], + 'text': '', + 'script': 'default'}, ...], + 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)], + [(253, 3292), (668, 3292), (668, 3455), (253, 3455)], + [(216, -4), (1015, -4), (1015, 534), (216, 534)]], + 'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)], + [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]], + ...} + } + + >>> page_doc = '/path/to/page' + >>> xml.parse_page(page_doc) + {'image': '/path/to/image/file', + 'type': 'baselines', + 'lines': [{'baseline': [(24, 2017), (25, 2078)], + 'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)], + 'text': '', + 'script': 'default'}, + {'baseline': [(79, 2016), (79, 2041)], + 'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)], + 'text': '', + 'script': 'default'}, ...], + 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)], + [(253, 3292), (668, 3292), (668, 3455), (253, 3455)], + [(216, -4), (1015, -4), (1015, 534), (216, 534)]], + 'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)], + [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]], + ...} + + +Serialization +------------- + +The serialization module can be used to transform the :class:`ocr_records +` returned by the prediction iterator into a text +based (most often XML) format for archival. The module renders `jinja2 +`_ templates in `kraken/templates` through +the :func:`kraken.serialization.serialize` function. + +.. code-block:: python + + >>> from kraken.lib import serialization + + >>> records = [record for record in pred_it] + >>> alto = serialization.serialize(records, image_name='path/to/image', image_size=im.size, template='alto') + >>> with open('output.xml', 'w') as fp: + fp.write(alto) + + +Training +-------- + +Training is largely implemented with the `pytorch lightning +`_ framework. There are separate +`LightningModule`s for recognition and segmentation training and a small +wrapper around the lightning's `Trainer` class that mainly sets up model +handling and verbosity options for the CLI. + + +.. code-block:: python + + >>> from kraken.lib.train import RecognitionModel, KrakenTrainer + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer() + >>> trainer.fit(model) + +Likewise for a baseline and region segmentation model: + +.. code-block:: python + + >>> from kraken.lib.train import SegmentationModel, KrakenTrainer + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = SegmentationModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer() + >>> trainer.fit(model) + +When the `fit()` method is called the dataset is initialized and the training +commences. Both can take quite a bit of time. To get insight into what exactly +is happening the standard `lightning callbacks +`_ +can be attached to the trainer object: + +.. code-block:: python + + >>> from pytorch_lightning.callbacks import Callback + >>> from kraken.lib.train import RecognitionModel, KrakenTrainer + >>> class MyPrintingCallback(Callback): + def on_init_start(self, trainer): + print("Starting to init trainer!") + + def on_init_end(self, trainer): + print("trainer is init now") + + def on_train_end(self, trainer, pl_module): + print("do something when training ends") + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer(enable_progress_bar=False, callbacks=[MyPrintingCallback]) + >>> trainer.fit(model) + Starting to init trainer! + trainer is init now + +This is only a small subset of the training functionality. It is suggested to +have a closer look at the command line parameters for features as transfer +learning, region and baseline filtering, training continuation, and so on. diff --git a/4.2.0/_sources/api_docs.rst.txt b/4.2.0/_sources/api_docs.rst.txt new file mode 100644 index 000000000..46379f2b8 --- /dev/null +++ b/4.2.0/_sources/api_docs.rst.txt @@ -0,0 +1,251 @@ +************* +API Reference +************* + +kraken.blla module +================== + +.. note:: + + `blla` provides the interface to the fully trainable segmenter. For the + legacy segmenter interface refer to the `pageseg` module. Note that + recognition models are not interchangeable between segmenters. + +.. autoapifunction:: kraken.blla.segment + +kraken.pageseg module +===================== + +.. note:: + + `pageseg` is the legacy bounding box-based segmenter. For the trainable + baseline segmenter interface refer to the `blla` module. Note that + recognition models are not interchangeable between segmenters. + +.. autoapifunction:: kraken.pageseg.segment + +kraken.rpred module +=================== + +.. autoapifunction:: kraken.rpred.bidi_record + +.. autoapiclass:: kraken.rpred.mm_rpred + :members: + +.. autoapiclass:: kraken.rpred.ocr_record + :members: + +.. autoapifunction:: kraken.rpred.rpred + + +kraken.serialization module +=========================== + +.. autoapifunction:: kraken.serialization.render_report + +.. autoapifunction:: kraken.serialization.serialize + +.. autoapifunction:: kraken.serialization.serialize_segmentation + +kraken.lib.models module +======================== + +.. autoapiclass:: kraken.lib.models.TorchSeqRecognizer + :members: + +.. autoapifunction:: kraken.lib.models.load_any + +kraken.lib.vgsl module +====================== + +.. autoapiclass:: kraken.lib.vgsl.TorchVGSLModel + :members: + +kraken.lib.xml module +===================== + +.. autoapifunction:: kraken.lib.xml.parse_xml + +.. autoapifunction:: kraken.lib.xml.parse_page + +.. autoapifunction:: kraken.lib.xml.parse_alto + +kraken.lib.codec module +======================= + +.. autoapiclass:: kraken.lib.codec.PytorchCodec + :members: + +kraken.lib.train module +======================= + +Training Schedulers +------------------- + +.. autoapiclass:: kraken.lib.train.TrainScheduler + :members: + +.. autoapiclass:: kraken.lib.train.annealing_step + :members: + +.. autoapiclass:: kraken.lib.train.annealing_const + :members: + +.. autoapiclass:: kraken.lib.train.annealing_exponential + :members: + +.. autoapiclass:: kraken.lib.train.annealing_reduceonplateau + :members: + +.. autoapiclass:: kraken.lib.train.annealing_cosine + :members: + +.. autoapiclass:: kraken.lib.train.annealing_onecycle + :members: + +Training Stoppers +----------------- + +.. autoapiclass:: kraken.lib.train.TrainStopper + :members: + +.. autoapiclass:: kraken.lib.train.EarlyStopping + :members: + +.. autoapiclass:: kraken.lib.train.EpochStopping + :members: + +.. autoapiclass:: kraken.lib.train.NoStopping + :members: + +Loss and Evaluation Functions +----------------------------- + +.. autoapifunction:: kraken.lib.train.recognition_loss_fn + +.. autoapifunction:: kraken.lib.train.baseline_label_loss_fn + +.. autoapifunction:: kraken.lib.train.recognition_evaluator_fn + +.. autoapifunction:: kraken.lib.train.baseline_label_evaluator_fn + +Trainer +------- + +.. autoapiclass:: kraken.lib.train.KrakenTrainer + :members: + + +kraken.lib.dataset module +========================= + +Datasets +-------- + +.. autoapiclass:: kraken.lib.dataset.BaselineSet + :members: + +.. autoapiclass:: kraken.lib.dataset.PolygonGTDataset + :members: + +.. autoapiclass:: kraken.lib.dataset.GroundTruthDataset + :members: + +Helpers +------- + +.. autoapifunction:: kraken.lib.dataset.compute_error + +.. autoapifunction:: kraken.lib.dataset.preparse_xml_data + +.. autoapifunction:: kraken.lib.dataset.generate_input_transforms + +kraken.lib.segmentation module +------------------------------ + +.. autoapifunction:: kraken.lib.segmentation.reading_order + +.. autoapifunction:: kraken.lib.segmentation.polygonal_reading_order + +.. autoapifunction:: kraken.lib.segmentation.denoising_hysteresis_thresh + +.. autoapifunction:: kraken.lib.segmentation.vectorize_lines + +.. autoapifunction:: kraken.lib.segmentation.calculate_polygonal_environment + +.. autoapifunction:: kraken.lib.segmentation.scale_polygonal_lines + +.. autoapifunction:: kraken.lib.segmentation.scale_regions + +.. autoapifunction:: kraken.lib.segmentation.compute_polygon_section + +.. autoapifunction:: kraken.lib.segmentation.extract_polygons + + +kraken.lib.ctc_decoder +====================== + +.. autoapifunction:: kraken.lib.ctc_decoder.beam_decoder + +.. autoapifunction:: kraken.lib.ctc_decoder.greedy_decoder + +.. autoapifunction:: kraken.lib.ctc_decoder.blank_threshold_decoder + +kraken.lib.exceptions +===================== + +.. autoapiclass:: kraken.lib.exceptions.KrakenCodecException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenStopTrainingException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenEncodeException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenRecordException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenInvalidModelException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenInputException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenRepoException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenCairoSurfaceException + :members: + + +Legacy modules +============== + +These modules are retained for compatibility reasons or highly specialized use +cases. In most cases their use is not necessary and they aren't further +developed for interoperability with new functionality, e.g. the transcription +and line generation modules do not work with the baseline segmenter. + +kraken.binarization module +-------------------------- + +.. autoapifunction:: kraken.binarization.nlbin + +kraken.transcribe module +------------------------ + +.. autoapiclass:: kraken.transcribe.TranscriptionInterface + :members: + +kraken.linegen module +--------------------- + +.. autoapiclass:: kraken.transcribe.LineGenerator + :members: + +.. autoapifunction:: kraken.transcribe.ocropy_degrade + +.. autoapifunction:: kraken.transcribe.degrade_line + +.. autoapifunction:: kraken.transcribe.distort_line diff --git a/4.2.0/_sources/gpu.rst.txt b/4.2.0/_sources/gpu.rst.txt new file mode 100644 index 000000000..fbb66ba76 --- /dev/null +++ b/4.2.0/_sources/gpu.rst.txt @@ -0,0 +1,10 @@ +.. _gpu: + +GPU Acceleration +================ + +The latest version of kraken uses a new pytorch backend which enables GPU +acceleration both for training and recognition. Apart from a compatible Nvidia +GPU, CUDA and cuDNN have to be installed so pytorch can run computation on it. + + diff --git a/4.2.0/_sources/index.rst.txt b/4.2.0/_sources/index.rst.txt new file mode 100644 index 000000000..85c8b81e7 --- /dev/null +++ b/4.2.0/_sources/index.rst.txt @@ -0,0 +1,243 @@ +kraken +====== + +.. toctree:: + :hidden: + :maxdepth: 2 + + advanced + Training + API Tutorial + API Reference + Models + +kraken is a turn-key OCR system optimized for historical and non-Latin script +material. + +Features +======== + +kraken's main features are: + + - Fully trainable layout analysis and character recognition + - `Right-to-Left `_, `BiDi + `_, and Top-to-Bottom + script support + - `ALTO `_, PageXML, abbyyXML, and hOCR + output + - Word bounding boxes and character cuts + - Multi-script recognition support + - :ref:`Public repository ` of model files + - :ref:`Lightweight model files ` + - :ref:`Variable recognition network architectures ` + +Pull requests and code contributions are always welcome. + +Installation +============ + +Kraken can be run on Linux or Mac OS X (both x64 and ARM). Installation through +the on-board *pip* utility and the `anaconda `_ +scientific computing python are supported. + +Installation using Pip +---------------------- + +.. code-block:: console + + $ pip install kraken + +or by running pip in the git repository: + +.. code-block:: console + + $ pip install . + +If you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to +install the `pdf` extras package for PyPi: + +.. code-block:: console + + $ pip install kraken[pdf] + +or + +.. code-block:: console + + $ pip install .[pdf] + +respectively. + +Installation using Conda +------------------------ + +To install the stable version through `conda `_: + +.. code-block:: console + + $ conda install -c conda-forge -c mittagessen kraken + +Again PDF/multi-page TIFF/JPEG2000 support requires some additional dependencies: + +.. code-block:: console + + $ conda install -c conda-forge pyvips + +The git repository contains some environment files that aid in setting up the latest development version: + +.. code-block:: console + + $ git clone https://github.com/mittagessen/kraken.git + $ cd kraken + $ conda env create -f environment.yml + +or: + +.. code-block:: console + + $ git clone https://github.com/mittagessen/kraken.git + $ cd kraken + $ conda env create -f environment_cuda.yml + +for CUDA acceleration with the appropriate hardware. + +Finding Recognition Models +-------------------------- + +Finally you'll have to scrounge up a recognition model to do the actual +recognition of characters. To download the default English text recognition +model and place it in the user's kraken directory: + +.. code-block:: console + + $ kraken get 10.5281/zenodo.2577813 + +A list of libre models available in the central repository can be retrieved by +running: + +.. code-block:: console + + $ kraken list + +Model metadata can be extracted using: + +.. code-block:: console + + $ kraken show 10.5281/zenodo.2577813 + name: 10.5281/zenodo.2577813 + + A generalized model for English printed text + + This model has been trained on a large corpus of modern printed English text\naugmented with ~10000 lines of historical p + scripts: Latn + alphabet: !"#$%&'()+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]`abcdefghijklmnopqrstuvwxyz{} SPACE + accuracy: 99.95% + license: Apache-2.0 + author(s): Kiessling, Benjamin + date: 2019-02-26 + +Quickstart +========== + +The structure of an OCR software consists of multiple steps, primarily +preprocessing, segmentation, and recognition, each of which takes the output of +the previous step and sometimes additional files such as models and templates +that define how a particular transformation is to be performed. + +In kraken these are separated into different subcommands that can be chained or +ran separately: + +.. raw:: html + :file: _static/kraken_workflow.svg + +Recognizing text on an image using the default parameters including the +prerequisite step of page segmentation: + +.. code-block:: console + + $ kraken -i image.tif image.txt segment -bl ocr + Loading RNN ✓ + Processing ⣻ + +To segment an image into reading-order sorted baselines and regions: + +.. code-block:: console + + $ kraken -i bw.tif lines.json segment -bl + +To OCR an image using the default model: + +.. code-block:: console + + $ kraken -i bw.tif image.txt segment -bl ocr + +To OCR an image using the default model and serialize the output using the ALTO +template: + +.. code-block:: console + + $ kraken -a -i bw.tif image.txt segment -bl ocr + +All commands and their parameters are documented, just add the standard +``--help`` flag for further information. + +Training Tutorial +================= + +There is a training tutorial at :doc:`training`. + +Related Software +================ + +These days kraken is quite closely linked to the `escriptorium +`_ project developed in the same eScripta research +group. eScriptorium provides a user-friendly interface for annotating data, +training models, and inference (but also much more). There is a `gitter channel +`_ that is mostly intended for +coordinating technical development but is also a spot to find people with +experience on applying kraken on a wide variety of material. + +.. _license: + +License +======= + +``Kraken`` is provided under the terms and conditions of the `Apache 2.0 +License `_. + +Funding +======= + +kraken is developed at the `École Pratique des Hautes Études `_, `Université PSL `_. + + +.. container:: twocol + + .. container:: leftside + + .. image:: https://ec.europa.eu/regional_policy/images/information/logos/eu_flag.jpg + :width: 100 + :alt: Co-financed by the European Union + + .. container:: rightside + + This project was partially funded through the RESILIENCE project, funded from + the European Union’s Horizon 2020 Framework Programme for Research and + Innovation. + + +.. container:: twocol + + .. container:: leftside + + .. image:: https://www.gouvernement.fr/sites/default/files/styles/illustration-centre/public/contenu/illustration/2018/10/logo_investirlavenir_rvb.png + :width: 100 + :alt: Received funding from the Programme d’investissements d’Avenir + + .. container:: rightside + + Ce travail a bénéficié d’une aide de l’État gérée par l’Agence Nationale de la + Recherche au titre du Programme d’Investissements d’Avenir portant la référence + ANR-21-ESRE-0005. + + diff --git a/4.2.0/_sources/ketos.rst.txt b/4.2.0/_sources/ketos.rst.txt new file mode 100644 index 000000000..2ac1993cc --- /dev/null +++ b/4.2.0/_sources/ketos.rst.txt @@ -0,0 +1,694 @@ +.. _ketos: + +Training +======== + +This page describes the training utilities available through the ``ketos`` +command line utility in depth. For a gentle introduction on model training +please refer to the :ref:`tutorial `. + +Both segmentation and recognition are trainable in kraken. The segmentation +model finds baselines and regions on a page image. Recognition models convert +text image lines found by the segmenter into digital text. + +Training data formats +--------------------- + +The training tools accept a variety of training data formats, usually some kind +of custom low level format, the XML-based formats that are commony used for +archival of annotation and transcription data, and in the case of recognizer +training a precompiled binary format. It is recommended to use the XML formats +for segmentation training and the binary format for recognition training. + +ALTO +~~~~ + +Kraken parses and produces files according to ALTO 4.2. An example showing the +attributes necessary for segmentation and recognition training follows: + +.. literalinclude:: alto.xml + :language: xml + :force: + +Importantly, the parser only works with measurements in the pixel domain, i.e. +an unset `MeasurementUnit` or one with an element value of `pixel`. In +addition, as the minimal version required for ingestion is quite new it is +likely that most existing ALTO documents will not contain sufficient +information to be used with kraken out of the box. + +PAGE XML +~~~~~~~~ + +PAGE XML is parsed and produced according to the 2019-07-15 version of the +schema, although the parser is not strict and works with non-conformant output +of a variety of tools. + +.. literalinclude:: pagexml.xml + :language: xml + :force: + +Binary Datasets +~~~~~~~~~~~~~~~ + +.. _binary_datasets: + +In addition to training recognition models directly from XML and image files, a +binary dataset format offering a couple of advantages is supported. Binary +datasets drastically improve loading performance allowing the saturation of +most GPUs with minimal computational overhead while also allowing training with +datasets that are larger than the systems main memory. A minor drawback is a +~30% increase in dataset size in comparison to the raw images + XML approach. + +To realize this speedup the dataset has to be compiled first: + +.. code-block:: console + + $ ketos compile -f xml -o dataset.arrow file_1.xml file_2.xml ... + +if there are a lot of individual lines containing many lines this process can +take a long time. It can easily be parallelized by specifying the number of +separate parsing workers with the `--workers` option: + +.. code-block:: console + + $ ketos compile --workers 8 -f xml ... + +In addition, binary datasets can contain fixed splits which allow +reproducibility and comparability between training and evaluation runs. +Training, validation, and test splits can be pre-defined from multiple sources. +Per default they are sourced from tags defined in the source XML files unless +the option telling kraken to ignore them is set: + +.. code-block:: console + + $ ketos compile --ignore-splits -f xml ... + +Alternatively fixed-proportion random splits can be created ad-hoc during +compile time: + +.. code-block:: console + + $ ketos compile --random-split 0.8 0.1 0.1 ... + +The above line splits assigns 80% of the source lines to the training set, 10% +to the validation set, and 10% to the test set. The training and validation +sets in the dataset file are used automatically by `ketos train` (unless told +otherwise) while the remaining 10% of the test set is selected by `ketos test`. + +Recognition training +-------------------- + +The training utility allows training of :ref:`VGSL ` specified models +both from scratch and from existing models. Here are its most important command line options: + +======================================================= ====== +option action +======================================================= ====== +-o, --output Output model file prefix. Defaults to model. +-s, --spec VGSL spec of the network to train. CTC layer + will be added automatically. default: + [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 + Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] +-a, --append Removes layers before argument and then + appends spec. Only works when loading an + existing model +-i, --load Load existing file to continue training +-F, --savefreq Model save frequency in epochs during + training +-q, --quit Stop condition for training. Set to `early` + for early stopping (default) or `dumb` for fixed + number of epochs. +-N, --epochs Number of epochs to train for. +--min-epochs Minimum number of epochs to train for when using early stopping. +--lag Number of epochs to wait before stopping + training without improvement. Only used when using early stopping. +-d, --device Select device to use (cpu, cuda:0, cuda:1,...). GPU acceleration requires CUDA. +--optimizer Select optimizer (Adam, SGD, RMSprop). +-r, --lrate Learning rate [default: 0.001] +-m, --momentum Momentum used with SGD optimizer. Ignored otherwise. +-w, --weight-decay Weight decay. +--schedule Sets the learning rate scheduler. May be either constant, 1cycle, exponential, cosine, step, or + reduceonplateau. For 1cycle the cycle length is determined by the `--epoch` option. +-p, --partition Ground truth data partition ratio between train/validation set +-u, --normalization Ground truth Unicode normalization. One of NFC, NFKC, NFD, NFKD. +-c, --codec Load a codec JSON definition (invalid if loading existing model) +--resize Codec/output layer resizing option. If set + to `add` code points will be added, `both` + will set the layer to match exactly the + training data, `fail` will abort if training + data and model codec do not match. Only valid when refining an existing model. +-n, --reorder / --no-reorder Reordering of code points to display order. +-t, --training-files File(s) with additional paths to training data. Used to + enforce an explicit train/validation set split and deal with + training sets with more lines than the command line can process. Can be used more than once. +-e, --evaluation-files File(s) with paths to evaluation data. Overrides the `-p` parameter. +-f, --format-type Sets the training and evaluation data format. + Valid choices are 'path', 'xml' (default), 'alto', 'page', or binary. + In `alto`, `page`, and xml mode all data is extracted from XML files + containing both baselines and a link to source images. + In `path` mode arguments are image files sharing a prefix up to the last + extension with JSON `.path` files containing the baseline information. + In `binary` mode arguments are precompiled binary dataset files. +--augment / --no-augment Enables/disables data augmentation. +--workers Number of OpenMP threads and workers used to perform neural network passes and load samples from the dataset. +======================================================= ====== + +From Scratch +~~~~~~~~~~~~ + +The absolute minimal example to train a new recognition model from a number of +ALTO or PAGE XML documents is similar to the segmentation training: + +.. code-block:: console + + $ ketos train -f xml training_data/*.xml + +Training will continue until the error does not improve anymore and the best +model (among intermediate results) will be saved in the current directory; this +approach is called early stopping. + +In some cases, such as color inputs, changing the network architecture might be +useful: + +.. code-block:: console + + $ ketos train -f page -s '[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do]' syr/*.xml + +Complete documentation for the network description language can be found on the +:ref:`VGSL ` page. + +Sometimes the early stopping default parameters might produce suboptimal +results such as stopping training too soon. Adjusting the minimum delta an/or +lag can be useful: + +.. code-block:: console + + $ ketos train --lag 10 --min-delta 0.001 syr/*.png + +To switch optimizers from Adam to SGD or RMSprop just set the option: + +.. code-block:: console + + $ ketos train --optimizer SGD syr/*.png + +It is possible to resume training from a previously saved model: + +.. code-block:: console + + $ ketos train -i model_25.mlmodel syr/*.png + +A good configuration for a small precompiled print dataset and GPU acceleration +would be: + +.. code-block:: console + + $ ketos train -d cuda -f binary dataset.arrow + +A better configuration for large and complicated datasets such as handwritten texts: + +.. code-block:: console + + $ ketos train --augment --workers 4 -d cuda -f binary --min-epochs 20 -w 0 -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]' -r 0.0001 dataset_large.arrow + +This configuration is slower to train and often requires a couple of epochs to +output any sensible text at all. Therefore we tell ketos to train for at least +20 epochs so the early stopping algorithm doesn't prematurely interrupt the +training process. + +Fine Tuning +~~~~~~~~~~~ + +Fine tuning an existing model for another typeface or new characters is also +possible with the same syntax as resuming regular training: + +.. code-block:: console + + $ ketos train -f page -i model_best.mlmodel syr/*.xml + +The caveat is that the alphabet of the base model and training data have to be +an exact match. Otherwise an error will be raised: + +.. code-block:: console + + $ ketos train -i model_5.mlmodel kamil/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [0.8616] alphabet mismatch {'~', '»', '8', '9', 'ـ'} + Network codec not compatible with training set + [0.8620] Training data and model codec alphabets mismatch: {'ٓ', '؟', '!', 'ص', '،', 'ذ', 'ة', 'ي', 'و', 'ب', 'ز', 'ح', 'غ', '~', 'ف', ')', 'د', 'خ', 'م', '»', 'ع', 'ى', 'ق', 'ش', 'ا', 'ه', 'ك', 'ج', 'ث', '(', 'ت', 'ظ', 'ض', 'ل', 'ط', '؛', 'ر', 'س', 'ن', 'ء', 'ٔ', '«', 'ـ', 'ٕ'} + +There are two modes dealing with mismatching alphabets, ``add`` and ``both``. +``add`` resizes the output layer and codec of the loaded model to include all +characters in the new training set without removing any characters. ``both`` +will make the resulting model an exact match with the new training set by both +removing unused characters from the model and adding new ones. + +.. code-block:: console + + $ ketos -v train --resize add -i model_5.mlmodel syr/*.png + ... + [0.7943] Training set 788 lines, validation set 88 lines, alphabet 50 symbols + ... + [0.8337] Resizing codec to include 3 new code points + [0.8374] Resizing last layer in network to 52 outputs + ... + +In this example 3 characters were added for a network that is able to +recognize 52 different characters after sufficient additional training. + +.. code-block:: console + + $ ketos -v train --resize both -i model_5.mlmodel syr/*.png + ... + [0.7593] Training set 788 lines, validation set 88 lines, alphabet 49 symbols + ... + [0.7857] Resizing network or given codec to 49 code sequences + [0.8344] Deleting 2 output classes from network (46 retained) + ... + +In ``both`` mode 2 of the original characters were removed and 3 new ones were added. + +Slicing +~~~~~~~ + +Refining on mismatched alphabets has its limits. If the alphabets are highly +different the modification of the final linear layer to add/remove character +will destroy the inference capabilities of the network. In those cases it is +faster to slice off the last few layers of the network and only train those +instead of a complete network from scratch. + +Taking the default network definition as printed in the debug log we can see +the layer indices of the model: + +.. code-block:: console + + [0.8760] Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 48 outputs + [0.8762] layer type params + [0.8790] 0 conv kernel 3 x 3 filters 32 activation r + [0.8795] 1 dropout probability 0.1 dims 2 + [0.8797] 2 maxpool kernel 2 x 2 stride 2 x 2 + [0.8802] 3 conv kernel 3 x 3 filters 64 activation r + [0.8804] 4 dropout probability 0.1 dims 2 + [0.8806] 5 maxpool kernel 2 x 2 stride 2 x 2 + [0.8813] 6 reshape from 1 1 x 12 to 1/3 + [0.8876] 7 rnn direction b transposed False summarize False out 100 legacy None + [0.8878] 8 dropout probability 0.5 dims 1 + [0.8883] 9 linear augmented False out 48 + +To remove everything after the initial convolutional stack and add untrained +layers we define a network stub and index for appending: + +.. code-block:: console + + $ ketos train -i model_1.mlmodel --append 7 -s '[Lbx256 Do]' syr/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [0.8014] alphabet mismatch {'8', '3', '9', '7', '܇', '݀', '݂', '4', ':', '0'} + Slicing and dicing model ✓ + +The new model will behave exactly like a new one, except potentially training a +lot faster. + +Text Normalization and Unicode +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. note: + + The description of the different behaviors of Unicode text below are highly + abbreviated. If confusion arrises it is recommended to take a look at the + linked documents which are more exhaustive and include visual examples. + +Text can be encoded in multiple different ways when using Unicode. For many +scripts characters with diacritics can be encoded either as a single code point +or a base character and the diacritic, `different types of whitespace +`_ exist, and mixed bidirectional text +can be written differently depending on the `base line direction +`_. + +Ketos provides options to largely normalize input into normalized forms that +make processing of data from multiple sources possible. Principally, two +options are available: one for `Unicode normalization +`_ and one for whitespace normalization. The +Unicode normalization (disabled per default) switch allows one to select one of +the 4 normalization forms: + +.. code-block:: console + + $ ketos train --normalization NFD -f xml training_data/*.xml + $ ketos train --normalization NFC -f xml training_data/*.xml + $ ketos train --normalization NFKD -f xml training_data/*.xml + $ ketos train --normalization NFKC -f xml training_data/*.xml + +Whitespace normalization is enabled per default and converts all Unicode +whitespace characters into a simple space. It is highly recommended to leave +this function enabled as the variation of space width, resulting either from +text justification or the irregularity of handwriting, is difficult for a +recognition model to accurately model and map onto the different space code +points. Nevertheless it can be disabled through: + +.. code-block:: console + + $ ketos train --no-normalize-whitespace -f xml training_data/*.xml + +Further the behavior of the `BiDi algorithm +`_ can be influenced through two options. The +configuration of the algorithm is important as the recognition network is +trained to output characters (or rather labels which are mapped to code points +by a :ref:`codec `) in the order a line is fed into the network, i.e. +left-to-right also called display order. Unicode text is encoded as a stream of +code points in logical order, i.e. the order the characters in a line are read +in by a human reader, for example (mostly) right-to-left for a text in Hebrew. +The BiDi algorithm resolves this logical order to the display order expected by +the network and vice versa. The primary parameter of the algorithm is the base +direction which is just the default direction of the input fields of the user +when the ground truth was initially transcribed. Base direction will be +automatically determined by kraken when using PAGE XML or ALTO files that +contain it, otherwise it will have to be supplied if it differs from the +default when training a model: + +.. code-block:: console + + $ ketos train --base-dir R -f xml rtl_training_data/*.xml + +It is also possible to disable BiDi processing completely, e.g. when the text +has been brought into display order already: + +.. code-block:: console + + $ ketos train --no-reorder -f xml rtl_display_data/*.xml + +Codecs +~~~~~~ + +.. _codecs: + +Codecs map between the label decoded from the raw network output and Unicode +code points (see :ref:`this ` diagram for the precise steps +involved in text line recognition). Codecs are attached to a recognition model +and are usually defined once at initial training time, although they can be +adapted either explicitly (with the API) or implicitly through domain adaptation. + +The default behavior of kraken is to auto-infer this mapping from all the +characters in the training set and map each code point to one separate label. +This is usually sufficient for alphabetic scripts, abjads, and abugidas apart +from very specialised use cases. Logographic writing systems with a very large +number of different graphemes, such as all the variants of Han characters or +Cuneiform, can be more problematic as their large inventory makes recognition +both slow and error-prone. In such cases it can be advantageous to decompose +each code point into multiple labels to reduce the output dimensionality of the +network. During decoding valid sequences of labels will be mapped to their +respective code points as usual. + +There are multiple approaches one could follow constructing a custom codec: +*randomized block codes*, i.e. producing random fixed-length labels for each code +point, *Huffmann coding*, i.e. variable length label sequences depending on the +frequency of each code point in some text (not necessarily the training set), +or *structural decomposition*, i.e. describing each code point through a +sequence of labels that describe the shape of the grapheme similar to how some +input systems for Chinese characters function. + +While the system is functional it is not well-tested in practice and it is +unclear which approach works best for which kinds of inputs. + +Custom codecs can be supplied as simple JSON files that contain a dictionary +mapping between strings and integer sequences, e.g.: + +.. code-block:: console + + $ ketos train -c sample.codec -f xml training_data/*.xml + +with `sample.codec` containing: + +.. code-block:: json + + {"S": [50, 53, 74, 23], + "A": [95, 60, 19, 95], + "B": [2, 96, 28, 29], + "\u1f05": [91, 14, 95, 90]} + +Unsupervised recognition pretraining +------------------------------------ + +Text recognition models can be pretrained in an unsupervised fashion from text +line images, both in bounding box and baseline format. The pretraining is +performed through a contrastive surrogate task aiming to distinguish in-painted +parts of the input image features from randomly sampled distractor slices. + +All data sources accepted by the supervised trainer are valid for pretraining. + +The basic pretraining call is very similar to a training one: + +.. code-block:: console + + $ ketos pretrain -f binary foo.arrow + +There are a couple of hyperparameters that are specific to pretraining: the +mask width (at the subsampling level of the last convolutional layer), the +probability of a particular position being the start position of a mask, and +the number of negative distractor samples. + +.. code-block:: console + + $ ketos pretrain -o pretrain --mask-width 4 --mask-probability 0.2 --num-negatives 3 -f binary foo.arrow + +Once a model has been pretrained it has to be adapted to perform actual +recognition with a standard labelled dataset, although training data +requirements will usually be much reduced: + +.. code-block:: console + + $ ketos train -i pretrain_best.mlmodel --warmup 5000 -f binary labelled.arrow + +It is recommended to use learning rate warmup (`warmup`) for at least one epoch +to improve convergence of the pretrained model. + +Segmentation training +--------------------- + +.. _segtrain: + +Training a segmentation model is very similar to training models for text +recognition. The basic invocation is: + +.. code-block:: console + + $ ketos segtrain -f xml training_data/*.xml + Training line types: + default 2 53980 + foo 8 134 + Training region types: + graphic 3 135 + text 4 1128 + separator 5 5431 + paragraph 6 10218 + table 7 16 + val check [------------------------------------] 0/0 + +This takes all text lines and regions encoded in the XML files and trains a +model to recognize them. + +Most other options available in transcription training are also available in +segmentation training. CUDA acceleration: + +.. code-block:: console + + $ ketos segtrain -d cuda -f xml training_data/*.xml + +Defining custom architectures: + +.. code-block:: console + + $ ketos segtrain -d cuda -s '[1,1200,0,3 Cr7,7,64,2,2 Gn32 Cr3,3,128,2,2 Gn32 Cr3,3,128 Gn32 Cr3,3,256 Gn32]' -f xml training_data/*.xml + +Fine tuning/transfer learning with last layer adaptation and slicing: + +.. code-block:: console + + $ ketos segtrain --resize both -i segmodel_best.mlmodel training_data/*.xml + $ ketos segtrain -i segmodel_best.mlmodel --append 7 -s '[Cr3,3,64 Do0.1]' training_data/*.xml + +In addition there are a couple of specific options that allow filtering of +baseline and region types. Datasets are often annotated to a level that is too +detailed or contains undesirable types, e.g. when combining segmentation data +from different sources. The most basic option is the suppression of *all* of +either baseline or region data contained in the dataset: + +.. code-block:: console + + $ ketos segtrain --suppress-baselines -f xml training_data/*.xml + Training line types: + Training region types: + graphic 3 135 + text 4 1128 + separator 5 5431 + paragraph 6 10218 + table 7 16 + ... + $ ketos segtrain --suppress-regions -f xml training-data/*.xml + Training line types: + default 2 53980 + foo 8 134 + ... + +It is also possible to filter out baselines/regions selectively: + +.. code-block:: console + + $ ketos segtrain -f xml --valid-baselines default training_data/*.xml + Training line types: + default 2 53980 + Training region types: + graphic 3 135 + text 4 1128 + separator 5 5431 + paragraph 6 10218 + table 7 16 + $ ketos segtrain -f xml --valid-regions graphic --valid-regions paragraph training_data/*.xml + Training line types: + default 2 53980 + Training region types: + graphic 3 135 + paragraph 6 10218 + +Finally, we can merge baselines and regions into each other: + +.. code-block:: console + + $ ketos segtrain -f xml --merge-baselines default:foo training_data/*.xml + Training line types: + default 2 54114 + ... + $ ketos segtrain -f xml --merge-regions text:paragraph --merge-regions graphic:table training_data/*.xml + ... + Training region types: + graphic 3 151 + text 4 11346 + separator 5 5431 + ... + +These options are combinable to massage the dataset into any typology you want. + +Then there are some options that set metadata fields controlling the +postprocessing. When computing the bounding polygons the recognized baselines +are offset slightly to ensure overlap with the line corpus. This offset is per +default upwards for baselines but as it is possible to annotate toplines (for +scripts like Hebrew) and centerlines (for baseline-free scripts like Chinese) +the appropriate offset can be selected with an option: + +.. code-block:: console + + $ ketos segtrain --topline -f xml hebrew_training_data/*.xml + $ ketos segtrain --centerline -f xml chinese_training_data/*.xml + $ ketos segtrain --baseline -f xml latin_training_data/*.xml + +Lastly, there are some regions that are absolute boundaries for text line +content. When these regions are marked as such the polygonization can sometimes +be improved: + +.. code-block:: console + + $ ketos segtrain --bounding-regions paragraph -f xml training_data/*.xml + ... + +Recognition Testing +------------------- + +Picking a particular model from a pool or getting a more detailed look on the +recognition accuracy can be done with the `test` command. It uses transcribed +lines, the test set, in the same format as the `train` command, recognizes the +line images with one or more models, and creates a detailed report of the +differences from the ground truth for each of them. + +======================================================= ====== +option action +======================================================= ====== +-f, --format-type Sets the test set data format. + Valid choices are 'path', 'xml' (default), 'alto', 'page', or binary. + In `alto`, `page`, and xml mode all data is extracted from XML files + containing both baselines and a link to source images. + In `path` mode arguments are image files sharing a prefix up to the last + extension with JSON `.path` files containing the baseline information. + In `binary` mode arguments are precompiled binary dataset files. +-m, --model Model(s) to evaluate. +-e, --evaluation-files File(s) with paths to evaluation data. +-d, --device Select device to use. +--pad Left and right padding around lines. +======================================================= ====== + +Transcriptions are handed to the command in the same way as for the `train` +command, either through a manifest with `-e/--evaluation-files` or by just +adding a number of image files as the final argument: + +.. code-block:: console + + $ ketos test -m $model -e test.txt test/*.png + Evaluating $model + Evaluating [####################################] 100% + === report test_model.mlmodel === + + 7012 Characters + 6022 Errors + 14.12% Accuracy + + 5226 Insertions + 2 Deletions + 794 Substitutions + + Count Missed %Right + 1567 575 63.31% Common + 5230 5230 0.00% Arabic + 215 215 0.00% Inherited + + Errors Correct-Generated + 773 { ا } - { } + 536 { ل } - { } + 328 { و } - { } + 274 { ي } - { } + 266 { م } - { } + 256 { ب } - { } + 246 { ن } - { } + 241 { SPACE } - { } + 207 { ر } - { } + 199 { ف } - { } + 192 { ه } - { } + 174 { ع } - { } + 172 { ARABIC HAMZA ABOVE } - { } + 144 { ت } - { } + 136 { ق } - { } + 122 { س } - { } + 108 { ، } - { } + 106 { د } - { } + 82 { ك } - { } + 81 { ح } - { } + 71 { ج } - { } + 66 { خ } - { } + 62 { ة } - { } + 60 { ص } - { } + 39 { ، } - { - } + 38 { ش } - { } + 30 { ا } - { - } + 30 { ن } - { - } + 29 { ى } - { } + 28 { ذ } - { } + 27 { ه } - { - } + 27 { ARABIC HAMZA BELOW } - { } + 25 { ز } - { } + 23 { ث } - { } + 22 { غ } - { } + 20 { م } - { - } + 20 { ي } - { - } + 20 { ) } - { } + 19 { : } - { } + 19 { ط } - { } + 19 { ل } - { - } + 18 { ، } - { . } + 17 { ة } - { - } + 16 { ض } - { } + ... + Average accuracy: 14.12%, (stddev: 0.00) + +The report(s) contains character accuracy measured per script and a detailed +list of confusions. When evaluating multiple models the last line of the output +will the average accuracy and the standard deviation across all of them. + + diff --git a/4.2.0/_sources/models.rst.txt b/4.2.0/_sources/models.rst.txt new file mode 100644 index 000000000..b393f0738 --- /dev/null +++ b/4.2.0/_sources/models.rst.txt @@ -0,0 +1,24 @@ +.. _models: + +Models +====== + +There are currently three kinds of models containing the recurrent neural +networks doing all the character recognition supported by kraken: ``pronn`` +files serializing old pickled ``pyrnn`` models as protobuf, clstm's native +serialization, and versatile `Core ML +`_ models. + +CoreML +------ + +Core ML allows arbitrary network architectures in a compact serialization with +metadata. This is the default format in pytorch-based kraken. + +Segmentation Models +------------------- + +Recognition Models +------------------ + + diff --git a/4.2.0/_sources/training.rst.txt b/4.2.0/_sources/training.rst.txt new file mode 100644 index 000000000..f514da49b --- /dev/null +++ b/4.2.0/_sources/training.rst.txt @@ -0,0 +1,463 @@ +.. _training: + +Training kraken +=============== + +kraken is an optical character recognition package that can be trained fairly +easily for a large number of scripts. In contrast to other system requiring +segmentation down to glyph level before classification, it is uniquely suited +for the recognition of connected scripts, because the neural network is trained +to assign correct character to unsegmented training data. + +Both segmentation, the process finding lines and regions on a page image, and +recognition, the conversion of line images into text, can be trained in kraken. +To train models for either we require training data, i.e. examples of page +segmentations and transcriptions that are similar to what we want to be able to +recognize. For segmentation the examples are the location of baselines, i.e. +the imaginary lines the text is written on, and polygons of regions. For +recognition these are the text contained in a line. There are multiple ways to +supply training data but the easiest is through PageXML or ALTO files. + +Installing kraken +----------------- + +The easiest way to install and use kraken is through `conda +`_. kraken works both on Linux and Mac OS +X. After installing conda, download the environment file and create the +environment for kraken: + +.. code-block:: console + + $ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment.yml + $ conda env create -f environment.yml + +Each time you want to use the kraken environment in a shell is has to be +activated first: + +.. code-block:: console + + $ conda activate kraken + +Image acquisition and preprocessing +----------------------------------- + +First a number of high quality scans, preferably color or grayscale and at +least 300dpi are required. Scans should be in a lossless image format such as +TIFF or PNG, images in PDF files have to be extracted beforehand using a tool +such as ``pdftocairo`` or ``pdfimages``. While each of these requirements can +be relaxed to a degree, the final accuracy will suffer to some extent. For +example, only slightly compressed JPEG scans are generally suitable for +training and recognition. + +Depending on the source of the scans some preprocessing such as splitting scans +into pages, correcting skew and warp, and removing speckles can be advisable +although it isn't strictly necessary as the segmenter can be trained to treat +noisy material with a high accuracy. A fairly user-friendly software for +semi-automatic batch processing of image scans is `Scantailor +`_ albeit most work can be done using a standard image +editor. + +The total number of scans required depends on the kind of model to train +(segmentation or recognition), the complexity of the layout or the nature of +the script to recognize. Only features that are found in the training data can +later be recognized, so it is important that the coverage of typographic +features is exhaustive. Training a small segmentation model for a particular +kind of material might require less than a few hundred samples while a general +model can well go into the thousands of pages. Likewise a specific recognition +model for printed script with a small grapheme inventory such as Arabic or +Hebrew requires around 800 lines, with manuscripts, complex scripts (such as +polytonic Greek), and general models for multiple typefaces and hands needing +more training data for the same accuracy. + +There is no hard rule for the amount of training data and it may be required to +retrain a model after the initial training data proves insufficient. Most +``western`` texts contain between 25 and 40 lines per page, therefore upward of +30 pages have to be preprocessed and later transcribed. + +Annotation and transcription +---------------------------- + +kraken does not provide internal tools for the annotation and transcription of +baselines, regions, and text. There are a number of tools available that can +create ALTO and PageXML files containing the requisite information for either +segmentation or recognition training: `escriptorium +`_ integrates kraken tightly including +training and inference, `Aletheia +`_ is a powerful desktop +application that can create fine grained annotations. + +Dataset Compilation +------------------- + +.. _compilation: + +Training +-------- + +.. _training_step: + +The training data, e.g. a collection of PAGE XML documents, obtained through +annotation and transcription may now be used to train segmentation and/or +transcription models. + +The training data in ``output_dir`` may now be used to train a new model by +invoking the ``ketos train`` command. Just hand a list of images to the command +such as: + +.. code-block:: console + + $ ketos train output_dir/*.png + +to start training. + +A number of lines will be split off into a separate held-out set that is used +to estimate the actual recognition accuracy achieved in the real world. These +are never shown to the network during training but will be recognized +periodically to evaluate the accuracy of the model. Per default the validation +set will comprise of 10% of the training data. + +Basic model training is mostly automatic albeit there are multiple parameters +that can be adjusted: + +--output + Sets the prefix for models generated during training. They will best as + ``prefix_epochs.mlmodel``. +--report + How often evaluation passes are run on the validation set. It is an + integer equal or larger than 1 with 1 meaning a report is created each + time the complete training set has been seen by the network. +--savefreq + How often intermediate models are saved to disk. It is an integer with + the same semantics as ``--report``. +--load + Continuing training is possible by loading an existing model file with + ``--load``. To continue training from a base model with another + training set refer to the full :ref:`ketos ` documentation. +--preload + Enables/disables preloading of the training set into memory for + accelerated training. The default setting preloads data sets with less + than 2500 lines, explicitly adding ``--preload`` will preload arbitrary + sized sets. ``--no-preload`` disables preloading in all circumstances. + +Training a network will take some time on a modern computer, even with the +default parameters. While the exact time required is unpredictable as training +is a somewhat random process a rough guide is that accuracy seldom improves +after 50 epochs reached between 8 and 24 hours of training. + +When to stop training is a matter of experience; the default setting employs a +fairly reliable approach known as `early stopping +`_ that stops training as soon as +the error rate on the validation set doesn't improve anymore. This will +prevent `overfitting `_, i.e. +fitting the model to recognize only the training data properly instead of the +general patterns contained therein. + +.. code-block:: console + + $ ketos train output_dir/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'} + Initializing model ✓ + Accuracy report (0) -1.5951 3680 9550 + epoch 0/-1 [####################################] 788/788 + Accuracy report (1) 0.0245 3504 3418 + epoch 1/-1 [####################################] 788/788 + Accuracy report (2) 0.8445 3504 545 + epoch 2/-1 [####################################] 788/788 + Accuracy report (3) 0.9541 3504 161 + epoch 3/-1 [------------------------------------] 13/788 0d 00:22:09 + ... + +By now there should be a couple of models model_name-1.mlmodel, +model_name-2.mlmodel, ... in the directory the script was executed in. Lets +take a look at each part of the output. + +.. code-block:: console + + Building training set [####################################] 100% + Building validation set [####################################] 100% + +shows the progress of loading the training and validation set into memory. This +might take a while as preprocessing the whole set and putting it into memory is +computationally intensive. Loading can be made faster without preloading at the +cost of performing preprocessing repeatedly during the training process. + +.. code-block:: console + + [270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'} + +is a warning about missing characters in either the validation or training set, +i.e. that the alphabets of the sets are not equal. Increasing the size of the +validation set will often remedy this warning. + +.. code-block:: console + + Accuracy report (2) 0.8445 3504 545 + +this line shows the results of the validation set evaluation. The error after 2 +epochs is 545 incorrect characters out of 3504 characters in the validation set +for a character accuracy of 84.4%. It should decrease fairly rapidly. If +accuracy remains around 0.30 something is amiss, e.g. non-reordered +right-to-left or wildly incorrect transcriptions. Abort training, correct the +error(s) and start again. + +After training is finished the best model is saved as +``model_name_best.mlmodel``. It is highly recommended to also archive the +training log and data for later reference. + +``ketos`` can also produce more verbose output with training set and network +information by appending one or more ``-v`` to the command: + +.. code-block:: console + + $ ketos -vv train syr/*.png + [0.7272] Building ground truth set from 876 line images + [0.7281] Taking 88 lines from training for evaluation + ... + [0.8479] Training set 788 lines, validation set 88 lines, alphabet 48 symbols + [0.8481] alphabet mismatch {'\xa0', '0', ':', '݀', '܇', '݂', '5'} + [0.8482] grapheme count + [0.8484] SPACE 5258 + [0.8484] ܐ 3519 + [0.8485] ܘ 2334 + [0.8486] ܝ 2096 + [0.8487] ܠ 1754 + [0.8487] ܢ 1724 + [0.8488] ܕ 1697 + [0.8489] ܗ 1681 + [0.8489] ܡ 1623 + [0.8490] ܪ 1359 + [0.8491] ܬ 1339 + [0.8491] ܒ 1184 + [0.8492] ܥ 824 + [0.8492] . 811 + [0.8493] COMBINING DOT BELOW 646 + [0.8493] ܟ 599 + [0.8494] ܫ 577 + [0.8495] COMBINING DIAERESIS 488 + [0.8495] ܚ 431 + [0.8496] ܦ 428 + [0.8496] ܩ 307 + [0.8497] COMBINING DOT ABOVE 259 + [0.8497] ܣ 256 + [0.8498] ܛ 204 + [0.8498] ܓ 176 + [0.8499] ܀ 132 + [0.8499] ܙ 81 + [0.8500] * 66 + [0.8501] ܨ 59 + [0.8501] ܆ 40 + [0.8502] [ 40 + [0.8503] ] 40 + [0.8503] 1 18 + [0.8504] 2 11 + [0.8504] ܇ 9 + [0.8505] 3 8 + [0.8505] 6 + [0.8506] 5 5 + [0.8506] NO-BREAK SPACE 4 + [0.8507] 0 4 + [0.8507] 6 4 + [0.8508] : 4 + [0.8508] 8 4 + [0.8509] 9 3 + [0.8510] 7 3 + [0.8510] 4 3 + [0.8511] SYRIAC FEMININE DOT 1 + [0.8511] SYRIAC RUKKAKHA 1 + [0.8512] Encoding training set + [0.9315] Creating new model [1,1,0,48 Lbx100 Do] with 49 outputs + [0.9318] layer type params + [0.9350] 0 rnn direction b transposed False summarize False out 100 legacy None + [0.9361] 1 dropout probability 0.5 dims 1 + [0.9381] 2 linear augmented False out 49 + [0.9918] Constructing RMSprop optimizer (lr: 0.001, momentum: 0.9) + [0.9920] Set OpenMP threads to 4 + [0.9920] Moving model to device cpu + [0.9924] Starting evaluation run + + +indicates that the training is running on 788 transcribed lines and a +validation set of 88 lines. 49 different classes, i.e. Unicode code points, +where found in these 788 lines. These affect the output size of the network; +obviously only these 49 different classes/code points can later be output by +the network. Importantly, we can see that certain characters occur markedly +less often than others. Characters like the Syriac feminine dot and numerals +that occur less than 10 times will most likely not be recognized well by the +trained net. + + +Evaluation and Validation +------------------------- + +While output during training is detailed enough to know when to stop training +one usually wants to know the specific kinds of errors to expect. Doing more +in-depth error analysis also allows to pinpoint weaknesses in the training +data, e.g. above average error rates for numerals indicate either a lack of +representation of numerals in the training data or erroneous transcription in +the first place. + +First the trained model has to be applied to some line transcriptions with the +`ketos test` command: + +.. code-block:: console + + $ ketos test -m syriac_best.mlmodel lines/*.png + Loading model syriac_best.mlmodel ✓ + Evaluating syriac_best.mlmodel + Evaluating [#-----------------------------------] 3% 00:04:56 + ... + +After all lines have been processed a evaluation report will be printed: + +.. code-block:: console + + === report === + + 35619 Characters + 336 Errors + 99.06% Accuracy + + 157 Insertions + 81 Deletions + 98 Substitutions + + Count Missed %Right + 27046 143 99.47% Syriac + 7015 52 99.26% Common + 1558 60 96.15% Inherited + + Errors Correct-Generated + 25 { } - { COMBINING DOT BELOW } + 25 { COMBINING DOT BELOW } - { } + 15 { . } - { } + 15 { COMBINING DIAERESIS } - { } + 12 { ܢ } - { } + 10 { } - { . } + 8 { COMBINING DOT ABOVE } - { } + 8 { ܝ } - { } + 7 { ZERO WIDTH NO-BREAK SPACE } - { } + 7 { ܆ } - { } + 7 { SPACE } - { } + 7 { ܣ } - { } + 6 { } - { ܝ } + 6 { COMBINING DOT ABOVE } - { COMBINING DIAERESIS } + 5 { ܙ } - { } + 5 { ܬ } - { } + 5 { } - { ܢ } + 4 { NO-BREAK SPACE } - { } + 4 { COMBINING DIAERESIS } - { COMBINING DOT ABOVE } + 4 { } - { ܒ } + 4 { } - { COMBINING DIAERESIS } + 4 { ܗ } - { } + 4 { } - { ܬ } + 4 { } - { ܘ } + 4 { ܕ } - { ܢ } + 3 { } - { ܕ } + 3 { ܐ } - { } + 3 { ܗ } - { ܐ } + 3 { ܝ } - { ܢ } + 3 { ܀ } - { . } + 3 { } - { ܗ } + + ..... + +The first section of the report consists of a simple accounting of the number +of characters in the ground truth, the errors in the recognition output and the +resulting accuracy in per cent. + +The next table lists the number of insertions (characters occurring in the +ground truth but not in the recognition output), substitutions (misrecognized +characters), and deletions (superfluous characters recognized by the model). + +Next is a grouping of errors (insertions and substitutions) by Unicode script. + +The final part of the report are errors sorted by frequency and a per +character accuracy report. Importantly most errors are incorrect recognition of +combining marks such as dots and diaereses. These may have several sources: +different dot placement in training and validation set, incorrect transcription +such as non-systematic transcription, or unclean speckled scans. Depending on +the error source, correction most often involves adding more training data and +fixing transcriptions. Sometimes it may even be advisable to remove +unrepresentative data from the training set. + +Recognition +----------- + +The ``kraken`` utility is employed for all non-training related tasks. Optical +character recognition is a multi-step process consisting of binarization +(conversion of input images to black and white), page segmentation (extracting +lines from the image), and recognition (converting line image to character +sequences). All of these may be run in a single call like this: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m MODEL_FILE + +producing a text file from the input image. There are also `hocr +`_ and `ALTO `_ output +formats available through the appropriate switches: + +.. code-block:: console + + $ kraken -i ... ocr -h + $ kraken -i ... ocr -a + +For debugging purposes it is sometimes helpful to run each step manually and +inspect intermediate results: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE BW_IMAGE binarize + $ kraken -i BW_IMAGE LINES segment + $ kraken -i BW_IMAGE OUTPUT_FILE ocr -l LINES ... + +It is also possible to recognize more than one file at a time by just chaining +``-i ... ...`` clauses like this: + +.. code-block:: console + + $ kraken -i input_1 output_1 -i input_2 output_2 ... + +Finally, there is a central repository containing freely available models. +Getting a list of all available models: + +.. code-block:: console + + $ kraken list + +Retrieving model metadata for a particular model: + +.. code-block:: console + + $ kraken show arabic-alam-al-kutub + name: arabic-alam-al-kutub.mlmodel + + An experimental model for Classical Arabic texts. + + Network trained on 889 lines of [0] as a test case for a general Classical + Arabic model. Ground truth was prepared by Sarah Savant + and Maxim Romanov . + + Vocalization was omitted in the ground truth. Training was stopped at ~35000 + iterations with an accuracy of 97%. + + [0] Ibn al-Faqīh (d. 365 AH). Kitāb al-buldān. Edited by Yūsuf al-Hādī, 1st + edition. Bayrūt: ʿĀlam al-kutub, 1416 AH/1996 CE. + alphabet: !()-.0123456789:[] «»،؟ءابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ARABIC + MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW + +and actually fetching the model: + +.. code-block:: console + + $ kraken get arabic-alam-al-kutub + +The downloaded model can then be used for recognition by the name shown in its metadata, e.g.: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m arabic-alam-al-kutub.mlmodel + +For more documentation see the kraken `website `_. diff --git a/4.2.0/_sources/vgsl.rst.txt b/4.2.0/_sources/vgsl.rst.txt new file mode 100644 index 000000000..913a7b5b1 --- /dev/null +++ b/4.2.0/_sources/vgsl.rst.txt @@ -0,0 +1,199 @@ +.. _vgsl: + +VGSL network specification +========================== + +kraken implements a dialect of the Variable-size Graph Specification Language +(VGSL), enabling the specification of different network architectures for image +processing purposes using a short definition string. + +Basics +------ + +A VGSL specification consists of an input block, one or more layers, and an +output block. For example: + +.. code-block:: console + + [1,48,0,1 Cr3,3,32 Mp2,2 Cr3,3,64 Mp2,2 S1(1x12)1,3 Lbx100 Do O1c103] + +The first block defines the input in order of [batch, height, width, channels] +with zero-valued dimensions being variable. Integer valued height or width +input specifications will result in the input images being automatically scaled +in either dimension. + +When channels are set to 1 grayscale or B/W inputs are expected, 3 expects RGB +color images. Higher values in combination with a height of 1 result in the +network being fed 1 pixel wide grayscale strips scaled to the size of the +channel dimension. + +After the input, a number of layers are defined. Layers operate on the channel +dimension; this is intuitive for convolutional layers but a recurrent layer +doing sequence classification along the width axis on an image of a particular +height requires the height dimension to be moved to the channel dimension, +e.g.: + +.. code-block:: console + + [1,48,0,1 S1(1x48)1,3 Lbx100 O1c103] + +or using the alternative slightly faster formulation: + +.. code-block:: console + + [1,1,0,48 Lbx100 O1c103] + +Finally an output definition is appended. When training sequence classification +networks with the provided tools the appropriate output definition is +automatically appended to the network based on the alphabet of the training +data. + +Examples +-------- + +.. code-block:: console + + [1,1,0,48 Lbx100 Do 01c59] + + Creating new model [1,1,0,48 Lbx100 Do] with 59 outputs + layer type params + 0 rnn direction b transposed False summarize False out 100 legacy None + 1 dropout probability 0.5 dims 1 + 2 linear augmented False out 59 + +A simple recurrent recognition model with a single LSTM layer classifying lines +normalized to 48 pixels in height. + +.. code-block:: console + + [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do 01c59] + + Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 59 outputs + layer type params + 0 conv kernel 3 x 3 filters 32 activation r + 1 dropout probability 0.1 dims 2 + 2 maxpool kernel 2 x 2 stride 2 x 2 + 3 conv kernel 3 x 3 filters 64 activation r + 4 dropout probability 0.1 dims 2 + 5 maxpool kernel 2 x 2 stride 2 x 2 + 6 reshape from 1 1 x 12 to 1/3 + 7 rnn direction b transposed False summarize False out 100 legacy None + 8 dropout probability 0.5 dims 1 + 9 linear augmented False out 59 + +A model with a small convolutional stack before a recurrent LSTM layer. The +extended dropout layer syntax is used to reduce drop probability on the depth +dimension as the default is too high for convolutional layers. The remainder of +the height dimension (`12`) is reshaped into the depth dimensions before +applying the final recurrent and linear layers. + +.. code-block:: console + + [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do 01c59] + + Creating new model [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do] with 59 outputs + layer type params + 0 conv kernel 3 x 3 filters 16 activation r + 1 maxpool kernel 3 x 3 stride 3 x 3 + 2 rnn direction f transposed True summarize True out 64 legacy None + 3 rnn direction b transposed False summarize False out 128 legacy None + 4 rnn direction b transposed False summarize False out 256 legacy None + 5 dropout probability 0.5 dims 1 + 6 linear augmented False out 59 + +A model with arbitrary sized color image input, an initial summarizing +recurrent layer to squash the height to 64, followed by 2 bi-directional +recurrent layers and a linear projection. + +Convolutional Layers +-------------------- + +.. code-block:: console + + C[{name}](s|t|r|l|m)[{name}],,[,,] + s = sigmoid + t = tanh + r = relu + l = linear + m = softmax + +Adds a 2D convolution with kernel size `(y, x)` and `d` output channels, applying +the selected nonlinearity. The stride can be adjusted with the optional last +two parameters. + +Recurrent Layers +---------------- + +.. code-block:: console + + L[{name}](f|r|b)(x|y)[s][{name}] LSTM cell with n outputs. + G[{name}](f|r|b)(x|y)[s][{name}] GRU cell with n outputs. + f runs the RNN forward only. + r runs the RNN reversed only. + b runs the RNN bidirectionally. + s (optional) summarizes the output in the requested dimension, return the last step. + +Adds either an LSTM or GRU recurrent layer to the network using either the `x` +(width) or `y` (height) dimension as the time axis. Input features are the +channel dimension and the non-time-axis dimension (height/width) is treated as +another batch dimension. For example, a `Lfx25` layer on an `1, 16, 906, 32` +input will execute 16 independent forward passes on `906x32` tensors resulting +in an output of shape `1, 16, 906, 25`. If this isn't desired either run a +summarizing layer in the other direction, e.g. `Lfys20` for an input `1, 1, +906, 20`, or prepend a reshape layer `S1(1x16)1,3` combining the height and +channel dimension for an `1, 1, 906, 512` input to the recurrent layer. + +Helper and Plumbing Layers +-------------------------- + +Max Pool +^^^^^^^^ +.. code-block:: console + + Mp[{name}],[,,] + +Adds a maximum pooling with `(y, x)` kernel_size and `(y_stride, x_stride)` stride. + +Reshape +^^^^^^^ + +.. code-block:: console + + S[{name}](x), Splits one dimension, moves one part to another + dimension. + +The `S` layer reshapes a source dimension `d` to `a,b` and distributes `a` into +dimension `e`, respectively `b` into `f`. Either `e` or `f` has to be equal to +`d`. So `S1(1, 48)1, 3` on an `1, 48, 1020, 8` input will first reshape into +`1, 1, 48, 1020, 8`, leave the `1` part in the height dimension and distribute +the `48` sized tensor into the channel dimension resulting in a `1, 1, 1024, +48*8=384` sized output. `S` layers are mostly used to remove undesirable non-1 +height before a recurrent layer. + +.. note:: + + This `S` layer is equivalent to the one implemented in the tensorflow + implementation of VGSL, i.e. behaves differently from tesseract. + +Regularization Layers +--------------------- + +Dropout +^^^^^^^ + +.. code-block:: console + + Do[{name}][],[] Insert a 1D or 2D dropout layer + +Adds an 1D or 2D dropout layer with a given probability. Defaults to `0.5` drop +probability and 1D dropout. Set to `dim` to `2` after convolutional layers. + +Group Normalization +^^^^^^^^^^^^^^^^^^^ + +.. code-block:: console + + Gn Inserts a group normalization layer + +Adds a group normalization layer separating the input into `` groups, +normalizing each separately. diff --git a/4.2.0/_static/alabaster.css b/4.2.0/_static/alabaster.css new file mode 100644 index 000000000..e3174bf93 --- /dev/null +++ b/4.2.0/_static/alabaster.css @@ -0,0 +1,708 @@ +@import url("basic.css"); + +/* -- page layout ----------------------------------------------------------- */ + +body { + font-family: Georgia, serif; + font-size: 17px; + background-color: #fff; + color: #000; + margin: 0; + padding: 0; +} + + +div.document { + width: 940px; + margin: 30px auto 0 auto; +} + +div.documentwrapper { + float: left; + width: 100%; +} + +div.bodywrapper { + margin: 0 0 0 220px; +} + +div.sphinxsidebar { + width: 220px; + font-size: 14px; + line-height: 1.5; +} + +hr { + border: 1px solid #B1B4B6; +} + +div.body { + background-color: #fff; + color: #3E4349; + padding: 0 30px 0 30px; +} + +div.body > .section { + text-align: left; +} + +div.footer { + width: 940px; + margin: 20px auto 30px auto; + font-size: 14px; + color: #888; + text-align: right; +} + +div.footer a { + color: #888; +} + +p.caption { + font-family: inherit; + font-size: inherit; +} + + +div.relations { + display: none; +} + + +div.sphinxsidebar { + max-height: 100%; + overflow-y: auto; +} + +div.sphinxsidebar a { + color: #444; + text-decoration: none; + border-bottom: 1px dotted #999; +} + +div.sphinxsidebar a:hover { + border-bottom: 1px solid #999; +} + +div.sphinxsidebarwrapper { + padding: 18px 10px; +} + +div.sphinxsidebarwrapper p.logo { + padding: 0; + margin: -10px 0 0 0px; + text-align: center; +} + +div.sphinxsidebarwrapper h1.logo { + margin-top: -10px; + text-align: center; + margin-bottom: 5px; + text-align: left; +} + +div.sphinxsidebarwrapper h1.logo-name { + margin-top: 0px; +} + +div.sphinxsidebarwrapper p.blurb { + margin-top: 0; + font-style: normal; +} + +div.sphinxsidebar h3, +div.sphinxsidebar h4 { + font-family: Georgia, serif; + color: #444; + font-size: 24px; + font-weight: normal; + margin: 0 0 5px 0; + padding: 0; +} + +div.sphinxsidebar h4 { + font-size: 20px; +} + +div.sphinxsidebar h3 a { + color: #444; +} + +div.sphinxsidebar p.logo a, +div.sphinxsidebar h3 a, +div.sphinxsidebar p.logo a:hover, +div.sphinxsidebar h3 a:hover { + border: none; +} + +div.sphinxsidebar p { + color: #555; + margin: 10px 0; +} + +div.sphinxsidebar ul { + margin: 10px 0; + padding: 0; + color: #000; +} + +div.sphinxsidebar ul li.toctree-l1 > a { + font-size: 120%; +} + +div.sphinxsidebar ul li.toctree-l2 > a { + font-size: 110%; +} + +div.sphinxsidebar input { + border: 1px solid #CCC; + font-family: Georgia, serif; + font-size: 1em; +} + +div.sphinxsidebar #searchbox input[type="text"] { + width: 160px; +} + +div.sphinxsidebar .search > div { + display: table-cell; +} + +div.sphinxsidebar hr { + border: none; + height: 1px; + color: #AAA; + background: #AAA; + + text-align: left; + margin-left: 0; + width: 50%; +} + +div.sphinxsidebar .badge { + border-bottom: none; +} + +div.sphinxsidebar .badge:hover { + border-bottom: none; +} + +/* To address an issue with donation coming after search */ +div.sphinxsidebar h3.donation { + margin-top: 10px; +} + +/* -- body styles ----------------------------------------------------------- */ + +a { + color: #004B6B; + text-decoration: underline; +} + +a:hover { + color: #6D4100; + text-decoration: underline; +} + +div.body h1, +div.body h2, +div.body h3, +div.body h4, +div.body h5, +div.body h6 { + font-family: Georgia, serif; + font-weight: normal; + margin: 30px 0px 10px 0px; + padding: 0; +} + +div.body h1 { margin-top: 0; padding-top: 0; font-size: 240%; } +div.body h2 { font-size: 180%; } +div.body h3 { font-size: 150%; } +div.body h4 { font-size: 130%; } +div.body h5 { font-size: 100%; } +div.body h6 { font-size: 100%; } + +a.headerlink { + color: #DDD; + padding: 0 4px; + text-decoration: none; +} + +a.headerlink:hover { + color: #444; + background: #EAEAEA; +} + +div.body p, div.body dd, div.body li { + line-height: 1.4em; +} + +div.admonition { + margin: 20px 0px; + padding: 10px 30px; + background-color: #EEE; + border: 1px solid #CCC; +} + +div.admonition tt.xref, div.admonition code.xref, div.admonition a tt { + background-color: #FBFBFB; + border-bottom: 1px solid #fafafa; +} + +div.admonition p.admonition-title { + font-family: Georgia, serif; + font-weight: normal; + font-size: 24px; + margin: 0 0 10px 0; + padding: 0; + line-height: 1; +} + +div.admonition p.last { + margin-bottom: 0; +} + +div.highlight { + background-color: #fff; +} + +dt:target, .highlight { + background: #FAF3E8; +} + +div.warning { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.danger { + background-color: #FCC; + border: 1px solid #FAA; + -moz-box-shadow: 2px 2px 4px #D52C2C; + -webkit-box-shadow: 2px 2px 4px #D52C2C; + box-shadow: 2px 2px 4px #D52C2C; +} + +div.error { + background-color: #FCC; + border: 1px solid #FAA; + -moz-box-shadow: 2px 2px 4px #D52C2C; + -webkit-box-shadow: 2px 2px 4px #D52C2C; + box-shadow: 2px 2px 4px #D52C2C; +} + +div.caution { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.attention { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.important { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.note { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.tip { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.hint { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.seealso { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.topic { + background-color: #EEE; +} + +p.admonition-title { + display: inline; +} + +p.admonition-title:after { + content: ":"; +} + +pre, tt, code { + font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace; + font-size: 0.9em; +} + +.hll { + background-color: #FFC; + margin: 0 -12px; + padding: 0 12px; + display: block; +} + +img.screenshot { +} + +tt.descname, tt.descclassname, code.descname, code.descclassname { + font-size: 0.95em; +} + +tt.descname, code.descname { + padding-right: 0.08em; +} + +img.screenshot { + -moz-box-shadow: 2px 2px 4px #EEE; + -webkit-box-shadow: 2px 2px 4px #EEE; + box-shadow: 2px 2px 4px #EEE; +} + +table.docutils { + border: 1px solid #888; + -moz-box-shadow: 2px 2px 4px #EEE; + -webkit-box-shadow: 2px 2px 4px #EEE; + box-shadow: 2px 2px 4px #EEE; +} + +table.docutils td, table.docutils th { + border: 1px solid #888; + padding: 0.25em 0.7em; +} + +table.field-list, table.footnote { + border: none; + -moz-box-shadow: none; + -webkit-box-shadow: none; + box-shadow: none; +} + +table.footnote { + margin: 15px 0; + width: 100%; + border: 1px solid #EEE; + background: #FDFDFD; + font-size: 0.9em; +} + +table.footnote + table.footnote { + margin-top: -15px; + border-top: none; +} + +table.field-list th { + padding: 0 0.8em 0 0; +} + +table.field-list td { + padding: 0; +} + +table.field-list p { + margin-bottom: 0.8em; +} + +/* Cloned from + * https://github.com/sphinx-doc/sphinx/commit/ef60dbfce09286b20b7385333d63a60321784e68 + */ +.field-name { + -moz-hyphens: manual; + -ms-hyphens: manual; + -webkit-hyphens: manual; + hyphens: manual; +} + +table.footnote td.label { + width: .1px; + padding: 0.3em 0 0.3em 0.5em; +} + +table.footnote td { + padding: 0.3em 0.5em; +} + +dl { + margin-left: 0; + margin-right: 0; + margin-top: 0; + padding: 0; +} + +dl dd { + margin-left: 30px; +} + +blockquote { + margin: 0 0 0 30px; + padding: 0; +} + +ul, ol { + /* Matches the 30px from the narrow-screen "li > ul" selector below */ + margin: 10px 0 10px 30px; + padding: 0; +} + +pre { + background: #EEE; + padding: 7px 30px; + margin: 15px 0px; + line-height: 1.3em; +} + +div.viewcode-block:target { + background: #ffd; +} + +dl pre, blockquote pre, li pre { + margin-left: 0; + padding-left: 30px; +} + +tt, code { + background-color: #ecf0f3; + color: #222; + /* padding: 1px 2px; */ +} + +tt.xref, code.xref, a tt { + background-color: #FBFBFB; + border-bottom: 1px solid #fff; +} + +a.reference { + text-decoration: none; + border-bottom: 1px dotted #004B6B; +} + +/* Don't put an underline on images */ +a.image-reference, a.image-reference:hover { + border-bottom: none; +} + +a.reference:hover { + border-bottom: 1px solid #6D4100; +} + +a.footnote-reference { + text-decoration: none; + font-size: 0.7em; + vertical-align: top; + border-bottom: 1px dotted #004B6B; +} + +a.footnote-reference:hover { + border-bottom: 1px solid #6D4100; +} + +a:hover tt, a:hover code { + background: #EEE; +} + + +@media screen and (max-width: 870px) { + + div.sphinxsidebar { + display: none; + } + + div.document { + width: 100%; + + } + + div.documentwrapper { + margin-left: 0; + margin-top: 0; + margin-right: 0; + margin-bottom: 0; + } + + div.bodywrapper { + margin-top: 0; + margin-right: 0; + margin-bottom: 0; + margin-left: 0; + } + + ul { + margin-left: 0; + } + + li > ul { + /* Matches the 30px from the "ul, ol" selector above */ + margin-left: 30px; + } + + .document { + width: auto; + } + + .footer { + width: auto; + } + + .bodywrapper { + margin: 0; + } + + .footer { + width: auto; + } + + .github { + display: none; + } + + + +} + + + +@media screen and (max-width: 875px) { + + body { + margin: 0; + padding: 20px 30px; + } + + div.documentwrapper { + float: none; + background: #fff; + } + + div.sphinxsidebar { + display: block; + float: none; + width: 102.5%; + margin: 50px -30px -20px -30px; + padding: 10px 20px; + background: #333; + color: #FFF; + } + + div.sphinxsidebar h3, div.sphinxsidebar h4, div.sphinxsidebar p, + div.sphinxsidebar h3 a { + color: #fff; + } + + div.sphinxsidebar a { + color: #AAA; + } + + div.sphinxsidebar p.logo { + display: none; + } + + div.document { + width: 100%; + margin: 0; + } + + div.footer { + display: none; + } + + div.bodywrapper { + margin: 0; + } + + div.body { + min-height: 0; + padding: 0; + } + + .rtd_doc_footer { + display: none; + } + + .document { + width: auto; + } + + .footer { + width: auto; + } + + .footer { + width: auto; + } + + .github { + display: none; + } +} + + +/* misc. */ + +.revsys-inline { + display: none!important; +} + +/* Hide ugly table cell borders in ..bibliography:: directive output */ +table.docutils.citation, table.docutils.citation td, table.docutils.citation th { + border: none; + /* Below needed in some edge cases; if not applied, bottom shadows appear */ + -moz-box-shadow: none; + -webkit-box-shadow: none; + box-shadow: none; +} + + +/* relbar */ + +.related { + line-height: 30px; + width: 100%; + font-size: 0.9rem; +} + +.related.top { + border-bottom: 1px solid #EEE; + margin-bottom: 20px; +} + +.related.bottom { + border-top: 1px solid #EEE; +} + +.related ul { + padding: 0; + margin: 0; + list-style: none; +} + +.related li { + display: inline; +} + +nav#rellinks { + float: right; +} + +nav#rellinks li+li:before { + content: "|"; +} + +nav#breadcrumbs li+li:before { + content: "\00BB"; +} + +/* Hide certain items when printing */ +@media print { + div.related { + display: none; + } +} \ No newline at end of file diff --git a/4.2.0/_static/basic.css b/4.2.0/_static/basic.css new file mode 100644 index 000000000..e5179b7a9 --- /dev/null +++ b/4.2.0/_static/basic.css @@ -0,0 +1,925 @@ +/* + * basic.css + * ~~~~~~~~~ + * + * Sphinx stylesheet -- basic theme. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +/* -- main layout ----------------------------------------------------------- */ + +div.clearer { + clear: both; +} + +div.section::after { + display: block; + content: ''; + clear: left; +} + +/* -- relbar ---------------------------------------------------------------- */ + +div.related { + width: 100%; + font-size: 90%; +} + +div.related h3 { + display: none; +} + +div.related ul { + margin: 0; + padding: 0 0 0 10px; + list-style: none; +} + +div.related li { + display: inline; +} + +div.related li.right { + float: right; + margin-right: 5px; +} + +/* -- sidebar --------------------------------------------------------------- */ + +div.sphinxsidebarwrapper { + padding: 10px 5px 0 10px; +} + +div.sphinxsidebar { + float: left; + width: 230px; + margin-left: -100%; + font-size: 90%; + word-wrap: break-word; + overflow-wrap : break-word; +} + +div.sphinxsidebar ul { + list-style: none; +} + +div.sphinxsidebar ul ul, +div.sphinxsidebar ul.want-points { + margin-left: 20px; + list-style: square; +} + +div.sphinxsidebar ul ul { + margin-top: 0; + margin-bottom: 0; +} + +div.sphinxsidebar form { + margin-top: 10px; +} + +div.sphinxsidebar input { + border: 1px solid #98dbcc; + font-family: sans-serif; + font-size: 1em; +} + +div.sphinxsidebar #searchbox form.search { + overflow: hidden; +} + +div.sphinxsidebar #searchbox input[type="text"] { + float: left; + width: 80%; + padding: 0.25em; + box-sizing: border-box; +} + +div.sphinxsidebar #searchbox input[type="submit"] { + float: left; + width: 20%; + border-left: none; + padding: 0.25em; + box-sizing: border-box; +} + + +img { + border: 0; + max-width: 100%; +} + +/* -- search page ----------------------------------------------------------- */ + +ul.search { + margin: 10px 0 0 20px; + padding: 0; +} + +ul.search li { + padding: 5px 0 5px 20px; + background-image: url(file.png); + background-repeat: no-repeat; + background-position: 0 7px; +} + +ul.search li a { + font-weight: bold; +} + +ul.search li p.context { + color: #888; + margin: 2px 0 0 30px; + text-align: left; +} + +ul.keywordmatches li.goodmatch a { + font-weight: bold; +} + +/* -- index page ------------------------------------------------------------ */ + +table.contentstable { + width: 90%; + margin-left: auto; + margin-right: auto; +} + +table.contentstable p.biglink { + line-height: 150%; +} + +a.biglink { + font-size: 1.3em; +} + +span.linkdescr { + font-style: italic; + padding-top: 5px; + font-size: 90%; +} + +/* -- general index --------------------------------------------------------- */ + +table.indextable { + width: 100%; +} + +table.indextable td { + text-align: left; + vertical-align: top; +} + +table.indextable ul { + margin-top: 0; + margin-bottom: 0; + list-style-type: none; +} + +table.indextable > tbody > tr > td > ul { + padding-left: 0em; +} + +table.indextable tr.pcap { + height: 10px; +} + +table.indextable tr.cap { + margin-top: 10px; + background-color: #f2f2f2; +} + +img.toggler { + margin-right: 3px; + margin-top: 3px; + cursor: pointer; +} + +div.modindex-jumpbox { + border-top: 1px solid #ddd; + border-bottom: 1px solid #ddd; + margin: 1em 0 1em 0; + padding: 0.4em; +} + +div.genindex-jumpbox { + border-top: 1px solid #ddd; + border-bottom: 1px solid #ddd; + margin: 1em 0 1em 0; + padding: 0.4em; +} + +/* -- domain module index --------------------------------------------------- */ + +table.modindextable td { + padding: 2px; + border-collapse: collapse; +} + +/* -- general body styles --------------------------------------------------- */ + +div.body { + min-width: inherit; + max-width: 800px; +} + +div.body p, div.body dd, div.body li, div.body blockquote { + -moz-hyphens: auto; + -ms-hyphens: auto; + -webkit-hyphens: auto; + hyphens: auto; +} + +a.headerlink { + visibility: hidden; +} + +a:visited { + color: #551A8B; +} + +h1:hover > a.headerlink, +h2:hover > a.headerlink, +h3:hover > a.headerlink, +h4:hover > a.headerlink, +h5:hover > a.headerlink, +h6:hover > a.headerlink, +dt:hover > a.headerlink, +caption:hover > a.headerlink, +p.caption:hover > a.headerlink, +div.code-block-caption:hover > a.headerlink { + visibility: visible; +} + +div.body p.caption { + text-align: inherit; +} + +div.body td { + text-align: left; +} + +.first { + margin-top: 0 !important; +} + +p.rubric { + margin-top: 30px; + font-weight: bold; +} + +img.align-left, figure.align-left, .figure.align-left, object.align-left { + clear: left; + float: left; + margin-right: 1em; +} + +img.align-right, figure.align-right, .figure.align-right, object.align-right { + clear: right; + float: right; + margin-left: 1em; +} + +img.align-center, figure.align-center, .figure.align-center, object.align-center { + display: block; + margin-left: auto; + margin-right: auto; +} + +img.align-default, figure.align-default, .figure.align-default { + display: block; + margin-left: auto; + margin-right: auto; +} + +.align-left { + text-align: left; +} + +.align-center { + text-align: center; +} + +.align-default { + text-align: center; +} + +.align-right { + text-align: right; +} + +/* -- sidebars -------------------------------------------------------------- */ + +div.sidebar, +aside.sidebar { + margin: 0 0 0.5em 1em; + border: 1px solid #ddb; + padding: 7px; + background-color: #ffe; + width: 40%; + float: right; + clear: right; + overflow-x: auto; +} + +p.sidebar-title { + font-weight: bold; +} + +nav.contents, +aside.topic, +div.admonition, div.topic, blockquote { + clear: left; +} + +/* -- topics ---------------------------------------------------------------- */ + +nav.contents, +aside.topic, +div.topic { + border: 1px solid #ccc; + padding: 7px; + margin: 10px 0 10px 0; +} + +p.topic-title { + font-size: 1.1em; + font-weight: bold; + margin-top: 10px; +} + +/* -- admonitions ----------------------------------------------------------- */ + +div.admonition { + margin-top: 10px; + margin-bottom: 10px; + padding: 7px; +} + +div.admonition dt { + font-weight: bold; +} + +p.admonition-title { + margin: 0px 10px 5px 0px; + font-weight: bold; +} + +div.body p.centered { + text-align: center; + margin-top: 25px; +} + +/* -- content of sidebars/topics/admonitions -------------------------------- */ + +div.sidebar > :last-child, +aside.sidebar > :last-child, +nav.contents > :last-child, +aside.topic > :last-child, +div.topic > :last-child, +div.admonition > :last-child { + margin-bottom: 0; +} + +div.sidebar::after, +aside.sidebar::after, +nav.contents::after, +aside.topic::after, +div.topic::after, +div.admonition::after, +blockquote::after { + display: block; + content: ''; + clear: both; +} + +/* -- tables ---------------------------------------------------------------- */ + +table.docutils { + margin-top: 10px; + margin-bottom: 10px; + border: 0; + border-collapse: collapse; +} + +table.align-center { + margin-left: auto; + margin-right: auto; +} + +table.align-default { + margin-left: auto; + margin-right: auto; +} + +table caption span.caption-number { + font-style: italic; +} + +table caption span.caption-text { +} + +table.docutils td, table.docutils th { + padding: 1px 8px 1px 5px; + border-top: 0; + border-left: 0; + border-right: 0; + border-bottom: 1px solid #aaa; +} + +th { + text-align: left; + padding-right: 5px; +} + +table.citation { + border-left: solid 1px gray; + margin-left: 1px; +} + +table.citation td { + border-bottom: none; +} + +th > :first-child, +td > :first-child { + margin-top: 0px; +} + +th > :last-child, +td > :last-child { + margin-bottom: 0px; +} + +/* -- figures --------------------------------------------------------------- */ + +div.figure, figure { + margin: 0.5em; + padding: 0.5em; +} + +div.figure p.caption, figcaption { + padding: 0.3em; +} + +div.figure p.caption span.caption-number, +figcaption span.caption-number { + font-style: italic; +} + +div.figure p.caption span.caption-text, +figcaption span.caption-text { +} + +/* -- field list styles ----------------------------------------------------- */ + +table.field-list td, table.field-list th { + border: 0 !important; +} + +.field-list ul { + margin: 0; + padding-left: 1em; +} + +.field-list p { + margin: 0; +} + +.field-name { + -moz-hyphens: manual; + -ms-hyphens: manual; + -webkit-hyphens: manual; + hyphens: manual; +} + +/* -- hlist styles ---------------------------------------------------------- */ + +table.hlist { + margin: 1em 0; +} + +table.hlist td { + vertical-align: top; +} + +/* -- object description styles --------------------------------------------- */ + +.sig { + font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace; +} + +.sig-name, code.descname { + background-color: transparent; + font-weight: bold; +} + +.sig-name { + font-size: 1.1em; +} + +code.descname { + font-size: 1.2em; +} + +.sig-prename, code.descclassname { + background-color: transparent; +} + +.optional { + font-size: 1.3em; +} + +.sig-paren { + font-size: larger; +} + +.sig-param.n { + font-style: italic; +} + +/* C++ specific styling */ + +.sig-inline.c-texpr, +.sig-inline.cpp-texpr { + font-family: unset; +} + +.sig.c .k, .sig.c .kt, +.sig.cpp .k, .sig.cpp .kt { + color: #0033B3; +} + +.sig.c .m, +.sig.cpp .m { + color: #1750EB; +} + +.sig.c .s, .sig.c .sc, +.sig.cpp .s, .sig.cpp .sc { + color: #067D17; +} + + +/* -- other body styles ----------------------------------------------------- */ + +ol.arabic { + list-style: decimal; +} + +ol.loweralpha { + list-style: lower-alpha; +} + +ol.upperalpha { + list-style: upper-alpha; +} + +ol.lowerroman { + list-style: lower-roman; +} + +ol.upperroman { + list-style: upper-roman; +} + +:not(li) > ol > li:first-child > :first-child, +:not(li) > ul > li:first-child > :first-child { + margin-top: 0px; +} + +:not(li) > ol > li:last-child > :last-child, +:not(li) > ul > li:last-child > :last-child { + margin-bottom: 0px; +} + +ol.simple ol p, +ol.simple ul p, +ul.simple ol p, +ul.simple ul p { + margin-top: 0; +} + +ol.simple > li:not(:first-child) > p, +ul.simple > li:not(:first-child) > p { + margin-top: 0; +} + +ol.simple p, +ul.simple p { + margin-bottom: 0; +} + +aside.footnote > span, +div.citation > span { + float: left; +} +aside.footnote > span:last-of-type, +div.citation > span:last-of-type { + padding-right: 0.5em; +} +aside.footnote > p { + margin-left: 2em; +} +div.citation > p { + margin-left: 4em; +} +aside.footnote > p:last-of-type, +div.citation > p:last-of-type { + margin-bottom: 0em; +} +aside.footnote > p:last-of-type:after, +div.citation > p:last-of-type:after { + content: ""; + clear: both; +} + +dl.field-list { + display: grid; + grid-template-columns: fit-content(30%) auto; +} + +dl.field-list > dt { + font-weight: bold; + word-break: break-word; + padding-left: 0.5em; + padding-right: 5px; +} + +dl.field-list > dd { + padding-left: 0.5em; + margin-top: 0em; + margin-left: 0em; + margin-bottom: 0em; +} + +dl { + margin-bottom: 15px; +} + +dd > :first-child { + margin-top: 0px; +} + +dd ul, dd table { + margin-bottom: 10px; +} + +dd { + margin-top: 3px; + margin-bottom: 10px; + margin-left: 30px; +} + +.sig dd { + margin-top: 0px; + margin-bottom: 0px; +} + +.sig dl { + margin-top: 0px; + margin-bottom: 0px; +} + +dl > dd:last-child, +dl > dd:last-child > :last-child { + margin-bottom: 0; +} + +dt:target, span.highlighted { + background-color: #fbe54e; +} + +rect.highlighted { + fill: #fbe54e; +} + +dl.glossary dt { + font-weight: bold; + font-size: 1.1em; +} + +.versionmodified { + font-style: italic; +} + +.system-message { + background-color: #fda; + padding: 5px; + border: 3px solid red; +} + +.footnote:target { + background-color: #ffa; +} + +.line-block { + display: block; + margin-top: 1em; + margin-bottom: 1em; +} + +.line-block .line-block { + margin-top: 0; + margin-bottom: 0; + margin-left: 1.5em; +} + +.guilabel, .menuselection { + font-family: sans-serif; +} + +.accelerator { + text-decoration: underline; +} + +.classifier { + font-style: oblique; +} + +.classifier:before { + font-style: normal; + margin: 0 0.5em; + content: ":"; + display: inline-block; +} + +abbr, acronym { + border-bottom: dotted 1px; + cursor: help; +} + +.translated { + background-color: rgba(207, 255, 207, 0.2) +} + +.untranslated { + background-color: rgba(255, 207, 207, 0.2) +} + +/* -- code displays --------------------------------------------------------- */ + +pre { + overflow: auto; + overflow-y: hidden; /* fixes display issues on Chrome browsers */ +} + +pre, div[class*="highlight-"] { + clear: both; +} + +span.pre { + -moz-hyphens: none; + -ms-hyphens: none; + -webkit-hyphens: none; + hyphens: none; + white-space: nowrap; +} + +div[class*="highlight-"] { + margin: 1em 0; +} + +td.linenos pre { + border: 0; + background-color: transparent; + color: #aaa; +} + +table.highlighttable { + display: block; +} + +table.highlighttable tbody { + display: block; +} + +table.highlighttable tr { + display: flex; +} + +table.highlighttable td { + margin: 0; + padding: 0; +} + +table.highlighttable td.linenos { + padding-right: 0.5em; +} + +table.highlighttable td.code { + flex: 1; + overflow: hidden; +} + +.highlight .hll { + display: block; +} + +div.highlight pre, +table.highlighttable pre { + margin: 0; +} + +div.code-block-caption + div { + margin-top: 0; +} + +div.code-block-caption { + margin-top: 1em; + padding: 2px 5px; + font-size: small; +} + +div.code-block-caption code { + background-color: transparent; +} + +table.highlighttable td.linenos, +span.linenos, +div.highlight span.gp { /* gp: Generic.Prompt */ + user-select: none; + -webkit-user-select: text; /* Safari fallback only */ + -webkit-user-select: none; /* Chrome/Safari */ + -moz-user-select: none; /* Firefox */ + -ms-user-select: none; /* IE10+ */ +} + +div.code-block-caption span.caption-number { + padding: 0.1em 0.3em; + font-style: italic; +} + +div.code-block-caption span.caption-text { +} + +div.literal-block-wrapper { + margin: 1em 0; +} + +code.xref, a code { + background-color: transparent; + font-weight: bold; +} + +h1 code, h2 code, h3 code, h4 code, h5 code, h6 code { + background-color: transparent; +} + +.viewcode-link { + float: right; +} + +.viewcode-back { + float: right; + font-family: sans-serif; +} + +div.viewcode-block:target { + margin: -1px -10px; + padding: 0 10px; +} + +/* -- math display ---------------------------------------------------------- */ + +img.math { + vertical-align: middle; +} + +div.body div.math p { + text-align: center; +} + +span.eqno { + float: right; +} + +span.eqno a.headerlink { + position: absolute; + z-index: 1; +} + +div.math:hover a.headerlink { + visibility: visible; +} + +/* -- printout stylesheet --------------------------------------------------- */ + +@media print { + div.document, + div.documentwrapper, + div.bodywrapper { + margin: 0 !important; + width: 100%; + } + + div.sphinxsidebar, + div.related, + div.footer, + #top-link { + display: none; + } +} \ No newline at end of file diff --git a/4.2.0/_static/blla_heatmap.jpg b/4.2.0/_static/blla_heatmap.jpg new file mode 100644 index 000000000..3f3381096 Binary files /dev/null and b/4.2.0/_static/blla_heatmap.jpg differ diff --git a/4.2.0/_static/blla_output.jpg b/4.2.0/_static/blla_output.jpg new file mode 100644 index 000000000..72c652fa4 Binary files /dev/null and b/4.2.0/_static/blla_output.jpg differ diff --git a/4.2.0/_static/bw.png b/4.2.0/_static/bw.png new file mode 100644 index 000000000..e7e5054eb Binary files /dev/null and b/4.2.0/_static/bw.png differ diff --git a/4.2.0/_static/custom.css b/4.2.0/_static/custom.css new file mode 100644 index 000000000..c41f90af5 --- /dev/null +++ b/4.2.0/_static/custom.css @@ -0,0 +1,24 @@ +pre { + white-space: pre-wrap; +} +svg { + width: 100%; +} +.highlight .err { + border: inherit; + box-sizing: inherit; +} + +div.leftside { + width: 110px; + padding: 0px 3px 0px 0px; + float: left; +} + +div.rightside { + margin-left: 125px; +} + +dl.py { + margin-top: 25px; +} diff --git a/4.2.0/_static/doctools.js b/4.2.0/_static/doctools.js new file mode 100644 index 000000000..4d67807d1 --- /dev/null +++ b/4.2.0/_static/doctools.js @@ -0,0 +1,156 @@ +/* + * doctools.js + * ~~~~~~~~~~~ + * + * Base JavaScript utilities for all Sphinx HTML documentation. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ +"use strict"; + +const BLACKLISTED_KEY_CONTROL_ELEMENTS = new Set([ + "TEXTAREA", + "INPUT", + "SELECT", + "BUTTON", +]); + +const _ready = (callback) => { + if (document.readyState !== "loading") { + callback(); + } else { + document.addEventListener("DOMContentLoaded", callback); + } +}; + +/** + * Small JavaScript module for the documentation. + */ +const Documentation = { + init: () => { + Documentation.initDomainIndexTable(); + Documentation.initOnKeyListeners(); + }, + + /** + * i18n support + */ + TRANSLATIONS: {}, + PLURAL_EXPR: (n) => (n === 1 ? 0 : 1), + LOCALE: "unknown", + + // gettext and ngettext don't access this so that the functions + // can safely bound to a different name (_ = Documentation.gettext) + gettext: (string) => { + const translated = Documentation.TRANSLATIONS[string]; + switch (typeof translated) { + case "undefined": + return string; // no translation + case "string": + return translated; // translation exists + default: + return translated[0]; // (singular, plural) translation tuple exists + } + }, + + ngettext: (singular, plural, n) => { + const translated = Documentation.TRANSLATIONS[singular]; + if (typeof translated !== "undefined") + return translated[Documentation.PLURAL_EXPR(n)]; + return n === 1 ? singular : plural; + }, + + addTranslations: (catalog) => { + Object.assign(Documentation.TRANSLATIONS, catalog.messages); + Documentation.PLURAL_EXPR = new Function( + "n", + `return (${catalog.plural_expr})` + ); + Documentation.LOCALE = catalog.locale; + }, + + /** + * helper function to focus on search bar + */ + focusSearchBar: () => { + document.querySelectorAll("input[name=q]")[0]?.focus(); + }, + + /** + * Initialise the domain index toggle buttons + */ + initDomainIndexTable: () => { + const toggler = (el) => { + const idNumber = el.id.substr(7); + const toggledRows = document.querySelectorAll(`tr.cg-${idNumber}`); + if (el.src.substr(-9) === "minus.png") { + el.src = `${el.src.substr(0, el.src.length - 9)}plus.png`; + toggledRows.forEach((el) => (el.style.display = "none")); + } else { + el.src = `${el.src.substr(0, el.src.length - 8)}minus.png`; + toggledRows.forEach((el) => (el.style.display = "")); + } + }; + + const togglerElements = document.querySelectorAll("img.toggler"); + togglerElements.forEach((el) => + el.addEventListener("click", (event) => toggler(event.currentTarget)) + ); + togglerElements.forEach((el) => (el.style.display = "")); + if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) togglerElements.forEach(toggler); + }, + + initOnKeyListeners: () => { + // only install a listener if it is really needed + if ( + !DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS && + !DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS + ) + return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.altKey || event.ctrlKey || event.metaKey) return; + + if (!event.shiftKey) { + switch (event.key) { + case "ArrowLeft": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const prevLink = document.querySelector('link[rel="prev"]'); + if (prevLink && prevLink.href) { + window.location.href = prevLink.href; + event.preventDefault(); + } + break; + case "ArrowRight": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const nextLink = document.querySelector('link[rel="next"]'); + if (nextLink && nextLink.href) { + window.location.href = nextLink.href; + event.preventDefault(); + } + break; + } + } + + // some keyboard layouts may need Shift to get / + switch (event.key) { + case "/": + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) break; + Documentation.focusSearchBar(); + event.preventDefault(); + } + }); + }, +}; + +// quick alias for translations +const _ = Documentation.gettext; + +_ready(Documentation.init); diff --git a/4.2.0/_static/documentation_options.js b/4.2.0/_static/documentation_options.js new file mode 100644 index 000000000..7e4c114f2 --- /dev/null +++ b/4.2.0/_static/documentation_options.js @@ -0,0 +1,13 @@ +const DOCUMENTATION_OPTIONS = { + VERSION: '', + LANGUAGE: 'en', + COLLAPSE_INDEX: false, + BUILDER: 'html', + FILE_SUFFIX: '.html', + LINK_SUFFIX: '.html', + HAS_SOURCE: true, + SOURCELINK_SUFFIX: '.txt', + NAVIGATION_WITH_KEYS: false, + SHOW_SEARCH_SUMMARY: true, + ENABLE_SEARCH_SHORTCUTS: true, +}; \ No newline at end of file diff --git a/4.2.0/_static/file.png b/4.2.0/_static/file.png new file mode 100644 index 000000000..a858a410e Binary files /dev/null and b/4.2.0/_static/file.png differ diff --git a/4.2.0/_static/graphviz.css b/4.2.0/_static/graphviz.css new file mode 100644 index 000000000..027576e34 --- /dev/null +++ b/4.2.0/_static/graphviz.css @@ -0,0 +1,19 @@ +/* + * graphviz.css + * ~~~~~~~~~~~~ + * + * Sphinx stylesheet -- graphviz extension. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +img.graphviz { + border: 0; + max-width: 100%; +} + +object.graphviz { + max-width: 100%; +} diff --git a/4.2.0/_static/kraken.png b/4.2.0/_static/kraken.png new file mode 100644 index 000000000..8f25dd8be Binary files /dev/null and b/4.2.0/_static/kraken.png differ diff --git a/4.2.0/_static/kraken_recognition.svg b/4.2.0/_static/kraken_recognition.svg new file mode 100644 index 000000000..129b2c67a --- /dev/null +++ b/4.2.0/_static/kraken_recognition.svg @@ -0,0 +1,948 @@ + + + + + + + + + + + + Output Matrix + + + Labels + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Label + Sequence + + + 15, 10, 1, ... + + + + 'Time' Steps + + + + + + + + + + + + + + 'Time' Steps + (Width) + + + + + + + + + + + + + + + + + + + + + + + + + + Neural + Net + + + + Character + Sequence + + + o, c, u, ... + + + + + + + + + + + + + + + CTC + decoder + + + + + Codec + + + + + + + + + + + + + + diff --git a/4.2.0/_static/kraken_segmentation.svg b/4.2.0/_static/kraken_segmentation.svg new file mode 100644 index 000000000..4b9c860ce --- /dev/null +++ b/4.2.0/_static/kraken_segmentation.svg @@ -0,0 +1,1161 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Pixel Labelling + + + + + + + + Line and Separator + Heatmaps + + + + + + + + + Bounding Polygon + Calculation + + + + + + + + + + + Baseline + Vectorization + and Orientation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Oriented + Baselines + + + + + + + + + Line + Ordering + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Bounding + Polygons + + + + + + + Trainable + + + + + + + + + + + + Segmentation + + + + + + + + + + Region Heatmaps + + + + + + + + + + Region + Vectorization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Region + Boundaries + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/4.2.0/_static/kraken_segmodel.svg b/4.2.0/_static/kraken_segmodel.svg new file mode 100644 index 000000000..e722a9707 --- /dev/null +++ b/4.2.0/_static/kraken_segmodel.svg @@ -0,0 +1,250 @@ + + + + + + + + + + + + + Segmentation Model + (TorchVGSLModel) + + + + + + + + + Metadata + + + + + + + Line and Region Types + + + + + + + Baseline location flag + + + + + + + Bounding Regions + + + + + + + + + + + Neural Network + + + + diff --git a/4.2.0/_static/kraken_torchseqrecognizer.svg b/4.2.0/_static/kraken_torchseqrecognizer.svg new file mode 100644 index 000000000..c9a2f1135 --- /dev/null +++ b/4.2.0/_static/kraken_torchseqrecognizer.svg @@ -0,0 +1,239 @@ + + + + + + + + + + + + + Transcription Model + (TorchSeqRecognizer) + + + + + + + + + + Codec + + + + + + + + + + + Metadata + + + + + + + + + + + CTC Decoder + + + + + + + + + + + Neural Network + + + + diff --git a/4.2.0/_static/kraken_workflow.svg b/4.2.0/_static/kraken_workflow.svg new file mode 100644 index 000000000..5a50b51d6 --- /dev/null +++ b/4.2.0/_static/kraken_workflow.svg @@ -0,0 +1,753 @@ + + + + + + + + + + + + + + + Segmentation + + + + + + + + + + + Recognition + + + + + + + + + + + Serialization + + + + + + + + + + + + + + + + + + + + + + Recognition Model + + + + + + + + + + + + + + + + + + + + + + Segmentation Model + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + OCR Records + + + + + + + + + + + + + + + + + + Baselines, + Regions, + and Order + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Output File + + + + + + + + + + + + + + + + + + Output Template + + + + + + + + + + + + + + + + + + Image + + diff --git a/4.2.0/_static/language_data.js b/4.2.0/_static/language_data.js new file mode 100644 index 000000000..367b8ed81 --- /dev/null +++ b/4.2.0/_static/language_data.js @@ -0,0 +1,199 @@ +/* + * language_data.js + * ~~~~~~~~~~~~~~~~ + * + * This script contains the language-specific data used by searchtools.js, + * namely the list of stopwords, stemmer, scorer and splitter. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"]; + + +/* Non-minified version is copied as a separate JS file, if available */ + +/** + * Porter Stemmer + */ +var Stemmer = function() { + + var step2list = { + ational: 'ate', + tional: 'tion', + enci: 'ence', + anci: 'ance', + izer: 'ize', + bli: 'ble', + alli: 'al', + entli: 'ent', + eli: 'e', + ousli: 'ous', + ization: 'ize', + ation: 'ate', + ator: 'ate', + alism: 'al', + iveness: 'ive', + fulness: 'ful', + ousness: 'ous', + aliti: 'al', + iviti: 'ive', + biliti: 'ble', + logi: 'log' + }; + + var step3list = { + icate: 'ic', + ative: '', + alize: 'al', + iciti: 'ic', + ical: 'ic', + ful: '', + ness: '' + }; + + var c = "[^aeiou]"; // consonant + var v = "[aeiouy]"; // vowel + var C = c + "[^aeiouy]*"; // consonant sequence + var V = v + "[aeiou]*"; // vowel sequence + + var mgr0 = "^(" + C + ")?" + V + C; // [C]VC... is m>0 + var meq1 = "^(" + C + ")?" + V + C + "(" + V + ")?$"; // [C]VC[V] is m=1 + var mgr1 = "^(" + C + ")?" + V + C + V + C; // [C]VCVC... is m>1 + var s_v = "^(" + C + ")?" + v; // vowel in stem + + this.stemWord = function (w) { + var stem; + var suffix; + var firstch; + var origword = w; + + if (w.length < 3) + return w; + + var re; + var re2; + var re3; + var re4; + + firstch = w.substr(0,1); + if (firstch == "y") + w = firstch.toUpperCase() + w.substr(1); + + // Step 1a + re = /^(.+?)(ss|i)es$/; + re2 = /^(.+?)([^s])s$/; + + if (re.test(w)) + w = w.replace(re,"$1$2"); + else if (re2.test(w)) + w = w.replace(re2,"$1$2"); + + // Step 1b + re = /^(.+?)eed$/; + re2 = /^(.+?)(ed|ing)$/; + if (re.test(w)) { + var fp = re.exec(w); + re = new RegExp(mgr0); + if (re.test(fp[1])) { + re = /.$/; + w = w.replace(re,""); + } + } + else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1]; + re2 = new RegExp(s_v); + if (re2.test(stem)) { + w = stem; + re2 = /(at|bl|iz)$/; + re3 = new RegExp("([^aeiouylsz])\\1$"); + re4 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + if (re2.test(w)) + w = w + "e"; + else if (re3.test(w)) { + re = /.$/; + w = w.replace(re,""); + } + else if (re4.test(w)) + w = w + "e"; + } + } + + // Step 1c + re = /^(.+?)y$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(s_v); + if (re.test(stem)) + w = stem + "i"; + } + + // Step 2 + re = /^(.+?)(ational|tional|enci|anci|izer|bli|alli|entli|eli|ousli|ization|ation|ator|alism|iveness|fulness|ousness|aliti|iviti|biliti|logi)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = new RegExp(mgr0); + if (re.test(stem)) + w = stem + step2list[suffix]; + } + + // Step 3 + re = /^(.+?)(icate|ative|alize|iciti|ical|ful|ness)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = new RegExp(mgr0); + if (re.test(stem)) + w = stem + step3list[suffix]; + } + + // Step 4 + re = /^(.+?)(al|ance|ence|er|ic|able|ible|ant|ement|ment|ent|ou|ism|ate|iti|ous|ive|ize)$/; + re2 = /^(.+?)(s|t)(ion)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(mgr1); + if (re.test(stem)) + w = stem; + } + else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1] + fp[2]; + re2 = new RegExp(mgr1); + if (re2.test(stem)) + w = stem; + } + + // Step 5 + re = /^(.+?)e$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(mgr1); + re2 = new RegExp(meq1); + re3 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + if (re.test(stem) || (re2.test(stem) && !(re3.test(stem)))) + w = stem; + } + re = /ll$/; + re2 = new RegExp(mgr1); + if (re.test(w) && re2.test(w)) { + re = /.$/; + w = w.replace(re,""); + } + + // and turn initial Y back to y + if (firstch == "y") + w = firstch.toLowerCase() + w.substr(1); + return w; + } +} + diff --git a/4.2.0/_static/minus.png b/4.2.0/_static/minus.png new file mode 100644 index 000000000..d96755fda Binary files /dev/null and b/4.2.0/_static/minus.png differ diff --git a/4.2.0/_static/normal-reproduction-low-resolution.jpg b/4.2.0/_static/normal-reproduction-low-resolution.jpg new file mode 100644 index 000000000..673be92ae Binary files /dev/null and b/4.2.0/_static/normal-reproduction-low-resolution.jpg differ diff --git a/4.2.0/_static/pat.png b/4.2.0/_static/pat.png new file mode 100644 index 000000000..52f7a8995 Binary files /dev/null and b/4.2.0/_static/pat.png differ diff --git a/4.2.0/_static/plus.png b/4.2.0/_static/plus.png new file mode 100644 index 000000000..7107cec93 Binary files /dev/null and b/4.2.0/_static/plus.png differ diff --git a/4.2.0/_static/pygments.css b/4.2.0/_static/pygments.css new file mode 100644 index 000000000..0d49244ed --- /dev/null +++ b/4.2.0/_static/pygments.css @@ -0,0 +1,75 @@ +pre { line-height: 125%; } +td.linenos .normal { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +span.linenos { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +td.linenos .special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +.highlight .hll { background-color: #ffffcc } +.highlight { background: #eeffcc; } +.highlight .c { color: #408090; font-style: italic } /* Comment */ +.highlight .err { border: 1px solid #FF0000 } /* Error */ +.highlight .k { color: #007020; font-weight: bold } /* Keyword */ +.highlight .o { color: #666666 } /* Operator */ +.highlight .ch { color: #408090; font-style: italic } /* Comment.Hashbang */ +.highlight .cm { color: #408090; font-style: italic } /* Comment.Multiline */ +.highlight .cp { color: #007020 } /* Comment.Preproc */ +.highlight .cpf { color: #408090; font-style: italic } /* Comment.PreprocFile */ +.highlight .c1 { color: #408090; font-style: italic } /* Comment.Single */ +.highlight .cs { color: #408090; background-color: #fff0f0 } /* Comment.Special */ +.highlight .gd { color: #A00000 } /* Generic.Deleted */ +.highlight .ge { font-style: italic } /* Generic.Emph */ +.highlight .ges { font-weight: bold; font-style: italic } /* Generic.EmphStrong */ +.highlight .gr { color: #FF0000 } /* Generic.Error */ +.highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */ +.highlight .gi { color: #00A000 } /* Generic.Inserted */ +.highlight .go { color: #333333 } /* Generic.Output */ +.highlight .gp { color: #c65d09; font-weight: bold } /* Generic.Prompt */ +.highlight .gs { font-weight: bold } /* Generic.Strong */ +.highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */ +.highlight .gt { color: #0044DD } /* Generic.Traceback */ +.highlight .kc { color: #007020; font-weight: bold } /* Keyword.Constant */ +.highlight .kd { color: #007020; font-weight: bold } /* Keyword.Declaration */ +.highlight .kn { color: #007020; font-weight: bold } /* Keyword.Namespace */ +.highlight .kp { color: #007020 } /* Keyword.Pseudo */ +.highlight .kr { color: #007020; font-weight: bold } /* Keyword.Reserved */ +.highlight .kt { color: #902000 } /* Keyword.Type */ +.highlight .m { color: #208050 } /* Literal.Number */ +.highlight .s { color: #4070a0 } /* Literal.String */ +.highlight .na { color: #4070a0 } /* Name.Attribute */ +.highlight .nb { color: #007020 } /* Name.Builtin */ +.highlight .nc { color: #0e84b5; font-weight: bold } /* Name.Class */ +.highlight .no { color: #60add5 } /* Name.Constant */ +.highlight .nd { color: #555555; font-weight: bold } /* Name.Decorator */ +.highlight .ni { color: #d55537; font-weight: bold } /* Name.Entity */ +.highlight .ne { color: #007020 } /* Name.Exception */ +.highlight .nf { color: #06287e } /* Name.Function */ +.highlight .nl { color: #002070; font-weight: bold } /* Name.Label */ +.highlight .nn { color: #0e84b5; font-weight: bold } /* Name.Namespace */ +.highlight .nt { color: #062873; font-weight: bold } /* Name.Tag */ +.highlight .nv { color: #bb60d5 } /* Name.Variable */ +.highlight .ow { color: #007020; font-weight: bold } /* Operator.Word */ +.highlight .w { color: #bbbbbb } /* Text.Whitespace */ +.highlight .mb { color: #208050 } /* Literal.Number.Bin */ +.highlight .mf { color: #208050 } /* Literal.Number.Float */ +.highlight .mh { color: #208050 } /* Literal.Number.Hex */ +.highlight .mi { color: #208050 } /* Literal.Number.Integer */ +.highlight .mo { color: #208050 } /* Literal.Number.Oct */ +.highlight .sa { color: #4070a0 } /* Literal.String.Affix */ +.highlight .sb { color: #4070a0 } /* Literal.String.Backtick */ +.highlight .sc { color: #4070a0 } /* Literal.String.Char */ +.highlight .dl { color: #4070a0 } /* Literal.String.Delimiter */ +.highlight .sd { color: #4070a0; font-style: italic } /* Literal.String.Doc */ +.highlight .s2 { color: #4070a0 } /* Literal.String.Double */ +.highlight .se { color: #4070a0; font-weight: bold } /* Literal.String.Escape */ +.highlight .sh { color: #4070a0 } /* Literal.String.Heredoc */ +.highlight .si { color: #70a0d0; font-style: italic } /* Literal.String.Interpol */ +.highlight .sx { color: #c65d09 } /* Literal.String.Other */ +.highlight .sr { color: #235388 } /* Literal.String.Regex */ +.highlight .s1 { color: #4070a0 } /* Literal.String.Single */ +.highlight .ss { color: #517918 } /* Literal.String.Symbol */ +.highlight .bp { color: #007020 } /* Name.Builtin.Pseudo */ +.highlight .fm { color: #06287e } /* Name.Function.Magic */ +.highlight .vc { color: #bb60d5 } /* Name.Variable.Class */ +.highlight .vg { color: #bb60d5 } /* Name.Variable.Global */ +.highlight .vi { color: #bb60d5 } /* Name.Variable.Instance */ +.highlight .vm { color: #bb60d5 } /* Name.Variable.Magic */ +.highlight .il { color: #208050 } /* Literal.Number.Integer.Long */ \ No newline at end of file diff --git a/4.2.0/_static/searchtools.js b/4.2.0/_static/searchtools.js new file mode 100644 index 000000000..b08d58c9b --- /dev/null +++ b/4.2.0/_static/searchtools.js @@ -0,0 +1,620 @@ +/* + * searchtools.js + * ~~~~~~~~~~~~~~~~ + * + * Sphinx JavaScript utilities for the full-text search. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ +"use strict"; + +/** + * Simple result scoring code. + */ +if (typeof Scorer === "undefined") { + var Scorer = { + // Implement the following function to further tweak the score for each result + // The function takes a result array [docname, title, anchor, descr, score, filename] + // and returns the new score. + /* + score: result => { + const [docname, title, anchor, descr, score, filename] = result + return score + }, + */ + + // query matches the full name of an object + objNameMatch: 11, + // or matches in the last dotted part of the object name + objPartialMatch: 6, + // Additive scores depending on the priority of the object + objPrio: { + 0: 15, // used to be importantResults + 1: 5, // used to be objectResults + 2: -5, // used to be unimportantResults + }, + // Used when the priority is not in the mapping. + objPrioDefault: 0, + + // query found in title + title: 15, + partialTitle: 7, + // query found in terms + term: 5, + partialTerm: 2, + }; +} + +const _removeChildren = (element) => { + while (element && element.lastChild) element.removeChild(element.lastChild); +}; + +/** + * See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping + */ +const _escapeRegExp = (string) => + string.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string + +const _displayItem = (item, searchTerms, highlightTerms) => { + const docBuilder = DOCUMENTATION_OPTIONS.BUILDER; + const docFileSuffix = DOCUMENTATION_OPTIONS.FILE_SUFFIX; + const docLinkSuffix = DOCUMENTATION_OPTIONS.LINK_SUFFIX; + const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY; + const contentRoot = document.documentElement.dataset.content_root; + + const [docName, title, anchor, descr, score, _filename] = item; + + let listItem = document.createElement("li"); + let requestUrl; + let linkUrl; + if (docBuilder === "dirhtml") { + // dirhtml builder + let dirname = docName + "/"; + if (dirname.match(/\/index\/$/)) + dirname = dirname.substring(0, dirname.length - 6); + else if (dirname === "index/") dirname = ""; + requestUrl = contentRoot + dirname; + linkUrl = requestUrl; + } else { + // normal html builders + requestUrl = contentRoot + docName + docFileSuffix; + linkUrl = docName + docLinkSuffix; + } + let linkEl = listItem.appendChild(document.createElement("a")); + linkEl.href = linkUrl + anchor; + linkEl.dataset.score = score; + linkEl.innerHTML = title; + if (descr) { + listItem.appendChild(document.createElement("span")).innerHTML = + " (" + descr + ")"; + // highlight search terms in the description + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + } + else if (showSearchSummary) + fetch(requestUrl) + .then((responseData) => responseData.text()) + .then((data) => { + if (data) + listItem.appendChild( + Search.makeSearchSummary(data, searchTerms, anchor) + ); + // highlight search terms in the summary + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + }); + Search.output.appendChild(listItem); +}; +const _finishSearch = (resultCount) => { + Search.stopPulse(); + Search.title.innerText = _("Search Results"); + if (!resultCount) + Search.status.innerText = Documentation.gettext( + "Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories." + ); + else + Search.status.innerText = _( + "Search finished, found ${resultCount} page(s) matching the search query." + ).replace('${resultCount}', resultCount); +}; +const _displayNextItem = ( + results, + resultCount, + searchTerms, + highlightTerms, +) => { + // results left, load the summary and display it + // this is intended to be dynamic (don't sub resultsCount) + if (results.length) { + _displayItem(results.pop(), searchTerms, highlightTerms); + setTimeout( + () => _displayNextItem(results, resultCount, searchTerms, highlightTerms), + 5 + ); + } + // search finished, update title and status message + else _finishSearch(resultCount); +}; +// Helper function used by query() to order search results. +// Each input is an array of [docname, title, anchor, descr, score, filename]. +// Order the results by score (in opposite order of appearance, since the +// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically. +const _orderResultsByScoreThenName = (a, b) => { + const leftScore = a[4]; + const rightScore = b[4]; + if (leftScore === rightScore) { + // same score: sort alphabetically + const leftTitle = a[1].toLowerCase(); + const rightTitle = b[1].toLowerCase(); + if (leftTitle === rightTitle) return 0; + return leftTitle > rightTitle ? -1 : 1; // inverted is intentional + } + return leftScore > rightScore ? 1 : -1; +}; + +/** + * Default splitQuery function. Can be overridden in ``sphinx.search`` with a + * custom function per language. + * + * The regular expression works by splitting the string on consecutive characters + * that are not Unicode letters, numbers, underscores, or emoji characters. + * This is the same as ``\W+`` in Python, preserving the surrogate pair area. + */ +if (typeof splitQuery === "undefined") { + var splitQuery = (query) => query + .split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu) + .filter(term => term) // remove remaining empty strings +} + +/** + * Search Module + */ +const Search = { + _index: null, + _queued_query: null, + _pulse_status: -1, + + htmlToText: (htmlString, anchor) => { + const htmlElement = new DOMParser().parseFromString(htmlString, 'text/html'); + for (const removalQuery of [".headerlink", "script", "style"]) { + htmlElement.querySelectorAll(removalQuery).forEach((el) => { el.remove() }); + } + if (anchor) { + const anchorContent = htmlElement.querySelector(`[role="main"] ${anchor}`); + if (anchorContent) return anchorContent.textContent; + + console.warn( + `Anchored content block not found. Sphinx search tries to obtain it via DOM query '[role=main] ${anchor}'. Check your theme or template.` + ); + } + + // if anchor not specified or not found, fall back to main content + const docContent = htmlElement.querySelector('[role="main"]'); + if (docContent) return docContent.textContent; + + console.warn( + "Content block not found. Sphinx search tries to obtain it via DOM query '[role=main]'. Check your theme or template." + ); + return ""; + }, + + init: () => { + const query = new URLSearchParams(window.location.search).get("q"); + document + .querySelectorAll('input[name="q"]') + .forEach((el) => (el.value = query)); + if (query) Search.performSearch(query); + }, + + loadIndex: (url) => + (document.body.appendChild(document.createElement("script")).src = url), + + setIndex: (index) => { + Search._index = index; + if (Search._queued_query !== null) { + const query = Search._queued_query; + Search._queued_query = null; + Search.query(query); + } + }, + + hasIndex: () => Search._index !== null, + + deferQuery: (query) => (Search._queued_query = query), + + stopPulse: () => (Search._pulse_status = -1), + + startPulse: () => { + if (Search._pulse_status >= 0) return; + + const pulse = () => { + Search._pulse_status = (Search._pulse_status + 1) % 4; + Search.dots.innerText = ".".repeat(Search._pulse_status); + if (Search._pulse_status >= 0) window.setTimeout(pulse, 500); + }; + pulse(); + }, + + /** + * perform a search for something (or wait until index is loaded) + */ + performSearch: (query) => { + // create the required interface elements + const searchText = document.createElement("h2"); + searchText.textContent = _("Searching"); + const searchSummary = document.createElement("p"); + searchSummary.classList.add("search-summary"); + searchSummary.innerText = ""; + const searchList = document.createElement("ul"); + searchList.classList.add("search"); + + const out = document.getElementById("search-results"); + Search.title = out.appendChild(searchText); + Search.dots = Search.title.appendChild(document.createElement("span")); + Search.status = out.appendChild(searchSummary); + Search.output = out.appendChild(searchList); + + const searchProgress = document.getElementById("search-progress"); + // Some themes don't use the search progress node + if (searchProgress) { + searchProgress.innerText = _("Preparing search..."); + } + Search.startPulse(); + + // index already loaded, the browser was quick! + if (Search.hasIndex()) Search.query(query); + else Search.deferQuery(query); + }, + + _parseQuery: (query) => { + // stem the search terms and add them to the correct list + const stemmer = new Stemmer(); + const searchTerms = new Set(); + const excludedTerms = new Set(); + const highlightTerms = new Set(); + const objectTerms = new Set(splitQuery(query.toLowerCase().trim())); + splitQuery(query.trim()).forEach((queryTerm) => { + const queryTermLower = queryTerm.toLowerCase(); + + // maybe skip this "word" + // stopwords array is from language_data.js + if ( + stopwords.indexOf(queryTermLower) !== -1 || + queryTerm.match(/^\d+$/) + ) + return; + + // stem the word + let word = stemmer.stemWord(queryTermLower); + // select the correct list + if (word[0] === "-") excludedTerms.add(word.substr(1)); + else { + searchTerms.add(word); + highlightTerms.add(queryTermLower); + } + }); + + if (SPHINX_HIGHLIGHT_ENABLED) { // set in sphinx_highlight.js + localStorage.setItem("sphinx_highlight_terms", [...highlightTerms].join(" ")) + } + + // console.debug("SEARCH: searching for:"); + // console.info("required: ", [...searchTerms]); + // console.info("excluded: ", [...excludedTerms]); + + return [query, searchTerms, excludedTerms, highlightTerms, objectTerms]; + }, + + /** + * execute search (requires search index to be loaded) + */ + _performSearch: (query, searchTerms, excludedTerms, highlightTerms, objectTerms) => { + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const titles = Search._index.titles; + const allTitles = Search._index.alltitles; + const indexEntries = Search._index.indexentries; + + // Collect multiple result groups to be sorted separately and then ordered. + // Each is an array of [docname, title, anchor, descr, score, filename]. + const normalResults = []; + const nonMainIndexResults = []; + + _removeChildren(document.getElementById("search-progress")); + + const queryLower = query.toLowerCase().trim(); + for (const [title, foundTitles] of Object.entries(allTitles)) { + if (title.toLowerCase().trim().includes(queryLower) && (queryLower.length >= title.length/2)) { + for (const [file, id] of foundTitles) { + const score = Math.round(Scorer.title * queryLower.length / title.length); + const boost = titles[file] === title ? 1 : 0; // add a boost for document titles + normalResults.push([ + docNames[file], + titles[file] !== title ? `${titles[file]} > ${title}` : title, + id !== null ? "#" + id : "", + null, + score + boost, + filenames[file], + ]); + } + } + } + + // search for explicit entries in index directives + for (const [entry, foundEntries] of Object.entries(indexEntries)) { + if (entry.includes(queryLower) && (queryLower.length >= entry.length/2)) { + for (const [file, id, isMain] of foundEntries) { + const score = Math.round(100 * queryLower.length / entry.length); + const result = [ + docNames[file], + titles[file], + id ? "#" + id : "", + null, + score, + filenames[file], + ]; + if (isMain) { + normalResults.push(result); + } else { + nonMainIndexResults.push(result); + } + } + } + } + + // lookup as object + objectTerms.forEach((term) => + normalResults.push(...Search.performObjectSearch(term, objectTerms)) + ); + + // lookup as search terms in fulltext + normalResults.push(...Search.performTermsSearch(searchTerms, excludedTerms)); + + // let the scorer override scores with a custom scoring function + if (Scorer.score) { + normalResults.forEach((item) => (item[4] = Scorer.score(item))); + nonMainIndexResults.forEach((item) => (item[4] = Scorer.score(item))); + } + + // Sort each group of results by score and then alphabetically by name. + normalResults.sort(_orderResultsByScoreThenName); + nonMainIndexResults.sort(_orderResultsByScoreThenName); + + // Combine the result groups in (reverse) order. + // Non-main index entries are typically arbitrary cross-references, + // so display them after other results. + let results = [...nonMainIndexResults, ...normalResults]; + + // remove duplicate search results + // note the reversing of results, so that in the case of duplicates, the highest-scoring entry is kept + let seen = new Set(); + results = results.reverse().reduce((acc, result) => { + let resultStr = result.slice(0, 4).concat([result[5]]).map(v => String(v)).join(','); + if (!seen.has(resultStr)) { + acc.push(result); + seen.add(resultStr); + } + return acc; + }, []); + + return results.reverse(); + }, + + query: (query) => { + const [searchQuery, searchTerms, excludedTerms, highlightTerms, objectTerms] = Search._parseQuery(query); + const results = Search._performSearch(searchQuery, searchTerms, excludedTerms, highlightTerms, objectTerms); + + // for debugging + //Search.lastresults = results.slice(); // a copy + // console.info("search results:", Search.lastresults); + + // print the results + _displayNextItem(results, results.length, searchTerms, highlightTerms); + }, + + /** + * search for object names + */ + performObjectSearch: (object, objectTerms) => { + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const objects = Search._index.objects; + const objNames = Search._index.objnames; + const titles = Search._index.titles; + + const results = []; + + const objectSearchCallback = (prefix, match) => { + const name = match[4] + const fullname = (prefix ? prefix + "." : "") + name; + const fullnameLower = fullname.toLowerCase(); + if (fullnameLower.indexOf(object) < 0) return; + + let score = 0; + const parts = fullnameLower.split("."); + + // check for different match types: exact matches of full name or + // "last name" (i.e. last dotted part) + if (fullnameLower === object || parts.slice(-1)[0] === object) + score += Scorer.objNameMatch; + else if (parts.slice(-1)[0].indexOf(object) > -1) + score += Scorer.objPartialMatch; // matches in last name + + const objName = objNames[match[1]][2]; + const title = titles[match[0]]; + + // If more than one term searched for, we require other words to be + // found in the name/title/description + const otherTerms = new Set(objectTerms); + otherTerms.delete(object); + if (otherTerms.size > 0) { + const haystack = `${prefix} ${name} ${objName} ${title}`.toLowerCase(); + if ( + [...otherTerms].some((otherTerm) => haystack.indexOf(otherTerm) < 0) + ) + return; + } + + let anchor = match[3]; + if (anchor === "") anchor = fullname; + else if (anchor === "-") anchor = objNames[match[1]][1] + "-" + fullname; + + const descr = objName + _(", in ") + title; + + // add custom score for some objects according to scorer + if (Scorer.objPrio.hasOwnProperty(match[2])) + score += Scorer.objPrio[match[2]]; + else score += Scorer.objPrioDefault; + + results.push([ + docNames[match[0]], + fullname, + "#" + anchor, + descr, + score, + filenames[match[0]], + ]); + }; + Object.keys(objects).forEach((prefix) => + objects[prefix].forEach((array) => + objectSearchCallback(prefix, array) + ) + ); + return results; + }, + + /** + * search for full-text terms in the index + */ + performTermsSearch: (searchTerms, excludedTerms) => { + // prepare search + const terms = Search._index.terms; + const titleTerms = Search._index.titleterms; + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const titles = Search._index.titles; + + const scoreMap = new Map(); + const fileMap = new Map(); + + // perform the search on the required terms + searchTerms.forEach((word) => { + const files = []; + const arr = [ + { files: terms[word], score: Scorer.term }, + { files: titleTerms[word], score: Scorer.title }, + ]; + // add support for partial matches + if (word.length > 2) { + const escapedWord = _escapeRegExp(word); + if (!terms.hasOwnProperty(word)) { + Object.keys(terms).forEach((term) => { + if (term.match(escapedWord)) + arr.push({ files: terms[term], score: Scorer.partialTerm }); + }); + } + if (!titleTerms.hasOwnProperty(word)) { + Object.keys(titleTerms).forEach((term) => { + if (term.match(escapedWord)) + arr.push({ files: titleTerms[term], score: Scorer.partialTitle }); + }); + } + } + + // no match but word was a required one + if (arr.every((record) => record.files === undefined)) return; + + // found search word in contents + arr.forEach((record) => { + if (record.files === undefined) return; + + let recordFiles = record.files; + if (recordFiles.length === undefined) recordFiles = [recordFiles]; + files.push(...recordFiles); + + // set score for the word in each file + recordFiles.forEach((file) => { + if (!scoreMap.has(file)) scoreMap.set(file, {}); + scoreMap.get(file)[word] = record.score; + }); + }); + + // create the mapping + files.forEach((file) => { + if (!fileMap.has(file)) fileMap.set(file, [word]); + else if (fileMap.get(file).indexOf(word) === -1) fileMap.get(file).push(word); + }); + }); + + // now check if the files don't contain excluded terms + const results = []; + for (const [file, wordList] of fileMap) { + // check if all requirements are matched + + // as search terms with length < 3 are discarded + const filteredTermCount = [...searchTerms].filter( + (term) => term.length > 2 + ).length; + if ( + wordList.length !== searchTerms.size && + wordList.length !== filteredTermCount + ) + continue; + + // ensure that none of the excluded terms is in the search result + if ( + [...excludedTerms].some( + (term) => + terms[term] === file || + titleTerms[term] === file || + (terms[term] || []).includes(file) || + (titleTerms[term] || []).includes(file) + ) + ) + break; + + // select one (max) score for the file. + const score = Math.max(...wordList.map((w) => scoreMap.get(file)[w])); + // add result to the result list + results.push([ + docNames[file], + titles[file], + "", + null, + score, + filenames[file], + ]); + } + return results; + }, + + /** + * helper function to return a node containing the + * search summary for a given text. keywords is a list + * of stemmed words. + */ + makeSearchSummary: (htmlText, keywords, anchor) => { + const text = Search.htmlToText(htmlText, anchor); + if (text === "") return null; + + const textLower = text.toLowerCase(); + const actualStartPosition = [...keywords] + .map((k) => textLower.indexOf(k.toLowerCase())) + .filter((i) => i > -1) + .slice(-1)[0]; + const startWithContext = Math.max(actualStartPosition - 120, 0); + + const top = startWithContext === 0 ? "" : "..."; + const tail = startWithContext + 240 < text.length ? "..." : ""; + + let summary = document.createElement("p"); + summary.classList.add("context"); + summary.textContent = top + text.substr(startWithContext, 240).trim() + tail; + + return summary; + }, +}; + +_ready(Search.init); diff --git a/4.2.0/_static/sphinx_highlight.js b/4.2.0/_static/sphinx_highlight.js new file mode 100644 index 000000000..8a96c69a1 --- /dev/null +++ b/4.2.0/_static/sphinx_highlight.js @@ -0,0 +1,154 @@ +/* Highlighting utilities for Sphinx HTML documentation. */ +"use strict"; + +const SPHINX_HIGHLIGHT_ENABLED = true + +/** + * highlight a given string on a node by wrapping it in + * span elements with the given class name. + */ +const _highlight = (node, addItems, text, className) => { + if (node.nodeType === Node.TEXT_NODE) { + const val = node.nodeValue; + const parent = node.parentNode; + const pos = val.toLowerCase().indexOf(text); + if ( + pos >= 0 && + !parent.classList.contains(className) && + !parent.classList.contains("nohighlight") + ) { + let span; + + const closestNode = parent.closest("body, svg, foreignObject"); + const isInSVG = closestNode && closestNode.matches("svg"); + if (isInSVG) { + span = document.createElementNS("http://www.w3.org/2000/svg", "tspan"); + } else { + span = document.createElement("span"); + span.classList.add(className); + } + + span.appendChild(document.createTextNode(val.substr(pos, text.length))); + const rest = document.createTextNode(val.substr(pos + text.length)); + parent.insertBefore( + span, + parent.insertBefore( + rest, + node.nextSibling + ) + ); + node.nodeValue = val.substr(0, pos); + /* There may be more occurrences of search term in this node. So call this + * function recursively on the remaining fragment. + */ + _highlight(rest, addItems, text, className); + + if (isInSVG) { + const rect = document.createElementNS( + "http://www.w3.org/2000/svg", + "rect" + ); + const bbox = parent.getBBox(); + rect.x.baseVal.value = bbox.x; + rect.y.baseVal.value = bbox.y; + rect.width.baseVal.value = bbox.width; + rect.height.baseVal.value = bbox.height; + rect.setAttribute("class", className); + addItems.push({ parent: parent, target: rect }); + } + } + } else if (node.matches && !node.matches("button, select, textarea")) { + node.childNodes.forEach((el) => _highlight(el, addItems, text, className)); + } +}; +const _highlightText = (thisNode, text, className) => { + let addItems = []; + _highlight(thisNode, addItems, text, className); + addItems.forEach((obj) => + obj.parent.insertAdjacentElement("beforebegin", obj.target) + ); +}; + +/** + * Small JavaScript module for the documentation. + */ +const SphinxHighlight = { + + /** + * highlight the search words provided in localstorage in the text + */ + highlightSearchWords: () => { + if (!SPHINX_HIGHLIGHT_ENABLED) return; // bail if no highlight + + // get and clear terms from localstorage + const url = new URL(window.location); + const highlight = + localStorage.getItem("sphinx_highlight_terms") + || url.searchParams.get("highlight") + || ""; + localStorage.removeItem("sphinx_highlight_terms") + url.searchParams.delete("highlight"); + window.history.replaceState({}, "", url); + + // get individual terms from highlight string + const terms = highlight.toLowerCase().split(/\s+/).filter(x => x); + if (terms.length === 0) return; // nothing to do + + // There should never be more than one element matching "div.body" + const divBody = document.querySelectorAll("div.body"); + const body = divBody.length ? divBody[0] : document.querySelector("body"); + window.setTimeout(() => { + terms.forEach((term) => _highlightText(body, term, "highlighted")); + }, 10); + + const searchBox = document.getElementById("searchbox"); + if (searchBox === null) return; + searchBox.appendChild( + document + .createRange() + .createContextualFragment( + '" + ) + ); + }, + + /** + * helper function to hide the search marks again + */ + hideSearchWords: () => { + document + .querySelectorAll("#searchbox .highlight-link") + .forEach((el) => el.remove()); + document + .querySelectorAll("span.highlighted") + .forEach((el) => el.classList.remove("highlighted")); + localStorage.removeItem("sphinx_highlight_terms") + }, + + initEscapeListener: () => { + // only install a listener if it is really needed + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.shiftKey || event.altKey || event.ctrlKey || event.metaKey) return; + if (DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS && (event.key === "Escape")) { + SphinxHighlight.hideSearchWords(); + event.preventDefault(); + } + }); + }, +}; + +_ready(() => { + /* Do not call highlightSearchWords() when we are on the search page. + * It will highlight words from the *previous* search query. + */ + if (typeof Search === "undefined") SphinxHighlight.highlightSearchWords(); + SphinxHighlight.initEscapeListener(); +}); diff --git a/4.2.0/advanced.html b/4.2.0/advanced.html new file mode 100644 index 000000000..e0496cd99 --- /dev/null +++ b/4.2.0/advanced.html @@ -0,0 +1,517 @@ + + + + + + + + Advanced Usage — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Advanced Usage

+

Optical character recognition is the serial execution of multiple steps, in the +case of kraken, layout analysis/page segmentation (extracting topological text +lines from an image), recognition (feeding text lines images into a +classifier), and finally serialization of results into an appropriate format +such as ALTO or PageXML.

+
+

Input Specification

+

Kraken inputs and their outputs can be defined in multiple ways. The most +simple are input-output pairs, i.e. producing one output document for one input +document follow the basic syntax:

+
$ kraken -i input_1 output_1 -i input_2 output_2 ... subcommand_1 subcommand_2 ... subcommand_n
+
+
+

In particular subcommands may be chained.

+

There are other ways to define inputs and outputs as the syntax shown above can +become rather cumbersome for large amounts of files.

+

As such there are a couple of ways to deal with multiple files in a compact +way. The first is batch processing:

+
$ kraken -I '*.png' -o ocr.txt segment ...
+
+
+

which expands the glob expression in kraken internally and +appends the suffix defined with -o to each output file. An input file +xyz.png will therefore produce an output file xyz.png.ocr.txt. A second way +is to input multi-image files directly. These can be either in PDF, TIFF, or +JPEG2000 format and are specified like:

+
$ kraken -I some.pdf -o ocr.txt -f pdf segment ...
+
+
+

This will internally extract all page images from the input PDF file and write +one output file with an index (can be changed using the -p option) and the +suffix defined with -o.

+

The -f option can not only be used to extract data from PDF/TIFF/JPEG2000 +files but also various XML formats. In these cases the appropriate data is +automatically selected from the inputs, image data for segmentation or line and +region segmentation for recognition:

+
$ kraken -i alto.xml alto.ocr.txt -i page.xml page.ocr.txt -f xml ocr ...
+
+
+

The code is able to automatically determine if a file is in PageXML or ALTO format.

+
+
+

Binarization

+
+

Note

+

Binarization is deprecated and mostly not necessary anymore. It can often +worsen text recognition results especially for documents with uneven +lighting, faint writing, etc.

+
+

The binarization subcommand converts a color or grayscale input image into an +image containing only two color levels: white (background) and black +(foreground, i.e. text). It accepts almost the same parameters as +ocropus-nlbin. Only options not related to binarization, e.g. skew +detection are missing. In addition, error checking (image sizes, inversion +detection, grayscale enforcement) is always disabled and kraken will happily +binarize any image that is thrown at it.

+

Available parameters are:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

option

type

–threshold

FLOAT

–zoom

FLOAT

–escale

FLOAT

–border

FLOAT

–perc

INTEGER RANGE

–range

INTEGER

–low

INTEGER RANGE

–high

INTEGER RANGE

+

To binarize a image:

+
$ kraken -i input.jpg bw.png binarize
+
+
+
+

Note

+

Some image formats, notably JPEG, do not support a black and white +image mode. Per default the output format according to the output file +name extension will be honored. If this is not possible, a warning will +be printed and the output forced to PNG:

+
$ kraken -i input.jpg bw.jpg binarize
+Binarizing      [06/24/22 09:56:23] WARNING  jpeg does not support 1bpp images. Forcing to png.
+
+
+
+
+
+
+

Page Segmentation

+

The segment subcommand accesses page segmentation into lines and regions with +the two layout analysis methods implemented: the trainable baseline segmenter +that is capable of detecting both lines of different types and regions and a +legacy non-trainable segmenter that produces bounding boxes.

+

Universal parameters of either segmenter are:

+ + + + + + + + + + + + + + +

option

action

-d, –text-direction

Sets principal text direction. Valid values are horizontal-lr, horizontal-rl, vertical-lr, and vertical-rl.

-m, –mask

Segmentation mask suppressing page areas for line detection. A simple black and white mask image where 0-valued (black) areas are ignored for segmentation purposes.

+
+

Baseline Segmentation

+

The baseline segmenter works by applying a segmentation model on a page image +which labels each pixel on the image with one or more classes with each class +corresponding to a line or region of a specific type. In addition there are two +auxilary classes that are used to determine the line orientation. A simplified +example of a composite image of the auxilary classes and a single line type +without regions can be seen below:

+BLLA output heatmap + +

In a second step the raw heatmap is vectorized to extract line instances and +region boundaries, followed by bounding polygon computation for the baselines, +and text line ordering. The final output can be visualized as:

+BLLA final output + +

The primary determinant of segmentation quality is the segmentation model +employed. There is a default model that works reasonably well on printed and +handwritten material on undegraded, even writing surfaces such as paper or +parchment. The output of this model consists of a single line type and a +generic text region class that denotes coherent blocks of text. This model is +employed automatically when the baseline segment is activated with the -bl +option:

+
$ kraken -i input.jpg segmentation.json segment -bl
+
+
+

New models optimized for other kinds of documents can be trained (see +here). These can be applied with the -i option of the +segment subcommand:

+
$ kraken -i input.jpg segmentation.json segment -bl -i fancy_model.mlmodel
+
+
+
+
+

Legacy Box Segmentation

+

The legacy page segmentation is mostly parameterless, although a couple of +switches exist to tweak it for particular inputs. Its output consists of +rectangular bounding boxes in reading order and the general text direction +(horizontal, i.e. LTR or RTL text in top-to-bottom reading order or +vertical-ltr/rtl for vertical lines read from left-to-right or right-to-left).

+

Apart from the limitations of the bounding box paradigm (rotated and curved +lines cannot be effectively extracted) another important drawback of the legacy +segmenter is the requirement for binarized input images. It is therefore +necessary to apply binarization first or supply only +pre-binarized inputs.

+

The legacy segmenter can be applied on some input image with:

+
$ kraken -i 14.tif lines.json segment -x
+$ cat lines.json
+
+
+

Available specific parameters are:

+ + + + + + + + + + + + + + + + + + + + + + + +

option

action

–scale FLOAT

Estimate of the average line height on the page

-m, –maxcolseps

Maximum number of columns in the input document. Set to 0 for uni-column layouts.

-b, –black-colseps / -w, –white-colseps

Switch to black column separators.

-r, –remove-hlines / -l, –hlines

Disables prefiltering of small horizontal lines. Improves segmenter output on some Arabic texts.

-p, –pad

Adds left and right padding around lines in the output.

+
+
+

Principal Text Direction

+

The principal text direction selected with the -d/–text-direction is a +switch used in the reading order heuristic to determine the order of text +blocks (regions) and individual lines. It roughly corresponds to the block +flow direction in CSS with +an additional option. Valid options consist of two parts, an initial principal +line orientation (horizontal or vertical) followed by a block order (lr +for left-to-right or rl for right-to-left).

+

The first part is usually horizontal for scripts like Latin, Arabic, or +Hebrew where the lines are horizontally oriented on the page and are written/read from +top to bottom:

+Horizontal Latin script text + +

Other scripts like Chinese can be written with vertical lines that are +written/read from left to right or right to left:

+Vertical Chinese text + +

The second part is dependent on a number of factors as the order in which text +blocks are read is not fixed for every writing system. In mono-script texts it +is usually determined by the inline text direction, i.e. Latin script texts +columns are read starting with the top-left column followed by the column to +its right and so on, continuing with the left-most column below if none remain +to the right (inverse for right-to-left scripts like Arabic which start on the +top right-most columns, continuing leftward, and returning to the right-most +column just below when none remain).

+

In multi-script documents the order of is determined by the primary writing +system employed in the document, e.g. for a modern book containing both Latin +and Arabic script text it would be set to lr when Latin is primary, e.g. when +the binding is on the left side of the book seen from the title cover, and +vice-versa (rl if binding is on the right on the title cover). The analogue +applies to text written with vertical lines.

+

With these explications there are four different text directions available:

+ + + + + + + + + + + + + + + + + + + + +

Text Direction

Examples

horizontal-lr

Latin script texts, Mixed LTR/RTL docs with principal LTR script

horizontal-rl

Arabic script texts, Mixed LTR/RTL docs with principal RTL script

vertical-lr

Vertical script texts read from left-to-right.

vertical-rl

Vertical script texts read from right-to-left.

+
+
+

Masking

+

It is possible to keep the segmenter from finding text lines and regions on +certain areas of the input image. This is done through providing a binary mask +image that has the same size as the input image where blocked out regions are +black and valid regions white:

+
$ kraken -i input.jpg segmentation.json segment -bl -m mask.png
+
+
+
+
+
+

Model Repository

+

There is a semi-curated repository of freely licensed recognition +models that can be interacted with from the command line using a few +subcommands.

+
+

Querying and Model Retrieval

+

The list subcommand retrieves a list of all models available and prints +them including some additional information (identifier, type, and a short +description):

+
$ kraken list
+Retrieving model list ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 8/8 0:00:00 0:00:07
+10.5281/zenodo.6542744 (pytorch) - LECTAUREP Contemporary French Model (Administration)
+10.5281/zenodo.5617783 (pytorch) - Cremma-Medieval Old French Model (Litterature)
+10.5281/zenodo.5468665 (pytorch) - Medieval Hebrew manuscripts in Sephardi bookhand version 1.0
+...
+
+
+

To access more detailed information the show subcommand may be used:

+
$ kraken show 10.5281/zenodo.5617783
+name: 10.5281/zenodo.5617783
+
+Cremma-Medieval Old French Model (Litterature)
+
+....
+scripts: Latn
+alphabet: &'(),-.0123456789:;?ABCDEFGHIJKLMNOPQRSTUVXabcdefghijklmnopqrstuvwxyz¶ãíñõ÷ħĩłũƺᵉẽ’•⁊⁹ꝑꝓꝯꝰ SPACE, COMBINING ACUTE ACCENT, COMBINING TILDE, COMBINING MACRON, COMBINING ZIGZAG ABOVE, COMBINING LATIN SMALL LETTER A, COMBINING LATIN SMALL LETTER E, COMBINING LATIN SMALL LETTER I, COMBINING LATIN SMALL LETTER O, COMBINING LATIN SMALL LETTER U, COMBINING LATIN SMALL LETTER C, COMBINING LATIN SMALL LETTER R, COMBINING LATIN SMALL LETTER T, COMBINING UR ABOVE, COMBINING US ABOVE, COMBINING LATIN SMALL LETTER S, 0xe8e5, 0xf038, 0xf128
+accuracy: 95.49%
+license: CC-BY-SA-2.0
+author(s): Pinche, Ariane
+date: 2021-10-29
+
+
+

If a suitable model has been decided upon it can be retrieved using the get +subcommand:

+
$ kraken get 10.5281/zenodo.5617783
+Processing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 16.1/16.1 MB 0:00:00 0:00:10
+Model name: cremma_medieval_bicerin.mlmodel
+
+
+

Models will be placed in $XDG_BASE_DIR and can be accessed using their name as +printed in the last line of the kraken get output.

+
$ kraken -i ... ... ocr -m cremma_medieval_bicerin.mlmodel
+
+
+
+
+

Publishing

+

When one would like to share a model with the wider world (for fame and glory!) +it is possible (and recommended) to upload them to repository. The process +consists of 2 stages: the creation of the deposit on the Zenodo platform +followed by approval of the model in the community making it discoverable for +other kraken users.

+

For uploading model a Zenodo account and a personal access token is required. +After account creation tokens can be created under the account settings:

+Zenodo token creation dialogue + +

With the token models can then be uploaded:

+
$ ketos publish -a $ACCESS_TOKEN aaebv2-2.mlmodel
+DOI: 10.5281/zenodo.5617783
+
+
+

A number of important metadata will be asked for such as a short description of +the model, long form description, recognized scripts, and authorship. +Afterwards the model is deposited at Zenodo. This deposit is persistent, i.e. +can’t be changed or deleted so it is important to make sure that all the +information is correct. Each deposit also has a unique persistent identifier, a +DOI, that can be used to refer to it, e.g. in publications or when pointing +someone to a particular model.

+

Once the deposit has been created a request (requiring manual approval) for +inclusion in the repository will automatically be created which will make it +discoverable by other users.

+

It is possible to deposit models without including them in the queryable +repository. Models uploaded this way are not truly private and can still be +found through the standard Zenodo search and be downloaded with kraken get +and its DOI. It is mostly suggested for preliminary models that might get +updated later:

+
$ ketos publish --private -a $ACCESS_TOKEN aaebv2-2.mlmodel
+DOI: 10.5281/zenodo.5617734
+
+
+
+
+
+

Recognition

+

Recognition requires a grey-scale or binarized image, a page segmentation for +that image, and a model file. In particular there is no requirement to use the +page segmentation algorithm contained in the segment subcommand or the +binarization provided by kraken.

+

Multi-script recognition is possible by supplying a script-annotated +segmentation and a mapping between scripts and models:

+
$ kraken -i ... ... ocr -m Grek:porson.clstm -m Latn:antiqua.clstm
+
+
+

All polytonic Greek text portions will be recognized using the porson.clstm +model while Latin text will be fed into the antiqua.clstm model. It is +possible to define a fallback model that other text will be fed to:

+
$ kraken -i ... ... ocr -m ... -m ... -m default:porson.clstm
+
+
+

It is also possible to disable recognition on a particular script by mapping to +the special model keyword ignore. Ignored lines will still be serialized but +will not contain any recognition results.

+

The ocr subcommand is able to serialize the recognition results either as +plain text (default), as hOCR, into ALTO, or abbyyXML containing additional +metadata such as bounding boxes and confidences:

+
$ kraken -i ... ... ocr -t # text output
+$ kraken -i ... ... ocr -h # hOCR output
+$ kraken -i ... ... ocr -a # ALTO output
+$ kraken -i ... ... ocr -y # abbyyXML output
+
+
+

hOCR output is slightly different from hOCR files produced by ocropus. Each +ocr_line span contains not only the bounding box of the line but also +character boxes (x_bboxes attribute) indicating the coordinates of each +character. In each line alternating sequences of alphanumeric and +non-alphanumeric (in the unicode sense) characters are put into ocrx_word +spans. Both have bounding boxes as attributes and the recognition confidence +for each character in the x_conf attribute.

+

Paragraph detection has been removed as it was deemed to be unduly dependent on +certain typographic features which may not be valid for your input.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.2.0/api.html b/4.2.0/api.html new file mode 100644 index 000000000..d1164af32 --- /dev/null +++ b/4.2.0/api.html @@ -0,0 +1,3056 @@ + + + + + + + + API Quickstart — kraken documentation + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

API Quickstart

+

Kraken provides routines which are usable by third party tools to access all +functionality of the OCR engine. Most functional blocks, binarization, +segmentation, recognition, and serialization are encapsulated in one high +level method each.

+

Simple use cases of the API which are mostly useful for debugging purposes are +contained in the contrib directory. In general it is recommended to look at +this tutorial, these scripts, or the API reference. The command line drivers +are unnecessarily complex for straightforward applications as they contain lots +of boilerplate to enable all use cases.

+
+

Basic Concepts

+

The fundamental modules of the API are similar to the command line drivers. +Image inputs and outputs are generally Pillow +objects and numerical outputs numpy arrays.

+

Top-level modules implement high level functionality while kraken.lib +contains loaders and low level methods that usually should not be used if +access to intermediate results is not required.

+
+
+

Preprocessing and Segmentation

+

The primary preprocessing function is binarization although depending on the +particular setup of the pipeline and the models utilized it can be optional. +For the non-trainable legacy bounding box segmenter binarization is mandatory +although it is still possible to feed color and grayscale images to the +recognizer. The trainable baseline segmenter can work with black and white, +grayscale, and color images, depending on the training data and network +configuration utilized; though grayscale and color data are used in almost all +cases.

+
>>> from PIL import Image
+
+>>> from kraken import binarization
+
+# can be any supported image format and mode
+>>> im = Image.open('foo.png')
+>>> bw_im = binarization.nlbin(im)
+
+
+
+

Legacy segmentation

+

The basic parameter of the legacy segmenter consists just of a b/w image +object, although some additional parameters exist, largely to change the +principal text direction (important for column ordering and top-to-bottom +scripts) and explicit masking of non-text image regions:

+
>>> from kraken import pageseg
+
+>>> seg = pageseg.segment(bw_im)
+>>> seg
+{'text_direction': 'horizontal-lr',
+ 'boxes': [[0, 29, 232, 56],
+           [28, 54, 121, 84],
+           [9, 73, 92, 117],
+           [103, 76, 145, 131],
+           [7, 105, 119, 230],
+           [10, 228, 126, 345],
+           ...
+          ],
+ 'script_detection': False}
+
+
+
+
+

Baseline segmentation

+

The baseline segmentation method is based on a neural network that classifies +image pixels into baselines and regions. Because it is trainable, a +segmentation model is required in addition to the image to be segmented and +it has to be loaded first:

+
>>> from kraken import blla
+>>> from kraken.lib import vgsl
+
+>>> model_path = 'path/to/model/file'
+>>> model = vgsl.TorchVGSLModel.load_model(model_path)
+
+
+

A segmentation model contains a basic neural network and associated metadata +defining the available line and region types, bounding regions, and an +auxiliary baseline location flag for the polygonizer:

+ + + + + + + + + + + + + Segmentation Model + (TorchVGSLModel) + + + + + + + + + Metadata + + + + + + + Line and Region Types + + + + + + + Baseline location flag + + + + + + + Bounding Regions + + + + + + + + + + + Neural Network + + + + +

Afterwards they can be fed into the segmentation method +kraken.blla.segment() with image objects:

+
>>> from kraken import blla
+>>> from kraken import serialization
+
+>>> baseline_seg = blla.segment(im, model=model)
+>>> baseline_seg
+{'text_direction': 'horizontal-lr',
+ 'type': 'baselines',
+ 'script_detection': False,
+ 'lines': [{'script': 'default',
+            'baseline': [[471, 1408], [524, 1412], [509, 1397], [1161, 1412], [1195, 1412]],
+            'boundary': [[471, 1408], [491, 1408], [515, 1385], [562, 1388], [575, 1377], ... [473, 1410]]},
+           ...],
+ 'regions': {'$tip':[[[536, 1716], ... [522, 1708], [524, 1716], [536, 1716], ...]
+             '$par': ...
+             '$nop':  ...}}
+>>> alto = serialization.serialize_segmentation(baseline_seg, image_name=im.filename, image_size=im.size, template='alto')
+>>> with open('segmentation_output.xml', 'w') as fp:
+        fp.write(alto)
+
+
+

Optional parameters are largely the same as for the legacy segmenter, i.e. text +direction and masking.

+

Images are automatically converted into the proper mode for recognition, except +in the case of models trained on binary images as there is a plethora of +different algorithms available, each with strengths and weaknesses. For most +material the kraken-provided binarization should be sufficient, though. This +does not mean that a segmentation model trained on RGB images will have equal +accuracy for B/W, grayscale, and RGB inputs. Nevertheless the drop in quality +will often be modest or non-existent for color models while non-binarized +inputs to a binary model will cause severe degradation (and a warning to that +notion).

+

Per default segmentation is performed on the CPU although the neural network +can be run on a GPU with the device argument. As the vast majority of the +processing required is postprocessing the performance gain will most likely +modest though.

+

The above API is the most simple way to perform a complete segmentation. The +process consists of multiple steps such as pixel labelling, separate region and +baseline vectorization, and bounding polygon calculation:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Pixel Labelling + + + + + + + + Line and Separator + Heatmaps + + + + + + + + + Bounding Polygon + Calculation + + + + + + + + + + + Baseline + Vectorization + and Orientation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Oriented + Baselines + + + + + + + + + Line + Ordering + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Bounding + Polygons + + + + + + + Trainable + + + + + + + + + + + + Segmentation + + + + + + + + + + Region Heatmaps + + + + + + + + + + Region + Vectorization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Region + Boundaries + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

It is possible to only run a subset of the functionality depending on one’s +needs by calling the respective functions in kraken.lib.segmentation. As +part of the sub-library the API is not guaranteed to be stable but it generally +does not change much. Examples of more fine-grained use of the segmentation API +can be found in contrib/repolygonize.py +and contrib/segmentation_overlay.py.

+
+
+
+

Recognition

+

Recognition itself is a multi-step process with a neural network producing a +matrix with a confidence value for possible outputs at each time step. This +matrix is decoded into a sequence of integer labels (label domain) which are +subsequently mapped into Unicode code points using a codec. Labels and code +points usually correspond one-to-one, i.e. each label is mapped to exactly one +Unicode code point, but if desired more complex codecs can map single labels to +multiple code points, multiple labels to single code points, or multiple labels +to multiple code points (see the Codec section for further +information).

+ + + + + + + + + + + + Output Matrix + + + Labels + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Label + Sequence + + + 15, 10, 1, ... + + + + 'Time' Steps + + + + + + + + + + + + + + 'Time' Steps + (Width) + + + + + + + + + + + + + + + + + + + + + + + + + + Neural + Net + + + + Character + Sequence + + + o, c, u, ... + + + + + + + + + + + + + + + CTC + decoder + + + + + Codec + + + + + + + + + + + + + + +

As the customization of this two-stage decoding process is usually reserved +for specialized use cases, sensible defaults are chosen by default: codecs are +part of the model file and do not have to be supplied manually; the preferred +CTC decoder is an optional parameter of the recognition model object.

+

To perform text line recognition a neural network has to be loaded first. A +kraken.lib.models.TorchSeqRecognizer is returned which is a wrapper +around the kraken.lib.vgsl.TorchVGSLModel class seen above for +segmentation model loading.

+
>>> from kraken.lib import models
+
+>>> rec_model_path = '/path/to/recognition/model'
+>>> model = models.load_any(rec_model_path)
+
+
+

The sequence recognizer wrapper combines the neural network itself, a +codec, metadata such as the if the input is supposed to be +grayscale or binarized, and an instance of a CTC decoder that performs the +conversion of the raw output tensor of the network into a sequence of labels:

+ + + + + + + + + + + + + Transcription Model + (TorchSeqRecognizer) + + + + + + + + + + Codec + + + + + + + + + + + Metadata + + + + + + + + + + + CTC Decoder + + + + + + + + + + + Neural Network + + + + +

Afterwards, given an image, a segmentation and the model one can perform text +recognition. The code is identical for both legacy and baseline segmentations. +Like for segmentation input images are auto-converted to the correct color +mode, except in the case of binary models for which a warning will be raised if +there is a mismatch for binary input models.

+

There are two methods for recognition, a basic single model call +kraken.rpred.rpred() and a multi-model recognizer +kraken.rpred.mm_rpred(). The latter is useful for recognizing +multi-scriptal documents, i.e. applying different models to different parts of +a document.

+
>>> from kraken import rpred
+# single model recognition
+>>> pred_it = rpred(model, im, baseline_seg)
+>>> for record in pred_it:
+        print(record)
+
+
+

The output isn’t just a sequence of characters but an +kraken.rpred.ocr_record record object containing the character +prediction, cuts (approximate locations), and confidences.

+
>>> record.cuts
+>>> record.prediction
+>>> record.confidences
+
+
+

it is also possible to access the original line information:

+
# for baselines
+>>> record.type
+'baselines'
+>>> record.line
+>>> record.baseline
+>>> record.script
+
+# for box lines
+>>> record.type
+'box'
+>>> record.line
+>>> record.script
+
+
+

Sometimes the undecoded raw output of the network is required. The \(C +\times W\) softmax output matrix is accessible as the outputs attribute on the +kraken.lib.models.TorchSeqRecognizer after each step of the +kraken.rpred.rpred() iterator. To get a mapping from the label space +\(C\) the network operates in to Unicode code points a codec is used. An +arbitrary sequence of labels can generate an arbitrary number of Unicode code +points although usually the relation is one-to-one.

+
>>> pred_it = rpred(model, im, baseline_seg)
+>>> next(pred_it)
+>>> model.output
+>>> model.codec.l2c
+{'\x01': ' ',
+ '\x02': '"',
+ '\x03': "'",
+ '\x04': '(',
+ '\x05': ')',
+ '\x06': '-',
+ '\x07': '/',
+ ...
+}
+
+
+

There are several different ways to convert the output matrix to a sequence of +labels that can be decoded into a character sequence. These are contained in +kraken.lib.ctc_decoder with +kraken.lib.ctc_decoder.greedy_decoder() being the default.

+
+
+

XML Parsing

+

Sometimes it is desired to take the data in an existing XML serialization +format like PageXML or ALTO and apply an OCR function on it. The +kraken.lib.xml module includes parsers extracting information into data +structures processable with minimal transformtion by the functional blocks:

+
>>> from kraken.lib import xml
+
+>>> alto_doc = '/path/to/alto'
+>>> xml.parse_alto(alto_doc)
+{'image': '/path/to/image/file',
+ 'type': 'baselines',
+ 'lines': [{'baseline': [(24, 2017), (25, 2078)],
+            'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)],
+            'text': '',
+            'script': 'default'},
+           {'baseline': [(79, 2016), (79, 2041)],
+            'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)],
+            'text': '',
+            'script': 'default'}, ...],
+ 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)],
+                                      [(253, 3292), (668, 3292), (668, 3455), (253, 3455)],
+                                      [(216, -4), (1015, -4), (1015, 534), (216, 534)]],
+             'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)],
+                                  [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]],
+             ...}
+}
+
+>>> page_doc = '/path/to/page'
+>>> xml.parse_page(page_doc)
+{'image': '/path/to/image/file',
+ 'type': 'baselines',
+ 'lines': [{'baseline': [(24, 2017), (25, 2078)],
+            'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)],
+            'text': '',
+            'script': 'default'},
+           {'baseline': [(79, 2016), (79, 2041)],
+            'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)],
+            'text': '',
+            'script': 'default'}, ...],
+ 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)],
+                                      [(253, 3292), (668, 3292), (668, 3455), (253, 3455)],
+                                      [(216, -4), (1015, -4), (1015, 534), (216, 534)]],
+             'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)],
+                                  [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]],
+             ...}
+
+
+
+
+

Serialization

+

The serialization module can be used to transform the ocr_records returned by the prediction iterator into a text +based (most often XML) format for archival. The module renders jinja2 templates in kraken/templates through +the kraken.serialization.serialize() function.

+
>>> from kraken.lib import serialization
+
+>>> records = [record for record in pred_it]
+>>> alto = serialization.serialize(records, image_name='path/to/image', image_size=im.size, template='alto')
+>>> with open('output.xml', 'w') as fp:
+        fp.write(alto)
+
+
+
+
+

Training

+

Training is largely implemented with the pytorch lightning framework. There are separate +LightningModule`s for recognition and segmentation training and a small +wrapper around the lightning’s `Trainer class that mainly sets up model +handling and verbosity options for the CLI.

+
>>> from kraken.lib.train import RecognitionModel, KrakenTrainer
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer()
+>>> trainer.fit(model)
+
+
+

Likewise for a baseline and region segmentation model:

+
>>> from kraken.lib.train import SegmentationModel, KrakenTrainer
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = SegmentationModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer()
+>>> trainer.fit(model)
+
+
+

When the fit() method is called the dataset is initialized and the training +commences. Both can take quite a bit of time. To get insight into what exactly +is happening the standard lightning callbacks +can be attached to the trainer object:

+
>>> from pytorch_lightning.callbacks import Callback
+>>> from kraken.lib.train import RecognitionModel, KrakenTrainer
+>>> class MyPrintingCallback(Callback):
+    def on_init_start(self, trainer):
+        print("Starting to init trainer!")
+
+    def on_init_end(self, trainer):
+        print("trainer is init now")
+
+    def on_train_end(self, trainer, pl_module):
+        print("do something when training ends")
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer(enable_progress_bar=False, callbacks=[MyPrintingCallback])
+>>> trainer.fit(model)
+Starting to init trainer!
+trainer is init now
+
+
+

This is only a small subset of the training functionality. It is suggested to +have a closer look at the command line parameters for features as transfer +learning, region and baseline filtering, training continuation, and so on.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.2.0/api_docs.html b/4.2.0/api_docs.html new file mode 100644 index 000000000..e998c09e0 --- /dev/null +++ b/4.2.0/api_docs.html @@ -0,0 +1,2686 @@ + + + + + + + + API Reference — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

API Reference

+
+

kraken.blla module

+
+

Note

+

blla provides the interface to the fully trainable segmenter. For the +legacy segmenter interface refer to the pageseg module. Note that +recognition models are not interchangeable between segmenters.

+
+
+
+kraken.blla.segment(im, text_direction='horizontal-lr', mask=None, reading_order_fn=polygonal_reading_order, model=None, device='cpu')
+

Segments a page into text lines using the baseline segmenter.

+

Segments a page into text lines and returns the polyline formed by each +baseline and their estimated environment.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image. The mode can generally be anything but it is possible +to supply a binarized-input-only model which requires accordingly +treated images.

  • +
  • text_direction (str) – Passed-through value for serialization.serialize.

  • +
  • mask (Optional[numpy.ndarray]) – A bi-level mask image of the same size as im where 0-valued +regions are ignored for segmentation purposes. Disables column +detection.

  • +
  • reading_order_fn (Callable) – Function to determine the reading order. Has to +accept a list of tuples (baselines, polygon) and a +text direction (lr or rl).

  • +
  • model (Union[List[kraken.lib.vgsl.TorchVGSLModel], kraken.lib.vgsl.TorchVGSLModel]) – One or more TorchVGSLModel containing a segmentation model. If +none is given a default model will be loaded.

  • +
  • device (str) – The target device to run the neural network on.

  • +
+
+
Returns:
+

A dictionary containing the text direction and under the key ‘lines’ a +list of reading order sorted baselines (polylines) and their respective +polygonal boundaries. The last and first point of each boundary polygon +are connected.

+
 {'text_direction': '$dir',
+  'type': 'baseline',
+  'lines': [
+     {'baseline': [[x0, y0], [x1, y1], ..., [x_n, y_n]], 'boundary': [[x0, y0, x1, y1], ... [x_m, y_m]]},
+     {'baseline': [[x0, ...]], 'boundary': [[x0, ...]]}
+   ]
+   'regions': [
+     {'region': [[x0, y0], [x1, y1], ..., [x_n, y_n]], 'type': 'image'},
+     {'region': [[x0, ...]], 'type': 'text'}
+   ]
+ }
+
+
+

+
+
Raises:
+
+
+
Return type:
+

Dict[str, Any]

+
+
+
+ +
+
+

kraken.pageseg module

+
+

Note

+

pageseg is the legacy bounding box-based segmenter. For the trainable +baseline segmenter interface refer to the blla module. Note that +recognition models are not interchangeable between segmenters.

+
+
+
+kraken.pageseg.segment(im, text_direction='horizontal-lr', scale=None, maxcolseps=2, black_colseps=False, no_hlines=True, pad=0, mask=None, reading_order_fn=reading_order)
+

Segments a page into text lines.

+

Segments a page into text lines and returns the absolute coordinates of +each line in reading order.

+
+
Parameters:
+
    +
  • im – A bi-level page of mode ‘1’ or ‘L’

  • +
  • text_direction (str) – Principal direction of the text +(horizontal-lr/rl/vertical-lr/rl)

  • +
  • scale (Optional[float]) – Scale of the image. Will be auto-determined if set to None.

  • +
  • maxcolseps (float) – Maximum number of whitespace column separators

  • +
  • black_colseps (bool) – Whether column separators are assumed to be vertical +black lines or not

  • +
  • no_hlines (bool) – Switch for small horizontal line removal.

  • +
  • pad (Union[int, Tuple[int, int]]) – Padding to add to line bounding boxes. If int the same padding is +used both left and right. If a 2-tuple, uses (padding_left, +padding_right).

  • +
  • mask (Optional[numpy.ndarray]) – A bi-level mask image of the same size as im where 0-valued +regions are ignored for segmentation purposes. Disables column +detection.

  • +
  • reading_order_fn (Callable) – Function to call to order line output. Callable +accepting a list of slices (y, x) and a text +direction in (rl, lr).

  • +
+
+
Returns:
+

A dictionary containing the text direction and a list of reading order +sorted bounding boxes under the key ‘boxes’:

+
{'text_direction': '$dir', 'boxes': [(x1, y1, x2, y2),...]}
+
+
+

+
+
Raises:
+

KrakenInputException – if the input image is not binarized or the text +direction is invalid.

+
+
Return type:
+

Dict[str, Any]

+
+
+
+ +
+
+

kraken.rpred module

+
+
+kraken.rpred.bidi_record(record, base_dir=None)
+

Reorders a record using the Unicode BiDi algorithm.

+

Models trained for RTL or mixed scripts still emit classes in LTR order +requiring reordering for proper display.

+
+
Parameters:
+

record (kraken.rpred.ocr_record)

+
+
Returns:
+

kraken.rpred.ocr_record

+
+
Return type:
+

ocr_record

+
+
+
+ +
+
+class kraken.rpred.mm_rpred(nets, im, bounds, pad=16, bidi_reordering=True, tags_ignore=None)
+

Multi-model version of kraken.rpred.rpred

+
+
Parameters:
+
+
+
+
+
+bidi_reordering
+
+ +
+
+bounds
+
+ +
+
+filtered_tags = []
+
+ +
+
+im
+
+ +
+
+im_str
+
+ +
+
+miss = []
+
+ +
+
+nets
+
+ +
+
+one_channel_modes
+
+ +
+
+pad
+
+ +
+
+seg_types
+
+ +
+
+tags
+
+ +
+
+tags_ignore
+
+ +
+
+ts
+
+ +
+ +
+
+class kraken.rpred.ocr_record(prediction, cuts, confidences, line)
+

A record object containing the recognition result of a single line

+
+
Parameters:
+
    +
  • prediction (str)

  • +
  • confidences (List[float])

  • +
  • line (Union[List, Dict[str, List]])

  • +
+
+
+
+
+base_dir = None
+
+ +
+
+confidences
+
+ +
+
+cuts
+
+ +
+
+prediction
+
+ +
+
+tags
+
+ +
+
+type
+
+ +
+ +
+
+kraken.rpred.rpred(network, im, bounds, pad=16, bidi_reordering=True)
+

Uses a TorchSeqRecognizer and a segmentation to recognize text

+
+
Parameters:
+
    +
  • network (kraken.lib.models.TorchSeqRecognizer) – A TorchSegRecognizer +object

  • +
  • im (PIL.Image.Image) – Image to extract text from

  • +
  • bounds (dict) – A dictionary containing a ‘boxes’ entry with a list of +coordinates (x0, y0, x1, y1) of a text line in the image +and an entry ‘text_direction’ containing +‘horizontal-lr/rl/vertical-lr/rl’.

  • +
  • pad (int) – Extra blank padding to the left and right of text line. +Auto-disabled when expected network inputs are incompatible +with padding.

  • +
  • bidi_reordering (bool|str) – Reorder classes in the ocr_record according to +the Unicode bidirectional algorithm for correct +display. Set to L|R to change base text +direction.

  • +
+
+
Yields:
+

An ocr_record containing the recognized text, absolute character +positions, and confidence values for each character.

+
+
Return type:
+

Generator[ocr_record, None, None]

+
+
+
+ +
+
+

kraken.serialization module

+
+
+kraken.serialization.render_report(model, chars, errors, char_confusions, scripts, insertions, deletions, substitutions)
+

Renders an accuracy report.

+
+
Parameters:
+
    +
  • model (str) – Model name.

  • +
  • errors (int) – Number of errors on test set.

  • +
  • char_confusions (dict) – Dictionary mapping a tuple (gt, pred) to a +number of occurrences.

  • +
  • scripts (dict) – Dictionary counting character per script.

  • +
  • insertions (dict) – Dictionary counting insertion operations per Unicode +script

  • +
  • deletions (int) – Number of deletions

  • +
  • substitutions (dict) – Dictionary counting substitution operations per +Unicode script.

  • +
  • chars (int)

  • +
+
+
Returns:
+

A string containing the rendered report.

+
+
Return type:
+

str

+
+
+
+ +
+
+kraken.serialization.serialize(records, image_name=None, image_size=(0, 0), writing_mode='horizontal-tb', scripts=None, regions=None, template='hocr', processing_steps=None)
+

Serializes a list of ocr_records into an output document.

+

Serializes a list of predictions and their corresponding positions by doing +some hOCR-specific preprocessing and then renders them through one of +several jinja2 templates.

+

Note: Empty records are ignored for serialization purposes.

+
+
Parameters:
+
    +
  • records (Sequence[kraken.rpred.ocr_record]) – List of kraken.rpred.ocr_record

  • +
  • image_name (str) – Name of the source image

  • +
  • image_size (Tuple[int, int]) – Dimensions of the source image

  • +
  • writing_mode (str) – Sets the principal layout of lines and the +direction in which blocks progress. Valid values are +horizontal-tb, vertical-rl, and vertical-lr.

  • +
  • scripts (Optional[Iterable[str]]) – List of scripts contained in the OCR records

  • +
  • regions (Optional[Dict[str, List[List[Tuple[int, int]]]]]) – Dictionary mapping region types to a list of region polygons.

  • +
  • template (str) – Selector for the serialization format. May be ‘hocr’, +‘alto’, ‘page’ or any template found in the template directory.

  • +
  • processing_steps (Optional[List[Dict[str, Union[Dict, str, float, int, bool]]]]) –

    A list of dictionaries describing the processing kraken performed on the inputs:

    +
    {'category': 'preprocessing',
    + 'description': 'natural language description of process',
    + 'settings': {'arg0': 'foo', 'argX': 'bar'}
    +}
    +
    +
    +

  • +
+
+
Returns:
+

The rendered template

+
+
Return type:
+

str

+
+
+
+ +
+
+kraken.serialization.serialize_segmentation(segresult, image_name=None, image_size=(0, 0), template='hocr', processing_steps=None)
+

Serializes a segmentation result into an output document.

+
+
Parameters:
+
    +
  • segresult (Dict[str, Any]) – Result of blla.segment

  • +
  • image_name (str) – Name of the source image

  • +
  • image_size (tuple) – Dimensions of the source image

  • +
  • template (str) – Selector for the serialization format. May be +‘hocr’ or ‘alto’.

  • +
  • processing_steps (Optional[List[Dict[str, Union[Dict, str, float, int, bool]]]])

  • +
+
+
Returns:
+

(str) rendered template.

+
+
Return type:
+

str

+
+
+
+ +
+
+

kraken.lib.models module

+
+
+class kraken.lib.models.TorchSeqRecognizer(nn, decoder=kraken.lib.ctc_decoder.greedy_decoder, train=False, device='cpu')
+

A wrapper class around a TorchVGSLModel for text recognition.

+
+
Parameters:
+
+
+
+
+
+codec
+
+ +
+
+decoder
+
+ +
+
+device
+
+ +
+
+forward(line, lens=None)
+

Performs a forward pass on a torch tensor of one or more lines with +shape (N, C, H, W) and returns a numpy array (N, W, C).

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (torch.Tensor) – Optional tensor containing sequence lengths if N > 1

  • +
+
+
Returns:
+

Tuple with (N, W, C) shaped numpy array and final output sequence +lengths.

+
+
Raises:
+

KrakenInputException – Is raised if the channel dimension isn’t of +size 1 in the network output.

+
+
Return type:
+

Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]

+
+
+
+ +
+
+kind = ''
+
+ +
+
+nn
+
+ +
+
+one_channel_mode
+
+ +
+
+predict(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns the decoding as a list of tuples (string, start, end, +confidence).

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (Optional[torch.Tensor]) – Optional tensor containing sequence lengths if N > 1

  • +
+
+
Returns:
+

List of decoded sequences.

+
+
Return type:
+

List[List[Tuple[str, int, int, float]]]

+
+
+
+ +
+
+predict_labels(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns a list of tuples (class, start, end, max). Max is the +maximum value of the softmax layer in the region.

+
+
Parameters:
+
    +
  • line (torch.tensor)

  • +
  • lens (torch.Tensor)

  • +
+
+
Return type:
+

List[List[Tuple[int, int, int, float]]]

+
+
+
+ +
+
+predict_string(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns a string of the results.

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (Optional[torch.Tensor]) – Optional tensor

  • +
+
+
Return type:
+

List[str]

+
+
+
+ +
+
+seg_type
+
+ +
+
+to(device)
+

Moves model to device and automatically loads input tensors onto it.

+
+ +
+
+train
+
+ +
+ +
+
+kraken.lib.models.load_any(fname, train=False, device='cpu')
+

Loads anything that was, is, and will be a valid ocropus model and +instantiates a shiny new kraken.lib.lstm.SeqRecognizer from the RNN +configuration in the file.

+

Currently it recognizes the following kinds of models:

+
+
    +
  • protobuf models containing VGSL segmentation and recognition +networks.

  • +
+
+

Additionally an attribute ‘kind’ will be added to the SeqRecognizer +containing a string representation of the source kind. Current known values +are:

+
+
    +
  • vgsl for VGSL models

  • +
+
+
+
Parameters:
+
    +
  • fname (str) – Path to the model

  • +
  • train (bool) – Enables gradient calculation and dropout layers in model.

  • +
  • device (str) – Target device

  • +
+
+
Returns:
+

A kraken.lib.models.TorchSeqRecognizer object.

+
+
Raises:
+

KrakenInvalidModelException – if the model is not loadable by any parser.

+
+
Return type:
+

TorchSeqRecognizer

+
+
+
+ +
+
+

kraken.lib.vgsl module

+
+
+class kraken.lib.vgsl.TorchVGSLModel(spec)
+

Class building a torch module from a VSGL spec.

+

The initialized class will contain a variable number of layers and a loss +function. Inputs and outputs are always 4D tensors in order (batch, +channels, height, width) with channels always being the feature dimension.

+

Importantly this means that a recurrent network will be fed the channel +vector at each step along its time axis, i.e. either put the non-time-axis +dimension into the channels dimension or use a summarizing RNN squashing +the time axis to 1 and putting the output into the channels dimension +respectively.

+
+
Parameters:
+

spec (str)

+
+
+
+
+input
+

Expected input tensor as a 4-tuple.

+
+ +
+
+nn
+

Stack of layers parsed from the spec.

+
+ +
+
+criterion
+

Fully parametrized loss function.

+
+ +
+
+user_metadata
+

dict with user defined metadata. Is flushed into +model file during saving/overwritten by loading +operations.

+
+ +
+
+one_channel_mode
+

Field indicating the image type used during +training of one-channel images. Is ‘1’ for +models trained on binarized images, ‘L’ for +grayscale, and None otherwise.

+
+ +
+
+add_codec(codec)
+

Adds a PytorchCodec to the model.

+
+
Parameters:
+

codec (kraken.lib.codec.PytorchCodec)

+
+
Return type:
+

None

+
+
+
+ +
+
+append(idx, spec)
+

Splits a model at layer idx and append layers spec.

+

New layers are initialized using the init_weights method.

+
+
Parameters:
+
    +
  • idx (int) – Index of layer to append spec to starting with 1. To +select the whole layer stack set idx to None.

  • +
  • spec (str) – VGSL spec without input block to append to model.

  • +
+
+
Return type:
+

None

+
+
+
+ +
+
+property aux_layers
+
+ +
+
+blocks
+
+ +
+
+build_addition(input, blocks, idx)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_conv(input, blocks, idx)
+

Builds a 2D convolution layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_dropout(input, blocks, idx)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_groupnorm(input, blocks, idx)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_identity(input, blocks, idx)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_maxpool(input, blocks, idx)
+

Builds a maxpool layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_output(input, blocks, idx)
+

Builds an output layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_parallel(input, blocks, idx)
+

Builds a block of parallel layers.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_reshape(input, blocks, idx)
+

Builds a reshape layer

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_rnn(input, blocks, idx)
+

Builds an LSTM/GRU layer returning number of outputs and layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_series(input, blocks, idx)
+

Builds a serial block of layers.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_wav2vec2(input, blocks, idx)
+

Builds a Wav2Vec2 masking layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+codec: kraken.lib.codec.PytorchCodec | None = None
+
+ +
+
+criterion: Any = None
+
+ +
+
+eval()
+

Sets the model to evaluation/inference mode, disabling dropout and +gradient calculation.

+
+
Return type:
+

None

+
+
+
+ +
+
+property hyper_params
+
+ +
+
+idx
+
+ +
+
+init_weights(idx=slice(0, None))
+

Initializes weights for all or a subset of layers in the graph.

+

LSTM/GRU layers are orthogonally initialized, convolutional layers +uniformly from (-0.1,0.1).

+
+
Parameters:
+

idx (slice) – A slice object representing the indices of layers to +initialize.

+
+
Return type:
+

None

+
+
+
+ +
+
+input
+
+ +
+
+classmethod load_model(path)
+

Deserializes a VGSL model from a CoreML file.

+
+
Parameters:
+

path (Union[str, pathlib.Path]) – CoreML file

+
+
Returns:
+

A TorchVGSLModel instance.

+
+
Raises:
+
    +
  • KrakenInvalidModelException if the model data is invalid (not a

  • +
  • string, protobuf file, or without appropriate metadata).

  • +
  • FileNotFoundError if the path doesn't point to a file.

  • +
+
+
+
+ +
+
+m
+
+ +
+
+property model_type
+
+ +
+
+named_spec: List[str] = []
+
+ +
+
+nn
+
+ +
+
+property one_channel_mode
+
+ +
+
+ops
+
+ +
+
+pattern
+
+ +
+
+resize_output(output_size, del_indices=None)
+

Resizes an output layer.

+
+
Parameters:
+
    +
  • output_size (int) – New size/output channels of last layer

  • +
  • del_indices (list) – list of outputs to delete from layer

  • +
+
+
Return type:
+

None

+
+
+
+ +
+
+save_model(path)
+

Serializes the model into path.

+
+
Parameters:
+

path (str) – Target destination

+
+
+
+ +
+
+property seg_type
+
+ +
+
+set_num_threads(num)
+

Sets number of OpenMP threads to use.

+
+
Parameters:
+

num (int)

+
+
Return type:
+

None

+
+
+
+ +
+
+spec
+
+ +
+
+to(device)
+
+
Parameters:
+

device (Union[str, torch.device])

+
+
Return type:
+

None

+
+
+
+ +
+
+train()
+

Sets the model to training mode (enables dropout layers and disables +softmax on CTC layers).

+
+
Return type:
+

None

+
+
+
+ +
+
+user_metadata: dict[str, Any]
+
+ +
+ +
+
+

kraken.lib.xml module

+
+
+kraken.lib.xml.parse_xml(filename)
+

Parses either a PageXML or ALTO file with autodetermination of the file +format.

+
+
Parameters:
+

filename (Union[str, pathlib.Path]) – path to an XML file.

+
+
Returns:
+

A dict:

+
{'image': impath,
+ 'lines': [{'boundary': [[x0, y0], ...],
+            'baseline': [[x0, y0], ...],
+            'text': apdjfqpf',
+            'tags': {'type': 'default', ...}},
+           ...
+           {...}],
+ 'regions': {'region_type_0': [[[x0, y0], ...], ...], ...}}
+
+
+

+
+
Return type:
+

Dict[str, Any]

+
+
+
+ +
+
+kraken.lib.xml.parse_page(filename)
+

Parses a PageXML file, returns the baselines defined in it, and loads the +referenced image.

+
+
Parameters:
+

filename (Union[str, pathlib.Path]) – path to a PageXML file.

+
+
Returns:
+

A dict:

+
{'image': impath,
+ 'lines': [{'boundary': [[x0, y0], ...],
+            'baseline': [[x0, y0], ...],
+            'text': apdjfqpf',
+            'tags': {'type': 'default', ...}},
+           ...
+           {...}],
+ 'regions': {'region_type_0': [[[x0, y0], ...], ...], ...}}
+
+
+

+
+
Return type:
+

Dict[str, Any]

+
+
+
+ +
+
+kraken.lib.xml.parse_alto(filename)
+

Parses an ALTO file, returns the baselines defined in it, and loads the +referenced image.

+
+
Parameters:
+

filename (Union[str, pathlib.Path]) – path to an ALTO file.

+
+
Returns:
+

A dict:

+
{'image': impath,
+ 'lines': [{'boundary': [[x0, y0], ...],
+            'baseline': [[x0, y0], ...],
+            'text': apdjfqpf',
+            'tags': {'type': 'default', ...}},
+           ...
+           {...}],
+ 'regions': {'region_type_0': [[[x0, y0], ...], ...], ...}}
+
+
+

+
+
Return type:
+

Dict[str, Any]

+
+
+
+ +
+
+

kraken.lib.codec module

+
+
+class kraken.lib.codec.PytorchCodec(charset, strict=False)
+

Builds a codec converting between graphemes/code points and integer +label sequences.

+

charset may either be a string, a list or a dict. In the first case +each code point will be assigned a label, in the second case each +string in the list will be assigned a label, and in the final case each +key string will be mapped to the value sequence of integers. In the +first two cases labels will be assigned automatically. When a mapping +is manually provided the label codes need to be a prefix-free code.

+

As 0 is the blank label in a CTC output layer, output labels and input +dictionaries are/should be 1-indexed.

+
+
Parameters:
+
    +
  • charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) – Input character set.

  • +
  • strict – Flag indicating if encoding/decoding errors should be ignored +or cause an exception.

  • +
+
+
Raises:
+

KrakenCodecException – If the character set contains duplicate +entries or the mapping is non-singular or +non-prefix-free.

+
+
+
+
+add_labels(charset)
+

Adds additional characters/labels to the codec.

+

charset may either be a string, a list or a dict. In the first case +each code point will be assigned a label, in the second case each +string in the list will be assigned a label, and in the final case each +key string will be mapped to the value sequence of integers. In the +first two cases labels will be assigned automatically.

+

As 0 is the blank label in a CTC output layer, output labels and input +dictionaries are/should be 1-indexed.

+
+
Parameters:
+

charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) – Input character set.

+
+
Return type:
+

PytorchCodec

+
+
+
+ +
+
+c_sorted
+
+ +
+
+decode(labels)
+

Decodes a labelling.

+

Given a labelling with cuts and confidences returns a string with the +cuts and confidences aggregated across label-code point +correspondences. When decoding multilabels to code points the resulting +cuts are min/max, confidences are averaged.

+
+
Parameters:
+

labels (Sequence[Tuple[int, int, int, float]]) – Input containing tuples (label, start, end, +confidence).

+
+
Returns:
+

A list of tuples (code point, start, end, confidence)

+
+
Return type:
+

List[Tuple[str, int, int, float]]

+
+
+
+ +
+
+encode(s)
+

Encodes a string into a sequence of labels.

+

If the code is non-singular we greedily encode the longest sequence first.

+
+
Parameters:
+

s (str) – Input unicode string

+
+
Returns:
+

Ecoded label sequence

+
+
Raises:
+

KrakenEncodeException – if the a subsequence is not encodable and the +codec is set to strict mode.

+
+
Return type:
+

torch.IntTensor

+
+
+
+ +
+
+property is_valid: bool
+

Returns True if the codec is prefix-free (in label space) and +non-singular (in both directions).

+
+
Return type:
+

bool

+
+
+
+ +
+
+l2c: Dict[Tuple[int], str]
+
+ +
+
+property max_label: int
+

Returns the maximum label value.

+
+
Return type:
+

int

+
+
+
+ +
+
+merge(codec)
+

Transforms this codec (c1) into another (c2) reusing as many labels as +possible.

+

The resulting codec is able to encode the same code point sequences +while not necessarily having the same labels for them as c2. +Retains matching character -> label mappings from both codecs, removes +mappings not c2, and adds mappings not in c1. Compound labels in c2 for +code point sequences not in c1 containing labels also in use in c1 are +added as separate labels.

+
+
Parameters:
+

codec (PytorchCodec) – PytorchCodec to merge with

+
+
Returns:
+

A merged codec and a list of labels that were removed from the +original codec.

+
+
Return type:
+

Tuple[PytorchCodec, Set]

+
+
+
+ +
+
+strict
+
+ +
+ +
+
+

kraken.lib.train module

+
+

Training Schedulers

+
+
+

Training Stoppers

+
+
+

Loss and Evaluation Functions

+
+
+

Trainer

+
+
+class kraken.lib.train.KrakenTrainer(enable_progress_bar=True, enable_summary=True, min_epochs=5, max_epochs=100, pb_ignored_metrics=('loss', 'val_metric'), *args, **kwargs)
+
+
Parameters:
+
    +
  • enable_progress_bar (bool)

  • +
  • enable_summary (bool)

  • +
+
+
+
+
+fit(*args, **kwargs)
+
+ +
+ +
+
+
+

kraken.lib.dataset module

+
+

Datasets

+
+
+class kraken.lib.dataset.BaselineSet(imgs=None, suffix='.path', line_width=4, im_transforms=transforms.Compose([]), mode='path', augmentation=False, valid_baselines=None, merge_baselines=None, valid_regions=None, merge_regions=None)
+

Dataset for training a baseline/region segmentation model.

+
+
Parameters:
+
    +
  • imgs (Sequence[str])

  • +
  • suffix (str)

  • +
  • line_width (int)

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • mode (str)

  • +
  • augmentation (bool)

  • +
  • valid_baselines (Sequence[str])

  • +
  • merge_baselines (Dict[str, Sequence[str]])

  • +
  • valid_regions (Sequence[str])

  • +
  • merge_regions (Dict[str, Sequence[str]])

  • +
+
+
+
+
+add(image, baselines=None, regions=None, *args, **kwargs)
+

Adds a page to the dataset.

+
+
Parameters:
+
    +
  • im (path) – Path to the whole page image

  • +
  • baseline (dict) – A list containing dicts with a list of coordinates +and tags [{‘baseline’: [[x0, y0], …, +[xn, yn]], ‘tags’: (‘script_type’,)}, …]

  • +
  • regions (dict) – A dict containing list of lists of coordinates +{‘region_type_0’: [[x0, y0], …, [xn, yn]]], +‘region_type_1’: …}.

  • +
  • image (Union[str, PIL.Image.Image])

  • +
  • baselines (List[List[List[Tuple[int, int]]]])

  • +
+
+
+
+ +
+
+aug = None
+
+ +
+
+class_mapping
+
+ +
+
+class_stats
+
+ +
+
+im_mode = '1'
+
+ +
+
+imgs
+
+ +
+
+line_width
+
+ +
+
+mbl_dict
+
+ +
+
+mode
+
+ +
+
+mreg_dict
+
+ +
+
+num_classes = 2
+
+ +
+
+seg_type = None
+
+ +
+
+targets = []
+
+ +
+
+transform(image, target)
+
+ +
+
+transforms
+
+ +
+
+valid_baselines
+
+ +
+
+valid_regions
+
+ +
+ +
+
+class kraken.lib.dataset.PolygonGTDataset(normalization=None, whitespace_normalization=True, ignore_empty_lines=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False)
+

Dataset for training a line recognition model from polygonal/baseline data.

+
+
Parameters:
+
    +
  • normalization (Optional[str])

  • +
  • whitespace_normalization (bool)

  • +
  • ignore_empty_lines (bool)

  • +
  • reorder (Union[bool, str])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • augmentation (bool)

  • +
+
+
+
+
+add(*args, **kwargs)
+

Adds a line to the dataset.

+
+
Parameters:
+
    +
  • im (path) – Path to the whole page image

  • +
  • text (str) – Transcription of the line.

  • +
  • baseline (list) – A list of coordinates [[x0, y0], …, [xn, yn]].

  • +
  • boundary (list) – A polygon mask for the line.

  • +
+
+
+
+ +
+
+alphabet: collections.Counter
+
+ +
+
+aug = None
+
+ +
+
+encode(codec=None)
+

Adds a codec to the dataset and encodes all text lines.

+

Has to be run before sampling from the dataset.

+
+
Parameters:
+

codec (Optional[kraken.lib.codec.PytorchCodec])

+
+
Return type:
+

None

+
+
+
+ +
+
+ignore_empty_lines
+
+ +
+
+im_mode = '1'
+
+ +
+
+no_encode()
+

Creates an unencoded dataset.

+
+
Return type:
+

None

+
+
+
+ +
+
+parse(image, text, baseline, boundary, *args, **kwargs)
+

Parses a sample for the dataset and returns it.

+

This function is mainly uses for parallelized loading of training data.

+
+
Parameters:
+
    +
  • im (path) – Path to the whole page image

  • +
  • text (str) – Transcription of the line.

  • +
  • baseline (list) – A list of coordinates [[x0, y0], …, [xn, yn]].

  • +
  • boundary (list) – A polygon mask for the line.

  • +
  • image (Union[str, PIL.Image.Image])

  • +
+
+
+
+ +
+
+seg_type = 'baselines'
+
+ +
+
+text_transforms: List[Callable[[str], str]] = []
+
+ +
+
+transforms
+
+ +
+ +
+
+class kraken.lib.dataset.GroundTruthDataset(split=F_t.default_split, suffix='.gt.txt', normalization=None, whitespace_normalization=True, ignore_empty_lines=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False)
+

Dataset for training a line recognition model.

+

All data is cached in memory.

+
+
Parameters:
+
    +
  • split (Callable[[str], str])

  • +
  • suffix (str)

  • +
  • normalization (Optional[str])

  • +
  • whitespace_normalization (bool)

  • +
  • ignore_empty_lines (bool)

  • +
  • reorder (Union[bool, str])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • augmentation (bool)

  • +
+
+
+
+
+add(*args, **kwargs)
+

Adds a line-image-text pair to the dataset.

+
+
Parameters:
+

image (str) – Input image path

+
+
Return type:
+

None

+
+
+
+ +
+
+alphabet: collections.Counter
+
+ +
+
+aug = None
+
+ +
+
+encode(codec=None)
+

Adds a codec to the dataset and encodes all text lines.

+

Has to be run before sampling from the dataset.

+
+
Parameters:
+

codec (Optional[kraken.lib.codec.PytorchCodec])

+
+
Return type:
+

None

+
+
+
+ +
+
+ignore_empty_lines
+
+ +
+
+im_mode = '1'
+
+ +
+
+no_encode()
+

Creates an unencoded dataset.

+
+
Return type:
+

None

+
+
+
+ +
+
+parse(image, *args, **kwargs)
+

Parses a sample for this dataset.

+

This is mostly used to parallelize populating the dataset.

+
+
Parameters:
+

image (str) – Input image path

+
+
Return type:
+

Dict

+
+
+
+ +
+
+seg_type = 'bbox'
+
+ +
+
+split
+
+ +
+
+suffix
+
+ +
+
+text_transforms: List[Callable[[str], str]] = []
+
+ +
+
+transforms
+
+ +
+ +
+
+

Helpers

+
+
+kraken.lib.dataset.compute_error(model, batch)
+

Computes error report from a model and a list of line image-text pairs.

+
+
Parameters:
+
+
+
Returns:
+

A tuple with total number of characters and edit distance across the +whole validation set.

+
+
Return type:
+

Tuple[int, int]

+
+
+
+ +
+
+kraken.lib.dataset.preparse_xml_data(filenames, format_type='xml', repolygonize=False)
+

Loads training data from a set of xml files.

+

Extracts line information from Page/ALTO xml files for training of +recognition models.

+
+
Parameters:
+
    +
  • filenames (Sequence[Union[str, pathlib.Path]]) – List of XML files.

  • +
  • format_type (str) – Either page, alto or xml for autodetermination.

  • +
  • repolygonize (bool) – (Re-)calculates polygon information using the kraken +algorithm.

  • +
+
+
Returns:
+

text, ‘baseline’: [[x0, y0], …], ‘boundary’: +[[x0, y0], …], ‘image’: PIL.Image}.

+
+
Return type:
+

A list of dicts {‘text’

+
+
+
+ +
+
+

kraken.lib.segmentation module

+
+
+kraken.lib.segmentation.reading_order(lines, text_direction='lr')
+

Given the list of lines (a list of 2D slices), computes +the partial reading order. The output is a binary 2D array +such that order[i,j] is true if line i comes before line j +in reading order.

+
+
Parameters:
+
    +
  • lines (Sequence[Tuple[slice, slice]])

  • +
  • text_direction (str)

  • +
+
+
Return type:
+

numpy.ndarray

+
+
+
+ +
+
+kraken.lib.segmentation.polygonal_reading_order(lines, text_direction='lr', regions=None)
+

Given a list of baselines and regions, calculates the correct reading order +and applies it to the input.

+
+
Parameters:
+
    +
  • lines (Sequence) – List of tuples containing the baseline and its +polygonization.

  • +
  • regions (Sequence) – List of region polygons.

  • +
  • text_direction (str) – Set principal text direction for column ordering. +Can be ‘lr’ or ‘rl’

  • +
+
+
Returns:
+

A reordered input.

+
+
Return type:
+

Sequence[Tuple[List[Tuple[int, int]], List[Tuple[int, int]]]]

+
+
+
+ +
+
+kraken.lib.segmentation.denoising_hysteresis_thresh(im, low, high, sigma)
+
+ +
+
+kraken.lib.segmentation.vectorize_lines(im, threshold=0.17, min_length=5)
+

Vectorizes lines from a binarized array.

+
+
Parameters:
+
    +
  • im (np.ndarray) – Array of shape (3, H, W) with the first dimension +being probabilities for (start_separators, +end_separators, baseline).

  • +
  • threshold (float) – Threshold for baseline blob detection.

  • +
  • min_length (int) – Minimal length of output baselines.

  • +
+
+
Returns:
+

[[x0, y0, … xn, yn], [xm, ym, …, xk, yk], … ] +A list of lists containing the points of all baseline polylines.

+
+
+
+ +
+
+kraken.lib.segmentation.calculate_polygonal_environment(im=None, baselines=None, suppl_obj=None, im_feats=None, scale=None, topline=False)
+

Given a list of baselines and an input image, calculates a polygonal +environment around each baseline.

+
+
Parameters:
+
    +
  • im (PIL.Image) – grayscale input image (mode ‘L’)

  • +
  • baselines (sequence) – List of lists containing a single baseline per +entry.

  • +
  • suppl_obj (sequence) – List of lists containing additional polylines +that should be considered hard boundaries for +polygonizaton purposes. Can be used to prevent +polygonization into non-text areas such as +illustrations or to compute the polygonization of +a subset of the lines in an image.

  • +
  • im_feats (numpy.array) – An optional precomputed seamcarve energy map. +Overrides data in im. The default map is +gaussian_filter(sobel(im), 2).

  • +
  • scale (tuple) – A 2-tuple (h, w) containing optional scale factors of +the input. Values of 0 are used for aspect-preserving +scaling. None skips input scaling.

  • +
  • topline (bool) – Switch to change default baseline location for offset +calculation purposes. If set to False, baselines are +assumed to be on the bottom of the text line and will +be offset upwards, if set to True, baselines are on the +top and will be offset downwards. If set to None, no +offset will be applied.

  • +
+
+
Returns:
+

List of lists of coordinates. If no polygonization could be compute for +a baseline None is returned instead.

+
+
+
+ +
+
+kraken.lib.segmentation.scale_polygonal_lines(lines, scale)
+

Scales baselines/polygon coordinates by a certain factor.

+
+
Parameters:
+
    +
  • lines (Sequence) – List of tuples containing the baseline and it’s +polygonization.

  • +
  • scale (float or tuple of floats) – Scaling factor

  • +
+
+
Return type:
+

Sequence[Tuple[List, List]]

+
+
+
+ +
+
+kraken.lib.segmentation.scale_regions(regions, scale)
+

Scales baselines/polygon coordinates by a certain factor.

+
+
Parameters:
+
    +
  • lines (Sequence) – List of tuples containing the baseline and it’s +polygonization.

  • +
  • scale (float or tuple of floats) – Scaling factor

  • +
  • regions (Sequence[Tuple[List[int], List[int]]])

  • +
+
+
Return type:
+

Sequence[Tuple[List, List]]

+
+
+
+ +
+
+kraken.lib.segmentation.compute_polygon_section(baseline, boundary, dist1, dist2)
+

Given a baseline, polygonal boundary, and two points on the baseline return +the rectangle formed by the orthogonal cuts on that baseline segment. The +resulting polygon is not garantueed to have a non-zero area.

+

The distance can be larger than the actual length of the baseline if the +baseline endpoints are inside the bounding polygon. In that case the +baseline will be extrapolated to the polygon edge.

+
+
Parameters:
+
    +
  • baseline (list) – A polyline ((x1, y1), …, (xn, yn))

  • +
  • boundary (list) – A bounding polygon around the baseline (same format as +baseline).

  • +
  • dist1 (int) – Absolute distance along the baseline of the first point.

  • +
  • dist2 (int) – Absolute distance along the baseline of the second point.

  • +
+
+
Returns:
+

A sequence of polygon points.

+
+
Return type:
+

List[Tuple[int, int]]

+
+
+
+ +
+
+kraken.lib.segmentation.extract_polygons(im, bounds)
+

Yields the subimages of image im defined in the list of bounding polygons +with baselines preserving order.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image

  • +
  • bounds (Dict[str, Any]) –

    A list of dicts in baseline:

    +
    {'type': 'baselines',
    + 'lines': [{'baseline': [[x_0, y_0], ... [x_n, y_n]],
    +            'boundary': [[x_0, y_0], ... [x_n, y_n]]},
    +           ....]
    +}
    +
    +
    +

    or bounding box format:

    +
    {'boxes': [[x_0, y_0, x_1, y_1], ...], 'text_direction': 'horizontal-lr'}
    +
    +
    +

  • +
+
+
Yields:
+

The extracted subimage

+
+
Return type:
+

PIL.Image.Image

+
+
+
+ +
+
+
+

kraken.lib.ctc_decoder

+
+
+kraken.lib.ctc_decoder.beam_decoder(outputs, beam_size=3)
+

Translates back the network output to a label sequence using +same-prefix-merge beam search decoding as described in [0].

+

[0] Hannun, Awni Y., et al. “First-pass large vocabulary continuous speech +recognition using bi-directional recurrent DNNs.” arXiv preprint +arXiv:1408.2873 (2014).

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • beam_size (int) – Size of the beam

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, prob). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+kraken.lib.ctc_decoder.greedy_decoder(outputs)
+

Translates back the network output to a label sequence using greedy/best +path decoding as described in [0].

+

[0] Graves, Alex, et al. “Connectionist temporal classification: labelling +unsegmented sequence data with recurrent neural networks.” Proceedings of +the 23rd international conference on Machine learning. ACM, 2006.

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, max). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+kraken.lib.ctc_decoder.blank_threshold_decoder(outputs, threshold=0.5)
+

Translates back the network output to a label sequence as the original +ocropy/clstm.

+

Thresholds on class 0, then assigns the maximum (non-zero) class to each +region.

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • threshold (float) – Threshold for 0 class when determining possible label +locations.

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, max). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+

kraken.lib.exceptions

+
+
+class kraken.lib.exceptions.KrakenCodecException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenStopTrainingException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenEncodeException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenRecordException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenInvalidModelException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenInputException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenRepoException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenCairoSurfaceException(message, width, height)
+

Raised when the Cairo surface couldn’t be created.

+
+
Parameters:
+
    +
  • message (str)

  • +
  • width (int)

  • +
  • height (int)

  • +
+
+
+
+
+message
+

Error message

+
+
Type:
+

str

+
+
+
+ +
+
+width
+

Width of the surface

+
+
Type:
+

int

+
+
+
+ +
+
+height
+

Height of the surface

+
+
Type:
+

int

+
+
+
+ +
+
+height
+
+ +
+
+message
+
+ +
+
+width
+
+ +
+ +
+
+

Legacy modules

+

These modules are retained for compatibility reasons or highly specialized use +cases. In most cases their use is not necessary and they aren’t further +developed for interoperability with new functionality, e.g. the transcription +and line generation modules do not work with the baseline segmenter.

+
+

kraken.binarization module

+
+
+kraken.binarization.nlbin(im, threshold=0.5, zoom=0.5, escale=1.0, border=0.1, perc=80, range=20, low=5, high=90)
+

Performs binarization using non-linear processing.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image

  • +
  • threshold (float)

  • +
  • zoom (float) – Zoom for background page estimation

  • +
  • escale (float) – Scale for estimating a mask over the text region

  • +
  • border (float) – Ignore this much of the border

  • +
  • perc (int) – Percentage for filters

  • +
  • range (int) – Range for filters

  • +
  • low (int) – Percentile for black estimation

  • +
  • high (int) – Percentile for white estimation

  • +
+
+
Returns:
+

PIL.Image.Image containing the binarized image

+
+
Raises:
+

KrakenInputException – When trying to binarize an empty image.

+
+
Return type:
+

PIL.Image.Image

+
+
+
+ +
+
+

kraken.transcribe module

+
+
+class kraken.transcribe.TranscriptionInterface(font=None, font_style=None)
+
+
+add_page(im, segmentation=None, records=None)
+

Adds an image to the transcription interface, optionally filling in +information from a list of ocr_record objects.

+
+
Parameters:
+
    +
  • im (PIL.Image) – Input image

  • +
  • segmentation (dict) – Output of the segment method.

  • +
  • records (list) – A list of ocr_record objects.

  • +
+
+
+
+ +
+
+env
+
+ +
+
+font
+
+ +
+
+line_idx = 1
+
+ +
+
+page_idx = 1
+
+ +
+
+pages: List[dict] = []
+
+ +
+
+seg_idx = 1
+
+ +
+
+text_direction = 'horizontal-tb'
+
+ +
+
+tmpl
+
+ +
+
+write(fd)
+

Writes the HTML file to a file descriptor.

+
+
Parameters:
+

fd (File) – File descriptor (mode=’rb’) to write to.

+
+
+
+ +
+ +
+
+

kraken.linegen module

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.2.0/genindex.html b/4.2.0/genindex.html new file mode 100644 index 000000000..d3dfb17f4 --- /dev/null +++ b/4.2.0/genindex.html @@ -0,0 +1,677 @@ + + + + + + + Index — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ + +

Index

+ +
+ A + | B + | C + | D + | E + | F + | G + | H + | I + | K + | L + | M + | N + | O + | P + | R + | S + | T + | U + | V + | W + +
+

A

+ + + +
+ +

B

+ + + +
+ +

C

+ + + +
+ +

D

+ + + +
+ +

E

+ + + +
+ +

F

+ + + +
+ +

G

+ + + +
+ +

H

+ + + +
+ +

I

+ + + +
+ +

K

+ + + +
+ +

L

+ + + +
+ +

M

+ + + +
+ +

N

+ + + +
+ +

O

+ + + +
+ +

P

+ + + +
+ +

R

+ + + +
+ +

S

+ + + +
+ +

T

+ + + +
+ +

U

+ + +
+ +

V

+ + + +
+ +

W

+ + + +
+ + + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.2.0/gpu.html b/4.2.0/gpu.html new file mode 100644 index 000000000..b7699a4d0 --- /dev/null +++ b/4.2.0/gpu.html @@ -0,0 +1,100 @@ + + + + + + + + GPU Acceleration — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

GPU Acceleration

+

The latest version of kraken uses a new pytorch backend which enables GPU +acceleration both for training and recognition. Apart from a compatible Nvidia +GPU, CUDA and cuDNN have to be installed so pytorch can run computation on it.

+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.2.0/index.html b/4.2.0/index.html new file mode 100644 index 000000000..f9084be35 --- /dev/null +++ b/4.2.0/index.html @@ -0,0 +1,1037 @@ + + + + + + + + kraken — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

kraken

+
+
+

kraken is a turn-key OCR system optimized for historical and non-Latin script +material.

+
+
+

Features

+

kraken’s main features are:

+
+
+
+

Pull requests and code contributions are always welcome.

+
+
+

Installation

+

Kraken can be run on Linux or Mac OS X (both x64 and ARM). Installation through +the on-board pip utility and the anaconda +scientific computing python are supported.

+
+

Installation using Pip

+
$ pip install kraken
+
+
+

or by running pip in the git repository:

+
$ pip install .
+
+
+

If you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to +install the pdf extras package for PyPi:

+
$ pip install kraken[pdf]
+
+
+

or

+
$ pip install .[pdf]
+
+
+

respectively.

+
+
+

Installation using Conda

+

To install the stable version through conda:

+
$ conda install -c conda-forge -c mittagessen kraken
+
+
+

Again PDF/multi-page TIFF/JPEG2000 support requires some additional dependencies:

+
$ conda install -c conda-forge pyvips
+
+
+

The git repository contains some environment files that aid in setting up the latest development version:

+
$ git clone https://github.com/mittagessen/kraken.git
+$ cd kraken
+$ conda env create -f environment.yml
+
+
+

or:

+
$ git clone https://github.com/mittagessen/kraken.git
+$ cd kraken
+$ conda env create -f environment_cuda.yml
+
+
+

for CUDA acceleration with the appropriate hardware.

+
+
+

Finding Recognition Models

+

Finally you’ll have to scrounge up a recognition model to do the actual +recognition of characters. To download the default English text recognition +model and place it in the user’s kraken directory:

+
$ kraken get 10.5281/zenodo.2577813
+
+
+

A list of libre models available in the central repository can be retrieved by +running:

+
$ kraken list
+
+
+

Model metadata can be extracted using:

+
$ kraken show 10.5281/zenodo.2577813
+name: 10.5281/zenodo.2577813
+
+A generalized model for English printed text
+
+This model has been trained on a large corpus of modern printed English text\naugmented with ~10000 lines of historical p
+scripts: Latn
+alphabet: !"#$%&'()+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]`abcdefghijklmnopqrstuvwxyz{} SPACE
+accuracy: 99.95%
+license: Apache-2.0
+author(s): Kiessling, Benjamin
+date: 2019-02-26
+
+
+
+
+
+

Quickstart

+

The structure of an OCR software consists of multiple steps, primarily +preprocessing, segmentation, and recognition, each of which takes the output of +the previous step and sometimes additional files such as models and templates +that define how a particular transformation is to be performed.

+

In kraken these are separated into different subcommands that can be chained or +ran separately:

+ + + + + + + + + + + + + + + Segmentation + + + + + + + + + + + Recognition + + + + + + + + + + + Serialization + + + + + + + + + + + + + + + + + + + + + + Recognition Model + + + + + + + + + + + + + + + + + + + + + + Segmentation Model + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + OCR Records + + + + + + + + + + + + + + + + + + Baselines, + Regions, + and Order + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Output File + + + + + + + + + + + + + + + + + + Output Template + + + + + + + + + + + + + + + + + + Image + + +

Recognizing text on an image using the default parameters including the +prerequisite step of page segmentation:

+
$ kraken -i image.tif image.txt segment -bl ocr
+Loading RNN     ✓
+Processing      ⣻
+
+
+

To segment an image into reading-order sorted baselines and regions:

+
$ kraken -i bw.tif lines.json segment -bl
+
+
+

To OCR an image using the default model:

+
$ kraken -i bw.tif image.txt segment -bl ocr
+
+
+

To OCR an image using the default model and serialize the output using the ALTO +template:

+
$ kraken -a -i bw.tif image.txt segment -bl ocr
+
+
+

All commands and their parameters are documented, just add the standard +--help flag for further information.

+
+
+

Training Tutorial

+

There is a training tutorial at Training kraken.

+
+ +
+

License

+

Kraken is provided under the terms and conditions of the Apache 2.0 +License.

+
+
+

Funding

+

kraken is developed at the École Pratique des Hautes Études, Université PSL.

+
+
+Co-financed by the European Union + +
+
+

This project was partially funded through the RESILIENCE project, funded from +the European Union’s Horizon 2020 Framework Programme for Research and +Innovation.

+
+
+
+
+Received funding from the Programme d’investissements d’Avenir + +
+
+

Ce travail a bénéficié d’une aide de l’État gérée par l’Agence Nationale de la +Recherche au titre du Programme d’Investissements d’Avenir portant la référence +ANR-21-ESRE-0005.

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.2.0/ketos.html b/4.2.0/ketos.html new file mode 100644 index 000000000..379f6effe --- /dev/null +++ b/4.2.0/ketos.html @@ -0,0 +1,826 @@ + + + + + + + + Training — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Training

+

This page describes the training utilities available through the ketos +command line utility in depth. For a gentle introduction on model training +please refer to the tutorial.

+

Both segmentation and recognition are trainable in kraken. The segmentation +model finds baselines and regions on a page image. Recognition models convert +text image lines found by the segmenter into digital text.

+
+

Training data formats

+

The training tools accept a variety of training data formats, usually some kind +of custom low level format, the XML-based formats that are commony used for +archival of annotation and transcription data, and in the case of recognizer +training a precompiled binary format. It is recommended to use the XML formats +for segmentation training and the binary format for recognition training.

+
+

ALTO

+

Kraken parses and produces files according to ALTO 4.2. An example showing the +attributes necessary for segmentation and recognition training follows:

+
<?xml version="1.0" encoding="UTF-8"?>
+<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+	xmlns="http://www.loc.gov/standards/alto/ns-v4#"
+	xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-0.xsd">
+	<Description>
+		<sourceImageInformation>
+			<fileName>filename.jpg</fileName><!-- relative path in relation to XML location of the image file-->
+		</sourceImageInformation>
+		....
+	</Description>
+	<Layout>
+		<Page...>
+			<PrintSpace...>
+				<ComposedBlockType ID="block_I"
+						   HPOS="125"
+						   VPOS="523" 
+						   WIDTH="5234" 
+						   HEIGHT="4000"
+						   TYPE="region_type"><!-- for textlines part of a semantic region -->
+					<TextBlock ID="textblock_N">
+						<TextLine ID="line_0"
+							  HPOS="..."
+							  VPOS="..." 
+							  WIDTH="..." 
+							  HEIGHT="..."
+							  BASELINE="10 20 15 20 400 20"><!-- necessary for segmentation training -->
+							<String ID="segment_K" 
+								CONTENT="word_text"><!-- necessary for recognition training. Text is retrieved from <String> and <SP> tags. Lower level glyphs are ignored. -->
+								...
+							</String>
+							<SP.../>
+						</TextLine>
+					</TextBlock>
+				</ComposedBlockType>
+				<TextBlock ID="textblock_M"><!-- for textlines not part of a region -->
+				...
+				</TextBlock>
+			</PrintSpace>
+		</Page>
+	</Layout>
+</alto>
+
+
+

Importantly, the parser only works with measurements in the pixel domain, i.e. +an unset MeasurementUnit or one with an element value of pixel. In +addition, as the minimal version required for ingestion is quite new it is +likely that most existing ALTO documents will not contain sufficient +information to be used with kraken out of the box.

+
+
+

PAGE XML

+

PAGE XML is parsed and produced according to the 2019-07-15 version of the +schema, although the parser is not strict and works with non-conformant output +of a variety of tools.

+
<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd">
+	<Metadata>...</Metadata>
+	<Page imageFilename="filename.jpg"...><!-- relative path to an image file from the location of the XML document -->
+		<TextRegion id="block_N"
+			    custom="structure {type:region_type;}"><!-- region type is a free text field-->
+			<Coords points="10,20 500,20 400,200, 500,300, 10,300 5,80"/><!-- polygon for region boundary -->
+			<TextLine id="line_K">
+				<Baseline points="80,200 100,210, 400,198"/><!-- required for baseline segmentation training -->
+				<TextEquiv><Unicode>text text text</Unicode></TextEquiv><!-- only TextEquiv tags immediately below the TextLine tag are parsed for recognition training -->
+				<Word>
+				...
+			</TextLine>
+			....
+		</TextRegion>
+		<TextRegion id="textblock_M"><!-- for lines not contained in any region. TextRegions without a type are automatically assigned the 'text' type which can be filtered out for training. -->
+			<Coords points="0,0 0,{{ page.size[1] }} {{ page.size[0] }},{{ page.size[1] }} {{ page.size[0] }},0"/>
+			<TextLine>...</TextLine><!-- same as above -->
+			....
+                </TextRegion>
+	</Page>
+</PcGts>
+
+
+
+
+

Binary Datasets

+

In addition to training recognition models directly from XML and image files, a +binary dataset format offering a couple of advantages is supported. Binary +datasets drastically improve loading performance allowing the saturation of +most GPUs with minimal computational overhead while also allowing training with +datasets that are larger than the systems main memory. A minor drawback is a +~30% increase in dataset size in comparison to the raw images + XML approach.

+

To realize this speedup the dataset has to be compiled first:

+
$ ketos compile -f xml -o dataset.arrow file_1.xml file_2.xml ...
+
+
+

if there are a lot of individual lines containing many lines this process can +take a long time. It can easily be parallelized by specifying the number of +separate parsing workers with the –workers option:

+
$ ketos compile --workers 8 -f xml ...
+
+
+

In addition, binary datasets can contain fixed splits which allow +reproducibility and comparability between training and evaluation runs. +Training, validation, and test splits can be pre-defined from multiple sources. +Per default they are sourced from tags defined in the source XML files unless +the option telling kraken to ignore them is set:

+
$ ketos compile --ignore-splits -f xml ...
+
+
+

Alternatively fixed-proportion random splits can be created ad-hoc during +compile time:

+
$ ketos compile --random-split 0.8 0.1 0.1 ...
+
+
+

The above line splits assigns 80% of the source lines to the training set, 10% +to the validation set, and 10% to the test set. The training and validation +sets in the dataset file are used automatically by ketos train (unless told +otherwise) while the remaining 10% of the test set is selected by ketos test.

+
+
+
+

Recognition training

+

The training utility allows training of VGSL specified models +both from scratch and from existing models. Here are its most important command line options:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

option

action

-o, –output

Output model file prefix. Defaults to model.

-s, –spec

VGSL spec of the network to train. CTC layer +will be added automatically. default: +[1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 +Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do]

-a, –append

Removes layers before argument and then +appends spec. Only works when loading an +existing model

-i, –load

Load existing file to continue training

-F, –savefreq

Model save frequency in epochs during +training

-q, –quit

Stop condition for training. Set to early +for early stopping (default) or dumb for fixed +number of epochs.

-N, –epochs

Number of epochs to train for.

–min-epochs

Minimum number of epochs to train for when using early stopping.

–lag

Number of epochs to wait before stopping +training without improvement. Only used when using early stopping.

-d, –device

Select device to use (cpu, cuda:0, cuda:1,…). GPU acceleration requires CUDA.

–optimizer

Select optimizer (Adam, SGD, RMSprop).

-r, –lrate

Learning rate [default: 0.001]

-m, –momentum

Momentum used with SGD optimizer. Ignored otherwise.

-w, –weight-decay

Weight decay.

–schedule

Sets the learning rate scheduler. May be either constant, 1cycle, exponential, cosine, step, or +reduceonplateau. For 1cycle the cycle length is determined by the –epoch option.

-p, –partition

Ground truth data partition ratio between train/validation set

-u, –normalization

Ground truth Unicode normalization. One of NFC, NFKC, NFD, NFKD.

-c, –codec

Load a codec JSON definition (invalid if loading existing model)

–resize

Codec/output layer resizing option. If set +to add code points will be added, both +will set the layer to match exactly the +training data, fail will abort if training +data and model codec do not match. Only valid when refining an existing model.

-n, –reorder / –no-reorder

Reordering of code points to display order.

-t, –training-files

File(s) with additional paths to training data. Used to +enforce an explicit train/validation set split and deal with +training sets with more lines than the command line can process. Can be used more than once.

-e, –evaluation-files

File(s) with paths to evaluation data. Overrides the -p parameter.

-f, –format-type

Sets the training and evaluation data format. +Valid choices are ‘path’, ‘xml’ (default), ‘alto’, ‘page’, or binary. +In alto, page, and xml mode all data is extracted from XML files +containing both baselines and a link to source images. +In path mode arguments are image files sharing a prefix up to the last +extension with JSON .path files containing the baseline information. +In binary mode arguments are precompiled binary dataset files.

–augment / –no-augment

Enables/disables data augmentation.

–workers

Number of OpenMP threads and workers used to perform neural network passes and load samples from the dataset.

+
+

From Scratch

+

The absolute minimal example to train a new recognition model from a number of +ALTO or PAGE XML documents is similar to the segmentation training:

+
$ ketos train -f xml training_data/*.xml
+
+
+

Training will continue until the error does not improve anymore and the best +model (among intermediate results) will be saved in the current directory; this +approach is called early stopping.

+

In some cases, such as color inputs, changing the network architecture might be +useful:

+
$ ketos train -f page -s '[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do]' syr/*.xml
+
+
+

Complete documentation for the network description language can be found on the +VGSL page.

+

Sometimes the early stopping default parameters might produce suboptimal +results such as stopping training too soon. Adjusting the minimum delta an/or +lag can be useful:

+
$ ketos train --lag 10 --min-delta 0.001 syr/*.png
+
+
+

To switch optimizers from Adam to SGD or RMSprop just set the option:

+
$ ketos train --optimizer SGD syr/*.png
+
+
+

It is possible to resume training from a previously saved model:

+
$ ketos train -i model_25.mlmodel syr/*.png
+
+
+

A good configuration for a small precompiled print dataset and GPU acceleration +would be:

+
$ ketos train -d cuda -f binary dataset.arrow
+
+
+

A better configuration for large and complicated datasets such as handwritten texts:

+
$ ketos train --augment --workers 4 -d cuda -f binary --min-epochs 20 -w 0 -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]' -r 0.0001 dataset_large.arrow
+
+
+

This configuration is slower to train and often requires a couple of epochs to +output any sensible text at all. Therefore we tell ketos to train for at least +20 epochs so the early stopping algorithm doesn’t prematurely interrupt the +training process.

+
+
+

Fine Tuning

+

Fine tuning an existing model for another typeface or new characters is also +possible with the same syntax as resuming regular training:

+
$ ketos train -f page -i model_best.mlmodel syr/*.xml
+
+
+

The caveat is that the alphabet of the base model and training data have to be +an exact match. Otherwise an error will be raised:

+
$ ketos train -i model_5.mlmodel kamil/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[0.8616] alphabet mismatch {'~', '»', '8', '9', 'ـ'}
+Network codec not compatible with training set
+[0.8620] Training data and model codec alphabets mismatch: {'ٓ', '؟', '!', 'ص', '،', 'ذ', 'ة', 'ي', 'و', 'ب', 'ز', 'ح', 'غ', '~', 'ف', ')', 'د', 'خ', 'م', '»', 'ع', 'ى', 'ق', 'ش', 'ا', 'ه', 'ك', 'ج', 'ث', '(', 'ت', 'ظ', 'ض', 'ل', 'ط', '؛', 'ر', 'س', 'ن', 'ء', 'ٔ', '«', 'ـ', 'ٕ'}
+
+
+

There are two modes dealing with mismatching alphabets, add and both. +add resizes the output layer and codec of the loaded model to include all +characters in the new training set without removing any characters. both +will make the resulting model an exact match with the new training set by both +removing unused characters from the model and adding new ones.

+
$ ketos -v train --resize add -i model_5.mlmodel syr/*.png
+...
+[0.7943] Training set 788 lines, validation set 88 lines, alphabet 50 symbols
+...
+[0.8337] Resizing codec to include 3 new code points
+[0.8374] Resizing last layer in network to 52 outputs
+...
+
+
+

In this example 3 characters were added for a network that is able to +recognize 52 different characters after sufficient additional training.

+
$ ketos -v train --resize both -i model_5.mlmodel syr/*.png
+...
+[0.7593] Training set 788 lines, validation set 88 lines, alphabet 49 symbols
+...
+[0.7857] Resizing network or given codec to 49 code sequences
+[0.8344] Deleting 2 output classes from network (46 retained)
+...
+
+
+

In both mode 2 of the original characters were removed and 3 new ones were added.

+
+
+

Slicing

+

Refining on mismatched alphabets has its limits. If the alphabets are highly +different the modification of the final linear layer to add/remove character +will destroy the inference capabilities of the network. In those cases it is +faster to slice off the last few layers of the network and only train those +instead of a complete network from scratch.

+

Taking the default network definition as printed in the debug log we can see +the layer indices of the model:

+
[0.8760] Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 48 outputs
+[0.8762] layer          type    params
+[0.8790] 0              conv    kernel 3 x 3 filters 32 activation r
+[0.8795] 1              dropout probability 0.1 dims 2
+[0.8797] 2              maxpool kernel 2 x 2 stride 2 x 2
+[0.8802] 3              conv    kernel 3 x 3 filters 64 activation r
+[0.8804] 4              dropout probability 0.1 dims 2
+[0.8806] 5              maxpool kernel 2 x 2 stride 2 x 2
+[0.8813] 6              reshape from 1 1 x 12 to 1/3
+[0.8876] 7              rnn     direction b transposed False summarize False out 100 legacy None
+[0.8878] 8              dropout probability 0.5 dims 1
+[0.8883] 9              linear  augmented False out 48
+
+
+

To remove everything after the initial convolutional stack and add untrained +layers we define a network stub and index for appending:

+
$ ketos train -i model_1.mlmodel --append 7 -s '[Lbx256 Do]' syr/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[0.8014] alphabet mismatch {'8', '3', '9', '7', '܇', '݀', '݂', '4', ':', '0'}
+Slicing and dicing model ✓
+
+
+

The new model will behave exactly like a new one, except potentially training a +lot faster.

+
+
+

Text Normalization and Unicode

+

Text can be encoded in multiple different ways when using Unicode. For many +scripts characters with diacritics can be encoded either as a single code point +or a base character and the diacritic, different types of whitespace exist, and mixed bidirectional text +can be written differently depending on the base line direction.

+

Ketos provides options to largely normalize input into normalized forms that +make processing of data from multiple sources possible. Principally, two +options are available: one for Unicode normalization and one for whitespace normalization. The +Unicode normalization (disabled per default) switch allows one to select one of +the 4 normalization forms:

+
$ ketos train --normalization NFD -f xml training_data/*.xml
+$ ketos train --normalization NFC -f xml training_data/*.xml
+$ ketos train --normalization NFKD -f xml training_data/*.xml
+$ ketos train --normalization NFKC -f xml training_data/*.xml
+
+
+

Whitespace normalization is enabled per default and converts all Unicode +whitespace characters into a simple space. It is highly recommended to leave +this function enabled as the variation of space width, resulting either from +text justification or the irregularity of handwriting, is difficult for a +recognition model to accurately model and map onto the different space code +points. Nevertheless it can be disabled through:

+
$ ketos train --no-normalize-whitespace -f xml training_data/*.xml
+
+
+

Further the behavior of the BiDi algorithm can be influenced through two options. The +configuration of the algorithm is important as the recognition network is +trained to output characters (or rather labels which are mapped to code points +by a codec) in the order a line is fed into the network, i.e. +left-to-right also called display order. Unicode text is encoded as a stream of +code points in logical order, i.e. the order the characters in a line are read +in by a human reader, for example (mostly) right-to-left for a text in Hebrew. +The BiDi algorithm resolves this logical order to the display order expected by +the network and vice versa. The primary parameter of the algorithm is the base +direction which is just the default direction of the input fields of the user +when the ground truth was initially transcribed. Base direction will be +automatically determined by kraken when using PAGE XML or ALTO files that +contain it, otherwise it will have to be supplied if it differs from the +default when training a model:

+
$ ketos train --base-dir R -f xml rtl_training_data/*.xml
+
+
+

It is also possible to disable BiDi processing completely, e.g. when the text +has been brought into display order already:

+
$ ketos train --no-reorder -f xml rtl_display_data/*.xml
+
+
+
+
+

Codecs

+

Codecs map between the label decoded from the raw network output and Unicode +code points (see this diagram for the precise steps +involved in text line recognition). Codecs are attached to a recognition model +and are usually defined once at initial training time, although they can be +adapted either explicitly (with the API) or implicitly through domain adaptation.

+

The default behavior of kraken is to auto-infer this mapping from all the +characters in the training set and map each code point to one separate label. +This is usually sufficient for alphabetic scripts, abjads, and abugidas apart +from very specialised use cases. Logographic writing systems with a very large +number of different graphemes, such as all the variants of Han characters or +Cuneiform, can be more problematic as their large inventory makes recognition +both slow and error-prone. In such cases it can be advantageous to decompose +each code point into multiple labels to reduce the output dimensionality of the +network. During decoding valid sequences of labels will be mapped to their +respective code points as usual.

+

There are multiple approaches one could follow constructing a custom codec: +randomized block codes, i.e. producing random fixed-length labels for each code +point, Huffmann coding, i.e. variable length label sequences depending on the +frequency of each code point in some text (not necessarily the training set), +or structural decomposition, i.e. describing each code point through a +sequence of labels that describe the shape of the grapheme similar to how some +input systems for Chinese characters function.

+

While the system is functional it is not well-tested in practice and it is +unclear which approach works best for which kinds of inputs.

+

Custom codecs can be supplied as simple JSON files that contain a dictionary +mapping between strings and integer sequences, e.g.:

+
$ ketos train -c sample.codec -f xml training_data/*.xml
+
+
+

with sample.codec containing:

+
{"S": [50, 53, 74, 23],
+ "A": [95, 60, 19, 95],
+ "B": [2, 96, 28, 29],
+ "\u1f05": [91, 14, 95, 90]}
+
+
+
+
+
+

Unsupervised recognition pretraining

+

Text recognition models can be pretrained in an unsupervised fashion from text +line images, both in bounding box and baseline format. The pretraining is +performed through a contrastive surrogate task aiming to distinguish in-painted +parts of the input image features from randomly sampled distractor slices.

+

All data sources accepted by the supervised trainer are valid for pretraining.

+

The basic pretraining call is very similar to a training one:

+
$ ketos pretrain -f binary foo.arrow
+
+
+

There are a couple of hyperparameters that are specific to pretraining: the +mask width (at the subsampling level of the last convolutional layer), the +probability of a particular position being the start position of a mask, and +the number of negative distractor samples.

+
$ ketos pretrain -o pretrain --mask-width 4 --mask-probability 0.2 --num-negatives 3 -f binary foo.arrow
+
+
+

Once a model has been pretrained it has to be adapted to perform actual +recognition with a standard labelled dataset, although training data +requirements will usually be much reduced:

+
$ ketos train -i pretrain_best.mlmodel --warmup 5000 -f binary labelled.arrow
+
+
+

It is recommended to use learning rate warmup (warmup) for at least one epoch +to improve convergence of the pretrained model.

+
+
+

Segmentation training

+

Training a segmentation model is very similar to training models for text +recognition. The basic invocation is:

+
$ ketos segtrain -f xml training_data/*.xml
+Training line types:
+  default 2     53980
+  foo     8     134
+Training region types:
+  graphic       3       135
+  text  4       1128
+  separator     5       5431
+  paragraph     6       10218
+  table 7       16
+val check  [------------------------------------]  0/0
+
+
+

This takes all text lines and regions encoded in the XML files and trains a +model to recognize them.

+

Most other options available in transcription training are also available in +segmentation training. CUDA acceleration:

+
$ ketos segtrain -d cuda -f xml training_data/*.xml
+
+
+

Defining custom architectures:

+
$ ketos segtrain -d cuda -s '[1,1200,0,3 Cr7,7,64,2,2 Gn32 Cr3,3,128,2,2 Gn32 Cr3,3,128 Gn32 Cr3,3,256 Gn32]' -f xml training_data/*.xml
+
+
+

Fine tuning/transfer learning with last layer adaptation and slicing:

+
$ ketos segtrain --resize both -i segmodel_best.mlmodel training_data/*.xml
+$ ketos segtrain -i segmodel_best.mlmodel --append 7 -s '[Cr3,3,64 Do0.1]' training_data/*.xml
+
+
+

In addition there are a couple of specific options that allow filtering of +baseline and region types. Datasets are often annotated to a level that is too +detailed or contains undesirable types, e.g. when combining segmentation data +from different sources. The most basic option is the suppression of all of +either baseline or region data contained in the dataset:

+
$ ketos segtrain --suppress-baselines -f xml training_data/*.xml
+Training line types:
+Training region types:
+  graphic       3       135
+  text  4       1128
+  separator     5       5431
+  paragraph     6       10218
+  table 7       16
+...
+$ ketos segtrain --suppress-regions -f xml training-data/*.xml
+Training line types:
+  default 2     53980
+  foo     8     134
+...
+
+
+

It is also possible to filter out baselines/regions selectively:

+
$ ketos segtrain -f xml --valid-baselines default training_data/*.xml
+Training line types:
+  default 2     53980
+Training region types:
+  graphic       3       135
+  text  4       1128
+  separator     5       5431
+  paragraph     6       10218
+  table 7       16
+$ ketos segtrain -f xml --valid-regions graphic --valid-regions paragraph training_data/*.xml
+Training line types:
+  default 2     53980
+ Training region types:
+  graphic       3       135
+  paragraph     6       10218
+
+
+

Finally, we can merge baselines and regions into each other:

+
$ ketos segtrain -f xml --merge-baselines default:foo training_data/*.xml
+Training line types:
+  default 2     54114
+...
+$ ketos segtrain -f xml --merge-regions text:paragraph --merge-regions graphic:table training_data/*.xml
+...
+Training region types:
+  graphic       3       151
+  text  4       11346
+  separator     5       5431
+...
+
+
+

These options are combinable to massage the dataset into any typology you want.

+

Then there are some options that set metadata fields controlling the +postprocessing. When computing the bounding polygons the recognized baselines +are offset slightly to ensure overlap with the line corpus. This offset is per +default upwards for baselines but as it is possible to annotate toplines (for +scripts like Hebrew) and centerlines (for baseline-free scripts like Chinese) +the appropriate offset can be selected with an option:

+
$ ketos segtrain --topline -f xml hebrew_training_data/*.xml
+$ ketos segtrain --centerline -f xml chinese_training_data/*.xml
+$ ketos segtrain --baseline -f xml latin_training_data/*.xml
+
+
+

Lastly, there are some regions that are absolute boundaries for text line +content. When these regions are marked as such the polygonization can sometimes +be improved:

+
$ ketos segtrain --bounding-regions paragraph -f xml training_data/*.xml
+...
+
+
+
+
+

Recognition Testing

+

Picking a particular model from a pool or getting a more detailed look on the +recognition accuracy can be done with the test command. It uses transcribed +lines, the test set, in the same format as the train command, recognizes the +line images with one or more models, and creates a detailed report of the +differences from the ground truth for each of them.

+ + + + + + + + + + + + + + + + + + + + + + + +

option

action

-f, –format-type

Sets the test set data format. +Valid choices are ‘path’, ‘xml’ (default), ‘alto’, ‘page’, or binary. +In alto, page, and xml mode all data is extracted from XML files +containing both baselines and a link to source images. +In path mode arguments are image files sharing a prefix up to the last +extension with JSON .path files containing the baseline information. +In binary mode arguments are precompiled binary dataset files.

-m, –model

Model(s) to evaluate.

-e, –evaluation-files

File(s) with paths to evaluation data.

-d, –device

Select device to use.

–pad

Left and right padding around lines.

+

Transcriptions are handed to the command in the same way as for the train +command, either through a manifest with -e/–evaluation-files or by just +adding a number of image files as the final argument:

+
$ ketos test -m $model -e test.txt test/*.png
+Evaluating $model
+Evaluating  [####################################]  100%
+=== report test_model.mlmodel ===
+
+7012 Characters
+6022 Errors
+14.12%       Accuracy
+
+5226 Insertions
+2    Deletions
+794  Substitutions
+
+Count Missed   %Right
+1567  575    63.31%  Common
+5230  5230   0.00%   Arabic
+215   215    0.00%   Inherited
+
+Errors       Correct-Generated
+773  { ا } - {  }
+536  { ل } - {  }
+328  { و } - {  }
+274  { ي } - {  }
+266  { م } - {  }
+256  { ب } - {  }
+246  { ن } - {  }
+241  { SPACE } - {  }
+207  { ر } - {  }
+199  { ف } - {  }
+192  { ه } - {  }
+174  { ع } - {  }
+172  { ARABIC HAMZA ABOVE } - {  }
+144  { ت } - {  }
+136  { ق } - {  }
+122  { س } - {  }
+108  { ، } - {  }
+106  { د } - {  }
+82   { ك } - {  }
+81   { ح } - {  }
+71   { ج } - {  }
+66   { خ } - {  }
+62   { ة } - {  }
+60   { ص } - {  }
+39   { ، } - { - }
+38   { ش } - {  }
+30   { ا } - { - }
+30   { ن } - { - }
+29   { ى } - {  }
+28   { ذ } - {  }
+27   { ه } - { - }
+27   { ARABIC HAMZA BELOW } - {  }
+25   { ز } - {  }
+23   { ث } - {  }
+22   { غ } - {  }
+20   { م } - { - }
+20   { ي } - { - }
+20   { ) } - {  }
+19   { : } - {  }
+19   { ط } - {  }
+19   { ل } - { - }
+18   { ، } - { . }
+17   { ة } - { - }
+16   { ض } - {  }
+...
+Average accuracy: 14.12%, (stddev: 0.00)
+
+
+

The report(s) contains character accuracy measured per script and a detailed +list of confusions. When evaluating multiple models the last line of the output +will the average accuracy and the standard deviation across all of them.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.2.0/models.html b/4.2.0/models.html new file mode 100644 index 000000000..34e659892 --- /dev/null +++ b/4.2.0/models.html @@ -0,0 +1,126 @@ + + + + + + + + Models — kraken documentation + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Models

+

There are currently three kinds of models containing the recurrent neural +networks doing all the character recognition supported by kraken: pronn +files serializing old pickled pyrnn models as protobuf, clstm’s native +serialization, and versatile Core ML models.

+
+

CoreML

+

Core ML allows arbitrary network architectures in a compact serialization with +metadata. This is the default format in pytorch-based kraken.

+
+
+

Segmentation Models

+
+
+

Recognition Models

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.2.0/objects.inv b/4.2.0/objects.inv new file mode 100644 index 000000000..3df3867ff Binary files /dev/null and b/4.2.0/objects.inv differ diff --git a/4.2.0/search.html b/4.2.0/search.html new file mode 100644 index 000000000..3134374c5 --- /dev/null +++ b/4.2.0/search.html @@ -0,0 +1,113 @@ + + + + + + + Search — kraken documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +

Search

+ + + + +

+ Searching for multiple words only shows matches that contain + all words. +

+ + +
+ + + +
+ + +
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.2.0/searchindex.js b/4.2.0/searchindex.js new file mode 100644 index 000000000..acea7a836 --- /dev/null +++ b/4.2.0/searchindex.js @@ -0,0 +1 @@ +Search.setIndex({"alltitles": {"ALTO": [[5, "alto"]], "API Quickstart": [[1, null]], "API Reference": [[2, null]], "Advanced Usage": [[0, null]], "Annotation and transcription": [[7, "annotation-and-transcription"]], "Baseline Segmentation": [[0, "baseline-segmentation"]], "Baseline segmentation": [[1, "baseline-segmentation"]], "Basic Concepts": [[1, "basic-concepts"]], "Basics": [[8, "basics"]], "Binarization": [[0, "binarization"]], "Binary Datasets": [[5, "binary-datasets"]], "Codecs": [[5, "codecs"]], "Convolutional Layers": [[8, "convolutional-layers"]], "CoreML": [[6, "coreml"]], "Dataset Compilation": [[7, "dataset-compilation"]], "Datasets": [[2, "datasets"]], "Dropout": [[8, "dropout"]], "Evaluation and Validation": [[7, "evaluation-and-validation"]], "Examples": [[8, "examples"]], "Features": [[4, "features"]], "Finding Recognition Models": [[4, "finding-recognition-models"]], "Fine Tuning": [[5, "fine-tuning"]], "From Scratch": [[5, "from-scratch"]], "Funding": [[4, "funding"]], "GPU Acceleration": [[3, null]], "Group Normalization": [[8, "group-normalization"]], "Helper and Plumbing Layers": [[8, "helper-and-plumbing-layers"]], "Helpers": [[2, "helpers"]], "Image acquisition and preprocessing": [[7, "image-acquisition-and-preprocessing"]], "Input Specification": [[0, "input-specification"]], "Installation": [[4, "installation"]], "Installation using Conda": [[4, "installation-using-conda"]], "Installation using Pip": [[4, "installation-using-pip"]], "Installing kraken": [[7, "installing-kraken"]], "Legacy Box Segmentation": [[0, "legacy-box-segmentation"]], "Legacy modules": [[2, "legacy-modules"]], "Legacy segmentation": [[1, "legacy-segmentation"]], "License": [[4, "license"]], "Loss and Evaluation Functions": [[2, "loss-and-evaluation-functions"]], "Masking": [[0, "masking"]], "Max Pool": [[8, "max-pool"]], "Model Repository": [[0, "model-repository"]], "Models": [[6, null]], "PAGE XML": [[5, "page-xml"]], "Page Segmentation": [[0, "page-segmentation"]], "Preprocessing and Segmentation": [[1, "preprocessing-and-segmentation"]], "Principal Text Direction": [[0, "principal-text-direction"]], "Publishing": [[0, "publishing"]], "Querying and Model Retrieval": [[0, "querying-and-model-retrieval"]], "Quickstart": [[4, "quickstart"]], "Recognition": [[0, "recognition"], [1, "recognition"], [7, "recognition"]], "Recognition Models": [[6, "recognition-models"]], "Recognition Testing": [[5, "recognition-testing"]], "Recognition training": [[5, "recognition-training"]], "Recurrent Layers": [[8, "recurrent-layers"]], "Regularization Layers": [[8, "regularization-layers"]], "Related Software": [[4, "related-software"]], "Reshape": [[8, "reshape"]], "Segmentation Models": [[6, "segmentation-models"]], "Segmentation training": [[5, "segmentation-training"]], "Serialization": [[1, "serialization"]], "Slicing": [[5, "slicing"]], "Text Normalization and Unicode": [[5, "text-normalization-and-unicode"]], "Trainer": [[2, "trainer"]], "Training": [[1, "training"], [5, null], [7, "compilation"]], "Training Schedulers": [[2, "training-schedulers"]], "Training Stoppers": [[2, "training-stoppers"]], "Training Tutorial": [[4, "training-tutorial"]], "Training data formats": [[5, "training-data-formats"]], "Training kraken": [[7, null]], "Unsupervised recognition pretraining": [[5, "unsupervised-recognition-pretraining"]], "VGSL network specification": [[8, null]], "XML Parsing": [[1, "xml-parsing"]], "kraken": [[4, null]], "kraken.binarization module": [[2, "kraken-binarization-module"]], "kraken.blla module": [[2, "kraken-blla-module"]], "kraken.lib.codec module": [[2, "kraken-lib-codec-module"]], "kraken.lib.ctc_decoder": [[2, "kraken-lib-ctc-decoder"]], "kraken.lib.dataset module": [[2, "kraken-lib-dataset-module"]], "kraken.lib.exceptions": [[2, "kraken-lib-exceptions"]], "kraken.lib.models module": [[2, "kraken-lib-models-module"]], "kraken.lib.segmentation module": [[2, "kraken-lib-segmentation-module"]], "kraken.lib.train module": [[2, "kraken-lib-train-module"]], "kraken.lib.vgsl module": [[2, "kraken-lib-vgsl-module"]], "kraken.lib.xml module": [[2, "kraken-lib-xml-module"]], "kraken.linegen module": [[2, "kraken-linegen-module"]], "kraken.pageseg module": [[2, "kraken-pageseg-module"]], "kraken.rpred module": [[2, "kraken-rpred-module"]], "kraken.serialization module": [[2, "kraken-serialization-module"]], "kraken.transcribe module": [[2, "kraken-transcribe-module"]]}, "docnames": ["advanced", "api", "api_docs", "gpu", "index", "ketos", "models", "training", "vgsl"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2}, "filenames": ["advanced.rst", "api.rst", "api_docs.rst", "gpu.rst", "index.rst", "ketos.rst", "models.rst", "training.rst", "vgsl.rst"], "indexentries": {"add() (kraken.lib.dataset.baselineset method)": [[2, "kraken.lib.dataset.BaselineSet.add", false]], "add() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.add", false]], "add() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.add", false]], "add_codec() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.add_codec", false]], "add_labels() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.add_labels", false]], "add_page() (kraken.transcribe.transcriptioninterface method)": [[2, "kraken.transcribe.TranscriptionInterface.add_page", false]], "alphabet (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.alphabet", false]], "alphabet (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.alphabet", false]], "append() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.append", false]], "aug (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.aug", false]], "aug (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.aug", false]], "aug (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.aug", false]], "aux_layers (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.aux_layers", false]], "base_dir (kraken.rpred.ocr_record attribute)": [[2, "kraken.rpred.ocr_record.base_dir", false]], "baselineset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.BaselineSet", false]], "beam_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.beam_decoder", false]], "bidi_record() (in module kraken.rpred)": [[2, "kraken.rpred.bidi_record", false]], "bidi_reordering (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.bidi_reordering", false]], "blank_threshold_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.blank_threshold_decoder", false]], "blocks (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.blocks", false]], "bounds (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.bounds", false]], "build_addition() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_addition", false]], "build_conv() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_conv", false]], "build_dropout() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_dropout", false]], "build_groupnorm() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_groupnorm", false]], "build_identity() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_identity", false]], "build_maxpool() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_maxpool", false]], "build_output() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_output", false]], "build_parallel() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_parallel", false]], "build_reshape() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_reshape", false]], "build_rnn() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_rnn", false]], "build_series() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_series", false]], "build_wav2vec2() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_wav2vec2", false]], "c_sorted (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.c_sorted", false]], "calculate_polygonal_environment() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.calculate_polygonal_environment", false]], "class_mapping (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.class_mapping", false]], "class_stats (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.class_stats", false]], "codec (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.codec", false]], "codec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.codec", false]], "compute_error() (in module kraken.lib.dataset)": [[2, "kraken.lib.dataset.compute_error", false]], "compute_polygon_section() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.compute_polygon_section", false]], "confidences (kraken.rpred.ocr_record attribute)": [[2, "kraken.rpred.ocr_record.confidences", false]], "criterion (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id0", false], [2, "kraken.lib.vgsl.TorchVGSLModel.criterion", false]], "cuts (kraken.rpred.ocr_record attribute)": [[2, "kraken.rpred.ocr_record.cuts", false]], "decode() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.decode", false]], "decoder (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.decoder", false]], "denoising_hysteresis_thresh() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.denoising_hysteresis_thresh", false]], "device (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.device", false]], "encode() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.encode", false]], "encode() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.encode", false]], "encode() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.encode", false]], "env (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.env", false]], "eval() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.eval", false]], "extract_polygons() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.extract_polygons", false]], "filtered_tags (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.filtered_tags", false]], "fit() (kraken.lib.train.krakentrainer method)": [[2, "kraken.lib.train.KrakenTrainer.fit", false]], "font (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.font", false]], "forward() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.forward", false]], "greedy_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.greedy_decoder", false]], "groundtruthdataset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.GroundTruthDataset", false]], "height (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id5", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.height", false]], "hyper_params (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.hyper_params", false]], "idx (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.idx", false]], "ignore_empty_lines (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.ignore_empty_lines", false]], "ignore_empty_lines (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.ignore_empty_lines", false]], "im (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.im", false]], "im_mode (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.im_mode", false]], "im_mode (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.im_mode", false]], "im_mode (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.im_mode", false]], "im_str (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.im_str", false]], "imgs (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.imgs", false]], "init_weights() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.init_weights", false]], "input (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id1", false], [2, "kraken.lib.vgsl.TorchVGSLModel.input", false]], "is_valid (kraken.lib.codec.pytorchcodec property)": [[2, "kraken.lib.codec.PytorchCodec.is_valid", false]], "kind (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.kind", false]], "krakencairosurfaceexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenCairoSurfaceException", false]], "krakencodecexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenCodecException", false]], "krakenencodeexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenEncodeException", false]], "krakeninputexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenInputException", false]], "krakeninvalidmodelexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenInvalidModelException", false]], "krakenrecordexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenRecordException", false]], "krakenrepoexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenRepoException", false]], "krakenstoptrainingexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenStopTrainingException", false]], "krakentrainer (class in kraken.lib.train)": [[2, "kraken.lib.train.KrakenTrainer", false]], "l2c (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.l2c", false]], "line_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.line_idx", false]], "line_width (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.line_width", false]], "load_any() (in module kraken.lib.models)": [[2, "kraken.lib.models.load_any", false]], "load_model() (kraken.lib.vgsl.torchvgslmodel class method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.load_model", false]], "m (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.m", false]], "max_label (kraken.lib.codec.pytorchcodec property)": [[2, "kraken.lib.codec.PytorchCodec.max_label", false]], "mbl_dict (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.mbl_dict", false]], "merge() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.merge", false]], "message (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id6", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.message", false]], "miss (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.miss", false]], "mm_rpred (class in kraken.rpred)": [[2, "kraken.rpred.mm_rpred", false]], "mode (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.mode", false]], "model_type (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.model_type", false]], "mreg_dict (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.mreg_dict", false]], "named_spec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.named_spec", false]], "nets (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.nets", false]], "nlbin() (in module kraken.binarization)": [[2, "kraken.binarization.nlbin", false]], "nn (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.nn", false]], "nn (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id2", false], [2, "kraken.lib.vgsl.TorchVGSLModel.nn", false]], "no_encode() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.no_encode", false]], "no_encode() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.no_encode", false]], "num_classes (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.num_classes", false]], "ocr_record (class in kraken.rpred)": [[2, "kraken.rpred.ocr_record", false]], "one_channel_mode (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.one_channel_mode", false]], "one_channel_mode (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.one_channel_mode", false]], "one_channel_mode (kraken.lib.vgsl.torchvgslmodel property)": [[2, "id3", false]], "one_channel_modes (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.one_channel_modes", false]], "ops (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.ops", false]], "pad (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.pad", false]], "page_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.page_idx", false]], "pages (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.pages", false]], "parse() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.parse", false]], "parse() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.parse", false]], "parse_alto() (in module kraken.lib.xml)": [[2, "kraken.lib.xml.parse_alto", false]], "parse_page() (in module kraken.lib.xml)": [[2, "kraken.lib.xml.parse_page", false]], "parse_xml() (in module kraken.lib.xml)": [[2, "kraken.lib.xml.parse_xml", false]], "pattern (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.pattern", false]], "polygonal_reading_order() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.polygonal_reading_order", false]], "polygongtdataset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.PolygonGTDataset", false]], "predict() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict", false]], "predict_labels() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict_labels", false]], "predict_string() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict_string", false]], "prediction (kraken.rpred.ocr_record attribute)": [[2, "kraken.rpred.ocr_record.prediction", false]], "preparse_xml_data() (in module kraken.lib.dataset)": [[2, "kraken.lib.dataset.preparse_xml_data", false]], "pytorchcodec (class in kraken.lib.codec)": [[2, "kraken.lib.codec.PytorchCodec", false]], "reading_order() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.reading_order", false]], "render_report() (in module kraken.serialization)": [[2, "kraken.serialization.render_report", false]], "resize_output() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.resize_output", false]], "rpred() (in module kraken.rpred)": [[2, "kraken.rpred.rpred", false]], "save_model() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.save_model", false]], "scale_polygonal_lines() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.scale_polygonal_lines", false]], "scale_regions() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.scale_regions", false]], "seg_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.seg_idx", false]], "seg_type (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.seg_type", false]], "seg_type (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.seg_type", false]], "seg_type (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.seg_type", false]], "seg_type (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.seg_type", false]], "seg_type (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.seg_type", false]], "seg_types (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.seg_types", false]], "segment() (in module kraken.blla)": [[2, "kraken.blla.segment", false]], "segment() (in module kraken.pageseg)": [[2, "kraken.pageseg.segment", false]], "serialize() (in module kraken.serialization)": [[2, "kraken.serialization.serialize", false]], "serialize_segmentation() (in module kraken.serialization)": [[2, "kraken.serialization.serialize_segmentation", false]], "set_num_threads() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.set_num_threads", false]], "spec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.spec", false]], "split (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.split", false]], "strict (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.strict", false]], "suffix (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.suffix", false]], "tags (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.tags", false]], "tags (kraken.rpred.ocr_record attribute)": [[2, "kraken.rpred.ocr_record.tags", false]], "tags_ignore (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.tags_ignore", false]], "targets (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.targets", false]], "text_direction (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.text_direction", false]], "text_transforms (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.text_transforms", false]], "text_transforms (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.text_transforms", false]], "tmpl (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.tmpl", false]], "to() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.to", false]], "to() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.to", false]], "torchseqrecognizer (class in kraken.lib.models)": [[2, "kraken.lib.models.TorchSeqRecognizer", false]], "torchvgslmodel (class in kraken.lib.vgsl)": [[2, "kraken.lib.vgsl.TorchVGSLModel", false]], "train (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.train", false]], "train() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.train", false]], "transcriptioninterface (class in kraken.transcribe)": [[2, "kraken.transcribe.TranscriptionInterface", false]], "transform() (kraken.lib.dataset.baselineset method)": [[2, "kraken.lib.dataset.BaselineSet.transform", false]], "transforms (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.transforms", false]], "transforms (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.transforms", false]], "transforms (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.transforms", false]], "ts (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.ts", false]], "type (kraken.rpred.ocr_record attribute)": [[2, "kraken.rpred.ocr_record.type", false]], "user_metadata (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id4", false], [2, "kraken.lib.vgsl.TorchVGSLModel.user_metadata", false]], "valid_baselines (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.valid_baselines", false]], "valid_regions (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.valid_regions", false]], "vectorize_lines() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.vectorize_lines", false]], "width (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id7", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.width", false]], "write() (kraken.transcribe.transcriptioninterface method)": [[2, "kraken.transcribe.TranscriptionInterface.write", false]]}, "objects": {"kraken.binarization": [[2, 0, 1, "", "nlbin"]], "kraken.blla": [[2, 0, 1, "", "segment"]], "kraken.lib.codec": [[2, 1, 1, "", "PytorchCodec"]], "kraken.lib.codec.PytorchCodec": [[2, 2, 1, "", "add_labels"], [2, 3, 1, "", "c_sorted"], [2, 2, 1, "", "decode"], [2, 2, 1, "", "encode"], [2, 4, 1, "", "is_valid"], [2, 3, 1, "", "l2c"], [2, 4, 1, "", "max_label"], [2, 2, 1, "", "merge"], [2, 3, 1, "", "strict"]], "kraken.lib.ctc_decoder": [[2, 0, 1, "", "beam_decoder"], [2, 0, 1, "", "blank_threshold_decoder"], [2, 0, 1, "", "greedy_decoder"]], "kraken.lib.dataset": [[2, 1, 1, "", "BaselineSet"], [2, 1, 1, "", "GroundTruthDataset"], [2, 1, 1, "", "PolygonGTDataset"], [2, 0, 1, "", "compute_error"], [2, 0, 1, "", "preparse_xml_data"]], "kraken.lib.dataset.BaselineSet": [[2, 2, 1, "", "add"], [2, 3, 1, "", "aug"], [2, 3, 1, "", "class_mapping"], [2, 3, 1, "", "class_stats"], [2, 3, 1, "", "im_mode"], [2, 3, 1, "", "imgs"], [2, 3, 1, "", "line_width"], [2, 3, 1, "", "mbl_dict"], [2, 3, 1, "", "mode"], [2, 3, 1, "", "mreg_dict"], [2, 3, 1, "", "num_classes"], [2, 3, 1, "", "seg_type"], [2, 3, 1, "", "targets"], [2, 2, 1, "", "transform"], [2, 3, 1, "", "transforms"], [2, 3, 1, "", "valid_baselines"], [2, 3, 1, "", "valid_regions"]], "kraken.lib.dataset.GroundTruthDataset": [[2, 2, 1, "", "add"], [2, 3, 1, "", "alphabet"], [2, 3, 1, "", "aug"], [2, 2, 1, "", "encode"], [2, 3, 1, "", "ignore_empty_lines"], [2, 3, 1, "", "im_mode"], [2, 2, 1, "", "no_encode"], [2, 2, 1, "", "parse"], [2, 3, 1, "", "seg_type"], [2, 3, 1, "", "split"], [2, 3, 1, "", "suffix"], [2, 3, 1, "", "text_transforms"], [2, 3, 1, "", "transforms"]], "kraken.lib.dataset.PolygonGTDataset": [[2, 2, 1, "", "add"], [2, 3, 1, "", "alphabet"], [2, 3, 1, "", "aug"], [2, 2, 1, "", "encode"], [2, 3, 1, "", "ignore_empty_lines"], [2, 3, 1, "", "im_mode"], [2, 2, 1, "", "no_encode"], [2, 2, 1, "", "parse"], [2, 3, 1, "", "seg_type"], [2, 3, 1, "", "text_transforms"], [2, 3, 1, "", "transforms"]], "kraken.lib.exceptions": [[2, 1, 1, "", "KrakenCairoSurfaceException"], [2, 1, 1, "", "KrakenCodecException"], [2, 1, 1, "", "KrakenEncodeException"], [2, 1, 1, "", "KrakenInputException"], [2, 1, 1, "", "KrakenInvalidModelException"], [2, 1, 1, "", "KrakenRecordException"], [2, 1, 1, "", "KrakenRepoException"], [2, 1, 1, "", "KrakenStopTrainingException"]], "kraken.lib.exceptions.KrakenCairoSurfaceException": [[2, 3, 1, "id5", "height"], [2, 3, 1, "id6", "message"], [2, 3, 1, "id7", "width"]], "kraken.lib.models": [[2, 1, 1, "", "TorchSeqRecognizer"], [2, 0, 1, "", "load_any"]], "kraken.lib.models.TorchSeqRecognizer": [[2, 3, 1, "", "codec"], [2, 3, 1, "", "decoder"], [2, 3, 1, "", "device"], [2, 2, 1, "", "forward"], [2, 3, 1, "", "kind"], [2, 3, 1, "", "nn"], [2, 3, 1, "", "one_channel_mode"], [2, 2, 1, "", "predict"], [2, 2, 1, "", "predict_labels"], [2, 2, 1, "", "predict_string"], [2, 3, 1, "", "seg_type"], [2, 2, 1, "", "to"], [2, 3, 1, "", "train"]], "kraken.lib.segmentation": [[2, 0, 1, "", "calculate_polygonal_environment"], [2, 0, 1, "", "compute_polygon_section"], [2, 0, 1, "", "denoising_hysteresis_thresh"], [2, 0, 1, "", "extract_polygons"], [2, 0, 1, "", "polygonal_reading_order"], [2, 0, 1, "", "reading_order"], [2, 0, 1, "", "scale_polygonal_lines"], [2, 0, 1, "", "scale_regions"], [2, 0, 1, "", "vectorize_lines"]], "kraken.lib.train": [[2, 1, 1, "", "KrakenTrainer"]], "kraken.lib.train.KrakenTrainer": [[2, 2, 1, "", "fit"]], "kraken.lib.vgsl": [[2, 1, 1, "", "TorchVGSLModel"]], "kraken.lib.vgsl.TorchVGSLModel": [[2, 2, 1, "", "add_codec"], [2, 2, 1, "", "append"], [2, 4, 1, "", "aux_layers"], [2, 3, 1, "", "blocks"], [2, 2, 1, "", "build_addition"], [2, 2, 1, "", "build_conv"], [2, 2, 1, "", "build_dropout"], [2, 2, 1, "", "build_groupnorm"], [2, 2, 1, "", "build_identity"], [2, 2, 1, "", "build_maxpool"], [2, 2, 1, "", "build_output"], [2, 2, 1, "", "build_parallel"], [2, 2, 1, "", "build_reshape"], [2, 2, 1, "", "build_rnn"], [2, 2, 1, "", "build_series"], [2, 2, 1, "", "build_wav2vec2"], [2, 3, 1, "", "codec"], [2, 3, 1, "id0", "criterion"], [2, 2, 1, "", "eval"], [2, 4, 1, "", "hyper_params"], [2, 3, 1, "", "idx"], [2, 2, 1, "", "init_weights"], [2, 3, 1, "id1", "input"], [2, 2, 1, "", "load_model"], [2, 3, 1, "", "m"], [2, 4, 1, "", "model_type"], [2, 3, 1, "", "named_spec"], [2, 3, 1, "id2", "nn"], [2, 4, 1, "id3", "one_channel_mode"], [2, 3, 1, "", "ops"], [2, 3, 1, "", "pattern"], [2, 2, 1, "", "resize_output"], [2, 2, 1, "", "save_model"], [2, 4, 1, "", "seg_type"], [2, 2, 1, "", "set_num_threads"], [2, 3, 1, "", "spec"], [2, 2, 1, "", "to"], [2, 2, 1, "", "train"], [2, 3, 1, "id4", "user_metadata"]], "kraken.lib.xml": [[2, 0, 1, "", "parse_alto"], [2, 0, 1, "", "parse_page"], [2, 0, 1, "", "parse_xml"]], "kraken.pageseg": [[2, 0, 1, "", "segment"]], "kraken.rpred": [[2, 0, 1, "", "bidi_record"], [2, 1, 1, "", "mm_rpred"], [2, 1, 1, "", "ocr_record"], [2, 0, 1, "", "rpred"]], "kraken.rpred.mm_rpred": [[2, 3, 1, "", "bidi_reordering"], [2, 3, 1, "", "bounds"], [2, 3, 1, "", "filtered_tags"], [2, 3, 1, "", "im"], [2, 3, 1, "", "im_str"], [2, 3, 1, "", "miss"], [2, 3, 1, "", "nets"], [2, 3, 1, "", "one_channel_modes"], [2, 3, 1, "", "pad"], [2, 3, 1, "", "seg_types"], [2, 3, 1, "", "tags"], [2, 3, 1, "", "tags_ignore"], [2, 3, 1, "", "ts"]], "kraken.rpred.ocr_record": [[2, 3, 1, "", "base_dir"], [2, 3, 1, "", "confidences"], [2, 3, 1, "", "cuts"], [2, 3, 1, "", "prediction"], [2, 3, 1, "", "tags"], [2, 3, 1, "", "type"]], "kraken.serialization": [[2, 0, 1, "", "render_report"], [2, 0, 1, "", "serialize"], [2, 0, 1, "", "serialize_segmentation"]], "kraken.transcribe": [[2, 1, 1, "", "TranscriptionInterface"]], "kraken.transcribe.TranscriptionInterface": [[2, 2, 1, "", "add_page"], [2, 3, 1, "", "env"], [2, 3, 1, "", "font"], [2, 3, 1, "", "line_idx"], [2, 3, 1, "", "page_idx"], [2, 3, 1, "", "pages"], [2, 3, 1, "", "seg_idx"], [2, 3, 1, "", "text_direction"], [2, 3, 1, "", "tmpl"], [2, 2, 1, "", "write"]]}, "objnames": {"0": ["py", "function", "Python function"], "1": ["py", "class", "Python class"], "2": ["py", "method", "Python method"], "3": ["py", "attribute", "Python attribute"], "4": ["py", "property", "Python property"]}, "objtypes": {"0": "py:function", "1": "py:class", "2": "py:method", "3": "py:attribute", "4": "py:property"}, "terms": {"": [0, 1, 2, 4, 5, 6, 7, 8], "0": [0, 1, 2, 4, 5, 7, 8], "00": [0, 5, 7], "0001": 5, "0005": 4, "001": [5, 7], "0123456789": [0, 4, 7], "01c59": 8, "02": 4, "0245": 7, "04": 7, "06": [0, 7], "07": [0, 5], "09": [0, 7], "0d": 7, "0xe8e5": 0, "0xf038": 0, "0xf128": 0, "1": [0, 1, 2, 5, 7, 8], "10": [0, 1, 4, 5, 7], "100": [0, 2, 5, 7, 8], "10000": 4, "1015": 1, "1020": 8, "10218": 5, "1024": 8, "103": 1, "105": 1, "106": 5, "108": 5, "11": 7, "1128": 5, "11346": 5, "1161": 1, "117": 1, "1184": 7, "119": 1, "1195": 1, "12": [5, 7, 8], "120": 5, "1200": 5, "121": 1, "122": 5, "124": 1, "125": 5, "126": 1, "128": [5, 8], "13": [5, 7], "131": 1, "132": 7, "1339": 7, "134": 5, "135": 5, "1359": 7, "136": 5, "1377": 1, "1385": 1, "1388": 1, "1397": 1, "14": [0, 5], "1408": [1, 2], "1410": 1, "1412": 1, "1416": 7, "143": 7, "144": 5, "145": 1, "15": [1, 5, 7], "151": 5, "1558": 7, "1567": 5, "157": 7, "16": [0, 2, 5, 8], "161": 7, "1623": 7, "1681": 7, "1697": 7, "17": [2, 5], "1708": 1, "1716": 1, "172": 5, "1724": 7, "174": 5, "1754": 7, "176": 7, "18": [5, 7], "1824": 1, "19": [1, 5], "192": 5, "198": 5, "199": 5, "1996": 7, "1bpp": 0, "1cycl": 5, "1d": 8, "1st": 7, "1x0": 5, "1x12": [5, 8], "1x16": 8, "1x48": 8, "2": [0, 2, 4, 5, 7, 8], "20": [1, 2, 5, 8], "200": 5, "2000": 1, "2001": 5, "2006": 2, "2014": 2, "2016": 1, "2017": 1, "2019": [4, 5], "2020": 4, "2021": 0, "204": 7, "2041": 1, "207": [1, 5], "2072": 1, "2077": 1, "2078": 1, "2096": 7, "21": 4, "210": 5, "215": 5, "216": 1, "22": [0, 5, 7], "228": 1, "23": [0, 5], "230": 1, "232": 1, "2334": 7, "2364": 7, "23rd": 2, "24": [0, 1, 7], "241": 5, "2426": 1, "246": 5, "2483": 1, "25": [1, 5, 7, 8], "250": 1, "2500": 7, "253": 1, "256": [5, 7, 8], "2577813": 4, "259": 7, "26": [4, 7], "266": 5, "27": 5, "270": 7, "27046": 7, "274": 5, "28": [1, 5], "2873": 2, "29": [0, 1, 5], "2d": [2, 8], "3": [2, 5, 7, 8], "30": [5, 7], "300": 5, "300dpi": 7, "307": 7, "31": 5, "32": [5, 8], "328": 5, "3292": 1, "336": 7, "3367": 1, "3398": 1, "3414": 1, "3418": 7, "3437": 1, "345": 1, "3455": 1, "35000": 7, "3504": 7, "3514": 1, "3519": 7, "35619": 7, "365": 7, "3680": 7, "38": 5, "384": 8, "39": 5, "4": [1, 2, 5, 7, 8], "40": 7, "400": 5, "4000": 5, "428": 7, "431": 7, "46": 5, "47": 7, "471": 1, "473": 1, "48": [5, 7, 8], "488": 7, "49": [0, 5, 7], "491": 1, "4d": 2, "5": [1, 2, 5, 7, 8], "50": [5, 7], "500": 5, "5000": 5, "509": 1, "512": 8, "515": 1, "52": [5, 7], "522": 1, "5226": 5, "523": 5, "5230": 5, "5234": 5, "524": 1, "5258": 7, "5281": [0, 4], "53": 5, "534": 1, "536": [1, 5], "53980": 5, "54": 1, "54114": 5, "5431": 5, "545": 7, "5468665": 0, "56": [0, 1, 7], "5617734": 0, "5617783": 0, "562": 1, "575": [1, 5], "577": 7, "59": [7, 8], "5951": 7, "599": 7, "6": [5, 7, 8], "60": [5, 7], "6022": 5, "62": 5, "63": 5, "64": [5, 8], "646": 7, "6542744": 0, "66": [5, 7], "668": 1, "69": 1, "7": [1, 5, 7, 8], "70": 1, "7012": 5, "7015": 7, "71": 5, "7272": 7, "7281": 7, "73": 1, "74": [1, 5], "7593": 5, "76": 1, "773": 5, "7857": 5, "788": [5, 7], "79": 1, "794": 5, "7943": 5, "8": [0, 5, 7, 8], "80": [2, 5], "800": 7, "8014": 5, "81": [5, 7], "811": 7, "82": 5, "824": 7, "8337": 5, "8344": 5, "8374": 5, "84": [1, 7], "8445": 7, "8479": 7, "8481": 7, "8482": 7, "8484": 7, "8485": 7, "8486": 7, "8487": 7, "8488": 7, "8489": 7, "8490": 7, "8491": 7, "8492": 7, "8493": 7, "8494": 7, "8495": 7, "8496": 7, "8497": 7, "8498": 7, "8499": 7, "8500": 7, "8501": 7, "8502": 7, "8503": 7, "8504": 7, "8505": 7, "8506": 7, "8507": 7, "8508": 7, "8509": 7, "8510": 7, "8511": 7, "8512": 7, "8616": 5, "8620": 5, "876": 7, "8760": 5, "8762": 5, "8790": 5, "8795": 5, "8797": 5, "88": [5, 7], "8802": 5, "8804": 5, "8806": 5, "8813": 5, "8876": 5, "8878": 5, "8883": 5, "889": 7, "9": [1, 5, 7, 8], "90": [2, 5], "906": 8, "906x32": 8, "91": 5, "92": 1, "9315": 7, "9318": 7, "9350": 7, "9361": 7, "9381": 7, "95": [0, 4, 5], "9541": 7, "9550": 7, "96": [5, 7], "97": 7, "98": 7, "99": [4, 7], "9918": 7, "9920": 7, "9924": 7, "A": [0, 1, 2, 4, 5, 7, 8], "As": [0, 1, 2], "BY": 0, "By": 7, "For": [0, 1, 2, 5, 7, 8], "If": [0, 2, 4, 5, 7, 8], "In": [0, 1, 2, 4, 5, 7], "It": [0, 1, 5, 7], "Its": 0, "NO": 7, "One": [2, 5], "The": [0, 1, 2, 3, 4, 5, 7, 8], "Then": 5, "There": [0, 1, 4, 5, 6, 7], "These": [0, 1, 2, 4, 5, 7], "To": [0, 1, 2, 4, 5, 7], "Will": 2, "With": 0, "aaebv2": 0, "abbyyxml": [0, 4], "abcdefghijklmnopqrstuvwxyz": 4, "abcdefghijklmnopqrstuvxabcdefghijklmnopqrstuvwxyz": 0, "abjad": 5, "abl": [0, 2, 5, 7], "abort": [5, 7], "about": 7, "abov": [0, 1, 5, 7], "absolut": [2, 5], "abugida": 5, "acceler": [4, 5, 7], "accent": 0, "accept": [0, 2, 5], "access": [0, 1], "access_token": 0, "accord": [0, 2, 5], "accordingli": 2, "account": [0, 7], "accur": 5, "accuraci": [0, 1, 2, 4, 5, 7], "achiev": 7, "acm": 2, "across": [2, 5], "action": [0, 5], "activ": [0, 5, 7, 8], "actual": [2, 4, 5, 7], "acut": 0, "ad": [2, 5, 7], "adam": 5, "adapt": 5, "add": [0, 2, 4, 5, 8], "add_codec": 2, "add_label": 2, "add_pag": 2, "addit": [0, 1, 2, 4, 5], "addition": 2, "adjust": [5, 7, 8], "administr": 0, "advantag": 5, "advis": 7, "affect": 7, "after": [0, 1, 5, 7, 8], "afterward": [0, 1], "again": [4, 7], "agenc": 4, "aggreg": 2, "ah": 7, "aid": 4, "aim": 5, "aku": 7, "al": [2, 7], "alam": 7, "albeit": 7, "aletheia": 7, "alex": 2, "algorithm": [0, 1, 2, 5], "all": [0, 1, 2, 4, 5, 6, 7], "allow": [5, 6, 7], "almost": [0, 1], "along": [2, 8], "alphabet": [0, 2, 4, 5, 7, 8], "alphanumer": 0, "alreadi": 5, "also": [0, 1, 2, 4, 5, 7], "altern": [0, 5, 8], "although": [0, 1, 5, 7], "alto": [0, 1, 2, 4, 7], "alto_doc": 1, "alwai": [0, 2, 4], "amiss": 7, "among": 5, "amount": [0, 7], "an": [0, 1, 2, 4, 5, 7, 8], "anaconda": 4, "analogu": 0, "analysi": [0, 4, 7], "ani": [0, 1, 2, 5], "annot": [0, 4, 5], "anoth": [0, 2, 5, 7, 8], "anr": 4, "antiqua": 0, "anymor": [0, 5, 7], "anyth": 2, "apach": 4, "apart": [0, 3, 5], "apdjfqpf": 2, "api": 5, "append": [0, 2, 5, 7, 8], "appli": [0, 1, 2, 4, 7, 8], "applic": [1, 7], "approach": [5, 7], "appropri": [0, 2, 4, 5, 7, 8], "approv": 0, "approxim": 1, "ar": [0, 1, 2, 4, 5, 6, 7, 8], "arab": [0, 5, 7], "arbitrari": [1, 6, 7, 8], "architectur": [4, 5, 6, 8], "archiv": [1, 5, 7], "area": [0, 2], "aren": 2, "arg": 2, "arg0": 2, "argument": [1, 5], "argx": 2, "arian": 0, "arm": 4, "around": [0, 1, 2, 5, 7], "arrai": [1, 2], "arrow": 5, "arxiv": 2, "ask": 0, "aspect": 2, "assign": [2, 5, 7], "associ": 1, "assum": 2, "attach": [1, 5], "attribut": [0, 1, 2, 5], "au": 4, "aug": 2, "augment": [1, 2, 5, 7, 8], "author": [0, 4], "authorship": 0, "auto": [1, 2, 5], "autodetermin": 2, "automat": [0, 1, 2, 5, 7, 8], "aux_lay": 2, "auxilari": 0, "auxiliari": 1, "avail": [0, 1, 4, 5, 7], "avenir": 4, "averag": [0, 2, 5, 7], "awni": 2, "axi": [2, 8], "b": [0, 1, 5, 7, 8], "back": 2, "backend": 3, "background": [0, 2], "bar": 2, "base": [1, 2, 5, 6, 7, 8], "base_dir": 2, "baselin": [2, 4, 5, 7], "baseline_seg": 1, "baselineset": 2, "basic": [0, 5, 7], "batch": [0, 2, 7, 8], "bayr\u016bt": 7, "bbox": 2, "beam": 2, "beam_decod": 2, "beam_siz": 2, "becaus": [1, 7], "becom": 0, "been": [0, 4, 5, 7], "befor": [2, 5, 7, 8], "beforehand": 7, "behav": [5, 8], "behavior": 5, "being": [1, 2, 5, 8], "below": [0, 5, 7], "benjamin": 4, "best": [2, 5, 7], "better": 5, "between": [0, 2, 5, 7], "bi": [2, 8], "bidi": [2, 4, 5], "bidi_record": 2, "bidi_reord": 2, "bidirect": [2, 5], "bidirection": 8, "binar": [1, 7], "binari": [0, 1, 2], "bind": 0, "bit": 1, "biton": 2, "bl": [0, 4], "black": [0, 1, 2, 7], "black_colsep": 2, "blank": 2, "blank_threshold_decod": 2, "blla": 1, "blob": 2, "block": [0, 1, 2, 5, 8], "block_i": 5, "block_n": 5, "board": 4, "boilerpl": 1, "book": 0, "bookhand": 0, "bool": 2, "border": [0, 2], "both": [0, 1, 2, 3, 4, 5, 7], "bottom": [0, 1, 2, 4], "bound": [0, 1, 2, 4, 5], "boundari": [0, 1, 2, 5], "box": [1, 2, 4, 5], "break": 7, "brought": 5, "build": [2, 5, 7], "build_addit": 2, "build_conv": 2, "build_dropout": 2, "build_groupnorm": 2, "build_ident": 2, "build_maxpool": 2, "build_output": 2, "build_parallel": 2, "build_reshap": 2, "build_rnn": 2, "build_seri": 2, "build_wav2vec2": 2, "buld\u0101n": 7, "bw": [0, 4], "bw_im": 1, "bw_imag": 7, "b\u00e9n\u00e9fici\u00e9": 4, "c": [0, 1, 2, 4, 5, 8], "c1": 2, "c2": 2, "c_sort": 2, "cach": 2, "cairo": 2, "calcul": [1, 2], "calculate_polygonal_environ": 2, "call": [1, 2, 5, 7], "callabl": 2, "callback": 1, "can": [0, 1, 2, 3, 4, 5, 7, 8], "cannot": 0, "capabl": [0, 5], "case": [0, 1, 2, 5, 7], "cat": 0, "categori": 2, "caus": [1, 2], "caveat": 5, "cc": 0, "cd": 4, "ce": [4, 7], "cell": 8, "cent": 7, "centerlin": 5, "central": [4, 7], "certain": [0, 2, 7], "chain": [0, 4, 7], "chang": [0, 1, 2, 5], "channel": [2, 4, 8], "char": 2, "char_confus": 2, "charact": [0, 1, 2, 4, 5, 6, 7], "charset": 2, "check": [0, 5], "chines": [0, 5], "chinese_training_data": 5, "choic": 5, "chosen": 1, "circumst": 7, "class": [0, 1, 2, 5, 7], "class_map": 2, "class_stat": 2, "classic": 7, "classif": [2, 7, 8], "classifi": [0, 1, 8], "classmethod": 2, "claus": 7, "cli": 1, "clone": 4, "close": 4, "closer": 1, "clstm": [0, 2, 6], "code": [0, 1, 2, 4, 5, 7], "codec": 1, "coher": 0, "collect": [2, 7], "color": [0, 1, 5, 7, 8], "colsep": 0, "column": [0, 1, 2], "com": [4, 7], "combin": [0, 1, 5, 7, 8], "come": 2, "command": [0, 1, 4, 5, 7], "commenc": 1, "common": [2, 5, 7], "commoni": 5, "commun": 0, "compact": [0, 6], "compar": 5, "comparison": 5, "compat": [2, 3, 5], "compil": 5, "complet": [1, 5, 7], "complex": [1, 7], "complic": 5, "compos": 2, "composedblocktyp": 5, "composit": 0, "compound": 2, "compress": 7, "compris": 7, "comput": [0, 2, 3, 4, 5, 7], "computation": 7, "compute_error": 2, "compute_polygon_sect": 2, "conda": 7, "condit": [4, 5], "confer": 2, "confid": [0, 1, 2], "configur": [1, 2, 5], "conform": 5, "confus": 5, "connect": [2, 7], "connectionist": 2, "consid": 2, "consist": [0, 1, 4, 7, 8], "constant": 5, "construct": [5, 7], "contain": [0, 1, 2, 4, 5, 6, 7], "contemporari": 0, "content": 5, "continu": [0, 1, 2, 5, 7], "contrast": [5, 7], "contrib": 1, "contribut": 4, "control": 5, "conv": [5, 8], "converg": 5, "convers": [1, 7], "convert": [0, 1, 2, 5, 7], "convolut": [2, 5], "coord": 5, "coordin": [0, 2, 4], "core": 6, "coreml": 2, "corpu": [4, 5], "correct": [0, 1, 2, 5, 7], "correspond": [0, 1, 2], "cosin": 5, "cost": 7, "could": [2, 5], "couldn": 2, "count": [2, 5, 7], "counter": 2, "coupl": [0, 5, 7], "cover": 0, "coverag": 7, "cpu": [1, 2, 5, 7], "cr3": [5, 8], "cr7": 5, "creat": [0, 2, 4, 5, 7, 8], "creation": 0, "cremma": 0, "cremma_medieval_bicerin": 0, "criterion": 2, "css": 0, "ctc": [1, 2, 5], "ctc_decod": 1, "cuda": [3, 4, 5], "cudnn": 3, "cumbersom": 0, "cuneiform": 5, "curat": 0, "current": [2, 5, 6], "curv": 0, "custom": [1, 5], "cut": [1, 2, 4], "cycl": 5, "d": [0, 4, 5, 7, 8], "dai": 4, "data": [0, 1, 2, 4, 7, 8], "dataset": 1, "dataset_larg": 5, "date": [0, 4], "de": [4, 7], "deal": [0, 5], "debug": [1, 5, 7], "decai": 5, "decid": 0, "decod": [1, 2, 5], "decompos": 5, "decomposit": 5, "decreas": 7, "deem": 0, "def": 1, "default": [0, 1, 2, 4, 5, 6, 7, 8], "default_split": 2, "defin": [0, 1, 2, 4, 5, 8], "definit": [5, 8], "degrad": 1, "degre": 7, "del_indic": 2, "delet": [0, 2, 5, 7], "delta": 5, "denoising_hysteresis_thresh": 2, "denot": 0, "depend": [0, 1, 4, 5, 7], "deposit": 0, "deprec": 0, "depth": [5, 7, 8], "describ": [2, 5], "descript": [0, 2, 5], "descriptor": 2, "deseri": 2, "desir": [1, 8], "desktop": 7, "destin": 2, "destroi": 5, "detail": [0, 5, 7], "detect": [0, 2], "determin": [0, 2, 5], "develop": [2, 4], "deviat": 5, "devic": [1, 2, 5, 7], "diacrit": 5, "diaeres": 7, "diaeresi": 7, "diagram": 5, "dialect": 8, "dice": 5, "dict": 2, "dictionari": [2, 5], "differ": [0, 1, 4, 5, 7, 8], "difficult": 5, "digit": 5, "dim": [5, 7, 8], "dimens": [2, 8], "dimension": 5, "dir": [2, 5], "direct": [1, 2, 4, 5, 7, 8], "directli": [0, 5], "directori": [1, 2, 4, 5, 7], "disabl": [0, 2, 5, 7], "discover": 0, "disk": 7, "displai": [2, 5], "dist1": 2, "dist2": 2, "distanc": 2, "distinguish": 5, "distractor": 5, "distribut": 8, "dnn": 2, "do": [0, 1, 2, 4, 5, 6, 7, 8], "do0": [5, 8], "doc": 0, "document": [0, 1, 2, 4, 5, 7], "doe": [0, 1, 2, 5, 7], "doesn": [2, 5, 7], "doi": 0, "domain": [1, 5], "done": [0, 5, 7], "dot": 7, "down": 7, "download": [0, 4, 7], "downward": 2, "drastic": 5, "draw": 1, "drawback": [0, 5], "driver": 1, "drop": [1, 8], "dropout": [2, 5, 7], "du": 4, "dumb": 5, "duplic": 2, "dure": [2, 5, 7], "e": [0, 1, 2, 5, 7, 8], "each": [0, 1, 2, 4, 5, 7, 8], "earli": [5, 7], "easiest": 7, "easili": [5, 7], "ecod": 2, "edg": 2, "edit": [2, 7], "editor": 7, "edu": 7, "effect": 0, "either": [0, 2, 5, 7, 8], "element": 5, "emit": 2, "emploi": [0, 7], "empti": 2, "enabl": [1, 2, 3, 5, 7, 8], "enable_progress_bar": [1, 2], "enable_summari": 2, "encapsul": 1, "encod": [2, 5, 7], "end": [1, 2], "end_separ": 2, "endpoint": 2, "energi": 2, "enforc": [0, 5], "engin": 1, "english": 4, "enough": 7, "ensur": 5, "entri": 2, "env": [2, 4, 7], "environ": [2, 4, 7], "environment_cuda": 4, "epoch": [5, 7], "equal": [1, 7, 8], "equival": 8, "erron": 7, "error": [0, 2, 5, 7], "escal": [0, 2], "escripta": 4, "escriptorium": [4, 7], "especi": 0, "esr": 4, "estim": [0, 2, 7], "et": 2, "etc": 0, "european": 4, "eval": 2, "evalu": 5, "evaluation_data": 1, "evaluation_fil": 1, "even": [0, 7], "everi": 0, "everyth": 5, "exact": [5, 7], "exactli": [1, 5], "exampl": [0, 1, 5, 7], "except": [1, 5], "execut": [0, 7, 8], "exhaust": 7, "exist": [0, 1, 5, 7], "exit": 2, "expand": 0, "expect": [2, 5, 7, 8], "experi": [4, 7], "experiment": 7, "explic": 0, "explicit": [1, 5], "explicitli": [5, 7], "exponenti": 5, "express": 0, "extend": 8, "extens": [0, 5], "extent": 7, "extra": [2, 4], "extract": [0, 1, 2, 4, 5, 7], "extract_polygon": 2, "extrapol": 2, "f": [0, 4, 5, 7, 8], "f_t": 2, "factor": [0, 2], "fail": 5, "faint": 0, "fairli": 7, "fallback": 0, "fals": [1, 2, 5, 7, 8], "fame": 0, "fancy_model": 0, "faq\u012bh": 7, "fashion": 5, "faster": [5, 7, 8], "fd": 2, "featur": [0, 1, 2, 5, 7, 8], "fed": [0, 1, 2, 5, 8], "feed": [0, 1], "feminin": 7, "fetch": 7, "few": [0, 5, 7], "field": [2, 5], "figur": 1, "file": [0, 1, 2, 4, 5, 6, 7], "file_1": 5, "file_2": 5, "filenam": [1, 2, 5], "filenotfounderror": 2, "fill": 2, "filter": [1, 2, 5, 8], "filtered_tag": 2, "final": [0, 2, 4, 5, 7, 8], "find": [0, 5, 7], "fine": [1, 7], "finish": 7, "first": [0, 1, 2, 5, 7, 8], "fit": [1, 2, 7], "fix": [0, 5, 7], "flag": [1, 2, 4], "float": [0, 2], "flow": 0, "flush": 2, "fname": 2, "follow": [0, 2, 5, 8], "font": 2, "font_styl": 2, "foo": [1, 2, 5], "forc": 0, "foreground": 0, "forg": 4, "form": [0, 2, 5], "format": [0, 1, 2, 6, 7], "format_typ": [1, 2], "formul": 8, "forward": [2, 8], "found": [0, 1, 2, 5, 7], "four": 0, "fp": 1, "framework": [1, 4], "free": [2, 5], "freeli": [0, 7], "french": 0, "frequenc": [5, 7], "friendli": [4, 7], "from": [0, 1, 2, 3, 4, 7, 8], "full": 7, "fulli": [2, 4], "function": [1, 5], "fundament": 1, "further": [1, 2, 4, 5], "g": [0, 2, 5, 7, 8], "gain": 1, "garantue": 2, "gaussian_filt": 2, "gener": [0, 1, 2, 4, 5, 7], "gentl": 5, "get": [0, 1, 4, 5, 7], "git": 4, "github": 4, "githubusercont": 7, "gitter": 4, "given": [1, 2, 5, 8], "glob": [0, 1], "glori": 0, "glyph": [5, 7], "gn": 8, "gn32": 5, "go": 7, "good": 5, "gov": 5, "gpu": [1, 5], "gradient": 2, "grain": [1, 7], "graph": [2, 8], "graphem": [2, 5, 7], "graphic": 5, "grave": 2, "grayscal": [0, 1, 2, 7, 8], "greedi": 2, "greedili": 2, "greedy_decod": [1, 2], "greek": [0, 7], "grei": 0, "grek": 0, "ground": [5, 7], "ground_truth": 1, "groundtruthdataset": 2, "group": [4, 7], "gru": [2, 8], "gt": [2, 5], "guarante": 1, "guid": 7, "g\u00e9r\u00e9e": 4, "h": [0, 2, 7], "ha": [0, 1, 2, 4, 5, 7, 8], "hamza": [5, 7], "han": 5, "hand": [5, 7], "handl": 1, "handwrit": 5, "handwritten": [0, 1, 5], "hannun": 2, "happen": 1, "happili": 0, "hard": [2, 7], "hardwar": 4, "haut": 4, "have": [0, 1, 2, 3, 4, 5, 7], "heatmap": [0, 1], "hebrew": [0, 5, 7], "hebrew_training_data": 5, "height": [0, 2, 5, 8], "held": 7, "help": [4, 7], "here": [0, 5], "heurist": 0, "high": [0, 1, 2, 7, 8], "higher": 8, "highli": [2, 5, 7], "histor": 4, "hline": 0, "hoc": 5, "hocr": [0, 2, 4, 7], "honor": 0, "horizon": 4, "horizont": [0, 1, 2], "hour": 7, "how": [4, 5, 7], "hpo": 5, "html": 2, "http": [4, 5, 7], "huffmann": 5, "human": 5, "hundr": 7, "hyper_param": 2, "hyperparamet": 5, "h\u0101d\u012b": 7, "i": [0, 1, 2, 4, 5, 6, 7, 8], "ibn": 7, "id": 5, "ident": 1, "identifi": 0, "idx": 2, "ignor": [0, 2, 5], "ignore_empty_lin": 2, "illustr": 2, "im": [1, 2], "im_feat": 2, "im_mod": 2, "im_str": 2, "im_transform": 2, "imag": [0, 1, 2, 4, 5, 8], "image_nam": [1, 2], "image_s": [1, 2], "imagefilenam": 5, "imaginari": 7, "img": 2, "immedi": 5, "impath": 2, "implement": [0, 1, 8], "implicitli": 5, "import": [0, 1, 5, 7], "importantli": [2, 5, 7], "improv": [0, 5, 7], "includ": [0, 1, 4, 5, 7], "inclus": 0, "incompat": 2, "incorrect": 7, "increas": [5, 7], "independ": 8, "index": [0, 2, 5], "indic": [0, 2, 5, 7], "individu": [0, 5], "infer": [2, 4, 5, 7], "influenc": 5, "inform": [0, 1, 2, 4, 5, 7], "ingest": 5, "inherit": [5, 7], "init": 1, "init_weight": 2, "initi": [0, 1, 2, 5, 7, 8], "inlin": 0, "innov": 4, "input": [1, 2, 5, 7, 8], "input_1": [0, 7], "input_2": [0, 7], "input_imag": 7, "insert": [2, 5, 7, 8], "insid": 2, "insight": 1, "inspect": 7, "instal": 3, "instanc": [0, 1, 2, 5], "instanti": 2, "instead": [2, 5, 7], "insuffici": 7, "int": 2, "integ": [0, 1, 2, 5, 7, 8], "integr": 7, "intend": 4, "intens": 7, "interact": 0, "interchang": 2, "interfac": [2, 4], "intermedi": [1, 5, 7], "intern": [0, 1, 2, 7], "interoper": 2, "interrupt": 5, "introduct": 5, "inttensor": 2, "intuit": 8, "invalid": [2, 5], "inventori": [5, 7], "invers": 0, "investiss": 4, "invoc": 5, "invok": 7, "involv": [5, 7], "irregular": 5, "is_valid": 2, "isn": [1, 2, 7, 8], "iter": [1, 2, 7], "its": [0, 2, 5, 7], "itself": 1, "j": 2, "jinja2": [1, 2], "jpeg": [0, 7], "jpeg2000": [0, 4], "jpg": [0, 5], "json": [0, 4, 5], "just": [0, 1, 4, 5, 7], "justif": 5, "kamil": 5, "keep": 0, "kei": [2, 4], "kernel": [5, 8], "kernel_s": 8, "keto": [0, 5, 7], "keyword": 0, "kiessl": 4, "kind": [0, 2, 5, 6, 7], "kit\u0101b": 7, "know": 7, "known": [2, 7], "kraken": [0, 1, 3, 5, 6, 8], "krakencairosurfaceexcept": 2, "krakencodecexcept": 2, "krakenencodeexcept": 2, "krakeninputexcept": 2, "krakeninvalidmodelexcept": 2, "krakenrecordexcept": 2, "krakenrepoexcept": 2, "krakenstoptrainingexcept": 2, "krakentrain": [1, 2], "kutub": 7, "kwarg": 2, "l": [0, 2, 4, 7, 8], "l2c": [1, 2], "la": 4, "label": [0, 1, 2, 5], "lack": 7, "lag": 5, "languag": [2, 5, 8], "larg": [0, 1, 2, 4, 5, 7], "larger": [2, 5, 7], "last": [0, 2, 5, 8], "lastli": 5, "later": [0, 7], "latest": [3, 4], "latin": [0, 4], "latin_training_data": 5, "latn": [0, 4], "latter": 1, "layer": [2, 5, 7], "layout": [0, 2, 4, 5, 7], "lbx100": [5, 7, 8], "lbx128": [5, 8], "lbx200": 5, "lbx256": [5, 8], "learn": [1, 2, 5], "least": [5, 7], "leav": [5, 8], "lectaurep": 0, "left": [0, 2, 4, 5, 7], "leftward": 0, "legaci": [5, 7, 8], "leipzig": 7, "len": 2, "length": [2, 5], "less": 7, "let": 7, "letter": 0, "level": [0, 1, 2, 5, 7], "lfx25": 8, "lfys20": 8, "lfys64": [5, 8], "lib": 1, "libr": 4, "librari": 1, "licens": 0, "light": 0, "lightn": 1, "lightningmodul": 1, "lightweight": 4, "like": [0, 1, 5, 7], "likewis": [1, 7], "limit": [0, 5], "line": [0, 1, 2, 4, 5, 7, 8], "line_0": 5, "line_idx": 2, "line_k": 5, "line_width": 2, "linear": [2, 5, 7, 8], "link": [4, 5], "linux": [4, 7], "list": [0, 2, 4, 5, 7], "litteratur": 0, "ll": 4, "load": [1, 2, 4, 5, 7], "load_ani": [1, 2], "load_model": [1, 2], "loadabl": 2, "loader": 1, "loc": 5, "locat": [1, 2, 5, 7], "log": [5, 7], "logic": 5, "logograph": 5, "long": [0, 5], "longest": 2, "look": [1, 5, 7], "lossless": 7, "lot": [1, 5], "low": [0, 1, 2, 5], "lower": 5, "lr": [0, 1, 2, 7], "lrate": 5, "lstm": [2, 8], "ltr": [0, 2], "m": [0, 2, 5, 7, 8], "mac": [4, 7], "machin": 2, "macron": 0, "maddah": 7, "made": 7, "mai": [0, 2, 5, 7], "main": [4, 5], "mainli": [1, 2], "major": 1, "make": [0, 5], "mandatori": 1, "mani": [2, 5], "manifest": 5, "manual": [0, 1, 2, 7], "manuscript": [0, 7], "map": [0, 1, 2, 5], "mark": [5, 7], "markedli": 7, "mask": [1, 2, 5], "massag": 5, "master": 7, "match": [2, 5], "materi": [0, 1, 4, 7], "matrix": 1, "matter": 7, "max": 2, "max_epoch": 2, "max_label": 2, "maxcolsep": [0, 2], "maxim": 7, "maximum": [0, 2, 8], "maxpool": [2, 5, 8], "mb": 0, "mbl_dict": 2, "mean": [1, 2, 7], "measur": 5, "measurementunit": 5, "mediev": 0, "memori": [2, 5, 7], "merg": [2, 5], "merge_baselin": 2, "merge_region": 2, "messag": 2, "metadata": [0, 1, 2, 4, 5, 6, 7], "method": [0, 1, 2], "might": [0, 5, 7], "min": [2, 5], "min_epoch": 2, "min_length": 2, "minim": [1, 2, 5], "minimum": 5, "minor": 5, "mismatch": [1, 5, 7], "misrecogn": 7, "miss": [0, 2, 5, 7], "mittagessen": [4, 7], "mix": [0, 2, 5], "ml": 6, "mlmodel": [0, 5, 7], "mm_rpred": [1, 2], "mode": [0, 1, 2, 5], "model": [1, 5, 7, 8], "model_1": 5, "model_25": 5, "model_5": 5, "model_best": 5, "model_fil": 7, "model_nam": 7, "model_name_best": 7, "model_path": 1, "model_typ": 2, "modern": [0, 4, 7], "modest": 1, "modif": 5, "modul": 1, "momentum": [5, 7], "mono": 0, "more": [0, 1, 2, 4, 5, 7, 8], "most": [0, 1, 2, 5, 7], "mostli": [0, 1, 2, 4, 5, 7, 8], "move": [2, 7, 8], "mp": 8, "mp2": [5, 8], "mp3": [5, 8], "mreg_dict": 2, "much": [1, 2, 4, 5], "multi": [0, 1, 2, 4, 7], "multilabel": 2, "multipl": [0, 1, 4, 5, 7], "myprintingcallback": 1, "n": [2, 5, 8], "name": [0, 2, 4, 7, 8], "named_spec": 2, "national": 4, "nativ": 6, "natur": [2, 7], "naugment": 4, "nchw": 2, "ndarrai": 2, "necessari": [0, 2, 4, 5, 7], "necessarili": [2, 5], "need": [1, 2, 7], "neg": 5, "net": [1, 2, 7], "network": [1, 2, 4, 5, 6, 7], "neural": [1, 2, 5, 6, 7], "never": 7, "nevertheless": [1, 5], "new": [0, 2, 3, 5, 7, 8], "next": [1, 7], "nfc": 5, "nfd": 5, "nfkc": 5, "nfkd": 5, "nlbin": [0, 1, 2], "nn": 2, "no_encod": 2, "no_hlin": 2, "noisi": 7, "non": [0, 1, 2, 4, 5, 7, 8], "none": [0, 2, 5, 7, 8], "nonlinear": 8, "nop": 1, "normal": 2, "notabl": 0, "note": 2, "notion": 1, "now": [1, 7], "np": 2, "num": [2, 5], "num_class": 2, "number": [0, 1, 2, 5, 7, 8], "numer": [1, 7], "numpi": [1, 2], "nvidia": 3, "o": [0, 1, 4, 5, 7], "o1c103": 8, "object": [1, 2], "obtain": 7, "obvious": 7, "occur": 7, "occurr": 2, "ocr": [0, 1, 2, 4, 7], "ocr_lin": 0, "ocr_record": [1, 2], "ocropi": 2, "ocropu": [0, 2], "ocrx_word": 0, "off": [5, 7], "offer": 5, "offset": [2, 5], "often": [0, 1, 5, 7], "old": [0, 6], "omit": 7, "on_init_end": 1, "on_init_start": 1, "on_train_end": 1, "onc": [0, 5], "one": [0, 1, 2, 5, 7, 8], "one_channel_mod": 2, "ones": 5, "onli": [0, 1, 2, 5, 7, 8], "onto": [2, 5], "op": 2, "open": 1, "openmp": [2, 5, 7], "oper": [1, 2, 8], "optic": [0, 7], "optim": [0, 4, 5, 7], "option": [0, 1, 2, 5, 8], "order": [0, 1, 2, 4, 5, 8], "org": 5, "orient": [0, 1], "origin": [1, 2, 5], "orthogon": 2, "other": [0, 5, 7, 8], "otherwis": [2, 5], "out": [0, 5, 7, 8], "output": [0, 1, 2, 4, 5, 7, 8], "output_1": [0, 7], "output_2": [0, 7], "output_dir": 7, "output_fil": 7, "output_s": 2, "over": 2, "overfit": 7, "overhead": 5, "overlap": 5, "overrid": [2, 5], "overwritten": 2, "p": [0, 4, 5], "packag": [4, 7], "pad": [0, 2, 5], "padding_left": 2, "padding_right": 2, "page": [1, 2, 4, 7], "page_doc": 1, "page_idx": 2, "pagecont": 5, "pageseg": 1, "pagexml": [0, 1, 2, 4, 7], "paint": 5, "pair": [0, 2], "paper": 0, "par": [1, 4], "paradigm": 0, "paragraph": [0, 5], "parallel": [2, 5], "param": [5, 7, 8], "paramet": [0, 1, 2, 4, 5, 7, 8], "parameterless": 0, "parametr": 2, "parchment": 0, "pars": [2, 5], "parse_alto": [1, 2], "parse_pag": [1, 2], "parse_xml": 2, "parser": [1, 2, 5], "part": [0, 1, 5, 7, 8], "parti": 1, "partial": [2, 4], "particular": [0, 1, 4, 5, 7, 8], "partit": 5, "pass": [2, 5, 7, 8], "path": [1, 2, 5], "pathlib": 2, "pattern": [2, 7], "pb_ignored_metr": 2, "pcgt": 5, "pdf": [0, 4, 7], "pdfimag": 7, "pdftocairo": 7, "peopl": 4, "per": [0, 1, 2, 5, 7], "perc": [0, 2], "percentag": 2, "percentil": 2, "perform": [1, 2, 4, 5, 7], "period": 7, "persist": 0, "person": 0, "pick": 5, "pickl": 6, "pil": [1, 2], "pillow": 1, "pinch": 0, "pinpoint": 7, "pipelin": 1, "pixel": [0, 1, 5, 8], "pl_modul": 1, "place": [0, 4, 7], "placement": 7, "plain": 0, "platform": 0, "pleas": 5, "plethora": 1, "png": [0, 1, 5, 7], "point": [0, 1, 2, 5, 7], "polygon": [0, 1, 2, 5, 7], "polygonal_reading_ord": 2, "polygongtdataset": 2, "polygonizaton": 2, "polylin": 2, "polyton": [0, 7], "pool": 5, "popul": 2, "porson": 0, "portant": 4, "portion": 0, "posit": [2, 5], "possibl": [0, 1, 2, 5, 7], "postprocess": [1, 5], "potenti": 5, "power": 7, "practic": 5, "pratiqu": 4, "pre": [0, 5], "precis": 5, "precompil": 5, "precomput": 2, "pred": 2, "pred_it": 1, "predict": [1, 2], "predict_label": 2, "predict_str": 2, "prefer": [1, 7], "prefilt": 0, "prefix": [2, 5, 7], "prefix_epoch": 7, "preliminari": 0, "preload": 7, "prematur": 5, "prepar": 7, "preparse_xml_data": 2, "prepend": 8, "preprint": 2, "preprocess": [2, 4], "prerequisit": 4, "preserv": 2, "pretrain_best": 5, "prevent": [2, 7], "previou": 4, "previous": 5, "primaresearch": 5, "primari": [0, 1, 5], "primarili": 4, "princip": [1, 2, 5], "print": [0, 1, 4, 5, 7], "printspac": 5, "privat": 0, "prob": [2, 8], "probabl": [2, 5, 7, 8], "problemat": 5, "proceed": 2, "process": [0, 1, 2, 4, 5, 7, 8], "processing_step": 2, "produc": [0, 1, 5, 7], "programm": 4, "progress": [2, 7], "project": [4, 8], "prone": 5, "pronn": 6, "proper": [1, 2], "properli": 7, "properti": 2, "proport": 5, "protobuf": [2, 6], "prove": 7, "provid": [0, 1, 2, 4, 5, 7, 8], "psl": 4, "public": [0, 4], "pull": 4, "purpos": [0, 1, 2, 7, 8], "put": [0, 2, 7], "py": 1, "pypi": 4, "pyrnn": 6, "python": 4, "pytorch": [0, 1, 3, 6], "pytorch_lightn": 1, "pytorchcodec": 2, "pyvip": 4, "q": 5, "qualiti": [0, 1, 7], "queryabl": 0, "quit": [1, 4, 5], "r": [0, 2, 5, 8], "rais": [1, 2, 5], "ran": 4, "random": [5, 7], "randomli": 5, "rang": [0, 2], "rapidli": 7, "rate": [5, 7], "rather": [0, 5], "ratio": 5, "raw": [0, 1, 5, 7], "rb": 2, "re": 2, "reach": 7, "read": [0, 2, 4, 5], "reader": 5, "reading_ord": 2, "reading_order_fn": 2, "real": 7, "realiz": 5, "reason": [0, 2], "rec_model_path": 1, "recherch": 4, "recogn": [0, 1, 2, 4, 5, 7], "recognit": [2, 3, 8], "recognitionmodel": 1, "recommend": [0, 1, 5, 7], "record": [1, 2, 4], "rectangl": 2, "rectangular": 0, "recurr": [2, 6], "reduc": [5, 8], "reduceonplateau": 5, "refer": [0, 1, 5, 7], "referenc": 2, "refin": 5, "region": [0, 1, 2, 4, 5, 7], "region_typ": 5, "region_type_0": 2, "region_type_1": 2, "regular": 5, "rel": 5, "relat": [0, 1, 5, 7], "relax": 7, "reliabl": 7, "relu": 8, "remain": [0, 5, 7], "remaind": 8, "remedi": 7, "remov": [0, 2, 5, 7, 8], "render": [1, 2], "render_report": 2, "reorder": [2, 5, 7], "repeatedli": 7, "repolygon": [1, 2], "report": [2, 5, 7], "repositori": [4, 7], "repres": 2, "represent": [2, 7], "reproduc": 5, "request": [0, 4, 8], "requir": [0, 1, 2, 4, 5, 7, 8], "requisit": 7, "research": 4, "reserv": 1, "reshap": [2, 5], "resili": 4, "resiz": [2, 5], "resize_output": 2, "resolv": 5, "respect": [1, 2, 4, 5, 8], "result": [0, 1, 2, 5, 7, 8], "resum": 5, "retain": [2, 5], "retrain": 7, "retriev": [4, 5, 7], "return": [0, 1, 2, 8], "reus": 2, "revers": 8, "rgb": [1, 8], "right": [0, 2, 4, 5, 7], "rl": [0, 2], "rmsprop": [5, 7], "rnn": [2, 4, 5, 7, 8], "romanov": 7, "rotat": 0, "rough": 7, "roughli": 0, "routin": 1, "rpred": 1, "rtl": [0, 2], "rtl_display_data": 5, "rtl_training_data": 5, "rukkakha": 7, "rule": 7, "run": [1, 2, 3, 4, 5, 7, 8], "r\u00e9f\u00e9renc": 4, "s1": [5, 8], "sa": 0, "same": [0, 1, 2, 4, 5, 7], "sampl": [2, 5, 7], "sarah": 7, "satur": 5, "savant": 7, "save": [2, 5, 7], "save_model": 2, "savefreq": [5, 7], "scale": [0, 2, 8], "scale_polygonal_lin": 2, "scale_region": 2, "scan": 7, "scantailor": 7, "schedul": 5, "schema": 5, "schemaloc": 5, "scientif": 4, "script": [0, 1, 2, 4, 5, 7], "script_detect": 1, "script_typ": 2, "scriptal": 1, "scroung": 4, "seamcarv": 2, "search": [0, 2], "second": [0, 2], "section": [1, 7], "see": [0, 1, 5, 7], "seen": [0, 1, 7], "seg": 1, "seg_idx": 2, "seg_typ": 2, "segment": [4, 7], "segment_k": 5, "segmentation_output": 1, "segmentation_overlai": 1, "segmentationmodel": 1, "segmodel_best": 5, "segresult": 2, "segtrain": 5, "seldom": 7, "select": [0, 2, 5, 8], "selector": 2, "self": 1, "semant": [5, 7], "semi": [0, 7], "sens": 0, "sensibl": [1, 5], "separ": [0, 1, 2, 4, 5, 7, 8], "sephardi": 0, "seqrecogn": 2, "sequenc": [0, 1, 2, 5, 7, 8], "serial": [0, 4, 6], "serialize_segment": [1, 2], "set": [0, 1, 2, 4, 5, 7, 8], "set_num_thread": 2, "setup": 1, "sever": [1, 2, 7], "sgd": 5, "shape": [2, 5, 8], "share": [0, 5], "shell": 7, "shini": 2, "short": [0, 8], "should": [1, 2, 7], "show": [0, 4, 5, 7], "shown": [0, 7], "shuffl": 1, "side": 0, "sigma": 2, "sigmoid": 8, "similar": [1, 5, 7], "simpl": [0, 1, 5, 7, 8], "simplifi": 0, "singl": [0, 1, 2, 5, 7, 8], "singular": 2, "size": [0, 1, 2, 5, 7, 8], "skew": [0, 7], "skip": 2, "slice": 2, "slightli": [0, 5, 7, 8], "slow": 5, "slower": 5, "small": [0, 1, 2, 5, 7, 8], "so": [0, 1, 3, 5, 7, 8], "sobel": 2, "softmax": [1, 2, 8], "softwar": 7, "some": [0, 1, 2, 4, 5, 7], "someon": 0, "someth": [1, 7], "sometim": [1, 4, 5, 7], "somewhat": 7, "soon": [5, 7], "sort": [2, 4, 7], "sourc": [2, 5, 7, 8], "sourceimageinform": 5, "sp": 5, "space": [0, 1, 2, 4, 5, 7], "span": 0, "spec": [2, 5], "special": [0, 1, 2], "specialis": 5, "specif": [2, 5, 7], "specifi": [0, 5], "speckl": 7, "speech": 2, "speedup": 5, "split": [2, 5, 7, 8], "spot": 4, "squash": [2, 8], "stabl": [1, 4], "stack": [2, 5, 8], "stage": [0, 1], "standard": [0, 1, 4, 5, 7], "start": [0, 1, 2, 5, 7], "start_separ": 2, "stddev": 5, "step": [0, 1, 2, 4, 5, 7, 8], "still": [0, 1, 2], "stop": [5, 7], "str": 2, "straightforward": 1, "stream": 5, "strength": 1, "strict": [2, 5], "strictli": 7, "stride": [5, 8], "stride_i": 8, "stride_x": 8, "string": [2, 5, 8], "strip": 8, "structur": [1, 4, 5], "stub": 5, "sub": 1, "subcommand": [0, 4], "subcommand_1": 0, "subcommand_2": 0, "subcommand_n": 0, "subimag": 2, "suboptim": 5, "subsampl": 5, "subsequ": [1, 2], "subset": [1, 2], "substitut": [2, 5, 7], "suffer": 7, "suffici": [1, 5], "suffix": [0, 2], "suggest": [0, 1], "suit": 7, "suitabl": [0, 7], "summar": [2, 5, 7, 8], "superflu": 7, "supervis": 5, "suppl_obj": 2, "suppli": [0, 1, 2, 5, 7], "support": [0, 1, 4, 5, 6], "suppos": 1, "suppress": [0, 5], "sure": 0, "surfac": [0, 2], "surrog": 5, "switch": [0, 2, 5, 7], "symbol": [5, 7], "syntax": [0, 5, 8], "syr": [5, 7], "syriac": 7, "syriac_best": 7, "system": [0, 4, 5, 7], "systemat": 7, "t": [0, 1, 2, 5, 7, 8], "tabl": [5, 7], "tag": [2, 5], "tags_ignor": 2, "take": [1, 4, 5, 7], "tanh": 8, "target": 2, "task": [5, 7], "tb": 2, "technic": 4, "tell": 5, "templat": [1, 2, 4], "tempor": 2, "tensor": [1, 2, 8], "tensorflow": 8, "term": 4, "tesseract": 8, "test": [2, 7], "test_model": 5, "text": [1, 2, 4, 7], "text_direct": [1, 2], "text_transform": 2, "textblock": 5, "textblock_m": 5, "textblock_n": 5, "textequiv": 5, "textlin": 5, "textregion": 5, "than": [2, 5, 7], "thei": [1, 2, 5, 7], "them": [0, 2, 5], "therefor": [0, 5, 7], "therein": 7, "thi": [0, 1, 2, 4, 5, 6, 7, 8], "third": 1, "those": 5, "though": 1, "thousand": 7, "thread": [2, 5, 7], "three": 6, "threshold": [0, 2], "through": [0, 1, 2, 4, 5, 7], "thrown": 0, "tif": [0, 4], "tiff": [0, 4, 7], "tightli": 7, "tild": 0, "time": [1, 2, 5, 7, 8], "tip": 1, "titl": 0, "titr": 4, "tmpl": 2, "token": 0, "told": 5, "too": [5, 8], "tool": [1, 5, 7, 8], "top": [0, 1, 2, 4], "toplin": [2, 5], "topolog": 0, "torch": 2, "torchsegrecogn": 2, "torchseqrecogn": [1, 2], "torchvgslmodel": [1, 2], "total": [2, 7], "train": [0, 3, 8], "trainabl": [0, 1, 2, 4, 5], "trainer": [1, 5], "training_data": [1, 5], "training_fil": 1, "transcrib": [5, 7], "transcript": [1, 2, 5], "transcriptioninterfac": 2, "transfer": [1, 5], "transform": [1, 2, 4], "transformt": 1, "translat": 2, "transpos": [5, 7, 8], "travail": 4, "treat": [2, 7, 8], "true": [1, 2, 8], "truli": 0, "truth": [5, 7], "try": 2, "tupl": 2, "turn": 4, "tutori": [1, 5], "tweak": 0, "two": [0, 1, 2, 5, 8], "txt": [0, 2, 4, 5], "type": [0, 1, 2, 5, 7, 8], "typefac": [5, 7], "typograph": [0, 7], "typologi": 5, "u": [0, 1, 5], "u1f05": 5, "un": 4, "unclean": 7, "unclear": 5, "undecod": 1, "undegrad": 0, "under": [0, 2, 4], "undesir": [5, 8], "unduli": 0, "unencod": 2, "uneven": 0, "uni": [0, 7], "unicod": [0, 1, 2, 7], "uniformli": 2, "union": [2, 4], "uniqu": [0, 7], "univers": 0, "universit\u00e9": 4, "unless": 5, "unnecessarili": 1, "unpredict": 7, "unrepres": 7, "unseg": [2, 7], "unset": 5, "until": 5, "untrain": 5, "unus": 5, "up": [1, 4, 5], "updat": 0, "upload": 0, "upon": 0, "upward": [2, 5, 7], "ur": 0, "us": [0, 1, 2, 3, 5, 7, 8], "usabl": 1, "user": [0, 2, 4, 5, 7], "user_metadata": 2, "usual": [0, 1, 5, 7], "utf": 5, "util": [1, 4, 5, 7], "v": [5, 7], "v4": 5, "val": 5, "val_metr": 2, "valid": [0, 2, 5], "valid_baselin": 2, "valid_region": 2, "validation_set": 2, "valu": [0, 1, 2, 5, 8], "variabl": [2, 4, 5, 8], "variant": 5, "variat": 5, "varieti": [4, 5], "variou": 0, "vast": 1, "vector": [0, 1, 2], "vectorize_lin": 2, "verbos": [1, 7], "veri": 5, "versa": [0, 5], "versatil": 6, "version": [0, 2, 3, 4, 5], "vertic": [0, 2], "vgsl": [1, 5], "vice": [0, 5], "visual": 0, "vocabulari": 2, "vocal": 7, "vpo": 5, "vsgl": 2, "vv": 7, "w": [0, 1, 2, 5, 8], "w3": 5, "wa": [0, 2, 4, 5, 7], "wai": [0, 1, 5, 7], "wait": 5, "want": [4, 5, 7], "warmup": 5, "warn": [0, 1, 7], "warp": 7, "wav2vec2": 2, "we": [2, 5, 7], "weak": [1, 7], "websit": 7, "weight": [2, 5], "welcom": 4, "well": [0, 5, 7], "were": [2, 5], "western": 7, "wget": 7, "what": [1, 7], "when": [0, 1, 2, 5, 7, 8], "where": [0, 2, 7], "whether": 2, "which": [0, 1, 2, 3, 4, 5], "while": [0, 1, 2, 5, 7], "white": [0, 1, 2, 7], "whitespac": [2, 5], "whitespace_norm": 2, "whole": [2, 7], "wide": [4, 8], "wider": 0, "width": [1, 2, 5, 7, 8], "wildli": 7, "without": [0, 2, 5, 7], "word": [4, 5], "word_text": 5, "work": [0, 1, 2, 5, 7], "worker": 5, "world": [0, 7], "worsen": 0, "would": [0, 5], "wrapper": [1, 2], "write": [0, 1, 2, 5], "writing_mod": 2, "written": [0, 5, 7], "www": 5, "x": [0, 2, 4, 5, 7, 8], "x0": 2, "x01": 1, "x02": 1, "x03": 1, "x04": 1, "x05": 1, "x06": 1, "x07": 1, "x1": 2, "x2": 2, "x64": 4, "x_0": 2, "x_1": 2, "x_bbox": 0, "x_conf": 0, "x_m": 2, "x_n": 2, "x_stride": 8, "xa0": 7, "xdg_base_dir": 0, "xk": 2, "xm": 2, "xml": [0, 7], "xmln": 5, "xmlschema": 5, "xn": 2, "xsd": 5, "xsi": 5, "xyz": 0, "y": [0, 2, 8], "y0": 2, "y1": 2, "y2": 2, "y_0": 2, "y_1": 2, "y_m": 2, "y_n": 2, "y_stride": 8, "yield": 2, "yk": 2, "ym": 2, "yml": [4, 7], "yn": 2, "you": [4, 5, 7], "your": 0, "y\u016bsuf": 7, "zenodo": [0, 4], "zero": [2, 7, 8], "zigzag": 0, "zoom": [0, 2], "\u00e3\u00ed\u00f1\u00f5": 0, "\u00e9cole": 4, "\u00e9tat": 4, "\u00e9tude": 4, "\u0127\u0129\u0142\u0169\u01ba\u1d49\u1ebd": 0, "\u02bf\u0101lam": 7, "\u0621": 5, "\u0621\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0627": 5, "\u0628": 5, "\u0629": 5, "\u062a": 5, "\u062b": 5, "\u062c": 5, "\u062d": 5, "\u062e": 5, "\u062f": 5, "\u0630": 5, "\u0631": 5, "\u0632": 5, "\u0633": 5, "\u0634": 5, "\u0635": 5, "\u0636": 5, "\u0637": 5, "\u0638": 5, "\u0639": 5, "\u063a": 5, "\u0640": 5, "\u0641": 5, "\u0642": 5, "\u0643": 5, "\u0644": 5, "\u0645": 5, "\u0646": 5, "\u0647": 5, "\u0648": 5, "\u0649": 5, "\u064a": 5, "\u0710": 7, "\u0712": 7, "\u0713": 7, "\u0715": 7, "\u0717": 7, "\u0718": 7, "\u0719": 7, "\u071a": 7, "\u071b": 7, "\u071d": 7, "\u071f": 7, "\u0720": 7, "\u0721": 7, "\u0722": 7, "\u0723": 7, "\u0725": 7, "\u0726": 7, "\u0728": 7, "\u0729": 7, "\u072a": 7, "\u072b": 7, "\u072c": 7, "\u2079\ua751\ua753\ua76f\ua770": 0}, "titles": ["Advanced Usage", "API Quickstart", "API Reference", "GPU Acceleration", "kraken", "Training", "Models", "Training kraken", "VGSL network specification"], "titleterms": {"acceler": 3, "acquisit": 7, "advanc": 0, "alto": 5, "annot": 7, "api": [1, 2], "baselin": [0, 1], "basic": [1, 8], "binar": [0, 2], "binari": 5, "blla": 2, "box": 0, "codec": [2, 5], "compil": 7, "concept": 1, "conda": 4, "convolut": 8, "coreml": 6, "ctc_decod": 2, "data": 5, "dataset": [2, 5, 7], "direct": 0, "dropout": 8, "evalu": [2, 7], "exampl": 8, "except": 2, "featur": 4, "find": 4, "fine": 5, "format": 5, "from": 5, "function": 2, "fund": 4, "gpu": 3, "group": 8, "helper": [2, 8], "imag": 7, "input": 0, "instal": [4, 7], "kraken": [2, 4, 7], "layer": 8, "legaci": [0, 1, 2], "lib": 2, "licens": 4, "linegen": 2, "loss": 2, "mask": 0, "max": 8, "model": [0, 2, 4, 6], "modul": 2, "network": 8, "normal": [5, 8], "page": [0, 5], "pageseg": 2, "pars": 1, "pip": 4, "plumb": 8, "pool": 8, "preprocess": [1, 7], "pretrain": 5, "princip": 0, "publish": 0, "queri": 0, "quickstart": [1, 4], "recognit": [0, 1, 4, 5, 6, 7], "recurr": 8, "refer": 2, "regular": 8, "relat": 4, "repositori": 0, "reshap": 8, "retriev": 0, "rpred": 2, "schedul": 2, "scratch": 5, "segment": [0, 1, 2, 5, 6], "serial": [1, 2], "slice": 5, "softwar": 4, "specif": [0, 8], "stopper": 2, "test": 5, "text": [0, 5], "train": [1, 2, 4, 5, 7], "trainer": 2, "transcrib": 2, "transcript": 7, "tune": 5, "tutori": 4, "unicod": 5, "unsupervis": 5, "us": 4, "usag": 0, "valid": 7, "vgsl": [2, 8], "xml": [1, 2, 5]}}) \ No newline at end of file diff --git a/4.2.0/training.html b/4.2.0/training.html new file mode 100644 index 000000000..c1a7e00d4 --- /dev/null +++ b/4.2.0/training.html @@ -0,0 +1,509 @@ + + + + + + + + Training kraken — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Training kraken

+

kraken is an optical character recognition package that can be trained fairly +easily for a large number of scripts. In contrast to other system requiring +segmentation down to glyph level before classification, it is uniquely suited +for the recognition of connected scripts, because the neural network is trained +to assign correct character to unsegmented training data.

+

Both segmentation, the process finding lines and regions on a page image, and +recognition, the conversion of line images into text, can be trained in kraken. +To train models for either we require training data, i.e. examples of page +segmentations and transcriptions that are similar to what we want to be able to +recognize. For segmentation the examples are the location of baselines, i.e. +the imaginary lines the text is written on, and polygons of regions. For +recognition these are the text contained in a line. There are multiple ways to +supply training data but the easiest is through PageXML or ALTO files.

+
+

Installing kraken

+

The easiest way to install and use kraken is through conda. kraken works both on Linux and Mac OS +X. After installing conda, download the environment file and create the +environment for kraken:

+
$ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment.yml
+$ conda env create -f environment.yml
+
+
+

Each time you want to use the kraken environment in a shell is has to be +activated first:

+
$ conda activate kraken
+
+
+
+
+

Image acquisition and preprocessing

+

First a number of high quality scans, preferably color or grayscale and at +least 300dpi are required. Scans should be in a lossless image format such as +TIFF or PNG, images in PDF files have to be extracted beforehand using a tool +such as pdftocairo or pdfimages. While each of these requirements can +be relaxed to a degree, the final accuracy will suffer to some extent. For +example, only slightly compressed JPEG scans are generally suitable for +training and recognition.

+

Depending on the source of the scans some preprocessing such as splitting scans +into pages, correcting skew and warp, and removing speckles can be advisable +although it isn’t strictly necessary as the segmenter can be trained to treat +noisy material with a high accuracy. A fairly user-friendly software for +semi-automatic batch processing of image scans is Scantailor albeit most work can be done using a standard image +editor.

+

The total number of scans required depends on the kind of model to train +(segmentation or recognition), the complexity of the layout or the nature of +the script to recognize. Only features that are found in the training data can +later be recognized, so it is important that the coverage of typographic +features is exhaustive. Training a small segmentation model for a particular +kind of material might require less than a few hundred samples while a general +model can well go into the thousands of pages. Likewise a specific recognition +model for printed script with a small grapheme inventory such as Arabic or +Hebrew requires around 800 lines, with manuscripts, complex scripts (such as +polytonic Greek), and general models for multiple typefaces and hands needing +more training data for the same accuracy.

+

There is no hard rule for the amount of training data and it may be required to +retrain a model after the initial training data proves insufficient. Most +western texts contain between 25 and 40 lines per page, therefore upward of +30 pages have to be preprocessed and later transcribed.

+
+
+

Annotation and transcription

+

kraken does not provide internal tools for the annotation and transcription of +baselines, regions, and text. There are a number of tools available that can +create ALTO and PageXML files containing the requisite information for either +segmentation or recognition training: escriptorium integrates kraken tightly including +training and inference, Aletheia is a powerful desktop +application that can create fine grained annotations.

+
+
+

Dataset Compilation

+
+
+

Training

+

The training data, e.g. a collection of PAGE XML documents, obtained through +annotation and transcription may now be used to train segmentation and/or +transcription models.

+

The training data in output_dir may now be used to train a new model by +invoking the ketos train command. Just hand a list of images to the command +such as:

+
$ ketos train output_dir/*.png
+
+
+

to start training.

+

A number of lines will be split off into a separate held-out set that is used +to estimate the actual recognition accuracy achieved in the real world. These +are never shown to the network during training but will be recognized +periodically to evaluate the accuracy of the model. Per default the validation +set will comprise of 10% of the training data.

+

Basic model training is mostly automatic albeit there are multiple parameters +that can be adjusted:

+
+
--output
+

Sets the prefix for models generated during training. They will best as +prefix_epochs.mlmodel.

+
+
--report
+

How often evaluation passes are run on the validation set. It is an +integer equal or larger than 1 with 1 meaning a report is created each +time the complete training set has been seen by the network.

+
+
--savefreq
+

How often intermediate models are saved to disk. It is an integer with +the same semantics as --report.

+
+
--load
+

Continuing training is possible by loading an existing model file with +--load. To continue training from a base model with another +training set refer to the full ketos documentation.

+
+
--preload
+

Enables/disables preloading of the training set into memory for +accelerated training. The default setting preloads data sets with less +than 2500 lines, explicitly adding --preload will preload arbitrary +sized sets. --no-preload disables preloading in all circumstances.

+
+
+

Training a network will take some time on a modern computer, even with the +default parameters. While the exact time required is unpredictable as training +is a somewhat random process a rough guide is that accuracy seldom improves +after 50 epochs reached between 8 and 24 hours of training.

+

When to stop training is a matter of experience; the default setting employs a +fairly reliable approach known as early stopping that stops training as soon as +the error rate on the validation set doesn’t improve anymore. This will +prevent overfitting, i.e. +fitting the model to recognize only the training data properly instead of the +general patterns contained therein.

+
$ ketos train output_dir/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'}
+Initializing model ✓
+Accuracy report (0) -1.5951 3680 9550
+epoch 0/-1  [####################################]  788/788
+Accuracy report (1) 0.0245 3504 3418
+epoch 1/-1  [####################################]  788/788
+Accuracy report (2) 0.8445 3504 545
+epoch 2/-1  [####################################]  788/788
+Accuracy report (3) 0.9541 3504 161
+epoch 3/-1  [------------------------------------]  13/788  0d 00:22:09
+...
+
+
+

By now there should be a couple of models model_name-1.mlmodel, +model_name-2.mlmodel, … in the directory the script was executed in. Lets +take a look at each part of the output.

+
Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+
+
+

shows the progress of loading the training and validation set into memory. This +might take a while as preprocessing the whole set and putting it into memory is +computationally intensive. Loading can be made faster without preloading at the +cost of performing preprocessing repeatedly during the training process.

+
[270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'}
+
+
+

is a warning about missing characters in either the validation or training set, +i.e. that the alphabets of the sets are not equal. Increasing the size of the +validation set will often remedy this warning.

+
Accuracy report (2) 0.8445 3504 545
+
+
+

this line shows the results of the validation set evaluation. The error after 2 +epochs is 545 incorrect characters out of 3504 characters in the validation set +for a character accuracy of 84.4%. It should decrease fairly rapidly. If +accuracy remains around 0.30 something is amiss, e.g. non-reordered +right-to-left or wildly incorrect transcriptions. Abort training, correct the +error(s) and start again.

+

After training is finished the best model is saved as +model_name_best.mlmodel. It is highly recommended to also archive the +training log and data for later reference.

+

ketos can also produce more verbose output with training set and network +information by appending one or more -v to the command:

+
$ ketos -vv train syr/*.png
+[0.7272] Building ground truth set from 876 line images
+[0.7281] Taking 88 lines from training for evaluation
+...
+[0.8479] Training set 788 lines, validation set 88 lines, alphabet 48 symbols
+[0.8481] alphabet mismatch {'\xa0', '0', ':', '݀', '܇', '݂', '5'}
+[0.8482] grapheme       count
+[0.8484] SPACE  5258
+[0.8484]        ܐ       3519
+[0.8485]        ܘ       2334
+[0.8486]        ܝ       2096
+[0.8487]        ܠ       1754
+[0.8487]        ܢ       1724
+[0.8488]        ܕ       1697
+[0.8489]        ܗ       1681
+[0.8489]        ܡ       1623
+[0.8490]        ܪ       1359
+[0.8491]        ܬ       1339
+[0.8491]        ܒ       1184
+[0.8492]        ܥ       824
+[0.8492]        .       811
+[0.8493] COMBINING DOT BELOW    646
+[0.8493]        ܟ       599
+[0.8494]        ܫ       577
+[0.8495] COMBINING DIAERESIS    488
+[0.8495]        ܚ       431
+[0.8496]        ܦ       428
+[0.8496]        ܩ       307
+[0.8497] COMBINING DOT ABOVE    259
+[0.8497]        ܣ       256
+[0.8498]        ܛ       204
+[0.8498]        ܓ       176
+[0.8499]        ܀       132
+[0.8499]        ܙ       81
+[0.8500]        *       66
+[0.8501]        ܨ       59
+[0.8501]        ܆       40
+[0.8502]        [       40
+[0.8503]        ]       40
+[0.8503]        1       18
+[0.8504]        2       11
+[0.8504]        ܇       9
+[0.8505]        3       8
+[0.8505]                6
+[0.8506]        5       5
+[0.8506] NO-BREAK SPACE 4
+[0.8507]        0       4
+[0.8507]        6       4
+[0.8508]        :       4
+[0.8508]        8       4
+[0.8509]        9       3
+[0.8510]        7       3
+[0.8510]        4       3
+[0.8511] SYRIAC FEMININE DOT    1
+[0.8511] SYRIAC RUKKAKHA        1
+[0.8512] Encoding training set
+[0.9315] Creating new model [1,1,0,48 Lbx100 Do] with 49 outputs
+[0.9318] layer          type    params
+[0.9350] 0              rnn     direction b transposed False summarize False out 100 legacy None
+[0.9361] 1              dropout probability 0.5 dims 1
+[0.9381] 2              linear  augmented False out 49
+[0.9918] Constructing RMSprop optimizer (lr: 0.001, momentum: 0.9)
+[0.9920] Set OpenMP threads to 4
+[0.9920] Moving model to device cpu
+[0.9924] Starting evaluation run
+
+
+

indicates that the training is running on 788 transcribed lines and a +validation set of 88 lines. 49 different classes, i.e. Unicode code points, +where found in these 788 lines. These affect the output size of the network; +obviously only these 49 different classes/code points can later be output by +the network. Importantly, we can see that certain characters occur markedly +less often than others. Characters like the Syriac feminine dot and numerals +that occur less than 10 times will most likely not be recognized well by the +trained net.

+
+
+

Evaluation and Validation

+

While output during training is detailed enough to know when to stop training +one usually wants to know the specific kinds of errors to expect. Doing more +in-depth error analysis also allows to pinpoint weaknesses in the training +data, e.g. above average error rates for numerals indicate either a lack of +representation of numerals in the training data or erroneous transcription in +the first place.

+

First the trained model has to be applied to some line transcriptions with the +ketos test command:

+
$ ketos test -m syriac_best.mlmodel lines/*.png
+Loading model syriac_best.mlmodel ✓
+Evaluating syriac_best.mlmodel
+Evaluating  [#-----------------------------------]    3%  00:04:56
+...
+
+
+

After all lines have been processed a evaluation report will be printed:

+
=== report  ===
+
+35619     Characters
+336       Errors
+99.06%    Accuracy
+
+157       Insertions
+81        Deletions
+98        Substitutions
+
+Count     Missed  %Right
+27046     143     99.47%  Syriac
+7015      52      99.26%  Common
+1558      60      96.15%  Inherited
+
+Errors    Correct-Generated
+25        {  } - { COMBINING DOT BELOW }
+25        { COMBINING DOT BELOW } - {  }
+15        { . } - {  }
+15        { COMBINING DIAERESIS } - {  }
+12        { ܢ } - {  }
+10        {  } - { . }
+8 { COMBINING DOT ABOVE } - {  }
+8 { ܝ } - {  }
+7 { ZERO WIDTH NO-BREAK SPACE } - {  }
+7 { ܆ } - {  }
+7 { SPACE } - {  }
+7 { ܣ } - {  }
+6 {  } - { ܝ }
+6 { COMBINING DOT ABOVE } - { COMBINING DIAERESIS }
+5 { ܙ } - {  }
+5 { ܬ } - {  }
+5 {  } - { ܢ }
+4 { NO-BREAK SPACE } - {  }
+4 { COMBINING DIAERESIS } - { COMBINING DOT ABOVE }
+4 {  } - { ܒ }
+4 {  } - { COMBINING DIAERESIS }
+4 { ܗ } - {  }
+4 {  } - { ܬ }
+4 {  } - { ܘ }
+4 { ܕ } - { ܢ }
+3 {  } - { ܕ }
+3 { ܐ } - {  }
+3 { ܗ } - { ܐ }
+3 { ܝ } - { ܢ }
+3 { ܀ } - { . }
+3 {  } - { ܗ }
+
+  .....
+
+
+

The first section of the report consists of a simple accounting of the number +of characters in the ground truth, the errors in the recognition output and the +resulting accuracy in per cent.

+

The next table lists the number of insertions (characters occurring in the +ground truth but not in the recognition output), substitutions (misrecognized +characters), and deletions (superfluous characters recognized by the model).

+

Next is a grouping of errors (insertions and substitutions) by Unicode script.

+

The final part of the report are errors sorted by frequency and a per +character accuracy report. Importantly most errors are incorrect recognition of +combining marks such as dots and diaereses. These may have several sources: +different dot placement in training and validation set, incorrect transcription +such as non-systematic transcription, or unclean speckled scans. Depending on +the error source, correction most often involves adding more training data and +fixing transcriptions. Sometimes it may even be advisable to remove +unrepresentative data from the training set.

+
+
+

Recognition

+

The kraken utility is employed for all non-training related tasks. Optical +character recognition is a multi-step process consisting of binarization +(conversion of input images to black and white), page segmentation (extracting +lines from the image), and recognition (converting line image to character +sequences). All of these may be run in a single call like this:

+
$ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m MODEL_FILE
+
+
+

producing a text file from the input image. There are also hocr and ALTO output +formats available through the appropriate switches:

+
$ kraken -i ... ocr -h
+$ kraken -i ... ocr -a
+
+
+

For debugging purposes it is sometimes helpful to run each step manually and +inspect intermediate results:

+
$ kraken -i INPUT_IMAGE BW_IMAGE binarize
+$ kraken -i BW_IMAGE LINES segment
+$ kraken -i BW_IMAGE OUTPUT_FILE ocr -l LINES ...
+
+
+

It is also possible to recognize more than one file at a time by just chaining +-i ... ... clauses like this:

+
$ kraken -i input_1 output_1 -i input_2 output_2 ...
+
+
+

Finally, there is a central repository containing freely available models. +Getting a list of all available models:

+
$ kraken list
+
+
+

Retrieving model metadata for a particular model:

+
$ kraken show arabic-alam-al-kutub
+name: arabic-alam-al-kutub.mlmodel
+
+An experimental model for Classical Arabic texts.
+
+Network trained on 889 lines of [0] as a test case for a general Classical
+Arabic model. Ground truth was prepared by Sarah Savant
+<sarah.savant@aku.edu> and Maxim Romanov <maxim.romanov@uni-leipzig.de>.
+
+Vocalization was omitted in the ground truth. Training was stopped at ~35000
+iterations with an accuracy of 97%.
+
+[0] Ibn al-Faqīh (d. 365 AH). Kitāb al-buldān. Edited by Yūsuf al-Hādī, 1st
+edition. Bayrūt: ʿĀlam al-kutub, 1416 AH/1996 CE.
+alphabet:  !()-.0123456789:[] «»،؟ءابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ARABIC
+MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW
+
+
+

and actually fetching the model:

+
$ kraken get arabic-alam-al-kutub
+
+
+

The downloaded model can then be used for recognition by the name shown in its metadata, e.g.:

+
$ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m arabic-alam-al-kutub.mlmodel
+
+
+

For more documentation see the kraken website.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.2.0/vgsl.html b/4.2.0/vgsl.html new file mode 100644 index 000000000..d504d371f --- /dev/null +++ b/4.2.0/vgsl.html @@ -0,0 +1,288 @@ + + + + + + + + VGSL network specification — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

VGSL network specification

+

kraken implements a dialect of the Variable-size Graph Specification Language +(VGSL), enabling the specification of different network architectures for image +processing purposes using a short definition string.

+
+

Basics

+

A VGSL specification consists of an input block, one or more layers, and an +output block. For example:

+
[1,48,0,1 Cr3,3,32 Mp2,2 Cr3,3,64 Mp2,2 S1(1x12)1,3 Lbx100 Do O1c103]
+
+
+

The first block defines the input in order of [batch, height, width, channels] +with zero-valued dimensions being variable. Integer valued height or width +input specifications will result in the input images being automatically scaled +in either dimension.

+

When channels are set to 1 grayscale or B/W inputs are expected, 3 expects RGB +color images. Higher values in combination with a height of 1 result in the +network being fed 1 pixel wide grayscale strips scaled to the size of the +channel dimension.

+

After the input, a number of layers are defined. Layers operate on the channel +dimension; this is intuitive for convolutional layers but a recurrent layer +doing sequence classification along the width axis on an image of a particular +height requires the height dimension to be moved to the channel dimension, +e.g.:

+
[1,48,0,1 S1(1x48)1,3 Lbx100 O1c103]
+
+
+

or using the alternative slightly faster formulation:

+
[1,1,0,48 Lbx100 O1c103]
+
+
+

Finally an output definition is appended. When training sequence classification +networks with the provided tools the appropriate output definition is +automatically appended to the network based on the alphabet of the training +data.

+
+
+

Examples

+
[1,1,0,48 Lbx100 Do 01c59]
+
+Creating new model [1,1,0,48 Lbx100 Do] with 59 outputs
+layer           type    params
+0               rnn     direction b transposed False summarize False out 100 legacy None
+1               dropout probability 0.5 dims 1
+2               linear  augmented False out 59
+
+
+

A simple recurrent recognition model with a single LSTM layer classifying lines +normalized to 48 pixels in height.

+
[1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do 01c59]
+
+Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 59 outputs
+layer           type    params
+0               conv    kernel 3 x 3 filters 32 activation r
+1               dropout probability 0.1 dims 2
+2               maxpool kernel 2 x 2 stride 2 x 2
+3               conv    kernel 3 x 3 filters 64 activation r
+4               dropout probability 0.1 dims 2
+5               maxpool kernel 2 x 2 stride 2 x 2
+6               reshape from 1 1 x 12 to 1/3
+7               rnn     direction b transposed False summarize False out 100 legacy None
+8               dropout probability 0.5 dims 1
+9               linear  augmented False out 59
+
+
+

A model with a small convolutional stack before a recurrent LSTM layer. The +extended dropout layer syntax is used to reduce drop probability on the depth +dimension as the default is too high for convolutional layers. The remainder of +the height dimension (12) is reshaped into the depth dimensions before +applying the final recurrent and linear layers.

+
[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do 01c59]
+
+Creating new model [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do] with 59 outputs
+layer           type    params
+0               conv    kernel 3 x 3 filters 16 activation r
+1               maxpool kernel 3 x 3 stride 3 x 3
+2               rnn     direction f transposed True summarize True out 64 legacy None
+3               rnn     direction b transposed False summarize False out 128 legacy None
+4               rnn     direction b transposed False summarize False out 256 legacy None
+5               dropout probability 0.5 dims 1
+6               linear  augmented False out 59
+
+
+

A model with arbitrary sized color image input, an initial summarizing +recurrent layer to squash the height to 64, followed by 2 bi-directional +recurrent layers and a linear projection.

+
+
+

Convolutional Layers

+
C[{name}](s|t|r|l|m)[{name}]<y>,<x>,<d>[,<stride_y>,<stride_x>]
+s = sigmoid
+t = tanh
+r = relu
+l = linear
+m = softmax
+
+
+

Adds a 2D convolution with kernel size (y, x) and d output channels, applying +the selected nonlinearity. The stride can be adjusted with the optional last +two parameters.

+
+
+

Recurrent Layers

+
L[{name}](f|r|b)(x|y)[s][{name}]<n> LSTM cell with n outputs.
+G[{name}](f|r|b)(x|y)[s][{name}]<n> GRU cell with n outputs.
+f runs the RNN forward only.
+r runs the RNN reversed only.
+b runs the RNN bidirectionally.
+s (optional) summarizes the output in the requested dimension, return the last step.
+
+
+

Adds either an LSTM or GRU recurrent layer to the network using either the x +(width) or y (height) dimension as the time axis. Input features are the +channel dimension and the non-time-axis dimension (height/width) is treated as +another batch dimension. For example, a Lfx25 layer on an 1, 16, 906, 32 +input will execute 16 independent forward passes on 906x32 tensors resulting +in an output of shape 1, 16, 906, 25. If this isn’t desired either run a +summarizing layer in the other direction, e.g. Lfys20 for an input 1, 1, +906, 20, or prepend a reshape layer S1(1x16)1,3 combining the height and +channel dimension for an 1, 1, 906, 512 input to the recurrent layer.

+
+
+

Helper and Plumbing Layers

+
+

Max Pool

+
Mp[{name}]<y>,<x>[,<y_stride>,<x_stride>]
+
+
+

Adds a maximum pooling with (y, x) kernel_size and (y_stride, x_stride) stride.

+
+
+

Reshape

+
S[{name}]<d>(<a>x<b>)<e>,<f> Splits one dimension, moves one part to another
+        dimension.
+
+
+

The S layer reshapes a source dimension d to a,b and distributes a into +dimension e, respectively b into f. Either e or f has to be equal to +d. So S1(1, 48)1, 3 on an 1, 48, 1020, 8 input will first reshape into +1, 1, 48, 1020, 8, leave the 1 part in the height dimension and distribute +the 48 sized tensor into the channel dimension resulting in a 1, 1, 1024, +48*8=384 sized output. S layers are mostly used to remove undesirable non-1 +height before a recurrent layer.

+
+

Note

+

This S layer is equivalent to the one implemented in the tensorflow +implementation of VGSL, i.e. behaves differently from tesseract.

+
+
+
+
+

Regularization Layers

+
+

Dropout

+
Do[{name}][<prob>],[<dim>] Insert a 1D or 2D dropout layer
+
+
+

Adds an 1D or 2D dropout layer with a given probability. Defaults to 0.5 drop +probability and 1D dropout. Set to dim to 2 after convolutional layers.

+
+
+

Group Normalization

+
Gn<groups> Inserts a group normalization layer
+
+
+

Adds a group normalization layer separating the input into <groups> groups, +normalizing each separately.

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.3.0/.buildinfo b/4.3.0/.buildinfo new file mode 100644 index 000000000..74c273527 --- /dev/null +++ b/4.3.0/.buildinfo @@ -0,0 +1,4 @@ +# Sphinx build info version 1 +# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. +config: 95dc8dab335796ef807f4cf1a22f28c0 +tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/4.3.0/.doctrees/advanced.doctree b/4.3.0/.doctrees/advanced.doctree new file mode 100644 index 000000000..bb410a0de Binary files /dev/null and b/4.3.0/.doctrees/advanced.doctree differ diff --git a/4.3.0/.doctrees/api.doctree b/4.3.0/.doctrees/api.doctree new file mode 100644 index 000000000..32fdc5b46 Binary files /dev/null and b/4.3.0/.doctrees/api.doctree differ diff --git a/4.3.0/.doctrees/api_docs.doctree b/4.3.0/.doctrees/api_docs.doctree new file mode 100644 index 000000000..7f115e4b2 Binary files /dev/null and b/4.3.0/.doctrees/api_docs.doctree differ diff --git a/4.3.0/.doctrees/environment.pickle b/4.3.0/.doctrees/environment.pickle new file mode 100644 index 000000000..a408cfcfa Binary files /dev/null and b/4.3.0/.doctrees/environment.pickle differ diff --git a/4.3.0/.doctrees/gpu.doctree b/4.3.0/.doctrees/gpu.doctree new file mode 100644 index 000000000..25cbfdb8c Binary files /dev/null and b/4.3.0/.doctrees/gpu.doctree differ diff --git a/4.3.0/.doctrees/index.doctree b/4.3.0/.doctrees/index.doctree new file mode 100644 index 000000000..974213a1a Binary files /dev/null and b/4.3.0/.doctrees/index.doctree differ diff --git a/4.3.0/.doctrees/ketos.doctree b/4.3.0/.doctrees/ketos.doctree new file mode 100644 index 000000000..9298f8514 Binary files /dev/null and b/4.3.0/.doctrees/ketos.doctree differ diff --git a/4.3.0/.doctrees/models.doctree b/4.3.0/.doctrees/models.doctree new file mode 100644 index 000000000..188f7cb31 Binary files /dev/null and b/4.3.0/.doctrees/models.doctree differ diff --git a/4.3.0/.doctrees/training.doctree b/4.3.0/.doctrees/training.doctree new file mode 100644 index 000000000..6059a2d7f Binary files /dev/null and b/4.3.0/.doctrees/training.doctree differ diff --git a/4.3.0/.doctrees/vgsl.doctree b/4.3.0/.doctrees/vgsl.doctree new file mode 100644 index 000000000..4b48b0ebe Binary files /dev/null and b/4.3.0/.doctrees/vgsl.doctree differ diff --git a/4.3.0/.nojekyll b/4.3.0/.nojekyll new file mode 100644 index 000000000..e69de29bb diff --git a/4.3.0/_images/blla_heatmap.jpg b/4.3.0/_images/blla_heatmap.jpg new file mode 100644 index 000000000..3f3381096 Binary files /dev/null and b/4.3.0/_images/blla_heatmap.jpg differ diff --git a/4.3.0/_images/blla_output.jpg b/4.3.0/_images/blla_output.jpg new file mode 100644 index 000000000..72c652fa4 Binary files /dev/null and b/4.3.0/_images/blla_output.jpg differ diff --git a/4.3.0/_images/bw.png b/4.3.0/_images/bw.png new file mode 100644 index 000000000..e7e5054eb Binary files /dev/null and b/4.3.0/_images/bw.png differ diff --git a/4.3.0/_images/normal-reproduction-low-resolution.jpg b/4.3.0/_images/normal-reproduction-low-resolution.jpg new file mode 100644 index 000000000..673be92ae Binary files /dev/null and b/4.3.0/_images/normal-reproduction-low-resolution.jpg differ diff --git a/4.3.0/_images/pat.png b/4.3.0/_images/pat.png new file mode 100644 index 000000000..52f7a8995 Binary files /dev/null and b/4.3.0/_images/pat.png differ diff --git a/4.3.0/_sources/advanced.rst.txt b/4.3.0/_sources/advanced.rst.txt new file mode 100644 index 000000000..70448d5e8 --- /dev/null +++ b/4.3.0/_sources/advanced.rst.txt @@ -0,0 +1,462 @@ +.. _advanced: + +Advanced Usage +============== + +Optical character recognition is the serial execution of multiple steps, in the +case of kraken, layout analysis/page segmentation (extracting topological text +lines from an image), recognition (feeding text lines images into a +classifier), and finally serialization of results into an appropriate format +such as ALTO or PageXML. + +Input and Outputs +----------------- + +Kraken inputs and their outputs can be defined in multiple ways. The most +simple are input-output pairs, i.e. producing one output document for one input +document follow the basic syntax: + +.. code-block:: console + + $ kraken -i input_1 output_1 -i input_2 output_2 ... subcommand_1 subcommand_2 ... subcommand_n + +In particular subcommands may be chained. + +There are other ways to define inputs and outputs as the syntax shown above can +become rather cumbersome for large amounts of files. + +As such there are a couple of ways to deal with multiple files in a compact +way. The first is batch processing: + +.. code-block:: console + + $ kraken -I '*.png' -o ocr.txt segment ... + +which expands the `glob expression +`_ in kraken internally and +appends the suffix defined with `-o` to each output file. An input file +`xyz.png` will therefore produce an output file `xyz.png.ocr.txt`. A second way +is to input multi-image files directly. These can be either in PDF, TIFF, or +JPEG2000 format and are specified like: + +.. code-block:: console + + $ kraken -I some.pdf -o ocr.txt -f pdf segment ... + +This will internally extract all page images from the input PDF file and write +one output file with an index (can be changed using the `-p` option) and the +suffix defined with `-o`. + +The `-f` option can not only be used to extract data from PDF/TIFF/JPEG2000 +files but also various XML formats. In these cases the appropriate data is +automatically selected from the inputs, image data for segmentation or line and +region segmentation for recognition: + +.. code-block:: console + + $ kraken -i alto.xml alto.ocr.txt -i page.xml page.ocr.txt -f xml ocr ... + +The code is able to automatically determine if a file is in PageXML or ALTO format. + +Output formats +^^^^^^^^^^^^^^ + +All commands have a default output format such as raw text for `ocr`, a plain +image for `binarize`, or a JSON definition of the the segmentation for +`segment`. These are specific to kraken and generally not suitable for further +processing by other software but a number of standardized data exchange formats +can be selected. Per default `ALTO `_, +`PageXML `_, `hOCR +`_, and abbyyXML containing additional metadata such as +bounding boxes and confidences are implemented. In addition, custom `jinja +`_ templates can be loaded to crate +individualised output such as TEI. + +Output formats are selected on the main `kraken` command and apply to the last +subcommand defined in the subcommand chain. For example: + +.. code-block:: console + + $ kraken --alto -i ... segment -bl + +will serialize a plain segmentation in ALTO into the specified output file. + +The currently available format switches are: + +.. code-block:: console + + $ kraken -n -i ... ... # native output + $ kraken -a -i ... ... # ALTO output + $ kraken -x -i ... ... # PageXML output + $ kraken -h -i ... ... # hOCR output + $ kraken -y -i ... ... # abbyyXML output + +Custom templates can be loaded with the `--template` option: + +.. code-block:: console + + $ kraken --template /my/awesome/template.tmpl -i ... ... + +The data objects used by the templates are considered internal to kraken and +can change from time to time. The best way to get some orientation when writing +a new template from scratch is to have a look at the existing templates `here +`_. + +Binarization +------------ + +.. _binarization: + +.. note:: + + Binarization is deprecated and mostly not necessary anymore. It can often + worsen text recognition results especially for documents with uneven + lighting, faint writing, etc. + +The binarization subcommand converts a color or grayscale input image into an +image containing only two color levels: white (background) and black +(foreground, i.e. text). It accepts almost the same parameters as +``ocropus-nlbin``. Only options not related to binarization, e.g. skew +detection are missing. In addition, error checking (image sizes, inversion +detection, grayscale enforcement) is always disabled and kraken will happily +binarize any image that is thrown at it. + +Available parameters are: + +=========== ==== +option type +=========== ==== +--threshold FLOAT +--zoom FLOAT +--escale FLOAT +--border FLOAT +--perc INTEGER RANGE +--range INTEGER +--low INTEGER RANGE +--high INTEGER RANGE +=========== ==== + +To binarize a image: + +.. code-block:: console + + $ kraken -i input.jpg bw.png binarize + +.. note:: + + Some image formats, notably JPEG, do not support a black and white + image mode. Per default the output format according to the output file + name extension will be honored. If this is not possible, a warning will + be printed and the output forced to PNG: + + .. code-block:: console + + $ kraken -i input.jpg bw.jpg binarize + Binarizing [06/24/22 09:56:23] WARNING jpeg does not support 1bpp images. Forcing to png. + ✓ + +Page Segmentation +----------------- + +The `segment` subcommand accesses page segmentation into lines and regions with +the two layout analysis methods implemented: the trainable baseline segmenter +that is capable of detecting both lines of different types and regions and a +legacy non-trainable segmenter that produces bounding boxes. + +Universal parameters of either segmenter are: + +=============================================== ====== +option action +=============================================== ====== +-d, --text-direction Sets principal text direction. Valid values are `horizontal-lr`, `horizontal-rl`, `vertical-lr`, and `vertical-rl`. +-m, --mask Segmentation mask suppressing page areas for line detection. A simple black and white mask image where 0-valued (black) areas are ignored for segmentation purposes. +=============================================== ====== + +Baseline Segmentation +^^^^^^^^^^^^^^^^^^^^^ + +The baseline segmenter works by applying a segmentation model on a page image +which labels each pixel on the image with one or more classes with each class +corresponding to a line or region of a specific type. In addition there are two +auxiliary classes that are used to determine the line orientation. A simplified +example of a composite image of the auxiliary classes and a single line type +without regions can be seen below: + +.. image:: _static/blla_heatmap.jpg + :width: 800 + :alt: BLLA output heatmap + +In a second step the raw heatmap is vectorized to extract line instances and +region boundaries, followed by bounding polygon computation for the baselines, +and text line ordering. The final output can be visualized as: + +.. image:: _static/blla_output.jpg + :width: 800 + :alt: BLLA final output + +The primary determinant of segmentation quality is the segmentation model +employed. There is a default model that works reasonably well on printed and +handwritten material on undegraded, even writing surfaces such as paper or +parchment. The output of this model consists of a single line type and a +generic text region class that denotes coherent blocks of text. This model is +employed automatically when the baseline segment is activated with the `-bl` +option: + +.. code-block:: console + + $ kraken -i input.jpg segmentation.json segment -bl + +New models optimized for other kinds of documents can be trained (see +:ref:`here `). These can be applied with the `-i` option of the +`segment` subcommand: + +.. code-block:: console + + $ kraken -i input.jpg segmentation.json segment -bl -i fancy_model.mlmodel + +Legacy Box Segmentation +^^^^^^^^^^^^^^^^^^^^^^^ + +The legacy page segmentation is mostly parameterless, although a couple of +switches exist to tweak it for particular inputs. Its output consists of +rectangular bounding boxes in reading order and the general text direction +(horizontal, i.e. LTR or RTL text in top-to-bottom reading order or +vertical-ltr/rtl for vertical lines read from left-to-right or right-to-left). + +Apart from the limitations of the bounding box paradigm (rotated and curved +lines cannot be effectively extracted) another important drawback of the legacy +segmenter is the requirement for binarized input images. It is therefore +necessary to apply :ref:`binarization ` first or supply only +pre-binarized inputs. + +The legacy segmenter can be applied on some input image with: + +.. code-block:: console + + $ kraken -i 14.tif lines.json segment -x + $ cat lines.json + +Available specific parameters are: + +=============================================== ====== +option action +=============================================== ====== +--scale FLOAT Estimate of the average line height on the page +-m, --maxcolseps Maximum number of columns in the input document. Set to `0` for uni-column layouts. +-b, --black-colseps / -w, --white-colseps Switch to black column separators. +-r, --remove-hlines / -l, --hlines Disables prefiltering of small horizontal lines. Improves segmenter output on some Arabic texts. +-p, --pad Adds left and right padding around lines in the output. +=============================================== ====== + +Principal Text Direction +^^^^^^^^^^^^^^^^^^^^^^^^ + +The principal text direction selected with the `-d/--text-direction` is a +switch used in the reading order heuristic to determine the order of text +blocks (regions) and individual lines. It roughly corresponds to the `block +flow direction +`_ in CSS with +an additional option. Valid options consist of two parts, an initial principal +line orientation (`horizontal` or `vertical`) followed by a block order (`lr` +for left-to-right or `rl` for right-to-left). + +.. warning: + + The principal text direction is independent of the direction of the + *inline text direction* (which is left-to-right for writing systems like + Latin and right-to-left for ones like Hebrew or Arabic). Kraken deals + automatically with the inline text direction through the BiDi algorithm + but can't infer the principal text direction automatically as it is + determined by factors like layout, type of document, primary script in + the document, and other factors. The differents types of text + directionality and their relation can be confusing, the `W3C writing + mode `_ document explains + the fundamentals, although the model used in Kraken differs slightly. + +The first part is usually `horizontal` for scripts like Latin, Arabic, or +Hebrew where the lines are horizontally oriented on the page and are written/read from +top to bottom: + +.. image:: _static/bw.png + :width: 800 + :alt: Horizontal Latin script text + +Other scripts like Chinese can be written with vertical lines that are +written/read from left to right or right to left: + +.. image:: https://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Chinese_manuscript_Ti-i_ch%27i-shu._Wellcome_L0020843.jpg/577px-Chinese_manuscript_Ti-i_ch%27i-shu._Wellcome_L0020843.jpg + :width: 800 + :alt: Vertical Chinese text + +The second part is dependent on a number of factors as the order in which text +blocks are read is not fixed for every writing system. In mono-script texts it +is usually determined by the inline text direction, i.e. Latin script texts +columns are read starting with the top-left column followed by the column to +its right and so on, continuing with the left-most column below if none remain +to the right (inverse for right-to-left scripts like Arabic which start on the +top right-most columns, continuing leftward, and returning to the right-most +column just below when none remain). + +In multi-script documents the order of is determined by the primary writing +system employed in the document, e.g. for a modern book containing both Latin +and Arabic script text it would be set to `lr` when Latin is primary, e.g. when +the binding is on the left side of the book seen from the title cover, and +vice-versa (`rl` if binding is on the right on the title cover). The analogue +applies to text written with vertical lines. + +With these explications there are four different text directions available: + +=============================================== ====== +Text Direction Examples +=============================================== ====== +horizontal-lr Latin script texts, Mixed LTR/RTL docs with principal LTR script +horizontal-rl Arabic script texts, Mixed LTR/RTL docs with principal RTL script +vertical-lr Vertical script texts read from left-to-right. +vertical-rl Vertical script texts read from right-to-left. +=============================================== ====== + +Masking +^^^^^^^ + +It is possible to keep the segmenter from finding text lines and regions on +certain areas of the input image. This is done through providing a binary mask +image that has the same size as the input image where blocked out regions are +black and valid regions white: + +.. code-block:: console + + $ kraken -i input.jpg segmentation.json segment -bl -m mask.png + +Model Repository +---------------- + +.. _repo: + +There is a semi-curated `repository +`_ of freely licensed recognition +models that can be interacted with from the command line using a few +subcommands. + +Querying and Model Retrieval +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``list`` subcommand retrieves a list of all models available and prints +them including some additional information (identifier, type, and a short +description): + +.. code-block:: console + + $ kraken list + Retrieving model list ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 8/8 0:00:00 0:00:07 + 10.5281/zenodo.6542744 (pytorch) - LECTAUREP Contemporary French Model (Administration) + 10.5281/zenodo.5617783 (pytorch) - Cremma-Medieval Old French Model (Litterature) + 10.5281/zenodo.5468665 (pytorch) - Medieval Hebrew manuscripts in Sephardi bookhand version 1.0 + ... + +To access more detailed information the ``show`` subcommand may be used: + +.. code-block:: console + + $ kraken show 10.5281/zenodo.5617783 + name: 10.5281/zenodo.5617783 + + Cremma-Medieval Old French Model (Litterature) + + .... + scripts: Latn + alphabet: &'(),-.0123456789:;?ABCDEFGHIJKLMNOPQRSTUVXabcdefghijklmnopqrstuvwxyz¶ãíñõ÷ħĩłũƺᵉẽ’•⁊⁹ꝑꝓꝯꝰ SPACE, COMBINING ACUTE ACCENT, COMBINING TILDE, COMBINING MACRON, COMBINING ZIGZAG ABOVE, COMBINING LATIN SMALL LETTER A, COMBINING LATIN SMALL LETTER E, COMBINING LATIN SMALL LETTER I, COMBINING LATIN SMALL LETTER O, COMBINING LATIN SMALL LETTER U, COMBINING LATIN SMALL LETTER C, COMBINING LATIN SMALL LETTER R, COMBINING LATIN SMALL LETTER T, COMBINING UR ABOVE, COMBINING US ABOVE, COMBINING LATIN SMALL LETTER S, 0xe8e5, 0xf038, 0xf128 + accuracy: 95.49% + license: CC-BY-SA-2.0 + author(s): Pinche, Ariane + date: 2021-10-29 + +If a suitable model has been decided upon it can be retrieved using the ``get`` +subcommand: + +.. code-block:: console + + $ kraken get 10.5281/zenodo.5617783 + Processing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 16.1/16.1 MB 0:00:00 0:00:10 + Model name: cremma_medieval_bicerin.mlmodel + +Models will be placed in ``$XDG_BASE_DIR`` and can be accessed using their name as +printed in the last line of the ``kraken get`` output. + +.. code-block:: console + + $ kraken -i ... ... ocr -m cremma_medieval_bicerin.mlmodel + +Publishing +^^^^^^^^^^ + +When one would like to share a model with the wider world (for fame and glory!) +it is possible (and recommended) to upload them to repository. The process +consists of 2 stages: the creation of the deposit on the Zenodo platform +followed by approval of the model in the community making it discoverable for +other kraken users. + +For uploading model a Zenodo account and a personal access token is required. +After account creation tokens can be created under the account settings: + +.. image:: _static/pat.png + :width: 800 + :alt: Zenodo token creation dialogue + +With the token models can then be uploaded: + +.. code-block:: console + + $ ketos publish -a $ACCESS_TOKEN aaebv2-2.mlmodel + DOI: 10.5281/zenodo.5617783 + +A number of important metadata will be asked for such as a short description of +the model, long form description, recognized scripts, and authorship. +Afterwards the model is deposited at Zenodo. This deposit is persistent, i.e. +can't be changed or deleted so it is important to make sure that all the +information is correct. Each deposit also has a unique persistent identifier, a +DOI, that can be used to refer to it, e.g. in publications or when pointing +someone to a particular model. + +Once the deposit has been created a request (requiring manual approval) for +inclusion in the repository will automatically be created which will make it +discoverable by other users. + +It is possible to deposit models without including them in the queryable +repository. Models uploaded this way are not truly private and can still be +found through the standard Zenodo search and be downloaded with `kraken get` +and its DOI. It is mostly suggested for preliminary models that might get +updated later: + +.. code-block:: console + + $ ketos publish --private -a $ACCESS_TOKEN aaebv2-2.mlmodel + DOI: 10.5281/zenodo.5617734 + +Recognition +----------- + +Recognition requires a grey-scale or binarized image, a page segmentation for +that image, and a model file. In particular there is no requirement to use the +page segmentation algorithm contained in the ``segment`` subcommand or the +binarization provided by kraken. + +Multi-script recognition is possible by supplying a script-annotated +segmentation and a mapping between scripts and models: + +.. code-block:: console + + $ kraken -i ... ... ocr -m Grek:porson.clstm -m Latn:antiqua.clstm + +All polytonic Greek text portions will be recognized using the `porson.clstm` +model while Latin text will be fed into the `antiqua.clstm` model. It is +possible to define a fallback model that other text will be fed to: + +.. code-block:: console + + $ kraken -i ... ... ocr -m ... -m ... -m default:porson.clstm + +It is also possible to disable recognition on a particular script by mapping to +the special model keyword `ignore`. Ignored lines will still be serialized but +will not contain any recognition results. + + diff --git a/4.3.0/_sources/api.rst.txt b/4.3.0/_sources/api.rst.txt new file mode 100644 index 000000000..a907f33dc --- /dev/null +++ b/4.3.0/_sources/api.rst.txt @@ -0,0 +1,406 @@ +API Quickstart +============== + +Kraken provides routines which are usable by third party tools to access all +functionality of the OCR engine. Most functional blocks, binarization, +segmentation, recognition, and serialization are encapsulated in one high +level method each. + +Simple use cases of the API which are mostly useful for debugging purposes are +contained in the `contrib` directory. In general it is recommended to look at +this tutorial, these scripts, or the API reference. The command line drivers +are unnecessarily complex for straightforward applications as they contain lots +of boilerplate to enable all use cases. + +Basic Concepts +-------------- + +The fundamental modules of the API are similar to the command line drivers. +Image inputs and outputs are generally `Pillow `_ +objects and numerical outputs numpy arrays. + +Top-level modules implement high level functionality while :mod:`kraken.lib` +contains loaders and low level methods that usually should not be used if +access to intermediate results is not required. + +Preprocessing and Segmentation +------------------------------ + +The primary preprocessing function is binarization although depending on the +particular setup of the pipeline and the models utilized it can be optional. +For the non-trainable legacy bounding box segmenter binarization is mandatory +although it is still possible to feed color and grayscale images to the +recognizer. The trainable baseline segmenter can work with black and white, +grayscale, and color images, depending on the training data and network +configuration utilized; though grayscale and color data are used in almost all +cases. + +.. code-block:: python + + >>> from PIL import Image + + >>> from kraken import binarization + + # can be any supported image format and mode + >>> im = Image.open('foo.png') + >>> bw_im = binarization.nlbin(im) + +Legacy segmentation +~~~~~~~~~~~~~~~~~~~ + +The basic parameter of the legacy segmenter consists just of a b/w image +object, although some additional parameters exist, largely to change the +principal text direction (important for column ordering and top-to-bottom +scripts) and explicit masking of non-text image regions: + +.. code-block:: python + + >>> from kraken import pageseg + + >>> seg = pageseg.segment(bw_im) + >>> seg + {'text_direction': 'horizontal-lr', + 'boxes': [[0, 29, 232, 56], + [28, 54, 121, 84], + [9, 73, 92, 117], + [103, 76, 145, 131], + [7, 105, 119, 230], + [10, 228, 126, 345], + ... + ], + 'script_detection': False} + +Baseline segmentation +~~~~~~~~~~~~~~~~~~~~~ + +The baseline segmentation method is based on a neural network that classifies +image pixels into baselines and regions. Because it is trainable, a +segmentation model is required in addition to the image to be segmented and +it has to be loaded first: + +.. code-block:: python + + >>> from kraken import blla + >>> from kraken.lib import vgsl + + >>> model_path = 'path/to/model/file' + >>> model = vgsl.TorchVGSLModel.load_model(model_path) + +A segmentation model contains a basic neural network and associated metadata +defining the available line and region types, bounding regions, and an +auxiliary baseline location flag for the polygonizer: + +.. raw:: html + :file: _static/kraken_segmodel.svg + +Afterwards they can be fed into the segmentation method +:func:`kraken.blla.segment` with image objects: + +.. code-block:: python + + >>> from kraken import blla + >>> from kraken import serialization + + >>> baseline_seg = blla.segment(im, model=model) + >>> baseline_seg + {'text_direction': 'horizontal-lr', + 'type': 'baselines', + 'script_detection': False, + 'lines': [{'script': 'default', + 'baseline': [[471, 1408], [524, 1412], [509, 1397], [1161, 1412], [1195, 1412]], + 'boundary': [[471, 1408], [491, 1408], [515, 1385], [562, 1388], [575, 1377], ... [473, 1410]]}, + ...], + 'regions': {'$tip':[[[536, 1716], ... [522, 1708], [524, 1716], [536, 1716], ...] + '$par': ... + '$nop': ...}} + >>> alto = serialization.serialize_segmentation(baseline_seg, image_name=im.filename, image_size=im.size, template='alto') + >>> with open('segmentation_output.xml', 'w') as fp: + fp.write(alto) + +Optional parameters are largely the same as for the legacy segmenter, i.e. text +direction and masking. + +Images are automatically converted into the proper mode for recognition, except +in the case of models trained on binary images as there is a plethora of +different algorithms available, each with strengths and weaknesses. For most +material the kraken-provided binarization should be sufficient, though. This +does not mean that a segmentation model trained on RGB images will have equal +accuracy for B/W, grayscale, and RGB inputs. Nevertheless the drop in quality +will often be modest or non-existent for color models while non-binarized +inputs to a binary model will cause severe degradation (and a warning to that +notion). + +Per default segmentation is performed on the CPU although the neural network +can be run on a GPU with the `device` argument. As the vast majority of the +processing required is postprocessing the performance gain will most likely +modest though. + +The above API is the most simple way to perform a complete segmentation. The +process consists of multiple steps such as pixel labelling, separate region and +baseline vectorization, and bounding polygon calculation: + +.. raw:: html + :file: _static/kraken_segmentation.svg + +It is possible to only run a subset of the functionality depending on one's +needs by calling the respective functions in :mod:`kraken.lib.segmentation`. As +part of the sub-library the API is not guaranteed to be stable but it generally +does not change much. Examples of more fine-grained use of the segmentation API +can be found in `contrib/repolygonize.py +`_ +and `contrib/segmentation_overlay.py +`_. + +Recognition +----------- + +Recognition itself is a multi-step process with a neural network producing a +matrix with a confidence value for possible outputs at each time step. This +matrix is decoded into a sequence of integer labels (*label domain*) which are +subsequently mapped into Unicode code points using a codec. Labels and code +points usually correspond one-to-one, i.e. each label is mapped to exactly one +Unicode code point, but if desired more complex codecs can map single labels to +multiple code points, multiple labels to single code points, or multiple labels +to multiple code points (see the :ref:`Codec ` section for further +information). + +.. _recognition_steps: + +.. raw:: html + :file: _static/kraken_recognition.svg + +As the customization of this two-stage decoding process is usually reserved +for specialized use cases, sensible defaults are chosen by default: codecs are +part of the model file and do not have to be supplied manually; the preferred +CTC decoder is an optional parameter of the recognition model object. + +To perform text line recognition a neural network has to be loaded first. A +:class:`kraken.lib.models.TorchSeqRecognizer` is returned which is a wrapper +around the :class:`kraken.lib.vgsl.TorchVGSLModel` class seen above for +segmentation model loading. + +.. code-block:: python + + >>> from kraken.lib import models + + >>> rec_model_path = '/path/to/recognition/model' + >>> model = models.load_any(rec_model_path) + +The sequence recognizer wrapper combines the neural network itself, a +:ref:`codec `, metadata such as the if the input is supposed to be +grayscale or binarized, and an instance of a CTC decoder that performs the +conversion of the raw output tensor of the network into a sequence of labels: + +.. raw:: html + :file: _static/kraken_torchseqrecognizer.svg + +Afterwards, given an image, a segmentation and the model one can perform text +recognition. The code is identical for both legacy and baseline segmentations. +Like for segmentation input images are auto-converted to the correct color +mode, except in the case of binary models for which a warning will be raised if +there is a mismatch for binary input models. + +There are two methods for recognition, a basic single model call +:func:`kraken.rpred.rpred` and a multi-model recognizer +:func:`kraken.rpred.mm_rpred`. The latter is useful for recognizing +multi-scriptal documents, i.e. applying different models to different parts of +a document. + +.. code-block:: python + + >>> from kraken import rpred + # single model recognition + >>> pred_it = rpred(model, im, baseline_seg) + >>> for record in pred_it: + print(record) + +The output isn't just a sequence of characters but an +:class:`kraken.rpred.ocr_record` record object containing the character +prediction, cuts (approximate locations), and confidences. + +.. code-block:: python + + >>> record.cuts + >>> record.prediction + >>> record.confidences + +it is also possible to access the original line information: + +.. code-block:: python + + # for baselines + >>> record.type + 'baselines' + >>> record.line + >>> record.baseline + >>> record.script + + # for box lines + >>> record.type + 'box' + >>> record.line + >>> record.script + +Sometimes the undecoded raw output of the network is required. The :math:`C +\times W` softmax output matrix is accessible as the `outputs` attribute on the +:class:`kraken.lib.models.TorchSeqRecognizer` after each step of the +:func:`kraken.rpred.rpred` iterator. To get a mapping from the label space +:math:`C` the network operates in to Unicode code points a codec is used. An +arbitrary sequence of labels can generate an arbitrary number of Unicode code +points although usually the relation is one-to-one. + +.. code-block:: python + + >>> pred_it = rpred(model, im, baseline_seg) + >>> next(pred_it) + >>> model.output + >>> model.codec.l2c + {'\x01': ' ', + '\x02': '"', + '\x03': "'", + '\x04': '(', + '\x05': ')', + '\x06': '-', + '\x07': '/', + ... + } + +There are several different ways to convert the output matrix to a sequence of +labels that can be decoded into a character sequence. These are contained in +:mod:`kraken.lib.ctc_decoder` with +:func:`kraken.lib.ctc_decoder.greedy_decoder` being the default. + +XML Parsing +----------- + +Sometimes it is desired to take the data in an existing XML serialization +format like PageXML or ALTO and apply an OCR function on it. The +:mod:`kraken.lib.xml` module includes parsers extracting information into data +structures processable with minimal transformtion by the functional blocks: + +.. code-block:: python + + >>> from kraken.lib import xml + + >>> alto_doc = '/path/to/alto' + >>> xml.parse_alto(alto_doc) + {'image': '/path/to/image/file', + 'type': 'baselines', + 'lines': [{'baseline': [(24, 2017), (25, 2078)], + 'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)], + 'text': '', + 'script': 'default'}, + {'baseline': [(79, 2016), (79, 2041)], + 'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)], + 'text': '', + 'script': 'default'}, ...], + 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)], + [(253, 3292), (668, 3292), (668, 3455), (253, 3455)], + [(216, -4), (1015, -4), (1015, 534), (216, 534)]], + 'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)], + [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]], + ...} + } + + >>> page_doc = '/path/to/page' + >>> xml.parse_page(page_doc) + {'image': '/path/to/image/file', + 'type': 'baselines', + 'lines': [{'baseline': [(24, 2017), (25, 2078)], + 'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)], + 'text': '', + 'script': 'default'}, + {'baseline': [(79, 2016), (79, 2041)], + 'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)], + 'text': '', + 'script': 'default'}, ...], + 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)], + [(253, 3292), (668, 3292), (668, 3455), (253, 3455)], + [(216, -4), (1015, -4), (1015, 534), (216, 534)]], + 'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)], + [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]], + ...} + + +Serialization +------------- + +The serialization module can be used to transform the :class:`ocr_records +` returned by the prediction iterator into a text +based (most often XML) format for archival. The module renders `jinja2 +`_ templates in `kraken/templates` through +the :func:`kraken.serialization.serialize` function. + +.. code-block:: python + + >>> from kraken.lib import serialization + + >>> records = [record for record in pred_it] + >>> alto = serialization.serialize(records, image_name='path/to/image', image_size=im.size, template='alto') + >>> with open('output.xml', 'w') as fp: + fp.write(alto) + + +Training +-------- + +Training is largely implemented with the `pytorch lightning +`_ framework. There are separate +`LightningModule`s for recognition and segmentation training and a small +wrapper around the lightning's `Trainer` class that mainly sets up model +handling and verbosity options for the CLI. + + +.. code-block:: python + + >>> from kraken.lib.train import RecognitionModel, KrakenTrainer + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer() + >>> trainer.fit(model) + +Likewise for a baseline and region segmentation model: + +.. code-block:: python + + >>> from kraken.lib.train import SegmentationModel, KrakenTrainer + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = SegmentationModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer() + >>> trainer.fit(model) + +When the `fit()` method is called the dataset is initialized and the training +commences. Both can take quite a bit of time. To get insight into what exactly +is happening the standard `lightning callbacks +`_ +can be attached to the trainer object: + +.. code-block:: python + + >>> from pytorch_lightning.callbacks import Callback + >>> from kraken.lib.train import RecognitionModel, KrakenTrainer + >>> class MyPrintingCallback(Callback): + def on_init_start(self, trainer): + print("Starting to init trainer!") + + def on_init_end(self, trainer): + print("trainer is init now") + + def on_train_end(self, trainer, pl_module): + print("do something when training ends") + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer(enable_progress_bar=False, callbacks=[MyPrintingCallback]) + >>> trainer.fit(model) + Starting to init trainer! + trainer is init now + +This is only a small subset of the training functionality. It is suggested to +have a closer look at the command line parameters for features as transfer +learning, region and baseline filtering, training continuation, and so on. diff --git a/4.3.0/_sources/api_docs.rst.txt b/4.3.0/_sources/api_docs.rst.txt new file mode 100644 index 000000000..46379f2b8 --- /dev/null +++ b/4.3.0/_sources/api_docs.rst.txt @@ -0,0 +1,251 @@ +************* +API Reference +************* + +kraken.blla module +================== + +.. note:: + + `blla` provides the interface to the fully trainable segmenter. For the + legacy segmenter interface refer to the `pageseg` module. Note that + recognition models are not interchangeable between segmenters. + +.. autoapifunction:: kraken.blla.segment + +kraken.pageseg module +===================== + +.. note:: + + `pageseg` is the legacy bounding box-based segmenter. For the trainable + baseline segmenter interface refer to the `blla` module. Note that + recognition models are not interchangeable between segmenters. + +.. autoapifunction:: kraken.pageseg.segment + +kraken.rpred module +=================== + +.. autoapifunction:: kraken.rpred.bidi_record + +.. autoapiclass:: kraken.rpred.mm_rpred + :members: + +.. autoapiclass:: kraken.rpred.ocr_record + :members: + +.. autoapifunction:: kraken.rpred.rpred + + +kraken.serialization module +=========================== + +.. autoapifunction:: kraken.serialization.render_report + +.. autoapifunction:: kraken.serialization.serialize + +.. autoapifunction:: kraken.serialization.serialize_segmentation + +kraken.lib.models module +======================== + +.. autoapiclass:: kraken.lib.models.TorchSeqRecognizer + :members: + +.. autoapifunction:: kraken.lib.models.load_any + +kraken.lib.vgsl module +====================== + +.. autoapiclass:: kraken.lib.vgsl.TorchVGSLModel + :members: + +kraken.lib.xml module +===================== + +.. autoapifunction:: kraken.lib.xml.parse_xml + +.. autoapifunction:: kraken.lib.xml.parse_page + +.. autoapifunction:: kraken.lib.xml.parse_alto + +kraken.lib.codec module +======================= + +.. autoapiclass:: kraken.lib.codec.PytorchCodec + :members: + +kraken.lib.train module +======================= + +Training Schedulers +------------------- + +.. autoapiclass:: kraken.lib.train.TrainScheduler + :members: + +.. autoapiclass:: kraken.lib.train.annealing_step + :members: + +.. autoapiclass:: kraken.lib.train.annealing_const + :members: + +.. autoapiclass:: kraken.lib.train.annealing_exponential + :members: + +.. autoapiclass:: kraken.lib.train.annealing_reduceonplateau + :members: + +.. autoapiclass:: kraken.lib.train.annealing_cosine + :members: + +.. autoapiclass:: kraken.lib.train.annealing_onecycle + :members: + +Training Stoppers +----------------- + +.. autoapiclass:: kraken.lib.train.TrainStopper + :members: + +.. autoapiclass:: kraken.lib.train.EarlyStopping + :members: + +.. autoapiclass:: kraken.lib.train.EpochStopping + :members: + +.. autoapiclass:: kraken.lib.train.NoStopping + :members: + +Loss and Evaluation Functions +----------------------------- + +.. autoapifunction:: kraken.lib.train.recognition_loss_fn + +.. autoapifunction:: kraken.lib.train.baseline_label_loss_fn + +.. autoapifunction:: kraken.lib.train.recognition_evaluator_fn + +.. autoapifunction:: kraken.lib.train.baseline_label_evaluator_fn + +Trainer +------- + +.. autoapiclass:: kraken.lib.train.KrakenTrainer + :members: + + +kraken.lib.dataset module +========================= + +Datasets +-------- + +.. autoapiclass:: kraken.lib.dataset.BaselineSet + :members: + +.. autoapiclass:: kraken.lib.dataset.PolygonGTDataset + :members: + +.. autoapiclass:: kraken.lib.dataset.GroundTruthDataset + :members: + +Helpers +------- + +.. autoapifunction:: kraken.lib.dataset.compute_error + +.. autoapifunction:: kraken.lib.dataset.preparse_xml_data + +.. autoapifunction:: kraken.lib.dataset.generate_input_transforms + +kraken.lib.segmentation module +------------------------------ + +.. autoapifunction:: kraken.lib.segmentation.reading_order + +.. autoapifunction:: kraken.lib.segmentation.polygonal_reading_order + +.. autoapifunction:: kraken.lib.segmentation.denoising_hysteresis_thresh + +.. autoapifunction:: kraken.lib.segmentation.vectorize_lines + +.. autoapifunction:: kraken.lib.segmentation.calculate_polygonal_environment + +.. autoapifunction:: kraken.lib.segmentation.scale_polygonal_lines + +.. autoapifunction:: kraken.lib.segmentation.scale_regions + +.. autoapifunction:: kraken.lib.segmentation.compute_polygon_section + +.. autoapifunction:: kraken.lib.segmentation.extract_polygons + + +kraken.lib.ctc_decoder +====================== + +.. autoapifunction:: kraken.lib.ctc_decoder.beam_decoder + +.. autoapifunction:: kraken.lib.ctc_decoder.greedy_decoder + +.. autoapifunction:: kraken.lib.ctc_decoder.blank_threshold_decoder + +kraken.lib.exceptions +===================== + +.. autoapiclass:: kraken.lib.exceptions.KrakenCodecException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenStopTrainingException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenEncodeException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenRecordException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenInvalidModelException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenInputException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenRepoException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenCairoSurfaceException + :members: + + +Legacy modules +============== + +These modules are retained for compatibility reasons or highly specialized use +cases. In most cases their use is not necessary and they aren't further +developed for interoperability with new functionality, e.g. the transcription +and line generation modules do not work with the baseline segmenter. + +kraken.binarization module +-------------------------- + +.. autoapifunction:: kraken.binarization.nlbin + +kraken.transcribe module +------------------------ + +.. autoapiclass:: kraken.transcribe.TranscriptionInterface + :members: + +kraken.linegen module +--------------------- + +.. autoapiclass:: kraken.transcribe.LineGenerator + :members: + +.. autoapifunction:: kraken.transcribe.ocropy_degrade + +.. autoapifunction:: kraken.transcribe.degrade_line + +.. autoapifunction:: kraken.transcribe.distort_line diff --git a/4.3.0/_sources/gpu.rst.txt b/4.3.0/_sources/gpu.rst.txt new file mode 100644 index 000000000..fbb66ba76 --- /dev/null +++ b/4.3.0/_sources/gpu.rst.txt @@ -0,0 +1,10 @@ +.. _gpu: + +GPU Acceleration +================ + +The latest version of kraken uses a new pytorch backend which enables GPU +acceleration both for training and recognition. Apart from a compatible Nvidia +GPU, CUDA and cuDNN have to be installed so pytorch can run computation on it. + + diff --git a/4.3.0/_sources/index.rst.txt b/4.3.0/_sources/index.rst.txt new file mode 100644 index 000000000..9b7da028f --- /dev/null +++ b/4.3.0/_sources/index.rst.txt @@ -0,0 +1,242 @@ +kraken +====== + +.. toctree:: + :hidden: + :maxdepth: 2 + + advanced + Training + API Tutorial + API Reference + Models + +kraken is a turn-key OCR system optimized for historical and non-Latin script +material. + +Features +======== + +kraken's main features are: + + - Fully trainable layout analysis and character recognition + - `Right-to-Left `_, `BiDi + `_, and Top-to-Bottom + script support + - `ALTO `_, PageXML, abbyyXML, and hOCR + output + - Word bounding boxes and character cuts + - Multi-script recognition support + - :ref:`Public repository ` of model files + - :ref:`Variable recognition network architectures ` + +Pull requests and code contributions are always welcome. + +Installation +============ + +Kraken can be run on Linux or Mac OS X (both x64 and ARM). Installation through +the on-board *pip* utility and the `anaconda `_ +scientific computing python are supported. + +Installation using Pip +---------------------- + +.. code-block:: console + + $ pip install kraken + +or by running pip in the git repository: + +.. code-block:: console + + $ pip install . + +If you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to +install the `pdf` extras package for PyPi: + +.. code-block:: console + + $ pip install kraken[pdf] + +or + +.. code-block:: console + + $ pip install .[pdf] + +respectively. + +Installation using Conda +------------------------ + +To install the stable version through `conda `_: + +.. code-block:: console + + $ conda install -c conda-forge -c mittagessen kraken + +Again PDF/multi-page TIFF/JPEG2000 support requires some additional dependencies: + +.. code-block:: console + + $ conda install -c conda-forge pyvips + +The git repository contains some environment files that aid in setting up the latest development version: + +.. code-block:: console + + $ git clone https://github.com/mittagessen/kraken.git + $ cd kraken + $ conda env create -f environment.yml + +or: + +.. code-block:: console + + $ git clone https://github.com/mittagessen/kraken.git + $ cd kraken + $ conda env create -f environment_cuda.yml + +for CUDA acceleration with the appropriate hardware. + +Finding Recognition Models +-------------------------- + +Finally you'll have to scrounge up a recognition model to do the actual +recognition of characters. To download the default English text recognition +model and place it in the user's kraken directory: + +.. code-block:: console + + $ kraken get 10.5281/zenodo.2577813 + +A list of libre models available in the central repository can be retrieved by +running: + +.. code-block:: console + + $ kraken list + +Model metadata can be extracted using: + +.. code-block:: console + + $ kraken show 10.5281/zenodo.2577813 + name: 10.5281/zenodo.2577813 + + A generalized model for English printed text + + This model has been trained on a large corpus of modern printed English text\naugmented with ~10000 lines of historical p + scripts: Latn + alphabet: !"#$%&'()+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]`abcdefghijklmnopqrstuvwxyz{} SPACE + accuracy: 99.95% + license: Apache-2.0 + author(s): Kiessling, Benjamin + date: 2019-02-26 + +Quickstart +========== + +The structure of an OCR software consists of multiple steps, primarily +preprocessing, segmentation, and recognition, each of which takes the output of +the previous step and sometimes additional files such as models and templates +that define how a particular transformation is to be performed. + +In kraken these are separated into different subcommands that can be chained or +ran separately: + +.. raw:: html + :file: _static/kraken_workflow.svg + +Recognizing text on an image using the default parameters including the +prerequisite step of page segmentation: + +.. code-block:: console + + $ kraken -i image.tif image.txt segment -bl ocr + Loading RNN ✓ + Processing ⣻ + +To segment an image into reading-order sorted baselines and regions: + +.. code-block:: console + + $ kraken -i bw.tif lines.json segment -bl + +To OCR an image using the default model: + +.. code-block:: console + + $ kraken -i bw.tif image.txt segment -bl ocr + +To OCR an image using the default model and serialize the output using the ALTO +template: + +.. code-block:: console + + $ kraken -a -i bw.tif image.txt segment -bl ocr + +All commands and their parameters are documented, just add the standard +``--help`` flag for further information. + +Training Tutorial +================= + +There is a training tutorial at :doc:`training`. + +Related Software +================ + +These days kraken is quite closely linked to the `escriptorium +`_ project developed in the same eScripta research +group. eScriptorium provides a user-friendly interface for annotating data, +training models, and inference (but also much more). There is a `gitter channel +`_ that is mostly intended for +coordinating technical development but is also a spot to find people with +experience on applying kraken on a wide variety of material. + +.. _license: + +License +======= + +``Kraken`` is provided under the terms and conditions of the `Apache 2.0 +License `_. + +Funding +======= + +kraken is developed at the `École Pratique des Hautes Études `_, `Université PSL `_. + + +.. container:: twocol + + .. container:: leftside + + .. image:: _static/normal-reproduction-low-resolution.jpg + :width: 100 + :alt: Co-financed by the European Union + + .. container:: rightside + + This project was partially funded through the RESILIENCE project, funded from + the European Union’s Horizon 2020 Framework Programme for Research and + Innovation. + + +.. container:: twocol + + .. container:: leftside + + .. image:: https://www.gouvernement.fr/sites/default/files/styles/illustration-centre/public/contenu/illustration/2018/10/logo_investirlavenir_rvb.png + :width: 100 + :alt: Received funding from the Programme d’investissements d’Avenir + + .. container:: rightside + + Ce travail a bénéficié d’une aide de l’État gérée par l’Agence Nationale de la + Recherche au titre du Programme d’Investissements d’Avenir portant la référence + ANR-21-ESRE-0005. + + diff --git a/4.3.0/_sources/ketos.rst.txt b/4.3.0/_sources/ketos.rst.txt new file mode 100644 index 000000000..d173bd148 --- /dev/null +++ b/4.3.0/_sources/ketos.rst.txt @@ -0,0 +1,711 @@ +.. _ketos: + +Training +======== + +This page describes the training utilities available through the ``ketos`` +command line utility in depth. For a gentle introduction on model training +please refer to the :ref:`tutorial `. + +Both segmentation and recognition are trainable in kraken. The segmentation +model finds baselines and regions on a page image. Recognition models convert +text image lines found by the segmenter into digital text. + +Training data formats +--------------------- + +The training tools accept a variety of training data formats, usually some kind +of custom low level format, the XML-based formats that are commony used for +archival of annotation and transcription data, and in the case of recognizer +training a precompiled binary format. It is recommended to use the XML formats +for segmentation training and the binary format for recognition training. + +ALTO +~~~~ + +Kraken parses and produces files according to ALTO 4.2. An example showing the +attributes necessary for segmentation and recognition training follows: + +.. literalinclude:: alto.xml + :language: xml + :force: + +Importantly, the parser only works with measurements in the pixel domain, i.e. +an unset `MeasurementUnit` or one with an element value of `pixel`. In +addition, as the minimal version required for ingestion is quite new it is +likely that most existing ALTO documents will not contain sufficient +information to be used with kraken out of the box. + +PAGE XML +~~~~~~~~ + +PAGE XML is parsed and produced according to the 2019-07-15 version of the +schema, although the parser is not strict and works with non-conformant output +of a variety of tools. + +.. literalinclude:: pagexml.xml + :language: xml + :force: + +Binary Datasets +~~~~~~~~~~~~~~~ + +.. _binary_datasets: + +In addition to training recognition models directly from XML and image files, a +binary dataset format offering a couple of advantages is supported. Binary +datasets drastically improve loading performance allowing the saturation of +most GPUs with minimal computational overhead while also allowing training with +datasets that are larger than the systems main memory. A minor drawback is a +~30% increase in dataset size in comparison to the raw images + XML approach. + +To realize this speedup the dataset has to be compiled first: + +.. code-block:: console + + $ ketos compile -f xml -o dataset.arrow file_1.xml file_2.xml ... + +if there are a lot of individual lines containing many lines this process can +take a long time. It can easily be parallelized by specifying the number of +separate parsing workers with the `--workers` option: + +.. code-block:: console + + $ ketos compile --workers 8 -f xml ... + +In addition, binary datasets can contain fixed splits which allow +reproducibility and comparability between training and evaluation runs. +Training, validation, and test splits can be pre-defined from multiple sources. +Per default they are sourced from tags defined in the source XML files unless +the option telling kraken to ignore them is set: + +.. code-block:: console + + $ ketos compile --ignore-splits -f xml ... + +Alternatively fixed-proportion random splits can be created ad-hoc during +compile time: + +.. code-block:: console + + $ ketos compile --random-split 0.8 0.1 0.1 ... + +The above line splits assigns 80% of the source lines to the training set, 10% +to the validation set, and 10% to the test set. The training and validation +sets in the dataset file are used automatically by `ketos train` (unless told +otherwise) while the remaining 10% of the test set is selected by `ketos test`. + +Recognition training +-------------------- + +The training utility allows training of :ref:`VGSL ` specified models +both from scratch and from existing models. Here are its most important command line options: + +======================================================= ====== +option action +======================================================= ====== +-o, \--output Output model file prefix. Defaults to model. +-s, \--spec VGSL spec of the network to train. CTC layer + will be added automatically. default: + [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 + Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] +-a, \--append Removes layers before argument and then + appends spec. Only works when loading an + existing model +-i, \--load Load existing file to continue training +-F, \--savefreq Model save frequency in epochs during + training +-q, \--quit Stop condition for training. Set to `early` + for early stopping (default) or `dumb` for fixed + number of epochs. +-N, \--epochs Number of epochs to train for. +\--min-epochs Minimum number of epochs to train for when using early stopping. +\--lag Number of epochs to wait before stopping + training without improvement. Only used when using early stopping. +-d, \--device Select device to use (cpu, cuda:0, cuda:1,...). GPU acceleration requires CUDA. +\--optimizer Select optimizer (Adam, SGD, RMSprop). +-r, \--lrate Learning rate [default: 0.001] +-m, \--momentum Momentum used with SGD optimizer. Ignored otherwise. +-w, \--weight-decay Weight decay. +\--schedule Sets the learning rate scheduler. May be either constant, 1cycle, exponential, cosine, step, or + reduceonplateau. For 1cycle the cycle length is determined by the `--epoch` option. +-p, \--partition Ground truth data partition ratio between train/validation set +-u, \--normalization Ground truth Unicode normalization. One of NFC, NFKC, NFD, NFKD. +-c, \--codec Load a codec JSON definition (invalid if loading existing model) +\--resize Codec/output layer resizing option. If set + to `add` code points will be added, `both` + will set the layer to match exactly the + training data, `fail` will abort if training + data and model codec do not match. Only valid when refining an existing model. +-n, \--reorder / \--no-reorder Reordering of code points to display order. +-t, \--training-files File(s) with additional paths to training data. Used to + enforce an explicit train/validation set split and deal with + training sets with more lines than the command line can process. Can be used more than once. +-e, \--evaluation-files File(s) with paths to evaluation data. Overrides the `-p` parameter. +-f, \--format-type Sets the training and evaluation data format. + Valid choices are 'path', 'xml' (default), 'alto', 'page', or binary. + In `alto`, `page`, and xml mode all data is extracted from XML files + containing both baselines and a link to source images. + In `path` mode arguments are image files sharing a prefix up to the last + extension with JSON `.path` files containing the baseline information. + In `binary` mode arguments are precompiled binary dataset files. +\--augment / \--no-augment Enables/disables data augmentation. +\--workers Number of OpenMP threads and workers used to perform neural network passes and load samples from the dataset. +======================================================= ====== + +From Scratch +~~~~~~~~~~~~ + +The absolute minimal example to train a new recognition model from a number of +ALTO or PAGE XML documents is similar to the segmentation training: + +.. code-block:: console + + $ ketos train -f xml training_data/*.xml + +Training will continue until the error does not improve anymore and the best +model (among intermediate results) will be saved in the current directory; this +approach is called early stopping. + +In some cases, such as color inputs, changing the network architecture might be +useful: + +.. code-block:: console + + $ ketos train -f page -s '[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do]' syr/*.xml + +Complete documentation for the network description language can be found on the +:ref:`VGSL ` page. + +Sometimes the early stopping default parameters might produce suboptimal +results such as stopping training too soon. Adjusting the minimum delta an/or +lag can be useful: + +.. code-block:: console + + $ ketos train --lag 10 --min-delta 0.001 syr/*.png + +To switch optimizers from Adam to SGD or RMSprop just set the option: + +.. code-block:: console + + $ ketos train --optimizer SGD syr/*.png + +It is possible to resume training from a previously saved model: + +.. code-block:: console + + $ ketos train -i model_25.mlmodel syr/*.png + +A good configuration for a small precompiled print dataset and GPU acceleration +would be: + +.. code-block:: console + + $ ketos train -d cuda -f binary dataset.arrow + +A better configuration for large and complicated datasets such as handwritten texts: + +.. code-block:: console + + $ ketos train --augment --workers 4 -d cuda -f binary --min-epochs 20 -w 0 -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]' -r 0.0001 dataset_large.arrow + +This configuration is slower to train and often requires a couple of epochs to +output any sensible text at all. Therefore we tell ketos to train for at least +20 epochs so the early stopping algorithm doesn't prematurely interrupt the +training process. + +Fine Tuning +~~~~~~~~~~~ + +Fine tuning an existing model for another typeface or new characters is also +possible with the same syntax as resuming regular training: + +.. code-block:: console + + $ ketos train -f page -i model_best.mlmodel syr/*.xml + +The caveat is that the alphabet of the base model and training data have to be +an exact match. Otherwise an error will be raised: + +.. code-block:: console + + $ ketos train -i model_5.mlmodel kamil/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [0.8616] alphabet mismatch {'~', '»', '8', '9', 'ـ'} + Network codec not compatible with training set + [0.8620] Training data and model codec alphabets mismatch: {'ٓ', '؟', '!', 'ص', '،', 'ذ', 'ة', 'ي', 'و', 'ب', 'ز', 'ح', 'غ', '~', 'ف', ')', 'د', 'خ', 'م', '»', 'ع', 'ى', 'ق', 'ش', 'ا', 'ه', 'ك', 'ج', 'ث', '(', 'ت', 'ظ', 'ض', 'ل', 'ط', '؛', 'ر', 'س', 'ن', 'ء', 'ٔ', '«', 'ـ', 'ٕ'} + +There are two modes dealing with mismatching alphabets, ``add`` and ``both``. +``add`` resizes the output layer and codec of the loaded model to include all +characters in the new training set without removing any characters. ``both`` +will make the resulting model an exact match with the new training set by both +removing unused characters from the model and adding new ones. + +.. code-block:: console + + $ ketos -v train --resize add -i model_5.mlmodel syr/*.png + ... + [0.7943] Training set 788 lines, validation set 88 lines, alphabet 50 symbols + ... + [0.8337] Resizing codec to include 3 new code points + [0.8374] Resizing last layer in network to 52 outputs + ... + +In this example 3 characters were added for a network that is able to +recognize 52 different characters after sufficient additional training. + +.. code-block:: console + + $ ketos -v train --resize both -i model_5.mlmodel syr/*.png + ... + [0.7593] Training set 788 lines, validation set 88 lines, alphabet 49 symbols + ... + [0.7857] Resizing network or given codec to 49 code sequences + [0.8344] Deleting 2 output classes from network (46 retained) + ... + +In ``both`` mode 2 of the original characters were removed and 3 new ones were added. + +Slicing +~~~~~~~ + +Refining on mismatched alphabets has its limits. If the alphabets are highly +different the modification of the final linear layer to add/remove character +will destroy the inference capabilities of the network. In those cases it is +faster to slice off the last few layers of the network and only train those +instead of a complete network from scratch. + +Taking the default network definition as printed in the debug log we can see +the layer indices of the model: + +.. code-block:: console + + [0.8760] Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 48 outputs + [0.8762] layer type params + [0.8790] 0 conv kernel 3 x 3 filters 32 activation r + [0.8795] 1 dropout probability 0.1 dims 2 + [0.8797] 2 maxpool kernel 2 x 2 stride 2 x 2 + [0.8802] 3 conv kernel 3 x 3 filters 64 activation r + [0.8804] 4 dropout probability 0.1 dims 2 + [0.8806] 5 maxpool kernel 2 x 2 stride 2 x 2 + [0.8813] 6 reshape from 1 1 x 12 to 1/3 + [0.8876] 7 rnn direction b transposed False summarize False out 100 legacy None + [0.8878] 8 dropout probability 0.5 dims 1 + [0.8883] 9 linear augmented False out 48 + +To remove everything after the initial convolutional stack and add untrained +layers we define a network stub and index for appending: + +.. code-block:: console + + $ ketos train -i model_1.mlmodel --append 7 -s '[Lbx256 Do]' syr/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [0.8014] alphabet mismatch {'8', '3', '9', '7', '܇', '݀', '݂', '4', ':', '0'} + Slicing and dicing model ✓ + +The new model will behave exactly like a new one, except potentially training a +lot faster. + +Text Normalization and Unicode +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. note: + + The description of the different behaviors of Unicode text below are highly + abbreviated. If confusion arrises it is recommended to take a look at the + linked documents which are more exhaustive and include visual examples. + +Text can be encoded in multiple different ways when using Unicode. For many +scripts characters with diacritics can be encoded either as a single code point +or a base character and the diacritic, `different types of whitespace +`_ exist, and mixed bidirectional text +can be written differently depending on the `base line direction +`_. + +Ketos provides options to largely normalize input into normalized forms that +make processing of data from multiple sources possible. Principally, two +options are available: one for `Unicode normalization +`_ and one for whitespace normalization. The +Unicode normalization (disabled per default) switch allows one to select one of +the 4 normalization forms: + +.. code-block:: console + + $ ketos train --normalization NFD -f xml training_data/*.xml + $ ketos train --normalization NFC -f xml training_data/*.xml + $ ketos train --normalization NFKD -f xml training_data/*.xml + $ ketos train --normalization NFKC -f xml training_data/*.xml + +Whitespace normalization is enabled per default and converts all Unicode +whitespace characters into a simple space. It is highly recommended to leave +this function enabled as the variation of space width, resulting either from +text justification or the irregularity of handwriting, is difficult for a +recognition model to accurately model and map onto the different space code +points. Nevertheless it can be disabled through: + +.. code-block:: console + + $ ketos train --no-normalize-whitespace -f xml training_data/*.xml + +Further the behavior of the `BiDi algorithm +`_ can be influenced through two options. The +configuration of the algorithm is important as the recognition network is +trained to output characters (or rather labels which are mapped to code points +by a :ref:`codec `) in the order a line is fed into the network, i.e. +left-to-right also called display order. Unicode text is encoded as a stream of +code points in logical order, i.e. the order the characters in a line are read +in by a human reader, for example (mostly) right-to-left for a text in Hebrew. +The BiDi algorithm resolves this logical order to the display order expected by +the network and vice versa. The primary parameter of the algorithm is the base +direction which is just the default direction of the input fields of the user +when the ground truth was initially transcribed. Base direction will be +automatically determined by kraken when using PAGE XML or ALTO files that +contain it, otherwise it will have to be supplied if it differs from the +default when training a model: + +.. code-block:: console + + $ ketos train --base-dir R -f xml rtl_training_data/*.xml + +It is also possible to disable BiDi processing completely, e.g. when the text +has been brought into display order already: + +.. code-block:: console + + $ ketos train --no-reorder -f xml rtl_display_data/*.xml + +Codecs +~~~~~~ + +.. _codecs: + +Codecs map between the label decoded from the raw network output and Unicode +code points (see :ref:`this ` diagram for the precise steps +involved in text line recognition). Codecs are attached to a recognition model +and are usually defined once at initial training time, although they can be +adapted either explicitly (with the API) or implicitly through domain adaptation. + +The default behavior of kraken is to auto-infer this mapping from all the +characters in the training set and map each code point to one separate label. +This is usually sufficient for alphabetic scripts, abjads, and abugidas apart +from very specialised use cases. Logographic writing systems with a very large +number of different graphemes, such as all the variants of Han characters or +Cuneiform, can be more problematic as their large inventory makes recognition +both slow and error-prone. In such cases it can be advantageous to decompose +each code point into multiple labels to reduce the output dimensionality of the +network. During decoding valid sequences of labels will be mapped to their +respective code points as usual. + +There are multiple approaches one could follow constructing a custom codec: +*randomized block codes*, i.e. producing random fixed-length labels for each code +point, *Huffmann coding*, i.e. variable length label sequences depending on the +frequency of each code point in some text (not necessarily the training set), +or *structural decomposition*, i.e. describing each code point through a +sequence of labels that describe the shape of the grapheme similar to how some +input systems for Chinese characters function. + +While the system is functional it is not well-tested in practice and it is +unclear which approach works best for which kinds of inputs. + +Custom codecs can be supplied as simple JSON files that contain a dictionary +mapping between strings and integer sequences, e.g.: + +.. code-block:: console + + $ ketos train -c sample.codec -f xml training_data/*.xml + +with `sample.codec` containing: + +.. code-block:: json + + {"S": [50, 53, 74, 23], + "A": [95, 60, 19, 95], + "B": [2, 96, 28, 29], + "\u1f05": [91, 14, 95, 90]} + +Unsupervised recognition pretraining +------------------------------------ + +Text recognition models can be pretrained in an unsupervised fashion from text +line images, both in bounding box and baseline format. The pretraining is +performed through a contrastive surrogate task aiming to distinguish in-painted +parts of the input image features from randomly sampled distractor slices. + +All data sources accepted by the supervised trainer are valid for pretraining +but for performance reasons it is recommended to use pre-compiled binary +datasets. One thing to keep in mind is that compilation filters out empty +(non-transcribed) text lines per default which is undesirable for pretraining. +With the `--keep-empty-lines` option all valid lines will be written to the +dataset file: + +.. code-block:: console + + $ ketos compile --keep-empty-lines -f xml -o foo.arrow *.xml + + +The basic pretraining call is very similar to a training one: + +.. code-block:: console + + $ ketos pretrain -f binary foo.arrow + +There are a couple of hyperparameters that are specific to pretraining: the +mask width (at the subsampling level of the last convolutional layer), the +probability of a particular position being the start position of a mask, and +the number of negative distractor samples. + +.. code-block:: console + + $ ketos pretrain -o pretrain --mask-width 4 --mask-probability 0.2 --num-negatives 3 -f binary foo.arrow + +Once a model has been pretrained it has to be adapted to perform actual +recognition with a standard labelled dataset, although training data +requirements will usually be much reduced: + +.. code-block:: console + + $ ketos train -i pretrain_best.mlmodel --warmup 5000 --freeze-backbone 1000 -f binary labelled.arrow + +It is necessary to use learning rate warmup (`warmup`) for at least a couple of +epochs in addition to freezing the backbone (all but the last fully connected +layer performing the classification) to have the model converge during +fine-tuning. Fine-tuning models from pre-trained weights is quite a bit less +stable than training from scratch or fine-tuning an existing model. As such it +can be necessary to run a couple of trials with different hyperparameters +(principally learning rate) to find workable ones. It is entirely possible that +pretrained models do not converge at all even with reasonable hyperparameter +configurations. + +Segmentation training +--------------------- + +.. _segtrain: + +Training a segmentation model is very similar to training models for text +recognition. The basic invocation is: + +.. code-block:: console + + $ ketos segtrain -f xml training_data/*.xml + Training line types: + default 2 53980 + foo 8 134 + Training region types: + graphic 3 135 + text 4 1128 + separator 5 5431 + paragraph 6 10218 + table 7 16 + val check [------------------------------------] 0/0 + +This takes all text lines and regions encoded in the XML files and trains a +model to recognize them. + +Most other options available in transcription training are also available in +segmentation training. CUDA acceleration: + +.. code-block:: console + + $ ketos segtrain -d cuda -f xml training_data/*.xml + +Defining custom architectures: + +.. code-block:: console + + $ ketos segtrain -d cuda -s '[1,1200,0,3 Cr7,7,64,2,2 Gn32 Cr3,3,128,2,2 Gn32 Cr3,3,128 Gn32 Cr3,3,256 Gn32]' -f xml training_data/*.xml + +Fine tuning/transfer learning with last layer adaptation and slicing: + +.. code-block:: console + + $ ketos segtrain --resize both -i segmodel_best.mlmodel training_data/*.xml + $ ketos segtrain -i segmodel_best.mlmodel --append 7 -s '[Cr3,3,64 Do0.1]' training_data/*.xml + +In addition there are a couple of specific options that allow filtering of +baseline and region types. Datasets are often annotated to a level that is too +detailed or contains undesirable types, e.g. when combining segmentation data +from different sources. The most basic option is the suppression of *all* of +either baseline or region data contained in the dataset: + +.. code-block:: console + + $ ketos segtrain --suppress-baselines -f xml training_data/*.xml + Training line types: + Training region types: + graphic 3 135 + text 4 1128 + separator 5 5431 + paragraph 6 10218 + table 7 16 + ... + $ ketos segtrain --suppress-regions -f xml training-data/*.xml + Training line types: + default 2 53980 + foo 8 134 + ... + +It is also possible to filter out baselines/regions selectively: + +.. code-block:: console + + $ ketos segtrain -f xml --valid-baselines default training_data/*.xml + Training line types: + default 2 53980 + Training region types: + graphic 3 135 + text 4 1128 + separator 5 5431 + paragraph 6 10218 + table 7 16 + $ ketos segtrain -f xml --valid-regions graphic --valid-regions paragraph training_data/*.xml + Training line types: + default 2 53980 + Training region types: + graphic 3 135 + paragraph 6 10218 + +Finally, we can merge baselines and regions into each other: + +.. code-block:: console + + $ ketos segtrain -f xml --merge-baselines default:foo training_data/*.xml + Training line types: + default 2 54114 + ... + $ ketos segtrain -f xml --merge-regions text:paragraph --merge-regions graphic:table training_data/*.xml + ... + Training region types: + graphic 3 151 + text 4 11346 + separator 5 5431 + ... + +These options are combinable to massage the dataset into any typology you want. + +Then there are some options that set metadata fields controlling the +postprocessing. When computing the bounding polygons the recognized baselines +are offset slightly to ensure overlap with the line corpus. This offset is per +default upwards for baselines but as it is possible to annotate toplines (for +scripts like Hebrew) and centerlines (for baseline-free scripts like Chinese) +the appropriate offset can be selected with an option: + +.. code-block:: console + + $ ketos segtrain --topline -f xml hebrew_training_data/*.xml + $ ketos segtrain --centerline -f xml chinese_training_data/*.xml + $ ketos segtrain --baseline -f xml latin_training_data/*.xml + +Lastly, there are some regions that are absolute boundaries for text line +content. When these regions are marked as such the polygonization can sometimes +be improved: + +.. code-block:: console + + $ ketos segtrain --bounding-regions paragraph -f xml training_data/*.xml + ... + +Recognition Testing +------------------- + +Picking a particular model from a pool or getting a more detailed look on the +recognition accuracy can be done with the `test` command. It uses transcribed +lines, the test set, in the same format as the `train` command, recognizes the +line images with one or more models, and creates a detailed report of the +differences from the ground truth for each of them. + +======================================================= ====== +option action +======================================================= ====== +-f, --format-type Sets the test set data format. + Valid choices are 'path', 'xml' (default), 'alto', 'page', or binary. + In `alto`, `page`, and xml mode all data is extracted from XML files + containing both baselines and a link to source images. + In `path` mode arguments are image files sharing a prefix up to the last + extension with JSON `.path` files containing the baseline information. + In `binary` mode arguments are precompiled binary dataset files. +-m, --model Model(s) to evaluate. +-e, --evaluation-files File(s) with paths to evaluation data. +-d, --device Select device to use. +--pad Left and right padding around lines. +======================================================= ====== + +Transcriptions are handed to the command in the same way as for the `train` +command, either through a manifest with `-e/--evaluation-files` or by just +adding a number of image files as the final argument: + +.. code-block:: console + + $ ketos test -m $model -e test.txt test/*.png + Evaluating $model + Evaluating [####################################] 100% + === report test_model.mlmodel === + + 7012 Characters + 6022 Errors + 14.12% Accuracy + + 5226 Insertions + 2 Deletions + 794 Substitutions + + Count Missed %Right + 1567 575 63.31% Common + 5230 5230 0.00% Arabic + 215 215 0.00% Inherited + + Errors Correct-Generated + 773 { ا } - { } + 536 { ل } - { } + 328 { و } - { } + 274 { ي } - { } + 266 { م } - { } + 256 { ب } - { } + 246 { ن } - { } + 241 { SPACE } - { } + 207 { ر } - { } + 199 { ف } - { } + 192 { ه } - { } + 174 { ع } - { } + 172 { ARABIC HAMZA ABOVE } - { } + 144 { ت } - { } + 136 { ق } - { } + 122 { س } - { } + 108 { ، } - { } + 106 { د } - { } + 82 { ك } - { } + 81 { ح } - { } + 71 { ج } - { } + 66 { خ } - { } + 62 { ة } - { } + 60 { ص } - { } + 39 { ، } - { - } + 38 { ش } - { } + 30 { ا } - { - } + 30 { ن } - { - } + 29 { ى } - { } + 28 { ذ } - { } + 27 { ه } - { - } + 27 { ARABIC HAMZA BELOW } - { } + 25 { ز } - { } + 23 { ث } - { } + 22 { غ } - { } + 20 { م } - { - } + 20 { ي } - { - } + 20 { ) } - { } + 19 { : } - { } + 19 { ط } - { } + 19 { ل } - { - } + 18 { ، } - { . } + 17 { ة } - { - } + 16 { ض } - { } + ... + Average accuracy: 14.12%, (stddev: 0.00) + +The report(s) contains character accuracy measured per script and a detailed +list of confusions. When evaluating multiple models the last line of the output +will the average accuracy and the standard deviation across all of them. + + diff --git a/4.3.0/_sources/models.rst.txt b/4.3.0/_sources/models.rst.txt new file mode 100644 index 000000000..b393f0738 --- /dev/null +++ b/4.3.0/_sources/models.rst.txt @@ -0,0 +1,24 @@ +.. _models: + +Models +====== + +There are currently three kinds of models containing the recurrent neural +networks doing all the character recognition supported by kraken: ``pronn`` +files serializing old pickled ``pyrnn`` models as protobuf, clstm's native +serialization, and versatile `Core ML +`_ models. + +CoreML +------ + +Core ML allows arbitrary network architectures in a compact serialization with +metadata. This is the default format in pytorch-based kraken. + +Segmentation Models +------------------- + +Recognition Models +------------------ + + diff --git a/4.3.0/_sources/training.rst.txt b/4.3.0/_sources/training.rst.txt new file mode 100644 index 000000000..f514da49b --- /dev/null +++ b/4.3.0/_sources/training.rst.txt @@ -0,0 +1,463 @@ +.. _training: + +Training kraken +=============== + +kraken is an optical character recognition package that can be trained fairly +easily for a large number of scripts. In contrast to other system requiring +segmentation down to glyph level before classification, it is uniquely suited +for the recognition of connected scripts, because the neural network is trained +to assign correct character to unsegmented training data. + +Both segmentation, the process finding lines and regions on a page image, and +recognition, the conversion of line images into text, can be trained in kraken. +To train models for either we require training data, i.e. examples of page +segmentations and transcriptions that are similar to what we want to be able to +recognize. For segmentation the examples are the location of baselines, i.e. +the imaginary lines the text is written on, and polygons of regions. For +recognition these are the text contained in a line. There are multiple ways to +supply training data but the easiest is through PageXML or ALTO files. + +Installing kraken +----------------- + +The easiest way to install and use kraken is through `conda +`_. kraken works both on Linux and Mac OS +X. After installing conda, download the environment file and create the +environment for kraken: + +.. code-block:: console + + $ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment.yml + $ conda env create -f environment.yml + +Each time you want to use the kraken environment in a shell is has to be +activated first: + +.. code-block:: console + + $ conda activate kraken + +Image acquisition and preprocessing +----------------------------------- + +First a number of high quality scans, preferably color or grayscale and at +least 300dpi are required. Scans should be in a lossless image format such as +TIFF or PNG, images in PDF files have to be extracted beforehand using a tool +such as ``pdftocairo`` or ``pdfimages``. While each of these requirements can +be relaxed to a degree, the final accuracy will suffer to some extent. For +example, only slightly compressed JPEG scans are generally suitable for +training and recognition. + +Depending on the source of the scans some preprocessing such as splitting scans +into pages, correcting skew and warp, and removing speckles can be advisable +although it isn't strictly necessary as the segmenter can be trained to treat +noisy material with a high accuracy. A fairly user-friendly software for +semi-automatic batch processing of image scans is `Scantailor +`_ albeit most work can be done using a standard image +editor. + +The total number of scans required depends on the kind of model to train +(segmentation or recognition), the complexity of the layout or the nature of +the script to recognize. Only features that are found in the training data can +later be recognized, so it is important that the coverage of typographic +features is exhaustive. Training a small segmentation model for a particular +kind of material might require less than a few hundred samples while a general +model can well go into the thousands of pages. Likewise a specific recognition +model for printed script with a small grapheme inventory such as Arabic or +Hebrew requires around 800 lines, with manuscripts, complex scripts (such as +polytonic Greek), and general models for multiple typefaces and hands needing +more training data for the same accuracy. + +There is no hard rule for the amount of training data and it may be required to +retrain a model after the initial training data proves insufficient. Most +``western`` texts contain between 25 and 40 lines per page, therefore upward of +30 pages have to be preprocessed and later transcribed. + +Annotation and transcription +---------------------------- + +kraken does not provide internal tools for the annotation and transcription of +baselines, regions, and text. There are a number of tools available that can +create ALTO and PageXML files containing the requisite information for either +segmentation or recognition training: `escriptorium +`_ integrates kraken tightly including +training and inference, `Aletheia +`_ is a powerful desktop +application that can create fine grained annotations. + +Dataset Compilation +------------------- + +.. _compilation: + +Training +-------- + +.. _training_step: + +The training data, e.g. a collection of PAGE XML documents, obtained through +annotation and transcription may now be used to train segmentation and/or +transcription models. + +The training data in ``output_dir`` may now be used to train a new model by +invoking the ``ketos train`` command. Just hand a list of images to the command +such as: + +.. code-block:: console + + $ ketos train output_dir/*.png + +to start training. + +A number of lines will be split off into a separate held-out set that is used +to estimate the actual recognition accuracy achieved in the real world. These +are never shown to the network during training but will be recognized +periodically to evaluate the accuracy of the model. Per default the validation +set will comprise of 10% of the training data. + +Basic model training is mostly automatic albeit there are multiple parameters +that can be adjusted: + +--output + Sets the prefix for models generated during training. They will best as + ``prefix_epochs.mlmodel``. +--report + How often evaluation passes are run on the validation set. It is an + integer equal or larger than 1 with 1 meaning a report is created each + time the complete training set has been seen by the network. +--savefreq + How often intermediate models are saved to disk. It is an integer with + the same semantics as ``--report``. +--load + Continuing training is possible by loading an existing model file with + ``--load``. To continue training from a base model with another + training set refer to the full :ref:`ketos ` documentation. +--preload + Enables/disables preloading of the training set into memory for + accelerated training. The default setting preloads data sets with less + than 2500 lines, explicitly adding ``--preload`` will preload arbitrary + sized sets. ``--no-preload`` disables preloading in all circumstances. + +Training a network will take some time on a modern computer, even with the +default parameters. While the exact time required is unpredictable as training +is a somewhat random process a rough guide is that accuracy seldom improves +after 50 epochs reached between 8 and 24 hours of training. + +When to stop training is a matter of experience; the default setting employs a +fairly reliable approach known as `early stopping +`_ that stops training as soon as +the error rate on the validation set doesn't improve anymore. This will +prevent `overfitting `_, i.e. +fitting the model to recognize only the training data properly instead of the +general patterns contained therein. + +.. code-block:: console + + $ ketos train output_dir/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'} + Initializing model ✓ + Accuracy report (0) -1.5951 3680 9550 + epoch 0/-1 [####################################] 788/788 + Accuracy report (1) 0.0245 3504 3418 + epoch 1/-1 [####################################] 788/788 + Accuracy report (2) 0.8445 3504 545 + epoch 2/-1 [####################################] 788/788 + Accuracy report (3) 0.9541 3504 161 + epoch 3/-1 [------------------------------------] 13/788 0d 00:22:09 + ... + +By now there should be a couple of models model_name-1.mlmodel, +model_name-2.mlmodel, ... in the directory the script was executed in. Lets +take a look at each part of the output. + +.. code-block:: console + + Building training set [####################################] 100% + Building validation set [####################################] 100% + +shows the progress of loading the training and validation set into memory. This +might take a while as preprocessing the whole set and putting it into memory is +computationally intensive. Loading can be made faster without preloading at the +cost of performing preprocessing repeatedly during the training process. + +.. code-block:: console + + [270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'} + +is a warning about missing characters in either the validation or training set, +i.e. that the alphabets of the sets are not equal. Increasing the size of the +validation set will often remedy this warning. + +.. code-block:: console + + Accuracy report (2) 0.8445 3504 545 + +this line shows the results of the validation set evaluation. The error after 2 +epochs is 545 incorrect characters out of 3504 characters in the validation set +for a character accuracy of 84.4%. It should decrease fairly rapidly. If +accuracy remains around 0.30 something is amiss, e.g. non-reordered +right-to-left or wildly incorrect transcriptions. Abort training, correct the +error(s) and start again. + +After training is finished the best model is saved as +``model_name_best.mlmodel``. It is highly recommended to also archive the +training log and data for later reference. + +``ketos`` can also produce more verbose output with training set and network +information by appending one or more ``-v`` to the command: + +.. code-block:: console + + $ ketos -vv train syr/*.png + [0.7272] Building ground truth set from 876 line images + [0.7281] Taking 88 lines from training for evaluation + ... + [0.8479] Training set 788 lines, validation set 88 lines, alphabet 48 symbols + [0.8481] alphabet mismatch {'\xa0', '0', ':', '݀', '܇', '݂', '5'} + [0.8482] grapheme count + [0.8484] SPACE 5258 + [0.8484] ܐ 3519 + [0.8485] ܘ 2334 + [0.8486] ܝ 2096 + [0.8487] ܠ 1754 + [0.8487] ܢ 1724 + [0.8488] ܕ 1697 + [0.8489] ܗ 1681 + [0.8489] ܡ 1623 + [0.8490] ܪ 1359 + [0.8491] ܬ 1339 + [0.8491] ܒ 1184 + [0.8492] ܥ 824 + [0.8492] . 811 + [0.8493] COMBINING DOT BELOW 646 + [0.8493] ܟ 599 + [0.8494] ܫ 577 + [0.8495] COMBINING DIAERESIS 488 + [0.8495] ܚ 431 + [0.8496] ܦ 428 + [0.8496] ܩ 307 + [0.8497] COMBINING DOT ABOVE 259 + [0.8497] ܣ 256 + [0.8498] ܛ 204 + [0.8498] ܓ 176 + [0.8499] ܀ 132 + [0.8499] ܙ 81 + [0.8500] * 66 + [0.8501] ܨ 59 + [0.8501] ܆ 40 + [0.8502] [ 40 + [0.8503] ] 40 + [0.8503] 1 18 + [0.8504] 2 11 + [0.8504] ܇ 9 + [0.8505] 3 8 + [0.8505] 6 + [0.8506] 5 5 + [0.8506] NO-BREAK SPACE 4 + [0.8507] 0 4 + [0.8507] 6 4 + [0.8508] : 4 + [0.8508] 8 4 + [0.8509] 9 3 + [0.8510] 7 3 + [0.8510] 4 3 + [0.8511] SYRIAC FEMININE DOT 1 + [0.8511] SYRIAC RUKKAKHA 1 + [0.8512] Encoding training set + [0.9315] Creating new model [1,1,0,48 Lbx100 Do] with 49 outputs + [0.9318] layer type params + [0.9350] 0 rnn direction b transposed False summarize False out 100 legacy None + [0.9361] 1 dropout probability 0.5 dims 1 + [0.9381] 2 linear augmented False out 49 + [0.9918] Constructing RMSprop optimizer (lr: 0.001, momentum: 0.9) + [0.9920] Set OpenMP threads to 4 + [0.9920] Moving model to device cpu + [0.9924] Starting evaluation run + + +indicates that the training is running on 788 transcribed lines and a +validation set of 88 lines. 49 different classes, i.e. Unicode code points, +where found in these 788 lines. These affect the output size of the network; +obviously only these 49 different classes/code points can later be output by +the network. Importantly, we can see that certain characters occur markedly +less often than others. Characters like the Syriac feminine dot and numerals +that occur less than 10 times will most likely not be recognized well by the +trained net. + + +Evaluation and Validation +------------------------- + +While output during training is detailed enough to know when to stop training +one usually wants to know the specific kinds of errors to expect. Doing more +in-depth error analysis also allows to pinpoint weaknesses in the training +data, e.g. above average error rates for numerals indicate either a lack of +representation of numerals in the training data or erroneous transcription in +the first place. + +First the trained model has to be applied to some line transcriptions with the +`ketos test` command: + +.. code-block:: console + + $ ketos test -m syriac_best.mlmodel lines/*.png + Loading model syriac_best.mlmodel ✓ + Evaluating syriac_best.mlmodel + Evaluating [#-----------------------------------] 3% 00:04:56 + ... + +After all lines have been processed a evaluation report will be printed: + +.. code-block:: console + + === report === + + 35619 Characters + 336 Errors + 99.06% Accuracy + + 157 Insertions + 81 Deletions + 98 Substitutions + + Count Missed %Right + 27046 143 99.47% Syriac + 7015 52 99.26% Common + 1558 60 96.15% Inherited + + Errors Correct-Generated + 25 { } - { COMBINING DOT BELOW } + 25 { COMBINING DOT BELOW } - { } + 15 { . } - { } + 15 { COMBINING DIAERESIS } - { } + 12 { ܢ } - { } + 10 { } - { . } + 8 { COMBINING DOT ABOVE } - { } + 8 { ܝ } - { } + 7 { ZERO WIDTH NO-BREAK SPACE } - { } + 7 { ܆ } - { } + 7 { SPACE } - { } + 7 { ܣ } - { } + 6 { } - { ܝ } + 6 { COMBINING DOT ABOVE } - { COMBINING DIAERESIS } + 5 { ܙ } - { } + 5 { ܬ } - { } + 5 { } - { ܢ } + 4 { NO-BREAK SPACE } - { } + 4 { COMBINING DIAERESIS } - { COMBINING DOT ABOVE } + 4 { } - { ܒ } + 4 { } - { COMBINING DIAERESIS } + 4 { ܗ } - { } + 4 { } - { ܬ } + 4 { } - { ܘ } + 4 { ܕ } - { ܢ } + 3 { } - { ܕ } + 3 { ܐ } - { } + 3 { ܗ } - { ܐ } + 3 { ܝ } - { ܢ } + 3 { ܀ } - { . } + 3 { } - { ܗ } + + ..... + +The first section of the report consists of a simple accounting of the number +of characters in the ground truth, the errors in the recognition output and the +resulting accuracy in per cent. + +The next table lists the number of insertions (characters occurring in the +ground truth but not in the recognition output), substitutions (misrecognized +characters), and deletions (superfluous characters recognized by the model). + +Next is a grouping of errors (insertions and substitutions) by Unicode script. + +The final part of the report are errors sorted by frequency and a per +character accuracy report. Importantly most errors are incorrect recognition of +combining marks such as dots and diaereses. These may have several sources: +different dot placement in training and validation set, incorrect transcription +such as non-systematic transcription, or unclean speckled scans. Depending on +the error source, correction most often involves adding more training data and +fixing transcriptions. Sometimes it may even be advisable to remove +unrepresentative data from the training set. + +Recognition +----------- + +The ``kraken`` utility is employed for all non-training related tasks. Optical +character recognition is a multi-step process consisting of binarization +(conversion of input images to black and white), page segmentation (extracting +lines from the image), and recognition (converting line image to character +sequences). All of these may be run in a single call like this: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m MODEL_FILE + +producing a text file from the input image. There are also `hocr +`_ and `ALTO `_ output +formats available through the appropriate switches: + +.. code-block:: console + + $ kraken -i ... ocr -h + $ kraken -i ... ocr -a + +For debugging purposes it is sometimes helpful to run each step manually and +inspect intermediate results: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE BW_IMAGE binarize + $ kraken -i BW_IMAGE LINES segment + $ kraken -i BW_IMAGE OUTPUT_FILE ocr -l LINES ... + +It is also possible to recognize more than one file at a time by just chaining +``-i ... ...`` clauses like this: + +.. code-block:: console + + $ kraken -i input_1 output_1 -i input_2 output_2 ... + +Finally, there is a central repository containing freely available models. +Getting a list of all available models: + +.. code-block:: console + + $ kraken list + +Retrieving model metadata for a particular model: + +.. code-block:: console + + $ kraken show arabic-alam-al-kutub + name: arabic-alam-al-kutub.mlmodel + + An experimental model for Classical Arabic texts. + + Network trained on 889 lines of [0] as a test case for a general Classical + Arabic model. Ground truth was prepared by Sarah Savant + and Maxim Romanov . + + Vocalization was omitted in the ground truth. Training was stopped at ~35000 + iterations with an accuracy of 97%. + + [0] Ibn al-Faqīh (d. 365 AH). Kitāb al-buldān. Edited by Yūsuf al-Hādī, 1st + edition. Bayrūt: ʿĀlam al-kutub, 1416 AH/1996 CE. + alphabet: !()-.0123456789:[] «»،؟ءابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ARABIC + MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW + +and actually fetching the model: + +.. code-block:: console + + $ kraken get arabic-alam-al-kutub + +The downloaded model can then be used for recognition by the name shown in its metadata, e.g.: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m arabic-alam-al-kutub.mlmodel + +For more documentation see the kraken `website `_. diff --git a/4.3.0/_sources/vgsl.rst.txt b/4.3.0/_sources/vgsl.rst.txt new file mode 100644 index 000000000..913a7b5b1 --- /dev/null +++ b/4.3.0/_sources/vgsl.rst.txt @@ -0,0 +1,199 @@ +.. _vgsl: + +VGSL network specification +========================== + +kraken implements a dialect of the Variable-size Graph Specification Language +(VGSL), enabling the specification of different network architectures for image +processing purposes using a short definition string. + +Basics +------ + +A VGSL specification consists of an input block, one or more layers, and an +output block. For example: + +.. code-block:: console + + [1,48,0,1 Cr3,3,32 Mp2,2 Cr3,3,64 Mp2,2 S1(1x12)1,3 Lbx100 Do O1c103] + +The first block defines the input in order of [batch, height, width, channels] +with zero-valued dimensions being variable. Integer valued height or width +input specifications will result in the input images being automatically scaled +in either dimension. + +When channels are set to 1 grayscale or B/W inputs are expected, 3 expects RGB +color images. Higher values in combination with a height of 1 result in the +network being fed 1 pixel wide grayscale strips scaled to the size of the +channel dimension. + +After the input, a number of layers are defined. Layers operate on the channel +dimension; this is intuitive for convolutional layers but a recurrent layer +doing sequence classification along the width axis on an image of a particular +height requires the height dimension to be moved to the channel dimension, +e.g.: + +.. code-block:: console + + [1,48,0,1 S1(1x48)1,3 Lbx100 O1c103] + +or using the alternative slightly faster formulation: + +.. code-block:: console + + [1,1,0,48 Lbx100 O1c103] + +Finally an output definition is appended. When training sequence classification +networks with the provided tools the appropriate output definition is +automatically appended to the network based on the alphabet of the training +data. + +Examples +-------- + +.. code-block:: console + + [1,1,0,48 Lbx100 Do 01c59] + + Creating new model [1,1,0,48 Lbx100 Do] with 59 outputs + layer type params + 0 rnn direction b transposed False summarize False out 100 legacy None + 1 dropout probability 0.5 dims 1 + 2 linear augmented False out 59 + +A simple recurrent recognition model with a single LSTM layer classifying lines +normalized to 48 pixels in height. + +.. code-block:: console + + [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do 01c59] + + Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 59 outputs + layer type params + 0 conv kernel 3 x 3 filters 32 activation r + 1 dropout probability 0.1 dims 2 + 2 maxpool kernel 2 x 2 stride 2 x 2 + 3 conv kernel 3 x 3 filters 64 activation r + 4 dropout probability 0.1 dims 2 + 5 maxpool kernel 2 x 2 stride 2 x 2 + 6 reshape from 1 1 x 12 to 1/3 + 7 rnn direction b transposed False summarize False out 100 legacy None + 8 dropout probability 0.5 dims 1 + 9 linear augmented False out 59 + +A model with a small convolutional stack before a recurrent LSTM layer. The +extended dropout layer syntax is used to reduce drop probability on the depth +dimension as the default is too high for convolutional layers. The remainder of +the height dimension (`12`) is reshaped into the depth dimensions before +applying the final recurrent and linear layers. + +.. code-block:: console + + [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do 01c59] + + Creating new model [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do] with 59 outputs + layer type params + 0 conv kernel 3 x 3 filters 16 activation r + 1 maxpool kernel 3 x 3 stride 3 x 3 + 2 rnn direction f transposed True summarize True out 64 legacy None + 3 rnn direction b transposed False summarize False out 128 legacy None + 4 rnn direction b transposed False summarize False out 256 legacy None + 5 dropout probability 0.5 dims 1 + 6 linear augmented False out 59 + +A model with arbitrary sized color image input, an initial summarizing +recurrent layer to squash the height to 64, followed by 2 bi-directional +recurrent layers and a linear projection. + +Convolutional Layers +-------------------- + +.. code-block:: console + + C[{name}](s|t|r|l|m)[{name}],,[,,] + s = sigmoid + t = tanh + r = relu + l = linear + m = softmax + +Adds a 2D convolution with kernel size `(y, x)` and `d` output channels, applying +the selected nonlinearity. The stride can be adjusted with the optional last +two parameters. + +Recurrent Layers +---------------- + +.. code-block:: console + + L[{name}](f|r|b)(x|y)[s][{name}] LSTM cell with n outputs. + G[{name}](f|r|b)(x|y)[s][{name}] GRU cell with n outputs. + f runs the RNN forward only. + r runs the RNN reversed only. + b runs the RNN bidirectionally. + s (optional) summarizes the output in the requested dimension, return the last step. + +Adds either an LSTM or GRU recurrent layer to the network using either the `x` +(width) or `y` (height) dimension as the time axis. Input features are the +channel dimension and the non-time-axis dimension (height/width) is treated as +another batch dimension. For example, a `Lfx25` layer on an `1, 16, 906, 32` +input will execute 16 independent forward passes on `906x32` tensors resulting +in an output of shape `1, 16, 906, 25`. If this isn't desired either run a +summarizing layer in the other direction, e.g. `Lfys20` for an input `1, 1, +906, 20`, or prepend a reshape layer `S1(1x16)1,3` combining the height and +channel dimension for an `1, 1, 906, 512` input to the recurrent layer. + +Helper and Plumbing Layers +-------------------------- + +Max Pool +^^^^^^^^ +.. code-block:: console + + Mp[{name}],[,,] + +Adds a maximum pooling with `(y, x)` kernel_size and `(y_stride, x_stride)` stride. + +Reshape +^^^^^^^ + +.. code-block:: console + + S[{name}](x), Splits one dimension, moves one part to another + dimension. + +The `S` layer reshapes a source dimension `d` to `a,b` and distributes `a` into +dimension `e`, respectively `b` into `f`. Either `e` or `f` has to be equal to +`d`. So `S1(1, 48)1, 3` on an `1, 48, 1020, 8` input will first reshape into +`1, 1, 48, 1020, 8`, leave the `1` part in the height dimension and distribute +the `48` sized tensor into the channel dimension resulting in a `1, 1, 1024, +48*8=384` sized output. `S` layers are mostly used to remove undesirable non-1 +height before a recurrent layer. + +.. note:: + + This `S` layer is equivalent to the one implemented in the tensorflow + implementation of VGSL, i.e. behaves differently from tesseract. + +Regularization Layers +--------------------- + +Dropout +^^^^^^^ + +.. code-block:: console + + Do[{name}][],[] Insert a 1D or 2D dropout layer + +Adds an 1D or 2D dropout layer with a given probability. Defaults to `0.5` drop +probability and 1D dropout. Set to `dim` to `2` after convolutional layers. + +Group Normalization +^^^^^^^^^^^^^^^^^^^ + +.. code-block:: console + + Gn Inserts a group normalization layer + +Adds a group normalization layer separating the input into `` groups, +normalizing each separately. diff --git a/4.3.0/_static/alabaster.css b/4.3.0/_static/alabaster.css new file mode 100644 index 000000000..e3174bf93 --- /dev/null +++ b/4.3.0/_static/alabaster.css @@ -0,0 +1,708 @@ +@import url("basic.css"); + +/* -- page layout ----------------------------------------------------------- */ + +body { + font-family: Georgia, serif; + font-size: 17px; + background-color: #fff; + color: #000; + margin: 0; + padding: 0; +} + + +div.document { + width: 940px; + margin: 30px auto 0 auto; +} + +div.documentwrapper { + float: left; + width: 100%; +} + +div.bodywrapper { + margin: 0 0 0 220px; +} + +div.sphinxsidebar { + width: 220px; + font-size: 14px; + line-height: 1.5; +} + +hr { + border: 1px solid #B1B4B6; +} + +div.body { + background-color: #fff; + color: #3E4349; + padding: 0 30px 0 30px; +} + +div.body > .section { + text-align: left; +} + +div.footer { + width: 940px; + margin: 20px auto 30px auto; + font-size: 14px; + color: #888; + text-align: right; +} + +div.footer a { + color: #888; +} + +p.caption { + font-family: inherit; + font-size: inherit; +} + + +div.relations { + display: none; +} + + +div.sphinxsidebar { + max-height: 100%; + overflow-y: auto; +} + +div.sphinxsidebar a { + color: #444; + text-decoration: none; + border-bottom: 1px dotted #999; +} + +div.sphinxsidebar a:hover { + border-bottom: 1px solid #999; +} + +div.sphinxsidebarwrapper { + padding: 18px 10px; +} + +div.sphinxsidebarwrapper p.logo { + padding: 0; + margin: -10px 0 0 0px; + text-align: center; +} + +div.sphinxsidebarwrapper h1.logo { + margin-top: -10px; + text-align: center; + margin-bottom: 5px; + text-align: left; +} + +div.sphinxsidebarwrapper h1.logo-name { + margin-top: 0px; +} + +div.sphinxsidebarwrapper p.blurb { + margin-top: 0; + font-style: normal; +} + +div.sphinxsidebar h3, +div.sphinxsidebar h4 { + font-family: Georgia, serif; + color: #444; + font-size: 24px; + font-weight: normal; + margin: 0 0 5px 0; + padding: 0; +} + +div.sphinxsidebar h4 { + font-size: 20px; +} + +div.sphinxsidebar h3 a { + color: #444; +} + +div.sphinxsidebar p.logo a, +div.sphinxsidebar h3 a, +div.sphinxsidebar p.logo a:hover, +div.sphinxsidebar h3 a:hover { + border: none; +} + +div.sphinxsidebar p { + color: #555; + margin: 10px 0; +} + +div.sphinxsidebar ul { + margin: 10px 0; + padding: 0; + color: #000; +} + +div.sphinxsidebar ul li.toctree-l1 > a { + font-size: 120%; +} + +div.sphinxsidebar ul li.toctree-l2 > a { + font-size: 110%; +} + +div.sphinxsidebar input { + border: 1px solid #CCC; + font-family: Georgia, serif; + font-size: 1em; +} + +div.sphinxsidebar #searchbox input[type="text"] { + width: 160px; +} + +div.sphinxsidebar .search > div { + display: table-cell; +} + +div.sphinxsidebar hr { + border: none; + height: 1px; + color: #AAA; + background: #AAA; + + text-align: left; + margin-left: 0; + width: 50%; +} + +div.sphinxsidebar .badge { + border-bottom: none; +} + +div.sphinxsidebar .badge:hover { + border-bottom: none; +} + +/* To address an issue with donation coming after search */ +div.sphinxsidebar h3.donation { + margin-top: 10px; +} + +/* -- body styles ----------------------------------------------------------- */ + +a { + color: #004B6B; + text-decoration: underline; +} + +a:hover { + color: #6D4100; + text-decoration: underline; +} + +div.body h1, +div.body h2, +div.body h3, +div.body h4, +div.body h5, +div.body h6 { + font-family: Georgia, serif; + font-weight: normal; + margin: 30px 0px 10px 0px; + padding: 0; +} + +div.body h1 { margin-top: 0; padding-top: 0; font-size: 240%; } +div.body h2 { font-size: 180%; } +div.body h3 { font-size: 150%; } +div.body h4 { font-size: 130%; } +div.body h5 { font-size: 100%; } +div.body h6 { font-size: 100%; } + +a.headerlink { + color: #DDD; + padding: 0 4px; + text-decoration: none; +} + +a.headerlink:hover { + color: #444; + background: #EAEAEA; +} + +div.body p, div.body dd, div.body li { + line-height: 1.4em; +} + +div.admonition { + margin: 20px 0px; + padding: 10px 30px; + background-color: #EEE; + border: 1px solid #CCC; +} + +div.admonition tt.xref, div.admonition code.xref, div.admonition a tt { + background-color: #FBFBFB; + border-bottom: 1px solid #fafafa; +} + +div.admonition p.admonition-title { + font-family: Georgia, serif; + font-weight: normal; + font-size: 24px; + margin: 0 0 10px 0; + padding: 0; + line-height: 1; +} + +div.admonition p.last { + margin-bottom: 0; +} + +div.highlight { + background-color: #fff; +} + +dt:target, .highlight { + background: #FAF3E8; +} + +div.warning { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.danger { + background-color: #FCC; + border: 1px solid #FAA; + -moz-box-shadow: 2px 2px 4px #D52C2C; + -webkit-box-shadow: 2px 2px 4px #D52C2C; + box-shadow: 2px 2px 4px #D52C2C; +} + +div.error { + background-color: #FCC; + border: 1px solid #FAA; + -moz-box-shadow: 2px 2px 4px #D52C2C; + -webkit-box-shadow: 2px 2px 4px #D52C2C; + box-shadow: 2px 2px 4px #D52C2C; +} + +div.caution { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.attention { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.important { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.note { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.tip { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.hint { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.seealso { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.topic { + background-color: #EEE; +} + +p.admonition-title { + display: inline; +} + +p.admonition-title:after { + content: ":"; +} + +pre, tt, code { + font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace; + font-size: 0.9em; +} + +.hll { + background-color: #FFC; + margin: 0 -12px; + padding: 0 12px; + display: block; +} + +img.screenshot { +} + +tt.descname, tt.descclassname, code.descname, code.descclassname { + font-size: 0.95em; +} + +tt.descname, code.descname { + padding-right: 0.08em; +} + +img.screenshot { + -moz-box-shadow: 2px 2px 4px #EEE; + -webkit-box-shadow: 2px 2px 4px #EEE; + box-shadow: 2px 2px 4px #EEE; +} + +table.docutils { + border: 1px solid #888; + -moz-box-shadow: 2px 2px 4px #EEE; + -webkit-box-shadow: 2px 2px 4px #EEE; + box-shadow: 2px 2px 4px #EEE; +} + +table.docutils td, table.docutils th { + border: 1px solid #888; + padding: 0.25em 0.7em; +} + +table.field-list, table.footnote { + border: none; + -moz-box-shadow: none; + -webkit-box-shadow: none; + box-shadow: none; +} + +table.footnote { + margin: 15px 0; + width: 100%; + border: 1px solid #EEE; + background: #FDFDFD; + font-size: 0.9em; +} + +table.footnote + table.footnote { + margin-top: -15px; + border-top: none; +} + +table.field-list th { + padding: 0 0.8em 0 0; +} + +table.field-list td { + padding: 0; +} + +table.field-list p { + margin-bottom: 0.8em; +} + +/* Cloned from + * https://github.com/sphinx-doc/sphinx/commit/ef60dbfce09286b20b7385333d63a60321784e68 + */ +.field-name { + -moz-hyphens: manual; + -ms-hyphens: manual; + -webkit-hyphens: manual; + hyphens: manual; +} + +table.footnote td.label { + width: .1px; + padding: 0.3em 0 0.3em 0.5em; +} + +table.footnote td { + padding: 0.3em 0.5em; +} + +dl { + margin-left: 0; + margin-right: 0; + margin-top: 0; + padding: 0; +} + +dl dd { + margin-left: 30px; +} + +blockquote { + margin: 0 0 0 30px; + padding: 0; +} + +ul, ol { + /* Matches the 30px from the narrow-screen "li > ul" selector below */ + margin: 10px 0 10px 30px; + padding: 0; +} + +pre { + background: #EEE; + padding: 7px 30px; + margin: 15px 0px; + line-height: 1.3em; +} + +div.viewcode-block:target { + background: #ffd; +} + +dl pre, blockquote pre, li pre { + margin-left: 0; + padding-left: 30px; +} + +tt, code { + background-color: #ecf0f3; + color: #222; + /* padding: 1px 2px; */ +} + +tt.xref, code.xref, a tt { + background-color: #FBFBFB; + border-bottom: 1px solid #fff; +} + +a.reference { + text-decoration: none; + border-bottom: 1px dotted #004B6B; +} + +/* Don't put an underline on images */ +a.image-reference, a.image-reference:hover { + border-bottom: none; +} + +a.reference:hover { + border-bottom: 1px solid #6D4100; +} + +a.footnote-reference { + text-decoration: none; + font-size: 0.7em; + vertical-align: top; + border-bottom: 1px dotted #004B6B; +} + +a.footnote-reference:hover { + border-bottom: 1px solid #6D4100; +} + +a:hover tt, a:hover code { + background: #EEE; +} + + +@media screen and (max-width: 870px) { + + div.sphinxsidebar { + display: none; + } + + div.document { + width: 100%; + + } + + div.documentwrapper { + margin-left: 0; + margin-top: 0; + margin-right: 0; + margin-bottom: 0; + } + + div.bodywrapper { + margin-top: 0; + margin-right: 0; + margin-bottom: 0; + margin-left: 0; + } + + ul { + margin-left: 0; + } + + li > ul { + /* Matches the 30px from the "ul, ol" selector above */ + margin-left: 30px; + } + + .document { + width: auto; + } + + .footer { + width: auto; + } + + .bodywrapper { + margin: 0; + } + + .footer { + width: auto; + } + + .github { + display: none; + } + + + +} + + + +@media screen and (max-width: 875px) { + + body { + margin: 0; + padding: 20px 30px; + } + + div.documentwrapper { + float: none; + background: #fff; + } + + div.sphinxsidebar { + display: block; + float: none; + width: 102.5%; + margin: 50px -30px -20px -30px; + padding: 10px 20px; + background: #333; + color: #FFF; + } + + div.sphinxsidebar h3, div.sphinxsidebar h4, div.sphinxsidebar p, + div.sphinxsidebar h3 a { + color: #fff; + } + + div.sphinxsidebar a { + color: #AAA; + } + + div.sphinxsidebar p.logo { + display: none; + } + + div.document { + width: 100%; + margin: 0; + } + + div.footer { + display: none; + } + + div.bodywrapper { + margin: 0; + } + + div.body { + min-height: 0; + padding: 0; + } + + .rtd_doc_footer { + display: none; + } + + .document { + width: auto; + } + + .footer { + width: auto; + } + + .footer { + width: auto; + } + + .github { + display: none; + } +} + + +/* misc. */ + +.revsys-inline { + display: none!important; +} + +/* Hide ugly table cell borders in ..bibliography:: directive output */ +table.docutils.citation, table.docutils.citation td, table.docutils.citation th { + border: none; + /* Below needed in some edge cases; if not applied, bottom shadows appear */ + -moz-box-shadow: none; + -webkit-box-shadow: none; + box-shadow: none; +} + + +/* relbar */ + +.related { + line-height: 30px; + width: 100%; + font-size: 0.9rem; +} + +.related.top { + border-bottom: 1px solid #EEE; + margin-bottom: 20px; +} + +.related.bottom { + border-top: 1px solid #EEE; +} + +.related ul { + padding: 0; + margin: 0; + list-style: none; +} + +.related li { + display: inline; +} + +nav#rellinks { + float: right; +} + +nav#rellinks li+li:before { + content: "|"; +} + +nav#breadcrumbs li+li:before { + content: "\00BB"; +} + +/* Hide certain items when printing */ +@media print { + div.related { + display: none; + } +} \ No newline at end of file diff --git a/4.3.0/_static/basic.css b/4.3.0/_static/basic.css new file mode 100644 index 000000000..e5179b7a9 --- /dev/null +++ b/4.3.0/_static/basic.css @@ -0,0 +1,925 @@ +/* + * basic.css + * ~~~~~~~~~ + * + * Sphinx stylesheet -- basic theme. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +/* -- main layout ----------------------------------------------------------- */ + +div.clearer { + clear: both; +} + +div.section::after { + display: block; + content: ''; + clear: left; +} + +/* -- relbar ---------------------------------------------------------------- */ + +div.related { + width: 100%; + font-size: 90%; +} + +div.related h3 { + display: none; +} + +div.related ul { + margin: 0; + padding: 0 0 0 10px; + list-style: none; +} + +div.related li { + display: inline; +} + +div.related li.right { + float: right; + margin-right: 5px; +} + +/* -- sidebar --------------------------------------------------------------- */ + +div.sphinxsidebarwrapper { + padding: 10px 5px 0 10px; +} + +div.sphinxsidebar { + float: left; + width: 230px; + margin-left: -100%; + font-size: 90%; + word-wrap: break-word; + overflow-wrap : break-word; +} + +div.sphinxsidebar ul { + list-style: none; +} + +div.sphinxsidebar ul ul, +div.sphinxsidebar ul.want-points { + margin-left: 20px; + list-style: square; +} + +div.sphinxsidebar ul ul { + margin-top: 0; + margin-bottom: 0; +} + +div.sphinxsidebar form { + margin-top: 10px; +} + +div.sphinxsidebar input { + border: 1px solid #98dbcc; + font-family: sans-serif; + font-size: 1em; +} + +div.sphinxsidebar #searchbox form.search { + overflow: hidden; +} + +div.sphinxsidebar #searchbox input[type="text"] { + float: left; + width: 80%; + padding: 0.25em; + box-sizing: border-box; +} + +div.sphinxsidebar #searchbox input[type="submit"] { + float: left; + width: 20%; + border-left: none; + padding: 0.25em; + box-sizing: border-box; +} + + +img { + border: 0; + max-width: 100%; +} + +/* -- search page ----------------------------------------------------------- */ + +ul.search { + margin: 10px 0 0 20px; + padding: 0; +} + +ul.search li { + padding: 5px 0 5px 20px; + background-image: url(file.png); + background-repeat: no-repeat; + background-position: 0 7px; +} + +ul.search li a { + font-weight: bold; +} + +ul.search li p.context { + color: #888; + margin: 2px 0 0 30px; + text-align: left; +} + +ul.keywordmatches li.goodmatch a { + font-weight: bold; +} + +/* -- index page ------------------------------------------------------------ */ + +table.contentstable { + width: 90%; + margin-left: auto; + margin-right: auto; +} + +table.contentstable p.biglink { + line-height: 150%; +} + +a.biglink { + font-size: 1.3em; +} + +span.linkdescr { + font-style: italic; + padding-top: 5px; + font-size: 90%; +} + +/* -- general index --------------------------------------------------------- */ + +table.indextable { + width: 100%; +} + +table.indextable td { + text-align: left; + vertical-align: top; +} + +table.indextable ul { + margin-top: 0; + margin-bottom: 0; + list-style-type: none; +} + +table.indextable > tbody > tr > td > ul { + padding-left: 0em; +} + +table.indextable tr.pcap { + height: 10px; +} + +table.indextable tr.cap { + margin-top: 10px; + background-color: #f2f2f2; +} + +img.toggler { + margin-right: 3px; + margin-top: 3px; + cursor: pointer; +} + +div.modindex-jumpbox { + border-top: 1px solid #ddd; + border-bottom: 1px solid #ddd; + margin: 1em 0 1em 0; + padding: 0.4em; +} + +div.genindex-jumpbox { + border-top: 1px solid #ddd; + border-bottom: 1px solid #ddd; + margin: 1em 0 1em 0; + padding: 0.4em; +} + +/* -- domain module index --------------------------------------------------- */ + +table.modindextable td { + padding: 2px; + border-collapse: collapse; +} + +/* -- general body styles --------------------------------------------------- */ + +div.body { + min-width: inherit; + max-width: 800px; +} + +div.body p, div.body dd, div.body li, div.body blockquote { + -moz-hyphens: auto; + -ms-hyphens: auto; + -webkit-hyphens: auto; + hyphens: auto; +} + +a.headerlink { + visibility: hidden; +} + +a:visited { + color: #551A8B; +} + +h1:hover > a.headerlink, +h2:hover > a.headerlink, +h3:hover > a.headerlink, +h4:hover > a.headerlink, +h5:hover > a.headerlink, +h6:hover > a.headerlink, +dt:hover > a.headerlink, +caption:hover > a.headerlink, +p.caption:hover > a.headerlink, +div.code-block-caption:hover > a.headerlink { + visibility: visible; +} + +div.body p.caption { + text-align: inherit; +} + +div.body td { + text-align: left; +} + +.first { + margin-top: 0 !important; +} + +p.rubric { + margin-top: 30px; + font-weight: bold; +} + +img.align-left, figure.align-left, .figure.align-left, object.align-left { + clear: left; + float: left; + margin-right: 1em; +} + +img.align-right, figure.align-right, .figure.align-right, object.align-right { + clear: right; + float: right; + margin-left: 1em; +} + +img.align-center, figure.align-center, .figure.align-center, object.align-center { + display: block; + margin-left: auto; + margin-right: auto; +} + +img.align-default, figure.align-default, .figure.align-default { + display: block; + margin-left: auto; + margin-right: auto; +} + +.align-left { + text-align: left; +} + +.align-center { + text-align: center; +} + +.align-default { + text-align: center; +} + +.align-right { + text-align: right; +} + +/* -- sidebars -------------------------------------------------------------- */ + +div.sidebar, +aside.sidebar { + margin: 0 0 0.5em 1em; + border: 1px solid #ddb; + padding: 7px; + background-color: #ffe; + width: 40%; + float: right; + clear: right; + overflow-x: auto; +} + +p.sidebar-title { + font-weight: bold; +} + +nav.contents, +aside.topic, +div.admonition, div.topic, blockquote { + clear: left; +} + +/* -- topics ---------------------------------------------------------------- */ + +nav.contents, +aside.topic, +div.topic { + border: 1px solid #ccc; + padding: 7px; + margin: 10px 0 10px 0; +} + +p.topic-title { + font-size: 1.1em; + font-weight: bold; + margin-top: 10px; +} + +/* -- admonitions ----------------------------------------------------------- */ + +div.admonition { + margin-top: 10px; + margin-bottom: 10px; + padding: 7px; +} + +div.admonition dt { + font-weight: bold; +} + +p.admonition-title { + margin: 0px 10px 5px 0px; + font-weight: bold; +} + +div.body p.centered { + text-align: center; + margin-top: 25px; +} + +/* -- content of sidebars/topics/admonitions -------------------------------- */ + +div.sidebar > :last-child, +aside.sidebar > :last-child, +nav.contents > :last-child, +aside.topic > :last-child, +div.topic > :last-child, +div.admonition > :last-child { + margin-bottom: 0; +} + +div.sidebar::after, +aside.sidebar::after, +nav.contents::after, +aside.topic::after, +div.topic::after, +div.admonition::after, +blockquote::after { + display: block; + content: ''; + clear: both; +} + +/* -- tables ---------------------------------------------------------------- */ + +table.docutils { + margin-top: 10px; + margin-bottom: 10px; + border: 0; + border-collapse: collapse; +} + +table.align-center { + margin-left: auto; + margin-right: auto; +} + +table.align-default { + margin-left: auto; + margin-right: auto; +} + +table caption span.caption-number { + font-style: italic; +} + +table caption span.caption-text { +} + +table.docutils td, table.docutils th { + padding: 1px 8px 1px 5px; + border-top: 0; + border-left: 0; + border-right: 0; + border-bottom: 1px solid #aaa; +} + +th { + text-align: left; + padding-right: 5px; +} + +table.citation { + border-left: solid 1px gray; + margin-left: 1px; +} + +table.citation td { + border-bottom: none; +} + +th > :first-child, +td > :first-child { + margin-top: 0px; +} + +th > :last-child, +td > :last-child { + margin-bottom: 0px; +} + +/* -- figures --------------------------------------------------------------- */ + +div.figure, figure { + margin: 0.5em; + padding: 0.5em; +} + +div.figure p.caption, figcaption { + padding: 0.3em; +} + +div.figure p.caption span.caption-number, +figcaption span.caption-number { + font-style: italic; +} + +div.figure p.caption span.caption-text, +figcaption span.caption-text { +} + +/* -- field list styles ----------------------------------------------------- */ + +table.field-list td, table.field-list th { + border: 0 !important; +} + +.field-list ul { + margin: 0; + padding-left: 1em; +} + +.field-list p { + margin: 0; +} + +.field-name { + -moz-hyphens: manual; + -ms-hyphens: manual; + -webkit-hyphens: manual; + hyphens: manual; +} + +/* -- hlist styles ---------------------------------------------------------- */ + +table.hlist { + margin: 1em 0; +} + +table.hlist td { + vertical-align: top; +} + +/* -- object description styles --------------------------------------------- */ + +.sig { + font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace; +} + +.sig-name, code.descname { + background-color: transparent; + font-weight: bold; +} + +.sig-name { + font-size: 1.1em; +} + +code.descname { + font-size: 1.2em; +} + +.sig-prename, code.descclassname { + background-color: transparent; +} + +.optional { + font-size: 1.3em; +} + +.sig-paren { + font-size: larger; +} + +.sig-param.n { + font-style: italic; +} + +/* C++ specific styling */ + +.sig-inline.c-texpr, +.sig-inline.cpp-texpr { + font-family: unset; +} + +.sig.c .k, .sig.c .kt, +.sig.cpp .k, .sig.cpp .kt { + color: #0033B3; +} + +.sig.c .m, +.sig.cpp .m { + color: #1750EB; +} + +.sig.c .s, .sig.c .sc, +.sig.cpp .s, .sig.cpp .sc { + color: #067D17; +} + + +/* -- other body styles ----------------------------------------------------- */ + +ol.arabic { + list-style: decimal; +} + +ol.loweralpha { + list-style: lower-alpha; +} + +ol.upperalpha { + list-style: upper-alpha; +} + +ol.lowerroman { + list-style: lower-roman; +} + +ol.upperroman { + list-style: upper-roman; +} + +:not(li) > ol > li:first-child > :first-child, +:not(li) > ul > li:first-child > :first-child { + margin-top: 0px; +} + +:not(li) > ol > li:last-child > :last-child, +:not(li) > ul > li:last-child > :last-child { + margin-bottom: 0px; +} + +ol.simple ol p, +ol.simple ul p, +ul.simple ol p, +ul.simple ul p { + margin-top: 0; +} + +ol.simple > li:not(:first-child) > p, +ul.simple > li:not(:first-child) > p { + margin-top: 0; +} + +ol.simple p, +ul.simple p { + margin-bottom: 0; +} + +aside.footnote > span, +div.citation > span { + float: left; +} +aside.footnote > span:last-of-type, +div.citation > span:last-of-type { + padding-right: 0.5em; +} +aside.footnote > p { + margin-left: 2em; +} +div.citation > p { + margin-left: 4em; +} +aside.footnote > p:last-of-type, +div.citation > p:last-of-type { + margin-bottom: 0em; +} +aside.footnote > p:last-of-type:after, +div.citation > p:last-of-type:after { + content: ""; + clear: both; +} + +dl.field-list { + display: grid; + grid-template-columns: fit-content(30%) auto; +} + +dl.field-list > dt { + font-weight: bold; + word-break: break-word; + padding-left: 0.5em; + padding-right: 5px; +} + +dl.field-list > dd { + padding-left: 0.5em; + margin-top: 0em; + margin-left: 0em; + margin-bottom: 0em; +} + +dl { + margin-bottom: 15px; +} + +dd > :first-child { + margin-top: 0px; +} + +dd ul, dd table { + margin-bottom: 10px; +} + +dd { + margin-top: 3px; + margin-bottom: 10px; + margin-left: 30px; +} + +.sig dd { + margin-top: 0px; + margin-bottom: 0px; +} + +.sig dl { + margin-top: 0px; + margin-bottom: 0px; +} + +dl > dd:last-child, +dl > dd:last-child > :last-child { + margin-bottom: 0; +} + +dt:target, span.highlighted { + background-color: #fbe54e; +} + +rect.highlighted { + fill: #fbe54e; +} + +dl.glossary dt { + font-weight: bold; + font-size: 1.1em; +} + +.versionmodified { + font-style: italic; +} + +.system-message { + background-color: #fda; + padding: 5px; + border: 3px solid red; +} + +.footnote:target { + background-color: #ffa; +} + +.line-block { + display: block; + margin-top: 1em; + margin-bottom: 1em; +} + +.line-block .line-block { + margin-top: 0; + margin-bottom: 0; + margin-left: 1.5em; +} + +.guilabel, .menuselection { + font-family: sans-serif; +} + +.accelerator { + text-decoration: underline; +} + +.classifier { + font-style: oblique; +} + +.classifier:before { + font-style: normal; + margin: 0 0.5em; + content: ":"; + display: inline-block; +} + +abbr, acronym { + border-bottom: dotted 1px; + cursor: help; +} + +.translated { + background-color: rgba(207, 255, 207, 0.2) +} + +.untranslated { + background-color: rgba(255, 207, 207, 0.2) +} + +/* -- code displays --------------------------------------------------------- */ + +pre { + overflow: auto; + overflow-y: hidden; /* fixes display issues on Chrome browsers */ +} + +pre, div[class*="highlight-"] { + clear: both; +} + +span.pre { + -moz-hyphens: none; + -ms-hyphens: none; + -webkit-hyphens: none; + hyphens: none; + white-space: nowrap; +} + +div[class*="highlight-"] { + margin: 1em 0; +} + +td.linenos pre { + border: 0; + background-color: transparent; + color: #aaa; +} + +table.highlighttable { + display: block; +} + +table.highlighttable tbody { + display: block; +} + +table.highlighttable tr { + display: flex; +} + +table.highlighttable td { + margin: 0; + padding: 0; +} + +table.highlighttable td.linenos { + padding-right: 0.5em; +} + +table.highlighttable td.code { + flex: 1; + overflow: hidden; +} + +.highlight .hll { + display: block; +} + +div.highlight pre, +table.highlighttable pre { + margin: 0; +} + +div.code-block-caption + div { + margin-top: 0; +} + +div.code-block-caption { + margin-top: 1em; + padding: 2px 5px; + font-size: small; +} + +div.code-block-caption code { + background-color: transparent; +} + +table.highlighttable td.linenos, +span.linenos, +div.highlight span.gp { /* gp: Generic.Prompt */ + user-select: none; + -webkit-user-select: text; /* Safari fallback only */ + -webkit-user-select: none; /* Chrome/Safari */ + -moz-user-select: none; /* Firefox */ + -ms-user-select: none; /* IE10+ */ +} + +div.code-block-caption span.caption-number { + padding: 0.1em 0.3em; + font-style: italic; +} + +div.code-block-caption span.caption-text { +} + +div.literal-block-wrapper { + margin: 1em 0; +} + +code.xref, a code { + background-color: transparent; + font-weight: bold; +} + +h1 code, h2 code, h3 code, h4 code, h5 code, h6 code { + background-color: transparent; +} + +.viewcode-link { + float: right; +} + +.viewcode-back { + float: right; + font-family: sans-serif; +} + +div.viewcode-block:target { + margin: -1px -10px; + padding: 0 10px; +} + +/* -- math display ---------------------------------------------------------- */ + +img.math { + vertical-align: middle; +} + +div.body div.math p { + text-align: center; +} + +span.eqno { + float: right; +} + +span.eqno a.headerlink { + position: absolute; + z-index: 1; +} + +div.math:hover a.headerlink { + visibility: visible; +} + +/* -- printout stylesheet --------------------------------------------------- */ + +@media print { + div.document, + div.documentwrapper, + div.bodywrapper { + margin: 0 !important; + width: 100%; + } + + div.sphinxsidebar, + div.related, + div.footer, + #top-link { + display: none; + } +} \ No newline at end of file diff --git a/4.3.0/_static/blla_heatmap.jpg b/4.3.0/_static/blla_heatmap.jpg new file mode 100644 index 000000000..3f3381096 Binary files /dev/null and b/4.3.0/_static/blla_heatmap.jpg differ diff --git a/4.3.0/_static/blla_output.jpg b/4.3.0/_static/blla_output.jpg new file mode 100644 index 000000000..72c652fa4 Binary files /dev/null and b/4.3.0/_static/blla_output.jpg differ diff --git a/4.3.0/_static/bw.png b/4.3.0/_static/bw.png new file mode 100644 index 000000000..e7e5054eb Binary files /dev/null and b/4.3.0/_static/bw.png differ diff --git a/4.3.0/_static/custom.css b/4.3.0/_static/custom.css new file mode 100644 index 000000000..c41f90af5 --- /dev/null +++ b/4.3.0/_static/custom.css @@ -0,0 +1,24 @@ +pre { + white-space: pre-wrap; +} +svg { + width: 100%; +} +.highlight .err { + border: inherit; + box-sizing: inherit; +} + +div.leftside { + width: 110px; + padding: 0px 3px 0px 0px; + float: left; +} + +div.rightside { + margin-left: 125px; +} + +dl.py { + margin-top: 25px; +} diff --git a/4.3.0/_static/doctools.js b/4.3.0/_static/doctools.js new file mode 100644 index 000000000..4d67807d1 --- /dev/null +++ b/4.3.0/_static/doctools.js @@ -0,0 +1,156 @@ +/* + * doctools.js + * ~~~~~~~~~~~ + * + * Base JavaScript utilities for all Sphinx HTML documentation. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ +"use strict"; + +const BLACKLISTED_KEY_CONTROL_ELEMENTS = new Set([ + "TEXTAREA", + "INPUT", + "SELECT", + "BUTTON", +]); + +const _ready = (callback) => { + if (document.readyState !== "loading") { + callback(); + } else { + document.addEventListener("DOMContentLoaded", callback); + } +}; + +/** + * Small JavaScript module for the documentation. + */ +const Documentation = { + init: () => { + Documentation.initDomainIndexTable(); + Documentation.initOnKeyListeners(); + }, + + /** + * i18n support + */ + TRANSLATIONS: {}, + PLURAL_EXPR: (n) => (n === 1 ? 0 : 1), + LOCALE: "unknown", + + // gettext and ngettext don't access this so that the functions + // can safely bound to a different name (_ = Documentation.gettext) + gettext: (string) => { + const translated = Documentation.TRANSLATIONS[string]; + switch (typeof translated) { + case "undefined": + return string; // no translation + case "string": + return translated; // translation exists + default: + return translated[0]; // (singular, plural) translation tuple exists + } + }, + + ngettext: (singular, plural, n) => { + const translated = Documentation.TRANSLATIONS[singular]; + if (typeof translated !== "undefined") + return translated[Documentation.PLURAL_EXPR(n)]; + return n === 1 ? singular : plural; + }, + + addTranslations: (catalog) => { + Object.assign(Documentation.TRANSLATIONS, catalog.messages); + Documentation.PLURAL_EXPR = new Function( + "n", + `return (${catalog.plural_expr})` + ); + Documentation.LOCALE = catalog.locale; + }, + + /** + * helper function to focus on search bar + */ + focusSearchBar: () => { + document.querySelectorAll("input[name=q]")[0]?.focus(); + }, + + /** + * Initialise the domain index toggle buttons + */ + initDomainIndexTable: () => { + const toggler = (el) => { + const idNumber = el.id.substr(7); + const toggledRows = document.querySelectorAll(`tr.cg-${idNumber}`); + if (el.src.substr(-9) === "minus.png") { + el.src = `${el.src.substr(0, el.src.length - 9)}plus.png`; + toggledRows.forEach((el) => (el.style.display = "none")); + } else { + el.src = `${el.src.substr(0, el.src.length - 8)}minus.png`; + toggledRows.forEach((el) => (el.style.display = "")); + } + }; + + const togglerElements = document.querySelectorAll("img.toggler"); + togglerElements.forEach((el) => + el.addEventListener("click", (event) => toggler(event.currentTarget)) + ); + togglerElements.forEach((el) => (el.style.display = "")); + if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) togglerElements.forEach(toggler); + }, + + initOnKeyListeners: () => { + // only install a listener if it is really needed + if ( + !DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS && + !DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS + ) + return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.altKey || event.ctrlKey || event.metaKey) return; + + if (!event.shiftKey) { + switch (event.key) { + case "ArrowLeft": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const prevLink = document.querySelector('link[rel="prev"]'); + if (prevLink && prevLink.href) { + window.location.href = prevLink.href; + event.preventDefault(); + } + break; + case "ArrowRight": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const nextLink = document.querySelector('link[rel="next"]'); + if (nextLink && nextLink.href) { + window.location.href = nextLink.href; + event.preventDefault(); + } + break; + } + } + + // some keyboard layouts may need Shift to get / + switch (event.key) { + case "/": + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) break; + Documentation.focusSearchBar(); + event.preventDefault(); + } + }); + }, +}; + +// quick alias for translations +const _ = Documentation.gettext; + +_ready(Documentation.init); diff --git a/4.3.0/_static/documentation_options.js b/4.3.0/_static/documentation_options.js new file mode 100644 index 000000000..7e4c114f2 --- /dev/null +++ b/4.3.0/_static/documentation_options.js @@ -0,0 +1,13 @@ +const DOCUMENTATION_OPTIONS = { + VERSION: '', + LANGUAGE: 'en', + COLLAPSE_INDEX: false, + BUILDER: 'html', + FILE_SUFFIX: '.html', + LINK_SUFFIX: '.html', + HAS_SOURCE: true, + SOURCELINK_SUFFIX: '.txt', + NAVIGATION_WITH_KEYS: false, + SHOW_SEARCH_SUMMARY: true, + ENABLE_SEARCH_SHORTCUTS: true, +}; \ No newline at end of file diff --git a/4.3.0/_static/file.png b/4.3.0/_static/file.png new file mode 100644 index 000000000..a858a410e Binary files /dev/null and b/4.3.0/_static/file.png differ diff --git a/4.3.0/_static/graphviz.css b/4.3.0/_static/graphviz.css new file mode 100644 index 000000000..027576e34 --- /dev/null +++ b/4.3.0/_static/graphviz.css @@ -0,0 +1,19 @@ +/* + * graphviz.css + * ~~~~~~~~~~~~ + * + * Sphinx stylesheet -- graphviz extension. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +img.graphviz { + border: 0; + max-width: 100%; +} + +object.graphviz { + max-width: 100%; +} diff --git a/4.3.0/_static/kraken.png b/4.3.0/_static/kraken.png new file mode 100644 index 000000000..8f25dd8be Binary files /dev/null and b/4.3.0/_static/kraken.png differ diff --git a/4.3.0/_static/kraken_recognition.svg b/4.3.0/_static/kraken_recognition.svg new file mode 100644 index 000000000..129b2c67a --- /dev/null +++ b/4.3.0/_static/kraken_recognition.svg @@ -0,0 +1,948 @@ + + + + + + + + + + + + Output Matrix + + + Labels + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Label + Sequence + + + 15, 10, 1, ... + + + + 'Time' Steps + + + + + + + + + + + + + + 'Time' Steps + (Width) + + + + + + + + + + + + + + + + + + + + + + + + + + Neural + Net + + + + Character + Sequence + + + o, c, u, ... + + + + + + + + + + + + + + + CTC + decoder + + + + + Codec + + + + + + + + + + + + + + diff --git a/4.3.0/_static/kraken_segmentation.svg b/4.3.0/_static/kraken_segmentation.svg new file mode 100644 index 000000000..4b9c860ce --- /dev/null +++ b/4.3.0/_static/kraken_segmentation.svg @@ -0,0 +1,1161 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Pixel Labelling + + + + + + + + Line and Separator + Heatmaps + + + + + + + + + Bounding Polygon + Calculation + + + + + + + + + + + Baseline + Vectorization + and Orientation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Oriented + Baselines + + + + + + + + + Line + Ordering + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Bounding + Polygons + + + + + + + Trainable + + + + + + + + + + + + Segmentation + + + + + + + + + + Region Heatmaps + + + + + + + + + + Region + Vectorization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Region + Boundaries + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/4.3.0/_static/kraken_segmodel.svg b/4.3.0/_static/kraken_segmodel.svg new file mode 100644 index 000000000..e722a9707 --- /dev/null +++ b/4.3.0/_static/kraken_segmodel.svg @@ -0,0 +1,250 @@ + + + + + + + + + + + + + Segmentation Model + (TorchVGSLModel) + + + + + + + + + Metadata + + + + + + + Line and Region Types + + + + + + + Baseline location flag + + + + + + + Bounding Regions + + + + + + + + + + + Neural Network + + + + diff --git a/4.3.0/_static/kraken_torchseqrecognizer.svg b/4.3.0/_static/kraken_torchseqrecognizer.svg new file mode 100644 index 000000000..c9a2f1135 --- /dev/null +++ b/4.3.0/_static/kraken_torchseqrecognizer.svg @@ -0,0 +1,239 @@ + + + + + + + + + + + + + Transcription Model + (TorchSeqRecognizer) + + + + + + + + + + Codec + + + + + + + + + + + Metadata + + + + + + + + + + + CTC Decoder + + + + + + + + + + + Neural Network + + + + diff --git a/4.3.0/_static/kraken_workflow.svg b/4.3.0/_static/kraken_workflow.svg new file mode 100644 index 000000000..5a50b51d6 --- /dev/null +++ b/4.3.0/_static/kraken_workflow.svg @@ -0,0 +1,753 @@ + + + + + + + + + + + + + + + Segmentation + + + + + + + + + + + Recognition + + + + + + + + + + + Serialization + + + + + + + + + + + + + + + + + + + + + + Recognition Model + + + + + + + + + + + + + + + + + + + + + + Segmentation Model + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + OCR Records + + + + + + + + + + + + + + + + + + Baselines, + Regions, + and Order + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Output File + + + + + + + + + + + + + + + + + + Output Template + + + + + + + + + + + + + + + + + + Image + + diff --git a/4.3.0/_static/language_data.js b/4.3.0/_static/language_data.js new file mode 100644 index 000000000..367b8ed81 --- /dev/null +++ b/4.3.0/_static/language_data.js @@ -0,0 +1,199 @@ +/* + * language_data.js + * ~~~~~~~~~~~~~~~~ + * + * This script contains the language-specific data used by searchtools.js, + * namely the list of stopwords, stemmer, scorer and splitter. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"]; + + +/* Non-minified version is copied as a separate JS file, if available */ + +/** + * Porter Stemmer + */ +var Stemmer = function() { + + var step2list = { + ational: 'ate', + tional: 'tion', + enci: 'ence', + anci: 'ance', + izer: 'ize', + bli: 'ble', + alli: 'al', + entli: 'ent', + eli: 'e', + ousli: 'ous', + ization: 'ize', + ation: 'ate', + ator: 'ate', + alism: 'al', + iveness: 'ive', + fulness: 'ful', + ousness: 'ous', + aliti: 'al', + iviti: 'ive', + biliti: 'ble', + logi: 'log' + }; + + var step3list = { + icate: 'ic', + ative: '', + alize: 'al', + iciti: 'ic', + ical: 'ic', + ful: '', + ness: '' + }; + + var c = "[^aeiou]"; // consonant + var v = "[aeiouy]"; // vowel + var C = c + "[^aeiouy]*"; // consonant sequence + var V = v + "[aeiou]*"; // vowel sequence + + var mgr0 = "^(" + C + ")?" + V + C; // [C]VC... is m>0 + var meq1 = "^(" + C + ")?" + V + C + "(" + V + ")?$"; // [C]VC[V] is m=1 + var mgr1 = "^(" + C + ")?" + V + C + V + C; // [C]VCVC... is m>1 + var s_v = "^(" + C + ")?" + v; // vowel in stem + + this.stemWord = function (w) { + var stem; + var suffix; + var firstch; + var origword = w; + + if (w.length < 3) + return w; + + var re; + var re2; + var re3; + var re4; + + firstch = w.substr(0,1); + if (firstch == "y") + w = firstch.toUpperCase() + w.substr(1); + + // Step 1a + re = /^(.+?)(ss|i)es$/; + re2 = /^(.+?)([^s])s$/; + + if (re.test(w)) + w = w.replace(re,"$1$2"); + else if (re2.test(w)) + w = w.replace(re2,"$1$2"); + + // Step 1b + re = /^(.+?)eed$/; + re2 = /^(.+?)(ed|ing)$/; + if (re.test(w)) { + var fp = re.exec(w); + re = new RegExp(mgr0); + if (re.test(fp[1])) { + re = /.$/; + w = w.replace(re,""); + } + } + else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1]; + re2 = new RegExp(s_v); + if (re2.test(stem)) { + w = stem; + re2 = /(at|bl|iz)$/; + re3 = new RegExp("([^aeiouylsz])\\1$"); + re4 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + if (re2.test(w)) + w = w + "e"; + else if (re3.test(w)) { + re = /.$/; + w = w.replace(re,""); + } + else if (re4.test(w)) + w = w + "e"; + } + } + + // Step 1c + re = /^(.+?)y$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(s_v); + if (re.test(stem)) + w = stem + "i"; + } + + // Step 2 + re = /^(.+?)(ational|tional|enci|anci|izer|bli|alli|entli|eli|ousli|ization|ation|ator|alism|iveness|fulness|ousness|aliti|iviti|biliti|logi)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = new RegExp(mgr0); + if (re.test(stem)) + w = stem + step2list[suffix]; + } + + // Step 3 + re = /^(.+?)(icate|ative|alize|iciti|ical|ful|ness)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = new RegExp(mgr0); + if (re.test(stem)) + w = stem + step3list[suffix]; + } + + // Step 4 + re = /^(.+?)(al|ance|ence|er|ic|able|ible|ant|ement|ment|ent|ou|ism|ate|iti|ous|ive|ize)$/; + re2 = /^(.+?)(s|t)(ion)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(mgr1); + if (re.test(stem)) + w = stem; + } + else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1] + fp[2]; + re2 = new RegExp(mgr1); + if (re2.test(stem)) + w = stem; + } + + // Step 5 + re = /^(.+?)e$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(mgr1); + re2 = new RegExp(meq1); + re3 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + if (re.test(stem) || (re2.test(stem) && !(re3.test(stem)))) + w = stem; + } + re = /ll$/; + re2 = new RegExp(mgr1); + if (re.test(w) && re2.test(w)) { + re = /.$/; + w = w.replace(re,""); + } + + // and turn initial Y back to y + if (firstch == "y") + w = firstch.toLowerCase() + w.substr(1); + return w; + } +} + diff --git a/4.3.0/_static/minus.png b/4.3.0/_static/minus.png new file mode 100644 index 000000000..d96755fda Binary files /dev/null and b/4.3.0/_static/minus.png differ diff --git a/4.3.0/_static/normal-reproduction-low-resolution.jpg b/4.3.0/_static/normal-reproduction-low-resolution.jpg new file mode 100644 index 000000000..673be92ae Binary files /dev/null and b/4.3.0/_static/normal-reproduction-low-resolution.jpg differ diff --git a/4.3.0/_static/pat.png b/4.3.0/_static/pat.png new file mode 100644 index 000000000..52f7a8995 Binary files /dev/null and b/4.3.0/_static/pat.png differ diff --git a/4.3.0/_static/plus.png b/4.3.0/_static/plus.png new file mode 100644 index 000000000..7107cec93 Binary files /dev/null and b/4.3.0/_static/plus.png differ diff --git a/4.3.0/_static/pygments.css b/4.3.0/_static/pygments.css new file mode 100644 index 000000000..0d49244ed --- /dev/null +++ b/4.3.0/_static/pygments.css @@ -0,0 +1,75 @@ +pre { line-height: 125%; } +td.linenos .normal { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +span.linenos { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +td.linenos .special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +.highlight .hll { background-color: #ffffcc } +.highlight { background: #eeffcc; } +.highlight .c { color: #408090; font-style: italic } /* Comment */ +.highlight .err { border: 1px solid #FF0000 } /* Error */ +.highlight .k { color: #007020; font-weight: bold } /* Keyword */ +.highlight .o { color: #666666 } /* Operator */ +.highlight .ch { color: #408090; font-style: italic } /* Comment.Hashbang */ +.highlight .cm { color: #408090; font-style: italic } /* Comment.Multiline */ +.highlight .cp { color: #007020 } /* Comment.Preproc */ +.highlight .cpf { color: #408090; font-style: italic } /* Comment.PreprocFile */ +.highlight .c1 { color: #408090; font-style: italic } /* Comment.Single */ +.highlight .cs { color: #408090; background-color: #fff0f0 } /* Comment.Special */ +.highlight .gd { color: #A00000 } /* Generic.Deleted */ +.highlight .ge { font-style: italic } /* Generic.Emph */ +.highlight .ges { font-weight: bold; font-style: italic } /* Generic.EmphStrong */ +.highlight .gr { color: #FF0000 } /* Generic.Error */ +.highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */ +.highlight .gi { color: #00A000 } /* Generic.Inserted */ +.highlight .go { color: #333333 } /* Generic.Output */ +.highlight .gp { color: #c65d09; font-weight: bold } /* Generic.Prompt */ +.highlight .gs { font-weight: bold } /* Generic.Strong */ +.highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */ +.highlight .gt { color: #0044DD } /* Generic.Traceback */ +.highlight .kc { color: #007020; font-weight: bold } /* Keyword.Constant */ +.highlight .kd { color: #007020; font-weight: bold } /* Keyword.Declaration */ +.highlight .kn { color: #007020; font-weight: bold } /* Keyword.Namespace */ +.highlight .kp { color: #007020 } /* Keyword.Pseudo */ +.highlight .kr { color: #007020; font-weight: bold } /* Keyword.Reserved */ +.highlight .kt { color: #902000 } /* Keyword.Type */ +.highlight .m { color: #208050 } /* Literal.Number */ +.highlight .s { color: #4070a0 } /* Literal.String */ +.highlight .na { color: #4070a0 } /* Name.Attribute */ +.highlight .nb { color: #007020 } /* Name.Builtin */ +.highlight .nc { color: #0e84b5; font-weight: bold } /* Name.Class */ +.highlight .no { color: #60add5 } /* Name.Constant */ +.highlight .nd { color: #555555; font-weight: bold } /* Name.Decorator */ +.highlight .ni { color: #d55537; font-weight: bold } /* Name.Entity */ +.highlight .ne { color: #007020 } /* Name.Exception */ +.highlight .nf { color: #06287e } /* Name.Function */ +.highlight .nl { color: #002070; font-weight: bold } /* Name.Label */ +.highlight .nn { color: #0e84b5; font-weight: bold } /* Name.Namespace */ +.highlight .nt { color: #062873; font-weight: bold } /* Name.Tag */ +.highlight .nv { color: #bb60d5 } /* Name.Variable */ +.highlight .ow { color: #007020; font-weight: bold } /* Operator.Word */ +.highlight .w { color: #bbbbbb } /* Text.Whitespace */ +.highlight .mb { color: #208050 } /* Literal.Number.Bin */ +.highlight .mf { color: #208050 } /* Literal.Number.Float */ +.highlight .mh { color: #208050 } /* Literal.Number.Hex */ +.highlight .mi { color: #208050 } /* Literal.Number.Integer */ +.highlight .mo { color: #208050 } /* Literal.Number.Oct */ +.highlight .sa { color: #4070a0 } /* Literal.String.Affix */ +.highlight .sb { color: #4070a0 } /* Literal.String.Backtick */ +.highlight .sc { color: #4070a0 } /* Literal.String.Char */ +.highlight .dl { color: #4070a0 } /* Literal.String.Delimiter */ +.highlight .sd { color: #4070a0; font-style: italic } /* Literal.String.Doc */ +.highlight .s2 { color: #4070a0 } /* Literal.String.Double */ +.highlight .se { color: #4070a0; font-weight: bold } /* Literal.String.Escape */ +.highlight .sh { color: #4070a0 } /* Literal.String.Heredoc */ +.highlight .si { color: #70a0d0; font-style: italic } /* Literal.String.Interpol */ +.highlight .sx { color: #c65d09 } /* Literal.String.Other */ +.highlight .sr { color: #235388 } /* Literal.String.Regex */ +.highlight .s1 { color: #4070a0 } /* Literal.String.Single */ +.highlight .ss { color: #517918 } /* Literal.String.Symbol */ +.highlight .bp { color: #007020 } /* Name.Builtin.Pseudo */ +.highlight .fm { color: #06287e } /* Name.Function.Magic */ +.highlight .vc { color: #bb60d5 } /* Name.Variable.Class */ +.highlight .vg { color: #bb60d5 } /* Name.Variable.Global */ +.highlight .vi { color: #bb60d5 } /* Name.Variable.Instance */ +.highlight .vm { color: #bb60d5 } /* Name.Variable.Magic */ +.highlight .il { color: #208050 } /* Literal.Number.Integer.Long */ \ No newline at end of file diff --git a/4.3.0/_static/searchtools.js b/4.3.0/_static/searchtools.js new file mode 100644 index 000000000..b08d58c9b --- /dev/null +++ b/4.3.0/_static/searchtools.js @@ -0,0 +1,620 @@ +/* + * searchtools.js + * ~~~~~~~~~~~~~~~~ + * + * Sphinx JavaScript utilities for the full-text search. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ +"use strict"; + +/** + * Simple result scoring code. + */ +if (typeof Scorer === "undefined") { + var Scorer = { + // Implement the following function to further tweak the score for each result + // The function takes a result array [docname, title, anchor, descr, score, filename] + // and returns the new score. + /* + score: result => { + const [docname, title, anchor, descr, score, filename] = result + return score + }, + */ + + // query matches the full name of an object + objNameMatch: 11, + // or matches in the last dotted part of the object name + objPartialMatch: 6, + // Additive scores depending on the priority of the object + objPrio: { + 0: 15, // used to be importantResults + 1: 5, // used to be objectResults + 2: -5, // used to be unimportantResults + }, + // Used when the priority is not in the mapping. + objPrioDefault: 0, + + // query found in title + title: 15, + partialTitle: 7, + // query found in terms + term: 5, + partialTerm: 2, + }; +} + +const _removeChildren = (element) => { + while (element && element.lastChild) element.removeChild(element.lastChild); +}; + +/** + * See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping + */ +const _escapeRegExp = (string) => + string.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string + +const _displayItem = (item, searchTerms, highlightTerms) => { + const docBuilder = DOCUMENTATION_OPTIONS.BUILDER; + const docFileSuffix = DOCUMENTATION_OPTIONS.FILE_SUFFIX; + const docLinkSuffix = DOCUMENTATION_OPTIONS.LINK_SUFFIX; + const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY; + const contentRoot = document.documentElement.dataset.content_root; + + const [docName, title, anchor, descr, score, _filename] = item; + + let listItem = document.createElement("li"); + let requestUrl; + let linkUrl; + if (docBuilder === "dirhtml") { + // dirhtml builder + let dirname = docName + "/"; + if (dirname.match(/\/index\/$/)) + dirname = dirname.substring(0, dirname.length - 6); + else if (dirname === "index/") dirname = ""; + requestUrl = contentRoot + dirname; + linkUrl = requestUrl; + } else { + // normal html builders + requestUrl = contentRoot + docName + docFileSuffix; + linkUrl = docName + docLinkSuffix; + } + let linkEl = listItem.appendChild(document.createElement("a")); + linkEl.href = linkUrl + anchor; + linkEl.dataset.score = score; + linkEl.innerHTML = title; + if (descr) { + listItem.appendChild(document.createElement("span")).innerHTML = + " (" + descr + ")"; + // highlight search terms in the description + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + } + else if (showSearchSummary) + fetch(requestUrl) + .then((responseData) => responseData.text()) + .then((data) => { + if (data) + listItem.appendChild( + Search.makeSearchSummary(data, searchTerms, anchor) + ); + // highlight search terms in the summary + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + }); + Search.output.appendChild(listItem); +}; +const _finishSearch = (resultCount) => { + Search.stopPulse(); + Search.title.innerText = _("Search Results"); + if (!resultCount) + Search.status.innerText = Documentation.gettext( + "Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories." + ); + else + Search.status.innerText = _( + "Search finished, found ${resultCount} page(s) matching the search query." + ).replace('${resultCount}', resultCount); +}; +const _displayNextItem = ( + results, + resultCount, + searchTerms, + highlightTerms, +) => { + // results left, load the summary and display it + // this is intended to be dynamic (don't sub resultsCount) + if (results.length) { + _displayItem(results.pop(), searchTerms, highlightTerms); + setTimeout( + () => _displayNextItem(results, resultCount, searchTerms, highlightTerms), + 5 + ); + } + // search finished, update title and status message + else _finishSearch(resultCount); +}; +// Helper function used by query() to order search results. +// Each input is an array of [docname, title, anchor, descr, score, filename]. +// Order the results by score (in opposite order of appearance, since the +// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically. +const _orderResultsByScoreThenName = (a, b) => { + const leftScore = a[4]; + const rightScore = b[4]; + if (leftScore === rightScore) { + // same score: sort alphabetically + const leftTitle = a[1].toLowerCase(); + const rightTitle = b[1].toLowerCase(); + if (leftTitle === rightTitle) return 0; + return leftTitle > rightTitle ? -1 : 1; // inverted is intentional + } + return leftScore > rightScore ? 1 : -1; +}; + +/** + * Default splitQuery function. Can be overridden in ``sphinx.search`` with a + * custom function per language. + * + * The regular expression works by splitting the string on consecutive characters + * that are not Unicode letters, numbers, underscores, or emoji characters. + * This is the same as ``\W+`` in Python, preserving the surrogate pair area. + */ +if (typeof splitQuery === "undefined") { + var splitQuery = (query) => query + .split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu) + .filter(term => term) // remove remaining empty strings +} + +/** + * Search Module + */ +const Search = { + _index: null, + _queued_query: null, + _pulse_status: -1, + + htmlToText: (htmlString, anchor) => { + const htmlElement = new DOMParser().parseFromString(htmlString, 'text/html'); + for (const removalQuery of [".headerlink", "script", "style"]) { + htmlElement.querySelectorAll(removalQuery).forEach((el) => { el.remove() }); + } + if (anchor) { + const anchorContent = htmlElement.querySelector(`[role="main"] ${anchor}`); + if (anchorContent) return anchorContent.textContent; + + console.warn( + `Anchored content block not found. Sphinx search tries to obtain it via DOM query '[role=main] ${anchor}'. Check your theme or template.` + ); + } + + // if anchor not specified or not found, fall back to main content + const docContent = htmlElement.querySelector('[role="main"]'); + if (docContent) return docContent.textContent; + + console.warn( + "Content block not found. Sphinx search tries to obtain it via DOM query '[role=main]'. Check your theme or template." + ); + return ""; + }, + + init: () => { + const query = new URLSearchParams(window.location.search).get("q"); + document + .querySelectorAll('input[name="q"]') + .forEach((el) => (el.value = query)); + if (query) Search.performSearch(query); + }, + + loadIndex: (url) => + (document.body.appendChild(document.createElement("script")).src = url), + + setIndex: (index) => { + Search._index = index; + if (Search._queued_query !== null) { + const query = Search._queued_query; + Search._queued_query = null; + Search.query(query); + } + }, + + hasIndex: () => Search._index !== null, + + deferQuery: (query) => (Search._queued_query = query), + + stopPulse: () => (Search._pulse_status = -1), + + startPulse: () => { + if (Search._pulse_status >= 0) return; + + const pulse = () => { + Search._pulse_status = (Search._pulse_status + 1) % 4; + Search.dots.innerText = ".".repeat(Search._pulse_status); + if (Search._pulse_status >= 0) window.setTimeout(pulse, 500); + }; + pulse(); + }, + + /** + * perform a search for something (or wait until index is loaded) + */ + performSearch: (query) => { + // create the required interface elements + const searchText = document.createElement("h2"); + searchText.textContent = _("Searching"); + const searchSummary = document.createElement("p"); + searchSummary.classList.add("search-summary"); + searchSummary.innerText = ""; + const searchList = document.createElement("ul"); + searchList.classList.add("search"); + + const out = document.getElementById("search-results"); + Search.title = out.appendChild(searchText); + Search.dots = Search.title.appendChild(document.createElement("span")); + Search.status = out.appendChild(searchSummary); + Search.output = out.appendChild(searchList); + + const searchProgress = document.getElementById("search-progress"); + // Some themes don't use the search progress node + if (searchProgress) { + searchProgress.innerText = _("Preparing search..."); + } + Search.startPulse(); + + // index already loaded, the browser was quick! + if (Search.hasIndex()) Search.query(query); + else Search.deferQuery(query); + }, + + _parseQuery: (query) => { + // stem the search terms and add them to the correct list + const stemmer = new Stemmer(); + const searchTerms = new Set(); + const excludedTerms = new Set(); + const highlightTerms = new Set(); + const objectTerms = new Set(splitQuery(query.toLowerCase().trim())); + splitQuery(query.trim()).forEach((queryTerm) => { + const queryTermLower = queryTerm.toLowerCase(); + + // maybe skip this "word" + // stopwords array is from language_data.js + if ( + stopwords.indexOf(queryTermLower) !== -1 || + queryTerm.match(/^\d+$/) + ) + return; + + // stem the word + let word = stemmer.stemWord(queryTermLower); + // select the correct list + if (word[0] === "-") excludedTerms.add(word.substr(1)); + else { + searchTerms.add(word); + highlightTerms.add(queryTermLower); + } + }); + + if (SPHINX_HIGHLIGHT_ENABLED) { // set in sphinx_highlight.js + localStorage.setItem("sphinx_highlight_terms", [...highlightTerms].join(" ")) + } + + // console.debug("SEARCH: searching for:"); + // console.info("required: ", [...searchTerms]); + // console.info("excluded: ", [...excludedTerms]); + + return [query, searchTerms, excludedTerms, highlightTerms, objectTerms]; + }, + + /** + * execute search (requires search index to be loaded) + */ + _performSearch: (query, searchTerms, excludedTerms, highlightTerms, objectTerms) => { + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const titles = Search._index.titles; + const allTitles = Search._index.alltitles; + const indexEntries = Search._index.indexentries; + + // Collect multiple result groups to be sorted separately and then ordered. + // Each is an array of [docname, title, anchor, descr, score, filename]. + const normalResults = []; + const nonMainIndexResults = []; + + _removeChildren(document.getElementById("search-progress")); + + const queryLower = query.toLowerCase().trim(); + for (const [title, foundTitles] of Object.entries(allTitles)) { + if (title.toLowerCase().trim().includes(queryLower) && (queryLower.length >= title.length/2)) { + for (const [file, id] of foundTitles) { + const score = Math.round(Scorer.title * queryLower.length / title.length); + const boost = titles[file] === title ? 1 : 0; // add a boost for document titles + normalResults.push([ + docNames[file], + titles[file] !== title ? `${titles[file]} > ${title}` : title, + id !== null ? "#" + id : "", + null, + score + boost, + filenames[file], + ]); + } + } + } + + // search for explicit entries in index directives + for (const [entry, foundEntries] of Object.entries(indexEntries)) { + if (entry.includes(queryLower) && (queryLower.length >= entry.length/2)) { + for (const [file, id, isMain] of foundEntries) { + const score = Math.round(100 * queryLower.length / entry.length); + const result = [ + docNames[file], + titles[file], + id ? "#" + id : "", + null, + score, + filenames[file], + ]; + if (isMain) { + normalResults.push(result); + } else { + nonMainIndexResults.push(result); + } + } + } + } + + // lookup as object + objectTerms.forEach((term) => + normalResults.push(...Search.performObjectSearch(term, objectTerms)) + ); + + // lookup as search terms in fulltext + normalResults.push(...Search.performTermsSearch(searchTerms, excludedTerms)); + + // let the scorer override scores with a custom scoring function + if (Scorer.score) { + normalResults.forEach((item) => (item[4] = Scorer.score(item))); + nonMainIndexResults.forEach((item) => (item[4] = Scorer.score(item))); + } + + // Sort each group of results by score and then alphabetically by name. + normalResults.sort(_orderResultsByScoreThenName); + nonMainIndexResults.sort(_orderResultsByScoreThenName); + + // Combine the result groups in (reverse) order. + // Non-main index entries are typically arbitrary cross-references, + // so display them after other results. + let results = [...nonMainIndexResults, ...normalResults]; + + // remove duplicate search results + // note the reversing of results, so that in the case of duplicates, the highest-scoring entry is kept + let seen = new Set(); + results = results.reverse().reduce((acc, result) => { + let resultStr = result.slice(0, 4).concat([result[5]]).map(v => String(v)).join(','); + if (!seen.has(resultStr)) { + acc.push(result); + seen.add(resultStr); + } + return acc; + }, []); + + return results.reverse(); + }, + + query: (query) => { + const [searchQuery, searchTerms, excludedTerms, highlightTerms, objectTerms] = Search._parseQuery(query); + const results = Search._performSearch(searchQuery, searchTerms, excludedTerms, highlightTerms, objectTerms); + + // for debugging + //Search.lastresults = results.slice(); // a copy + // console.info("search results:", Search.lastresults); + + // print the results + _displayNextItem(results, results.length, searchTerms, highlightTerms); + }, + + /** + * search for object names + */ + performObjectSearch: (object, objectTerms) => { + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const objects = Search._index.objects; + const objNames = Search._index.objnames; + const titles = Search._index.titles; + + const results = []; + + const objectSearchCallback = (prefix, match) => { + const name = match[4] + const fullname = (prefix ? prefix + "." : "") + name; + const fullnameLower = fullname.toLowerCase(); + if (fullnameLower.indexOf(object) < 0) return; + + let score = 0; + const parts = fullnameLower.split("."); + + // check for different match types: exact matches of full name or + // "last name" (i.e. last dotted part) + if (fullnameLower === object || parts.slice(-1)[0] === object) + score += Scorer.objNameMatch; + else if (parts.slice(-1)[0].indexOf(object) > -1) + score += Scorer.objPartialMatch; // matches in last name + + const objName = objNames[match[1]][2]; + const title = titles[match[0]]; + + // If more than one term searched for, we require other words to be + // found in the name/title/description + const otherTerms = new Set(objectTerms); + otherTerms.delete(object); + if (otherTerms.size > 0) { + const haystack = `${prefix} ${name} ${objName} ${title}`.toLowerCase(); + if ( + [...otherTerms].some((otherTerm) => haystack.indexOf(otherTerm) < 0) + ) + return; + } + + let anchor = match[3]; + if (anchor === "") anchor = fullname; + else if (anchor === "-") anchor = objNames[match[1]][1] + "-" + fullname; + + const descr = objName + _(", in ") + title; + + // add custom score for some objects according to scorer + if (Scorer.objPrio.hasOwnProperty(match[2])) + score += Scorer.objPrio[match[2]]; + else score += Scorer.objPrioDefault; + + results.push([ + docNames[match[0]], + fullname, + "#" + anchor, + descr, + score, + filenames[match[0]], + ]); + }; + Object.keys(objects).forEach((prefix) => + objects[prefix].forEach((array) => + objectSearchCallback(prefix, array) + ) + ); + return results; + }, + + /** + * search for full-text terms in the index + */ + performTermsSearch: (searchTerms, excludedTerms) => { + // prepare search + const terms = Search._index.terms; + const titleTerms = Search._index.titleterms; + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const titles = Search._index.titles; + + const scoreMap = new Map(); + const fileMap = new Map(); + + // perform the search on the required terms + searchTerms.forEach((word) => { + const files = []; + const arr = [ + { files: terms[word], score: Scorer.term }, + { files: titleTerms[word], score: Scorer.title }, + ]; + // add support for partial matches + if (word.length > 2) { + const escapedWord = _escapeRegExp(word); + if (!terms.hasOwnProperty(word)) { + Object.keys(terms).forEach((term) => { + if (term.match(escapedWord)) + arr.push({ files: terms[term], score: Scorer.partialTerm }); + }); + } + if (!titleTerms.hasOwnProperty(word)) { + Object.keys(titleTerms).forEach((term) => { + if (term.match(escapedWord)) + arr.push({ files: titleTerms[term], score: Scorer.partialTitle }); + }); + } + } + + // no match but word was a required one + if (arr.every((record) => record.files === undefined)) return; + + // found search word in contents + arr.forEach((record) => { + if (record.files === undefined) return; + + let recordFiles = record.files; + if (recordFiles.length === undefined) recordFiles = [recordFiles]; + files.push(...recordFiles); + + // set score for the word in each file + recordFiles.forEach((file) => { + if (!scoreMap.has(file)) scoreMap.set(file, {}); + scoreMap.get(file)[word] = record.score; + }); + }); + + // create the mapping + files.forEach((file) => { + if (!fileMap.has(file)) fileMap.set(file, [word]); + else if (fileMap.get(file).indexOf(word) === -1) fileMap.get(file).push(word); + }); + }); + + // now check if the files don't contain excluded terms + const results = []; + for (const [file, wordList] of fileMap) { + // check if all requirements are matched + + // as search terms with length < 3 are discarded + const filteredTermCount = [...searchTerms].filter( + (term) => term.length > 2 + ).length; + if ( + wordList.length !== searchTerms.size && + wordList.length !== filteredTermCount + ) + continue; + + // ensure that none of the excluded terms is in the search result + if ( + [...excludedTerms].some( + (term) => + terms[term] === file || + titleTerms[term] === file || + (terms[term] || []).includes(file) || + (titleTerms[term] || []).includes(file) + ) + ) + break; + + // select one (max) score for the file. + const score = Math.max(...wordList.map((w) => scoreMap.get(file)[w])); + // add result to the result list + results.push([ + docNames[file], + titles[file], + "", + null, + score, + filenames[file], + ]); + } + return results; + }, + + /** + * helper function to return a node containing the + * search summary for a given text. keywords is a list + * of stemmed words. + */ + makeSearchSummary: (htmlText, keywords, anchor) => { + const text = Search.htmlToText(htmlText, anchor); + if (text === "") return null; + + const textLower = text.toLowerCase(); + const actualStartPosition = [...keywords] + .map((k) => textLower.indexOf(k.toLowerCase())) + .filter((i) => i > -1) + .slice(-1)[0]; + const startWithContext = Math.max(actualStartPosition - 120, 0); + + const top = startWithContext === 0 ? "" : "..."; + const tail = startWithContext + 240 < text.length ? "..." : ""; + + let summary = document.createElement("p"); + summary.classList.add("context"); + summary.textContent = top + text.substr(startWithContext, 240).trim() + tail; + + return summary; + }, +}; + +_ready(Search.init); diff --git a/4.3.0/_static/sphinx_highlight.js b/4.3.0/_static/sphinx_highlight.js new file mode 100644 index 000000000..8a96c69a1 --- /dev/null +++ b/4.3.0/_static/sphinx_highlight.js @@ -0,0 +1,154 @@ +/* Highlighting utilities for Sphinx HTML documentation. */ +"use strict"; + +const SPHINX_HIGHLIGHT_ENABLED = true + +/** + * highlight a given string on a node by wrapping it in + * span elements with the given class name. + */ +const _highlight = (node, addItems, text, className) => { + if (node.nodeType === Node.TEXT_NODE) { + const val = node.nodeValue; + const parent = node.parentNode; + const pos = val.toLowerCase().indexOf(text); + if ( + pos >= 0 && + !parent.classList.contains(className) && + !parent.classList.contains("nohighlight") + ) { + let span; + + const closestNode = parent.closest("body, svg, foreignObject"); + const isInSVG = closestNode && closestNode.matches("svg"); + if (isInSVG) { + span = document.createElementNS("http://www.w3.org/2000/svg", "tspan"); + } else { + span = document.createElement("span"); + span.classList.add(className); + } + + span.appendChild(document.createTextNode(val.substr(pos, text.length))); + const rest = document.createTextNode(val.substr(pos + text.length)); + parent.insertBefore( + span, + parent.insertBefore( + rest, + node.nextSibling + ) + ); + node.nodeValue = val.substr(0, pos); + /* There may be more occurrences of search term in this node. So call this + * function recursively on the remaining fragment. + */ + _highlight(rest, addItems, text, className); + + if (isInSVG) { + const rect = document.createElementNS( + "http://www.w3.org/2000/svg", + "rect" + ); + const bbox = parent.getBBox(); + rect.x.baseVal.value = bbox.x; + rect.y.baseVal.value = bbox.y; + rect.width.baseVal.value = bbox.width; + rect.height.baseVal.value = bbox.height; + rect.setAttribute("class", className); + addItems.push({ parent: parent, target: rect }); + } + } + } else if (node.matches && !node.matches("button, select, textarea")) { + node.childNodes.forEach((el) => _highlight(el, addItems, text, className)); + } +}; +const _highlightText = (thisNode, text, className) => { + let addItems = []; + _highlight(thisNode, addItems, text, className); + addItems.forEach((obj) => + obj.parent.insertAdjacentElement("beforebegin", obj.target) + ); +}; + +/** + * Small JavaScript module for the documentation. + */ +const SphinxHighlight = { + + /** + * highlight the search words provided in localstorage in the text + */ + highlightSearchWords: () => { + if (!SPHINX_HIGHLIGHT_ENABLED) return; // bail if no highlight + + // get and clear terms from localstorage + const url = new URL(window.location); + const highlight = + localStorage.getItem("sphinx_highlight_terms") + || url.searchParams.get("highlight") + || ""; + localStorage.removeItem("sphinx_highlight_terms") + url.searchParams.delete("highlight"); + window.history.replaceState({}, "", url); + + // get individual terms from highlight string + const terms = highlight.toLowerCase().split(/\s+/).filter(x => x); + if (terms.length === 0) return; // nothing to do + + // There should never be more than one element matching "div.body" + const divBody = document.querySelectorAll("div.body"); + const body = divBody.length ? divBody[0] : document.querySelector("body"); + window.setTimeout(() => { + terms.forEach((term) => _highlightText(body, term, "highlighted")); + }, 10); + + const searchBox = document.getElementById("searchbox"); + if (searchBox === null) return; + searchBox.appendChild( + document + .createRange() + .createContextualFragment( + '" + ) + ); + }, + + /** + * helper function to hide the search marks again + */ + hideSearchWords: () => { + document + .querySelectorAll("#searchbox .highlight-link") + .forEach((el) => el.remove()); + document + .querySelectorAll("span.highlighted") + .forEach((el) => el.classList.remove("highlighted")); + localStorage.removeItem("sphinx_highlight_terms") + }, + + initEscapeListener: () => { + // only install a listener if it is really needed + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.shiftKey || event.altKey || event.ctrlKey || event.metaKey) return; + if (DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS && (event.key === "Escape")) { + SphinxHighlight.hideSearchWords(); + event.preventDefault(); + } + }); + }, +}; + +_ready(() => { + /* Do not call highlightSearchWords() when we are on the search page. + * It will highlight words from the *previous* search query. + */ + if (typeof Search === "undefined") SphinxHighlight.highlightSearchWords(); + SphinxHighlight.initEscapeListener(); +}); diff --git a/4.3.0/advanced.html b/4.3.0/advanced.html new file mode 100644 index 000000000..f0843d6cc --- /dev/null +++ b/4.3.0/advanced.html @@ -0,0 +1,534 @@ + + + + + + + + Advanced Usage — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Advanced Usage

+

Optical character recognition is the serial execution of multiple steps, in the +case of kraken, layout analysis/page segmentation (extracting topological text +lines from an image), recognition (feeding text lines images into a +classifier), and finally serialization of results into an appropriate format +such as ALTO or PageXML.

+
+

Input and Outputs

+

Kraken inputs and their outputs can be defined in multiple ways. The most +simple are input-output pairs, i.e. producing one output document for one input +document follow the basic syntax:

+
$ kraken -i input_1 output_1 -i input_2 output_2 ... subcommand_1 subcommand_2 ... subcommand_n
+
+
+

In particular subcommands may be chained.

+

There are other ways to define inputs and outputs as the syntax shown above can +become rather cumbersome for large amounts of files.

+

As such there are a couple of ways to deal with multiple files in a compact +way. The first is batch processing:

+
$ kraken -I '*.png' -o ocr.txt segment ...
+
+
+

which expands the glob expression in kraken internally and +appends the suffix defined with -o to each output file. An input file +xyz.png will therefore produce an output file xyz.png.ocr.txt. A second way +is to input multi-image files directly. These can be either in PDF, TIFF, or +JPEG2000 format and are specified like:

+
$ kraken -I some.pdf -o ocr.txt -f pdf segment ...
+
+
+

This will internally extract all page images from the input PDF file and write +one output file with an index (can be changed using the -p option) and the +suffix defined with -o.

+

The -f option can not only be used to extract data from PDF/TIFF/JPEG2000 +files but also various XML formats. In these cases the appropriate data is +automatically selected from the inputs, image data for segmentation or line and +region segmentation for recognition:

+
$ kraken -i alto.xml alto.ocr.txt -i page.xml page.ocr.txt -f xml ocr ...
+
+
+

The code is able to automatically determine if a file is in PageXML or ALTO format.

+
+

Output formats

+

All commands have a default output format such as raw text for ocr, a plain +image for binarize, or a JSON definition of the the segmentation for +segment. These are specific to kraken and generally not suitable for further +processing by other software but a number of standardized data exchange formats +can be selected. Per default ALTO, +PageXML, hOCR, and abbyyXML containing additional metadata such as +bounding boxes and confidences are implemented. In addition, custom jinja templates can be loaded to crate +individualised output such as TEI.

+

Output formats are selected on the main kraken command and apply to the last +subcommand defined in the subcommand chain. For example:

+
$ kraken --alto -i ... segment -bl
+
+
+

will serialize a plain segmentation in ALTO into the specified output file.

+

The currently available format switches are:

+
$ kraken -n -i ... ... # native output
+$ kraken -a -i ... ... # ALTO output
+$ kraken -x -i ... ... # PageXML output
+$ kraken -h -i ... ... # hOCR output
+$ kraken -y -i ... ... # abbyyXML output
+
+
+

Custom templates can be loaded with the –template option:

+
$ kraken --template /my/awesome/template.tmpl -i ... ...
+
+
+

The data objects used by the templates are considered internal to kraken and +can change from time to time. The best way to get some orientation when writing +a new template from scratch is to have a look at the existing templates here.

+
+
+
+

Binarization

+
+

Note

+

Binarization is deprecated and mostly not necessary anymore. It can often +worsen text recognition results especially for documents with uneven +lighting, faint writing, etc.

+
+

The binarization subcommand converts a color or grayscale input image into an +image containing only two color levels: white (background) and black +(foreground, i.e. text). It accepts almost the same parameters as +ocropus-nlbin. Only options not related to binarization, e.g. skew +detection are missing. In addition, error checking (image sizes, inversion +detection, grayscale enforcement) is always disabled and kraken will happily +binarize any image that is thrown at it.

+

Available parameters are:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

option

type

–threshold

FLOAT

–zoom

FLOAT

–escale

FLOAT

–border

FLOAT

–perc

INTEGER RANGE

–range

INTEGER

–low

INTEGER RANGE

–high

INTEGER RANGE

+

To binarize a image:

+
$ kraken -i input.jpg bw.png binarize
+
+
+
+

Note

+

Some image formats, notably JPEG, do not support a black and white +image mode. Per default the output format according to the output file +name extension will be honored. If this is not possible, a warning will +be printed and the output forced to PNG:

+
$ kraken -i input.jpg bw.jpg binarize
+Binarizing      [06/24/22 09:56:23] WARNING  jpeg does not support 1bpp images. Forcing to png.
+
+
+
+
+
+
+

Page Segmentation

+

The segment subcommand accesses page segmentation into lines and regions with +the two layout analysis methods implemented: the trainable baseline segmenter +that is capable of detecting both lines of different types and regions and a +legacy non-trainable segmenter that produces bounding boxes.

+

Universal parameters of either segmenter are:

+ + + + + + + + + + + + + + +

option

action

-d, –text-direction

Sets principal text direction. Valid values are horizontal-lr, horizontal-rl, vertical-lr, and vertical-rl.

-m, –mask

Segmentation mask suppressing page areas for line detection. A simple black and white mask image where 0-valued (black) areas are ignored for segmentation purposes.

+
+

Baseline Segmentation

+

The baseline segmenter works by applying a segmentation model on a page image +which labels each pixel on the image with one or more classes with each class +corresponding to a line or region of a specific type. In addition there are two +auxiliary classes that are used to determine the line orientation. A simplified +example of a composite image of the auxiliary classes and a single line type +without regions can be seen below:

+BLLA output heatmap + +

In a second step the raw heatmap is vectorized to extract line instances and +region boundaries, followed by bounding polygon computation for the baselines, +and text line ordering. The final output can be visualized as:

+BLLA final output + +

The primary determinant of segmentation quality is the segmentation model +employed. There is a default model that works reasonably well on printed and +handwritten material on undegraded, even writing surfaces such as paper or +parchment. The output of this model consists of a single line type and a +generic text region class that denotes coherent blocks of text. This model is +employed automatically when the baseline segment is activated with the -bl +option:

+
$ kraken -i input.jpg segmentation.json segment -bl
+
+
+

New models optimized for other kinds of documents can be trained (see +here). These can be applied with the -i option of the +segment subcommand:

+
$ kraken -i input.jpg segmentation.json segment -bl -i fancy_model.mlmodel
+
+
+
+
+

Legacy Box Segmentation

+

The legacy page segmentation is mostly parameterless, although a couple of +switches exist to tweak it for particular inputs. Its output consists of +rectangular bounding boxes in reading order and the general text direction +(horizontal, i.e. LTR or RTL text in top-to-bottom reading order or +vertical-ltr/rtl for vertical lines read from left-to-right or right-to-left).

+

Apart from the limitations of the bounding box paradigm (rotated and curved +lines cannot be effectively extracted) another important drawback of the legacy +segmenter is the requirement for binarized input images. It is therefore +necessary to apply binarization first or supply only +pre-binarized inputs.

+

The legacy segmenter can be applied on some input image with:

+
$ kraken -i 14.tif lines.json segment -x
+$ cat lines.json
+
+
+

Available specific parameters are:

+ + + + + + + + + + + + + + + + + + + + + + + +

option

action

–scale FLOAT

Estimate of the average line height on the page

-m, –maxcolseps

Maximum number of columns in the input document. Set to 0 for uni-column layouts.

-b, –black-colseps / -w, –white-colseps

Switch to black column separators.

-r, –remove-hlines / -l, –hlines

Disables prefiltering of small horizontal lines. Improves segmenter output on some Arabic texts.

-p, –pad

Adds left and right padding around lines in the output.

+
+
+

Principal Text Direction

+

The principal text direction selected with the -d/–text-direction is a +switch used in the reading order heuristic to determine the order of text +blocks (regions) and individual lines. It roughly corresponds to the block +flow direction in CSS with +an additional option. Valid options consist of two parts, an initial principal +line orientation (horizontal or vertical) followed by a block order (lr +for left-to-right or rl for right-to-left).

+

The first part is usually horizontal for scripts like Latin, Arabic, or +Hebrew where the lines are horizontally oriented on the page and are written/read from +top to bottom:

+Horizontal Latin script text + +

Other scripts like Chinese can be written with vertical lines that are +written/read from left to right or right to left:

+Vertical Chinese text + +

The second part is dependent on a number of factors as the order in which text +blocks are read is not fixed for every writing system. In mono-script texts it +is usually determined by the inline text direction, i.e. Latin script texts +columns are read starting with the top-left column followed by the column to +its right and so on, continuing with the left-most column below if none remain +to the right (inverse for right-to-left scripts like Arabic which start on the +top right-most columns, continuing leftward, and returning to the right-most +column just below when none remain).

+

In multi-script documents the order of is determined by the primary writing +system employed in the document, e.g. for a modern book containing both Latin +and Arabic script text it would be set to lr when Latin is primary, e.g. when +the binding is on the left side of the book seen from the title cover, and +vice-versa (rl if binding is on the right on the title cover). The analogue +applies to text written with vertical lines.

+

With these explications there are four different text directions available:

+ + + + + + + + + + + + + + + + + + + + +

Text Direction

Examples

horizontal-lr

Latin script texts, Mixed LTR/RTL docs with principal LTR script

horizontal-rl

Arabic script texts, Mixed LTR/RTL docs with principal RTL script

vertical-lr

Vertical script texts read from left-to-right.

vertical-rl

Vertical script texts read from right-to-left.

+
+
+

Masking

+

It is possible to keep the segmenter from finding text lines and regions on +certain areas of the input image. This is done through providing a binary mask +image that has the same size as the input image where blocked out regions are +black and valid regions white:

+
$ kraken -i input.jpg segmentation.json segment -bl -m mask.png
+
+
+
+
+
+

Model Repository

+

There is a semi-curated repository of freely licensed recognition +models that can be interacted with from the command line using a few +subcommands.

+
+

Querying and Model Retrieval

+

The list subcommand retrieves a list of all models available and prints +them including some additional information (identifier, type, and a short +description):

+
$ kraken list
+Retrieving model list ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 8/8 0:00:00 0:00:07
+10.5281/zenodo.6542744 (pytorch) - LECTAUREP Contemporary French Model (Administration)
+10.5281/zenodo.5617783 (pytorch) - Cremma-Medieval Old French Model (Litterature)
+10.5281/zenodo.5468665 (pytorch) - Medieval Hebrew manuscripts in Sephardi bookhand version 1.0
+...
+
+
+

To access more detailed information the show subcommand may be used:

+
$ kraken show 10.5281/zenodo.5617783
+name: 10.5281/zenodo.5617783
+
+Cremma-Medieval Old French Model (Litterature)
+
+....
+scripts: Latn
+alphabet: &'(),-.0123456789:;?ABCDEFGHIJKLMNOPQRSTUVXabcdefghijklmnopqrstuvwxyz¶ãíñõ÷ħĩłũƺᵉẽ’•⁊⁹ꝑꝓꝯꝰ SPACE, COMBINING ACUTE ACCENT, COMBINING TILDE, COMBINING MACRON, COMBINING ZIGZAG ABOVE, COMBINING LATIN SMALL LETTER A, COMBINING LATIN SMALL LETTER E, COMBINING LATIN SMALL LETTER I, COMBINING LATIN SMALL LETTER O, COMBINING LATIN SMALL LETTER U, COMBINING LATIN SMALL LETTER C, COMBINING LATIN SMALL LETTER R, COMBINING LATIN SMALL LETTER T, COMBINING UR ABOVE, COMBINING US ABOVE, COMBINING LATIN SMALL LETTER S, 0xe8e5, 0xf038, 0xf128
+accuracy: 95.49%
+license: CC-BY-SA-2.0
+author(s): Pinche, Ariane
+date: 2021-10-29
+
+
+

If a suitable model has been decided upon it can be retrieved using the get +subcommand:

+
$ kraken get 10.5281/zenodo.5617783
+Processing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 16.1/16.1 MB 0:00:00 0:00:10
+Model name: cremma_medieval_bicerin.mlmodel
+
+
+

Models will be placed in $XDG_BASE_DIR and can be accessed using their name as +printed in the last line of the kraken get output.

+
$ kraken -i ... ... ocr -m cremma_medieval_bicerin.mlmodel
+
+
+
+
+

Publishing

+

When one would like to share a model with the wider world (for fame and glory!) +it is possible (and recommended) to upload them to repository. The process +consists of 2 stages: the creation of the deposit on the Zenodo platform +followed by approval of the model in the community making it discoverable for +other kraken users.

+

For uploading model a Zenodo account and a personal access token is required. +After account creation tokens can be created under the account settings:

+Zenodo token creation dialogue + +

With the token models can then be uploaded:

+
$ ketos publish -a $ACCESS_TOKEN aaebv2-2.mlmodel
+DOI: 10.5281/zenodo.5617783
+
+
+

A number of important metadata will be asked for such as a short description of +the model, long form description, recognized scripts, and authorship. +Afterwards the model is deposited at Zenodo. This deposit is persistent, i.e. +can’t be changed or deleted so it is important to make sure that all the +information is correct. Each deposit also has a unique persistent identifier, a +DOI, that can be used to refer to it, e.g. in publications or when pointing +someone to a particular model.

+

Once the deposit has been created a request (requiring manual approval) for +inclusion in the repository will automatically be created which will make it +discoverable by other users.

+

It is possible to deposit models without including them in the queryable +repository. Models uploaded this way are not truly private and can still be +found through the standard Zenodo search and be downloaded with kraken get +and its DOI. It is mostly suggested for preliminary models that might get +updated later:

+
$ ketos publish --private -a $ACCESS_TOKEN aaebv2-2.mlmodel
+DOI: 10.5281/zenodo.5617734
+
+
+
+
+
+

Recognition

+

Recognition requires a grey-scale or binarized image, a page segmentation for +that image, and a model file. In particular there is no requirement to use the +page segmentation algorithm contained in the segment subcommand or the +binarization provided by kraken.

+

Multi-script recognition is possible by supplying a script-annotated +segmentation and a mapping between scripts and models:

+
$ kraken -i ... ... ocr -m Grek:porson.clstm -m Latn:antiqua.clstm
+
+
+

All polytonic Greek text portions will be recognized using the porson.clstm +model while Latin text will be fed into the antiqua.clstm model. It is +possible to define a fallback model that other text will be fed to:

+
$ kraken -i ... ... ocr -m ... -m ... -m default:porson.clstm
+
+
+

It is also possible to disable recognition on a particular script by mapping to +the special model keyword ignore. Ignored lines will still be serialized but +will not contain any recognition results.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.3.0/api.html b/4.3.0/api.html new file mode 100644 index 000000000..254646372 --- /dev/null +++ b/4.3.0/api.html @@ -0,0 +1,3056 @@ + + + + + + + + API Quickstart — kraken documentation + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

API Quickstart

+

Kraken provides routines which are usable by third party tools to access all +functionality of the OCR engine. Most functional blocks, binarization, +segmentation, recognition, and serialization are encapsulated in one high +level method each.

+

Simple use cases of the API which are mostly useful for debugging purposes are +contained in the contrib directory. In general it is recommended to look at +this tutorial, these scripts, or the API reference. The command line drivers +are unnecessarily complex for straightforward applications as they contain lots +of boilerplate to enable all use cases.

+
+

Basic Concepts

+

The fundamental modules of the API are similar to the command line drivers. +Image inputs and outputs are generally Pillow +objects and numerical outputs numpy arrays.

+

Top-level modules implement high level functionality while kraken.lib +contains loaders and low level methods that usually should not be used if +access to intermediate results is not required.

+
+
+

Preprocessing and Segmentation

+

The primary preprocessing function is binarization although depending on the +particular setup of the pipeline and the models utilized it can be optional. +For the non-trainable legacy bounding box segmenter binarization is mandatory +although it is still possible to feed color and grayscale images to the +recognizer. The trainable baseline segmenter can work with black and white, +grayscale, and color images, depending on the training data and network +configuration utilized; though grayscale and color data are used in almost all +cases.

+
>>> from PIL import Image
+
+>>> from kraken import binarization
+
+# can be any supported image format and mode
+>>> im = Image.open('foo.png')
+>>> bw_im = binarization.nlbin(im)
+
+
+
+

Legacy segmentation

+

The basic parameter of the legacy segmenter consists just of a b/w image +object, although some additional parameters exist, largely to change the +principal text direction (important for column ordering and top-to-bottom +scripts) and explicit masking of non-text image regions:

+
>>> from kraken import pageseg
+
+>>> seg = pageseg.segment(bw_im)
+>>> seg
+{'text_direction': 'horizontal-lr',
+ 'boxes': [[0, 29, 232, 56],
+           [28, 54, 121, 84],
+           [9, 73, 92, 117],
+           [103, 76, 145, 131],
+           [7, 105, 119, 230],
+           [10, 228, 126, 345],
+           ...
+          ],
+ 'script_detection': False}
+
+
+
+
+

Baseline segmentation

+

The baseline segmentation method is based on a neural network that classifies +image pixels into baselines and regions. Because it is trainable, a +segmentation model is required in addition to the image to be segmented and +it has to be loaded first:

+
>>> from kraken import blla
+>>> from kraken.lib import vgsl
+
+>>> model_path = 'path/to/model/file'
+>>> model = vgsl.TorchVGSLModel.load_model(model_path)
+
+
+

A segmentation model contains a basic neural network and associated metadata +defining the available line and region types, bounding regions, and an +auxiliary baseline location flag for the polygonizer:

+ + + + + + + + + + + + + Segmentation Model + (TorchVGSLModel) + + + + + + + + + Metadata + + + + + + + Line and Region Types + + + + + + + Baseline location flag + + + + + + + Bounding Regions + + + + + + + + + + + Neural Network + + + + +

Afterwards they can be fed into the segmentation method +kraken.blla.segment() with image objects:

+
>>> from kraken import blla
+>>> from kraken import serialization
+
+>>> baseline_seg = blla.segment(im, model=model)
+>>> baseline_seg
+{'text_direction': 'horizontal-lr',
+ 'type': 'baselines',
+ 'script_detection': False,
+ 'lines': [{'script': 'default',
+            'baseline': [[471, 1408], [524, 1412], [509, 1397], [1161, 1412], [1195, 1412]],
+            'boundary': [[471, 1408], [491, 1408], [515, 1385], [562, 1388], [575, 1377], ... [473, 1410]]},
+           ...],
+ 'regions': {'$tip':[[[536, 1716], ... [522, 1708], [524, 1716], [536, 1716], ...]
+             '$par': ...
+             '$nop':  ...}}
+>>> alto = serialization.serialize_segmentation(baseline_seg, image_name=im.filename, image_size=im.size, template='alto')
+>>> with open('segmentation_output.xml', 'w') as fp:
+        fp.write(alto)
+
+
+

Optional parameters are largely the same as for the legacy segmenter, i.e. text +direction and masking.

+

Images are automatically converted into the proper mode for recognition, except +in the case of models trained on binary images as there is a plethora of +different algorithms available, each with strengths and weaknesses. For most +material the kraken-provided binarization should be sufficient, though. This +does not mean that a segmentation model trained on RGB images will have equal +accuracy for B/W, grayscale, and RGB inputs. Nevertheless the drop in quality +will often be modest or non-existent for color models while non-binarized +inputs to a binary model will cause severe degradation (and a warning to that +notion).

+

Per default segmentation is performed on the CPU although the neural network +can be run on a GPU with the device argument. As the vast majority of the +processing required is postprocessing the performance gain will most likely +modest though.

+

The above API is the most simple way to perform a complete segmentation. The +process consists of multiple steps such as pixel labelling, separate region and +baseline vectorization, and bounding polygon calculation:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Pixel Labelling + + + + + + + + Line and Separator + Heatmaps + + + + + + + + + Bounding Polygon + Calculation + + + + + + + + + + + Baseline + Vectorization + and Orientation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Oriented + Baselines + + + + + + + + + Line + Ordering + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Bounding + Polygons + + + + + + + Trainable + + + + + + + + + + + + Segmentation + + + + + + + + + + Region Heatmaps + + + + + + + + + + Region + Vectorization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Region + Boundaries + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

It is possible to only run a subset of the functionality depending on one’s +needs by calling the respective functions in kraken.lib.segmentation. As +part of the sub-library the API is not guaranteed to be stable but it generally +does not change much. Examples of more fine-grained use of the segmentation API +can be found in contrib/repolygonize.py +and contrib/segmentation_overlay.py.

+
+
+
+

Recognition

+

Recognition itself is a multi-step process with a neural network producing a +matrix with a confidence value for possible outputs at each time step. This +matrix is decoded into a sequence of integer labels (label domain) which are +subsequently mapped into Unicode code points using a codec. Labels and code +points usually correspond one-to-one, i.e. each label is mapped to exactly one +Unicode code point, but if desired more complex codecs can map single labels to +multiple code points, multiple labels to single code points, or multiple labels +to multiple code points (see the Codec section for further +information).

+ + + + + + + + + + + + Output Matrix + + + Labels + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Label + Sequence + + + 15, 10, 1, ... + + + + 'Time' Steps + + + + + + + + + + + + + + 'Time' Steps + (Width) + + + + + + + + + + + + + + + + + + + + + + + + + + Neural + Net + + + + Character + Sequence + + + o, c, u, ... + + + + + + + + + + + + + + + CTC + decoder + + + + + Codec + + + + + + + + + + + + + + +

As the customization of this two-stage decoding process is usually reserved +for specialized use cases, sensible defaults are chosen by default: codecs are +part of the model file and do not have to be supplied manually; the preferred +CTC decoder is an optional parameter of the recognition model object.

+

To perform text line recognition a neural network has to be loaded first. A +kraken.lib.models.TorchSeqRecognizer is returned which is a wrapper +around the kraken.lib.vgsl.TorchVGSLModel class seen above for +segmentation model loading.

+
>>> from kraken.lib import models
+
+>>> rec_model_path = '/path/to/recognition/model'
+>>> model = models.load_any(rec_model_path)
+
+
+

The sequence recognizer wrapper combines the neural network itself, a +codec, metadata such as the if the input is supposed to be +grayscale or binarized, and an instance of a CTC decoder that performs the +conversion of the raw output tensor of the network into a sequence of labels:

+ + + + + + + + + + + + + Transcription Model + (TorchSeqRecognizer) + + + + + + + + + + Codec + + + + + + + + + + + Metadata + + + + + + + + + + + CTC Decoder + + + + + + + + + + + Neural Network + + + + +

Afterwards, given an image, a segmentation and the model one can perform text +recognition. The code is identical for both legacy and baseline segmentations. +Like for segmentation input images are auto-converted to the correct color +mode, except in the case of binary models for which a warning will be raised if +there is a mismatch for binary input models.

+

There are two methods for recognition, a basic single model call +kraken.rpred.rpred() and a multi-model recognizer +kraken.rpred.mm_rpred(). The latter is useful for recognizing +multi-scriptal documents, i.e. applying different models to different parts of +a document.

+
>>> from kraken import rpred
+# single model recognition
+>>> pred_it = rpred(model, im, baseline_seg)
+>>> for record in pred_it:
+        print(record)
+
+
+

The output isn’t just a sequence of characters but an +kraken.rpred.ocr_record record object containing the character +prediction, cuts (approximate locations), and confidences.

+
>>> record.cuts
+>>> record.prediction
+>>> record.confidences
+
+
+

it is also possible to access the original line information:

+
# for baselines
+>>> record.type
+'baselines'
+>>> record.line
+>>> record.baseline
+>>> record.script
+
+# for box lines
+>>> record.type
+'box'
+>>> record.line
+>>> record.script
+
+
+

Sometimes the undecoded raw output of the network is required. The \(C +\times W\) softmax output matrix is accessible as the outputs attribute on the +kraken.lib.models.TorchSeqRecognizer after each step of the +kraken.rpred.rpred() iterator. To get a mapping from the label space +\(C\) the network operates in to Unicode code points a codec is used. An +arbitrary sequence of labels can generate an arbitrary number of Unicode code +points although usually the relation is one-to-one.

+
>>> pred_it = rpred(model, im, baseline_seg)
+>>> next(pred_it)
+>>> model.output
+>>> model.codec.l2c
+{'\x01': ' ',
+ '\x02': '"',
+ '\x03': "'",
+ '\x04': '(',
+ '\x05': ')',
+ '\x06': '-',
+ '\x07': '/',
+ ...
+}
+
+
+

There are several different ways to convert the output matrix to a sequence of +labels that can be decoded into a character sequence. These are contained in +kraken.lib.ctc_decoder with +kraken.lib.ctc_decoder.greedy_decoder() being the default.

+
+
+

XML Parsing

+

Sometimes it is desired to take the data in an existing XML serialization +format like PageXML or ALTO and apply an OCR function on it. The +kraken.lib.xml module includes parsers extracting information into data +structures processable with minimal transformtion by the functional blocks:

+
>>> from kraken.lib import xml
+
+>>> alto_doc = '/path/to/alto'
+>>> xml.parse_alto(alto_doc)
+{'image': '/path/to/image/file',
+ 'type': 'baselines',
+ 'lines': [{'baseline': [(24, 2017), (25, 2078)],
+            'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)],
+            'text': '',
+            'script': 'default'},
+           {'baseline': [(79, 2016), (79, 2041)],
+            'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)],
+            'text': '',
+            'script': 'default'}, ...],
+ 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)],
+                                      [(253, 3292), (668, 3292), (668, 3455), (253, 3455)],
+                                      [(216, -4), (1015, -4), (1015, 534), (216, 534)]],
+             'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)],
+                                  [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]],
+             ...}
+}
+
+>>> page_doc = '/path/to/page'
+>>> xml.parse_page(page_doc)
+{'image': '/path/to/image/file',
+ 'type': 'baselines',
+ 'lines': [{'baseline': [(24, 2017), (25, 2078)],
+            'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)],
+            'text': '',
+            'script': 'default'},
+           {'baseline': [(79, 2016), (79, 2041)],
+            'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)],
+            'text': '',
+            'script': 'default'}, ...],
+ 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)],
+                                      [(253, 3292), (668, 3292), (668, 3455), (253, 3455)],
+                                      [(216, -4), (1015, -4), (1015, 534), (216, 534)]],
+             'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)],
+                                  [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]],
+             ...}
+
+
+
+
+

Serialization

+

The serialization module can be used to transform the ocr_records returned by the prediction iterator into a text +based (most often XML) format for archival. The module renders jinja2 templates in kraken/templates through +the kraken.serialization.serialize() function.

+
>>> from kraken.lib import serialization
+
+>>> records = [record for record in pred_it]
+>>> alto = serialization.serialize(records, image_name='path/to/image', image_size=im.size, template='alto')
+>>> with open('output.xml', 'w') as fp:
+        fp.write(alto)
+
+
+
+
+

Training

+

Training is largely implemented with the pytorch lightning framework. There are separate +LightningModule`s for recognition and segmentation training and a small +wrapper around the lightning’s `Trainer class that mainly sets up model +handling and verbosity options for the CLI.

+
>>> from kraken.lib.train import RecognitionModel, KrakenTrainer
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer()
+>>> trainer.fit(model)
+
+
+

Likewise for a baseline and region segmentation model:

+
>>> from kraken.lib.train import SegmentationModel, KrakenTrainer
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = SegmentationModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer()
+>>> trainer.fit(model)
+
+
+

When the fit() method is called the dataset is initialized and the training +commences. Both can take quite a bit of time. To get insight into what exactly +is happening the standard lightning callbacks +can be attached to the trainer object:

+
>>> from pytorch_lightning.callbacks import Callback
+>>> from kraken.lib.train import RecognitionModel, KrakenTrainer
+>>> class MyPrintingCallback(Callback):
+    def on_init_start(self, trainer):
+        print("Starting to init trainer!")
+
+    def on_init_end(self, trainer):
+        print("trainer is init now")
+
+    def on_train_end(self, trainer, pl_module):
+        print("do something when training ends")
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer(enable_progress_bar=False, callbacks=[MyPrintingCallback])
+>>> trainer.fit(model)
+Starting to init trainer!
+trainer is init now
+
+
+

This is only a small subset of the training functionality. It is suggested to +have a closer look at the command line parameters for features as transfer +learning, region and baseline filtering, training continuation, and so on.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.3.0/api_docs.html b/4.3.0/api_docs.html new file mode 100644 index 000000000..09cba0dd9 --- /dev/null +++ b/4.3.0/api_docs.html @@ -0,0 +1,2691 @@ + + + + + + + + API Reference — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

API Reference

+
+

kraken.blla module

+
+

Note

+

blla provides the interface to the fully trainable segmenter. For the +legacy segmenter interface refer to the pageseg module. Note that +recognition models are not interchangeable between segmenters.

+
+
+
+kraken.blla.segment(im, text_direction='horizontal-lr', mask=None, reading_order_fn=polygonal_reading_order, model=None, device='cpu')
+

Segments a page into text lines using the baseline segmenter.

+

Segments a page into text lines and returns the polyline formed by each +baseline and their estimated environment.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image. The mode can generally be anything but it is possible +to supply a binarized-input-only model which requires accordingly +treated images.

  • +
  • text_direction (str) – Passed-through value for serialization.serialize.

  • +
  • mask (Optional[numpy.ndarray]) – A bi-level mask image of the same size as im where 0-valued +regions are ignored for segmentation purposes. Disables column +detection.

  • +
  • reading_order_fn (Callable) – Function to determine the reading order. Has to +accept a list of tuples (baselines, polygon) and a +text direction (lr or rl).

  • +
  • model (Union[List[kraken.lib.vgsl.TorchVGSLModel], kraken.lib.vgsl.TorchVGSLModel]) – One or more TorchVGSLModel containing a segmentation model. If +none is given a default model will be loaded.

  • +
  • device (str) – The target device to run the neural network on.

  • +
+
+
Returns:
+

A dictionary containing the text direction and under the key ‘lines’ a +list of reading order sorted baselines (polylines) and their respective +polygonal boundaries. The last and first point of each boundary polygon +are connected.

+
 {'text_direction': '$dir',
+  'type': 'baseline',
+  'lines': [
+     {'baseline': [[x0, y0], [x1, y1], ..., [x_n, y_n]], 'boundary': [[x0, y0, x1, y1], ... [x_m, y_m]]},
+     {'baseline': [[x0, ...]], 'boundary': [[x0, ...]]}
+   ]
+   'regions': [
+     {'region': [[x0, y0], [x1, y1], ..., [x_n, y_n]], 'type': 'image'},
+     {'region': [[x0, ...]], 'type': 'text'}
+   ]
+ }
+
+
+

+
+
Raises:
+
+
+
Return type:
+

Dict[str, Any]

+
+
+
+ +
+
+

kraken.pageseg module

+
+

Note

+

pageseg is the legacy bounding box-based segmenter. For the trainable +baseline segmenter interface refer to the blla module. Note that +recognition models are not interchangeable between segmenters.

+
+
+
+kraken.pageseg.segment(im, text_direction='horizontal-lr', scale=None, maxcolseps=2, black_colseps=False, no_hlines=True, pad=0, mask=None, reading_order_fn=reading_order)
+

Segments a page into text lines.

+

Segments a page into text lines and returns the absolute coordinates of +each line in reading order.

+
+
Parameters:
+
    +
  • im – A bi-level page of mode ‘1’ or ‘L’

  • +
  • text_direction (str) – Principal direction of the text +(horizontal-lr/rl/vertical-lr/rl)

  • +
  • scale (Optional[float]) – Scale of the image. Will be auto-determined if set to None.

  • +
  • maxcolseps (float) – Maximum number of whitespace column separators

  • +
  • black_colseps (bool) – Whether column separators are assumed to be vertical +black lines or not

  • +
  • no_hlines (bool) – Switch for small horizontal line removal.

  • +
  • pad (Union[int, Tuple[int, int]]) – Padding to add to line bounding boxes. If int the same padding is +used both left and right. If a 2-tuple, uses (padding_left, +padding_right).

  • +
  • mask (Optional[numpy.ndarray]) – A bi-level mask image of the same size as im where 0-valued +regions are ignored for segmentation purposes. Disables column +detection.

  • +
  • reading_order_fn (Callable) – Function to call to order line output. Callable +accepting a list of slices (y, x) and a text +direction in (rl, lr).

  • +
+
+
Returns:
+

A dictionary containing the text direction and a list of reading order +sorted bounding boxes under the key ‘boxes’:

+
{'text_direction': '$dir', 'boxes': [(x1, y1, x2, y2),...]}
+
+
+

+
+
Raises:
+

KrakenInputException – if the input image is not binarized or the text +direction is invalid.

+
+
Return type:
+

Dict[str, Any]

+
+
+
+ +
+
+

kraken.rpred module

+
+
+class kraken.rpred.mm_rpred(nets, im, bounds, pad=16, bidi_reordering=True, tags_ignore=None)
+

Multi-model version of kraken.rpred.rpred

+
+
Parameters:
+
+
+
+
+
+bidi_reordering
+
+ +
+
+bounds
+
+ +
+
+filtered_tags = []
+
+ +
+
+im
+
+ +
+
+im_str
+
+ +
+
+miss = []
+
+ +
+
+nets
+
+ +
+
+one_channel_modes
+
+ +
+
+pad
+
+ +
+
+seg_types
+
+ +
+
+tags
+
+ +
+
+tags_ignore
+
+ +
+
+ts
+
+ +
+ +
+
+class kraken.rpred.ocr_record(prediction, cuts, confidences, display_order=True)
+

A record object containing the recognition result of a single line

+
+
Parameters:
+
    +
  • prediction (str)

  • +
  • cuts (Sequence[Union[Tuple[int, int], Tuple[Tuple[int, int], Tuple[int, int], Tuple[int, int], Tuple[int, int]]]])

  • +
  • confidences (Sequence[float])

  • +
  • display_order (bool)

  • +
+
+
+
+
+base_dir = None
+
+ +
+
+property confidences: List[float]
+
+
Return type:
+

List[float]

+
+
+
+ +
+
+property cuts: Sequence
+
+
Return type:
+

Sequence

+
+
+
+ +
+
+abstract display_order(base_dir)
+
+
Return type:
+

ocr_record

+
+
+
+ +
+
+abstract logical_order(base_dir)
+
+
Return type:
+

ocr_record

+
+
+
+ +
+
+property prediction: str
+
+
Return type:
+

str

+
+
+
+ +
+
+abstract property type
+
+ +
+ +
+
+kraken.rpred.rpred(network, im, bounds, pad=16, bidi_reordering=True)
+

Uses a TorchSeqRecognizer and a segmentation to recognize text

+
+
Parameters:
+
    +
  • network (kraken.lib.models.TorchSeqRecognizer) – A TorchSegRecognizer +object

  • +
  • im (PIL.Image.Image) – Image to extract text from

  • +
  • bounds (dict) – A dictionary containing a ‘boxes’ entry with a list of +coordinates (x0, y0, x1, y1) of a text line in the image +and an entry ‘text_direction’ containing +‘horizontal-lr/rl/vertical-lr/rl’.

  • +
  • pad (int) – Extra blank padding to the left and right of text line. +Auto-disabled when expected network inputs are incompatible +with padding.

  • +
  • bidi_reordering (bool|str) – Reorder classes in the ocr_record according to +the Unicode bidirectional algorithm for correct +display. Set to L|R to change base text +direction.

  • +
+
+
Yields:
+

An ocr_record containing the recognized text, absolute character +positions, and confidence values for each character.

+
+
Return type:
+

Generator[ocr_record, None, None]

+
+
+
+ +
+
+

kraken.serialization module

+
+
+kraken.serialization.render_report(model, chars, errors, char_confusions, scripts, insertions, deletions, substitutions)
+

Renders an accuracy report.

+
+
Parameters:
+
    +
  • model (str) – Model name.

  • +
  • errors (int) – Number of errors on test set.

  • +
  • char_confusions (dict) – Dictionary mapping a tuple (gt, pred) to a +number of occurrences.

  • +
  • scripts (dict) – Dictionary counting character per script.

  • +
  • insertions (dict) – Dictionary counting insertion operations per Unicode +script

  • +
  • deletions (int) – Number of deletions

  • +
  • substitutions (dict) – Dictionary counting substitution operations per +Unicode script.

  • +
  • chars (int)

  • +
+
+
Returns:
+

A string containing the rendered report.

+
+
Return type:
+

str

+
+
+
+ +
+
+kraken.serialization.serialize(records, image_name=None, image_size=(0, 0), writing_mode='horizontal-tb', scripts=None, regions=None, template='alto', template_source='native', processing_steps=None)
+

Serializes a list of ocr_records into an output document.

+

Serializes a list of predictions and their corresponding positions by doing +some hOCR-specific preprocessing and then renders them through one of +several jinja2 templates.

+

Note: Empty records are ignored for serialization purposes.

+
+
Parameters:
+
    +
  • records (Sequence[kraken.rpred.ocr_record]) – List of kraken.rpred.ocr_record

  • +
  • image_name (Union[os.PathLike, str]) – Name of the source image

  • +
  • image_size (Tuple[int, int]) – Dimensions of the source image

  • +
  • writing_mode (Literal['horizontal-tb', 'vertical-lr', 'vertical-rl']) – Sets the principal layout of lines and the +direction in which blocks progress. Valid values are +horizontal-tb, vertical-rl, and vertical-lr.

  • +
  • scripts (Optional[Iterable[str]]) – List of scripts contained in the OCR records

  • +
  • regions (Optional[Dict[str, List[List[Tuple[int, int]]]]]) – Dictionary mapping region types to a list of region polygons.

  • +
  • template ([os.PathLike, str]) – Selector for the serialization format. May be ‘hocr’, +‘alto’, ‘page’ or any template found in the template +directory. If template_source is set to custom a path to a +template is expected.

  • +
  • template_source (Literal['native', 'custom']) – Switch to enable loading of custom templates from +outside the kraken package.

  • +
  • processing_steps (Optional[List[Dict[str, Union[Dict, str, float, int, bool]]]]) –

    A list of dictionaries describing the processing kraken performed on the inputs:

    +
    {'category': 'preprocessing',
    + 'description': 'natural language description of process',
    + 'settings': {'arg0': 'foo', 'argX': 'bar'}
    +}
    +
    +
    +

  • +
+
+
Returns:
+

The rendered template

+
+
Return type:
+

str

+
+
+
+ +
+
+kraken.serialization.serialize_segmentation(segresult, image_name=None, image_size=(0, 0), template='alto', template_source='native', processing_steps=None)
+

Serializes a segmentation result into an output document.

+
+
Parameters:
+
    +
  • segresult (Dict[str, Any]) – Result of blla.segment

  • +
  • image_name (Union[os.PathLike, str]) – Name of the source image

  • +
  • image_size (Tuple[int, int]) – Dimensions of the source image

  • +
  • template (Union[os.PathLike, str]) – Selector for the serialization format. Any value accepted by +serialize is valid.

  • +
  • template_source (Literal['native', 'custom']) – Enables/disables loading of external templates.

  • +
  • processing_steps (Optional[List[Dict[str, Union[Dict, str, float, int, bool]]]])

  • +
+
+
Returns:
+

(str) rendered template.

+
+
Return type:
+

str

+
+
+
+ +
+
+

kraken.lib.models module

+
+
+class kraken.lib.models.TorchSeqRecognizer(nn, decoder=kraken.lib.ctc_decoder.greedy_decoder, train=False, device='cpu')
+

A wrapper class around a TorchVGSLModel for text recognition.

+
+
Parameters:
+
+
+
+
+
+codec
+
+ +
+
+decoder
+
+ +
+
+device
+
+ +
+
+forward(line, lens=None)
+

Performs a forward pass on a torch tensor of one or more lines with +shape (N, C, H, W) and returns a numpy array (N, W, C).

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (torch.Tensor) – Optional tensor containing sequence lengths if N > 1

  • +
+
+
Returns:
+

Tuple with (N, W, C) shaped numpy array and final output sequence +lengths.

+
+
Raises:
+

KrakenInputException – Is raised if the channel dimension isn’t of +size 1 in the network output.

+
+
Return type:
+

Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]

+
+
+
+ +
+
+kind = ''
+
+ +
+
+nn
+
+ +
+
+one_channel_mode
+
+ +
+
+predict(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns the decoding as a list of tuples (string, start, end, +confidence).

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (Optional[torch.Tensor]) – Optional tensor containing sequence lengths if N > 1

  • +
+
+
Returns:
+

List of decoded sequences.

+
+
Return type:
+

List[List[Tuple[str, int, int, float]]]

+
+
+
+ +
+
+predict_labels(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns a list of tuples (class, start, end, max). Max is the +maximum value of the softmax layer in the region.

+
+
Parameters:
+
    +
  • line (torch.tensor)

  • +
  • lens (torch.Tensor)

  • +
+
+
Return type:
+

List[List[Tuple[int, int, int, float]]]

+
+
+
+ +
+
+predict_string(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns a string of the results.

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (Optional[torch.Tensor]) – Optional tensor containing the sequence lengths of the input batch.

  • +
+
+
Return type:
+

List[str]

+
+
+
+ +
+
+seg_type
+
+ +
+
+to(device)
+

Moves model to device and automatically loads input tensors onto it.

+
+ +
+
+train
+
+ +
+ +
+
+kraken.lib.models.load_any(fname, train=False, device='cpu')
+

Loads anything that was, is, and will be a valid ocropus model and +instantiates a shiny new kraken.lib.lstm.SeqRecognizer from the RNN +configuration in the file.

+

Currently it recognizes the following kinds of models:

+
+
    +
  • protobuf models containing VGSL segmentation and recognition +networks.

  • +
+
+

Additionally an attribute ‘kind’ will be added to the SeqRecognizer +containing a string representation of the source kind. Current known values +are:

+
+
    +
  • vgsl for VGSL models

  • +
+
+
+
Parameters:
+
    +
  • fname (Union[os.PathLike, str]) – Path to the model

  • +
  • train (bool) – Enables gradient calculation and dropout layers in model.

  • +
  • device (str) – Target device

  • +
+
+
Returns:
+

A kraken.lib.models.TorchSeqRecognizer object.

+
+
Raises:
+

KrakenInvalidModelException – if the model is not loadable by any parser.

+
+
Return type:
+

TorchSeqRecognizer

+
+
+
+ +
+
+

kraken.lib.vgsl module

+
+
+class kraken.lib.vgsl.TorchVGSLModel(spec)
+

Class building a torch module from a VSGL spec.

+

The initialized class will contain a variable number of layers and a loss +function. Inputs and outputs are always 4D tensors in order (batch, +channels, height, width) with channels always being the feature dimension.

+

Importantly this means that a recurrent network will be fed the channel +vector at each step along its time axis, i.e. either put the non-time-axis +dimension into the channels dimension or use a summarizing RNN squashing +the time axis to 1 and putting the output into the channels dimension +respectively.

+
+
Parameters:
+

spec (str)

+
+
+
+
+input
+

Expected input tensor as a 4-tuple.

+
+ +
+
+nn
+

Stack of layers parsed from the spec.

+
+ +
+
+criterion
+

Fully parametrized loss function.

+
+ +
+
+user_metadata
+

dict with user defined metadata. Is flushed into +model file during saving/overwritten by loading +operations.

+
+ +
+
+one_channel_mode
+

Field indicating the image type used during +training of one-channel images. Is ‘1’ for +models trained on binarized images, ‘L’ for +grayscale, and None otherwise.

+
+ +
+
+add_codec(codec)
+

Adds a PytorchCodec to the model.

+
+
Parameters:
+

codec (kraken.lib.codec.PytorchCodec)

+
+
Return type:
+

None

+
+
+
+ +
+
+append(idx, spec)
+

Splits a model at layer idx and append layers spec.

+

New layers are initialized using the init_weights method.

+
+
Parameters:
+
    +
  • idx (int) – Index of layer to append spec to starting with 1. To +select the whole layer stack set idx to None.

  • +
  • spec (str) – VGSL spec without input block to append to model.

  • +
+
+
Return type:
+

None

+
+
+
+ +
+
+property aux_layers
+
+ +
+
+blocks
+
+ +
+
+build_addition(input, blocks, idx)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_conv(input, blocks, idx)
+

Builds a 2D convolution layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_dropout(input, blocks, idx)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_groupnorm(input, blocks, idx)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_identity(input, blocks, idx)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_maxpool(input, blocks, idx)
+

Builds a maxpool layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_output(input, blocks, idx)
+

Builds an output layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_parallel(input, blocks, idx)
+

Builds a block of parallel layers.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_reshape(input, blocks, idx)
+

Builds a reshape layer

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_rnn(input, blocks, idx)
+

Builds an LSTM/GRU layer returning number of outputs and layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_series(input, blocks, idx)
+

Builds a serial block of layers.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_wav2vec2(input, blocks, idx)
+

Builds a Wav2Vec2 masking layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+codec: kraken.lib.codec.PytorchCodec | None = None
+
+ +
+
+criterion: Any = None
+
+ +
+
+eval()
+

Sets the model to evaluation/inference mode, disabling dropout and +gradient calculation.

+
+
Return type:
+

None

+
+
+
+ +
+
+property hyper_params
+
+ +
+
+idx
+
+ +
+
+init_weights(idx=slice(0, None))
+

Initializes weights for all or a subset of layers in the graph.

+

LSTM/GRU layers are orthogonally initialized, convolutional layers +uniformly from (-0.1,0.1).

+
+
Parameters:
+

idx (slice) – A slice object representing the indices of layers to +initialize.

+
+
Return type:
+

None

+
+
+
+ +
+
+input
+
+ +
+
+classmethod load_model(path)
+

Deserializes a VGSL model from a CoreML file.

+
+
Parameters:
+

path (Union[str, os.PathLike]) – CoreML file

+
+
Returns:
+

A TorchVGSLModel instance.

+
+
Raises:
+
    +
  • KrakenInvalidModelException if the model data is invalid (not a

  • +
  • string, protobuf file, or without appropriate metadata).

  • +
  • FileNotFoundError if the path doesn't point to a file.

  • +
+
+
+
+ +
+
+m
+
+ +
+
+property model_type
+
+ +
+
+named_spec: List[str] = []
+
+ +
+
+nn
+
+ +
+
+property one_channel_mode
+
+ +
+
+ops
+
+ +
+
+pattern
+
+ +
+
+resize_output(output_size, del_indices=None)
+

Resizes an output layer.

+
+
Parameters:
+
    +
  • output_size (int) – New size/output channels of last layer

  • +
  • del_indices (list) – list of outputs to delete from layer

  • +
+
+
Return type:
+

None

+
+
+
+ +
+
+save_model(path)
+

Serializes the model into path.

+
+
Parameters:
+

path (str) – Target destination

+
+
+
+ +
+
+property seg_type
+
+ +
+
+set_num_threads(num)
+

Sets number of OpenMP threads to use.

+
+
Parameters:
+

num (int)

+
+
Return type:
+

None

+
+
+
+ +
+
+spec
+
+ +
+
+to(device)
+
+
Parameters:
+

device (Union[str, torch.device])

+
+
Return type:
+

None

+
+
+
+ +
+
+train()
+

Sets the model to training mode (enables dropout layers and disables +softmax on CTC layers).

+
+
Return type:
+

None

+
+
+
+ +
+
+user_metadata: dict[str, Any]
+
+ +
+ +
+
+

kraken.lib.xml module

+
+
+kraken.lib.xml.parse_xml(filename)
+

Parses either a PageXML or ALTO file with autodetermination of the file +format.

+
+
Parameters:
+

filename (Union[str, os.PathLike]) – path to an XML file.

+
+
Returns:
+

A dict:

+
{'image': impath,
+ 'lines': [{'boundary': [[x0, y0], ...],
+            'baseline': [[x0, y0], ...],
+            'text': apdjfqpf',
+            'tags': {'type': 'default', ...}},
+           ...
+           {...}],
+ 'regions': {'region_type_0': [[[x0, y0], ...], ...], ...}}
+
+
+

+
+
Return type:
+

Dict[str, Any]

+
+
+
+ +
+
+kraken.lib.xml.parse_page(filename)
+

Parses a PageXML file, returns the baselines defined in it, and loads the +referenced image.

+
+
Parameters:
+

filename (Union[str, os.PathLike]) – path to a PageXML file.

+
+
Returns:
+

A dict:

+
{'image': impath,
+ 'lines': [{'boundary': [[x0, y0], ...],
+            'baseline': [[x0, y0], ...],
+            'text': apdjfqpf',
+            'tags': {'type': 'default', ...}},
+           ...
+           {...}],
+ 'regions': {'region_type_0': [[[x0, y0], ...], ...], ...}}
+
+
+

+
+
Return type:
+

Dict[str, Any]

+
+
+
+ +
+
+kraken.lib.xml.parse_alto(filename)
+

Parses an ALTO file, returns the baselines defined in it, and loads the +referenced image.

+
+
Parameters:
+

filename (Union[str, os.PathLike]) – path to an ALTO file.

+
+
Returns:
+

A dict:

+
{'image': impath,
+ 'lines': [{'boundary': [[x0, y0], ...],
+            'baseline': [[x0, y0], ...],
+            'text': apdjfqpf',
+            'tags': {'type': 'default', ...}},
+           ...
+           {...}],
+ 'regions': {'region_type_0': [[[x0, y0], ...], ...], ...}}
+
+
+

+
+
Return type:
+

Dict[str, Any]

+
+
+
+ +
+
+

kraken.lib.codec module

+
+
+class kraken.lib.codec.PytorchCodec(charset, strict=False)
+

Builds a codec converting between graphemes/code points and integer +label sequences.

+

charset may either be a string, a list or a dict. In the first case +each code point will be assigned a label, in the second case each +string in the list will be assigned a label, and in the final case each +key string will be mapped to the value sequence of integers. In the +first two cases labels will be assigned automatically. When a mapping +is manually provided the label codes need to be a prefix-free code.

+

As 0 is the blank label in a CTC output layer, output labels and input +dictionaries are/should be 1-indexed.

+
+
Parameters:
+
    +
  • charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) – Input character set.

  • +
  • strict – Flag indicating if encoding/decoding errors should be ignored +or cause an exception.

  • +
+
+
Raises:
+

KrakenCodecException – If the character set contains duplicate +entries or the mapping is non-singular or +non-prefix-free.

+
+
+
+
+add_labels(charset)
+

Adds additional characters/labels to the codec.

+

charset may either be a string, a list or a dict. In the first case +each code point will be assigned a label, in the second case each +string in the list will be assigned a label, and in the final case each +key string will be mapped to the value sequence of integers. In the +first two cases labels will be assigned automatically.

+

As 0 is the blank label in a CTC output layer, output labels and input +dictionaries are/should be 1-indexed.

+
+
Parameters:
+

charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) – Input character set.

+
+
Return type:
+

PytorchCodec

+
+
+
+ +
+
+c_sorted
+
+ +
+
+decode(labels)
+

Decodes a labelling.

+

Given a labelling with cuts and confidences returns a string with the +cuts and confidences aggregated across label-code point +correspondences. When decoding multilabels to code points the resulting +cuts are min/max, confidences are averaged.

+
+
Parameters:
+

labels (Sequence[Tuple[int, int, int, float]]) – Input containing tuples (label, start, end, +confidence).

+
+
Returns:
+

A list of tuples (code point, start, end, confidence)

+
+
Return type:
+

List[Tuple[str, int, int, float]]

+
+
+
+ +
+
+encode(s)
+

Encodes a string into a sequence of labels.

+

If the code is non-singular we greedily encode the longest sequence first.

+
+
Parameters:
+

s (str) – Input unicode string

+
+
Returns:
+

Ecoded label sequence

+
+
Raises:
+

KrakenEncodeException – if the a subsequence is not encodable and the +codec is set to strict mode.

+
+
Return type:
+

torch.IntTensor

+
+
+
+ +
+
+property is_valid: bool
+

Returns True if the codec is prefix-free (in label space) and +non-singular (in both directions).

+
+
Return type:
+

bool

+
+
+
+ +
+
+l2c: Dict[Tuple[int], str]
+
+ +
+
+property max_label: int
+

Returns the maximum label value.

+
+
Return type:
+

int

+
+
+
+ +
+
+merge(codec)
+

Transforms this codec (c1) into another (c2) reusing as many labels as +possible.

+

The resulting codec is able to encode the same code point sequences +while not necessarily having the same labels for them as c2. +Retains matching character -> label mappings from both codecs, removes +mappings not c2, and adds mappings not in c1. Compound labels in c2 for +code point sequences not in c1 containing labels also in use in c1 are +added as separate labels.

+
+
Parameters:
+

codec (PytorchCodec) – PytorchCodec to merge with

+
+
Returns:
+

A merged codec and a list of labels that were removed from the +original codec.

+
+
Return type:
+

Tuple[PytorchCodec, Set]

+
+
+
+ +
+
+strict
+
+ +
+ +
+
+

kraken.lib.train module

+
+

Training Schedulers

+
+
+

Training Stoppers

+
+
+

Loss and Evaluation Functions

+
+
+

Trainer

+
+
+class kraken.lib.train.KrakenTrainer(enable_progress_bar=True, enable_summary=True, min_epochs=5, max_epochs=100, pb_ignored_metrics=('loss', 'val_metric'), move_metrics_to_cpu=True, freeze_backbone=-1, failed_sample_threshold=10, pl_logger=None, log_dir=None, *args, **kwargs)
+
+
Parameters:
+
    +
  • enable_progress_bar (bool)

  • +
  • enable_summary (bool)

  • +
  • min_epochs (int)

  • +
  • max_epochs (int)

  • +
  • pb_ignored_metrics (Sequence[str])

  • +
  • move_metrics_to_cpu (bool)

  • +
  • pl_logger (Optional[pytorch_lightning.loggers.logger.DummyLogger])

  • +
  • log_dir (Optional[os.PathLike])

  • +
+
+
+
+
+automatic_optimization = False
+
+ +
+
+fit(*args, **kwargs)
+
+ +
+ +
+
+
+

kraken.lib.dataset module

+
+

Datasets

+
+
+class kraken.lib.dataset.BaselineSet(imgs=None, suffix='.path', line_width=4, padding=(0, 0, 0, 0), im_transforms=transforms.Compose([]), mode='path', augmentation=False, valid_baselines=None, merge_baselines=None, valid_regions=None, merge_regions=None)
+

Dataset for training a baseline/region segmentation model.

+
+
Parameters:
+
    +
  • imgs (Sequence[Union[os.PathLike, str]])

  • +
  • suffix (str)

  • +
  • line_width (int)

  • +
  • padding (Tuple[int, int, int, int])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • mode (Optional[Literal['path', 'alto', 'page', 'xml']])

  • +
  • augmentation (bool)

  • +
  • valid_baselines (Sequence[str])

  • +
  • merge_baselines (Dict[str, Sequence[str]])

  • +
  • valid_regions (Sequence[str])

  • +
  • merge_regions (Dict[str, Sequence[str]])

  • +
+
+
+
+
+add(image, baselines=None, regions=None, *args, **kwargs)
+

Adds a page to the dataset.

+
+
Parameters:
+
    +
  • im – Path to the whole page image

  • +
  • baseline – A list containing dicts with a list of coordinates +and tags [{‘baseline’: [[x0, y0], …, +[xn, yn]], ‘tags’: (‘script_type’,)}, …]

  • +
  • regions (Dict[str, List[List[Tuple[int, int]]]]) – A dict containing list of lists of coordinates +{‘region_type_0’: [[x0, y0], …, [xn, yn]]], +‘region_type_1’: …}.

  • +
  • image (Union[os.PathLike, str, PIL.Image.Image])

  • +
  • baselines (List[List[List[Tuple[int, int]]]])

  • +
+
+
+
+ +
+
+aug = None
+
+ +
+
+class_mapping
+
+ +
+
+class_stats
+
+ +
+
+failed_samples
+
+ +
+
+im_mode = '1'
+
+ +
+
+imgs
+
+ +
+
+line_width
+
+ +
+
+mbl_dict
+
+ +
+
+mode
+
+ +
+
+mreg_dict
+
+ +
+
+num_classes = 2
+
+ +
+
+pad
+
+ +
+
+seg_type = None
+
+ +
+
+targets = []
+
+ +
+
+transform(image, target)
+
+ +
+
+transforms
+
+ +
+
+valid_baselines
+
+ +
+
+valid_regions
+
+ +
+ +
+
+class kraken.lib.dataset.PolygonGTDataset(normalization=None, whitespace_normalization=True, skip_empty_lines=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False)
+

Dataset for training a line recognition model from polygonal/baseline data.

+
+
Parameters:
+
    +
  • normalization (Optional[str])

  • +
  • whitespace_normalization (bool)

  • +
  • skip_empty_lines (bool)

  • +
  • reorder (Union[bool, str])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • augmentation (bool)

  • +
+
+
+
+
+add(*args, **kwargs)
+

Adds a line to the dataset.

+
+
Parameters:
+
    +
  • im (path) – Path to the whole page image

  • +
  • text (str) – Transcription of the line.

  • +
  • baseline (list) – A list of coordinates [[x0, y0], …, [xn, yn]].

  • +
  • boundary (list) – A polygon mask for the line.

  • +
+
+
+
+ +
+
+alphabet: collections.Counter
+
+ +
+
+aug = None
+
+ +
+
+encode(codec=None)
+

Adds a codec to the dataset and encodes all text lines.

+

Has to be run before sampling from the dataset.

+
+
Parameters:
+

codec (Optional[kraken.lib.codec.PytorchCodec])

+
+
Return type:
+

None

+
+
+
+ +
+
+failed_samples
+
+ +
+
+im_mode = '1'
+
+ +
+
+no_encode()
+

Creates an unencoded dataset.

+
+
Return type:
+

None

+
+
+
+ +
+
+parse(image, text, baseline, boundary, *args, **kwargs)
+

Parses a sample for the dataset and returns it.

+

This function is mainly uses for parallelized loading of training data.

+
+
Parameters:
+
    +
  • im (path) – Path to the whole page image

  • +
  • text (str) – Transcription of the line.

  • +
  • baseline (list) – A list of coordinates [[x0, y0], …, [xn, yn]].

  • +
  • boundary (list) – A polygon mask for the line.

  • +
  • image (Union[os.PathLike, str, PIL.Image.Image])

  • +
+
+
+
+ +
+
+seg_type = 'baselines'
+
+ +
+
+skip_empty_lines
+
+ +
+
+text_transforms: List[Callable[[str], str]] = []
+
+ +
+
+transforms
+
+ +
+ +
+
+class kraken.lib.dataset.GroundTruthDataset(split=F_t.default_split, suffix='.gt.txt', normalization=None, whitespace_normalization=True, skip_empty_lines=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False)
+

Dataset for training a line recognition model.

+

All data is cached in memory.

+
+
Parameters:
+
    +
  • split (Callable[[Union[os.PathLike, str]], str])

  • +
  • suffix (str)

  • +
  • normalization (Optional[str])

  • +
  • whitespace_normalization (bool)

  • +
  • skip_empty_lines (bool)

  • +
  • reorder (Union[bool, str])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • augmentation (bool)

  • +
+
+
+
+
+add(*args, **kwargs)
+

Adds a line-image-text pair to the dataset.

+
+
Parameters:
+

image (str) – Input image path

+
+
Return type:
+

None

+
+
+
+ +
+
+alphabet: collections.Counter
+
+ +
+
+aug = None
+
+ +
+
+encode(codec=None)
+

Adds a codec to the dataset and encodes all text lines.

+

Has to be run before sampling from the dataset.

+
+
Parameters:
+

codec (Optional[kraken.lib.codec.PytorchCodec])

+
+
Return type:
+

None

+
+
+
+ +
+
+failed_samples
+
+ +
+
+im_mode = '1'
+
+ +
+
+no_encode()
+

Creates an unencoded dataset.

+
+
Return type:
+

None

+
+
+
+ +
+
+parse(image, *args, **kwargs)
+

Parses a sample for this dataset.

+

This is mostly used to parallelize populating the dataset.

+
+
Parameters:
+

image (str) – Input image path

+
+
Return type:
+

Dict

+
+
+
+ +
+
+seg_type = 'bbox'
+
+ +
+
+skip_empty_lines
+
+ +
+
+split
+
+ +
+
+suffix
+
+ +
+
+text_transforms: List[Callable[[str], str]] = []
+
+ +
+
+transforms
+
+ +
+ +
+
+

Helpers

+
+
+

kraken.lib.segmentation module

+
+
+kraken.lib.segmentation.reading_order(lines, text_direction='lr')
+

Given the list of lines (a list of 2D slices), computes +the partial reading order. The output is a binary 2D array +such that order[i,j] is true if line i comes before line j +in reading order.

+
+
Parameters:
+
    +
  • lines (Sequence[Tuple[slice, slice]])

  • +
  • text_direction (str)

  • +
+
+
Return type:
+

numpy.ndarray

+
+
+
+ +
+
+kraken.lib.segmentation.polygonal_reading_order(lines, text_direction='lr', regions=None)
+

Given a list of baselines and regions, calculates the correct reading order +and applies it to the input.

+
+
Parameters:
+
    +
  • lines (Sequence) – List of tuples containing the baseline and its +polygonization.

  • +
  • regions (Sequence) – List of region polygons.

  • +
  • text_direction (str) – Set principal text direction for column ordering. +Can be ‘lr’ or ‘rl’

  • +
+
+
Returns:
+

A reordered input.

+
+
Return type:
+

Sequence[Tuple[List[Tuple[int, int]], List[Tuple[int, int]]]]

+
+
+
+ +
+
+kraken.lib.segmentation.denoising_hysteresis_thresh(im, low, high, sigma)
+
+ +
+
+kraken.lib.segmentation.vectorize_lines(im, threshold=0.17, min_length=5, text_direction='horizontal')
+

Vectorizes lines from a binarized array.

+
+
Parameters:
+
    +
  • im (np.ndarray) – Array of shape (3, H, W) with the first dimension +being probabilities for (start_separators, +end_separators, baseline).

  • +
  • threshold (float) – Threshold for baseline blob detection.

  • +
  • min_length (int) – Minimal length of output baselines.

  • +
  • text_direction (str) – Base orientation of the text line (horizontal or +vertical).

  • +
+
+
Returns:
+

[[x0, y0, … xn, yn], [xm, ym, …, xk, yk], … ] +A list of lists containing the points of all baseline polylines.

+
+
+
+ +
+
+kraken.lib.segmentation.calculate_polygonal_environment(im=None, baselines=None, suppl_obj=None, im_feats=None, scale=None, topline=False)
+

Given a list of baselines and an input image, calculates a polygonal +environment around each baseline.

+
+
Parameters:
+
    +
  • im (PIL.Image) – grayscale input image (mode ‘L’)

  • +
  • baselines (sequence) – List of lists containing a single baseline per +entry.

  • +
  • suppl_obj (sequence) – List of lists containing additional polylines +that should be considered hard boundaries for +polygonizaton purposes. Can be used to prevent +polygonization into non-text areas such as +illustrations or to compute the polygonization of +a subset of the lines in an image.

  • +
  • im_feats (numpy.array) – An optional precomputed seamcarve energy map. +Overrides data in im. The default map is +gaussian_filter(sobel(im), 2).

  • +
  • scale (tuple) – A 2-tuple (h, w) containing optional scale factors of +the input. Values of 0 are used for aspect-preserving +scaling. None skips input scaling.

  • +
  • topline (bool) – Switch to change default baseline location for offset +calculation purposes. If set to False, baselines are +assumed to be on the bottom of the text line and will +be offset upwards, if set to True, baselines are on the +top and will be offset downwards. If set to None, no +offset will be applied.

  • +
+
+
Returns:
+

List of lists of coordinates. If no polygonization could be compute for +a baseline None is returned instead.

+
+
+
+ +
+
+kraken.lib.segmentation.scale_polygonal_lines(lines, scale)
+

Scales baselines/polygon coordinates by a certain factor.

+
+
Parameters:
+
    +
  • lines (Sequence) – List of tuples containing the baseline and it’s +polygonization.

  • +
  • scale (float or tuple of floats) – Scaling factor

  • +
+
+
Return type:
+

Sequence[Tuple[List, List]]

+
+
+
+ +
+
+kraken.lib.segmentation.scale_regions(regions, scale)
+

Scales baselines/polygon coordinates by a certain factor.

+
+
Parameters:
+
    +
  • lines (Sequence) – List of tuples containing the baseline and it’s +polygonization.

  • +
  • scale (float or tuple of floats) – Scaling factor

  • +
  • regions (Sequence[Tuple[List[int], List[int]]])

  • +
+
+
Return type:
+

Sequence[Tuple[List, List]]

+
+
+
+ +
+
+kraken.lib.segmentation.compute_polygon_section(baseline, boundary, dist1, dist2)
+

Given a baseline, polygonal boundary, and two points on the baseline return +the rectangle formed by the orthogonal cuts on that baseline segment. The +resulting polygon is not garantueed to have a non-zero area.

+

The distance can be larger than the actual length of the baseline if the +baseline endpoints are inside the bounding polygon. In that case the +baseline will be extrapolated to the polygon edge.

+
+
Parameters:
+
    +
  • baseline (list) – A polyline ((x1, y1), …, (xn, yn))

  • +
  • boundary (list) – A bounding polygon around the baseline (same format as +baseline).

  • +
  • dist1 (int) – Absolute distance along the baseline of the first point.

  • +
  • dist2 (int) – Absolute distance along the baseline of the second point.

  • +
+
+
Returns:
+

A sequence of polygon points.

+
+
Return type:
+

Tuple[Tuple[int, int]]

+
+
+
+ +
+
+kraken.lib.segmentation.extract_polygons(im, bounds)
+

Yields the subimages of image im defined in the list of bounding polygons +with baselines preserving order.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image

  • +
  • bounds (Dict[str, Any]) –

    A list of dicts in baseline:

    +
    {'type': 'baselines',
    + 'lines': [{'baseline': [[x_0, y_0], ... [x_n, y_n]],
    +            'boundary': [[x_0, y_0], ... [x_n, y_n]]},
    +           ....]
    +}
    +
    +
    +

    or bounding box format:

    +
    {'boxes': [[x_0, y_0, x_1, y_1], ...], 'text_direction': 'horizontal-lr'}
    +
    +
    +

  • +
+
+
Yields:
+

The extracted subimage

+
+
Return type:
+

PIL.Image.Image

+
+
+
+ +
+
+
+

kraken.lib.ctc_decoder

+
+
+kraken.lib.ctc_decoder.beam_decoder(outputs, beam_size=3)
+

Translates back the network output to a label sequence using +same-prefix-merge beam search decoding as described in [0].

+

[0] Hannun, Awni Y., et al. “First-pass large vocabulary continuous speech +recognition using bi-directional recurrent DNNs.” arXiv preprint +arXiv:1408.2873 (2014).

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • beam_size (int) – Size of the beam

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, prob). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+kraken.lib.ctc_decoder.greedy_decoder(outputs)
+

Translates back the network output to a label sequence using greedy/best +path decoding as described in [0].

+

[0] Graves, Alex, et al. “Connectionist temporal classification: labelling +unsegmented sequence data with recurrent neural networks.” Proceedings of +the 23rd international conference on Machine learning. ACM, 2006.

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, max). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+kraken.lib.ctc_decoder.blank_threshold_decoder(outputs, threshold=0.5)
+

Translates back the network output to a label sequence as the original +ocropy/clstm.

+

Thresholds on class 0, then assigns the maximum (non-zero) class to each +region.

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • threshold (float) – Threshold for 0 class when determining possible label +locations.

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, max). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+

kraken.lib.exceptions

+
+
+class kraken.lib.exceptions.KrakenCodecException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenStopTrainingException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenEncodeException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenRecordException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenInvalidModelException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenInputException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenRepoException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenCairoSurfaceException(message, width, height)
+

Raised when the Cairo surface couldn’t be created.

+
+
Parameters:
+
    +
  • message (str)

  • +
  • width (int)

  • +
  • height (int)

  • +
+
+
+
+
+message
+

Error message

+
+
Type:
+

str

+
+
+
+ +
+
+width
+

Width of the surface

+
+
Type:
+

int

+
+
+
+ +
+
+height
+

Height of the surface

+
+
Type:
+

int

+
+
+
+ +
+
+height
+
+ +
+
+message
+
+ +
+
+width
+
+ +
+ +
+
+

Legacy modules

+

These modules are retained for compatibility reasons or highly specialized use +cases. In most cases their use is not necessary and they aren’t further +developed for interoperability with new functionality, e.g. the transcription +and line generation modules do not work with the baseline segmenter.

+
+

kraken.binarization module

+
+
+kraken.binarization.nlbin(im, threshold=0.5, zoom=0.5, escale=1.0, border=0.1, perc=80, range=20, low=5, high=90)
+

Performs binarization using non-linear processing.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image

  • +
  • threshold (float)

  • +
  • zoom (float) – Zoom for background page estimation

  • +
  • escale (float) – Scale for estimating a mask over the text region

  • +
  • border (float) – Ignore this much of the border

  • +
  • perc (int) – Percentage for filters

  • +
  • range (int) – Range for filters

  • +
  • low (int) – Percentile for black estimation

  • +
  • high (int) – Percentile for white estimation

  • +
+
+
Returns:
+

PIL.Image.Image containing the binarized image

+
+
Raises:
+

KrakenInputException – When trying to binarize an empty image.

+
+
Return type:
+

PIL.Image.Image

+
+
+
+ +
+
+

kraken.transcribe module

+
+
+class kraken.transcribe.TranscriptionInterface(font=None, font_style=None)
+
+
+add_page(im, segmentation=None, records=None)
+

Adds an image to the transcription interface, optionally filling in +information from a list of ocr_record objects.

+
+
Parameters:
+
    +
  • im (PIL.Image) – Input image

  • +
  • segmentation (dict) – Output of the segment method.

  • +
  • records (list) – A list of ocr_record objects.

  • +
+
+
+
+ +
+
+env
+
+ +
+
+font
+
+ +
+
+line_idx = 1
+
+ +
+
+page_idx = 1
+
+ +
+
+pages: List[dict] = []
+
+ +
+
+seg_idx = 1
+
+ +
+
+text_direction = 'horizontal-tb'
+
+ +
+
+tmpl
+
+ +
+
+write(fd)
+

Writes the HTML file to a file descriptor.

+
+
Parameters:
+

fd (File) – File descriptor (mode=’rb’) to write to.

+
+
+
+ +
+ +
+
+

kraken.linegen module

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.3.0/genindex.html b/4.3.0/genindex.html new file mode 100644 index 000000000..62e80ce27 --- /dev/null +++ b/4.3.0/genindex.html @@ -0,0 +1,685 @@ + + + + + + + Index — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ + +

Index

+ +
+ A + | B + | C + | D + | E + | F + | G + | H + | I + | K + | L + | M + | N + | O + | P + | R + | S + | T + | U + | V + | W + +
+

A

+ + + +
+ +

B

+ + + +
+ +

C

+ + + +
+ +

D

+ + + +
+ +

E

+ + + +
+ +

F

+ + + +
+ +

G

+ + + +
+ +

H

+ + + +
+ +

I

+ + + +
+ +

K

+ + + +
+ +

L

+ + + +
+ +

M

+ + + +
+ +

N

+ + + +
+ +

O

+ + + +
+ +

P

+ + + +
+ +

R

+ + + +
+ +

S

+ + + +
+ +

T

+ + + +
+ +

U

+ + +
+ +

V

+ + + +
+ +

W

+ + + +
+ + + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.3.0/gpu.html b/4.3.0/gpu.html new file mode 100644 index 000000000..b0b9c09d9 --- /dev/null +++ b/4.3.0/gpu.html @@ -0,0 +1,100 @@ + + + + + + + + GPU Acceleration — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

GPU Acceleration

+

The latest version of kraken uses a new pytorch backend which enables GPU +acceleration both for training and recognition. Apart from a compatible Nvidia +GPU, CUDA and cuDNN have to be installed so pytorch can run computation on it.

+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.3.0/index.html b/4.3.0/index.html new file mode 100644 index 000000000..25af47322 --- /dev/null +++ b/4.3.0/index.html @@ -0,0 +1,1036 @@ + + + + + + + + kraken — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

kraken

+
+
+

kraken is a turn-key OCR system optimized for historical and non-Latin script +material.

+
+
+

Features

+

kraken’s main features are:

+
+
+
+

Pull requests and code contributions are always welcome.

+
+
+

Installation

+

Kraken can be run on Linux or Mac OS X (both x64 and ARM). Installation through +the on-board pip utility and the anaconda +scientific computing python are supported.

+
+

Installation using Pip

+
$ pip install kraken
+
+
+

or by running pip in the git repository:

+
$ pip install .
+
+
+

If you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to +install the pdf extras package for PyPi:

+
$ pip install kraken[pdf]
+
+
+

or

+
$ pip install .[pdf]
+
+
+

respectively.

+
+
+

Installation using Conda

+

To install the stable version through conda:

+
$ conda install -c conda-forge -c mittagessen kraken
+
+
+

Again PDF/multi-page TIFF/JPEG2000 support requires some additional dependencies:

+
$ conda install -c conda-forge pyvips
+
+
+

The git repository contains some environment files that aid in setting up the latest development version:

+
$ git clone https://github.com/mittagessen/kraken.git
+$ cd kraken
+$ conda env create -f environment.yml
+
+
+

or:

+
$ git clone https://github.com/mittagessen/kraken.git
+$ cd kraken
+$ conda env create -f environment_cuda.yml
+
+
+

for CUDA acceleration with the appropriate hardware.

+
+
+

Finding Recognition Models

+

Finally you’ll have to scrounge up a recognition model to do the actual +recognition of characters. To download the default English text recognition +model and place it in the user’s kraken directory:

+
$ kraken get 10.5281/zenodo.2577813
+
+
+

A list of libre models available in the central repository can be retrieved by +running:

+
$ kraken list
+
+
+

Model metadata can be extracted using:

+
$ kraken show 10.5281/zenodo.2577813
+name: 10.5281/zenodo.2577813
+
+A generalized model for English printed text
+
+This model has been trained on a large corpus of modern printed English text\naugmented with ~10000 lines of historical p
+scripts: Latn
+alphabet: !"#$%&'()+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]`abcdefghijklmnopqrstuvwxyz{} SPACE
+accuracy: 99.95%
+license: Apache-2.0
+author(s): Kiessling, Benjamin
+date: 2019-02-26
+
+
+
+
+
+

Quickstart

+

The structure of an OCR software consists of multiple steps, primarily +preprocessing, segmentation, and recognition, each of which takes the output of +the previous step and sometimes additional files such as models and templates +that define how a particular transformation is to be performed.

+

In kraken these are separated into different subcommands that can be chained or +ran separately:

+ + + + + + + + + + + + + + + Segmentation + + + + + + + + + + + Recognition + + + + + + + + + + + Serialization + + + + + + + + + + + + + + + + + + + + + + Recognition Model + + + + + + + + + + + + + + + + + + + + + + Segmentation Model + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + OCR Records + + + + + + + + + + + + + + + + + + Baselines, + Regions, + and Order + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Output File + + + + + + + + + + + + + + + + + + Output Template + + + + + + + + + + + + + + + + + + Image + + +

Recognizing text on an image using the default parameters including the +prerequisite step of page segmentation:

+
$ kraken -i image.tif image.txt segment -bl ocr
+Loading RNN     ✓
+Processing      ⣻
+
+
+

To segment an image into reading-order sorted baselines and regions:

+
$ kraken -i bw.tif lines.json segment -bl
+
+
+

To OCR an image using the default model:

+
$ kraken -i bw.tif image.txt segment -bl ocr
+
+
+

To OCR an image using the default model and serialize the output using the ALTO +template:

+
$ kraken -a -i bw.tif image.txt segment -bl ocr
+
+
+

All commands and their parameters are documented, just add the standard +--help flag for further information.

+
+
+

Training Tutorial

+

There is a training tutorial at Training kraken.

+
+ +
+

License

+

Kraken is provided under the terms and conditions of the Apache 2.0 +License.

+
+
+

Funding

+

kraken is developed at the École Pratique des Hautes Études, Université PSL.

+
+
+Co-financed by the European Union + +
+
+

This project was partially funded through the RESILIENCE project, funded from +the European Union’s Horizon 2020 Framework Programme for Research and +Innovation.

+
+
+
+
+Received funding from the Programme d’investissements d’Avenir + +
+
+

Ce travail a bénéficié d’une aide de l’État gérée par l’Agence Nationale de la +Recherche au titre du Programme d’Investissements d’Avenir portant la référence +ANR-21-ESRE-0005.

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.3.0/ketos.html b/4.3.0/ketos.html new file mode 100644 index 000000000..2efa1f48e --- /dev/null +++ b/4.3.0/ketos.html @@ -0,0 +1,841 @@ + + + + + + + + Training — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Training

+

This page describes the training utilities available through the ketos +command line utility in depth. For a gentle introduction on model training +please refer to the tutorial.

+

Both segmentation and recognition are trainable in kraken. The segmentation +model finds baselines and regions on a page image. Recognition models convert +text image lines found by the segmenter into digital text.

+
+

Training data formats

+

The training tools accept a variety of training data formats, usually some kind +of custom low level format, the XML-based formats that are commony used for +archival of annotation and transcription data, and in the case of recognizer +training a precompiled binary format. It is recommended to use the XML formats +for segmentation training and the binary format for recognition training.

+
+

ALTO

+

Kraken parses and produces files according to ALTO 4.2. An example showing the +attributes necessary for segmentation and recognition training follows:

+
<?xml version="1.0" encoding="UTF-8"?>
+<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+	xmlns="http://www.loc.gov/standards/alto/ns-v4#"
+	xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-0.xsd">
+	<Description>
+		<sourceImageInformation>
+			<fileName>filename.jpg</fileName><!-- relative path in relation to XML location of the image file-->
+		</sourceImageInformation>
+		....
+	</Description>
+	<Layout>
+		<Page...>
+			<PrintSpace...>
+				<ComposedBlockType ID="block_I"
+						   HPOS="125"
+						   VPOS="523" 
+						   WIDTH="5234" 
+						   HEIGHT="4000"
+						   TYPE="region_type"><!-- for textlines part of a semantic region -->
+					<TextBlock ID="textblock_N">
+						<TextLine ID="line_0"
+							  HPOS="..."
+							  VPOS="..." 
+							  WIDTH="..." 
+							  HEIGHT="..."
+							  BASELINE="10 20 15 20 400 20"><!-- necessary for segmentation training -->
+							<String ID="segment_K" 
+								CONTENT="word_text"><!-- necessary for recognition training. Text is retrieved from <String> and <SP> tags. Lower level glyphs are ignored. -->
+								...
+							</String>
+							<SP.../>
+						</TextLine>
+					</TextBlock>
+				</ComposedBlockType>
+				<TextBlock ID="textblock_M"><!-- for textlines not part of a region -->
+				...
+				</TextBlock>
+			</PrintSpace>
+		</Page>
+	</Layout>
+</alto>
+
+
+

Importantly, the parser only works with measurements in the pixel domain, i.e. +an unset MeasurementUnit or one with an element value of pixel. In +addition, as the minimal version required for ingestion is quite new it is +likely that most existing ALTO documents will not contain sufficient +information to be used with kraken out of the box.

+
+
+

PAGE XML

+

PAGE XML is parsed and produced according to the 2019-07-15 version of the +schema, although the parser is not strict and works with non-conformant output +of a variety of tools.

+
<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd">
+	<Metadata>...</Metadata>
+	<Page imageFilename="filename.jpg"...><!-- relative path to an image file from the location of the XML document -->
+		<TextRegion id="block_N"
+			    custom="structure {type:region_type;}"><!-- region type is a free text field-->
+			<Coords points="10,20 500,20 400,200, 500,300, 10,300 5,80"/><!-- polygon for region boundary -->
+			<TextLine id="line_K">
+				<Baseline points="80,200 100,210, 400,198"/><!-- required for baseline segmentation training -->
+				<TextEquiv><Unicode>text text text</Unicode></TextEquiv><!-- only TextEquiv tags immediately below the TextLine tag are parsed for recognition training -->
+				<Word>
+				...
+			</TextLine>
+			....
+		</TextRegion>
+		<TextRegion id="textblock_M"><!-- for lines not contained in any region. TextRegions without a type are automatically assigned the 'text' type which can be filtered out for training. -->
+			<Coords points="0,0 0,{{ page.size[1] }} {{ page.size[0] }},{{ page.size[1] }} {{ page.size[0] }},0"/>
+			<TextLine>...</TextLine><!-- same as above -->
+			....
+                </TextRegion>
+	</Page>
+</PcGts>
+
+
+
+
+

Binary Datasets

+

In addition to training recognition models directly from XML and image files, a +binary dataset format offering a couple of advantages is supported. Binary +datasets drastically improve loading performance allowing the saturation of +most GPUs with minimal computational overhead while also allowing training with +datasets that are larger than the systems main memory. A minor drawback is a +~30% increase in dataset size in comparison to the raw images + XML approach.

+

To realize this speedup the dataset has to be compiled first:

+
$ ketos compile -f xml -o dataset.arrow file_1.xml file_2.xml ...
+
+
+

if there are a lot of individual lines containing many lines this process can +take a long time. It can easily be parallelized by specifying the number of +separate parsing workers with the –workers option:

+
$ ketos compile --workers 8 -f xml ...
+
+
+

In addition, binary datasets can contain fixed splits which allow +reproducibility and comparability between training and evaluation runs. +Training, validation, and test splits can be pre-defined from multiple sources. +Per default they are sourced from tags defined in the source XML files unless +the option telling kraken to ignore them is set:

+
$ ketos compile --ignore-splits -f xml ...
+
+
+

Alternatively fixed-proportion random splits can be created ad-hoc during +compile time:

+
$ ketos compile --random-split 0.8 0.1 0.1 ...
+
+
+

The above line splits assigns 80% of the source lines to the training set, 10% +to the validation set, and 10% to the test set. The training and validation +sets in the dataset file are used automatically by ketos train (unless told +otherwise) while the remaining 10% of the test set is selected by ketos test.

+
+
+
+

Recognition training

+

The training utility allows training of VGSL specified models +both from scratch and from existing models. Here are its most important command line options:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

option

action

-o, --output

Output model file prefix. Defaults to model.

-s, --spec

VGSL spec of the network to train. CTC layer +will be added automatically. default: +[1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 +Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do]

-a, --append

Removes layers before argument and then +appends spec. Only works when loading an +existing model

-i, --load

Load existing file to continue training

-F, --savefreq

Model save frequency in epochs during +training

-q, --quit

Stop condition for training. Set to early +for early stopping (default) or dumb for fixed +number of epochs.

-N, --epochs

Number of epochs to train for.

--min-epochs

Minimum number of epochs to train for when using early stopping.

--lag

Number of epochs to wait before stopping +training without improvement. Only used when using early stopping.

-d, --device

Select device to use (cpu, cuda:0, cuda:1,…). GPU acceleration requires CUDA.

--optimizer

Select optimizer (Adam, SGD, RMSprop).

-r, --lrate

Learning rate [default: 0.001]

-m, --momentum

Momentum used with SGD optimizer. Ignored otherwise.

-w, --weight-decay

Weight decay.

--schedule

Sets the learning rate scheduler. May be either constant, 1cycle, exponential, cosine, step, or +reduceonplateau. For 1cycle the cycle length is determined by the –epoch option.

-p, --partition

Ground truth data partition ratio between train/validation set

-u, --normalization

Ground truth Unicode normalization. One of NFC, NFKC, NFD, NFKD.

-c, --codec

Load a codec JSON definition (invalid if loading existing model)

--resize

Codec/output layer resizing option. If set +to add code points will be added, both +will set the layer to match exactly the +training data, fail will abort if training +data and model codec do not match. Only valid when refining an existing model.

-n, --reorder / --no-reorder

Reordering of code points to display order.

-t, --training-files

File(s) with additional paths to training data. Used to +enforce an explicit train/validation set split and deal with +training sets with more lines than the command line can process. Can be used more than once.

-e, --evaluation-files

File(s) with paths to evaluation data. Overrides the -p parameter.

-f, --format-type

Sets the training and evaluation data format. +Valid choices are ‘path’, ‘xml’ (default), ‘alto’, ‘page’, or binary. +In alto, page, and xml mode all data is extracted from XML files +containing both baselines and a link to source images. +In path mode arguments are image files sharing a prefix up to the last +extension with JSON .path files containing the baseline information. +In binary mode arguments are precompiled binary dataset files.

--augment / --no-augment

Enables/disables data augmentation.

--workers

Number of OpenMP threads and workers used to perform neural network passes and load samples from the dataset.

+
+

From Scratch

+

The absolute minimal example to train a new recognition model from a number of +ALTO or PAGE XML documents is similar to the segmentation training:

+
$ ketos train -f xml training_data/*.xml
+
+
+

Training will continue until the error does not improve anymore and the best +model (among intermediate results) will be saved in the current directory; this +approach is called early stopping.

+

In some cases, such as color inputs, changing the network architecture might be +useful:

+
$ ketos train -f page -s '[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do]' syr/*.xml
+
+
+

Complete documentation for the network description language can be found on the +VGSL page.

+

Sometimes the early stopping default parameters might produce suboptimal +results such as stopping training too soon. Adjusting the minimum delta an/or +lag can be useful:

+
$ ketos train --lag 10 --min-delta 0.001 syr/*.png
+
+
+

To switch optimizers from Adam to SGD or RMSprop just set the option:

+
$ ketos train --optimizer SGD syr/*.png
+
+
+

It is possible to resume training from a previously saved model:

+
$ ketos train -i model_25.mlmodel syr/*.png
+
+
+

A good configuration for a small precompiled print dataset and GPU acceleration +would be:

+
$ ketos train -d cuda -f binary dataset.arrow
+
+
+

A better configuration for large and complicated datasets such as handwritten texts:

+
$ ketos train --augment --workers 4 -d cuda -f binary --min-epochs 20 -w 0 -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]' -r 0.0001 dataset_large.arrow
+
+
+

This configuration is slower to train and often requires a couple of epochs to +output any sensible text at all. Therefore we tell ketos to train for at least +20 epochs so the early stopping algorithm doesn’t prematurely interrupt the +training process.

+
+
+

Fine Tuning

+

Fine tuning an existing model for another typeface or new characters is also +possible with the same syntax as resuming regular training:

+
$ ketos train -f page -i model_best.mlmodel syr/*.xml
+
+
+

The caveat is that the alphabet of the base model and training data have to be +an exact match. Otherwise an error will be raised:

+
$ ketos train -i model_5.mlmodel kamil/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[0.8616] alphabet mismatch {'~', '»', '8', '9', 'ـ'}
+Network codec not compatible with training set
+[0.8620] Training data and model codec alphabets mismatch: {'ٓ', '؟', '!', 'ص', '،', 'ذ', 'ة', 'ي', 'و', 'ب', 'ز', 'ح', 'غ', '~', 'ف', ')', 'د', 'خ', 'م', '»', 'ع', 'ى', 'ق', 'ش', 'ا', 'ه', 'ك', 'ج', 'ث', '(', 'ت', 'ظ', 'ض', 'ل', 'ط', '؛', 'ر', 'س', 'ن', 'ء', 'ٔ', '«', 'ـ', 'ٕ'}
+
+
+

There are two modes dealing with mismatching alphabets, add and both. +add resizes the output layer and codec of the loaded model to include all +characters in the new training set without removing any characters. both +will make the resulting model an exact match with the new training set by both +removing unused characters from the model and adding new ones.

+
$ ketos -v train --resize add -i model_5.mlmodel syr/*.png
+...
+[0.7943] Training set 788 lines, validation set 88 lines, alphabet 50 symbols
+...
+[0.8337] Resizing codec to include 3 new code points
+[0.8374] Resizing last layer in network to 52 outputs
+...
+
+
+

In this example 3 characters were added for a network that is able to +recognize 52 different characters after sufficient additional training.

+
$ ketos -v train --resize both -i model_5.mlmodel syr/*.png
+...
+[0.7593] Training set 788 lines, validation set 88 lines, alphabet 49 symbols
+...
+[0.7857] Resizing network or given codec to 49 code sequences
+[0.8344] Deleting 2 output classes from network (46 retained)
+...
+
+
+

In both mode 2 of the original characters were removed and 3 new ones were added.

+
+
+

Slicing

+

Refining on mismatched alphabets has its limits. If the alphabets are highly +different the modification of the final linear layer to add/remove character +will destroy the inference capabilities of the network. In those cases it is +faster to slice off the last few layers of the network and only train those +instead of a complete network from scratch.

+

Taking the default network definition as printed in the debug log we can see +the layer indices of the model:

+
[0.8760] Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 48 outputs
+[0.8762] layer          type    params
+[0.8790] 0              conv    kernel 3 x 3 filters 32 activation r
+[0.8795] 1              dropout probability 0.1 dims 2
+[0.8797] 2              maxpool kernel 2 x 2 stride 2 x 2
+[0.8802] 3              conv    kernel 3 x 3 filters 64 activation r
+[0.8804] 4              dropout probability 0.1 dims 2
+[0.8806] 5              maxpool kernel 2 x 2 stride 2 x 2
+[0.8813] 6              reshape from 1 1 x 12 to 1/3
+[0.8876] 7              rnn     direction b transposed False summarize False out 100 legacy None
+[0.8878] 8              dropout probability 0.5 dims 1
+[0.8883] 9              linear  augmented False out 48
+
+
+

To remove everything after the initial convolutional stack and add untrained +layers we define a network stub and index for appending:

+
$ ketos train -i model_1.mlmodel --append 7 -s '[Lbx256 Do]' syr/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[0.8014] alphabet mismatch {'8', '3', '9', '7', '܇', '݀', '݂', '4', ':', '0'}
+Slicing and dicing model ✓
+
+
+

The new model will behave exactly like a new one, except potentially training a +lot faster.

+
+
+

Text Normalization and Unicode

+

Text can be encoded in multiple different ways when using Unicode. For many +scripts characters with diacritics can be encoded either as a single code point +or a base character and the diacritic, different types of whitespace exist, and mixed bidirectional text +can be written differently depending on the base line direction.

+

Ketos provides options to largely normalize input into normalized forms that +make processing of data from multiple sources possible. Principally, two +options are available: one for Unicode normalization and one for whitespace normalization. The +Unicode normalization (disabled per default) switch allows one to select one of +the 4 normalization forms:

+
$ ketos train --normalization NFD -f xml training_data/*.xml
+$ ketos train --normalization NFC -f xml training_data/*.xml
+$ ketos train --normalization NFKD -f xml training_data/*.xml
+$ ketos train --normalization NFKC -f xml training_data/*.xml
+
+
+

Whitespace normalization is enabled per default and converts all Unicode +whitespace characters into a simple space. It is highly recommended to leave +this function enabled as the variation of space width, resulting either from +text justification or the irregularity of handwriting, is difficult for a +recognition model to accurately model and map onto the different space code +points. Nevertheless it can be disabled through:

+
$ ketos train --no-normalize-whitespace -f xml training_data/*.xml
+
+
+

Further the behavior of the BiDi algorithm can be influenced through two options. The +configuration of the algorithm is important as the recognition network is +trained to output characters (or rather labels which are mapped to code points +by a codec) in the order a line is fed into the network, i.e. +left-to-right also called display order. Unicode text is encoded as a stream of +code points in logical order, i.e. the order the characters in a line are read +in by a human reader, for example (mostly) right-to-left for a text in Hebrew. +The BiDi algorithm resolves this logical order to the display order expected by +the network and vice versa. The primary parameter of the algorithm is the base +direction which is just the default direction of the input fields of the user +when the ground truth was initially transcribed. Base direction will be +automatically determined by kraken when using PAGE XML or ALTO files that +contain it, otherwise it will have to be supplied if it differs from the +default when training a model:

+
$ ketos train --base-dir R -f xml rtl_training_data/*.xml
+
+
+

It is also possible to disable BiDi processing completely, e.g. when the text +has been brought into display order already:

+
$ ketos train --no-reorder -f xml rtl_display_data/*.xml
+
+
+
+
+

Codecs

+

Codecs map between the label decoded from the raw network output and Unicode +code points (see this diagram for the precise steps +involved in text line recognition). Codecs are attached to a recognition model +and are usually defined once at initial training time, although they can be +adapted either explicitly (with the API) or implicitly through domain adaptation.

+

The default behavior of kraken is to auto-infer this mapping from all the +characters in the training set and map each code point to one separate label. +This is usually sufficient for alphabetic scripts, abjads, and abugidas apart +from very specialised use cases. Logographic writing systems with a very large +number of different graphemes, such as all the variants of Han characters or +Cuneiform, can be more problematic as their large inventory makes recognition +both slow and error-prone. In such cases it can be advantageous to decompose +each code point into multiple labels to reduce the output dimensionality of the +network. During decoding valid sequences of labels will be mapped to their +respective code points as usual.

+

There are multiple approaches one could follow constructing a custom codec: +randomized block codes, i.e. producing random fixed-length labels for each code +point, Huffmann coding, i.e. variable length label sequences depending on the +frequency of each code point in some text (not necessarily the training set), +or structural decomposition, i.e. describing each code point through a +sequence of labels that describe the shape of the grapheme similar to how some +input systems for Chinese characters function.

+

While the system is functional it is not well-tested in practice and it is +unclear which approach works best for which kinds of inputs.

+

Custom codecs can be supplied as simple JSON files that contain a dictionary +mapping between strings and integer sequences, e.g.:

+
$ ketos train -c sample.codec -f xml training_data/*.xml
+
+
+

with sample.codec containing:

+
{"S": [50, 53, 74, 23],
+ "A": [95, 60, 19, 95],
+ "B": [2, 96, 28, 29],
+ "\u1f05": [91, 14, 95, 90]}
+
+
+
+
+
+

Unsupervised recognition pretraining

+

Text recognition models can be pretrained in an unsupervised fashion from text +line images, both in bounding box and baseline format. The pretraining is +performed through a contrastive surrogate task aiming to distinguish in-painted +parts of the input image features from randomly sampled distractor slices.

+

All data sources accepted by the supervised trainer are valid for pretraining +but for performance reasons it is recommended to use pre-compiled binary +datasets. One thing to keep in mind is that compilation filters out empty +(non-transcribed) text lines per default which is undesirable for pretraining. +With the –keep-empty-lines option all valid lines will be written to the +dataset file:

+
$ ketos compile --keep-empty-lines -f xml -o foo.arrow *.xml
+
+
+

The basic pretraining call is very similar to a training one:

+
$ ketos pretrain -f binary foo.arrow
+
+
+

There are a couple of hyperparameters that are specific to pretraining: the +mask width (at the subsampling level of the last convolutional layer), the +probability of a particular position being the start position of a mask, and +the number of negative distractor samples.

+
$ ketos pretrain -o pretrain --mask-width 4 --mask-probability 0.2 --num-negatives 3 -f binary foo.arrow
+
+
+

Once a model has been pretrained it has to be adapted to perform actual +recognition with a standard labelled dataset, although training data +requirements will usually be much reduced:

+
$ ketos train -i pretrain_best.mlmodel --warmup 5000 --freeze-backbone 1000 -f binary labelled.arrow
+
+
+

It is necessary to use learning rate warmup (warmup) for at least a couple of +epochs in addition to freezing the backbone (all but the last fully connected +layer performing the classification) to have the model converge during +fine-tuning. Fine-tuning models from pre-trained weights is quite a bit less +stable than training from scratch or fine-tuning an existing model. As such it +can be necessary to run a couple of trials with different hyperparameters +(principally learning rate) to find workable ones. It is entirely possible that +pretrained models do not converge at all even with reasonable hyperparameter +configurations.

+
+
+

Segmentation training

+

Training a segmentation model is very similar to training models for text +recognition. The basic invocation is:

+
$ ketos segtrain -f xml training_data/*.xml
+Training line types:
+  default 2     53980
+  foo     8     134
+Training region types:
+  graphic       3       135
+  text  4       1128
+  separator     5       5431
+  paragraph     6       10218
+  table 7       16
+val check  [------------------------------------]  0/0
+
+
+

This takes all text lines and regions encoded in the XML files and trains a +model to recognize them.

+

Most other options available in transcription training are also available in +segmentation training. CUDA acceleration:

+
$ ketos segtrain -d cuda -f xml training_data/*.xml
+
+
+

Defining custom architectures:

+
$ ketos segtrain -d cuda -s '[1,1200,0,3 Cr7,7,64,2,2 Gn32 Cr3,3,128,2,2 Gn32 Cr3,3,128 Gn32 Cr3,3,256 Gn32]' -f xml training_data/*.xml
+
+
+

Fine tuning/transfer learning with last layer adaptation and slicing:

+
$ ketos segtrain --resize both -i segmodel_best.mlmodel training_data/*.xml
+$ ketos segtrain -i segmodel_best.mlmodel --append 7 -s '[Cr3,3,64 Do0.1]' training_data/*.xml
+
+
+

In addition there are a couple of specific options that allow filtering of +baseline and region types. Datasets are often annotated to a level that is too +detailed or contains undesirable types, e.g. when combining segmentation data +from different sources. The most basic option is the suppression of all of +either baseline or region data contained in the dataset:

+
$ ketos segtrain --suppress-baselines -f xml training_data/*.xml
+Training line types:
+Training region types:
+  graphic       3       135
+  text  4       1128
+  separator     5       5431
+  paragraph     6       10218
+  table 7       16
+...
+$ ketos segtrain --suppress-regions -f xml training-data/*.xml
+Training line types:
+  default 2     53980
+  foo     8     134
+...
+
+
+

It is also possible to filter out baselines/regions selectively:

+
$ ketos segtrain -f xml --valid-baselines default training_data/*.xml
+Training line types:
+  default 2     53980
+Training region types:
+  graphic       3       135
+  text  4       1128
+  separator     5       5431
+  paragraph     6       10218
+  table 7       16
+$ ketos segtrain -f xml --valid-regions graphic --valid-regions paragraph training_data/*.xml
+Training line types:
+  default 2     53980
+ Training region types:
+  graphic       3       135
+  paragraph     6       10218
+
+
+

Finally, we can merge baselines and regions into each other:

+
$ ketos segtrain -f xml --merge-baselines default:foo training_data/*.xml
+Training line types:
+  default 2     54114
+...
+$ ketos segtrain -f xml --merge-regions text:paragraph --merge-regions graphic:table training_data/*.xml
+...
+Training region types:
+  graphic       3       151
+  text  4       11346
+  separator     5       5431
+...
+
+
+

These options are combinable to massage the dataset into any typology you want.

+

Then there are some options that set metadata fields controlling the +postprocessing. When computing the bounding polygons the recognized baselines +are offset slightly to ensure overlap with the line corpus. This offset is per +default upwards for baselines but as it is possible to annotate toplines (for +scripts like Hebrew) and centerlines (for baseline-free scripts like Chinese) +the appropriate offset can be selected with an option:

+
$ ketos segtrain --topline -f xml hebrew_training_data/*.xml
+$ ketos segtrain --centerline -f xml chinese_training_data/*.xml
+$ ketos segtrain --baseline -f xml latin_training_data/*.xml
+
+
+

Lastly, there are some regions that are absolute boundaries for text line +content. When these regions are marked as such the polygonization can sometimes +be improved:

+
$ ketos segtrain --bounding-regions paragraph -f xml training_data/*.xml
+...
+
+
+
+
+

Recognition Testing

+

Picking a particular model from a pool or getting a more detailed look on the +recognition accuracy can be done with the test command. It uses transcribed +lines, the test set, in the same format as the train command, recognizes the +line images with one or more models, and creates a detailed report of the +differences from the ground truth for each of them.

+ + + + + + + + + + + + + + + + + + + + + + + +

option

action

-f, –format-type

Sets the test set data format. +Valid choices are ‘path’, ‘xml’ (default), ‘alto’, ‘page’, or binary. +In alto, page, and xml mode all data is extracted from XML files +containing both baselines and a link to source images. +In path mode arguments are image files sharing a prefix up to the last +extension with JSON .path files containing the baseline information. +In binary mode arguments are precompiled binary dataset files.

-m, –model

Model(s) to evaluate.

-e, –evaluation-files

File(s) with paths to evaluation data.

-d, –device

Select device to use.

–pad

Left and right padding around lines.

+

Transcriptions are handed to the command in the same way as for the train +command, either through a manifest with -e/–evaluation-files or by just +adding a number of image files as the final argument:

+
$ ketos test -m $model -e test.txt test/*.png
+Evaluating $model
+Evaluating  [####################################]  100%
+=== report test_model.mlmodel ===
+
+7012 Characters
+6022 Errors
+14.12%       Accuracy
+
+5226 Insertions
+2    Deletions
+794  Substitutions
+
+Count Missed   %Right
+1567  575    63.31%  Common
+5230  5230   0.00%   Arabic
+215   215    0.00%   Inherited
+
+Errors       Correct-Generated
+773  { ا } - {  }
+536  { ل } - {  }
+328  { و } - {  }
+274  { ي } - {  }
+266  { م } - {  }
+256  { ب } - {  }
+246  { ن } - {  }
+241  { SPACE } - {  }
+207  { ر } - {  }
+199  { ف } - {  }
+192  { ه } - {  }
+174  { ع } - {  }
+172  { ARABIC HAMZA ABOVE } - {  }
+144  { ت } - {  }
+136  { ق } - {  }
+122  { س } - {  }
+108  { ، } - {  }
+106  { د } - {  }
+82   { ك } - {  }
+81   { ح } - {  }
+71   { ج } - {  }
+66   { خ } - {  }
+62   { ة } - {  }
+60   { ص } - {  }
+39   { ، } - { - }
+38   { ش } - {  }
+30   { ا } - { - }
+30   { ن } - { - }
+29   { ى } - {  }
+28   { ذ } - {  }
+27   { ه } - { - }
+27   { ARABIC HAMZA BELOW } - {  }
+25   { ز } - {  }
+23   { ث } - {  }
+22   { غ } - {  }
+20   { م } - { - }
+20   { ي } - { - }
+20   { ) } - {  }
+19   { : } - {  }
+19   { ط } - {  }
+19   { ل } - { - }
+18   { ، } - { . }
+17   { ة } - { - }
+16   { ض } - {  }
+...
+Average accuracy: 14.12%, (stddev: 0.00)
+
+
+

The report(s) contains character accuracy measured per script and a detailed +list of confusions. When evaluating multiple models the last line of the output +will the average accuracy and the standard deviation across all of them.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.3.0/models.html b/4.3.0/models.html new file mode 100644 index 000000000..feff8b54d --- /dev/null +++ b/4.3.0/models.html @@ -0,0 +1,126 @@ + + + + + + + + Models — kraken documentation + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Models

+

There are currently three kinds of models containing the recurrent neural +networks doing all the character recognition supported by kraken: pronn +files serializing old pickled pyrnn models as protobuf, clstm’s native +serialization, and versatile Core ML models.

+
+

CoreML

+

Core ML allows arbitrary network architectures in a compact serialization with +metadata. This is the default format in pytorch-based kraken.

+
+
+

Segmentation Models

+
+
+

Recognition Models

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.3.0/objects.inv b/4.3.0/objects.inv new file mode 100644 index 000000000..1d53a4a86 Binary files /dev/null and b/4.3.0/objects.inv differ diff --git a/4.3.0/search.html b/4.3.0/search.html new file mode 100644 index 000000000..27b26b591 --- /dev/null +++ b/4.3.0/search.html @@ -0,0 +1,113 @@ + + + + + + + Search — kraken documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +

Search

+ + + + +

+ Searching for multiple words only shows matches that contain + all words. +

+ + +
+ + + +
+ + +
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.3.0/searchindex.js b/4.3.0/searchindex.js new file mode 100644 index 000000000..4d95587c5 --- /dev/null +++ b/4.3.0/searchindex.js @@ -0,0 +1 @@ +Search.setIndex({"alltitles": {"ALTO": [[5, "alto"]], "API Quickstart": [[1, null]], "API Reference": [[2, null]], "Advanced Usage": [[0, null]], "Annotation and transcription": [[7, "annotation-and-transcription"]], "Baseline Segmentation": [[0, "baseline-segmentation"]], "Baseline segmentation": [[1, "baseline-segmentation"]], "Basic Concepts": [[1, "basic-concepts"]], "Basics": [[8, "basics"]], "Binarization": [[0, "binarization"]], "Binary Datasets": [[5, "binary-datasets"]], "Codecs": [[5, "codecs"]], "Convolutional Layers": [[8, "convolutional-layers"]], "CoreML": [[6, "coreml"]], "Dataset Compilation": [[7, "dataset-compilation"]], "Datasets": [[2, "datasets"]], "Dropout": [[8, "dropout"]], "Evaluation and Validation": [[7, "evaluation-and-validation"]], "Examples": [[8, "examples"]], "Features": [[4, "features"]], "Finding Recognition Models": [[4, "finding-recognition-models"]], "Fine Tuning": [[5, "fine-tuning"]], "From Scratch": [[5, "from-scratch"]], "Funding": [[4, "funding"]], "GPU Acceleration": [[3, null]], "Group Normalization": [[8, "group-normalization"]], "Helper and Plumbing Layers": [[8, "helper-and-plumbing-layers"]], "Helpers": [[2, "helpers"]], "Image acquisition and preprocessing": [[7, "image-acquisition-and-preprocessing"]], "Input and Outputs": [[0, "input-and-outputs"]], "Installation": [[4, "installation"]], "Installation using Conda": [[4, "installation-using-conda"]], "Installation using Pip": [[4, "installation-using-pip"]], "Installing kraken": [[7, "installing-kraken"]], "Legacy Box Segmentation": [[0, "legacy-box-segmentation"]], "Legacy modules": [[2, "legacy-modules"]], "Legacy segmentation": [[1, "legacy-segmentation"]], "License": [[4, "license"]], "Loss and Evaluation Functions": [[2, "loss-and-evaluation-functions"]], "Masking": [[0, "masking"]], "Max Pool": [[8, "max-pool"]], "Model Repository": [[0, "model-repository"]], "Models": [[6, null]], "Output formats": [[0, "output-formats"]], "PAGE XML": [[5, "page-xml"]], "Page Segmentation": [[0, "page-segmentation"]], "Preprocessing and Segmentation": [[1, "preprocessing-and-segmentation"]], "Principal Text Direction": [[0, "principal-text-direction"]], "Publishing": [[0, "publishing"]], "Querying and Model Retrieval": [[0, "querying-and-model-retrieval"]], "Quickstart": [[4, "quickstart"]], "Recognition": [[0, "recognition"], [1, "recognition"], [7, "recognition"]], "Recognition Models": [[6, "recognition-models"]], "Recognition Testing": [[5, "recognition-testing"]], "Recognition training": [[5, "recognition-training"]], "Recurrent Layers": [[8, "recurrent-layers"]], "Regularization Layers": [[8, "regularization-layers"]], "Related Software": [[4, "related-software"]], "Reshape": [[8, "reshape"]], "Segmentation Models": [[6, "segmentation-models"]], "Segmentation training": [[5, "segmentation-training"]], "Serialization": [[1, "serialization"]], "Slicing": [[5, "slicing"]], "Text Normalization and Unicode": [[5, "text-normalization-and-unicode"]], "Trainer": [[2, "trainer"]], "Training": [[1, "training"], [5, null], [7, "compilation"]], "Training Schedulers": [[2, "training-schedulers"]], "Training Stoppers": [[2, "training-stoppers"]], "Training Tutorial": [[4, "training-tutorial"]], "Training data formats": [[5, "training-data-formats"]], "Training kraken": [[7, null]], "Unsupervised recognition pretraining": [[5, "unsupervised-recognition-pretraining"]], "VGSL network specification": [[8, null]], "XML Parsing": [[1, "xml-parsing"]], "kraken": [[4, null]], "kraken.binarization module": [[2, "kraken-binarization-module"]], "kraken.blla module": [[2, "kraken-blla-module"]], "kraken.lib.codec module": [[2, "kraken-lib-codec-module"]], "kraken.lib.ctc_decoder": [[2, "kraken-lib-ctc-decoder"]], "kraken.lib.dataset module": [[2, "kraken-lib-dataset-module"]], "kraken.lib.exceptions": [[2, "kraken-lib-exceptions"]], "kraken.lib.models module": [[2, "kraken-lib-models-module"]], "kraken.lib.segmentation module": [[2, "kraken-lib-segmentation-module"]], "kraken.lib.train module": [[2, "kraken-lib-train-module"]], "kraken.lib.vgsl module": [[2, "kraken-lib-vgsl-module"]], "kraken.lib.xml module": [[2, "kraken-lib-xml-module"]], "kraken.linegen module": [[2, "kraken-linegen-module"]], "kraken.pageseg module": [[2, "kraken-pageseg-module"]], "kraken.rpred module": [[2, "kraken-rpred-module"]], "kraken.serialization module": [[2, "kraken-serialization-module"]], "kraken.transcribe module": [[2, "kraken-transcribe-module"]]}, "docnames": ["advanced", "api", "api_docs", "gpu", "index", "ketos", "models", "training", "vgsl"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2}, "filenames": ["advanced.rst", "api.rst", "api_docs.rst", "gpu.rst", "index.rst", "ketos.rst", "models.rst", "training.rst", "vgsl.rst"], "indexentries": {"add() (kraken.lib.dataset.baselineset method)": [[2, "kraken.lib.dataset.BaselineSet.add", false]], "add() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.add", false]], "add() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.add", false]], "add_codec() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.add_codec", false]], "add_labels() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.add_labels", false]], "add_page() (kraken.transcribe.transcriptioninterface method)": [[2, "kraken.transcribe.TranscriptionInterface.add_page", false]], "alphabet (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.alphabet", false]], "alphabet (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.alphabet", false]], "append() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.append", false]], "aug (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.aug", false]], "aug (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.aug", false]], "aug (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.aug", false]], "automatic_optimization (kraken.lib.train.krakentrainer attribute)": [[2, "kraken.lib.train.KrakenTrainer.automatic_optimization", false]], "aux_layers (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.aux_layers", false]], "base_dir (kraken.rpred.ocr_record attribute)": [[2, "kraken.rpred.ocr_record.base_dir", false]], "baselineset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.BaselineSet", false]], "beam_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.beam_decoder", false]], "bidi_reordering (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.bidi_reordering", false]], "blank_threshold_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.blank_threshold_decoder", false]], "blocks (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.blocks", false]], "bounds (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.bounds", false]], "build_addition() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_addition", false]], "build_conv() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_conv", false]], "build_dropout() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_dropout", false]], "build_groupnorm() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_groupnorm", false]], "build_identity() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_identity", false]], "build_maxpool() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_maxpool", false]], "build_output() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_output", false]], "build_parallel() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_parallel", false]], "build_reshape() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_reshape", false]], "build_rnn() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_rnn", false]], "build_series() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_series", false]], "build_wav2vec2() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_wav2vec2", false]], "c_sorted (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.c_sorted", false]], "calculate_polygonal_environment() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.calculate_polygonal_environment", false]], "class_mapping (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.class_mapping", false]], "class_stats (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.class_stats", false]], "codec (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.codec", false]], "codec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.codec", false]], "compute_polygon_section() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.compute_polygon_section", false]], "confidences (kraken.rpred.ocr_record property)": [[2, "kraken.rpred.ocr_record.confidences", false]], "criterion (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id0", false], [2, "kraken.lib.vgsl.TorchVGSLModel.criterion", false]], "cuts (kraken.rpred.ocr_record property)": [[2, "kraken.rpred.ocr_record.cuts", false]], "decode() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.decode", false]], "decoder (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.decoder", false]], "denoising_hysteresis_thresh() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.denoising_hysteresis_thresh", false]], "device (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.device", false]], "display_order() (kraken.rpred.ocr_record method)": [[2, "kraken.rpred.ocr_record.display_order", false]], "encode() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.encode", false]], "encode() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.encode", false]], "encode() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.encode", false]], "env (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.env", false]], "eval() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.eval", false]], "extract_polygons() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.extract_polygons", false]], "failed_samples (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.failed_samples", false]], "failed_samples (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.failed_samples", false]], "failed_samples (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.failed_samples", false]], "filtered_tags (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.filtered_tags", false]], "fit() (kraken.lib.train.krakentrainer method)": [[2, "kraken.lib.train.KrakenTrainer.fit", false]], "font (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.font", false]], "forward() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.forward", false]], "greedy_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.greedy_decoder", false]], "groundtruthdataset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.GroundTruthDataset", false]], "height (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id5", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.height", false]], "hyper_params (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.hyper_params", false]], "idx (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.idx", false]], "im (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.im", false]], "im_mode (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.im_mode", false]], "im_mode (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.im_mode", false]], "im_mode (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.im_mode", false]], "im_str (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.im_str", false]], "imgs (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.imgs", false]], "init_weights() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.init_weights", false]], "input (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id1", false], [2, "kraken.lib.vgsl.TorchVGSLModel.input", false]], "is_valid (kraken.lib.codec.pytorchcodec property)": [[2, "kraken.lib.codec.PytorchCodec.is_valid", false]], "kind (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.kind", false]], "krakencairosurfaceexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenCairoSurfaceException", false]], "krakencodecexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenCodecException", false]], "krakenencodeexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenEncodeException", false]], "krakeninputexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenInputException", false]], "krakeninvalidmodelexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenInvalidModelException", false]], "krakenrecordexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenRecordException", false]], "krakenrepoexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenRepoException", false]], "krakenstoptrainingexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenStopTrainingException", false]], "krakentrainer (class in kraken.lib.train)": [[2, "kraken.lib.train.KrakenTrainer", false]], "l2c (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.l2c", false]], "line_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.line_idx", false]], "line_width (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.line_width", false]], "load_any() (in module kraken.lib.models)": [[2, "kraken.lib.models.load_any", false]], "load_model() (kraken.lib.vgsl.torchvgslmodel class method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.load_model", false]], "logical_order() (kraken.rpred.ocr_record method)": [[2, "kraken.rpred.ocr_record.logical_order", false]], "m (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.m", false]], "max_label (kraken.lib.codec.pytorchcodec property)": [[2, "kraken.lib.codec.PytorchCodec.max_label", false]], "mbl_dict (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.mbl_dict", false]], "merge() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.merge", false]], "message (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id6", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.message", false]], "miss (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.miss", false]], "mm_rpred (class in kraken.rpred)": [[2, "kraken.rpred.mm_rpred", false]], "mode (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.mode", false]], "model_type (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.model_type", false]], "mreg_dict (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.mreg_dict", false]], "named_spec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.named_spec", false]], "nets (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.nets", false]], "nlbin() (in module kraken.binarization)": [[2, "kraken.binarization.nlbin", false]], "nn (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.nn", false]], "nn (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id2", false], [2, "kraken.lib.vgsl.TorchVGSLModel.nn", false]], "no_encode() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.no_encode", false]], "no_encode() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.no_encode", false]], "num_classes (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.num_classes", false]], "ocr_record (class in kraken.rpred)": [[2, "kraken.rpred.ocr_record", false]], "one_channel_mode (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.one_channel_mode", false]], "one_channel_mode (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.one_channel_mode", false]], "one_channel_mode (kraken.lib.vgsl.torchvgslmodel property)": [[2, "id3", false]], "one_channel_modes (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.one_channel_modes", false]], "ops (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.ops", false]], "pad (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.pad", false]], "pad (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.pad", false]], "page_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.page_idx", false]], "pages (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.pages", false]], "parse() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.parse", false]], "parse() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.parse", false]], "parse_alto() (in module kraken.lib.xml)": [[2, "kraken.lib.xml.parse_alto", false]], "parse_page() (in module kraken.lib.xml)": [[2, "kraken.lib.xml.parse_page", false]], "parse_xml() (in module kraken.lib.xml)": [[2, "kraken.lib.xml.parse_xml", false]], "pattern (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.pattern", false]], "polygonal_reading_order() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.polygonal_reading_order", false]], "polygongtdataset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.PolygonGTDataset", false]], "predict() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict", false]], "predict_labels() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict_labels", false]], "predict_string() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict_string", false]], "prediction (kraken.rpred.ocr_record property)": [[2, "kraken.rpred.ocr_record.prediction", false]], "pytorchcodec (class in kraken.lib.codec)": [[2, "kraken.lib.codec.PytorchCodec", false]], "reading_order() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.reading_order", false]], "render_report() (in module kraken.serialization)": [[2, "kraken.serialization.render_report", false]], "resize_output() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.resize_output", false]], "rpred() (in module kraken.rpred)": [[2, "kraken.rpred.rpred", false]], "save_model() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.save_model", false]], "scale_polygonal_lines() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.scale_polygonal_lines", false]], "scale_regions() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.scale_regions", false]], "seg_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.seg_idx", false]], "seg_type (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.seg_type", false]], "seg_type (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.seg_type", false]], "seg_type (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.seg_type", false]], "seg_type (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.seg_type", false]], "seg_type (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.seg_type", false]], "seg_types (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.seg_types", false]], "segment() (in module kraken.blla)": [[2, "kraken.blla.segment", false]], "segment() (in module kraken.pageseg)": [[2, "kraken.pageseg.segment", false]], "serialize() (in module kraken.serialization)": [[2, "kraken.serialization.serialize", false]], "serialize_segmentation() (in module kraken.serialization)": [[2, "kraken.serialization.serialize_segmentation", false]], "set_num_threads() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.set_num_threads", false]], "skip_empty_lines (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.skip_empty_lines", false]], "skip_empty_lines (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.skip_empty_lines", false]], "spec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.spec", false]], "split (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.split", false]], "strict (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.strict", false]], "suffix (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.suffix", false]], "tags (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.tags", false]], "tags_ignore (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.tags_ignore", false]], "targets (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.targets", false]], "text_direction (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.text_direction", false]], "text_transforms (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.text_transforms", false]], "text_transforms (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.text_transforms", false]], "tmpl (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.tmpl", false]], "to() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.to", false]], "to() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.to", false]], "torchseqrecognizer (class in kraken.lib.models)": [[2, "kraken.lib.models.TorchSeqRecognizer", false]], "torchvgslmodel (class in kraken.lib.vgsl)": [[2, "kraken.lib.vgsl.TorchVGSLModel", false]], "train (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.train", false]], "train() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.train", false]], "transcriptioninterface (class in kraken.transcribe)": [[2, "kraken.transcribe.TranscriptionInterface", false]], "transform() (kraken.lib.dataset.baselineset method)": [[2, "kraken.lib.dataset.BaselineSet.transform", false]], "transforms (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.transforms", false]], "transforms (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.transforms", false]], "transforms (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.transforms", false]], "ts (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.ts", false]], "type (kraken.rpred.ocr_record property)": [[2, "kraken.rpred.ocr_record.type", false]], "user_metadata (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id4", false], [2, "kraken.lib.vgsl.TorchVGSLModel.user_metadata", false]], "valid_baselines (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.valid_baselines", false]], "valid_regions (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.valid_regions", false]], "vectorize_lines() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.vectorize_lines", false]], "width (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id7", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.width", false]], "write() (kraken.transcribe.transcriptioninterface method)": [[2, "kraken.transcribe.TranscriptionInterface.write", false]]}, "objects": {"kraken.binarization": [[2, 0, 1, "", "nlbin"]], "kraken.blla": [[2, 0, 1, "", "segment"]], "kraken.lib.codec": [[2, 1, 1, "", "PytorchCodec"]], "kraken.lib.codec.PytorchCodec": [[2, 2, 1, "", "add_labels"], [2, 3, 1, "", "c_sorted"], [2, 2, 1, "", "decode"], [2, 2, 1, "", "encode"], [2, 4, 1, "", "is_valid"], [2, 3, 1, "", "l2c"], [2, 4, 1, "", "max_label"], [2, 2, 1, "", "merge"], [2, 3, 1, "", "strict"]], "kraken.lib.ctc_decoder": [[2, 0, 1, "", "beam_decoder"], [2, 0, 1, "", "blank_threshold_decoder"], [2, 0, 1, "", "greedy_decoder"]], "kraken.lib.dataset": [[2, 1, 1, "", "BaselineSet"], [2, 1, 1, "", "GroundTruthDataset"], [2, 1, 1, "", "PolygonGTDataset"]], "kraken.lib.dataset.BaselineSet": [[2, 2, 1, "", "add"], [2, 3, 1, "", "aug"], [2, 3, 1, "", "class_mapping"], [2, 3, 1, "", "class_stats"], [2, 3, 1, "", "failed_samples"], [2, 3, 1, "", "im_mode"], [2, 3, 1, "", "imgs"], [2, 3, 1, "", "line_width"], [2, 3, 1, "", "mbl_dict"], [2, 3, 1, "", "mode"], [2, 3, 1, "", "mreg_dict"], [2, 3, 1, "", "num_classes"], [2, 3, 1, "", "pad"], [2, 3, 1, "", "seg_type"], [2, 3, 1, "", "targets"], [2, 2, 1, "", "transform"], [2, 3, 1, "", "transforms"], [2, 3, 1, "", "valid_baselines"], [2, 3, 1, "", "valid_regions"]], "kraken.lib.dataset.GroundTruthDataset": [[2, 2, 1, "", "add"], [2, 3, 1, "", "alphabet"], [2, 3, 1, "", "aug"], [2, 2, 1, "", "encode"], [2, 3, 1, "", "failed_samples"], [2, 3, 1, "", "im_mode"], [2, 2, 1, "", "no_encode"], [2, 2, 1, "", "parse"], [2, 3, 1, "", "seg_type"], [2, 3, 1, "", "skip_empty_lines"], [2, 3, 1, "", "split"], [2, 3, 1, "", "suffix"], [2, 3, 1, "", "text_transforms"], [2, 3, 1, "", "transforms"]], "kraken.lib.dataset.PolygonGTDataset": [[2, 2, 1, "", "add"], [2, 3, 1, "", "alphabet"], [2, 3, 1, "", "aug"], [2, 2, 1, "", "encode"], [2, 3, 1, "", "failed_samples"], [2, 3, 1, "", "im_mode"], [2, 2, 1, "", "no_encode"], [2, 2, 1, "", "parse"], [2, 3, 1, "", "seg_type"], [2, 3, 1, "", "skip_empty_lines"], [2, 3, 1, "", "text_transforms"], [2, 3, 1, "", "transforms"]], "kraken.lib.exceptions": [[2, 1, 1, "", "KrakenCairoSurfaceException"], [2, 1, 1, "", "KrakenCodecException"], [2, 1, 1, "", "KrakenEncodeException"], [2, 1, 1, "", "KrakenInputException"], [2, 1, 1, "", "KrakenInvalidModelException"], [2, 1, 1, "", "KrakenRecordException"], [2, 1, 1, "", "KrakenRepoException"], [2, 1, 1, "", "KrakenStopTrainingException"]], "kraken.lib.exceptions.KrakenCairoSurfaceException": [[2, 3, 1, "id5", "height"], [2, 3, 1, "id6", "message"], [2, 3, 1, "id7", "width"]], "kraken.lib.models": [[2, 1, 1, "", "TorchSeqRecognizer"], [2, 0, 1, "", "load_any"]], "kraken.lib.models.TorchSeqRecognizer": [[2, 3, 1, "", "codec"], [2, 3, 1, "", "decoder"], [2, 3, 1, "", "device"], [2, 2, 1, "", "forward"], [2, 3, 1, "", "kind"], [2, 3, 1, "", "nn"], [2, 3, 1, "", "one_channel_mode"], [2, 2, 1, "", "predict"], [2, 2, 1, "", "predict_labels"], [2, 2, 1, "", "predict_string"], [2, 3, 1, "", "seg_type"], [2, 2, 1, "", "to"], [2, 3, 1, "", "train"]], "kraken.lib.segmentation": [[2, 0, 1, "", "calculate_polygonal_environment"], [2, 0, 1, "", "compute_polygon_section"], [2, 0, 1, "", "denoising_hysteresis_thresh"], [2, 0, 1, "", "extract_polygons"], [2, 0, 1, "", "polygonal_reading_order"], [2, 0, 1, "", "reading_order"], [2, 0, 1, "", "scale_polygonal_lines"], [2, 0, 1, "", "scale_regions"], [2, 0, 1, "", "vectorize_lines"]], "kraken.lib.train": [[2, 1, 1, "", "KrakenTrainer"]], "kraken.lib.train.KrakenTrainer": [[2, 3, 1, "", "automatic_optimization"], [2, 2, 1, "", "fit"]], "kraken.lib.vgsl": [[2, 1, 1, "", "TorchVGSLModel"]], "kraken.lib.vgsl.TorchVGSLModel": [[2, 2, 1, "", "add_codec"], [2, 2, 1, "", "append"], [2, 4, 1, "", "aux_layers"], [2, 3, 1, "", "blocks"], [2, 2, 1, "", "build_addition"], [2, 2, 1, "", "build_conv"], [2, 2, 1, "", "build_dropout"], [2, 2, 1, "", "build_groupnorm"], [2, 2, 1, "", "build_identity"], [2, 2, 1, "", "build_maxpool"], [2, 2, 1, "", "build_output"], [2, 2, 1, "", "build_parallel"], [2, 2, 1, "", "build_reshape"], [2, 2, 1, "", "build_rnn"], [2, 2, 1, "", "build_series"], [2, 2, 1, "", "build_wav2vec2"], [2, 3, 1, "", "codec"], [2, 3, 1, "id0", "criterion"], [2, 2, 1, "", "eval"], [2, 4, 1, "", "hyper_params"], [2, 3, 1, "", "idx"], [2, 2, 1, "", "init_weights"], [2, 3, 1, "id1", "input"], [2, 2, 1, "", "load_model"], [2, 3, 1, "", "m"], [2, 4, 1, "", "model_type"], [2, 3, 1, "", "named_spec"], [2, 3, 1, "id2", "nn"], [2, 4, 1, "id3", "one_channel_mode"], [2, 3, 1, "", "ops"], [2, 3, 1, "", "pattern"], [2, 2, 1, "", "resize_output"], [2, 2, 1, "", "save_model"], [2, 4, 1, "", "seg_type"], [2, 2, 1, "", "set_num_threads"], [2, 3, 1, "", "spec"], [2, 2, 1, "", "to"], [2, 2, 1, "", "train"], [2, 3, 1, "id4", "user_metadata"]], "kraken.lib.xml": [[2, 0, 1, "", "parse_alto"], [2, 0, 1, "", "parse_page"], [2, 0, 1, "", "parse_xml"]], "kraken.pageseg": [[2, 0, 1, "", "segment"]], "kraken.rpred": [[2, 1, 1, "", "mm_rpred"], [2, 1, 1, "", "ocr_record"], [2, 0, 1, "", "rpred"]], "kraken.rpred.mm_rpred": [[2, 3, 1, "", "bidi_reordering"], [2, 3, 1, "", "bounds"], [2, 3, 1, "", "filtered_tags"], [2, 3, 1, "", "im"], [2, 3, 1, "", "im_str"], [2, 3, 1, "", "miss"], [2, 3, 1, "", "nets"], [2, 3, 1, "", "one_channel_modes"], [2, 3, 1, "", "pad"], [2, 3, 1, "", "seg_types"], [2, 3, 1, "", "tags"], [2, 3, 1, "", "tags_ignore"], [2, 3, 1, "", "ts"]], "kraken.rpred.ocr_record": [[2, 3, 1, "", "base_dir"], [2, 4, 1, "", "confidences"], [2, 4, 1, "", "cuts"], [2, 2, 1, "", "display_order"], [2, 2, 1, "", "logical_order"], [2, 4, 1, "", "prediction"], [2, 4, 1, "", "type"]], "kraken.serialization": [[2, 0, 1, "", "render_report"], [2, 0, 1, "", "serialize"], [2, 0, 1, "", "serialize_segmentation"]], "kraken.transcribe": [[2, 1, 1, "", "TranscriptionInterface"]], "kraken.transcribe.TranscriptionInterface": [[2, 2, 1, "", "add_page"], [2, 3, 1, "", "env"], [2, 3, 1, "", "font"], [2, 3, 1, "", "line_idx"], [2, 3, 1, "", "page_idx"], [2, 3, 1, "", "pages"], [2, 3, 1, "", "seg_idx"], [2, 3, 1, "", "text_direction"], [2, 3, 1, "", "tmpl"], [2, 2, 1, "", "write"]]}, "objnames": {"0": ["py", "function", "Python function"], "1": ["py", "class", "Python class"], "2": ["py", "method", "Python method"], "3": ["py", "attribute", "Python attribute"], "4": ["py", "property", "Python property"]}, "objtypes": {"0": "py:function", "1": "py:class", "2": "py:method", "3": "py:attribute", "4": "py:property"}, "terms": {"": [0, 1, 2, 4, 5, 6, 7, 8], "0": [0, 1, 2, 4, 5, 7, 8], "00": [0, 5, 7], "0001": 5, "0005": 4, "001": [5, 7], "0123456789": [0, 4, 7], "01c59": 8, "02": 4, "0245": 7, "04": 7, "06": [0, 7], "07": [0, 5], "09": [0, 7], "0d": 7, "0xe8e5": 0, "0xf038": 0, "0xf128": 0, "1": [0, 1, 2, 5, 7, 8], "10": [0, 1, 2, 4, 5, 7], "100": [0, 2, 5, 7, 8], "1000": 5, "10000": 4, "1015": 1, "1020": 8, "10218": 5, "1024": 8, "103": 1, "105": 1, "106": 5, "108": 5, "11": 7, "1128": 5, "11346": 5, "1161": 1, "117": 1, "1184": 7, "119": 1, "1195": 1, "12": [5, 7, 8], "120": 5, "1200": 5, "121": 1, "122": 5, "124": 1, "125": 5, "126": 1, "128": [5, 8], "13": [5, 7], "131": 1, "132": 7, "1339": 7, "134": 5, "135": 5, "1359": 7, "136": 5, "1377": 1, "1385": 1, "1388": 1, "1397": 1, "14": [0, 5], "1408": [1, 2], "1410": 1, "1412": 1, "1416": 7, "143": 7, "144": 5, "145": 1, "15": [1, 5, 7], "151": 5, "1558": 7, "1567": 5, "157": 7, "16": [0, 2, 5, 8], "161": 7, "1623": 7, "1681": 7, "1697": 7, "17": [2, 5], "1708": 1, "1716": 1, "172": 5, "1724": 7, "174": 5, "1754": 7, "176": 7, "18": [5, 7], "1824": 1, "19": [1, 5], "192": 5, "198": 5, "199": 5, "1996": 7, "1bpp": 0, "1cycl": 5, "1d": 8, "1st": 7, "1x0": 5, "1x12": [5, 8], "1x16": 8, "1x48": 8, "2": [0, 2, 4, 5, 7, 8], "20": [1, 2, 5, 8], "200": 5, "2000": 1, "2001": 5, "2006": 2, "2014": 2, "2016": 1, "2017": 1, "2019": [4, 5], "2020": 4, "2021": 0, "204": 7, "2041": 1, "207": [1, 5], "2072": 1, "2077": 1, "2078": 1, "2096": 7, "21": 4, "210": 5, "215": 5, "216": 1, "22": [0, 5, 7], "228": 1, "23": [0, 5], "230": 1, "232": 1, "2334": 7, "2364": 7, "23rd": 2, "24": [0, 1, 7], "241": 5, "2426": 1, "246": 5, "2483": 1, "25": [1, 5, 7, 8], "250": 1, "2500": 7, "253": 1, "256": [5, 7, 8], "2577813": 4, "259": 7, "26": [4, 7], "266": 5, "27": 5, "270": 7, "27046": 7, "274": 5, "28": [1, 5], "2873": 2, "29": [0, 1, 5], "2d": [2, 8], "3": [2, 5, 7, 8], "30": [5, 7], "300": 5, "300dpi": 7, "307": 7, "31": 5, "32": [5, 8], "328": 5, "3292": 1, "336": 7, "3367": 1, "3398": 1, "3414": 1, "3418": 7, "3437": 1, "345": 1, "3455": 1, "35000": 7, "3504": 7, "3514": 1, "3519": 7, "35619": 7, "365": 7, "3680": 7, "38": 5, "384": 8, "39": 5, "4": [1, 2, 5, 7, 8], "40": 7, "400": 5, "4000": 5, "428": 7, "431": 7, "46": 5, "47": 7, "471": 1, "473": 1, "48": [5, 7, 8], "488": 7, "49": [0, 5, 7], "491": 1, "4d": 2, "5": [1, 2, 5, 7, 8], "50": [5, 7], "500": 5, "5000": 5, "509": 1, "512": 8, "515": 1, "52": [5, 7], "522": 1, "5226": 5, "523": 5, "5230": 5, "5234": 5, "524": 1, "5258": 7, "5281": [0, 4], "53": 5, "534": 1, "536": [1, 5], "53980": 5, "54": 1, "54114": 5, "5431": 5, "545": 7, "5468665": 0, "56": [0, 1, 7], "5617734": 0, "5617783": 0, "562": 1, "575": [1, 5], "577": 7, "59": [7, 8], "5951": 7, "599": 7, "6": [5, 7, 8], "60": [5, 7], "6022": 5, "62": 5, "63": 5, "64": [5, 8], "646": 7, "6542744": 0, "66": [5, 7], "668": 1, "69": 1, "7": [1, 5, 7, 8], "70": 1, "7012": 5, "7015": 7, "71": 5, "7272": 7, "7281": 7, "73": 1, "74": [1, 5], "7593": 5, "76": 1, "773": 5, "7857": 5, "788": [5, 7], "79": 1, "794": 5, "7943": 5, "8": [0, 5, 7, 8], "80": [2, 5], "800": 7, "8014": 5, "81": [5, 7], "811": 7, "82": 5, "824": 7, "8337": 5, "8344": 5, "8374": 5, "84": [1, 7], "8445": 7, "8479": 7, "8481": 7, "8482": 7, "8484": 7, "8485": 7, "8486": 7, "8487": 7, "8488": 7, "8489": 7, "8490": 7, "8491": 7, "8492": 7, "8493": 7, "8494": 7, "8495": 7, "8496": 7, "8497": 7, "8498": 7, "8499": 7, "8500": 7, "8501": 7, "8502": 7, "8503": 7, "8504": 7, "8505": 7, "8506": 7, "8507": 7, "8508": 7, "8509": 7, "8510": 7, "8511": 7, "8512": 7, "8616": 5, "8620": 5, "876": 7, "8760": 5, "8762": 5, "8790": 5, "8795": 5, "8797": 5, "88": [5, 7], "8802": 5, "8804": 5, "8806": 5, "8813": 5, "8876": 5, "8878": 5, "8883": 5, "889": 7, "9": [1, 5, 7, 8], "90": [2, 5], "906": 8, "906x32": 8, "91": 5, "92": 1, "9315": 7, "9318": 7, "9350": 7, "9361": 7, "9381": 7, "95": [0, 4, 5], "9541": 7, "9550": 7, "96": [5, 7], "97": 7, "98": 7, "99": [4, 7], "9918": 7, "9920": 7, "9924": 7, "A": [0, 1, 2, 4, 5, 7, 8], "As": [0, 1, 2, 5], "BY": 0, "By": 7, "For": [0, 1, 2, 5, 7, 8], "If": [0, 2, 4, 5, 7, 8], "In": [0, 1, 2, 4, 5, 7], "It": [0, 1, 5, 7], "Its": 0, "NO": 7, "One": [2, 5], "The": [0, 1, 2, 3, 4, 5, 7, 8], "Then": 5, "There": [0, 1, 4, 5, 6, 7], "These": [0, 1, 2, 4, 5, 7], "To": [0, 1, 2, 4, 5, 7], "Will": 2, "With": [0, 5], "aaebv2": 0, "abbyyxml": [0, 4], "abcdefghijklmnopqrstuvwxyz": 4, "abcdefghijklmnopqrstuvxabcdefghijklmnopqrstuvwxyz": 0, "abjad": 5, "abl": [0, 2, 5, 7], "abort": [5, 7], "about": 7, "abov": [0, 1, 5, 7], "absolut": [2, 5], "abstract": 2, "abugida": 5, "acceler": [4, 5, 7], "accent": 0, "accept": [0, 2, 5], "access": [0, 1], "access_token": 0, "accord": [0, 2, 5], "accordingli": 2, "account": [0, 7], "accur": 5, "accuraci": [0, 1, 2, 4, 5, 7], "achiev": 7, "acm": 2, "across": [2, 5], "action": [0, 5], "activ": [0, 5, 7, 8], "actual": [2, 4, 5, 7], "acut": 0, "ad": [2, 5, 7], "adam": 5, "adapt": 5, "add": [0, 2, 4, 5, 8], "add_codec": 2, "add_label": 2, "add_pag": 2, "addit": [0, 1, 2, 4, 5], "addition": 2, "adjust": [5, 7, 8], "administr": 0, "advantag": 5, "advis": 7, "affect": 7, "after": [0, 1, 5, 7, 8], "afterward": [0, 1], "again": [4, 7], "agenc": 4, "aggreg": 2, "ah": 7, "aid": 4, "aim": 5, "aku": 7, "al": [2, 7], "alam": 7, "albeit": 7, "aletheia": 7, "alex": 2, "algorithm": [0, 1, 2, 5], "all": [0, 1, 2, 4, 5, 6, 7], "allow": [5, 6, 7], "almost": [0, 1], "along": [2, 8], "alphabet": [0, 2, 4, 5, 7, 8], "alreadi": 5, "also": [0, 1, 2, 4, 5, 7], "altern": [5, 8], "although": [0, 1, 5, 7], "alto": [0, 1, 2, 4, 7], "alto_doc": 1, "alwai": [0, 2, 4], "amiss": 7, "among": 5, "amount": [0, 7], "an": [0, 1, 2, 4, 5, 7, 8], "anaconda": 4, "analogu": 0, "analysi": [0, 4, 7], "ani": [0, 1, 2, 5], "annot": [0, 4, 5], "anoth": [0, 2, 5, 7, 8], "anr": 4, "antiqua": 0, "anymor": [0, 5, 7], "anyth": 2, "apach": 4, "apart": [0, 3, 5], "apdjfqpf": 2, "api": 5, "append": [0, 2, 5, 7, 8], "appli": [0, 1, 2, 4, 7, 8], "applic": [1, 7], "approach": [5, 7], "appropri": [0, 2, 4, 5, 7, 8], "approv": 0, "approxim": 1, "ar": [0, 1, 2, 4, 5, 6, 7, 8], "arab": [0, 5, 7], "arbitrari": [1, 6, 7, 8], "architectur": [4, 5, 6, 8], "archiv": [1, 5, 7], "area": [0, 2], "aren": 2, "arg": 2, "arg0": 2, "argument": [1, 5], "argx": 2, "arian": 0, "arm": 4, "around": [0, 1, 2, 5, 7], "arrai": [1, 2], "arrow": 5, "arxiv": 2, "ask": 0, "aspect": 2, "assign": [2, 5, 7], "associ": 1, "assum": 2, "attach": [1, 5], "attribut": [1, 2, 5], "au": 4, "aug": 2, "augment": [1, 2, 5, 7, 8], "author": [0, 4], "authorship": 0, "auto": [1, 2, 5], "autodetermin": 2, "automat": [0, 1, 2, 5, 7, 8], "automatic_optim": 2, "aux_lay": 2, "auxiliari": [0, 1], "avail": [0, 1, 4, 5, 7], "avenir": 4, "averag": [0, 2, 5, 7], "awesom": 0, "awni": 2, "axi": [2, 8], "b": [0, 1, 5, 7, 8], "back": 2, "backbon": 5, "backend": 3, "background": [0, 2], "bar": 2, "base": [1, 2, 5, 6, 7, 8], "base_dir": 2, "baselin": [2, 4, 5, 7], "baseline_seg": 1, "baselineset": 2, "basic": [0, 5, 7], "batch": [0, 2, 7, 8], "bayr\u016bt": 7, "bbox": 2, "beam": 2, "beam_decod": 2, "beam_siz": 2, "becaus": [1, 7], "becom": 0, "been": [0, 4, 5, 7], "befor": [2, 5, 7, 8], "beforehand": 7, "behav": [5, 8], "behavior": 5, "being": [1, 2, 5, 8], "below": [0, 5, 7], "benjamin": 4, "best": [0, 2, 5, 7], "better": 5, "between": [0, 2, 5, 7], "bi": [2, 8], "bidi": [4, 5], "bidi_reord": 2, "bidirect": [2, 5], "bidirection": 8, "binar": [1, 7], "binari": [0, 1, 2], "bind": 0, "bit": [1, 5], "biton": 2, "bl": [0, 4], "black": [0, 1, 2, 7], "black_colsep": 2, "blank": 2, "blank_threshold_decod": 2, "blla": 1, "blob": 2, "block": [0, 1, 2, 5, 8], "block_i": 5, "block_n": 5, "board": 4, "boilerpl": 1, "book": 0, "bookhand": 0, "bool": 2, "border": [0, 2], "both": [0, 1, 2, 3, 4, 5, 7], "bottom": [0, 1, 2, 4], "bound": [0, 1, 2, 4, 5], "boundari": [0, 1, 2, 5], "box": [1, 2, 4, 5], "break": 7, "brought": 5, "build": [2, 5, 7], "build_addit": 2, "build_conv": 2, "build_dropout": 2, "build_groupnorm": 2, "build_ident": 2, "build_maxpool": 2, "build_output": 2, "build_parallel": 2, "build_reshap": 2, "build_rnn": 2, "build_seri": 2, "build_wav2vec2": 2, "buld\u0101n": 7, "bw": [0, 4], "bw_im": 1, "bw_imag": 7, "b\u00e9n\u00e9fici\u00e9": 4, "c": [0, 1, 2, 4, 5, 8], "c1": 2, "c2": 2, "c_sort": 2, "cach": 2, "cairo": 2, "calcul": [1, 2], "calculate_polygonal_environ": 2, "call": [1, 2, 5, 7], "callabl": 2, "callback": 1, "can": [0, 1, 2, 3, 4, 5, 7, 8], "cannot": 0, "capabl": [0, 5], "case": [0, 1, 2, 5, 7], "cat": 0, "categori": 2, "caus": [1, 2], "caveat": 5, "cc": 0, "cd": 4, "ce": [4, 7], "cell": 8, "cent": 7, "centerlin": 5, "central": [4, 7], "certain": [0, 2, 7], "chain": [0, 4, 7], "chang": [0, 1, 2, 5], "channel": [2, 4, 8], "char": 2, "char_confus": 2, "charact": [0, 1, 2, 4, 5, 6, 7], "charset": 2, "check": [0, 5], "chines": [0, 5], "chinese_training_data": 5, "choic": 5, "chosen": 1, "circumst": 7, "class": [0, 1, 2, 5, 7], "class_map": 2, "class_stat": 2, "classic": 7, "classif": [2, 5, 7, 8], "classifi": [0, 1, 8], "classmethod": 2, "claus": 7, "cli": 1, "clone": 4, "close": 4, "closer": 1, "clstm": [0, 2, 6], "code": [0, 1, 2, 4, 5, 7], "codec": 1, "coher": 0, "collect": [2, 7], "color": [0, 1, 5, 7, 8], "colsep": 0, "column": [0, 1, 2], "com": [4, 7], "combin": [0, 1, 5, 7, 8], "come": 2, "command": [0, 1, 4, 5, 7], "commenc": 1, "common": [2, 5, 7], "commoni": 5, "commun": 0, "compact": [0, 6], "compar": 5, "comparison": 5, "compat": [2, 3, 5], "compil": 5, "complet": [1, 5, 7], "complex": [1, 7], "complic": 5, "compos": 2, "composedblocktyp": 5, "composit": 0, "compound": 2, "compress": 7, "compris": 7, "comput": [0, 2, 3, 4, 5, 7], "computation": 7, "compute_polygon_sect": 2, "conda": 7, "condit": [4, 5], "confer": 2, "confid": [0, 1, 2], "configur": [1, 2, 5], "conform": 5, "confus": 5, "connect": [2, 5, 7], "connectionist": 2, "consid": [0, 2], "consist": [0, 1, 4, 7, 8], "constant": 5, "construct": [5, 7], "contain": [0, 1, 2, 4, 5, 6, 7], "contemporari": 0, "content": 5, "continu": [0, 1, 2, 5, 7], "contrast": [5, 7], "contrib": 1, "contribut": 4, "control": 5, "conv": [5, 8], "converg": 5, "convers": [1, 7], "convert": [0, 1, 2, 5, 7], "convolut": [2, 5], "coord": 5, "coordin": [2, 4], "core": 6, "coreml": 2, "corpu": [4, 5], "correct": [0, 1, 2, 5, 7], "correspond": [0, 1, 2], "cosin": 5, "cost": 7, "could": [2, 5], "couldn": 2, "count": [2, 5, 7], "counter": 2, "coupl": [0, 5, 7], "cover": 0, "coverag": 7, "cpu": [1, 2, 5, 7], "cr3": [5, 8], "cr7": 5, "crate": 0, "creat": [0, 2, 4, 5, 7, 8], "creation": 0, "cremma": 0, "cremma_medieval_bicerin": 0, "criterion": 2, "css": 0, "ctc": [1, 2, 5], "ctc_decod": 1, "cuda": [3, 4, 5], "cudnn": 3, "cumbersom": 0, "cuneiform": 5, "curat": 0, "current": [0, 2, 5, 6], "curv": 0, "custom": [0, 1, 2, 5], "cut": [1, 2, 4], "cycl": 5, "d": [0, 4, 5, 7, 8], "dai": 4, "data": [0, 1, 2, 4, 7, 8], "dataset": 1, "dataset_larg": 5, "date": [0, 4], "de": [4, 7], "deal": [0, 5], "debug": [1, 5, 7], "decai": 5, "decid": 0, "decod": [1, 2, 5], "decompos": 5, "decomposit": 5, "decreas": 7, "def": 1, "default": [0, 1, 2, 4, 5, 6, 7, 8], "default_split": 2, "defin": [0, 1, 2, 4, 5, 8], "definit": [0, 5, 8], "degrad": 1, "degre": 7, "del_indic": 2, "delet": [0, 2, 5, 7], "delta": 5, "denoising_hysteresis_thresh": 2, "denot": 0, "depend": [0, 1, 4, 5, 7], "deposit": 0, "deprec": 0, "depth": [5, 7, 8], "describ": [2, 5], "descript": [0, 2, 5], "descriptor": 2, "deseri": 2, "desir": [1, 8], "desktop": 7, "destin": 2, "destroi": 5, "detail": [0, 5, 7], "detect": [0, 2], "determin": [0, 2, 5], "develop": [2, 4], "deviat": 5, "devic": [1, 2, 5, 7], "diacrit": 5, "diaeres": 7, "diaeresi": 7, "diagram": 5, "dialect": 8, "dice": 5, "dict": 2, "dictionari": [2, 5], "differ": [0, 1, 4, 5, 7, 8], "difficult": 5, "digit": 5, "dim": [5, 7, 8], "dimens": [2, 8], "dimension": 5, "dir": [2, 5], "direct": [1, 2, 4, 5, 7, 8], "directli": [0, 5], "directori": [1, 2, 4, 5, 7], "disabl": [0, 2, 5, 7], "discover": 0, "disk": 7, "displai": [2, 5], "display_ord": 2, "dist1": 2, "dist2": 2, "distanc": 2, "distinguish": 5, "distractor": 5, "distribut": 8, "dnn": 2, "do": [0, 1, 2, 4, 5, 6, 7, 8], "do0": [5, 8], "doc": 0, "document": [0, 1, 2, 4, 5, 7], "doe": [0, 1, 2, 5, 7], "doesn": [2, 5, 7], "doi": 0, "domain": [1, 5], "done": [0, 5, 7], "dot": 7, "down": 7, "download": [0, 4, 7], "downward": 2, "drastic": 5, "draw": 1, "drawback": [0, 5], "driver": 1, "drop": [1, 8], "dropout": [2, 5, 7], "du": 4, "dumb": 5, "dummylogg": 2, "duplic": 2, "dure": [2, 5, 7], "e": [0, 1, 2, 5, 7, 8], "each": [0, 1, 2, 4, 5, 7, 8], "earli": [5, 7], "easiest": 7, "easili": [5, 7], "ecod": 2, "edg": 2, "edit": 7, "editor": 7, "edu": 7, "effect": 0, "either": [0, 2, 5, 7, 8], "element": 5, "emploi": [0, 7], "empti": [2, 5], "enabl": [1, 2, 3, 5, 7, 8], "enable_progress_bar": [1, 2], "enable_summari": 2, "encapsul": 1, "encod": [2, 5, 7], "end": [1, 2], "end_separ": 2, "endpoint": 2, "energi": 2, "enforc": [0, 5], "engin": 1, "english": 4, "enough": 7, "ensur": 5, "entir": 5, "entri": 2, "env": [2, 4, 7], "environ": [2, 4, 7], "environment_cuda": 4, "epoch": [5, 7], "equal": [1, 7, 8], "equival": 8, "erron": 7, "error": [0, 2, 5, 7], "escal": [0, 2], "escripta": 4, "escriptorium": [4, 7], "especi": 0, "esr": 4, "estim": [0, 2, 7], "et": 2, "etc": 0, "european": 4, "eval": 2, "evalu": 5, "evaluation_data": 1, "evaluation_fil": 1, "even": [0, 5, 7], "everi": 0, "everyth": 5, "exact": [5, 7], "exactli": [1, 5], "exampl": [0, 1, 5, 7], "except": [1, 5], "exchang": 0, "execut": [0, 7, 8], "exhaust": 7, "exist": [0, 1, 5, 7], "exit": 2, "expand": 0, "expect": [2, 5, 7, 8], "experi": [4, 7], "experiment": 7, "explic": 0, "explicit": [1, 5], "explicitli": [5, 7], "exponenti": 5, "express": 0, "extend": 8, "extens": [0, 5], "extent": 7, "extern": 2, "extra": [2, 4], "extract": [0, 1, 2, 4, 5, 7], "extract_polygon": 2, "extrapol": 2, "f": [0, 4, 5, 7, 8], "f_t": 2, "factor": [0, 2], "fail": 5, "failed_sampl": 2, "failed_sample_threshold": 2, "faint": 0, "fairli": 7, "fallback": 0, "fals": [1, 2, 5, 7, 8], "fame": 0, "fancy_model": 0, "faq\u012bh": 7, "fashion": 5, "faster": [5, 7, 8], "fd": 2, "featur": [1, 2, 5, 7, 8], "fed": [0, 1, 2, 5, 8], "feed": [0, 1], "feminin": 7, "fetch": 7, "few": [0, 5, 7], "field": [2, 5], "figur": 1, "file": [0, 1, 2, 4, 5, 6, 7], "file_1": 5, "file_2": 5, "filenam": [1, 2, 5], "filenotfounderror": 2, "fill": 2, "filter": [1, 2, 5, 8], "filtered_tag": 2, "final": [0, 2, 4, 5, 7, 8], "find": [0, 5, 7], "fine": [1, 7], "finish": 7, "first": [0, 1, 2, 5, 7, 8], "fit": [1, 2, 7], "fix": [0, 5, 7], "flag": [1, 2, 4], "float": [0, 2], "flow": 0, "flush": 2, "fname": 2, "follow": [0, 2, 5, 8], "font": 2, "font_styl": 2, "foo": [1, 2, 5], "forc": 0, "foreground": 0, "forg": 4, "form": [0, 2, 5], "format": [1, 2, 6, 7], "format_typ": 1, "formul": 8, "forward": [2, 8], "found": [0, 1, 2, 5, 7], "four": 0, "fp": 1, "framework": [1, 4], "free": [2, 5], "freeli": [0, 7], "freez": 5, "freeze_backbon": 2, "french": 0, "frequenc": [5, 7], "friendli": [4, 7], "from": [0, 1, 2, 3, 4, 7, 8], "full": 7, "fulli": [2, 4, 5], "function": [1, 5], "fundament": 1, "further": [0, 1, 2, 4, 5], "g": [0, 2, 5, 7, 8], "gain": 1, "garantue": 2, "gaussian_filt": 2, "gener": [0, 1, 2, 4, 5, 7], "gentl": 5, "get": [0, 1, 4, 5, 7], "git": 4, "github": 4, "githubusercont": 7, "gitter": 4, "given": [1, 2, 5, 8], "glob": [0, 1], "glori": 0, "glyph": [5, 7], "gn": 8, "gn32": 5, "go": 7, "good": 5, "gov": 5, "gpu": [1, 5], "gradient": 2, "grain": [1, 7], "graph": [2, 8], "graphem": [2, 5, 7], "graphic": 5, "grave": 2, "grayscal": [0, 1, 2, 7, 8], "greedi": 2, "greedili": 2, "greedy_decod": [1, 2], "greek": [0, 7], "grei": 0, "grek": 0, "ground": [5, 7], "ground_truth": 1, "groundtruthdataset": 2, "group": [4, 7], "gru": [2, 8], "gt": [2, 5], "guarante": 1, "guid": 7, "g\u00e9r\u00e9e": 4, "h": [0, 2, 7], "ha": [0, 1, 2, 4, 5, 7, 8], "hamza": [5, 7], "han": 5, "hand": [5, 7], "handl": 1, "handwrit": 5, "handwritten": [0, 1, 5], "hannun": 2, "happen": 1, "happili": 0, "hard": [2, 7], "hardwar": 4, "haut": 4, "have": [0, 1, 2, 3, 4, 5, 7], "heatmap": [0, 1], "hebrew": [0, 5, 7], "hebrew_training_data": 5, "height": [0, 2, 5, 8], "held": 7, "help": [4, 7], "here": [0, 5], "heurist": 0, "high": [0, 1, 2, 7, 8], "higher": 8, "highli": [2, 5, 7], "histor": 4, "hline": 0, "hoc": 5, "hocr": [0, 2, 4, 7], "honor": 0, "horizon": 4, "horizont": [0, 1, 2], "hour": 7, "how": [4, 5, 7], "hpo": 5, "html": 2, "http": [4, 5, 7], "huffmann": 5, "human": 5, "hundr": 7, "hyper_param": 2, "hyperparamet": 5, "h\u0101d\u012b": 7, "i": [0, 1, 2, 4, 5, 6, 7, 8], "ibn": 7, "id": 5, "ident": 1, "identifi": 0, "idx": 2, "ignor": [0, 2, 5], "illustr": 2, "im": [1, 2], "im_feat": 2, "im_mod": 2, "im_str": 2, "im_transform": 2, "imag": [0, 1, 2, 4, 5, 8], "image_nam": [1, 2], "image_s": [1, 2], "imagefilenam": 5, "imaginari": 7, "img": 2, "immedi": 5, "impath": 2, "implement": [0, 1, 8], "implicitli": 5, "import": [0, 1, 5, 7], "importantli": [2, 5, 7], "improv": [0, 5, 7], "includ": [0, 1, 4, 5, 7], "inclus": 0, "incompat": 2, "incorrect": 7, "increas": [5, 7], "independ": 8, "index": [0, 2, 5], "indic": [2, 5, 7], "individu": [0, 5], "individualis": 0, "infer": [2, 4, 5, 7], "influenc": 5, "inform": [0, 1, 2, 4, 5, 7], "ingest": 5, "inherit": [5, 7], "init": 1, "init_weight": 2, "initi": [0, 1, 2, 5, 7, 8], "inlin": 0, "innov": 4, "input": [1, 2, 5, 7, 8], "input_1": [0, 7], "input_2": [0, 7], "input_imag": 7, "insert": [2, 5, 7, 8], "insid": 2, "insight": 1, "inspect": 7, "instal": 3, "instanc": [0, 1, 2, 5], "instanti": 2, "instead": [2, 5, 7], "insuffici": 7, "int": 2, "integ": [0, 1, 2, 5, 7, 8], "integr": 7, "intend": 4, "intens": 7, "interact": 0, "interchang": 2, "interfac": [2, 4], "intermedi": [1, 5, 7], "intern": [0, 1, 2, 7], "interoper": 2, "interrupt": 5, "introduct": 5, "inttensor": 2, "intuit": 8, "invalid": [2, 5], "inventori": [5, 7], "invers": 0, "investiss": 4, "invoc": 5, "invok": 7, "involv": [5, 7], "irregular": 5, "is_valid": 2, "isn": [1, 2, 7, 8], "iter": [1, 2, 7], "its": [0, 2, 5, 7], "itself": 1, "j": 2, "jinja": 0, "jinja2": [1, 2], "jpeg": [0, 7], "jpeg2000": [0, 4], "jpg": [0, 5], "json": [0, 4, 5], "just": [0, 1, 4, 5, 7], "justif": 5, "kamil": 5, "keep": [0, 5], "kei": [2, 4], "kernel": [5, 8], "kernel_s": 8, "keto": [0, 5, 7], "keyword": 0, "kiessl": 4, "kind": [0, 2, 5, 6, 7], "kit\u0101b": 7, "know": 7, "known": [2, 7], "kraken": [0, 1, 3, 5, 6, 8], "krakencairosurfaceexcept": 2, "krakencodecexcept": 2, "krakenencodeexcept": 2, "krakeninputexcept": 2, "krakeninvalidmodelexcept": 2, "krakenrecordexcept": 2, "krakenrepoexcept": 2, "krakenstoptrainingexcept": 2, "krakentrain": [1, 2], "kutub": 7, "kwarg": 2, "l": [0, 2, 4, 7, 8], "l2c": [1, 2], "la": 4, "label": [0, 1, 2, 5], "lack": 7, "lag": 5, "languag": [2, 5, 8], "larg": [0, 1, 2, 4, 5, 7], "larger": [2, 5, 7], "last": [0, 2, 5, 8], "lastli": 5, "later": [0, 7], "latest": [3, 4], "latin": [0, 4], "latin_training_data": 5, "latn": [0, 4], "latter": 1, "layer": [2, 5, 7], "layout": [0, 2, 4, 5, 7], "lbx100": [5, 7, 8], "lbx128": [5, 8], "lbx200": 5, "lbx256": [5, 8], "learn": [1, 2, 5], "least": [5, 7], "leav": [5, 8], "lectaurep": 0, "left": [0, 2, 4, 5, 7], "leftward": 0, "legaci": [5, 7, 8], "leipzig": 7, "len": 2, "length": [2, 5], "less": [5, 7], "let": 7, "letter": 0, "level": [0, 1, 2, 5, 7], "lfx25": 8, "lfys20": 8, "lfys64": [5, 8], "lib": 1, "libr": 4, "librari": 1, "licens": 0, "light": 0, "lightn": 1, "lightningmodul": 1, "like": [0, 1, 5, 7], "likewis": [1, 7], "limit": [0, 5], "line": [0, 1, 2, 4, 5, 7, 8], "line_0": 5, "line_idx": 2, "line_k": 5, "line_width": 2, "linear": [2, 5, 7, 8], "link": [4, 5], "linux": [4, 7], "list": [0, 2, 4, 5, 7], "liter": 2, "litteratur": 0, "ll": 4, "load": [0, 1, 2, 4, 5, 7], "load_ani": [1, 2], "load_model": [1, 2], "loadabl": 2, "loader": 1, "loc": 5, "locat": [1, 2, 5, 7], "log": [5, 7], "log_dir": 2, "logger": 2, "logic": 5, "logical_ord": 2, "logograph": 5, "long": [0, 5], "longest": 2, "look": [0, 1, 5, 7], "lossless": 7, "lot": [1, 5], "low": [0, 1, 2, 5], "lower": 5, "lr": [0, 1, 2, 7], "lrate": 5, "lstm": [2, 8], "ltr": 0, "m": [0, 2, 5, 7, 8], "mac": [4, 7], "machin": 2, "macron": 0, "maddah": 7, "made": 7, "mai": [0, 2, 5, 7], "main": [0, 4, 5], "mainli": [1, 2], "major": 1, "make": [0, 5], "mandatori": 1, "mani": [2, 5], "manifest": 5, "manual": [0, 1, 2, 7], "manuscript": [0, 7], "map": [0, 1, 2, 5], "mark": [5, 7], "markedli": 7, "mask": [1, 2, 5], "massag": 5, "master": 7, "match": [2, 5], "materi": [0, 1, 4, 7], "matrix": 1, "matter": 7, "max": 2, "max_epoch": 2, "max_label": 2, "maxcolsep": [0, 2], "maxim": 7, "maximum": [0, 2, 8], "maxpool": [2, 5, 8], "mb": 0, "mbl_dict": 2, "mean": [1, 2, 7], "measur": 5, "measurementunit": 5, "mediev": 0, "memori": [2, 5, 7], "merg": [2, 5], "merge_baselin": 2, "merge_region": 2, "messag": 2, "metadata": [0, 1, 2, 4, 5, 6, 7], "method": [0, 1, 2], "might": [0, 5, 7], "min": [2, 5], "min_epoch": 2, "min_length": 2, "mind": 5, "minim": [1, 2, 5], "minimum": 5, "minor": 5, "mismatch": [1, 5, 7], "misrecogn": 7, "miss": [0, 2, 5, 7], "mittagessen": [4, 7], "mix": [0, 5], "ml": 6, "mlmodel": [0, 5, 7], "mm_rpred": [1, 2], "mode": [0, 1, 2, 5], "model": [1, 5, 7, 8], "model_1": 5, "model_25": 5, "model_5": 5, "model_best": 5, "model_fil": 7, "model_nam": 7, "model_name_best": 7, "model_path": 1, "model_typ": 2, "modern": [0, 4, 7], "modest": 1, "modif": 5, "modul": 1, "momentum": [5, 7], "mono": 0, "more": [0, 1, 2, 4, 5, 7, 8], "most": [0, 1, 2, 5, 7], "mostli": [0, 1, 2, 4, 5, 7, 8], "move": [2, 7, 8], "move_metrics_to_cpu": 2, "mp": 8, "mp2": [5, 8], "mp3": [5, 8], "mreg_dict": 2, "much": [1, 2, 4, 5], "multi": [0, 1, 2, 4, 7], "multilabel": 2, "multipl": [0, 1, 4, 5, 7], "my": 0, "myprintingcallback": 1, "n": [0, 2, 5, 8], "name": [0, 2, 4, 7, 8], "named_spec": 2, "national": 4, "nativ": [0, 2, 6], "natur": [2, 7], "naugment": 4, "nchw": 2, "ndarrai": 2, "necessari": [0, 2, 4, 5, 7], "necessarili": [2, 5], "need": [1, 2, 7], "neg": 5, "net": [1, 2, 7], "network": [1, 2, 4, 5, 6, 7], "neural": [1, 2, 5, 6, 7], "never": 7, "nevertheless": [1, 5], "new": [0, 2, 3, 5, 7, 8], "next": [1, 7], "nfc": 5, "nfd": 5, "nfkc": 5, "nfkd": 5, "nlbin": [0, 1, 2], "nn": 2, "no_encod": 2, "no_hlin": 2, "noisi": 7, "non": [0, 1, 2, 4, 5, 7, 8], "none": [0, 2, 5, 7, 8], "nonlinear": 8, "nop": 1, "normal": 2, "notabl": 0, "note": 2, "notion": 1, "now": [1, 7], "np": 2, "num": [2, 5], "num_class": 2, "number": [0, 1, 2, 5, 7, 8], "numer": [1, 7], "numpi": [1, 2], "nvidia": 3, "o": [0, 1, 2, 4, 5, 7], "o1c103": 8, "object": [0, 1, 2], "obtain": 7, "obvious": 7, "occur": 7, "occurr": 2, "ocr": [0, 1, 2, 4, 7], "ocr_record": [1, 2], "ocropi": 2, "ocropu": [0, 2], "off": [5, 7], "offer": 5, "offset": [2, 5], "often": [0, 1, 5, 7], "old": [0, 6], "omit": 7, "on_init_end": 1, "on_init_start": 1, "on_train_end": 1, "onc": [0, 5], "one": [0, 1, 2, 5, 7, 8], "one_channel_mod": 2, "ones": 5, "onli": [0, 1, 2, 5, 7, 8], "onto": [2, 5], "op": 2, "open": 1, "openmp": [2, 5, 7], "oper": [1, 2, 8], "optic": [0, 7], "optim": [0, 4, 5, 7], "option": [0, 1, 2, 5, 8], "order": [0, 1, 2, 4, 5, 8], "org": 5, "orient": [0, 1, 2], "origin": [1, 2, 5], "orthogon": 2, "other": [0, 5, 7, 8], "otherwis": [2, 5], "out": [0, 5, 7, 8], "output": [1, 2, 4, 5, 7, 8], "output_1": [0, 7], "output_2": [0, 7], "output_dir": 7, "output_fil": 7, "output_s": 2, "outsid": 2, "over": 2, "overfit": 7, "overhead": 5, "overlap": 5, "overrid": [2, 5], "overwritten": 2, "p": [0, 4, 5], "packag": [2, 4, 7], "pad": [0, 2, 5], "padding_left": 2, "padding_right": 2, "page": [1, 2, 4, 7], "page_doc": 1, "page_idx": 2, "pagecont": 5, "pageseg": 1, "pagexml": [0, 1, 2, 4, 7], "paint": 5, "pair": [0, 2], "paper": 0, "par": [1, 4], "paradigm": 0, "paragraph": 5, "parallel": [2, 5], "param": [5, 7, 8], "paramet": [0, 1, 2, 4, 5, 7, 8], "parameterless": 0, "parametr": 2, "parchment": 0, "pars": [2, 5], "parse_alto": [1, 2], "parse_pag": [1, 2], "parse_xml": 2, "parser": [1, 2, 5], "part": [0, 1, 5, 7, 8], "parti": 1, "partial": [2, 4], "particular": [0, 1, 4, 5, 7, 8], "partit": 5, "pass": [2, 5, 7, 8], "path": [1, 2, 5], "pathlik": 2, "pattern": [2, 7], "pb_ignored_metr": 2, "pcgt": 5, "pdf": [0, 4, 7], "pdfimag": 7, "pdftocairo": 7, "peopl": 4, "per": [0, 1, 2, 5, 7], "perc": [0, 2], "percentag": 2, "percentil": 2, "perform": [1, 2, 4, 5, 7], "period": 7, "persist": 0, "person": 0, "pick": 5, "pickl": 6, "pil": [1, 2], "pillow": 1, "pinch": 0, "pinpoint": 7, "pipelin": 1, "pixel": [0, 1, 5, 8], "pl_logger": 2, "pl_modul": 1, "place": [0, 4, 7], "placement": 7, "plain": 0, "platform": 0, "pleas": 5, "plethora": 1, "png": [0, 1, 5, 7], "point": [0, 1, 2, 5, 7], "polygon": [0, 1, 2, 5, 7], "polygonal_reading_ord": 2, "polygongtdataset": 2, "polygonizaton": 2, "polylin": 2, "polyton": [0, 7], "pool": 5, "popul": 2, "porson": 0, "portant": 4, "portion": 0, "posit": [2, 5], "possibl": [0, 1, 2, 5, 7], "postprocess": [1, 5], "potenti": 5, "power": 7, "practic": 5, "pratiqu": 4, "pre": [0, 5], "precis": 5, "precompil": 5, "precomput": 2, "pred": 2, "pred_it": 1, "predict": [1, 2], "predict_label": 2, "predict_str": 2, "prefer": [1, 7], "prefilt": 0, "prefix": [2, 5, 7], "prefix_epoch": 7, "preliminari": 0, "preload": 7, "prematur": 5, "prepar": 7, "prepend": 8, "preprint": 2, "preprocess": [2, 4], "prerequisit": 4, "preserv": 2, "pretrain_best": 5, "prevent": [2, 7], "previou": 4, "previous": 5, "primaresearch": 5, "primari": [0, 1, 5], "primarili": 4, "princip": [1, 2, 5], "print": [0, 1, 4, 5, 7], "printspac": 5, "privat": 0, "prob": [2, 8], "probabl": [2, 5, 7, 8], "problemat": 5, "proceed": 2, "process": [0, 1, 2, 4, 5, 7, 8], "processing_step": 2, "produc": [0, 1, 5, 7], "programm": 4, "progress": [2, 7], "project": [4, 8], "prone": 5, "pronn": 6, "proper": 1, "properli": 7, "properti": 2, "proport": 5, "protobuf": [2, 6], "prove": 7, "provid": [0, 1, 2, 4, 5, 7, 8], "psl": 4, "public": [0, 4], "pull": 4, "purpos": [0, 1, 2, 7, 8], "put": [2, 7], "py": 1, "pypi": 4, "pyrnn": 6, "python": 4, "pytorch": [0, 1, 3, 6], "pytorch_lightn": [1, 2], "pytorchcodec": 2, "pyvip": 4, "q": 5, "qualiti": [0, 1, 7], "queryabl": 0, "quit": [1, 4, 5], "r": [0, 2, 5, 8], "rais": [1, 2, 5], "ran": 4, "random": [5, 7], "randomli": 5, "rang": [0, 2], "rapidli": 7, "rate": [5, 7], "rather": [0, 5], "ratio": 5, "raw": [0, 1, 5, 7], "rb": 2, "reach": 7, "read": [0, 2, 4, 5], "reader": 5, "reading_ord": 2, "reading_order_fn": 2, "real": 7, "realiz": 5, "reason": [0, 2, 5], "rec_model_path": 1, "recherch": 4, "recogn": [0, 1, 2, 4, 5, 7], "recognit": [2, 3, 8], "recognitionmodel": 1, "recommend": [0, 1, 5, 7], "record": [1, 2, 4], "rectangl": 2, "rectangular": 0, "recurr": [2, 6], "reduc": [5, 8], "reduceonplateau": 5, "refer": [0, 1, 5, 7], "referenc": 2, "refin": 5, "region": [0, 1, 2, 4, 5, 7], "region_typ": 5, "region_type_0": 2, "region_type_1": 2, "regular": 5, "rel": 5, "relat": [0, 1, 5, 7], "relax": 7, "reliabl": 7, "relu": 8, "remain": [0, 5, 7], "remaind": 8, "remedi": 7, "remov": [0, 2, 5, 7, 8], "render": [1, 2], "render_report": 2, "reorder": [2, 5, 7], "repeatedli": 7, "repolygon": 1, "report": [2, 5, 7], "repositori": [4, 7], "repres": 2, "represent": [2, 7], "reproduc": 5, "request": [0, 4, 8], "requir": [0, 1, 2, 4, 5, 7, 8], "requisit": 7, "research": 4, "reserv": 1, "reshap": [2, 5], "resili": 4, "resiz": [2, 5], "resize_output": 2, "resolv": 5, "respect": [1, 2, 4, 5, 8], "result": [0, 1, 2, 5, 7, 8], "resum": 5, "retain": [2, 5], "retrain": 7, "retriev": [4, 5, 7], "return": [0, 1, 2, 8], "reus": 2, "revers": 8, "rgb": [1, 8], "right": [0, 2, 4, 5, 7], "rl": [0, 2], "rmsprop": [5, 7], "rnn": [2, 4, 5, 7, 8], "romanov": 7, "rotat": 0, "rough": 7, "roughli": 0, "routin": 1, "rpred": 1, "rtl": 0, "rtl_display_data": 5, "rtl_training_data": 5, "rukkakha": 7, "rule": 7, "run": [1, 2, 3, 4, 5, 7, 8], "r\u00e9f\u00e9renc": 4, "s1": [5, 8], "sa": 0, "same": [0, 1, 2, 4, 5, 7], "sampl": [2, 5, 7], "sarah": 7, "satur": 5, "savant": 7, "save": [2, 5, 7], "save_model": 2, "savefreq": [5, 7], "scale": [0, 2, 8], "scale_polygonal_lin": 2, "scale_region": 2, "scan": 7, "scantailor": 7, "schedul": 5, "schema": 5, "schemaloc": 5, "scientif": 4, "scratch": 0, "script": [0, 1, 2, 4, 5, 7], "script_detect": 1, "script_typ": 2, "scriptal": 1, "scroung": 4, "seamcarv": 2, "search": [0, 2], "second": [0, 2], "section": [1, 7], "see": [0, 1, 5, 7], "seen": [0, 1, 7], "seg": 1, "seg_idx": 2, "seg_typ": 2, "segment": [4, 7], "segment_k": 5, "segmentation_output": 1, "segmentation_overlai": 1, "segmentationmodel": 1, "segmodel_best": 5, "segresult": 2, "segtrain": 5, "seldom": 7, "select": [0, 2, 5, 8], "selector": 2, "self": 1, "semant": [5, 7], "semi": [0, 7], "sensibl": [1, 5], "separ": [0, 1, 2, 4, 5, 7, 8], "sephardi": 0, "seqrecogn": 2, "sequenc": [1, 2, 5, 7, 8], "serial": [0, 4, 6], "serialize_segment": [1, 2], "set": [0, 1, 2, 4, 5, 7, 8], "set_num_thread": 2, "setup": 1, "sever": [1, 2, 7], "sgd": 5, "shape": [2, 5, 8], "share": [0, 5], "shell": 7, "shini": 2, "short": [0, 8], "should": [1, 2, 7], "show": [0, 4, 5, 7], "shown": [0, 7], "shuffl": 1, "side": 0, "sigma": 2, "sigmoid": 8, "similar": [1, 5, 7], "simpl": [0, 1, 5, 7, 8], "simplifi": 0, "singl": [0, 1, 2, 5, 7, 8], "singular": 2, "size": [0, 1, 2, 5, 7, 8], "skew": [0, 7], "skip": 2, "skip_empty_lin": 2, "slice": 2, "slightli": [5, 7, 8], "slow": 5, "slower": 5, "small": [0, 1, 2, 5, 7, 8], "so": [0, 1, 3, 5, 7, 8], "sobel": 2, "softmax": [1, 2, 8], "softwar": [0, 7], "some": [0, 1, 2, 4, 5, 7], "someon": 0, "someth": [1, 7], "sometim": [1, 4, 5, 7], "somewhat": 7, "soon": [5, 7], "sort": [2, 4, 7], "sourc": [2, 5, 7, 8], "sourceimageinform": 5, "sp": 5, "space": [0, 1, 2, 4, 5, 7], "spec": [2, 5], "special": [0, 1, 2], "specialis": 5, "specif": [0, 2, 5, 7], "specifi": [0, 5], "speckl": 7, "speech": 2, "speedup": 5, "split": [2, 5, 7, 8], "spot": 4, "squash": [2, 8], "stabl": [1, 4, 5], "stack": [2, 5, 8], "stage": [0, 1], "standard": [0, 1, 4, 5, 7], "start": [0, 1, 2, 5, 7], "start_separ": 2, "stddev": 5, "step": [0, 1, 2, 4, 5, 7, 8], "still": [0, 1], "stop": [5, 7], "str": 2, "straightforward": 1, "stream": 5, "strength": 1, "strict": [2, 5], "strictli": 7, "stride": [5, 8], "stride_i": 8, "stride_x": 8, "string": [2, 5, 8], "strip": 8, "structur": [1, 4, 5], "stub": 5, "sub": 1, "subcommand": [0, 4], "subcommand_1": 0, "subcommand_2": 0, "subcommand_n": 0, "subimag": 2, "suboptim": 5, "subsampl": 5, "subsequ": [1, 2], "subset": [1, 2], "substitut": [2, 5, 7], "suffer": 7, "suffici": [1, 5], "suffix": [0, 2], "suggest": [0, 1], "suit": 7, "suitabl": [0, 7], "summar": [2, 5, 7, 8], "superflu": 7, "supervis": 5, "suppl_obj": 2, "suppli": [0, 1, 2, 5, 7], "support": [0, 1, 4, 5, 6], "suppos": 1, "suppress": [0, 5], "sure": 0, "surfac": [0, 2], "surrog": 5, "switch": [0, 2, 5, 7], "symbol": [5, 7], "syntax": [0, 5, 8], "syr": [5, 7], "syriac": 7, "syriac_best": 7, "system": [0, 4, 5, 7], "systemat": 7, "t": [0, 1, 2, 5, 7, 8], "tabl": [5, 7], "tag": [2, 5], "tags_ignor": 2, "take": [1, 4, 5, 7], "tanh": 8, "target": 2, "task": [5, 7], "tb": 2, "technic": 4, "tei": 0, "tell": 5, "templat": [0, 1, 2, 4], "template_sourc": 2, "tempor": 2, "tensor": [1, 2, 8], "tensorflow": 8, "term": 4, "tesseract": 8, "test": [2, 7], "test_model": 5, "text": [1, 2, 4, 7], "text_direct": [1, 2], "text_transform": 2, "textblock": 5, "textblock_m": 5, "textblock_n": 5, "textequiv": 5, "textlin": 5, "textregion": 5, "than": [2, 5, 7], "thei": [1, 2, 5, 7], "them": [0, 2, 5], "therefor": [0, 5, 7], "therein": 7, "thi": [0, 1, 2, 4, 5, 6, 7, 8], "thing": 5, "third": 1, "those": 5, "though": 1, "thousand": 7, "thread": [2, 5, 7], "three": 6, "threshold": [0, 2], "through": [0, 1, 2, 4, 5, 7], "thrown": 0, "tif": [0, 4], "tiff": [0, 4, 7], "tightli": 7, "tild": 0, "time": [0, 1, 2, 5, 7, 8], "tip": 1, "titl": 0, "titr": 4, "tmpl": [0, 2], "token": 0, "told": 5, "too": [5, 8], "tool": [1, 5, 7, 8], "top": [0, 1, 2, 4], "toplin": [2, 5], "topolog": 0, "torch": 2, "torchsegrecogn": 2, "torchseqrecogn": [1, 2], "torchvgslmodel": [1, 2], "total": 7, "train": [0, 3, 8], "trainabl": [0, 1, 2, 4, 5], "trainer": [1, 5], "training_data": [1, 5], "training_fil": 1, "transcrib": [5, 7], "transcript": [1, 2, 5], "transcriptioninterfac": 2, "transfer": [1, 5], "transform": [1, 2, 4], "transformt": 1, "translat": 2, "transpos": [5, 7, 8], "travail": 4, "treat": [2, 7, 8], "trial": 5, "true": [1, 2, 8], "truli": 0, "truth": [5, 7], "try": 2, "tupl": 2, "turn": 4, "tutori": [1, 5], "tweak": 0, "two": [0, 1, 2, 5, 8], "txt": [0, 2, 4, 5], "type": [0, 1, 2, 5, 7, 8], "typefac": [5, 7], "typograph": 7, "typologi": 5, "u": [0, 1, 5], "u1f05": 5, "un": 4, "unclean": 7, "unclear": 5, "undecod": 1, "undegrad": 0, "under": [0, 2, 4], "undesir": [5, 8], "unencod": 2, "uneven": 0, "uni": [0, 7], "unicod": [1, 2, 7], "uniformli": 2, "union": [2, 4], "uniqu": [0, 7], "univers": 0, "universit\u00e9": 4, "unless": 5, "unnecessarili": 1, "unpredict": 7, "unrepres": 7, "unseg": [2, 7], "unset": 5, "until": 5, "untrain": 5, "unus": 5, "up": [1, 4, 5], "updat": 0, "upload": 0, "upon": 0, "upward": [2, 5, 7], "ur": 0, "us": [0, 1, 2, 3, 5, 7, 8], "usabl": 1, "user": [0, 2, 4, 5, 7], "user_metadata": 2, "usual": [0, 1, 5, 7], "utf": 5, "util": [1, 4, 5, 7], "v": [5, 7], "v4": 5, "val": 5, "val_metr": 2, "valid": [0, 2, 5], "valid_baselin": 2, "valid_region": 2, "valu": [0, 1, 2, 5, 8], "variabl": [2, 4, 5, 8], "variant": 5, "variat": 5, "varieti": [4, 5], "variou": 0, "vast": 1, "vector": [0, 1, 2], "vectorize_lin": 2, "verbos": [1, 7], "veri": 5, "versa": [0, 5], "versatil": 6, "version": [0, 2, 3, 4, 5], "vertic": [0, 2], "vgsl": [1, 5], "vice": [0, 5], "visual": 0, "vocabulari": 2, "vocal": 7, "vpo": 5, "vsgl": 2, "vv": 7, "w": [0, 1, 2, 5, 8], "w3": 5, "wa": [2, 4, 5, 7], "wai": [0, 1, 5, 7], "wait": 5, "want": [4, 5, 7], "warmup": 5, "warn": [0, 1, 7], "warp": 7, "wav2vec2": 2, "we": [2, 5, 7], "weak": [1, 7], "websit": 7, "weight": [2, 5], "welcom": 4, "well": [0, 5, 7], "were": [2, 5], "western": 7, "wget": 7, "what": [1, 7], "when": [0, 1, 2, 5, 7, 8], "where": [0, 2, 7], "whether": 2, "which": [0, 1, 2, 3, 4, 5], "while": [0, 1, 2, 5, 7], "white": [0, 1, 2, 7], "whitespac": [2, 5], "whitespace_norm": 2, "whole": [2, 7], "wide": [4, 8], "wider": 0, "width": [1, 2, 5, 7, 8], "wildli": 7, "without": [0, 2, 5, 7], "word": [4, 5], "word_text": 5, "work": [0, 1, 2, 5, 7], "workabl": 5, "worker": 5, "world": [0, 7], "worsen": 0, "would": [0, 5], "wrapper": [1, 2], "write": [0, 1, 2, 5], "writing_mod": 2, "written": [0, 5, 7], "www": 5, "x": [0, 2, 4, 5, 7, 8], "x0": 2, "x01": 1, "x02": 1, "x03": 1, "x04": 1, "x05": 1, "x06": 1, "x07": 1, "x1": 2, "x2": 2, "x64": 4, "x_0": 2, "x_1": 2, "x_m": 2, "x_n": 2, "x_stride": 8, "xa0": 7, "xdg_base_dir": 0, "xk": 2, "xm": 2, "xml": [0, 7], "xmln": 5, "xmlschema": 5, "xn": 2, "xsd": 5, "xsi": 5, "xyz": 0, "y": [0, 2, 8], "y0": 2, "y1": 2, "y2": 2, "y_0": 2, "y_1": 2, "y_m": 2, "y_n": 2, "y_stride": 8, "yield": 2, "yk": 2, "ym": 2, "yml": [4, 7], "yn": 2, "you": [4, 5, 7], "y\u016bsuf": 7, "zenodo": [0, 4], "zero": [2, 7, 8], "zigzag": 0, "zoom": [0, 2], "\u00e3\u00ed\u00f1\u00f5": 0, "\u00e9cole": 4, "\u00e9tat": 4, "\u00e9tude": 4, "\u0127\u0129\u0142\u0169\u01ba\u1d49\u1ebd": 0, "\u02bf\u0101lam": 7, "\u0621": 5, "\u0621\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0627": 5, "\u0628": 5, "\u0629": 5, "\u062a": 5, "\u062b": 5, "\u062c": 5, "\u062d": 5, "\u062e": 5, "\u062f": 5, "\u0630": 5, "\u0631": 5, "\u0632": 5, "\u0633": 5, "\u0634": 5, "\u0635": 5, "\u0636": 5, "\u0637": 5, "\u0638": 5, "\u0639": 5, "\u063a": 5, "\u0640": 5, "\u0641": 5, "\u0642": 5, "\u0643": 5, "\u0644": 5, "\u0645": 5, "\u0646": 5, "\u0647": 5, "\u0648": 5, "\u0649": 5, "\u064a": 5, "\u0710": 7, "\u0712": 7, "\u0713": 7, "\u0715": 7, "\u0717": 7, "\u0718": 7, "\u0719": 7, "\u071a": 7, "\u071b": 7, "\u071d": 7, "\u071f": 7, "\u0720": 7, "\u0721": 7, "\u0722": 7, "\u0723": 7, "\u0725": 7, "\u0726": 7, "\u0728": 7, "\u0729": 7, "\u072a": 7, "\u072b": 7, "\u072c": 7, "\u2079\ua751\ua753\ua76f\ua770": 0}, "titles": ["Advanced Usage", "API Quickstart", "API Reference", "GPU Acceleration", "kraken", "Training", "Models", "Training kraken", "VGSL network specification"], "titleterms": {"acceler": 3, "acquisit": 7, "advanc": 0, "alto": 5, "annot": 7, "api": [1, 2], "baselin": [0, 1], "basic": [1, 8], "binar": [0, 2], "binari": 5, "blla": 2, "box": 0, "codec": [2, 5], "compil": 7, "concept": 1, "conda": 4, "convolut": 8, "coreml": 6, "ctc_decod": 2, "data": 5, "dataset": [2, 5, 7], "direct": 0, "dropout": 8, "evalu": [2, 7], "exampl": 8, "except": 2, "featur": 4, "find": 4, "fine": 5, "format": [0, 5], "from": 5, "function": 2, "fund": 4, "gpu": 3, "group": 8, "helper": [2, 8], "imag": 7, "input": 0, "instal": [4, 7], "kraken": [2, 4, 7], "layer": 8, "legaci": [0, 1, 2], "lib": 2, "licens": 4, "linegen": 2, "loss": 2, "mask": 0, "max": 8, "model": [0, 2, 4, 6], "modul": 2, "network": 8, "normal": [5, 8], "output": 0, "page": [0, 5], "pageseg": 2, "pars": 1, "pip": 4, "plumb": 8, "pool": 8, "preprocess": [1, 7], "pretrain": 5, "princip": 0, "publish": 0, "queri": 0, "quickstart": [1, 4], "recognit": [0, 1, 4, 5, 6, 7], "recurr": 8, "refer": 2, "regular": 8, "relat": 4, "repositori": 0, "reshap": 8, "retriev": 0, "rpred": 2, "schedul": 2, "scratch": 5, "segment": [0, 1, 2, 5, 6], "serial": [1, 2], "slice": 5, "softwar": 4, "specif": 8, "stopper": 2, "test": 5, "text": [0, 5], "train": [1, 2, 4, 5, 7], "trainer": 2, "transcrib": 2, "transcript": 7, "tune": 5, "tutori": 4, "unicod": 5, "unsupervis": 5, "us": 4, "usag": 0, "valid": 7, "vgsl": [2, 8], "xml": [1, 2, 5]}}) \ No newline at end of file diff --git a/4.3.0/training.html b/4.3.0/training.html new file mode 100644 index 000000000..8f64d6eab --- /dev/null +++ b/4.3.0/training.html @@ -0,0 +1,509 @@ + + + + + + + + Training kraken — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Training kraken

+

kraken is an optical character recognition package that can be trained fairly +easily for a large number of scripts. In contrast to other system requiring +segmentation down to glyph level before classification, it is uniquely suited +for the recognition of connected scripts, because the neural network is trained +to assign correct character to unsegmented training data.

+

Both segmentation, the process finding lines and regions on a page image, and +recognition, the conversion of line images into text, can be trained in kraken. +To train models for either we require training data, i.e. examples of page +segmentations and transcriptions that are similar to what we want to be able to +recognize. For segmentation the examples are the location of baselines, i.e. +the imaginary lines the text is written on, and polygons of regions. For +recognition these are the text contained in a line. There are multiple ways to +supply training data but the easiest is through PageXML or ALTO files.

+
+

Installing kraken

+

The easiest way to install and use kraken is through conda. kraken works both on Linux and Mac OS +X. After installing conda, download the environment file and create the +environment for kraken:

+
$ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment.yml
+$ conda env create -f environment.yml
+
+
+

Each time you want to use the kraken environment in a shell is has to be +activated first:

+
$ conda activate kraken
+
+
+
+
+

Image acquisition and preprocessing

+

First a number of high quality scans, preferably color or grayscale and at +least 300dpi are required. Scans should be in a lossless image format such as +TIFF or PNG, images in PDF files have to be extracted beforehand using a tool +such as pdftocairo or pdfimages. While each of these requirements can +be relaxed to a degree, the final accuracy will suffer to some extent. For +example, only slightly compressed JPEG scans are generally suitable for +training and recognition.

+

Depending on the source of the scans some preprocessing such as splitting scans +into pages, correcting skew and warp, and removing speckles can be advisable +although it isn’t strictly necessary as the segmenter can be trained to treat +noisy material with a high accuracy. A fairly user-friendly software for +semi-automatic batch processing of image scans is Scantailor albeit most work can be done using a standard image +editor.

+

The total number of scans required depends on the kind of model to train +(segmentation or recognition), the complexity of the layout or the nature of +the script to recognize. Only features that are found in the training data can +later be recognized, so it is important that the coverage of typographic +features is exhaustive. Training a small segmentation model for a particular +kind of material might require less than a few hundred samples while a general +model can well go into the thousands of pages. Likewise a specific recognition +model for printed script with a small grapheme inventory such as Arabic or +Hebrew requires around 800 lines, with manuscripts, complex scripts (such as +polytonic Greek), and general models for multiple typefaces and hands needing +more training data for the same accuracy.

+

There is no hard rule for the amount of training data and it may be required to +retrain a model after the initial training data proves insufficient. Most +western texts contain between 25 and 40 lines per page, therefore upward of +30 pages have to be preprocessed and later transcribed.

+
+
+

Annotation and transcription

+

kraken does not provide internal tools for the annotation and transcription of +baselines, regions, and text. There are a number of tools available that can +create ALTO and PageXML files containing the requisite information for either +segmentation or recognition training: escriptorium integrates kraken tightly including +training and inference, Aletheia is a powerful desktop +application that can create fine grained annotations.

+
+
+

Dataset Compilation

+
+
+

Training

+

The training data, e.g. a collection of PAGE XML documents, obtained through +annotation and transcription may now be used to train segmentation and/or +transcription models.

+

The training data in output_dir may now be used to train a new model by +invoking the ketos train command. Just hand a list of images to the command +such as:

+
$ ketos train output_dir/*.png
+
+
+

to start training.

+

A number of lines will be split off into a separate held-out set that is used +to estimate the actual recognition accuracy achieved in the real world. These +are never shown to the network during training but will be recognized +periodically to evaluate the accuracy of the model. Per default the validation +set will comprise of 10% of the training data.

+

Basic model training is mostly automatic albeit there are multiple parameters +that can be adjusted:

+
+
--output
+

Sets the prefix for models generated during training. They will best as +prefix_epochs.mlmodel.

+
+
--report
+

How often evaluation passes are run on the validation set. It is an +integer equal or larger than 1 with 1 meaning a report is created each +time the complete training set has been seen by the network.

+
+
--savefreq
+

How often intermediate models are saved to disk. It is an integer with +the same semantics as --report.

+
+
--load
+

Continuing training is possible by loading an existing model file with +--load. To continue training from a base model with another +training set refer to the full ketos documentation.

+
+
--preload
+

Enables/disables preloading of the training set into memory for +accelerated training. The default setting preloads data sets with less +than 2500 lines, explicitly adding --preload will preload arbitrary +sized sets. --no-preload disables preloading in all circumstances.

+
+
+

Training a network will take some time on a modern computer, even with the +default parameters. While the exact time required is unpredictable as training +is a somewhat random process a rough guide is that accuracy seldom improves +after 50 epochs reached between 8 and 24 hours of training.

+

When to stop training is a matter of experience; the default setting employs a +fairly reliable approach known as early stopping that stops training as soon as +the error rate on the validation set doesn’t improve anymore. This will +prevent overfitting, i.e. +fitting the model to recognize only the training data properly instead of the +general patterns contained therein.

+
$ ketos train output_dir/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'}
+Initializing model ✓
+Accuracy report (0) -1.5951 3680 9550
+epoch 0/-1  [####################################]  788/788
+Accuracy report (1) 0.0245 3504 3418
+epoch 1/-1  [####################################]  788/788
+Accuracy report (2) 0.8445 3504 545
+epoch 2/-1  [####################################]  788/788
+Accuracy report (3) 0.9541 3504 161
+epoch 3/-1  [------------------------------------]  13/788  0d 00:22:09
+...
+
+
+

By now there should be a couple of models model_name-1.mlmodel, +model_name-2.mlmodel, … in the directory the script was executed in. Lets +take a look at each part of the output.

+
Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+
+
+

shows the progress of loading the training and validation set into memory. This +might take a while as preprocessing the whole set and putting it into memory is +computationally intensive. Loading can be made faster without preloading at the +cost of performing preprocessing repeatedly during the training process.

+
[270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'}
+
+
+

is a warning about missing characters in either the validation or training set, +i.e. that the alphabets of the sets are not equal. Increasing the size of the +validation set will often remedy this warning.

+
Accuracy report (2) 0.8445 3504 545
+
+
+

this line shows the results of the validation set evaluation. The error after 2 +epochs is 545 incorrect characters out of 3504 characters in the validation set +for a character accuracy of 84.4%. It should decrease fairly rapidly. If +accuracy remains around 0.30 something is amiss, e.g. non-reordered +right-to-left or wildly incorrect transcriptions. Abort training, correct the +error(s) and start again.

+

After training is finished the best model is saved as +model_name_best.mlmodel. It is highly recommended to also archive the +training log and data for later reference.

+

ketos can also produce more verbose output with training set and network +information by appending one or more -v to the command:

+
$ ketos -vv train syr/*.png
+[0.7272] Building ground truth set from 876 line images
+[0.7281] Taking 88 lines from training for evaluation
+...
+[0.8479] Training set 788 lines, validation set 88 lines, alphabet 48 symbols
+[0.8481] alphabet mismatch {'\xa0', '0', ':', '݀', '܇', '݂', '5'}
+[0.8482] grapheme       count
+[0.8484] SPACE  5258
+[0.8484]        ܐ       3519
+[0.8485]        ܘ       2334
+[0.8486]        ܝ       2096
+[0.8487]        ܠ       1754
+[0.8487]        ܢ       1724
+[0.8488]        ܕ       1697
+[0.8489]        ܗ       1681
+[0.8489]        ܡ       1623
+[0.8490]        ܪ       1359
+[0.8491]        ܬ       1339
+[0.8491]        ܒ       1184
+[0.8492]        ܥ       824
+[0.8492]        .       811
+[0.8493] COMBINING DOT BELOW    646
+[0.8493]        ܟ       599
+[0.8494]        ܫ       577
+[0.8495] COMBINING DIAERESIS    488
+[0.8495]        ܚ       431
+[0.8496]        ܦ       428
+[0.8496]        ܩ       307
+[0.8497] COMBINING DOT ABOVE    259
+[0.8497]        ܣ       256
+[0.8498]        ܛ       204
+[0.8498]        ܓ       176
+[0.8499]        ܀       132
+[0.8499]        ܙ       81
+[0.8500]        *       66
+[0.8501]        ܨ       59
+[0.8501]        ܆       40
+[0.8502]        [       40
+[0.8503]        ]       40
+[0.8503]        1       18
+[0.8504]        2       11
+[0.8504]        ܇       9
+[0.8505]        3       8
+[0.8505]                6
+[0.8506]        5       5
+[0.8506] NO-BREAK SPACE 4
+[0.8507]        0       4
+[0.8507]        6       4
+[0.8508]        :       4
+[0.8508]        8       4
+[0.8509]        9       3
+[0.8510]        7       3
+[0.8510]        4       3
+[0.8511] SYRIAC FEMININE DOT    1
+[0.8511] SYRIAC RUKKAKHA        1
+[0.8512] Encoding training set
+[0.9315] Creating new model [1,1,0,48 Lbx100 Do] with 49 outputs
+[0.9318] layer          type    params
+[0.9350] 0              rnn     direction b transposed False summarize False out 100 legacy None
+[0.9361] 1              dropout probability 0.5 dims 1
+[0.9381] 2              linear  augmented False out 49
+[0.9918] Constructing RMSprop optimizer (lr: 0.001, momentum: 0.9)
+[0.9920] Set OpenMP threads to 4
+[0.9920] Moving model to device cpu
+[0.9924] Starting evaluation run
+
+
+

indicates that the training is running on 788 transcribed lines and a +validation set of 88 lines. 49 different classes, i.e. Unicode code points, +where found in these 788 lines. These affect the output size of the network; +obviously only these 49 different classes/code points can later be output by +the network. Importantly, we can see that certain characters occur markedly +less often than others. Characters like the Syriac feminine dot and numerals +that occur less than 10 times will most likely not be recognized well by the +trained net.

+
+
+

Evaluation and Validation

+

While output during training is detailed enough to know when to stop training +one usually wants to know the specific kinds of errors to expect. Doing more +in-depth error analysis also allows to pinpoint weaknesses in the training +data, e.g. above average error rates for numerals indicate either a lack of +representation of numerals in the training data or erroneous transcription in +the first place.

+

First the trained model has to be applied to some line transcriptions with the +ketos test command:

+
$ ketos test -m syriac_best.mlmodel lines/*.png
+Loading model syriac_best.mlmodel ✓
+Evaluating syriac_best.mlmodel
+Evaluating  [#-----------------------------------]    3%  00:04:56
+...
+
+
+

After all lines have been processed a evaluation report will be printed:

+
=== report  ===
+
+35619     Characters
+336       Errors
+99.06%    Accuracy
+
+157       Insertions
+81        Deletions
+98        Substitutions
+
+Count     Missed  %Right
+27046     143     99.47%  Syriac
+7015      52      99.26%  Common
+1558      60      96.15%  Inherited
+
+Errors    Correct-Generated
+25        {  } - { COMBINING DOT BELOW }
+25        { COMBINING DOT BELOW } - {  }
+15        { . } - {  }
+15        { COMBINING DIAERESIS } - {  }
+12        { ܢ } - {  }
+10        {  } - { . }
+8 { COMBINING DOT ABOVE } - {  }
+8 { ܝ } - {  }
+7 { ZERO WIDTH NO-BREAK SPACE } - {  }
+7 { ܆ } - {  }
+7 { SPACE } - {  }
+7 { ܣ } - {  }
+6 {  } - { ܝ }
+6 { COMBINING DOT ABOVE } - { COMBINING DIAERESIS }
+5 { ܙ } - {  }
+5 { ܬ } - {  }
+5 {  } - { ܢ }
+4 { NO-BREAK SPACE } - {  }
+4 { COMBINING DIAERESIS } - { COMBINING DOT ABOVE }
+4 {  } - { ܒ }
+4 {  } - { COMBINING DIAERESIS }
+4 { ܗ } - {  }
+4 {  } - { ܬ }
+4 {  } - { ܘ }
+4 { ܕ } - { ܢ }
+3 {  } - { ܕ }
+3 { ܐ } - {  }
+3 { ܗ } - { ܐ }
+3 { ܝ } - { ܢ }
+3 { ܀ } - { . }
+3 {  } - { ܗ }
+
+  .....
+
+
+

The first section of the report consists of a simple accounting of the number +of characters in the ground truth, the errors in the recognition output and the +resulting accuracy in per cent.

+

The next table lists the number of insertions (characters occurring in the +ground truth but not in the recognition output), substitutions (misrecognized +characters), and deletions (superfluous characters recognized by the model).

+

Next is a grouping of errors (insertions and substitutions) by Unicode script.

+

The final part of the report are errors sorted by frequency and a per +character accuracy report. Importantly most errors are incorrect recognition of +combining marks such as dots and diaereses. These may have several sources: +different dot placement in training and validation set, incorrect transcription +such as non-systematic transcription, or unclean speckled scans. Depending on +the error source, correction most often involves adding more training data and +fixing transcriptions. Sometimes it may even be advisable to remove +unrepresentative data from the training set.

+
+
+

Recognition

+

The kraken utility is employed for all non-training related tasks. Optical +character recognition is a multi-step process consisting of binarization +(conversion of input images to black and white), page segmentation (extracting +lines from the image), and recognition (converting line image to character +sequences). All of these may be run in a single call like this:

+
$ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m MODEL_FILE
+
+
+

producing a text file from the input image. There are also hocr and ALTO output +formats available through the appropriate switches:

+
$ kraken -i ... ocr -h
+$ kraken -i ... ocr -a
+
+
+

For debugging purposes it is sometimes helpful to run each step manually and +inspect intermediate results:

+
$ kraken -i INPUT_IMAGE BW_IMAGE binarize
+$ kraken -i BW_IMAGE LINES segment
+$ kraken -i BW_IMAGE OUTPUT_FILE ocr -l LINES ...
+
+
+

It is also possible to recognize more than one file at a time by just chaining +-i ... ... clauses like this:

+
$ kraken -i input_1 output_1 -i input_2 output_2 ...
+
+
+

Finally, there is a central repository containing freely available models. +Getting a list of all available models:

+
$ kraken list
+
+
+

Retrieving model metadata for a particular model:

+
$ kraken show arabic-alam-al-kutub
+name: arabic-alam-al-kutub.mlmodel
+
+An experimental model for Classical Arabic texts.
+
+Network trained on 889 lines of [0] as a test case for a general Classical
+Arabic model. Ground truth was prepared by Sarah Savant
+<sarah.savant@aku.edu> and Maxim Romanov <maxim.romanov@uni-leipzig.de>.
+
+Vocalization was omitted in the ground truth. Training was stopped at ~35000
+iterations with an accuracy of 97%.
+
+[0] Ibn al-Faqīh (d. 365 AH). Kitāb al-buldān. Edited by Yūsuf al-Hādī, 1st
+edition. Bayrūt: ʿĀlam al-kutub, 1416 AH/1996 CE.
+alphabet:  !()-.0123456789:[] «»،؟ءابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ARABIC
+MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW
+
+
+

and actually fetching the model:

+
$ kraken get arabic-alam-al-kutub
+
+
+

The downloaded model can then be used for recognition by the name shown in its metadata, e.g.:

+
$ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m arabic-alam-al-kutub.mlmodel
+
+
+

For more documentation see the kraken website.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/4.3.0/vgsl.html b/4.3.0/vgsl.html new file mode 100644 index 000000000..eeb27e722 --- /dev/null +++ b/4.3.0/vgsl.html @@ -0,0 +1,288 @@ + + + + + + + + VGSL network specification — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

VGSL network specification

+

kraken implements a dialect of the Variable-size Graph Specification Language +(VGSL), enabling the specification of different network architectures for image +processing purposes using a short definition string.

+
+

Basics

+

A VGSL specification consists of an input block, one or more layers, and an +output block. For example:

+
[1,48,0,1 Cr3,3,32 Mp2,2 Cr3,3,64 Mp2,2 S1(1x12)1,3 Lbx100 Do O1c103]
+
+
+

The first block defines the input in order of [batch, height, width, channels] +with zero-valued dimensions being variable. Integer valued height or width +input specifications will result in the input images being automatically scaled +in either dimension.

+

When channels are set to 1 grayscale or B/W inputs are expected, 3 expects RGB +color images. Higher values in combination with a height of 1 result in the +network being fed 1 pixel wide grayscale strips scaled to the size of the +channel dimension.

+

After the input, a number of layers are defined. Layers operate on the channel +dimension; this is intuitive for convolutional layers but a recurrent layer +doing sequence classification along the width axis on an image of a particular +height requires the height dimension to be moved to the channel dimension, +e.g.:

+
[1,48,0,1 S1(1x48)1,3 Lbx100 O1c103]
+
+
+

or using the alternative slightly faster formulation:

+
[1,1,0,48 Lbx100 O1c103]
+
+
+

Finally an output definition is appended. When training sequence classification +networks with the provided tools the appropriate output definition is +automatically appended to the network based on the alphabet of the training +data.

+
+
+

Examples

+
[1,1,0,48 Lbx100 Do 01c59]
+
+Creating new model [1,1,0,48 Lbx100 Do] with 59 outputs
+layer           type    params
+0               rnn     direction b transposed False summarize False out 100 legacy None
+1               dropout probability 0.5 dims 1
+2               linear  augmented False out 59
+
+
+

A simple recurrent recognition model with a single LSTM layer classifying lines +normalized to 48 pixels in height.

+
[1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do 01c59]
+
+Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 59 outputs
+layer           type    params
+0               conv    kernel 3 x 3 filters 32 activation r
+1               dropout probability 0.1 dims 2
+2               maxpool kernel 2 x 2 stride 2 x 2
+3               conv    kernel 3 x 3 filters 64 activation r
+4               dropout probability 0.1 dims 2
+5               maxpool kernel 2 x 2 stride 2 x 2
+6               reshape from 1 1 x 12 to 1/3
+7               rnn     direction b transposed False summarize False out 100 legacy None
+8               dropout probability 0.5 dims 1
+9               linear  augmented False out 59
+
+
+

A model with a small convolutional stack before a recurrent LSTM layer. The +extended dropout layer syntax is used to reduce drop probability on the depth +dimension as the default is too high for convolutional layers. The remainder of +the height dimension (12) is reshaped into the depth dimensions before +applying the final recurrent and linear layers.

+
[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do 01c59]
+
+Creating new model [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do] with 59 outputs
+layer           type    params
+0               conv    kernel 3 x 3 filters 16 activation r
+1               maxpool kernel 3 x 3 stride 3 x 3
+2               rnn     direction f transposed True summarize True out 64 legacy None
+3               rnn     direction b transposed False summarize False out 128 legacy None
+4               rnn     direction b transposed False summarize False out 256 legacy None
+5               dropout probability 0.5 dims 1
+6               linear  augmented False out 59
+
+
+

A model with arbitrary sized color image input, an initial summarizing +recurrent layer to squash the height to 64, followed by 2 bi-directional +recurrent layers and a linear projection.

+
+
+

Convolutional Layers

+
C[{name}](s|t|r|l|m)[{name}]<y>,<x>,<d>[,<stride_y>,<stride_x>]
+s = sigmoid
+t = tanh
+r = relu
+l = linear
+m = softmax
+
+
+

Adds a 2D convolution with kernel size (y, x) and d output channels, applying +the selected nonlinearity. The stride can be adjusted with the optional last +two parameters.

+
+
+

Recurrent Layers

+
L[{name}](f|r|b)(x|y)[s][{name}]<n> LSTM cell with n outputs.
+G[{name}](f|r|b)(x|y)[s][{name}]<n> GRU cell with n outputs.
+f runs the RNN forward only.
+r runs the RNN reversed only.
+b runs the RNN bidirectionally.
+s (optional) summarizes the output in the requested dimension, return the last step.
+
+
+

Adds either an LSTM or GRU recurrent layer to the network using either the x +(width) or y (height) dimension as the time axis. Input features are the +channel dimension and the non-time-axis dimension (height/width) is treated as +another batch dimension. For example, a Lfx25 layer on an 1, 16, 906, 32 +input will execute 16 independent forward passes on 906x32 tensors resulting +in an output of shape 1, 16, 906, 25. If this isn’t desired either run a +summarizing layer in the other direction, e.g. Lfys20 for an input 1, 1, +906, 20, or prepend a reshape layer S1(1x16)1,3 combining the height and +channel dimension for an 1, 1, 906, 512 input to the recurrent layer.

+
+
+

Helper and Plumbing Layers

+
+

Max Pool

+
Mp[{name}]<y>,<x>[,<y_stride>,<x_stride>]
+
+
+

Adds a maximum pooling with (y, x) kernel_size and (y_stride, x_stride) stride.

+
+
+

Reshape

+
S[{name}]<d>(<a>x<b>)<e>,<f> Splits one dimension, moves one part to another
+        dimension.
+
+
+

The S layer reshapes a source dimension d to a,b and distributes a into +dimension e, respectively b into f. Either e or f has to be equal to +d. So S1(1, 48)1, 3 on an 1, 48, 1020, 8 input will first reshape into +1, 1, 48, 1020, 8, leave the 1 part in the height dimension and distribute +the 48 sized tensor into the channel dimension resulting in a 1, 1, 1024, +48*8=384 sized output. S layers are mostly used to remove undesirable non-1 +height before a recurrent layer.

+
+

Note

+

This S layer is equivalent to the one implemented in the tensorflow +implementation of VGSL, i.e. behaves differently from tesseract.

+
+
+
+
+

Regularization Layers

+
+

Dropout

+
Do[{name}][<prob>],[<dim>] Insert a 1D or 2D dropout layer
+
+
+

Adds an 1D or 2D dropout layer with a given probability. Defaults to 0.5 drop +probability and 1D dropout. Set to dim to 2 after convolutional layers.

+
+
+

Group Normalization

+
Gn<groups> Inserts a group normalization layer
+
+
+

Adds a group normalization layer separating the input into <groups> groups, +normalizing each separately.

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.0.0/.buildinfo b/5.0.0/.buildinfo new file mode 100644 index 000000000..474acac1e --- /dev/null +++ b/5.0.0/.buildinfo @@ -0,0 +1,4 @@ +# Sphinx build info version 1 +# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. +config: 7fdc1e3f9b7f03681badcd730cd3ed0c +tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/5.0.0/.doctrees/advanced.doctree b/5.0.0/.doctrees/advanced.doctree new file mode 100644 index 000000000..8a138ab7c Binary files /dev/null and b/5.0.0/.doctrees/advanced.doctree differ diff --git a/5.0.0/.doctrees/api.doctree b/5.0.0/.doctrees/api.doctree new file mode 100644 index 000000000..3f30f9c22 Binary files /dev/null and b/5.0.0/.doctrees/api.doctree differ diff --git a/5.0.0/.doctrees/api_docs.doctree b/5.0.0/.doctrees/api_docs.doctree new file mode 100644 index 000000000..fcb5db135 Binary files /dev/null and b/5.0.0/.doctrees/api_docs.doctree differ diff --git a/5.0.0/.doctrees/environment.pickle b/5.0.0/.doctrees/environment.pickle new file mode 100644 index 000000000..a79561c3b Binary files /dev/null and b/5.0.0/.doctrees/environment.pickle differ diff --git a/5.0.0/.doctrees/gpu.doctree b/5.0.0/.doctrees/gpu.doctree new file mode 100644 index 000000000..e6aca5d55 Binary files /dev/null and b/5.0.0/.doctrees/gpu.doctree differ diff --git a/5.0.0/.doctrees/index.doctree b/5.0.0/.doctrees/index.doctree new file mode 100644 index 000000000..2c96ad49a Binary files /dev/null and b/5.0.0/.doctrees/index.doctree differ diff --git a/5.0.0/.doctrees/ketos.doctree b/5.0.0/.doctrees/ketos.doctree new file mode 100644 index 000000000..f6c10956a Binary files /dev/null and b/5.0.0/.doctrees/ketos.doctree differ diff --git a/5.0.0/.doctrees/models.doctree b/5.0.0/.doctrees/models.doctree new file mode 100644 index 000000000..0a01c2c02 Binary files /dev/null and b/5.0.0/.doctrees/models.doctree differ diff --git a/5.0.0/.doctrees/training.doctree b/5.0.0/.doctrees/training.doctree new file mode 100644 index 000000000..c3bdddfde Binary files /dev/null and b/5.0.0/.doctrees/training.doctree differ diff --git a/5.0.0/.doctrees/vgsl.doctree b/5.0.0/.doctrees/vgsl.doctree new file mode 100644 index 000000000..b4cbb8672 Binary files /dev/null and b/5.0.0/.doctrees/vgsl.doctree differ diff --git a/5.0.0/.nojekyll b/5.0.0/.nojekyll new file mode 100644 index 000000000..e69de29bb diff --git a/5.0.0/_images/blla_heatmap.jpg b/5.0.0/_images/blla_heatmap.jpg new file mode 100644 index 000000000..3f3381096 Binary files /dev/null and b/5.0.0/_images/blla_heatmap.jpg differ diff --git a/5.0.0/_images/blla_output.jpg b/5.0.0/_images/blla_output.jpg new file mode 100644 index 000000000..72c652fa4 Binary files /dev/null and b/5.0.0/_images/blla_output.jpg differ diff --git a/5.0.0/_images/bw.png b/5.0.0/_images/bw.png new file mode 100644 index 000000000..e7e5054eb Binary files /dev/null and b/5.0.0/_images/bw.png differ diff --git a/5.0.0/_images/normal-reproduction-low-resolution.jpg b/5.0.0/_images/normal-reproduction-low-resolution.jpg new file mode 100644 index 000000000..673be92ae Binary files /dev/null and b/5.0.0/_images/normal-reproduction-low-resolution.jpg differ diff --git a/5.0.0/_images/pat.png b/5.0.0/_images/pat.png new file mode 100644 index 000000000..52f7a8995 Binary files /dev/null and b/5.0.0/_images/pat.png differ diff --git a/5.0.0/_sources/advanced.rst.txt b/5.0.0/_sources/advanced.rst.txt new file mode 100644 index 000000000..533e1280f --- /dev/null +++ b/5.0.0/_sources/advanced.rst.txt @@ -0,0 +1,466 @@ +.. _advanced: + +Advanced Usage +============== + +Optical character recognition is the serial execution of multiple steps, in the +case of kraken, layout analysis/page segmentation (extracting topological text +lines from an image), recognition (feeding text lines images into a +classifier), and finally serialization of results into an appropriate format +such as ALTO or PageXML. + +Input and Outputs +----------------- + +Kraken inputs and their outputs can be defined in multiple ways. The most +simple are input-output pairs, i.e. producing one output document for one input +document follow the basic syntax: + +.. code-block:: console + + $ kraken -i input_1 output_1 -i input_2 output_2 ... subcommand_1 subcommand_2 ... subcommand_n + +In particular subcommands may be chained. + +There are other ways to define inputs and outputs as the syntax shown above can +become rather cumbersome for large amounts of files. + +As such there are a couple of ways to deal with multiple files in a compact +way. The first is batch processing: + +.. code-block:: console + + $ kraken -I '*.png' -o ocr.txt segment ... + +which expands the `glob expression +`_ in kraken internally and +appends the suffix defined with `-o` to each output file. An input file +`xyz.png` will therefore produce an output file `xyz.png.ocr.txt`. `-I` batch +inputs can also be specified multiple times: + +.. code-block:: console + + $ kraken -I '*.png' -I '*.jpg' -I '*.tif' -o ocr.txt segment ... + +A second way is to input multi-image files directly. These can be either in +PDF, TIFF, or JPEG2000 format and are specified like: + +.. code-block:: console + + $ kraken -I some.pdf -o ocr.txt -f pdf segment ... + +This will internally extract all page images from the input PDF file and write +one output file with an index (can be changed using the `-p` option) and the +suffix defined with `-o`. + +The `-f` option can not only be used to extract data from PDF/TIFF/JPEG2000 +files but also various XML formats. In these cases the appropriate data is +automatically selected from the inputs, image data for segmentation or line and +region segmentation for recognition: + +.. code-block:: console + + $ kraken -i alto.xml alto.ocr.txt -i page.xml page.ocr.txt -f xml ocr ... + +The code is able to automatically determine if a file is in PageXML or ALTO format. + +Output formats +^^^^^^^^^^^^^^ + +All commands have a default output format such as raw text for `ocr`, a plain +image for `binarize`, or a JSON definition of the the segmentation for +`segment`. These are specific to kraken and generally not suitable for further +processing by other software but a number of standardized data exchange formats +can be selected. Per default `ALTO `_, +`PageXML `_, `hOCR +`_, and abbyyXML containing additional metadata such as +bounding boxes and confidences are implemented. In addition, custom `jinja +`_ templates can be loaded to create +individualised output such as TEI. + +Output formats are selected on the main `kraken` command and apply to the last +subcommand defined in the subcommand chain. For example: + +.. code-block:: console + + $ kraken --alto -i ... segment -bl + +will serialize a plain segmentation in ALTO into the specified output file. + +The currently available format switches are: + +.. code-block:: console + + $ kraken -n -i ... ... # native output + $ kraken -a -i ... ... # ALTO output + $ kraken -x -i ... ... # PageXML output + $ kraken -h -i ... ... # hOCR output + $ kraken -y -i ... ... # abbyyXML output + +Custom templates can be loaded with the ``--template`` option: + +.. code-block:: console + + $ kraken --template /my/awesome/template.tmpl -i ... ... + +The data objects used by the templates are considered internal to kraken and +can change from time to time. The best way to get some orientation when writing +a new template from scratch is to have a look at the existing templates `here +`_. + +Binarization +------------ + +.. _binarization: + +.. note:: + + Binarization is deprecated and mostly not necessary anymore. It can often + worsen text recognition results especially for documents with uneven + lighting, faint writing, etc. + +The binarization subcommand converts a color or grayscale input image into an +image containing only two color levels: white (background) and black +(foreground, i.e. text). It accepts almost the same parameters as +``ocropus-nlbin``. Only options not related to binarization, e.g. skew +detection are missing. In addition, error checking (image sizes, inversion +detection, grayscale enforcement) is always disabled and kraken will happily +binarize any image that is thrown at it. + +Available parameters are: + +============ ==== +option type +============ ==== +\--threshold FLOAT +\--zoom FLOAT +\--escale FLOAT +\--border FLOAT +\--perc INTEGER RANGE +\--range INTEGER +\--low INTEGER RANGE +\--high INTEGER RANGE +============ ==== + +To binarize an image: + +.. code-block:: console + + $ kraken -i input.jpg bw.png binarize + +.. note:: + + Some image formats, notably JPEG, do not support a black and white + image mode. Per default the output format according to the output file + name extension will be honored. If this is not possible, a warning will + be printed and the output forced to PNG: + + .. code-block:: console + + $ kraken -i input.jpg bw.jpg binarize + Binarizing [06/24/22 09:56:23] WARNING jpeg does not support 1bpp images. Forcing to png. + ✓ + +Page Segmentation +----------------- + +The `segment` subcommand accesses page segmentation into lines and regions with +the two layout analysis methods implemented: the trainable baseline segmenter +that is capable of detecting both lines of different types and regions and a +legacy non-trainable segmenter that produces bounding boxes. + +Universal parameters of either segmenter are: + +=============================================== ====== +option action +=============================================== ====== +-d, \--text-direction Sets principal text direction. Valid values are `horizontal-lr`, `horizontal-rl`, `vertical-lr`, and `vertical-rl`. +-m, \--mask Segmentation mask suppressing page areas for line detection. A simple black and white mask image where 0-valued (black) areas are ignored for segmentation purposes. +=============================================== ====== + +Baseline Segmentation +^^^^^^^^^^^^^^^^^^^^^ + +The baseline segmenter works by applying a segmentation model on a page image +which labels each pixel on the image with one or more classes with each class +corresponding to a line or region of a specific type. In addition there are two +auxiliary classes that are used to determine the line orientation. A simplified +example of a composite image of the auxiliary classes and a single line type +without regions can be seen below: + +.. image:: _static/blla_heatmap.jpg + :width: 800 + :alt: BLLA output heatmap + +In a second step the raw heatmap is vectorized to extract line instances and +region boundaries, followed by bounding polygon computation for the baselines, +and text line ordering. The final output can be visualized as: + +.. image:: _static/blla_output.jpg + :width: 800 + :alt: BLLA final output + +The primary determinant of segmentation quality is the segmentation model +employed. There is a default model that works reasonably well on printed and +handwritten material on undegraded, even writing surfaces such as paper or +parchment. The output of this model consists of a single line type and a +generic text region class that denotes coherent blocks of text. This model is +employed automatically when the baseline segment is activated with the `-bl` +option: + +.. code-block:: console + + $ kraken -i input.jpg segmentation.json segment -bl + +New models optimized for other kinds of documents can be trained (see +:ref:`here `). These can be applied with the `-i` option of the +`segment` subcommand: + +.. code-block:: console + + $ kraken -i input.jpg segmentation.json segment -bl -i fancy_model.mlmodel + +Legacy Box Segmentation +^^^^^^^^^^^^^^^^^^^^^^^ + +The legacy page segmentation is mostly parameterless, although a couple of +switches exist to tweak it for particular inputs. Its output consists of +rectangular bounding boxes in reading order and the general text direction +(horizontal, i.e. LTR or RTL text in top-to-bottom reading order or +vertical-ltr/rtl for vertical lines read from left-to-right or right-to-left). + +Apart from the limitations of the bounding box paradigm (rotated and curved +lines cannot be effectively extracted) another important drawback of the legacy +segmenter is the requirement for binarized input images. It is therefore +necessary to apply :ref:`binarization ` first or supply only +pre-binarized inputs. + +The legacy segmenter can be applied on some input image with: + +.. code-block:: console + + $ kraken -i 14.tif lines.json segment -x + $ cat lines.json + +Available specific parameters are: + +=============================================== ====== +option action +=============================================== ====== +\--scale FLOAT Estimate of the average line height on the page +-m, \--maxcolseps Maximum number of columns in the input document. Set to `0` for uni-column layouts. +-b, \--black-colseps / -w, \--white-colseps Switch to black column separators. +-r, \--remove-hlines / -l, \--hlines Disables prefiltering of small horizontal lines. Improves segmenter output on some Arabic texts. +-p, \--pad Adds left and right padding around lines in the output. +=============================================== ====== + +Principal Text Direction +^^^^^^^^^^^^^^^^^^^^^^^^ + +The principal text direction selected with the ``-d/--text-direction`` is a +switch used in the reading order heuristic to determine the order of text +blocks (regions) and individual lines. It roughly corresponds to the `block +flow direction +`_ in CSS with +an additional option. Valid options consist of two parts, an initial principal +line orientation (`horizontal` or `vertical`) followed by a block order (`lr` +for left-to-right or `rl` for right-to-left). + +.. warning: + + The principal text direction is independent of the direction of the + *inline text direction* (which is left-to-right for writing systems like + Latin and right-to-left for ones like Hebrew or Arabic). Kraken deals + automatically with the inline text direction through the BiDi algorithm + but can't infer the principal text direction automatically as it is + determined by factors like layout, type of document, primary script in + the document, and other factors. The different types of text + directionality and their relation can be confusing, the `W3C writing + mode `_ document explains + the fundamentals, although the model used in Kraken differs slightly. + +The first part is usually `horizontal` for scripts like Latin, Arabic, or +Hebrew where the lines are horizontally oriented on the page and are written/read from +top to bottom: + +.. image:: _static/bw.png + :width: 800 + :alt: Horizontal Latin script text + +Other scripts like Chinese can be written with vertical lines that are +written/read from left to right or right to left: + +.. image:: https://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Chinese_manuscript_Ti-i_ch%27i-shu._Wellcome_L0020843.jpg/577px-Chinese_manuscript_Ti-i_ch%27i-shu._Wellcome_L0020843.jpg + :width: 800 + :alt: Vertical Chinese text + +The second part is dependent on a number of factors as the order in which text +blocks are read is not fixed for every writing system. In mono-script texts it +is usually determined by the inline text direction, i.e. Latin script texts +columns are read starting with the top-left column followed by the column to +its right and so on, continuing with the left-most column below if none remain +to the right (inverse for right-to-left scripts like Arabic which start on the +top right-most columns, continuing leftward, and returning to the right-most +column just below when none remain). + +In multi-script documents the order is determined by the primary writing +system employed in the document, e.g. for a modern book containing both Latin +and Arabic script text it would be set to `lr` when Latin is primary, e.g. when +the binding is on the left side of the book seen from the title cover, and +vice-versa (`rl` if binding is on the right on the title cover). The analogue +applies to text written with vertical lines. + +With these explications there are four different text directions available: + +=============================================== ====== +Text Direction Examples +=============================================== ====== +horizontal-lr Latin script texts, Mixed LTR/RTL docs with principal LTR script +horizontal-rl Arabic script texts, Mixed LTR/RTL docs with principal RTL script +vertical-lr Vertical script texts read from left-to-right. +vertical-rl Vertical script texts read from right-to-left. +=============================================== ====== + +Masking +^^^^^^^ + +It is possible to keep the segmenter from finding text lines and regions on +certain areas of the input image. This is done through providing a binary mask +image that has the same size as the input image where blocked out regions are +black and valid regions white: + +.. code-block:: console + + $ kraken -i input.jpg segmentation.json segment -bl -m mask.png + +Model Repository +---------------- + +.. _repo: + +There is a semi-curated `repository +`_ of freely licensed recognition +models that can be interacted with from the command line using a few +subcommands. + +Querying and Model Retrieval +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``list`` subcommand retrieves a list of all models available and prints +them including some additional information (identifier, type, and a short +description): + +.. code-block:: console + + $ kraken list + Retrieving model list ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 8/8 0:00:00 0:00:07 + 10.5281/zenodo.6542744 (pytorch) - LECTAUREP Contemporary French Model (Administration) + 10.5281/zenodo.5617783 (pytorch) - Cremma-Medieval Old French Model (Litterature) + 10.5281/zenodo.5468665 (pytorch) - Medieval Hebrew manuscripts in Sephardi bookhand version 1.0 + ... + +To access more detailed information the ``show`` subcommand may be used: + +.. code-block:: console + + $ kraken show 10.5281/zenodo.5617783 + name: 10.5281/zenodo.5617783 + + Cremma-Medieval Old French Model (Litterature) + + .... + scripts: Latn + alphabet: &'(),-.0123456789:;?ABCDEFGHIJKLMNOPQRSTUVXabcdefghijklmnopqrstuvwxyz¶ãíñõ÷ħĩłũƺᵉẽ’•⁊⁹ꝑꝓꝯꝰ SPACE, COMBINING ACUTE ACCENT, COMBINING TILDE, COMBINING MACRON, COMBINING ZIGZAG ABOVE, COMBINING LATIN SMALL LETTER A, COMBINING LATIN SMALL LETTER E, COMBINING LATIN SMALL LETTER I, COMBINING LATIN SMALL LETTER O, COMBINING LATIN SMALL LETTER U, COMBINING LATIN SMALL LETTER C, COMBINING LATIN SMALL LETTER R, COMBINING LATIN SMALL LETTER T, COMBINING UR ABOVE, COMBINING US ABOVE, COMBINING LATIN SMALL LETTER S, 0xe8e5, 0xf038, 0xf128 + accuracy: 95.49% + license: CC-BY-SA-2.0 + author(s): Pinche, Ariane + date: 2021-10-29 + +If a suitable model has been decided upon it can be retrieved using the ``get`` +subcommand: + +.. code-block:: console + + $ kraken get 10.5281/zenodo.5617783 + Processing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 16.1/16.1 MB 0:00:00 0:00:10 + Model name: cremma_medieval_bicerin.mlmodel + +Models will be placed in ``$XDG_BASE_DIR`` and can be accessed using their name as +printed in the last line of the ``kraken get`` output. + +.. code-block:: console + + $ kraken -i ... ... ocr -m cremma_medieval_bicerin.mlmodel + +Publishing +^^^^^^^^^^ + +When one would like to share a model with the wider world (for fame and glory!) +it is possible (and recommended) to upload them to repository. The process +consists of 2 stages: the creation of the deposit on the Zenodo platform +followed by approval of the model in the community making it discoverable for +other kraken users. + +For uploading model a Zenodo account and a personal access token is required. +After account creation tokens can be created under the account settings: + +.. image:: _static/pat.png + :width: 800 + :alt: Zenodo token creation dialogue + +With the token models can then be uploaded: + +.. code-block:: console + + $ ketos publish -a $ACCESS_TOKEN aaebv2-2.mlmodel + DOI: 10.5281/zenodo.5617783 + +A number of important metadata will be asked for such as a short description of +the model, long form description, recognized scripts, and authorship. +Afterwards the model is deposited at Zenodo. This deposit is persistent, i.e. +can't be changed or deleted so it is important to make sure that all the +information is correct. Each deposit also has a unique persistent identifier, a +DOI, that can be used to refer to it, e.g. in publications or when pointing +someone to a particular model. + +Once the deposit has been created a request (requiring manual approval) for +inclusion in the repository will automatically be created which will make it +discoverable by other users. + +It is possible to deposit models without including them in the queryable +repository. Models uploaded this way are not truly private and can still be +found through the standard Zenodo search and be downloaded with `kraken get` +and its DOI. It is mostly suggested for preliminary models that might get +updated later: + +.. code-block:: console + + $ ketos publish --private -a $ACCESS_TOKEN aaebv2-2.mlmodel + DOI: 10.5281/zenodo.5617734 + +Recognition +----------- + +Recognition requires a grey-scale or binarized image, a page segmentation for +that image, and a model file. In particular there is no requirement to use the +page segmentation algorithm contained in the ``segment`` subcommand or the +binarization provided by kraken. + +Multi-script recognition is possible by supplying a script-annotated +segmentation and a mapping between scripts and models: + +.. code-block:: console + + $ kraken -i ... ... ocr -m Grek:porson.mlmodel -m Latn:antiqua.mlmodel + +All polytonic Greek text portions will be recognized using the `porson.mlmodel` +model while Latin text will be fed into the `antiqua.mlmodel` model. It is +possible to define a fallback model that other text will be fed to: + +.. code-block:: console + + $ kraken -i ... ... ocr -m ... -m ... -m default:porson.mlmodel + +It is also possible to disable recognition on a particular script by mapping to +the special model keyword `ignore`. Ignored lines will still be serialized but +will not contain any recognition results. diff --git a/5.0.0/_sources/api.rst.txt b/5.0.0/_sources/api.rst.txt new file mode 100644 index 000000000..703829f3a --- /dev/null +++ b/5.0.0/_sources/api.rst.txt @@ -0,0 +1,406 @@ +API Quickstart +============== + +Kraken provides routines which are usable by third party tools to access all +functionality of the OCR engine. Most functional blocks, binarization, +segmentation, recognition, and serialization are encapsulated in one high +level method each. + +Simple use cases of the API which are mostly useful for debugging purposes are +contained in the `contrib` directory. In general it is recommended to look at +this tutorial, these scripts, or the API reference. The command line drivers +are unnecessarily complex for straightforward applications as they contain lots +of boilerplate to enable all use cases. + +Basic Concepts +-------------- + +The fundamental modules of the API are similar to the command line drivers. +Image inputs and outputs are generally `Pillow `_ +objects and numerical outputs numpy arrays. + +Top-level modules implement high level functionality while :mod:`kraken.lib` +contains loaders and low level methods that usually should not be used if +access to intermediate results is not required. + +Preprocessing and Segmentation +------------------------------ + +The primary preprocessing function is binarization although depending on the +particular setup of the pipeline and the models utilized it can be optional. +For the non-trainable legacy bounding box segmenter binarization is mandatory +although it is still possible to feed color and grayscale images to the +recognizer. The trainable baseline segmenter can work with black and white, +grayscale, and color images, depending on the training data and network +configuration utilized; though grayscale and color data are used in almost all +cases. + +.. code-block:: python + + >>> from PIL import Image + + >>> from kraken import binarization + + # can be any supported image format and mode + >>> im = Image.open('foo.png') + >>> bw_im = binarization.nlbin(im) + +Legacy segmentation +~~~~~~~~~~~~~~~~~~~ + +The basic parameter of the legacy segmenter consists just of a b/w image +object, although some additional parameters exist, largely to change the +principal text direction (important for column ordering and top-to-bottom +scripts) and explicit masking of non-text image regions: + +.. code-block:: python + + >>> from kraken import pageseg + + >>> seg = pageseg.segment(bw_im) + >>> seg + {'text_direction': 'horizontal-lr', + 'boxes': [[0, 29, 232, 56], + [28, 54, 121, 84], + [9, 73, 92, 117], + [103, 76, 145, 131], + [7, 105, 119, 230], + [10, 228, 126, 345], + ... + ], + 'script_detection': False} + +Baseline segmentation +~~~~~~~~~~~~~~~~~~~~~ + +The baseline segmentation method is based on a neural network that classifies +image pixels into baselines and regions. Because it is trainable, a +segmentation model is required in addition to the image to be segmented and +it has to be loaded first: + +.. code-block:: python + + >>> from kraken import blla + >>> from kraken.lib import vgsl + + >>> model_path = 'path/to/model/file' + >>> model = vgsl.TorchVGSLModel.load_model(model_path) + +A segmentation model contains a basic neural network and associated metadata +defining the available line and region types, bounding regions, and an +auxiliary baseline location flag for the polygonizer: + +.. raw:: html + :file: _static/kraken_segmodel.svg + +Afterwards they can be fed into the segmentation method +:func:`kraken.blla.segment` with image objects: + +.. code-block:: python + + >>> from kraken import blla + >>> from kraken import serialization + + >>> baseline_seg = blla.segment(im, model=model) + >>> baseline_seg + {'text_direction': 'horizontal-lr', + 'type': 'baselines', + 'script_detection': False, + 'lines': [{'script': 'default', + 'baseline': [[471, 1408], [524, 1412], [509, 1397], [1161, 1412], [1195, 1412]], + 'boundary': [[471, 1408], [491, 1408], [515, 1385], [562, 1388], [575, 1377], ... [473, 1410]]}, + ...], + 'regions': {'$tip':[[[536, 1716], ... [522, 1708], [524, 1716], [536, 1716], ...] + '$par': ... + '$nop': ...}} + >>> alto = serialization.serialize_segmentation(baseline_seg, image_name=im.filename, image_size=im.size, template='alto') + >>> with open('segmentation_output.xml', 'w') as fp: + fp.write(alto) + +Optional parameters are largely the same as for the legacy segmenter, i.e. text +direction and masking. + +Images are automatically converted into the proper mode for recognition, except +in the case of models trained on binary images as there is a plethora of +different algorithms available, each with strengths and weaknesses. For most +material the kraken-provided binarization should be sufficient, though. This +does not mean that a segmentation model trained on RGB images will have equal +accuracy for B/W, grayscale, and RGB inputs. Nevertheless the drop in quality +will often be modest or non-existent for color models while non-binarized +inputs to a binary model will cause severe degradation (and a warning to that +notion). + +Per default segmentation is performed on the CPU although the neural network +can be run on a GPU with the `device` argument. As the vast majority of the +processing required is postprocessing the performance gain will most likely +modest though. + +The above API is the most simple way to perform a complete segmentation. The +process consists of multiple steps such as pixel labelling, separate region and +baseline vectorization, and bounding polygon calculation: + +.. raw:: html + :file: _static/kraken_segmentation.svg + +It is possible to only run a subset of the functionality depending on one's +needs by calling the respective functions in :mod:`kraken.lib.segmentation`. As +part of the sub-library the API is not guaranteed to be stable but it generally +does not change much. Examples of more fine-grained use of the segmentation API +can be found in `contrib/repolygonize.py +`_ +and `contrib/segmentation_overlay.py +`_. + +Recognition +----------- + +Recognition itself is a multi-step process with a neural network producing a +matrix with a confidence value for possible outputs at each time step. This +matrix is decoded into a sequence of integer labels (*label domain*) which are +subsequently mapped into Unicode code points using a codec. Labels and code +points usually correspond one-to-one, i.e. each label is mapped to exactly one +Unicode code point, but if desired more complex codecs can map single labels to +multiple code points, multiple labels to single code points, or multiple labels +to multiple code points (see the :ref:`Codec ` section for further +information). + +.. _recognition_steps: + +.. raw:: html + :file: _static/kraken_recognition.svg + +As the customization of this two-stage decoding process is usually reserved +for specialized use cases, sensible defaults are chosen by default: codecs are +part of the model file and do not have to be supplied manually; the preferred +CTC decoder is an optional parameter of the recognition model object. + +To perform text line recognition a neural network has to be loaded first. A +:class:`kraken.lib.models.TorchSeqRecognizer` is returned which is a wrapper +around the :class:`kraken.lib.vgsl.TorchVGSLModel` class seen above for +segmentation model loading. + +.. code-block:: python + + >>> from kraken.lib import models + + >>> rec_model_path = '/path/to/recognition/model' + >>> model = models.load_any(rec_model_path) + +The sequence recognizer wrapper combines the neural network itself, a +:ref:`codec `, metadata such as the if the input is supposed to be +grayscale or binarized, and an instance of a CTC decoder that performs the +conversion of the raw output tensor of the network into a sequence of labels: + +.. raw:: html + :file: _static/kraken_torchseqrecognizer.svg + +Afterwards, given an image, a segmentation and the model one can perform text +recognition. The code is identical for both legacy and baseline segmentations. +Like for segmentation input images are auto-converted to the correct color +mode, except in the case of binary models for which a warning will be raised if +there is a mismatch for binary input models. + +There are two methods for recognition, a basic single model call +:func:`kraken.rpred.rpred` and a multi-model recognizer +:func:`kraken.rpred.mm_rpred`. The latter is useful for recognizing +multi-scriptal documents, i.e. applying different models to different parts of +a document. + +.. code-block:: python + + >>> from kraken import rpred + # single model recognition + >>> pred_it = rpred(model, im, baseline_seg) + >>> for record in pred_it: + print(record) + +The output isn't just a sequence of characters but an +:class:`kraken.rpred.ocr_record` record object containing the character +prediction, cuts (approximate locations), and confidences. + +.. code-block:: python + + >>> record.cuts + >>> record.prediction + >>> record.confidences + +it is also possible to access the original line information: + +.. code-block:: python + + # for baselines + >>> record.type + 'baselines' + >>> record.line + >>> record.baseline + >>> record.script + + # for box lines + >>> record.type + 'box' + >>> record.line + >>> record.script + +Sometimes the undecoded raw output of the network is required. The :math:`C +\times W` softmax output matrix is accessible as the `outputs` attribute on the +:class:`kraken.lib.models.TorchSeqRecognizer` after each step of the +:func:`kraken.rpred.rpred` iterator. To get a mapping from the label space +:math:`C` the network operates in to Unicode code points a codec is used. An +arbitrary sequence of labels can generate an arbitrary number of Unicode code +points although usually the relation is one-to-one. + +.. code-block:: python + + >>> pred_it = rpred(model, im, baseline_seg) + >>> next(pred_it) + >>> model.output + >>> model.codec.l2c + {'\x01': ' ', + '\x02': '"', + '\x03': "'", + '\x04': '(', + '\x05': ')', + '\x06': '-', + '\x07': '/', + ... + } + +There are several different ways to convert the output matrix to a sequence of +labels that can be decoded into a character sequence. These are contained in +:mod:`kraken.lib.ctc_decoder` with +:func:`kraken.lib.ctc_decoder.greedy_decoder` being the default. + +XML Parsing +----------- + +Sometimes it is desired to take the data in an existing XML serialization +format like PageXML or ALTO and apply an OCR function on it. The +:mod:`kraken.lib.xml` module includes parsers extracting information into data +structures processable with minimal transformtion by the functional blocks: + +.. code-block:: python + + >>> from kraken.lib import xml + + >>> alto_doc = '/path/to/alto' + >>> xml.parse_alto(alto_doc) + {'image': '/path/to/image/file', + 'type': 'baselines', + 'lines': [{'baseline': [(24, 2017), (25, 2078)], + 'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)], + 'text': '', + 'script': 'default'}, + {'baseline': [(79, 2016), (79, 2041)], + 'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)], + 'text': '', + 'script': 'default'}, ...], + 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)], + [(253, 3292), (668, 3292), (668, 3455), (253, 3455)], + [(216, -4), (1015, -4), (1015, 534), (216, 534)]], + 'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)], + [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]], + ...} + } + + >>> page_doc = '/path/to/page' + >>> xml.parse_page(page_doc) + {'image': '/path/to/image/file', + 'type': 'baselines', + 'lines': [{'baseline': [(24, 2017), (25, 2078)], + 'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)], + 'text': '', + 'script': 'default'}, + {'baseline': [(79, 2016), (79, 2041)], + 'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)], + 'text': '', + 'script': 'default'}, ...], + 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)], + [(253, 3292), (668, 3292), (668, 3455), (253, 3455)], + [(216, -4), (1015, -4), (1015, 534), (216, 534)]], + 'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)], + [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]], + ...} + + +Serialization +------------- + +The serialization module can be used to transform the :class:`ocr_records +` returned by the prediction iterator into a text +based (most often XML) format for archival. The module renders `jinja2 +`_ templates in `kraken/templates` through +the :func:`kraken.serialization.serialize` function. + +.. code-block:: python + + >>> from kraken.lib import serialization + + >>> records = [record for record in pred_it] + >>> alto = serialization.serialize(records, image_name='path/to/image', image_size=im.size, template='alto') + >>> with open('output.xml', 'w') as fp: + fp.write(alto) + + +Training +-------- + +Training is largely implemented with the `pytorch lightning +`_ framework. There are separate +`LightningModule`s for recognition and segmentation training and a small +wrapper around the lightning's `Trainer` class that mainly sets up model +handling and verbosity options for the CLI. + + +.. code-block:: python + + >>> from kraken.lib.train import RecognitionModel, KrakenTrainer + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer() + >>> trainer.fit(model) + +Likewise for a baseline and region segmentation model: + +.. code-block:: python + + >>> from kraken.lib.train import SegmentationModel, KrakenTrainer + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = SegmentationModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer() + >>> trainer.fit(model) + +When the `fit()` method is called the dataset is initialized and the training +commences. Both can take quite a bit of time. To get insight into what exactly +is happening the standard `lightning callbacks +`_ +can be attached to the trainer object: + +.. code-block:: python + + >>> from pytorch_lightning.callbacks import Callback + >>> from kraken.lib.train import RecognitionModel, KrakenTrainer + >>> class MyPrintingCallback(Callback): + def on_init_start(self, trainer): + print("Starting to init trainer!") + + def on_init_end(self, trainer): + print("trainer is init now") + + def on_train_end(self, trainer, pl_module): + print("do something when training ends") + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer(enable_progress_bar=False, callbacks=[MyPrintingCallback]) + >>> trainer.fit(model) + Starting to init trainer! + trainer is init now + +This is only a small subset of the training functionality. It is suggested to +have a closer look at the command line parameters for features as transfer +learning, region and baseline filtering, training continuation, and so on. diff --git a/5.0.0/_sources/api_docs.rst.txt b/5.0.0/_sources/api_docs.rst.txt new file mode 100644 index 000000000..cb85ff91f --- /dev/null +++ b/5.0.0/_sources/api_docs.rst.txt @@ -0,0 +1,284 @@ +************* +API Reference +************* + +Segmentation +============ + +kraken.blla module +------------------ + +.. note:: + + `blla` provides the interface to the fully trainable segmenter. For the + legacy segmenter interface refer to the `pageseg` module. Note that + recognition models are not interchangeable between segmenters. + +.. autoapifunction:: kraken.blla.segment + +kraken.pageseg module +--------------------- + +.. note:: + + `pageseg` is the legacy bounding box-based segmenter. For the trainable + baseline segmenter interface refer to the `blla` module. Note that + recognition models are not interchangeable between segmenters. + +.. autoapifunction:: kraken.pageseg.segment + +Recognition +=========== + +kraken.rpred module +------------------- + +.. autoapiclass:: kraken.rpred.mm_rpred + :members: + +.. autoapifunction:: kraken.rpred.rpred + +Serialization +============= + +kraken.serialization module +--------------------------- + +.. autoapifunction:: kraken.serialization.render_report + +.. autoapifunction:: kraken.serialization.serialize + +.. autoapifunction:: kraken.serialization.serialize_segmentation + +Default templates +----------------- + +ALTO 4.4 +^^^^^^^^ + +.. literalinclude:: ../../templates/alto + :language: xml+jinja + +PageXML +^^^^^^^ + +.. literalinclude:: ../../templates/alto + :language: xml+jinja + +hOCR +^^^^ + +.. literalinclude:: ../../templates/alto + :language: xml+jinja + +ABBYY XML +^^^^^^^^^ + +.. literalinclude:: ../../templates/abbyyxml + :language: xml+jinja + +Containers and Helpers +====================== + +kraken.lib.codec module +----------------------- + +.. autoapiclass:: kraken.lib.codec.PytorchCodec + :members: + +kraken.containers module +------------------------ + +.. autoapiclass:: kraken.containers.Segmentation + :members: + +.. autoapiclass:: kraken.containers.BaselineLine + :members: + +.. autoapiclass:: kraken.containers.BBoxLine + :members: + +.. autoapiclass:: kraken.containers.ocr_record + :members: + +.. autoapiclass:: kraken.containers.BaselineOCRRecord + :members: + +.. autoapiclass:: kraken.containers.BBoxOCRRecord + :members: + +.. autoapiclass:: kraken.containers.ProcessingStep + :members: + +kraken.lib.ctc_decoder +---------------------- + +.. autoapifunction:: kraken.lib.ctc_decoder.beam_decoder + +.. autoapifunction:: kraken.lib.ctc_decoder.greedy_decoder + +.. autoapifunction:: kraken.lib.ctc_decoder.blank_threshold_decoder + +kraken.lib.exceptions +--------------------- + +.. autoapiclass:: kraken.lib.exceptions.KrakenCodecException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenStopTrainingException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenEncodeException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenRecordException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenInvalidModelException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenInputException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenRepoException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenCairoSurfaceException + :members: + +kraken.lib.models module +------------------------ + +.. autoapiclass:: kraken.lib.models.TorchSeqRecognizer + :members: + +.. autoapifunction:: kraken.lib.models.load_any + +kraken.lib.segmentation module +------------------------------ + +.. autoapifunction:: kraken.lib.segmentation.reading_order + +.. autoapifunction:: kraken.lib.segmentation.neural_reading_order + +.. autoapifunction:: kraken.lib.segmentation.polygonal_reading_order + +.. autoapifunction:: kraken.lib.segmentation.vectorize_lines + +.. autoapifunction:: kraken.lib.segmentation.calculate_polygonal_environment + +.. autoapifunction:: kraken.lib.segmentation.scale_polygonal_lines + +.. autoapifunction:: kraken.lib.segmentation.scale_regions + +.. autoapifunction:: kraken.lib.segmentation.compute_polygon_section + +.. autoapifunction:: kraken.lib.segmentation.extract_polygons + +kraken.lib.vgsl module +---------------------- + +.. autoapiclass:: kraken.lib.vgsl.TorchVGSLModel + :members: + +kraken.lib.xml module +--------------------- + +.. autoapiclass:: kraken.lib.xml.XMLPage + +Training +======== + +kraken.lib.train module +----------------------- + +Loss and Evaluation Functions +----------------------------- + +.. autoapifunction:: kraken.lib.train.recognition_loss_fn + +.. autoapifunction:: kraken.lib.train.baseline_label_loss_fn + +.. autoapifunction:: kraken.lib.train.recognition_evaluator_fn + +.. autoapifunction:: kraken.lib.train.baseline_label_evaluator_fn + +Trainer +------- + +.. autoapiclass:: kraken.lib.train.KrakenTrainer + :members: + + +kraken.lib.dataset module +------------------------- + +Recognition datasets +^^^^^^^^^^^^^^^^^^^^ + +.. autoapiclass:: kraken.lib.dataset.ArrowIPCRecognitionDataset + :members: + +.. autoapiclass:: kraken.lib.dataset.BaselineSet + :members: + +.. autoapiclass:: kraken.lib.dataset.GroundTruthDataset + :members: + +Segmentation datasets +^^^^^^^^^^^^^^^^^^^^^ + +.. autoapiclass:: kraken.lib.dataset.PolygonGTDataset + :members: + +Reading order datasets +^^^^^^^^^^^^^^^^^^^^^^ + +.. autoapiclass:: kraken.lib.dataset.PairWiseROSet + :members: + +.. autoapiclass:: kraken.lib.dataset.PageWiseROSet + :members: + +Helpers +^^^^^^^ + +.. autoapiclass:: kraken.lib.dataset.ImageInputTransforms + :members: + +.. autoapifunction:: kraken.lib.dataset.collate_sequences + +.. autoapifunction:: kraken.lib.dataset.global_align + +.. autoapifunction:: kraken.lib.dataset.compute_confusions + +Legacy modules +============== + +These modules are retained for compatibility reasons or highly specialized use +cases. In most cases their use is not necessary and they aren't further +developed for interoperability with new functionality, e.g. the transcription +and line generation modules do not work with the baseline segmenter. + +kraken.binarization module +-------------------------- + +.. autoapifunction:: kraken.binarization.nlbin + +kraken.transcribe module +------------------------ + +.. autoapiclass:: kraken.transcribe.TranscriptionInterface + :members: + +kraken.linegen module +--------------------- + +.. autoapiclass:: kraken.transcribe.LineGenerator + :members: + +.. autoapifunction:: kraken.transcribe.ocropy_degrade + +.. autoapifunction:: kraken.transcribe.degrade_line + +.. autoapifunction:: kraken.transcribe.distort_line diff --git a/5.0.0/_sources/gpu.rst.txt b/5.0.0/_sources/gpu.rst.txt new file mode 100644 index 000000000..fbb66ba76 --- /dev/null +++ b/5.0.0/_sources/gpu.rst.txt @@ -0,0 +1,10 @@ +.. _gpu: + +GPU Acceleration +================ + +The latest version of kraken uses a new pytorch backend which enables GPU +acceleration both for training and recognition. Apart from a compatible Nvidia +GPU, CUDA and cuDNN have to be installed so pytorch can run computation on it. + + diff --git a/5.0.0/_sources/index.rst.txt b/5.0.0/_sources/index.rst.txt new file mode 100644 index 000000000..59e096265 --- /dev/null +++ b/5.0.0/_sources/index.rst.txt @@ -0,0 +1,247 @@ +kraken +====== + +.. toctree:: + :hidden: + :maxdepth: 2 + + advanced + Training + API Tutorial + API Reference + Models + +kraken is a turn-key OCR system optimized for historical and non-Latin script +material. + +Features +======== + +kraken's main features are: + + - Fully trainable :ref:`layout analysis `, :ref:`reading order `, and :ref:`character recognition ` + - `Right-to-Left `_, `BiDi + `_, and Top-to-Bottom + script support + - `ALTO `_, PageXML, abbyyXML, and hOCR + output + - Word bounding boxes and character cuts + - Multi-script recognition support + - :ref:`Public repository ` of model files + - :ref:`Variable recognition network architectures ` + +Pull requests and code contributions are always welcome. + +Installation +============ + +Kraken can be run on Linux or Mac OS X (both x64 and ARM). Installation through +the on-board *pip* utility and the `anaconda `_ +scientific computing python are supported. + +Installation using Pip +---------------------- + +.. code-block:: console + + $ pip install kraken + +or by running pip in the git repository: + +.. code-block:: console + + $ pip install . + +If you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to +install the `pdf` extras package for PyPi: + +.. code-block:: console + + $ pip install kraken[pdf] + +or + +.. code-block:: console + + $ pip install .[pdf] + +respectively. + +Installation using Conda +------------------------ + +To install the stable version through `conda `_: + +.. code-block:: console + + $ conda install -c conda-forge -c mittagessen kraken + +Again PDF/multi-page TIFF/JPEG2000 support requires some additional dependencies: + +.. code-block:: console + + $ conda install -c conda-forge pyvips + +The git repository contains some environment files that aid in setting up the latest development version: + +.. code-block:: console + + $ git clone https://github.com/mittagessen/kraken.git + $ cd kraken + $ conda env create -f environment.yml + +or: + +.. code-block:: console + + $ git clone https://github.com/mittagessen/kraken.git + $ cd kraken + $ conda env create -f environment_cuda.yml + +for CUDA acceleration with the appropriate hardware. + +Finding Recognition Models +-------------------------- + +Finally you'll have to scrounge up a model to do the actual recognition of +characters. To download the default model for printed French text and place it +in the kraken directory for the current user: + +:: + + $ kraken get 10.5281/zenodo.10592716 + + +A list of libre models available in the central repository can be retrieved by +running: + +.. code-block:: console + + $ kraken list + +Model metadata can be extracted using: + +.. code-block:: console + + $ kraken show 10.5281/zenodo.10592716 + name: 10.5281/zenodo.10592716 + + CATMuS-Print (Large, 2024-01-30) - Diachronic model for French prints and other languages + +

CATMuS-Print (Large) - Diachronic model for French prints and other West European languages

+

CATMuS (Consistent Approach to Transcribing ManuScript) Print is a Kraken HTR model trained on data produced by several projects, dealing with different languages (French, Spanish, German, English, Corsican, Catalan, Latin, Italian…) and different centuries (from the first prints of the 16th c. to digital documents of the 21st century).

+

Transcriptions follow graphematic principles and try to be as compatible as possible with guidelines previously published for French: no ligature (except those that still exist), no allographetic variants (except the long s), and preservation of the historical use of some letters (u/v, i/j). Abbreviations are not resolved. Inconsistencies might be present, because transcriptions have been done over several years and the norms have slightly evolved.

+

The model is trained with NFKD Unicode normalization: each diacritic (including superscripts) are transcribed as their own characters, separately from the "main" character.

+

This model is the result of the collaboration from researchers from the University of Geneva and Inria Paris and will be consolidated under the CATMuS Medieval Guidelines in an upcoming paper.

+ scripts: Latn + alphabet: !"#$%&'()*+,-./0123456789:;<=>?ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_abcdefghijklmnopqrstuvwxyz|}~¡£¥§«¬°¶·»¿ÆßæđłŒœƀǝɇΑΒΓΔΕΖΘΙΚΛΜΝΟΠΡΣΤΥΦΧΩαβγδεζηθικλμνξοπρςστυφχωϛחלרᑕᗅᗞᚠẞ–—‘’‚“”„‟†•⁄⁊⁋℟←▽◊★☙✠✺✻⟦⟧⬪ꝑꝓꝗꝙꝟꝯꝵ SPACE, COMBINING GRAVE ACCENT, COMBINING ACUTE ACCENT, COMBINING CIRCUMFLEX ACCENT, COMBINING TILDE, COMBINING MACRON, COMBINING DOT ABOVE, COMBINING DIAERESIS, COMBINING RING ABOVE, COMBINING COMMA ABOVE, COMBINING REVERSED COMMA ABOVE, COMBINING CEDILLA, COMBINING OGONEK, COMBINING GREEK PERISPOMENI, COMBINING GREEK YPOGEGRAMMENI, COMBINING LATIN SMALL LETTER I, COMBINING LATIN SMALL LETTER U, 0xe682, 0xe68b, 0xe8bf, 0xf1a7 + accuracy: 98.56% + license: cc-by-4.0 + author(s): Gabay, Simon; Clérice, Thibault + date: 2024-01-30 + +Quickstart +========== + +The structure of an OCR software consists of multiple steps, primarily +preprocessing, segmentation, and recognition, each of which takes the output of +the previous step and sometimes additional files such as models and templates +that define how a particular transformation is to be performed. + +In kraken these are separated into different subcommands that can be chained or +ran separately: + +.. raw:: html + :file: _static/kraken_workflow.svg + +Recognizing text on an image using the default parameters including the +prerequisite step of page segmentation: + +.. code-block:: console + + $ kraken -i image.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel + Loading RNN ✓ + Processing ⣻ + +To segment an image into reading-order sorted baselines and regions: + +.. code-block:: console + + $ kraken -i bw.tif lines.json segment -bl + +To OCR an image using the previously downloaded model: + +.. code-block:: console + + $ kraken -i bw.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel + +To OCR an image using the default model and serialize the output using the ALTO +template: + +.. code-block:: console + + $ kraken -a -i bw.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel + +All commands and their parameters are documented, just add the standard +``--help`` flag for further information. + +Training Tutorial +================= + +There is a training tutorial at :doc:`training`. + +Related Software +================ + +These days kraken is quite closely linked to the `escriptorium +`_ project developed in the same eScripta research +group. eScriptorium provides a user-friendly interface for annotating data, +training models, and inference (but also much more). There is a `gitter channel +`_ that is mostly intended for +coordinating technical development but is also a spot to find people with +experience on applying kraken on a wide variety of material. + +.. _license: + +License +======= + +``Kraken`` is provided under the terms and conditions of the `Apache 2.0 +License `_. + +Funding +======= + +kraken is developed at the `École Pratique des Hautes Études `_, `Université PSL `_. + + +.. container:: twocol + + .. container:: leftside + + .. image:: _static/normal-reproduction-low-resolution.jpg + :width: 100 + :alt: Co-financed by the European Union + + .. container:: rightside + + This project was partially funded through the RESILIENCE project, funded from + the European Union’s Horizon 2020 Framework Programme for Research and + Innovation. + + +.. container:: twocol + + .. container:: leftside + + .. image:: https://projet.biblissima.fr/sites/default/files/2021-11/biblissima-baseline-sombre-ia.png + :width: 300 + :alt: Received funding from the Programme d’investissements d’Avenir + + .. container:: rightside + + Ce travail a bénéficié d’une aide de l’État gérée par l’Agence Nationale de la + Recherche au titre du Programme d’Investissements d’Avenir portant la référence + ANR-21-ESRE-0005 (Biblissima+). + + diff --git a/5.0.0/_sources/ketos.rst.txt b/5.0.0/_sources/ketos.rst.txt new file mode 100644 index 000000000..b1c23ae30 --- /dev/null +++ b/5.0.0/_sources/ketos.rst.txt @@ -0,0 +1,823 @@ +.. _ketos: + +Training +======== + +This page describes the training utilities available through the ``ketos`` +command line utility in depth. For a gentle introduction on model training +please refer to the :ref:`tutorial `. + +There are currently three trainable components in the kraken processing pipeline: +* Segmentation: finding lines and regions in images +* Reading Order: ordering lines found in the previous segmentation step. Reading order models are closely linked to segmentation models and both are usually trained on the same dataset. +* Recognition: recognition models transform images of lines into text. + +Depending on the use case it is not necessary to manually train new models for +each material. The default segmentation model works well on quite a variety of +handwritten and printed documents, a reading order model might not perform +better than the default heuristic for simple text flows, and there are +recognition models for some types of material available in the repository. + +Best practices +-------------- + +Recognition model training +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +* The default architecture works well for decently sized datasets. +* Use precompiled binary datasets and put them in a place where they can be memory mapped during training (local storage, not NFS or similar). +* Use the ``--logger`` flag to track your training metrics across experiments using Tensorboard. +* If the network doesn't converge before the early stopping aborts training, increase ``--min-epochs`` or ``--lag``. Use the ``--logger`` option to inspect your training loss. +* Use the flag ``--augment`` to activate data augmentation. +* Increase the amount of ``--workers`` to speedup data loading. This is essential when you use the ``--augment`` option. +* When using an Nvidia GPU, set the ``--precision`` option to 16 to use automatic mixed precision (AMP). This can provide significant speedup without any loss in accuracy. +* Use option -B to scale batch size until GPU utilization reaches 100%. When using a larger batch size, it is recommended to use option -r to scale the learning rate by the square root of the batch size (1e-3 * sqrt(batch_size)). +* When fine-tuning, it is recommended to use `new` mode not `union` as the network will rapidly unlearn missing labels in the new dataset. +* If the new dataset is fairly dissimilar or your base model has been pretrained with ketos pretrain, use ``--warmup`` in conjunction with ``--freeze-backbone`` for one 1 or 2 epochs. +* Upload your models to the model repository. + +Segmentation model training +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +* The segmenter is fairly robust when it comes to hyperparameter choice. +* Start by finetuning from the default model for a fixed number of epochs (50 for reasonably sized datasets) with a cosine schedule. +* Segmentation models' performance is difficult to evaluate. Pixel accuracy doesn't mean much because there are many more pixels that aren't part of a line or region than just background. Frequency-weighted IoU is good for overall performance, while mean IoU overrepresents rare classes. The best way to evaluate segmentation models is to look at the output on unlabelled data. +* If you don't have rare classes you can use a fairly small validation set to make sure everything is converging and just visually validate on unlabelled data. + +Training data formats +--------------------- + +The training tools accept a variety of training data formats, usually some kind +of custom low level format, the XML-based formats that are commony used for +archival of annotation and transcription data, and in the case of recognizer +training a precompiled binary format. It is recommended to use the XML formats +for segmentation and reading order training and the binary format for +recognition training. + +ALTO +~~~~ + +Kraken parses and produces files according to ALTO 4.3. An example showing the +attributes necessary for segmentation, recognition, and reading order training +follows: + +.. literalinclude:: alto.xml + :language: xml + :force: + +Importantly, the parser only works with measurements in the pixel domain, i.e. +an unset `MeasurementUnit` or one with an element value of `pixel`. In +addition, as the minimal version required for ingestion is quite new it is +likely that most existing ALTO documents will not contain sufficient +information to be used with kraken out of the box. + +PAGE XML +~~~~~~~~ + +PAGE XML is parsed and produced according to the 2019-07-15 version of the +schema, although the parser is not strict and works with non-conformant output +from a variety of tools. As with ALTO, PAGE XML files can be used to train +segmentation, reading order, and recognition models. + +.. literalinclude:: pagexml.xml + :language: xml + :force: + +Binary Datasets +~~~~~~~~~~~~~~~ + +.. _binary_datasets: + +In addition to training recognition models directly from XML and image files, a +binary dataset format offering a couple of advantages is supported for +recognition training. Binary datasets drastically improve loading performance +allowing the saturation of most GPUs with minimal computational overhead while +also allowing training with datasets that are larger than the systems main +memory. A minor drawback is a ~30% increase in dataset size in comparison to +the raw images + XML approach. + +To realize this speedup the dataset has to be compiled first: + +.. code-block:: console + + $ ketos compile -f xml -o dataset.arrow file_1.xml file_2.xml ... + +if there are a lot of individual lines containing many lines this process can +take a long time. It can easily be parallelized by specifying the number of +separate parsing workers with the ``--workers`` option: + +.. code-block:: console + + $ ketos compile --workers 8 -f xml ... + +In addition, binary datasets can contain fixed splits which allow +reproducibility and comparability between training and evaluation runs. +Training, validation, and test splits can be pre-defined from multiple sources. +Per default they are sourced from tags defined in the source XML files unless +the option telling kraken to ignore them is set: + +.. code-block:: console + + $ ketos compile --ignore-splits -f xml ... + +Alternatively fixed-proportion random splits can be created ad-hoc during +compile time: + +.. code-block:: console + + $ ketos compile --random-split 0.8 0.1 0.1 ... + +The above line splits assigns 80% of the source lines to the training set, 10% +to the validation set, and 10% to the test set. The training and validation +sets in the dataset file are used automatically by `ketos train` (unless told +otherwise) while the remaining 10% of the test set is selected by `ketos test`. + +Recognition training +-------------------- + +.. _predtrain: + +The training utility allows training of :ref:`VGSL ` specified models +both from scratch and from existing models. Here are its most important command line options: + +======================================================= ====== +option action +======================================================= ====== +-o, \--output Output model file prefix. Defaults to model. +-s, \--spec VGSL spec of the network to train. CTC layer + will be added automatically. default: + [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 + Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] +-a, \--append Removes layers before argument and then + appends spec. Only works when loading an + existing model +-i, \--load Load existing file to continue training +-F, \--savefreq Model save frequency in epochs during + training +-q, \--quit Stop condition for training. Set to `early` + for early stopping (default) or `fixed` for fixed + number of epochs. +-N, \--epochs Number of epochs to train for. +\--min-epochs Minimum number of epochs to train for when using early stopping. +\--lag Number of epochs to wait before stopping + training without improvement. Only used when using early stopping. +-d, \--device Select device to use (cpu, cuda:0, cuda:1,...). GPU acceleration requires CUDA. +\--optimizer Select optimizer (Adam, SGD, RMSprop). +-r, \--lrate Learning rate [default: 0.001] +-m, \--momentum Momentum used with SGD optimizer. Ignored otherwise. +-w, \--weight-decay Weight decay. +\--schedule Sets the learning rate scheduler. May be either constant, 1cycle, exponential, cosine, step, or + reduceonplateau. For 1cycle the cycle length is determined by the `--epoch` option. +-p, \--partition Ground truth data partition ratio between train/validation set +-u, \--normalization Ground truth Unicode normalization. One of NFC, NFKC, NFD, NFKD. +-c, \--codec Load a codec JSON definition (invalid if loading existing model) +\--resize Codec/output layer resizing option. If set + to `union` code points will be added, `new` + will set the layer to match exactly the + training data, `fail` will abort if training + data and model codec do not match. Only valid when refining an existing model. +-n, \--reorder / \--no-reorder Reordering of code points to display order. +-t, \--training-files File(s) with additional paths to training data. Used to + enforce an explicit train/validation set split and deal with + training sets with more lines than the command line can process. Can be used more than once. +-e, \--evaluation-files File(s) with paths to evaluation data. Overrides the `-p` parameter. +-f, \--format-type Sets the training and evaluation data format. + Valid choices are 'path', 'xml' (default), 'alto', 'page', or binary. + In `alto`, `page`, and xml mode all data is extracted from XML files + containing both baselines and a link to source images. + In `path` mode arguments are image files sharing a prefix up to the last + extension with JSON `.path` files containing the baseline information. + In `binary` mode arguments are precompiled binary dataset files. +\--augment / \--no-augment Enables/disables data augmentation. +\--workers Number of OpenMP threads and workers used to perform neural network passes and load samples from the dataset. +======================================================= ====== + +From Scratch +~~~~~~~~~~~~ + +The absolute minimal example to train a new recognition model from a number of +ALTO or PAGE XML documents is similar to the segmentation training: + +.. code-block:: console + + $ ketos train -f xml training_data/*.xml + +Training will continue until the error does not improve anymore and the best +model (among intermediate results) will be saved in the current directory; this +approach is called early stopping. + +In some cases changing the network architecture might be useful. One such +example would be material that is not well recognized in the grayscale domain, +as the default architecture definition converts images into grayscale. The +input definition can be changed quite easily to train on color data (RGB) instead: + +.. code-block:: console + + $ ketos train -f page -s '[1,120,0,3 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do0.1,2 Lbx200 Do]]' syr/*.xml + +Complete documentation for the network description language can be found on the +:ref:`VGSL ` page. + +Sometimes the early stopping default parameters might produce suboptimal +results such as stopping training too soon. Adjusting the lag can be useful: + +.. code-block:: console + + $ ketos train --lag 10 syr/*.png + +To switch optimizers from Adam to SGD or RMSprop just set the option: + +.. code-block:: console + + $ ketos train --optimizer SGD syr/*.png + +It is possible to resume training from a previously saved model: + +.. code-block:: console + + $ ketos train -i model_25.mlmodel syr/*.png + +A good configuration for a small precompiled print dataset and GPU acceleration +would be: + +.. code-block:: console + + $ ketos train -d cuda -f binary dataset.arrow + +A better configuration for large and complicated datasets such as handwritten texts: + +.. code-block:: console + + $ ketos train --augment --workers 4 -d cuda -f binary --min-epochs 20 -w 0 -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]' -r 0.0001 dataset_large.arrow + +This configuration is slower to train and often requires a couple of epochs to +output any sensible text at all. Therefore we tell ketos to train for at least +20 epochs so the early stopping algorithm doesn't prematurely interrupt the +training process. + +Fine Tuning +~~~~~~~~~~~ + +Fine tuning an existing model for another typeface or new characters is also +possible with the same syntax as resuming regular training: + +.. code-block:: console + + $ ketos train -f page -i model_best.mlmodel syr/*.xml + +The caveat is that the alphabet of the base model and training data have to be +an exact match. Otherwise an error will be raised: + +.. code-block:: console + + $ ketos train -i model_5.mlmodel kamil/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [0.8616] alphabet mismatch {'~', '»', '8', '9', 'ـ'} + Network codec not compatible with training set + [0.8620] Training data and model codec alphabets mismatch: {'ٓ', '؟', '!', 'ص', '،', 'ذ', 'ة', 'ي', 'و', 'ب', 'ز', 'ح', 'غ', '~', 'ف', ')', 'د', 'خ', 'م', '»', 'ع', 'ى', 'ق', 'ش', 'ا', 'ه', 'ك', 'ج', 'ث', '(', 'ت', 'ظ', 'ض', 'ل', 'ط', '؛', 'ر', 'س', 'ن', 'ء', 'ٔ', '«', 'ـ', 'ٕ'} + +There are two modes dealing with mismatching alphabets, ``union`` and ``new``. +``union`` resizes the output layer and codec of the loaded model to include all +characters in the new training set without removing any characters. ``new`` +will make the resulting model an exact match with the new training set by both +removing unused characters from the model and adding new ones. + +.. code-block:: console + + $ ketos -v train --resize union -i model_5.mlmodel syr/*.png + ... + [0.7943] Training set 788 lines, validation set 88 lines, alphabet 50 symbols + ... + [0.8337] Resizing codec to include 3 new code points + [0.8374] Resizing last layer in network to 52 outputs + ... + +In this example 3 characters were added for a network that is able to +recognize 52 different characters after sufficient additional training. + +.. code-block:: console + + $ ketos -v train --resize new -i model_5.mlmodel syr/*.png + ... + [0.7593] Training set 788 lines, validation set 88 lines, alphabet 49 symbols + ... + [0.7857] Resizing network or given codec to 49 code sequences + [0.8344] Deleting 2 output classes from network (46 retained) + ... + +In ``new`` mode 2 of the original characters were removed and 3 new ones were added. + +Slicing +~~~~~~~ + +Refining on mismatched alphabets has its limits. If the alphabets are highly +different the modification of the final linear layer to add/remove character +will destroy the inference capabilities of the network. In those cases it is +faster to slice off the last few layers of the network and only train those +instead of a complete network from scratch. + +Taking the default network definition as printed in the debug log we can see +the layer indices of the model: + +.. code-block:: console + + [0.8760] Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 48 outputs + [0.8762] layer type params + [0.8790] 0 conv kernel 3 x 3 filters 32 activation r + [0.8795] 1 dropout probability 0.1 dims 2 + [0.8797] 2 maxpool kernel 2 x 2 stride 2 x 2 + [0.8802] 3 conv kernel 3 x 3 filters 64 activation r + [0.8804] 4 dropout probability 0.1 dims 2 + [0.8806] 5 maxpool kernel 2 x 2 stride 2 x 2 + [0.8813] 6 reshape from 1 1 x 12 to 1/3 + [0.8876] 7 rnn direction b transposed False summarize False out 100 legacy None + [0.8878] 8 dropout probability 0.5 dims 1 + [0.8883] 9 linear augmented False out 48 + +To remove everything after the initial convolutional stack and add untrained +layers we define a network stub and index for appending: + +.. code-block:: console + + $ ketos train -i model_1.mlmodel --append 7 -s '[Lbx256 Do]' syr/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [0.8014] alphabet mismatch {'8', '3', '9', '7', '܇', '݀', '݂', '4', ':', '0'} + Slicing and dicing model ✓ + +The new model will behave exactly like a new one, except potentially training a +lot faster. + +Text Normalization and Unicode +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. note: + + The description of the different behaviors of Unicode text below are highly + abbreviated. If confusion arrises it is recommended to take a look at the + linked documents which are more exhaustive and include visual examples. + +Text can be encoded in multiple different ways when using Unicode. For many +scripts characters with diacritics can be encoded either as a single code point +or a base character and the diacritic, `different types of whitespace +`_ exist, and mixed bidirectional text +can be written differently depending on the `base line direction +`_. + +Ketos provides options to largely normalize input into normalized forms that +make processing of data from multiple sources possible. Principally, two +options are available: one for `Unicode normalization +`_ and one for whitespace normalization. The +Unicode normalization (disabled per default) switch allows one to select one of +the 4 normalization forms: + +.. code-block:: console + + $ ketos train --normalization NFD -f xml training_data/*.xml + $ ketos train --normalization NFC -f xml training_data/*.xml + $ ketos train --normalization NFKD -f xml training_data/*.xml + $ ketos train --normalization NFKC -f xml training_data/*.xml + +Whitespace normalization is enabled per default and converts all Unicode +whitespace characters into a simple space. It is highly recommended to leave +this function enabled as the variation of space width, resulting either from +text justification or the irregularity of handwriting, is difficult for a +recognition model to accurately model and map onto the different space code +points. Nevertheless it can be disabled through: + +.. code-block:: console + + $ ketos train --no-normalize-whitespace -f xml training_data/*.xml + +Further the behavior of the `BiDi algorithm +`_ can be influenced through two options. The +configuration of the algorithm is important as the recognition network is +trained to output characters (or rather labels which are mapped to code points +by a :ref:`codec `) in the order a line is fed into the network, i.e. +left-to-right also called display order. Unicode text is encoded as a stream of +code points in logical order, i.e. the order the characters in a line are read +in by a human reader, for example (mostly) right-to-left for a text in Hebrew. +The BiDi algorithm resolves this logical order to the display order expected by +the network and vice versa. The primary parameter of the algorithm is the base +direction which is just the default direction of the input fields of the user +when the ground truth was initially transcribed. Base direction will be +automatically determined by kraken when using PAGE XML or ALTO files that +contain it, otherwise it will have to be supplied if it differs from the +default when training a model: + +.. code-block:: console + + $ ketos train --base-dir R -f xml rtl_training_data/*.xml + +It is also possible to disable BiDi processing completely, e.g. when the text +has been brought into display order already: + +.. code-block:: console + + $ ketos train --no-reorder -f xml rtl_display_data/*.xml + +Codecs +~~~~~~ + +.. _codecs: + +Codecs map between the label decoded from the raw network output and Unicode +code points (see :ref:`this ` diagram for the precise steps +involved in text line recognition). Codecs are attached to a recognition model +and are usually defined once at initial training time, although they can be +adapted either explicitly (with the API) or implicitly through domain adaptation. + +The default behavior of kraken is to auto-infer this mapping from all the +characters in the training set and map each code point to one separate label. +This is usually sufficient for alphabetic scripts, abjads, and abugidas apart +from very specialised use cases. Logographic writing systems with a very large +number of different graphemes, such as all the variants of Han characters or +Cuneiform, can be more problematic as their large inventory makes recognition +both slow and error-prone. In such cases it can be advantageous to decompose +each code point into multiple labels to reduce the output dimensionality of the +network. During decoding valid sequences of labels will be mapped to their +respective code points as usual. + +There are multiple approaches one could follow constructing a custom codec: +*randomized block codes*, i.e. producing random fixed-length labels for each code +point, *Huffmann coding*, i.e. variable length label sequences depending on the +frequency of each code point in some text (not necessarily the training set), +or *structural decomposition*, i.e. describing each code point through a +sequence of labels that describe the shape of the grapheme similar to how some +input systems for Chinese characters function. + +While the system is functional it is not well-tested in practice and it is +unclear which approach works best for which kinds of inputs. + +Custom codecs can be supplied as simple JSON files that contain a dictionary +mapping between strings and integer sequences, e.g.: + +.. code-block:: console + + $ ketos train -c sample.codec -f xml training_data/*.xml + +with `sample.codec` containing: + +.. code-block:: json + + {"S": [50, 53, 74, 23], + "A": [95, 60, 19, 95], + "B": [2, 96, 28, 29], + "\u1f05": [91, 14, 95, 90]} + +Unsupervised recognition pretraining +------------------------------------ + +Text recognition models can be pretrained in an unsupervised fashion from text +line images, both in bounding box and baseline format. The pretraining is +performed through a contrastive surrogate task aiming to distinguish in-painted +parts of the input image features from randomly sampled distractor slices. + +All data sources accepted by the supervised trainer are valid for pretraining +but for performance reasons it is recommended to use pre-compiled binary +datasets. One thing to keep in mind is that compilation filters out empty +(non-transcribed) text lines per default which is undesirable for pretraining. +With the ``--keep-empty-lines`` option all valid lines will be written to the +dataset file: + +.. code-block:: console + + $ ketos compile --keep-empty-lines -f xml -o foo.arrow *.xml + + +The basic pretraining call is very similar to a training one: + +.. code-block:: console + + $ ketos pretrain -f binary foo.arrow + +There are a couple of hyperparameters that are specific to pretraining: the +mask width (at the subsampling level of the last convolutional layer), the +probability of a particular position being the start position of a mask, and +the number of negative distractor samples. + +.. code-block:: console + + $ ketos pretrain -o pretrain --mask-width 4 --mask-probability 0.2 --num-negatives 3 -f binary foo.arrow + +Once a model has been pretrained it has to be adapted to perform actual +recognition with a standard labelled dataset, although training data +requirements will usually be much reduced: + +.. code-block:: console + + $ ketos train -i pretrain_best.mlmodel --warmup 5000 --freeze-backbone 1000 -f binary labelled.arrow + +It is necessary to use learning rate warmup (`warmup`) for at least a couple of +epochs in addition to freezing the backbone (all but the last fully connected +layer performing the classification) to have the model converge during +fine-tuning. Fine-tuning models from pre-trained weights is quite a bit less +stable than training from scratch or fine-tuning an existing model. As such it +can be necessary to run a couple of trials with different hyperparameters +(principally learning rate) to find workable ones. It is entirely possible that +pretrained models do not converge at all even with reasonable hyperparameter +configurations. + +Segmentation training +--------------------- + +.. _segtrain: + +Training a segmentation model is very similar to training models for text +recognition. The basic invocation is: + +.. code-block:: console + + $ ketos segtrain -f xml training_data/*.xml + +This takes all text lines and regions encoded in the XML files and trains a +model to recognize them. + +Most other options available in transcription training are also available in +segmentation training. CUDA acceleration: + +.. code-block:: console + + $ ketos segtrain -d cuda -f xml training_data/*.xml + +Defining custom architectures: + +.. code-block:: console + + $ ketos segtrain -d cuda -s '[1,1200,0,3 Cr7,7,64,2,2 Gn32 Cr3,3,128,2,2 Gn32 Cr3,3,128 Gn32 Cr3,3,256 Gn32]' -f xml training_data/*.xml + +Fine tuning/transfer learning with last layer adaptation and slicing: + +.. code-block:: console + + $ ketos segtrain --resize new -i segmodel_best.mlmodel training_data/*.xml + $ ketos segtrain -i segmodel_best.mlmodel --append 7 -s '[Cr3,3,64 Do0.1]' training_data/*.xml + +In addition there are a couple of specific options that allow filtering of +baseline and region types. Datasets are often annotated to a level that is too +detailed or contains undesirable types, e.g. when combining segmentation data +from different sources. The most basic option is the suppression of *all* of +either baseline or region data contained in the dataset: + +.. code-block:: console + + $ ketos segtrain --suppress-baselines -f xml training_data/*.xml + Training line types: + Training region types: + graphic 3 135 + text 4 1128 + separator 5 5431 + paragraph 6 10218 + table 7 16 + ... + $ ketos segtrain --suppress-regions -f xml training-data/*.xml + Training line types: + default 2 53980 + foo 8 134 + ... + +It is also possible to filter out baselines/regions selectively: + +.. code-block:: console + + $ ketos segtrain -f xml --valid-baselines default training_data/*.xml + Training line types: + default 2 53980 + Training region types: + graphic 3 135 + text 4 1128 + separator 5 5431 + paragraph 6 10218 + table 7 16 + $ ketos segtrain -f xml --valid-regions graphic --valid-regions paragraph training_data/*.xml + Training line types: + default 2 53980 + Training region types: + graphic 3 135 + paragraph 6 10218 + +Finally, we can merge baselines and regions into each other: + +.. code-block:: console + + $ ketos segtrain -f xml --merge-baselines default:foo training_data/*.xml + Training line types: + default 2 54114 + ... + $ ketos segtrain -f xml --merge-regions text:paragraph --merge-regions graphic:table training_data/*.xml + ... + Training region types: + graphic 3 151 + text 4 11346 + separator 5 5431 + ... + +These options are combinable to massage the dataset into any typology you want. +Tags containing the separator character `:` can be specified by escaping them +with backslash. + +Then there are some options that set metadata fields controlling the +postprocessing. When computing the bounding polygons the recognized baselines +are offset slightly to ensure overlap with the line corpus. This offset is per +default upwards for baselines but as it is possible to annotate toplines (for +scripts like Hebrew) and centerlines (for baseline-free scripts like Chinese) +the appropriate offset can be selected with an option: + +.. code-block:: console + + $ ketos segtrain --topline -f xml hebrew_training_data/*.xml + $ ketos segtrain --centerline -f xml chinese_training_data/*.xml + $ ketos segtrain --baseline -f xml latin_training_data/*.xml + +Lastly, there are some regions that are absolute boundaries for text line +content. When these regions are marked as such the polygonization can sometimes +be improved: + +.. code-block:: console + + $ ketos segtrain --bounding-regions paragraph -f xml training_data/*.xml + ... + +Reading order training +---------------------- + +.. _rotrain: + +Reading order models work slightly differently from segmentation and reading +order models. They are closely linked to the typology used in the dataset they +were trained on as they use type information on lines and regions to make +ordering decisions. As the same typology was probably used to train a specific +segmentation model, reading order models are trained separately but bundled +with their segmentation model in a subsequent step. The general sequence is +therefore: + +.. code-block:: console + + $ ketos segtrain -o fr_manu_seg.mlmodel -f xml french/*.xml + ... + $ ketos rotrain -o fr_manu_ro.mlmodel -f xml french/*.xml + ... + $ ketos roadd -o fr_manu_seg_with_ro.mlmodel -i fr_manu_seg_best.mlmodel -r fr_manu_ro_best.mlmodel + +Only the `fr_manu_seg_with_ro.mlmodel` file will contain the trained reading +order model. Segmentation models can exist with or without reading order +models. If one is added, the neural reading order will be computed *in +addition* to the one produced by the default heuristic during segmentation and +serialized in the final XML output (in ALTO/PAGE XML). + +.. note:: + + Reading order models work purely on the typology and geometric features + of the lines and regions. They construct an approximate ordering matrix + by feeding feature vectors of two lines (or regions) into the network + to decide which of those two lines precedes the other. + + These feature vectors are quite simple; just the lines' types, and + their start, center, and end points. Therefore they can *not* reliably + learn any ordering relying on graphical features of the input page such + as: line color, typeface, or writing system. + +Reading order models are extremely simple and do not require a lot of memory or +computational power to train. In fact, the default parameters are extremely +conservative and it is recommended to increase the batch size for improved +training speed. Large batch size above 128k are easily possible with +sufficiently large training datasets: + +.. code-block:: console + + $ ketos rotrain -o fr_manu_ro.mlmodel -B 128000 -f french/*.xml + Training RO on following baselines types: + DefaultLine 1 + DropCapitalLine 2 + HeadingLine 3 + InterlinearLine 4 + GPU available: False, used: False + TPU available: False, using: 0 TPU cores + IPU available: False, using: 0 IPUs + HPU available: False, using: 0 HPUs + ┏━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓ + ┃ ┃ Name ┃ Type ┃ Params ┃ + ┡━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩ + │ 0 │ criterion │ BCEWithLogitsLoss │ 0 │ + │ 1 │ ro_net │ MLP │ 1.1 K │ + │ 2 │ ro_net.fc1 │ Linear │ 1.0 K │ + │ 3 │ ro_net.relu │ ReLU │ 0 │ + │ 4 │ ro_net.fc2 │ Linear │ 45 │ + └───┴─────────────┴───────────────────┴────────┘ + Trainable params: 1.1 K + Non-trainable params: 0 + Total params: 1.1 K + Total estimated model params size (MB): 0 + stage 0/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/35 0:00:00 • -:--:-- 0.00it/s val_spearman: 0.912 val_loss: 0.701 early_stopping: 0/300 inf + +During validation a metric called Spearman's footrule is computed. To calculate +Spearman's footrule, the ranks of the lines of text in the ground truth reading +order and the predicted reading order are compared. The footrule is then +calculated as the sum of the absolute differences between the ranks of pairs of +lines. The score increases by 1 for each line between the correct and predicted +positions of a line. + +A lower footrule score indicates a better alignment between the two orders. A +score of 0 implies perfect alignment of line ranks. + +Recognition testing +------------------- + +Picking a particular model from a pool or getting a more detailed look on the +recognition accuracy can be done with the `test` command. It uses transcribed +lines, the test set, in the same format as the `train` command, recognizes the +line images with one or more models, and creates a detailed report of the +differences from the ground truth for each of them. + +======================================================= ====== +option action +======================================================= ====== +-f, \--format-type Sets the test set data format. + Valid choices are 'path', 'xml' (default), 'alto', 'page', or binary. + In `alto`, `page`, and xml mode all data is extracted from XML files + containing both baselines and a link to source images. + In `path` mode arguments are image files sharing a prefix up to the last + extension with JSON `.path` files containing the baseline information. + In `binary` mode arguments are precompiled binary dataset files. +-m, \--model Model(s) to evaluate. +-e, \--evaluation-files File(s) with paths to evaluation data. +-d, \--device Select device to use. +\--pad Left and right padding around lines. +======================================================= ====== + +Transcriptions are handed to the command in the same way as for the `train` +command, either through a manifest with ``-e/--evaluation-files`` or by just +adding a number of image files as the final argument: + +.. code-block:: console + + $ ketos test -m $model -e test.txt test/*.png + Evaluating $model + Evaluating [####################################] 100% + === report test_model.mlmodel === + + 7012 Characters + 6022 Errors + 14.12% Accuracy + + 5226 Insertions + 2 Deletions + 794 Substitutions + + Count Missed %Right + 1567 575 63.31% Common + 5230 5230 0.00% Arabic + 215 215 0.00% Inherited + + Errors Correct-Generated + 773 { ا } - { } + 536 { ل } - { } + 328 { و } - { } + 274 { ي } - { } + 266 { م } - { } + 256 { ب } - { } + 246 { ن } - { } + 241 { SPACE } - { } + 207 { ر } - { } + 199 { ف } - { } + 192 { ه } - { } + 174 { ع } - { } + 172 { ARABIC HAMZA ABOVE } - { } + 144 { ت } - { } + 136 { ق } - { } + 122 { س } - { } + 108 { ، } - { } + 106 { د } - { } + 82 { ك } - { } + 81 { ح } - { } + 71 { ج } - { } + 66 { خ } - { } + 62 { ة } - { } + 60 { ص } - { } + 39 { ، } - { - } + 38 { ش } - { } + 30 { ا } - { - } + 30 { ن } - { - } + 29 { ى } - { } + 28 { ذ } - { } + 27 { ه } - { - } + 27 { ARABIC HAMZA BELOW } - { } + 25 { ز } - { } + 23 { ث } - { } + 22 { غ } - { } + 20 { م } - { - } + 20 { ي } - { - } + 20 { ) } - { } + 19 { : } - { } + 19 { ط } - { } + 19 { ل } - { - } + 18 { ، } - { . } + 17 { ة } - { - } + 16 { ض } - { } + ... + Average accuracy: 14.12%, (stddev: 0.00) + +The report(s) contains character accuracy measured per script and a detailed +list of confusions. When evaluating multiple models the last line of the output +will the average accuracy and the standard deviation across all of them. diff --git a/5.0.0/_sources/models.rst.txt b/5.0.0/_sources/models.rst.txt new file mode 100644 index 000000000..b393f0738 --- /dev/null +++ b/5.0.0/_sources/models.rst.txt @@ -0,0 +1,24 @@ +.. _models: + +Models +====== + +There are currently three kinds of models containing the recurrent neural +networks doing all the character recognition supported by kraken: ``pronn`` +files serializing old pickled ``pyrnn`` models as protobuf, clstm's native +serialization, and versatile `Core ML +`_ models. + +CoreML +------ + +Core ML allows arbitrary network architectures in a compact serialization with +metadata. This is the default format in pytorch-based kraken. + +Segmentation Models +------------------- + +Recognition Models +------------------ + + diff --git a/5.0.0/_sources/training.rst.txt b/5.0.0/_sources/training.rst.txt new file mode 100644 index 000000000..aa63338f5 --- /dev/null +++ b/5.0.0/_sources/training.rst.txt @@ -0,0 +1,463 @@ +.. _training: + +Training kraken +=============== + +kraken is an optical character recognition package that can be trained fairly +easily for a large number of scripts. In contrast to other system requiring +segmentation down to glyph level before classification, it is uniquely suited +for the recognition of connected scripts, because the neural network is trained +to assign correct character to unsegmented training data. + +Both segmentation, the process finding lines and regions on a page image, and +recognition, the conversion of line images into text, can be trained in kraken. +To train models for either we require training data, i.e. examples of page +segmentations and transcriptions that are similar to what we want to be able to +recognize. For segmentation the examples are the location of baselines, i.e. +the imaginary lines the text is written on, and polygons of regions. For +recognition these are the text contained in a line. There are multiple ways to +supply training data but the easiest is through PageXML or ALTO files. + +Installing kraken +----------------- + +The easiest way to install and use kraken is through `conda +`_. kraken works both on Linux and Mac OS +X. After installing conda, download the environment file and create the +environment for kraken: + +.. code-block:: console + + $ wget https://raw.githubusercontent.com/mittagessen/kraken/main/environment.yml + $ conda env create -f environment.yml + +Each time you want to use the kraken environment in a shell is has to be +activated first: + +.. code-block:: console + + $ conda activate kraken + +Image acquisition and preprocessing +----------------------------------- + +First a number of high quality scans, preferably color or grayscale and at +least 300dpi are required. Scans should be in a lossless image format such as +TIFF or PNG, images in PDF files have to be extracted beforehand using a tool +such as ``pdftocairo`` or ``pdfimages``. While each of these requirements can +be relaxed to a degree, the final accuracy will suffer to some extent. For +example, only slightly compressed JPEG scans are generally suitable for +training and recognition. + +Depending on the source of the scans some preprocessing such as splitting scans +into pages, correcting skew and warp, and removing speckles can be advisable +although it isn't strictly necessary as the segmenter can be trained to treat +noisy material with a high accuracy. A fairly user-friendly software for +semi-automatic batch processing of image scans is `Scantailor +`_ albeit most work can be done using a standard image +editor. + +The total number of scans required depends on the kind of model to train +(segmentation or recognition), the complexity of the layout or the nature of +the script to recognize. Only features that are found in the training data can +later be recognized, so it is important that the coverage of typographic +features is exhaustive. Training a small segmentation model for a particular +kind of material might require less than a few hundred samples while a general +model can well go into the thousands of pages. Likewise a specific recognition +model for printed script with a small grapheme inventory such as Arabic or +Hebrew requires around 800 lines, with manuscripts, complex scripts (such as +polytonic Greek), and general models for multiple typefaces and hands needing +more training data for the same accuracy. + +There is no hard rule for the amount of training data and it may be required to +retrain a model after the initial training data proves insufficient. Most +``western`` texts contain between 25 and 40 lines per page, therefore upward of +30 pages have to be preprocessed and later transcribed. + +Annotation and transcription +---------------------------- + +kraken does not provide internal tools for the annotation and transcription of +baselines, regions, and text. There are a number of tools available that can +create ALTO and PageXML files containing the requisite information for either +segmentation or recognition training: `escriptorium +`_ integrates kraken tightly including +training and inference, `Aletheia +`_ is a powerful desktop +application that can create fine grained annotations. + +Dataset Compilation +------------------- + +.. _compilation: + +Training +-------- + +.. _training_step: + +The training data, e.g. a collection of PAGE XML documents, obtained through +annotation and transcription may now be used to train segmentation and/or +transcription models. + +The training data in ``output_dir`` may now be used to train a new model by +invoking the ``ketos train`` command. Just hand a list of images to the command +such as: + +.. code-block:: console + + $ ketos train output_dir/*.png + +to start training. + +A number of lines will be split off into a separate held-out set that is used +to estimate the actual recognition accuracy achieved in the real world. These +are never shown to the network during training but will be recognized +periodically to evaluate the accuracy of the model. Per default the validation +set will comprise of 10% of the training data. + +Basic model training is mostly automatic albeit there are multiple parameters +that can be adjusted: + +--output + Sets the prefix for models generated during training. They will best as + ``prefix_epochs.mlmodel``. +--report + How often evaluation passes are run on the validation set. It is an + integer equal or larger than 1 with 1 meaning a report is created each + time the complete training set has been seen by the network. +--savefreq + How often intermediate models are saved to disk. It is an integer with + the same semantics as ``--report``. +--load + Continuing training is possible by loading an existing model file with + ``--load``. To continue training from a base model with another + training set refer to the full :ref:`ketos ` documentation. +--preload + Enables/disables preloading of the training set into memory for + accelerated training. The default setting preloads data sets with less + than 2500 lines, explicitly adding ``--preload`` will preload arbitrary + sized sets. ``--no-preload`` disables preloading in all circumstances. + +Training a network will take some time on a modern computer, even with the +default parameters. While the exact time required is unpredictable as training +is a somewhat random process a rough guide is that accuracy seldom improves +after 50 epochs reached between 8 and 24 hours of training. + +When to stop training is a matter of experience; the default setting employs a +fairly reliable approach known as `early stopping +`_ that stops training as soon as +the error rate on the validation set doesn't improve anymore. This will +prevent `overfitting `_, i.e. +fitting the model to recognize only the training data properly instead of the +general patterns contained therein. + +.. code-block:: console + + $ ketos train output_dir/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'} + Initializing model ✓ + Accuracy report (0) -1.5951 3680 9550 + epoch 0/-1 [####################################] 788/788 + Accuracy report (1) 0.0245 3504 3418 + epoch 1/-1 [####################################] 788/788 + Accuracy report (2) 0.8445 3504 545 + epoch 2/-1 [####################################] 788/788 + Accuracy report (3) 0.9541 3504 161 + epoch 3/-1 [------------------------------------] 13/788 0d 00:22:09 + ... + +By now there should be a couple of models model_name-1.mlmodel, +model_name-2.mlmodel, ... in the directory the script was executed in. Lets +take a look at each part of the output. + +.. code-block:: console + + Building training set [####################################] 100% + Building validation set [####################################] 100% + +shows the progress of loading the training and validation set into memory. This +might take a while as preprocessing the whole set and putting it into memory is +computationally intensive. Loading can be made faster without preloading at the +cost of performing preprocessing repeatedly during the training process. + +.. code-block:: console + + [270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'} + +is a warning about missing characters in either the validation or training set, +i.e. that the alphabets of the sets are not equal. Increasing the size of the +validation set will often remedy this warning. + +.. code-block:: console + + Accuracy report (2) 0.8445 3504 545 + +this line shows the results of the validation set evaluation. The error after 2 +epochs is 545 incorrect characters out of 3504 characters in the validation set +for a character accuracy of 84.4%. It should decrease fairly rapidly. If +accuracy remains around 0.30 something is amiss, e.g. non-reordered +right-to-left or wildly incorrect transcriptions. Abort training, correct the +error(s) and start again. + +After training is finished the best model is saved as +``model_name_best.mlmodel``. It is highly recommended to also archive the +training log and data for later reference. + +``ketos`` can also produce more verbose output with training set and network +information by appending one or more ``-v`` to the command: + +.. code-block:: console + + $ ketos -vv train syr/*.png + [0.7272] Building ground truth set from 876 line images + [0.7281] Taking 88 lines from training for evaluation + ... + [0.8479] Training set 788 lines, validation set 88 lines, alphabet 48 symbols + [0.8481] alphabet mismatch {'\xa0', '0', ':', '݀', '܇', '݂', '5'} + [0.8482] grapheme count + [0.8484] SPACE 5258 + [0.8484] ܐ 3519 + [0.8485] ܘ 2334 + [0.8486] ܝ 2096 + [0.8487] ܠ 1754 + [0.8487] ܢ 1724 + [0.8488] ܕ 1697 + [0.8489] ܗ 1681 + [0.8489] ܡ 1623 + [0.8490] ܪ 1359 + [0.8491] ܬ 1339 + [0.8491] ܒ 1184 + [0.8492] ܥ 824 + [0.8492] . 811 + [0.8493] COMBINING DOT BELOW 646 + [0.8493] ܟ 599 + [0.8494] ܫ 577 + [0.8495] COMBINING DIAERESIS 488 + [0.8495] ܚ 431 + [0.8496] ܦ 428 + [0.8496] ܩ 307 + [0.8497] COMBINING DOT ABOVE 259 + [0.8497] ܣ 256 + [0.8498] ܛ 204 + [0.8498] ܓ 176 + [0.8499] ܀ 132 + [0.8499] ܙ 81 + [0.8500] * 66 + [0.8501] ܨ 59 + [0.8501] ܆ 40 + [0.8502] [ 40 + [0.8503] ] 40 + [0.8503] 1 18 + [0.8504] 2 11 + [0.8504] ܇ 9 + [0.8505] 3 8 + [0.8505] 6 + [0.8506] 5 5 + [0.8506] NO-BREAK SPACE 4 + [0.8507] 0 4 + [0.8507] 6 4 + [0.8508] : 4 + [0.8508] 8 4 + [0.8509] 9 3 + [0.8510] 7 3 + [0.8510] 4 3 + [0.8511] SYRIAC FEMININE DOT 1 + [0.8511] SYRIAC RUKKAKHA 1 + [0.8512] Encoding training set + [0.9315] Creating new model [1,1,0,48 Lbx100 Do] with 49 outputs + [0.9318] layer type params + [0.9350] 0 rnn direction b transposed False summarize False out 100 legacy None + [0.9361] 1 dropout probability 0.5 dims 1 + [0.9381] 2 linear augmented False out 49 + [0.9918] Constructing RMSprop optimizer (lr: 0.001, momentum: 0.9) + [0.9920] Set OpenMP threads to 4 + [0.9920] Moving model to device cpu + [0.9924] Starting evaluation run + + +indicates that the training is running on 788 transcribed lines and a +validation set of 88 lines. 49 different classes, i.e. Unicode code points, +where found in these 788 lines. These affect the output size of the network; +obviously only these 49 different classes/code points can later be output by +the network. Importantly, we can see that certain characters occur markedly +less often than others. Characters like the Syriac feminine dot and numerals +that occur less than 10 times will most likely not be recognized well by the +trained net. + + +Evaluation and Validation +------------------------- + +While output during training is detailed enough to know when to stop training +one usually wants to know the specific kinds of errors to expect. Doing more +in-depth error analysis also allows to pinpoint weaknesses in the training +data, e.g. above average error rates for numerals indicate either a lack of +representation of numerals in the training data or erroneous transcription in +the first place. + +First the trained model has to be applied to some line transcriptions with the +`ketos test` command: + +.. code-block:: console + + $ ketos test -m syriac_best.mlmodel lines/*.png + Loading model syriac_best.mlmodel ✓ + Evaluating syriac_best.mlmodel + Evaluating [#-----------------------------------] 3% 00:04:56 + ... + +After all lines have been processed a evaluation report will be printed: + +.. code-block:: console + + === report === + + 35619 Characters + 336 Errors + 99.06% Accuracy + + 157 Insertions + 81 Deletions + 98 Substitutions + + Count Missed %Right + 27046 143 99.47% Syriac + 7015 52 99.26% Common + 1558 60 96.15% Inherited + + Errors Correct-Generated + 25 { } - { COMBINING DOT BELOW } + 25 { COMBINING DOT BELOW } - { } + 15 { . } - { } + 15 { COMBINING DIAERESIS } - { } + 12 { ܢ } - { } + 10 { } - { . } + 8 { COMBINING DOT ABOVE } - { } + 8 { ܝ } - { } + 7 { ZERO WIDTH NO-BREAK SPACE } - { } + 7 { ܆ } - { } + 7 { SPACE } - { } + 7 { ܣ } - { } + 6 { } - { ܝ } + 6 { COMBINING DOT ABOVE } - { COMBINING DIAERESIS } + 5 { ܙ } - { } + 5 { ܬ } - { } + 5 { } - { ܢ } + 4 { NO-BREAK SPACE } - { } + 4 { COMBINING DIAERESIS } - { COMBINING DOT ABOVE } + 4 { } - { ܒ } + 4 { } - { COMBINING DIAERESIS } + 4 { ܗ } - { } + 4 { } - { ܬ } + 4 { } - { ܘ } + 4 { ܕ } - { ܢ } + 3 { } - { ܕ } + 3 { ܐ } - { } + 3 { ܗ } - { ܐ } + 3 { ܝ } - { ܢ } + 3 { ܀ } - { . } + 3 { } - { ܗ } + + ..... + +The first section of the report consists of a simple accounting of the number +of characters in the ground truth, the errors in the recognition output and the +resulting accuracy in per cent. + +The next table lists the number of insertions (characters occurring in the +ground truth but not in the recognition output), substitutions (misrecognized +characters), and deletions (superfluous characters recognized by the model). + +Next is a grouping of errors (insertions and substitutions) by Unicode script. + +The final part of the report are errors sorted by frequency and a per +character accuracy report. Importantly most errors are incorrect recognition of +combining marks such as dots and diaereses. These may have several sources: +different dot placement in training and validation set, incorrect transcription +such as non-systematic transcription, or unclean speckled scans. Depending on +the error source, correction most often involves adding more training data and +fixing transcriptions. Sometimes it may even be advisable to remove +unrepresentative data from the training set. + +Recognition +----------- + +The ``kraken`` utility is employed for all non-training related tasks. Optical +character recognition is a multi-step process consisting of binarization +(conversion of input images to black and white), page segmentation (extracting +lines from the image), and recognition (converting line image to character +sequences). All of these may be run in a single call like this: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m MODEL_FILE + +producing a text file from the input image. There are also `hocr +`_ and `ALTO `_ output +formats available through the appropriate switches: + +.. code-block:: console + + $ kraken -i ... ocr -h + $ kraken -i ... ocr -a + +For debugging purposes it is sometimes helpful to run each step manually and +inspect intermediate results: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE BW_IMAGE binarize + $ kraken -i BW_IMAGE LINES segment + $ kraken -i BW_IMAGE OUTPUT_FILE ocr -l LINES ... + +It is also possible to recognize more than one file at a time by just chaining +``-i ... ...`` clauses like this: + +.. code-block:: console + + $ kraken -i input_1 output_1 -i input_2 output_2 ... + +Finally, there is a central repository containing freely available models. +Getting a list of all available models: + +.. code-block:: console + + $ kraken list + +Retrieving model metadata for a particular model: + +.. code-block:: console + + $ kraken show arabic-alam-al-kutub + name: arabic-alam-al-kutub.mlmodel + + An experimental model for Classical Arabic texts. + + Network trained on 889 lines of [0] as a test case for a general Classical + Arabic model. Ground truth was prepared by Sarah Savant + and Maxim Romanov . + + Vocalization was omitted in the ground truth. Training was stopped at ~35000 + iterations with an accuracy of 97%. + + [0] Ibn al-Faqīh (d. 365 AH). Kitāb al-buldān. Edited by Yūsuf al-Hādī, 1st + edition. Bayrūt: ʿĀlam al-kutub, 1416 AH/1996 CE. + alphabet: !()-.0123456789:[] «»،؟ءابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ARABIC + MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW + +and actually fetching the model: + +.. code-block:: console + + $ kraken get arabic-alam-al-kutub + +The downloaded model can then be used for recognition by the name shown in its metadata, e.g.: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m arabic-alam-al-kutub.mlmodel + +For more documentation see the kraken `website `_. diff --git a/5.0.0/_sources/vgsl.rst.txt b/5.0.0/_sources/vgsl.rst.txt new file mode 100644 index 000000000..8a956b213 --- /dev/null +++ b/5.0.0/_sources/vgsl.rst.txt @@ -0,0 +1,233 @@ +.. _vgsl: + +VGSL network specification +========================== + +kraken implements a dialect of the Variable-size Graph Specification Language +(VGSL), enabling the specification of different network architectures for image +processing purposes using a short definition string. + +Basics +------ + +A VGSL specification consists of an input block, one or more layers, and an +output block. For example: + +.. code-block:: console + + [1,48,0,1 Cr3,3,32 Mp2,2 Cr3,3,64 Mp2,2 S1(1x12)1,3 Lbx100 Do O1c103] + +The first block defines the input in order of [batch, height, width, channels] +with zero-valued dimensions being variable. Integer valued height or width +input specifications will result in the input images being automatically scaled +in either dimension. + +When channels are set to 1 grayscale or B/W inputs are expected, 3 expects RGB +color images. Higher values in combination with a height of 1 result in the +network being fed 1 pixel wide grayscale strips scaled to the size of the +channel dimension. + +After the input, a number of layers are defined. Layers operate on the channel +dimension; this is intuitive for convolutional layers but a recurrent layer +doing sequence classification along the width axis on an image of a particular +height requires the height dimension to be moved to the channel dimension, +e.g.: + +.. code-block:: console + + [1,48,0,1 S1(1x48)1,3 Lbx100 O1c103] + +or using the alternative slightly faster formulation: + +.. code-block:: console + + [1,1,0,48 Lbx100 O1c103] + +Finally an output definition is appended. When training sequence classification +networks with the provided tools the appropriate output definition is +automatically appended to the network based on the alphabet of the training +data. + +Examples +-------- + +.. code-block:: console + + [1,1,0,48 Lbx100 Do 01c59] + + Creating new model [1,1,0,48 Lbx100 Do] with 59 outputs + layer type params + 0 rnn direction b transposed False summarize False out 100 legacy None + 1 dropout probability 0.5 dims 1 + 2 linear augmented False out 59 + +A simple recurrent recognition model with a single LSTM layer classifying lines +normalized to 48 pixels in height. + +.. code-block:: console + + [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do 01c59] + + Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 59 outputs + layer type params + 0 conv kernel 3 x 3 filters 32 activation r + 1 dropout probability 0.1 dims 2 + 2 maxpool kernel 2 x 2 stride 2 x 2 + 3 conv kernel 3 x 3 filters 64 activation r + 4 dropout probability 0.1 dims 2 + 5 maxpool kernel 2 x 2 stride 2 x 2 + 6 reshape from 1 1 x 12 to 1/3 + 7 rnn direction b transposed False summarize False out 100 legacy None + 8 dropout probability 0.5 dims 1 + 9 linear augmented False out 59 + +A model with a small convolutional stack before a recurrent LSTM layer. The +extended dropout layer syntax is used to reduce drop probability on the depth +dimension as the default is too high for convolutional layers. The remainder of +the height dimension (`12`) is reshaped into the depth dimensions before +applying the final recurrent and linear layers. + +.. code-block:: console + + [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do 01c59] + + Creating new model [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do] with 59 outputs + layer type params + 0 conv kernel 3 x 3 filters 16 activation r + 1 maxpool kernel 3 x 3 stride 3 x 3 + 2 rnn direction f transposed True summarize True out 64 legacy None + 3 rnn direction b transposed False summarize False out 128 legacy None + 4 rnn direction b transposed False summarize False out 256 legacy None + 5 dropout probability 0.5 dims 1 + 6 linear augmented False out 59 + +A model with arbitrary sized color image input, an initial summarizing +recurrent layer to squash the height to 64, followed by 2 bi-directional +recurrent layers and a linear projection. + +.. code-block:: console + + [1,1800,0,3 Cr3,3,32 Gn8 (I [Cr3,3,64,2,2 Gn8 CTr3,3,32,2,2]) Cr3,3,32 O2l8] + + layer type params + 0 conv kernel 3 x 3 filters 32 activation r + 1 groupnorm 8 groups + 2 parallel execute 2.0 and 2.1 in parallel + 2.0 identity + 2.1 serial execute 2.1.0 to 2.1.2 in sequence + 2.1.0 conv kernel 3 x 3 stride 2 x 2 filters 64 activation r + 2.1.1 groupnorm 8 groups + 2.1.2 transposed convolution kernel 3 x 3 stride 2 x 2 filters 2 activation r + 3 conv kernel 3 x 3 stride 1 x 1 filters 32 activation r + 4 linear activation sigmoid + +A model that outputs heatmaps with 8 feature dimensions, taking color images with +height normalized to 1800 pixels as its input. It uses a strided convolution +to first scale the image down, and then a transposed convolution to transform +the image back to its original size. This is done in a parallel block, where the +other branch simply passes through the output of the first convolution layer. +The input of the last convolutional layer is then the output of the two branches +of the parallel block concatenated, i.e. the output of the first +convolutional layer together with the output of the transposed convolutional layer, +giving `32 + 32 = 64` feature dimensions. + +Convolutional Layers +-------------------- + +.. code-block:: console + + C[T][{name}](s|t|r|l|m)[{name}],,[,,][,,] + s = sigmoid + t = tanh + r = relu + l = linear + m = softmax + +Adds a 2D convolution with kernel size `(y, x)` and `d` output channels, applying +the selected nonlinearity. Stride and dilation can be adjusted with the optional last +two parameters. `T` gives a transposed convolution. For transposed convolutions, +several output sizes are possible for the same configuration. The system +will try to match the output size of the different branches of parallel +blocks, however, this will only work if the transposed convolution directly +proceeds the confluence of the parallel branches, and if the branches with +fixed output size come first in the definition of the parallel block. Hence, +out of `(I [Cr3,3,8,2,2 CTr3,3,8,2,2])`, `([Cr3,3,8,2,2 CTr3,3,8,2,2] I)` +and `(I [Cr3,3,8,2,2 CTr3,3,8,2,2 Gn8])` only the first variant will +behave correctly. + +Recurrent Layers +---------------- + +.. code-block:: console + + L[{name}](f|r|b)(x|y)[s][{name}] LSTM cell with n outputs. + G[{name}](f|r|b)(x|y)[s][{name}] GRU cell with n outputs. + f runs the RNN forward only. + r runs the RNN reversed only. + b runs the RNN bidirectionally. + s (optional) summarizes the output in the requested dimension, return the last step. + +Adds either an LSTM or GRU recurrent layer to the network using either the `x` +(width) or `y` (height) dimension as the time axis. Input features are the +channel dimension and the non-time-axis dimension (height/width) is treated as +another batch dimension. For example, a `Lfx25` layer on an `1, 16, 906, 32` +input will execute 16 independent forward passes on `906x32` tensors resulting +in an output of shape `1, 16, 906, 25`. If this isn't desired either run a +summarizing layer in the other direction, e.g. `Lfys20` for an input `1, 1, +906, 20`, or prepend a reshape layer `S1(1x16)1,3` combining the height and +channel dimension for an `1, 1, 906, 512` input to the recurrent layer. + +Helper and Plumbing Layers +-------------------------- + +Max Pool +^^^^^^^^ +.. code-block:: console + + Mp[{name}],[,,] + +Adds a maximum pooling with `(y, x)` kernel_size and `(y_stride, x_stride)` stride. + +Reshape +^^^^^^^ + +.. code-block:: console + + S[{name}](x), Splits one dimension, moves one part to another + dimension. + +The `S` layer reshapes a source dimension `d` to `a,b` and distributes `a` into +dimension `e`, respectively `b` into `f`. Either `e` or `f` has to be equal to +`d`. So `S1(1, 48)1, 3` on an `1, 48, 1020, 8` input will first reshape into +`1, 1, 48, 1020, 8`, leave the `1` part in the height dimension and distribute +the `48` sized tensor into the channel dimension resulting in a `1, 1, 1024, +48*8=384` sized output. `S` layers are mostly used to remove undesirable non-1 +height before a recurrent layer. + +.. note:: + + This `S` layer is equivalent to the one implemented in the tensorflow + implementation of VGSL, i.e. behaves differently from tesseract. + +Regularization Layers +--------------------- + +Dropout +^^^^^^^ + +.. code-block:: console + + Do[{name}][],[] Insert a 1D or 2D dropout layer + +Adds an 1D or 2D dropout layer with a given probability. Defaults to `0.5` drop +probability and 1D dropout. Set to `dim` to `2` after convolutional layers. + +Group Normalization +^^^^^^^^^^^^^^^^^^^ + +.. code-block:: console + + Gn Inserts a group normalization layer + +Adds a group normalization layer separating the input into `` groups, +normalizing each separately. diff --git a/5.0.0/_static/alabaster.css b/5.0.0/_static/alabaster.css new file mode 100644 index 000000000..e3174bf93 --- /dev/null +++ b/5.0.0/_static/alabaster.css @@ -0,0 +1,708 @@ +@import url("basic.css"); + +/* -- page layout ----------------------------------------------------------- */ + +body { + font-family: Georgia, serif; + font-size: 17px; + background-color: #fff; + color: #000; + margin: 0; + padding: 0; +} + + +div.document { + width: 940px; + margin: 30px auto 0 auto; +} + +div.documentwrapper { + float: left; + width: 100%; +} + +div.bodywrapper { + margin: 0 0 0 220px; +} + +div.sphinxsidebar { + width: 220px; + font-size: 14px; + line-height: 1.5; +} + +hr { + border: 1px solid #B1B4B6; +} + +div.body { + background-color: #fff; + color: #3E4349; + padding: 0 30px 0 30px; +} + +div.body > .section { + text-align: left; +} + +div.footer { + width: 940px; + margin: 20px auto 30px auto; + font-size: 14px; + color: #888; + text-align: right; +} + +div.footer a { + color: #888; +} + +p.caption { + font-family: inherit; + font-size: inherit; +} + + +div.relations { + display: none; +} + + +div.sphinxsidebar { + max-height: 100%; + overflow-y: auto; +} + +div.sphinxsidebar a { + color: #444; + text-decoration: none; + border-bottom: 1px dotted #999; +} + +div.sphinxsidebar a:hover { + border-bottom: 1px solid #999; +} + +div.sphinxsidebarwrapper { + padding: 18px 10px; +} + +div.sphinxsidebarwrapper p.logo { + padding: 0; + margin: -10px 0 0 0px; + text-align: center; +} + +div.sphinxsidebarwrapper h1.logo { + margin-top: -10px; + text-align: center; + margin-bottom: 5px; + text-align: left; +} + +div.sphinxsidebarwrapper h1.logo-name { + margin-top: 0px; +} + +div.sphinxsidebarwrapper p.blurb { + margin-top: 0; + font-style: normal; +} + +div.sphinxsidebar h3, +div.sphinxsidebar h4 { + font-family: Georgia, serif; + color: #444; + font-size: 24px; + font-weight: normal; + margin: 0 0 5px 0; + padding: 0; +} + +div.sphinxsidebar h4 { + font-size: 20px; +} + +div.sphinxsidebar h3 a { + color: #444; +} + +div.sphinxsidebar p.logo a, +div.sphinxsidebar h3 a, +div.sphinxsidebar p.logo a:hover, +div.sphinxsidebar h3 a:hover { + border: none; +} + +div.sphinxsidebar p { + color: #555; + margin: 10px 0; +} + +div.sphinxsidebar ul { + margin: 10px 0; + padding: 0; + color: #000; +} + +div.sphinxsidebar ul li.toctree-l1 > a { + font-size: 120%; +} + +div.sphinxsidebar ul li.toctree-l2 > a { + font-size: 110%; +} + +div.sphinxsidebar input { + border: 1px solid #CCC; + font-family: Georgia, serif; + font-size: 1em; +} + +div.sphinxsidebar #searchbox input[type="text"] { + width: 160px; +} + +div.sphinxsidebar .search > div { + display: table-cell; +} + +div.sphinxsidebar hr { + border: none; + height: 1px; + color: #AAA; + background: #AAA; + + text-align: left; + margin-left: 0; + width: 50%; +} + +div.sphinxsidebar .badge { + border-bottom: none; +} + +div.sphinxsidebar .badge:hover { + border-bottom: none; +} + +/* To address an issue with donation coming after search */ +div.sphinxsidebar h3.donation { + margin-top: 10px; +} + +/* -- body styles ----------------------------------------------------------- */ + +a { + color: #004B6B; + text-decoration: underline; +} + +a:hover { + color: #6D4100; + text-decoration: underline; +} + +div.body h1, +div.body h2, +div.body h3, +div.body h4, +div.body h5, +div.body h6 { + font-family: Georgia, serif; + font-weight: normal; + margin: 30px 0px 10px 0px; + padding: 0; +} + +div.body h1 { margin-top: 0; padding-top: 0; font-size: 240%; } +div.body h2 { font-size: 180%; } +div.body h3 { font-size: 150%; } +div.body h4 { font-size: 130%; } +div.body h5 { font-size: 100%; } +div.body h6 { font-size: 100%; } + +a.headerlink { + color: #DDD; + padding: 0 4px; + text-decoration: none; +} + +a.headerlink:hover { + color: #444; + background: #EAEAEA; +} + +div.body p, div.body dd, div.body li { + line-height: 1.4em; +} + +div.admonition { + margin: 20px 0px; + padding: 10px 30px; + background-color: #EEE; + border: 1px solid #CCC; +} + +div.admonition tt.xref, div.admonition code.xref, div.admonition a tt { + background-color: #FBFBFB; + border-bottom: 1px solid #fafafa; +} + +div.admonition p.admonition-title { + font-family: Georgia, serif; + font-weight: normal; + font-size: 24px; + margin: 0 0 10px 0; + padding: 0; + line-height: 1; +} + +div.admonition p.last { + margin-bottom: 0; +} + +div.highlight { + background-color: #fff; +} + +dt:target, .highlight { + background: #FAF3E8; +} + +div.warning { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.danger { + background-color: #FCC; + border: 1px solid #FAA; + -moz-box-shadow: 2px 2px 4px #D52C2C; + -webkit-box-shadow: 2px 2px 4px #D52C2C; + box-shadow: 2px 2px 4px #D52C2C; +} + +div.error { + background-color: #FCC; + border: 1px solid #FAA; + -moz-box-shadow: 2px 2px 4px #D52C2C; + -webkit-box-shadow: 2px 2px 4px #D52C2C; + box-shadow: 2px 2px 4px #D52C2C; +} + +div.caution { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.attention { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.important { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.note { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.tip { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.hint { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.seealso { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.topic { + background-color: #EEE; +} + +p.admonition-title { + display: inline; +} + +p.admonition-title:after { + content: ":"; +} + +pre, tt, code { + font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace; + font-size: 0.9em; +} + +.hll { + background-color: #FFC; + margin: 0 -12px; + padding: 0 12px; + display: block; +} + +img.screenshot { +} + +tt.descname, tt.descclassname, code.descname, code.descclassname { + font-size: 0.95em; +} + +tt.descname, code.descname { + padding-right: 0.08em; +} + +img.screenshot { + -moz-box-shadow: 2px 2px 4px #EEE; + -webkit-box-shadow: 2px 2px 4px #EEE; + box-shadow: 2px 2px 4px #EEE; +} + +table.docutils { + border: 1px solid #888; + -moz-box-shadow: 2px 2px 4px #EEE; + -webkit-box-shadow: 2px 2px 4px #EEE; + box-shadow: 2px 2px 4px #EEE; +} + +table.docutils td, table.docutils th { + border: 1px solid #888; + padding: 0.25em 0.7em; +} + +table.field-list, table.footnote { + border: none; + -moz-box-shadow: none; + -webkit-box-shadow: none; + box-shadow: none; +} + +table.footnote { + margin: 15px 0; + width: 100%; + border: 1px solid #EEE; + background: #FDFDFD; + font-size: 0.9em; +} + +table.footnote + table.footnote { + margin-top: -15px; + border-top: none; +} + +table.field-list th { + padding: 0 0.8em 0 0; +} + +table.field-list td { + padding: 0; +} + +table.field-list p { + margin-bottom: 0.8em; +} + +/* Cloned from + * https://github.com/sphinx-doc/sphinx/commit/ef60dbfce09286b20b7385333d63a60321784e68 + */ +.field-name { + -moz-hyphens: manual; + -ms-hyphens: manual; + -webkit-hyphens: manual; + hyphens: manual; +} + +table.footnote td.label { + width: .1px; + padding: 0.3em 0 0.3em 0.5em; +} + +table.footnote td { + padding: 0.3em 0.5em; +} + +dl { + margin-left: 0; + margin-right: 0; + margin-top: 0; + padding: 0; +} + +dl dd { + margin-left: 30px; +} + +blockquote { + margin: 0 0 0 30px; + padding: 0; +} + +ul, ol { + /* Matches the 30px from the narrow-screen "li > ul" selector below */ + margin: 10px 0 10px 30px; + padding: 0; +} + +pre { + background: #EEE; + padding: 7px 30px; + margin: 15px 0px; + line-height: 1.3em; +} + +div.viewcode-block:target { + background: #ffd; +} + +dl pre, blockquote pre, li pre { + margin-left: 0; + padding-left: 30px; +} + +tt, code { + background-color: #ecf0f3; + color: #222; + /* padding: 1px 2px; */ +} + +tt.xref, code.xref, a tt { + background-color: #FBFBFB; + border-bottom: 1px solid #fff; +} + +a.reference { + text-decoration: none; + border-bottom: 1px dotted #004B6B; +} + +/* Don't put an underline on images */ +a.image-reference, a.image-reference:hover { + border-bottom: none; +} + +a.reference:hover { + border-bottom: 1px solid #6D4100; +} + +a.footnote-reference { + text-decoration: none; + font-size: 0.7em; + vertical-align: top; + border-bottom: 1px dotted #004B6B; +} + +a.footnote-reference:hover { + border-bottom: 1px solid #6D4100; +} + +a:hover tt, a:hover code { + background: #EEE; +} + + +@media screen and (max-width: 870px) { + + div.sphinxsidebar { + display: none; + } + + div.document { + width: 100%; + + } + + div.documentwrapper { + margin-left: 0; + margin-top: 0; + margin-right: 0; + margin-bottom: 0; + } + + div.bodywrapper { + margin-top: 0; + margin-right: 0; + margin-bottom: 0; + margin-left: 0; + } + + ul { + margin-left: 0; + } + + li > ul { + /* Matches the 30px from the "ul, ol" selector above */ + margin-left: 30px; + } + + .document { + width: auto; + } + + .footer { + width: auto; + } + + .bodywrapper { + margin: 0; + } + + .footer { + width: auto; + } + + .github { + display: none; + } + + + +} + + + +@media screen and (max-width: 875px) { + + body { + margin: 0; + padding: 20px 30px; + } + + div.documentwrapper { + float: none; + background: #fff; + } + + div.sphinxsidebar { + display: block; + float: none; + width: 102.5%; + margin: 50px -30px -20px -30px; + padding: 10px 20px; + background: #333; + color: #FFF; + } + + div.sphinxsidebar h3, div.sphinxsidebar h4, div.sphinxsidebar p, + div.sphinxsidebar h3 a { + color: #fff; + } + + div.sphinxsidebar a { + color: #AAA; + } + + div.sphinxsidebar p.logo { + display: none; + } + + div.document { + width: 100%; + margin: 0; + } + + div.footer { + display: none; + } + + div.bodywrapper { + margin: 0; + } + + div.body { + min-height: 0; + padding: 0; + } + + .rtd_doc_footer { + display: none; + } + + .document { + width: auto; + } + + .footer { + width: auto; + } + + .footer { + width: auto; + } + + .github { + display: none; + } +} + + +/* misc. */ + +.revsys-inline { + display: none!important; +} + +/* Hide ugly table cell borders in ..bibliography:: directive output */ +table.docutils.citation, table.docutils.citation td, table.docutils.citation th { + border: none; + /* Below needed in some edge cases; if not applied, bottom shadows appear */ + -moz-box-shadow: none; + -webkit-box-shadow: none; + box-shadow: none; +} + + +/* relbar */ + +.related { + line-height: 30px; + width: 100%; + font-size: 0.9rem; +} + +.related.top { + border-bottom: 1px solid #EEE; + margin-bottom: 20px; +} + +.related.bottom { + border-top: 1px solid #EEE; +} + +.related ul { + padding: 0; + margin: 0; + list-style: none; +} + +.related li { + display: inline; +} + +nav#rellinks { + float: right; +} + +nav#rellinks li+li:before { + content: "|"; +} + +nav#breadcrumbs li+li:before { + content: "\00BB"; +} + +/* Hide certain items when printing */ +@media print { + div.related { + display: none; + } +} \ No newline at end of file diff --git a/5.0.0/_static/basic.css b/5.0.0/_static/basic.css new file mode 100644 index 000000000..e5179b7a9 --- /dev/null +++ b/5.0.0/_static/basic.css @@ -0,0 +1,925 @@ +/* + * basic.css + * ~~~~~~~~~ + * + * Sphinx stylesheet -- basic theme. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +/* -- main layout ----------------------------------------------------------- */ + +div.clearer { + clear: both; +} + +div.section::after { + display: block; + content: ''; + clear: left; +} + +/* -- relbar ---------------------------------------------------------------- */ + +div.related { + width: 100%; + font-size: 90%; +} + +div.related h3 { + display: none; +} + +div.related ul { + margin: 0; + padding: 0 0 0 10px; + list-style: none; +} + +div.related li { + display: inline; +} + +div.related li.right { + float: right; + margin-right: 5px; +} + +/* -- sidebar --------------------------------------------------------------- */ + +div.sphinxsidebarwrapper { + padding: 10px 5px 0 10px; +} + +div.sphinxsidebar { + float: left; + width: 230px; + margin-left: -100%; + font-size: 90%; + word-wrap: break-word; + overflow-wrap : break-word; +} + +div.sphinxsidebar ul { + list-style: none; +} + +div.sphinxsidebar ul ul, +div.sphinxsidebar ul.want-points { + margin-left: 20px; + list-style: square; +} + +div.sphinxsidebar ul ul { + margin-top: 0; + margin-bottom: 0; +} + +div.sphinxsidebar form { + margin-top: 10px; +} + +div.sphinxsidebar input { + border: 1px solid #98dbcc; + font-family: sans-serif; + font-size: 1em; +} + +div.sphinxsidebar #searchbox form.search { + overflow: hidden; +} + +div.sphinxsidebar #searchbox input[type="text"] { + float: left; + width: 80%; + padding: 0.25em; + box-sizing: border-box; +} + +div.sphinxsidebar #searchbox input[type="submit"] { + float: left; + width: 20%; + border-left: none; + padding: 0.25em; + box-sizing: border-box; +} + + +img { + border: 0; + max-width: 100%; +} + +/* -- search page ----------------------------------------------------------- */ + +ul.search { + margin: 10px 0 0 20px; + padding: 0; +} + +ul.search li { + padding: 5px 0 5px 20px; + background-image: url(file.png); + background-repeat: no-repeat; + background-position: 0 7px; +} + +ul.search li a { + font-weight: bold; +} + +ul.search li p.context { + color: #888; + margin: 2px 0 0 30px; + text-align: left; +} + +ul.keywordmatches li.goodmatch a { + font-weight: bold; +} + +/* -- index page ------------------------------------------------------------ */ + +table.contentstable { + width: 90%; + margin-left: auto; + margin-right: auto; +} + +table.contentstable p.biglink { + line-height: 150%; +} + +a.biglink { + font-size: 1.3em; +} + +span.linkdescr { + font-style: italic; + padding-top: 5px; + font-size: 90%; +} + +/* -- general index --------------------------------------------------------- */ + +table.indextable { + width: 100%; +} + +table.indextable td { + text-align: left; + vertical-align: top; +} + +table.indextable ul { + margin-top: 0; + margin-bottom: 0; + list-style-type: none; +} + +table.indextable > tbody > tr > td > ul { + padding-left: 0em; +} + +table.indextable tr.pcap { + height: 10px; +} + +table.indextable tr.cap { + margin-top: 10px; + background-color: #f2f2f2; +} + +img.toggler { + margin-right: 3px; + margin-top: 3px; + cursor: pointer; +} + +div.modindex-jumpbox { + border-top: 1px solid #ddd; + border-bottom: 1px solid #ddd; + margin: 1em 0 1em 0; + padding: 0.4em; +} + +div.genindex-jumpbox { + border-top: 1px solid #ddd; + border-bottom: 1px solid #ddd; + margin: 1em 0 1em 0; + padding: 0.4em; +} + +/* -- domain module index --------------------------------------------------- */ + +table.modindextable td { + padding: 2px; + border-collapse: collapse; +} + +/* -- general body styles --------------------------------------------------- */ + +div.body { + min-width: inherit; + max-width: 800px; +} + +div.body p, div.body dd, div.body li, div.body blockquote { + -moz-hyphens: auto; + -ms-hyphens: auto; + -webkit-hyphens: auto; + hyphens: auto; +} + +a.headerlink { + visibility: hidden; +} + +a:visited { + color: #551A8B; +} + +h1:hover > a.headerlink, +h2:hover > a.headerlink, +h3:hover > a.headerlink, +h4:hover > a.headerlink, +h5:hover > a.headerlink, +h6:hover > a.headerlink, +dt:hover > a.headerlink, +caption:hover > a.headerlink, +p.caption:hover > a.headerlink, +div.code-block-caption:hover > a.headerlink { + visibility: visible; +} + +div.body p.caption { + text-align: inherit; +} + +div.body td { + text-align: left; +} + +.first { + margin-top: 0 !important; +} + +p.rubric { + margin-top: 30px; + font-weight: bold; +} + +img.align-left, figure.align-left, .figure.align-left, object.align-left { + clear: left; + float: left; + margin-right: 1em; +} + +img.align-right, figure.align-right, .figure.align-right, object.align-right { + clear: right; + float: right; + margin-left: 1em; +} + +img.align-center, figure.align-center, .figure.align-center, object.align-center { + display: block; + margin-left: auto; + margin-right: auto; +} + +img.align-default, figure.align-default, .figure.align-default { + display: block; + margin-left: auto; + margin-right: auto; +} + +.align-left { + text-align: left; +} + +.align-center { + text-align: center; +} + +.align-default { + text-align: center; +} + +.align-right { + text-align: right; +} + +/* -- sidebars -------------------------------------------------------------- */ + +div.sidebar, +aside.sidebar { + margin: 0 0 0.5em 1em; + border: 1px solid #ddb; + padding: 7px; + background-color: #ffe; + width: 40%; + float: right; + clear: right; + overflow-x: auto; +} + +p.sidebar-title { + font-weight: bold; +} + +nav.contents, +aside.topic, +div.admonition, div.topic, blockquote { + clear: left; +} + +/* -- topics ---------------------------------------------------------------- */ + +nav.contents, +aside.topic, +div.topic { + border: 1px solid #ccc; + padding: 7px; + margin: 10px 0 10px 0; +} + +p.topic-title { + font-size: 1.1em; + font-weight: bold; + margin-top: 10px; +} + +/* -- admonitions ----------------------------------------------------------- */ + +div.admonition { + margin-top: 10px; + margin-bottom: 10px; + padding: 7px; +} + +div.admonition dt { + font-weight: bold; +} + +p.admonition-title { + margin: 0px 10px 5px 0px; + font-weight: bold; +} + +div.body p.centered { + text-align: center; + margin-top: 25px; +} + +/* -- content of sidebars/topics/admonitions -------------------------------- */ + +div.sidebar > :last-child, +aside.sidebar > :last-child, +nav.contents > :last-child, +aside.topic > :last-child, +div.topic > :last-child, +div.admonition > :last-child { + margin-bottom: 0; +} + +div.sidebar::after, +aside.sidebar::after, +nav.contents::after, +aside.topic::after, +div.topic::after, +div.admonition::after, +blockquote::after { + display: block; + content: ''; + clear: both; +} + +/* -- tables ---------------------------------------------------------------- */ + +table.docutils { + margin-top: 10px; + margin-bottom: 10px; + border: 0; + border-collapse: collapse; +} + +table.align-center { + margin-left: auto; + margin-right: auto; +} + +table.align-default { + margin-left: auto; + margin-right: auto; +} + +table caption span.caption-number { + font-style: italic; +} + +table caption span.caption-text { +} + +table.docutils td, table.docutils th { + padding: 1px 8px 1px 5px; + border-top: 0; + border-left: 0; + border-right: 0; + border-bottom: 1px solid #aaa; +} + +th { + text-align: left; + padding-right: 5px; +} + +table.citation { + border-left: solid 1px gray; + margin-left: 1px; +} + +table.citation td { + border-bottom: none; +} + +th > :first-child, +td > :first-child { + margin-top: 0px; +} + +th > :last-child, +td > :last-child { + margin-bottom: 0px; +} + +/* -- figures --------------------------------------------------------------- */ + +div.figure, figure { + margin: 0.5em; + padding: 0.5em; +} + +div.figure p.caption, figcaption { + padding: 0.3em; +} + +div.figure p.caption span.caption-number, +figcaption span.caption-number { + font-style: italic; +} + +div.figure p.caption span.caption-text, +figcaption span.caption-text { +} + +/* -- field list styles ----------------------------------------------------- */ + +table.field-list td, table.field-list th { + border: 0 !important; +} + +.field-list ul { + margin: 0; + padding-left: 1em; +} + +.field-list p { + margin: 0; +} + +.field-name { + -moz-hyphens: manual; + -ms-hyphens: manual; + -webkit-hyphens: manual; + hyphens: manual; +} + +/* -- hlist styles ---------------------------------------------------------- */ + +table.hlist { + margin: 1em 0; +} + +table.hlist td { + vertical-align: top; +} + +/* -- object description styles --------------------------------------------- */ + +.sig { + font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace; +} + +.sig-name, code.descname { + background-color: transparent; + font-weight: bold; +} + +.sig-name { + font-size: 1.1em; +} + +code.descname { + font-size: 1.2em; +} + +.sig-prename, code.descclassname { + background-color: transparent; +} + +.optional { + font-size: 1.3em; +} + +.sig-paren { + font-size: larger; +} + +.sig-param.n { + font-style: italic; +} + +/* C++ specific styling */ + +.sig-inline.c-texpr, +.sig-inline.cpp-texpr { + font-family: unset; +} + +.sig.c .k, .sig.c .kt, +.sig.cpp .k, .sig.cpp .kt { + color: #0033B3; +} + +.sig.c .m, +.sig.cpp .m { + color: #1750EB; +} + +.sig.c .s, .sig.c .sc, +.sig.cpp .s, .sig.cpp .sc { + color: #067D17; +} + + +/* -- other body styles ----------------------------------------------------- */ + +ol.arabic { + list-style: decimal; +} + +ol.loweralpha { + list-style: lower-alpha; +} + +ol.upperalpha { + list-style: upper-alpha; +} + +ol.lowerroman { + list-style: lower-roman; +} + +ol.upperroman { + list-style: upper-roman; +} + +:not(li) > ol > li:first-child > :first-child, +:not(li) > ul > li:first-child > :first-child { + margin-top: 0px; +} + +:not(li) > ol > li:last-child > :last-child, +:not(li) > ul > li:last-child > :last-child { + margin-bottom: 0px; +} + +ol.simple ol p, +ol.simple ul p, +ul.simple ol p, +ul.simple ul p { + margin-top: 0; +} + +ol.simple > li:not(:first-child) > p, +ul.simple > li:not(:first-child) > p { + margin-top: 0; +} + +ol.simple p, +ul.simple p { + margin-bottom: 0; +} + +aside.footnote > span, +div.citation > span { + float: left; +} +aside.footnote > span:last-of-type, +div.citation > span:last-of-type { + padding-right: 0.5em; +} +aside.footnote > p { + margin-left: 2em; +} +div.citation > p { + margin-left: 4em; +} +aside.footnote > p:last-of-type, +div.citation > p:last-of-type { + margin-bottom: 0em; +} +aside.footnote > p:last-of-type:after, +div.citation > p:last-of-type:after { + content: ""; + clear: both; +} + +dl.field-list { + display: grid; + grid-template-columns: fit-content(30%) auto; +} + +dl.field-list > dt { + font-weight: bold; + word-break: break-word; + padding-left: 0.5em; + padding-right: 5px; +} + +dl.field-list > dd { + padding-left: 0.5em; + margin-top: 0em; + margin-left: 0em; + margin-bottom: 0em; +} + +dl { + margin-bottom: 15px; +} + +dd > :first-child { + margin-top: 0px; +} + +dd ul, dd table { + margin-bottom: 10px; +} + +dd { + margin-top: 3px; + margin-bottom: 10px; + margin-left: 30px; +} + +.sig dd { + margin-top: 0px; + margin-bottom: 0px; +} + +.sig dl { + margin-top: 0px; + margin-bottom: 0px; +} + +dl > dd:last-child, +dl > dd:last-child > :last-child { + margin-bottom: 0; +} + +dt:target, span.highlighted { + background-color: #fbe54e; +} + +rect.highlighted { + fill: #fbe54e; +} + +dl.glossary dt { + font-weight: bold; + font-size: 1.1em; +} + +.versionmodified { + font-style: italic; +} + +.system-message { + background-color: #fda; + padding: 5px; + border: 3px solid red; +} + +.footnote:target { + background-color: #ffa; +} + +.line-block { + display: block; + margin-top: 1em; + margin-bottom: 1em; +} + +.line-block .line-block { + margin-top: 0; + margin-bottom: 0; + margin-left: 1.5em; +} + +.guilabel, .menuselection { + font-family: sans-serif; +} + +.accelerator { + text-decoration: underline; +} + +.classifier { + font-style: oblique; +} + +.classifier:before { + font-style: normal; + margin: 0 0.5em; + content: ":"; + display: inline-block; +} + +abbr, acronym { + border-bottom: dotted 1px; + cursor: help; +} + +.translated { + background-color: rgba(207, 255, 207, 0.2) +} + +.untranslated { + background-color: rgba(255, 207, 207, 0.2) +} + +/* -- code displays --------------------------------------------------------- */ + +pre { + overflow: auto; + overflow-y: hidden; /* fixes display issues on Chrome browsers */ +} + +pre, div[class*="highlight-"] { + clear: both; +} + +span.pre { + -moz-hyphens: none; + -ms-hyphens: none; + -webkit-hyphens: none; + hyphens: none; + white-space: nowrap; +} + +div[class*="highlight-"] { + margin: 1em 0; +} + +td.linenos pre { + border: 0; + background-color: transparent; + color: #aaa; +} + +table.highlighttable { + display: block; +} + +table.highlighttable tbody { + display: block; +} + +table.highlighttable tr { + display: flex; +} + +table.highlighttable td { + margin: 0; + padding: 0; +} + +table.highlighttable td.linenos { + padding-right: 0.5em; +} + +table.highlighttable td.code { + flex: 1; + overflow: hidden; +} + +.highlight .hll { + display: block; +} + +div.highlight pre, +table.highlighttable pre { + margin: 0; +} + +div.code-block-caption + div { + margin-top: 0; +} + +div.code-block-caption { + margin-top: 1em; + padding: 2px 5px; + font-size: small; +} + +div.code-block-caption code { + background-color: transparent; +} + +table.highlighttable td.linenos, +span.linenos, +div.highlight span.gp { /* gp: Generic.Prompt */ + user-select: none; + -webkit-user-select: text; /* Safari fallback only */ + -webkit-user-select: none; /* Chrome/Safari */ + -moz-user-select: none; /* Firefox */ + -ms-user-select: none; /* IE10+ */ +} + +div.code-block-caption span.caption-number { + padding: 0.1em 0.3em; + font-style: italic; +} + +div.code-block-caption span.caption-text { +} + +div.literal-block-wrapper { + margin: 1em 0; +} + +code.xref, a code { + background-color: transparent; + font-weight: bold; +} + +h1 code, h2 code, h3 code, h4 code, h5 code, h6 code { + background-color: transparent; +} + +.viewcode-link { + float: right; +} + +.viewcode-back { + float: right; + font-family: sans-serif; +} + +div.viewcode-block:target { + margin: -1px -10px; + padding: 0 10px; +} + +/* -- math display ---------------------------------------------------------- */ + +img.math { + vertical-align: middle; +} + +div.body div.math p { + text-align: center; +} + +span.eqno { + float: right; +} + +span.eqno a.headerlink { + position: absolute; + z-index: 1; +} + +div.math:hover a.headerlink { + visibility: visible; +} + +/* -- printout stylesheet --------------------------------------------------- */ + +@media print { + div.document, + div.documentwrapper, + div.bodywrapper { + margin: 0 !important; + width: 100%; + } + + div.sphinxsidebar, + div.related, + div.footer, + #top-link { + display: none; + } +} \ No newline at end of file diff --git a/5.0.0/_static/blla_heatmap.jpg b/5.0.0/_static/blla_heatmap.jpg new file mode 100644 index 000000000..3f3381096 Binary files /dev/null and b/5.0.0/_static/blla_heatmap.jpg differ diff --git a/5.0.0/_static/blla_output.jpg b/5.0.0/_static/blla_output.jpg new file mode 100644 index 000000000..72c652fa4 Binary files /dev/null and b/5.0.0/_static/blla_output.jpg differ diff --git a/5.0.0/_static/bw.png b/5.0.0/_static/bw.png new file mode 100644 index 000000000..e7e5054eb Binary files /dev/null and b/5.0.0/_static/bw.png differ diff --git a/5.0.0/_static/custom.css b/5.0.0/_static/custom.css new file mode 100644 index 000000000..c41f90af5 --- /dev/null +++ b/5.0.0/_static/custom.css @@ -0,0 +1,24 @@ +pre { + white-space: pre-wrap; +} +svg { + width: 100%; +} +.highlight .err { + border: inherit; + box-sizing: inherit; +} + +div.leftside { + width: 110px; + padding: 0px 3px 0px 0px; + float: left; +} + +div.rightside { + margin-left: 125px; +} + +dl.py { + margin-top: 25px; +} diff --git a/5.0.0/_static/doctools.js b/5.0.0/_static/doctools.js new file mode 100644 index 000000000..4d67807d1 --- /dev/null +++ b/5.0.0/_static/doctools.js @@ -0,0 +1,156 @@ +/* + * doctools.js + * ~~~~~~~~~~~ + * + * Base JavaScript utilities for all Sphinx HTML documentation. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ +"use strict"; + +const BLACKLISTED_KEY_CONTROL_ELEMENTS = new Set([ + "TEXTAREA", + "INPUT", + "SELECT", + "BUTTON", +]); + +const _ready = (callback) => { + if (document.readyState !== "loading") { + callback(); + } else { + document.addEventListener("DOMContentLoaded", callback); + } +}; + +/** + * Small JavaScript module for the documentation. + */ +const Documentation = { + init: () => { + Documentation.initDomainIndexTable(); + Documentation.initOnKeyListeners(); + }, + + /** + * i18n support + */ + TRANSLATIONS: {}, + PLURAL_EXPR: (n) => (n === 1 ? 0 : 1), + LOCALE: "unknown", + + // gettext and ngettext don't access this so that the functions + // can safely bound to a different name (_ = Documentation.gettext) + gettext: (string) => { + const translated = Documentation.TRANSLATIONS[string]; + switch (typeof translated) { + case "undefined": + return string; // no translation + case "string": + return translated; // translation exists + default: + return translated[0]; // (singular, plural) translation tuple exists + } + }, + + ngettext: (singular, plural, n) => { + const translated = Documentation.TRANSLATIONS[singular]; + if (typeof translated !== "undefined") + return translated[Documentation.PLURAL_EXPR(n)]; + return n === 1 ? singular : plural; + }, + + addTranslations: (catalog) => { + Object.assign(Documentation.TRANSLATIONS, catalog.messages); + Documentation.PLURAL_EXPR = new Function( + "n", + `return (${catalog.plural_expr})` + ); + Documentation.LOCALE = catalog.locale; + }, + + /** + * helper function to focus on search bar + */ + focusSearchBar: () => { + document.querySelectorAll("input[name=q]")[0]?.focus(); + }, + + /** + * Initialise the domain index toggle buttons + */ + initDomainIndexTable: () => { + const toggler = (el) => { + const idNumber = el.id.substr(7); + const toggledRows = document.querySelectorAll(`tr.cg-${idNumber}`); + if (el.src.substr(-9) === "minus.png") { + el.src = `${el.src.substr(0, el.src.length - 9)}plus.png`; + toggledRows.forEach((el) => (el.style.display = "none")); + } else { + el.src = `${el.src.substr(0, el.src.length - 8)}minus.png`; + toggledRows.forEach((el) => (el.style.display = "")); + } + }; + + const togglerElements = document.querySelectorAll("img.toggler"); + togglerElements.forEach((el) => + el.addEventListener("click", (event) => toggler(event.currentTarget)) + ); + togglerElements.forEach((el) => (el.style.display = "")); + if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) togglerElements.forEach(toggler); + }, + + initOnKeyListeners: () => { + // only install a listener if it is really needed + if ( + !DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS && + !DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS + ) + return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.altKey || event.ctrlKey || event.metaKey) return; + + if (!event.shiftKey) { + switch (event.key) { + case "ArrowLeft": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const prevLink = document.querySelector('link[rel="prev"]'); + if (prevLink && prevLink.href) { + window.location.href = prevLink.href; + event.preventDefault(); + } + break; + case "ArrowRight": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const nextLink = document.querySelector('link[rel="next"]'); + if (nextLink && nextLink.href) { + window.location.href = nextLink.href; + event.preventDefault(); + } + break; + } + } + + // some keyboard layouts may need Shift to get / + switch (event.key) { + case "/": + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) break; + Documentation.focusSearchBar(); + event.preventDefault(); + } + }); + }, +}; + +// quick alias for translations +const _ = Documentation.gettext; + +_ready(Documentation.init); diff --git a/5.0.0/_static/documentation_options.js b/5.0.0/_static/documentation_options.js new file mode 100644 index 000000000..7e4c114f2 --- /dev/null +++ b/5.0.0/_static/documentation_options.js @@ -0,0 +1,13 @@ +const DOCUMENTATION_OPTIONS = { + VERSION: '', + LANGUAGE: 'en', + COLLAPSE_INDEX: false, + BUILDER: 'html', + FILE_SUFFIX: '.html', + LINK_SUFFIX: '.html', + HAS_SOURCE: true, + SOURCELINK_SUFFIX: '.txt', + NAVIGATION_WITH_KEYS: false, + SHOW_SEARCH_SUMMARY: true, + ENABLE_SEARCH_SHORTCUTS: true, +}; \ No newline at end of file diff --git a/5.0.0/_static/file.png b/5.0.0/_static/file.png new file mode 100644 index 000000000..a858a410e Binary files /dev/null and b/5.0.0/_static/file.png differ diff --git a/5.0.0/_static/graphviz.css b/5.0.0/_static/graphviz.css new file mode 100644 index 000000000..027576e34 --- /dev/null +++ b/5.0.0/_static/graphviz.css @@ -0,0 +1,19 @@ +/* + * graphviz.css + * ~~~~~~~~~~~~ + * + * Sphinx stylesheet -- graphviz extension. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +img.graphviz { + border: 0; + max-width: 100%; +} + +object.graphviz { + max-width: 100%; +} diff --git a/5.0.0/_static/kraken.png b/5.0.0/_static/kraken.png new file mode 100644 index 000000000..8f25dd8be Binary files /dev/null and b/5.0.0/_static/kraken.png differ diff --git a/5.0.0/_static/kraken_recognition.svg b/5.0.0/_static/kraken_recognition.svg new file mode 100644 index 000000000..129b2c67a --- /dev/null +++ b/5.0.0/_static/kraken_recognition.svg @@ -0,0 +1,948 @@ + + + + + + + + + + + + Output Matrix + + + Labels + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Label + Sequence + + + 15, 10, 1, ... + + + + 'Time' Steps + + + + + + + + + + + + + + 'Time' Steps + (Width) + + + + + + + + + + + + + + + + + + + + + + + + + + Neural + Net + + + + Character + Sequence + + + o, c, u, ... + + + + + + + + + + + + + + + CTC + decoder + + + + + Codec + + + + + + + + + + + + + + diff --git a/5.0.0/_static/kraken_segmentation.svg b/5.0.0/_static/kraken_segmentation.svg new file mode 100644 index 000000000..4b9c860ce --- /dev/null +++ b/5.0.0/_static/kraken_segmentation.svg @@ -0,0 +1,1161 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Pixel Labelling + + + + + + + + Line and Separator + Heatmaps + + + + + + + + + Bounding Polygon + Calculation + + + + + + + + + + + Baseline + Vectorization + and Orientation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Oriented + Baselines + + + + + + + + + Line + Ordering + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Bounding + Polygons + + + + + + + Trainable + + + + + + + + + + + + Segmentation + + + + + + + + + + Region Heatmaps + + + + + + + + + + Region + Vectorization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Region + Boundaries + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/5.0.0/_static/kraken_segmodel.svg b/5.0.0/_static/kraken_segmodel.svg new file mode 100644 index 000000000..e722a9707 --- /dev/null +++ b/5.0.0/_static/kraken_segmodel.svg @@ -0,0 +1,250 @@ + + + + + + + + + + + + + Segmentation Model + (TorchVGSLModel) + + + + + + + + + Metadata + + + + + + + Line and Region Types + + + + + + + Baseline location flag + + + + + + + Bounding Regions + + + + + + + + + + + Neural Network + + + + diff --git a/5.0.0/_static/kraken_torchseqrecognizer.svg b/5.0.0/_static/kraken_torchseqrecognizer.svg new file mode 100644 index 000000000..c9a2f1135 --- /dev/null +++ b/5.0.0/_static/kraken_torchseqrecognizer.svg @@ -0,0 +1,239 @@ + + + + + + + + + + + + + Transcription Model + (TorchSeqRecognizer) + + + + + + + + + + Codec + + + + + + + + + + + Metadata + + + + + + + + + + + CTC Decoder + + + + + + + + + + + Neural Network + + + + diff --git a/5.0.0/_static/kraken_workflow.svg b/5.0.0/_static/kraken_workflow.svg new file mode 100644 index 000000000..5a50b51d6 --- /dev/null +++ b/5.0.0/_static/kraken_workflow.svg @@ -0,0 +1,753 @@ + + + + + + + + + + + + + + + Segmentation + + + + + + + + + + + Recognition + + + + + + + + + + + Serialization + + + + + + + + + + + + + + + + + + + + + + Recognition Model + + + + + + + + + + + + + + + + + + + + + + Segmentation Model + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + OCR Records + + + + + + + + + + + + + + + + + + Baselines, + Regions, + and Order + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Output File + + + + + + + + + + + + + + + + + + Output Template + + + + + + + + + + + + + + + + + + Image + + diff --git a/5.0.0/_static/language_data.js b/5.0.0/_static/language_data.js new file mode 100644 index 000000000..367b8ed81 --- /dev/null +++ b/5.0.0/_static/language_data.js @@ -0,0 +1,199 @@ +/* + * language_data.js + * ~~~~~~~~~~~~~~~~ + * + * This script contains the language-specific data used by searchtools.js, + * namely the list of stopwords, stemmer, scorer and splitter. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"]; + + +/* Non-minified version is copied as a separate JS file, if available */ + +/** + * Porter Stemmer + */ +var Stemmer = function() { + + var step2list = { + ational: 'ate', + tional: 'tion', + enci: 'ence', + anci: 'ance', + izer: 'ize', + bli: 'ble', + alli: 'al', + entli: 'ent', + eli: 'e', + ousli: 'ous', + ization: 'ize', + ation: 'ate', + ator: 'ate', + alism: 'al', + iveness: 'ive', + fulness: 'ful', + ousness: 'ous', + aliti: 'al', + iviti: 'ive', + biliti: 'ble', + logi: 'log' + }; + + var step3list = { + icate: 'ic', + ative: '', + alize: 'al', + iciti: 'ic', + ical: 'ic', + ful: '', + ness: '' + }; + + var c = "[^aeiou]"; // consonant + var v = "[aeiouy]"; // vowel + var C = c + "[^aeiouy]*"; // consonant sequence + var V = v + "[aeiou]*"; // vowel sequence + + var mgr0 = "^(" + C + ")?" + V + C; // [C]VC... is m>0 + var meq1 = "^(" + C + ")?" + V + C + "(" + V + ")?$"; // [C]VC[V] is m=1 + var mgr1 = "^(" + C + ")?" + V + C + V + C; // [C]VCVC... is m>1 + var s_v = "^(" + C + ")?" + v; // vowel in stem + + this.stemWord = function (w) { + var stem; + var suffix; + var firstch; + var origword = w; + + if (w.length < 3) + return w; + + var re; + var re2; + var re3; + var re4; + + firstch = w.substr(0,1); + if (firstch == "y") + w = firstch.toUpperCase() + w.substr(1); + + // Step 1a + re = /^(.+?)(ss|i)es$/; + re2 = /^(.+?)([^s])s$/; + + if (re.test(w)) + w = w.replace(re,"$1$2"); + else if (re2.test(w)) + w = w.replace(re2,"$1$2"); + + // Step 1b + re = /^(.+?)eed$/; + re2 = /^(.+?)(ed|ing)$/; + if (re.test(w)) { + var fp = re.exec(w); + re = new RegExp(mgr0); + if (re.test(fp[1])) { + re = /.$/; + w = w.replace(re,""); + } + } + else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1]; + re2 = new RegExp(s_v); + if (re2.test(stem)) { + w = stem; + re2 = /(at|bl|iz)$/; + re3 = new RegExp("([^aeiouylsz])\\1$"); + re4 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + if (re2.test(w)) + w = w + "e"; + else if (re3.test(w)) { + re = /.$/; + w = w.replace(re,""); + } + else if (re4.test(w)) + w = w + "e"; + } + } + + // Step 1c + re = /^(.+?)y$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(s_v); + if (re.test(stem)) + w = stem + "i"; + } + + // Step 2 + re = /^(.+?)(ational|tional|enci|anci|izer|bli|alli|entli|eli|ousli|ization|ation|ator|alism|iveness|fulness|ousness|aliti|iviti|biliti|logi)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = new RegExp(mgr0); + if (re.test(stem)) + w = stem + step2list[suffix]; + } + + // Step 3 + re = /^(.+?)(icate|ative|alize|iciti|ical|ful|ness)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = new RegExp(mgr0); + if (re.test(stem)) + w = stem + step3list[suffix]; + } + + // Step 4 + re = /^(.+?)(al|ance|ence|er|ic|able|ible|ant|ement|ment|ent|ou|ism|ate|iti|ous|ive|ize)$/; + re2 = /^(.+?)(s|t)(ion)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(mgr1); + if (re.test(stem)) + w = stem; + } + else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1] + fp[2]; + re2 = new RegExp(mgr1); + if (re2.test(stem)) + w = stem; + } + + // Step 5 + re = /^(.+?)e$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(mgr1); + re2 = new RegExp(meq1); + re3 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + if (re.test(stem) || (re2.test(stem) && !(re3.test(stem)))) + w = stem; + } + re = /ll$/; + re2 = new RegExp(mgr1); + if (re.test(w) && re2.test(w)) { + re = /.$/; + w = w.replace(re,""); + } + + // and turn initial Y back to y + if (firstch == "y") + w = firstch.toLowerCase() + w.substr(1); + return w; + } +} + diff --git a/5.0.0/_static/minus.png b/5.0.0/_static/minus.png new file mode 100644 index 000000000..d96755fda Binary files /dev/null and b/5.0.0/_static/minus.png differ diff --git a/5.0.0/_static/normal-reproduction-low-resolution.jpg b/5.0.0/_static/normal-reproduction-low-resolution.jpg new file mode 100644 index 000000000..673be92ae Binary files /dev/null and b/5.0.0/_static/normal-reproduction-low-resolution.jpg differ diff --git a/5.0.0/_static/pat.png b/5.0.0/_static/pat.png new file mode 100644 index 000000000..52f7a8995 Binary files /dev/null and b/5.0.0/_static/pat.png differ diff --git a/5.0.0/_static/plus.png b/5.0.0/_static/plus.png new file mode 100644 index 000000000..7107cec93 Binary files /dev/null and b/5.0.0/_static/plus.png differ diff --git a/5.0.0/_static/pygments.css b/5.0.0/_static/pygments.css new file mode 100644 index 000000000..0d49244ed --- /dev/null +++ b/5.0.0/_static/pygments.css @@ -0,0 +1,75 @@ +pre { line-height: 125%; } +td.linenos .normal { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +span.linenos { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +td.linenos .special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +.highlight .hll { background-color: #ffffcc } +.highlight { background: #eeffcc; } +.highlight .c { color: #408090; font-style: italic } /* Comment */ +.highlight .err { border: 1px solid #FF0000 } /* Error */ +.highlight .k { color: #007020; font-weight: bold } /* Keyword */ +.highlight .o { color: #666666 } /* Operator */ +.highlight .ch { color: #408090; font-style: italic } /* Comment.Hashbang */ +.highlight .cm { color: #408090; font-style: italic } /* Comment.Multiline */ +.highlight .cp { color: #007020 } /* Comment.Preproc */ +.highlight .cpf { color: #408090; font-style: italic } /* Comment.PreprocFile */ +.highlight .c1 { color: #408090; font-style: italic } /* Comment.Single */ +.highlight .cs { color: #408090; background-color: #fff0f0 } /* Comment.Special */ +.highlight .gd { color: #A00000 } /* Generic.Deleted */ +.highlight .ge { font-style: italic } /* Generic.Emph */ +.highlight .ges { font-weight: bold; font-style: italic } /* Generic.EmphStrong */ +.highlight .gr { color: #FF0000 } /* Generic.Error */ +.highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */ +.highlight .gi { color: #00A000 } /* Generic.Inserted */ +.highlight .go { color: #333333 } /* Generic.Output */ +.highlight .gp { color: #c65d09; font-weight: bold } /* Generic.Prompt */ +.highlight .gs { font-weight: bold } /* Generic.Strong */ +.highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */ +.highlight .gt { color: #0044DD } /* Generic.Traceback */ +.highlight .kc { color: #007020; font-weight: bold } /* Keyword.Constant */ +.highlight .kd { color: #007020; font-weight: bold } /* Keyword.Declaration */ +.highlight .kn { color: #007020; font-weight: bold } /* Keyword.Namespace */ +.highlight .kp { color: #007020 } /* Keyword.Pseudo */ +.highlight .kr { color: #007020; font-weight: bold } /* Keyword.Reserved */ +.highlight .kt { color: #902000 } /* Keyword.Type */ +.highlight .m { color: #208050 } /* Literal.Number */ +.highlight .s { color: #4070a0 } /* Literal.String */ +.highlight .na { color: #4070a0 } /* Name.Attribute */ +.highlight .nb { color: #007020 } /* Name.Builtin */ +.highlight .nc { color: #0e84b5; font-weight: bold } /* Name.Class */ +.highlight .no { color: #60add5 } /* Name.Constant */ +.highlight .nd { color: #555555; font-weight: bold } /* Name.Decorator */ +.highlight .ni { color: #d55537; font-weight: bold } /* Name.Entity */ +.highlight .ne { color: #007020 } /* Name.Exception */ +.highlight .nf { color: #06287e } /* Name.Function */ +.highlight .nl { color: #002070; font-weight: bold } /* Name.Label */ +.highlight .nn { color: #0e84b5; font-weight: bold } /* Name.Namespace */ +.highlight .nt { color: #062873; font-weight: bold } /* Name.Tag */ +.highlight .nv { color: #bb60d5 } /* Name.Variable */ +.highlight .ow { color: #007020; font-weight: bold } /* Operator.Word */ +.highlight .w { color: #bbbbbb } /* Text.Whitespace */ +.highlight .mb { color: #208050 } /* Literal.Number.Bin */ +.highlight .mf { color: #208050 } /* Literal.Number.Float */ +.highlight .mh { color: #208050 } /* Literal.Number.Hex */ +.highlight .mi { color: #208050 } /* Literal.Number.Integer */ +.highlight .mo { color: #208050 } /* Literal.Number.Oct */ +.highlight .sa { color: #4070a0 } /* Literal.String.Affix */ +.highlight .sb { color: #4070a0 } /* Literal.String.Backtick */ +.highlight .sc { color: #4070a0 } /* Literal.String.Char */ +.highlight .dl { color: #4070a0 } /* Literal.String.Delimiter */ +.highlight .sd { color: #4070a0; font-style: italic } /* Literal.String.Doc */ +.highlight .s2 { color: #4070a0 } /* Literal.String.Double */ +.highlight .se { color: #4070a0; font-weight: bold } /* Literal.String.Escape */ +.highlight .sh { color: #4070a0 } /* Literal.String.Heredoc */ +.highlight .si { color: #70a0d0; font-style: italic } /* Literal.String.Interpol */ +.highlight .sx { color: #c65d09 } /* Literal.String.Other */ +.highlight .sr { color: #235388 } /* Literal.String.Regex */ +.highlight .s1 { color: #4070a0 } /* Literal.String.Single */ +.highlight .ss { color: #517918 } /* Literal.String.Symbol */ +.highlight .bp { color: #007020 } /* Name.Builtin.Pseudo */ +.highlight .fm { color: #06287e } /* Name.Function.Magic */ +.highlight .vc { color: #bb60d5 } /* Name.Variable.Class */ +.highlight .vg { color: #bb60d5 } /* Name.Variable.Global */ +.highlight .vi { color: #bb60d5 } /* Name.Variable.Instance */ +.highlight .vm { color: #bb60d5 } /* Name.Variable.Magic */ +.highlight .il { color: #208050 } /* Literal.Number.Integer.Long */ \ No newline at end of file diff --git a/5.0.0/_static/searchtools.js b/5.0.0/_static/searchtools.js new file mode 100644 index 000000000..b08d58c9b --- /dev/null +++ b/5.0.0/_static/searchtools.js @@ -0,0 +1,620 @@ +/* + * searchtools.js + * ~~~~~~~~~~~~~~~~ + * + * Sphinx JavaScript utilities for the full-text search. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ +"use strict"; + +/** + * Simple result scoring code. + */ +if (typeof Scorer === "undefined") { + var Scorer = { + // Implement the following function to further tweak the score for each result + // The function takes a result array [docname, title, anchor, descr, score, filename] + // and returns the new score. + /* + score: result => { + const [docname, title, anchor, descr, score, filename] = result + return score + }, + */ + + // query matches the full name of an object + objNameMatch: 11, + // or matches in the last dotted part of the object name + objPartialMatch: 6, + // Additive scores depending on the priority of the object + objPrio: { + 0: 15, // used to be importantResults + 1: 5, // used to be objectResults + 2: -5, // used to be unimportantResults + }, + // Used when the priority is not in the mapping. + objPrioDefault: 0, + + // query found in title + title: 15, + partialTitle: 7, + // query found in terms + term: 5, + partialTerm: 2, + }; +} + +const _removeChildren = (element) => { + while (element && element.lastChild) element.removeChild(element.lastChild); +}; + +/** + * See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping + */ +const _escapeRegExp = (string) => + string.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string + +const _displayItem = (item, searchTerms, highlightTerms) => { + const docBuilder = DOCUMENTATION_OPTIONS.BUILDER; + const docFileSuffix = DOCUMENTATION_OPTIONS.FILE_SUFFIX; + const docLinkSuffix = DOCUMENTATION_OPTIONS.LINK_SUFFIX; + const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY; + const contentRoot = document.documentElement.dataset.content_root; + + const [docName, title, anchor, descr, score, _filename] = item; + + let listItem = document.createElement("li"); + let requestUrl; + let linkUrl; + if (docBuilder === "dirhtml") { + // dirhtml builder + let dirname = docName + "/"; + if (dirname.match(/\/index\/$/)) + dirname = dirname.substring(0, dirname.length - 6); + else if (dirname === "index/") dirname = ""; + requestUrl = contentRoot + dirname; + linkUrl = requestUrl; + } else { + // normal html builders + requestUrl = contentRoot + docName + docFileSuffix; + linkUrl = docName + docLinkSuffix; + } + let linkEl = listItem.appendChild(document.createElement("a")); + linkEl.href = linkUrl + anchor; + linkEl.dataset.score = score; + linkEl.innerHTML = title; + if (descr) { + listItem.appendChild(document.createElement("span")).innerHTML = + " (" + descr + ")"; + // highlight search terms in the description + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + } + else if (showSearchSummary) + fetch(requestUrl) + .then((responseData) => responseData.text()) + .then((data) => { + if (data) + listItem.appendChild( + Search.makeSearchSummary(data, searchTerms, anchor) + ); + // highlight search terms in the summary + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + }); + Search.output.appendChild(listItem); +}; +const _finishSearch = (resultCount) => { + Search.stopPulse(); + Search.title.innerText = _("Search Results"); + if (!resultCount) + Search.status.innerText = Documentation.gettext( + "Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories." + ); + else + Search.status.innerText = _( + "Search finished, found ${resultCount} page(s) matching the search query." + ).replace('${resultCount}', resultCount); +}; +const _displayNextItem = ( + results, + resultCount, + searchTerms, + highlightTerms, +) => { + // results left, load the summary and display it + // this is intended to be dynamic (don't sub resultsCount) + if (results.length) { + _displayItem(results.pop(), searchTerms, highlightTerms); + setTimeout( + () => _displayNextItem(results, resultCount, searchTerms, highlightTerms), + 5 + ); + } + // search finished, update title and status message + else _finishSearch(resultCount); +}; +// Helper function used by query() to order search results. +// Each input is an array of [docname, title, anchor, descr, score, filename]. +// Order the results by score (in opposite order of appearance, since the +// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically. +const _orderResultsByScoreThenName = (a, b) => { + const leftScore = a[4]; + const rightScore = b[4]; + if (leftScore === rightScore) { + // same score: sort alphabetically + const leftTitle = a[1].toLowerCase(); + const rightTitle = b[1].toLowerCase(); + if (leftTitle === rightTitle) return 0; + return leftTitle > rightTitle ? -1 : 1; // inverted is intentional + } + return leftScore > rightScore ? 1 : -1; +}; + +/** + * Default splitQuery function. Can be overridden in ``sphinx.search`` with a + * custom function per language. + * + * The regular expression works by splitting the string on consecutive characters + * that are not Unicode letters, numbers, underscores, or emoji characters. + * This is the same as ``\W+`` in Python, preserving the surrogate pair area. + */ +if (typeof splitQuery === "undefined") { + var splitQuery = (query) => query + .split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu) + .filter(term => term) // remove remaining empty strings +} + +/** + * Search Module + */ +const Search = { + _index: null, + _queued_query: null, + _pulse_status: -1, + + htmlToText: (htmlString, anchor) => { + const htmlElement = new DOMParser().parseFromString(htmlString, 'text/html'); + for (const removalQuery of [".headerlink", "script", "style"]) { + htmlElement.querySelectorAll(removalQuery).forEach((el) => { el.remove() }); + } + if (anchor) { + const anchorContent = htmlElement.querySelector(`[role="main"] ${anchor}`); + if (anchorContent) return anchorContent.textContent; + + console.warn( + `Anchored content block not found. Sphinx search tries to obtain it via DOM query '[role=main] ${anchor}'. Check your theme or template.` + ); + } + + // if anchor not specified or not found, fall back to main content + const docContent = htmlElement.querySelector('[role="main"]'); + if (docContent) return docContent.textContent; + + console.warn( + "Content block not found. Sphinx search tries to obtain it via DOM query '[role=main]'. Check your theme or template." + ); + return ""; + }, + + init: () => { + const query = new URLSearchParams(window.location.search).get("q"); + document + .querySelectorAll('input[name="q"]') + .forEach((el) => (el.value = query)); + if (query) Search.performSearch(query); + }, + + loadIndex: (url) => + (document.body.appendChild(document.createElement("script")).src = url), + + setIndex: (index) => { + Search._index = index; + if (Search._queued_query !== null) { + const query = Search._queued_query; + Search._queued_query = null; + Search.query(query); + } + }, + + hasIndex: () => Search._index !== null, + + deferQuery: (query) => (Search._queued_query = query), + + stopPulse: () => (Search._pulse_status = -1), + + startPulse: () => { + if (Search._pulse_status >= 0) return; + + const pulse = () => { + Search._pulse_status = (Search._pulse_status + 1) % 4; + Search.dots.innerText = ".".repeat(Search._pulse_status); + if (Search._pulse_status >= 0) window.setTimeout(pulse, 500); + }; + pulse(); + }, + + /** + * perform a search for something (or wait until index is loaded) + */ + performSearch: (query) => { + // create the required interface elements + const searchText = document.createElement("h2"); + searchText.textContent = _("Searching"); + const searchSummary = document.createElement("p"); + searchSummary.classList.add("search-summary"); + searchSummary.innerText = ""; + const searchList = document.createElement("ul"); + searchList.classList.add("search"); + + const out = document.getElementById("search-results"); + Search.title = out.appendChild(searchText); + Search.dots = Search.title.appendChild(document.createElement("span")); + Search.status = out.appendChild(searchSummary); + Search.output = out.appendChild(searchList); + + const searchProgress = document.getElementById("search-progress"); + // Some themes don't use the search progress node + if (searchProgress) { + searchProgress.innerText = _("Preparing search..."); + } + Search.startPulse(); + + // index already loaded, the browser was quick! + if (Search.hasIndex()) Search.query(query); + else Search.deferQuery(query); + }, + + _parseQuery: (query) => { + // stem the search terms and add them to the correct list + const stemmer = new Stemmer(); + const searchTerms = new Set(); + const excludedTerms = new Set(); + const highlightTerms = new Set(); + const objectTerms = new Set(splitQuery(query.toLowerCase().trim())); + splitQuery(query.trim()).forEach((queryTerm) => { + const queryTermLower = queryTerm.toLowerCase(); + + // maybe skip this "word" + // stopwords array is from language_data.js + if ( + stopwords.indexOf(queryTermLower) !== -1 || + queryTerm.match(/^\d+$/) + ) + return; + + // stem the word + let word = stemmer.stemWord(queryTermLower); + // select the correct list + if (word[0] === "-") excludedTerms.add(word.substr(1)); + else { + searchTerms.add(word); + highlightTerms.add(queryTermLower); + } + }); + + if (SPHINX_HIGHLIGHT_ENABLED) { // set in sphinx_highlight.js + localStorage.setItem("sphinx_highlight_terms", [...highlightTerms].join(" ")) + } + + // console.debug("SEARCH: searching for:"); + // console.info("required: ", [...searchTerms]); + // console.info("excluded: ", [...excludedTerms]); + + return [query, searchTerms, excludedTerms, highlightTerms, objectTerms]; + }, + + /** + * execute search (requires search index to be loaded) + */ + _performSearch: (query, searchTerms, excludedTerms, highlightTerms, objectTerms) => { + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const titles = Search._index.titles; + const allTitles = Search._index.alltitles; + const indexEntries = Search._index.indexentries; + + // Collect multiple result groups to be sorted separately and then ordered. + // Each is an array of [docname, title, anchor, descr, score, filename]. + const normalResults = []; + const nonMainIndexResults = []; + + _removeChildren(document.getElementById("search-progress")); + + const queryLower = query.toLowerCase().trim(); + for (const [title, foundTitles] of Object.entries(allTitles)) { + if (title.toLowerCase().trim().includes(queryLower) && (queryLower.length >= title.length/2)) { + for (const [file, id] of foundTitles) { + const score = Math.round(Scorer.title * queryLower.length / title.length); + const boost = titles[file] === title ? 1 : 0; // add a boost for document titles + normalResults.push([ + docNames[file], + titles[file] !== title ? `${titles[file]} > ${title}` : title, + id !== null ? "#" + id : "", + null, + score + boost, + filenames[file], + ]); + } + } + } + + // search for explicit entries in index directives + for (const [entry, foundEntries] of Object.entries(indexEntries)) { + if (entry.includes(queryLower) && (queryLower.length >= entry.length/2)) { + for (const [file, id, isMain] of foundEntries) { + const score = Math.round(100 * queryLower.length / entry.length); + const result = [ + docNames[file], + titles[file], + id ? "#" + id : "", + null, + score, + filenames[file], + ]; + if (isMain) { + normalResults.push(result); + } else { + nonMainIndexResults.push(result); + } + } + } + } + + // lookup as object + objectTerms.forEach((term) => + normalResults.push(...Search.performObjectSearch(term, objectTerms)) + ); + + // lookup as search terms in fulltext + normalResults.push(...Search.performTermsSearch(searchTerms, excludedTerms)); + + // let the scorer override scores with a custom scoring function + if (Scorer.score) { + normalResults.forEach((item) => (item[4] = Scorer.score(item))); + nonMainIndexResults.forEach((item) => (item[4] = Scorer.score(item))); + } + + // Sort each group of results by score and then alphabetically by name. + normalResults.sort(_orderResultsByScoreThenName); + nonMainIndexResults.sort(_orderResultsByScoreThenName); + + // Combine the result groups in (reverse) order. + // Non-main index entries are typically arbitrary cross-references, + // so display them after other results. + let results = [...nonMainIndexResults, ...normalResults]; + + // remove duplicate search results + // note the reversing of results, so that in the case of duplicates, the highest-scoring entry is kept + let seen = new Set(); + results = results.reverse().reduce((acc, result) => { + let resultStr = result.slice(0, 4).concat([result[5]]).map(v => String(v)).join(','); + if (!seen.has(resultStr)) { + acc.push(result); + seen.add(resultStr); + } + return acc; + }, []); + + return results.reverse(); + }, + + query: (query) => { + const [searchQuery, searchTerms, excludedTerms, highlightTerms, objectTerms] = Search._parseQuery(query); + const results = Search._performSearch(searchQuery, searchTerms, excludedTerms, highlightTerms, objectTerms); + + // for debugging + //Search.lastresults = results.slice(); // a copy + // console.info("search results:", Search.lastresults); + + // print the results + _displayNextItem(results, results.length, searchTerms, highlightTerms); + }, + + /** + * search for object names + */ + performObjectSearch: (object, objectTerms) => { + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const objects = Search._index.objects; + const objNames = Search._index.objnames; + const titles = Search._index.titles; + + const results = []; + + const objectSearchCallback = (prefix, match) => { + const name = match[4] + const fullname = (prefix ? prefix + "." : "") + name; + const fullnameLower = fullname.toLowerCase(); + if (fullnameLower.indexOf(object) < 0) return; + + let score = 0; + const parts = fullnameLower.split("."); + + // check for different match types: exact matches of full name or + // "last name" (i.e. last dotted part) + if (fullnameLower === object || parts.slice(-1)[0] === object) + score += Scorer.objNameMatch; + else if (parts.slice(-1)[0].indexOf(object) > -1) + score += Scorer.objPartialMatch; // matches in last name + + const objName = objNames[match[1]][2]; + const title = titles[match[0]]; + + // If more than one term searched for, we require other words to be + // found in the name/title/description + const otherTerms = new Set(objectTerms); + otherTerms.delete(object); + if (otherTerms.size > 0) { + const haystack = `${prefix} ${name} ${objName} ${title}`.toLowerCase(); + if ( + [...otherTerms].some((otherTerm) => haystack.indexOf(otherTerm) < 0) + ) + return; + } + + let anchor = match[3]; + if (anchor === "") anchor = fullname; + else if (anchor === "-") anchor = objNames[match[1]][1] + "-" + fullname; + + const descr = objName + _(", in ") + title; + + // add custom score for some objects according to scorer + if (Scorer.objPrio.hasOwnProperty(match[2])) + score += Scorer.objPrio[match[2]]; + else score += Scorer.objPrioDefault; + + results.push([ + docNames[match[0]], + fullname, + "#" + anchor, + descr, + score, + filenames[match[0]], + ]); + }; + Object.keys(objects).forEach((prefix) => + objects[prefix].forEach((array) => + objectSearchCallback(prefix, array) + ) + ); + return results; + }, + + /** + * search for full-text terms in the index + */ + performTermsSearch: (searchTerms, excludedTerms) => { + // prepare search + const terms = Search._index.terms; + const titleTerms = Search._index.titleterms; + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const titles = Search._index.titles; + + const scoreMap = new Map(); + const fileMap = new Map(); + + // perform the search on the required terms + searchTerms.forEach((word) => { + const files = []; + const arr = [ + { files: terms[word], score: Scorer.term }, + { files: titleTerms[word], score: Scorer.title }, + ]; + // add support for partial matches + if (word.length > 2) { + const escapedWord = _escapeRegExp(word); + if (!terms.hasOwnProperty(word)) { + Object.keys(terms).forEach((term) => { + if (term.match(escapedWord)) + arr.push({ files: terms[term], score: Scorer.partialTerm }); + }); + } + if (!titleTerms.hasOwnProperty(word)) { + Object.keys(titleTerms).forEach((term) => { + if (term.match(escapedWord)) + arr.push({ files: titleTerms[term], score: Scorer.partialTitle }); + }); + } + } + + // no match but word was a required one + if (arr.every((record) => record.files === undefined)) return; + + // found search word in contents + arr.forEach((record) => { + if (record.files === undefined) return; + + let recordFiles = record.files; + if (recordFiles.length === undefined) recordFiles = [recordFiles]; + files.push(...recordFiles); + + // set score for the word in each file + recordFiles.forEach((file) => { + if (!scoreMap.has(file)) scoreMap.set(file, {}); + scoreMap.get(file)[word] = record.score; + }); + }); + + // create the mapping + files.forEach((file) => { + if (!fileMap.has(file)) fileMap.set(file, [word]); + else if (fileMap.get(file).indexOf(word) === -1) fileMap.get(file).push(word); + }); + }); + + // now check if the files don't contain excluded terms + const results = []; + for (const [file, wordList] of fileMap) { + // check if all requirements are matched + + // as search terms with length < 3 are discarded + const filteredTermCount = [...searchTerms].filter( + (term) => term.length > 2 + ).length; + if ( + wordList.length !== searchTerms.size && + wordList.length !== filteredTermCount + ) + continue; + + // ensure that none of the excluded terms is in the search result + if ( + [...excludedTerms].some( + (term) => + terms[term] === file || + titleTerms[term] === file || + (terms[term] || []).includes(file) || + (titleTerms[term] || []).includes(file) + ) + ) + break; + + // select one (max) score for the file. + const score = Math.max(...wordList.map((w) => scoreMap.get(file)[w])); + // add result to the result list + results.push([ + docNames[file], + titles[file], + "", + null, + score, + filenames[file], + ]); + } + return results; + }, + + /** + * helper function to return a node containing the + * search summary for a given text. keywords is a list + * of stemmed words. + */ + makeSearchSummary: (htmlText, keywords, anchor) => { + const text = Search.htmlToText(htmlText, anchor); + if (text === "") return null; + + const textLower = text.toLowerCase(); + const actualStartPosition = [...keywords] + .map((k) => textLower.indexOf(k.toLowerCase())) + .filter((i) => i > -1) + .slice(-1)[0]; + const startWithContext = Math.max(actualStartPosition - 120, 0); + + const top = startWithContext === 0 ? "" : "..."; + const tail = startWithContext + 240 < text.length ? "..." : ""; + + let summary = document.createElement("p"); + summary.classList.add("context"); + summary.textContent = top + text.substr(startWithContext, 240).trim() + tail; + + return summary; + }, +}; + +_ready(Search.init); diff --git a/5.0.0/_static/sphinx_highlight.js b/5.0.0/_static/sphinx_highlight.js new file mode 100644 index 000000000..8a96c69a1 --- /dev/null +++ b/5.0.0/_static/sphinx_highlight.js @@ -0,0 +1,154 @@ +/* Highlighting utilities for Sphinx HTML documentation. */ +"use strict"; + +const SPHINX_HIGHLIGHT_ENABLED = true + +/** + * highlight a given string on a node by wrapping it in + * span elements with the given class name. + */ +const _highlight = (node, addItems, text, className) => { + if (node.nodeType === Node.TEXT_NODE) { + const val = node.nodeValue; + const parent = node.parentNode; + const pos = val.toLowerCase().indexOf(text); + if ( + pos >= 0 && + !parent.classList.contains(className) && + !parent.classList.contains("nohighlight") + ) { + let span; + + const closestNode = parent.closest("body, svg, foreignObject"); + const isInSVG = closestNode && closestNode.matches("svg"); + if (isInSVG) { + span = document.createElementNS("http://www.w3.org/2000/svg", "tspan"); + } else { + span = document.createElement("span"); + span.classList.add(className); + } + + span.appendChild(document.createTextNode(val.substr(pos, text.length))); + const rest = document.createTextNode(val.substr(pos + text.length)); + parent.insertBefore( + span, + parent.insertBefore( + rest, + node.nextSibling + ) + ); + node.nodeValue = val.substr(0, pos); + /* There may be more occurrences of search term in this node. So call this + * function recursively on the remaining fragment. + */ + _highlight(rest, addItems, text, className); + + if (isInSVG) { + const rect = document.createElementNS( + "http://www.w3.org/2000/svg", + "rect" + ); + const bbox = parent.getBBox(); + rect.x.baseVal.value = bbox.x; + rect.y.baseVal.value = bbox.y; + rect.width.baseVal.value = bbox.width; + rect.height.baseVal.value = bbox.height; + rect.setAttribute("class", className); + addItems.push({ parent: parent, target: rect }); + } + } + } else if (node.matches && !node.matches("button, select, textarea")) { + node.childNodes.forEach((el) => _highlight(el, addItems, text, className)); + } +}; +const _highlightText = (thisNode, text, className) => { + let addItems = []; + _highlight(thisNode, addItems, text, className); + addItems.forEach((obj) => + obj.parent.insertAdjacentElement("beforebegin", obj.target) + ); +}; + +/** + * Small JavaScript module for the documentation. + */ +const SphinxHighlight = { + + /** + * highlight the search words provided in localstorage in the text + */ + highlightSearchWords: () => { + if (!SPHINX_HIGHLIGHT_ENABLED) return; // bail if no highlight + + // get and clear terms from localstorage + const url = new URL(window.location); + const highlight = + localStorage.getItem("sphinx_highlight_terms") + || url.searchParams.get("highlight") + || ""; + localStorage.removeItem("sphinx_highlight_terms") + url.searchParams.delete("highlight"); + window.history.replaceState({}, "", url); + + // get individual terms from highlight string + const terms = highlight.toLowerCase().split(/\s+/).filter(x => x); + if (terms.length === 0) return; // nothing to do + + // There should never be more than one element matching "div.body" + const divBody = document.querySelectorAll("div.body"); + const body = divBody.length ? divBody[0] : document.querySelector("body"); + window.setTimeout(() => { + terms.forEach((term) => _highlightText(body, term, "highlighted")); + }, 10); + + const searchBox = document.getElementById("searchbox"); + if (searchBox === null) return; + searchBox.appendChild( + document + .createRange() + .createContextualFragment( + '" + ) + ); + }, + + /** + * helper function to hide the search marks again + */ + hideSearchWords: () => { + document + .querySelectorAll("#searchbox .highlight-link") + .forEach((el) => el.remove()); + document + .querySelectorAll("span.highlighted") + .forEach((el) => el.classList.remove("highlighted")); + localStorage.removeItem("sphinx_highlight_terms") + }, + + initEscapeListener: () => { + // only install a listener if it is really needed + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.shiftKey || event.altKey || event.ctrlKey || event.metaKey) return; + if (DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS && (event.key === "Escape")) { + SphinxHighlight.hideSearchWords(); + event.preventDefault(); + } + }); + }, +}; + +_ready(() => { + /* Do not call highlightSearchWords() when we are on the search page. + * It will highlight words from the *previous* search query. + */ + if (typeof Search === "undefined") SphinxHighlight.highlightSearchWords(); + SphinxHighlight.initEscapeListener(); +}); diff --git a/5.0.0/advanced.html b/5.0.0/advanced.html new file mode 100644 index 000000000..70aa68df3 --- /dev/null +++ b/5.0.0/advanced.html @@ -0,0 +1,538 @@ + + + + + + + + Advanced Usage — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Advanced Usage

+

Optical character recognition is the serial execution of multiple steps, in the +case of kraken, layout analysis/page segmentation (extracting topological text +lines from an image), recognition (feeding text lines images into a +classifier), and finally serialization of results into an appropriate format +such as ALTO or PageXML.

+
+

Input and Outputs

+

Kraken inputs and their outputs can be defined in multiple ways. The most +simple are input-output pairs, i.e. producing one output document for one input +document follow the basic syntax:

+
$ kraken -i input_1 output_1 -i input_2 output_2 ... subcommand_1 subcommand_2 ... subcommand_n
+
+
+

In particular subcommands may be chained.

+

There are other ways to define inputs and outputs as the syntax shown above can +become rather cumbersome for large amounts of files.

+

As such there are a couple of ways to deal with multiple files in a compact +way. The first is batch processing:

+
$ kraken -I '*.png' -o ocr.txt segment ...
+
+
+

which expands the glob expression in kraken internally and +appends the suffix defined with -o to each output file. An input file +xyz.png will therefore produce an output file xyz.png.ocr.txt. -I batch +inputs can also be specified multiple times:

+
$ kraken -I '*.png' -I '*.jpg' -I '*.tif' -o ocr.txt segment ...
+
+
+

A second way is to input multi-image files directly. These can be either in +PDF, TIFF, or JPEG2000 format and are specified like:

+
$ kraken -I some.pdf -o ocr.txt -f pdf segment ...
+
+
+

This will internally extract all page images from the input PDF file and write +one output file with an index (can be changed using the -p option) and the +suffix defined with -o.

+

The -f option can not only be used to extract data from PDF/TIFF/JPEG2000 +files but also various XML formats. In these cases the appropriate data is +automatically selected from the inputs, image data for segmentation or line and +region segmentation for recognition:

+
$ kraken -i alto.xml alto.ocr.txt -i page.xml page.ocr.txt -f xml ocr ...
+
+
+

The code is able to automatically determine if a file is in PageXML or ALTO format.

+
+

Output formats

+

All commands have a default output format such as raw text for ocr, a plain +image for binarize, or a JSON definition of the the segmentation for +segment. These are specific to kraken and generally not suitable for further +processing by other software but a number of standardized data exchange formats +can be selected. Per default ALTO, +PageXML, hOCR, and abbyyXML containing additional metadata such as +bounding boxes and confidences are implemented. In addition, custom jinja templates can be loaded to create +individualised output such as TEI.

+

Output formats are selected on the main kraken command and apply to the last +subcommand defined in the subcommand chain. For example:

+
$ kraken --alto -i ... segment -bl
+
+
+

will serialize a plain segmentation in ALTO into the specified output file.

+

The currently available format switches are:

+
$ kraken -n -i ... ... # native output
+$ kraken -a -i ... ... # ALTO output
+$ kraken -x -i ... ... # PageXML output
+$ kraken -h -i ... ... # hOCR output
+$ kraken -y -i ... ... # abbyyXML output
+
+
+

Custom templates can be loaded with the --template option:

+
$ kraken --template /my/awesome/template.tmpl -i ... ...
+
+
+

The data objects used by the templates are considered internal to kraken and +can change from time to time. The best way to get some orientation when writing +a new template from scratch is to have a look at the existing templates here.

+
+
+
+

Binarization

+
+

Note

+

Binarization is deprecated and mostly not necessary anymore. It can often +worsen text recognition results especially for documents with uneven +lighting, faint writing, etc.

+
+

The binarization subcommand converts a color or grayscale input image into an +image containing only two color levels: white (background) and black +(foreground, i.e. text). It accepts almost the same parameters as +ocropus-nlbin. Only options not related to binarization, e.g. skew +detection are missing. In addition, error checking (image sizes, inversion +detection, grayscale enforcement) is always disabled and kraken will happily +binarize any image that is thrown at it.

+

Available parameters are:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

option

type

--threshold

FLOAT

--zoom

FLOAT

--escale

FLOAT

--border

FLOAT

--perc

INTEGER RANGE

--range

INTEGER

--low

INTEGER RANGE

--high

INTEGER RANGE

+

To binarize an image:

+
$ kraken -i input.jpg bw.png binarize
+
+
+
+

Note

+

Some image formats, notably JPEG, do not support a black and white +image mode. Per default the output format according to the output file +name extension will be honored. If this is not possible, a warning will +be printed and the output forced to PNG:

+
$ kraken -i input.jpg bw.jpg binarize
+Binarizing      [06/24/22 09:56:23] WARNING  jpeg does not support 1bpp images. Forcing to png.
+
+
+
+
+
+
+

Page Segmentation

+

The segment subcommand accesses page segmentation into lines and regions with +the two layout analysis methods implemented: the trainable baseline segmenter +that is capable of detecting both lines of different types and regions and a +legacy non-trainable segmenter that produces bounding boxes.

+

Universal parameters of either segmenter are:

+ + + + + + + + + + + + + + +

option

action

-d, --text-direction

Sets principal text direction. Valid values are horizontal-lr, horizontal-rl, vertical-lr, and vertical-rl.

-m, --mask

Segmentation mask suppressing page areas for line detection. A simple black and white mask image where 0-valued (black) areas are ignored for segmentation purposes.

+
+

Baseline Segmentation

+

The baseline segmenter works by applying a segmentation model on a page image +which labels each pixel on the image with one or more classes with each class +corresponding to a line or region of a specific type. In addition there are two +auxiliary classes that are used to determine the line orientation. A simplified +example of a composite image of the auxiliary classes and a single line type +without regions can be seen below:

+BLLA output heatmap + +

In a second step the raw heatmap is vectorized to extract line instances and +region boundaries, followed by bounding polygon computation for the baselines, +and text line ordering. The final output can be visualized as:

+BLLA final output + +

The primary determinant of segmentation quality is the segmentation model +employed. There is a default model that works reasonably well on printed and +handwritten material on undegraded, even writing surfaces such as paper or +parchment. The output of this model consists of a single line type and a +generic text region class that denotes coherent blocks of text. This model is +employed automatically when the baseline segment is activated with the -bl +option:

+
$ kraken -i input.jpg segmentation.json segment -bl
+
+
+

New models optimized for other kinds of documents can be trained (see +here). These can be applied with the -i option of the +segment subcommand:

+
$ kraken -i input.jpg segmentation.json segment -bl -i fancy_model.mlmodel
+
+
+
+
+

Legacy Box Segmentation

+

The legacy page segmentation is mostly parameterless, although a couple of +switches exist to tweak it for particular inputs. Its output consists of +rectangular bounding boxes in reading order and the general text direction +(horizontal, i.e. LTR or RTL text in top-to-bottom reading order or +vertical-ltr/rtl for vertical lines read from left-to-right or right-to-left).

+

Apart from the limitations of the bounding box paradigm (rotated and curved +lines cannot be effectively extracted) another important drawback of the legacy +segmenter is the requirement for binarized input images. It is therefore +necessary to apply binarization first or supply only +pre-binarized inputs.

+

The legacy segmenter can be applied on some input image with:

+
$ kraken -i 14.tif lines.json segment -x
+$ cat lines.json
+
+
+

Available specific parameters are:

+ + + + + + + + + + + + + + + + + + + + + + + +

option

action

--scale FLOAT

Estimate of the average line height on the page

-m, --maxcolseps

Maximum number of columns in the input document. Set to 0 for uni-column layouts.

-b, --black-colseps / -w, --white-colseps

Switch to black column separators.

-r, --remove-hlines / -l, --hlines

Disables prefiltering of small horizontal lines. Improves segmenter output on some Arabic texts.

-p, --pad

Adds left and right padding around lines in the output.

+
+
+

Principal Text Direction

+

The principal text direction selected with the -d/--text-direction is a +switch used in the reading order heuristic to determine the order of text +blocks (regions) and individual lines. It roughly corresponds to the block +flow direction in CSS with +an additional option. Valid options consist of two parts, an initial principal +line orientation (horizontal or vertical) followed by a block order (lr +for left-to-right or rl for right-to-left).

+

The first part is usually horizontal for scripts like Latin, Arabic, or +Hebrew where the lines are horizontally oriented on the page and are written/read from +top to bottom:

+Horizontal Latin script text + +

Other scripts like Chinese can be written with vertical lines that are +written/read from left to right or right to left:

+Vertical Chinese text + +

The second part is dependent on a number of factors as the order in which text +blocks are read is not fixed for every writing system. In mono-script texts it +is usually determined by the inline text direction, i.e. Latin script texts +columns are read starting with the top-left column followed by the column to +its right and so on, continuing with the left-most column below if none remain +to the right (inverse for right-to-left scripts like Arabic which start on the +top right-most columns, continuing leftward, and returning to the right-most +column just below when none remain).

+

In multi-script documents the order is determined by the primary writing +system employed in the document, e.g. for a modern book containing both Latin +and Arabic script text it would be set to lr when Latin is primary, e.g. when +the binding is on the left side of the book seen from the title cover, and +vice-versa (rl if binding is on the right on the title cover). The analogue +applies to text written with vertical lines.

+

With these explications there are four different text directions available:

+ + + + + + + + + + + + + + + + + + + + +

Text Direction

Examples

horizontal-lr

Latin script texts, Mixed LTR/RTL docs with principal LTR script

horizontal-rl

Arabic script texts, Mixed LTR/RTL docs with principal RTL script

vertical-lr

Vertical script texts read from left-to-right.

vertical-rl

Vertical script texts read from right-to-left.

+
+
+

Masking

+

It is possible to keep the segmenter from finding text lines and regions on +certain areas of the input image. This is done through providing a binary mask +image that has the same size as the input image where blocked out regions are +black and valid regions white:

+
$ kraken -i input.jpg segmentation.json segment -bl -m mask.png
+
+
+
+
+
+

Model Repository

+

There is a semi-curated repository of freely licensed recognition +models that can be interacted with from the command line using a few +subcommands.

+
+

Querying and Model Retrieval

+

The list subcommand retrieves a list of all models available and prints +them including some additional information (identifier, type, and a short +description):

+
$ kraken list
+Retrieving model list ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 8/8 0:00:00 0:00:07
+10.5281/zenodo.6542744 (pytorch) - LECTAUREP Contemporary French Model (Administration)
+10.5281/zenodo.5617783 (pytorch) - Cremma-Medieval Old French Model (Litterature)
+10.5281/zenodo.5468665 (pytorch) - Medieval Hebrew manuscripts in Sephardi bookhand version 1.0
+...
+
+
+

To access more detailed information the show subcommand may be used:

+
$ kraken show 10.5281/zenodo.5617783
+name: 10.5281/zenodo.5617783
+
+Cremma-Medieval Old French Model (Litterature)
+
+....
+scripts: Latn
+alphabet: &'(),-.0123456789:;?ABCDEFGHIJKLMNOPQRSTUVXabcdefghijklmnopqrstuvwxyz¶ãíñõ÷ħĩłũƺᵉẽ’•⁊⁹ꝑꝓꝯꝰ SPACE, COMBINING ACUTE ACCENT, COMBINING TILDE, COMBINING MACRON, COMBINING ZIGZAG ABOVE, COMBINING LATIN SMALL LETTER A, COMBINING LATIN SMALL LETTER E, COMBINING LATIN SMALL LETTER I, COMBINING LATIN SMALL LETTER O, COMBINING LATIN SMALL LETTER U, COMBINING LATIN SMALL LETTER C, COMBINING LATIN SMALL LETTER R, COMBINING LATIN SMALL LETTER T, COMBINING UR ABOVE, COMBINING US ABOVE, COMBINING LATIN SMALL LETTER S, 0xe8e5, 0xf038, 0xf128
+accuracy: 95.49%
+license: CC-BY-SA-2.0
+author(s): Pinche, Ariane
+date: 2021-10-29
+
+
+

If a suitable model has been decided upon it can be retrieved using the get +subcommand:

+
$ kraken get 10.5281/zenodo.5617783
+Processing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 16.1/16.1 MB 0:00:00 0:00:10
+Model name: cremma_medieval_bicerin.mlmodel
+
+
+

Models will be placed in $XDG_BASE_DIR and can be accessed using their name as +printed in the last line of the kraken get output.

+
$ kraken -i ... ... ocr -m cremma_medieval_bicerin.mlmodel
+
+
+
+
+

Publishing

+

When one would like to share a model with the wider world (for fame and glory!) +it is possible (and recommended) to upload them to repository. The process +consists of 2 stages: the creation of the deposit on the Zenodo platform +followed by approval of the model in the community making it discoverable for +other kraken users.

+

For uploading model a Zenodo account and a personal access token is required. +After account creation tokens can be created under the account settings:

+Zenodo token creation dialogue + +

With the token models can then be uploaded:

+
$ ketos publish -a $ACCESS_TOKEN aaebv2-2.mlmodel
+DOI: 10.5281/zenodo.5617783
+
+
+

A number of important metadata will be asked for such as a short description of +the model, long form description, recognized scripts, and authorship. +Afterwards the model is deposited at Zenodo. This deposit is persistent, i.e. +can’t be changed or deleted so it is important to make sure that all the +information is correct. Each deposit also has a unique persistent identifier, a +DOI, that can be used to refer to it, e.g. in publications or when pointing +someone to a particular model.

+

Once the deposit has been created a request (requiring manual approval) for +inclusion in the repository will automatically be created which will make it +discoverable by other users.

+

It is possible to deposit models without including them in the queryable +repository. Models uploaded this way are not truly private and can still be +found through the standard Zenodo search and be downloaded with kraken get +and its DOI. It is mostly suggested for preliminary models that might get +updated later:

+
$ ketos publish --private -a $ACCESS_TOKEN aaebv2-2.mlmodel
+DOI: 10.5281/zenodo.5617734
+
+
+
+
+
+

Recognition

+

Recognition requires a grey-scale or binarized image, a page segmentation for +that image, and a model file. In particular there is no requirement to use the +page segmentation algorithm contained in the segment subcommand or the +binarization provided by kraken.

+

Multi-script recognition is possible by supplying a script-annotated +segmentation and a mapping between scripts and models:

+
$ kraken -i ... ... ocr -m Grek:porson.mlmodel -m Latn:antiqua.mlmodel
+
+
+

All polytonic Greek text portions will be recognized using the porson.mlmodel +model while Latin text will be fed into the antiqua.mlmodel model. It is +possible to define a fallback model that other text will be fed to:

+
$ kraken -i ... ... ocr -m ... -m ... -m default:porson.mlmodel
+
+
+

It is also possible to disable recognition on a particular script by mapping to +the special model keyword ignore. Ignored lines will still be serialized but +will not contain any recognition results.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.0.0/api.html b/5.0.0/api.html new file mode 100644 index 000000000..46fbbd93f --- /dev/null +++ b/5.0.0/api.html @@ -0,0 +1,3056 @@ + + + + + + + + API Quickstart — kraken documentation + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

API Quickstart

+

Kraken provides routines which are usable by third party tools to access all +functionality of the OCR engine. Most functional blocks, binarization, +segmentation, recognition, and serialization are encapsulated in one high +level method each.

+

Simple use cases of the API which are mostly useful for debugging purposes are +contained in the contrib directory. In general it is recommended to look at +this tutorial, these scripts, or the API reference. The command line drivers +are unnecessarily complex for straightforward applications as they contain lots +of boilerplate to enable all use cases.

+
+

Basic Concepts

+

The fundamental modules of the API are similar to the command line drivers. +Image inputs and outputs are generally Pillow +objects and numerical outputs numpy arrays.

+

Top-level modules implement high level functionality while kraken.lib +contains loaders and low level methods that usually should not be used if +access to intermediate results is not required.

+
+
+

Preprocessing and Segmentation

+

The primary preprocessing function is binarization although depending on the +particular setup of the pipeline and the models utilized it can be optional. +For the non-trainable legacy bounding box segmenter binarization is mandatory +although it is still possible to feed color and grayscale images to the +recognizer. The trainable baseline segmenter can work with black and white, +grayscale, and color images, depending on the training data and network +configuration utilized; though grayscale and color data are used in almost all +cases.

+
>>> from PIL import Image
+
+>>> from kraken import binarization
+
+# can be any supported image format and mode
+>>> im = Image.open('foo.png')
+>>> bw_im = binarization.nlbin(im)
+
+
+
+

Legacy segmentation

+

The basic parameter of the legacy segmenter consists just of a b/w image +object, although some additional parameters exist, largely to change the +principal text direction (important for column ordering and top-to-bottom +scripts) and explicit masking of non-text image regions:

+
>>> from kraken import pageseg
+
+>>> seg = pageseg.segment(bw_im)
+>>> seg
+{'text_direction': 'horizontal-lr',
+ 'boxes': [[0, 29, 232, 56],
+           [28, 54, 121, 84],
+           [9, 73, 92, 117],
+           [103, 76, 145, 131],
+           [7, 105, 119, 230],
+           [10, 228, 126, 345],
+           ...
+          ],
+ 'script_detection': False}
+
+
+
+
+

Baseline segmentation

+

The baseline segmentation method is based on a neural network that classifies +image pixels into baselines and regions. Because it is trainable, a +segmentation model is required in addition to the image to be segmented and +it has to be loaded first:

+
>>> from kraken import blla
+>>> from kraken.lib import vgsl
+
+>>> model_path = 'path/to/model/file'
+>>> model = vgsl.TorchVGSLModel.load_model(model_path)
+
+
+

A segmentation model contains a basic neural network and associated metadata +defining the available line and region types, bounding regions, and an +auxiliary baseline location flag for the polygonizer:

+ + + + + + + + + + + + + Segmentation Model + (TorchVGSLModel) + + + + + + + + + Metadata + + + + + + + Line and Region Types + + + + + + + Baseline location flag + + + + + + + Bounding Regions + + + + + + + + + + + Neural Network + + + + +

Afterwards they can be fed into the segmentation method +kraken.blla.segment() with image objects:

+
>>> from kraken import blla
+>>> from kraken import serialization
+
+>>> baseline_seg = blla.segment(im, model=model)
+>>> baseline_seg
+{'text_direction': 'horizontal-lr',
+ 'type': 'baselines',
+ 'script_detection': False,
+ 'lines': [{'script': 'default',
+            'baseline': [[471, 1408], [524, 1412], [509, 1397], [1161, 1412], [1195, 1412]],
+            'boundary': [[471, 1408], [491, 1408], [515, 1385], [562, 1388], [575, 1377], ... [473, 1410]]},
+           ...],
+ 'regions': {'$tip':[[[536, 1716], ... [522, 1708], [524, 1716], [536, 1716], ...]
+             '$par': ...
+             '$nop':  ...}}
+>>> alto = serialization.serialize_segmentation(baseline_seg, image_name=im.filename, image_size=im.size, template='alto')
+>>> with open('segmentation_output.xml', 'w') as fp:
+        fp.write(alto)
+
+
+

Optional parameters are largely the same as for the legacy segmenter, i.e. text +direction and masking.

+

Images are automatically converted into the proper mode for recognition, except +in the case of models trained on binary images as there is a plethora of +different algorithms available, each with strengths and weaknesses. For most +material the kraken-provided binarization should be sufficient, though. This +does not mean that a segmentation model trained on RGB images will have equal +accuracy for B/W, grayscale, and RGB inputs. Nevertheless the drop in quality +will often be modest or non-existent for color models while non-binarized +inputs to a binary model will cause severe degradation (and a warning to that +notion).

+

Per default segmentation is performed on the CPU although the neural network +can be run on a GPU with the device argument. As the vast majority of the +processing required is postprocessing the performance gain will most likely +modest though.

+

The above API is the most simple way to perform a complete segmentation. The +process consists of multiple steps such as pixel labelling, separate region and +baseline vectorization, and bounding polygon calculation:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Pixel Labelling + + + + + + + + Line and Separator + Heatmaps + + + + + + + + + Bounding Polygon + Calculation + + + + + + + + + + + Baseline + Vectorization + and Orientation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Oriented + Baselines + + + + + + + + + Line + Ordering + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Bounding + Polygons + + + + + + + Trainable + + + + + + + + + + + + Segmentation + + + + + + + + + + Region Heatmaps + + + + + + + + + + Region + Vectorization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Region + Boundaries + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

It is possible to only run a subset of the functionality depending on one’s +needs by calling the respective functions in kraken.lib.segmentation. As +part of the sub-library the API is not guaranteed to be stable but it generally +does not change much. Examples of more fine-grained use of the segmentation API +can be found in contrib/repolygonize.py +and contrib/segmentation_overlay.py.

+
+
+
+

Recognition

+

Recognition itself is a multi-step process with a neural network producing a +matrix with a confidence value for possible outputs at each time step. This +matrix is decoded into a sequence of integer labels (label domain) which are +subsequently mapped into Unicode code points using a codec. Labels and code +points usually correspond one-to-one, i.e. each label is mapped to exactly one +Unicode code point, but if desired more complex codecs can map single labels to +multiple code points, multiple labels to single code points, or multiple labels +to multiple code points (see the Codec section for further +information).

+ + + + + + + + + + + + Output Matrix + + + Labels + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Label + Sequence + + + 15, 10, 1, ... + + + + 'Time' Steps + + + + + + + + + + + + + + 'Time' Steps + (Width) + + + + + + + + + + + + + + + + + + + + + + + + + + Neural + Net + + + + Character + Sequence + + + o, c, u, ... + + + + + + + + + + + + + + + CTC + decoder + + + + + Codec + + + + + + + + + + + + + + +

As the customization of this two-stage decoding process is usually reserved +for specialized use cases, sensible defaults are chosen by default: codecs are +part of the model file and do not have to be supplied manually; the preferred +CTC decoder is an optional parameter of the recognition model object.

+

To perform text line recognition a neural network has to be loaded first. A +kraken.lib.models.TorchSeqRecognizer is returned which is a wrapper +around the kraken.lib.vgsl.TorchVGSLModel class seen above for +segmentation model loading.

+
>>> from kraken.lib import models
+
+>>> rec_model_path = '/path/to/recognition/model'
+>>> model = models.load_any(rec_model_path)
+
+
+

The sequence recognizer wrapper combines the neural network itself, a +codec, metadata such as the if the input is supposed to be +grayscale or binarized, and an instance of a CTC decoder that performs the +conversion of the raw output tensor of the network into a sequence of labels:

+ + + + + + + + + + + + + Transcription Model + (TorchSeqRecognizer) + + + + + + + + + + Codec + + + + + + + + + + + Metadata + + + + + + + + + + + CTC Decoder + + + + + + + + + + + Neural Network + + + + +

Afterwards, given an image, a segmentation and the model one can perform text +recognition. The code is identical for both legacy and baseline segmentations. +Like for segmentation input images are auto-converted to the correct color +mode, except in the case of binary models for which a warning will be raised if +there is a mismatch for binary input models.

+

There are two methods for recognition, a basic single model call +kraken.rpred.rpred() and a multi-model recognizer +kraken.rpred.mm_rpred(). The latter is useful for recognizing +multi-scriptal documents, i.e. applying different models to different parts of +a document.

+
>>> from kraken import rpred
+# single model recognition
+>>> pred_it = rpred(model, im, baseline_seg)
+>>> for record in pred_it:
+        print(record)
+
+
+

The output isn’t just a sequence of characters but an +kraken.rpred.ocr_record record object containing the character +prediction, cuts (approximate locations), and confidences.

+
>>> record.cuts
+>>> record.prediction
+>>> record.confidences
+
+
+

it is also possible to access the original line information:

+
# for baselines
+>>> record.type
+'baselines'
+>>> record.line
+>>> record.baseline
+>>> record.script
+
+# for box lines
+>>> record.type
+'box'
+>>> record.line
+>>> record.script
+
+
+

Sometimes the undecoded raw output of the network is required. The \(C +\times W\) softmax output matrix is accessible as the outputs attribute on the +kraken.lib.models.TorchSeqRecognizer after each step of the +kraken.rpred.rpred() iterator. To get a mapping from the label space +\(C\) the network operates in to Unicode code points a codec is used. An +arbitrary sequence of labels can generate an arbitrary number of Unicode code +points although usually the relation is one-to-one.

+
>>> pred_it = rpred(model, im, baseline_seg)
+>>> next(pred_it)
+>>> model.output
+>>> model.codec.l2c
+{'\x01': ' ',
+ '\x02': '"',
+ '\x03': "'",
+ '\x04': '(',
+ '\x05': ')',
+ '\x06': '-',
+ '\x07': '/',
+ ...
+}
+
+
+

There are several different ways to convert the output matrix to a sequence of +labels that can be decoded into a character sequence. These are contained in +kraken.lib.ctc_decoder with +kraken.lib.ctc_decoder.greedy_decoder() being the default.

+
+
+

XML Parsing

+

Sometimes it is desired to take the data in an existing XML serialization +format like PageXML or ALTO and apply an OCR function on it. The +kraken.lib.xml module includes parsers extracting information into data +structures processable with minimal transformtion by the functional blocks:

+
>>> from kraken.lib import xml
+
+>>> alto_doc = '/path/to/alto'
+>>> xml.parse_alto(alto_doc)
+{'image': '/path/to/image/file',
+ 'type': 'baselines',
+ 'lines': [{'baseline': [(24, 2017), (25, 2078)],
+            'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)],
+            'text': '',
+            'script': 'default'},
+           {'baseline': [(79, 2016), (79, 2041)],
+            'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)],
+            'text': '',
+            'script': 'default'}, ...],
+ 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)],
+                                      [(253, 3292), (668, 3292), (668, 3455), (253, 3455)],
+                                      [(216, -4), (1015, -4), (1015, 534), (216, 534)]],
+             'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)],
+                                  [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]],
+             ...}
+}
+
+>>> page_doc = '/path/to/page'
+>>> xml.parse_page(page_doc)
+{'image': '/path/to/image/file',
+ 'type': 'baselines',
+ 'lines': [{'baseline': [(24, 2017), (25, 2078)],
+            'boundary': [(69, 2016), (70, 2077), (20, 2078), (19, 2017)],
+            'text': '',
+            'script': 'default'},
+           {'baseline': [(79, 2016), (79, 2041)],
+            'boundary': [(124, 2016), (124, 2041), (74, 2041), (74, 2016)],
+            'text': '',
+            'script': 'default'}, ...],
+ 'regions': {'Image/Drawing/Figure': [[(-5, 3398), (207, 3398), (207, 2000), (-5, 2000)],
+                                      [(253, 3292), (668, 3292), (668, 3455), (253, 3455)],
+                                      [(216, -4), (1015, -4), (1015, 534), (216, 534)]],
+             'Handwritten text': [[(2426, 3367), (2483, 3367), (2483, 3414), (2426, 3414)],
+                                  [(1824, 3437), (2072, 3437), (2072, 3514), (1824, 3514)]],
+             ...}
+
+
+
+
+

Serialization

+

The serialization module can be used to transform the ocr_records returned by the prediction iterator into a text +based (most often XML) format for archival. The module renders jinja2 templates in kraken/templates through +the kraken.serialization.serialize() function.

+
>>> from kraken.lib import serialization
+
+>>> records = [record for record in pred_it]
+>>> alto = serialization.serialize(records, image_name='path/to/image', image_size=im.size, template='alto')
+>>> with open('output.xml', 'w') as fp:
+        fp.write(alto)
+
+
+
+
+

Training

+

Training is largely implemented with the pytorch lightning framework. There are separate +LightningModule`s for recognition and segmentation training and a small +wrapper around the lightning’s `Trainer class that mainly sets up model +handling and verbosity options for the CLI.

+
>>> from kraken.lib.train import RecognitionModel, KrakenTrainer
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer()
+>>> trainer.fit(model)
+
+
+

Likewise for a baseline and region segmentation model:

+
>>> from kraken.lib.train import SegmentationModel, KrakenTrainer
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = SegmentationModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer()
+>>> trainer.fit(model)
+
+
+

When the fit() method is called the dataset is initialized and the training +commences. Both can take quite a bit of time. To get insight into what exactly +is happening the standard lightning callbacks +can be attached to the trainer object:

+
>>> from pytorch_lightning.callbacks import Callback
+>>> from kraken.lib.train import RecognitionModel, KrakenTrainer
+>>> class MyPrintingCallback(Callback):
+    def on_init_start(self, trainer):
+        print("Starting to init trainer!")
+
+    def on_init_end(self, trainer):
+        print("trainer is init now")
+
+    def on_train_end(self, trainer, pl_module):
+        print("do something when training ends")
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer(enable_progress_bar=False, callbacks=[MyPrintingCallback])
+>>> trainer.fit(model)
+Starting to init trainer!
+trainer is init now
+
+
+

This is only a small subset of the training functionality. It is suggested to +have a closer look at the command line parameters for features as transfer +learning, region and baseline filtering, training continuation, and so on.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.0.0/api_docs.html b/5.0.0/api_docs.html new file mode 100644 index 000000000..c24372914 --- /dev/null +++ b/5.0.0/api_docs.html @@ -0,0 +1,3857 @@ + + + + + + + + API Reference — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

API Reference

+
+

Segmentation

+
+

kraken.blla module

+
+

Note

+

blla provides the interface to the fully trainable segmenter. For the +legacy segmenter interface refer to the pageseg module. Note that +recognition models are not interchangeable between segmenters.

+
+
+
+kraken.blla.segment(im, text_direction='horizontal-lr', mask=None, reading_order_fn=polygonal_reading_order, model=None, device='cpu', raise_on_error=False, autocast=False)
+

Segments a page into text lines using the baseline segmenter.

+

Segments a page into text lines and returns the polyline formed by each +baseline and their estimated environment.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image. The mode can generally be anything but it is possible +to supply a binarized-input-only model which requires accordingly +treated images.

  • +
  • text_direction (Literal['horizontal-lr', 'horizontal-rl', 'vertical-lr', 'vertical-rl']) – Passed-through value for serialization.serialize.

  • +
  • mask (Optional[numpy.ndarray]) – A bi-level mask image of the same size as im where 0-valued +regions are ignored for segmentation purposes. Disables column +detection.

  • +
  • reading_order_fn (Callable) – Function to determine the reading order. Has to +accept a list of tuples (baselines, polygon) and a +text direction (lr or rl).

  • +
  • model (Union[List[kraken.lib.vgsl.TorchVGSLModel], kraken.lib.vgsl.TorchVGSLModel]) – One or more TorchVGSLModel containing a segmentation model. If +none is given a default model will be loaded.

  • +
  • device (str) – The target device to run the neural network on.

  • +
  • raise_on_error (bool) – Raises error instead of logging them when they are +not-blocking

  • +
  • autocast (bool) – Runs the model with automatic mixed precision

  • +
+
+
Returns:
+

A kraken.containers.Segmentation class containing reading +order sorted baselines (polylines) and their respective polygonal +boundaries as kraken.containers.BaselineLine records. The +last and first point of each boundary polygon are connected.

+
+
Raises:
+
+
+
Return type:
+

kraken.containers.Segmentation

+
+
+

Notes

+

Multi-model operation is most useful for combining one or more region +detection models and one text line model. Detected lines from all +models are simply combined without any merging or duplicate detection +so the chance of the same line appearing multiple times in the output +are high. In addition, neural reading order determination is disabled +when more than one model outputs lines.

+
+ +
+
+

kraken.pageseg module

+
+

Note

+

pageseg is the legacy bounding box-based segmenter. For the trainable +baseline segmenter interface refer to the blla module. Note that +recognition models are not interchangeable between segmenters.

+
+
+
+kraken.pageseg.segment(im, text_direction='horizontal-lr', scale=None, maxcolseps=2, black_colseps=False, no_hlines=True, pad=0, mask=None, reading_order_fn=reading_order)
+

Segments a page into text lines.

+

Segments a page into text lines and returns the absolute coordinates of +each line in reading order.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – A bi-level page of mode ‘1’ or ‘L’

  • +
  • text_direction (str) – Principal direction of the text +(horizontal-lr/rl/vertical-lr/rl)

  • +
  • scale (Optional[float]) – Scale of the image. Will be auto-determined if set to None.

  • +
  • maxcolseps (float) – Maximum number of whitespace column separators

  • +
  • black_colseps (bool) – Whether column separators are assumed to be vertical +black lines or not

  • +
  • no_hlines (bool) – Switch for small horizontal line removal.

  • +
  • pad (Union[int, Tuple[int, int]]) – Padding to add to line bounding boxes. If int the same padding is +used both left and right. If a 2-tuple, uses (padding_left, +padding_right).

  • +
  • mask (Optional[numpy.ndarray]) – A bi-level mask image of the same size as im where 0-valued +regions are ignored for segmentation purposes. Disables column +detection.

  • +
  • reading_order_fn (Callable) – Function to call to order line output. Callable +accepting a list of slices (y, x) and a text +direction in (rl, lr).

  • +
+
+
Returns:
+

A kraken.containers.Segmentation class containing reading +order sorted bounding box-type lines as +kraken.containers.BBoxLine records.

+
+
Raises:
+

KrakenInputException – if the input image is not binarized or the text +direction is invalid.

+
+
Return type:
+

kraken.containers.Segmentation

+
+
+
+ +
+
+
+

Recognition

+
+

kraken.rpred module

+
+
+class kraken.rpred.mm_rpred(nets, im, bounds, pad=16, bidi_reordering=True, tags_ignore=None, no_legacy_polygons=False)
+

Multi-model version of kraken.rpred.rpred

+
+
Parameters:
+
+
+
+
+
+bidi_reordering
+
+ +
+
+bounds
+
+ +
+
+im
+
+ +
+
+len
+
+ +
+
+line_iter
+
+ +
+
+nets
+
+ +
+
+no_legacy_polygons
+
+ +
+
+one_channel_modes
+
+ +
+
+pad
+
+ +
+
+seg_types
+
+ +
+
+tags_ignore
+
+ +
+ +
+
+kraken.rpred.rpred(network, im, bounds, pad=16, bidi_reordering=True, no_legacy_polygons=False)
+

Uses a TorchSeqRecognizer and a segmentation to recognize text

+
+
Parameters:
+
    +
  • network (kraken.lib.models.TorchSeqRecognizer) – A TorchSegRecognizer object

  • +
  • im (PIL.Image.Image) – Image to extract text from

  • +
  • bounds (kraken.containers.Segmentation) – A Segmentation class instance containing either a baseline or +bbox segmentation.

  • +
  • pad (int) – Extra blank padding to the left and right of text line. +Auto-disabled when expected network inputs are incompatible with +padding.

  • +
  • bidi_reordering (Union[bool, str]) – Reorder classes in the ocr_record according to the +Unicode bidirectional algorithm for correct display. +Set to L|R to change base text direction.

  • +
  • no_legacy_polygons (bool)

  • +
+
+
Yields:
+

An ocr_record containing the recognized text, absolute character +positions, and confidence values for each character.

+
+
Return type:
+

Generator[kraken.containers.ocr_record, None, None]

+
+
+
+ +
+
+
+

Serialization

+
+

kraken.serialization module

+
+
+kraken.serialization.render_report(model, chars, errors, char_confusions, scripts, insertions, deletions, substitutions)
+

Renders an accuracy report.

+
+
Parameters:
+
    +
  • model (str) – Model name.

  • +
  • errors (int) – Number of errors on test set.

  • +
  • char_confusions (dict) – Dictionary mapping a tuple (gt, pred) to a +number of occurrences.

  • +
  • scripts (dict) – Dictionary counting character per script.

  • +
  • insertions (dict) – Dictionary counting insertion operations per Unicode +script

  • +
  • deletions (int) – Number of deletions

  • +
  • substitutions (dict) – Dictionary counting substitution operations per +Unicode script.

  • +
  • chars (int)

  • +
+
+
Returns:
+

A string containing the rendered report.

+
+
Return type:
+

str

+
+
+
+ +
+
+kraken.serialization.serialize(results, image_size=(0, 0), writing_mode='horizontal-tb', scripts=None, template='alto', template_source='native', processing_steps=None)
+

Serializes recognition and segmentation results into an output document.

+

Serializes a Segmentation container object containing either segmentation +or recognition results into an output document. The rendering is performed +with jinja2 templates that can either be shipped with kraken +(template_source == ‘native’) or custom (template_source == ‘custom’).

+

Note: Empty records are ignored for serialization purposes.

+
+
Parameters:
+
    +
  • segmentation – Segmentation container object

  • +
  • image_size (Tuple[int, int]) – Dimensions of the source image

  • +
  • writing_mode (Literal['horizontal-tb', 'vertical-lr', 'vertical-rl']) – Sets the principal layout of lines and the +direction in which blocks progress. Valid values are +horizontal-tb, vertical-rl, and vertical-lr.

  • +
  • scripts (Optional[Iterable[str]]) – List of scripts contained in the OCR records

  • +
  • template ([os.PathLike, str]) – Selector for the serialization format. May be ‘hocr’, +‘alto’, ‘page’ or any template found in the template +directory. If template_source is set to custom a path to a +template is expected.

  • +
  • template_source (Literal['native', 'custom']) – Switch to enable loading of custom templates from +outside the kraken package.

  • +
  • processing_steps (Optional[List[kraken.containers.ProcessingStep]]) – A list of ProcessingStep container classes describing +the processing kraken performed on the inputs.

  • +
  • results (kraken.containers.Segmentation)

  • +
+
+
Returns:
+

The rendered template

+
+
Return type:
+

str

+
+
+
+ +
+
+

Default templates

+
+

ALTO 4.4

+
+
+

PageXML

+
+
+

hOCR

+
+
+

ABBYY XML

+
+
+
+
+

Containers and Helpers

+
+

kraken.lib.codec module

+
+
+class kraken.lib.codec.PytorchCodec(charset, strict=False)
+

Builds a codec converting between graphemes/code points and integer +label sequences.

+

charset may either be a string, a list or a dict. In the first case +each code point will be assigned a label, in the second case each +string in the list will be assigned a label, and in the final case each +key string will be mapped to the value sequence of integers. In the +first two cases labels will be assigned automatically. When a mapping +is manually provided the label codes need to be a prefix-free code.

+

As 0 is the blank label in a CTC output layer, output labels and input +dictionaries are/should be 1-indexed.

+
+
Parameters:
+
    +
  • charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) – Input character set.

  • +
  • strict – Flag indicating if encoding/decoding errors should be ignored +or cause an exception.

  • +
+
+
Raises:
+

KrakenCodecException – If the character set contains duplicate +entries or the mapping is non-singular or +non-prefix-free.

+
+
+
+
+add_labels(charset)
+

Adds additional characters/labels to the codec.

+

charset may either be a string, a list or a dict. In the first case +each code point will be assigned a label, in the second case each +string in the list will be assigned a label, and in the final case each +key string will be mapped to the value sequence of integers. In the +first two cases labels will be assigned automatically.

+

As 0 is the blank label in a CTC output layer, output labels and input +dictionaries are/should be 1-indexed.

+
+
Parameters:
+

charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) – Input character set.

+
+
Return type:
+

PytorchCodec

+
+
+
+ +
+
+c_sorted
+
+ +
+
+decode(labels)
+

Decodes a labelling.

+

Given a labelling with cuts and confidences returns a string with the +cuts and confidences aggregated across label-code point +correspondences. When decoding multilabels to code points the resulting +cuts are min/max, confidences are averaged.

+
+
Parameters:
+

labels (Sequence[Tuple[int, int, int, float]]) – Input containing tuples (label, start, end, +confidence).

+
+
Returns:
+

A list of tuples (code point, start, end, confidence)

+
+
Return type:
+

List[Tuple[str, int, int, float]]

+
+
+
+ +
+
+encode(s)
+

Encodes a string into a sequence of labels.

+

If the code is non-singular we greedily encode the longest sequence first.

+
+
Parameters:
+

s (str) – Input unicode string

+
+
Returns:
+

Ecoded label sequence

+
+
Raises:
+

KrakenEncodeException – if the a subsequence is not encodable and the +codec is set to strict mode.

+
+
Return type:
+

torch.IntTensor

+
+
+
+ +
+
+property is_valid: bool
+

Returns True if the codec is prefix-free (in label space) and +non-singular (in both directions).

+
+
Return type:
+

bool

+
+
+
+ +
+
+l2c: Dict[Tuple[int], str]
+
+ +
+
+l2c_single
+
+ +
+
+property max_label: int
+

Returns the maximum label value.

+
+
Return type:
+

int

+
+
+
+ +
+
+merge(codec)
+

Transforms this codec (c1) into another (c2) reusing as many labels as +possible.

+

The resulting codec is able to encode the same code point sequences +while not necessarily having the same labels for them as c2. +Retains matching character -> label mappings from both codecs, removes +mappings not c2, and adds mappings not in c1. Compound labels in c2 for +code point sequences not in c1 containing labels also in use in c1 are +added as separate labels.

+
+
Parameters:
+

codec (PytorchCodec) – PytorchCodec to merge with

+
+
Returns:
+

A merged codec and a list of labels that were removed from the +original codec.

+
+
Return type:
+

Tuple[PytorchCodec, Set]

+
+
+
+ +
+
+strict
+
+ +
+ +
+
+

kraken.containers module

+
+
+class kraken.containers.Segmentation
+

A container class for segmentation or recognition results.

+

In order to allow easy JSON de-/serialization, nested classes for lines +(BaselineLine/BBoxLine) and regions (Region) are reinstantiated from their +dictionaries.

+
+
+type
+

Field indicating if baselines +(kraken.containers.BaselineLine) or bbox +(kraken.containers.BBoxLine) line records are in the +segmentation.

+
+ +
+
+imagename
+

Path to the image associated with the segmentation.

+
+ +
+
+text_direction
+

Sets the principal orientation (of the line), i.e. +horizontal/vertical, and reading direction (of the +document), i.e. lr/rl.

+
+ +
+
+script_detection
+

Flag indicating if the line records have tags.

+
+ +
+
+lines
+

List of line records. Records are expected to be in a valid +reading order.

+
+ +
+
+regions
+

Dict mapping types to lists of regions.

+
+ +
+
+line_orders
+

List of alternative reading orders for the segmentation. +Each reading order is a list of line indices.

+
+ +
+
+imagename: str | os.PathLike
+
+ +
+
+line_orders: List[List[int]] | None = None
+
+ +
+
+lines: List[BaselineLine | BBoxLine] | None = None
+
+ +
+
+regions: Dict[str, List[Region]] | None = None
+
+ +
+
+script_detection: bool
+
+ +
+
+text_direction: Literal['horizontal-lr', 'horizontal-rl', 'vertical-lr', 'vertical-rl']
+
+ +
+
+type: Literal['baselines', 'bbox']
+
+ +
+ +
+
+class kraken.containers.BaselineLine
+

Baseline-type line record.

+

A container class for a single line in baseline + bounding polygon format, +optionally containing a transcription, tags, or associated regions.

+
+
+id
+

Unique identifier

+
+ +
+
+baseline
+

List of tuples (x_n, y_n) defining the baseline.

+
+ +
+
+boundary
+

List of tuples (x_n, y_n) defining the bounding polygon of +the line. The first and last points should be identical.

+
+ +
+
+text
+

Transcription of this line.

+
+ +
+
+base_dir
+

An optional string defining the base direction (also called +paragraph direction) for the BiDi algorithm. Valid values are +‘L’ or ‘R’. If None is given the default auto-resolution will +be used.

+
+ +
+
+imagename
+

Path to the image associated with the line.

+
+ +
+
+tags
+

A dict mapping types to values.

+
+ +
+
+split
+

Defines whether this line is in the train, validation, or +test set during training.

+
+ +
+
+regions
+

A list of identifiers of regions the line is associated with.

+
+ +
+
+base_dir: Literal['L', 'R'] | None = None
+
+ +
+
+baseline: List[Tuple[int, int]]
+
+ +
+
+boundary: List[Tuple[int, int]]
+
+ +
+
+id: str
+
+ +
+
+imagename: str | os.PathLike | None = None
+
+ +
+
+regions: List[str] | None = None
+
+ +
+
+split: Literal['train', 'validation', 'test'] | None = None
+
+ +
+
+tags: Dict[str, str] | None = None
+
+ +
+
+text: str | None = None
+
+ +
+
+type: str = 'baselines'
+
+ +
+ +
+
+class kraken.containers.BBoxLine
+

Bounding box-type line record.

+

A container class for a single line in axis-aligned bounding box format, +optionally containing a transcription, tags, or associated regions.

+
+
+id
+

Unique identifier

+
+ +
+
+bbox
+

Tuple in form (xmin, ymin, xmax, ymax) defining +the bounding box.

+
+ +
+
+text
+

Transcription of this line.

+
+ +
+
+base_dir
+

An optional string defining the base direction (also called +paragraph direction) for the BiDi algorithm. Valid values are +‘L’ or ‘R’. If None is given the default auto-resolution will +be used.

+
+ +
+
+imagename
+

Path to the image associated with the line..

+
+ +
+
+tags
+

A dict mapping types to values.

+
+ +
+
+split
+

Defines whether this line is in the train, validation, or +test set during training.

+
+ +
+
+regions
+

A list of identifiers of regions the line is associated with.

+
+ +
+
+text_direction
+

Sets the principal orientation (of the line) and +reading direction (of the document).

+
+ +
+
+base_dir: Literal['L', 'R'] | None = None
+
+ +
+
+bbox: Tuple[int, int, int, int]
+
+ +
+
+id: str
+
+ +
+
+imagename: str | os.PathLike | None = None
+
+ +
+
+regions: List[str] | None = None
+
+ +
+
+split: Literal['train', 'validation', 'test'] | None = None
+
+ +
+
+tags: Dict[str, str] | None = None
+
+ +
+
+text: str | None = None
+
+ +
+
+text_direction: Literal['horizontal-lr', 'horizontal-rl', 'vertical-lr', 'vertical-rl'] = 'horizontal-lr'
+
+ +
+
+type: str = 'bbox'
+
+ +
+ +
+
+class kraken.containers.ocr_record(prediction, cuts, confidences, display_order=True)
+

A record object containing the recognition result of a single line

+
+
Parameters:
+
    +
  • prediction (str)

  • +
  • cuts (List[Union[Tuple[int, int], Tuple[Tuple[int, int], Tuple[int, int], Tuple[int, int], Tuple[int, int]]]])

  • +
  • confidences (List[float])

  • +
  • display_order (bool)

  • +
+
+
+
+
+base_dir = None
+
+ +
+
+property confidences: List[float]
+
+
Return type:
+

List[float]

+
+
+
+ +
+
+property cuts: List
+
+
Return type:
+

List

+
+
+
+ +
+
+abstract display_order(base_dir)
+
+
Return type:
+

ocr_record

+
+
+
+ +
+
+abstract logical_order(base_dir)
+
+
Return type:
+

ocr_record

+
+
+
+ +
+
+property prediction: str
+
+
Return type:
+

str

+
+
+
+ +
+
+abstract property type
+
+ +
+ +
+
+class kraken.containers.BaselineOCRRecord(prediction, cuts, confidences, line, base_dir=None, display_order=True)
+

A record object containing the recognition result of a single line in +baseline format.

+
+
Parameters:
+
    +
  • prediction (str)

  • +
  • cuts (List[Tuple[int, int]])

  • +
  • confidences (List[float])

  • +
  • line (Union[BaselineLine, Dict[str, Any]])

  • +
  • base_dir (Optional[Literal['L', 'R']])

  • +
  • display_order (bool)

  • +
+
+
+
+
+type
+

‘baselines’ to indicate a baseline record

+
+ +
+
+prediction
+

The text predicted by the network as one continuous string.

+
+
Return type:
+

str

+
+
+
+ +
+
+cuts
+

The absolute bounding polygons for each code point in prediction +as a list of tuples [(x0, y0), (x1, y2), …].

+
+
Return type:
+

List[Tuple[int, int]]

+
+
+
+ +
+
+confidences
+

A list of floats indicating the confidence value of each +code point.

+
+
Return type:
+

List[float]

+
+
+
+ +
+
+base_dir
+

An optional string defining the base direction (also called +paragraph direction) for the BiDi algorithm. Valid values are +‘L’ or ‘R’. If None is given the default auto-resolution will +be used.

+
+ +
+
+display_order
+

Flag indicating the order of the code points in the +prediction. In display order (True) the n-th code +point in the string corresponds to the n-th leftmost +code point, in logical order (False) the n-th code +point corresponds to the n-th read code point. See [UAX +#9](https://unicode.org/reports/tr9) for more details.

+
+
Parameters:
+

base_dir (Optional[Literal['L', 'R']])

+
+
Return type:
+

BaselineOCRRecord

+
+
+
+ +

Notes

+

When slicing the record the behavior of the cuts is changed from +earlier versions of kraken. Instead of returning per-character bounding +polygons a single polygons section of the line bounding polygon +starting at the first and extending to the last code point emitted by +the network is returned. This aids numerical stability when computing +aggregated bounding polygons such as for words. Individual code point +bounding polygons are still accessible through the cuts attribute or +by iterating over the record code point by code point.

+
+
+base_dir
+
+ +
+
+property cuts: List[Tuple[int, int]]
+
+
Return type:
+

List[Tuple[int, int]]

+
+
+
+ +
+
+display_order(base_dir=None)
+

Returns the OCR record in Unicode display order, i.e. ordered from left +to right inside the line.

+
+
Parameters:
+

base_dir (Optional[Literal['L', 'R']]) – An optional string defining the base direction (also +called paragraph direction) for the BiDi algorithm. Valid +values are ‘L’ or ‘R’. If None is given the default +auto-resolution will be used.

+
+
Return type:
+

BaselineOCRRecord

+
+
+
+ +
+
+logical_order(base_dir=None)
+

Returns the OCR record in Unicode logical order, i.e. in the order the +characters in the line would be read by a human.

+
+
Parameters:
+

base_dir (Optional[Literal['L', 'R']]) – An optional string defining the base direction (also +called paragraph direction) for the BiDi algorithm. Valid +values are ‘L’ or ‘R’. If None is given the default +auto-resolution will be used.

+
+
Return type:
+

BaselineOCRRecord

+
+
+
+ +
+
+type = 'baselines'
+
+ +
+ +
+
+class kraken.containers.BBoxOCRRecord(prediction, cuts, confidences, line, base_dir=None, display_order=True)
+

A record object containing the recognition result of a single line in +bbox format.

+
+
Parameters:
+
    +
  • prediction (str)

  • +
  • cuts (List[Tuple[Tuple[int, int], Tuple[int, int], Tuple[int, int], Tuple[int, int]]])

  • +
  • confidences (List[float])

  • +
  • line (Union[BBoxLine, Dict[str, Any]])

  • +
  • base_dir (Optional[Literal['L', 'R']])

  • +
  • display_order (bool)

  • +
+
+
+
+
+type
+

‘bbox’ to indicate a bounding box record

+
+ +
+
+prediction
+

The text predicted by the network as one continuous string.

+
+
Return type:
+

str

+
+
+
+ +
+
+cuts
+

The absolute bounding polygons for each code point in prediction +as a list of 4-tuples ((x0, y0), (x1, y0), (x1, y1), (x0, y1)).

+
+
Return type:
+

List

+
+
+
+ +
+
+confidences
+

A list of floats indicating the confidence value of each +code point.

+
+
Return type:
+

List[float]

+
+
+
+ +
+
+base_dir
+

An optional string defining the base direction (also called +paragraph direction) for the BiDi algorithm. Valid values are +‘L’ or ‘R’. If None is given the default auto-resolution will +be used.

+
+ +
+
+display_order
+

Flag indicating the order of the code points in the +prediction. In display order (True) the n-th code +point in the string corresponds to the n-th leftmost +code point, in logical order (False) the n-th code +point corresponds to the n-th read code point. See [UAX +#9](https://unicode.org/reports/tr9) for more details.

+
+
Parameters:
+

base_dir (Optional[Literal['L', 'R']])

+
+
Return type:
+

BBoxOCRRecord

+
+
+
+ +

Notes

+

When slicing the record the behavior of the cuts is changed from +earlier versions of kraken. Instead of returning per-character bounding +polygons a single polygons section of the line bounding polygon +starting at the first and extending to the last code point emitted by +the network is returned. This aids numerical stability when computing +aggregated bounding polygons such as for words. Individual code point +bounding polygons are still accessible through the cuts attribute or +by iterating over the record code point by code point.

+
+
+base_dir
+
+ +
+
+display_order(base_dir=None)
+

Returns the OCR record in Unicode display order, i.e. ordered from left +to right inside the line.

+
+
Parameters:
+

base_dir (Optional[Literal['L', 'R']]) – An optional string defining the base direction (also +called paragraph direction) for the BiDi algorithm. Valid +values are ‘L’ or ‘R’. If None is given the default +auto-resolution will be used.

+
+
Return type:
+

BBoxOCRRecord

+
+
+
+ +
+
+logical_order(base_dir=None)
+

Returns the OCR record in Unicode logical order, i.e. in the order the +characters in the line would be read by a human.

+
+
Parameters:
+

base_dir (Optional[Literal['L', 'R']]) – An optional string defining the base direction (also +called paragraph direction) for the BiDi algorithm. Valid +values are ‘L’ or ‘R’. If None is given the default +auto-resolution will be used.

+
+
Return type:
+

BBoxOCRRecord

+
+
+
+ +
+
+type = 'bbox'
+
+ +
+ +
+
+class kraken.containers.ProcessingStep
+

A processing step in the recognition pipeline.

+
+
+id
+

Unique identifier

+
+ +
+
+category
+

Category of processing step that has been performed.

+
+ +
+
+description
+

Natural-language description of the process.

+
+ +
+
+settings
+

Dict describing the parameters of the processing step.

+
+ +
+
+category: Literal['preprocessing', 'processing', 'postprocessing']
+
+ +
+
+description: str
+
+ +
+
+id: str
+
+ +
+
+settings: Dict[str, Dict | str | float | int | bool]
+
+ +
+ +
+
+

kraken.lib.ctc_decoder

+
+
+kraken.lib.ctc_decoder.beam_decoder(outputs, beam_size=3)
+

Translates back the network output to a label sequence using +same-prefix-merge beam search decoding as described in [0].

+

[0] Hannun, Awni Y., et al. “First-pass large vocabulary continuous speech +recognition using bi-directional recurrent DNNs.” arXiv preprint +arXiv:1408.2873 (2014).

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • beam_size (int) – Size of the beam

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, prob). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+kraken.lib.ctc_decoder.greedy_decoder(outputs)
+

Translates back the network output to a label sequence using greedy/best +path decoding as described in [0].

+

[0] Graves, Alex, et al. “Connectionist temporal classification: labelling +unsegmented sequence data with recurrent neural networks.” Proceedings of +the 23rd international conference on Machine learning. ACM, 2006.

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, max). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+kraken.lib.ctc_decoder.blank_threshold_decoder(outputs, threshold=0.5)
+

Translates back the network output to a label sequence as the original +ocropy/clstm.

+

Thresholds on class 0, then assigns the maximum (non-zero) class to each +region.

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • threshold (float) – Threshold for 0 class when determining possible label +locations.

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, max). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+

kraken.lib.exceptions

+
+
+class kraken.lib.exceptions.KrakenCodecException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenStopTrainingException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenEncodeException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenRecordException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenInvalidModelException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenInputException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenRepoException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenCairoSurfaceException(message, width, height)
+

Raised when the Cairo surface couldn’t be created.

+
+
Parameters:
+
    +
  • message (str)

  • +
  • width (int)

  • +
  • height (int)

  • +
+
+
+
+
+message
+

Error message

+
+
Type:
+

str

+
+
+
+ +
+
+width
+

Width of the surface

+
+
Type:
+

int

+
+
+
+ +
+
+height
+

Height of the surface

+
+
Type:
+

int

+
+
+
+ +
+
+height
+
+ +
+
+message
+
+ +
+
+width
+
+ +
+ +
+
+

kraken.lib.models module

+
+
+class kraken.lib.models.TorchSeqRecognizer(nn, decoder=kraken.lib.ctc_decoder.greedy_decoder, train=False, device='cpu')
+

A wrapper class around a TorchVGSLModel for text recognition.

+
+
Parameters:
+
+
+
+
+
+codec
+
+ +
+
+decoder
+
+ +
+
+device
+
+ +
+
+forward(line, lens=None)
+

Performs a forward pass on a torch tensor of one or more lines with +shape (N, C, H, W) and returns a numpy array (N, W, C).

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (torch.Tensor) – Optional tensor containing sequence lengths if N > 1

  • +
+
+
Returns:
+

Tuple with (N, W, C) shaped numpy array and final output sequence +lengths.

+
+
Raises:
+

KrakenInputException – Is raised if the channel dimension isn’t of +size 1 in the network output.

+
+
Return type:
+

Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]

+
+
+
+ +
+
+kind = ''
+
+ +
+
+nn
+
+ +
+
+one_channel_mode
+
+ +
+
+predict(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns the decoding as a list of tuples (string, start, end, +confidence).

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (Optional[torch.Tensor]) – Optional tensor containing sequence lengths if N > 1

  • +
+
+
Returns:
+

List of decoded sequences.

+
+
Return type:
+

List[List[Tuple[str, int, int, float]]]

+
+
+
+ +
+
+predict_labels(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns a list of tuples (class, start, end, max). Max is the +maximum value of the softmax layer in the region.

+
+
Parameters:
+
    +
  • line (torch.tensor)

  • +
  • lens (torch.Tensor)

  • +
+
+
Return type:
+

List[List[Tuple[int, int, int, float]]]

+
+
+
+ +
+
+predict_string(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns a string of the results.

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (Optional[torch.Tensor]) – Optional tensor containing the sequence lengths of the input batch.

  • +
+
+
Return type:
+

List[str]

+
+
+
+ +
+
+seg_type
+
+ +
+
+to(device)
+

Moves model to device and automatically loads input tensors onto it.

+
+ +
+
+train
+
+ +
+ +
+
+kraken.lib.models.load_any(fname, train=False, device='cpu')
+

Loads anything that was, is, and will be a valid ocropus model and +instantiates a shiny new kraken.lib.lstm.SeqRecognizer from the RNN +configuration in the file.

+

Currently it recognizes the following kinds of models:

+
+
    +
  • protobuf models containing VGSL segmentation and recognition +networks.

  • +
+
+

Additionally an attribute ‘kind’ will be added to the SeqRecognizer +containing a string representation of the source kind. Current known values +are:

+
+
    +
  • vgsl for VGSL models

  • +
+
+
+
Parameters:
+
    +
  • fname (Union[os.PathLike, str]) – Path to the model

  • +
  • train (bool) – Enables gradient calculation and dropout layers in model.

  • +
  • device (str) – Target device

  • +
+
+
Returns:
+

A kraken.lib.models.TorchSeqRecognizer object.

+
+
Raises:
+

KrakenInvalidModelException – if the model is not loadable by any parser.

+
+
Return type:
+

TorchSeqRecognizer

+
+
+
+ +
+
+

kraken.lib.segmentation module

+
+
+kraken.lib.segmentation.reading_order(lines, text_direction='lr')
+

Given the list of lines (a list of 2D slices), computes +the partial reading order. The output is a binary 2D array +such that order[i,j] is true if line i comes before line j +in reading order.

+
+
Parameters:
+
    +
  • lines (Sequence[Tuple[slice, slice]])

  • +
  • text_direction (Literal['lr', 'rl'])

  • +
+
+
Return type:
+

numpy.ndarray

+
+
+
+ +
+
+kraken.lib.segmentation.neural_reading_order(lines, text_direction='lr', regions=None, im_size=None, model=None, class_mapping=None)
+

Given a list of baselines and regions, calculates the correct reading order +and applies it to the input.

+
+
Parameters:
+
    +
  • lines (Sequence[Dict]) – List of tuples containing the baseline and its polygonization.

  • +
  • model (kraken.lib.vgsl.TorchVGSLModel) – torch Module for

  • +
  • text_direction (str)

  • +
  • regions (Optional[Sequence[shapely.geometry.Polygon]])

  • +
  • im_size (Tuple[int, int])

  • +
  • class_mapping (Dict[str, int])

  • +
+
+
Returns:
+

The indices of the ordered input.

+
+
Return type:
+

Sequence[int]

+
+
+
+ +
+
+kraken.lib.segmentation.polygonal_reading_order(lines, text_direction='lr', regions=None)
+

Given a list of baselines and regions, calculates the correct reading order +and applies it to the input.

+
+
Parameters:
+
    +
  • lines (Sequence[Dict]) – List of tuples containing the baseline and its polygonization.

  • +
  • regions (Optional[Sequence[shapely.geometry.Polygon]]) – List of region polygons.

  • +
  • text_direction (Literal['lr', 'rl']) – Set principal text direction for column ordering. Can +be ‘lr’ or ‘rl’

  • +
+
+
Returns:
+

The indices of the ordered input.

+
+
Return type:
+

Sequence[int]

+
+
+
+ +
+
+kraken.lib.segmentation.vectorize_lines(im, threshold=0.17, min_length=5, text_direction='horizontal')
+

Vectorizes lines from a binarized array.

+
+
Parameters:
+
    +
  • im (np.ndarray) – Array of shape (3, H, W) with the first dimension +being probabilities for (start_separators, +end_separators, baseline).

  • +
  • threshold (float) – Threshold for baseline blob detection.

  • +
  • min_length (int) – Minimal length of output baselines.

  • +
  • text_direction (str) – Base orientation of the text line (horizontal or +vertical).

  • +
+
+
Returns:
+

[[x0, y0, … xn, yn], [xm, ym, …, xk, yk], … ] +A list of lists containing the points of all baseline polylines.

+
+
+
+ +
+
+kraken.lib.segmentation.calculate_polygonal_environment(im=None, baselines=None, suppl_obj=None, im_feats=None, scale=None, topline=False, raise_on_error=False)
+

Given a list of baselines and an input image, calculates a polygonal +environment around each baseline.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – grayscale input image (mode ‘L’)

  • +
  • baselines (Sequence[Sequence[Tuple[int, int]]]) – List of lists containing a single baseline per entry.

  • +
  • suppl_obj (Sequence[Sequence[Tuple[int, int]]]) – List of lists containing additional polylines that should be +considered hard boundaries for polygonizaton purposes. Can +be used to prevent polygonization into non-text areas such +as illustrations or to compute the polygonization of a +subset of the lines in an image.

  • +
  • im_feats (numpy.ndarray) – An optional precomputed seamcarve energy map. Overrides data +in im. The default map is gaussian_filter(sobel(im), 2).

  • +
  • scale (Tuple[int, int]) – A 2-tuple (h, w) containing optional scale factors of the input. +Values of 0 are used for aspect-preserving scaling. None skips +input scaling.

  • +
  • topline (bool) – Switch to change default baseline location for offset +calculation purposes. If set to False, baselines are assumed +to be on the bottom of the text line and will be offset +upwards, if set to True, baselines are on the top and will be +offset downwards. If set to None, no offset will be applied.

  • +
  • raise_on_error (bool) – Raises error instead of logging them when they are +not-blocking

  • +
+
+
Returns:
+

List of lists of coordinates. If no polygonization could be compute for +a baseline None is returned instead.

+
+
+
+ +
+
+kraken.lib.segmentation.scale_polygonal_lines(lines, scale)
+

Scales baselines/polygon coordinates by a certain factor.

+
+
Parameters:
+
    +
  • lines (Sequence[Tuple[List, List]]) – List of tuples containing the baseline and its polygonization.

  • +
  • scale (Union[float, Tuple[float, float]]) – Scaling factor

  • +
+
+
Return type:
+

Sequence[Tuple[List, List]]

+
+
+
+ +
+
+kraken.lib.segmentation.scale_regions(regions, scale)
+

Scales baselines/polygon coordinates by a certain factor.

+
+
Parameters:
+
    +
  • lines – List of tuples containing the baseline and its polygonization.

  • +
  • scale (Union[float, Tuple[float, float]]) – Scaling factor

  • +
  • regions (Sequence[Tuple[List[int], List[int]]])

  • +
+
+
Return type:
+

Sequence[Tuple[List, List]]

+
+
+
+ +
+
+kraken.lib.segmentation.compute_polygon_section(baseline, boundary, dist1, dist2)
+

Given a baseline, polygonal boundary, and two points on the baseline return +the rectangle formed by the orthogonal cuts on that baseline segment. The +resulting polygon is not garantueed to have a non-zero area.

+

The distance can be larger than the actual length of the baseline if the +baseline endpoints are inside the bounding polygon. In that case the +baseline will be extrapolated to the polygon edge.

+
+
Parameters:
+
    +
  • baseline (Sequence[Tuple[int, int]]) – A polyline ((x1, y1), …, (xn, yn))

  • +
  • boundary (Sequence[Tuple[int, int]]) – A bounding polygon around the baseline (same format as +baseline). Last and first point are automatically connected.

  • +
  • dist1 (int) – Absolute distance along the baseline of the first point.

  • +
  • dist2 (int) – Absolute distance along the baseline of the second point.

  • +
+
+
Returns:
+

A sequence of polygon points.

+
+
Return type:
+

Tuple[Tuple[int, int]]

+
+
+
+ +
+
+kraken.lib.segmentation.extract_polygons(im, bounds, legacy=False)
+

Yields the subimages of image im defined in the list of bounding polygons +with baselines preserving order.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image

  • +
  • bounds (kraken.containers.Segmentation) – A Segmentation class containing a bounding box or baseline +segmentation.

  • +
  • legacy (bool) – Use the old, slow, and deprecated path

  • +
+
+
Yields:
+

The extracted subimage, and the corresponding bounding box or baseline

+
+
Return type:
+

Generator[Tuple[PIL.Image.Image, Union[kraken.containers.BBoxLine, kraken.containers.BaselineLine]], None, None]

+
+
+
+ +
+
+

kraken.lib.vgsl module

+
+
+class kraken.lib.vgsl.TorchVGSLModel(spec)
+

Class building a torch module from a VSGL spec.

+

The initialized class will contain a variable number of layers and a loss +function. Inputs and outputs are always 4D tensors in order (batch, +channels, height, width) with channels always being the feature dimension.

+

Importantly this means that a recurrent network will be fed the channel +vector at each step along its time axis, i.e. either put the non-time-axis +dimension into the channels dimension or use a summarizing RNN squashing +the time axis to 1 and putting the output into the channels dimension +respectively.

+
+
Parameters:
+

spec (str)

+
+
+
+
+input
+

Expected input tensor as a 4-tuple.

+
+ +
+
+nn
+

Stack of layers parsed from the spec.

+
+ +
+
+criterion
+

Fully parametrized loss function.

+
+ +
+
+user_metadata
+

dict with user defined metadata. Is flushed into +model file during saving/overwritten by loading +operations.

+
+ +
+
+one_channel_mode
+

Field indicating the image type used during +training of one-channel images. Is ‘1’ for +models trained on binarized images, ‘L’ for +grayscale, and None otherwise.

+
+ +
+
+add_codec(codec)
+

Adds a PytorchCodec to the model.

+
+
Parameters:
+

codec (kraken.lib.codec.PytorchCodec)

+
+
Return type:
+

None

+
+
+
+ +
+
+append(idx, spec)
+

Splits a model at layer idx and append layers spec.

+

New layers are initialized using the init_weights method.

+
+
Parameters:
+
    +
  • idx (int) – Index of layer to append spec to starting with 1. To +select the whole layer stack set idx to None.

  • +
  • spec (str) – VGSL spec without input block to append to model.

  • +
+
+
Return type:
+

None

+
+
+
+ +
+
+property aux_layers
+
+ +
+
+blocks
+
+ +
+
+build_addition(input, blocks, idx, target_output_shape=None)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_conv(input, blocks, idx, target_output_shape=None)
+

Builds a 2D convolution layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_dropout(input, blocks, idx, target_output_shape=None)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_groupnorm(input, blocks, idx, target_output_shape=None)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_identity(input, blocks, idx, target_output_shape=None)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_maxpool(input, blocks, idx, target_output_shape=None)
+

Builds a maxpool layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_output(input, blocks, idx, target_output_shape=None)
+

Builds an output layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_parallel(input, blocks, idx, target_output_shape=None)
+

Builds a block of parallel layers.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_reshape(input, blocks, idx, target_output_shape=None)
+

Builds a reshape layer

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_rnn(input, blocks, idx, target_output_shape=None)
+

Builds an LSTM/GRU layer returning number of outputs and layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_ro(input, blocks, idx)
+

Builds a RO determination layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_series(input, blocks, idx, target_output_shape=None)
+

Builds a serial block of layers.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_wav2vec2(input, blocks, idx, target_output_shape=None)
+

Builds a Wav2Vec2 masking layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+codec: kraken.lib.codec.PytorchCodec | None = None
+
+ +
+
+criterion: Any = None
+
+ +
+
+eval()
+

Sets the model to evaluation/inference mode, disabling dropout and +gradient calculation.

+
+
Return type:
+

None

+
+
+
+ +
+
+property hyper_params
+
+ +
+
+idx
+
+ +
+
+init_weights(idx=slice(0, None))
+

Initializes weights for all or a subset of layers in the graph.

+

LSTM/GRU layers are orthogonally initialized, convolutional layers +uniformly from (-0.1,0.1).

+
+
Parameters:
+

idx (slice) – A slice object representing the indices of layers to +initialize.

+
+
Return type:
+

None

+
+
+
+ +
+
+input
+
+ +
+
+classmethod load_model(path)
+

Deserializes a VGSL model from a CoreML file.

+
+
Parameters:
+

path (Union[str, os.PathLike]) – CoreML file

+
+
Returns:
+

A TorchVGSLModel instance.

+
+
Raises:
+
    +
  • KrakenInvalidModelException if the model data is invalid (not a

  • +
  • string, protobuf file, or without appropriate metadata).

  • +
  • FileNotFoundError if the path doesn't point to a file.

  • +
+
+
+
+ +
+
+m
+
+ +
+
+property model_type
+
+ +
+
+named_spec: List[str] = []
+
+ +
+
+nn
+
+ +
+
+property one_channel_mode
+
+ +
+
+ops
+
+ +
+
+pattern
+
+ +
+
+resize_output(output_size, del_indices=None)
+

Resizes an output layer.

+
+
Parameters:
+
    +
  • output_size (int) – New size/output channels of last layer

  • +
  • del_indices (list) – list of outputs to delete from layer

  • +
+
+
Return type:
+

None

+
+
+
+ +
+
+save_model(path)
+

Serializes the model into path.

+
+
Parameters:
+

path (str) – Target destination

+
+
+
+ +
+
+property seg_type
+
+ +
+
+set_num_threads(num)
+

Sets number of OpenMP threads to use.

+
+
Parameters:
+

num (int)

+
+
Return type:
+

None

+
+
+
+ +
+
+spec
+
+ +
+
+to(device)
+
+
Parameters:
+

device (Union[str, torch.device])

+
+
Return type:
+

None

+
+
+
+ +
+
+train()
+

Sets the model to training mode (enables dropout layers and disables +softmax on CTC layers).

+
+
Return type:
+

None

+
+
+
+ +
+
+property use_legacy_polygons
+
+ +
+
+user_metadata: Dict[str, Any]
+
+ +
+ +
+
+

kraken.lib.xml module

+
+
+class kraken.lib.xml.XMLPage(filename, filetype='xml')
+
+
Parameters:
+
    +
  • filename (Union[str, os.PathLike])

  • +
  • filetype (Literal['xml', 'alto', 'page'])

  • +
+
+
+
+ +
+
+
+

Training

+
+

kraken.lib.train module

+
+
+

Loss and Evaluation Functions

+
+
+

Trainer

+
+
+class kraken.lib.train.KrakenTrainer(enable_progress_bar=True, enable_summary=True, min_epochs=5, max_epochs=100, freeze_backbone=-1, pl_logger=None, log_dir=None, *args, **kwargs)
+
+
Parameters:
+
    +
  • enable_progress_bar (bool)

  • +
  • enable_summary (bool)

  • +
  • min_epochs (int)

  • +
  • max_epochs (int)

  • +
  • pl_logger (Union[pytorch_lightning.loggers.logger.Logger, str, None])

  • +
  • log_dir (Optional[os.PathLike])

  • +
+
+
+
+
+automatic_optimization = False
+
+ +
+
+fit(*args, **kwargs)
+
+ +
+ +
+
+

kraken.lib.dataset module

+
+

Recognition datasets

+
+
+class kraken.lib.dataset.ArrowIPCRecognitionDataset(normalization=None, whitespace_normalization=True, skip_empty_lines=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False, split_filter=None)
+

Dataset for training a recognition model from a precompiled dataset in +Arrow IPC format.

+
+
Parameters:
+
    +
  • normalization (Optional[str])

  • +
  • whitespace_normalization (bool)

  • +
  • skip_empty_lines (bool)

  • +
  • reorder (Union[bool, Literal['L', 'R']])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • augmentation (bool)

  • +
  • split_filter (Optional[str])

  • +
+
+
+
+
+add(file)
+

Adds an Arrow IPC file to the dataset.

+
+
Parameters:
+

file (Union[str, os.PathLike]) – Location of the precompiled dataset file.

+
+
Return type:
+

None

+
+
+
+ +
+
+alphabet: collections.Counter
+
+ +
+
+arrow_table = None
+
+ +
+
+aug = None
+
+ +
+
+codec = None
+
+ +
+
+encode(codec=None)
+

Adds a codec to the dataset.

+
+
Parameters:
+

codec (Optional[kraken.lib.codec.PytorchCodec])

+
+
Return type:
+

None

+
+
+
+ +
+
+failed_samples
+
+ +
+
+im_mode
+
+ +
+
+legacy_polygons_status = None
+
+ +
+
+no_encode()
+

Creates an unencoded dataset.

+
+
Return type:
+

None

+
+
+
+ +
+
+rebuild_alphabet()
+

Recomputes the alphabet depending on the given text transformation.

+
+ +
+
+seg_type = None
+
+ +
+
+skip_empty_lines
+
+ +
+
+text_transforms: List[Callable[[str], str]] = []
+
+ +
+
+transforms
+
+ +
+ +
+
+class kraken.lib.dataset.BaselineSet(line_width=4, padding=(0, 0, 0, 0), im_transforms=transforms.Compose([]), augmentation=False, valid_baselines=None, merge_baselines=None, valid_regions=None, merge_regions=None)
+

Dataset for training a baseline/region segmentation model.

+
+
Parameters:
+
    +
  • line_width (int)

  • +
  • padding (Tuple[int, int, int, int])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • augmentation (bool)

  • +
  • valid_baselines (Sequence[str])

  • +
  • merge_baselines (Dict[str, Sequence[str]])

  • +
  • valid_regions (Sequence[str])

  • +
  • merge_regions (Dict[str, Sequence[str]])

  • +
+
+
+
+
+add(doc)
+

Adds a page to the dataset.

+
+
Parameters:
+

doc (kraken.containers.Segmentation) – A Segmentation container class.

+
+
+
+ +
+
+aug = None
+
+ +
+
+class_mapping
+
+ +
+
+class_stats
+
+ +
+
+failed_samples
+
+ +
+
+im_mode = '1'
+
+ +
+
+imgs = []
+
+ +
+
+line_width
+
+ +
+
+mbl_dict
+
+ +
+
+mreg_dict
+
+ +
+
+num_classes = 2
+
+ +
+
+pad
+
+ +
+
+seg_type = None
+
+ +
+
+targets = []
+
+ +
+
+transform(image, target)
+
+ +
+
+transforms
+
+ +
+
+valid_baselines
+
+ +
+
+valid_regions
+
+ +
+ +
+
+class kraken.lib.dataset.GroundTruthDataset(normalization=None, whitespace_normalization=True, skip_empty_lines=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False)
+

Dataset for training a line recognition model.

+

All data is cached in memory.

+
+
Parameters:
+
    +
  • normalization (Optional[str])

  • +
  • whitespace_normalization (bool)

  • +
  • skip_empty_lines (bool)

  • +
  • reorder (Union[bool, str])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • augmentation (bool)

  • +
+
+
+
+
+add(line=None, page=None)
+

Adds an individual line or all lines on a page to the dataset.

+
+
Parameters:
+
+
+
+
+ +
+
+add_line(line)
+

Adds a line to the dataset.

+
+
Parameters:
+

line (kraken.containers.BBoxLine) – BBoxLine container object for a line.

+
+
Raises:
+
    +
  • ValueError if the transcription of the line is empty after

  • +
  • transformation or either baseline or bounding polygon are missing.

  • +
+
+
+
+ +
+
+add_page(page)
+

Adds all lines on a page to the dataset.

+

Invalid lines will be skipped and a warning will be printed.

+
+
Parameters:
+

page (kraken.containers.Segmentation) – Segmentation container object for a page.

+
+
+
+ +
+
+alphabet: collections.Counter
+
+ +
+
+aug = None
+
+ +
+
+encode(codec=None)
+

Adds a codec to the dataset and encodes all text lines.

+

Has to be run before sampling from the dataset.

+
+
Parameters:
+

codec (Optional[kraken.lib.codec.PytorchCodec])

+
+
Return type:
+

None

+
+
+
+ +
+
+failed_samples
+
+ +
+
+im_mode = '1'
+
+ +
+
+no_encode()
+

Creates an unencoded dataset.

+
+
Return type:
+

None

+
+
+
+ +
+
+seg_type = 'bbox'
+
+ +
+
+skip_empty_lines
+
+ +
+
+text_transforms: List[Callable[[str], str]] = []
+
+ +
+
+transforms
+
+ +
+ +
+
+

Segmentation datasets

+
+
+class kraken.lib.dataset.PolygonGTDataset(normalization=None, whitespace_normalization=True, skip_empty_lines=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False, legacy_polygons=False)
+

Dataset for training a line recognition model from polygonal/baseline data.

+
+
Parameters:
+
    +
  • normalization (Optional[str])

  • +
  • whitespace_normalization (bool)

  • +
  • skip_empty_lines (bool)

  • +
  • reorder (Union[bool, Literal['L', 'R']])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • augmentation (bool)

  • +
  • legacy_polygons (bool)

  • +
+
+
+
+
+add(line=None, page=None)
+

Adds an individual line or all lines on a page to the dataset.

+
+
Parameters:
+
+
+
+
+ +
+
+add_line(line)
+

Adds a line to the dataset.

+
+
Parameters:
+

line (kraken.containers.BaselineLine) – BaselineLine container object for a line.

+
+
Raises:
+
    +
  • ValueError if the transcription of the line is empty after

  • +
  • transformation or either baseline or bounding polygon are missing.

  • +
+
+
+
+ +
+
+add_page(page)
+

Adds all lines on a page to the dataset.

+

Invalid lines will be skipped and a warning will be printed.

+
+
Parameters:
+

page (kraken.containers.Segmentation) – Segmentation container object for a page.

+
+
+
+ +
+
+alphabet: collections.Counter
+
+ +
+
+aug = None
+
+ +
+
+encode(codec=None)
+

Adds a codec to the dataset and encodes all text lines.

+

Has to be run before sampling from the dataset.

+
+
Parameters:
+

codec (Optional[kraken.lib.codec.PytorchCodec])

+
+
Return type:
+

None

+
+
+
+ +
+
+failed_samples
+
+ +
+
+im_mode = '1'
+
+ +
+
+legacy_polygons
+
+ +
+
+no_encode()
+

Creates an unencoded dataset.

+
+
Return type:
+

None

+
+
+
+ +
+
+seg_type = 'baselines'
+
+ +
+
+skip_empty_lines
+
+ +
+
+text_transforms: List[Callable[[str], str]] = []
+
+ +
+
+transforms
+
+ +
+ +
+
+

Reading order datasets

+
+
+class kraken.lib.dataset.PairWiseROSet(files=None, mode='xml', level='baselines', ro_id=None, class_mapping=None)
+

Dataset for training a reading order determination model.

+

Returns random pairs of lines from the same page.

+
+
Parameters:
+
    +
  • files (Sequence[Union[os.PathLike, str]])

  • +
  • mode (Optional[Literal['alto', 'page', 'xml']])

  • +
  • level (Literal['regions', 'baselines'])

  • +
  • ro_id (Optional[str])

  • +
  • class_mapping (Optional[Dict[str, int]])

  • +
+
+
+
+
+data = []
+
+ +
+
+failed_samples = []
+
+ +
+
+get_feature_dim()
+
+ +
+ +
+
+class kraken.lib.dataset.PageWiseROSet(files=None, mode='xml', level='baselines', ro_id=None, class_mapping=None)
+

Dataset for training a reading order determination model.

+

Returns all lines from the same page.

+
+
Parameters:
+
    +
  • files (Sequence[Union[os.PathLike, str]])

  • +
  • mode (Optional[Literal['alto', 'page', 'xml']])

  • +
  • level (Literal['regions', 'baselines'])

  • +
  • ro_id (Optional[str])

  • +
  • class_mapping (Optional[Dict[str, int]])

  • +
+
+
+
+
+data = []
+
+ +
+
+failed_samples = []
+
+ +
+
+get_feature_dim()
+
+ +
+ +
+
+

Helpers

+
+
+class kraken.lib.dataset.ImageInputTransforms(batch, height, width, channels, pad, valid_norm=True, force_binarization=False)
+
+
Parameters:
+
    +
  • batch (int)

  • +
  • height (int)

  • +
  • width (int)

  • +
  • channels (int)

  • +
  • pad (Union[int, Tuple[int, int], Tuple[int, int, int, int]])

  • +
  • valid_norm (bool)

  • +
  • force_binarization (bool)

  • +
+
+
+
+
+property batch: int
+

Batch size attribute. Ignored.

+
+
Return type:
+

int

+
+
+
+ +
+
+property centerline_norm: bool
+

Attribute indicating if centerline normalization will be applied to +input images.

+
+
Return type:
+

bool

+
+
+
+ +
+
+property channels: int
+

Channels attribute. Can be either 1 (binary/grayscale), 3 (RGB).

+
+
Return type:
+

int

+
+
+
+ +
+
+property force_binarization: bool
+

Switch enabling/disabling forced binarization.

+
+
Return type:
+

bool

+
+
+
+ +
+
+property height: int
+

Desired output image height. If set to 0, image will be rescaled +proportionally with width, if 1 and channels is larger than 3 output +will be grayscale and of the height set with the channels attribute.

+
+
Return type:
+

int

+
+
+
+ +
+
+property mode: str
+

Imaginary PIL.Image.Image mode of the output tensor. Possible values +are RGB, L, and 1.

+
+
Return type:
+

str

+
+
+
+ +
+
+property pad: int
+

Amount of padding around left/right end of image.

+
+
Return type:
+

int

+
+
+
+ +
+
+property scale: Tuple[int, int]
+

Desired output shape (height, width) of the image. If any value is set +to 0, image will be rescaled proportionally with height, width, if 1 +and channels is larger than 3 output will be grayscale and of the +height set with the channels attribute.

+
+
Return type:
+

Tuple[int, int]

+
+
+
+ +
+
+property valid_norm: bool
+

Switch allowing/disallowing centerline normalization. Even if enabled +won’t be applied to 3-channel images.

+
+
Return type:
+

bool

+
+
+
+ +
+
+property width: int
+

Desired output image width. If set to 0, image will be rescaled +proportionally with height.

+
+
Return type:
+

int

+
+
+
+ +
+ +
+
+kraken.lib.dataset.collate_sequences(batch)
+

Sorts and pads sequences.

+
+ +
+
+kraken.lib.dataset.global_align(seq1, seq2)
+

Computes a global alignment of two strings.

+
+
Parameters:
+
    +
  • seq1 (Sequence[Any])

  • +
  • seq2 (Sequence[Any])

  • +
+
+
Return type:
+

Tuple[int, List[str], List[str]]

+
+
+

Returns a tuple (distance, list(algn1), list(algn2))

+
+ +
+
+kraken.lib.dataset.compute_confusions(algn1, algn2)
+

Compute confusion matrices from two globally aligned strings.

+
+
Parameters:
+
    +
  • align1 (Sequence[str]) – sequence 1

  • +
  • align2 (Sequence[str]) – sequence 2

  • +
  • algn1 (Sequence[str])

  • +
  • algn2 (Sequence[str])

  • +
+
+
Returns:
+

A tuple (counts, scripts, ins, dels, subs) with counts being per-character +confusions, scripts per-script counts, ins a dict with per script +insertions, del an integer of the number of deletions, subs per +script substitutions.

+
+
+
+ +
+
+
+
+

Legacy modules

+

These modules are retained for compatibility reasons or highly specialized use +cases. In most cases their use is not necessary and they aren’t further +developed for interoperability with new functionality, e.g. the transcription +and line generation modules do not work with the baseline segmenter.

+
+

kraken.binarization module

+
+
+kraken.binarization.nlbin(im, threshold=0.5, zoom=0.5, escale=1.0, border=0.1, perc=80, range=20, low=5, high=90)
+

Performs binarization using non-linear processing.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image

  • +
  • threshold (float)

  • +
  • zoom (float) – Zoom for background page estimation

  • +
  • escale (float) – Scale for estimating a mask over the text region

  • +
  • border (float) – Ignore this much of the border

  • +
  • perc (int) – Percentage for filters

  • +
  • range (int) – Range for filters

  • +
  • low (int) – Percentile for black estimation

  • +
  • high (int) – Percentile for white estimation

  • +
+
+
Returns:
+

PIL.Image.Image containing the binarized image

+
+
Raises:
+

KrakenInputException – When trying to binarize an empty image.

+
+
Return type:
+

PIL.Image.Image

+
+
+
+ +
+
+

kraken.transcribe module

+
+
+class kraken.transcribe.TranscriptionInterface(font=None, font_style=None)
+
+
+add_page(im, segmentation=None, records=None)
+

Adds an image to the transcription interface, optionally filling in +information from a list of ocr_record objects.

+
+
Parameters:
+
    +
  • im (PIL.Image) – Input image

  • +
  • segmentation (dict) – Output of the segment method.

  • +
  • records (list) – A list of ocr_record objects.

  • +
+
+
+
+ +
+
+env
+
+ +
+
+font
+
+ +
+
+line_idx = 1
+
+ +
+
+page_idx = 1
+
+ +
+
+pages: List[Dict[Any, Any]] = []
+
+ +
+
+seg_idx = 1
+
+ +
+
+text_direction = 'horizontal-tb'
+
+ +
+
+tmpl
+
+ +
+
+write(fd)
+

Writes the HTML file to a file descriptor.

+
+
Parameters:
+

fd (File) – File descriptor (mode=’rb’) to write to.

+
+
+
+ +
+ +
+
+

kraken.linegen module

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.0.0/genindex.html b/5.0.0/genindex.html new file mode 100644 index 000000000..5c01e339e --- /dev/null +++ b/5.0.0/genindex.html @@ -0,0 +1,914 @@ + + + + + + + Index — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ + +

Index

+ +
+ A + | B + | C + | D + | E + | F + | G + | H + | I + | K + | L + | M + | N + | O + | P + | R + | S + | T + | U + | V + | W + | X + +
+

A

+ + + +
+ +

B

+ + + +
+ +

C

+ + + +
+ +

D

+ + + +
+ +

E

+ + + +
+ +

F

+ + + +
+ +

G

+ + + +
+ +

H

+ + + +
+ +

I

+ + + +
+ +

K

+ + + +
+ +

L

+ + + +
+ +

M

+ + + +
+ +

N

+ + + +
+ +

O

+ + + +
+ +

P

+ + + +
+ +

R

+ + + +
+ +

S

+ + + +
+ +

T

+ + + +
+ +

U

+ + + +
+ +

V

+ + + +
+ +

W

+ + + +
+ +

X

+ + +
+ + + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.0.0/gpu.html b/5.0.0/gpu.html new file mode 100644 index 000000000..ff9de72b5 --- /dev/null +++ b/5.0.0/gpu.html @@ -0,0 +1,100 @@ + + + + + + + + GPU Acceleration — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

GPU Acceleration

+

The latest version of kraken uses a new pytorch backend which enables GPU +acceleration both for training and recognition. Apart from a compatible Nvidia +GPU, CUDA and cuDNN have to be installed so pytorch can run computation on it.

+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.0.0/index.html b/5.0.0/index.html new file mode 100644 index 000000000..ca327e2cd --- /dev/null +++ b/5.0.0/index.html @@ -0,0 +1,1040 @@ + + + + + + + + kraken — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

kraken

+
+
+

kraken is a turn-key OCR system optimized for historical and non-Latin script +material.

+
+
+

Features

+

kraken’s main features are:

+
+
+
+

Pull requests and code contributions are always welcome.

+
+
+

Installation

+

Kraken can be run on Linux or Mac OS X (both x64 and ARM). Installation through +the on-board pip utility and the anaconda +scientific computing python are supported.

+
+

Installation using Pip

+
$ pip install kraken
+
+
+

or by running pip in the git repository:

+
$ pip install .
+
+
+

If you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to +install the pdf extras package for PyPi:

+
$ pip install kraken[pdf]
+
+
+

or

+
$ pip install .[pdf]
+
+
+

respectively.

+
+
+

Installation using Conda

+

To install the stable version through conda:

+
$ conda install -c conda-forge -c mittagessen kraken
+
+
+

Again PDF/multi-page TIFF/JPEG2000 support requires some additional dependencies:

+
$ conda install -c conda-forge pyvips
+
+
+

The git repository contains some environment files that aid in setting up the latest development version:

+
$ git clone https://github.com/mittagessen/kraken.git
+$ cd kraken
+$ conda env create -f environment.yml
+
+
+

or:

+
$ git clone https://github.com/mittagessen/kraken.git
+$ cd kraken
+$ conda env create -f environment_cuda.yml
+
+
+

for CUDA acceleration with the appropriate hardware.

+
+
+

Finding Recognition Models

+

Finally you’ll have to scrounge up a model to do the actual recognition of +characters. To download the default model for printed French text and place it +in the kraken directory for the current user:

+
$ kraken get 10.5281/zenodo.10592716
+
+
+

A list of libre models available in the central repository can be retrieved by +running:

+
$ kraken list
+
+
+

Model metadata can be extracted using:

+
$ kraken show 10.5281/zenodo.10592716
+name: 10.5281/zenodo.10592716
+
+CATMuS-Print (Large, 2024-01-30) - Diachronic model for French prints and other languages
+
+<p><strong>CATMuS-Print (Large) - Diachronic model for French prints and other West European languages</strong></p>
+<p>CATMuS (Consistent Approach to Transcribing ManuScript) Print is a Kraken HTR model trained on data produced by several projects, dealing with different languages (French, Spanish, German, English, Corsican, Catalan, Latin, Italian&hellip;) and different centuries (from the first prints of the 16th c. to digital documents of the 21st century).</p>
+<p>Transcriptions follow graphematic principles and try to be as compatible as possible with guidelines previously published for French: no ligature (except those that still exist), no allographetic variants (except the long s), and preservation of the historical use of some letters (u/v, i/j). Abbreviations are not resolved. Inconsistencies might be present, because transcriptions have been done over several years and the norms have slightly evolved.</p>
+<p>The model is trained with NFKD Unicode normalization: each diacritic (including superscripts) are transcribed as their own characters, separately from the "main" character.</p>
+<p>This model is the result of the collaboration from researchers from the University of Geneva and Inria Paris and will be consolidated under the CATMuS Medieval Guidelines in an upcoming paper.</p>
+scripts: Latn
+alphabet: !"#$%&'()*+,-./0123456789:;<=>?ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_abcdefghijklmnopqrstuvwxyz|}~¡£¥§«¬°¶·»¿ÆßæđłŒœƀǝɇΑΒΓΔΕΖΘΙΚΛΜΝΟΠΡΣΤΥΦΧΩαβγδεζηθικλμνξοπρςστυφχωϛחלרᑕᗅᗞᚠẞ–—‘’‚“”„‟†•⁄⁊⁋℟←▽◊★☙✠✺✻⟦⟧⬪ꝑꝓꝗꝙꝟꝯꝵ SPACE, COMBINING GRAVE ACCENT, COMBINING ACUTE ACCENT, COMBINING CIRCUMFLEX ACCENT, COMBINING TILDE, COMBINING MACRON, COMBINING DOT ABOVE, COMBINING DIAERESIS, COMBINING RING ABOVE, COMBINING COMMA ABOVE, COMBINING REVERSED COMMA ABOVE, COMBINING CEDILLA, COMBINING OGONEK, COMBINING GREEK PERISPOMENI, COMBINING GREEK YPOGEGRAMMENI, COMBINING LATIN SMALL LETTER I, COMBINING LATIN SMALL LETTER U, 0xe682, 0xe68b, 0xe8bf, 0xf1a7
+accuracy: 98.56%
+license: cc-by-4.0
+author(s): Gabay, Simon; Clérice, Thibault
+date: 2024-01-30
+
+
+
+
+
+

Quickstart

+

The structure of an OCR software consists of multiple steps, primarily +preprocessing, segmentation, and recognition, each of which takes the output of +the previous step and sometimes additional files such as models and templates +that define how a particular transformation is to be performed.

+

In kraken these are separated into different subcommands that can be chained or +ran separately:

+ + + + + + + + + + + + + + + Segmentation + + + + + + + + + + + Recognition + + + + + + + + + + + Serialization + + + + + + + + + + + + + + + + + + + + + + Recognition Model + + + + + + + + + + + + + + + + + + + + + + Segmentation Model + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + OCR Records + + + + + + + + + + + + + + + + + + Baselines, + Regions, + and Order + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Output File + + + + + + + + + + + + + + + + + + Output Template + + + + + + + + + + + + + + + + + + Image + + +

Recognizing text on an image using the default parameters including the +prerequisite step of page segmentation:

+
$ kraken -i image.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel
+Loading RNN     ✓
+Processing      ⣻
+
+
+

To segment an image into reading-order sorted baselines and regions:

+
$ kraken -i bw.tif lines.json segment -bl
+
+
+

To OCR an image using the previously downloaded model:

+
$ kraken -i bw.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel
+
+
+

To OCR an image using the default model and serialize the output using the ALTO +template:

+
$ kraken -a -i bw.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel
+
+
+

All commands and their parameters are documented, just add the standard +--help flag for further information.

+
+
+

Training Tutorial

+

There is a training tutorial at Training kraken.

+
+ +
+

License

+

Kraken is provided under the terms and conditions of the Apache 2.0 +License.

+
+
+

Funding

+

kraken is developed at the École Pratique des Hautes Études, Université PSL.

+
+
+Co-financed by the European Union + +
+
+

This project was partially funded through the RESILIENCE project, funded from +the European Union’s Horizon 2020 Framework Programme for Research and +Innovation.

+
+
+
+
+Received funding from the Programme d’investissements d’Avenir + +
+
+

Ce travail a bénéficié d’une aide de l’État gérée par l’Agence Nationale de la +Recherche au titre du Programme d’Investissements d’Avenir portant la référence +ANR-21-ESRE-0005 (Biblissima+).

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.0.0/ketos.html b/5.0.0/ketos.html new file mode 100644 index 000000000..aefd2707d --- /dev/null +++ b/5.0.0/ketos.html @@ -0,0 +1,950 @@ + + + + + + + + Training — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Training

+

This page describes the training utilities available through the ketos +command line utility in depth. For a gentle introduction on model training +please refer to the tutorial.

+

There are currently three trainable components in the kraken processing pipeline: +* Segmentation: finding lines and regions in images +* Reading Order: ordering lines found in the previous segmentation step. Reading order models are closely linked to segmentation models and both are usually trained on the same dataset. +* Recognition: recognition models transform images of lines into text.

+

Depending on the use case it is not necessary to manually train new models for +each material. The default segmentation model works well on quite a variety of +handwritten and printed documents, a reading order model might not perform +better than the default heuristic for simple text flows, and there are +recognition models for some types of material available in the repository.

+
+

Best practices

+
+

Recognition model training

+
    +
  • The default architecture works well for decently sized datasets.

  • +
  • Use precompiled binary datasets and put them in a place where they can be memory mapped during training (local storage, not NFS or similar).

  • +
  • Use the --logger flag to track your training metrics across experiments using Tensorboard.

  • +
  • If the network doesn’t converge before the early stopping aborts training, increase --min-epochs or --lag. Use the --logger option to inspect your training loss.

  • +
  • Use the flag --augment to activate data augmentation.

  • +
  • Increase the amount of --workers to speedup data loading. This is essential when you use the --augment option.

  • +
  • When using an Nvidia GPU, set the --precision option to 16 to use automatic mixed precision (AMP). This can provide significant speedup without any loss in accuracy.

  • +
  • Use option -B to scale batch size until GPU utilization reaches 100%. When using a larger batch size, it is recommended to use option -r to scale the learning rate by the square root of the batch size (1e-3 * sqrt(batch_size)).

  • +
  • When fine-tuning, it is recommended to use new mode not union as the network will rapidly unlearn missing labels in the new dataset.

  • +
  • If the new dataset is fairly dissimilar or your base model has been pretrained with ketos pretrain, use --warmup in conjunction with --freeze-backbone for one 1 or 2 epochs.

  • +
  • Upload your models to the model repository.

  • +
+
+
+

Segmentation model training

+
    +
  • The segmenter is fairly robust when it comes to hyperparameter choice.

  • +
  • Start by finetuning from the default model for a fixed number of epochs (50 for reasonably sized datasets) with a cosine schedule.

  • +
  • Segmentation models’ performance is difficult to evaluate. Pixel accuracy doesn’t mean much because there are many more pixels that aren’t part of a line or region than just background. Frequency-weighted IoU is good for overall performance, while mean IoU overrepresents rare classes. The best way to evaluate segmentation models is to look at the output on unlabelled data.

  • +
  • If you don’t have rare classes you can use a fairly small validation set to make sure everything is converging and just visually validate on unlabelled data.

  • +
+
+
+
+

Training data formats

+

The training tools accept a variety of training data formats, usually some kind +of custom low level format, the XML-based formats that are commony used for +archival of annotation and transcription data, and in the case of recognizer +training a precompiled binary format. It is recommended to use the XML formats +for segmentation and reading order training and the binary format for +recognition training.

+
+

ALTO

+

Kraken parses and produces files according to ALTO 4.3. An example showing the +attributes necessary for segmentation, recognition, and reading order training +follows:

+
<?xml version="1.0" encoding="UTF-8"?>
+<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+	xmlns="http://www.loc.gov/standards/alto/ns-v4#"
+	xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-0.xsd">
+	<Description>
+		<sourceImageInformation>
+			<fileName>filename.jpg</fileName><!-- relative path in relation to XML location of the image file-->
+		</sourceImageInformation>
+		....
+	</Description>
+	<Layout>
+		<Page...>
+			<PrintSpace...>
+				<ComposedBlockType ID="block_I"
+						   HPOS="125"
+						   VPOS="523" 
+						   WIDTH="5234" 
+						   HEIGHT="4000"
+						   TYPE="region_type"><!-- for textlines part of a semantic region -->
+					<TextBlock ID="textblock_N">
+						<TextLine ID="line_0"
+							  HPOS="..."
+							  VPOS="..." 
+							  WIDTH="..." 
+							  HEIGHT="..."
+							  BASELINE="10 20 15 20 400 20"><!-- necessary for segmentation training -->
+							<String ID="segment_K" 
+								CONTENT="word_text"><!-- necessary for recognition training. Text is retrieved from <String> and <SP> tags. Lower level glyphs are ignored. -->
+								...
+							</String>
+							<SP.../>
+						</TextLine>
+					</TextBlock>
+				</ComposedBlockType>
+				<TextBlock ID="textblock_M"><!-- for textlines not part of a region -->
+				...
+				</TextBlock>
+			</PrintSpace>
+		</Page>
+	</Layout>
+</alto>
+
+
+

Importantly, the parser only works with measurements in the pixel domain, i.e. +an unset MeasurementUnit or one with an element value of pixel. In +addition, as the minimal version required for ingestion is quite new it is +likely that most existing ALTO documents will not contain sufficient +information to be used with kraken out of the box.

+
+
+

PAGE XML

+

PAGE XML is parsed and produced according to the 2019-07-15 version of the +schema, although the parser is not strict and works with non-conformant output +from a variety of tools. As with ALTO, PAGE XML files can be used to train +segmentation, reading order, and recognition models.

+
<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd">
+	<Metadata>...</Metadata>
+	<Page imageFilename="filename.jpg"...><!-- relative path to an image file from the location of the XML document -->
+		<TextRegion id="block_N"
+			    custom="structure {type:region_type;}"><!-- region type is a free text field-->
+			<Coords points="10,20 500,20 400,200, 500,300, 10,300 5,80"/><!-- polygon for region boundary -->
+			<TextLine id="line_K">
+				<Baseline points="80,200 100,210, 400,198"/><!-- required for baseline segmentation training -->
+				<TextEquiv><Unicode>text text text</Unicode></TextEquiv><!-- only TextEquiv tags immediately below the TextLine tag are parsed for recognition training -->
+				<Word>
+				...
+			</TextLine>
+			....
+		</TextRegion>
+		<TextRegion id="textblock_M"><!-- for lines not contained in any region. TextRegions without a type are automatically assigned the 'text' type which can be filtered out for training. -->
+			<Coords points="0,0 0,{{ page.size[1] }} {{ page.size[0] }},{{ page.size[1] }} {{ page.size[0] }},0"/>
+			<TextLine>...</TextLine><!-- same as above -->
+			....
+                </TextRegion>
+	</Page>
+</PcGts>
+
+
+
+
+

Binary Datasets

+

In addition to training recognition models directly from XML and image files, a +binary dataset format offering a couple of advantages is supported for +recognition training. Binary datasets drastically improve loading performance +allowing the saturation of most GPUs with minimal computational overhead while +also allowing training with datasets that are larger than the systems main +memory. A minor drawback is a ~30% increase in dataset size in comparison to +the raw images + XML approach.

+

To realize this speedup the dataset has to be compiled first:

+
$ ketos compile -f xml -o dataset.arrow file_1.xml file_2.xml ...
+
+
+

if there are a lot of individual lines containing many lines this process can +take a long time. It can easily be parallelized by specifying the number of +separate parsing workers with the --workers option:

+
$ ketos compile --workers 8 -f xml ...
+
+
+

In addition, binary datasets can contain fixed splits which allow +reproducibility and comparability between training and evaluation runs. +Training, validation, and test splits can be pre-defined from multiple sources. +Per default they are sourced from tags defined in the source XML files unless +the option telling kraken to ignore them is set:

+
$ ketos compile --ignore-splits -f xml ...
+
+
+

Alternatively fixed-proportion random splits can be created ad-hoc during +compile time:

+
$ ketos compile --random-split 0.8 0.1 0.1 ...
+
+
+

The above line splits assigns 80% of the source lines to the training set, 10% +to the validation set, and 10% to the test set. The training and validation +sets in the dataset file are used automatically by ketos train (unless told +otherwise) while the remaining 10% of the test set is selected by ketos test.

+
+
+
+

Recognition training

+

The training utility allows training of VGSL specified models +both from scratch and from existing models. Here are its most important command line options:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

option

action

-o, --output

Output model file prefix. Defaults to model.

-s, --spec

VGSL spec of the network to train. CTC layer +will be added automatically. default: +[1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 +Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do]

-a, --append

Removes layers before argument and then +appends spec. Only works when loading an +existing model

-i, --load

Load existing file to continue training

-F, --savefreq

Model save frequency in epochs during +training

-q, --quit

Stop condition for training. Set to early +for early stopping (default) or fixed for fixed +number of epochs.

-N, --epochs

Number of epochs to train for.

--min-epochs

Minimum number of epochs to train for when using early stopping.

--lag

Number of epochs to wait before stopping +training without improvement. Only used when using early stopping.

-d, --device

Select device to use (cpu, cuda:0, cuda:1,…). GPU acceleration requires CUDA.

--optimizer

Select optimizer (Adam, SGD, RMSprop).

-r, --lrate

Learning rate [default: 0.001]

-m, --momentum

Momentum used with SGD optimizer. Ignored otherwise.

-w, --weight-decay

Weight decay.

--schedule

Sets the learning rate scheduler. May be either constant, 1cycle, exponential, cosine, step, or +reduceonplateau. For 1cycle the cycle length is determined by the –epoch option.

-p, --partition

Ground truth data partition ratio between train/validation set

-u, --normalization

Ground truth Unicode normalization. One of NFC, NFKC, NFD, NFKD.

-c, --codec

Load a codec JSON definition (invalid if loading existing model)

--resize

Codec/output layer resizing option. If set +to union code points will be added, new +will set the layer to match exactly the +training data, fail will abort if training +data and model codec do not match. Only valid when refining an existing model.

-n, --reorder / --no-reorder

Reordering of code points to display order.

-t, --training-files

File(s) with additional paths to training data. Used to +enforce an explicit train/validation set split and deal with +training sets with more lines than the command line can process. Can be used more than once.

-e, --evaluation-files

File(s) with paths to evaluation data. Overrides the -p parameter.

-f, --format-type

Sets the training and evaluation data format. +Valid choices are ‘path’, ‘xml’ (default), ‘alto’, ‘page’, or binary. +In alto, page, and xml mode all data is extracted from XML files +containing both baselines and a link to source images. +In path mode arguments are image files sharing a prefix up to the last +extension with JSON .path files containing the baseline information. +In binary mode arguments are precompiled binary dataset files.

--augment / --no-augment

Enables/disables data augmentation.

--workers

Number of OpenMP threads and workers used to perform neural network passes and load samples from the dataset.

+
+

From Scratch

+

The absolute minimal example to train a new recognition model from a number of +ALTO or PAGE XML documents is similar to the segmentation training:

+
$ ketos train -f xml training_data/*.xml
+
+
+

Training will continue until the error does not improve anymore and the best +model (among intermediate results) will be saved in the current directory; this +approach is called early stopping.

+

In some cases changing the network architecture might be useful. One such +example would be material that is not well recognized in the grayscale domain, +as the default architecture definition converts images into grayscale. The +input definition can be changed quite easily to train on color data (RGB) instead:

+
$ ketos train -f page -s '[1,120,0,3 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do0.1,2 Lbx200 Do]]' syr/*.xml
+
+
+

Complete documentation for the network description language can be found on the +VGSL page.

+

Sometimes the early stopping default parameters might produce suboptimal +results such as stopping training too soon. Adjusting the lag can be useful:

+
$ ketos train --lag 10 syr/*.png
+
+
+

To switch optimizers from Adam to SGD or RMSprop just set the option:

+
$ ketos train --optimizer SGD syr/*.png
+
+
+

It is possible to resume training from a previously saved model:

+
$ ketos train -i model_25.mlmodel syr/*.png
+
+
+

A good configuration for a small precompiled print dataset and GPU acceleration +would be:

+
$ ketos train -d cuda -f binary dataset.arrow
+
+
+

A better configuration for large and complicated datasets such as handwritten texts:

+
$ ketos train --augment --workers 4 -d cuda -f binary --min-epochs 20 -w 0 -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]' -r 0.0001 dataset_large.arrow
+
+
+

This configuration is slower to train and often requires a couple of epochs to +output any sensible text at all. Therefore we tell ketos to train for at least +20 epochs so the early stopping algorithm doesn’t prematurely interrupt the +training process.

+
+
+

Fine Tuning

+

Fine tuning an existing model for another typeface or new characters is also +possible with the same syntax as resuming regular training:

+
$ ketos train -f page -i model_best.mlmodel syr/*.xml
+
+
+

The caveat is that the alphabet of the base model and training data have to be +an exact match. Otherwise an error will be raised:

+
$ ketos train -i model_5.mlmodel kamil/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[0.8616] alphabet mismatch {'~', '»', '8', '9', 'ـ'}
+Network codec not compatible with training set
+[0.8620] Training data and model codec alphabets mismatch: {'ٓ', '؟', '!', 'ص', '،', 'ذ', 'ة', 'ي', 'و', 'ب', 'ز', 'ح', 'غ', '~', 'ف', ')', 'د', 'خ', 'م', '»', 'ع', 'ى', 'ق', 'ش', 'ا', 'ه', 'ك', 'ج', 'ث', '(', 'ت', 'ظ', 'ض', 'ل', 'ط', '؛', 'ر', 'س', 'ن', 'ء', 'ٔ', '«', 'ـ', 'ٕ'}
+
+
+

There are two modes dealing with mismatching alphabets, union and new. +union resizes the output layer and codec of the loaded model to include all +characters in the new training set without removing any characters. new +will make the resulting model an exact match with the new training set by both +removing unused characters from the model and adding new ones.

+
$ ketos -v train --resize union -i model_5.mlmodel syr/*.png
+...
+[0.7943] Training set 788 lines, validation set 88 lines, alphabet 50 symbols
+...
+[0.8337] Resizing codec to include 3 new code points
+[0.8374] Resizing last layer in network to 52 outputs
+...
+
+
+

In this example 3 characters were added for a network that is able to +recognize 52 different characters after sufficient additional training.

+
$ ketos -v train --resize new -i model_5.mlmodel syr/*.png
+...
+[0.7593] Training set 788 lines, validation set 88 lines, alphabet 49 symbols
+...
+[0.7857] Resizing network or given codec to 49 code sequences
+[0.8344] Deleting 2 output classes from network (46 retained)
+...
+
+
+

In new mode 2 of the original characters were removed and 3 new ones were added.

+
+
+

Slicing

+

Refining on mismatched alphabets has its limits. If the alphabets are highly +different the modification of the final linear layer to add/remove character +will destroy the inference capabilities of the network. In those cases it is +faster to slice off the last few layers of the network and only train those +instead of a complete network from scratch.

+

Taking the default network definition as printed in the debug log we can see +the layer indices of the model:

+
[0.8760] Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 48 outputs
+[0.8762] layer          type    params
+[0.8790] 0              conv    kernel 3 x 3 filters 32 activation r
+[0.8795] 1              dropout probability 0.1 dims 2
+[0.8797] 2              maxpool kernel 2 x 2 stride 2 x 2
+[0.8802] 3              conv    kernel 3 x 3 filters 64 activation r
+[0.8804] 4              dropout probability 0.1 dims 2
+[0.8806] 5              maxpool kernel 2 x 2 stride 2 x 2
+[0.8813] 6              reshape from 1 1 x 12 to 1/3
+[0.8876] 7              rnn     direction b transposed False summarize False out 100 legacy None
+[0.8878] 8              dropout probability 0.5 dims 1
+[0.8883] 9              linear  augmented False out 48
+
+
+

To remove everything after the initial convolutional stack and add untrained +layers we define a network stub and index for appending:

+
$ ketos train -i model_1.mlmodel --append 7 -s '[Lbx256 Do]' syr/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[0.8014] alphabet mismatch {'8', '3', '9', '7', '܇', '݀', '݂', '4', ':', '0'}
+Slicing and dicing model ✓
+
+
+

The new model will behave exactly like a new one, except potentially training a +lot faster.

+
+
+

Text Normalization and Unicode

+

Text can be encoded in multiple different ways when using Unicode. For many +scripts characters with diacritics can be encoded either as a single code point +or a base character and the diacritic, different types of whitespace exist, and mixed bidirectional text +can be written differently depending on the base line direction.

+

Ketos provides options to largely normalize input into normalized forms that +make processing of data from multiple sources possible. Principally, two +options are available: one for Unicode normalization and one for whitespace normalization. The +Unicode normalization (disabled per default) switch allows one to select one of +the 4 normalization forms:

+
$ ketos train --normalization NFD -f xml training_data/*.xml
+$ ketos train --normalization NFC -f xml training_data/*.xml
+$ ketos train --normalization NFKD -f xml training_data/*.xml
+$ ketos train --normalization NFKC -f xml training_data/*.xml
+
+
+

Whitespace normalization is enabled per default and converts all Unicode +whitespace characters into a simple space. It is highly recommended to leave +this function enabled as the variation of space width, resulting either from +text justification or the irregularity of handwriting, is difficult for a +recognition model to accurately model and map onto the different space code +points. Nevertheless it can be disabled through:

+
$ ketos train --no-normalize-whitespace -f xml training_data/*.xml
+
+
+

Further the behavior of the BiDi algorithm can be influenced through two options. The +configuration of the algorithm is important as the recognition network is +trained to output characters (or rather labels which are mapped to code points +by a codec) in the order a line is fed into the network, i.e. +left-to-right also called display order. Unicode text is encoded as a stream of +code points in logical order, i.e. the order the characters in a line are read +in by a human reader, for example (mostly) right-to-left for a text in Hebrew. +The BiDi algorithm resolves this logical order to the display order expected by +the network and vice versa. The primary parameter of the algorithm is the base +direction which is just the default direction of the input fields of the user +when the ground truth was initially transcribed. Base direction will be +automatically determined by kraken when using PAGE XML or ALTO files that +contain it, otherwise it will have to be supplied if it differs from the +default when training a model:

+
$ ketos train --base-dir R -f xml rtl_training_data/*.xml
+
+
+

It is also possible to disable BiDi processing completely, e.g. when the text +has been brought into display order already:

+
$ ketos train --no-reorder -f xml rtl_display_data/*.xml
+
+
+
+
+

Codecs

+

Codecs map between the label decoded from the raw network output and Unicode +code points (see this diagram for the precise steps +involved in text line recognition). Codecs are attached to a recognition model +and are usually defined once at initial training time, although they can be +adapted either explicitly (with the API) or implicitly through domain adaptation.

+

The default behavior of kraken is to auto-infer this mapping from all the +characters in the training set and map each code point to one separate label. +This is usually sufficient for alphabetic scripts, abjads, and abugidas apart +from very specialised use cases. Logographic writing systems with a very large +number of different graphemes, such as all the variants of Han characters or +Cuneiform, can be more problematic as their large inventory makes recognition +both slow and error-prone. In such cases it can be advantageous to decompose +each code point into multiple labels to reduce the output dimensionality of the +network. During decoding valid sequences of labels will be mapped to their +respective code points as usual.

+

There are multiple approaches one could follow constructing a custom codec: +randomized block codes, i.e. producing random fixed-length labels for each code +point, Huffmann coding, i.e. variable length label sequences depending on the +frequency of each code point in some text (not necessarily the training set), +or structural decomposition, i.e. describing each code point through a +sequence of labels that describe the shape of the grapheme similar to how some +input systems for Chinese characters function.

+

While the system is functional it is not well-tested in practice and it is +unclear which approach works best for which kinds of inputs.

+

Custom codecs can be supplied as simple JSON files that contain a dictionary +mapping between strings and integer sequences, e.g.:

+
$ ketos train -c sample.codec -f xml training_data/*.xml
+
+
+

with sample.codec containing:

+
{"S": [50, 53, 74, 23],
+ "A": [95, 60, 19, 95],
+ "B": [2, 96, 28, 29],
+ "\u1f05": [91, 14, 95, 90]}
+
+
+
+
+
+

Unsupervised recognition pretraining

+

Text recognition models can be pretrained in an unsupervised fashion from text +line images, both in bounding box and baseline format. The pretraining is +performed through a contrastive surrogate task aiming to distinguish in-painted +parts of the input image features from randomly sampled distractor slices.

+

All data sources accepted by the supervised trainer are valid for pretraining +but for performance reasons it is recommended to use pre-compiled binary +datasets. One thing to keep in mind is that compilation filters out empty +(non-transcribed) text lines per default which is undesirable for pretraining. +With the --keep-empty-lines option all valid lines will be written to the +dataset file:

+
$ ketos compile --keep-empty-lines -f xml -o foo.arrow *.xml
+
+
+

The basic pretraining call is very similar to a training one:

+
$ ketos pretrain -f binary foo.arrow
+
+
+

There are a couple of hyperparameters that are specific to pretraining: the +mask width (at the subsampling level of the last convolutional layer), the +probability of a particular position being the start position of a mask, and +the number of negative distractor samples.

+
$ ketos pretrain -o pretrain --mask-width 4 --mask-probability 0.2 --num-negatives 3 -f binary foo.arrow
+
+
+

Once a model has been pretrained it has to be adapted to perform actual +recognition with a standard labelled dataset, although training data +requirements will usually be much reduced:

+
$ ketos train -i pretrain_best.mlmodel --warmup 5000 --freeze-backbone 1000 -f binary labelled.arrow
+
+
+

It is necessary to use learning rate warmup (warmup) for at least a couple of +epochs in addition to freezing the backbone (all but the last fully connected +layer performing the classification) to have the model converge during +fine-tuning. Fine-tuning models from pre-trained weights is quite a bit less +stable than training from scratch or fine-tuning an existing model. As such it +can be necessary to run a couple of trials with different hyperparameters +(principally learning rate) to find workable ones. It is entirely possible that +pretrained models do not converge at all even with reasonable hyperparameter +configurations.

+
+
+

Segmentation training

+

Training a segmentation model is very similar to training models for text +recognition. The basic invocation is:

+
$ ketos segtrain -f xml training_data/*.xml
+
+
+

This takes all text lines and regions encoded in the XML files and trains a +model to recognize them.

+

Most other options available in transcription training are also available in +segmentation training. CUDA acceleration:

+
$ ketos segtrain -d cuda -f xml training_data/*.xml
+
+
+

Defining custom architectures:

+
$ ketos segtrain -d cuda -s '[1,1200,0,3 Cr7,7,64,2,2 Gn32 Cr3,3,128,2,2 Gn32 Cr3,3,128 Gn32 Cr3,3,256 Gn32]' -f xml training_data/*.xml
+
+
+

Fine tuning/transfer learning with last layer adaptation and slicing:

+
$ ketos segtrain --resize new -i segmodel_best.mlmodel training_data/*.xml
+$ ketos segtrain -i segmodel_best.mlmodel --append 7 -s '[Cr3,3,64 Do0.1]' training_data/*.xml
+
+
+

In addition there are a couple of specific options that allow filtering of +baseline and region types. Datasets are often annotated to a level that is too +detailed or contains undesirable types, e.g. when combining segmentation data +from different sources. The most basic option is the suppression of all of +either baseline or region data contained in the dataset:

+
$ ketos segtrain --suppress-baselines -f xml training_data/*.xml
+Training line types:
+Training region types:
+  graphic       3       135
+  text  4       1128
+  separator     5       5431
+  paragraph     6       10218
+  table 7       16
+...
+$ ketos segtrain --suppress-regions -f xml training-data/*.xml
+Training line types:
+  default 2     53980
+  foo     8     134
+...
+
+
+

It is also possible to filter out baselines/regions selectively:

+
$ ketos segtrain -f xml --valid-baselines default training_data/*.xml
+Training line types:
+  default 2     53980
+Training region types:
+  graphic       3       135
+  text  4       1128
+  separator     5       5431
+  paragraph     6       10218
+  table 7       16
+$ ketos segtrain -f xml --valid-regions graphic --valid-regions paragraph training_data/*.xml
+Training line types:
+  default 2     53980
+ Training region types:
+  graphic       3       135
+  paragraph     6       10218
+
+
+

Finally, we can merge baselines and regions into each other:

+
$ ketos segtrain -f xml --merge-baselines default:foo training_data/*.xml
+Training line types:
+  default 2     54114
+...
+$ ketos segtrain -f xml --merge-regions text:paragraph --merge-regions graphic:table training_data/*.xml
+...
+Training region types:
+  graphic       3       151
+  text  4       11346
+  separator     5       5431
+...
+
+
+

These options are combinable to massage the dataset into any typology you want. +Tags containing the separator character : can be specified by escaping them +with backslash.

+

Then there are some options that set metadata fields controlling the +postprocessing. When computing the bounding polygons the recognized baselines +are offset slightly to ensure overlap with the line corpus. This offset is per +default upwards for baselines but as it is possible to annotate toplines (for +scripts like Hebrew) and centerlines (for baseline-free scripts like Chinese) +the appropriate offset can be selected with an option:

+
$ ketos segtrain --topline -f xml hebrew_training_data/*.xml
+$ ketos segtrain --centerline -f xml chinese_training_data/*.xml
+$ ketos segtrain --baseline -f xml latin_training_data/*.xml
+
+
+

Lastly, there are some regions that are absolute boundaries for text line +content. When these regions are marked as such the polygonization can sometimes +be improved:

+
$ ketos segtrain --bounding-regions paragraph -f xml training_data/*.xml
+...
+
+
+
+
+

Reading order training

+

Reading order models work slightly differently from segmentation and reading +order models. They are closely linked to the typology used in the dataset they +were trained on as they use type information on lines and regions to make +ordering decisions. As the same typology was probably used to train a specific +segmentation model, reading order models are trained separately but bundled +with their segmentation model in a subsequent step. The general sequence is +therefore:

+
$ ketos segtrain -o fr_manu_seg.mlmodel -f xml french/*.xml
+...
+$ ketos rotrain -o fr_manu_ro.mlmodel -f xml french/*.xml
+...
+$ ketos roadd -o fr_manu_seg_with_ro.mlmodel -i fr_manu_seg_best.mlmodel  -r fr_manu_ro_best.mlmodel
+
+
+

Only the fr_manu_seg_with_ro.mlmodel file will contain the trained reading +order model. Segmentation models can exist with or without reading order +models. If one is added, the neural reading order will be computed in +addition to the one produced by the default heuristic during segmentation and +serialized in the final XML output (in ALTO/PAGE XML).

+
+

Note

+

Reading order models work purely on the typology and geometric features +of the lines and regions. They construct an approximate ordering matrix +by feeding feature vectors of two lines (or regions) into the network +to decide which of those two lines precedes the other.

+

These feature vectors are quite simple; just the lines’ types, and +their start, center, and end points. Therefore they can not reliably +learn any ordering relying on graphical features of the input page such +as: line color, typeface, or writing system.

+
+

Reading order models are extremely simple and do not require a lot of memory or +computational power to train. In fact, the default parameters are extremely +conservative and it is recommended to increase the batch size for improved +training speed. Large batch size above 128k are easily possible with +sufficiently large training datasets:

+
$ ketos rotrain -o fr_manu_ro.mlmodel -B 128000 -f french/*.xml
+Training RO on following baselines types:
+  DefaultLine   1
+  DropCapitalLine       2
+  HeadingLine   3
+  InterlinearLine       4
+GPU available: False, used: False
+TPU available: False, using: 0 TPU cores
+IPU available: False, using: 0 IPUs
+HPU available: False, using: 0 HPUs
+┏━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
+┃   ┃ Name        ┃ Type              ┃ Params ┃
+┡━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
+│ 0 │ criterion   │ BCEWithLogitsLoss │      0 │
+│ 1 │ ro_net      │ MLP               │  1.1 K │
+│ 2 │ ro_net.fc1  │ Linear            │  1.0 K │
+│ 3 │ ro_net.relu │ ReLU              │      0 │
+│ 4 │ ro_net.fc2  │ Linear            │     45 │
+└───┴─────────────┴───────────────────┴────────┘
+Trainable params: 1.1 K
+Non-trainable params: 0
+Total params: 1.1 K
+Total estimated model params size (MB): 0
+stage 0/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/35 0:00:00 • -:--:-- 0.00it/s val_spearman: 0.912 val_loss: 0.701 early_stopping: 0/300 inf
+
+
+

During validation a metric called Spearman’s footrule is computed. To calculate +Spearman’s footrule, the ranks of the lines of text in the ground truth reading +order and the predicted reading order are compared. The footrule is then +calculated as the sum of the absolute differences between the ranks of pairs of +lines. The score increases by 1 for each line between the correct and predicted +positions of a line.

+

A lower footrule score indicates a better alignment between the two orders. A +score of 0 implies perfect alignment of line ranks.

+
+
+

Recognition testing

+

Picking a particular model from a pool or getting a more detailed look on the +recognition accuracy can be done with the test command. It uses transcribed +lines, the test set, in the same format as the train command, recognizes the +line images with one or more models, and creates a detailed report of the +differences from the ground truth for each of them.

+ + + + + + + + + + + + + + + + + + + + + + + +

option

action

-f, --format-type

Sets the test set data format. +Valid choices are ‘path’, ‘xml’ (default), ‘alto’, ‘page’, or binary. +In alto, page, and xml mode all data is extracted from XML files +containing both baselines and a link to source images. +In path mode arguments are image files sharing a prefix up to the last +extension with JSON .path files containing the baseline information. +In binary mode arguments are precompiled binary dataset files.

-m, --model

Model(s) to evaluate.

-e, --evaluation-files

File(s) with paths to evaluation data.

-d, --device

Select device to use.

--pad

Left and right padding around lines.

+

Transcriptions are handed to the command in the same way as for the train +command, either through a manifest with -e/--evaluation-files or by just +adding a number of image files as the final argument:

+
$ ketos test -m $model -e test.txt test/*.png
+Evaluating $model
+Evaluating  [####################################]  100%
+=== report test_model.mlmodel ===
+
+7012 Characters
+6022 Errors
+14.12%       Accuracy
+
+5226 Insertions
+2    Deletions
+794  Substitutions
+
+Count Missed   %Right
+1567  575    63.31%  Common
+5230  5230   0.00%   Arabic
+215   215    0.00%   Inherited
+
+Errors       Correct-Generated
+773  { ا } - {  }
+536  { ل } - {  }
+328  { و } - {  }
+274  { ي } - {  }
+266  { م } - {  }
+256  { ب } - {  }
+246  { ن } - {  }
+241  { SPACE } - {  }
+207  { ر } - {  }
+199  { ف } - {  }
+192  { ه } - {  }
+174  { ع } - {  }
+172  { ARABIC HAMZA ABOVE } - {  }
+144  { ت } - {  }
+136  { ق } - {  }
+122  { س } - {  }
+108  { ، } - {  }
+106  { د } - {  }
+82   { ك } - {  }
+81   { ح } - {  }
+71   { ج } - {  }
+66   { خ } - {  }
+62   { ة } - {  }
+60   { ص } - {  }
+39   { ، } - { - }
+38   { ش } - {  }
+30   { ا } - { - }
+30   { ن } - { - }
+29   { ى } - {  }
+28   { ذ } - {  }
+27   { ه } - { - }
+27   { ARABIC HAMZA BELOW } - {  }
+25   { ز } - {  }
+23   { ث } - {  }
+22   { غ } - {  }
+20   { م } - { - }
+20   { ي } - { - }
+20   { ) } - {  }
+19   { : } - {  }
+19   { ط } - {  }
+19   { ل } - { - }
+18   { ، } - { . }
+17   { ة } - { - }
+16   { ض } - {  }
+...
+Average accuracy: 14.12%, (stddev: 0.00)
+
+
+

The report(s) contains character accuracy measured per script and a detailed +list of confusions. When evaluating multiple models the last line of the output +will the average accuracy and the standard deviation across all of them.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.0.0/models.html b/5.0.0/models.html new file mode 100644 index 000000000..81c9238e4 --- /dev/null +++ b/5.0.0/models.html @@ -0,0 +1,126 @@ + + + + + + + + Models — kraken documentation + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Models

+

There are currently three kinds of models containing the recurrent neural +networks doing all the character recognition supported by kraken: pronn +files serializing old pickled pyrnn models as protobuf, clstm’s native +serialization, and versatile Core ML models.

+
+

CoreML

+

Core ML allows arbitrary network architectures in a compact serialization with +metadata. This is the default format in pytorch-based kraken.

+
+
+

Segmentation Models

+
+
+

Recognition Models

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.0.0/objects.inv b/5.0.0/objects.inv new file mode 100644 index 000000000..ec4b7477d Binary files /dev/null and b/5.0.0/objects.inv differ diff --git a/5.0.0/search.html b/5.0.0/search.html new file mode 100644 index 000000000..59f976870 --- /dev/null +++ b/5.0.0/search.html @@ -0,0 +1,113 @@ + + + + + + + Search — kraken documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +

Search

+ + + + +

+ Searching for multiple words only shows matches that contain + all words. +

+ + +
+ + + +
+ + +
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.0.0/searchindex.js b/5.0.0/searchindex.js new file mode 100644 index 000000000..00074b8c3 --- /dev/null +++ b/5.0.0/searchindex.js @@ -0,0 +1 @@ +Search.setIndex({"alltitles": {"ABBYY XML": [[2, "abbyy-xml"]], "ALTO": [[5, "alto"]], "ALTO 4.4": [[2, "alto-4-4"]], "API Quickstart": [[1, null]], "API Reference": [[2, null]], "Advanced Usage": [[0, null]], "Annotation and transcription": [[7, "annotation-and-transcription"]], "Baseline Segmentation": [[0, "baseline-segmentation"]], "Baseline segmentation": [[1, "baseline-segmentation"]], "Basic Concepts": [[1, "basic-concepts"]], "Basics": [[8, "basics"]], "Best practices": [[5, "best-practices"]], "Binarization": [[0, "binarization"]], "Binary Datasets": [[5, "binary-datasets"]], "Codecs": [[5, "codecs"]], "Containers and Helpers": [[2, "containers-and-helpers"]], "Convolutional Layers": [[8, "convolutional-layers"]], "CoreML": [[6, "coreml"]], "Dataset Compilation": [[7, "dataset-compilation"]], "Default templates": [[2, "default-templates"]], "Dropout": [[8, "dropout"]], "Evaluation and Validation": [[7, "evaluation-and-validation"]], "Examples": [[8, "examples"]], "Features": [[4, "features"]], "Finding Recognition Models": [[4, "finding-recognition-models"]], "Fine Tuning": [[5, "fine-tuning"]], "From Scratch": [[5, "from-scratch"]], "Funding": [[4, "funding"]], "GPU Acceleration": [[3, null]], "Group Normalization": [[8, "group-normalization"]], "Helper and Plumbing Layers": [[8, "helper-and-plumbing-layers"]], "Helpers": [[2, "helpers"]], "Image acquisition and preprocessing": [[7, "image-acquisition-and-preprocessing"]], "Input and Outputs": [[0, "input-and-outputs"]], "Installation": [[4, "installation"]], "Installation using Conda": [[4, "installation-using-conda"]], "Installation using Pip": [[4, "installation-using-pip"]], "Installing kraken": [[7, "installing-kraken"]], "Legacy Box Segmentation": [[0, "legacy-box-segmentation"]], "Legacy modules": [[2, "legacy-modules"]], "Legacy segmentation": [[1, "legacy-segmentation"]], "License": [[4, "license"]], "Loss and Evaluation Functions": [[2, "loss-and-evaluation-functions"]], "Masking": [[0, "masking"]], "Max Pool": [[8, "max-pool"]], "Model Repository": [[0, "model-repository"]], "Models": [[6, null]], "Output formats": [[0, "output-formats"]], "PAGE XML": [[5, "page-xml"]], "Page Segmentation": [[0, "page-segmentation"]], "PageXML": [[2, "pagexml"]], "Preprocessing and Segmentation": [[1, "preprocessing-and-segmentation"]], "Principal Text Direction": [[0, "principal-text-direction"]], "Publishing": [[0, "publishing"]], "Querying and Model Retrieval": [[0, "querying-and-model-retrieval"]], "Quickstart": [[4, "quickstart"]], "Reading order datasets": [[2, "reading-order-datasets"]], "Reading order training": [[5, "reading-order-training"]], "Recognition": [[0, "recognition"], [1, "recognition"], [2, "recognition"], [7, "recognition"]], "Recognition Models": [[6, "recognition-models"]], "Recognition datasets": [[2, "recognition-datasets"]], "Recognition model training": [[5, "recognition-model-training"]], "Recognition testing": [[5, "recognition-testing"]], "Recognition training": [[5, "recognition-training"]], "Recurrent Layers": [[8, "recurrent-layers"]], "Regularization Layers": [[8, "regularization-layers"]], "Related Software": [[4, "related-software"]], "Reshape": [[8, "reshape"]], "Segmentation": [[2, "segmentation"]], "Segmentation Models": [[6, "segmentation-models"]], "Segmentation datasets": [[2, "segmentation-datasets"]], "Segmentation model training": [[5, "segmentation-model-training"]], "Segmentation training": [[5, "segmentation-training"]], "Serialization": [[1, "serialization"], [2, "serialization"]], "Slicing": [[5, "slicing"]], "Text Normalization and Unicode": [[5, "text-normalization-and-unicode"]], "Trainer": [[2, "trainer"]], "Training": [[1, "training"], [2, "training"], [5, null], [7, "compilation"]], "Training Tutorial": [[4, "training-tutorial"]], "Training data formats": [[5, "training-data-formats"]], "Training kraken": [[7, null]], "Unsupervised recognition pretraining": [[5, "unsupervised-recognition-pretraining"]], "VGSL network specification": [[8, null]], "XML Parsing": [[1, "xml-parsing"]], "hOCR": [[2, "hocr"]], "kraken": [[4, null]], "kraken.binarization module": [[2, "kraken-binarization-module"]], "kraken.blla module": [[2, "kraken-blla-module"]], "kraken.containers module": [[2, "kraken-containers-module"]], "kraken.lib.codec module": [[2, "kraken-lib-codec-module"]], "kraken.lib.ctc_decoder": [[2, "kraken-lib-ctc-decoder"]], "kraken.lib.dataset module": [[2, "kraken-lib-dataset-module"]], "kraken.lib.exceptions": [[2, "kraken-lib-exceptions"]], "kraken.lib.models module": [[2, "kraken-lib-models-module"]], "kraken.lib.segmentation module": [[2, "kraken-lib-segmentation-module"]], "kraken.lib.train module": [[2, "kraken-lib-train-module"]], "kraken.lib.vgsl module": [[2, "kraken-lib-vgsl-module"]], "kraken.lib.xml module": [[2, "kraken-lib-xml-module"]], "kraken.linegen module": [[2, "kraken-linegen-module"]], "kraken.pageseg module": [[2, "kraken-pageseg-module"]], "kraken.rpred module": [[2, "kraken-rpred-module"]], "kraken.serialization module": [[2, "kraken-serialization-module"]], "kraken.transcribe module": [[2, "kraken-transcribe-module"]]}, "docnames": ["advanced", "api", "api_docs", "gpu", "index", "ketos", "models", "training", "vgsl"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2}, "filenames": ["advanced.rst", "api.rst", "api_docs.rst", "gpu.rst", "index.rst", "ketos.rst", "models.rst", "training.rst", "vgsl.rst"], "indexentries": {"add() (kraken.lib.dataset.arrowipcrecognitiondataset method)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.add", false]], "add() (kraken.lib.dataset.baselineset method)": [[2, "kraken.lib.dataset.BaselineSet.add", false]], "add() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.add", false]], "add() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.add", false]], "add_codec() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.add_codec", false]], "add_labels() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.add_labels", false]], "add_line() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.add_line", false]], "add_line() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.add_line", false]], "add_page() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.add_page", false]], "add_page() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.add_page", false]], "add_page() (kraken.transcribe.transcriptioninterface method)": [[2, "kraken.transcribe.TranscriptionInterface.add_page", false]], "alphabet (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.alphabet", false]], "alphabet (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.alphabet", false]], "alphabet (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.alphabet", false]], "append() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.append", false]], "arrow_table (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.arrow_table", false]], "arrowipcrecognitiondataset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset", false]], "aug (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.aug", false]], "aug (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.aug", false]], "aug (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.aug", false]], "aug (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.aug", false]], "automatic_optimization (kraken.lib.train.krakentrainer attribute)": [[2, "kraken.lib.train.KrakenTrainer.automatic_optimization", false]], "aux_layers (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.aux_layers", false]], "base_dir (kraken.containers.baselineline attribute)": [[2, "id7", false], [2, "kraken.containers.BaselineLine.base_dir", false]], "base_dir (kraken.containers.baselineocrrecord attribute)": [[2, "id25", false], [2, "kraken.containers.BaselineOCRRecord.base_dir", false]], "base_dir (kraken.containers.bboxline attribute)": [[2, "id16", false], [2, "kraken.containers.BBoxLine.base_dir", false]], "base_dir (kraken.containers.bboxocrrecord attribute)": [[2, "id29", false], [2, "kraken.containers.BBoxOCRRecord.base_dir", false]], "base_dir (kraken.containers.ocr_record attribute)": [[2, "kraken.containers.ocr_record.base_dir", false]], "baseline (kraken.containers.baselineline attribute)": [[2, "id8", false], [2, "kraken.containers.BaselineLine.baseline", false]], "baselineline (class in kraken.containers)": [[2, "kraken.containers.BaselineLine", false]], "baselineocrrecord (class in kraken.containers)": [[2, "kraken.containers.BaselineOCRRecord", false]], "baselineset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.BaselineSet", false]], "batch (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.batch", false]], "bbox (kraken.containers.bboxline attribute)": [[2, "id17", false], [2, "kraken.containers.BBoxLine.bbox", false]], "bboxline (class in kraken.containers)": [[2, "kraken.containers.BBoxLine", false]], "bboxocrrecord (class in kraken.containers)": [[2, "kraken.containers.BBoxOCRRecord", false]], "beam_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.beam_decoder", false]], "bidi_reordering (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.bidi_reordering", false]], "blank_threshold_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.blank_threshold_decoder", false]], "blocks (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.blocks", false]], "boundary (kraken.containers.baselineline attribute)": [[2, "id9", false], [2, "kraken.containers.BaselineLine.boundary", false]], "bounds (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.bounds", false]], "build_addition() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_addition", false]], "build_conv() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_conv", false]], "build_dropout() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_dropout", false]], "build_groupnorm() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_groupnorm", false]], "build_identity() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_identity", false]], "build_maxpool() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_maxpool", false]], "build_output() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_output", false]], "build_parallel() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_parallel", false]], "build_reshape() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_reshape", false]], "build_rnn() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_rnn", false]], "build_ro() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_ro", false]], "build_series() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_series", false]], "build_wav2vec2() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_wav2vec2", false]], "c_sorted (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.c_sorted", false]], "calculate_polygonal_environment() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.calculate_polygonal_environment", false]], "category (kraken.containers.processingstep attribute)": [[2, "id32", false], [2, "kraken.containers.ProcessingStep.category", false]], "centerline_norm (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.centerline_norm", false]], "channels (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.channels", false]], "class_mapping (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.class_mapping", false]], "class_stats (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.class_stats", false]], "codec (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.codec", false]], "codec (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.codec", false]], "codec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.codec", false]], "collate_sequences() (in module kraken.lib.dataset)": [[2, "kraken.lib.dataset.collate_sequences", false]], "compute_confusions() (in module kraken.lib.dataset)": [[2, "kraken.lib.dataset.compute_confusions", false]], "compute_polygon_section() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.compute_polygon_section", false]], "confidences (kraken.containers.baselineocrrecord attribute)": [[2, "kraken.containers.BaselineOCRRecord.confidences", false]], "confidences (kraken.containers.bboxocrrecord attribute)": [[2, "kraken.containers.BBoxOCRRecord.confidences", false]], "confidences (kraken.containers.ocr_record property)": [[2, "kraken.containers.ocr_record.confidences", false]], "criterion (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id39", false], [2, "kraken.lib.vgsl.TorchVGSLModel.criterion", false]], "cuts (kraken.containers.baselineocrrecord attribute)": [[2, "kraken.containers.BaselineOCRRecord.cuts", false]], "cuts (kraken.containers.baselineocrrecord property)": [[2, "id26", false]], "cuts (kraken.containers.bboxocrrecord attribute)": [[2, "kraken.containers.BBoxOCRRecord.cuts", false]], "cuts (kraken.containers.ocr_record property)": [[2, "kraken.containers.ocr_record.cuts", false]], "data (kraken.lib.dataset.pagewiseroset attribute)": [[2, "kraken.lib.dataset.PageWiseROSet.data", false]], "data (kraken.lib.dataset.pairwiseroset attribute)": [[2, "kraken.lib.dataset.PairWiseROSet.data", false]], "decode() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.decode", false]], "decoder (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.decoder", false]], "description (kraken.containers.processingstep attribute)": [[2, "id33", false], [2, "kraken.containers.ProcessingStep.description", false]], "device (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.device", false]], "display_order (kraken.containers.baselineocrrecord attribute)": [[2, "kraken.containers.BaselineOCRRecord.display_order", false]], "display_order (kraken.containers.bboxocrrecord attribute)": [[2, "kraken.containers.BBoxOCRRecord.display_order", false]], "display_order() (kraken.containers.baselineocrrecord method)": [[2, "id27", false]], "display_order() (kraken.containers.bboxocrrecord method)": [[2, "id30", false]], "display_order() (kraken.containers.ocr_record method)": [[2, "kraken.containers.ocr_record.display_order", false]], "encode() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.encode", false]], "encode() (kraken.lib.dataset.arrowipcrecognitiondataset method)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.encode", false]], "encode() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.encode", false]], "encode() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.encode", false]], "env (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.env", false]], "eval() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.eval", false]], "extract_polygons() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.extract_polygons", false]], "failed_samples (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.failed_samples", false]], "failed_samples (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.failed_samples", false]], "failed_samples (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.failed_samples", false]], "failed_samples (kraken.lib.dataset.pagewiseroset attribute)": [[2, "kraken.lib.dataset.PageWiseROSet.failed_samples", false]], "failed_samples (kraken.lib.dataset.pairwiseroset attribute)": [[2, "kraken.lib.dataset.PairWiseROSet.failed_samples", false]], "failed_samples (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.failed_samples", false]], "fit() (kraken.lib.train.krakentrainer method)": [[2, "kraken.lib.train.KrakenTrainer.fit", false]], "font (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.font", false]], "force_binarization (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.force_binarization", false]], "forward() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.forward", false]], "get_feature_dim() (kraken.lib.dataset.pagewiseroset method)": [[2, "kraken.lib.dataset.PageWiseROSet.get_feature_dim", false]], "get_feature_dim() (kraken.lib.dataset.pairwiseroset method)": [[2, "kraken.lib.dataset.PairWiseROSet.get_feature_dim", false]], "global_align() (in module kraken.lib.dataset)": [[2, "kraken.lib.dataset.global_align", false]], "greedy_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.greedy_decoder", false]], "groundtruthdataset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.GroundTruthDataset", false]], "height (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.height", false]], "height (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id36", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.height", false]], "hyper_params (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.hyper_params", false]], "id (kraken.containers.baselineline attribute)": [[2, "id10", false], [2, "kraken.containers.BaselineLine.id", false]], "id (kraken.containers.bboxline attribute)": [[2, "id18", false], [2, "kraken.containers.BBoxLine.id", false]], "id (kraken.containers.processingstep attribute)": [[2, "id34", false], [2, "kraken.containers.ProcessingStep.id", false]], "idx (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.idx", false]], "im (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.im", false]], "im_mode (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.im_mode", false]], "im_mode (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.im_mode", false]], "im_mode (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.im_mode", false]], "im_mode (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.im_mode", false]], "imageinputtransforms (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.ImageInputTransforms", false]], "imagename (kraken.containers.baselineline attribute)": [[2, "id11", false], [2, "kraken.containers.BaselineLine.imagename", false]], "imagename (kraken.containers.bboxline attribute)": [[2, "id19", false], [2, "kraken.containers.BBoxLine.imagename", false]], "imagename (kraken.containers.segmentation attribute)": [[2, "id0", false], [2, "kraken.containers.Segmentation.imagename", false]], "imgs (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.imgs", false]], "init_weights() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.init_weights", false]], "input (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id40", false], [2, "kraken.lib.vgsl.TorchVGSLModel.input", false]], "is_valid (kraken.lib.codec.pytorchcodec property)": [[2, "kraken.lib.codec.PytorchCodec.is_valid", false]], "kind (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.kind", false]], "krakencairosurfaceexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenCairoSurfaceException", false]], "krakencodecexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenCodecException", false]], "krakenencodeexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenEncodeException", false]], "krakeninputexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenInputException", false]], "krakeninvalidmodelexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenInvalidModelException", false]], "krakenrecordexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenRecordException", false]], "krakenrepoexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenRepoException", false]], "krakenstoptrainingexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenStopTrainingException", false]], "krakentrainer (class in kraken.lib.train)": [[2, "kraken.lib.train.KrakenTrainer", false]], "l2c (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.l2c", false]], "l2c_single (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.l2c_single", false]], "legacy_polygons (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.legacy_polygons", false]], "legacy_polygons_status (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.legacy_polygons_status", false]], "len (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.len", false]], "line_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.line_idx", false]], "line_iter (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.line_iter", false]], "line_orders (kraken.containers.segmentation attribute)": [[2, "id1", false], [2, "kraken.containers.Segmentation.line_orders", false]], "line_width (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.line_width", false]], "lines (kraken.containers.segmentation attribute)": [[2, "id2", false], [2, "kraken.containers.Segmentation.lines", false]], "load_any() (in module kraken.lib.models)": [[2, "kraken.lib.models.load_any", false]], "load_model() (kraken.lib.vgsl.torchvgslmodel class method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.load_model", false]], "logical_order() (kraken.containers.baselineocrrecord method)": [[2, "kraken.containers.BaselineOCRRecord.logical_order", false]], "logical_order() (kraken.containers.bboxocrrecord method)": [[2, "kraken.containers.BBoxOCRRecord.logical_order", false]], "logical_order() (kraken.containers.ocr_record method)": [[2, "kraken.containers.ocr_record.logical_order", false]], "m (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.m", false]], "max_label (kraken.lib.codec.pytorchcodec property)": [[2, "kraken.lib.codec.PytorchCodec.max_label", false]], "mbl_dict (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.mbl_dict", false]], "merge() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.merge", false]], "message (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id37", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.message", false]], "mm_rpred (class in kraken.rpred)": [[2, "kraken.rpred.mm_rpred", false]], "mode (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.mode", false]], "model_type (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.model_type", false]], "mreg_dict (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.mreg_dict", false]], "named_spec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.named_spec", false]], "nets (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.nets", false]], "neural_reading_order() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.neural_reading_order", false]], "nlbin() (in module kraken.binarization)": [[2, "kraken.binarization.nlbin", false]], "nn (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.nn", false]], "nn (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id41", false], [2, "kraken.lib.vgsl.TorchVGSLModel.nn", false]], "no_encode() (kraken.lib.dataset.arrowipcrecognitiondataset method)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.no_encode", false]], "no_encode() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.no_encode", false]], "no_encode() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.no_encode", false]], "no_legacy_polygons (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.no_legacy_polygons", false]], "num_classes (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.num_classes", false]], "ocr_record (class in kraken.containers)": [[2, "kraken.containers.ocr_record", false]], "one_channel_mode (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.one_channel_mode", false]], "one_channel_mode (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.one_channel_mode", false]], "one_channel_mode (kraken.lib.vgsl.torchvgslmodel property)": [[2, "id42", false]], "one_channel_modes (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.one_channel_modes", false]], "ops (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.ops", false]], "pad (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.pad", false]], "pad (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.pad", false]], "pad (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.pad", false]], "page_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.page_idx", false]], "pages (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.pages", false]], "pagewiseroset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.PageWiseROSet", false]], "pairwiseroset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.PairWiseROSet", false]], "pattern (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.pattern", false]], "polygonal_reading_order() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.polygonal_reading_order", false]], "polygongtdataset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.PolygonGTDataset", false]], "predict() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict", false]], "predict_labels() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict_labels", false]], "predict_string() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict_string", false]], "prediction (kraken.containers.baselineocrrecord attribute)": [[2, "kraken.containers.BaselineOCRRecord.prediction", false]], "prediction (kraken.containers.bboxocrrecord attribute)": [[2, "kraken.containers.BBoxOCRRecord.prediction", false]], "prediction (kraken.containers.ocr_record property)": [[2, "kraken.containers.ocr_record.prediction", false]], "processingstep (class in kraken.containers)": [[2, "kraken.containers.ProcessingStep", false]], "pytorchcodec (class in kraken.lib.codec)": [[2, "kraken.lib.codec.PytorchCodec", false]], "reading_order() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.reading_order", false]], "rebuild_alphabet() (kraken.lib.dataset.arrowipcrecognitiondataset method)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.rebuild_alphabet", false]], "regions (kraken.containers.baselineline attribute)": [[2, "id12", false], [2, "kraken.containers.BaselineLine.regions", false]], "regions (kraken.containers.bboxline attribute)": [[2, "id20", false], [2, "kraken.containers.BBoxLine.regions", false]], "regions (kraken.containers.segmentation attribute)": [[2, "id3", false], [2, "kraken.containers.Segmentation.regions", false]], "render_report() (in module kraken.serialization)": [[2, "kraken.serialization.render_report", false]], "resize_output() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.resize_output", false]], "rpred() (in module kraken.rpred)": [[2, "kraken.rpred.rpred", false]], "save_model() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.save_model", false]], "scale (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.scale", false]], "scale_polygonal_lines() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.scale_polygonal_lines", false]], "scale_regions() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.scale_regions", false]], "script_detection (kraken.containers.segmentation attribute)": [[2, "id4", false], [2, "kraken.containers.Segmentation.script_detection", false]], "seg_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.seg_idx", false]], "seg_type (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.seg_type", false]], "seg_type (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.seg_type", false]], "seg_type (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.seg_type", false]], "seg_type (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.seg_type", false]], "seg_type (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.seg_type", false]], "seg_type (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.seg_type", false]], "seg_types (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.seg_types", false]], "segment() (in module kraken.blla)": [[2, "kraken.blla.segment", false]], "segment() (in module kraken.pageseg)": [[2, "kraken.pageseg.segment", false]], "segmentation (class in kraken.containers)": [[2, "kraken.containers.Segmentation", false]], "serialize() (in module kraken.serialization)": [[2, "kraken.serialization.serialize", false]], "set_num_threads() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.set_num_threads", false]], "settings (kraken.containers.processingstep attribute)": [[2, "id35", false], [2, "kraken.containers.ProcessingStep.settings", false]], "skip_empty_lines (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.skip_empty_lines", false]], "skip_empty_lines (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.skip_empty_lines", false]], "skip_empty_lines (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.skip_empty_lines", false]], "spec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.spec", false]], "split (kraken.containers.baselineline attribute)": [[2, "id13", false], [2, "kraken.containers.BaselineLine.split", false]], "split (kraken.containers.bboxline attribute)": [[2, "id21", false], [2, "kraken.containers.BBoxLine.split", false]], "strict (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.strict", false]], "tags (kraken.containers.baselineline attribute)": [[2, "id14", false], [2, "kraken.containers.BaselineLine.tags", false]], "tags (kraken.containers.bboxline attribute)": [[2, "id22", false], [2, "kraken.containers.BBoxLine.tags", false]], "tags_ignore (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.tags_ignore", false]], "targets (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.targets", false]], "text (kraken.containers.baselineline attribute)": [[2, "id15", false], [2, "kraken.containers.BaselineLine.text", false]], "text (kraken.containers.bboxline attribute)": [[2, "id23", false], [2, "kraken.containers.BBoxLine.text", false]], "text_direction (kraken.containers.bboxline attribute)": [[2, "id24", false], [2, "kraken.containers.BBoxLine.text_direction", false]], "text_direction (kraken.containers.segmentation attribute)": [[2, "id5", false], [2, "kraken.containers.Segmentation.text_direction", false]], "text_direction (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.text_direction", false]], "text_transforms (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.text_transforms", false]], "text_transforms (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.text_transforms", false]], "text_transforms (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.text_transforms", false]], "tmpl (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.tmpl", false]], "to() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.to", false]], "to() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.to", false]], "torchseqrecognizer (class in kraken.lib.models)": [[2, "kraken.lib.models.TorchSeqRecognizer", false]], "torchvgslmodel (class in kraken.lib.vgsl)": [[2, "kraken.lib.vgsl.TorchVGSLModel", false]], "train (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.train", false]], "train() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.train", false]], "transcriptioninterface (class in kraken.transcribe)": [[2, "kraken.transcribe.TranscriptionInterface", false]], "transform() (kraken.lib.dataset.baselineset method)": [[2, "kraken.lib.dataset.BaselineSet.transform", false]], "transforms (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.transforms", false]], "transforms (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.transforms", false]], "transforms (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.transforms", false]], "transforms (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.transforms", false]], "type (kraken.containers.baselineline attribute)": [[2, "kraken.containers.BaselineLine.type", false]], "type (kraken.containers.baselineocrrecord attribute)": [[2, "id28", false], [2, "kraken.containers.BaselineOCRRecord.type", false]], "type (kraken.containers.bboxline attribute)": [[2, "kraken.containers.BBoxLine.type", false]], "type (kraken.containers.bboxocrrecord attribute)": [[2, "id31", false], [2, "kraken.containers.BBoxOCRRecord.type", false]], "type (kraken.containers.ocr_record property)": [[2, "kraken.containers.ocr_record.type", false]], "type (kraken.containers.segmentation attribute)": [[2, "id6", false], [2, "kraken.containers.Segmentation.type", false]], "use_legacy_polygons (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.use_legacy_polygons", false]], "user_metadata (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id43", false], [2, "kraken.lib.vgsl.TorchVGSLModel.user_metadata", false]], "valid_baselines (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.valid_baselines", false]], "valid_norm (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.valid_norm", false]], "valid_regions (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.valid_regions", false]], "vectorize_lines() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.vectorize_lines", false]], "width (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.width", false]], "width (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id38", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.width", false]], "write() (kraken.transcribe.transcriptioninterface method)": [[2, "kraken.transcribe.TranscriptionInterface.write", false]], "xmlpage (class in kraken.lib.xml)": [[2, "kraken.lib.xml.XMLPage", false]]}, "objects": {"kraken.binarization": [[2, 0, 1, "", "nlbin"]], "kraken.blla": [[2, 0, 1, "", "segment"]], "kraken.containers": [[2, 1, 1, "", "BBoxLine"], [2, 1, 1, "", "BBoxOCRRecord"], [2, 1, 1, "", "BaselineLine"], [2, 1, 1, "", "BaselineOCRRecord"], [2, 1, 1, "", "ProcessingStep"], [2, 1, 1, "", "Segmentation"], [2, 1, 1, "", "ocr_record"]], "kraken.containers.BBoxLine": [[2, 2, 1, "id16", "base_dir"], [2, 2, 1, "id17", "bbox"], [2, 2, 1, "id18", "id"], [2, 2, 1, "id19", "imagename"], [2, 2, 1, "id20", "regions"], [2, 2, 1, "id21", "split"], [2, 2, 1, "id22", "tags"], [2, 2, 1, "id23", "text"], [2, 2, 1, "id24", "text_direction"], [2, 2, 1, "", "type"]], "kraken.containers.BBoxOCRRecord": [[2, 2, 1, "id29", "base_dir"], [2, 2, 1, "", "confidences"], [2, 2, 1, "", "cuts"], [2, 3, 1, "id30", "display_order"], [2, 3, 1, "", "logical_order"], [2, 2, 1, "", "prediction"], [2, 2, 1, "id31", "type"]], "kraken.containers.BaselineLine": [[2, 2, 1, "id7", "base_dir"], [2, 2, 1, "id8", "baseline"], [2, 2, 1, "id9", "boundary"], [2, 2, 1, "id10", "id"], [2, 2, 1, "id11", "imagename"], [2, 2, 1, "id12", "regions"], [2, 2, 1, "id13", "split"], [2, 2, 1, "id14", "tags"], [2, 2, 1, "id15", "text"], [2, 2, 1, "", "type"]], "kraken.containers.BaselineOCRRecord": [[2, 2, 1, "id25", "base_dir"], [2, 2, 1, "", "confidences"], [2, 4, 1, "id26", "cuts"], [2, 3, 1, "id27", "display_order"], [2, 3, 1, "", "logical_order"], [2, 2, 1, "", "prediction"], [2, 2, 1, "id28", "type"]], "kraken.containers.ProcessingStep": [[2, 2, 1, "id32", "category"], [2, 2, 1, "id33", "description"], [2, 2, 1, "id34", "id"], [2, 2, 1, "id35", "settings"]], "kraken.containers.Segmentation": [[2, 2, 1, "id0", "imagename"], [2, 2, 1, "id1", "line_orders"], [2, 2, 1, "id2", "lines"], [2, 2, 1, "id3", "regions"], [2, 2, 1, "id4", "script_detection"], [2, 2, 1, "id5", "text_direction"], [2, 2, 1, "id6", "type"]], "kraken.containers.ocr_record": [[2, 2, 1, "", "base_dir"], [2, 4, 1, "", "confidences"], [2, 4, 1, "", "cuts"], [2, 3, 1, "", "display_order"], [2, 3, 1, "", "logical_order"], [2, 4, 1, "", "prediction"], [2, 4, 1, "", "type"]], "kraken.lib.codec": [[2, 1, 1, "", "PytorchCodec"]], "kraken.lib.codec.PytorchCodec": [[2, 3, 1, "", "add_labels"], [2, 2, 1, "", "c_sorted"], [2, 3, 1, "", "decode"], [2, 3, 1, "", "encode"], [2, 4, 1, "", "is_valid"], [2, 2, 1, "", "l2c"], [2, 2, 1, "", "l2c_single"], [2, 4, 1, "", "max_label"], [2, 3, 1, "", "merge"], [2, 2, 1, "", "strict"]], "kraken.lib.ctc_decoder": [[2, 0, 1, "", "beam_decoder"], [2, 0, 1, "", "blank_threshold_decoder"], [2, 0, 1, "", "greedy_decoder"]], "kraken.lib.dataset": [[2, 1, 1, "", "ArrowIPCRecognitionDataset"], [2, 1, 1, "", "BaselineSet"], [2, 1, 1, "", "GroundTruthDataset"], [2, 1, 1, "", "ImageInputTransforms"], [2, 1, 1, "", "PageWiseROSet"], [2, 1, 1, "", "PairWiseROSet"], [2, 1, 1, "", "PolygonGTDataset"], [2, 0, 1, "", "collate_sequences"], [2, 0, 1, "", "compute_confusions"], [2, 0, 1, "", "global_align"]], "kraken.lib.dataset.ArrowIPCRecognitionDataset": [[2, 3, 1, "", "add"], [2, 2, 1, "", "alphabet"], [2, 2, 1, "", "arrow_table"], [2, 2, 1, "", "aug"], [2, 2, 1, "", "codec"], [2, 3, 1, "", "encode"], [2, 2, 1, "", "failed_samples"], [2, 2, 1, "", "im_mode"], [2, 2, 1, "", "legacy_polygons_status"], [2, 3, 1, "", "no_encode"], [2, 3, 1, "", "rebuild_alphabet"], [2, 2, 1, "", "seg_type"], [2, 2, 1, "", "skip_empty_lines"], [2, 2, 1, "", "text_transforms"], [2, 2, 1, "", "transforms"]], "kraken.lib.dataset.BaselineSet": [[2, 3, 1, "", "add"], [2, 2, 1, "", "aug"], [2, 2, 1, "", "class_mapping"], [2, 2, 1, "", "class_stats"], [2, 2, 1, "", "failed_samples"], [2, 2, 1, "", "im_mode"], [2, 2, 1, "", "imgs"], [2, 2, 1, "", "line_width"], [2, 2, 1, "", "mbl_dict"], [2, 2, 1, "", "mreg_dict"], [2, 2, 1, "", "num_classes"], [2, 2, 1, "", "pad"], [2, 2, 1, "", "seg_type"], [2, 2, 1, "", "targets"], [2, 3, 1, "", "transform"], [2, 2, 1, "", "transforms"], [2, 2, 1, "", "valid_baselines"], [2, 2, 1, "", "valid_regions"]], "kraken.lib.dataset.GroundTruthDataset": [[2, 3, 1, "", "add"], [2, 3, 1, "", "add_line"], [2, 3, 1, "", "add_page"], [2, 2, 1, "", "alphabet"], [2, 2, 1, "", "aug"], [2, 3, 1, "", "encode"], [2, 2, 1, "", "failed_samples"], [2, 2, 1, "", "im_mode"], [2, 3, 1, "", "no_encode"], [2, 2, 1, "", "seg_type"], [2, 2, 1, "", "skip_empty_lines"], [2, 2, 1, "", "text_transforms"], [2, 2, 1, "", "transforms"]], "kraken.lib.dataset.ImageInputTransforms": [[2, 4, 1, "", "batch"], [2, 4, 1, "", "centerline_norm"], [2, 4, 1, "", "channels"], [2, 4, 1, "", "force_binarization"], [2, 4, 1, "", "height"], [2, 4, 1, "", "mode"], [2, 4, 1, "", "pad"], [2, 4, 1, "", "scale"], [2, 4, 1, "", "valid_norm"], [2, 4, 1, "", "width"]], "kraken.lib.dataset.PageWiseROSet": [[2, 2, 1, "", "data"], [2, 2, 1, "", "failed_samples"], [2, 3, 1, "", "get_feature_dim"]], "kraken.lib.dataset.PairWiseROSet": [[2, 2, 1, "", "data"], [2, 2, 1, "", "failed_samples"], [2, 3, 1, "", "get_feature_dim"]], "kraken.lib.dataset.PolygonGTDataset": [[2, 3, 1, "", "add"], [2, 3, 1, "", "add_line"], [2, 3, 1, "", "add_page"], [2, 2, 1, "", "alphabet"], [2, 2, 1, "", "aug"], [2, 3, 1, "", "encode"], [2, 2, 1, "", "failed_samples"], [2, 2, 1, "", "im_mode"], [2, 2, 1, "", "legacy_polygons"], [2, 3, 1, "", "no_encode"], [2, 2, 1, "", "seg_type"], [2, 2, 1, "", "skip_empty_lines"], [2, 2, 1, "", "text_transforms"], [2, 2, 1, "", "transforms"]], "kraken.lib.exceptions": [[2, 1, 1, "", "KrakenCairoSurfaceException"], [2, 1, 1, "", "KrakenCodecException"], [2, 1, 1, "", "KrakenEncodeException"], [2, 1, 1, "", "KrakenInputException"], [2, 1, 1, "", "KrakenInvalidModelException"], [2, 1, 1, "", "KrakenRecordException"], [2, 1, 1, "", "KrakenRepoException"], [2, 1, 1, "", "KrakenStopTrainingException"]], "kraken.lib.exceptions.KrakenCairoSurfaceException": [[2, 2, 1, "id36", "height"], [2, 2, 1, "id37", "message"], [2, 2, 1, "id38", "width"]], "kraken.lib.models": [[2, 1, 1, "", "TorchSeqRecognizer"], [2, 0, 1, "", "load_any"]], "kraken.lib.models.TorchSeqRecognizer": [[2, 2, 1, "", "codec"], [2, 2, 1, "", "decoder"], [2, 2, 1, "", "device"], [2, 3, 1, "", "forward"], [2, 2, 1, "", "kind"], [2, 2, 1, "", "nn"], [2, 2, 1, "", "one_channel_mode"], [2, 3, 1, "", "predict"], [2, 3, 1, "", "predict_labels"], [2, 3, 1, "", "predict_string"], [2, 2, 1, "", "seg_type"], [2, 3, 1, "", "to"], [2, 2, 1, "", "train"]], "kraken.lib.segmentation": [[2, 0, 1, "", "calculate_polygonal_environment"], [2, 0, 1, "", "compute_polygon_section"], [2, 0, 1, "", "extract_polygons"], [2, 0, 1, "", "neural_reading_order"], [2, 0, 1, "", "polygonal_reading_order"], [2, 0, 1, "", "reading_order"], [2, 0, 1, "", "scale_polygonal_lines"], [2, 0, 1, "", "scale_regions"], [2, 0, 1, "", "vectorize_lines"]], "kraken.lib.train": [[2, 1, 1, "", "KrakenTrainer"]], "kraken.lib.train.KrakenTrainer": [[2, 2, 1, "", "automatic_optimization"], [2, 3, 1, "", "fit"]], "kraken.lib.vgsl": [[2, 1, 1, "", "TorchVGSLModel"]], "kraken.lib.vgsl.TorchVGSLModel": [[2, 3, 1, "", "add_codec"], [2, 3, 1, "", "append"], [2, 4, 1, "", "aux_layers"], [2, 2, 1, "", "blocks"], [2, 3, 1, "", "build_addition"], [2, 3, 1, "", "build_conv"], [2, 3, 1, "", "build_dropout"], [2, 3, 1, "", "build_groupnorm"], [2, 3, 1, "", "build_identity"], [2, 3, 1, "", "build_maxpool"], [2, 3, 1, "", "build_output"], [2, 3, 1, "", "build_parallel"], [2, 3, 1, "", "build_reshape"], [2, 3, 1, "", "build_rnn"], [2, 3, 1, "", "build_ro"], [2, 3, 1, "", "build_series"], [2, 3, 1, "", "build_wav2vec2"], [2, 2, 1, "", "codec"], [2, 2, 1, "id39", "criterion"], [2, 3, 1, "", "eval"], [2, 4, 1, "", "hyper_params"], [2, 2, 1, "", "idx"], [2, 3, 1, "", "init_weights"], [2, 2, 1, "id40", "input"], [2, 3, 1, "", "load_model"], [2, 2, 1, "", "m"], [2, 4, 1, "", "model_type"], [2, 2, 1, "", "named_spec"], [2, 2, 1, "id41", "nn"], [2, 4, 1, "id42", "one_channel_mode"], [2, 2, 1, "", "ops"], [2, 2, 1, "", "pattern"], [2, 3, 1, "", "resize_output"], [2, 3, 1, "", "save_model"], [2, 4, 1, "", "seg_type"], [2, 3, 1, "", "set_num_threads"], [2, 2, 1, "", "spec"], [2, 3, 1, "", "to"], [2, 3, 1, "", "train"], [2, 4, 1, "", "use_legacy_polygons"], [2, 2, 1, "id43", "user_metadata"]], "kraken.lib.xml": [[2, 1, 1, "", "XMLPage"]], "kraken.pageseg": [[2, 0, 1, "", "segment"]], "kraken.rpred": [[2, 1, 1, "", "mm_rpred"], [2, 0, 1, "", "rpred"]], "kraken.rpred.mm_rpred": [[2, 2, 1, "", "bidi_reordering"], [2, 2, 1, "", "bounds"], [2, 2, 1, "", "im"], [2, 2, 1, "", "len"], [2, 2, 1, "", "line_iter"], [2, 2, 1, "", "nets"], [2, 2, 1, "", "no_legacy_polygons"], [2, 2, 1, "", "one_channel_modes"], [2, 2, 1, "", "pad"], [2, 2, 1, "", "seg_types"], [2, 2, 1, "", "tags_ignore"]], "kraken.serialization": [[2, 0, 1, "", "render_report"], [2, 0, 1, "", "serialize"]], "kraken.transcribe": [[2, 1, 1, "", "TranscriptionInterface"]], "kraken.transcribe.TranscriptionInterface": [[2, 3, 1, "", "add_page"], [2, 2, 1, "", "env"], [2, 2, 1, "", "font"], [2, 2, 1, "", "line_idx"], [2, 2, 1, "", "page_idx"], [2, 2, 1, "", "pages"], [2, 2, 1, "", "seg_idx"], [2, 2, 1, "", "text_direction"], [2, 2, 1, "", "tmpl"], [2, 3, 1, "", "write"]]}, "objnames": {"0": ["py", "function", "Python function"], "1": ["py", "class", "Python class"], "2": ["py", "attribute", "Python attribute"], "3": ["py", "method", "Python method"], "4": ["py", "property", "Python property"]}, "objtypes": {"0": "py:function", "1": "py:class", "2": "py:attribute", "3": "py:method", "4": "py:property"}, "terms": {"": [0, 1, 2, 4, 5, 6, 7, 8], "0": [0, 1, 2, 4, 5, 7, 8], "00": [0, 5, 7], "0001": 5, "0005": 4, "001": [5, 7], "00it": 5, "01": 4, "0123456789": [0, 4, 7], "01c59": 8, "0245": 7, "04": 7, "06": [0, 7], "07": [0, 5], "09": [0, 7], "0d": 7, "0xe682": 4, "0xe68b": 4, "0xe8bf": 4, "0xe8e5": 0, "0xf038": 0, "0xf128": 0, "0xf1a7": 4, "1": [0, 1, 2, 5, 7, 8], "10": [0, 1, 4, 5, 7], "100": [0, 2, 5, 7, 8], "1000": 5, "1015": 1, "1020": 8, "10218": 5, "1024": 8, "103": 1, "105": 1, "10592716": 4, "106": 5, "108": 5, "11": 7, "1128": 5, "11346": 5, "1161": 1, "117": 1, "1184": 7, "119": 1, "1195": 1, "12": [5, 7, 8], "120": 5, "1200": 5, "121": 1, "122": 5, "124": 1, "125": 5, "126": 1, "128": [5, 8], "128000": 5, "128k": 5, "13": [5, 7], "131": 1, "132": 7, "1339": 7, "134": 5, "135": 5, "1359": 7, "136": 5, "1377": 1, "1385": 1, "1388": 1, "1397": 1, "14": [0, 5], "1408": [1, 2], "1410": 1, "1412": 1, "1416": 7, "143": 7, "144": 5, "145": 1, "15": [1, 5, 7], "151": 5, "1558": 7, "1567": 5, "157": 7, "16": [0, 2, 5, 8], "161": 7, "1623": 7, "1681": 7, "1697": 7, "16th": 4, "17": [2, 5], "1708": 1, "1716": 1, "172": 5, "1724": 7, "174": 5, "1754": 7, "176": 7, "18": [5, 7], "1800": 8, "1824": 1, "19": [1, 5], "192": 5, "198": 5, "199": 5, "1996": 7, "1bpp": 0, "1cycl": 5, "1d": 8, "1e": 5, "1st": 7, "1x0": 5, "1x12": [5, 8], "1x16": 8, "1x48": 8, "2": [0, 2, 4, 5, 7, 8], "20": [1, 2, 5, 8], "200": 5, "2000": 1, "2001": 5, "2006": 2, "2014": 2, "2016": 1, "2017": 1, "2019": 5, "2020": 4, "2021": 0, "2024": 4, "204": 7, "2041": 1, "207": [1, 5], "2072": 1, "2077": 1, "2078": 1, "2096": 7, "21": 4, "210": 5, "215": 5, "216": 1, "21st": 4, "22": [0, 5, 7], "228": 1, "23": [0, 5], "230": 1, "232": 1, "2334": 7, "2364": 7, "23rd": 2, "24": [0, 1, 7], "241": 5, "2426": 1, "246": 5, "2483": 1, "25": [1, 5, 7, 8], "250": 1, "2500": 7, "253": 1, "256": [5, 7, 8], "259": 7, "26": 7, "266": 5, "27": 5, "270": 7, "27046": 7, "274": 5, "28": [1, 5], "2873": 2, "29": [0, 1, 5], "2d": [2, 8], "3": [2, 5, 7, 8], "30": [4, 5, 7], "300": 5, "300dpi": 7, "307": 7, "31": 5, "32": [5, 8], "328": 5, "3292": 1, "336": 7, "3367": 1, "3398": 1, "3414": 1, "3418": 7, "3437": 1, "345": 1, "3455": 1, "35": 5, "35000": 7, "3504": 7, "3514": 1, "3519": 7, "35619": 7, "365": 7, "3680": 7, "38": 5, "384": 8, "39": 5, "4": [1, 4, 5, 7, 8], "40": 7, "400": 5, "4000": 5, "428": 7, "431": 7, "45": 5, "46": 5, "47": 7, "471": 1, "473": 1, "48": [5, 7, 8], "488": 7, "49": [0, 5, 7], "491": 1, "4d": 2, "5": [1, 2, 5, 7, 8], "50": [5, 7], "500": 5, "5000": 5, "509": 1, "512": 8, "515": 1, "52": [5, 7], "522": 1, "5226": 5, "523": 5, "5230": 5, "5234": 5, "524": 1, "5258": 7, "5281": [0, 4], "53": 5, "534": 1, "536": [1, 5], "53980": 5, "54": 1, "54114": 5, "5431": 5, "545": 7, "5468665": 0, "56": [0, 1, 4, 7], "5617734": 0, "5617783": 0, "562": 1, "575": [1, 5], "577": 7, "59": [7, 8], "5951": 7, "599": 7, "6": [5, 7, 8], "60": [5, 7], "6022": 5, "62": 5, "63": 5, "64": [5, 8], "646": 7, "6542744": 0, "66": [5, 7], "668": 1, "69": 1, "7": [1, 5, 7, 8], "70": 1, "701": 5, "7012": 5, "7015": 7, "71": 5, "7272": 7, "7281": 7, "73": 1, "74": [1, 5], "7593": 5, "76": 1, "773": 5, "7857": 5, "788": [5, 7], "79": 1, "794": 5, "7943": 5, "8": [0, 5, 7, 8], "80": [2, 5], "800": 7, "8014": 5, "81": [5, 7], "811": 7, "82": 5, "824": 7, "8337": 5, "8344": 5, "8374": 5, "84": [1, 7], "8445": 7, "8479": 7, "8481": 7, "8482": 7, "8484": 7, "8485": 7, "8486": 7, "8487": 7, "8488": 7, "8489": 7, "8490": 7, "8491": 7, "8492": 7, "8493": 7, "8494": 7, "8495": 7, "8496": 7, "8497": 7, "8498": 7, "8499": 7, "8500": 7, "8501": 7, "8502": 7, "8503": 7, "8504": 7, "8505": 7, "8506": 7, "8507": 7, "8508": 7, "8509": 7, "8510": 7, "8511": 7, "8512": 7, "8616": 5, "8620": 5, "876": 7, "8760": 5, "8762": 5, "8790": 5, "8795": 5, "8797": 5, "88": [5, 7], "8802": 5, "8804": 5, "8806": 5, "8813": 5, "8876": 5, "8878": 5, "8883": 5, "889": 7, "9": [1, 2, 5, 7, 8], "90": [2, 5], "906": 8, "906x32": 8, "91": 5, "912": 5, "92": 1, "9315": 7, "9318": 7, "9350": 7, "9361": 7, "9381": 7, "95": [0, 5], "9541": 7, "9550": 7, "96": [5, 7], "97": 7, "98": [4, 7], "99": 7, "9918": 7, "9920": 7, "9924": 7, "A": [0, 1, 2, 4, 5, 7, 8], "As": [0, 1, 2, 5], "BY": 0, "By": 7, "For": [0, 1, 2, 5, 7, 8], "If": [0, 2, 4, 5, 7, 8], "In": [0, 1, 2, 4, 5, 7], "It": [0, 1, 5, 7, 8], "Its": 0, "NO": 7, "One": [2, 5], "The": [0, 1, 2, 3, 4, 5, 7, 8], "Then": 5, "There": [0, 1, 4, 5, 6, 7], "These": [0, 1, 2, 4, 5, 7], "To": [0, 1, 2, 4, 5, 7], "Will": 2, "With": [0, 5], "_abcdefghijklmnopqrstuvwxyz": 4, "aaebv2": 0, "abbrevi": 4, "abbyyxml": [0, 4], "abcdefghijklmnopqrstuvwxyz": 4, "abcdefghijklmnopqrstuvxabcdefghijklmnopqrstuvwxyz": 0, "abjad": 5, "abl": [0, 2, 5, 7], "abort": [5, 7], "about": 7, "abov": [0, 1, 4, 5, 7], "absolut": [2, 5], "abstract": 2, "abugida": 5, "acceler": [4, 5, 7], "accent": [0, 4], "accept": [0, 2, 5], "access": [0, 1, 2], "access_token": 0, "accord": [0, 2, 5], "accordingli": 2, "account": [0, 7], "accur": 5, "accuraci": [0, 1, 2, 4, 5, 7], "achiev": 7, "acm": 2, "across": [2, 5], "action": [0, 5], "activ": [0, 5, 7, 8], "actual": [2, 4, 5, 7], "acut": [0, 4], "ad": [2, 5, 7], "adam": 5, "adapt": 5, "add": [0, 2, 4, 5, 8], "add_codec": 2, "add_label": 2, "add_lin": 2, "add_pag": 2, "addit": [0, 1, 2, 4, 5], "addition": 2, "adjust": [5, 7, 8], "administr": 0, "advantag": 5, "advis": 7, "affect": 7, "after": [0, 1, 2, 5, 7, 8], "afterward": [0, 1], "again": [4, 7], "agenc": 4, "aggreg": 2, "ah": 7, "aid": [2, 4], "aim": 5, "aku": 7, "al": [2, 7], "alam": 7, "albeit": 7, "aletheia": 7, "alex": 2, "algn1": 2, "algn2": 2, "algorithm": [0, 1, 2, 5], "align": [2, 5], "align1": 2, "align2": 2, "all": [0, 1, 2, 4, 5, 6, 7], "allographet": 4, "allow": [2, 5, 6, 7], "almost": [0, 1], "along": [2, 8], "alphabet": [0, 2, 4, 5, 7, 8], "alreadi": 5, "also": [0, 1, 2, 4, 5, 7], "altern": [2, 5, 8], "although": [0, 1, 5, 7], "alto": [0, 1, 4, 7], "alto_doc": 1, "alwai": [0, 2, 4], "amiss": 7, "among": 5, "amount": [0, 2, 5, 7], "amp": 5, "an": [0, 1, 2, 4, 5, 7, 8], "anaconda": 4, "analogu": 0, "analysi": [0, 4, 7], "ani": [0, 1, 2, 5], "annot": [0, 4, 5], "anoth": [0, 2, 5, 7, 8], "anr": 4, "antiqua": 0, "anymor": [0, 5, 7], "anyth": 2, "apach": 4, "apart": [0, 3, 5], "api": 5, "appear": 2, "append": [0, 2, 5, 7, 8], "appli": [0, 1, 2, 4, 7, 8], "applic": [1, 7], "approach": [4, 5, 7], "appropri": [0, 2, 4, 5, 7, 8], "approv": 0, "approxim": [1, 5], "ar": [0, 1, 2, 4, 5, 6, 7, 8], "arab": [0, 5, 7], "arbitrari": [1, 6, 7, 8], "architectur": [4, 5, 6, 8], "archiv": [1, 5, 7], "area": [0, 2], "aren": [2, 5], "arg": 2, "argument": [1, 5], "arian": 0, "arm": 4, "around": [0, 1, 2, 5, 7], "arrai": [1, 2], "arrow": [2, 5], "arrow_t": 2, "arrowipcrecognitiondataset": 2, "arxiv": 2, "ask": 0, "aspect": 2, "assign": [2, 5, 7], "associ": [1, 2], "assum": 2, "attach": [1, 5], "attribut": [1, 2, 5], "au": 4, "aug": 2, "augment": [1, 2, 5, 7, 8], "author": [0, 4], "authorship": 0, "auto": [1, 2, 5], "autocast": 2, "automat": [0, 1, 2, 5, 7, 8], "automatic_optim": 2, "aux_lay": 2, "auxiliari": [0, 1], "avail": [0, 1, 4, 5, 7], "avenir": 4, "averag": [0, 2, 5, 7], "awesom": 0, "awni": 2, "axi": [2, 8], "b": [0, 1, 5, 7, 8], "back": [2, 8], "backbon": 5, "backend": 3, "background": [0, 2, 5], "backslash": 5, "base": [1, 2, 5, 6, 7, 8], "base_dir": 2, "baselin": [2, 4, 5, 7], "baseline_seg": 1, "baselinelin": 2, "baselineocrrecord": 2, "baselineset": 2, "basic": [0, 5, 7], "batch": [0, 2, 5, 7, 8], "batch_siz": 5, "bayr\u016bt": 7, "bbox": 2, "bboxlin": 2, "bboxocrrecord": 2, "bcewithlogitsloss": 5, "beam": 2, "beam_decod": 2, "beam_siz": 2, "becaus": [1, 4, 5, 7], "becom": 0, "been": [0, 2, 4, 5, 7], "befor": [2, 5, 7, 8], "beforehand": 7, "behav": [5, 8], "behavior": [2, 5], "being": [1, 2, 5, 8], "below": [0, 5, 7], "best": [0, 2, 7], "better": 5, "between": [0, 2, 5, 7], "bi": [2, 8], "biblissima": 4, "bidi": [2, 4, 5], "bidi_reord": 2, "bidirect": [2, 5], "bidirection": 8, "binar": [1, 7], "binari": [0, 1, 2], "bind": 0, "bit": [1, 5], "biton": 2, "bl": [0, 4], "black": [0, 1, 2, 7], "black_colsep": 2, "blank": 2, "blank_threshold_decod": 2, "blla": 1, "blob": 2, "block": [0, 1, 2, 5, 8], "block_i": 5, "block_n": 5, "board": 4, "boilerpl": 1, "book": 0, "bookhand": 0, "bool": 2, "border": [0, 2], "both": [0, 1, 2, 3, 4, 5, 7], "bottom": [0, 1, 2, 4], "bound": [0, 1, 2, 4, 5], "boundari": [0, 1, 2, 5], "box": [1, 2, 4, 5], "branch": 8, "break": 7, "brought": 5, "build": [2, 5, 7], "build_addit": 2, "build_conv": 2, "build_dropout": 2, "build_groupnorm": 2, "build_ident": 2, "build_maxpool": 2, "build_output": 2, "build_parallel": 2, "build_reshap": 2, "build_rnn": 2, "build_ro": 2, "build_seri": 2, "build_wav2vec2": 2, "buld\u0101n": 7, "bundl": 5, "bw": [0, 4], "bw_im": 1, "bw_imag": 7, "b\u00e9n\u00e9fici\u00e9": 4, "c": [0, 1, 2, 4, 5, 8], "c1": 2, "c2": 2, "c_sort": 2, "cach": 2, "cairo": 2, "calcul": [1, 2, 5], "calculate_polygonal_environ": 2, "call": [1, 2, 5, 7], "callabl": 2, "callback": 1, "can": [0, 1, 2, 3, 4, 5, 7, 8], "cannot": 0, "capabl": [0, 5], "case": [0, 1, 2, 5, 7], "cat": 0, "catalan": 4, "categori": 2, "catmu": 4, "caus": [1, 2], "caveat": 5, "cc": [0, 4], "cd": 4, "ce": [4, 7], "cedilla": 4, "cell": 8, "cent": 7, "center": 5, "centerlin": [2, 5], "centerline_norm": 2, "central": [4, 7], "centuri": 4, "certain": [0, 2, 7], "chain": [0, 4, 7], "chanc": 2, "chang": [0, 1, 2, 5], "channel": [2, 4, 8], "char": 2, "char_confus": 2, "charact": [0, 1, 2, 4, 5, 6, 7], "charset": 2, "check": 0, "chines": [0, 5], "chinese_training_data": 5, "choic": 5, "chosen": 1, "circumflex": 4, "circumst": 7, "class": [0, 1, 2, 5, 7], "class_map": 2, "class_stat": 2, "classic": 7, "classif": [2, 5, 7, 8], "classifi": [0, 1, 8], "classmethod": 2, "claus": 7, "cli": 1, "clone": 4, "close": [4, 5], "closer": 1, "clstm": [2, 6], "cl\u00e9rice": 4, "code": [0, 1, 2, 4, 5, 7], "codec": 1, "coher": 0, "collabor": 4, "collate_sequ": 2, "collect": [2, 7], "color": [0, 1, 5, 7, 8], "colsep": 0, "column": [0, 1, 2], "com": [4, 7], "combin": [0, 1, 2, 4, 5, 7, 8], "come": [2, 5, 8], "comma": 4, "command": [0, 1, 4, 5, 7], "commenc": 1, "common": [2, 5, 7], "commoni": 5, "commun": 0, "compact": [0, 6], "compar": 5, "comparison": 5, "compat": [2, 3, 4, 5], "compil": 5, "complet": [1, 5, 7], "complex": [1, 7], "complic": 5, "compon": 5, "compos": 2, "composedblocktyp": 5, "composit": 0, "compound": 2, "compress": 7, "compris": 7, "comput": [0, 2, 3, 4, 5, 7], "computation": 7, "compute_confus": 2, "compute_polygon_sect": 2, "concaten": 8, "conda": 7, "condit": [4, 5], "confer": 2, "confid": [0, 1, 2], "configur": [1, 2, 5, 8], "confluenc": 8, "conform": 5, "confus": [2, 5], "conjunct": 5, "connect": [2, 5, 7], "connectionist": 2, "conserv": 5, "consid": [0, 2], "consist": [0, 1, 4, 7, 8], "consolid": 4, "constant": 5, "construct": [5, 7], "contain": [0, 1, 4, 5, 6, 7], "contemporari": 0, "content": 5, "continu": [0, 1, 2, 5, 7], "contrast": [5, 7], "contrib": 1, "contribut": 4, "control": 5, "conv": [5, 8], "converg": 5, "convers": [1, 7], "convert": [0, 1, 2, 5, 7], "convolut": [2, 5], "coord": 5, "coordin": [2, 4], "core": [5, 6], "coreml": 2, "corpu": 5, "correct": [0, 1, 2, 5, 7], "correctli": 8, "correspond": [0, 1, 2], "corsican": 4, "cosin": 5, "cost": 7, "could": [2, 5], "couldn": 2, "count": [2, 5, 7], "counter": 2, "coupl": [0, 5, 7], "cover": 0, "coverag": 7, "cpu": [1, 2, 5, 7], "cr3": [5, 8], "cr7": 5, "creat": [0, 2, 4, 5, 7, 8], "creation": 0, "cremma": 0, "cremma_medieval_bicerin": 0, "criterion": [2, 5], "css": 0, "ctc": [1, 2, 5], "ctc_decod": 1, "ctr3": 8, "cuda": [3, 4, 5], "cudnn": 3, "cumbersom": 0, "cuneiform": 5, "curat": 0, "current": [0, 2, 4, 5, 6], "curv": 0, "custom": [0, 1, 2, 5], "cut": [1, 2, 4], "cycl": 5, "d": [0, 4, 5, 7, 8], "dai": 4, "data": [0, 1, 2, 4, 7, 8], "dataset": 1, "dataset_larg": 5, "date": [0, 4], "de": [2, 4, 7], "deal": [0, 4, 5], "debug": [1, 5, 7], "decai": 5, "decent": 5, "decid": [0, 5], "decis": 5, "decod": [1, 2, 5], "decompos": 5, "decomposit": 5, "decreas": 7, "def": 1, "default": [0, 1, 4, 5, 6, 7, 8], "defaultlin": 5, "defin": [0, 1, 2, 4, 5, 8], "definit": [0, 5, 8], "degrad": 1, "degre": 7, "del": 2, "del_indic": 2, "delet": [0, 2, 5, 7], "denot": 0, "depend": [0, 1, 2, 4, 5, 7], "deposit": 0, "deprec": [0, 2], "depth": [5, 7, 8], "describ": [2, 5], "descript": [0, 2, 5], "descriptor": 2, "deseri": 2, "desir": [1, 2, 8], "desktop": 7, "destin": 2, "destroi": 5, "detail": [0, 2, 5, 7], "detect": [0, 2], "determin": [0, 2, 5], "develop": [2, 4], "deviat": 5, "devic": [1, 2, 5, 7], "diachron": 4, "diacrit": [4, 5], "diaeres": 7, "diaeresi": [4, 7], "diagram": 5, "dialect": 8, "dice": 5, "dict": 2, "dictionari": [2, 5], "differ": [0, 1, 4, 5, 7, 8], "difficult": 5, "digit": 4, "dilat": 8, "dilation_i": 8, "dilation_x": 8, "dim": [5, 7, 8], "dimens": [2, 8], "dimension": 5, "dir": 5, "direct": [1, 2, 4, 5, 7, 8], "directli": [0, 5, 8], "directori": [1, 2, 4, 5, 7], "disabl": [0, 2, 5, 7], "disallow": 2, "discover": 0, "disk": 7, "displai": [2, 5], "display_ord": 2, "dissimilar": 5, "dist1": 2, "dist2": 2, "distanc": 2, "distinguish": 5, "distractor": 5, "distribut": 8, "dnn": 2, "do": [0, 1, 2, 4, 5, 6, 7, 8], "do0": [5, 8], "doc": [0, 2], "document": [0, 1, 2, 4, 5, 7], "doe": [0, 1, 2, 5, 7], "doesn": [2, 5, 7], "doi": 0, "domain": [1, 5], "don": 5, "done": [0, 4, 5, 7, 8], "dot": [4, 7], "down": [7, 8], "download": [0, 4, 7], "downward": 2, "drastic": 5, "draw": 1, "drawback": [0, 5], "driver": 1, "drop": [1, 8], "dropcapitallin": 5, "dropout": [2, 5, 7], "du": 4, "duplic": 2, "dure": [2, 5, 7], "e": [0, 1, 2, 5, 7, 8], "each": [0, 1, 2, 4, 5, 7, 8], "earli": [5, 7], "earlier": 2, "early_stop": 5, "easi": 2, "easiest": 7, "easili": [5, 7], "ecod": 2, "edg": 2, "edit": 7, "editor": 7, "edu": 7, "effect": 0, "either": [0, 2, 5, 7, 8], "element": 5, "emit": 2, "emploi": [0, 7], "empti": [2, 5], "enabl": [1, 2, 3, 5, 7, 8], "enable_progress_bar": [1, 2], "enable_summari": 2, "encapsul": 1, "encod": [2, 5, 7], "end": [1, 2, 5], "end_separ": 2, "endpoint": 2, "energi": 2, "enforc": [0, 5], "engin": 1, "english": 4, "enough": 7, "ensur": 5, "entir": 5, "entri": 2, "env": [2, 4, 7], "environ": [2, 4, 7], "environment_cuda": 4, "epoch": [5, 7], "equal": [1, 7, 8], "equival": 8, "erron": 7, "error": [0, 2, 5, 7], "escal": [0, 2], "escap": 5, "escripta": 4, "escriptorium": [4, 7], "especi": 0, "esr": 4, "essenti": 5, "estim": [0, 2, 5, 7], "et": 2, "etc": 0, "european": 4, "eval": 2, "evalu": 5, "evaluation_data": 1, "evaluation_fil": 1, "even": [0, 2, 5, 7], "everi": 0, "everyth": 5, "evolv": 4, "exact": [5, 7], "exactli": [1, 5], "exampl": [0, 1, 5, 7], "except": [1, 4, 5], "exchang": 0, "execut": [0, 7, 8], "exhaust": 7, "exist": [0, 1, 4, 5, 7], "exit": 2, "expand": 0, "expect": [2, 5, 7, 8], "experi": [4, 5, 7], "experiment": 7, "explic": 0, "explicit": [1, 5], "explicitli": [5, 7], "exponenti": 5, "express": 0, "extend": [2, 8], "extens": [0, 5], "extent": 7, "extra": [2, 4], "extract": [0, 1, 2, 4, 5, 7], "extract_polygon": 2, "extrapol": 2, "extrem": 5, "f": [0, 4, 5, 7, 8], "fact": 5, "factor": [0, 2], "fail": 5, "failed_sampl": 2, "faint": 0, "fairli": [5, 7], "fallback": 0, "fals": [1, 2, 5, 7, 8], "fame": 0, "fancy_model": 0, "faq\u012bh": 7, "fashion": 5, "faster": [5, 7, 8], "fc1": 5, "fc2": 5, "fd": 2, "featur": [1, 2, 5, 7, 8], "fed": [0, 1, 2, 5, 8], "feed": [0, 1, 5], "feminin": 7, "fetch": 7, "few": [0, 5, 7], "field": [2, 5], "figur": 1, "file": [0, 1, 2, 4, 5, 6, 7], "file_1": 5, "file_2": 5, "filenam": [1, 2, 5], "filenotfounderror": 2, "filetyp": 2, "fill": 2, "filter": [1, 2, 5, 8], "final": [0, 2, 4, 5, 7, 8], "find": [0, 5, 7], "fine": [1, 7], "finetun": 5, "finish": 7, "first": [0, 1, 2, 4, 5, 7, 8], "fit": [1, 2, 7], "fix": [0, 5, 7, 8], "flag": [1, 2, 4, 5], "float": [0, 2], "flow": [0, 5], "flush": 2, "fname": 2, "follow": [0, 2, 4, 5, 8], "fondu": 4, "font": 2, "font_styl": 2, "foo": [1, 5], "footrul": 5, "forc": [0, 2], "force_binar": 2, "foreground": 0, "forg": 4, "form": [0, 2, 5], "format": [1, 2, 6, 7], "format_typ": 1, "formul": 8, "forward": [2, 8], "found": [0, 1, 2, 5, 7], "four": 0, "fp": 1, "fr_manu_ro": 5, "fr_manu_ro_best": 5, "fr_manu_seg": 5, "fr_manu_seg_best": 5, "fr_manu_seg_with_ro": 5, "framework": [1, 4], "free": [2, 5], "freeli": [0, 7], "freez": 5, "freeze_backbon": 2, "french": [0, 4, 5], "frequenc": [5, 7], "friendli": [4, 7], "from": [0, 1, 2, 3, 4, 7, 8], "full": 7, "fulli": [2, 4, 5], "function": [1, 5], "fundament": 1, "further": [0, 1, 2, 4, 5], "g": [0, 2, 5, 7, 8], "gabai": 4, "gain": 1, "garantue": 2, "gaussian_filt": 2, "gener": [0, 1, 2, 5, 7], "geneva": 4, "gentl": 5, "geometr": 5, "geometri": 2, "german": 4, "get": [0, 1, 4, 5, 7], "get_feature_dim": 2, "git": 4, "github": 4, "githubusercont": 7, "gitter": 4, "give": 8, "given": [1, 2, 5, 8], "glob": [0, 1], "global": 2, "global_align": 2, "glori": 0, "glyph": [5, 7], "gn": 8, "gn32": 5, "gn8": 8, "go": 7, "good": 5, "gov": 5, "gpu": [1, 5], "gradient": 2, "grain": [1, 7], "graph": [2, 8], "graphem": [2, 5, 7], "graphemat": 4, "graphic": 5, "grave": [2, 4], "grayscal": [0, 1, 2, 5, 7, 8], "greedi": 2, "greedili": 2, "greedy_decod": [1, 2], "greek": [0, 4, 7], "grei": 0, "grek": 0, "ground": [5, 7], "ground_truth": 1, "groundtruthdataset": 2, "group": [4, 7], "groupnorm": 8, "gru": [2, 8], "gt": [2, 5], "guarante": 1, "guid": 7, "guidelin": 4, "g\u00e9r\u00e9e": 4, "h": [0, 2, 7], "ha": [0, 1, 2, 5, 7, 8], "hamza": [5, 7], "han": 5, "hand": [5, 7], "handl": 1, "handwrit": 5, "handwritten": [0, 1, 5], "hannun": 2, "happen": 1, "happili": 0, "hard": [2, 7], "hardwar": 4, "haut": 4, "have": [0, 1, 2, 3, 4, 5, 7], "headinglin": 5, "heatmap": [0, 1, 8], "hebrew": [0, 5, 7], "hebrew_training_data": 5, "height": [0, 2, 5, 8], "held": 7, "hellip": 4, "help": [4, 7], "henc": 8, "here": [0, 5], "heurist": [0, 5], "high": [0, 1, 2, 7, 8], "higher": 8, "highli": [2, 5, 7], "histor": 4, "hline": 0, "hoc": 5, "hocr": [0, 4, 7], "honor": 0, "horizon": 4, "horizont": [0, 1, 2], "hour": 7, "how": [4, 5, 7], "howev": 8, "hpo": 5, "hpu": 5, "html": 2, "htr": 4, "http": [2, 4, 5, 7], "huffmann": 5, "human": [2, 5], "hundr": 7, "hyper_param": 2, "hyperparamet": 5, "h\u0101d\u012b": 7, "i": [0, 1, 2, 4, 5, 6, 7, 8], "ibn": 7, "id": [2, 5], "ident": [1, 2, 8], "identifi": [0, 2], "idx": 2, "ignor": [0, 2, 5], "illustr": 2, "im": [1, 2], "im_feat": 2, "im_mod": 2, "im_siz": 2, "im_transform": 2, "imag": [0, 1, 2, 4, 5, 8], "image_nam": 1, "image_s": [1, 2], "imagefilenam": 5, "imageinputtransform": 2, "imagenam": 2, "imaginari": [2, 7], "img": 2, "immedi": 5, "implement": [0, 1, 8], "impli": 5, "implicitli": 5, "import": [0, 1, 5, 7], "importantli": [2, 5, 7], "improv": [0, 5, 7], "includ": [0, 1, 4, 5, 7], "inclus": 0, "incompat": 2, "inconsist": 4, "incorrect": 7, "increas": [5, 7], "independ": 8, "index": [0, 2, 5], "indic": [2, 5, 7], "individu": [0, 2, 5], "individualis": 0, "inf": 5, "infer": [2, 4, 5, 7], "influenc": 5, "inform": [0, 1, 2, 4, 5, 7], "ingest": 5, "inherit": [5, 7], "init": 1, "init_weight": 2, "initi": [0, 1, 2, 5, 7, 8], "inlin": 0, "innov": 4, "input": [1, 2, 5, 7, 8], "input_1": [0, 7], "input_2": [0, 7], "input_imag": 7, "inria": 4, "ins": 2, "insert": [2, 5, 7, 8], "insid": 2, "insight": 1, "inspect": [5, 7], "instal": 3, "instanc": [0, 1, 2, 5], "instanti": 2, "instead": [2, 5, 7], "insuffici": 7, "int": 2, "integ": [0, 1, 2, 5, 7, 8], "integr": 7, "intend": 4, "intens": 7, "interact": 0, "interchang": 2, "interfac": [2, 4], "interlinearlin": 5, "intermedi": [1, 5, 7], "intern": [0, 1, 2, 7], "interoper": 2, "interrupt": 5, "introduct": 5, "inttensor": 2, "intuit": 8, "invalid": [2, 5], "inventori": [5, 7], "invers": 0, "investiss": 4, "invoc": 5, "invok": 7, "involv": [5, 7], "iou": 5, "ipc": 2, "ipu": 5, "irregular": 5, "is_valid": 2, "isn": [1, 2, 7, 8], "italian": 4, "iter": [1, 2, 7], "its": [0, 2, 5, 7, 8], "itself": 1, "j": [2, 4], "jinja": 0, "jinja2": [1, 2], "jpeg": [0, 7], "jpeg2000": [0, 4], "jpg": [0, 5], "json": [0, 2, 4, 5], "just": [0, 1, 4, 5, 7], "justif": 5, "k": 5, "kamil": 5, "keep": [0, 5], "kei": [2, 4], "kernel": [5, 8], "kernel_s": 8, "keto": [0, 5, 7], "keyword": 0, "kind": [0, 2, 5, 6, 7], "kit\u0101b": 7, "know": 7, "known": [2, 7], "kraken": [0, 1, 3, 5, 6, 8], "krakencairosurfaceexcept": 2, "krakencodecexcept": 2, "krakenencodeexcept": 2, "krakeninputexcept": 2, "krakeninvalidmodelexcept": 2, "krakenrecordexcept": 2, "krakenrepoexcept": 2, "krakenstoptrainingexcept": 2, "krakentrain": [1, 2], "kutub": 7, "kwarg": 2, "l": [0, 2, 4, 7, 8], "l2c": [1, 2], "l2c_singl": 2, "la": 4, "label": [0, 1, 2, 5], "lack": 7, "lag": 5, "languag": [2, 4, 5, 8], "larg": [0, 1, 2, 4, 5, 7], "larger": [2, 5, 7], "last": [0, 2, 5, 8], "lastli": 5, "later": [0, 7], "latest": [3, 4], "latin": [0, 4], "latin_training_data": 5, "latn": [0, 4], "latter": 1, "layer": [2, 5, 7], "layout": [0, 2, 4, 5, 7], "lbx100": [5, 7, 8], "lbx128": 8, "lbx200": 5, "lbx256": [5, 8], "learn": [1, 2, 5], "least": [5, 7], "leav": [5, 8], "lectaurep": 0, "left": [0, 2, 4, 5, 7], "leftmost": 2, "leftward": 0, "legaci": [5, 7, 8], "legacy_polygon": 2, "legacy_polygons_statu": 2, "leipzig": 7, "len": 2, "length": [2, 5], "less": [5, 7], "let": 7, "letter": [0, 4], "level": [0, 1, 2, 5, 7], "lfx25": 8, "lfys20": 8, "lfys64": 8, "lib": 1, "libr": 4, "librari": 1, "licens": 0, "ligatur": 4, "light": 0, "lightn": 1, "lightningmodul": 1, "like": [0, 1, 5, 7], "likewis": [1, 7], "limit": [0, 5], "line": [0, 1, 2, 4, 5, 7, 8], "line_0": 5, "line_idx": 2, "line_it": 2, "line_k": 5, "line_ord": 2, "line_width": 2, "linear": [2, 5, 7, 8], "link": [4, 5], "linux": [4, 7], "list": [0, 2, 4, 5, 7], "liter": 2, "litteratur": 0, "ll": 4, "load": [0, 1, 2, 4, 5, 7], "load_ani": [1, 2], "load_model": [1, 2], "loadabl": 2, "loader": 1, "loc": 5, "local": 5, "locat": [1, 2, 5, 7], "log": [2, 5, 7], "log_dir": 2, "logger": [2, 5], "logic": [2, 5], "logical_ord": 2, "logograph": 5, "long": [0, 4, 5], "longest": 2, "look": [0, 1, 5, 7], "loss": 5, "lossless": 7, "lot": [1, 5], "low": [0, 1, 2, 5], "lower": 5, "lr": [0, 1, 2, 7], "lrate": 5, "lstm": [2, 8], "ltr": 0, "m": [0, 2, 4, 5, 7, 8], "mac": [4, 7], "machin": 2, "macron": [0, 4], "maddah": 7, "made": 7, "mai": [0, 2, 5, 7], "main": [0, 4, 5, 7], "mainli": 1, "major": 1, "make": [0, 5], "mandatori": 1, "mani": [2, 5], "manifest": 5, "manual": [0, 1, 2, 5, 7], "manuscript": [0, 4, 7], "map": [0, 1, 2, 5], "mark": [5, 7], "markedli": 7, "mask": [1, 2, 5], "massag": 5, "match": [2, 5, 8], "materi": [0, 1, 4, 5, 7], "matric": 2, "matrix": [1, 5], "matter": 7, "max": 2, "max_epoch": 2, "max_label": 2, "maxcolsep": [0, 2], "maxim": 7, "maximum": [0, 2, 8], "maxpool": [2, 5, 8], "mb": [0, 5], "mbl_dict": 2, "mean": [1, 2, 5, 7], "measur": 5, "measurementunit": 5, "mediev": [0, 4], "memori": [2, 5, 7], "merg": [2, 5], "merge_baselin": 2, "merge_region": 2, "messag": 2, "metadata": [0, 1, 2, 4, 5, 6, 7], "method": [0, 1, 2], "metric": 5, "might": [0, 4, 5, 7], "min": [2, 5], "min_epoch": 2, "min_length": 2, "mind": 5, "minim": [1, 2, 5], "minimum": 5, "minor": 5, "mismatch": [1, 5, 7], "misrecogn": 7, "miss": [0, 2, 5, 7], "mittagessen": [4, 7], "mix": [0, 2, 5], "ml": 6, "mlmodel": [0, 4, 5, 7], "mlp": 5, "mm_rpred": [1, 2], "mode": [0, 1, 2, 5], "model": [1, 7, 8], "model_1": 5, "model_25": 5, "model_5": 5, "model_best": 5, "model_fil": 7, "model_nam": 7, "model_name_best": 7, "model_path": 1, "model_typ": 2, "modern": [0, 7], "modest": 1, "modif": 5, "modul": 1, "momentum": [5, 7], "mono": 0, "more": [0, 1, 2, 4, 5, 7, 8], "most": [0, 1, 2, 5, 7], "mostli": [0, 1, 4, 5, 7, 8], "move": [2, 7, 8], "mp": 8, "mp2": [5, 8], "mp3": 8, "mreg_dict": 2, "much": [1, 2, 4, 5], "multi": [0, 1, 2, 4, 7], "multilabel": 2, "multipl": [0, 1, 2, 4, 5, 7], "my": 0, "myprintingcallback": 1, "n": [0, 2, 5, 8], "name": [0, 2, 4, 5, 7, 8], "named_spec": 2, "national": 4, "nativ": [0, 2, 6], "natur": [2, 7], "nchw": 2, "ndarrai": 2, "necessari": [0, 2, 4, 5, 7], "necessarili": [2, 5], "need": [1, 2, 7], "neg": 5, "nest": 2, "net": [1, 2, 7], "network": [1, 2, 4, 5, 6, 7], "neural": [1, 2, 5, 6, 7], "neural_reading_ord": 2, "never": 7, "nevertheless": [1, 5], "new": [0, 2, 3, 5, 7, 8], "next": [1, 7], "nf": 5, "nfc": 5, "nfd": 5, "nfkc": 5, "nfkd": [4, 5], "nlbin": [0, 1, 2], "nn": 2, "no_encod": 2, "no_hlin": 2, "no_legacy_polygon": 2, "noisi": 7, "non": [0, 1, 2, 4, 5, 7, 8], "none": [0, 2, 5, 7, 8], "nonlinear": 8, "nop": 1, "norm": 4, "normal": [2, 4], "notabl": 0, "note": 2, "notion": 1, "now": [1, 7], "np": 2, "num": [2, 5], "num_class": 2, "number": [0, 1, 2, 5, 7, 8], "numer": [1, 2, 7], "numpi": [1, 2], "nvidia": [3, 5], "o": [0, 1, 2, 4, 5, 7], "o1c103": 8, "o2l8": 8, "object": [0, 1, 2], "obtain": 7, "obvious": 7, "occur": 7, "occurr": 2, "ocr": [0, 1, 2, 4, 7], "ocr_record": [1, 2], "ocropi": 2, "ocropu": [0, 2], "off": [5, 7], "offer": 5, "offset": [2, 5], "often": [0, 1, 5, 7], "ogonek": 4, "old": [0, 2, 6], "omit": 7, "on_init_end": 1, "on_init_start": 1, "on_train_end": 1, "onc": [0, 5], "one": [0, 1, 2, 5, 7, 8], "one_channel_mod": 2, "ones": 5, "onli": [0, 1, 2, 5, 7, 8], "onto": [2, 5], "op": 2, "open": 1, "openmp": [2, 5, 7], "oper": [1, 2, 8], "optic": [0, 7], "optim": [0, 4, 5, 7], "option": [0, 1, 2, 5, 8], "order": [0, 1, 4, 8], "org": [2, 5], "orient": [0, 1, 2], "origin": [1, 2, 5, 8], "orthogon": 2, "other": [0, 4, 5, 7, 8], "otherwis": [2, 5], "out": [0, 5, 7, 8], "output": [1, 2, 4, 5, 7, 8], "output_1": [0, 7], "output_2": [0, 7], "output_dir": 7, "output_fil": 7, "output_s": 2, "outsid": 2, "over": [2, 4], "overal": 5, "overfit": 7, "overhead": 5, "overlap": 5, "overrepres": 5, "overrid": [2, 5], "overwritten": 2, "own": 4, "p": [0, 4, 5], "packag": [2, 4, 7], "pad": [0, 2, 5], "padding_left": 2, "padding_right": 2, "page": [1, 2, 4, 7], "page_doc": 1, "page_idx": 2, "pagecont": 5, "pageseg": 1, "pagewiseroset": 2, "pagexml": [0, 1, 4, 7], "paint": 5, "pair": [0, 2, 5], "pairwiseroset": 2, "paper": [0, 4], "par": [1, 4], "paradigm": 0, "paragraph": [2, 5], "parallel": [2, 5, 8], "param": [5, 7, 8], "paramet": [0, 1, 2, 4, 5, 7, 8], "parameterless": 0, "parametr": 2, "parchment": 0, "pari": 4, "pars": [2, 5], "parse_alto": 1, "parse_pag": 1, "parser": [1, 2, 5], "part": [0, 1, 5, 7, 8], "parti": 1, "partial": [2, 4], "particular": [0, 1, 4, 5, 7, 8], "partit": 5, "pass": [2, 5, 7, 8], "path": [1, 2, 5], "pathlik": 2, "pattern": [2, 7], "pcgt": 5, "pdf": [0, 4, 7], "pdfimag": 7, "pdftocairo": 7, "peopl": 4, "per": [0, 1, 2, 5, 7], "perc": [0, 2], "percentag": 2, "percentil": 2, "perfect": 5, "perform": [1, 2, 4, 5, 7], "period": 7, "perispomeni": 4, "persist": 0, "person": 0, "pick": 5, "pickl": 6, "pil": [1, 2], "pillow": 1, "pinch": 0, "pinpoint": 7, "pipelin": [1, 2, 5], "pixel": [0, 1, 5, 8], "pl_logger": 2, "pl_modul": 1, "place": [0, 4, 5, 7], "placement": 7, "plain": 0, "platform": 0, "pleas": 5, "plethora": 1, "png": [0, 1, 5, 7], "point": [0, 1, 2, 5, 7], "polygon": [0, 1, 2, 5, 7], "polygonal_reading_ord": 2, "polygongtdataset": 2, "polygonizaton": 2, "polylin": 2, "polyton": [0, 7], "pool": 5, "porson": 0, "portant": 4, "portion": 0, "posit": [2, 5], "possibl": [0, 1, 2, 4, 5, 7, 8], "postprocess": [1, 2, 5], "potenti": 5, "power": [5, 7], "pratiqu": 4, "pre": [0, 5], "preced": 5, "precis": [2, 5], "precompil": [2, 5], "precomput": 2, "pred": 2, "pred_it": 1, "predict": [1, 2, 5], "predict_label": 2, "predict_str": 2, "prefer": [1, 7], "prefilt": 0, "prefix": [2, 5, 7], "prefix_epoch": 7, "preliminari": 0, "preload": 7, "prematur": 5, "prepar": 7, "prepend": 8, "preprint": 2, "preprocess": [2, 4], "prerequisit": 4, "present": 4, "preserv": [2, 4], "pretrain_best": 5, "prevent": [2, 7], "previou": [4, 5], "previous": [4, 5], "primaresearch": 5, "primari": [0, 1, 5], "primarili": 4, "princip": [1, 2, 5], "principl": 4, "print": [0, 1, 2, 4, 5, 7], "printspac": 5, "privat": 0, "prob": [2, 8], "probabl": [2, 5, 7, 8], "problemat": 5, "proce": 8, "proceed": 2, "process": [0, 1, 2, 4, 5, 7, 8], "processing_step": 2, "processingstep": 2, "produc": [0, 1, 4, 5, 7], "programm": 4, "progress": [2, 7], "project": [4, 8], "prone": 5, "pronn": 6, "proper": 1, "properli": 7, "properti": 2, "proport": 5, "proportion": 2, "protobuf": [2, 6], "prove": 7, "provid": [0, 1, 2, 4, 5, 7, 8], "psl": 4, "public": [0, 4], "publish": 4, "pull": 4, "pure": 5, "purpos": [0, 1, 2, 7, 8], "put": [2, 5, 7], "py": 1, "pypi": 4, "pyrnn": 6, "python": 4, "pytorch": [0, 1, 3, 6], "pytorch_lightn": [1, 2], "pytorchcodec": 2, "pyvip": 4, "q": 5, "qualiti": [0, 1, 7], "queryabl": 0, "quit": [1, 4, 5], "r": [0, 2, 5, 8], "rais": [1, 2, 5], "raise_on_error": 2, "ran": 4, "random": [2, 5, 7], "randomli": 5, "rang": [0, 2], "rank": 5, "rapidli": [5, 7], "rare": 5, "rate": [5, 7], "rather": [0, 5], "ratio": 5, "raw": [0, 1, 5, 7], "rb": 2, "reach": [5, 7], "read": [0, 4], "reader": 5, "reading_ord": 2, "reading_order_fn": 2, "real": 7, "realiz": 5, "reason": [0, 2, 5], "rebuild_alphabet": 2, "rec_model_path": 1, "recherch": 4, "recogn": [0, 1, 2, 4, 5, 7], "recognit": [3, 8], "recognitionmodel": 1, "recommend": [0, 1, 5, 7], "recomput": 2, "record": [1, 2, 4], "rectangl": 2, "rectangular": 0, "recurr": [2, 6], "reduc": [5, 8], "reduceonplateau": 5, "refer": [0, 1, 5, 7], "refin": 5, "region": [0, 1, 2, 4, 5, 7], "region_typ": 5, "regular": 5, "reinstanti": 2, "rel": 5, "relat": [0, 1, 5, 7], "relax": 7, "reli": 5, "reliabl": [5, 7], "relu": [5, 8], "remain": [0, 5, 7], "remaind": 8, "remedi": 7, "remov": [0, 2, 5, 7, 8], "render": [1, 2], "render_report": 2, "reorder": [2, 5, 7], "repeatedli": 7, "repolygon": 1, "report": [2, 5, 7], "repositori": [4, 5, 7], "repres": 2, "represent": [2, 7], "reproduc": 5, "request": [0, 4, 8], "requir": [0, 1, 2, 4, 5, 7, 8], "requisit": 7, "rescal": 2, "research": 4, "reserv": 1, "reshap": [2, 5], "resili": 4, "resiz": [2, 5], "resize_output": 2, "resolut": 2, "resolv": [4, 5], "respect": [1, 2, 4, 5, 8], "result": [0, 1, 2, 4, 5, 7, 8], "resum": 5, "retain": [2, 5], "retrain": 7, "retriev": [4, 5, 7], "return": [0, 1, 2, 8], "reus": 2, "revers": [4, 8], "rgb": [1, 2, 5, 8], "right": [0, 2, 4, 5, 7], "ring": 4, "rl": [0, 2], "rmsprop": [5, 7], "rnn": [2, 4, 5, 7, 8], "ro": [2, 5], "ro_id": 2, "ro_net": 5, "roadd": 5, "robust": 5, "romanov": 7, "root": 5, "rotat": 0, "rotrain": 5, "rough": 7, "roughli": 0, "routin": 1, "rpred": 1, "rtl": 0, "rtl_display_data": 5, "rtl_training_data": 5, "rukkakha": 7, "rule": 7, "run": [1, 2, 3, 4, 5, 7, 8], "r\u00e9f\u00e9renc": 4, "s1": [5, 8], "sa": 0, "same": [0, 1, 2, 4, 5, 7, 8], "sampl": [2, 5, 7], "sarah": 7, "satur": 5, "savant": 7, "save": [2, 5, 7], "save_model": 2, "savefreq": [5, 7], "scale": [0, 2, 5, 8], "scale_polygonal_lin": 2, "scale_region": 2, "scan": 7, "scantailor": 7, "schedul": 5, "schema": 5, "schemaloc": 5, "scientif": 4, "score": 5, "scratch": 0, "script": [0, 1, 2, 4, 5, 7], "script_detect": [1, 2], "scriptal": 1, "scroung": 4, "seamcarv": 2, "search": [0, 2], "second": [0, 2], "section": [1, 2, 7], "see": [0, 1, 2, 5, 7], "seen": [0, 1, 7], "seg": 1, "seg_idx": 2, "seg_typ": 2, "segment": [4, 7], "segment_k": 5, "segmentation_output": 1, "segmentation_overlai": 1, "segmentationmodel": 1, "segmodel_best": 5, "segtrain": 5, "seldom": 7, "select": [0, 2, 5, 8], "selector": 2, "self": 1, "semant": [5, 7], "semi": [0, 7], "sensibl": [1, 5], "separ": [0, 1, 2, 4, 5, 7, 8], "sephardi": 0, "seq1": 2, "seq2": 2, "seqrecogn": 2, "sequenc": [1, 2, 5, 7, 8], "serial": [0, 4, 5, 6, 8], "serialize_segment": 1, "set": [0, 1, 2, 4, 5, 7, 8], "set_num_thread": 2, "setup": 1, "sever": [1, 4, 7, 8], "sgd": 5, "shape": [2, 5, 8], "share": [0, 5], "shell": 7, "shini": 2, "ship": 2, "short": [0, 8], "should": [1, 2, 7], "show": [0, 4, 5, 7], "shown": [0, 7], "shuffl": 1, "side": 0, "sigmoid": 8, "signific": 5, "similar": [1, 5, 7], "simon": 4, "simpl": [0, 1, 5, 7, 8], "simpli": [2, 8], "simplifi": 0, "singl": [0, 1, 2, 5, 7, 8], "singular": 2, "size": [0, 1, 2, 5, 7, 8], "skew": [0, 7], "skip": 2, "skip_empty_lin": 2, "slice": 2, "slightli": [4, 5, 7, 8], "slow": [2, 5], "slower": 5, "small": [0, 1, 2, 4, 5, 7, 8], "so": [0, 1, 2, 3, 5, 7, 8], "sobel": 2, "softmax": [1, 2, 8], "softwar": [0, 7], "some": [0, 1, 4, 5, 7], "someon": 0, "someth": [1, 7], "sometim": [1, 4, 5, 7], "somewhat": 7, "soon": [5, 7], "sort": [2, 4, 7], "sourc": [2, 5, 7, 8], "sourceimageinform": 5, "sp": 5, "space": [0, 1, 2, 4, 5, 7], "spanish": 4, "spearman": 5, "spec": [2, 5], "special": [0, 1, 2], "specialis": 5, "specif": [0, 5, 7], "specifi": [0, 5], "speckl": 7, "speech": 2, "speed": 5, "speedup": 5, "split": [2, 5, 7, 8], "split_filt": 2, "spot": 4, "sqrt": 5, "squar": 5, "squash": [2, 8], "stabil": 2, "stabl": [1, 4, 5], "stack": [2, 5, 8], "stage": [0, 1, 5], "standard": [0, 1, 4, 5, 7], "start": [0, 1, 2, 5, 7], "start_separ": 2, "stddev": 5, "step": [0, 1, 2, 4, 5, 7, 8], "still": [0, 1, 2, 4], "stop": [5, 7], "storag": 5, "str": 2, "straightforward": 1, "stream": 5, "strength": 1, "strict": [2, 5], "strictli": 7, "stride": [5, 8], "stride_i": 8, "stride_x": 8, "string": [2, 5, 8], "strip": 8, "strong": 4, "structur": [1, 4, 5], "stub": 5, "sub": [1, 2], "subcommand": [0, 4], "subcommand_1": 0, "subcommand_2": 0, "subcommand_n": 0, "subimag": 2, "suboptim": 5, "subsampl": 5, "subsequ": [1, 2, 5], "subset": [1, 2], "substitut": [2, 5, 7], "suffer": 7, "suffici": [1, 5], "suffix": 0, "suggest": [0, 1], "suit": 7, "suitabl": [0, 7], "sum": 5, "summar": [2, 5, 7, 8], "superflu": 7, "superscript": 4, "supervis": 5, "suppl_obj": 2, "suppli": [0, 1, 2, 5, 7], "support": [0, 1, 4, 5, 6], "suppos": 1, "suppress": [0, 5], "sure": [0, 5], "surfac": [0, 2], "surrog": 5, "switch": [0, 2, 5, 7], "symbol": [5, 7], "syntax": [0, 5, 8], "syr": [5, 7], "syriac": 7, "syriac_best": 7, "system": [0, 4, 5, 7, 8], "systemat": 7, "t": [0, 1, 2, 5, 7, 8], "tabl": [5, 7], "tag": [2, 5], "tags_ignor": 2, "take": [1, 4, 5, 7, 8], "tanh": 8, "target": 2, "target_output_shap": 2, "task": [5, 7], "tb": 2, "technic": 4, "tei": 0, "tell": 5, "templat": [0, 1, 4], "template_sourc": 2, "tempor": 2, "tensor": [1, 2, 8], "tensorboard": 5, "tensorflow": 8, "term": 4, "tesseract": 8, "test": [2, 7], "test_model": 5, "text": [1, 2, 4, 7], "text_direct": [1, 2], "text_transform": 2, "textblock": 5, "textblock_m": 5, "textblock_n": 5, "textequiv": 5, "textlin": 5, "textregion": 5, "th": 2, "than": [2, 5, 7], "thei": [1, 2, 5, 7], "them": [0, 2, 5], "therefor": [0, 5, 7], "therein": 7, "thi": [0, 1, 2, 4, 5, 6, 7, 8], "thibault": 4, "thing": 5, "third": 1, "those": [4, 5], "though": 1, "thousand": 7, "thread": [2, 5, 7], "three": [5, 6], "threshold": [0, 2], "through": [0, 1, 2, 4, 5, 7, 8], "thrown": 0, "tif": [0, 4], "tiff": [0, 4, 7], "tightli": 7, "tild": [0, 4], "time": [0, 1, 2, 5, 7, 8], "tip": 1, "titl": 0, "titr": 4, "tmpl": [0, 2], "togeth": 8, "token": 0, "told": 5, "too": [5, 8], "tool": [1, 5, 7, 8], "top": [0, 1, 2, 4], "toplin": [2, 5], "topolog": 0, "torch": 2, "torchsegrecogn": 2, "torchseqrecogn": [1, 2], "torchvgslmodel": [1, 2], "total": [5, 7], "tpu": 5, "tr9": 2, "track": 5, "train": [0, 3, 8], "trainabl": [0, 1, 2, 4, 5], "trainer": [1, 5], "training_data": [1, 5], "training_fil": 1, "transcrib": [4, 5, 7], "transcript": [1, 2, 4, 5], "transcriptioninterfac": 2, "transfer": [1, 5], "transform": [1, 2, 4, 5, 8], "transformt": 1, "translat": 2, "transpos": [5, 7, 8], "travail": 4, "treat": [2, 7, 8], "trial": 5, "true": [1, 2, 8], "truli": 0, "truth": [5, 7], "try": [2, 4, 8], "tupl": 2, "turn": 4, "tutori": [1, 5], "tweak": 0, "two": [0, 1, 2, 5, 8], "txt": [0, 4, 5], "type": [0, 1, 2, 5, 7, 8], "typefac": [5, 7], "typograph": 7, "typologi": 5, "u": [0, 1, 4, 5], "u1f05": 5, "uax": 2, "un": 4, "unclean": 7, "unclear": 5, "undecod": 1, "undegrad": 0, "under": [0, 4], "undesir": [5, 8], "unencod": 2, "uneven": 0, "uni": [0, 7], "unicod": [1, 2, 4, 7], "uniformli": 2, "union": [2, 4, 5], "uniqu": [0, 2, 7], "univers": [0, 4], "universit\u00e9": 4, "unlabel": 5, "unlearn": 5, "unless": 5, "unnecessarili": 1, "unpredict": 7, "unrepres": 7, "unseg": [2, 7], "unset": 5, "until": 5, "untrain": 5, "unus": 5, "up": [1, 4, 5], "upcom": 4, "updat": 0, "upload": [0, 5], "upon": 0, "upward": [2, 5, 7], "ur": 0, "us": [0, 1, 2, 3, 5, 7, 8], "usabl": 1, "use_legacy_polygon": 2, "user": [0, 2, 4, 5, 7], "user_metadata": 2, "usual": [0, 1, 5, 7], "utf": 5, "util": [1, 4, 5, 7], "v": [4, 5, 7], "v4": 5, "val_loss": 5, "val_spearman": 5, "valid": [0, 2, 5], "valid_baselin": 2, "valid_norm": 2, "valid_region": 2, "valu": [0, 1, 2, 5, 8], "valueerror": 2, "variabl": [2, 4, 5, 8], "variant": [4, 5, 8], "variat": 5, "varieti": [4, 5], "variou": 0, "vast": 1, "vector": [0, 1, 2, 5], "vectorize_lin": 2, "verbos": [1, 7], "veri": 5, "versa": [0, 5], "versatil": 6, "version": [0, 2, 3, 4, 5], "vertic": [0, 2], "vgsl": [1, 5], "vice": [0, 5], "visual": [0, 5], "vocabulari": 2, "vocal": 7, "vpo": 5, "vsgl": 2, "vv": 7, "w": [0, 1, 2, 5, 8], "w3": 5, "wa": [2, 4, 5, 7], "wai": [0, 1, 5, 7], "wait": 5, "want": [4, 5, 7], "warmup": 5, "warn": [0, 1, 2, 7], "warp": 7, "wav2vec2": 2, "we": [2, 5, 7], "weak": [1, 7], "websit": 7, "weight": [2, 5], "welcom": 4, "well": [0, 5, 7], "were": [2, 5], "west": 4, "western": 7, "wget": 7, "what": [1, 7], "when": [0, 1, 2, 5, 7, 8], "where": [0, 2, 5, 7, 8], "whether": 2, "which": [0, 1, 2, 3, 4, 5], "while": [0, 1, 2, 5, 7], "white": [0, 1, 2, 7], "whitespac": [2, 5], "whitespace_norm": 2, "whole": [2, 7], "wide": [4, 8], "wider": 0, "width": [1, 2, 5, 7, 8], "wildli": 7, "without": [0, 2, 5, 7], "won": 2, "word": [2, 4, 5], "word_text": 5, "work": [0, 1, 2, 5, 7, 8], "workabl": 5, "worker": 5, "world": [0, 7], "worsen": 0, "would": [0, 2, 5], "wrapper": [1, 2], "write": [0, 1, 2, 5], "writing_mod": 2, "written": [0, 5, 7], "www": 5, "x": [0, 2, 4, 5, 7, 8], "x0": 2, "x01": 1, "x02": 1, "x03": 1, "x04": 1, "x05": 1, "x06": 1, "x07": 1, "x1": 2, "x64": 4, "x_n": 2, "x_stride": 8, "xa0": 7, "xdg_base_dir": 0, "xk": 2, "xm": 2, "xmax": 2, "xmin": 2, "xml": [0, 7], "xmln": 5, "xmlpage": 2, "xmlschema": 5, "xn": 2, "xsd": 5, "xsi": 5, "xyz": 0, "y": [0, 2, 8], "y0": 2, "y1": 2, "y2": 2, "y_n": 2, "y_stride": 8, "year": 4, "yield": 2, "yk": 2, "ym": 2, "ymax": 2, "ymin": 2, "yml": [4, 7], "yn": 2, "you": [4, 5, 7], "your": 5, "ypogegrammeni": 4, "y\u016bsuf": 7, "zenodo": [0, 4], "zero": [2, 7, 8], "zigzag": 0, "zoom": [0, 2], "\u00e3\u00ed\u00f1\u00f5": 0, "\u00e6\u00df\u00e6\u0111\u0142\u0153\u0153\u0180\u01dd\u0247\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c2\u03c3\u03c4\u03c5\u03c6\u03c7\u03c9\u03db\u05d7\u05dc\u05e8\u1455\u15c5\u15de\u16a0\u00df": 4, "\u00e9cole": 4, "\u00e9tat": 4, "\u00e9tude": 4, "\u0127\u0129\u0142\u0169\u01ba\u1d49\u1ebd": 0, "\u02bf\u0101lam": 7, "\u0621": 5, "\u0621\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0627": 5, "\u0628": 5, "\u0629": 5, "\u062a": 5, "\u062b": 5, "\u062c": 5, "\u062d": 5, "\u062e": 5, "\u062f": 5, "\u0630": 5, "\u0631": 5, "\u0632": 5, "\u0633": 5, "\u0634": 5, "\u0635": 5, "\u0636": 5, "\u0637": 5, "\u0638": 5, "\u0639": 5, "\u063a": 5, "\u0640": 5, "\u0641": 5, "\u0642": 5, "\u0643": 5, "\u0644": 5, "\u0645": 5, "\u0646": 5, "\u0647": 5, "\u0648": 5, "\u0649": 5, "\u064a": 5, "\u0710": 7, "\u0712": 7, "\u0713": 7, "\u0715": 7, "\u0717": 7, "\u0718": 7, "\u0719": 7, "\u071a": 7, "\u071b": 7, "\u071d": 7, "\u071f": 7, "\u0720": 7, "\u0721": 7, "\u0722": 7, "\u0723": 7, "\u0725": 7, "\u0726": 7, "\u0728": 7, "\u0729": 7, "\u072a": 7, "\u072b": 7, "\u072c": 7, "\u2079\ua751\ua753\ua76f\ua770": 0, "\ua751\ua753\ua757\ua759\ua75f\ua76f\ua775": 4}, "titles": ["Advanced Usage", "API Quickstart", "API Reference", "GPU Acceleration", "kraken", "Training", "Models", "Training kraken", "VGSL network specification"], "titleterms": {"4": 2, "abbyi": 2, "acceler": 3, "acquisit": 7, "advanc": 0, "alto": [2, 5], "annot": 7, "api": [1, 2], "baselin": [0, 1], "basic": [1, 8], "best": 5, "binar": [0, 2], "binari": 5, "blla": 2, "box": 0, "codec": [2, 5], "compil": 7, "concept": 1, "conda": 4, "contain": 2, "convolut": 8, "coreml": 6, "ctc_decod": 2, "data": 5, "dataset": [2, 5, 7], "default": 2, "direct": 0, "dropout": 8, "evalu": [2, 7], "exampl": 8, "except": 2, "featur": 4, "find": 4, "fine": 5, "format": [0, 5], "from": 5, "function": 2, "fund": 4, "gpu": 3, "group": 8, "helper": [2, 8], "hocr": 2, "imag": 7, "input": 0, "instal": [4, 7], "kraken": [2, 4, 7], "layer": 8, "legaci": [0, 1, 2], "lib": 2, "licens": 4, "linegen": 2, "loss": 2, "mask": 0, "max": 8, "model": [0, 2, 4, 5, 6], "modul": 2, "network": 8, "normal": [5, 8], "order": [2, 5], "output": 0, "page": [0, 5], "pageseg": 2, "pagexml": 2, "pars": 1, "pip": 4, "plumb": 8, "pool": 8, "practic": 5, "preprocess": [1, 7], "pretrain": 5, "princip": 0, "publish": 0, "queri": 0, "quickstart": [1, 4], "read": [2, 5], "recognit": [0, 1, 2, 4, 5, 6, 7], "recurr": 8, "refer": 2, "regular": 8, "relat": 4, "repositori": 0, "reshap": 8, "retriev": 0, "rpred": 2, "scratch": 5, "segment": [0, 1, 2, 5, 6], "serial": [1, 2], "slice": 5, "softwar": 4, "specif": 8, "templat": 2, "test": 5, "text": [0, 5], "train": [1, 2, 4, 5, 7], "trainer": 2, "transcrib": 2, "transcript": 7, "tune": 5, "tutori": 4, "unicod": 5, "unsupervis": 5, "us": 4, "usag": 0, "valid": 7, "vgsl": [2, 8], "xml": [1, 2, 5]}}) \ No newline at end of file diff --git a/5.0.0/training.html b/5.0.0/training.html new file mode 100644 index 000000000..316fa422e --- /dev/null +++ b/5.0.0/training.html @@ -0,0 +1,509 @@ + + + + + + + + Training kraken — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Training kraken

+

kraken is an optical character recognition package that can be trained fairly +easily for a large number of scripts. In contrast to other system requiring +segmentation down to glyph level before classification, it is uniquely suited +for the recognition of connected scripts, because the neural network is trained +to assign correct character to unsegmented training data.

+

Both segmentation, the process finding lines and regions on a page image, and +recognition, the conversion of line images into text, can be trained in kraken. +To train models for either we require training data, i.e. examples of page +segmentations and transcriptions that are similar to what we want to be able to +recognize. For segmentation the examples are the location of baselines, i.e. +the imaginary lines the text is written on, and polygons of regions. For +recognition these are the text contained in a line. There are multiple ways to +supply training data but the easiest is through PageXML or ALTO files.

+
+

Installing kraken

+

The easiest way to install and use kraken is through conda. kraken works both on Linux and Mac OS +X. After installing conda, download the environment file and create the +environment for kraken:

+
$ wget https://raw.githubusercontent.com/mittagessen/kraken/main/environment.yml
+$ conda env create -f environment.yml
+
+
+

Each time you want to use the kraken environment in a shell is has to be +activated first:

+
$ conda activate kraken
+
+
+
+
+

Image acquisition and preprocessing

+

First a number of high quality scans, preferably color or grayscale and at +least 300dpi are required. Scans should be in a lossless image format such as +TIFF or PNG, images in PDF files have to be extracted beforehand using a tool +such as pdftocairo or pdfimages. While each of these requirements can +be relaxed to a degree, the final accuracy will suffer to some extent. For +example, only slightly compressed JPEG scans are generally suitable for +training and recognition.

+

Depending on the source of the scans some preprocessing such as splitting scans +into pages, correcting skew and warp, and removing speckles can be advisable +although it isn’t strictly necessary as the segmenter can be trained to treat +noisy material with a high accuracy. A fairly user-friendly software for +semi-automatic batch processing of image scans is Scantailor albeit most work can be done using a standard image +editor.

+

The total number of scans required depends on the kind of model to train +(segmentation or recognition), the complexity of the layout or the nature of +the script to recognize. Only features that are found in the training data can +later be recognized, so it is important that the coverage of typographic +features is exhaustive. Training a small segmentation model for a particular +kind of material might require less than a few hundred samples while a general +model can well go into the thousands of pages. Likewise a specific recognition +model for printed script with a small grapheme inventory such as Arabic or +Hebrew requires around 800 lines, with manuscripts, complex scripts (such as +polytonic Greek), and general models for multiple typefaces and hands needing +more training data for the same accuracy.

+

There is no hard rule for the amount of training data and it may be required to +retrain a model after the initial training data proves insufficient. Most +western texts contain between 25 and 40 lines per page, therefore upward of +30 pages have to be preprocessed and later transcribed.

+
+
+

Annotation and transcription

+

kraken does not provide internal tools for the annotation and transcription of +baselines, regions, and text. There are a number of tools available that can +create ALTO and PageXML files containing the requisite information for either +segmentation or recognition training: escriptorium integrates kraken tightly including +training and inference, Aletheia is a powerful desktop +application that can create fine grained annotations.

+
+
+

Dataset Compilation

+
+
+

Training

+

The training data, e.g. a collection of PAGE XML documents, obtained through +annotation and transcription may now be used to train segmentation and/or +transcription models.

+

The training data in output_dir may now be used to train a new model by +invoking the ketos train command. Just hand a list of images to the command +such as:

+
$ ketos train output_dir/*.png
+
+
+

to start training.

+

A number of lines will be split off into a separate held-out set that is used +to estimate the actual recognition accuracy achieved in the real world. These +are never shown to the network during training but will be recognized +periodically to evaluate the accuracy of the model. Per default the validation +set will comprise of 10% of the training data.

+

Basic model training is mostly automatic albeit there are multiple parameters +that can be adjusted:

+
+
--output
+

Sets the prefix for models generated during training. They will best as +prefix_epochs.mlmodel.

+
+
--report
+

How often evaluation passes are run on the validation set. It is an +integer equal or larger than 1 with 1 meaning a report is created each +time the complete training set has been seen by the network.

+
+
--savefreq
+

How often intermediate models are saved to disk. It is an integer with +the same semantics as --report.

+
+
--load
+

Continuing training is possible by loading an existing model file with +--load. To continue training from a base model with another +training set refer to the full ketos documentation.

+
+
--preload
+

Enables/disables preloading of the training set into memory for +accelerated training. The default setting preloads data sets with less +than 2500 lines, explicitly adding --preload will preload arbitrary +sized sets. --no-preload disables preloading in all circumstances.

+
+
+

Training a network will take some time on a modern computer, even with the +default parameters. While the exact time required is unpredictable as training +is a somewhat random process a rough guide is that accuracy seldom improves +after 50 epochs reached between 8 and 24 hours of training.

+

When to stop training is a matter of experience; the default setting employs a +fairly reliable approach known as early stopping that stops training as soon as +the error rate on the validation set doesn’t improve anymore. This will +prevent overfitting, i.e. +fitting the model to recognize only the training data properly instead of the +general patterns contained therein.

+
$ ketos train output_dir/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'}
+Initializing model ✓
+Accuracy report (0) -1.5951 3680 9550
+epoch 0/-1  [####################################]  788/788
+Accuracy report (1) 0.0245 3504 3418
+epoch 1/-1  [####################################]  788/788
+Accuracy report (2) 0.8445 3504 545
+epoch 2/-1  [####################################]  788/788
+Accuracy report (3) 0.9541 3504 161
+epoch 3/-1  [------------------------------------]  13/788  0d 00:22:09
+...
+
+
+

By now there should be a couple of models model_name-1.mlmodel, +model_name-2.mlmodel, … in the directory the script was executed in. Lets +take a look at each part of the output.

+
Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+
+
+

shows the progress of loading the training and validation set into memory. This +might take a while as preprocessing the whole set and putting it into memory is +computationally intensive. Loading can be made faster without preloading at the +cost of performing preprocessing repeatedly during the training process.

+
[270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'}
+
+
+

is a warning about missing characters in either the validation or training set, +i.e. that the alphabets of the sets are not equal. Increasing the size of the +validation set will often remedy this warning.

+
Accuracy report (2) 0.8445 3504 545
+
+
+

this line shows the results of the validation set evaluation. The error after 2 +epochs is 545 incorrect characters out of 3504 characters in the validation set +for a character accuracy of 84.4%. It should decrease fairly rapidly. If +accuracy remains around 0.30 something is amiss, e.g. non-reordered +right-to-left or wildly incorrect transcriptions. Abort training, correct the +error(s) and start again.

+

After training is finished the best model is saved as +model_name_best.mlmodel. It is highly recommended to also archive the +training log and data for later reference.

+

ketos can also produce more verbose output with training set and network +information by appending one or more -v to the command:

+
$ ketos -vv train syr/*.png
+[0.7272] Building ground truth set from 876 line images
+[0.7281] Taking 88 lines from training for evaluation
+...
+[0.8479] Training set 788 lines, validation set 88 lines, alphabet 48 symbols
+[0.8481] alphabet mismatch {'\xa0', '0', ':', '݀', '܇', '݂', '5'}
+[0.8482] grapheme       count
+[0.8484] SPACE  5258
+[0.8484]        ܐ       3519
+[0.8485]        ܘ       2334
+[0.8486]        ܝ       2096
+[0.8487]        ܠ       1754
+[0.8487]        ܢ       1724
+[0.8488]        ܕ       1697
+[0.8489]        ܗ       1681
+[0.8489]        ܡ       1623
+[0.8490]        ܪ       1359
+[0.8491]        ܬ       1339
+[0.8491]        ܒ       1184
+[0.8492]        ܥ       824
+[0.8492]        .       811
+[0.8493] COMBINING DOT BELOW    646
+[0.8493]        ܟ       599
+[0.8494]        ܫ       577
+[0.8495] COMBINING DIAERESIS    488
+[0.8495]        ܚ       431
+[0.8496]        ܦ       428
+[0.8496]        ܩ       307
+[0.8497] COMBINING DOT ABOVE    259
+[0.8497]        ܣ       256
+[0.8498]        ܛ       204
+[0.8498]        ܓ       176
+[0.8499]        ܀       132
+[0.8499]        ܙ       81
+[0.8500]        *       66
+[0.8501]        ܨ       59
+[0.8501]        ܆       40
+[0.8502]        [       40
+[0.8503]        ]       40
+[0.8503]        1       18
+[0.8504]        2       11
+[0.8504]        ܇       9
+[0.8505]        3       8
+[0.8505]                6
+[0.8506]        5       5
+[0.8506] NO-BREAK SPACE 4
+[0.8507]        0       4
+[0.8507]        6       4
+[0.8508]        :       4
+[0.8508]        8       4
+[0.8509]        9       3
+[0.8510]        7       3
+[0.8510]        4       3
+[0.8511] SYRIAC FEMININE DOT    1
+[0.8511] SYRIAC RUKKAKHA        1
+[0.8512] Encoding training set
+[0.9315] Creating new model [1,1,0,48 Lbx100 Do] with 49 outputs
+[0.9318] layer          type    params
+[0.9350] 0              rnn     direction b transposed False summarize False out 100 legacy None
+[0.9361] 1              dropout probability 0.5 dims 1
+[0.9381] 2              linear  augmented False out 49
+[0.9918] Constructing RMSprop optimizer (lr: 0.001, momentum: 0.9)
+[0.9920] Set OpenMP threads to 4
+[0.9920] Moving model to device cpu
+[0.9924] Starting evaluation run
+
+
+

indicates that the training is running on 788 transcribed lines and a +validation set of 88 lines. 49 different classes, i.e. Unicode code points, +where found in these 788 lines. These affect the output size of the network; +obviously only these 49 different classes/code points can later be output by +the network. Importantly, we can see that certain characters occur markedly +less often than others. Characters like the Syriac feminine dot and numerals +that occur less than 10 times will most likely not be recognized well by the +trained net.

+
+
+

Evaluation and Validation

+

While output during training is detailed enough to know when to stop training +one usually wants to know the specific kinds of errors to expect. Doing more +in-depth error analysis also allows to pinpoint weaknesses in the training +data, e.g. above average error rates for numerals indicate either a lack of +representation of numerals in the training data or erroneous transcription in +the first place.

+

First the trained model has to be applied to some line transcriptions with the +ketos test command:

+
$ ketos test -m syriac_best.mlmodel lines/*.png
+Loading model syriac_best.mlmodel ✓
+Evaluating syriac_best.mlmodel
+Evaluating  [#-----------------------------------]    3%  00:04:56
+...
+
+
+

After all lines have been processed a evaluation report will be printed:

+
=== report  ===
+
+35619     Characters
+336       Errors
+99.06%    Accuracy
+
+157       Insertions
+81        Deletions
+98        Substitutions
+
+Count     Missed  %Right
+27046     143     99.47%  Syriac
+7015      52      99.26%  Common
+1558      60      96.15%  Inherited
+
+Errors    Correct-Generated
+25        {  } - { COMBINING DOT BELOW }
+25        { COMBINING DOT BELOW } - {  }
+15        { . } - {  }
+15        { COMBINING DIAERESIS } - {  }
+12        { ܢ } - {  }
+10        {  } - { . }
+8 { COMBINING DOT ABOVE } - {  }
+8 { ܝ } - {  }
+7 { ZERO WIDTH NO-BREAK SPACE } - {  }
+7 { ܆ } - {  }
+7 { SPACE } - {  }
+7 { ܣ } - {  }
+6 {  } - { ܝ }
+6 { COMBINING DOT ABOVE } - { COMBINING DIAERESIS }
+5 { ܙ } - {  }
+5 { ܬ } - {  }
+5 {  } - { ܢ }
+4 { NO-BREAK SPACE } - {  }
+4 { COMBINING DIAERESIS } - { COMBINING DOT ABOVE }
+4 {  } - { ܒ }
+4 {  } - { COMBINING DIAERESIS }
+4 { ܗ } - {  }
+4 {  } - { ܬ }
+4 {  } - { ܘ }
+4 { ܕ } - { ܢ }
+3 {  } - { ܕ }
+3 { ܐ } - {  }
+3 { ܗ } - { ܐ }
+3 { ܝ } - { ܢ }
+3 { ܀ } - { . }
+3 {  } - { ܗ }
+
+  .....
+
+
+

The first section of the report consists of a simple accounting of the number +of characters in the ground truth, the errors in the recognition output and the +resulting accuracy in per cent.

+

The next table lists the number of insertions (characters occurring in the +ground truth but not in the recognition output), substitutions (misrecognized +characters), and deletions (superfluous characters recognized by the model).

+

Next is a grouping of errors (insertions and substitutions) by Unicode script.

+

The final part of the report are errors sorted by frequency and a per +character accuracy report. Importantly most errors are incorrect recognition of +combining marks such as dots and diaereses. These may have several sources: +different dot placement in training and validation set, incorrect transcription +such as non-systematic transcription, or unclean speckled scans. Depending on +the error source, correction most often involves adding more training data and +fixing transcriptions. Sometimes it may even be advisable to remove +unrepresentative data from the training set.

+
+
+

Recognition

+

The kraken utility is employed for all non-training related tasks. Optical +character recognition is a multi-step process consisting of binarization +(conversion of input images to black and white), page segmentation (extracting +lines from the image), and recognition (converting line image to character +sequences). All of these may be run in a single call like this:

+
$ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m MODEL_FILE
+
+
+

producing a text file from the input image. There are also hocr and ALTO output +formats available through the appropriate switches:

+
$ kraken -i ... ocr -h
+$ kraken -i ... ocr -a
+
+
+

For debugging purposes it is sometimes helpful to run each step manually and +inspect intermediate results:

+
$ kraken -i INPUT_IMAGE BW_IMAGE binarize
+$ kraken -i BW_IMAGE LINES segment
+$ kraken -i BW_IMAGE OUTPUT_FILE ocr -l LINES ...
+
+
+

It is also possible to recognize more than one file at a time by just chaining +-i ... ... clauses like this:

+
$ kraken -i input_1 output_1 -i input_2 output_2 ...
+
+
+

Finally, there is a central repository containing freely available models. +Getting a list of all available models:

+
$ kraken list
+
+
+

Retrieving model metadata for a particular model:

+
$ kraken show arabic-alam-al-kutub
+name: arabic-alam-al-kutub.mlmodel
+
+An experimental model for Classical Arabic texts.
+
+Network trained on 889 lines of [0] as a test case for a general Classical
+Arabic model. Ground truth was prepared by Sarah Savant
+<sarah.savant@aku.edu> and Maxim Romanov <maxim.romanov@uni-leipzig.de>.
+
+Vocalization was omitted in the ground truth. Training was stopped at ~35000
+iterations with an accuracy of 97%.
+
+[0] Ibn al-Faqīh (d. 365 AH). Kitāb al-buldān. Edited by Yūsuf al-Hādī, 1st
+edition. Bayrūt: ʿĀlam al-kutub, 1416 AH/1996 CE.
+alphabet:  !()-.0123456789:[] «»،؟ءابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ARABIC
+MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW
+
+
+

and actually fetching the model:

+
$ kraken get arabic-alam-al-kutub
+
+
+

The downloaded model can then be used for recognition by the name shown in its metadata, e.g.:

+
$ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m arabic-alam-al-kutub.mlmodel
+
+
+

For more documentation see the kraken website.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.0.0/vgsl.html b/5.0.0/vgsl.html new file mode 100644 index 000000000..0938141c5 --- /dev/null +++ b/5.0.0/vgsl.html @@ -0,0 +1,320 @@ + + + + + + + + VGSL network specification — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

VGSL network specification

+

kraken implements a dialect of the Variable-size Graph Specification Language +(VGSL), enabling the specification of different network architectures for image +processing purposes using a short definition string.

+
+

Basics

+

A VGSL specification consists of an input block, one or more layers, and an +output block. For example:

+
[1,48,0,1 Cr3,3,32 Mp2,2 Cr3,3,64 Mp2,2 S1(1x12)1,3 Lbx100 Do O1c103]
+
+
+

The first block defines the input in order of [batch, height, width, channels] +with zero-valued dimensions being variable. Integer valued height or width +input specifications will result in the input images being automatically scaled +in either dimension.

+

When channels are set to 1 grayscale or B/W inputs are expected, 3 expects RGB +color images. Higher values in combination with a height of 1 result in the +network being fed 1 pixel wide grayscale strips scaled to the size of the +channel dimension.

+

After the input, a number of layers are defined. Layers operate on the channel +dimension; this is intuitive for convolutional layers but a recurrent layer +doing sequence classification along the width axis on an image of a particular +height requires the height dimension to be moved to the channel dimension, +e.g.:

+
[1,48,0,1 S1(1x48)1,3 Lbx100 O1c103]
+
+
+

or using the alternative slightly faster formulation:

+
[1,1,0,48 Lbx100 O1c103]
+
+
+

Finally an output definition is appended. When training sequence classification +networks with the provided tools the appropriate output definition is +automatically appended to the network based on the alphabet of the training +data.

+
+
+

Examples

+
[1,1,0,48 Lbx100 Do 01c59]
+
+Creating new model [1,1,0,48 Lbx100 Do] with 59 outputs
+layer           type    params
+0               rnn     direction b transposed False summarize False out 100 legacy None
+1               dropout probability 0.5 dims 1
+2               linear  augmented False out 59
+
+
+

A simple recurrent recognition model with a single LSTM layer classifying lines +normalized to 48 pixels in height.

+
[1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do 01c59]
+
+Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 59 outputs
+layer           type    params
+0               conv    kernel 3 x 3 filters 32 activation r
+1               dropout probability 0.1 dims 2
+2               maxpool kernel 2 x 2 stride 2 x 2
+3               conv    kernel 3 x 3 filters 64 activation r
+4               dropout probability 0.1 dims 2
+5               maxpool kernel 2 x 2 stride 2 x 2
+6               reshape from 1 1 x 12 to 1/3
+7               rnn     direction b transposed False summarize False out 100 legacy None
+8               dropout probability 0.5 dims 1
+9               linear  augmented False out 59
+
+
+

A model with a small convolutional stack before a recurrent LSTM layer. The +extended dropout layer syntax is used to reduce drop probability on the depth +dimension as the default is too high for convolutional layers. The remainder of +the height dimension (12) is reshaped into the depth dimensions before +applying the final recurrent and linear layers.

+
[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do 01c59]
+
+Creating new model [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do] with 59 outputs
+layer           type    params
+0               conv    kernel 3 x 3 filters 16 activation r
+1               maxpool kernel 3 x 3 stride 3 x 3
+2               rnn     direction f transposed True summarize True out 64 legacy None
+3               rnn     direction b transposed False summarize False out 128 legacy None
+4               rnn     direction b transposed False summarize False out 256 legacy None
+5               dropout probability 0.5 dims 1
+6               linear  augmented False out 59
+
+
+

A model with arbitrary sized color image input, an initial summarizing +recurrent layer to squash the height to 64, followed by 2 bi-directional +recurrent layers and a linear projection.

+
[1,1800,0,3 Cr3,3,32 Gn8 (I [Cr3,3,64,2,2 Gn8 CTr3,3,32,2,2]) Cr3,3,32 O2l8]
+
+layer           type    params
+0               conv    kernel 3 x 3 filters 32 activation r
+1               groupnorm       8 groups
+2               parallel        execute 2.0 and 2.1 in parallel
+2.0             identity
+2.1             serial  execute 2.1.0 to 2.1.2 in sequence
+2.1.0           conv    kernel 3 x 3 stride 2 x 2 filters 64 activation r
+2.1.1           groupnorm       8 groups
+2.1.2           transposed convolution  kernel 3 x 3 stride 2 x 2 filters 2 activation r
+3               conv    kernel 3 x 3 stride 1 x 1 filters 32 activation r
+4               linear  activation sigmoid
+
+
+

A model that outputs heatmaps with 8 feature dimensions, taking color images with +height normalized to 1800 pixels as its input. It uses a strided convolution +to first scale the image down, and then a transposed convolution to transform +the image back to its original size. This is done in a parallel block, where the +other branch simply passes through the output of the first convolution layer. +The input of the last convolutional layer is then the output of the two branches +of the parallel block concatenated, i.e. the output of the first +convolutional layer together with the output of the transposed convolutional layer, +giving 32 + 32 = 64 feature dimensions.

+
+
+

Convolutional Layers

+
C[T][{name}](s|t|r|l|m)[{name}]<y>,<x>,<d>[,<stride_y>,<stride_x>][,<dilation_y>,<dilation_x>]
+s = sigmoid
+t = tanh
+r = relu
+l = linear
+m = softmax
+
+
+

Adds a 2D convolution with kernel size (y, x) and d output channels, applying +the selected nonlinearity. Stride and dilation can be adjusted with the optional last +two parameters. T gives a transposed convolution. For transposed convolutions, +several output sizes are possible for the same configuration. The system +will try to match the output size of the different branches of parallel +blocks, however, this will only work if the transposed convolution directly +proceeds the confluence of the parallel branches, and if the branches with +fixed output size come first in the definition of the parallel block. Hence, +out of (I [Cr3,3,8,2,2 CTr3,3,8,2,2]), ([Cr3,3,8,2,2 CTr3,3,8,2,2] I) +and (I [Cr3,3,8,2,2 CTr3,3,8,2,2 Gn8]) only the first variant will +behave correctly.

+
+
+

Recurrent Layers

+
L[{name}](f|r|b)(x|y)[s][{name}]<n> LSTM cell with n outputs.
+G[{name}](f|r|b)(x|y)[s][{name}]<n> GRU cell with n outputs.
+f runs the RNN forward only.
+r runs the RNN reversed only.
+b runs the RNN bidirectionally.
+s (optional) summarizes the output in the requested dimension, return the last step.
+
+
+

Adds either an LSTM or GRU recurrent layer to the network using either the x +(width) or y (height) dimension as the time axis. Input features are the +channel dimension and the non-time-axis dimension (height/width) is treated as +another batch dimension. For example, a Lfx25 layer on an 1, 16, 906, 32 +input will execute 16 independent forward passes on 906x32 tensors resulting +in an output of shape 1, 16, 906, 25. If this isn’t desired either run a +summarizing layer in the other direction, e.g. Lfys20 for an input 1, 1, +906, 20, or prepend a reshape layer S1(1x16)1,3 combining the height and +channel dimension for an 1, 1, 906, 512 input to the recurrent layer.

+
+
+

Helper and Plumbing Layers

+
+

Max Pool

+
Mp[{name}]<y>,<x>[,<y_stride>,<x_stride>]
+
+
+

Adds a maximum pooling with (y, x) kernel_size and (y_stride, x_stride) stride.

+
+
+

Reshape

+
S[{name}]<d>(<a>x<b>)<e>,<f> Splits one dimension, moves one part to another
+        dimension.
+
+
+

The S layer reshapes a source dimension d to a,b and distributes a into +dimension e, respectively b into f. Either e or f has to be equal to +d. So S1(1, 48)1, 3 on an 1, 48, 1020, 8 input will first reshape into +1, 1, 48, 1020, 8, leave the 1 part in the height dimension and distribute +the 48 sized tensor into the channel dimension resulting in a 1, 1, 1024, +48*8=384 sized output. S layers are mostly used to remove undesirable non-1 +height before a recurrent layer.

+
+

Note

+

This S layer is equivalent to the one implemented in the tensorflow +implementation of VGSL, i.e. behaves differently from tesseract.

+
+
+
+
+

Regularization Layers

+
+

Dropout

+
Do[{name}][<prob>],[<dim>] Insert a 1D or 2D dropout layer
+
+
+

Adds an 1D or 2D dropout layer with a given probability. Defaults to 0.5 drop +probability and 1D dropout. Set to dim to 2 after convolutional layers.

+
+
+

Group Normalization

+
Gn<groups> Inserts a group normalization layer
+
+
+

Adds a group normalization layer separating the input into <groups> groups, +normalizing each separately.

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.2/.buildinfo b/5.2/.buildinfo new file mode 100644 index 000000000..1e7467e74 --- /dev/null +++ b/5.2/.buildinfo @@ -0,0 +1,4 @@ +# Sphinx build info version 1 +# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. +config: 2a29d0a014577df4956a1f01aa069898 +tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/5.2/.doctrees/advanced.doctree b/5.2/.doctrees/advanced.doctree new file mode 100644 index 000000000..cc1fdda4c Binary files /dev/null and b/5.2/.doctrees/advanced.doctree differ diff --git a/5.2/.doctrees/api.doctree b/5.2/.doctrees/api.doctree new file mode 100644 index 000000000..c3fac3423 Binary files /dev/null and b/5.2/.doctrees/api.doctree differ diff --git a/5.2/.doctrees/api_docs.doctree b/5.2/.doctrees/api_docs.doctree new file mode 100644 index 000000000..0cb786d97 Binary files /dev/null and b/5.2/.doctrees/api_docs.doctree differ diff --git a/5.2/.doctrees/environment.pickle b/5.2/.doctrees/environment.pickle new file mode 100644 index 000000000..f42ddf8c9 Binary files /dev/null and b/5.2/.doctrees/environment.pickle differ diff --git a/5.2/.doctrees/gpu.doctree b/5.2/.doctrees/gpu.doctree new file mode 100644 index 000000000..55a04ede4 Binary files /dev/null and b/5.2/.doctrees/gpu.doctree differ diff --git a/5.2/.doctrees/index.doctree b/5.2/.doctrees/index.doctree new file mode 100644 index 000000000..2f09cc463 Binary files /dev/null and b/5.2/.doctrees/index.doctree differ diff --git a/5.2/.doctrees/ketos.doctree b/5.2/.doctrees/ketos.doctree new file mode 100644 index 000000000..5d8cb3dec Binary files /dev/null and b/5.2/.doctrees/ketos.doctree differ diff --git a/5.2/.doctrees/models.doctree b/5.2/.doctrees/models.doctree new file mode 100644 index 000000000..4f4f97461 Binary files /dev/null and b/5.2/.doctrees/models.doctree differ diff --git a/5.2/.doctrees/training.doctree b/5.2/.doctrees/training.doctree new file mode 100644 index 000000000..e2f020a73 Binary files /dev/null and b/5.2/.doctrees/training.doctree differ diff --git a/5.2/.doctrees/vgsl.doctree b/5.2/.doctrees/vgsl.doctree new file mode 100644 index 000000000..c5f896e0f Binary files /dev/null and b/5.2/.doctrees/vgsl.doctree differ diff --git a/5.2/.nojekyll b/5.2/.nojekyll new file mode 100644 index 000000000..e69de29bb diff --git a/5.2/_images/blla_heatmap.jpg b/5.2/_images/blla_heatmap.jpg new file mode 100644 index 000000000..3f3381096 Binary files /dev/null and b/5.2/_images/blla_heatmap.jpg differ diff --git a/5.2/_images/blla_output.jpg b/5.2/_images/blla_output.jpg new file mode 100644 index 000000000..72c652fa4 Binary files /dev/null and b/5.2/_images/blla_output.jpg differ diff --git a/5.2/_images/bw.png b/5.2/_images/bw.png new file mode 100644 index 000000000..e7e5054eb Binary files /dev/null and b/5.2/_images/bw.png differ diff --git a/5.2/_images/normal-reproduction-low-resolution.jpg b/5.2/_images/normal-reproduction-low-resolution.jpg new file mode 100644 index 000000000..673be92ae Binary files /dev/null and b/5.2/_images/normal-reproduction-low-resolution.jpg differ diff --git a/5.2/_images/pat.png b/5.2/_images/pat.png new file mode 100644 index 000000000..52f7a8995 Binary files /dev/null and b/5.2/_images/pat.png differ diff --git a/5.2/_sources/advanced.rst.txt b/5.2/_sources/advanced.rst.txt new file mode 100644 index 000000000..533e1280f --- /dev/null +++ b/5.2/_sources/advanced.rst.txt @@ -0,0 +1,466 @@ +.. _advanced: + +Advanced Usage +============== + +Optical character recognition is the serial execution of multiple steps, in the +case of kraken, layout analysis/page segmentation (extracting topological text +lines from an image), recognition (feeding text lines images into a +classifier), and finally serialization of results into an appropriate format +such as ALTO or PageXML. + +Input and Outputs +----------------- + +Kraken inputs and their outputs can be defined in multiple ways. The most +simple are input-output pairs, i.e. producing one output document for one input +document follow the basic syntax: + +.. code-block:: console + + $ kraken -i input_1 output_1 -i input_2 output_2 ... subcommand_1 subcommand_2 ... subcommand_n + +In particular subcommands may be chained. + +There are other ways to define inputs and outputs as the syntax shown above can +become rather cumbersome for large amounts of files. + +As such there are a couple of ways to deal with multiple files in a compact +way. The first is batch processing: + +.. code-block:: console + + $ kraken -I '*.png' -o ocr.txt segment ... + +which expands the `glob expression +`_ in kraken internally and +appends the suffix defined with `-o` to each output file. An input file +`xyz.png` will therefore produce an output file `xyz.png.ocr.txt`. `-I` batch +inputs can also be specified multiple times: + +.. code-block:: console + + $ kraken -I '*.png' -I '*.jpg' -I '*.tif' -o ocr.txt segment ... + +A second way is to input multi-image files directly. These can be either in +PDF, TIFF, or JPEG2000 format and are specified like: + +.. code-block:: console + + $ kraken -I some.pdf -o ocr.txt -f pdf segment ... + +This will internally extract all page images from the input PDF file and write +one output file with an index (can be changed using the `-p` option) and the +suffix defined with `-o`. + +The `-f` option can not only be used to extract data from PDF/TIFF/JPEG2000 +files but also various XML formats. In these cases the appropriate data is +automatically selected from the inputs, image data for segmentation or line and +region segmentation for recognition: + +.. code-block:: console + + $ kraken -i alto.xml alto.ocr.txt -i page.xml page.ocr.txt -f xml ocr ... + +The code is able to automatically determine if a file is in PageXML or ALTO format. + +Output formats +^^^^^^^^^^^^^^ + +All commands have a default output format such as raw text for `ocr`, a plain +image for `binarize`, or a JSON definition of the the segmentation for +`segment`. These are specific to kraken and generally not suitable for further +processing by other software but a number of standardized data exchange formats +can be selected. Per default `ALTO `_, +`PageXML `_, `hOCR +`_, and abbyyXML containing additional metadata such as +bounding boxes and confidences are implemented. In addition, custom `jinja +`_ templates can be loaded to create +individualised output such as TEI. + +Output formats are selected on the main `kraken` command and apply to the last +subcommand defined in the subcommand chain. For example: + +.. code-block:: console + + $ kraken --alto -i ... segment -bl + +will serialize a plain segmentation in ALTO into the specified output file. + +The currently available format switches are: + +.. code-block:: console + + $ kraken -n -i ... ... # native output + $ kraken -a -i ... ... # ALTO output + $ kraken -x -i ... ... # PageXML output + $ kraken -h -i ... ... # hOCR output + $ kraken -y -i ... ... # abbyyXML output + +Custom templates can be loaded with the ``--template`` option: + +.. code-block:: console + + $ kraken --template /my/awesome/template.tmpl -i ... ... + +The data objects used by the templates are considered internal to kraken and +can change from time to time. The best way to get some orientation when writing +a new template from scratch is to have a look at the existing templates `here +`_. + +Binarization +------------ + +.. _binarization: + +.. note:: + + Binarization is deprecated and mostly not necessary anymore. It can often + worsen text recognition results especially for documents with uneven + lighting, faint writing, etc. + +The binarization subcommand converts a color or grayscale input image into an +image containing only two color levels: white (background) and black +(foreground, i.e. text). It accepts almost the same parameters as +``ocropus-nlbin``. Only options not related to binarization, e.g. skew +detection are missing. In addition, error checking (image sizes, inversion +detection, grayscale enforcement) is always disabled and kraken will happily +binarize any image that is thrown at it. + +Available parameters are: + +============ ==== +option type +============ ==== +\--threshold FLOAT +\--zoom FLOAT +\--escale FLOAT +\--border FLOAT +\--perc INTEGER RANGE +\--range INTEGER +\--low INTEGER RANGE +\--high INTEGER RANGE +============ ==== + +To binarize an image: + +.. code-block:: console + + $ kraken -i input.jpg bw.png binarize + +.. note:: + + Some image formats, notably JPEG, do not support a black and white + image mode. Per default the output format according to the output file + name extension will be honored. If this is not possible, a warning will + be printed and the output forced to PNG: + + .. code-block:: console + + $ kraken -i input.jpg bw.jpg binarize + Binarizing [06/24/22 09:56:23] WARNING jpeg does not support 1bpp images. Forcing to png. + ✓ + +Page Segmentation +----------------- + +The `segment` subcommand accesses page segmentation into lines and regions with +the two layout analysis methods implemented: the trainable baseline segmenter +that is capable of detecting both lines of different types and regions and a +legacy non-trainable segmenter that produces bounding boxes. + +Universal parameters of either segmenter are: + +=============================================== ====== +option action +=============================================== ====== +-d, \--text-direction Sets principal text direction. Valid values are `horizontal-lr`, `horizontal-rl`, `vertical-lr`, and `vertical-rl`. +-m, \--mask Segmentation mask suppressing page areas for line detection. A simple black and white mask image where 0-valued (black) areas are ignored for segmentation purposes. +=============================================== ====== + +Baseline Segmentation +^^^^^^^^^^^^^^^^^^^^^ + +The baseline segmenter works by applying a segmentation model on a page image +which labels each pixel on the image with one or more classes with each class +corresponding to a line or region of a specific type. In addition there are two +auxiliary classes that are used to determine the line orientation. A simplified +example of a composite image of the auxiliary classes and a single line type +without regions can be seen below: + +.. image:: _static/blla_heatmap.jpg + :width: 800 + :alt: BLLA output heatmap + +In a second step the raw heatmap is vectorized to extract line instances and +region boundaries, followed by bounding polygon computation for the baselines, +and text line ordering. The final output can be visualized as: + +.. image:: _static/blla_output.jpg + :width: 800 + :alt: BLLA final output + +The primary determinant of segmentation quality is the segmentation model +employed. There is a default model that works reasonably well on printed and +handwritten material on undegraded, even writing surfaces such as paper or +parchment. The output of this model consists of a single line type and a +generic text region class that denotes coherent blocks of text. This model is +employed automatically when the baseline segment is activated with the `-bl` +option: + +.. code-block:: console + + $ kraken -i input.jpg segmentation.json segment -bl + +New models optimized for other kinds of documents can be trained (see +:ref:`here `). These can be applied with the `-i` option of the +`segment` subcommand: + +.. code-block:: console + + $ kraken -i input.jpg segmentation.json segment -bl -i fancy_model.mlmodel + +Legacy Box Segmentation +^^^^^^^^^^^^^^^^^^^^^^^ + +The legacy page segmentation is mostly parameterless, although a couple of +switches exist to tweak it for particular inputs. Its output consists of +rectangular bounding boxes in reading order and the general text direction +(horizontal, i.e. LTR or RTL text in top-to-bottom reading order or +vertical-ltr/rtl for vertical lines read from left-to-right or right-to-left). + +Apart from the limitations of the bounding box paradigm (rotated and curved +lines cannot be effectively extracted) another important drawback of the legacy +segmenter is the requirement for binarized input images. It is therefore +necessary to apply :ref:`binarization ` first or supply only +pre-binarized inputs. + +The legacy segmenter can be applied on some input image with: + +.. code-block:: console + + $ kraken -i 14.tif lines.json segment -x + $ cat lines.json + +Available specific parameters are: + +=============================================== ====== +option action +=============================================== ====== +\--scale FLOAT Estimate of the average line height on the page +-m, \--maxcolseps Maximum number of columns in the input document. Set to `0` for uni-column layouts. +-b, \--black-colseps / -w, \--white-colseps Switch to black column separators. +-r, \--remove-hlines / -l, \--hlines Disables prefiltering of small horizontal lines. Improves segmenter output on some Arabic texts. +-p, \--pad Adds left and right padding around lines in the output. +=============================================== ====== + +Principal Text Direction +^^^^^^^^^^^^^^^^^^^^^^^^ + +The principal text direction selected with the ``-d/--text-direction`` is a +switch used in the reading order heuristic to determine the order of text +blocks (regions) and individual lines. It roughly corresponds to the `block +flow direction +`_ in CSS with +an additional option. Valid options consist of two parts, an initial principal +line orientation (`horizontal` or `vertical`) followed by a block order (`lr` +for left-to-right or `rl` for right-to-left). + +.. warning: + + The principal text direction is independent of the direction of the + *inline text direction* (which is left-to-right for writing systems like + Latin and right-to-left for ones like Hebrew or Arabic). Kraken deals + automatically with the inline text direction through the BiDi algorithm + but can't infer the principal text direction automatically as it is + determined by factors like layout, type of document, primary script in + the document, and other factors. The different types of text + directionality and their relation can be confusing, the `W3C writing + mode `_ document explains + the fundamentals, although the model used in Kraken differs slightly. + +The first part is usually `horizontal` for scripts like Latin, Arabic, or +Hebrew where the lines are horizontally oriented on the page and are written/read from +top to bottom: + +.. image:: _static/bw.png + :width: 800 + :alt: Horizontal Latin script text + +Other scripts like Chinese can be written with vertical lines that are +written/read from left to right or right to left: + +.. image:: https://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Chinese_manuscript_Ti-i_ch%27i-shu._Wellcome_L0020843.jpg/577px-Chinese_manuscript_Ti-i_ch%27i-shu._Wellcome_L0020843.jpg + :width: 800 + :alt: Vertical Chinese text + +The second part is dependent on a number of factors as the order in which text +blocks are read is not fixed for every writing system. In mono-script texts it +is usually determined by the inline text direction, i.e. Latin script texts +columns are read starting with the top-left column followed by the column to +its right and so on, continuing with the left-most column below if none remain +to the right (inverse for right-to-left scripts like Arabic which start on the +top right-most columns, continuing leftward, and returning to the right-most +column just below when none remain). + +In multi-script documents the order is determined by the primary writing +system employed in the document, e.g. for a modern book containing both Latin +and Arabic script text it would be set to `lr` when Latin is primary, e.g. when +the binding is on the left side of the book seen from the title cover, and +vice-versa (`rl` if binding is on the right on the title cover). The analogue +applies to text written with vertical lines. + +With these explications there are four different text directions available: + +=============================================== ====== +Text Direction Examples +=============================================== ====== +horizontal-lr Latin script texts, Mixed LTR/RTL docs with principal LTR script +horizontal-rl Arabic script texts, Mixed LTR/RTL docs with principal RTL script +vertical-lr Vertical script texts read from left-to-right. +vertical-rl Vertical script texts read from right-to-left. +=============================================== ====== + +Masking +^^^^^^^ + +It is possible to keep the segmenter from finding text lines and regions on +certain areas of the input image. This is done through providing a binary mask +image that has the same size as the input image where blocked out regions are +black and valid regions white: + +.. code-block:: console + + $ kraken -i input.jpg segmentation.json segment -bl -m mask.png + +Model Repository +---------------- + +.. _repo: + +There is a semi-curated `repository +`_ of freely licensed recognition +models that can be interacted with from the command line using a few +subcommands. + +Querying and Model Retrieval +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``list`` subcommand retrieves a list of all models available and prints +them including some additional information (identifier, type, and a short +description): + +.. code-block:: console + + $ kraken list + Retrieving model list ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 8/8 0:00:00 0:00:07 + 10.5281/zenodo.6542744 (pytorch) - LECTAUREP Contemporary French Model (Administration) + 10.5281/zenodo.5617783 (pytorch) - Cremma-Medieval Old French Model (Litterature) + 10.5281/zenodo.5468665 (pytorch) - Medieval Hebrew manuscripts in Sephardi bookhand version 1.0 + ... + +To access more detailed information the ``show`` subcommand may be used: + +.. code-block:: console + + $ kraken show 10.5281/zenodo.5617783 + name: 10.5281/zenodo.5617783 + + Cremma-Medieval Old French Model (Litterature) + + .... + scripts: Latn + alphabet: &'(),-.0123456789:;?ABCDEFGHIJKLMNOPQRSTUVXabcdefghijklmnopqrstuvwxyz¶ãíñõ÷ħĩłũƺᵉẽ’•⁊⁹ꝑꝓꝯꝰ SPACE, COMBINING ACUTE ACCENT, COMBINING TILDE, COMBINING MACRON, COMBINING ZIGZAG ABOVE, COMBINING LATIN SMALL LETTER A, COMBINING LATIN SMALL LETTER E, COMBINING LATIN SMALL LETTER I, COMBINING LATIN SMALL LETTER O, COMBINING LATIN SMALL LETTER U, COMBINING LATIN SMALL LETTER C, COMBINING LATIN SMALL LETTER R, COMBINING LATIN SMALL LETTER T, COMBINING UR ABOVE, COMBINING US ABOVE, COMBINING LATIN SMALL LETTER S, 0xe8e5, 0xf038, 0xf128 + accuracy: 95.49% + license: CC-BY-SA-2.0 + author(s): Pinche, Ariane + date: 2021-10-29 + +If a suitable model has been decided upon it can be retrieved using the ``get`` +subcommand: + +.. code-block:: console + + $ kraken get 10.5281/zenodo.5617783 + Processing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 16.1/16.1 MB 0:00:00 0:00:10 + Model name: cremma_medieval_bicerin.mlmodel + +Models will be placed in ``$XDG_BASE_DIR`` and can be accessed using their name as +printed in the last line of the ``kraken get`` output. + +.. code-block:: console + + $ kraken -i ... ... ocr -m cremma_medieval_bicerin.mlmodel + +Publishing +^^^^^^^^^^ + +When one would like to share a model with the wider world (for fame and glory!) +it is possible (and recommended) to upload them to repository. The process +consists of 2 stages: the creation of the deposit on the Zenodo platform +followed by approval of the model in the community making it discoverable for +other kraken users. + +For uploading model a Zenodo account and a personal access token is required. +After account creation tokens can be created under the account settings: + +.. image:: _static/pat.png + :width: 800 + :alt: Zenodo token creation dialogue + +With the token models can then be uploaded: + +.. code-block:: console + + $ ketos publish -a $ACCESS_TOKEN aaebv2-2.mlmodel + DOI: 10.5281/zenodo.5617783 + +A number of important metadata will be asked for such as a short description of +the model, long form description, recognized scripts, and authorship. +Afterwards the model is deposited at Zenodo. This deposit is persistent, i.e. +can't be changed or deleted so it is important to make sure that all the +information is correct. Each deposit also has a unique persistent identifier, a +DOI, that can be used to refer to it, e.g. in publications or when pointing +someone to a particular model. + +Once the deposit has been created a request (requiring manual approval) for +inclusion in the repository will automatically be created which will make it +discoverable by other users. + +It is possible to deposit models without including them in the queryable +repository. Models uploaded this way are not truly private and can still be +found through the standard Zenodo search and be downloaded with `kraken get` +and its DOI. It is mostly suggested for preliminary models that might get +updated later: + +.. code-block:: console + + $ ketos publish --private -a $ACCESS_TOKEN aaebv2-2.mlmodel + DOI: 10.5281/zenodo.5617734 + +Recognition +----------- + +Recognition requires a grey-scale or binarized image, a page segmentation for +that image, and a model file. In particular there is no requirement to use the +page segmentation algorithm contained in the ``segment`` subcommand or the +binarization provided by kraken. + +Multi-script recognition is possible by supplying a script-annotated +segmentation and a mapping between scripts and models: + +.. code-block:: console + + $ kraken -i ... ... ocr -m Grek:porson.mlmodel -m Latn:antiqua.mlmodel + +All polytonic Greek text portions will be recognized using the `porson.mlmodel` +model while Latin text will be fed into the `antiqua.mlmodel` model. It is +possible to define a fallback model that other text will be fed to: + +.. code-block:: console + + $ kraken -i ... ... ocr -m ... -m ... -m default:porson.mlmodel + +It is also possible to disable recognition on a particular script by mapping to +the special model keyword `ignore`. Ignored lines will still be serialized but +will not contain any recognition results. diff --git a/5.2/_sources/api.rst.txt b/5.2/_sources/api.rst.txt new file mode 100644 index 000000000..56d0fca81 --- /dev/null +++ b/5.2/_sources/api.rst.txt @@ -0,0 +1,546 @@ +API Quickstart +============== + +Kraken provides routines which are usable by third party tools to access all +functionality of the OCR engine. Most functional blocks, binarization, +segmentation, recognition, and serialization are encapsulated in one high +level method each. + +Simple use cases of the API which are mostly useful for debugging purposes are +contained in the `contrib` directory. In general it is recommended to look at +this tutorial, these scripts, or the API reference. The command line drivers +are unnecessarily complex for straightforward applications as they contain lots +of boilerplate to enable all use cases. + +Basic Concepts +-------------- + +The fundamental modules of the API are similar to the command line drivers. +Image inputs and outputs are generally `Pillow `_ +objects and numerical outputs numpy arrays. + +Top-level modules implement high level functionality while :mod:`kraken.lib` +contains loaders and low level methods that usually should not be used if +access to intermediate results is not required. + +Preprocessing and Segmentation +------------------------------ + +The primary preprocessing function is binarization although depending on the +particular setup of the pipeline and the models utilized it can be optional. +For the non-trainable legacy bounding box segmenter binarization is mandatory +although it is still possible to feed color and grayscale images to the +recognizer. The trainable baseline segmenter can work with black and white, +grayscale, and color images, depending on the training data and network +configuration utilized; though grayscale and color data are used in almost all +cases. + +.. code-block:: python + + >>> from PIL import Image + + >>> from kraken import binarization + + # can be any supported image format and mode + >>> im = Image.open('foo.png') + >>> bw_im = binarization.nlbin(im) + +Legacy segmentation +~~~~~~~~~~~~~~~~~~~ + +The basic parameter of the legacy segmenter consists just of a b/w image +object, although some additional parameters exist, largely to change the +principal text direction (important for column ordering and top-to-bottom +scripts) and explicit masking of non-text image regions: + +.. code-block:: python + + >>> from kraken import pageseg + + >>> seg = pageseg.segment(bw_im) + >>> seg + Segmentation(type='bbox', + imagename='foo.png', + text_direction='horizontal-lr', + script_detection=False, + lines=[BBoxLine(id='0ce11ad6-1f3b-4f7d-a8c8-0178e411df69', + bbox=[74, 61, 136, 101], + text=None, + base_dir=None, + type='bbox', + imagename=None, + tags=None, + split=None, + regions=None, + text_direction='horizontal-lr'), + BBoxLine(id='c4a751dc-6731-4eea-a287-d4b57683f5b0', ...), + ....], + regions={}, + line_orders=[]) + +All segmentation methods return a :class:`kraken.containers.Segmentation` +object that contains all elements of the segmentation: its type, a list of +lines (either :class:`kraken.containers.BBoxLine` or +:class:`kraken.containers.BaselineLine`), a dictionary mapping region types to +lists of regions (:class:`kraken.containers.Region`), and one or more line +reading orders. + +Baseline segmentation +~~~~~~~~~~~~~~~~~~~~~ + +The baseline segmentation method is based on a neural network that classifies +image pixels into baselines and regions. Because it is trainable, a +segmentation model is required in addition to the image to be segmented and +it has to be loaded first: + +.. code-block:: python + + >>> from kraken import blla + >>> from kraken.lib import vgsl + + >>> model_path = 'path/to/model/file' + >>> model = vgsl.TorchVGSLModel.load_model(model_path) + +A segmentation model contains a basic neural network and associated metadata +defining the available line and region types, bounding regions, and an +auxiliary baseline location flag for the polygonizer: + +.. raw:: html + :file: _static/kraken_segmodel.svg + +Afterwards they can be fed into the segmentation method +:func:`kraken.blla.segment` with image objects: + +.. code-block:: python + + >>> from kraken import blla + >>> from kraken import serialization + + >>> baseline_seg = blla.segment(im, model=model) + >>> baseline_seg + Segmentation(type='baselines', + imagename='foo.png', + text_direction='horizontal-lr', + script_detection=False, + lines=[BaselineLine(id='22fee3d1-377e-4130-b9e5-5983a0c50ce8', + baseline=[[71, 93], [145, 92]], + boundary=[[71, 93], ..., [71, 93]], + text=None, + base_dir=None, + type='baselines', + imagename=None, + tags={'type': 'default'}, + split=None, + regions=['f17d03e0-50bb-4a35-b247-cb910c0aaf2b']), + BaselineLine(id='539eadce-f795-4bba-a785-c7767d10c407', ...), ...], + regions={'text': [Region(id='f17d03e0-50bb-4a35-b247-cb910c0aaf2b', + boundary=[[277, 54], ..., [277, 54]], + imagename=None, + tags={'type': 'text'})]}, + line_orders=[]) + >>> alto = serialization.serialize(baseline_seg, + image_size=im.size, + template='alto') + >>> with open('segmentation_output.xml', 'w') as fp: + fp.write(alto) + +A default segmentation model is supplied and will be used if none is specified +explicitly as an argument. Optional parameters are largely the same as for the +legacy segmenter, i.e. text direction and masking. + +Images are automatically converted into the proper mode for recognition, except +in the case of models trained on binary images as there is a plethora of +different algorithms available, each with strengths and weaknesses. For most +material the kraken-provided binarization should be sufficient, though. This +does not mean that a segmentation model trained on RGB images will have equal +accuracy for B/W, grayscale, and RGB inputs. Nevertheless the drop in quality +will often be modest or non-existent for color models while non-binarized +inputs to a binary model will cause severe degradation (and a warning to that +notion). + +Per default segmentation is performed on the CPU although the neural network +can be run on a GPU with the `device` argument. As the vast majority of the +processing required is postprocessing the performance gain will most likely +modest though. + +The above API is the most simple way to perform a complete segmentation. The +process consists of multiple steps such as pixel labelling, separate region and +baseline vectorization, and bounding polygon calculation: + +.. raw:: html + :file: _static/kraken_segmentation.svg + +It is possible to only run a subset of the functionality depending on one's +needs by calling the respective functions in :mod:`kraken.lib.segmentation`. As +part of the sub-library the API is not guaranteed to be stable but it generally +does not change much. Examples of more fine-grained use of the segmentation API +can be found in `contrib/repolygonize.py +`_ +and `contrib/segmentation_overlay.py +`_. + +Recognition +----------- + +Recognition itself is a multi-step process with a neural network producing a +matrix with a confidence value for possible outputs at each time step. This +matrix is decoded into a sequence of integer labels (*label domain*) which are +subsequently mapped into Unicode code points using a codec. Labels and code +points usually correspond one-to-one, i.e. each label is mapped to exactly one +Unicode code point, but if desired more complex codecs can map single labels to +multiple code points, multiple labels to single code points, or multiple labels +to multiple code points (see the :ref:`Codec ` section for further +information). + +.. _recognition_steps: + +.. raw:: html + :file: _static/kraken_recognition.svg + +As the customization of this two-stage decoding process is usually reserved +for specialized use cases, sensible defaults are chosen by default: codecs are +part of the model file and do not have to be supplied manually; the preferred +CTC decoder is an optional parameter of the recognition model object. + +To perform text line recognition a neural network has to be loaded first. A +:class:`kraken.lib.models.TorchSeqRecognizer` is returned which is a wrapper +around the :class:`kraken.lib.vgsl.TorchVGSLModel` class seen above for +segmentation model loading. + +.. code-block:: python + + >>> from kraken.lib import models + + >>> rec_model_path = '/path/to/recognition/model' + >>> model = models.load_any(rec_model_path) + +The sequence recognizer wrapper combines the neural network itself, a +:ref:`codec `, metadata such as if the input is supposed to be +grayscale or binarized, and an instance of a CTC decoder that performs the +conversion of the raw output tensor of the network into a sequence of labels: + +.. raw:: html + :file: _static/kraken_torchseqrecognizer.svg + +Afterwards, given an image, a segmentation and the model one can perform text +recognition. The code is identical for both legacy and baseline segmentations. +Like for segmentation input images are auto-converted to the correct color +mode, except in the case of binary models for which a warning will be raised if +there is a mismatch. + +There are two methods for recognition, a basic single model call +:func:`kraken.rpred.rpred` and a multi-model recognizer +:func:`kraken.rpred.mm_rpred`. The latter is useful for recognizing +multi-scriptal documents, i.e. applying different models to different parts of +a document. + +.. code-block:: python + + >>> from kraken import rpred + # single model recognition + >>> pred_it = rpred(network=model, + im=im, + segmentation=baseline_seg) + >>> for record in pred_it: + print(record) + +The output isn't just a sequence of characters but, depending on the type of +segmentation supplied, a :class:`kraken.containers.BaselineOCRRecord` or +:class:`kraken.containers.BBoxOCRRecord` record object containing the character +prediction, cuts (approximate locations), and confidences. + +.. code-block:: python + + >>> record.cuts + >>> record.prediction + >>> record.confidences + +it is also possible to access the original line information: + +.. code-block:: python + + # for baselines + >>> record.type + 'baselines' + >>> record.line + >>> record.baseline + >>> record.script + + # for box lines + >>> record.type + 'bbox' + >>> record.line + >>> record.script + +Sometimes the undecoded raw output of the network is required. The :math:`C +\times W` softmax output matrix is accessible as the `outputs` attribute on the +:class:`kraken.lib.models.TorchSeqRecognizer` after each step of the +:func:`kraken.rpred.rpred` iterator. To get a mapping from the label space +:math:`C` the network operates in to Unicode code points a codec is used. An +arbitrary sequence of labels can generate an arbitrary number of Unicode code +points although usually the relation is one-to-one. + +.. code-block:: python + + >>> pred_it = rpred(model, im, baseline_seg) + >>> next(pred_it) + >>> model.output + >>> model.codec.l2c + {'\x01': ' ', + '\x02': '"', + '\x03': "'", + '\x04': '(', + '\x05': ')', + '\x06': '-', + '\x07': '/', + ... + } + +There are several different ways to convert the output matrix to a sequence of +labels that can be decoded into a character sequence. These are contained in +:mod:`kraken.lib.ctc_decoder` with +:func:`kraken.lib.ctc_decoder.greedy_decoder` being the default. + +XML Parsing +----------- + +Sometimes it is desired to take the data in an existing XML serialization +format like PageXML or ALTO and apply an OCR function on it. The +:mod:`kraken.lib.xml` module includes parsers extracting information into data +structures processable with minimal transformation by the functional blocks: + +Parsing is accessed is through the :class:`kraken.lib.xml.XMLPage` class. + +.. code-block:: python + + >>> from kraken.lib import xml + + >>> alto_doc = '/path/to/alto' + >>> parsed_doc = xml.XMLPage(alto_doc) + >>> parsed_doc + XMLPage(filename='/path/to/alto', filetype=alto) + >>> parsed_doc.lines + {'line_1469098625593_463': BaselineLine(id='line_1469098625593_463', + baseline=[(2337, 226), (2421, 239)], + boundary=[(2344, 182), (2428, 195), (2420, 244), (2336, 231)], + text='$pag:39', + base_dir=None, + type='baselines', + imagename=None, + tags={'type': '$pag'}, + split=None, + regions=['region_1469098609000_462']), + + 'line_1469098649515_464': BaselineLine(id='line_1469098649515_464', + baseline=[(789, 269), (2397, 304)], + boundary=[(790, 224), (2398, 259), (2397, 309), (789, 274)], + text='$-nor su hijo, De todos sus bienes, con los pactos', + base_dir=None, + type='baselines', + imagename=None, + tags={'type': '$pac'}, + split=None, + regions=['region_1469098557906_461']), + ....} + >>> parsed_doc.regions + {'$pag': [Region(id='region_1469098609000_462', + boundary=[(2324, 171), (2437, 171), (2436, 258), (2326, 237)], + imagename=None, + tags={'type': '$pag'})], + '$pac': [Region(id='region_1469098557906_461', + boundary=[(738, 203), (2339, 245), (2398, 294), (2446, 345), (2574, 469), (2539, 1873), (2523, 2053), (2477, 2182), (738, 2243)], + imagename=None, + tags={'type': '$pac'})], + '$tip': [Region(id='TextRegion_1520586482298_194', + boundary=[(687, 2428), (688, 2422), (107, 2420), (106, 2264), (789, 2256), (758, 2404)], + imagename=None, + tags={'type': '$tip'})], + '$par': [Region(id='TextRegion_1520586482298_193', + boundary=[(675, 3772), (687, 2428), (758, 2404), (789, 2256), (2542, 2236), (2581, 3748)], + imagename=None, + tags={'type': '$par'})] + } + +The parser is aware of reading order(s), thus the basic properties accessing +lines and regions are unordered dictionaries. Reading orders can be accessed +separately through the `reading_orders` property: + +.. code-block:: python + + >>> parsed_doc.region_orders + {'line_implicit': {'order': ['line_1469098625593_463', + 'line_1469098649515_464', + ... + 'line_1469099255968_508'], + 'is_total': True, + 'description': 'Implicit line order derived from element sequence'}, + 'region_implicit': {'order': ['region_1469098609000_462', + ... + 'TextRegion_1520586482298_193'], + 'is_total': True, + 'description': 'Implicit region order derived from element sequence'}, + 'region_transkribus': {'order': ['region_1469098609000_462', + ... + 'TextRegion_1520586482298_193'], + 'is_total': True, + 'description': 'Explicit region order from `custom` attribute'}, + 'line_transkribus': {'order': ['line_1469098625593_463', + ... + 'line_1469099255968_508'], + 'is_total': True, + 'description': 'Explicit line order from `custom` attribute'}, + 'o_1530717944451': {'order': ['region_1469098609000_462', + ... + 'TextRegion_1520586482298_193'], + 'is_total': True, + 'description': 'Regions reading order'}} + +Reading orders are created from different sources, depending on the content of +the XML file. Every document will contain at least implicit orders for lines +and regions (`line_implicit` and `region_implicit`) sourced from the sequence +of line and region elements. There can also be explicit additional orders +defined by the standard reading order elements, for example `o_1530717944451` +in the above example. In Page XML files reading orders defined with the +Transkribus style custom attribute are also recognized. + +To access the lines or regions of a document in a particular order: + +.. code-block:: python + + >>> parsed_doc.get_sorted_lines(ro='line_implicit') + [BaselineLine(id='line_1469098625593_463', + baseline=[(2337, 226), (2421, 239)], + boundary=[(2344, 182), (2428, 195), (2420, 244), (2336, 231)], + text='$pag:39', + base_dir=None, + type='baselines', + imagename=None, + tags={'type': '$pag'}, + split=None, + regions=['region_1469098609000_462']), + BaselineLine(id='line_1469098649515_464', + baseline=[(789, 269), (2397, 304)], + boundary=[(790, 224), (2398, 259), (2397, 309), (789, 274)], + text='$-nor su hijo, De todos sus bienes, con los pactos', + base_dir=None, + type='baselines', + imagename=None, + tags={'type': '$pac'}, + split=None, + regions=['region_1469098557906_461']) + ...] + +The recognizer functions do not accept :class:`kraken.lib.xml.XMLPage` objects +directly which means that for most practical purposes these need to be +converted into :class:`container ` objects: + +.. code-block:: python + + >>> segmentation = parsed_doc.to_container() + >>> pred_it = rpred(network=model, + im=im, + segmentation=segmentation) + >>> for record in pred_it: + print(record) + + +Serialization +------------- + + +The serialization module can be used to transform results returned by the +segmenter or recognizer into a text based (most often XML) format for archival. +The module renders `jinja2 `_ templates, +either ones :ref:`packaged ` with kraken or supplied externally, +through the :func:`kraken.serialization.serialize` function. + +.. code-block:: python + + >>> import dataclasses + >>> from kraken.lib import serialization + + >>> alto_seg_only = serialization.serialize(baseline_seg, image_size=im.size, template='alto') + + >>> records = [record for record in pred_it] + >>> results = dataclasses.replace(pred_it.bounds, lines=records) + >>> alto = serialization.serialize(results, image_size=im.size, template='alto') + >>> with open('output.xml', 'w') as fp: + fp.write(alto) + +The serialization function accepts arbitrary +:class:`kraken.containers.Segmentation` objects, which may contain textual or +only segmentation information. As the recognizer returns +:class:`ocr_records ` which cannot be serialized +directly it is necessary to either construct a new +:class:`kraken.containers.Segmentation` from scratch or insert them into the +segmentation fed into the recognizer (:class:`ocr_records +` subclass :class:`BaselineLine +`/:class:`BBoxLine +` The container classes are immutable data classes, +therefore it is necessary for simple insertion of the records to use +`dataclasses.replace` to create a new segmentation with a changed lines +attribute. + +Training +-------- + +Training is largely implemented with the `pytorch lightning +`_ framework. There are separate +`LightningModule`s for recognition and segmentation training and a small +wrapper around the lightning's `Trainer` class that mainly sets up model +handling and verbosity options for the CLI. + + +.. code-block:: python + + >>> from kraken.lib.train import RecognitionModel, KrakenTrainer + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer() + >>> trainer.fit(model) + +Likewise for a baseline and region segmentation model: + +.. code-block:: python + + >>> from kraken.lib.train import SegmentationModel, KrakenTrainer + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = SegmentationModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer() + >>> trainer.fit(model) + +When the `fit()` method is called the dataset is initialized and the training +commences. Both can take quite a bit of time. To get insight into what exactly +is happening the standard `lightning callbacks +`_ +can be attached to the trainer object: + +.. code-block:: python + + >>> from pytorch_lightning.callbacks import Callback + >>> from kraken.lib.train import RecognitionModel, KrakenTrainer + >>> class MyPrintingCallback(Callback): + def on_init_start(self, trainer): + print("Starting to init trainer!") + + def on_init_end(self, trainer): + print("trainer is init now") + + def on_train_end(self, trainer, pl_module): + print("do something when training ends") + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer(enable_progress_bar=False, callbacks=[MyPrintingCallback]) + >>> trainer.fit(model) + Starting to init trainer! + trainer is init now + +This is only a small subset of the training functionality. It is suggested to +have a closer look at the command line parameters for features as transfer +learning, region and baseline filtering, training continuation, and so on. diff --git a/5.2/_sources/api_docs.rst.txt b/5.2/_sources/api_docs.rst.txt new file mode 100644 index 000000000..494232c09 --- /dev/null +++ b/5.2/_sources/api_docs.rst.txt @@ -0,0 +1,289 @@ +************* +API Reference +************* + +Segmentation +============ + +kraken.blla module +------------------ + +.. note:: + + `blla` provides the interface to the fully trainable segmenter. For the + legacy segmenter interface refer to the `pageseg` module. Note that + recognition models are not interchangeable between segmenters. + +.. autoapifunction:: kraken.blla.segment + +kraken.pageseg module +--------------------- + +.. note:: + + `pageseg` is the legacy bounding box-based segmenter. For the trainable + baseline segmenter interface refer to the `blla` module. Note that + recognition models are not interchangeable between segmenters. + +.. autoapifunction:: kraken.pageseg.segment + +Recognition +=========== + +kraken.rpred module +------------------- + +.. autoapiclass:: kraken.rpred.mm_rpred + :members: + +.. autoapifunction:: kraken.rpred.rpred + +Serialization +============= + +kraken.serialization module +--------------------------- + +.. autoapifunction:: kraken.serialization.render_report + +.. autoapifunction:: kraken.serialization.serialize + +.. autoapifunction:: kraken.serialization.serialize_segmentation + +Default templates +----------------- + +.. _templates: + +ALTO 4.4 +^^^^^^^^ + +.. literalinclude:: ../kraken/templates/alto + :language: xml+jinja + +PageXML +^^^^^^^ + +.. literalinclude:: ../kraken/templates/alto + :language: xml+jinja + +hOCR +^^^^ + +.. literalinclude:: ../kraken/templates/alto + :language: xml+jinja + +ABBYY XML +^^^^^^^^^ + +.. literalinclude:: ../kraken/templates/abbyyxml + :language: xml+jinja + +Containers and Helpers +====================== + +kraken.lib.codec module +----------------------- + +.. autoapiclass:: kraken.lib.codec.PytorchCodec + :members: + +kraken.containers module +------------------------ + +.. autoapiclass:: kraken.containers.Segmentation + :members: + +.. autoapiclass:: kraken.containers.BaselineLine + :members: + +.. autoapiclass:: kraken.containers.BBoxLine + :members: + +.. autoapiclass:: kraken.containers.Region + :members: + +.. autoapiclass:: kraken.containers.ocr_record + :members: + +.. autoapiclass:: kraken.containers.BaselineOCRRecord + :members: + +.. autoapiclass:: kraken.containers.BBoxOCRRecord + :members: + +.. autoapiclass:: kraken.containers.ProcessingStep + :members: + +kraken.lib.ctc_decoder +---------------------- + +.. autoapifunction:: kraken.lib.ctc_decoder.beam_decoder + +.. autoapifunction:: kraken.lib.ctc_decoder.greedy_decoder + +.. autoapifunction:: kraken.lib.ctc_decoder.blank_threshold_decoder + +kraken.lib.exceptions +--------------------- + +.. autoapiclass:: kraken.lib.exceptions.KrakenCodecException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenStopTrainingException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenEncodeException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenRecordException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenInvalidModelException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenInputException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenRepoException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenCairoSurfaceException + :members: + +kraken.lib.models module +------------------------ + +.. autoapiclass:: kraken.lib.models.TorchSeqRecognizer + :members: + +.. autoapifunction:: kraken.lib.models.load_any + +kraken.lib.segmentation module +------------------------------ + +.. autoapifunction:: kraken.lib.segmentation.reading_order + +.. autoapifunction:: kraken.lib.segmentation.neural_reading_order + +.. autoapifunction:: kraken.lib.segmentation.polygonal_reading_order + +.. autoapifunction:: kraken.lib.segmentation.vectorize_lines + +.. autoapifunction:: kraken.lib.segmentation.calculate_polygonal_environment + +.. autoapifunction:: kraken.lib.segmentation.scale_polygonal_lines + +.. autoapifunction:: kraken.lib.segmentation.scale_regions + +.. autoapifunction:: kraken.lib.segmentation.compute_polygon_section + +.. autoapifunction:: kraken.lib.segmentation.extract_polygons + +kraken.lib.vgsl module +---------------------- + +.. autoapiclass:: kraken.lib.vgsl.TorchVGSLModel + :members: + +kraken.lib.xml module +--------------------- + +.. autoapiclass:: kraken.lib.xml.XMLPage + +Training +======== + +kraken.lib.train module +----------------------- + +Loss and Evaluation Functions +----------------------------- + +.. autoapifunction:: kraken.lib.train.recognition_loss_fn + +.. autoapifunction:: kraken.lib.train.baseline_label_loss_fn + +.. autoapifunction:: kraken.lib.train.recognition_evaluator_fn + +.. autoapifunction:: kraken.lib.train.baseline_label_evaluator_fn + +Trainer +------- + +.. autoapiclass:: kraken.lib.train.KrakenTrainer + :members: + + +kraken.lib.dataset module +------------------------- + +Recognition datasets +^^^^^^^^^^^^^^^^^^^^ + +.. autoapiclass:: kraken.lib.dataset.ArrowIPCRecognitionDataset + :members: + +.. autoapiclass:: kraken.lib.dataset.BaselineSet + :members: + +.. autoapiclass:: kraken.lib.dataset.GroundTruthDataset + :members: + +Segmentation datasets +^^^^^^^^^^^^^^^^^^^^^ + +.. autoapiclass:: kraken.lib.dataset.PolygonGTDataset + :members: + +Reading order datasets +^^^^^^^^^^^^^^^^^^^^^^ + +.. autoapiclass:: kraken.lib.dataset.PairWiseROSet + :members: + +.. autoapiclass:: kraken.lib.dataset.PageWiseROSet + :members: + +Helpers +^^^^^^^ + +.. autoapiclass:: kraken.lib.dataset.ImageInputTransforms + :members: + +.. autoapifunction:: kraken.lib.dataset.collate_sequences + +.. autoapifunction:: kraken.lib.dataset.global_align + +.. autoapifunction:: kraken.lib.dataset.compute_confusions + +Legacy modules +============== + +These modules are retained for compatibility reasons or highly specialized use +cases. In most cases their use is not necessary and they aren't further +developed for interoperability with new functionality, e.g. the transcription +and line generation modules do not work with the baseline segmenter. + +kraken.binarization module +-------------------------- + +.. autoapifunction:: kraken.binarization.nlbin + +kraken.transcribe module +------------------------ + +.. autoapiclass:: kraken.transcribe.TranscriptionInterface + :members: + +kraken.linegen module +--------------------- + +.. autoapiclass:: kraken.transcribe.LineGenerator + :members: + +.. autoapifunction:: kraken.transcribe.ocropy_degrade + +.. autoapifunction:: kraken.transcribe.degrade_line + +.. autoapifunction:: kraken.transcribe.distort_line diff --git a/5.2/_sources/gpu.rst.txt b/5.2/_sources/gpu.rst.txt new file mode 100644 index 000000000..fbb66ba76 --- /dev/null +++ b/5.2/_sources/gpu.rst.txt @@ -0,0 +1,10 @@ +.. _gpu: + +GPU Acceleration +================ + +The latest version of kraken uses a new pytorch backend which enables GPU +acceleration both for training and recognition. Apart from a compatible Nvidia +GPU, CUDA and cuDNN have to be installed so pytorch can run computation on it. + + diff --git a/5.2/_sources/index.rst.txt b/5.2/_sources/index.rst.txt new file mode 100644 index 000000000..dda41e35e --- /dev/null +++ b/5.2/_sources/index.rst.txt @@ -0,0 +1,247 @@ +kraken +====== + +.. toctree:: + :hidden: + :maxdepth: 2 + + advanced + Training + API Tutorial + API Reference + Models + +kraken is a turn-key OCR system optimized for historical and non-Latin script +material. + +Features +======== + +kraken's main features are: + + - Fully trainable :ref:`layout analysis `, :ref:`reading order `, and :ref:`character recognition ` + - `Right-to-Left `_, `BiDi + `_, and Top-to-Bottom + script support + - `ALTO `_, PageXML, abbyyXML, and hOCR + output + - Word bounding boxes and character cuts + - Multi-script recognition support + - :ref:`Public repository ` of model files + - :ref:`Variable recognition network architectures ` + +Pull requests and code contributions are always welcome. + +Installation +============ + +Kraken can be run on Linux or Mac OS X (both x64 and ARM). Installation through +the on-board *pip* utility and the `anaconda `_ +scientific computing python are supported. + +Installation using Pip +---------------------- + +.. code-block:: console + + $ pip install kraken + +or by running pip in the git repository: + +.. code-block:: console + + $ pip install . + +If you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to +install the `pdf` extras package for PyPi: + +.. code-block:: console + + $ pip install kraken[pdf] + +or + +.. code-block:: console + + $ pip install .[pdf] + +respectively. + +Installation using Conda +------------------------ + +To install the stable version through `conda `_: + +.. code-block:: console + + $ conda install -c conda-forge -c mittagessen kraken + +Again PDF/multi-page TIFF/JPEG2000 support requires some additional dependencies: + +.. code-block:: console + + $ conda install -c conda-forge pyvips + +The git repository contains some environment files that aid in setting up the latest development version: + +.. code-block:: console + + $ git clone https://github.com/mittagessen/kraken.git + $ cd kraken + $ conda env create -f environment.yml + +or: + +.. code-block:: console + + $ git clone https://github.com/mittagessen/kraken.git + $ cd kraken + $ conda env create -f environment_cuda.yml + +for CUDA acceleration with the appropriate hardware. + +Finding Recognition Models +-------------------------- + +Finally you'll have to scrounge up a model to do the actual recognition of +characters. To download the default model for printed French text and place it +in the kraken directory for the current user: + +:: + + $ kraken get 10.5281/zenodo.10592716 + + +A list of libre models available in the central repository can be retrieved by +running: + +.. code-block:: console + + $ kraken list + +Model metadata can be extracted using: + +.. code-block:: console + + $ kraken show 10.5281/zenodo.10592716 + name: 10.5281/zenodo.10592716 + + CATMuS-Print (Large, 2024-01-30) - Diachronic model for French prints and other languages + +

CATMuS-Print (Large) - Diachronic model for French prints and other West European languages

+

CATMuS (Consistent Approach to Transcribing ManuScript) Print is a Kraken HTR model trained on data produced by several projects, dealing with different languages (French, Spanish, German, English, Corsican, Catalan, Latin, Italian…) and different centuries (from the first prints of the 16th c. to digital documents of the 21st century).

+

Transcriptions follow graphematic principles and try to be as compatible as possible with guidelines previously published for French: no ligature (except those that still exist), no allographetic variants (except the long s), and preservation of the historical use of some letters (u/v, i/j). Abbreviations are not resolved. Inconsistencies might be present, because transcriptions have been done over several years and the norms have slightly evolved.

+

The model is trained with NFKD Unicode normalization: each diacritic (including superscripts) are transcribed as their own characters, separately from the "main" character.

+

This model is the result of the collaboration from researchers from the University of Geneva and Inria Paris and will be consolidated under the CATMuS Medieval Guidelines in an upcoming paper.

+ scripts: Latn + alphabet: !"#$%&'()*+,-./0123456789:;<=>?ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_abcdefghijklmnopqrstuvwxyz|}~¡£¥§«¬°¶·»¿ÆßæđłŒœƀǝɇΑΒΓΔΕΖΘΙΚΛΜΝΟΠΡΣΤΥΦΧΩαβγδεζηθικλμνξοπρςστυφχωϛחלרᑕᗅᗞᚠẞ–—‘’‚“”„‟†•⁄⁊⁋℟←▽◊★☙✠✺✻⟦⟧⬪ꝑꝓꝗꝙꝟꝯꝵ SPACE, COMBINING GRAVE ACCENT, COMBINING ACUTE ACCENT, COMBINING CIRCUMFLEX ACCENT, COMBINING TILDE, COMBINING MACRON, COMBINING DOT ABOVE, COMBINING DIAERESIS, COMBINING RING ABOVE, COMBINING COMMA ABOVE, COMBINING REVERSED COMMA ABOVE, COMBINING CEDILLA, COMBINING OGONEK, COMBINING GREEK PERISPOMENI, COMBINING GREEK YPOGEGRAMMENI, COMBINING LATIN SMALL LETTER I, COMBINING LATIN SMALL LETTER U, 0xe682, 0xe68b, 0xe8bf, 0xf1a7 + accuracy: 98.56% + license: cc-by-4.0 + author(s): Gabay, Simon; Clérice, Thibault + date: 2024-01-30 + +Quickstart +========== + +The structure of an OCR software consists of multiple steps, primarily +preprocessing, segmentation, and recognition, each of which takes the output of +the previous step and sometimes additional files such as models and templates +that define how a particular transformation is to be performed. + +In kraken these are separated into different subcommands that can be chained or +ran separately: + +.. raw:: html + :file: _static/kraken_workflow.svg + +Recognizing text on an image using the default parameters including the +prerequisite step of page segmentation: + +.. code-block:: console + + $ kraken -i image.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel + Loading RNN ✓ + Processing ⣻ + +To segment an image into reading-order sorted baselines and regions: + +.. code-block:: console + + $ kraken -i bw.tif lines.json segment -bl + +To OCR an image using the previously downloaded model: + +.. code-block:: console + + $ kraken -i bw.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel + +To OCR an image using the default model and serialize the output using the ALTO +template: + +.. code-block:: console + + $ kraken -a -i bw.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel + +All commands and their parameters are documented, just add the standard +``--help`` flag for further information. + +Training Tutorial +================= + +There is a training tutorial at :doc:`training`. + +Related Software +================ + +These days kraken is quite closely linked to the `eScriptorium +`_ project developed in the same eScripta research +group. eScriptorium provides a user-friendly interface for annotating data, +training models, and inference (but also much more). There is a `gitter channel +`_ that is mostly intended for +coordinating technical development but is also a spot to find people with +experience on applying kraken on a wide variety of material. + +.. _license: + +License +======= + +``Kraken`` is provided under the terms and conditions of the `Apache 2.0 +License `_. + +Funding +======= + +kraken is developed at the `École Pratique des Hautes Études `_, `Université PSL `_. + + +.. container:: twocol + + .. container:: leftside + + .. image:: _static/normal-reproduction-low-resolution.jpg + :width: 100 + :alt: Co-financed by the European Union + + .. container:: rightside + + This project was partially funded through the RESILIENCE project, funded from + the European Union’s Horizon 2020 Framework Programme for Research and + Innovation. + + +.. container:: twocol + + .. container:: leftside + + .. image:: https://projet.biblissima.fr/sites/default/files/2021-11/biblissima-baseline-sombre-ia.png + :width: 300 + :alt: Received funding from the Programme d’investissements d’Avenir + + .. container:: rightside + + Ce travail a bénéficié d’une aide de l’État gérée par l’Agence Nationale de la + Recherche au titre du Programme d’Investissements d’Avenir portant la référence + ANR-21-ESRE-0005 (Biblissima+). + + diff --git a/5.2/_sources/ketos.rst.txt b/5.2/_sources/ketos.rst.txt new file mode 100644 index 000000000..b2b2b00e8 --- /dev/null +++ b/5.2/_sources/ketos.rst.txt @@ -0,0 +1,823 @@ +.. _ketos: + +Training +======== + +This page describes the training utilities available through the ``ketos`` +command line utility in depth. For a gentle introduction on model training +please refer to the :ref:`tutorial `. + +There are currently three trainable components in the kraken processing pipeline: +* Segmentation: finding lines and regions in images +* Reading Order: ordering lines found in the previous segmentation step. Reading order models are closely linked to segmentation models and both are usually trained on the same dataset. +* Recognition: recognition models transform images of lines into text. + +Depending on the use case it is not necessary to manually train new models for +each material. The default segmentation model works well on quite a variety of +handwritten and printed documents, a reading order model might not perform +better than the default heuristic for simple text flows, and there are +recognition models for some types of material available in the repository. + +Best practices +-------------- + +Recognition model training +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +* The default architecture works well for decently sized datasets. +* Use precompiled binary datasets and put them in a place where they can be memory mapped during training (local storage, not NFS or similar). +* Use the ``--logger`` flag to track your training metrics across experiments using Tensorboard. +* If the network doesn't converge before the early stopping aborts training, increase ``--min-epochs`` or ``--lag``. Use the ``--logger`` option to inspect your training loss. +* Use the flag ``--augment`` to activate data augmentation. +* Increase the amount of ``--workers`` to speedup data loading. This is essential when you use the ``--augment`` option. +* When using an Nvidia GPU, set the ``--precision`` option to 16 to use automatic mixed precision (AMP). This can provide significant speedup without any loss in accuracy. +* Use option -B to scale batch size until GPU utilization reaches 100%. When using a larger batch size, it is recommended to use option -r to scale the learning rate by the square root of the batch size (1e-3 * sqrt(batch_size)). +* When fine-tuning, it is recommended to use `new` mode not `union` as the network will rapidly unlearn missing labels in the new dataset. +* If the new dataset is fairly dissimilar or your base model has been pretrained with ketos pretrain, use ``--warmup`` in conjunction with ``--freeze-backbone`` for one 1 or 2 epochs. +* Upload your models to the model repository. + +Segmentation model training +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +* The segmenter is fairly robust when it comes to hyperparameter choice. +* Start by finetuning from the default model for a fixed number of epochs (50 for reasonably sized datasets) with a cosine schedule. +* Segmentation models' performance is difficult to evaluate. Pixel accuracy doesn't mean much because there are many more pixels that aren't part of a line or region than just background. Frequency-weighted IoU is good for overall performance, while mean IoU overrepresents rare classes. The best way to evaluate segmentation models is to look at the output on unlabelled data. +* If you don't have rare classes you can use a fairly small validation set to make sure everything is converging and just visually validate on unlabelled data. + +Training data formats +--------------------- + +The training tools accept a variety of training data formats, usually some kind +of custom low level format, the XML-based formats that are commony used for +archival of annotation and transcription data, and in the case of recognizer +training a precompiled binary format. It is recommended to use the XML formats +for segmentation and reading order training and the binary format for +recognition training. + +ALTO +~~~~ + +Kraken parses and produces files according to ALTO 4.3. An example showing the +attributes necessary for segmentation, recognition, and reading order training +follows: + +.. literalinclude:: alto.xml + :language: xml + :force: + +Importantly, the parser only works with measurements in the pixel domain, i.e. +an unset `MeasurementUnit` or one with an element value of `pixel`. In +addition, as the minimal version required for ingestion is quite new it is +likely that most existing ALTO documents will not contain sufficient +information to be used with kraken out of the box. + +PAGE XML +~~~~~~~~ + +PAGE XML is parsed and produced according to the 2019-07-15 version of the +schema, although the parser is not strict and works with non-conformant output +from a variety of tools. As with ALTO, PAGE XML files can be used to train +segmentation, reading order, and recognition models. + +.. literalinclude:: pagexml.xml + :language: xml + :force: + +Binary Datasets +~~~~~~~~~~~~~~~ + +.. _binary_datasets: + +In addition to training recognition models directly from XML and image files, a +binary dataset format offering a couple of advantages is supported for +recognition training. Binary datasets drastically improve loading performance +allowing the saturation of most GPUs with minimal computational overhead while +also allowing training with datasets that are larger than the systems main +memory. A minor drawback is a ~30% increase in dataset size in comparison to +the raw images + XML approach. + +To realize this speedup the dataset has to be compiled first: + +.. code-block:: console + + $ ketos compile -f xml -o dataset.arrow file_1.xml file_2.xml ... + +if there are a lot of individual lines containing many lines this process can +take a long time. It can easily be parallelized by specifying the number of +separate parsing workers with the ``--workers`` option: + +.. code-block:: console + + $ ketos compile --workers 8 -f xml ... + +In addition, binary datasets can contain fixed splits which allow +reproducibility and comparability between training and evaluation runs. +Training, validation, and test splits can be pre-defined from multiple sources. +Per default they are sourced from tags defined in the source XML files unless +the option telling kraken to ignore them is set: + +.. code-block:: console + + $ ketos compile --ignore-splits -f xml ... + +Alternatively fixed-proportion random splits can be created ad-hoc during +compile time: + +.. code-block:: console + + $ ketos compile --random-split 0.8 0.1 0.1 ... + +The above line splits assigns 80% of the source lines to the training set, 10% +to the validation set, and 10% to the test set. The training and validation +sets in the dataset file are used automatically by `ketos train` (unless told +otherwise) while the remaining 10% of the test set is selected by `ketos test`. + +Recognition training +-------------------- + +.. _predtrain: + +The training utility allows training of :ref:`VGSL ` specified models +both from scratch and from existing models. Here are its most important command line options: + +======================================================= ====== +option action +======================================================= ====== +-o, \--output Output model file prefix. Defaults to model. +-s, \--spec VGSL spec of the network to train. CTC layer + will be added automatically. default: + [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 + Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] +-a, \--append Removes layers before argument and then + appends spec. Only works when loading an + existing model +-i, \--load Load existing file to continue training +-F, \--savefreq Model save frequency in epochs during + training +-q, \--quit Stop condition for training. Set to `early` + for early stopping (default) or `fixed` for fixed + number of epochs. +-N, \--epochs Number of epochs to train for. +\--min-epochs Minimum number of epochs to train for when using early stopping. +\--lag Number of epochs to wait before stopping + training without improvement. Only used when using early stopping. +-d, \--device Select device to use (cpu, cuda:0, cuda:1,...). GPU acceleration requires CUDA. +\--optimizer Select optimizer (Adam, SGD, RMSprop). +-r, \--lrate Learning rate [default: 0.001] +-m, \--momentum Momentum used with SGD optimizer. Ignored otherwise. +-w, \--weight-decay Weight decay. +\--schedule Sets the learning rate scheduler. May be either constant, 1cycle, exponential, cosine, step, or + reduceonplateau. For 1cycle the cycle length is determined by the `--epoch` option. +-p, \--partition Ground truth data partition ratio between train/validation set +-u, \--normalization Ground truth Unicode normalization. One of NFC, NFKC, NFD, NFKD. +-c, \--codec Load a codec JSON definition (invalid if loading existing model) +\--resize Codec/output layer resizing option. If set + to `union` code points will be added, `new` + will set the layer to match exactly the + training data, `fail` will abort if training + data and model codec do not match. Only valid when refining an existing model. +-n, \--reorder / \--no-reorder Reordering of code points to display order. +-t, \--training-files File(s) with additional paths to training data. Used to + enforce an explicit train/validation set split and deal with + training sets with more lines than the command line can process. Can be used more than once. +-e, \--evaluation-files File(s) with paths to evaluation data. Overrides the `-p` parameter. +-f, \--format-type Sets the training and evaluation data format. + Valid choices are 'path', 'xml' (default), 'alto', 'page', or binary. + In `alto`, `page`, and xml mode all data is extracted from XML files + containing both baselines and a link to source images. + In `path` mode arguments are image files sharing a prefix up to the last + extension with JSON `.path` files containing the baseline information. + In `binary` mode arguments are precompiled binary dataset files. +\--augment / \--no-augment Enables/disables data augmentation. +\--workers Number of OpenMP threads and workers used to perform neural network passes and load samples from the dataset. +======================================================= ====== + +From Scratch +~~~~~~~~~~~~ + +The absolute minimal example to train a new recognition model from a number of +ALTO or PAGE XML documents is similar to the segmentation training: + +.. code-block:: console + + $ ketos train -f xml training_data/*.xml + +Training will continue until the error does not improve anymore and the best +model (among intermediate results) will be saved in the current directory; this +approach is called early stopping. + +In some cases changing the network architecture might be useful. One such +example would be material that is not well recognized in the grayscale domain, +as the default architecture definition converts images into grayscale. The +input definition can be changed quite easily to train on color data (RGB) instead: + +.. code-block:: console + + $ ketos train -f page -s '[1,120,0,3 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do0.1,2 Lbx200 Do]]' syr/*.xml + +Complete documentation for the network description language can be found on the +:ref:`VGSL ` page. + +Sometimes the early stopping default parameters might produce suboptimal +results such as stopping training too soon. Adjusting the lag can be useful: + +.. code-block:: console + + $ ketos train --lag 10 syr/*.png + +To switch optimizers from Adam to SGD or RMSprop just set the option: + +.. code-block:: console + + $ ketos train --optimizer SGD syr/*.png + +It is possible to resume training from a previously saved model: + +.. code-block:: console + + $ ketos train -i model_25.mlmodel syr/*.png + +A good configuration for a small precompiled print dataset and GPU acceleration +would be: + +.. code-block:: console + + $ ketos train -d cuda -f binary dataset.arrow + +A better configuration for large and complicated datasets such as handwritten texts: + +.. code-block:: console + + $ ketos train --augment --workers 4 -d cuda -f binary --min-epochs 20 -w 0 -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]' -r 0.0001 dataset_large.arrow + +This configuration is slower to train and often requires a couple of epochs to +output any sensible text at all. Therefore we tell ketos to train for at least +20 epochs so the early stopping algorithm doesn't prematurely interrupt the +training process. + +Fine Tuning +~~~~~~~~~~~ + +Fine tuning an existing model for another typeface or new characters is also +possible with the same syntax as resuming regular training: + +.. code-block:: console + + $ ketos train -f page -i model_best.mlmodel syr/*.xml + +The caveat is that the alphabet of the base model and training data have to be +an exact match. Otherwise an error will be raised: + +.. code-block:: console + + $ ketos train -i model_5.mlmodel kamil/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [0.8616] alphabet mismatch {'~', '»', '8', '9', 'ـ'} + Network codec not compatible with training set + [0.8620] Training data and model codec alphabets mismatch: {'ٓ', '؟', '!', 'ص', '،', 'ذ', 'ة', 'ي', 'و', 'ب', 'ز', 'ح', 'غ', '~', 'ف', ')', 'د', 'خ', 'م', '»', 'ع', 'ى', 'ق', 'ش', 'ا', 'ه', 'ك', 'ج', 'ث', '(', 'ت', 'ظ', 'ض', 'ل', 'ط', '؛', 'ر', 'س', 'ن', 'ء', 'ٔ', '«', 'ـ', 'ٕ'} + +There are two modes dealing with mismatching alphabets, ``union`` and ``new``. +``union`` resizes the output layer and codec of the loaded model to include all +characters in the new training set without removing any characters. ``new`` +will make the resulting model an exact match with the new training set by both +removing unused characters from the model and adding new ones. + +.. code-block:: console + + $ ketos -v train --resize union -i model_5.mlmodel syr/*.png + ... + [0.7943] Training set 788 lines, validation set 88 lines, alphabet 50 symbols + ... + [0.8337] Resizing codec to include 3 new code points + [0.8374] Resizing last layer in network to 52 outputs + ... + +In this example 3 characters were added for a network that is able to +recognize 52 different characters after sufficient additional training. + +.. code-block:: console + + $ ketos -v train --resize new -i model_5.mlmodel syr/*.png + ... + [0.7593] Training set 788 lines, validation set 88 lines, alphabet 49 symbols + ... + [0.7857] Resizing network or given codec to 49 code sequences + [0.8344] Deleting 2 output classes from network (46 retained) + ... + +In ``new`` mode 2 of the original characters were removed and 3 new ones were added. + +Slicing +~~~~~~~ + +Refining on mismatched alphabets has its limits. If the alphabets are highly +different the modification of the final linear layer to add/remove character +will destroy the inference capabilities of the network. In those cases it is +faster to slice off the last few layers of the network and only train those +instead of a complete network from scratch. + +Taking the default network definition as printed in the debug log we can see +the layer indices of the model: + +.. code-block:: console + + [0.8760] Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 48 outputs + [0.8762] layer type params + [0.8790] 0 conv kernel 3 x 3 filters 32 activation r + [0.8795] 1 dropout probability 0.1 dims 2 + [0.8797] 2 maxpool kernel 2 x 2 stride 2 x 2 + [0.8802] 3 conv kernel 3 x 3 filters 64 activation r + [0.8804] 4 dropout probability 0.1 dims 2 + [0.8806] 5 maxpool kernel 2 x 2 stride 2 x 2 + [0.8813] 6 reshape from 1 1 x 12 to 1/3 + [0.8876] 7 rnn direction b transposed False summarize False out 100 legacy None + [0.8878] 8 dropout probability 0.5 dims 1 + [0.8883] 9 linear augmented False out 48 + +To remove everything after the initial convolutional stack and add untrained +layers we define a network stub and index for appending: + +.. code-block:: console + + $ ketos train -i model_1.mlmodel --append 7 -s '[Lbx256 Do]' syr/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [0.8014] alphabet mismatch {'8', '3', '9', '7', '܇', '݀', '݂', '4', ':', '0'} + Slicing and dicing model ✓ + +The new model will behave exactly like a new one, except potentially training a +lot faster. + +Text Normalization and Unicode +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. note: + + The description of the different behaviors of Unicode text below are highly + abbreviated. If confusion arrises it is recommended to take a look at the + linked documents which are more exhaustive and include visual examples. + +Text can be encoded in multiple different ways when using Unicode. For many +scripts characters with diacritics can be encoded either as a single code point +or a base character and the diacritic, `different types of whitespace +`_ exist, and mixed bidirectional text +can be written differently depending on the `base line direction +`_. + +Ketos provides options to largely normalize input into normalized forms that +make processing of data from multiple sources possible. Principally, two +options are available: one for `Unicode normalization +`_ and one for whitespace normalization. The +Unicode normalization (disabled per default) switch allows one to select one of +the 4 normalization forms: + +.. code-block:: console + + $ ketos train --normalization NFD -f xml training_data/*.xml + $ ketos train --normalization NFC -f xml training_data/*.xml + $ ketos train --normalization NFKD -f xml training_data/*.xml + $ ketos train --normalization NFKC -f xml training_data/*.xml + +Whitespace normalization is enabled per default and converts all Unicode +whitespace characters into a simple space. It is highly recommended to leave +this function enabled as the variation of space width, resulting either from +text justification or the irregularity of handwriting, is difficult for a +recognition model to accurately model and map onto the different space code +points. Nevertheless it can be disabled through: + +.. code-block:: console + + $ ketos train --no-normalize-whitespace -f xml training_data/*.xml + +Further the behavior of the `BiDi algorithm +`_ can be influenced through two options. The +configuration of the algorithm is important as the recognition network is +trained to output characters (or rather labels which are mapped to code points +by a :ref:`codec `) in the order a line is fed into the network, i.e. +left-to-right also called display order. Unicode text is encoded as a stream of +code points in logical order, i.e. the order the characters in a line are read +in by a human reader, for example (mostly) right-to-left for a text in Hebrew. +The BiDi algorithm resolves this logical order to the display order expected by +the network and vice versa. The primary parameter of the algorithm is the base +direction which is just the default direction of the input fields of the user +when the ground truth was initially transcribed. Base direction will be +automatically determined by kraken when using PAGE XML or ALTO files that +contain it, otherwise it will have to be supplied if it differs from the +default when training a model: + +.. code-block:: console + + $ ketos train --base-dir R -f xml rtl_training_data/*.xml + +It is also possible to disable BiDi processing completely, e.g. when the text +has been brought into display order already: + +.. code-block:: console + + $ ketos train --no-reorder -f xml rtl_display_data/*.xml + +Codecs +~~~~~~ + +.. _codecs: + +Codecs map between the label decoded from the raw network output and Unicode +code points (see :ref:`this ` diagram for the precise steps +involved in text line recognition). Codecs are attached to a recognition model +and are usually defined once at initial training time, although they can be +adapted either explicitly (with the API) or implicitly through domain adaptation. + +The default behavior of kraken is to auto-infer this mapping from all the +characters in the training set and map each code point to one separate label. +This is usually sufficient for alphabetic scripts, abjads, and abugidas apart +from very specialised use cases. Logographic writing systems with a very large +number of different graphemes, such as all the variants of Han characters or +Cuneiform, can be more problematic as their large inventory makes recognition +both slow and error-prone. In such cases it can be advantageous to decompose +each code point into multiple labels to reduce the output dimensionality of the +network. During decoding valid sequences of labels will be mapped to their +respective code points as usual. + +There are multiple approaches one could follow constructing a custom codec: +*randomized block codes*, i.e. producing random fixed-length labels for each code +point, *Huffmann coding*, i.e. variable length label sequences depending on the +frequency of each code point in some text (not necessarily the training set), +or *structural decomposition*, i.e. describing each code point through a +sequence of labels that describe the shape of the grapheme similar to how some +input systems for Chinese characters function. + +While the system is functional it is not well-tested in practice and it is +unclear which approach works best for which kinds of inputs. + +Custom codecs can be supplied as simple JSON files that contain a dictionary +mapping between strings and integer sequences, e.g.: + +.. code-block:: console + + $ ketos train -c sample.codec -f xml training_data/*.xml + +with `sample.codec` containing: + +.. code-block:: json + + {"S": [50, 53, 74, 23], + "A": [95, 60, 19, 95], + "B": [2, 96, 28, 29], + "\u1f05": [91, 14, 95, 90]} + +Unsupervised recognition pretraining +------------------------------------ + +Text recognition models can be pretrained in an unsupervised fashion from text +line images, both in bounding box and baseline format. The pretraining is +performed through a contrastive surrogate task aiming to distinguish in-painted +parts of the input image features from randomly sampled distractor slices. + +All data sources accepted by the supervised trainer are valid for pretraining +but for performance reasons it is recommended to use pre-compiled binary +datasets. One thing to keep in mind is that compilation filters out empty +(non-transcribed) text lines per default which is undesirable for pretraining. +With the ``--keep-empty-lines`` option all valid lines will be written to the +dataset file: + +.. code-block:: console + + $ ketos compile --keep-empty-lines -f xml -o foo.arrow *.xml + + +The basic pretraining call is very similar to a training one: + +.. code-block:: console + + $ ketos pretrain -f binary foo.arrow + +There are a couple of hyperparameters that are specific to pretraining: the +mask width (at the subsampling level of the last convolutional layer), the +probability of a particular position being the start position of a mask, and +the number of negative distractor samples. + +.. code-block:: console + + $ ketos pretrain -o pretrain --mask-width 4 --mask-probability 0.2 --num-negatives 3 -f binary foo.arrow + +Once a model has been pretrained it has to be adapted to perform actual +recognition with a standard labelled dataset, although training data +requirements will usually be much reduced: + +.. code-block:: console + + $ ketos train -i pretrain_best.mlmodel --warmup 5000 --freeze-backbone 1000 -f binary labelled.arrow + +It is necessary to use learning rate warmup (`warmup`) for at least a couple of +epochs in addition to freezing the backbone (all but the last fully connected +layer performing the classification) to have the model converge during +fine-tuning. Fine-tuning models from pre-trained weights is quite a bit less +stable than training from scratch or fine-tuning an existing model. As such it +can be necessary to run a couple of trials with different hyperparameters +(principally learning rate) to find workable ones. It is entirely possible that +pretrained models do not converge at all even with reasonable hyperparameter +configurations. + +Segmentation training +--------------------- + +.. _segtrain: + +Training a segmentation model is very similar to training models for text +recognition. The basic invocation is: + +.. code-block:: console + + $ ketos segtrain -f xml training_data/*.xml + +This takes all text lines and regions encoded in the XML files and trains a +model to recognize them. + +Most other options available in transcription training are also available in +segmentation training. CUDA acceleration: + +.. code-block:: console + + $ ketos segtrain -d cuda -f xml training_data/*.xml + +Defining custom architectures: + +.. code-block:: console + + $ ketos segtrain -d cuda -s '[1,1200,0,3 Cr7,7,64,2,2 Gn32 Cr3,3,128,2,2 Gn32 Cr3,3,128 Gn32 Cr3,3,256 Gn32]' -f xml training_data/*.xml + +Fine tuning/transfer learning with last layer adaptation and slicing: + +.. code-block:: console + + $ ketos segtrain --resize new -i segmodel_best.mlmodel training_data/*.xml + $ ketos segtrain -i segmodel_best.mlmodel --append 7 -s '[Cr3,3,64 Do0.1]' training_data/*.xml + +In addition there are a couple of specific options that allow filtering of +baseline and region types. Datasets are often annotated to a level that is too +detailed or contains undesirable types, e.g. when combining segmentation data +from different sources. The most basic option is the suppression of *all* of +either baseline or region data contained in the dataset: + +.. code-block:: console + + $ ketos segtrain --suppress-baselines -f xml training_data/*.xml + Training line types: + Training region types: + graphic 3 135 + text 4 1128 + separator 5 5431 + paragraph 6 10218 + table 7 16 + ... + $ ketos segtrain --suppress-regions -f xml training-data/*.xml + Training line types: + default 2 53980 + foo 8 134 + ... + +It is also possible to filter out baselines/regions selectively: + +.. code-block:: console + + $ ketos segtrain -f xml --valid-baselines default training_data/*.xml + Training line types: + default 2 53980 + Training region types: + graphic 3 135 + text 4 1128 + separator 5 5431 + paragraph 6 10218 + table 7 16 + $ ketos segtrain -f xml --valid-regions graphic --valid-regions paragraph training_data/*.xml + Training line types: + default 2 53980 + Training region types: + graphic 3 135 + paragraph 6 10218 + +Finally, we can merge baselines and regions into each other: + +.. code-block:: console + + $ ketos segtrain -f xml --merge-baselines default:foo training_data/*.xml + Training line types: + default 2 54114 + ... + $ ketos segtrain -f xml --merge-regions text:paragraph --merge-regions graphic:table training_data/*.xml + ... + Training region types: + graphic 3 151 + text 4 11346 + separator 5 5431 + ... + +These options are combinable to massage the dataset into any typology you want. +Tags containing the separator character `:` can be specified by escaping them +with backslash. + +Then there are some options that set metadata fields controlling the +postprocessing. When computing the bounding polygons the recognized baselines +are offset slightly to ensure overlap with the line corpus. This offset is per +default upwards for baselines but as it is possible to annotate toplines (for +scripts like Hebrew) and centerlines (for baseline-free scripts like Chinese) +the appropriate offset can be selected with an option: + +.. code-block:: console + + $ ketos segtrain --topline -f xml hebrew_training_data/*.xml + $ ketos segtrain --centerline -f xml chinese_training_data/*.xml + $ ketos segtrain --baseline -f xml latin_training_data/*.xml + +Lastly, there are some regions that are absolute boundaries for text line +content. When these regions are marked as such the polygonization can sometimes +be improved: + +.. code-block:: console + + $ ketos segtrain --bounding-regions paragraph -f xml training_data/*.xml + ... + +Reading order training +---------------------- + +.. _rotrain: + +Reading order models work slightly differently from segmentation and reading +order models. They are closely linked to the typology used in the dataset they +were trained on as they use type information on lines and regions to make +ordering decisions. As the same typology was probably used to train a specific +segmentation model, reading order models are trained separately but bundled +with their segmentation model in a subsequent step. The general sequence is +therefore: + +.. code-block:: console + + $ ketos segtrain -o fr_manu_seg.mlmodel -f xml french/*.xml + ... + $ ketos rotrain -o fr_manu_ro.mlmodel -f xml french/*.xml + ... + $ ketos roadd -o fr_manu_seg_with_ro.mlmodel -i fr_manu_seg_best.mlmodel -r fr_manu_ro_best.mlmodel + +Only the `fr_manu_seg_with_ro.mlmodel` file will contain the trained reading +order model. Segmentation models can exist with or without reading order +models. If one is added, the neural reading order will be computed *in +addition* to the one produced by the default heuristic during segmentation and +serialized in the final XML output (in ALTO/PAGE XML). + +.. note:: + + Reading order models work purely on the typology and geometric features + of the lines and regions. They construct an approximate ordering matrix + by feeding feature vectors of two lines (or regions) into the network + to decide which of those two lines precedes the other. + + These feature vectors are quite simple; just the lines' types, and + their start, center, and end points. Therefore they can *not* reliably + learn any ordering relying on graphical features of the input page such + as: line color, typeface, or writing system. + +Reading order models are extremely simple and do not require a lot of memory or +computational power to train. In fact, the default parameters are extremely +conservative and it is recommended to increase the batch size for improved +training speed. Large batch size above 128k are easily possible with +sufficiently large training datasets: + +.. code-block:: console + + $ ketos rotrain -o fr_manu_ro.mlmodel -B 128000 -f french/*.xml + Training RO on following baselines types: + DefaultLine 1 + DropCapitalLine 2 + HeadingLine 3 + InterlinearLine 4 + GPU available: False, used: False + TPU available: False, using: 0 TPU cores + IPU available: False, using: 0 IPUs + HPU available: False, using: 0 HPUs + ┏━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓ + ┃ ┃ Name ┃ Type ┃ Params ┃ + ┡━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩ + │ 0 │ criterion │ BCEWithLogitsLoss │ 0 │ + │ 1 │ ro_net │ MLP │ 1.1 K │ + │ 2 │ ro_net.fc1 │ Linear │ 1.0 K │ + │ 3 │ ro_net.relu │ ReLU │ 0 │ + │ 4 │ ro_net.fc2 │ Linear │ 45 │ + └───┴─────────────┴───────────────────┴────────┘ + Trainable params: 1.1 K + Non-trainable params: 0 + Total params: 1.1 K + Total estimated model params size (MB): 0 + stage 0/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/35 0:00:00 • -:--:-- 0.00it/s val_spearman: 0.912 val_loss: 0.701 early_stopping: 0/300 inf + +During validation a metric called Spearman's footrule is computed. To calculate +Spearman's footrule, the ranks of the lines of text in the ground truth reading +order and the predicted reading order are compared. The footrule is then +calculated as the sum of the absolute differences between the ranks of pairs of +lines. The score increases by 1 for each line between the correct and predicted +positions of a line. + +A lower footrule score indicates a better alignment between the two orders. A +score of 0 implies perfect alignment of line ranks. + +Recognition testing +------------------- + +Picking a particular model from a pool or getting a more detailed look on the +recognition accuracy can be done with the `test` command. It uses transcribed +lines, the test set, in the same format as the `train` command, recognizes the +line images with one or more models, and creates a detailed report of the +differences from the ground truth for each of them. + +======================================================= ====== +option action +======================================================= ====== +-f, \--format-type Sets the test set data format. + Valid choices are 'path', 'xml' (default), 'alto', 'page', or binary. + In `alto`, `page`, and xml mode all data is extracted from XML files + containing both baselines and a link to source images. + In `path` mode arguments are image files sharing a prefix up to the last + extension with JSON `.path` files containing the baseline information. + In `binary` mode arguments are precompiled binary dataset files. +-m, \--model Model(s) to evaluate. +-e, \--evaluation-files File(s) with paths to evaluation data. +-d, \--device Select device to use. +\--pad Left and right padding around lines. +======================================================= ====== + +Transcriptions are handed to the command in the same way as for the `train` +command, either through a manifest with ``-e/--evaluation-files`` or by just +adding a number of image files as the final argument: + +.. code-block:: console + + $ ketos test -m $model -e test.txt test/*.png + Evaluating $model + Evaluating [####################################] 100% + === report test_model.mlmodel === + + 7012 Characters + 6022 Errors + 14.12% Accuracy + + 5226 Insertions + 2 Deletions + 794 Substitutions + + Count Missed %Right + 1567 575 63.31% Common + 5230 5230 0.00% Arabic + 215 215 0.00% Inherited + + Errors Correct-Generated + 773 { ا } - { } + 536 { ل } - { } + 328 { و } - { } + 274 { ي } - { } + 266 { م } - { } + 256 { ب } - { } + 246 { ن } - { } + 241 { SPACE } - { } + 207 { ر } - { } + 199 { ف } - { } + 192 { ه } - { } + 174 { ع } - { } + 172 { ARABIC HAMZA ABOVE } - { } + 144 { ت } - { } + 136 { ق } - { } + 122 { س } - { } + 108 { ، } - { } + 106 { د } - { } + 82 { ك } - { } + 81 { ح } - { } + 71 { ج } - { } + 66 { خ } - { } + 62 { ة } - { } + 60 { ص } - { } + 39 { ، } - { - } + 38 { ش } - { } + 30 { ا } - { - } + 30 { ن } - { - } + 29 { ى } - { } + 28 { ذ } - { } + 27 { ه } - { - } + 27 { ARABIC HAMZA BELOW } - { } + 25 { ز } - { } + 23 { ث } - { } + 22 { غ } - { } + 20 { م } - { - } + 20 { ي } - { - } + 20 { ) } - { } + 19 { : } - { } + 19 { ط } - { } + 19 { ل } - { - } + 18 { ، } - { . } + 17 { ة } - { - } + 16 { ض } - { } + ... + Average accuracy: 14.12%, (stddev: 0.00) + +The report(s) contains character accuracy measured per script and a detailed +list of confusions. When evaluating multiple models the last line of the output +will the average accuracy and the standard deviation across all of them. diff --git a/5.2/_sources/models.rst.txt b/5.2/_sources/models.rst.txt new file mode 100644 index 000000000..b393f0738 --- /dev/null +++ b/5.2/_sources/models.rst.txt @@ -0,0 +1,24 @@ +.. _models: + +Models +====== + +There are currently three kinds of models containing the recurrent neural +networks doing all the character recognition supported by kraken: ``pronn`` +files serializing old pickled ``pyrnn`` models as protobuf, clstm's native +serialization, and versatile `Core ML +`_ models. + +CoreML +------ + +Core ML allows arbitrary network architectures in a compact serialization with +metadata. This is the default format in pytorch-based kraken. + +Segmentation Models +------------------- + +Recognition Models +------------------ + + diff --git a/5.2/_sources/training.rst.txt b/5.2/_sources/training.rst.txt new file mode 100644 index 000000000..704727aa5 --- /dev/null +++ b/5.2/_sources/training.rst.txt @@ -0,0 +1,463 @@ +.. _training: + +Training kraken +=============== + +kraken is an optical character recognition package that can be trained fairly +easily for a large number of scripts. In contrast to other system requiring +segmentation down to glyph level before classification, it is uniquely suited +for the recognition of connected scripts, because the neural network is trained +to assign correct character to unsegmented training data. + +Both segmentation, the process finding lines and regions on a page image, and +recognition, the conversion of line images into text, can be trained in kraken. +To train models for either we require training data, i.e. examples of page +segmentations and transcriptions that are similar to what we want to be able to +recognize. For segmentation the examples are the location of baselines, i.e. +the imaginary lines the text is written on, and polygons of regions. For +recognition these are the text contained in a line. There are multiple ways to +supply training data but the easiest is through PageXML or ALTO files. + +Installing kraken +----------------- + +The easiest way to install and use kraken is through `conda +`_. kraken works both on Linux and Mac OS +X. After installing conda, download the environment file and create the +environment for kraken: + +.. code-block:: console + + $ wget https://raw.githubusercontent.com/mittagessen/kraken/main/environment.yml + $ conda env create -f environment.yml + +Each time you want to use the kraken environment in a shell is has to be +activated first: + +.. code-block:: console + + $ conda activate kraken + +Image acquisition and preprocessing +----------------------------------- + +First a number of high quality scans, preferably color or grayscale and at +least 300dpi are required. Scans should be in a lossless image format such as +TIFF or PNG, images in PDF files have to be extracted beforehand using a tool +such as ``pdftocairo`` or ``pdfimages``. While each of these requirements can +be relaxed to a degree, the final accuracy will suffer to some extent. For +example, only slightly compressed JPEG scans are generally suitable for +training and recognition. + +Depending on the source of the scans some preprocessing such as splitting scans +into pages, correcting skew and warp, and removing speckles can be advisable +although it isn't strictly necessary as the segmenter can be trained to treat +noisy material with a high accuracy. A fairly user-friendly software for +semi-automatic batch processing of image scans is `Scantailor +`_ albeit most work can be done using a standard image +editor. + +The total number of scans required depends on the kind of model to train +(segmentation or recognition), the complexity of the layout or the nature of +the script to recognize. Only features that are found in the training data can +later be recognized, so it is important that the coverage of typographic +features is exhaustive. Training a small segmentation model for a particular +kind of material might require less than a few hundred samples while a general +model can well go into the thousands of pages. Likewise a specific recognition +model for printed script with a small grapheme inventory such as Arabic or +Hebrew requires around 800 lines, with manuscripts, complex scripts (such as +polytonic Greek), and general models for multiple typefaces and hands needing +more training data for the same accuracy. + +There is no hard rule for the amount of training data and it may be required to +retrain a model after the initial training data proves insufficient. Most +``western`` texts contain between 25 and 40 lines per page, therefore upward of +30 pages have to be preprocessed and later transcribed. + +Annotation and transcription +---------------------------- + +kraken does not provide internal tools for the annotation and transcription of +baselines, regions, and text. There are a number of tools available that can +create ALTO and PageXML files containing the requisite information for either +segmentation or recognition training: `escriptorium +`_ integrates kraken tightly including +training and inference, `Aletheia +`_ is a powerful desktop +application that can create fine grained annotations. + +Dataset Compilation +------------------- + +.. _compilation: + +Training +-------- + +.. _training_step: + +The training data, e.g. a collection of PAGE XML documents, obtained through +annotation and transcription may now be used to train segmentation and/or +transcription models. + +The training data in ``output_dir`` may now be used to train a new model by +invoking the ``ketos train`` command. Just hand a list of images to the command +such as: + +.. code-block:: console + + $ ketos train output_dir/*.png + +to start training. + +A number of lines will be split off into a separate held-out set that is used +to estimate the actual recognition accuracy achieved in the real world. These +are never shown to the network during training but will be recognized +periodically to evaluate the accuracy of the model. Per default the validation +set will comprise of 10% of the training data. + +Basic model training is mostly automatic albeit there are multiple parameters +that can be adjusted: + +--output + Sets the prefix for models generated during training. They will best as + ``prefix_epochs.mlmodel``. +--report + How often evaluation passes are run on the validation set. It is an + integer equal or larger than 1 with 1 meaning a report is created each + time the complete training set has been seen by the network. +--savefreq + How often intermediate models are saved to disk. It is an integer with + the same semantics as ``--report``. +--load + Continuing training is possible by loading an existing model file with + ``--load``. To continue training from a base model with another + training set refer to the full :ref:`ketos ` documentation. +--preload + Enables/disables preloading of the training set into memory for + accelerated training. The default setting preloads data sets with less + than 2500 lines, explicitly adding ``--preload`` will preload arbitrary + sized sets. ``--no-preload`` disables preloading in all circumstances. + +Training a network will take some time on a modern computer, even with the +default parameters. While the exact time required is unpredictable as training +is a somewhat random process a rough guide is that accuracy seldom improves +after 50 epochs reached between 8 and 24 hours of training. + +When to stop training is a matter of experience; the default setting employs a +fairly reliable approach known as `early stopping +`_ that stops training as soon as +the error rate on the validation set doesn't improve anymore. This will +prevent `overfitting `_, i.e. +fitting the model to recognize only the training data properly instead of the +general patterns contained therein. + +.. code-block:: console + + $ ketos train output_dir/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'} + Initializing model ✓ + Accuracy report (0) -1.5951 3680 9550 + epoch 0/-1 [####################################] 788/788 + Accuracy report (1) 0.0245 3504 3418 + epoch 1/-1 [####################################] 788/788 + Accuracy report (2) 0.8445 3504 545 + epoch 2/-1 [####################################] 788/788 + Accuracy report (3) 0.9541 3504 161 + epoch 3/-1 [------------------------------------] 13/788 0d 00:22:09 + ... + +By now there should be a couple of models model_name-1.mlmodel, +model_name-2.mlmodel, ... in the directory the script was executed in. Lets +take a look at each part of the output. + +.. code-block:: console + + Building training set [####################################] 100% + Building validation set [####################################] 100% + +shows the progress of loading the training and validation set into memory. This +might take a while as preprocessing the whole set and putting it into memory is +computationally intensive. Loading can be made faster without preloading at the +cost of performing preprocessing repeatedly during the training process. + +.. code-block:: console + + [270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'} + +is a warning about missing characters in either the validation or training set, +i.e. that the alphabets of the sets are not equal. Increasing the size of the +validation set will often remedy this warning. + +.. code-block:: console + + Accuracy report (2) 0.8445 3504 545 + +this line shows the results of the validation set evaluation. The error after 2 +epochs is 545 incorrect characters out of 3504 characters in the validation set +for a character accuracy of 84.4%. It should decrease fairly rapidly. If +accuracy remains around 0.30 something is amiss, e.g. non-reordered +right-to-left or wildly incorrect transcriptions. Abort training, correct the +error(s) and start again. + +After training is finished the best model is saved as +``model_name_best.mlmodel``. It is highly recommended to also archive the +training log and data for later reference. + +``ketos`` can also produce more verbose output with training set and network +information by appending one or more ``-v`` to the command: + +.. code-block:: console + + $ ketos -vv train syr/*.png + [0.7272] Building ground truth set from 876 line images + [0.7281] Taking 88 lines from training for evaluation + ... + [0.8479] Training set 788 lines, validation set 88 lines, alphabet 48 symbols + [0.8481] alphabet mismatch {'\xa0', '0', ':', '݀', '܇', '݂', '5'} + [0.8482] grapheme count + [0.8484] SPACE 5258 + [0.8484] ܐ 3519 + [0.8485] ܘ 2334 + [0.8486] ܝ 2096 + [0.8487] ܠ 1754 + [0.8487] ܢ 1724 + [0.8488] ܕ 1697 + [0.8489] ܗ 1681 + [0.8489] ܡ 1623 + [0.8490] ܪ 1359 + [0.8491] ܬ 1339 + [0.8491] ܒ 1184 + [0.8492] ܥ 824 + [0.8492] . 811 + [0.8493] COMBINING DOT BELOW 646 + [0.8493] ܟ 599 + [0.8494] ܫ 577 + [0.8495] COMBINING DIAERESIS 488 + [0.8495] ܚ 431 + [0.8496] ܦ 428 + [0.8496] ܩ 307 + [0.8497] COMBINING DOT ABOVE 259 + [0.8497] ܣ 256 + [0.8498] ܛ 204 + [0.8498] ܓ 176 + [0.8499] ܀ 132 + [0.8499] ܙ 81 + [0.8500] * 66 + [0.8501] ܨ 59 + [0.8501] ܆ 40 + [0.8502] [ 40 + [0.8503] ] 40 + [0.8503] 1 18 + [0.8504] 2 11 + [0.8504] ܇ 9 + [0.8505] 3 8 + [0.8505] 6 + [0.8506] 5 5 + [0.8506] NO-BREAK SPACE 4 + [0.8507] 0 4 + [0.8507] 6 4 + [0.8508] : 4 + [0.8508] 8 4 + [0.8509] 9 3 + [0.8510] 7 3 + [0.8510] 4 3 + [0.8511] SYRIAC FEMININE DOT 1 + [0.8511] SYRIAC RUKKAKHA 1 + [0.8512] Encoding training set + [0.9315] Creating new model [1,1,0,48 Lbx100 Do] with 49 outputs + [0.9318] layer type params + [0.9350] 0 rnn direction b transposed False summarize False out 100 legacy None + [0.9361] 1 dropout probability 0.5 dims 1 + [0.9381] 2 linear augmented False out 49 + [0.9918] Constructing RMSprop optimizer (lr: 0.001, momentum: 0.9) + [0.9920] Set OpenMP threads to 4 + [0.9920] Moving model to device cpu + [0.9924] Starting evaluation run + + +indicates that the training is running on 788 transcribed lines and a +validation set of 88 lines. 49 different classes, i.e. Unicode code points, +where found in these 788 lines. These affect the output size of the network; +obviously only these 49 different classes/code points can later be output by +the network. Importantly, we can see that certain characters occur markedly +less often than others. Characters like the Syriac feminine dot and numerals +that occur less than 10 times will most likely not be recognized well by the +trained net. + + +Evaluation and Validation +------------------------- + +While output during training is detailed enough to know when to stop training +one usually wants to know the specific kinds of errors to expect. Doing more +in-depth error analysis also allows to pinpoint weaknesses in the training +data, e.g. above average error rates for numerals indicate either a lack of +representation of numerals in the training data or erroneous transcription in +the first place. + +First the trained model has to be applied to some line transcriptions with the +`ketos test` command: + +.. code-block:: console + + $ ketos test -m syriac_best.mlmodel lines/*.png + Loading model syriac_best.mlmodel ✓ + Evaluating syriac_best.mlmodel + Evaluating [#-----------------------------------] 3% 00:04:56 + ... + +After all lines have been processed a evaluation report will be printed: + +.. code-block:: console + + === report === + + 35619 Characters + 336 Errors + 99.06% Accuracy + + 157 Insertions + 81 Deletions + 98 Substitutions + + Count Missed %Right + 27046 143 99.47% Syriac + 7015 52 99.26% Common + 1558 60 96.15% Inherited + + Errors Correct-Generated + 25 { } - { COMBINING DOT BELOW } + 25 { COMBINING DOT BELOW } - { } + 15 { . } - { } + 15 { COMBINING DIAERESIS } - { } + 12 { ܢ } - { } + 10 { } - { . } + 8 { COMBINING DOT ABOVE } - { } + 8 { ܝ } - { } + 7 { ZERO WIDTH NO-BREAK SPACE } - { } + 7 { ܆ } - { } + 7 { SPACE } - { } + 7 { ܣ } - { } + 6 { } - { ܝ } + 6 { COMBINING DOT ABOVE } - { COMBINING DIAERESIS } + 5 { ܙ } - { } + 5 { ܬ } - { } + 5 { } - { ܢ } + 4 { NO-BREAK SPACE } - { } + 4 { COMBINING DIAERESIS } - { COMBINING DOT ABOVE } + 4 { } - { ܒ } + 4 { } - { COMBINING DIAERESIS } + 4 { ܗ } - { } + 4 { } - { ܬ } + 4 { } - { ܘ } + 4 { ܕ } - { ܢ } + 3 { } - { ܕ } + 3 { ܐ } - { } + 3 { ܗ } - { ܐ } + 3 { ܝ } - { ܢ } + 3 { ܀ } - { . } + 3 { } - { ܗ } + + ..... + +The first section of the report consists of a simple accounting of the number +of characters in the ground truth, the errors in the recognition output and the +resulting accuracy in per cent. + +The next table lists the number of insertions (characters occurring in the +ground truth but not in the recognition output), substitutions (misrecognized +characters), and deletions (superfluous characters recognized by the model). + +Next is a grouping of errors (insertions and substitutions) by Unicode script. + +The final part of the report are errors sorted by frequency and a per +character accuracy report. Importantly most errors are incorrect recognition of +combining marks such as dots and diaereses. These may have several sources: +different dot placement in training and validation set, incorrect transcription +such as non-systematic transcription, or unclean speckled scans. Depending on +the error source, correction most often involves adding more training data and +fixing transcriptions. Sometimes it may even be advisable to remove +unrepresentative data from the training set. + +Recognition +----------- + +The ``kraken`` utility is employed for all non-training related tasks. Optical +character recognition is a multi-step process consisting of binarization +(conversion of input images to black and white), page segmentation (extracting +lines from the image), and recognition (converting line image to character +sequences). All of these may be run in a single call like this: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m MODEL_FILE + +producing a text file from the input image. There are also `hocr +`_ and `ALTO `_ output +formats available through the appropriate switches: + +.. code-block:: console + + $ kraken -i ... ocr -h + $ kraken -i ... ocr -a + +For debugging purposes it is sometimes helpful to run each step manually and +inspect intermediate results: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE BW_IMAGE binarize + $ kraken -i BW_IMAGE LINES segment + $ kraken -i BW_IMAGE OUTPUT_FILE ocr -l LINES ... + +It is also possible to recognize more than one file at a time by just chaining +``-i ... ...`` clauses like this: + +.. code-block:: console + + $ kraken -i input_1 output_1 -i input_2 output_2 ... + +Finally, there is a central repository containing freely available models. +Getting a list of all available models: + +.. code-block:: console + + $ kraken list + +Retrieving model metadata for a particular model: + +.. code-block:: console + + $ kraken show arabic-alam-al-kutub + name: arabic-alam-al-kutub.mlmodel + + An experimental model for Classical Arabic texts. + + Network trained on 889 lines of [0] as a test case for a general Classical + Arabic model. Ground truth was prepared by Sarah Savant + and Maxim Romanov . + + Vocalization was omitted in the ground truth. Training was stopped at ~35000 + iterations with an accuracy of 97%. + + [0] Ibn al-Faqīh (d. 365 AH). Kitāb al-buldān. Edited by Yūsuf al-Hādī, 1st + edition. Bayrūt: ʿĀlam al-kutub, 1416 AH/1996 CE. + alphabet: !()-.0123456789:[] «»،؟ءابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ARABIC + MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW + +and actually fetching the model: + +.. code-block:: console + + $ kraken get arabic-alam-al-kutub + +The downloaded model can then be used for recognition by the name shown in its metadata, e.g.: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m arabic-alam-al-kutub.mlmodel + +For more documentation see the kraken `website `_. diff --git a/5.2/_sources/vgsl.rst.txt b/5.2/_sources/vgsl.rst.txt new file mode 100644 index 000000000..6a0c42de4 --- /dev/null +++ b/5.2/_sources/vgsl.rst.txt @@ -0,0 +1,233 @@ +.. _vgsl: + +VGSL network specification +========================== + +kraken implements a dialect of the Variable-size Graph Specification Language +(VGSL), enabling the specification of different network architectures for image +processing purposes using a short definition string. + +Basics +------ + +A VGSL specification consists of an input block, one or more layers, and an +output block. For example: + +.. code-block:: console + + [1,48,0,1 Cr3,3,32 Mp2,2 Cr3,3,64 Mp2,2 S1(1x12)1,3 Lbx100 Do O1c103] + +The first block defines the input in order of [batch, height, width, channels] +with zero-valued dimensions being variable. Integer valued height or width +input specifications will result in the input images being automatically scaled +in either dimension. + +When channels are set to 1 grayscale or B/W inputs are expected, 3 expects RGB +color images. Higher values in combination with a height of 1 result in the +network being fed 1 pixel wide grayscale strips scaled to the size of the +channel dimension. + +After the input, a number of layers are defined. Layers operate on the channel +dimension; this is intuitive for convolutional layers but a recurrent layer +doing sequence classification along the width axis on an image of a particular +height requires the height dimension to be moved to the channel dimension, +e.g.: + +.. code-block:: console + + [1,48,0,1 S1(1x48)1,3 Lbx100 O1c103] + +or using the alternative slightly faster formulation: + +.. code-block:: console + + [1,1,0,48 Lbx100 O1c103] + +Finally an output definition is appended. When training sequence classification +networks with the provided tools the appropriate output definition is +automatically appended to the network based on the alphabet of the training +data. + +Examples +-------- + +.. code-block:: console + + [1,1,0,48 Lbx100 Do 01c59] + + Creating new model [1,1,0,48 Lbx100 Do] with 59 outputs + layer type params + 0 rnn direction b transposed False summarize False out 100 legacy None + 1 dropout probability 0.5 dims 1 + 2 linear augmented False out 59 + +A simple recurrent recognition model with a single LSTM layer classifying lines +normalized to 48 pixels in height. + +.. code-block:: console + + [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do 01c59] + + Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 59 outputs + layer type params + 0 conv kernel 3 x 3 filters 32 activation r + 1 dropout probability 0.1 dims 2 + 2 maxpool kernel 2 x 2 stride 2 x 2 + 3 conv kernel 3 x 3 filters 64 activation r + 4 dropout probability 0.1 dims 2 + 5 maxpool kernel 2 x 2 stride 2 x 2 + 6 reshape from 1 1 x 12 to 1/3 + 7 rnn direction b transposed False summarize False out 100 legacy None + 8 dropout probability 0.5 dims 1 + 9 linear augmented False out 59 + +A model with a small convolutional stack before a recurrent LSTM layer. The +extended dropout layer syntax is used to reduce drop probability on the depth +dimension as the default is too high for convolutional layers. The remainder of +the height dimension (`12`) is reshaped into the depth dimensions before +applying the final recurrent and linear layers. + +.. code-block:: console + + [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do 01c59] + + Creating new model [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do] with 59 outputs + layer type params + 0 conv kernel 3 x 3 filters 16 activation r + 1 maxpool kernel 3 x 3 stride 3 x 3 + 2 rnn direction f transposed True summarize True out 64 legacy None + 3 rnn direction b transposed False summarize False out 128 legacy None + 4 rnn direction b transposed False summarize False out 256 legacy None + 5 dropout probability 0.5 dims 1 + 6 linear augmented False out 59 + +A model with arbitrary sized color image input, an initial summarizing +recurrent layer to squash the height to 64, followed by 2 bi-directional +recurrent layers and a linear projection. + +.. code-block:: console + + [1,1800,0,3 Cr3,3,32 Gn8 (I [Cr3,3,64,2,2 Gn8 CTr3,3,32,2,2]) Cr3,3,32 O2l8] + + layer type params + 0 conv kernel 3 x 3 filters 32 activation r + 1 groupnorm 8 groups + 2 parallel execute 2.0 and 2.1 in parallel + 2.0 identity + 2.1 serial execute 2.1.0 to 2.1.2 in sequence + 2.1.0 conv kernel 3 x 3 stride 2 x 2 filters 64 activation r + 2.1.1 groupnorm 8 groups + 2.1.2 transposed convolution kernel 3 x 3 stride 2 x 2 filters 2 activation r + 3 conv kernel 3 x 3 stride 1 x 1 filters 32 activation r + 4 linear activation sigmoid + +A model that outputs heatmaps with 8 feature dimensions, taking color images with +height normalized to 1800 pixels as its input. It uses a strided convolution +to first scale the image down, and then a transposed convolution to transform +the image back to its original size. This is done in a parallel block, where the +other branch simply passes through the output of the first convolution layer. +The input of the last convolutional layer is then the output of the two branches +of the parallel block concatenated, i.e. the output of the first +convolutional layer together with the output of the transposed convolutional layer, +giving `32 + 32 = 64` feature dimensions. + +Convolutional Layers +-------------------- + +.. code-block:: console + + C[T][{name}](s|t|r|l|m)[{name}],,[,,][,,] + s = sigmoid + t = tanh + r = relu + l = linear + m = softmax + +Adds a 2D convolution with kernel size `(y, x)` and `d` output channels, applying +the selected nonlinearity. Stride and dilation can be adjusted with the optional last +two parameters. `T` gives a transposed convolution. For transposed convolutions, +several output sizes are possible for the same configuration. The system +will try to match the output size of the different branches of parallel +blocks, however, this will only work if the transposed convolution directly +proceeds the confluence of the parallel branches, and if the branches with +fixed output size come first in the definition of the parallel block. Hence, +out of `(I [Cr3,3,8,2,2 CTr3,3,8,2,2])`, `([Cr3,3,8,2,2 CTr3,3,8,2,2] I)` +and `(I [Cr3,3,8,2,2 CTr3,3,8,2,2 Gn8])` only the first variant will +behave correctly. + +Recurrent Layers +---------------- + +.. code-block:: console + + L[{name}](f|r|b)(x|y)[s][{name}] LSTM cell with n outputs. + G[{name}](f|r|b)(x|y)[s][{name}] GRU cell with n outputs. + f runs the RNN forward only. + r runs the RNN reversed only. + b runs the RNN bidirectionally. + s (optional) summarizes the output in the requested dimension, return the last step. + +Adds either an LSTM or GRU recurrent layer to the network using either the `x` +(width) or `y` (height) dimension as the time axis. Input features are the +channel dimension and the non-time-axis dimension (height/width) is treated as +another batch dimension. For example, a `Lfx25` layer on an `1, 16, 906, 32` +input will execute 16 independent forward passes on `906x32` tensors resulting +in an output of shape `1, 16, 906, 25`. If this isn't desired either run a +summarizing layer in the other direction, e.g. `Lfys20` for an input `1, 1, +906, 20`, or prepend a reshape layer `S1(1x16)1,3` combining the height and +channel dimension for an `1, 1, 906, 512` input to the recurrent layer. + +Helper and Plumbing Layers +-------------------------- + +Max Pool +^^^^^^^^ +.. code-block:: console + + Mp[{name}],[,,] + +Adds a maximum pooling with `(y, x)` kernel_size and `(y_stride, x_stride)` stride. + +Reshape +^^^^^^^ + +.. code-block:: console + + S[{name}](x), Splits one dimension, moves one part to another + dimension. + +The `S` layer reshapes a source dimension `d` to `a,b` and distributes `a` into +dimension `e`, respectively `b` into `f`. Either `e` or `f` has to be equal to +`d`. So `S1(1, 48)1, 3` on an `1, 48, 1020, 8` input will first reshape into +`1, 1, 48, 1020, 8`, leave the `1` part in the height dimension and distribute +the `48` sized tensor into the channel dimension resulting in a `1, 1, 1024, +48*8=384` sized output. `S` layers are mostly used to remove undesirable non-1 +height before a recurrent layer. + +.. note:: + + This `S` layer is equivalent to the one implemented in the tensorflow + implementation of VGSL, i.e. behaves differently from tesseract. + +Regularization Layers +--------------------- + +Dropout +^^^^^^^ + +.. code-block:: console + + Do[{name}][],[] Insert a 1D or 2D dropout layer + +Adds an 1D or 2D dropout layer with a given probability. Defaults to `0.5` drop +probability and 1D dropout. Set to `dim` to `2` after convolutional layers. + +Group Normalization +^^^^^^^^^^^^^^^^^^^ + +.. code-block:: console + + Gn Inserts a group normalization layer + +Adds a group normalization layer separating the input into `` groups, +normalizing each separately. diff --git a/5.2/_static/alabaster.css b/5.2/_static/alabaster.css new file mode 100644 index 000000000..e3174bf93 --- /dev/null +++ b/5.2/_static/alabaster.css @@ -0,0 +1,708 @@ +@import url("basic.css"); + +/* -- page layout ----------------------------------------------------------- */ + +body { + font-family: Georgia, serif; + font-size: 17px; + background-color: #fff; + color: #000; + margin: 0; + padding: 0; +} + + +div.document { + width: 940px; + margin: 30px auto 0 auto; +} + +div.documentwrapper { + float: left; + width: 100%; +} + +div.bodywrapper { + margin: 0 0 0 220px; +} + +div.sphinxsidebar { + width: 220px; + font-size: 14px; + line-height: 1.5; +} + +hr { + border: 1px solid #B1B4B6; +} + +div.body { + background-color: #fff; + color: #3E4349; + padding: 0 30px 0 30px; +} + +div.body > .section { + text-align: left; +} + +div.footer { + width: 940px; + margin: 20px auto 30px auto; + font-size: 14px; + color: #888; + text-align: right; +} + +div.footer a { + color: #888; +} + +p.caption { + font-family: inherit; + font-size: inherit; +} + + +div.relations { + display: none; +} + + +div.sphinxsidebar { + max-height: 100%; + overflow-y: auto; +} + +div.sphinxsidebar a { + color: #444; + text-decoration: none; + border-bottom: 1px dotted #999; +} + +div.sphinxsidebar a:hover { + border-bottom: 1px solid #999; +} + +div.sphinxsidebarwrapper { + padding: 18px 10px; +} + +div.sphinxsidebarwrapper p.logo { + padding: 0; + margin: -10px 0 0 0px; + text-align: center; +} + +div.sphinxsidebarwrapper h1.logo { + margin-top: -10px; + text-align: center; + margin-bottom: 5px; + text-align: left; +} + +div.sphinxsidebarwrapper h1.logo-name { + margin-top: 0px; +} + +div.sphinxsidebarwrapper p.blurb { + margin-top: 0; + font-style: normal; +} + +div.sphinxsidebar h3, +div.sphinxsidebar h4 { + font-family: Georgia, serif; + color: #444; + font-size: 24px; + font-weight: normal; + margin: 0 0 5px 0; + padding: 0; +} + +div.sphinxsidebar h4 { + font-size: 20px; +} + +div.sphinxsidebar h3 a { + color: #444; +} + +div.sphinxsidebar p.logo a, +div.sphinxsidebar h3 a, +div.sphinxsidebar p.logo a:hover, +div.sphinxsidebar h3 a:hover { + border: none; +} + +div.sphinxsidebar p { + color: #555; + margin: 10px 0; +} + +div.sphinxsidebar ul { + margin: 10px 0; + padding: 0; + color: #000; +} + +div.sphinxsidebar ul li.toctree-l1 > a { + font-size: 120%; +} + +div.sphinxsidebar ul li.toctree-l2 > a { + font-size: 110%; +} + +div.sphinxsidebar input { + border: 1px solid #CCC; + font-family: Georgia, serif; + font-size: 1em; +} + +div.sphinxsidebar #searchbox input[type="text"] { + width: 160px; +} + +div.sphinxsidebar .search > div { + display: table-cell; +} + +div.sphinxsidebar hr { + border: none; + height: 1px; + color: #AAA; + background: #AAA; + + text-align: left; + margin-left: 0; + width: 50%; +} + +div.sphinxsidebar .badge { + border-bottom: none; +} + +div.sphinxsidebar .badge:hover { + border-bottom: none; +} + +/* To address an issue with donation coming after search */ +div.sphinxsidebar h3.donation { + margin-top: 10px; +} + +/* -- body styles ----------------------------------------------------------- */ + +a { + color: #004B6B; + text-decoration: underline; +} + +a:hover { + color: #6D4100; + text-decoration: underline; +} + +div.body h1, +div.body h2, +div.body h3, +div.body h4, +div.body h5, +div.body h6 { + font-family: Georgia, serif; + font-weight: normal; + margin: 30px 0px 10px 0px; + padding: 0; +} + +div.body h1 { margin-top: 0; padding-top: 0; font-size: 240%; } +div.body h2 { font-size: 180%; } +div.body h3 { font-size: 150%; } +div.body h4 { font-size: 130%; } +div.body h5 { font-size: 100%; } +div.body h6 { font-size: 100%; } + +a.headerlink { + color: #DDD; + padding: 0 4px; + text-decoration: none; +} + +a.headerlink:hover { + color: #444; + background: #EAEAEA; +} + +div.body p, div.body dd, div.body li { + line-height: 1.4em; +} + +div.admonition { + margin: 20px 0px; + padding: 10px 30px; + background-color: #EEE; + border: 1px solid #CCC; +} + +div.admonition tt.xref, div.admonition code.xref, div.admonition a tt { + background-color: #FBFBFB; + border-bottom: 1px solid #fafafa; +} + +div.admonition p.admonition-title { + font-family: Georgia, serif; + font-weight: normal; + font-size: 24px; + margin: 0 0 10px 0; + padding: 0; + line-height: 1; +} + +div.admonition p.last { + margin-bottom: 0; +} + +div.highlight { + background-color: #fff; +} + +dt:target, .highlight { + background: #FAF3E8; +} + +div.warning { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.danger { + background-color: #FCC; + border: 1px solid #FAA; + -moz-box-shadow: 2px 2px 4px #D52C2C; + -webkit-box-shadow: 2px 2px 4px #D52C2C; + box-shadow: 2px 2px 4px #D52C2C; +} + +div.error { + background-color: #FCC; + border: 1px solid #FAA; + -moz-box-shadow: 2px 2px 4px #D52C2C; + -webkit-box-shadow: 2px 2px 4px #D52C2C; + box-shadow: 2px 2px 4px #D52C2C; +} + +div.caution { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.attention { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.important { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.note { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.tip { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.hint { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.seealso { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.topic { + background-color: #EEE; +} + +p.admonition-title { + display: inline; +} + +p.admonition-title:after { + content: ":"; +} + +pre, tt, code { + font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace; + font-size: 0.9em; +} + +.hll { + background-color: #FFC; + margin: 0 -12px; + padding: 0 12px; + display: block; +} + +img.screenshot { +} + +tt.descname, tt.descclassname, code.descname, code.descclassname { + font-size: 0.95em; +} + +tt.descname, code.descname { + padding-right: 0.08em; +} + +img.screenshot { + -moz-box-shadow: 2px 2px 4px #EEE; + -webkit-box-shadow: 2px 2px 4px #EEE; + box-shadow: 2px 2px 4px #EEE; +} + +table.docutils { + border: 1px solid #888; + -moz-box-shadow: 2px 2px 4px #EEE; + -webkit-box-shadow: 2px 2px 4px #EEE; + box-shadow: 2px 2px 4px #EEE; +} + +table.docutils td, table.docutils th { + border: 1px solid #888; + padding: 0.25em 0.7em; +} + +table.field-list, table.footnote { + border: none; + -moz-box-shadow: none; + -webkit-box-shadow: none; + box-shadow: none; +} + +table.footnote { + margin: 15px 0; + width: 100%; + border: 1px solid #EEE; + background: #FDFDFD; + font-size: 0.9em; +} + +table.footnote + table.footnote { + margin-top: -15px; + border-top: none; +} + +table.field-list th { + padding: 0 0.8em 0 0; +} + +table.field-list td { + padding: 0; +} + +table.field-list p { + margin-bottom: 0.8em; +} + +/* Cloned from + * https://github.com/sphinx-doc/sphinx/commit/ef60dbfce09286b20b7385333d63a60321784e68 + */ +.field-name { + -moz-hyphens: manual; + -ms-hyphens: manual; + -webkit-hyphens: manual; + hyphens: manual; +} + +table.footnote td.label { + width: .1px; + padding: 0.3em 0 0.3em 0.5em; +} + +table.footnote td { + padding: 0.3em 0.5em; +} + +dl { + margin-left: 0; + margin-right: 0; + margin-top: 0; + padding: 0; +} + +dl dd { + margin-left: 30px; +} + +blockquote { + margin: 0 0 0 30px; + padding: 0; +} + +ul, ol { + /* Matches the 30px from the narrow-screen "li > ul" selector below */ + margin: 10px 0 10px 30px; + padding: 0; +} + +pre { + background: #EEE; + padding: 7px 30px; + margin: 15px 0px; + line-height: 1.3em; +} + +div.viewcode-block:target { + background: #ffd; +} + +dl pre, blockquote pre, li pre { + margin-left: 0; + padding-left: 30px; +} + +tt, code { + background-color: #ecf0f3; + color: #222; + /* padding: 1px 2px; */ +} + +tt.xref, code.xref, a tt { + background-color: #FBFBFB; + border-bottom: 1px solid #fff; +} + +a.reference { + text-decoration: none; + border-bottom: 1px dotted #004B6B; +} + +/* Don't put an underline on images */ +a.image-reference, a.image-reference:hover { + border-bottom: none; +} + +a.reference:hover { + border-bottom: 1px solid #6D4100; +} + +a.footnote-reference { + text-decoration: none; + font-size: 0.7em; + vertical-align: top; + border-bottom: 1px dotted #004B6B; +} + +a.footnote-reference:hover { + border-bottom: 1px solid #6D4100; +} + +a:hover tt, a:hover code { + background: #EEE; +} + + +@media screen and (max-width: 870px) { + + div.sphinxsidebar { + display: none; + } + + div.document { + width: 100%; + + } + + div.documentwrapper { + margin-left: 0; + margin-top: 0; + margin-right: 0; + margin-bottom: 0; + } + + div.bodywrapper { + margin-top: 0; + margin-right: 0; + margin-bottom: 0; + margin-left: 0; + } + + ul { + margin-left: 0; + } + + li > ul { + /* Matches the 30px from the "ul, ol" selector above */ + margin-left: 30px; + } + + .document { + width: auto; + } + + .footer { + width: auto; + } + + .bodywrapper { + margin: 0; + } + + .footer { + width: auto; + } + + .github { + display: none; + } + + + +} + + + +@media screen and (max-width: 875px) { + + body { + margin: 0; + padding: 20px 30px; + } + + div.documentwrapper { + float: none; + background: #fff; + } + + div.sphinxsidebar { + display: block; + float: none; + width: 102.5%; + margin: 50px -30px -20px -30px; + padding: 10px 20px; + background: #333; + color: #FFF; + } + + div.sphinxsidebar h3, div.sphinxsidebar h4, div.sphinxsidebar p, + div.sphinxsidebar h3 a { + color: #fff; + } + + div.sphinxsidebar a { + color: #AAA; + } + + div.sphinxsidebar p.logo { + display: none; + } + + div.document { + width: 100%; + margin: 0; + } + + div.footer { + display: none; + } + + div.bodywrapper { + margin: 0; + } + + div.body { + min-height: 0; + padding: 0; + } + + .rtd_doc_footer { + display: none; + } + + .document { + width: auto; + } + + .footer { + width: auto; + } + + .footer { + width: auto; + } + + .github { + display: none; + } +} + + +/* misc. */ + +.revsys-inline { + display: none!important; +} + +/* Hide ugly table cell borders in ..bibliography:: directive output */ +table.docutils.citation, table.docutils.citation td, table.docutils.citation th { + border: none; + /* Below needed in some edge cases; if not applied, bottom shadows appear */ + -moz-box-shadow: none; + -webkit-box-shadow: none; + box-shadow: none; +} + + +/* relbar */ + +.related { + line-height: 30px; + width: 100%; + font-size: 0.9rem; +} + +.related.top { + border-bottom: 1px solid #EEE; + margin-bottom: 20px; +} + +.related.bottom { + border-top: 1px solid #EEE; +} + +.related ul { + padding: 0; + margin: 0; + list-style: none; +} + +.related li { + display: inline; +} + +nav#rellinks { + float: right; +} + +nav#rellinks li+li:before { + content: "|"; +} + +nav#breadcrumbs li+li:before { + content: "\00BB"; +} + +/* Hide certain items when printing */ +@media print { + div.related { + display: none; + } +} \ No newline at end of file diff --git a/5.2/_static/basic.css b/5.2/_static/basic.css new file mode 100644 index 000000000..e5179b7a9 --- /dev/null +++ b/5.2/_static/basic.css @@ -0,0 +1,925 @@ +/* + * basic.css + * ~~~~~~~~~ + * + * Sphinx stylesheet -- basic theme. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +/* -- main layout ----------------------------------------------------------- */ + +div.clearer { + clear: both; +} + +div.section::after { + display: block; + content: ''; + clear: left; +} + +/* -- relbar ---------------------------------------------------------------- */ + +div.related { + width: 100%; + font-size: 90%; +} + +div.related h3 { + display: none; +} + +div.related ul { + margin: 0; + padding: 0 0 0 10px; + list-style: none; +} + +div.related li { + display: inline; +} + +div.related li.right { + float: right; + margin-right: 5px; +} + +/* -- sidebar --------------------------------------------------------------- */ + +div.sphinxsidebarwrapper { + padding: 10px 5px 0 10px; +} + +div.sphinxsidebar { + float: left; + width: 230px; + margin-left: -100%; + font-size: 90%; + word-wrap: break-word; + overflow-wrap : break-word; +} + +div.sphinxsidebar ul { + list-style: none; +} + +div.sphinxsidebar ul ul, +div.sphinxsidebar ul.want-points { + margin-left: 20px; + list-style: square; +} + +div.sphinxsidebar ul ul { + margin-top: 0; + margin-bottom: 0; +} + +div.sphinxsidebar form { + margin-top: 10px; +} + +div.sphinxsidebar input { + border: 1px solid #98dbcc; + font-family: sans-serif; + font-size: 1em; +} + +div.sphinxsidebar #searchbox form.search { + overflow: hidden; +} + +div.sphinxsidebar #searchbox input[type="text"] { + float: left; + width: 80%; + padding: 0.25em; + box-sizing: border-box; +} + +div.sphinxsidebar #searchbox input[type="submit"] { + float: left; + width: 20%; + border-left: none; + padding: 0.25em; + box-sizing: border-box; +} + + +img { + border: 0; + max-width: 100%; +} + +/* -- search page ----------------------------------------------------------- */ + +ul.search { + margin: 10px 0 0 20px; + padding: 0; +} + +ul.search li { + padding: 5px 0 5px 20px; + background-image: url(file.png); + background-repeat: no-repeat; + background-position: 0 7px; +} + +ul.search li a { + font-weight: bold; +} + +ul.search li p.context { + color: #888; + margin: 2px 0 0 30px; + text-align: left; +} + +ul.keywordmatches li.goodmatch a { + font-weight: bold; +} + +/* -- index page ------------------------------------------------------------ */ + +table.contentstable { + width: 90%; + margin-left: auto; + margin-right: auto; +} + +table.contentstable p.biglink { + line-height: 150%; +} + +a.biglink { + font-size: 1.3em; +} + +span.linkdescr { + font-style: italic; + padding-top: 5px; + font-size: 90%; +} + +/* -- general index --------------------------------------------------------- */ + +table.indextable { + width: 100%; +} + +table.indextable td { + text-align: left; + vertical-align: top; +} + +table.indextable ul { + margin-top: 0; + margin-bottom: 0; + list-style-type: none; +} + +table.indextable > tbody > tr > td > ul { + padding-left: 0em; +} + +table.indextable tr.pcap { + height: 10px; +} + +table.indextable tr.cap { + margin-top: 10px; + background-color: #f2f2f2; +} + +img.toggler { + margin-right: 3px; + margin-top: 3px; + cursor: pointer; +} + +div.modindex-jumpbox { + border-top: 1px solid #ddd; + border-bottom: 1px solid #ddd; + margin: 1em 0 1em 0; + padding: 0.4em; +} + +div.genindex-jumpbox { + border-top: 1px solid #ddd; + border-bottom: 1px solid #ddd; + margin: 1em 0 1em 0; + padding: 0.4em; +} + +/* -- domain module index --------------------------------------------------- */ + +table.modindextable td { + padding: 2px; + border-collapse: collapse; +} + +/* -- general body styles --------------------------------------------------- */ + +div.body { + min-width: inherit; + max-width: 800px; +} + +div.body p, div.body dd, div.body li, div.body blockquote { + -moz-hyphens: auto; + -ms-hyphens: auto; + -webkit-hyphens: auto; + hyphens: auto; +} + +a.headerlink { + visibility: hidden; +} + +a:visited { + color: #551A8B; +} + +h1:hover > a.headerlink, +h2:hover > a.headerlink, +h3:hover > a.headerlink, +h4:hover > a.headerlink, +h5:hover > a.headerlink, +h6:hover > a.headerlink, +dt:hover > a.headerlink, +caption:hover > a.headerlink, +p.caption:hover > a.headerlink, +div.code-block-caption:hover > a.headerlink { + visibility: visible; +} + +div.body p.caption { + text-align: inherit; +} + +div.body td { + text-align: left; +} + +.first { + margin-top: 0 !important; +} + +p.rubric { + margin-top: 30px; + font-weight: bold; +} + +img.align-left, figure.align-left, .figure.align-left, object.align-left { + clear: left; + float: left; + margin-right: 1em; +} + +img.align-right, figure.align-right, .figure.align-right, object.align-right { + clear: right; + float: right; + margin-left: 1em; +} + +img.align-center, figure.align-center, .figure.align-center, object.align-center { + display: block; + margin-left: auto; + margin-right: auto; +} + +img.align-default, figure.align-default, .figure.align-default { + display: block; + margin-left: auto; + margin-right: auto; +} + +.align-left { + text-align: left; +} + +.align-center { + text-align: center; +} + +.align-default { + text-align: center; +} + +.align-right { + text-align: right; +} + +/* -- sidebars -------------------------------------------------------------- */ + +div.sidebar, +aside.sidebar { + margin: 0 0 0.5em 1em; + border: 1px solid #ddb; + padding: 7px; + background-color: #ffe; + width: 40%; + float: right; + clear: right; + overflow-x: auto; +} + +p.sidebar-title { + font-weight: bold; +} + +nav.contents, +aside.topic, +div.admonition, div.topic, blockquote { + clear: left; +} + +/* -- topics ---------------------------------------------------------------- */ + +nav.contents, +aside.topic, +div.topic { + border: 1px solid #ccc; + padding: 7px; + margin: 10px 0 10px 0; +} + +p.topic-title { + font-size: 1.1em; + font-weight: bold; + margin-top: 10px; +} + +/* -- admonitions ----------------------------------------------------------- */ + +div.admonition { + margin-top: 10px; + margin-bottom: 10px; + padding: 7px; +} + +div.admonition dt { + font-weight: bold; +} + +p.admonition-title { + margin: 0px 10px 5px 0px; + font-weight: bold; +} + +div.body p.centered { + text-align: center; + margin-top: 25px; +} + +/* -- content of sidebars/topics/admonitions -------------------------------- */ + +div.sidebar > :last-child, +aside.sidebar > :last-child, +nav.contents > :last-child, +aside.topic > :last-child, +div.topic > :last-child, +div.admonition > :last-child { + margin-bottom: 0; +} + +div.sidebar::after, +aside.sidebar::after, +nav.contents::after, +aside.topic::after, +div.topic::after, +div.admonition::after, +blockquote::after { + display: block; + content: ''; + clear: both; +} + +/* -- tables ---------------------------------------------------------------- */ + +table.docutils { + margin-top: 10px; + margin-bottom: 10px; + border: 0; + border-collapse: collapse; +} + +table.align-center { + margin-left: auto; + margin-right: auto; +} + +table.align-default { + margin-left: auto; + margin-right: auto; +} + +table caption span.caption-number { + font-style: italic; +} + +table caption span.caption-text { +} + +table.docutils td, table.docutils th { + padding: 1px 8px 1px 5px; + border-top: 0; + border-left: 0; + border-right: 0; + border-bottom: 1px solid #aaa; +} + +th { + text-align: left; + padding-right: 5px; +} + +table.citation { + border-left: solid 1px gray; + margin-left: 1px; +} + +table.citation td { + border-bottom: none; +} + +th > :first-child, +td > :first-child { + margin-top: 0px; +} + +th > :last-child, +td > :last-child { + margin-bottom: 0px; +} + +/* -- figures --------------------------------------------------------------- */ + +div.figure, figure { + margin: 0.5em; + padding: 0.5em; +} + +div.figure p.caption, figcaption { + padding: 0.3em; +} + +div.figure p.caption span.caption-number, +figcaption span.caption-number { + font-style: italic; +} + +div.figure p.caption span.caption-text, +figcaption span.caption-text { +} + +/* -- field list styles ----------------------------------------------------- */ + +table.field-list td, table.field-list th { + border: 0 !important; +} + +.field-list ul { + margin: 0; + padding-left: 1em; +} + +.field-list p { + margin: 0; +} + +.field-name { + -moz-hyphens: manual; + -ms-hyphens: manual; + -webkit-hyphens: manual; + hyphens: manual; +} + +/* -- hlist styles ---------------------------------------------------------- */ + +table.hlist { + margin: 1em 0; +} + +table.hlist td { + vertical-align: top; +} + +/* -- object description styles --------------------------------------------- */ + +.sig { + font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace; +} + +.sig-name, code.descname { + background-color: transparent; + font-weight: bold; +} + +.sig-name { + font-size: 1.1em; +} + +code.descname { + font-size: 1.2em; +} + +.sig-prename, code.descclassname { + background-color: transparent; +} + +.optional { + font-size: 1.3em; +} + +.sig-paren { + font-size: larger; +} + +.sig-param.n { + font-style: italic; +} + +/* C++ specific styling */ + +.sig-inline.c-texpr, +.sig-inline.cpp-texpr { + font-family: unset; +} + +.sig.c .k, .sig.c .kt, +.sig.cpp .k, .sig.cpp .kt { + color: #0033B3; +} + +.sig.c .m, +.sig.cpp .m { + color: #1750EB; +} + +.sig.c .s, .sig.c .sc, +.sig.cpp .s, .sig.cpp .sc { + color: #067D17; +} + + +/* -- other body styles ----------------------------------------------------- */ + +ol.arabic { + list-style: decimal; +} + +ol.loweralpha { + list-style: lower-alpha; +} + +ol.upperalpha { + list-style: upper-alpha; +} + +ol.lowerroman { + list-style: lower-roman; +} + +ol.upperroman { + list-style: upper-roman; +} + +:not(li) > ol > li:first-child > :first-child, +:not(li) > ul > li:first-child > :first-child { + margin-top: 0px; +} + +:not(li) > ol > li:last-child > :last-child, +:not(li) > ul > li:last-child > :last-child { + margin-bottom: 0px; +} + +ol.simple ol p, +ol.simple ul p, +ul.simple ol p, +ul.simple ul p { + margin-top: 0; +} + +ol.simple > li:not(:first-child) > p, +ul.simple > li:not(:first-child) > p { + margin-top: 0; +} + +ol.simple p, +ul.simple p { + margin-bottom: 0; +} + +aside.footnote > span, +div.citation > span { + float: left; +} +aside.footnote > span:last-of-type, +div.citation > span:last-of-type { + padding-right: 0.5em; +} +aside.footnote > p { + margin-left: 2em; +} +div.citation > p { + margin-left: 4em; +} +aside.footnote > p:last-of-type, +div.citation > p:last-of-type { + margin-bottom: 0em; +} +aside.footnote > p:last-of-type:after, +div.citation > p:last-of-type:after { + content: ""; + clear: both; +} + +dl.field-list { + display: grid; + grid-template-columns: fit-content(30%) auto; +} + +dl.field-list > dt { + font-weight: bold; + word-break: break-word; + padding-left: 0.5em; + padding-right: 5px; +} + +dl.field-list > dd { + padding-left: 0.5em; + margin-top: 0em; + margin-left: 0em; + margin-bottom: 0em; +} + +dl { + margin-bottom: 15px; +} + +dd > :first-child { + margin-top: 0px; +} + +dd ul, dd table { + margin-bottom: 10px; +} + +dd { + margin-top: 3px; + margin-bottom: 10px; + margin-left: 30px; +} + +.sig dd { + margin-top: 0px; + margin-bottom: 0px; +} + +.sig dl { + margin-top: 0px; + margin-bottom: 0px; +} + +dl > dd:last-child, +dl > dd:last-child > :last-child { + margin-bottom: 0; +} + +dt:target, span.highlighted { + background-color: #fbe54e; +} + +rect.highlighted { + fill: #fbe54e; +} + +dl.glossary dt { + font-weight: bold; + font-size: 1.1em; +} + +.versionmodified { + font-style: italic; +} + +.system-message { + background-color: #fda; + padding: 5px; + border: 3px solid red; +} + +.footnote:target { + background-color: #ffa; +} + +.line-block { + display: block; + margin-top: 1em; + margin-bottom: 1em; +} + +.line-block .line-block { + margin-top: 0; + margin-bottom: 0; + margin-left: 1.5em; +} + +.guilabel, .menuselection { + font-family: sans-serif; +} + +.accelerator { + text-decoration: underline; +} + +.classifier { + font-style: oblique; +} + +.classifier:before { + font-style: normal; + margin: 0 0.5em; + content: ":"; + display: inline-block; +} + +abbr, acronym { + border-bottom: dotted 1px; + cursor: help; +} + +.translated { + background-color: rgba(207, 255, 207, 0.2) +} + +.untranslated { + background-color: rgba(255, 207, 207, 0.2) +} + +/* -- code displays --------------------------------------------------------- */ + +pre { + overflow: auto; + overflow-y: hidden; /* fixes display issues on Chrome browsers */ +} + +pre, div[class*="highlight-"] { + clear: both; +} + +span.pre { + -moz-hyphens: none; + -ms-hyphens: none; + -webkit-hyphens: none; + hyphens: none; + white-space: nowrap; +} + +div[class*="highlight-"] { + margin: 1em 0; +} + +td.linenos pre { + border: 0; + background-color: transparent; + color: #aaa; +} + +table.highlighttable { + display: block; +} + +table.highlighttable tbody { + display: block; +} + +table.highlighttable tr { + display: flex; +} + +table.highlighttable td { + margin: 0; + padding: 0; +} + +table.highlighttable td.linenos { + padding-right: 0.5em; +} + +table.highlighttable td.code { + flex: 1; + overflow: hidden; +} + +.highlight .hll { + display: block; +} + +div.highlight pre, +table.highlighttable pre { + margin: 0; +} + +div.code-block-caption + div { + margin-top: 0; +} + +div.code-block-caption { + margin-top: 1em; + padding: 2px 5px; + font-size: small; +} + +div.code-block-caption code { + background-color: transparent; +} + +table.highlighttable td.linenos, +span.linenos, +div.highlight span.gp { /* gp: Generic.Prompt */ + user-select: none; + -webkit-user-select: text; /* Safari fallback only */ + -webkit-user-select: none; /* Chrome/Safari */ + -moz-user-select: none; /* Firefox */ + -ms-user-select: none; /* IE10+ */ +} + +div.code-block-caption span.caption-number { + padding: 0.1em 0.3em; + font-style: italic; +} + +div.code-block-caption span.caption-text { +} + +div.literal-block-wrapper { + margin: 1em 0; +} + +code.xref, a code { + background-color: transparent; + font-weight: bold; +} + +h1 code, h2 code, h3 code, h4 code, h5 code, h6 code { + background-color: transparent; +} + +.viewcode-link { + float: right; +} + +.viewcode-back { + float: right; + font-family: sans-serif; +} + +div.viewcode-block:target { + margin: -1px -10px; + padding: 0 10px; +} + +/* -- math display ---------------------------------------------------------- */ + +img.math { + vertical-align: middle; +} + +div.body div.math p { + text-align: center; +} + +span.eqno { + float: right; +} + +span.eqno a.headerlink { + position: absolute; + z-index: 1; +} + +div.math:hover a.headerlink { + visibility: visible; +} + +/* -- printout stylesheet --------------------------------------------------- */ + +@media print { + div.document, + div.documentwrapper, + div.bodywrapper { + margin: 0 !important; + width: 100%; + } + + div.sphinxsidebar, + div.related, + div.footer, + #top-link { + display: none; + } +} \ No newline at end of file diff --git a/5.2/_static/blla_heatmap.jpg b/5.2/_static/blla_heatmap.jpg new file mode 100644 index 000000000..3f3381096 Binary files /dev/null and b/5.2/_static/blla_heatmap.jpg differ diff --git a/5.2/_static/blla_output.jpg b/5.2/_static/blla_output.jpg new file mode 100644 index 000000000..72c652fa4 Binary files /dev/null and b/5.2/_static/blla_output.jpg differ diff --git a/5.2/_static/bw.png b/5.2/_static/bw.png new file mode 100644 index 000000000..e7e5054eb Binary files /dev/null and b/5.2/_static/bw.png differ diff --git a/5.2/_static/custom.css b/5.2/_static/custom.css new file mode 100644 index 000000000..c41f90af5 --- /dev/null +++ b/5.2/_static/custom.css @@ -0,0 +1,24 @@ +pre { + white-space: pre-wrap; +} +svg { + width: 100%; +} +.highlight .err { + border: inherit; + box-sizing: inherit; +} + +div.leftside { + width: 110px; + padding: 0px 3px 0px 0px; + float: left; +} + +div.rightside { + margin-left: 125px; +} + +dl.py { + margin-top: 25px; +} diff --git a/5.2/_static/doctools.js b/5.2/_static/doctools.js new file mode 100644 index 000000000..4d67807d1 --- /dev/null +++ b/5.2/_static/doctools.js @@ -0,0 +1,156 @@ +/* + * doctools.js + * ~~~~~~~~~~~ + * + * Base JavaScript utilities for all Sphinx HTML documentation. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ +"use strict"; + +const BLACKLISTED_KEY_CONTROL_ELEMENTS = new Set([ + "TEXTAREA", + "INPUT", + "SELECT", + "BUTTON", +]); + +const _ready = (callback) => { + if (document.readyState !== "loading") { + callback(); + } else { + document.addEventListener("DOMContentLoaded", callback); + } +}; + +/** + * Small JavaScript module for the documentation. + */ +const Documentation = { + init: () => { + Documentation.initDomainIndexTable(); + Documentation.initOnKeyListeners(); + }, + + /** + * i18n support + */ + TRANSLATIONS: {}, + PLURAL_EXPR: (n) => (n === 1 ? 0 : 1), + LOCALE: "unknown", + + // gettext and ngettext don't access this so that the functions + // can safely bound to a different name (_ = Documentation.gettext) + gettext: (string) => { + const translated = Documentation.TRANSLATIONS[string]; + switch (typeof translated) { + case "undefined": + return string; // no translation + case "string": + return translated; // translation exists + default: + return translated[0]; // (singular, plural) translation tuple exists + } + }, + + ngettext: (singular, plural, n) => { + const translated = Documentation.TRANSLATIONS[singular]; + if (typeof translated !== "undefined") + return translated[Documentation.PLURAL_EXPR(n)]; + return n === 1 ? singular : plural; + }, + + addTranslations: (catalog) => { + Object.assign(Documentation.TRANSLATIONS, catalog.messages); + Documentation.PLURAL_EXPR = new Function( + "n", + `return (${catalog.plural_expr})` + ); + Documentation.LOCALE = catalog.locale; + }, + + /** + * helper function to focus on search bar + */ + focusSearchBar: () => { + document.querySelectorAll("input[name=q]")[0]?.focus(); + }, + + /** + * Initialise the domain index toggle buttons + */ + initDomainIndexTable: () => { + const toggler = (el) => { + const idNumber = el.id.substr(7); + const toggledRows = document.querySelectorAll(`tr.cg-${idNumber}`); + if (el.src.substr(-9) === "minus.png") { + el.src = `${el.src.substr(0, el.src.length - 9)}plus.png`; + toggledRows.forEach((el) => (el.style.display = "none")); + } else { + el.src = `${el.src.substr(0, el.src.length - 8)}minus.png`; + toggledRows.forEach((el) => (el.style.display = "")); + } + }; + + const togglerElements = document.querySelectorAll("img.toggler"); + togglerElements.forEach((el) => + el.addEventListener("click", (event) => toggler(event.currentTarget)) + ); + togglerElements.forEach((el) => (el.style.display = "")); + if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) togglerElements.forEach(toggler); + }, + + initOnKeyListeners: () => { + // only install a listener if it is really needed + if ( + !DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS && + !DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS + ) + return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.altKey || event.ctrlKey || event.metaKey) return; + + if (!event.shiftKey) { + switch (event.key) { + case "ArrowLeft": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const prevLink = document.querySelector('link[rel="prev"]'); + if (prevLink && prevLink.href) { + window.location.href = prevLink.href; + event.preventDefault(); + } + break; + case "ArrowRight": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const nextLink = document.querySelector('link[rel="next"]'); + if (nextLink && nextLink.href) { + window.location.href = nextLink.href; + event.preventDefault(); + } + break; + } + } + + // some keyboard layouts may need Shift to get / + switch (event.key) { + case "/": + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) break; + Documentation.focusSearchBar(); + event.preventDefault(); + } + }); + }, +}; + +// quick alias for translations +const _ = Documentation.gettext; + +_ready(Documentation.init); diff --git a/5.2/_static/documentation_options.js b/5.2/_static/documentation_options.js new file mode 100644 index 000000000..7e4c114f2 --- /dev/null +++ b/5.2/_static/documentation_options.js @@ -0,0 +1,13 @@ +const DOCUMENTATION_OPTIONS = { + VERSION: '', + LANGUAGE: 'en', + COLLAPSE_INDEX: false, + BUILDER: 'html', + FILE_SUFFIX: '.html', + LINK_SUFFIX: '.html', + HAS_SOURCE: true, + SOURCELINK_SUFFIX: '.txt', + NAVIGATION_WITH_KEYS: false, + SHOW_SEARCH_SUMMARY: true, + ENABLE_SEARCH_SHORTCUTS: true, +}; \ No newline at end of file diff --git a/5.2/_static/file.png b/5.2/_static/file.png new file mode 100644 index 000000000..a858a410e Binary files /dev/null and b/5.2/_static/file.png differ diff --git a/5.2/_static/graphviz.css b/5.2/_static/graphviz.css new file mode 100644 index 000000000..027576e34 --- /dev/null +++ b/5.2/_static/graphviz.css @@ -0,0 +1,19 @@ +/* + * graphviz.css + * ~~~~~~~~~~~~ + * + * Sphinx stylesheet -- graphviz extension. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +img.graphviz { + border: 0; + max-width: 100%; +} + +object.graphviz { + max-width: 100%; +} diff --git a/5.2/_static/kraken.png b/5.2/_static/kraken.png new file mode 100644 index 000000000..8f25dd8be Binary files /dev/null and b/5.2/_static/kraken.png differ diff --git a/5.2/_static/kraken_recognition.svg b/5.2/_static/kraken_recognition.svg new file mode 100644 index 000000000..129b2c67a --- /dev/null +++ b/5.2/_static/kraken_recognition.svg @@ -0,0 +1,948 @@ + + + + + + + + + + + + Output Matrix + + + Labels + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Label + Sequence + + + 15, 10, 1, ... + + + + 'Time' Steps + + + + + + + + + + + + + + 'Time' Steps + (Width) + + + + + + + + + + + + + + + + + + + + + + + + + + Neural + Net + + + + Character + Sequence + + + o, c, u, ... + + + + + + + + + + + + + + + CTC + decoder + + + + + Codec + + + + + + + + + + + + + + diff --git a/5.2/_static/kraken_segmentation.svg b/5.2/_static/kraken_segmentation.svg new file mode 100644 index 000000000..4b9c860ce --- /dev/null +++ b/5.2/_static/kraken_segmentation.svg @@ -0,0 +1,1161 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Pixel Labelling + + + + + + + + Line and Separator + Heatmaps + + + + + + + + + Bounding Polygon + Calculation + + + + + + + + + + + Baseline + Vectorization + and Orientation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Oriented + Baselines + + + + + + + + + Line + Ordering + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Bounding + Polygons + + + + + + + Trainable + + + + + + + + + + + + Segmentation + + + + + + + + + + Region Heatmaps + + + + + + + + + + Region + Vectorization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Region + Boundaries + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/5.2/_static/kraken_segmodel.svg b/5.2/_static/kraken_segmodel.svg new file mode 100644 index 000000000..e722a9707 --- /dev/null +++ b/5.2/_static/kraken_segmodel.svg @@ -0,0 +1,250 @@ + + + + + + + + + + + + + Segmentation Model + (TorchVGSLModel) + + + + + + + + + Metadata + + + + + + + Line and Region Types + + + + + + + Baseline location flag + + + + + + + Bounding Regions + + + + + + + + + + + Neural Network + + + + diff --git a/5.2/_static/kraken_torchseqrecognizer.svg b/5.2/_static/kraken_torchseqrecognizer.svg new file mode 100644 index 000000000..c9a2f1135 --- /dev/null +++ b/5.2/_static/kraken_torchseqrecognizer.svg @@ -0,0 +1,239 @@ + + + + + + + + + + + + + Transcription Model + (TorchSeqRecognizer) + + + + + + + + + + Codec + + + + + + + + + + + Metadata + + + + + + + + + + + CTC Decoder + + + + + + + + + + + Neural Network + + + + diff --git a/5.2/_static/kraken_workflow.svg b/5.2/_static/kraken_workflow.svg new file mode 100644 index 000000000..5a50b51d6 --- /dev/null +++ b/5.2/_static/kraken_workflow.svg @@ -0,0 +1,753 @@ + + + + + + + + + + + + + + + Segmentation + + + + + + + + + + + Recognition + + + + + + + + + + + Serialization + + + + + + + + + + + + + + + + + + + + + + Recognition Model + + + + + + + + + + + + + + + + + + + + + + Segmentation Model + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + OCR Records + + + + + + + + + + + + + + + + + + Baselines, + Regions, + and Order + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Output File + + + + + + + + + + + + + + + + + + Output Template + + + + + + + + + + + + + + + + + + Image + + diff --git a/5.2/_static/language_data.js b/5.2/_static/language_data.js new file mode 100644 index 000000000..367b8ed81 --- /dev/null +++ b/5.2/_static/language_data.js @@ -0,0 +1,199 @@ +/* + * language_data.js + * ~~~~~~~~~~~~~~~~ + * + * This script contains the language-specific data used by searchtools.js, + * namely the list of stopwords, stemmer, scorer and splitter. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"]; + + +/* Non-minified version is copied as a separate JS file, if available */ + +/** + * Porter Stemmer + */ +var Stemmer = function() { + + var step2list = { + ational: 'ate', + tional: 'tion', + enci: 'ence', + anci: 'ance', + izer: 'ize', + bli: 'ble', + alli: 'al', + entli: 'ent', + eli: 'e', + ousli: 'ous', + ization: 'ize', + ation: 'ate', + ator: 'ate', + alism: 'al', + iveness: 'ive', + fulness: 'ful', + ousness: 'ous', + aliti: 'al', + iviti: 'ive', + biliti: 'ble', + logi: 'log' + }; + + var step3list = { + icate: 'ic', + ative: '', + alize: 'al', + iciti: 'ic', + ical: 'ic', + ful: '', + ness: '' + }; + + var c = "[^aeiou]"; // consonant + var v = "[aeiouy]"; // vowel + var C = c + "[^aeiouy]*"; // consonant sequence + var V = v + "[aeiou]*"; // vowel sequence + + var mgr0 = "^(" + C + ")?" + V + C; // [C]VC... is m>0 + var meq1 = "^(" + C + ")?" + V + C + "(" + V + ")?$"; // [C]VC[V] is m=1 + var mgr1 = "^(" + C + ")?" + V + C + V + C; // [C]VCVC... is m>1 + var s_v = "^(" + C + ")?" + v; // vowel in stem + + this.stemWord = function (w) { + var stem; + var suffix; + var firstch; + var origword = w; + + if (w.length < 3) + return w; + + var re; + var re2; + var re3; + var re4; + + firstch = w.substr(0,1); + if (firstch == "y") + w = firstch.toUpperCase() + w.substr(1); + + // Step 1a + re = /^(.+?)(ss|i)es$/; + re2 = /^(.+?)([^s])s$/; + + if (re.test(w)) + w = w.replace(re,"$1$2"); + else if (re2.test(w)) + w = w.replace(re2,"$1$2"); + + // Step 1b + re = /^(.+?)eed$/; + re2 = /^(.+?)(ed|ing)$/; + if (re.test(w)) { + var fp = re.exec(w); + re = new RegExp(mgr0); + if (re.test(fp[1])) { + re = /.$/; + w = w.replace(re,""); + } + } + else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1]; + re2 = new RegExp(s_v); + if (re2.test(stem)) { + w = stem; + re2 = /(at|bl|iz)$/; + re3 = new RegExp("([^aeiouylsz])\\1$"); + re4 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + if (re2.test(w)) + w = w + "e"; + else if (re3.test(w)) { + re = /.$/; + w = w.replace(re,""); + } + else if (re4.test(w)) + w = w + "e"; + } + } + + // Step 1c + re = /^(.+?)y$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(s_v); + if (re.test(stem)) + w = stem + "i"; + } + + // Step 2 + re = /^(.+?)(ational|tional|enci|anci|izer|bli|alli|entli|eli|ousli|ization|ation|ator|alism|iveness|fulness|ousness|aliti|iviti|biliti|logi)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = new RegExp(mgr0); + if (re.test(stem)) + w = stem + step2list[suffix]; + } + + // Step 3 + re = /^(.+?)(icate|ative|alize|iciti|ical|ful|ness)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = new RegExp(mgr0); + if (re.test(stem)) + w = stem + step3list[suffix]; + } + + // Step 4 + re = /^(.+?)(al|ance|ence|er|ic|able|ible|ant|ement|ment|ent|ou|ism|ate|iti|ous|ive|ize)$/; + re2 = /^(.+?)(s|t)(ion)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(mgr1); + if (re.test(stem)) + w = stem; + } + else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1] + fp[2]; + re2 = new RegExp(mgr1); + if (re2.test(stem)) + w = stem; + } + + // Step 5 + re = /^(.+?)e$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(mgr1); + re2 = new RegExp(meq1); + re3 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + if (re.test(stem) || (re2.test(stem) && !(re3.test(stem)))) + w = stem; + } + re = /ll$/; + re2 = new RegExp(mgr1); + if (re.test(w) && re2.test(w)) { + re = /.$/; + w = w.replace(re,""); + } + + // and turn initial Y back to y + if (firstch == "y") + w = firstch.toLowerCase() + w.substr(1); + return w; + } +} + diff --git a/5.2/_static/minus.png b/5.2/_static/minus.png new file mode 100644 index 000000000..d96755fda Binary files /dev/null and b/5.2/_static/minus.png differ diff --git a/5.2/_static/normal-reproduction-low-resolution.jpg b/5.2/_static/normal-reproduction-low-resolution.jpg new file mode 100644 index 000000000..673be92ae Binary files /dev/null and b/5.2/_static/normal-reproduction-low-resolution.jpg differ diff --git a/5.2/_static/pat.png b/5.2/_static/pat.png new file mode 100644 index 000000000..52f7a8995 Binary files /dev/null and b/5.2/_static/pat.png differ diff --git a/5.2/_static/plus.png b/5.2/_static/plus.png new file mode 100644 index 000000000..7107cec93 Binary files /dev/null and b/5.2/_static/plus.png differ diff --git a/5.2/_static/pygments.css b/5.2/_static/pygments.css new file mode 100644 index 000000000..0d49244ed --- /dev/null +++ b/5.2/_static/pygments.css @@ -0,0 +1,75 @@ +pre { line-height: 125%; } +td.linenos .normal { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +span.linenos { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +td.linenos .special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +.highlight .hll { background-color: #ffffcc } +.highlight { background: #eeffcc; } +.highlight .c { color: #408090; font-style: italic } /* Comment */ +.highlight .err { border: 1px solid #FF0000 } /* Error */ +.highlight .k { color: #007020; font-weight: bold } /* Keyword */ +.highlight .o { color: #666666 } /* Operator */ +.highlight .ch { color: #408090; font-style: italic } /* Comment.Hashbang */ +.highlight .cm { color: #408090; font-style: italic } /* Comment.Multiline */ +.highlight .cp { color: #007020 } /* Comment.Preproc */ +.highlight .cpf { color: #408090; font-style: italic } /* Comment.PreprocFile */ +.highlight .c1 { color: #408090; font-style: italic } /* Comment.Single */ +.highlight .cs { color: #408090; background-color: #fff0f0 } /* Comment.Special */ +.highlight .gd { color: #A00000 } /* Generic.Deleted */ +.highlight .ge { font-style: italic } /* Generic.Emph */ +.highlight .ges { font-weight: bold; font-style: italic } /* Generic.EmphStrong */ +.highlight .gr { color: #FF0000 } /* Generic.Error */ +.highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */ +.highlight .gi { color: #00A000 } /* Generic.Inserted */ +.highlight .go { color: #333333 } /* Generic.Output */ +.highlight .gp { color: #c65d09; font-weight: bold } /* Generic.Prompt */ +.highlight .gs { font-weight: bold } /* Generic.Strong */ +.highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */ +.highlight .gt { color: #0044DD } /* Generic.Traceback */ +.highlight .kc { color: #007020; font-weight: bold } /* Keyword.Constant */ +.highlight .kd { color: #007020; font-weight: bold } /* Keyword.Declaration */ +.highlight .kn { color: #007020; font-weight: bold } /* Keyword.Namespace */ +.highlight .kp { color: #007020 } /* Keyword.Pseudo */ +.highlight .kr { color: #007020; font-weight: bold } /* Keyword.Reserved */ +.highlight .kt { color: #902000 } /* Keyword.Type */ +.highlight .m { color: #208050 } /* Literal.Number */ +.highlight .s { color: #4070a0 } /* Literal.String */ +.highlight .na { color: #4070a0 } /* Name.Attribute */ +.highlight .nb { color: #007020 } /* Name.Builtin */ +.highlight .nc { color: #0e84b5; font-weight: bold } /* Name.Class */ +.highlight .no { color: #60add5 } /* Name.Constant */ +.highlight .nd { color: #555555; font-weight: bold } /* Name.Decorator */ +.highlight .ni { color: #d55537; font-weight: bold } /* Name.Entity */ +.highlight .ne { color: #007020 } /* Name.Exception */ +.highlight .nf { color: #06287e } /* Name.Function */ +.highlight .nl { color: #002070; font-weight: bold } /* Name.Label */ +.highlight .nn { color: #0e84b5; font-weight: bold } /* Name.Namespace */ +.highlight .nt { color: #062873; font-weight: bold } /* Name.Tag */ +.highlight .nv { color: #bb60d5 } /* Name.Variable */ +.highlight .ow { color: #007020; font-weight: bold } /* Operator.Word */ +.highlight .w { color: #bbbbbb } /* Text.Whitespace */ +.highlight .mb { color: #208050 } /* Literal.Number.Bin */ +.highlight .mf { color: #208050 } /* Literal.Number.Float */ +.highlight .mh { color: #208050 } /* Literal.Number.Hex */ +.highlight .mi { color: #208050 } /* Literal.Number.Integer */ +.highlight .mo { color: #208050 } /* Literal.Number.Oct */ +.highlight .sa { color: #4070a0 } /* Literal.String.Affix */ +.highlight .sb { color: #4070a0 } /* Literal.String.Backtick */ +.highlight .sc { color: #4070a0 } /* Literal.String.Char */ +.highlight .dl { color: #4070a0 } /* Literal.String.Delimiter */ +.highlight .sd { color: #4070a0; font-style: italic } /* Literal.String.Doc */ +.highlight .s2 { color: #4070a0 } /* Literal.String.Double */ +.highlight .se { color: #4070a0; font-weight: bold } /* Literal.String.Escape */ +.highlight .sh { color: #4070a0 } /* Literal.String.Heredoc */ +.highlight .si { color: #70a0d0; font-style: italic } /* Literal.String.Interpol */ +.highlight .sx { color: #c65d09 } /* Literal.String.Other */ +.highlight .sr { color: #235388 } /* Literal.String.Regex */ +.highlight .s1 { color: #4070a0 } /* Literal.String.Single */ +.highlight .ss { color: #517918 } /* Literal.String.Symbol */ +.highlight .bp { color: #007020 } /* Name.Builtin.Pseudo */ +.highlight .fm { color: #06287e } /* Name.Function.Magic */ +.highlight .vc { color: #bb60d5 } /* Name.Variable.Class */ +.highlight .vg { color: #bb60d5 } /* Name.Variable.Global */ +.highlight .vi { color: #bb60d5 } /* Name.Variable.Instance */ +.highlight .vm { color: #bb60d5 } /* Name.Variable.Magic */ +.highlight .il { color: #208050 } /* Literal.Number.Integer.Long */ \ No newline at end of file diff --git a/5.2/_static/searchtools.js b/5.2/_static/searchtools.js new file mode 100644 index 000000000..b08d58c9b --- /dev/null +++ b/5.2/_static/searchtools.js @@ -0,0 +1,620 @@ +/* + * searchtools.js + * ~~~~~~~~~~~~~~~~ + * + * Sphinx JavaScript utilities for the full-text search. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ +"use strict"; + +/** + * Simple result scoring code. + */ +if (typeof Scorer === "undefined") { + var Scorer = { + // Implement the following function to further tweak the score for each result + // The function takes a result array [docname, title, anchor, descr, score, filename] + // and returns the new score. + /* + score: result => { + const [docname, title, anchor, descr, score, filename] = result + return score + }, + */ + + // query matches the full name of an object + objNameMatch: 11, + // or matches in the last dotted part of the object name + objPartialMatch: 6, + // Additive scores depending on the priority of the object + objPrio: { + 0: 15, // used to be importantResults + 1: 5, // used to be objectResults + 2: -5, // used to be unimportantResults + }, + // Used when the priority is not in the mapping. + objPrioDefault: 0, + + // query found in title + title: 15, + partialTitle: 7, + // query found in terms + term: 5, + partialTerm: 2, + }; +} + +const _removeChildren = (element) => { + while (element && element.lastChild) element.removeChild(element.lastChild); +}; + +/** + * See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping + */ +const _escapeRegExp = (string) => + string.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string + +const _displayItem = (item, searchTerms, highlightTerms) => { + const docBuilder = DOCUMENTATION_OPTIONS.BUILDER; + const docFileSuffix = DOCUMENTATION_OPTIONS.FILE_SUFFIX; + const docLinkSuffix = DOCUMENTATION_OPTIONS.LINK_SUFFIX; + const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY; + const contentRoot = document.documentElement.dataset.content_root; + + const [docName, title, anchor, descr, score, _filename] = item; + + let listItem = document.createElement("li"); + let requestUrl; + let linkUrl; + if (docBuilder === "dirhtml") { + // dirhtml builder + let dirname = docName + "/"; + if (dirname.match(/\/index\/$/)) + dirname = dirname.substring(0, dirname.length - 6); + else if (dirname === "index/") dirname = ""; + requestUrl = contentRoot + dirname; + linkUrl = requestUrl; + } else { + // normal html builders + requestUrl = contentRoot + docName + docFileSuffix; + linkUrl = docName + docLinkSuffix; + } + let linkEl = listItem.appendChild(document.createElement("a")); + linkEl.href = linkUrl + anchor; + linkEl.dataset.score = score; + linkEl.innerHTML = title; + if (descr) { + listItem.appendChild(document.createElement("span")).innerHTML = + " (" + descr + ")"; + // highlight search terms in the description + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + } + else if (showSearchSummary) + fetch(requestUrl) + .then((responseData) => responseData.text()) + .then((data) => { + if (data) + listItem.appendChild( + Search.makeSearchSummary(data, searchTerms, anchor) + ); + // highlight search terms in the summary + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + }); + Search.output.appendChild(listItem); +}; +const _finishSearch = (resultCount) => { + Search.stopPulse(); + Search.title.innerText = _("Search Results"); + if (!resultCount) + Search.status.innerText = Documentation.gettext( + "Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories." + ); + else + Search.status.innerText = _( + "Search finished, found ${resultCount} page(s) matching the search query." + ).replace('${resultCount}', resultCount); +}; +const _displayNextItem = ( + results, + resultCount, + searchTerms, + highlightTerms, +) => { + // results left, load the summary and display it + // this is intended to be dynamic (don't sub resultsCount) + if (results.length) { + _displayItem(results.pop(), searchTerms, highlightTerms); + setTimeout( + () => _displayNextItem(results, resultCount, searchTerms, highlightTerms), + 5 + ); + } + // search finished, update title and status message + else _finishSearch(resultCount); +}; +// Helper function used by query() to order search results. +// Each input is an array of [docname, title, anchor, descr, score, filename]. +// Order the results by score (in opposite order of appearance, since the +// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically. +const _orderResultsByScoreThenName = (a, b) => { + const leftScore = a[4]; + const rightScore = b[4]; + if (leftScore === rightScore) { + // same score: sort alphabetically + const leftTitle = a[1].toLowerCase(); + const rightTitle = b[1].toLowerCase(); + if (leftTitle === rightTitle) return 0; + return leftTitle > rightTitle ? -1 : 1; // inverted is intentional + } + return leftScore > rightScore ? 1 : -1; +}; + +/** + * Default splitQuery function. Can be overridden in ``sphinx.search`` with a + * custom function per language. + * + * The regular expression works by splitting the string on consecutive characters + * that are not Unicode letters, numbers, underscores, or emoji characters. + * This is the same as ``\W+`` in Python, preserving the surrogate pair area. + */ +if (typeof splitQuery === "undefined") { + var splitQuery = (query) => query + .split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu) + .filter(term => term) // remove remaining empty strings +} + +/** + * Search Module + */ +const Search = { + _index: null, + _queued_query: null, + _pulse_status: -1, + + htmlToText: (htmlString, anchor) => { + const htmlElement = new DOMParser().parseFromString(htmlString, 'text/html'); + for (const removalQuery of [".headerlink", "script", "style"]) { + htmlElement.querySelectorAll(removalQuery).forEach((el) => { el.remove() }); + } + if (anchor) { + const anchorContent = htmlElement.querySelector(`[role="main"] ${anchor}`); + if (anchorContent) return anchorContent.textContent; + + console.warn( + `Anchored content block not found. Sphinx search tries to obtain it via DOM query '[role=main] ${anchor}'. Check your theme or template.` + ); + } + + // if anchor not specified or not found, fall back to main content + const docContent = htmlElement.querySelector('[role="main"]'); + if (docContent) return docContent.textContent; + + console.warn( + "Content block not found. Sphinx search tries to obtain it via DOM query '[role=main]'. Check your theme or template." + ); + return ""; + }, + + init: () => { + const query = new URLSearchParams(window.location.search).get("q"); + document + .querySelectorAll('input[name="q"]') + .forEach((el) => (el.value = query)); + if (query) Search.performSearch(query); + }, + + loadIndex: (url) => + (document.body.appendChild(document.createElement("script")).src = url), + + setIndex: (index) => { + Search._index = index; + if (Search._queued_query !== null) { + const query = Search._queued_query; + Search._queued_query = null; + Search.query(query); + } + }, + + hasIndex: () => Search._index !== null, + + deferQuery: (query) => (Search._queued_query = query), + + stopPulse: () => (Search._pulse_status = -1), + + startPulse: () => { + if (Search._pulse_status >= 0) return; + + const pulse = () => { + Search._pulse_status = (Search._pulse_status + 1) % 4; + Search.dots.innerText = ".".repeat(Search._pulse_status); + if (Search._pulse_status >= 0) window.setTimeout(pulse, 500); + }; + pulse(); + }, + + /** + * perform a search for something (or wait until index is loaded) + */ + performSearch: (query) => { + // create the required interface elements + const searchText = document.createElement("h2"); + searchText.textContent = _("Searching"); + const searchSummary = document.createElement("p"); + searchSummary.classList.add("search-summary"); + searchSummary.innerText = ""; + const searchList = document.createElement("ul"); + searchList.classList.add("search"); + + const out = document.getElementById("search-results"); + Search.title = out.appendChild(searchText); + Search.dots = Search.title.appendChild(document.createElement("span")); + Search.status = out.appendChild(searchSummary); + Search.output = out.appendChild(searchList); + + const searchProgress = document.getElementById("search-progress"); + // Some themes don't use the search progress node + if (searchProgress) { + searchProgress.innerText = _("Preparing search..."); + } + Search.startPulse(); + + // index already loaded, the browser was quick! + if (Search.hasIndex()) Search.query(query); + else Search.deferQuery(query); + }, + + _parseQuery: (query) => { + // stem the search terms and add them to the correct list + const stemmer = new Stemmer(); + const searchTerms = new Set(); + const excludedTerms = new Set(); + const highlightTerms = new Set(); + const objectTerms = new Set(splitQuery(query.toLowerCase().trim())); + splitQuery(query.trim()).forEach((queryTerm) => { + const queryTermLower = queryTerm.toLowerCase(); + + // maybe skip this "word" + // stopwords array is from language_data.js + if ( + stopwords.indexOf(queryTermLower) !== -1 || + queryTerm.match(/^\d+$/) + ) + return; + + // stem the word + let word = stemmer.stemWord(queryTermLower); + // select the correct list + if (word[0] === "-") excludedTerms.add(word.substr(1)); + else { + searchTerms.add(word); + highlightTerms.add(queryTermLower); + } + }); + + if (SPHINX_HIGHLIGHT_ENABLED) { // set in sphinx_highlight.js + localStorage.setItem("sphinx_highlight_terms", [...highlightTerms].join(" ")) + } + + // console.debug("SEARCH: searching for:"); + // console.info("required: ", [...searchTerms]); + // console.info("excluded: ", [...excludedTerms]); + + return [query, searchTerms, excludedTerms, highlightTerms, objectTerms]; + }, + + /** + * execute search (requires search index to be loaded) + */ + _performSearch: (query, searchTerms, excludedTerms, highlightTerms, objectTerms) => { + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const titles = Search._index.titles; + const allTitles = Search._index.alltitles; + const indexEntries = Search._index.indexentries; + + // Collect multiple result groups to be sorted separately and then ordered. + // Each is an array of [docname, title, anchor, descr, score, filename]. + const normalResults = []; + const nonMainIndexResults = []; + + _removeChildren(document.getElementById("search-progress")); + + const queryLower = query.toLowerCase().trim(); + for (const [title, foundTitles] of Object.entries(allTitles)) { + if (title.toLowerCase().trim().includes(queryLower) && (queryLower.length >= title.length/2)) { + for (const [file, id] of foundTitles) { + const score = Math.round(Scorer.title * queryLower.length / title.length); + const boost = titles[file] === title ? 1 : 0; // add a boost for document titles + normalResults.push([ + docNames[file], + titles[file] !== title ? `${titles[file]} > ${title}` : title, + id !== null ? "#" + id : "", + null, + score + boost, + filenames[file], + ]); + } + } + } + + // search for explicit entries in index directives + for (const [entry, foundEntries] of Object.entries(indexEntries)) { + if (entry.includes(queryLower) && (queryLower.length >= entry.length/2)) { + for (const [file, id, isMain] of foundEntries) { + const score = Math.round(100 * queryLower.length / entry.length); + const result = [ + docNames[file], + titles[file], + id ? "#" + id : "", + null, + score, + filenames[file], + ]; + if (isMain) { + normalResults.push(result); + } else { + nonMainIndexResults.push(result); + } + } + } + } + + // lookup as object + objectTerms.forEach((term) => + normalResults.push(...Search.performObjectSearch(term, objectTerms)) + ); + + // lookup as search terms in fulltext + normalResults.push(...Search.performTermsSearch(searchTerms, excludedTerms)); + + // let the scorer override scores with a custom scoring function + if (Scorer.score) { + normalResults.forEach((item) => (item[4] = Scorer.score(item))); + nonMainIndexResults.forEach((item) => (item[4] = Scorer.score(item))); + } + + // Sort each group of results by score and then alphabetically by name. + normalResults.sort(_orderResultsByScoreThenName); + nonMainIndexResults.sort(_orderResultsByScoreThenName); + + // Combine the result groups in (reverse) order. + // Non-main index entries are typically arbitrary cross-references, + // so display them after other results. + let results = [...nonMainIndexResults, ...normalResults]; + + // remove duplicate search results + // note the reversing of results, so that in the case of duplicates, the highest-scoring entry is kept + let seen = new Set(); + results = results.reverse().reduce((acc, result) => { + let resultStr = result.slice(0, 4).concat([result[5]]).map(v => String(v)).join(','); + if (!seen.has(resultStr)) { + acc.push(result); + seen.add(resultStr); + } + return acc; + }, []); + + return results.reverse(); + }, + + query: (query) => { + const [searchQuery, searchTerms, excludedTerms, highlightTerms, objectTerms] = Search._parseQuery(query); + const results = Search._performSearch(searchQuery, searchTerms, excludedTerms, highlightTerms, objectTerms); + + // for debugging + //Search.lastresults = results.slice(); // a copy + // console.info("search results:", Search.lastresults); + + // print the results + _displayNextItem(results, results.length, searchTerms, highlightTerms); + }, + + /** + * search for object names + */ + performObjectSearch: (object, objectTerms) => { + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const objects = Search._index.objects; + const objNames = Search._index.objnames; + const titles = Search._index.titles; + + const results = []; + + const objectSearchCallback = (prefix, match) => { + const name = match[4] + const fullname = (prefix ? prefix + "." : "") + name; + const fullnameLower = fullname.toLowerCase(); + if (fullnameLower.indexOf(object) < 0) return; + + let score = 0; + const parts = fullnameLower.split("."); + + // check for different match types: exact matches of full name or + // "last name" (i.e. last dotted part) + if (fullnameLower === object || parts.slice(-1)[0] === object) + score += Scorer.objNameMatch; + else if (parts.slice(-1)[0].indexOf(object) > -1) + score += Scorer.objPartialMatch; // matches in last name + + const objName = objNames[match[1]][2]; + const title = titles[match[0]]; + + // If more than one term searched for, we require other words to be + // found in the name/title/description + const otherTerms = new Set(objectTerms); + otherTerms.delete(object); + if (otherTerms.size > 0) { + const haystack = `${prefix} ${name} ${objName} ${title}`.toLowerCase(); + if ( + [...otherTerms].some((otherTerm) => haystack.indexOf(otherTerm) < 0) + ) + return; + } + + let anchor = match[3]; + if (anchor === "") anchor = fullname; + else if (anchor === "-") anchor = objNames[match[1]][1] + "-" + fullname; + + const descr = objName + _(", in ") + title; + + // add custom score for some objects according to scorer + if (Scorer.objPrio.hasOwnProperty(match[2])) + score += Scorer.objPrio[match[2]]; + else score += Scorer.objPrioDefault; + + results.push([ + docNames[match[0]], + fullname, + "#" + anchor, + descr, + score, + filenames[match[0]], + ]); + }; + Object.keys(objects).forEach((prefix) => + objects[prefix].forEach((array) => + objectSearchCallback(prefix, array) + ) + ); + return results; + }, + + /** + * search for full-text terms in the index + */ + performTermsSearch: (searchTerms, excludedTerms) => { + // prepare search + const terms = Search._index.terms; + const titleTerms = Search._index.titleterms; + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const titles = Search._index.titles; + + const scoreMap = new Map(); + const fileMap = new Map(); + + // perform the search on the required terms + searchTerms.forEach((word) => { + const files = []; + const arr = [ + { files: terms[word], score: Scorer.term }, + { files: titleTerms[word], score: Scorer.title }, + ]; + // add support for partial matches + if (word.length > 2) { + const escapedWord = _escapeRegExp(word); + if (!terms.hasOwnProperty(word)) { + Object.keys(terms).forEach((term) => { + if (term.match(escapedWord)) + arr.push({ files: terms[term], score: Scorer.partialTerm }); + }); + } + if (!titleTerms.hasOwnProperty(word)) { + Object.keys(titleTerms).forEach((term) => { + if (term.match(escapedWord)) + arr.push({ files: titleTerms[term], score: Scorer.partialTitle }); + }); + } + } + + // no match but word was a required one + if (arr.every((record) => record.files === undefined)) return; + + // found search word in contents + arr.forEach((record) => { + if (record.files === undefined) return; + + let recordFiles = record.files; + if (recordFiles.length === undefined) recordFiles = [recordFiles]; + files.push(...recordFiles); + + // set score for the word in each file + recordFiles.forEach((file) => { + if (!scoreMap.has(file)) scoreMap.set(file, {}); + scoreMap.get(file)[word] = record.score; + }); + }); + + // create the mapping + files.forEach((file) => { + if (!fileMap.has(file)) fileMap.set(file, [word]); + else if (fileMap.get(file).indexOf(word) === -1) fileMap.get(file).push(word); + }); + }); + + // now check if the files don't contain excluded terms + const results = []; + for (const [file, wordList] of fileMap) { + // check if all requirements are matched + + // as search terms with length < 3 are discarded + const filteredTermCount = [...searchTerms].filter( + (term) => term.length > 2 + ).length; + if ( + wordList.length !== searchTerms.size && + wordList.length !== filteredTermCount + ) + continue; + + // ensure that none of the excluded terms is in the search result + if ( + [...excludedTerms].some( + (term) => + terms[term] === file || + titleTerms[term] === file || + (terms[term] || []).includes(file) || + (titleTerms[term] || []).includes(file) + ) + ) + break; + + // select one (max) score for the file. + const score = Math.max(...wordList.map((w) => scoreMap.get(file)[w])); + // add result to the result list + results.push([ + docNames[file], + titles[file], + "", + null, + score, + filenames[file], + ]); + } + return results; + }, + + /** + * helper function to return a node containing the + * search summary for a given text. keywords is a list + * of stemmed words. + */ + makeSearchSummary: (htmlText, keywords, anchor) => { + const text = Search.htmlToText(htmlText, anchor); + if (text === "") return null; + + const textLower = text.toLowerCase(); + const actualStartPosition = [...keywords] + .map((k) => textLower.indexOf(k.toLowerCase())) + .filter((i) => i > -1) + .slice(-1)[0]; + const startWithContext = Math.max(actualStartPosition - 120, 0); + + const top = startWithContext === 0 ? "" : "..."; + const tail = startWithContext + 240 < text.length ? "..." : ""; + + let summary = document.createElement("p"); + summary.classList.add("context"); + summary.textContent = top + text.substr(startWithContext, 240).trim() + tail; + + return summary; + }, +}; + +_ready(Search.init); diff --git a/5.2/_static/sphinx_highlight.js b/5.2/_static/sphinx_highlight.js new file mode 100644 index 000000000..8a96c69a1 --- /dev/null +++ b/5.2/_static/sphinx_highlight.js @@ -0,0 +1,154 @@ +/* Highlighting utilities for Sphinx HTML documentation. */ +"use strict"; + +const SPHINX_HIGHLIGHT_ENABLED = true + +/** + * highlight a given string on a node by wrapping it in + * span elements with the given class name. + */ +const _highlight = (node, addItems, text, className) => { + if (node.nodeType === Node.TEXT_NODE) { + const val = node.nodeValue; + const parent = node.parentNode; + const pos = val.toLowerCase().indexOf(text); + if ( + pos >= 0 && + !parent.classList.contains(className) && + !parent.classList.contains("nohighlight") + ) { + let span; + + const closestNode = parent.closest("body, svg, foreignObject"); + const isInSVG = closestNode && closestNode.matches("svg"); + if (isInSVG) { + span = document.createElementNS("http://www.w3.org/2000/svg", "tspan"); + } else { + span = document.createElement("span"); + span.classList.add(className); + } + + span.appendChild(document.createTextNode(val.substr(pos, text.length))); + const rest = document.createTextNode(val.substr(pos + text.length)); + parent.insertBefore( + span, + parent.insertBefore( + rest, + node.nextSibling + ) + ); + node.nodeValue = val.substr(0, pos); + /* There may be more occurrences of search term in this node. So call this + * function recursively on the remaining fragment. + */ + _highlight(rest, addItems, text, className); + + if (isInSVG) { + const rect = document.createElementNS( + "http://www.w3.org/2000/svg", + "rect" + ); + const bbox = parent.getBBox(); + rect.x.baseVal.value = bbox.x; + rect.y.baseVal.value = bbox.y; + rect.width.baseVal.value = bbox.width; + rect.height.baseVal.value = bbox.height; + rect.setAttribute("class", className); + addItems.push({ parent: parent, target: rect }); + } + } + } else if (node.matches && !node.matches("button, select, textarea")) { + node.childNodes.forEach((el) => _highlight(el, addItems, text, className)); + } +}; +const _highlightText = (thisNode, text, className) => { + let addItems = []; + _highlight(thisNode, addItems, text, className); + addItems.forEach((obj) => + obj.parent.insertAdjacentElement("beforebegin", obj.target) + ); +}; + +/** + * Small JavaScript module for the documentation. + */ +const SphinxHighlight = { + + /** + * highlight the search words provided in localstorage in the text + */ + highlightSearchWords: () => { + if (!SPHINX_HIGHLIGHT_ENABLED) return; // bail if no highlight + + // get and clear terms from localstorage + const url = new URL(window.location); + const highlight = + localStorage.getItem("sphinx_highlight_terms") + || url.searchParams.get("highlight") + || ""; + localStorage.removeItem("sphinx_highlight_terms") + url.searchParams.delete("highlight"); + window.history.replaceState({}, "", url); + + // get individual terms from highlight string + const terms = highlight.toLowerCase().split(/\s+/).filter(x => x); + if (terms.length === 0) return; // nothing to do + + // There should never be more than one element matching "div.body" + const divBody = document.querySelectorAll("div.body"); + const body = divBody.length ? divBody[0] : document.querySelector("body"); + window.setTimeout(() => { + terms.forEach((term) => _highlightText(body, term, "highlighted")); + }, 10); + + const searchBox = document.getElementById("searchbox"); + if (searchBox === null) return; + searchBox.appendChild( + document + .createRange() + .createContextualFragment( + '" + ) + ); + }, + + /** + * helper function to hide the search marks again + */ + hideSearchWords: () => { + document + .querySelectorAll("#searchbox .highlight-link") + .forEach((el) => el.remove()); + document + .querySelectorAll("span.highlighted") + .forEach((el) => el.classList.remove("highlighted")); + localStorage.removeItem("sphinx_highlight_terms") + }, + + initEscapeListener: () => { + // only install a listener if it is really needed + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.shiftKey || event.altKey || event.ctrlKey || event.metaKey) return; + if (DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS && (event.key === "Escape")) { + SphinxHighlight.hideSearchWords(); + event.preventDefault(); + } + }); + }, +}; + +_ready(() => { + /* Do not call highlightSearchWords() when we are on the search page. + * It will highlight words from the *previous* search query. + */ + if (typeof Search === "undefined") SphinxHighlight.highlightSearchWords(); + SphinxHighlight.initEscapeListener(); +}); diff --git a/5.2/advanced.html b/5.2/advanced.html new file mode 100644 index 000000000..7b0d2e623 --- /dev/null +++ b/5.2/advanced.html @@ -0,0 +1,538 @@ + + + + + + + + Advanced Usage — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Advanced Usage

+

Optical character recognition is the serial execution of multiple steps, in the +case of kraken, layout analysis/page segmentation (extracting topological text +lines from an image), recognition (feeding text lines images into a +classifier), and finally serialization of results into an appropriate format +such as ALTO or PageXML.

+
+

Input and Outputs

+

Kraken inputs and their outputs can be defined in multiple ways. The most +simple are input-output pairs, i.e. producing one output document for one input +document follow the basic syntax:

+
$ kraken -i input_1 output_1 -i input_2 output_2 ... subcommand_1 subcommand_2 ... subcommand_n
+
+
+

In particular subcommands may be chained.

+

There are other ways to define inputs and outputs as the syntax shown above can +become rather cumbersome for large amounts of files.

+

As such there are a couple of ways to deal with multiple files in a compact +way. The first is batch processing:

+
$ kraken -I '*.png' -o ocr.txt segment ...
+
+
+

which expands the glob expression in kraken internally and +appends the suffix defined with -o to each output file. An input file +xyz.png will therefore produce an output file xyz.png.ocr.txt. -I batch +inputs can also be specified multiple times:

+
$ kraken -I '*.png' -I '*.jpg' -I '*.tif' -o ocr.txt segment ...
+
+
+

A second way is to input multi-image files directly. These can be either in +PDF, TIFF, or JPEG2000 format and are specified like:

+
$ kraken -I some.pdf -o ocr.txt -f pdf segment ...
+
+
+

This will internally extract all page images from the input PDF file and write +one output file with an index (can be changed using the -p option) and the +suffix defined with -o.

+

The -f option can not only be used to extract data from PDF/TIFF/JPEG2000 +files but also various XML formats. In these cases the appropriate data is +automatically selected from the inputs, image data for segmentation or line and +region segmentation for recognition:

+
$ kraken -i alto.xml alto.ocr.txt -i page.xml page.ocr.txt -f xml ocr ...
+
+
+

The code is able to automatically determine if a file is in PageXML or ALTO format.

+
+

Output formats

+

All commands have a default output format such as raw text for ocr, a plain +image for binarize, or a JSON definition of the the segmentation for +segment. These are specific to kraken and generally not suitable for further +processing by other software but a number of standardized data exchange formats +can be selected. Per default ALTO, +PageXML, hOCR, and abbyyXML containing additional metadata such as +bounding boxes and confidences are implemented. In addition, custom jinja templates can be loaded to create +individualised output such as TEI.

+

Output formats are selected on the main kraken command and apply to the last +subcommand defined in the subcommand chain. For example:

+
$ kraken --alto -i ... segment -bl
+
+
+

will serialize a plain segmentation in ALTO into the specified output file.

+

The currently available format switches are:

+
$ kraken -n -i ... ... # native output
+$ kraken -a -i ... ... # ALTO output
+$ kraken -x -i ... ... # PageXML output
+$ kraken -h -i ... ... # hOCR output
+$ kraken -y -i ... ... # abbyyXML output
+
+
+

Custom templates can be loaded with the --template option:

+
$ kraken --template /my/awesome/template.tmpl -i ... ...
+
+
+

The data objects used by the templates are considered internal to kraken and +can change from time to time. The best way to get some orientation when writing +a new template from scratch is to have a look at the existing templates here.

+
+
+
+

Binarization

+
+

Note

+

Binarization is deprecated and mostly not necessary anymore. It can often +worsen text recognition results especially for documents with uneven +lighting, faint writing, etc.

+
+

The binarization subcommand converts a color or grayscale input image into an +image containing only two color levels: white (background) and black +(foreground, i.e. text). It accepts almost the same parameters as +ocropus-nlbin. Only options not related to binarization, e.g. skew +detection are missing. In addition, error checking (image sizes, inversion +detection, grayscale enforcement) is always disabled and kraken will happily +binarize any image that is thrown at it.

+

Available parameters are:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

option

type

--threshold

FLOAT

--zoom

FLOAT

--escale

FLOAT

--border

FLOAT

--perc

INTEGER RANGE

--range

INTEGER

--low

INTEGER RANGE

--high

INTEGER RANGE

+

To binarize an image:

+
$ kraken -i input.jpg bw.png binarize
+
+
+
+

Note

+

Some image formats, notably JPEG, do not support a black and white +image mode. Per default the output format according to the output file +name extension will be honored. If this is not possible, a warning will +be printed and the output forced to PNG:

+
$ kraken -i input.jpg bw.jpg binarize
+Binarizing      [06/24/22 09:56:23] WARNING  jpeg does not support 1bpp images. Forcing to png.
+
+
+
+
+
+
+

Page Segmentation

+

The segment subcommand accesses page segmentation into lines and regions with +the two layout analysis methods implemented: the trainable baseline segmenter +that is capable of detecting both lines of different types and regions and a +legacy non-trainable segmenter that produces bounding boxes.

+

Universal parameters of either segmenter are:

+ + + + + + + + + + + + + + +

option

action

-d, --text-direction

Sets principal text direction. Valid values are horizontal-lr, horizontal-rl, vertical-lr, and vertical-rl.

-m, --mask

Segmentation mask suppressing page areas for line detection. A simple black and white mask image where 0-valued (black) areas are ignored for segmentation purposes.

+
+

Baseline Segmentation

+

The baseline segmenter works by applying a segmentation model on a page image +which labels each pixel on the image with one or more classes with each class +corresponding to a line or region of a specific type. In addition there are two +auxiliary classes that are used to determine the line orientation. A simplified +example of a composite image of the auxiliary classes and a single line type +without regions can be seen below:

+BLLA output heatmap + +

In a second step the raw heatmap is vectorized to extract line instances and +region boundaries, followed by bounding polygon computation for the baselines, +and text line ordering. The final output can be visualized as:

+BLLA final output + +

The primary determinant of segmentation quality is the segmentation model +employed. There is a default model that works reasonably well on printed and +handwritten material on undegraded, even writing surfaces such as paper or +parchment. The output of this model consists of a single line type and a +generic text region class that denotes coherent blocks of text. This model is +employed automatically when the baseline segment is activated with the -bl +option:

+
$ kraken -i input.jpg segmentation.json segment -bl
+
+
+

New models optimized for other kinds of documents can be trained (see +here). These can be applied with the -i option of the +segment subcommand:

+
$ kraken -i input.jpg segmentation.json segment -bl -i fancy_model.mlmodel
+
+
+
+
+

Legacy Box Segmentation

+

The legacy page segmentation is mostly parameterless, although a couple of +switches exist to tweak it for particular inputs. Its output consists of +rectangular bounding boxes in reading order and the general text direction +(horizontal, i.e. LTR or RTL text in top-to-bottom reading order or +vertical-ltr/rtl for vertical lines read from left-to-right or right-to-left).

+

Apart from the limitations of the bounding box paradigm (rotated and curved +lines cannot be effectively extracted) another important drawback of the legacy +segmenter is the requirement for binarized input images. It is therefore +necessary to apply binarization first or supply only +pre-binarized inputs.

+

The legacy segmenter can be applied on some input image with:

+
$ kraken -i 14.tif lines.json segment -x
+$ cat lines.json
+
+
+

Available specific parameters are:

+ + + + + + + + + + + + + + + + + + + + + + + +

option

action

--scale FLOAT

Estimate of the average line height on the page

-m, --maxcolseps

Maximum number of columns in the input document. Set to 0 for uni-column layouts.

-b, --black-colseps / -w, --white-colseps

Switch to black column separators.

-r, --remove-hlines / -l, --hlines

Disables prefiltering of small horizontal lines. Improves segmenter output on some Arabic texts.

-p, --pad

Adds left and right padding around lines in the output.

+
+
+

Principal Text Direction

+

The principal text direction selected with the -d/--text-direction is a +switch used in the reading order heuristic to determine the order of text +blocks (regions) and individual lines. It roughly corresponds to the block +flow direction in CSS with +an additional option. Valid options consist of two parts, an initial principal +line orientation (horizontal or vertical) followed by a block order (lr +for left-to-right or rl for right-to-left).

+

The first part is usually horizontal for scripts like Latin, Arabic, or +Hebrew where the lines are horizontally oriented on the page and are written/read from +top to bottom:

+Horizontal Latin script text + +

Other scripts like Chinese can be written with vertical lines that are +written/read from left to right or right to left:

+Vertical Chinese text + +

The second part is dependent on a number of factors as the order in which text +blocks are read is not fixed for every writing system. In mono-script texts it +is usually determined by the inline text direction, i.e. Latin script texts +columns are read starting with the top-left column followed by the column to +its right and so on, continuing with the left-most column below if none remain +to the right (inverse for right-to-left scripts like Arabic which start on the +top right-most columns, continuing leftward, and returning to the right-most +column just below when none remain).

+

In multi-script documents the order is determined by the primary writing +system employed in the document, e.g. for a modern book containing both Latin +and Arabic script text it would be set to lr when Latin is primary, e.g. when +the binding is on the left side of the book seen from the title cover, and +vice-versa (rl if binding is on the right on the title cover). The analogue +applies to text written with vertical lines.

+

With these explications there are four different text directions available:

+ + + + + + + + + + + + + + + + + + + + +

Text Direction

Examples

horizontal-lr

Latin script texts, Mixed LTR/RTL docs with principal LTR script

horizontal-rl

Arabic script texts, Mixed LTR/RTL docs with principal RTL script

vertical-lr

Vertical script texts read from left-to-right.

vertical-rl

Vertical script texts read from right-to-left.

+
+
+

Masking

+

It is possible to keep the segmenter from finding text lines and regions on +certain areas of the input image. This is done through providing a binary mask +image that has the same size as the input image where blocked out regions are +black and valid regions white:

+
$ kraken -i input.jpg segmentation.json segment -bl -m mask.png
+
+
+
+
+
+

Model Repository

+

There is a semi-curated repository of freely licensed recognition +models that can be interacted with from the command line using a few +subcommands.

+
+

Querying and Model Retrieval

+

The list subcommand retrieves a list of all models available and prints +them including some additional information (identifier, type, and a short +description):

+
$ kraken list
+Retrieving model list ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 8/8 0:00:00 0:00:07
+10.5281/zenodo.6542744 (pytorch) - LECTAUREP Contemporary French Model (Administration)
+10.5281/zenodo.5617783 (pytorch) - Cremma-Medieval Old French Model (Litterature)
+10.5281/zenodo.5468665 (pytorch) - Medieval Hebrew manuscripts in Sephardi bookhand version 1.0
+...
+
+
+

To access more detailed information the show subcommand may be used:

+
$ kraken show 10.5281/zenodo.5617783
+name: 10.5281/zenodo.5617783
+
+Cremma-Medieval Old French Model (Litterature)
+
+....
+scripts: Latn
+alphabet: &'(),-.0123456789:;?ABCDEFGHIJKLMNOPQRSTUVXabcdefghijklmnopqrstuvwxyz¶ãíñõ÷ħĩłũƺᵉẽ’•⁊⁹ꝑꝓꝯꝰ SPACE, COMBINING ACUTE ACCENT, COMBINING TILDE, COMBINING MACRON, COMBINING ZIGZAG ABOVE, COMBINING LATIN SMALL LETTER A, COMBINING LATIN SMALL LETTER E, COMBINING LATIN SMALL LETTER I, COMBINING LATIN SMALL LETTER O, COMBINING LATIN SMALL LETTER U, COMBINING LATIN SMALL LETTER C, COMBINING LATIN SMALL LETTER R, COMBINING LATIN SMALL LETTER T, COMBINING UR ABOVE, COMBINING US ABOVE, COMBINING LATIN SMALL LETTER S, 0xe8e5, 0xf038, 0xf128
+accuracy: 95.49%
+license: CC-BY-SA-2.0
+author(s): Pinche, Ariane
+date: 2021-10-29
+
+
+

If a suitable model has been decided upon it can be retrieved using the get +subcommand:

+
$ kraken get 10.5281/zenodo.5617783
+Processing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 16.1/16.1 MB 0:00:00 0:00:10
+Model name: cremma_medieval_bicerin.mlmodel
+
+
+

Models will be placed in $XDG_BASE_DIR and can be accessed using their name as +printed in the last line of the kraken get output.

+
$ kraken -i ... ... ocr -m cremma_medieval_bicerin.mlmodel
+
+
+
+
+

Publishing

+

When one would like to share a model with the wider world (for fame and glory!) +it is possible (and recommended) to upload them to repository. The process +consists of 2 stages: the creation of the deposit on the Zenodo platform +followed by approval of the model in the community making it discoverable for +other kraken users.

+

For uploading model a Zenodo account and a personal access token is required. +After account creation tokens can be created under the account settings:

+Zenodo token creation dialogue + +

With the token models can then be uploaded:

+
$ ketos publish -a $ACCESS_TOKEN aaebv2-2.mlmodel
+DOI: 10.5281/zenodo.5617783
+
+
+

A number of important metadata will be asked for such as a short description of +the model, long form description, recognized scripts, and authorship. +Afterwards the model is deposited at Zenodo. This deposit is persistent, i.e. +can’t be changed or deleted so it is important to make sure that all the +information is correct. Each deposit also has a unique persistent identifier, a +DOI, that can be used to refer to it, e.g. in publications or when pointing +someone to a particular model.

+

Once the deposit has been created a request (requiring manual approval) for +inclusion in the repository will automatically be created which will make it +discoverable by other users.

+

It is possible to deposit models without including them in the queryable +repository. Models uploaded this way are not truly private and can still be +found through the standard Zenodo search and be downloaded with kraken get +and its DOI. It is mostly suggested for preliminary models that might get +updated later:

+
$ ketos publish --private -a $ACCESS_TOKEN aaebv2-2.mlmodel
+DOI: 10.5281/zenodo.5617734
+
+
+
+
+
+

Recognition

+

Recognition requires a grey-scale or binarized image, a page segmentation for +that image, and a model file. In particular there is no requirement to use the +page segmentation algorithm contained in the segment subcommand or the +binarization provided by kraken.

+

Multi-script recognition is possible by supplying a script-annotated +segmentation and a mapping between scripts and models:

+
$ kraken -i ... ... ocr -m Grek:porson.mlmodel -m Latn:antiqua.mlmodel
+
+
+

All polytonic Greek text portions will be recognized using the porson.mlmodel +model while Latin text will be fed into the antiqua.mlmodel model. It is +possible to define a fallback model that other text will be fed to:

+
$ kraken -i ... ... ocr -m ... -m ... -m default:porson.mlmodel
+
+
+

It is also possible to disable recognition on a particular script by mapping to +the special model keyword ignore. Ignored lines will still be serialized but +will not contain any recognition results.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.2/api.html b/5.2/api.html new file mode 100644 index 000000000..481ad79e1 --- /dev/null +++ b/5.2/api.html @@ -0,0 +1,3185 @@ + + + + + + + + API Quickstart — kraken documentation + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

API Quickstart

+

Kraken provides routines which are usable by third party tools to access all +functionality of the OCR engine. Most functional blocks, binarization, +segmentation, recognition, and serialization are encapsulated in one high +level method each.

+

Simple use cases of the API which are mostly useful for debugging purposes are +contained in the contrib directory. In general it is recommended to look at +this tutorial, these scripts, or the API reference. The command line drivers +are unnecessarily complex for straightforward applications as they contain lots +of boilerplate to enable all use cases.

+
+

Basic Concepts

+

The fundamental modules of the API are similar to the command line drivers. +Image inputs and outputs are generally Pillow +objects and numerical outputs numpy arrays.

+

Top-level modules implement high level functionality while kraken.lib +contains loaders and low level methods that usually should not be used if +access to intermediate results is not required.

+
+
+

Preprocessing and Segmentation

+

The primary preprocessing function is binarization although depending on the +particular setup of the pipeline and the models utilized it can be optional. +For the non-trainable legacy bounding box segmenter binarization is mandatory +although it is still possible to feed color and grayscale images to the +recognizer. The trainable baseline segmenter can work with black and white, +grayscale, and color images, depending on the training data and network +configuration utilized; though grayscale and color data are used in almost all +cases.

+
>>> from PIL import Image
+
+>>> from kraken import binarization
+
+# can be any supported image format and mode
+>>> im = Image.open('foo.png')
+>>> bw_im = binarization.nlbin(im)
+
+
+
+

Legacy segmentation

+

The basic parameter of the legacy segmenter consists just of a b/w image +object, although some additional parameters exist, largely to change the +principal text direction (important for column ordering and top-to-bottom +scripts) and explicit masking of non-text image regions:

+
>>> from kraken import pageseg
+
+>>> seg = pageseg.segment(bw_im)
+>>> seg
+Segmentation(type='bbox',
+             imagename='foo.png',
+             text_direction='horizontal-lr',
+             script_detection=False,
+             lines=[BBoxLine(id='0ce11ad6-1f3b-4f7d-a8c8-0178e411df69',
+                             bbox=[74, 61, 136, 101],
+                             text=None,
+                             base_dir=None,
+                             type='bbox',
+                             imagename=None,
+                             tags=None,
+                             split=None,
+                             regions=None,
+                             text_direction='horizontal-lr'),
+                    BBoxLine(id='c4a751dc-6731-4eea-a287-d4b57683f5b0', ...),
+                    ....],
+             regions={},
+             line_orders=[])
+
+
+

All segmentation methods return a kraken.containers.Segmentation +object that contains all elements of the segmentation: its type, a list of +lines (either kraken.containers.BBoxLine or +kraken.containers.BaselineLine), a dictionary mapping region types to +lists of regions (kraken.containers.Region), and one or more line +reading orders.

+
+
+

Baseline segmentation

+

The baseline segmentation method is based on a neural network that classifies +image pixels into baselines and regions. Because it is trainable, a +segmentation model is required in addition to the image to be segmented and +it has to be loaded first:

+
>>> from kraken import blla
+>>> from kraken.lib import vgsl
+
+>>> model_path = 'path/to/model/file'
+>>> model = vgsl.TorchVGSLModel.load_model(model_path)
+
+
+

A segmentation model contains a basic neural network and associated metadata +defining the available line and region types, bounding regions, and an +auxiliary baseline location flag for the polygonizer:

+ + + + + + + + + + + + + Segmentation Model + (TorchVGSLModel) + + + + + + + + + Metadata + + + + + + + Line and Region Types + + + + + + + Baseline location flag + + + + + + + Bounding Regions + + + + + + + + + + + Neural Network + + + + +

Afterwards they can be fed into the segmentation method +kraken.blla.segment() with image objects:

+
>>> from kraken import blla
+>>> from kraken import serialization
+
+>>> baseline_seg = blla.segment(im, model=model)
+>>> baseline_seg
+Segmentation(type='baselines',
+             imagename='foo.png',
+             text_direction='horizontal-lr',
+             script_detection=False,
+             lines=[BaselineLine(id='22fee3d1-377e-4130-b9e5-5983a0c50ce8',
+                                 baseline=[[71, 93], [145, 92]],
+                                 boundary=[[71, 93], ..., [71, 93]],
+                                 text=None,
+                                 base_dir=None,
+                                 type='baselines',
+                                 imagename=None,
+                                 tags={'type': 'default'},
+                                 split=None,
+                                 regions=['f17d03e0-50bb-4a35-b247-cb910c0aaf2b']),
+                    BaselineLine(id='539eadce-f795-4bba-a785-c7767d10c407', ...), ...],
+             regions={'text': [Region(id='f17d03e0-50bb-4a35-b247-cb910c0aaf2b',
+                                      boundary=[[277, 54], ..., [277, 54]],
+                                      imagename=None,
+                                      tags={'type': 'text'})]},
+             line_orders=[])
+>>> alto = serialization.serialize(baseline_seg,
+                                   image_size=im.size,
+                                   template='alto')
+>>> with open('segmentation_output.xml', 'w') as fp:
+        fp.write(alto)
+
+
+

A default segmentation model is supplied and will be used if none is specified +explicitly as an argument. Optional parameters are largely the same as for the +legacy segmenter, i.e. text direction and masking.

+

Images are automatically converted into the proper mode for recognition, except +in the case of models trained on binary images as there is a plethora of +different algorithms available, each with strengths and weaknesses. For most +material the kraken-provided binarization should be sufficient, though. This +does not mean that a segmentation model trained on RGB images will have equal +accuracy for B/W, grayscale, and RGB inputs. Nevertheless the drop in quality +will often be modest or non-existent for color models while non-binarized +inputs to a binary model will cause severe degradation (and a warning to that +notion).

+

Per default segmentation is performed on the CPU although the neural network +can be run on a GPU with the device argument. As the vast majority of the +processing required is postprocessing the performance gain will most likely +modest though.

+

The above API is the most simple way to perform a complete segmentation. The +process consists of multiple steps such as pixel labelling, separate region and +baseline vectorization, and bounding polygon calculation:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Pixel Labelling + + + + + + + + Line and Separator + Heatmaps + + + + + + + + + Bounding Polygon + Calculation + + + + + + + + + + + Baseline + Vectorization + and Orientation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Oriented + Baselines + + + + + + + + + Line + Ordering + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Bounding + Polygons + + + + + + + Trainable + + + + + + + + + + + + Segmentation + + + + + + + + + + Region Heatmaps + + + + + + + + + + Region + Vectorization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Region + Boundaries + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

It is possible to only run a subset of the functionality depending on one’s +needs by calling the respective functions in kraken.lib.segmentation. As +part of the sub-library the API is not guaranteed to be stable but it generally +does not change much. Examples of more fine-grained use of the segmentation API +can be found in contrib/repolygonize.py +and contrib/segmentation_overlay.py.

+
+
+
+

Recognition

+

Recognition itself is a multi-step process with a neural network producing a +matrix with a confidence value for possible outputs at each time step. This +matrix is decoded into a sequence of integer labels (label domain) which are +subsequently mapped into Unicode code points using a codec. Labels and code +points usually correspond one-to-one, i.e. each label is mapped to exactly one +Unicode code point, but if desired more complex codecs can map single labels to +multiple code points, multiple labels to single code points, or multiple labels +to multiple code points (see the Codec section for further +information).

+ + + + + + + + + + + + Output Matrix + + + Labels + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Label + Sequence + + + 15, 10, 1, ... + + + + 'Time' Steps + + + + + + + + + + + + + + 'Time' Steps + (Width) + + + + + + + + + + + + + + + + + + + + + + + + + + Neural + Net + + + + Character + Sequence + + + o, c, u, ... + + + + + + + + + + + + + + + CTC + decoder + + + + + Codec + + + + + + + + + + + + + + +

As the customization of this two-stage decoding process is usually reserved +for specialized use cases, sensible defaults are chosen by default: codecs are +part of the model file and do not have to be supplied manually; the preferred +CTC decoder is an optional parameter of the recognition model object.

+

To perform text line recognition a neural network has to be loaded first. A +kraken.lib.models.TorchSeqRecognizer is returned which is a wrapper +around the kraken.lib.vgsl.TorchVGSLModel class seen above for +segmentation model loading.

+
>>> from kraken.lib import models
+
+>>> rec_model_path = '/path/to/recognition/model'
+>>> model = models.load_any(rec_model_path)
+
+
+

The sequence recognizer wrapper combines the neural network itself, a +codec, metadata such as if the input is supposed to be +grayscale or binarized, and an instance of a CTC decoder that performs the +conversion of the raw output tensor of the network into a sequence of labels:

+ + + + + + + + + + + + + Transcription Model + (TorchSeqRecognizer) + + + + + + + + + + Codec + + + + + + + + + + + Metadata + + + + + + + + + + + CTC Decoder + + + + + + + + + + + Neural Network + + + + +

Afterwards, given an image, a segmentation and the model one can perform text +recognition. The code is identical for both legacy and baseline segmentations. +Like for segmentation input images are auto-converted to the correct color +mode, except in the case of binary models for which a warning will be raised if +there is a mismatch.

+

There are two methods for recognition, a basic single model call +kraken.rpred.rpred() and a multi-model recognizer +kraken.rpred.mm_rpred(). The latter is useful for recognizing +multi-scriptal documents, i.e. applying different models to different parts of +a document.

+
>>> from kraken import rpred
+# single model recognition
+>>> pred_it = rpred(network=model,
+                    im=im,
+                    segmentation=baseline_seg)
+>>> for record in pred_it:
+        print(record)
+
+
+

The output isn’t just a sequence of characters but, depending on the type of +segmentation supplied, a kraken.containers.BaselineOCRRecord or +kraken.containers.BBoxOCRRecord record object containing the character +prediction, cuts (approximate locations), and confidences.

+
>>> record.cuts
+>>> record.prediction
+>>> record.confidences
+
+
+

it is also possible to access the original line information:

+
# for baselines
+>>> record.type
+'baselines'
+>>> record.line
+>>> record.baseline
+>>> record.script
+
+# for box lines
+>>> record.type
+'bbox'
+>>> record.line
+>>> record.script
+
+
+

Sometimes the undecoded raw output of the network is required. The \(C +\times W\) softmax output matrix is accessible as the outputs attribute on the +kraken.lib.models.TorchSeqRecognizer after each step of the +kraken.rpred.rpred() iterator. To get a mapping from the label space +\(C\) the network operates in to Unicode code points a codec is used. An +arbitrary sequence of labels can generate an arbitrary number of Unicode code +points although usually the relation is one-to-one.

+
>>> pred_it = rpred(model, im, baseline_seg)
+>>> next(pred_it)
+>>> model.output
+>>> model.codec.l2c
+{'\x01': ' ',
+ '\x02': '"',
+ '\x03': "'",
+ '\x04': '(',
+ '\x05': ')',
+ '\x06': '-',
+ '\x07': '/',
+ ...
+}
+
+
+

There are several different ways to convert the output matrix to a sequence of +labels that can be decoded into a character sequence. These are contained in +kraken.lib.ctc_decoder with +kraken.lib.ctc_decoder.greedy_decoder() being the default.

+
+
+

XML Parsing

+

Sometimes it is desired to take the data in an existing XML serialization +format like PageXML or ALTO and apply an OCR function on it. The +kraken.lib.xml module includes parsers extracting information into data +structures processable with minimal transformation by the functional blocks:

+

Parsing is accessed is through the kraken.lib.xml.XMLPage class.

+
>>> from kraken.lib import xml
+
+>>> alto_doc = '/path/to/alto'
+>>> parsed_doc = xml.XMLPage(alto_doc)
+>>> parsed_doc
+XMLPage(filename='/path/to/alto', filetype=alto)
+>>> parsed_doc.lines
+{'line_1469098625593_463': BaselineLine(id='line_1469098625593_463',
+                                        baseline=[(2337, 226), (2421, 239)],
+                                        boundary=[(2344, 182), (2428, 195), (2420, 244), (2336, 231)],
+                                        text='$pag:39',
+                                        base_dir=None,
+                                        type='baselines',
+                                        imagename=None,
+                                        tags={'type': '$pag'},
+                                        split=None,
+                                        regions=['region_1469098609000_462']),
+
+ 'line_1469098649515_464': BaselineLine(id='line_1469098649515_464',
+                                        baseline=[(789, 269), (2397, 304)],
+                                        boundary=[(790, 224), (2398, 259), (2397, 309), (789, 274)],
+                                        text='$-nor su hijo, De todos sus bienes, con los pactos',
+                                        base_dir=None,
+                                        type='baselines',
+                                        imagename=None,
+                                        tags={'type': '$pac'},
+                                        split=None,
+                                        regions=['region_1469098557906_461']),
+ ....}
+>>> parsed_doc.regions
+{'$pag': [Region(id='region_1469098609000_462',
+                 boundary=[(2324, 171), (2437, 171), (2436, 258), (2326, 237)],
+                 imagename=None,
+                 tags={'type': '$pag'})],
+ '$pac': [Region(id='region_1469098557906_461',
+                 boundary=[(738, 203), (2339, 245), (2398, 294), (2446, 345), (2574, 469), (2539, 1873), (2523, 2053), (2477, 2182), (738, 2243)],
+                 imagename=None,
+                 tags={'type': '$pac'})],
+ '$tip': [Region(id='TextRegion_1520586482298_194',
+                 boundary=[(687, 2428), (688, 2422), (107, 2420), (106, 2264), (789, 2256), (758, 2404)],
+                 imagename=None,
+                 tags={'type': '$tip'})],
+ '$par': [Region(id='TextRegion_1520586482298_193',
+                 boundary=[(675, 3772), (687, 2428), (758, 2404), (789, 2256), (2542, 2236), (2581, 3748)],
+                 imagename=None,
+                 tags={'type': '$par'})]
+}
+
+
+

The parser is aware of reading order(s), thus the basic properties accessing +lines and regions are unordered dictionaries. Reading orders can be accessed +separately through the reading_orders property:

+
>>> parsed_doc.region_orders
+{'line_implicit': {'order': ['line_1469098625593_463',
+                             'line_1469098649515_464',
+                             ...
+                            'line_1469099255968_508'],
+                   'is_total': True,
+                   'description': 'Implicit line order derived from element sequence'},
+'region_implicit': {'order': ['region_1469098609000_462',
+                              ...
+                             'TextRegion_1520586482298_193'],
+                    'is_total': True,
+                    'description': 'Implicit region order derived from element sequence'},
+'region_transkribus': {'order': ['region_1469098609000_462',
+                                 ...
+                                'TextRegion_1520586482298_193'],
+                    'is_total': True,
+                    'description': 'Explicit region order from `custom` attribute'},
+'line_transkribus': {'order': ['line_1469098625593_463',
+                               ...
+                               'line_1469099255968_508'],
+                     'is_total': True,
+                     'description': 'Explicit line order from `custom` attribute'},
+'o_1530717944451': {'order': ['region_1469098609000_462',
+                              ...
+                              'TextRegion_1520586482298_193'],
+                   'is_total': True,
+                   'description': 'Regions reading order'}}
+
+
+

Reading orders are created from different sources, depending on the content of +the XML file. Every document will contain at least implicit orders for lines +and regions (line_implicit and region_implicit) sourced from the sequence +of line and region elements. There can also be explicit additional orders +defined by the standard reading order elements, for example o_1530717944451 +in the above example. In Page XML files reading orders defined with the +Transkribus style custom attribute are also recognized.

+

To access the lines or regions of a document in a particular order:

+
>>> parsed_doc.get_sorted_lines(ro='line_implicit')
+[BaselineLine(id='line_1469098625593_463',
+              baseline=[(2337, 226), (2421, 239)],
+              boundary=[(2344, 182), (2428, 195), (2420, 244), (2336, 231)],
+              text='$pag:39',
+              base_dir=None,
+              type='baselines',
+              imagename=None,
+              tags={'type': '$pag'},
+              split=None,
+              regions=['region_1469098609000_462']),
+ BaselineLine(id='line_1469098649515_464',
+              baseline=[(789, 269), (2397, 304)],
+              boundary=[(790, 224), (2398, 259), (2397, 309), (789, 274)],
+              text='$-nor su hijo, De todos sus bienes, con los pactos',
+              base_dir=None,
+              type='baselines',
+              imagename=None,
+              tags={'type': '$pac'},
+              split=None,
+              regions=['region_1469098557906_461'])
+...]
+
+
+

The recognizer functions do not accept kraken.lib.xml.XMLPage objects +directly which means that for most practical purposes these need to be +converted into container objects:

+
>>> segmentation = parsed_doc.to_container()
+>>> pred_it = rpred(network=model,
+                    im=im,
+                    segmentation=segmentation)
+>>> for record in pred_it:
+        print(record)
+
+
+
+
+

Serialization

+

The serialization module can be used to transform results returned by the +segmenter or recognizer into a text based (most often XML) format for archival. +The module renders jinja2 templates, +either ones packaged with kraken or supplied externally, +through the kraken.serialization.serialize() function.

+
>>> import dataclasses
+>>> from kraken.lib import serialization
+
+>>> alto_seg_only = serialization.serialize(baseline_seg, image_size=im.size, template='alto')
+
+>>> records = [record for record in pred_it]
+>>> results = dataclasses.replace(pred_it.bounds, lines=records)
+>>> alto = serialization.serialize(results, image_size=im.size, template='alto')
+>>> with open('output.xml', 'w') as fp:
+        fp.write(alto)
+
+
+

The serialization function accepts arbitrary +kraken.containers.Segmentation objects, which may contain textual or +only segmentation information. As the recognizer returns +ocr_records which cannot be serialized +directly it is necessary to either construct a new +kraken.containers.Segmentation from scratch or insert them into the +segmentation fed into the recognizer (ocr_records subclass BaselineLine/BBoxLine The container classes are immutable data classes, +therefore it is necessary for simple insertion of the records to use +dataclasses.replace to create a new segmentation with a changed lines +attribute.

+
+
+

Training

+

Training is largely implemented with the pytorch lightning framework. There are separate +LightningModule`s for recognition and segmentation training and a small +wrapper around the lightning’s `Trainer class that mainly sets up model +handling and verbosity options for the CLI.

+
>>> from kraken.lib.train import RecognitionModel, KrakenTrainer
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer()
+>>> trainer.fit(model)
+
+
+

Likewise for a baseline and region segmentation model:

+
>>> from kraken.lib.train import SegmentationModel, KrakenTrainer
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = SegmentationModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer()
+>>> trainer.fit(model)
+
+
+

When the fit() method is called the dataset is initialized and the training +commences. Both can take quite a bit of time. To get insight into what exactly +is happening the standard lightning callbacks +can be attached to the trainer object:

+
>>> from pytorch_lightning.callbacks import Callback
+>>> from kraken.lib.train import RecognitionModel, KrakenTrainer
+>>> class MyPrintingCallback(Callback):
+    def on_init_start(self, trainer):
+        print("Starting to init trainer!")
+
+    def on_init_end(self, trainer):
+        print("trainer is init now")
+
+    def on_train_end(self, trainer, pl_module):
+        print("do something when training ends")
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer(enable_progress_bar=False, callbacks=[MyPrintingCallback])
+>>> trainer.fit(model)
+Starting to init trainer!
+trainer is init now
+
+
+

This is only a small subset of the training functionality. It is suggested to +have a closer look at the command line parameters for features as transfer +learning, region and baseline filtering, training continuation, and so on.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.2/api_docs.html b/5.2/api_docs.html new file mode 100644 index 000000000..df27c7dc1 --- /dev/null +++ b/5.2/api_docs.html @@ -0,0 +1,4360 @@ + + + + + + + + API Reference — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

API Reference

+
+

Segmentation

+
+

kraken.blla module

+
+

Note

+

blla provides the interface to the fully trainable segmenter. For the +legacy segmenter interface refer to the pageseg module. Note that +recognition models are not interchangeable between segmenters.

+
+
+
+kraken.blla.segment(im, text_direction='horizontal-lr', mask=None, reading_order_fn=polygonal_reading_order, model=None, device='cpu', raise_on_error=False, autocast=False)
+

Segments a page into text lines using the baseline segmenter.

+

Segments a page into text lines and returns the polyline formed by each +baseline and their estimated environment.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image. The mode can generally be anything but it is possible +to supply a binarized-input-only model which requires accordingly +treated images.

  • +
  • text_direction (Literal['horizontal-lr', 'horizontal-rl', 'vertical-lr', 'vertical-rl']) – Passed-through value for serialization.serialize.

  • +
  • mask (Optional[numpy.ndarray]) – A bi-level mask image of the same size as im where 0-valued +regions are ignored for segmentation purposes. Disables column +detection.

  • +
  • reading_order_fn (Callable) – Function to determine the reading order. Has to +accept a list of tuples (baselines, polygon) and a +text direction (lr or rl).

  • +
  • model (Union[List[kraken.lib.vgsl.TorchVGSLModel], kraken.lib.vgsl.TorchVGSLModel]) – One or more TorchVGSLModel containing a segmentation model. If +none is given a default model will be loaded.

  • +
  • device (str) – The target device to run the neural network on.

  • +
  • raise_on_error (bool) – Raises error instead of logging them when they are +not-blocking

  • +
  • autocast (bool) – Runs the model with automatic mixed precision

  • +
+
+
Returns:
+

A kraken.containers.Segmentation class containing reading +order sorted baselines (polylines) and their respective polygonal +boundaries as kraken.containers.BaselineLine records. The +last and first point of each boundary polygon are connected.

+
+
Raises:
+
+
+
Return type:
+

kraken.containers.Segmentation

+
+
+

Notes

+

Multi-model operation is most useful for combining one or more region +detection models and one text line model. Detected lines from all +models are simply combined without any merging or duplicate detection +so the chance of the same line appearing multiple times in the output +are high. In addition, neural reading order determination is disabled +when more than one model outputs lines.

+
+ +
+
+

kraken.pageseg module

+
+

Note

+

pageseg is the legacy bounding box-based segmenter. For the trainable +baseline segmenter interface refer to the blla module. Note that +recognition models are not interchangeable between segmenters.

+
+
+
+kraken.pageseg.segment(im, text_direction='horizontal-lr', scale=None, maxcolseps=2, black_colseps=False, no_hlines=True, pad=0, mask=None, reading_order_fn=reading_order)
+

Segments a page into text lines.

+

Segments a page into text lines and returns the absolute coordinates of +each line in reading order.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – A bi-level page of mode ‘1’ or ‘L’

  • +
  • text_direction (str) – Principal direction of the text +(horizontal-lr/rl/vertical-lr/rl)

  • +
  • scale (Optional[float]) – Scale of the image. Will be auto-determined if set to None.

  • +
  • maxcolseps (float) – Maximum number of whitespace column separators

  • +
  • black_colseps (bool) – Whether column separators are assumed to be vertical +black lines or not

  • +
  • no_hlines (bool) – Switch for small horizontal line removal.

  • +
  • pad (Union[int, Tuple[int, int]]) – Padding to add to line bounding boxes. If int the same padding is +used both left and right. If a 2-tuple, uses (padding_left, +padding_right).

  • +
  • mask (Optional[numpy.ndarray]) – A bi-level mask image of the same size as im where 0-valued +regions are ignored for segmentation purposes. Disables column +detection.

  • +
  • reading_order_fn (Callable) – Function to call to order line output. Callable +accepting a list of slices (y, x) and a text +direction in (rl, lr).

  • +
+
+
Returns:
+

A kraken.containers.Segmentation class containing reading +order sorted bounding box-type lines as +kraken.containers.BBoxLine records.

+
+
Raises:
+

KrakenInputException – if the input image is not binarized or the text +direction is invalid.

+
+
Return type:
+

kraken.containers.Segmentation

+
+
+
+ +
+
+
+

Recognition

+
+

kraken.rpred module

+
+
+class kraken.rpred.mm_rpred(nets, im, bounds, pad=16, bidi_reordering=True, tags_ignore=None, no_legacy_polygons=False)
+

Multi-model version of kraken.rpred.rpred

+
+
Parameters:
+
+
+
+
+
+bidi_reordering
+
+ +
+
+bounds
+
+ +
+
+im
+
+ +
+
+len
+
+ +
+
+line_iter
+
+ +
+
+nets
+
+ +
+
+no_legacy_polygons
+
+ +
+
+one_channel_modes
+
+ +
+
+pad
+
+ +
+
+seg_types
+
+ +
+
+tags_ignore
+
+ +
+ +
+
+kraken.rpred.rpred(network, im, bounds, pad=16, bidi_reordering=True, no_legacy_polygons=False)
+

Uses a TorchSeqRecognizer and a segmentation to recognize text

+
+
Parameters:
+
    +
  • network (kraken.lib.models.TorchSeqRecognizer) – A TorchSegRecognizer object

  • +
  • im (PIL.Image.Image) – Image to extract text from

  • +
  • bounds (kraken.containers.Segmentation) – A Segmentation class instance containing either a baseline or +bbox segmentation.

  • +
  • pad (int) – Extra blank padding to the left and right of text line. +Auto-disabled when expected network inputs are incompatible with +padding.

  • +
  • bidi_reordering (Union[bool, str]) – Reorder classes in the ocr_record according to the +Unicode bidirectional algorithm for correct display. +Set to L|R to change base text direction.

  • +
  • no_legacy_polygons (bool)

  • +
+
+
Yields:
+

An ocr_record containing the recognized text, absolute character +positions, and confidence values for each character.

+
+
Return type:
+

Generator[kraken.containers.ocr_record, None, None]

+
+
+
+ +
+
+
+

Serialization

+
+

kraken.serialization module

+
+
+kraken.serialization.render_report(model, chars, errors, char_confusions, scripts, insertions, deletions, substitutions)
+

Renders an accuracy report.

+
+
Parameters:
+
    +
  • model (str) – Model name.

  • +
  • errors (int) – Number of errors on test set.

  • +
  • char_confusions (dict) – Dictionary mapping a tuple (gt, pred) to a +number of occurrences.

  • +
  • scripts (dict) – Dictionary counting character per script.

  • +
  • insertions (dict) – Dictionary counting insertion operations per Unicode +script

  • +
  • deletions (int) – Number of deletions

  • +
  • substitutions (dict) – Dictionary counting substitution operations per +Unicode script.

  • +
  • chars (int)

  • +
+
+
Returns:
+

A string containing the rendered report.

+
+
Return type:
+

str

+
+
+
+ +
+
+kraken.serialization.serialize(results, image_size=(0, 0), writing_mode='horizontal-tb', scripts=None, template='alto', template_source='native', processing_steps=None)
+

Serializes recognition and segmentation results into an output document.

+

Serializes a Segmentation container object containing either segmentation +or recognition results into an output document. The rendering is performed +with jinja2 templates that can either be shipped with kraken +(template_source == ‘native’) or custom (template_source == ‘custom’).

+

Note: Empty records are ignored for serialization purposes.

+
+
Parameters:
+
    +
  • segmentation – Segmentation container object

  • +
  • image_size (Tuple[int, int]) – Dimensions of the source image

  • +
  • writing_mode (Literal['horizontal-tb', 'vertical-lr', 'vertical-rl']) – Sets the principal layout of lines and the +direction in which blocks progress. Valid values are +horizontal-tb, vertical-rl, and vertical-lr.

  • +
  • scripts (Optional[Iterable[str]]) – List of scripts contained in the OCR records

  • +
  • template ([os.PathLike, str]) – Selector for the serialization format. May be ‘hocr’, +‘alto’, ‘page’ or any template found in the template +directory. If template_source is set to custom a path to a +template is expected.

  • +
  • template_source (Literal['native', 'custom']) – Switch to enable loading of custom templates from +outside the kraken package.

  • +
  • processing_steps (Optional[List[kraken.containers.ProcessingStep]]) – A list of ProcessingStep container classes describing +the processing kraken performed on the inputs.

  • +
  • results (kraken.containers.Segmentation)

  • +
+
+
Returns:
+

The rendered template

+
+
Return type:
+

str

+
+
+
+ +
+
+

Default templates

+
+

ALTO 4.4

+
{% set proc_type_table = {'processing': 'contentGeneration',
+              'preprocessing': 'preOperation',
+              'postprocessing': 'postOperation'}
+%}
+{%+ macro render_line(page, line) +%}
+                    <TextLine ID="{{ line.id }}" HPOS="{{ line.bbox[0] }}" VPOS="{{ line.bbox[1] }}" WIDTH="{{ line.bbox[2] - line.bbox[0] }}" HEIGHT="{{ line.bbox[3] - line.bbox[1] }}" {% if line.baseline %}BASELINE="{{ line.baseline|sum(start=[])|join(' ') }}"{% endif %} {% if line.tags %}TAGREFS="{% for type in page.line_types %}{% if type[0] in line.tags and line.tags[type[0]] == type[1] %}LINE_TYPE_{{ loop.index }}{% endif %}{% endfor %}"{% endif %}>
+                        {% if line.boundary %}
+                        <Shape>
+                            <Polygon POINTS="{{ line.boundary|sum(start=[])|join(' ') }}"/>
+                        </Shape>
+                        {% endif %}
+                            {% if line.recognition|length() == 0 %}
+                        <String CONTENT=""/>
+                        {% else %}
+                        {% for segment in line.recognition %}
+                        {# ALTO forbids encoding whitespace before any String/Shape tags #}
+                        {% if segment.text is whitespace and loop.index > 1 %}
+                        <SP ID="segment_{{ segment.index }}" HPOS="{{ segment.bbox[0]}}"  VPOS="{{ segment.bbox[1] }}" WIDTH="{{ segment.bbox[2] - segment.bbox[0] }}"  HEIGHT="{{ segment.bbox[3] - segment.bbox[1] }}"/>
+                        {% else %}
+                        <String ID="segment_{{ segment.index }}" CONTENT="{{ segment.text|e }}" HPOS="{{ segment.bbox[0] }}" VPOS="{{ segment.bbox[1] }}" WIDTH="{{ segment.bbox[2] - segment.bbox[0] }}" HEIGHT="{{ segment.bbox[3] - segment.bbox[1] }}" WC="{{ (segment.confidences|sum / segment.confidences|length)|round(4) }}">
+                            {% if segment.boundary %}
+                            <Shape>
+                                <Polygon POINTS="{{ segment.boundary|sum(start=[])|join(' ') }}"/>
+                            </Shape>
+                            {% endif %}
+                            {% for char in segment.recognition %}
+                            <Glyph ID="char_{{ char.index }}" CONTENT="{{ char.text|e }}" HPOS="{{ char.bbox[0] }}" VPOS="{{ char.bbox[1] }}" WIDTH="{{ char.bbox[2] - char.bbox[0] }}" HEIGHT="{{ char.bbox[3] - char.bbox[1] }}" GC="{{ char.confidence|round(4) }}">
+                                {% if char.boundary %}
+                                <Shape>
+                                    <Polygon POINTS="{{ char.boundary|sum(start=[])|join(' ') }}"/>
+                                </Shape>
+                                {% endif %}
+                            </Glyph>
+                            {% endfor %}
+                        </String>
+                        {% endif %}
+                        {% endfor %}
+                        {% endif %}
+                    </TextLine>
+{%+ endmacro %}
+<?xml version="1.0" encoding="UTF-8"?>
+<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+    xmlns="http://www.loc.gov/standards/alto/ns-v4#"
+    xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-3.xsd">
+    <Description>
+        <MeasurementUnit>pixel</MeasurementUnit>
+        <sourceImageInformation>
+            <fileName>{{ page.name }}</fileName>
+        </sourceImageInformation>
+        {% if metadata.processing_steps %}
+        {% for step in metadata.processing_steps %}
+        <Processing ID="OCR_{{ step.id }}">
+            <processingCategory>{{ proc_type_table[step.category] }}</processingCategory>
+            <processingStepDescription>{{ step.description }}</processingStepDescription>
+            <processingStepSettings>{% for k, v in step.settings.items() %}{{k}}: {{v}}{% if not loop.last %}; {% endif %}{% endfor %}</processingStepSettings>
+            <processingSoftware>
+                <softwareName>kraken</softwareName>
+                <softwareVersion>{{ metadata.version }}</softwareVersion>
+            </processingSoftware>
+        </Processing>
+        {% endfor %}
+        {% else %}
+        <Processing ID="OCR_0">
+            <processingCategory>other</processingCategory>
+            <processingStepDescription>unknown</processingStepDescription>
+            <processingSoftware>
+                <softwareName>kraken</softwareName>
+                <softwareVersion>{{ metadata.version }}</softwareVersion>
+            </processingSoftware>
+        </Processing>
+        {% endif %}
+    </Description>
+    <Tags>
+    {% for type, label in page.line_types %}
+        <OtherTag DESCRIPTION="line type" ID="LINE_TYPE_{{ loop.index }}" TYPE="{{ type }}" LABEL="{{ label }}"/>
+    {% endfor %}
+    {% for label in page.region_types %}
+        <OtherTag DESCRIPTION="region type" ID="REGION_TYPE_{{ loop.index }}" TYPE="region" LABEL="{{ label }}"/>
+    {% endfor %}
+    </Tags>
+    {% if page.line_orders|length() > 0 %}
+    <ReadingOrder>
+        {% if page.line_orders | length == 1 %}
+        <OrderedGroup ID="ro_0">
+           {% for id in page.line_orders[0] %}
+           <ElementRef ID="o_{{ loop.index }}" REF="{{ id }}"/>
+           {% endfor %}
+        </OrderedGroup>
+        {% else %}
+        <UnorderedGroup>
+        {% for ro in page.line_orders %}
+           <OrderedGroup ID="ro_{{ loop.index }}">
+           {% for id in ro %}
+               <ElementRef ID="o_{{ loop.index }}" REF="{{ id }}"/>
+           {% endfor %}
+           </OrderedGroup>
+	{% endfor %}
+        </UnorderedGroup>
+        {% endif %}
+    </ReadingOrder>
+    {% endif %}
+    <Layout>
+        <Page WIDTH="{{ page.size[0] }}" HEIGHT="{{ page.size[1] }}" PHYSICAL_IMG_NR="0" ID="page_0">
+            <PrintSpace HPOS="0" VPOS="0" WIDTH="{{ page.size[0] }}" HEIGHT="{{ page.size[1] }}">
+            {% for entity in page.entities %}
+                {% if entity.type == "region" %}
+                {% if loop.previtem and loop.previtem.type == 'line' %}
+                </TextBlock>
+                {% endif %}
+                <TextBlock ID="{{ entity.id }}" HPOS="{{ entity.bbox[0] }}" VPOS="{{ entity.bbox[1] }}" WIDTH="{{ entity.bbox[2] - entity.bbox[0] }}" HEIGHT="{{ entity.bbox[3] - entity.bbox[1] }}" {% if entity.tags %}{% for type in page.region_types %}{% if type in entity.tags.values() %}TAGREFS="REGION_TYPE_{{ loop.index }}"{% endif %}{% endfor %}{% endif %}>
+                    <Shape>
+                        <Polygon POINTS="{{ entity.boundary|sum(start=[])|join(' ') }}"/>
+                    </Shape>
+                    {%- for line in entity.lines -%}
+                    {{ render_line(page, line) }}
+                    {%- endfor -%}
+                </TextBlock>
+                {% else %}
+                {% if not loop.previtem or loop.previtem.type != 'line' %}
+                <TextBlock ID="textblock_{{ loop.index }}">
+                {% endif %}
+                    {{ render_line(page, entity) }}
+                {% if loop.last %}
+                </TextBlock>
+                {% endif %}
+            {% endif %}
+            {% endfor %}
+            </PrintSpace>
+        </Page>
+    </Layout>
+</alto>
+
+
+
+
+

PageXML

+
{% set proc_type_table = {'processing': 'contentGeneration',
+              'preprocessing': 'preOperation',
+              'postprocessing': 'postOperation'}
+%}
+{%+ macro render_line(page, line) +%}
+                    <TextLine ID="{{ line.id }}" HPOS="{{ line.bbox[0] }}" VPOS="{{ line.bbox[1] }}" WIDTH="{{ line.bbox[2] - line.bbox[0] }}" HEIGHT="{{ line.bbox[3] - line.bbox[1] }}" {% if line.baseline %}BASELINE="{{ line.baseline|sum(start=[])|join(' ') }}"{% endif %} {% if line.tags %}TAGREFS="{% for type in page.line_types %}{% if type[0] in line.tags and line.tags[type[0]] == type[1] %}LINE_TYPE_{{ loop.index }}{% endif %}{% endfor %}"{% endif %}>
+                        {% if line.boundary %}
+                        <Shape>
+                            <Polygon POINTS="{{ line.boundary|sum(start=[])|join(' ') }}"/>
+                        </Shape>
+                        {% endif %}
+                            {% if line.recognition|length() == 0 %}
+                        <String CONTENT=""/>
+                        {% else %}
+                        {% for segment in line.recognition %}
+                        {# ALTO forbids encoding whitespace before any String/Shape tags #}
+                        {% if segment.text is whitespace and loop.index > 1 %}
+                        <SP ID="segment_{{ segment.index }}" HPOS="{{ segment.bbox[0]}}"  VPOS="{{ segment.bbox[1] }}" WIDTH="{{ segment.bbox[2] - segment.bbox[0] }}"  HEIGHT="{{ segment.bbox[3] - segment.bbox[1] }}"/>
+                        {% else %}
+                        <String ID="segment_{{ segment.index }}" CONTENT="{{ segment.text|e }}" HPOS="{{ segment.bbox[0] }}" VPOS="{{ segment.bbox[1] }}" WIDTH="{{ segment.bbox[2] - segment.bbox[0] }}" HEIGHT="{{ segment.bbox[3] - segment.bbox[1] }}" WC="{{ (segment.confidences|sum / segment.confidences|length)|round(4) }}">
+                            {% if segment.boundary %}
+                            <Shape>
+                                <Polygon POINTS="{{ segment.boundary|sum(start=[])|join(' ') }}"/>
+                            </Shape>
+                            {% endif %}
+                            {% for char in segment.recognition %}
+                            <Glyph ID="char_{{ char.index }}" CONTENT="{{ char.text|e }}" HPOS="{{ char.bbox[0] }}" VPOS="{{ char.bbox[1] }}" WIDTH="{{ char.bbox[2] - char.bbox[0] }}" HEIGHT="{{ char.bbox[3] - char.bbox[1] }}" GC="{{ char.confidence|round(4) }}">
+                                {% if char.boundary %}
+                                <Shape>
+                                    <Polygon POINTS="{{ char.boundary|sum(start=[])|join(' ') }}"/>
+                                </Shape>
+                                {% endif %}
+                            </Glyph>
+                            {% endfor %}
+                        </String>
+                        {% endif %}
+                        {% endfor %}
+                        {% endif %}
+                    </TextLine>
+{%+ endmacro %}
+<?xml version="1.0" encoding="UTF-8"?>
+<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+    xmlns="http://www.loc.gov/standards/alto/ns-v4#"
+    xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-3.xsd">
+    <Description>
+        <MeasurementUnit>pixel</MeasurementUnit>
+        <sourceImageInformation>
+            <fileName>{{ page.name }}</fileName>
+        </sourceImageInformation>
+        {% if metadata.processing_steps %}
+        {% for step in metadata.processing_steps %}
+        <Processing ID="OCR_{{ step.id }}">
+            <processingCategory>{{ proc_type_table[step.category] }}</processingCategory>
+            <processingStepDescription>{{ step.description }}</processingStepDescription>
+            <processingStepSettings>{% for k, v in step.settings.items() %}{{k}}: {{v}}{% if not loop.last %}; {% endif %}{% endfor %}</processingStepSettings>
+            <processingSoftware>
+                <softwareName>kraken</softwareName>
+                <softwareVersion>{{ metadata.version }}</softwareVersion>
+            </processingSoftware>
+        </Processing>
+        {% endfor %}
+        {% else %}
+        <Processing ID="OCR_0">
+            <processingCategory>other</processingCategory>
+            <processingStepDescription>unknown</processingStepDescription>
+            <processingSoftware>
+                <softwareName>kraken</softwareName>
+                <softwareVersion>{{ metadata.version }}</softwareVersion>
+            </processingSoftware>
+        </Processing>
+        {% endif %}
+    </Description>
+    <Tags>
+    {% for type, label in page.line_types %}
+        <OtherTag DESCRIPTION="line type" ID="LINE_TYPE_{{ loop.index }}" TYPE="{{ type }}" LABEL="{{ label }}"/>
+    {% endfor %}
+    {% for label in page.region_types %}
+        <OtherTag DESCRIPTION="region type" ID="REGION_TYPE_{{ loop.index }}" TYPE="region" LABEL="{{ label }}"/>
+    {% endfor %}
+    </Tags>
+    {% if page.line_orders|length() > 0 %}
+    <ReadingOrder>
+        {% if page.line_orders | length == 1 %}
+        <OrderedGroup ID="ro_0">
+           {% for id in page.line_orders[0] %}
+           <ElementRef ID="o_{{ loop.index }}" REF="{{ id }}"/>
+           {% endfor %}
+        </OrderedGroup>
+        {% else %}
+        <UnorderedGroup>
+        {% for ro in page.line_orders %}
+           <OrderedGroup ID="ro_{{ loop.index }}">
+           {% for id in ro %}
+               <ElementRef ID="o_{{ loop.index }}" REF="{{ id }}"/>
+           {% endfor %}
+           </OrderedGroup>
+	{% endfor %}
+        </UnorderedGroup>
+        {% endif %}
+    </ReadingOrder>
+    {% endif %}
+    <Layout>
+        <Page WIDTH="{{ page.size[0] }}" HEIGHT="{{ page.size[1] }}" PHYSICAL_IMG_NR="0" ID="page_0">
+            <PrintSpace HPOS="0" VPOS="0" WIDTH="{{ page.size[0] }}" HEIGHT="{{ page.size[1] }}">
+            {% for entity in page.entities %}
+                {% if entity.type == "region" %}
+                {% if loop.previtem and loop.previtem.type == 'line' %}
+                </TextBlock>
+                {% endif %}
+                <TextBlock ID="{{ entity.id }}" HPOS="{{ entity.bbox[0] }}" VPOS="{{ entity.bbox[1] }}" WIDTH="{{ entity.bbox[2] - entity.bbox[0] }}" HEIGHT="{{ entity.bbox[3] - entity.bbox[1] }}" {% if entity.tags %}{% for type in page.region_types %}{% if type in entity.tags.values() %}TAGREFS="REGION_TYPE_{{ loop.index }}"{% endif %}{% endfor %}{% endif %}>
+                    <Shape>
+                        <Polygon POINTS="{{ entity.boundary|sum(start=[])|join(' ') }}"/>
+                    </Shape>
+                    {%- for line in entity.lines -%}
+                    {{ render_line(page, line) }}
+                    {%- endfor -%}
+                </TextBlock>
+                {% else %}
+                {% if not loop.previtem or loop.previtem.type != 'line' %}
+                <TextBlock ID="textblock_{{ loop.index }}">
+                {% endif %}
+                    {{ render_line(page, entity) }}
+                {% if loop.last %}
+                </TextBlock>
+                {% endif %}
+            {% endif %}
+            {% endfor %}
+            </PrintSpace>
+        </Page>
+    </Layout>
+</alto>
+
+
+
+
+

hOCR

+
{% set proc_type_table = {'processing': 'contentGeneration',
+              'preprocessing': 'preOperation',
+              'postprocessing': 'postOperation'}
+%}
+{%+ macro render_line(page, line) +%}
+                    <TextLine ID="{{ line.id }}" HPOS="{{ line.bbox[0] }}" VPOS="{{ line.bbox[1] }}" WIDTH="{{ line.bbox[2] - line.bbox[0] }}" HEIGHT="{{ line.bbox[3] - line.bbox[1] }}" {% if line.baseline %}BASELINE="{{ line.baseline|sum(start=[])|join(' ') }}"{% endif %} {% if line.tags %}TAGREFS="{% for type in page.line_types %}{% if type[0] in line.tags and line.tags[type[0]] == type[1] %}LINE_TYPE_{{ loop.index }}{% endif %}{% endfor %}"{% endif %}>
+                        {% if line.boundary %}
+                        <Shape>
+                            <Polygon POINTS="{{ line.boundary|sum(start=[])|join(' ') }}"/>
+                        </Shape>
+                        {% endif %}
+                            {% if line.recognition|length() == 0 %}
+                        <String CONTENT=""/>
+                        {% else %}
+                        {% for segment in line.recognition %}
+                        {# ALTO forbids encoding whitespace before any String/Shape tags #}
+                        {% if segment.text is whitespace and loop.index > 1 %}
+                        <SP ID="segment_{{ segment.index }}" HPOS="{{ segment.bbox[0]}}"  VPOS="{{ segment.bbox[1] }}" WIDTH="{{ segment.bbox[2] - segment.bbox[0] }}"  HEIGHT="{{ segment.bbox[3] - segment.bbox[1] }}"/>
+                        {% else %}
+                        <String ID="segment_{{ segment.index }}" CONTENT="{{ segment.text|e }}" HPOS="{{ segment.bbox[0] }}" VPOS="{{ segment.bbox[1] }}" WIDTH="{{ segment.bbox[2] - segment.bbox[0] }}" HEIGHT="{{ segment.bbox[3] - segment.bbox[1] }}" WC="{{ (segment.confidences|sum / segment.confidences|length)|round(4) }}">
+                            {% if segment.boundary %}
+                            <Shape>
+                                <Polygon POINTS="{{ segment.boundary|sum(start=[])|join(' ') }}"/>
+                            </Shape>
+                            {% endif %}
+                            {% for char in segment.recognition %}
+                            <Glyph ID="char_{{ char.index }}" CONTENT="{{ char.text|e }}" HPOS="{{ char.bbox[0] }}" VPOS="{{ char.bbox[1] }}" WIDTH="{{ char.bbox[2] - char.bbox[0] }}" HEIGHT="{{ char.bbox[3] - char.bbox[1] }}" GC="{{ char.confidence|round(4) }}">
+                                {% if char.boundary %}
+                                <Shape>
+                                    <Polygon POINTS="{{ char.boundary|sum(start=[])|join(' ') }}"/>
+                                </Shape>
+                                {% endif %}
+                            </Glyph>
+                            {% endfor %}
+                        </String>
+                        {% endif %}
+                        {% endfor %}
+                        {% endif %}
+                    </TextLine>
+{%+ endmacro %}
+<?xml version="1.0" encoding="UTF-8"?>
+<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+    xmlns="http://www.loc.gov/standards/alto/ns-v4#"
+    xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-3.xsd">
+    <Description>
+        <MeasurementUnit>pixel</MeasurementUnit>
+        <sourceImageInformation>
+            <fileName>{{ page.name }}</fileName>
+        </sourceImageInformation>
+        {% if metadata.processing_steps %}
+        {% for step in metadata.processing_steps %}
+        <Processing ID="OCR_{{ step.id }}">
+            <processingCategory>{{ proc_type_table[step.category] }}</processingCategory>
+            <processingStepDescription>{{ step.description }}</processingStepDescription>
+            <processingStepSettings>{% for k, v in step.settings.items() %}{{k}}: {{v}}{% if not loop.last %}; {% endif %}{% endfor %}</processingStepSettings>
+            <processingSoftware>
+                <softwareName>kraken</softwareName>
+                <softwareVersion>{{ metadata.version }}</softwareVersion>
+            </processingSoftware>
+        </Processing>
+        {% endfor %}
+        {% else %}
+        <Processing ID="OCR_0">
+            <processingCategory>other</processingCategory>
+            <processingStepDescription>unknown</processingStepDescription>
+            <processingSoftware>
+                <softwareName>kraken</softwareName>
+                <softwareVersion>{{ metadata.version }}</softwareVersion>
+            </processingSoftware>
+        </Processing>
+        {% endif %}
+    </Description>
+    <Tags>
+    {% for type, label in page.line_types %}
+        <OtherTag DESCRIPTION="line type" ID="LINE_TYPE_{{ loop.index }}" TYPE="{{ type }}" LABEL="{{ label }}"/>
+    {% endfor %}
+    {% for label in page.region_types %}
+        <OtherTag DESCRIPTION="region type" ID="REGION_TYPE_{{ loop.index }}" TYPE="region" LABEL="{{ label }}"/>
+    {% endfor %}
+    </Tags>
+    {% if page.line_orders|length() > 0 %}
+    <ReadingOrder>
+        {% if page.line_orders | length == 1 %}
+        <OrderedGroup ID="ro_0">
+           {% for id in page.line_orders[0] %}
+           <ElementRef ID="o_{{ loop.index }}" REF="{{ id }}"/>
+           {% endfor %}
+        </OrderedGroup>
+        {% else %}
+        <UnorderedGroup>
+        {% for ro in page.line_orders %}
+           <OrderedGroup ID="ro_{{ loop.index }}">
+           {% for id in ro %}
+               <ElementRef ID="o_{{ loop.index }}" REF="{{ id }}"/>
+           {% endfor %}
+           </OrderedGroup>
+	{% endfor %}
+        </UnorderedGroup>
+        {% endif %}
+    </ReadingOrder>
+    {% endif %}
+    <Layout>
+        <Page WIDTH="{{ page.size[0] }}" HEIGHT="{{ page.size[1] }}" PHYSICAL_IMG_NR="0" ID="page_0">
+            <PrintSpace HPOS="0" VPOS="0" WIDTH="{{ page.size[0] }}" HEIGHT="{{ page.size[1] }}">
+            {% for entity in page.entities %}
+                {% if entity.type == "region" %}
+                {% if loop.previtem and loop.previtem.type == 'line' %}
+                </TextBlock>
+                {% endif %}
+                <TextBlock ID="{{ entity.id }}" HPOS="{{ entity.bbox[0] }}" VPOS="{{ entity.bbox[1] }}" WIDTH="{{ entity.bbox[2] - entity.bbox[0] }}" HEIGHT="{{ entity.bbox[3] - entity.bbox[1] }}" {% if entity.tags %}{% for type in page.region_types %}{% if type in entity.tags.values() %}TAGREFS="REGION_TYPE_{{ loop.index }}"{% endif %}{% endfor %}{% endif %}>
+                    <Shape>
+                        <Polygon POINTS="{{ entity.boundary|sum(start=[])|join(' ') }}"/>
+                    </Shape>
+                    {%- for line in entity.lines -%}
+                    {{ render_line(page, line) }}
+                    {%- endfor -%}
+                </TextBlock>
+                {% else %}
+                {% if not loop.previtem or loop.previtem.type != 'line' %}
+                <TextBlock ID="textblock_{{ loop.index }}">
+                {% endif %}
+                    {{ render_line(page, entity) }}
+                {% if loop.last %}
+                </TextBlock>
+                {% endif %}
+            {% endif %}
+            {% endfor %}
+            </PrintSpace>
+        </Page>
+    </Layout>
+</alto>
+
+
+
+
+

ABBYY XML

+
{%+ macro render_line(page, line) +%}
+                    <line baseline="{{ ((line.bbox[1] + line.bbox[3]) / 2)|int }}" l="{{ line.bbox[0] }}" r="{{ line.bbox[2] }}" t="{{ line.bbox[1] }}" b="{{ line.bbox[3] }}"><formatting lang="">
+                        {% for segment in line.recognition %}
+                        {% for char in segment.recognition %}
+                        {% if loop.first %}
+                        <charParams l="{{ char.bbox[0] }}" r="{{ char.bbox[2] }}" t="{{ char.bbox[1] }}" b="{{ char.bbox[3] }}" wordStart="1" charConfidence="{{ [char.confidence]|rescale(0, 100)|int }}">{{ char.text }}</charParams>
+                        {% else %}
+                        <charParams l="{{ char.bbox[0] }}" r="{{ char.bbox[2] }}" t="{{ char.bbox[1] }}" b="{{ char.bbox[3] }}" wordStart="0" charConfidence="{{ [char.confidence]|rescale(0, 100)|int }}">{{ char.text }}</charParams>
+                        {% endif %}
+                        {% endfor %}
+                        {% endfor %}
+                    </formatting>
+                    </line>
+{%+ endmacro %}
+<?xml version="1.0" encoding="UTF-8"?>
+<document xmlns="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml" version="1.0" producer="kraken {{ metadata.version}}">
+    <page width="{{ page.size[0] }}" height="{{ page.size[1] }}" resolution="0" originalCoords="1">
+        {% for entity in page.entities %}
+        {% if entity.type == "region" %}
+        <block blockType="Text">
+            <text>
+                <par>
+                {%- for line in entity.lines -%}
+                    {{ render_line(page, line) }}
+                {%- endfor -%}
+                </par>
+            </text>
+        </block>
+        {% else %}
+        <block blockType="Text">
+            <text>
+                <par>
+                    {{ render_line(page, entity) }}
+                </par>
+            </text>
+        </block>
+        {% endif %}
+        {% endfor %}
+    </page>
+</document>
+
+
+
+
+
+
+

Containers and Helpers

+
+

kraken.lib.codec module

+
+
+class kraken.lib.codec.PytorchCodec(charset, strict=False)
+

Builds a codec converting between graphemes/code points and integer +label sequences.

+

charset may either be a string, a list or a dict. In the first case +each code point will be assigned a label, in the second case each +string in the list will be assigned a label, and in the final case each +key string will be mapped to the value sequence of integers. In the +first two cases labels will be assigned automatically. When a mapping +is manually provided the label codes need to be a prefix-free code.

+

As 0 is the blank label in a CTC output layer, output labels and input +dictionaries are/should be 1-indexed.

+
+
Parameters:
+
    +
  • charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) – Input character set.

  • +
  • strict – Flag indicating if encoding/decoding errors should be ignored +or cause an exception.

  • +
+
+
Raises:
+

KrakenCodecException – If the character set contains duplicate +entries or the mapping is non-singular or +non-prefix-free.

+
+
+
+
+add_labels(charset)
+

Adds additional characters/labels to the codec.

+

charset may either be a string, a list or a dict. In the first case +each code point will be assigned a label, in the second case each +string in the list will be assigned a label, and in the final case each +key string will be mapped to the value sequence of integers. In the +first two cases labels will be assigned automatically.

+

As 0 is the blank label in a CTC output layer, output labels and input +dictionaries are/should be 1-indexed.

+
+
Parameters:
+

charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) – Input character set.

+
+
Return type:
+

PytorchCodec

+
+
+
+ +
+
+c_sorted
+
+ +
+
+decode(labels)
+

Decodes a labelling.

+

Given a labelling with cuts and confidences returns a string with the +cuts and confidences aggregated across label-code point +correspondences. When decoding multilabels to code points the resulting +cuts are min/max, confidences are averaged.

+
+
Parameters:
+

labels (Sequence[Tuple[int, int, int, float]]) – Input containing tuples (label, start, end, +confidence).

+
+
Returns:
+

A list of tuples (code point, start, end, confidence)

+
+
Return type:
+

List[Tuple[str, int, int, float]]

+
+
+
+ +
+
+encode(s)
+

Encodes a string into a sequence of labels.

+

If the code is non-singular we greedily encode the longest sequence first.

+
+
Parameters:
+

s (str) – Input unicode string

+
+
Returns:
+

Ecoded label sequence

+
+
Raises:
+

KrakenEncodeException – if the a subsequence is not encodable and the +codec is set to strict mode.

+
+
Return type:
+

torch.IntTensor

+
+
+
+ +
+
+property is_valid: bool
+

Returns True if the codec is prefix-free (in label space) and +non-singular (in both directions).

+
+
Return type:
+

bool

+
+
+
+ +
+
+l2c: Dict[Tuple[int], str]
+
+ +
+
+l2c_single
+
+ +
+
+property max_label: int
+

Returns the maximum label value.

+
+
Return type:
+

int

+
+
+
+ +
+
+merge(codec)
+

Transforms this codec (c1) into another (c2) reusing as many labels as +possible.

+

The resulting codec is able to encode the same code point sequences +while not necessarily having the same labels for them as c2. +Retains matching character -> label mappings from both codecs, removes +mappings not c2, and adds mappings not in c1. Compound labels in c2 for +code point sequences not in c1 containing labels also in use in c1 are +added as separate labels.

+
+
Parameters:
+

codec (PytorchCodec) – PytorchCodec to merge with

+
+
Returns:
+

A merged codec and a list of labels that were removed from the +original codec.

+
+
Return type:
+

Tuple[PytorchCodec, Set]

+
+
+
+ +
+
+strict
+
+ +
+ +
+
+

kraken.containers module

+
+
+class kraken.containers.Segmentation
+

A container class for segmentation or recognition results.

+

In order to allow easy JSON de-/serialization, nested classes for lines +(BaselineLine/BBoxLine) and regions (Region) are reinstantiated from their +dictionaries.

+
+
+type
+

Field indicating if baselines +(kraken.containers.BaselineLine) or bbox +(kraken.containers.BBoxLine) line records are in the +segmentation.

+
+ +
+
+imagename
+

Path to the image associated with the segmentation.

+
+ +
+
+text_direction
+

Sets the principal orientation (of the line), i.e. +horizontal/vertical, and reading direction (of the +document), i.e. lr/rl.

+
+ +
+
+script_detection
+

Flag indicating if the line records have tags.

+
+ +
+
+lines
+

List of line records. Records are expected to be in a valid +reading order.

+
+ +
+
+regions
+

Dict mapping types to lists of regions.

+
+ +
+
+line_orders
+

List of alternative reading orders for the segmentation. +Each reading order is a list of line indices.

+
+ +
+
+imagename: str | os.PathLike
+
+ +
+
+line_orders: List[List[int]] | None = None
+
+ +
+
+lines: List[BaselineLine | BBoxLine] | None = None
+
+ +
+
+regions: Dict[str, List[Region]] | None = None
+
+ +
+
+script_detection: bool
+
+ +
+
+text_direction: Literal['horizontal-lr', 'horizontal-rl', 'vertical-lr', 'vertical-rl']
+
+ +
+
+type: Literal['baselines', 'bbox']
+
+ +
+ +
+
+class kraken.containers.BaselineLine
+

Baseline-type line record.

+

A container class for a single line in baseline + bounding polygon format, +optionally containing a transcription, tags, or associated regions.

+
+
+id
+

Unique identifier

+
+ +
+
+baseline
+

List of tuples (x_n, y_n) defining the baseline.

+
+ +
+
+boundary
+

List of tuples (x_n, y_n) defining the bounding polygon of +the line. The first and last points should be identical.

+
+ +
+
+text
+

Transcription of this line.

+
+ +
+
+base_dir
+

An optional string defining the base direction (also called +paragraph direction) for the BiDi algorithm. Valid values are +‘L’ or ‘R’. If None is given the default auto-resolution will +be used.

+
+ +
+
+imagename
+

Path to the image associated with the line.

+
+ +
+
+tags
+

A dict mapping types to values.

+
+ +
+
+split
+

Defines whether this line is in the train, validation, or +test set during training.

+
+ +
+
+regions
+

A list of identifiers of regions the line is associated with.

+
+ +
+
+base_dir: Literal['L', 'R'] | None = None
+
+ +
+
+baseline: List[Tuple[int, int]]
+
+ +
+
+boundary: List[Tuple[int, int]]
+
+ +
+
+id: str
+
+ +
+
+imagename: str | os.PathLike | None = None
+
+ +
+
+regions: List[str] | None = None
+
+ +
+
+split: Literal['train', 'validation', 'test'] | None = None
+
+ +
+
+tags: Dict[str, str] | None = None
+
+ +
+
+text: str | None = None
+
+ +
+
+type: str = 'baselines'
+
+ +
+ +
+
+class kraken.containers.BBoxLine
+

Bounding box-type line record.

+

A container class for a single line in axis-aligned bounding box format, +optionally containing a transcription, tags, or associated regions.

+
+
+id
+

Unique identifier

+
+ +
+
+bbox
+

Tuple in form (xmin, ymin, xmax, ymax) defining +the bounding box.

+
+ +
+
+text
+

Transcription of this line.

+
+ +
+
+base_dir
+

An optional string defining the base direction (also called +paragraph direction) for the BiDi algorithm. Valid values are +‘L’ or ‘R’. If None is given the default auto-resolution will +be used.

+
+ +
+
+imagename
+

Path to the image associated with the line..

+
+ +
+
+tags
+

A dict mapping types to values.

+
+ +
+
+split
+

Defines whether this line is in the train, validation, or +test set during training.

+
+ +
+
+regions
+

A list of identifiers of regions the line is associated with.

+
+ +
+
+text_direction
+

Sets the principal orientation (of the line) and +reading direction (of the document).

+
+ +
+
+base_dir: Literal['L', 'R'] | None = None
+
+ +
+
+bbox: Tuple[int, int, int, int]
+
+ +
+
+id: str
+
+ +
+
+imagename: str | os.PathLike | None = None
+
+ +
+
+regions: List[str] | None = None
+
+ +
+
+split: Literal['train', 'validation', 'test'] | None = None
+
+ +
+
+tags: Dict[str, str] | None = None
+
+ +
+
+text: str | None = None
+
+ +
+
+text_direction: Literal['horizontal-lr', 'horizontal-rl', 'vertical-lr', 'vertical-rl'] = 'horizontal-lr'
+
+ +
+
+type: str = 'bbox'
+
+ +
+ +
+
+class kraken.containers.Region
+

Container class of a single polygonal region.

+
+
+id
+

Unique identifier

+
+ +
+
+boundary
+

List of tuples (x_n, y_n) defining the bounding polygon of +the region. The first and last points should be identical.

+
+ +
+
+imagename
+

Path to the image associated with the region.

+
+ +
+
+tags
+

A dict mapping types to values.

+
+ +
+
+boundary: List[Tuple[int, int]]
+
+ +
+
+id: str
+
+ +
+
+imagename: str | os.PathLike | None = None
+
+ +
+
+tags: Dict[str, str] | None = None
+
+ +
+ +
+
+class kraken.containers.ocr_record(prediction, cuts, confidences, display_order=True)
+

A record object containing the recognition result of a single line

+
+
Parameters:
+
    +
  • prediction (str)

  • +
  • cuts (List[Union[Tuple[int, int], Tuple[Tuple[int, int], Tuple[int, int], Tuple[int, int], Tuple[int, int]]]])

  • +
  • confidences (List[float])

  • +
  • display_order (bool)

  • +
+
+
+
+
+base_dir = None
+
+ +
+
+property confidences: List[float]
+
+
Return type:
+

List[float]

+
+
+
+ +
+
+property cuts: List
+
+
Return type:
+

List

+
+
+
+ +
+
+abstract display_order(base_dir)
+
+
Return type:
+

ocr_record

+
+
+
+ +
+
+abstract logical_order(base_dir)
+
+
Return type:
+

ocr_record

+
+
+
+ +
+
+property prediction: str
+
+
Return type:
+

str

+
+
+
+ +
+
+abstract property type
+
+ +
+ +
+
+class kraken.containers.BaselineOCRRecord(prediction, cuts, confidences, line, base_dir=None, display_order=True)
+

A record object containing the recognition result of a single line in +baseline format.

+
+
Parameters:
+
    +
  • prediction (str)

  • +
  • cuts (List[Tuple[int, int]])

  • +
  • confidences (List[float])

  • +
  • line (Union[BaselineLine, Dict[str, Any]])

  • +
  • base_dir (Optional[Literal['L', 'R']])

  • +
  • display_order (bool)

  • +
+
+
+
+
+type
+

‘baselines’ to indicate a baseline record

+
+ +
+
+prediction
+

The text predicted by the network as one continuous string.

+
+
Return type:
+

str

+
+
+
+ +
+
+cuts
+

The absolute bounding polygons for each code point in prediction +as a list of tuples [(x0, y0), (x1, y2), …].

+
+
Return type:
+

List[Tuple[int, int]]

+
+
+
+ +
+
+confidences
+

A list of floats indicating the confidence value of each +code point.

+
+
Return type:
+

List[float]

+
+
+
+ +
+
+base_dir
+

An optional string defining the base direction (also called +paragraph direction) for the BiDi algorithm. Valid values are +‘L’ or ‘R’. If None is given the default auto-resolution will +be used.

+
+ +
+
+display_order
+

Flag indicating the order of the code points in the +prediction. In display order (True) the n-th code +point in the string corresponds to the n-th leftmost +code point, in logical order (False) the n-th code +point corresponds to the n-th read code point. See [UAX +#9](https://unicode.org/reports/tr9) for more details.

+
+
Parameters:
+

base_dir (Optional[Literal['L', 'R']])

+
+
Return type:
+

BaselineOCRRecord

+
+
+
+ +

Notes

+

When slicing the record the behavior of the cuts is changed from +earlier versions of kraken. Instead of returning per-character bounding +polygons a single polygons section of the line bounding polygon +starting at the first and extending to the last code point emitted by +the network is returned. This aids numerical stability when computing +aggregated bounding polygons such as for words. Individual code point +bounding polygons are still accessible through the cuts attribute or +by iterating over the record code point by code point.

+
+
+base_dir
+
+ +
+
+property cuts: List[Tuple[int, int]]
+
+
Return type:
+

List[Tuple[int, int]]

+
+
+
+ +
+
+display_order(base_dir=None)
+

Returns the OCR record in Unicode display order, i.e. ordered from left +to right inside the line.

+
+
Parameters:
+

base_dir (Optional[Literal['L', 'R']]) – An optional string defining the base direction (also +called paragraph direction) for the BiDi algorithm. Valid +values are ‘L’ or ‘R’. If None is given the default +auto-resolution will be used.

+
+
Return type:
+

BaselineOCRRecord

+
+
+
+ +
+
+logical_order(base_dir=None)
+

Returns the OCR record in Unicode logical order, i.e. in the order the +characters in the line would be read by a human.

+
+
Parameters:
+

base_dir (Optional[Literal['L', 'R']]) – An optional string defining the base direction (also +called paragraph direction) for the BiDi algorithm. Valid +values are ‘L’ or ‘R’. If None is given the default +auto-resolution will be used.

+
+
Return type:
+

BaselineOCRRecord

+
+
+
+ +
+
+type = 'baselines'
+
+ +
+ +
+
+class kraken.containers.BBoxOCRRecord(prediction, cuts, confidences, line, base_dir=None, display_order=True)
+

A record object containing the recognition result of a single line in +bbox format.

+
+
Parameters:
+
    +
  • prediction (str)

  • +
  • cuts (List[Tuple[Tuple[int, int], Tuple[int, int], Tuple[int, int], Tuple[int, int]]])

  • +
  • confidences (List[float])

  • +
  • line (Union[BBoxLine, Dict[str, Any]])

  • +
  • base_dir (Optional[Literal['L', 'R']])

  • +
  • display_order (bool)

  • +
+
+
+
+
+type
+

‘bbox’ to indicate a bounding box record

+
+ +
+
+prediction
+

The text predicted by the network as one continuous string.

+
+
Return type:
+

str

+
+
+
+ +
+
+cuts
+

The absolute bounding polygons for each code point in prediction +as a list of 4-tuples ((x0, y0), (x1, y0), (x1, y1), (x0, y1)).

+
+
Return type:
+

List

+
+
+
+ +
+
+confidences
+

A list of floats indicating the confidence value of each +code point.

+
+
Return type:
+

List[float]

+
+
+
+ +
+
+base_dir
+

An optional string defining the base direction (also called +paragraph direction) for the BiDi algorithm. Valid values are +‘L’ or ‘R’. If None is given the default auto-resolution will +be used.

+
+ +
+
+display_order
+

Flag indicating the order of the code points in the +prediction. In display order (True) the n-th code +point in the string corresponds to the n-th leftmost +code point, in logical order (False) the n-th code +point corresponds to the n-th read code point. See [UAX +#9](https://unicode.org/reports/tr9) for more details.

+
+
Parameters:
+

base_dir (Optional[Literal['L', 'R']])

+
+
Return type:
+

BBoxOCRRecord

+
+
+
+ +

Notes

+

When slicing the record the behavior of the cuts is changed from +earlier versions of kraken. Instead of returning per-character bounding +polygons a single polygons section of the line bounding polygon +starting at the first and extending to the last code point emitted by +the network is returned. This aids numerical stability when computing +aggregated bounding polygons such as for words. Individual code point +bounding polygons are still accessible through the cuts attribute or +by iterating over the record code point by code point.

+
+
+base_dir
+
+ +
+
+display_order(base_dir=None)
+

Returns the OCR record in Unicode display order, i.e. ordered from left +to right inside the line.

+
+
Parameters:
+

base_dir (Optional[Literal['L', 'R']]) – An optional string defining the base direction (also +called paragraph direction) for the BiDi algorithm. Valid +values are ‘L’ or ‘R’. If None is given the default +auto-resolution will be used.

+
+
Return type:
+

BBoxOCRRecord

+
+
+
+ +
+
+logical_order(base_dir=None)
+

Returns the OCR record in Unicode logical order, i.e. in the order the +characters in the line would be read by a human.

+
+
Parameters:
+

base_dir (Optional[Literal['L', 'R']]) – An optional string defining the base direction (also +called paragraph direction) for the BiDi algorithm. Valid +values are ‘L’ or ‘R’. If None is given the default +auto-resolution will be used.

+
+
Return type:
+

BBoxOCRRecord

+
+
+
+ +
+
+type = 'bbox'
+
+ +
+ +
+
+class kraken.containers.ProcessingStep
+

A processing step in the recognition pipeline.

+
+
+id
+

Unique identifier

+
+ +
+
+category
+

Category of processing step that has been performed.

+
+ +
+
+description
+

Natural-language description of the process.

+
+ +
+
+settings
+

Dict describing the parameters of the processing step.

+
+ +
+
+category: Literal['preprocessing', 'processing', 'postprocessing']
+
+ +
+
+description: str
+
+ +
+
+id: str
+
+ +
+
+settings: Dict[str, Dict | str | float | int | bool]
+
+ +
+ +
+
+

kraken.lib.ctc_decoder

+
+
+kraken.lib.ctc_decoder.beam_decoder(outputs, beam_size=3)
+

Translates back the network output to a label sequence using +same-prefix-merge beam search decoding as described in [0].

+

[0] Hannun, Awni Y., et al. “First-pass large vocabulary continuous speech +recognition using bi-directional recurrent DNNs.” arXiv preprint +arXiv:1408.2873 (2014).

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • beam_size (int) – Size of the beam

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, prob). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+kraken.lib.ctc_decoder.greedy_decoder(outputs)
+

Translates back the network output to a label sequence using greedy/best +path decoding as described in [0].

+

[0] Graves, Alex, et al. “Connectionist temporal classification: labelling +unsegmented sequence data with recurrent neural networks.” Proceedings of +the 23rd international conference on Machine learning. ACM, 2006.

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, max). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+kraken.lib.ctc_decoder.blank_threshold_decoder(outputs, threshold=0.5)
+

Translates back the network output to a label sequence as the original +ocropy/clstm.

+

Thresholds on class 0, then assigns the maximum (non-zero) class to each +region.

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • threshold (float) – Threshold for 0 class when determining possible label +locations.

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, max). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+

kraken.lib.exceptions

+
+
+class kraken.lib.exceptions.KrakenCodecException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenStopTrainingException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenEncodeException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenRecordException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenInvalidModelException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenInputException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenRepoException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenCairoSurfaceException(message, width, height)
+

Raised when the Cairo surface couldn’t be created.

+
+
Parameters:
+
    +
  • message (str)

  • +
  • width (int)

  • +
  • height (int)

  • +
+
+
+
+
+message
+

Error message

+
+
Type:
+

str

+
+
+
+ +
+
+width
+

Width of the surface

+
+
Type:
+

int

+
+
+
+ +
+
+height
+

Height of the surface

+
+
Type:
+

int

+
+
+
+ +
+
+height
+
+ +
+
+message
+
+ +
+
+width
+
+ +
+ +
+
+

kraken.lib.models module

+
+
+class kraken.lib.models.TorchSeqRecognizer(nn, decoder=kraken.lib.ctc_decoder.greedy_decoder, train=False, device='cpu')
+

A wrapper class around a TorchVGSLModel for text recognition.

+
+
Parameters:
+
+
+
+
+
+codec
+
+ +
+
+decoder
+
+ +
+
+device
+
+ +
+
+forward(line, lens=None)
+

Performs a forward pass on a torch tensor of one or more lines with +shape (N, C, H, W) and returns a numpy array (N, W, C).

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (torch.Tensor) – Optional tensor containing sequence lengths if N > 1

  • +
+
+
Returns:
+

Tuple with (N, W, C) shaped numpy array and final output sequence +lengths.

+
+
Raises:
+

KrakenInputException – Is raised if the channel dimension isn’t of +size 1 in the network output.

+
+
Return type:
+

Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]

+
+
+
+ +
+
+kind = ''
+
+ +
+
+nn
+
+ +
+
+one_channel_mode
+
+ +
+
+predict(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns the decoding as a list of tuples (string, start, end, +confidence).

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (Optional[torch.Tensor]) – Optional tensor containing sequence lengths if N > 1

  • +
+
+
Returns:
+

List of decoded sequences.

+
+
Return type:
+

List[List[Tuple[str, int, int, float]]]

+
+
+
+ +
+
+predict_labels(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns a list of tuples (class, start, end, max). Max is the +maximum value of the softmax layer in the region.

+
+
Parameters:
+
    +
  • line (torch.tensor)

  • +
  • lens (torch.Tensor)

  • +
+
+
Return type:
+

List[List[Tuple[int, int, int, float]]]

+
+
+
+ +
+
+predict_string(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns a string of the results.

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (Optional[torch.Tensor]) – Optional tensor containing the sequence lengths of the input batch.

  • +
+
+
Return type:
+

List[str]

+
+
+
+ +
+
+seg_type
+
+ +
+
+to(device)
+

Moves model to device and automatically loads input tensors onto it.

+
+ +
+
+train
+
+ +
+ +
+
+kraken.lib.models.load_any(fname, train=False, device='cpu')
+

Loads anything that was, is, and will be a valid ocropus model and +instantiates a shiny new kraken.lib.lstm.SeqRecognizer from the RNN +configuration in the file.

+

Currently it recognizes the following kinds of models:

+
+
    +
  • protobuf models containing VGSL segmentation and recognition +networks.

  • +
+
+

Additionally an attribute ‘kind’ will be added to the SeqRecognizer +containing a string representation of the source kind. Current known values +are:

+
+
    +
  • vgsl for VGSL models

  • +
+
+
+
Parameters:
+
    +
  • fname (Union[os.PathLike, str]) – Path to the model

  • +
  • train (bool) – Enables gradient calculation and dropout layers in model.

  • +
  • device (str) – Target device

  • +
+
+
Returns:
+

A kraken.lib.models.TorchSeqRecognizer object.

+
+
Raises:
+

KrakenInvalidModelException – if the model is not loadable by any parser.

+
+
Return type:
+

TorchSeqRecognizer

+
+
+
+ +
+
+

kraken.lib.segmentation module

+
+
+kraken.lib.segmentation.reading_order(lines, text_direction='lr')
+

Given the list of lines (a list of 2D slices), computes +the partial reading order. The output is a binary 2D array +such that order[i,j] is true if line i comes before line j +in reading order.

+
+
Parameters:
+
    +
  • lines (Sequence[Tuple[slice, slice]])

  • +
  • text_direction (Literal['lr', 'rl'])

  • +
+
+
Return type:
+

numpy.ndarray

+
+
+
+ +
+
+kraken.lib.segmentation.neural_reading_order(lines, text_direction='lr', regions=None, im_size=None, model=None, class_mapping=None)
+

Given a list of baselines and regions, calculates the correct reading order +and applies it to the input.

+
+
Parameters:
+
    +
  • lines (Sequence[Dict]) – List of tuples containing the baseline and its polygonization.

  • +
  • model (kraken.lib.vgsl.TorchVGSLModel) – torch Module for

  • +
  • text_direction (str)

  • +
  • regions (Optional[Sequence[shapely.geometry.Polygon]])

  • +
  • im_size (Tuple[int, int])

  • +
  • class_mapping (Dict[str, int])

  • +
+
+
Returns:
+

The indices of the ordered input.

+
+
Return type:
+

Sequence[int]

+
+
+
+ +
+
+kraken.lib.segmentation.polygonal_reading_order(lines, text_direction='lr', regions=None)
+

Given a list of baselines and regions, calculates the correct reading order +and applies it to the input.

+
+
Parameters:
+
    +
  • lines (Sequence[Dict]) – List of tuples containing the baseline and its polygonization.

  • +
  • regions (Optional[Sequence[shapely.geometry.Polygon]]) – List of region polygons.

  • +
  • text_direction (Literal['lr', 'rl']) – Set principal text direction for column ordering. Can +be ‘lr’ or ‘rl’

  • +
+
+
Returns:
+

The indices of the ordered input.

+
+
Return type:
+

Sequence[int]

+
+
+
+ +
+
+kraken.lib.segmentation.vectorize_lines(im, threshold=0.17, min_length=5, text_direction='horizontal')
+

Vectorizes lines from a binarized array.

+
+
Parameters:
+
    +
  • im (np.ndarray) – Array of shape (3, H, W) with the first dimension +being probabilities for (start_separators, +end_separators, baseline).

  • +
  • threshold (float) – Threshold for baseline blob detection.

  • +
  • min_length (int) – Minimal length of output baselines.

  • +
  • text_direction (str) – Base orientation of the text line (horizontal or +vertical).

  • +
+
+
Returns:
+

[[x0, y0, … xn, yn], [xm, ym, …, xk, yk], … ] +A list of lists containing the points of all baseline polylines.

+
+
+
+ +
+
+kraken.lib.segmentation.calculate_polygonal_environment(im=None, baselines=None, suppl_obj=None, im_feats=None, scale=None, topline=False, raise_on_error=False)
+

Given a list of baselines and an input image, calculates a polygonal +environment around each baseline.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – grayscale input image (mode ‘L’)

  • +
  • baselines (Sequence[Sequence[Tuple[int, int]]]) – List of lists containing a single baseline per entry.

  • +
  • suppl_obj (Sequence[Sequence[Tuple[int, int]]]) – List of lists containing additional polylines that should be +considered hard boundaries for polygonizaton purposes. Can +be used to prevent polygonization into non-text areas such +as illustrations or to compute the polygonization of a +subset of the lines in an image.

  • +
  • im_feats (numpy.ndarray) – An optional precomputed seamcarve energy map. Overrides data +in im. The default map is gaussian_filter(sobel(im), 2).

  • +
  • scale (Tuple[int, int]) – A 2-tuple (h, w) containing optional scale factors of the input. +Values of 0 are used for aspect-preserving scaling. None skips +input scaling.

  • +
  • topline (bool) – Switch to change default baseline location for offset +calculation purposes. If set to False, baselines are assumed +to be on the bottom of the text line and will be offset +upwards, if set to True, baselines are on the top and will be +offset downwards. If set to None, no offset will be applied.

  • +
  • raise_on_error (bool) – Raises error instead of logging them when they are +not-blocking

  • +
+
+
Returns:
+

List of lists of coordinates. If no polygonization could be compute for +a baseline None is returned instead.

+
+
+
+ +
+
+kraken.lib.segmentation.scale_polygonal_lines(lines, scale)
+

Scales baselines/polygon coordinates by a certain factor.

+
+
Parameters:
+
    +
  • lines (Sequence[Tuple[List, List]]) – List of tuples containing the baseline and its polygonization.

  • +
  • scale (Union[float, Tuple[float, float]]) – Scaling factor

  • +
+
+
Return type:
+

Sequence[Tuple[List, List]]

+
+
+
+ +
+
+kraken.lib.segmentation.scale_regions(regions, scale)
+

Scales baselines/polygon coordinates by a certain factor.

+
+
Parameters:
+
    +
  • lines – List of tuples containing the baseline and its polygonization.

  • +
  • scale (Union[float, Tuple[float, float]]) – Scaling factor

  • +
  • regions (Sequence[Tuple[List[int], List[int]]])

  • +
+
+
Return type:
+

Sequence[Tuple[List, List]]

+
+
+
+ +
+
+kraken.lib.segmentation.compute_polygon_section(baseline, boundary, dist1, dist2)
+

Given a baseline, polygonal boundary, and two points on the baseline return +the rectangle formed by the orthogonal cuts on that baseline segment. The +resulting polygon is not garantueed to have a non-zero area.

+

The distance can be larger than the actual length of the baseline if the +baseline endpoints are inside the bounding polygon. In that case the +baseline will be extrapolated to the polygon edge.

+
+
Parameters:
+
    +
  • baseline (Sequence[Tuple[int, int]]) – A polyline ((x1, y1), …, (xn, yn))

  • +
  • boundary (Sequence[Tuple[int, int]]) – A bounding polygon around the baseline (same format as +baseline). Last and first point are automatically connected.

  • +
  • dist1 (int) – Absolute distance along the baseline of the first point.

  • +
  • dist2 (int) – Absolute distance along the baseline of the second point.

  • +
+
+
Returns:
+

A sequence of polygon points.

+
+
Return type:
+

Tuple[Tuple[int, int]]

+
+
+
+ +
+
+kraken.lib.segmentation.extract_polygons(im, bounds, legacy=False)
+

Yields the subimages of image im defined in the list of bounding polygons +with baselines preserving order.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image

  • +
  • bounds (kraken.containers.Segmentation) – A Segmentation class containing a bounding box or baseline +segmentation.

  • +
  • legacy (bool) – Use the old, slow, and deprecated path

  • +
+
+
Yields:
+

The extracted subimage, and the corresponding bounding box or baseline

+
+
Return type:
+

Generator[Tuple[PIL.Image.Image, Union[kraken.containers.BBoxLine, kraken.containers.BaselineLine]], None, None]

+
+
+
+ +
+
+

kraken.lib.vgsl module

+
+
+class kraken.lib.vgsl.TorchVGSLModel(spec)
+

Class building a torch module from a VSGL spec.

+

The initialized class will contain a variable number of layers and a loss +function. Inputs and outputs are always 4D tensors in order (batch, +channels, height, width) with channels always being the feature dimension.

+

Importantly this means that a recurrent network will be fed the channel +vector at each step along its time axis, i.e. either put the non-time-axis +dimension into the channels dimension or use a summarizing RNN squashing +the time axis to 1 and putting the output into the channels dimension +respectively.

+
+
Parameters:
+

spec (str)

+
+
+
+
+input
+

Expected input tensor as a 4-tuple.

+
+ +
+
+nn
+

Stack of layers parsed from the spec.

+
+ +
+
+criterion
+

Fully parametrized loss function.

+
+ +
+
+user_metadata
+

dict with user defined metadata. Is flushed into +model file during saving/overwritten by loading +operations.

+
+ +
+
+one_channel_mode
+

Field indicating the image type used during +training of one-channel images. Is ‘1’ for +models trained on binarized images, ‘L’ for +grayscale, and None otherwise.

+
+ +
+
+add_codec(codec)
+

Adds a PytorchCodec to the model.

+
+
Parameters:
+

codec (kraken.lib.codec.PytorchCodec)

+
+
Return type:
+

None

+
+
+
+ +
+
+append(idx, spec)
+

Splits a model at layer idx and append layers spec.

+

New layers are initialized using the init_weights method.

+
+
Parameters:
+
    +
  • idx (int) – Index of layer to append spec to starting with 1. To +select the whole layer stack set idx to None.

  • +
  • spec (str) – VGSL spec without input block to append to model.

  • +
+
+
Return type:
+

None

+
+
+
+ +
+
+property aux_layers
+
+ +
+
+blocks
+
+ +
+
+build_addition(input, blocks, idx, target_output_shape=None)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_conv(input, blocks, idx, target_output_shape=None)
+

Builds a 2D convolution layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_dropout(input, blocks, idx, target_output_shape=None)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_groupnorm(input, blocks, idx, target_output_shape=None)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_identity(input, blocks, idx, target_output_shape=None)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_maxpool(input, blocks, idx, target_output_shape=None)
+

Builds a maxpool layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_output(input, blocks, idx, target_output_shape=None)
+

Builds an output layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_parallel(input, blocks, idx, target_output_shape=None)
+

Builds a block of parallel layers.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_reshape(input, blocks, idx, target_output_shape=None)
+

Builds a reshape layer

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_rnn(input, blocks, idx, target_output_shape=None)
+

Builds an LSTM/GRU layer returning number of outputs and layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_ro(input, blocks, idx)
+

Builds a RO determination layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_series(input, blocks, idx, target_output_shape=None)
+

Builds a serial block of layers.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_wav2vec2(input, blocks, idx, target_output_shape=None)
+

Builds a Wav2Vec2 masking layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+codec: kraken.lib.codec.PytorchCodec | None = None
+
+ +
+
+criterion: Any = None
+
+ +
+
+eval()
+

Sets the model to evaluation/inference mode, disabling dropout and +gradient calculation.

+
+
Return type:
+

None

+
+
+
+ +
+
+property hyper_params
+
+ +
+
+idx
+
+ +
+
+init_weights(idx=slice(0, None))
+

Initializes weights for all or a subset of layers in the graph.

+

LSTM/GRU layers are orthogonally initialized, convolutional layers +uniformly from (-0.1,0.1).

+
+
Parameters:
+

idx (slice) – A slice object representing the indices of layers to +initialize.

+
+
Return type:
+

None

+
+
+
+ +
+
+input
+
+ +
+
+classmethod load_model(path)
+

Deserializes a VGSL model from a CoreML file.

+
+
Parameters:
+

path (Union[str, os.PathLike]) – CoreML file

+
+
Returns:
+

A TorchVGSLModel instance.

+
+
Raises:
+
    +
  • KrakenInvalidModelException if the model data is invalid (not a

  • +
  • string, protobuf file, or without appropriate metadata).

  • +
  • FileNotFoundError if the path doesn't point to a file.

  • +
+
+
+
+ +
+
+m
+
+ +
+
+property model_type
+
+ +
+
+named_spec: List[str] = []
+
+ +
+
+nn
+
+ +
+
+property one_channel_mode
+
+ +
+
+ops
+
+ +
+
+pattern
+
+ +
+
+resize_output(output_size, del_indices=None)
+

Resizes an output layer.

+
+
Parameters:
+
    +
  • output_size (int) – New size/output channels of last layer

  • +
  • del_indices (list) – list of outputs to delete from layer

  • +
+
+
Return type:
+

None

+
+
+
+ +
+
+save_model(path)
+

Serializes the model into path.

+
+
Parameters:
+

path (str) – Target destination

+
+
+
+ +
+
+property seg_type
+
+ +
+
+set_num_threads(num)
+

Sets number of OpenMP threads to use.

+
+
Parameters:
+

num (int)

+
+
Return type:
+

None

+
+
+
+ +
+
+spec
+
+ +
+
+to(device)
+
+
Parameters:
+

device (Union[str, torch.device])

+
+
Return type:
+

None

+
+
+
+ +
+
+train()
+

Sets the model to training mode (enables dropout layers and disables +softmax on CTC layers).

+
+
Return type:
+

None

+
+
+
+ +
+
+property use_legacy_polygons
+
+ +
+
+user_metadata: Dict[str, Any]
+
+ +
+ +
+
+

kraken.lib.xml module

+
+
+class kraken.lib.xml.XMLPage(filename, filetype='xml')
+
+
Parameters:
+
    +
  • filename (Union[str, os.PathLike])

  • +
  • filetype (Literal['xml', 'alto', 'page'])

  • +
+
+
+
+ +
+
+
+

Training

+
+

kraken.lib.train module

+
+
+

Loss and Evaluation Functions

+
+
+

Trainer

+
+
+class kraken.lib.train.KrakenTrainer(enable_progress_bar=True, enable_summary=True, min_epochs=5, max_epochs=100, freeze_backbone=-1, pl_logger=None, log_dir=None, *args, **kwargs)
+
+
Parameters:
+
    +
  • enable_progress_bar (bool)

  • +
  • enable_summary (bool)

  • +
  • min_epochs (int)

  • +
  • max_epochs (int)

  • +
  • pl_logger (Union[lightning.pytorch.loggers.logger.Logger, str, None])

  • +
  • log_dir (Optional[os.PathLike])

  • +
+
+
+
+
+automatic_optimization = False
+
+ +
+
+fit(*args, **kwargs)
+
+ +
+ +
+
+

kraken.lib.dataset module

+
+

Recognition datasets

+
+
+class kraken.lib.dataset.ArrowIPCRecognitionDataset(normalization=None, whitespace_normalization=True, skip_empty_lines=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False, split_filter=None)
+

Dataset for training a recognition model from a precompiled dataset in +Arrow IPC format.

+
+
Parameters:
+
    +
  • normalization (Optional[str])

  • +
  • whitespace_normalization (bool)

  • +
  • skip_empty_lines (bool)

  • +
  • reorder (Union[bool, Literal['L', 'R']])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • augmentation (bool)

  • +
  • split_filter (Optional[str])

  • +
+
+
+
+
+add(file)
+

Adds an Arrow IPC file to the dataset.

+
+
Parameters:
+

file (Union[str, os.PathLike]) – Location of the precompiled dataset file.

+
+
Return type:
+

None

+
+
+
+ +
+
+alphabet: collections.Counter
+
+ +
+
+arrow_table = None
+
+ +
+
+aug = None
+
+ +
+
+codec = None
+
+ +
+
+encode(codec=None)
+

Adds a codec to the dataset.

+
+
Parameters:
+

codec (Optional[kraken.lib.codec.PytorchCodec])

+
+
Return type:
+

None

+
+
+
+ +
+
+failed_samples
+
+ +
+
+im_mode
+
+ +
+
+legacy_polygons_status = None
+
+ +
+
+no_encode()
+

Creates an unencoded dataset.

+
+
Return type:
+

None

+
+
+
+ +
+
+rebuild_alphabet()
+

Recomputes the alphabet depending on the given text transformation.

+
+ +
+
+seg_type = None
+
+ +
+
+skip_empty_lines
+
+ +
+
+text_transforms: List[Callable[[str], str]] = []
+
+ +
+
+transforms
+
+ +
+ +
+
+class kraken.lib.dataset.BaselineSet(line_width=4, padding=(0, 0, 0, 0), im_transforms=transforms.Compose([]), augmentation=False, valid_baselines=None, merge_baselines=None, valid_regions=None, merge_regions=None)
+

Dataset for training a baseline/region segmentation model.

+
+
Parameters:
+
    +
  • line_width (int)

  • +
  • padding (Tuple[int, int, int, int])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • augmentation (bool)

  • +
  • valid_baselines (Sequence[str])

  • +
  • merge_baselines (Dict[str, Sequence[str]])

  • +
  • valid_regions (Sequence[str])

  • +
  • merge_regions (Dict[str, Sequence[str]])

  • +
+
+
+
+
+add(doc)
+

Adds a page to the dataset.

+
+
Parameters:
+

doc (kraken.containers.Segmentation) – A Segmentation container class.

+
+
+
+ +
+
+aug = None
+
+ +
+
+class_mapping
+
+ +
+
+class_stats
+
+ +
+
+failed_samples
+
+ +
+
+im_mode = '1'
+
+ +
+
+imgs = []
+
+ +
+
+line_width
+
+ +
+
+mbl_dict
+
+ +
+
+mreg_dict
+
+ +
+
+num_classes = 2
+
+ +
+
+pad
+
+ +
+
+seg_type = None
+
+ +
+
+targets = []
+
+ +
+
+transform(image, target)
+
+ +
+
+transforms
+
+ +
+
+valid_baselines
+
+ +
+
+valid_regions
+
+ +
+ +
+
+class kraken.lib.dataset.GroundTruthDataset(normalization=None, whitespace_normalization=True, skip_empty_lines=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False)
+

Dataset for training a line recognition model.

+

All data is cached in memory.

+
+
Parameters:
+
    +
  • normalization (Optional[str])

  • +
  • whitespace_normalization (bool)

  • +
  • skip_empty_lines (bool)

  • +
  • reorder (Union[bool, str])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • augmentation (bool)

  • +
+
+
+
+
+add(line=None, page=None)
+

Adds an individual line or all lines on a page to the dataset.

+
+
Parameters:
+
+
+
+
+ +
+
+add_line(line)
+

Adds a line to the dataset.

+
+
Parameters:
+

line (kraken.containers.BBoxLine) – BBoxLine container object for a line.

+
+
Raises:
+
    +
  • ValueError if the transcription of the line is empty after

  • +
  • transformation or either baseline or bounding polygon are missing.

  • +
+
+
+
+ +
+
+add_page(page)
+

Adds all lines on a page to the dataset.

+

Invalid lines will be skipped and a warning will be printed.

+
+
Parameters:
+

page (kraken.containers.Segmentation) – Segmentation container object for a page.

+
+
+
+ +
+
+alphabet: collections.Counter
+
+ +
+
+aug = None
+
+ +
+
+encode(codec=None)
+

Adds a codec to the dataset and encodes all text lines.

+

Has to be run before sampling from the dataset.

+
+
Parameters:
+

codec (Optional[kraken.lib.codec.PytorchCodec])

+
+
Return type:
+

None

+
+
+
+ +
+
+failed_samples
+
+ +
+
+property im_mode
+
+ +
+
+no_encode()
+

Creates an unencoded dataset.

+
+
Return type:
+

None

+
+
+
+ +
+
+seg_type = 'bbox'
+
+ +
+
+skip_empty_lines
+
+ +
+
+text_transforms: List[Callable[[str], str]] = []
+
+ +
+
+transforms
+
+ +
+ +
+
+

Segmentation datasets

+
+
+class kraken.lib.dataset.PolygonGTDataset(normalization=None, whitespace_normalization=True, skip_empty_lines=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False, legacy_polygons=False)
+

Dataset for training a line recognition model from polygonal/baseline data.

+
+
Parameters:
+
    +
  • normalization (Optional[str])

  • +
  • whitespace_normalization (bool)

  • +
  • skip_empty_lines (bool)

  • +
  • reorder (Union[bool, Literal['L', 'R']])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • augmentation (bool)

  • +
  • legacy_polygons (bool)

  • +
+
+
+
+
+add(line=None, page=None)
+

Adds an individual line or all lines on a page to the dataset.

+
+
Parameters:
+
+
+
+
+ +
+
+add_line(line)
+

Adds a line to the dataset.

+
+
Parameters:
+

line (kraken.containers.BaselineLine) – BaselineLine container object for a line.

+
+
Raises:
+
    +
  • ValueError if the transcription of the line is empty after

  • +
  • transformation or either baseline or bounding polygon are missing.

  • +
+
+
+
+ +
+
+add_page(page)
+

Adds all lines on a page to the dataset.

+

Invalid lines will be skipped and a warning will be printed.

+
+
Parameters:
+

page (kraken.containers.Segmentation) – Segmentation container object for a page.

+
+
+
+ +
+
+alphabet: collections.Counter
+
+ +
+
+aug = None
+
+ +
+
+encode(codec=None)
+

Adds a codec to the dataset and encodes all text lines.

+

Has to be run before sampling from the dataset.

+
+
Parameters:
+

codec (Optional[kraken.lib.codec.PytorchCodec])

+
+
Return type:
+

None

+
+
+
+ +
+
+failed_samples
+
+ +
+
+property im_mode
+
+ +
+
+legacy_polygons
+
+ +
+
+no_encode()
+

Creates an unencoded dataset.

+
+
Return type:
+

None

+
+
+
+ +
+
+seg_type = 'baselines'
+
+ +
+
+skip_empty_lines
+
+ +
+
+text_transforms: List[Callable[[str], str]] = []
+
+ +
+
+transforms
+
+ +
+ +
+
+

Reading order datasets

+
+
+class kraken.lib.dataset.PairWiseROSet(files=None, mode='xml', level='baselines', ro_id=None, class_mapping=None)
+

Dataset for training a reading order determination model.

+

Returns random pairs of lines from the same page.

+
+
Parameters:
+
    +
  • files (Sequence[Union[os.PathLike, str]])

  • +
  • mode (Optional[Literal['alto', 'page', 'xml']])

  • +
  • level (Literal['regions', 'baselines'])

  • +
  • ro_id (Optional[str])

  • +
  • class_mapping (Optional[Dict[str, int]])

  • +
+
+
+
+
+data = []
+
+ +
+
+failed_samples = []
+
+ +
+
+get_feature_dim()
+
+ +
+ +
+
+class kraken.lib.dataset.PageWiseROSet(files=None, mode='xml', level='baselines', ro_id=None, class_mapping=None)
+

Dataset for training a reading order determination model.

+

Returns all lines from the same page.

+
+
Parameters:
+
    +
  • files (Sequence[Union[os.PathLike, str]])

  • +
  • mode (Optional[Literal['alto', 'page', 'xml']])

  • +
  • level (Literal['regions', 'baselines'])

  • +
  • ro_id (Optional[str])

  • +
  • class_mapping (Optional[Dict[str, int]])

  • +
+
+
+
+
+data = []
+
+ +
+
+failed_samples = []
+
+ +
+
+get_feature_dim()
+
+ +
+ +
+
+

Helpers

+
+
+class kraken.lib.dataset.ImageInputTransforms(batch, height, width, channels, pad, valid_norm=True, force_binarization=False)
+
+
Parameters:
+
    +
  • batch (int)

  • +
  • height (int)

  • +
  • width (int)

  • +
  • channels (int)

  • +
  • pad (Union[int, Tuple[int, int], Tuple[int, int, int, int]])

  • +
  • valid_norm (bool)

  • +
  • force_binarization (bool)

  • +
+
+
+
+
+property batch: int
+

Batch size attribute. Ignored.

+
+
Return type:
+

int

+
+
+
+ +
+
+property centerline_norm: bool
+

Attribute indicating if centerline normalization will be applied to +input images.

+
+
Return type:
+

bool

+
+
+
+ +
+
+property channels: int
+

Channels attribute. Can be either 1 (binary/grayscale), 3 (RGB).

+
+
Return type:
+

int

+
+
+
+ +
+
+property force_binarization: bool
+

Switch enabling/disabling forced binarization.

+
+
Return type:
+

bool

+
+
+
+ +
+
+property height: int
+

Desired output image height. If set to 0, image will be rescaled +proportionally with width, if 1 and channels is larger than 3 output +will be grayscale and of the height set with the channels attribute.

+
+
Return type:
+

int

+
+
+
+ +
+
+property mode: str
+

Imaginary PIL.Image.Image mode of the output tensor. Possible values +are RGB, L, and 1.

+
+
Return type:
+

str

+
+
+
+ +
+
+property pad: int
+

Amount of padding around left/right end of image.

+
+
Return type:
+

int

+
+
+
+ +
+
+property scale: Tuple[int, int]
+

Desired output shape (height, width) of the image. If any value is set +to 0, image will be rescaled proportionally with height, width, if 1 +and channels is larger than 3 output will be grayscale and of the +height set with the channels attribute.

+
+
Return type:
+

Tuple[int, int]

+
+
+
+ +
+
+property valid_norm: bool
+

Switch allowing/disallowing centerline normalization. Even if enabled +won’t be applied to 3-channel images.

+
+
Return type:
+

bool

+
+
+
+ +
+
+property width: int
+

Desired output image width. If set to 0, image will be rescaled +proportionally with height.

+
+
Return type:
+

int

+
+
+
+ +
+ +
+
+kraken.lib.dataset.collate_sequences(batch)
+

Sorts and pads sequences.

+
+ +
+
+kraken.lib.dataset.global_align(seq1, seq2)
+

Computes a global alignment of two strings.

+
+
Parameters:
+
    +
  • seq1 (Sequence[Any])

  • +
  • seq2 (Sequence[Any])

  • +
+
+
Return type:
+

Tuple[int, List[str], List[str]]

+
+
+

Returns a tuple (distance, list(algn1), list(algn2))

+
+ +
+
+kraken.lib.dataset.compute_confusions(algn1, algn2)
+

Compute confusion matrices from two globally aligned strings.

+
+
Parameters:
+
    +
  • align1 (Sequence[str]) – sequence 1

  • +
  • align2 (Sequence[str]) – sequence 2

  • +
  • algn1 (Sequence[str])

  • +
  • algn2 (Sequence[str])

  • +
+
+
Returns:
+

A tuple (counts, scripts, ins, dels, subs) with counts being per-character +confusions, scripts per-script counts, ins a dict with per script +insertions, del an integer of the number of deletions, subs per +script substitutions.

+
+
+
+ +
+
+
+
+

Legacy modules

+

These modules are retained for compatibility reasons or highly specialized use +cases. In most cases their use is not necessary and they aren’t further +developed for interoperability with new functionality, e.g. the transcription +and line generation modules do not work with the baseline segmenter.

+
+

kraken.binarization module

+
+
+kraken.binarization.nlbin(im, threshold=0.5, zoom=0.5, escale=1.0, border=0.1, perc=80, range=20, low=5, high=90)
+

Performs binarization using non-linear processing.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image

  • +
  • threshold (float)

  • +
  • zoom (float) – Zoom for background page estimation

  • +
  • escale (float) – Scale for estimating a mask over the text region

  • +
  • border (float) – Ignore this much of the border

  • +
  • perc (int) – Percentage for filters

  • +
  • range (int) – Range for filters

  • +
  • low (int) – Percentile for black estimation

  • +
  • high (int) – Percentile for white estimation

  • +
+
+
Returns:
+

PIL.Image.Image containing the binarized image

+
+
Raises:
+

KrakenInputException – When trying to binarize an empty image.

+
+
Return type:
+

PIL.Image.Image

+
+
+
+ +
+
+

kraken.transcribe module

+
+
+class kraken.transcribe.TranscriptionInterface(font=None, font_style=None)
+
+
+add_page(im, segmentation=None, records=None)
+

Adds an image to the transcription interface, optionally filling in +information from a list of ocr_record objects.

+
+
Parameters:
+
    +
  • im (PIL.Image) – Input image

  • +
  • segmentation (dict) – Output of the segment method.

  • +
  • records (list) – A list of ocr_record objects.

  • +
+
+
+
+ +
+
+env
+
+ +
+
+font
+
+ +
+
+line_idx = 1
+
+ +
+
+page_idx = 1
+
+ +
+
+pages: List[Dict[Any, Any]] = []
+
+ +
+
+seg_idx = 1
+
+ +
+
+text_direction = 'horizontal-tb'
+
+ +
+
+tmpl
+
+ +
+
+write(fd)
+

Writes the HTML file to a file descriptor.

+
+
Parameters:
+

fd (File) – File descriptor (mode=’rb’) to write to.

+
+
+
+ +
+ +
+
+

kraken.linegen module

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.2/genindex.html b/5.2/genindex.html new file mode 100644 index 000000000..52d9d9cab --- /dev/null +++ b/5.2/genindex.html @@ -0,0 +1,926 @@ + + + + + + + Index — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ + +

Index

+ +
+ A + | B + | C + | D + | E + | F + | G + | H + | I + | K + | L + | M + | N + | O + | P + | R + | S + | T + | U + | V + | W + | X + +
+

A

+ + + +
+ +

B

+ + + +
+ +

C

+ + + +
+ +

D

+ + + +
+ +

E

+ + + +
+ +

F

+ + + +
+ +

G

+ + + +
+ +

H

+ + + +
+ +

I

+ + + +
+ +

K

+ + + +
+ +

L

+ + + +
+ +

M

+ + + +
+ +

N

+ + + +
+ +

O

+ + + +
+ +

P

+ + + +
+ +

R

+ + + +
+ +

S

+ + + +
+ +

T

+ + + +
+ +

U

+ + + +
+ +

V

+ + + +
+ +

W

+ + + +
+ +

X

+ + +
+ + + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.2/gpu.html b/5.2/gpu.html new file mode 100644 index 000000000..e3353bd04 --- /dev/null +++ b/5.2/gpu.html @@ -0,0 +1,100 @@ + + + + + + + + GPU Acceleration — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

GPU Acceleration

+

The latest version of kraken uses a new pytorch backend which enables GPU +acceleration both for training and recognition. Apart from a compatible Nvidia +GPU, CUDA and cuDNN have to be installed so pytorch can run computation on it.

+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.2/index.html b/5.2/index.html new file mode 100644 index 000000000..ad4ea0f3a --- /dev/null +++ b/5.2/index.html @@ -0,0 +1,1040 @@ + + + + + + + + kraken — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

kraken

+
+
+

kraken is a turn-key OCR system optimized for historical and non-Latin script +material.

+
+
+

Features

+

kraken’s main features are:

+
+
+
+

Pull requests and code contributions are always welcome.

+
+
+

Installation

+

Kraken can be run on Linux or Mac OS X (both x64 and ARM). Installation through +the on-board pip utility and the anaconda +scientific computing python are supported.

+
+

Installation using Pip

+
$ pip install kraken
+
+
+

or by running pip in the git repository:

+
$ pip install .
+
+
+

If you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to +install the pdf extras package for PyPi:

+
$ pip install kraken[pdf]
+
+
+

or

+
$ pip install .[pdf]
+
+
+

respectively.

+
+
+

Installation using Conda

+

To install the stable version through conda:

+
$ conda install -c conda-forge -c mittagessen kraken
+
+
+

Again PDF/multi-page TIFF/JPEG2000 support requires some additional dependencies:

+
$ conda install -c conda-forge pyvips
+
+
+

The git repository contains some environment files that aid in setting up the latest development version:

+
$ git clone https://github.com/mittagessen/kraken.git
+$ cd kraken
+$ conda env create -f environment.yml
+
+
+

or:

+
$ git clone https://github.com/mittagessen/kraken.git
+$ cd kraken
+$ conda env create -f environment_cuda.yml
+
+
+

for CUDA acceleration with the appropriate hardware.

+
+
+

Finding Recognition Models

+

Finally you’ll have to scrounge up a model to do the actual recognition of +characters. To download the default model for printed French text and place it +in the kraken directory for the current user:

+
$ kraken get 10.5281/zenodo.10592716
+
+
+

A list of libre models available in the central repository can be retrieved by +running:

+
$ kraken list
+
+
+

Model metadata can be extracted using:

+
$ kraken show 10.5281/zenodo.10592716
+name: 10.5281/zenodo.10592716
+
+CATMuS-Print (Large, 2024-01-30) - Diachronic model for French prints and other languages
+
+<p><strong>CATMuS-Print (Large) - Diachronic model for French prints and other West European languages</strong></p>
+<p>CATMuS (Consistent Approach to Transcribing ManuScript) Print is a Kraken HTR model trained on data produced by several projects, dealing with different languages (French, Spanish, German, English, Corsican, Catalan, Latin, Italian&hellip;) and different centuries (from the first prints of the 16th c. to digital documents of the 21st century).</p>
+<p>Transcriptions follow graphematic principles and try to be as compatible as possible with guidelines previously published for French: no ligature (except those that still exist), no allographetic variants (except the long s), and preservation of the historical use of some letters (u/v, i/j). Abbreviations are not resolved. Inconsistencies might be present, because transcriptions have been done over several years and the norms have slightly evolved.</p>
+<p>The model is trained with NFKD Unicode normalization: each diacritic (including superscripts) are transcribed as their own characters, separately from the "main" character.</p>
+<p>This model is the result of the collaboration from researchers from the University of Geneva and Inria Paris and will be consolidated under the CATMuS Medieval Guidelines in an upcoming paper.</p>
+scripts: Latn
+alphabet: !"#$%&'()*+,-./0123456789:;<=>?ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_abcdefghijklmnopqrstuvwxyz|}~¡£¥§«¬°¶·»¿ÆßæđłŒœƀǝɇΑΒΓΔΕΖΘΙΚΛΜΝΟΠΡΣΤΥΦΧΩαβγδεζηθικλμνξοπρςστυφχωϛחלרᑕᗅᗞᚠẞ–—‘’‚“”„‟†•⁄⁊⁋℟←▽◊★☙✠✺✻⟦⟧⬪ꝑꝓꝗꝙꝟꝯꝵ SPACE, COMBINING GRAVE ACCENT, COMBINING ACUTE ACCENT, COMBINING CIRCUMFLEX ACCENT, COMBINING TILDE, COMBINING MACRON, COMBINING DOT ABOVE, COMBINING DIAERESIS, COMBINING RING ABOVE, COMBINING COMMA ABOVE, COMBINING REVERSED COMMA ABOVE, COMBINING CEDILLA, COMBINING OGONEK, COMBINING GREEK PERISPOMENI, COMBINING GREEK YPOGEGRAMMENI, COMBINING LATIN SMALL LETTER I, COMBINING LATIN SMALL LETTER U, 0xe682, 0xe68b, 0xe8bf, 0xf1a7
+accuracy: 98.56%
+license: cc-by-4.0
+author(s): Gabay, Simon; Clérice, Thibault
+date: 2024-01-30
+
+
+
+
+
+

Quickstart

+

The structure of an OCR software consists of multiple steps, primarily +preprocessing, segmentation, and recognition, each of which takes the output of +the previous step and sometimes additional files such as models and templates +that define how a particular transformation is to be performed.

+

In kraken these are separated into different subcommands that can be chained or +ran separately:

+ + + + + + + + + + + + + + + Segmentation + + + + + + + + + + + Recognition + + + + + + + + + + + Serialization + + + + + + + + + + + + + + + + + + + + + + Recognition Model + + + + + + + + + + + + + + + + + + + + + + Segmentation Model + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + OCR Records + + + + + + + + + + + + + + + + + + Baselines, + Regions, + and Order + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Output File + + + + + + + + + + + + + + + + + + Output Template + + + + + + + + + + + + + + + + + + Image + + +

Recognizing text on an image using the default parameters including the +prerequisite step of page segmentation:

+
$ kraken -i image.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel
+Loading RNN     ✓
+Processing      ⣻
+
+
+

To segment an image into reading-order sorted baselines and regions:

+
$ kraken -i bw.tif lines.json segment -bl
+
+
+

To OCR an image using the previously downloaded model:

+
$ kraken -i bw.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel
+
+
+

To OCR an image using the default model and serialize the output using the ALTO +template:

+
$ kraken -a -i bw.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel
+
+
+

All commands and their parameters are documented, just add the standard +--help flag for further information.

+
+
+

Training Tutorial

+

There is a training tutorial at Training kraken.

+
+ +
+

License

+

Kraken is provided under the terms and conditions of the Apache 2.0 +License.

+
+
+

Funding

+

kraken is developed at the École Pratique des Hautes Études, Université PSL.

+
+
+Co-financed by the European Union + +
+
+

This project was partially funded through the RESILIENCE project, funded from +the European Union’s Horizon 2020 Framework Programme for Research and +Innovation.

+
+
+
+
+Received funding from the Programme d’investissements d’Avenir + +
+
+

Ce travail a bénéficié d’une aide de l’État gérée par l’Agence Nationale de la +Recherche au titre du Programme d’Investissements d’Avenir portant la référence +ANR-21-ESRE-0005 (Biblissima+).

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.2/ketos.html b/5.2/ketos.html new file mode 100644 index 000000000..871e8edca --- /dev/null +++ b/5.2/ketos.html @@ -0,0 +1,950 @@ + + + + + + + + Training — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Training

+

This page describes the training utilities available through the ketos +command line utility in depth. For a gentle introduction on model training +please refer to the tutorial.

+

There are currently three trainable components in the kraken processing pipeline: +* Segmentation: finding lines and regions in images +* Reading Order: ordering lines found in the previous segmentation step. Reading order models are closely linked to segmentation models and both are usually trained on the same dataset. +* Recognition: recognition models transform images of lines into text.

+

Depending on the use case it is not necessary to manually train new models for +each material. The default segmentation model works well on quite a variety of +handwritten and printed documents, a reading order model might not perform +better than the default heuristic for simple text flows, and there are +recognition models for some types of material available in the repository.

+
+

Best practices

+
+

Recognition model training

+
    +
  • The default architecture works well for decently sized datasets.

  • +
  • Use precompiled binary datasets and put them in a place where they can be memory mapped during training (local storage, not NFS or similar).

  • +
  • Use the --logger flag to track your training metrics across experiments using Tensorboard.

  • +
  • If the network doesn’t converge before the early stopping aborts training, increase --min-epochs or --lag. Use the --logger option to inspect your training loss.

  • +
  • Use the flag --augment to activate data augmentation.

  • +
  • Increase the amount of --workers to speedup data loading. This is essential when you use the --augment option.

  • +
  • When using an Nvidia GPU, set the --precision option to 16 to use automatic mixed precision (AMP). This can provide significant speedup without any loss in accuracy.

  • +
  • Use option -B to scale batch size until GPU utilization reaches 100%. When using a larger batch size, it is recommended to use option -r to scale the learning rate by the square root of the batch size (1e-3 * sqrt(batch_size)).

  • +
  • When fine-tuning, it is recommended to use new mode not union as the network will rapidly unlearn missing labels in the new dataset.

  • +
  • If the new dataset is fairly dissimilar or your base model has been pretrained with ketos pretrain, use --warmup in conjunction with --freeze-backbone for one 1 or 2 epochs.

  • +
  • Upload your models to the model repository.

  • +
+
+
+

Segmentation model training

+
    +
  • The segmenter is fairly robust when it comes to hyperparameter choice.

  • +
  • Start by finetuning from the default model for a fixed number of epochs (50 for reasonably sized datasets) with a cosine schedule.

  • +
  • Segmentation models’ performance is difficult to evaluate. Pixel accuracy doesn’t mean much because there are many more pixels that aren’t part of a line or region than just background. Frequency-weighted IoU is good for overall performance, while mean IoU overrepresents rare classes. The best way to evaluate segmentation models is to look at the output on unlabelled data.

  • +
  • If you don’t have rare classes you can use a fairly small validation set to make sure everything is converging and just visually validate on unlabelled data.

  • +
+
+
+
+

Training data formats

+

The training tools accept a variety of training data formats, usually some kind +of custom low level format, the XML-based formats that are commony used for +archival of annotation and transcription data, and in the case of recognizer +training a precompiled binary format. It is recommended to use the XML formats +for segmentation and reading order training and the binary format for +recognition training.

+
+

ALTO

+

Kraken parses and produces files according to ALTO 4.3. An example showing the +attributes necessary for segmentation, recognition, and reading order training +follows:

+
<?xml version="1.0" encoding="UTF-8"?>
+<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+	xmlns="http://www.loc.gov/standards/alto/ns-v4#"
+	xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-0.xsd">
+	<Description>
+		<sourceImageInformation>
+			<fileName>filename.jpg</fileName><!-- relative path in relation to XML location of the image file-->
+		</sourceImageInformation>
+		....
+	</Description>
+	<Layout>
+		<Page...>
+			<PrintSpace...>
+				<ComposedBlockType ID="block_I"
+						   HPOS="125"
+						   VPOS="523"
+						   WIDTH="5234"
+						   HEIGHT="4000"
+						   TYPE="region_type"><!-- for textlines part of a semantic region -->
+					<TextBlock ID="textblock_N">
+						<TextLine ID="line_0"
+							  HPOS="..."
+							  VPOS="..."
+							  WIDTH="..."
+							  HEIGHT="..."
+							  BASELINE="10 20 15 20 400 20"><!-- necessary for segmentation training -->
+							<String ID="segment_K"
+								CONTENT="word_text"><!-- necessary for recognition training. Text is retrieved from <String> and <SP> tags. Lower level glyphs are ignored. -->
+								...
+							</String>
+							<SP.../>
+						</TextLine>
+					</TextBlock>
+				</ComposedBlockType>
+				<TextBlock ID="textblock_M"><!-- for textlines not part of a region -->
+				...
+				</TextBlock>
+			</PrintSpace>
+		</Page>
+	</Layout>
+</alto>
+
+
+

Importantly, the parser only works with measurements in the pixel domain, i.e. +an unset MeasurementUnit or one with an element value of pixel. In +addition, as the minimal version required for ingestion is quite new it is +likely that most existing ALTO documents will not contain sufficient +information to be used with kraken out of the box.

+
+
+

PAGE XML

+

PAGE XML is parsed and produced according to the 2019-07-15 version of the +schema, although the parser is not strict and works with non-conformant output +from a variety of tools. As with ALTO, PAGE XML files can be used to train +segmentation, reading order, and recognition models.

+
<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd">
+	<Metadata>...</Metadata>
+	<Page imageFilename="filename.jpg"...><!-- relative path to an image file from the location of the XML document -->
+		<TextRegion id="block_N"
+			    custom="structure {type:region_type;}"><!-- region type is a free text field-->
+			<Coords points="10,20 500,20 400,200, 500,300, 10,300 5,80"/><!-- polygon for region boundary -->
+			<TextLine id="line_K">
+				<Baseline points="80,200 100,210, 400,198"/><!-- required for baseline segmentation training -->
+				<TextEquiv><Unicode>text text text</Unicode></TextEquiv><!-- only TextEquiv tags immediately below the TextLine tag are parsed for recognition training -->
+				<Word>
+				...
+			</TextLine>
+			....
+		</TextRegion>
+		<TextRegion id="textblock_M"><!-- for lines not contained in any region. TextRegions without a type are automatically assigned the 'text' type which can be filtered out for training. -->
+			<Coords points="0,0 0,{{ page.size[1] }} {{ page.size[0] }},{{ page.size[1] }} {{ page.size[0] }},0"/>
+			<TextLine>...</TextLine><!-- same as above -->
+			....
+                </TextRegion>
+	</Page>
+</PcGts>
+
+
+
+
+

Binary Datasets

+

In addition to training recognition models directly from XML and image files, a +binary dataset format offering a couple of advantages is supported for +recognition training. Binary datasets drastically improve loading performance +allowing the saturation of most GPUs with minimal computational overhead while +also allowing training with datasets that are larger than the systems main +memory. A minor drawback is a ~30% increase in dataset size in comparison to +the raw images + XML approach.

+

To realize this speedup the dataset has to be compiled first:

+
$ ketos compile -f xml -o dataset.arrow file_1.xml file_2.xml ...
+
+
+

if there are a lot of individual lines containing many lines this process can +take a long time. It can easily be parallelized by specifying the number of +separate parsing workers with the --workers option:

+
$ ketos compile --workers 8 -f xml ...
+
+
+

In addition, binary datasets can contain fixed splits which allow +reproducibility and comparability between training and evaluation runs. +Training, validation, and test splits can be pre-defined from multiple sources. +Per default they are sourced from tags defined in the source XML files unless +the option telling kraken to ignore them is set:

+
$ ketos compile --ignore-splits -f xml ...
+
+
+

Alternatively fixed-proportion random splits can be created ad-hoc during +compile time:

+
$ ketos compile --random-split 0.8 0.1 0.1 ...
+
+
+

The above line splits assigns 80% of the source lines to the training set, 10% +to the validation set, and 10% to the test set. The training and validation +sets in the dataset file are used automatically by ketos train (unless told +otherwise) while the remaining 10% of the test set is selected by ketos test.

+
+
+
+

Recognition training

+

The training utility allows training of VGSL specified models +both from scratch and from existing models. Here are its most important command line options:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

option

action

-o, --output

Output model file prefix. Defaults to model.

-s, --spec

VGSL spec of the network to train. CTC layer +will be added automatically. default: +[1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 +Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do]

-a, --append

Removes layers before argument and then +appends spec. Only works when loading an +existing model

-i, --load

Load existing file to continue training

-F, --savefreq

Model save frequency in epochs during +training

-q, --quit

Stop condition for training. Set to early +for early stopping (default) or fixed for fixed +number of epochs.

-N, --epochs

Number of epochs to train for.

--min-epochs

Minimum number of epochs to train for when using early stopping.

--lag

Number of epochs to wait before stopping +training without improvement. Only used when using early stopping.

-d, --device

Select device to use (cpu, cuda:0, cuda:1,…). GPU acceleration requires CUDA.

--optimizer

Select optimizer (Adam, SGD, RMSprop).

-r, --lrate

Learning rate [default: 0.001]

-m, --momentum

Momentum used with SGD optimizer. Ignored otherwise.

-w, --weight-decay

Weight decay.

--schedule

Sets the learning rate scheduler. May be either constant, 1cycle, exponential, cosine, step, or +reduceonplateau. For 1cycle the cycle length is determined by the –epoch option.

-p, --partition

Ground truth data partition ratio between train/validation set

-u, --normalization

Ground truth Unicode normalization. One of NFC, NFKC, NFD, NFKD.

-c, --codec

Load a codec JSON definition (invalid if loading existing model)

--resize

Codec/output layer resizing option. If set +to union code points will be added, new +will set the layer to match exactly the +training data, fail will abort if training +data and model codec do not match. Only valid when refining an existing model.

-n, --reorder / --no-reorder

Reordering of code points to display order.

-t, --training-files

File(s) with additional paths to training data. Used to +enforce an explicit train/validation set split and deal with +training sets with more lines than the command line can process. Can be used more than once.

-e, --evaluation-files

File(s) with paths to evaluation data. Overrides the -p parameter.

-f, --format-type

Sets the training and evaluation data format. +Valid choices are ‘path’, ‘xml’ (default), ‘alto’, ‘page’, or binary. +In alto, page, and xml mode all data is extracted from XML files +containing both baselines and a link to source images. +In path mode arguments are image files sharing a prefix up to the last +extension with JSON .path files containing the baseline information. +In binary mode arguments are precompiled binary dataset files.

--augment / --no-augment

Enables/disables data augmentation.

--workers

Number of OpenMP threads and workers used to perform neural network passes and load samples from the dataset.

+
+

From Scratch

+

The absolute minimal example to train a new recognition model from a number of +ALTO or PAGE XML documents is similar to the segmentation training:

+
$ ketos train -f xml training_data/*.xml
+
+
+

Training will continue until the error does not improve anymore and the best +model (among intermediate results) will be saved in the current directory; this +approach is called early stopping.

+

In some cases changing the network architecture might be useful. One such +example would be material that is not well recognized in the grayscale domain, +as the default architecture definition converts images into grayscale. The +input definition can be changed quite easily to train on color data (RGB) instead:

+
$ ketos train -f page -s '[1,120,0,3 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do0.1,2 Lbx200 Do]]' syr/*.xml
+
+
+

Complete documentation for the network description language can be found on the +VGSL page.

+

Sometimes the early stopping default parameters might produce suboptimal +results such as stopping training too soon. Adjusting the lag can be useful:

+
$ ketos train --lag 10 syr/*.png
+
+
+

To switch optimizers from Adam to SGD or RMSprop just set the option:

+
$ ketos train --optimizer SGD syr/*.png
+
+
+

It is possible to resume training from a previously saved model:

+
$ ketos train -i model_25.mlmodel syr/*.png
+
+
+

A good configuration for a small precompiled print dataset and GPU acceleration +would be:

+
$ ketos train -d cuda -f binary dataset.arrow
+
+
+

A better configuration for large and complicated datasets such as handwritten texts:

+
$ ketos train --augment --workers 4 -d cuda -f binary --min-epochs 20 -w 0 -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]' -r 0.0001 dataset_large.arrow
+
+
+

This configuration is slower to train and often requires a couple of epochs to +output any sensible text at all. Therefore we tell ketos to train for at least +20 epochs so the early stopping algorithm doesn’t prematurely interrupt the +training process.

+
+
+

Fine Tuning

+

Fine tuning an existing model for another typeface or new characters is also +possible with the same syntax as resuming regular training:

+
$ ketos train -f page -i model_best.mlmodel syr/*.xml
+
+
+

The caveat is that the alphabet of the base model and training data have to be +an exact match. Otherwise an error will be raised:

+
$ ketos train -i model_5.mlmodel kamil/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[0.8616] alphabet mismatch {'~', '»', '8', '9', 'ـ'}
+Network codec not compatible with training set
+[0.8620] Training data and model codec alphabets mismatch: {'ٓ', '؟', '!', 'ص', '،', 'ذ', 'ة', 'ي', 'و', 'ب', 'ز', 'ح', 'غ', '~', 'ف', ')', 'د', 'خ', 'م', '»', 'ع', 'ى', 'ق', 'ش', 'ا', 'ه', 'ك', 'ج', 'ث', '(', 'ت', 'ظ', 'ض', 'ل', 'ط', '؛', 'ر', 'س', 'ن', 'ء', 'ٔ', '«', 'ـ', 'ٕ'}
+
+
+

There are two modes dealing with mismatching alphabets, union and new. +union resizes the output layer and codec of the loaded model to include all +characters in the new training set without removing any characters. new +will make the resulting model an exact match with the new training set by both +removing unused characters from the model and adding new ones.

+
$ ketos -v train --resize union -i model_5.mlmodel syr/*.png
+...
+[0.7943] Training set 788 lines, validation set 88 lines, alphabet 50 symbols
+...
+[0.8337] Resizing codec to include 3 new code points
+[0.8374] Resizing last layer in network to 52 outputs
+...
+
+
+

In this example 3 characters were added for a network that is able to +recognize 52 different characters after sufficient additional training.

+
$ ketos -v train --resize new -i model_5.mlmodel syr/*.png
+...
+[0.7593] Training set 788 lines, validation set 88 lines, alphabet 49 symbols
+...
+[0.7857] Resizing network or given codec to 49 code sequences
+[0.8344] Deleting 2 output classes from network (46 retained)
+...
+
+
+

In new mode 2 of the original characters were removed and 3 new ones were added.

+
+
+

Slicing

+

Refining on mismatched alphabets has its limits. If the alphabets are highly +different the modification of the final linear layer to add/remove character +will destroy the inference capabilities of the network. In those cases it is +faster to slice off the last few layers of the network and only train those +instead of a complete network from scratch.

+

Taking the default network definition as printed in the debug log we can see +the layer indices of the model:

+
[0.8760] Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 48 outputs
+[0.8762] layer          type    params
+[0.8790] 0              conv    kernel 3 x 3 filters 32 activation r
+[0.8795] 1              dropout probability 0.1 dims 2
+[0.8797] 2              maxpool kernel 2 x 2 stride 2 x 2
+[0.8802] 3              conv    kernel 3 x 3 filters 64 activation r
+[0.8804] 4              dropout probability 0.1 dims 2
+[0.8806] 5              maxpool kernel 2 x 2 stride 2 x 2
+[0.8813] 6              reshape from 1 1 x 12 to 1/3
+[0.8876] 7              rnn     direction b transposed False summarize False out 100 legacy None
+[0.8878] 8              dropout probability 0.5 dims 1
+[0.8883] 9              linear  augmented False out 48
+
+
+

To remove everything after the initial convolutional stack and add untrained +layers we define a network stub and index for appending:

+
$ ketos train -i model_1.mlmodel --append 7 -s '[Lbx256 Do]' syr/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[0.8014] alphabet mismatch {'8', '3', '9', '7', '܇', '݀', '݂', '4', ':', '0'}
+Slicing and dicing model ✓
+
+
+

The new model will behave exactly like a new one, except potentially training a +lot faster.

+
+
+

Text Normalization and Unicode

+

Text can be encoded in multiple different ways when using Unicode. For many +scripts characters with diacritics can be encoded either as a single code point +or a base character and the diacritic, different types of whitespace exist, and mixed bidirectional text +can be written differently depending on the base line direction.

+

Ketos provides options to largely normalize input into normalized forms that +make processing of data from multiple sources possible. Principally, two +options are available: one for Unicode normalization and one for whitespace normalization. The +Unicode normalization (disabled per default) switch allows one to select one of +the 4 normalization forms:

+
$ ketos train --normalization NFD -f xml training_data/*.xml
+$ ketos train --normalization NFC -f xml training_data/*.xml
+$ ketos train --normalization NFKD -f xml training_data/*.xml
+$ ketos train --normalization NFKC -f xml training_data/*.xml
+
+
+

Whitespace normalization is enabled per default and converts all Unicode +whitespace characters into a simple space. It is highly recommended to leave +this function enabled as the variation of space width, resulting either from +text justification or the irregularity of handwriting, is difficult for a +recognition model to accurately model and map onto the different space code +points. Nevertheless it can be disabled through:

+
$ ketos train --no-normalize-whitespace -f xml training_data/*.xml
+
+
+

Further the behavior of the BiDi algorithm can be influenced through two options. The +configuration of the algorithm is important as the recognition network is +trained to output characters (or rather labels which are mapped to code points +by a codec) in the order a line is fed into the network, i.e. +left-to-right also called display order. Unicode text is encoded as a stream of +code points in logical order, i.e. the order the characters in a line are read +in by a human reader, for example (mostly) right-to-left for a text in Hebrew. +The BiDi algorithm resolves this logical order to the display order expected by +the network and vice versa. The primary parameter of the algorithm is the base +direction which is just the default direction of the input fields of the user +when the ground truth was initially transcribed. Base direction will be +automatically determined by kraken when using PAGE XML or ALTO files that +contain it, otherwise it will have to be supplied if it differs from the +default when training a model:

+
$ ketos train --base-dir R -f xml rtl_training_data/*.xml
+
+
+

It is also possible to disable BiDi processing completely, e.g. when the text +has been brought into display order already:

+
$ ketos train --no-reorder -f xml rtl_display_data/*.xml
+
+
+
+
+

Codecs

+

Codecs map between the label decoded from the raw network output and Unicode +code points (see this diagram for the precise steps +involved in text line recognition). Codecs are attached to a recognition model +and are usually defined once at initial training time, although they can be +adapted either explicitly (with the API) or implicitly through domain adaptation.

+

The default behavior of kraken is to auto-infer this mapping from all the +characters in the training set and map each code point to one separate label. +This is usually sufficient for alphabetic scripts, abjads, and abugidas apart +from very specialised use cases. Logographic writing systems with a very large +number of different graphemes, such as all the variants of Han characters or +Cuneiform, can be more problematic as their large inventory makes recognition +both slow and error-prone. In such cases it can be advantageous to decompose +each code point into multiple labels to reduce the output dimensionality of the +network. During decoding valid sequences of labels will be mapped to their +respective code points as usual.

+

There are multiple approaches one could follow constructing a custom codec: +randomized block codes, i.e. producing random fixed-length labels for each code +point, Huffmann coding, i.e. variable length label sequences depending on the +frequency of each code point in some text (not necessarily the training set), +or structural decomposition, i.e. describing each code point through a +sequence of labels that describe the shape of the grapheme similar to how some +input systems for Chinese characters function.

+

While the system is functional it is not well-tested in practice and it is +unclear which approach works best for which kinds of inputs.

+

Custom codecs can be supplied as simple JSON files that contain a dictionary +mapping between strings and integer sequences, e.g.:

+
$ ketos train -c sample.codec -f xml training_data/*.xml
+
+
+

with sample.codec containing:

+
{"S": [50, 53, 74, 23],
+ "A": [95, 60, 19, 95],
+ "B": [2, 96, 28, 29],
+ "\u1f05": [91, 14, 95, 90]}
+
+
+
+
+
+

Unsupervised recognition pretraining

+

Text recognition models can be pretrained in an unsupervised fashion from text +line images, both in bounding box and baseline format. The pretraining is +performed through a contrastive surrogate task aiming to distinguish in-painted +parts of the input image features from randomly sampled distractor slices.

+

All data sources accepted by the supervised trainer are valid for pretraining +but for performance reasons it is recommended to use pre-compiled binary +datasets. One thing to keep in mind is that compilation filters out empty +(non-transcribed) text lines per default which is undesirable for pretraining. +With the --keep-empty-lines option all valid lines will be written to the +dataset file:

+
$ ketos compile --keep-empty-lines -f xml -o foo.arrow *.xml
+
+
+

The basic pretraining call is very similar to a training one:

+
$ ketos pretrain -f binary foo.arrow
+
+
+

There are a couple of hyperparameters that are specific to pretraining: the +mask width (at the subsampling level of the last convolutional layer), the +probability of a particular position being the start position of a mask, and +the number of negative distractor samples.

+
$ ketos pretrain -o pretrain --mask-width 4 --mask-probability 0.2 --num-negatives 3 -f binary foo.arrow
+
+
+

Once a model has been pretrained it has to be adapted to perform actual +recognition with a standard labelled dataset, although training data +requirements will usually be much reduced:

+
$ ketos train -i pretrain_best.mlmodel --warmup 5000 --freeze-backbone 1000 -f binary labelled.arrow
+
+
+

It is necessary to use learning rate warmup (warmup) for at least a couple of +epochs in addition to freezing the backbone (all but the last fully connected +layer performing the classification) to have the model converge during +fine-tuning. Fine-tuning models from pre-trained weights is quite a bit less +stable than training from scratch or fine-tuning an existing model. As such it +can be necessary to run a couple of trials with different hyperparameters +(principally learning rate) to find workable ones. It is entirely possible that +pretrained models do not converge at all even with reasonable hyperparameter +configurations.

+
+
+

Segmentation training

+

Training a segmentation model is very similar to training models for text +recognition. The basic invocation is:

+
$ ketos segtrain -f xml training_data/*.xml
+
+
+

This takes all text lines and regions encoded in the XML files and trains a +model to recognize them.

+

Most other options available in transcription training are also available in +segmentation training. CUDA acceleration:

+
$ ketos segtrain -d cuda -f xml training_data/*.xml
+
+
+

Defining custom architectures:

+
$ ketos segtrain -d cuda -s '[1,1200,0,3 Cr7,7,64,2,2 Gn32 Cr3,3,128,2,2 Gn32 Cr3,3,128 Gn32 Cr3,3,256 Gn32]' -f xml training_data/*.xml
+
+
+

Fine tuning/transfer learning with last layer adaptation and slicing:

+
$ ketos segtrain --resize new -i segmodel_best.mlmodel training_data/*.xml
+$ ketos segtrain -i segmodel_best.mlmodel --append 7 -s '[Cr3,3,64 Do0.1]' training_data/*.xml
+
+
+

In addition there are a couple of specific options that allow filtering of +baseline and region types. Datasets are often annotated to a level that is too +detailed or contains undesirable types, e.g. when combining segmentation data +from different sources. The most basic option is the suppression of all of +either baseline or region data contained in the dataset:

+
$ ketos segtrain --suppress-baselines -f xml training_data/*.xml
+Training line types:
+Training region types:
+  graphic       3       135
+  text  4       1128
+  separator     5       5431
+  paragraph     6       10218
+  table 7       16
+...
+$ ketos segtrain --suppress-regions -f xml training-data/*.xml
+Training line types:
+  default 2     53980
+  foo     8     134
+...
+
+
+

It is also possible to filter out baselines/regions selectively:

+
$ ketos segtrain -f xml --valid-baselines default training_data/*.xml
+Training line types:
+  default 2     53980
+Training region types:
+  graphic       3       135
+  text  4       1128
+  separator     5       5431
+  paragraph     6       10218
+  table 7       16
+$ ketos segtrain -f xml --valid-regions graphic --valid-regions paragraph training_data/*.xml
+Training line types:
+  default 2     53980
+ Training region types:
+  graphic       3       135
+  paragraph     6       10218
+
+
+

Finally, we can merge baselines and regions into each other:

+
$ ketos segtrain -f xml --merge-baselines default:foo training_data/*.xml
+Training line types:
+  default 2     54114
+...
+$ ketos segtrain -f xml --merge-regions text:paragraph --merge-regions graphic:table training_data/*.xml
+...
+Training region types:
+  graphic       3       151
+  text  4       11346
+  separator     5       5431
+...
+
+
+

These options are combinable to massage the dataset into any typology you want. +Tags containing the separator character : can be specified by escaping them +with backslash.

+

Then there are some options that set metadata fields controlling the +postprocessing. When computing the bounding polygons the recognized baselines +are offset slightly to ensure overlap with the line corpus. This offset is per +default upwards for baselines but as it is possible to annotate toplines (for +scripts like Hebrew) and centerlines (for baseline-free scripts like Chinese) +the appropriate offset can be selected with an option:

+
$ ketos segtrain --topline -f xml hebrew_training_data/*.xml
+$ ketos segtrain --centerline -f xml chinese_training_data/*.xml
+$ ketos segtrain --baseline -f xml latin_training_data/*.xml
+
+
+

Lastly, there are some regions that are absolute boundaries for text line +content. When these regions are marked as such the polygonization can sometimes +be improved:

+
$ ketos segtrain --bounding-regions paragraph -f xml training_data/*.xml
+...
+
+
+
+
+

Reading order training

+

Reading order models work slightly differently from segmentation and reading +order models. They are closely linked to the typology used in the dataset they +were trained on as they use type information on lines and regions to make +ordering decisions. As the same typology was probably used to train a specific +segmentation model, reading order models are trained separately but bundled +with their segmentation model in a subsequent step. The general sequence is +therefore:

+
$ ketos segtrain -o fr_manu_seg.mlmodel -f xml french/*.xml
+...
+$ ketos rotrain -o fr_manu_ro.mlmodel -f xml french/*.xml
+...
+$ ketos roadd -o fr_manu_seg_with_ro.mlmodel -i fr_manu_seg_best.mlmodel  -r fr_manu_ro_best.mlmodel
+
+
+

Only the fr_manu_seg_with_ro.mlmodel file will contain the trained reading +order model. Segmentation models can exist with or without reading order +models. If one is added, the neural reading order will be computed in +addition to the one produced by the default heuristic during segmentation and +serialized in the final XML output (in ALTO/PAGE XML).

+
+

Note

+

Reading order models work purely on the typology and geometric features +of the lines and regions. They construct an approximate ordering matrix +by feeding feature vectors of two lines (or regions) into the network +to decide which of those two lines precedes the other.

+

These feature vectors are quite simple; just the lines’ types, and +their start, center, and end points. Therefore they can not reliably +learn any ordering relying on graphical features of the input page such +as: line color, typeface, or writing system.

+
+

Reading order models are extremely simple and do not require a lot of memory or +computational power to train. In fact, the default parameters are extremely +conservative and it is recommended to increase the batch size for improved +training speed. Large batch size above 128k are easily possible with +sufficiently large training datasets:

+
$ ketos rotrain -o fr_manu_ro.mlmodel -B 128000 -f french/*.xml
+Training RO on following baselines types:
+  DefaultLine   1
+  DropCapitalLine       2
+  HeadingLine   3
+  InterlinearLine       4
+GPU available: False, used: False
+TPU available: False, using: 0 TPU cores
+IPU available: False, using: 0 IPUs
+HPU available: False, using: 0 HPUs
+┏━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
+┃   ┃ Name        ┃ Type              ┃ Params ┃
+┡━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
+│ 0 │ criterion   │ BCEWithLogitsLoss │      0 │
+│ 1 │ ro_net      │ MLP               │  1.1 K │
+│ 2 │ ro_net.fc1  │ Linear            │  1.0 K │
+│ 3 │ ro_net.relu │ ReLU              │      0 │
+│ 4 │ ro_net.fc2  │ Linear            │     45 │
+└───┴─────────────┴───────────────────┴────────┘
+Trainable params: 1.1 K
+Non-trainable params: 0
+Total params: 1.1 K
+Total estimated model params size (MB): 0
+stage 0/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/35 0:00:00 • -:--:-- 0.00it/s val_spearman: 0.912 val_loss: 0.701 early_stopping: 0/300 inf
+
+
+

During validation a metric called Spearman’s footrule is computed. To calculate +Spearman’s footrule, the ranks of the lines of text in the ground truth reading +order and the predicted reading order are compared. The footrule is then +calculated as the sum of the absolute differences between the ranks of pairs of +lines. The score increases by 1 for each line between the correct and predicted +positions of a line.

+

A lower footrule score indicates a better alignment between the two orders. A +score of 0 implies perfect alignment of line ranks.

+
+
+

Recognition testing

+

Picking a particular model from a pool or getting a more detailed look on the +recognition accuracy can be done with the test command. It uses transcribed +lines, the test set, in the same format as the train command, recognizes the +line images with one or more models, and creates a detailed report of the +differences from the ground truth for each of them.

+ + + + + + + + + + + + + + + + + + + + + + + +

option

action

-f, --format-type

Sets the test set data format. +Valid choices are ‘path’, ‘xml’ (default), ‘alto’, ‘page’, or binary. +In alto, page, and xml mode all data is extracted from XML files +containing both baselines and a link to source images. +In path mode arguments are image files sharing a prefix up to the last +extension with JSON .path files containing the baseline information. +In binary mode arguments are precompiled binary dataset files.

-m, --model

Model(s) to evaluate.

-e, --evaluation-files

File(s) with paths to evaluation data.

-d, --device

Select device to use.

--pad

Left and right padding around lines.

+

Transcriptions are handed to the command in the same way as for the train +command, either through a manifest with -e/--evaluation-files or by just +adding a number of image files as the final argument:

+
$ ketos test -m $model -e test.txt test/*.png
+Evaluating $model
+Evaluating  [####################################]  100%
+=== report test_model.mlmodel ===
+
+7012 Characters
+6022 Errors
+14.12%       Accuracy
+
+5226 Insertions
+2    Deletions
+794  Substitutions
+
+Count Missed   %Right
+1567  575    63.31%  Common
+5230  5230   0.00%   Arabic
+215   215    0.00%   Inherited
+
+Errors       Correct-Generated
+773  { ا } - {  }
+536  { ل } - {  }
+328  { و } - {  }
+274  { ي } - {  }
+266  { م } - {  }
+256  { ب } - {  }
+246  { ن } - {  }
+241  { SPACE } - {  }
+207  { ر } - {  }
+199  { ف } - {  }
+192  { ه } - {  }
+174  { ع } - {  }
+172  { ARABIC HAMZA ABOVE } - {  }
+144  { ت } - {  }
+136  { ق } - {  }
+122  { س } - {  }
+108  { ، } - {  }
+106  { د } - {  }
+82   { ك } - {  }
+81   { ح } - {  }
+71   { ج } - {  }
+66   { خ } - {  }
+62   { ة } - {  }
+60   { ص } - {  }
+39   { ، } - { - }
+38   { ش } - {  }
+30   { ا } - { - }
+30   { ن } - { - }
+29   { ى } - {  }
+28   { ذ } - {  }
+27   { ه } - { - }
+27   { ARABIC HAMZA BELOW } - {  }
+25   { ز } - {  }
+23   { ث } - {  }
+22   { غ } - {  }
+20   { م } - { - }
+20   { ي } - { - }
+20   { ) } - {  }
+19   { : } - {  }
+19   { ط } - {  }
+19   { ل } - { - }
+18   { ، } - { . }
+17   { ة } - { - }
+16   { ض } - {  }
+...
+Average accuracy: 14.12%, (stddev: 0.00)
+
+
+

The report(s) contains character accuracy measured per script and a detailed +list of confusions. When evaluating multiple models the last line of the output +will the average accuracy and the standard deviation across all of them.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.2/models.html b/5.2/models.html new file mode 100644 index 000000000..dcd35bdb9 --- /dev/null +++ b/5.2/models.html @@ -0,0 +1,126 @@ + + + + + + + + Models — kraken documentation + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Models

+

There are currently three kinds of models containing the recurrent neural +networks doing all the character recognition supported by kraken: pronn +files serializing old pickled pyrnn models as protobuf, clstm’s native +serialization, and versatile Core ML models.

+
+

CoreML

+

Core ML allows arbitrary network architectures in a compact serialization with +metadata. This is the default format in pytorch-based kraken.

+
+
+

Segmentation Models

+
+
+

Recognition Models

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.2/objects.inv b/5.2/objects.inv new file mode 100644 index 000000000..1b1edb173 Binary files /dev/null and b/5.2/objects.inv differ diff --git a/5.2/search.html b/5.2/search.html new file mode 100644 index 000000000..bb59d9b5b --- /dev/null +++ b/5.2/search.html @@ -0,0 +1,113 @@ + + + + + + + Search — kraken documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +

Search

+ + + + +

+ Searching for multiple words only shows matches that contain + all words. +

+ + +
+ + + +
+ + +
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.2/searchindex.js b/5.2/searchindex.js new file mode 100644 index 000000000..04752b2ba --- /dev/null +++ b/5.2/searchindex.js @@ -0,0 +1 @@ +Search.setIndex({"alltitles": {"ABBYY XML": [[2, "abbyy-xml"]], "ALTO": [[5, "alto"]], "ALTO 4.4": [[2, "alto-4-4"]], "API Quickstart": [[1, null]], "API Reference": [[2, null]], "Advanced Usage": [[0, null]], "Annotation and transcription": [[7, "annotation-and-transcription"]], "Baseline Segmentation": [[0, "baseline-segmentation"]], "Baseline segmentation": [[1, "baseline-segmentation"]], "Basic Concepts": [[1, "basic-concepts"]], "Basics": [[8, "basics"]], "Best practices": [[5, "best-practices"]], "Binarization": [[0, "binarization"]], "Binary Datasets": [[5, "binary-datasets"]], "Codecs": [[5, "codecs"]], "Containers and Helpers": [[2, "containers-and-helpers"]], "Convolutional Layers": [[8, "convolutional-layers"]], "CoreML": [[6, "coreml"]], "Dataset Compilation": [[7, "dataset-compilation"]], "Default templates": [[2, "default-templates"]], "Dropout": [[8, "dropout"]], "Evaluation and Validation": [[7, "evaluation-and-validation"]], "Examples": [[8, "examples"]], "Features": [[4, "features"]], "Finding Recognition Models": [[4, "finding-recognition-models"]], "Fine Tuning": [[5, "fine-tuning"]], "From Scratch": [[5, "from-scratch"]], "Funding": [[4, "funding"]], "GPU Acceleration": [[3, null]], "Group Normalization": [[8, "group-normalization"]], "Helper and Plumbing Layers": [[8, "helper-and-plumbing-layers"]], "Helpers": [[2, "helpers"]], "Image acquisition and preprocessing": [[7, "image-acquisition-and-preprocessing"]], "Input and Outputs": [[0, "input-and-outputs"]], "Installation": [[4, "installation"]], "Installation using Conda": [[4, "installation-using-conda"]], "Installation using Pip": [[4, "installation-using-pip"]], "Installing kraken": [[7, "installing-kraken"]], "Legacy Box Segmentation": [[0, "legacy-box-segmentation"]], "Legacy modules": [[2, "legacy-modules"]], "Legacy segmentation": [[1, "legacy-segmentation"]], "License": [[4, "license"]], "Loss and Evaluation Functions": [[2, "loss-and-evaluation-functions"]], "Masking": [[0, "masking"]], "Max Pool": [[8, "max-pool"]], "Model Repository": [[0, "model-repository"]], "Models": [[6, null]], "Output formats": [[0, "output-formats"]], "PAGE XML": [[5, "page-xml"]], "Page Segmentation": [[0, "page-segmentation"]], "PageXML": [[2, "pagexml"]], "Preprocessing and Segmentation": [[1, "preprocessing-and-segmentation"]], "Principal Text Direction": [[0, "principal-text-direction"]], "Publishing": [[0, "publishing"]], "Querying and Model Retrieval": [[0, "querying-and-model-retrieval"]], "Quickstart": [[4, "quickstart"]], "Reading order datasets": [[2, "reading-order-datasets"]], "Reading order training": [[5, "reading-order-training"]], "Recognition": [[0, "recognition"], [1, "recognition"], [2, "recognition"], [7, "recognition"]], "Recognition Models": [[6, "recognition-models"]], "Recognition datasets": [[2, "recognition-datasets"]], "Recognition model training": [[5, "recognition-model-training"]], "Recognition testing": [[5, "recognition-testing"]], "Recognition training": [[5, "recognition-training"]], "Recurrent Layers": [[8, "recurrent-layers"]], "Regularization Layers": [[8, "regularization-layers"]], "Related Software": [[4, "related-software"]], "Reshape": [[8, "reshape"]], "Segmentation": [[2, "segmentation"]], "Segmentation Models": [[6, "segmentation-models"]], "Segmentation datasets": [[2, "segmentation-datasets"]], "Segmentation model training": [[5, "segmentation-model-training"]], "Segmentation training": [[5, "segmentation-training"]], "Serialization": [[1, "serialization"], [2, "serialization"]], "Slicing": [[5, "slicing"]], "Text Normalization and Unicode": [[5, "text-normalization-and-unicode"]], "Trainer": [[2, "trainer"]], "Training": [[1, "training"], [2, "training"], [5, null], [7, "compilation"]], "Training Tutorial": [[4, "training-tutorial"]], "Training data formats": [[5, "training-data-formats"]], "Training kraken": [[7, null]], "Unsupervised recognition pretraining": [[5, "unsupervised-recognition-pretraining"]], "VGSL network specification": [[8, null]], "XML Parsing": [[1, "xml-parsing"]], "hOCR": [[2, "hocr"]], "kraken": [[4, null]], "kraken.binarization module": [[2, "kraken-binarization-module"]], "kraken.blla module": [[2, "kraken-blla-module"]], "kraken.containers module": [[2, "kraken-containers-module"]], "kraken.lib.codec module": [[2, "kraken-lib-codec-module"]], "kraken.lib.ctc_decoder": [[2, "kraken-lib-ctc-decoder"]], "kraken.lib.dataset module": [[2, "kraken-lib-dataset-module"]], "kraken.lib.exceptions": [[2, "kraken-lib-exceptions"]], "kraken.lib.models module": [[2, "kraken-lib-models-module"]], "kraken.lib.segmentation module": [[2, "kraken-lib-segmentation-module"]], "kraken.lib.train module": [[2, "kraken-lib-train-module"]], "kraken.lib.vgsl module": [[2, "kraken-lib-vgsl-module"]], "kraken.lib.xml module": [[2, "kraken-lib-xml-module"]], "kraken.linegen module": [[2, "kraken-linegen-module"]], "kraken.pageseg module": [[2, "kraken-pageseg-module"]], "kraken.rpred module": [[2, "kraken-rpred-module"]], "kraken.serialization module": [[2, "kraken-serialization-module"]], "kraken.transcribe module": [[2, "kraken-transcribe-module"]]}, "docnames": ["advanced", "api", "api_docs", "gpu", "index", "ketos", "models", "training", "vgsl"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2}, "filenames": ["advanced.rst", "api.rst", "api_docs.rst", "gpu.rst", "index.rst", "ketos.rst", "models.rst", "training.rst", "vgsl.rst"], "indexentries": {"add() (kraken.lib.dataset.arrowipcrecognitiondataset method)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.add", false]], "add() (kraken.lib.dataset.baselineset method)": [[2, "kraken.lib.dataset.BaselineSet.add", false]], "add() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.add", false]], "add() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.add", false]], "add_codec() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.add_codec", false]], "add_labels() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.add_labels", false]], "add_line() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.add_line", false]], "add_line() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.add_line", false]], "add_page() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.add_page", false]], "add_page() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.add_page", false]], "add_page() (kraken.transcribe.transcriptioninterface method)": [[2, "kraken.transcribe.TranscriptionInterface.add_page", false]], "alphabet (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.alphabet", false]], "alphabet (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.alphabet", false]], "alphabet (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.alphabet", false]], "append() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.append", false]], "arrow_table (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.arrow_table", false]], "arrowipcrecognitiondataset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset", false]], "aug (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.aug", false]], "aug (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.aug", false]], "aug (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.aug", false]], "aug (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.aug", false]], "automatic_optimization (kraken.lib.train.krakentrainer attribute)": [[2, "kraken.lib.train.KrakenTrainer.automatic_optimization", false]], "aux_layers (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.aux_layers", false]], "base_dir (kraken.containers.baselineline attribute)": [[2, "id7", false], [2, "kraken.containers.BaselineLine.base_dir", false]], "base_dir (kraken.containers.baselineocrrecord attribute)": [[2, "id29", false], [2, "kraken.containers.BaselineOCRRecord.base_dir", false]], "base_dir (kraken.containers.bboxline attribute)": [[2, "id16", false], [2, "kraken.containers.BBoxLine.base_dir", false]], "base_dir (kraken.containers.bboxocrrecord attribute)": [[2, "id33", false], [2, "kraken.containers.BBoxOCRRecord.base_dir", false]], "base_dir (kraken.containers.ocr_record attribute)": [[2, "kraken.containers.ocr_record.base_dir", false]], "baseline (kraken.containers.baselineline attribute)": [[2, "id8", false], [2, "kraken.containers.BaselineLine.baseline", false]], "baselineline (class in kraken.containers)": [[2, "kraken.containers.BaselineLine", false]], "baselineocrrecord (class in kraken.containers)": [[2, "kraken.containers.BaselineOCRRecord", false]], "baselineset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.BaselineSet", false]], "batch (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.batch", false]], "bbox (kraken.containers.bboxline attribute)": [[2, "id17", false], [2, "kraken.containers.BBoxLine.bbox", false]], "bboxline (class in kraken.containers)": [[2, "kraken.containers.BBoxLine", false]], "bboxocrrecord (class in kraken.containers)": [[2, "kraken.containers.BBoxOCRRecord", false]], "beam_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.beam_decoder", false]], "bidi_reordering (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.bidi_reordering", false]], "blank_threshold_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.blank_threshold_decoder", false]], "blocks (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.blocks", false]], "boundary (kraken.containers.baselineline attribute)": [[2, "id9", false], [2, "kraken.containers.BaselineLine.boundary", false]], "boundary (kraken.containers.region attribute)": [[2, "id25", false], [2, "kraken.containers.Region.boundary", false]], "bounds (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.bounds", false]], "build_addition() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_addition", false]], "build_conv() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_conv", false]], "build_dropout() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_dropout", false]], "build_groupnorm() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_groupnorm", false]], "build_identity() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_identity", false]], "build_maxpool() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_maxpool", false]], "build_output() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_output", false]], "build_parallel() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_parallel", false]], "build_reshape() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_reshape", false]], "build_rnn() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_rnn", false]], "build_ro() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_ro", false]], "build_series() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_series", false]], "build_wav2vec2() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_wav2vec2", false]], "c_sorted (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.c_sorted", false]], "calculate_polygonal_environment() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.calculate_polygonal_environment", false]], "category (kraken.containers.processingstep attribute)": [[2, "id36", false], [2, "kraken.containers.ProcessingStep.category", false]], "centerline_norm (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.centerline_norm", false]], "channels (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.channels", false]], "class_mapping (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.class_mapping", false]], "class_stats (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.class_stats", false]], "codec (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.codec", false]], "codec (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.codec", false]], "codec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.codec", false]], "collate_sequences() (in module kraken.lib.dataset)": [[2, "kraken.lib.dataset.collate_sequences", false]], "compute_confusions() (in module kraken.lib.dataset)": [[2, "kraken.lib.dataset.compute_confusions", false]], "compute_polygon_section() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.compute_polygon_section", false]], "confidences (kraken.containers.baselineocrrecord attribute)": [[2, "kraken.containers.BaselineOCRRecord.confidences", false]], "confidences (kraken.containers.bboxocrrecord attribute)": [[2, "kraken.containers.BBoxOCRRecord.confidences", false]], "confidences (kraken.containers.ocr_record property)": [[2, "kraken.containers.ocr_record.confidences", false]], "criterion (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id43", false], [2, "kraken.lib.vgsl.TorchVGSLModel.criterion", false]], "cuts (kraken.containers.baselineocrrecord attribute)": [[2, "kraken.containers.BaselineOCRRecord.cuts", false]], "cuts (kraken.containers.baselineocrrecord property)": [[2, "id30", false]], "cuts (kraken.containers.bboxocrrecord attribute)": [[2, "kraken.containers.BBoxOCRRecord.cuts", false]], "cuts (kraken.containers.ocr_record property)": [[2, "kraken.containers.ocr_record.cuts", false]], "data (kraken.lib.dataset.pagewiseroset attribute)": [[2, "kraken.lib.dataset.PageWiseROSet.data", false]], "data (kraken.lib.dataset.pairwiseroset attribute)": [[2, "kraken.lib.dataset.PairWiseROSet.data", false]], "decode() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.decode", false]], "decoder (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.decoder", false]], "description (kraken.containers.processingstep attribute)": [[2, "id37", false], [2, "kraken.containers.ProcessingStep.description", false]], "device (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.device", false]], "display_order (kraken.containers.baselineocrrecord attribute)": [[2, "kraken.containers.BaselineOCRRecord.display_order", false]], "display_order (kraken.containers.bboxocrrecord attribute)": [[2, "kraken.containers.BBoxOCRRecord.display_order", false]], "display_order() (kraken.containers.baselineocrrecord method)": [[2, "id31", false]], "display_order() (kraken.containers.bboxocrrecord method)": [[2, "id34", false]], "display_order() (kraken.containers.ocr_record method)": [[2, "kraken.containers.ocr_record.display_order", false]], "encode() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.encode", false]], "encode() (kraken.lib.dataset.arrowipcrecognitiondataset method)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.encode", false]], "encode() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.encode", false]], "encode() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.encode", false]], "env (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.env", false]], "eval() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.eval", false]], "extract_polygons() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.extract_polygons", false]], "failed_samples (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.failed_samples", false]], "failed_samples (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.failed_samples", false]], "failed_samples (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.failed_samples", false]], "failed_samples (kraken.lib.dataset.pagewiseroset attribute)": [[2, "kraken.lib.dataset.PageWiseROSet.failed_samples", false]], "failed_samples (kraken.lib.dataset.pairwiseroset attribute)": [[2, "kraken.lib.dataset.PairWiseROSet.failed_samples", false]], "failed_samples (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.failed_samples", false]], "fit() (kraken.lib.train.krakentrainer method)": [[2, "kraken.lib.train.KrakenTrainer.fit", false]], "font (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.font", false]], "force_binarization (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.force_binarization", false]], "forward() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.forward", false]], "get_feature_dim() (kraken.lib.dataset.pagewiseroset method)": [[2, "kraken.lib.dataset.PageWiseROSet.get_feature_dim", false]], "get_feature_dim() (kraken.lib.dataset.pairwiseroset method)": [[2, "kraken.lib.dataset.PairWiseROSet.get_feature_dim", false]], "global_align() (in module kraken.lib.dataset)": [[2, "kraken.lib.dataset.global_align", false]], "greedy_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.greedy_decoder", false]], "groundtruthdataset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.GroundTruthDataset", false]], "height (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.height", false]], "height (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id40", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.height", false]], "hyper_params (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.hyper_params", false]], "id (kraken.containers.baselineline attribute)": [[2, "id10", false], [2, "kraken.containers.BaselineLine.id", false]], "id (kraken.containers.bboxline attribute)": [[2, "id18", false], [2, "kraken.containers.BBoxLine.id", false]], "id (kraken.containers.processingstep attribute)": [[2, "id38", false], [2, "kraken.containers.ProcessingStep.id", false]], "id (kraken.containers.region attribute)": [[2, "id26", false], [2, "kraken.containers.Region.id", false]], "idx (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.idx", false]], "im (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.im", false]], "im_mode (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.im_mode", false]], "im_mode (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.im_mode", false]], "im_mode (kraken.lib.dataset.groundtruthdataset property)": [[2, "kraken.lib.dataset.GroundTruthDataset.im_mode", false]], "im_mode (kraken.lib.dataset.polygongtdataset property)": [[2, "kraken.lib.dataset.PolygonGTDataset.im_mode", false]], "imageinputtransforms (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.ImageInputTransforms", false]], "imagename (kraken.containers.baselineline attribute)": [[2, "id11", false], [2, "kraken.containers.BaselineLine.imagename", false]], "imagename (kraken.containers.bboxline attribute)": [[2, "id19", false], [2, "kraken.containers.BBoxLine.imagename", false]], "imagename (kraken.containers.region attribute)": [[2, "id27", false], [2, "kraken.containers.Region.imagename", false]], "imagename (kraken.containers.segmentation attribute)": [[2, "id0", false], [2, "kraken.containers.Segmentation.imagename", false]], "imgs (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.imgs", false]], "init_weights() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.init_weights", false]], "input (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id44", false], [2, "kraken.lib.vgsl.TorchVGSLModel.input", false]], "is_valid (kraken.lib.codec.pytorchcodec property)": [[2, "kraken.lib.codec.PytorchCodec.is_valid", false]], "kind (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.kind", false]], "krakencairosurfaceexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenCairoSurfaceException", false]], "krakencodecexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenCodecException", false]], "krakenencodeexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenEncodeException", false]], "krakeninputexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenInputException", false]], "krakeninvalidmodelexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenInvalidModelException", false]], "krakenrecordexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenRecordException", false]], "krakenrepoexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenRepoException", false]], "krakenstoptrainingexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenStopTrainingException", false]], "krakentrainer (class in kraken.lib.train)": [[2, "kraken.lib.train.KrakenTrainer", false]], "l2c (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.l2c", false]], "l2c_single (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.l2c_single", false]], "legacy_polygons (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.legacy_polygons", false]], "legacy_polygons_status (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.legacy_polygons_status", false]], "len (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.len", false]], "line_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.line_idx", false]], "line_iter (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.line_iter", false]], "line_orders (kraken.containers.segmentation attribute)": [[2, "id1", false], [2, "kraken.containers.Segmentation.line_orders", false]], "line_width (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.line_width", false]], "lines (kraken.containers.segmentation attribute)": [[2, "id2", false], [2, "kraken.containers.Segmentation.lines", false]], "load_any() (in module kraken.lib.models)": [[2, "kraken.lib.models.load_any", false]], "load_model() (kraken.lib.vgsl.torchvgslmodel class method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.load_model", false]], "logical_order() (kraken.containers.baselineocrrecord method)": [[2, "kraken.containers.BaselineOCRRecord.logical_order", false]], "logical_order() (kraken.containers.bboxocrrecord method)": [[2, "kraken.containers.BBoxOCRRecord.logical_order", false]], "logical_order() (kraken.containers.ocr_record method)": [[2, "kraken.containers.ocr_record.logical_order", false]], "m (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.m", false]], "max_label (kraken.lib.codec.pytorchcodec property)": [[2, "kraken.lib.codec.PytorchCodec.max_label", false]], "mbl_dict (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.mbl_dict", false]], "merge() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.merge", false]], "message (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id41", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.message", false]], "mm_rpred (class in kraken.rpred)": [[2, "kraken.rpred.mm_rpred", false]], "mode (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.mode", false]], "model_type (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.model_type", false]], "mreg_dict (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.mreg_dict", false]], "named_spec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.named_spec", false]], "nets (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.nets", false]], "neural_reading_order() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.neural_reading_order", false]], "nlbin() (in module kraken.binarization)": [[2, "kraken.binarization.nlbin", false]], "nn (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.nn", false]], "nn (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id45", false], [2, "kraken.lib.vgsl.TorchVGSLModel.nn", false]], "no_encode() (kraken.lib.dataset.arrowipcrecognitiondataset method)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.no_encode", false]], "no_encode() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.no_encode", false]], "no_encode() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.no_encode", false]], "no_legacy_polygons (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.no_legacy_polygons", false]], "num_classes (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.num_classes", false]], "ocr_record (class in kraken.containers)": [[2, "kraken.containers.ocr_record", false]], "one_channel_mode (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.one_channel_mode", false]], "one_channel_mode (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.one_channel_mode", false]], "one_channel_mode (kraken.lib.vgsl.torchvgslmodel property)": [[2, "id46", false]], "one_channel_modes (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.one_channel_modes", false]], "ops (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.ops", false]], "pad (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.pad", false]], "pad (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.pad", false]], "pad (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.pad", false]], "page_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.page_idx", false]], "pages (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.pages", false]], "pagewiseroset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.PageWiseROSet", false]], "pairwiseroset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.PairWiseROSet", false]], "pattern (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.pattern", false]], "polygonal_reading_order() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.polygonal_reading_order", false]], "polygongtdataset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.PolygonGTDataset", false]], "predict() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict", false]], "predict_labels() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict_labels", false]], "predict_string() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict_string", false]], "prediction (kraken.containers.baselineocrrecord attribute)": [[2, "kraken.containers.BaselineOCRRecord.prediction", false]], "prediction (kraken.containers.bboxocrrecord attribute)": [[2, "kraken.containers.BBoxOCRRecord.prediction", false]], "prediction (kraken.containers.ocr_record property)": [[2, "kraken.containers.ocr_record.prediction", false]], "processingstep (class in kraken.containers)": [[2, "kraken.containers.ProcessingStep", false]], "pytorchcodec (class in kraken.lib.codec)": [[2, "kraken.lib.codec.PytorchCodec", false]], "reading_order() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.reading_order", false]], "rebuild_alphabet() (kraken.lib.dataset.arrowipcrecognitiondataset method)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.rebuild_alphabet", false]], "region (class in kraken.containers)": [[2, "kraken.containers.Region", false]], "regions (kraken.containers.baselineline attribute)": [[2, "id12", false], [2, "kraken.containers.BaselineLine.regions", false]], "regions (kraken.containers.bboxline attribute)": [[2, "id20", false], [2, "kraken.containers.BBoxLine.regions", false]], "regions (kraken.containers.segmentation attribute)": [[2, "id3", false], [2, "kraken.containers.Segmentation.regions", false]], "render_report() (in module kraken.serialization)": [[2, "kraken.serialization.render_report", false]], "resize_output() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.resize_output", false]], "rpred() (in module kraken.rpred)": [[2, "kraken.rpred.rpred", false]], "save_model() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.save_model", false]], "scale (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.scale", false]], "scale_polygonal_lines() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.scale_polygonal_lines", false]], "scale_regions() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.scale_regions", false]], "script_detection (kraken.containers.segmentation attribute)": [[2, "id4", false], [2, "kraken.containers.Segmentation.script_detection", false]], "seg_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.seg_idx", false]], "seg_type (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.seg_type", false]], "seg_type (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.seg_type", false]], "seg_type (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.seg_type", false]], "seg_type (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.seg_type", false]], "seg_type (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.seg_type", false]], "seg_type (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.seg_type", false]], "seg_types (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.seg_types", false]], "segment() (in module kraken.blla)": [[2, "kraken.blla.segment", false]], "segment() (in module kraken.pageseg)": [[2, "kraken.pageseg.segment", false]], "segmentation (class in kraken.containers)": [[2, "kraken.containers.Segmentation", false]], "serialize() (in module kraken.serialization)": [[2, "kraken.serialization.serialize", false]], "set_num_threads() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.set_num_threads", false]], "settings (kraken.containers.processingstep attribute)": [[2, "id39", false], [2, "kraken.containers.ProcessingStep.settings", false]], "skip_empty_lines (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.skip_empty_lines", false]], "skip_empty_lines (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.skip_empty_lines", false]], "skip_empty_lines (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.skip_empty_lines", false]], "spec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.spec", false]], "split (kraken.containers.baselineline attribute)": [[2, "id13", false], [2, "kraken.containers.BaselineLine.split", false]], "split (kraken.containers.bboxline attribute)": [[2, "id21", false], [2, "kraken.containers.BBoxLine.split", false]], "strict (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.strict", false]], "tags (kraken.containers.baselineline attribute)": [[2, "id14", false], [2, "kraken.containers.BaselineLine.tags", false]], "tags (kraken.containers.bboxline attribute)": [[2, "id22", false], [2, "kraken.containers.BBoxLine.tags", false]], "tags (kraken.containers.region attribute)": [[2, "id28", false], [2, "kraken.containers.Region.tags", false]], "tags_ignore (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.tags_ignore", false]], "targets (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.targets", false]], "text (kraken.containers.baselineline attribute)": [[2, "id15", false], [2, "kraken.containers.BaselineLine.text", false]], "text (kraken.containers.bboxline attribute)": [[2, "id23", false], [2, "kraken.containers.BBoxLine.text", false]], "text_direction (kraken.containers.bboxline attribute)": [[2, "id24", false], [2, "kraken.containers.BBoxLine.text_direction", false]], "text_direction (kraken.containers.segmentation attribute)": [[2, "id5", false], [2, "kraken.containers.Segmentation.text_direction", false]], "text_direction (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.text_direction", false]], "text_transforms (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.text_transforms", false]], "text_transforms (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.text_transforms", false]], "text_transforms (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.text_transforms", false]], "tmpl (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.tmpl", false]], "to() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.to", false]], "to() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.to", false]], "torchseqrecognizer (class in kraken.lib.models)": [[2, "kraken.lib.models.TorchSeqRecognizer", false]], "torchvgslmodel (class in kraken.lib.vgsl)": [[2, "kraken.lib.vgsl.TorchVGSLModel", false]], "train (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.train", false]], "train() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.train", false]], "transcriptioninterface (class in kraken.transcribe)": [[2, "kraken.transcribe.TranscriptionInterface", false]], "transform() (kraken.lib.dataset.baselineset method)": [[2, "kraken.lib.dataset.BaselineSet.transform", false]], "transforms (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.transforms", false]], "transforms (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.transforms", false]], "transforms (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.transforms", false]], "transforms (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.transforms", false]], "type (kraken.containers.baselineline attribute)": [[2, "kraken.containers.BaselineLine.type", false]], "type (kraken.containers.baselineocrrecord attribute)": [[2, "id32", false], [2, "kraken.containers.BaselineOCRRecord.type", false]], "type (kraken.containers.bboxline attribute)": [[2, "kraken.containers.BBoxLine.type", false]], "type (kraken.containers.bboxocrrecord attribute)": [[2, "id35", false], [2, "kraken.containers.BBoxOCRRecord.type", false]], "type (kraken.containers.ocr_record property)": [[2, "kraken.containers.ocr_record.type", false]], "type (kraken.containers.segmentation attribute)": [[2, "id6", false], [2, "kraken.containers.Segmentation.type", false]], "use_legacy_polygons (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.use_legacy_polygons", false]], "user_metadata (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id47", false], [2, "kraken.lib.vgsl.TorchVGSLModel.user_metadata", false]], "valid_baselines (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.valid_baselines", false]], "valid_norm (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.valid_norm", false]], "valid_regions (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.valid_regions", false]], "vectorize_lines() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.vectorize_lines", false]], "width (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.width", false]], "width (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id42", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.width", false]], "write() (kraken.transcribe.transcriptioninterface method)": [[2, "kraken.transcribe.TranscriptionInterface.write", false]], "xmlpage (class in kraken.lib.xml)": [[2, "kraken.lib.xml.XMLPage", false]]}, "objects": {"kraken.binarization": [[2, 0, 1, "", "nlbin"]], "kraken.blla": [[2, 0, 1, "", "segment"]], "kraken.containers": [[2, 1, 1, "", "BBoxLine"], [2, 1, 1, "", "BBoxOCRRecord"], [2, 1, 1, "", "BaselineLine"], [2, 1, 1, "", "BaselineOCRRecord"], [2, 1, 1, "", "ProcessingStep"], [2, 1, 1, "", "Region"], [2, 1, 1, "", "Segmentation"], [2, 1, 1, "", "ocr_record"]], "kraken.containers.BBoxLine": [[2, 2, 1, "id16", "base_dir"], [2, 2, 1, "id17", "bbox"], [2, 2, 1, "id18", "id"], [2, 2, 1, "id19", "imagename"], [2, 2, 1, "id20", "regions"], [2, 2, 1, "id21", "split"], [2, 2, 1, "id22", "tags"], [2, 2, 1, "id23", "text"], [2, 2, 1, "id24", "text_direction"], [2, 2, 1, "", "type"]], "kraken.containers.BBoxOCRRecord": [[2, 2, 1, "id33", "base_dir"], [2, 2, 1, "", "confidences"], [2, 2, 1, "", "cuts"], [2, 3, 1, "id34", "display_order"], [2, 3, 1, "", "logical_order"], [2, 2, 1, "", "prediction"], [2, 2, 1, "id35", "type"]], "kraken.containers.BaselineLine": [[2, 2, 1, "id7", "base_dir"], [2, 2, 1, "id8", "baseline"], [2, 2, 1, "id9", "boundary"], [2, 2, 1, "id10", "id"], [2, 2, 1, "id11", "imagename"], [2, 2, 1, "id12", "regions"], [2, 2, 1, "id13", "split"], [2, 2, 1, "id14", "tags"], [2, 2, 1, "id15", "text"], [2, 2, 1, "", "type"]], "kraken.containers.BaselineOCRRecord": [[2, 2, 1, "id29", "base_dir"], [2, 2, 1, "", "confidences"], [2, 4, 1, "id30", "cuts"], [2, 3, 1, "id31", "display_order"], [2, 3, 1, "", "logical_order"], [2, 2, 1, "", "prediction"], [2, 2, 1, "id32", "type"]], "kraken.containers.ProcessingStep": [[2, 2, 1, "id36", "category"], [2, 2, 1, "id37", "description"], [2, 2, 1, "id38", "id"], [2, 2, 1, "id39", "settings"]], "kraken.containers.Region": [[2, 2, 1, "id25", "boundary"], [2, 2, 1, "id26", "id"], [2, 2, 1, "id27", "imagename"], [2, 2, 1, "id28", "tags"]], "kraken.containers.Segmentation": [[2, 2, 1, "id0", "imagename"], [2, 2, 1, "id1", "line_orders"], [2, 2, 1, "id2", "lines"], [2, 2, 1, "id3", "regions"], [2, 2, 1, "id4", "script_detection"], [2, 2, 1, "id5", "text_direction"], [2, 2, 1, "id6", "type"]], "kraken.containers.ocr_record": [[2, 2, 1, "", "base_dir"], [2, 4, 1, "", "confidences"], [2, 4, 1, "", "cuts"], [2, 3, 1, "", "display_order"], [2, 3, 1, "", "logical_order"], [2, 4, 1, "", "prediction"], [2, 4, 1, "", "type"]], "kraken.lib.codec": [[2, 1, 1, "", "PytorchCodec"]], "kraken.lib.codec.PytorchCodec": [[2, 3, 1, "", "add_labels"], [2, 2, 1, "", "c_sorted"], [2, 3, 1, "", "decode"], [2, 3, 1, "", "encode"], [2, 4, 1, "", "is_valid"], [2, 2, 1, "", "l2c"], [2, 2, 1, "", "l2c_single"], [2, 4, 1, "", "max_label"], [2, 3, 1, "", "merge"], [2, 2, 1, "", "strict"]], "kraken.lib.ctc_decoder": [[2, 0, 1, "", "beam_decoder"], [2, 0, 1, "", "blank_threshold_decoder"], [2, 0, 1, "", "greedy_decoder"]], "kraken.lib.dataset": [[2, 1, 1, "", "ArrowIPCRecognitionDataset"], [2, 1, 1, "", "BaselineSet"], [2, 1, 1, "", "GroundTruthDataset"], [2, 1, 1, "", "ImageInputTransforms"], [2, 1, 1, "", "PageWiseROSet"], [2, 1, 1, "", "PairWiseROSet"], [2, 1, 1, "", "PolygonGTDataset"], [2, 0, 1, "", "collate_sequences"], [2, 0, 1, "", "compute_confusions"], [2, 0, 1, "", "global_align"]], "kraken.lib.dataset.ArrowIPCRecognitionDataset": [[2, 3, 1, "", "add"], [2, 2, 1, "", "alphabet"], [2, 2, 1, "", "arrow_table"], [2, 2, 1, "", "aug"], [2, 2, 1, "", "codec"], [2, 3, 1, "", "encode"], [2, 2, 1, "", "failed_samples"], [2, 2, 1, "", "im_mode"], [2, 2, 1, "", "legacy_polygons_status"], [2, 3, 1, "", "no_encode"], [2, 3, 1, "", "rebuild_alphabet"], [2, 2, 1, "", "seg_type"], [2, 2, 1, "", "skip_empty_lines"], [2, 2, 1, "", "text_transforms"], [2, 2, 1, "", "transforms"]], "kraken.lib.dataset.BaselineSet": [[2, 3, 1, "", "add"], [2, 2, 1, "", "aug"], [2, 2, 1, "", "class_mapping"], [2, 2, 1, "", "class_stats"], [2, 2, 1, "", "failed_samples"], [2, 2, 1, "", "im_mode"], [2, 2, 1, "", "imgs"], [2, 2, 1, "", "line_width"], [2, 2, 1, "", "mbl_dict"], [2, 2, 1, "", "mreg_dict"], [2, 2, 1, "", "num_classes"], [2, 2, 1, "", "pad"], [2, 2, 1, "", "seg_type"], [2, 2, 1, "", "targets"], [2, 3, 1, "", "transform"], [2, 2, 1, "", "transforms"], [2, 2, 1, "", "valid_baselines"], [2, 2, 1, "", "valid_regions"]], "kraken.lib.dataset.GroundTruthDataset": [[2, 3, 1, "", "add"], [2, 3, 1, "", "add_line"], [2, 3, 1, "", "add_page"], [2, 2, 1, "", "alphabet"], [2, 2, 1, "", "aug"], [2, 3, 1, "", "encode"], [2, 2, 1, "", "failed_samples"], [2, 4, 1, "", "im_mode"], [2, 3, 1, "", "no_encode"], [2, 2, 1, "", "seg_type"], [2, 2, 1, "", "skip_empty_lines"], [2, 2, 1, "", "text_transforms"], [2, 2, 1, "", "transforms"]], "kraken.lib.dataset.ImageInputTransforms": [[2, 4, 1, "", "batch"], [2, 4, 1, "", "centerline_norm"], [2, 4, 1, "", "channels"], [2, 4, 1, "", "force_binarization"], [2, 4, 1, "", "height"], [2, 4, 1, "", "mode"], [2, 4, 1, "", "pad"], [2, 4, 1, "", "scale"], [2, 4, 1, "", "valid_norm"], [2, 4, 1, "", "width"]], "kraken.lib.dataset.PageWiseROSet": [[2, 2, 1, "", "data"], [2, 2, 1, "", "failed_samples"], [2, 3, 1, "", "get_feature_dim"]], "kraken.lib.dataset.PairWiseROSet": [[2, 2, 1, "", "data"], [2, 2, 1, "", "failed_samples"], [2, 3, 1, "", "get_feature_dim"]], "kraken.lib.dataset.PolygonGTDataset": [[2, 3, 1, "", "add"], [2, 3, 1, "", "add_line"], [2, 3, 1, "", "add_page"], [2, 2, 1, "", "alphabet"], [2, 2, 1, "", "aug"], [2, 3, 1, "", "encode"], [2, 2, 1, "", "failed_samples"], [2, 4, 1, "", "im_mode"], [2, 2, 1, "", "legacy_polygons"], [2, 3, 1, "", "no_encode"], [2, 2, 1, "", "seg_type"], [2, 2, 1, "", "skip_empty_lines"], [2, 2, 1, "", "text_transforms"], [2, 2, 1, "", "transforms"]], "kraken.lib.exceptions": [[2, 1, 1, "", "KrakenCairoSurfaceException"], [2, 1, 1, "", "KrakenCodecException"], [2, 1, 1, "", "KrakenEncodeException"], [2, 1, 1, "", "KrakenInputException"], [2, 1, 1, "", "KrakenInvalidModelException"], [2, 1, 1, "", "KrakenRecordException"], [2, 1, 1, "", "KrakenRepoException"], [2, 1, 1, "", "KrakenStopTrainingException"]], "kraken.lib.exceptions.KrakenCairoSurfaceException": [[2, 2, 1, "id40", "height"], [2, 2, 1, "id41", "message"], [2, 2, 1, "id42", "width"]], "kraken.lib.models": [[2, 1, 1, "", "TorchSeqRecognizer"], [2, 0, 1, "", "load_any"]], "kraken.lib.models.TorchSeqRecognizer": [[2, 2, 1, "", "codec"], [2, 2, 1, "", "decoder"], [2, 2, 1, "", "device"], [2, 3, 1, "", "forward"], [2, 2, 1, "", "kind"], [2, 2, 1, "", "nn"], [2, 2, 1, "", "one_channel_mode"], [2, 3, 1, "", "predict"], [2, 3, 1, "", "predict_labels"], [2, 3, 1, "", "predict_string"], [2, 2, 1, "", "seg_type"], [2, 3, 1, "", "to"], [2, 2, 1, "", "train"]], "kraken.lib.segmentation": [[2, 0, 1, "", "calculate_polygonal_environment"], [2, 0, 1, "", "compute_polygon_section"], [2, 0, 1, "", "extract_polygons"], [2, 0, 1, "", "neural_reading_order"], [2, 0, 1, "", "polygonal_reading_order"], [2, 0, 1, "", "reading_order"], [2, 0, 1, "", "scale_polygonal_lines"], [2, 0, 1, "", "scale_regions"], [2, 0, 1, "", "vectorize_lines"]], "kraken.lib.train": [[2, 1, 1, "", "KrakenTrainer"]], "kraken.lib.train.KrakenTrainer": [[2, 2, 1, "", "automatic_optimization"], [2, 3, 1, "", "fit"]], "kraken.lib.vgsl": [[2, 1, 1, "", "TorchVGSLModel"]], "kraken.lib.vgsl.TorchVGSLModel": [[2, 3, 1, "", "add_codec"], [2, 3, 1, "", "append"], [2, 4, 1, "", "aux_layers"], [2, 2, 1, "", "blocks"], [2, 3, 1, "", "build_addition"], [2, 3, 1, "", "build_conv"], [2, 3, 1, "", "build_dropout"], [2, 3, 1, "", "build_groupnorm"], [2, 3, 1, "", "build_identity"], [2, 3, 1, "", "build_maxpool"], [2, 3, 1, "", "build_output"], [2, 3, 1, "", "build_parallel"], [2, 3, 1, "", "build_reshape"], [2, 3, 1, "", "build_rnn"], [2, 3, 1, "", "build_ro"], [2, 3, 1, "", "build_series"], [2, 3, 1, "", "build_wav2vec2"], [2, 2, 1, "", "codec"], [2, 2, 1, "id43", "criterion"], [2, 3, 1, "", "eval"], [2, 4, 1, "", "hyper_params"], [2, 2, 1, "", "idx"], [2, 3, 1, "", "init_weights"], [2, 2, 1, "id44", "input"], [2, 3, 1, "", "load_model"], [2, 2, 1, "", "m"], [2, 4, 1, "", "model_type"], [2, 2, 1, "", "named_spec"], [2, 2, 1, "id45", "nn"], [2, 4, 1, "id46", "one_channel_mode"], [2, 2, 1, "", "ops"], [2, 2, 1, "", "pattern"], [2, 3, 1, "", "resize_output"], [2, 3, 1, "", "save_model"], [2, 4, 1, "", "seg_type"], [2, 3, 1, "", "set_num_threads"], [2, 2, 1, "", "spec"], [2, 3, 1, "", "to"], [2, 3, 1, "", "train"], [2, 4, 1, "", "use_legacy_polygons"], [2, 2, 1, "id47", "user_metadata"]], "kraken.lib.xml": [[2, 1, 1, "", "XMLPage"]], "kraken.pageseg": [[2, 0, 1, "", "segment"]], "kraken.rpred": [[2, 1, 1, "", "mm_rpred"], [2, 0, 1, "", "rpred"]], "kraken.rpred.mm_rpred": [[2, 2, 1, "", "bidi_reordering"], [2, 2, 1, "", "bounds"], [2, 2, 1, "", "im"], [2, 2, 1, "", "len"], [2, 2, 1, "", "line_iter"], [2, 2, 1, "", "nets"], [2, 2, 1, "", "no_legacy_polygons"], [2, 2, 1, "", "one_channel_modes"], [2, 2, 1, "", "pad"], [2, 2, 1, "", "seg_types"], [2, 2, 1, "", "tags_ignore"]], "kraken.serialization": [[2, 0, 1, "", "render_report"], [2, 0, 1, "", "serialize"]], "kraken.transcribe": [[2, 1, 1, "", "TranscriptionInterface"]], "kraken.transcribe.TranscriptionInterface": [[2, 3, 1, "", "add_page"], [2, 2, 1, "", "env"], [2, 2, 1, "", "font"], [2, 2, 1, "", "line_idx"], [2, 2, 1, "", "page_idx"], [2, 2, 1, "", "pages"], [2, 2, 1, "", "seg_idx"], [2, 2, 1, "", "text_direction"], [2, 2, 1, "", "tmpl"], [2, 3, 1, "", "write"]]}, "objnames": {"0": ["py", "function", "Python function"], "1": ["py", "class", "Python class"], "2": ["py", "attribute", "Python attribute"], "3": ["py", "method", "Python method"], "4": ["py", "property", "Python property"]}, "objtypes": {"0": "py:function", "1": "py:class", "2": "py:attribute", "3": "py:method", "4": "py:property"}, "terms": {"": [0, 1, 2, 4, 5, 6, 7, 8], "0": [0, 2, 4, 5, 7, 8], "00": [0, 5, 7], "0001": 5, "0005": 4, "001": [5, 7], "00it": 5, "01": 4, "0123456789": [0, 4, 7], "0178e411df69": 1, "01c59": 8, "0245": 7, "04": 7, "06": [0, 7], "07": [0, 5], "09": [0, 7], "0ce11ad6": 1, "0d": 7, "0xe682": 4, "0xe68b": 4, "0xe8bf": 4, "0xe8e5": 0, "0xf038": 0, "0xf128": 0, "0xf1a7": 4, "1": [0, 1, 2, 5, 7, 8], "10": [0, 1, 4, 5, 7], "100": [0, 2, 5, 7, 8], "1000": 5, "101": 1, "1020": 8, "10218": 5, "1024": 8, "10592716": 4, "106": [1, 5], "107": 1, "108": 5, "11": 7, "1128": 5, "11346": 5, "1184": 7, "12": [5, 7, 8], "120": 5, "1200": 5, "122": 5, "125": 5, "128": [5, 8], "128000": 5, "128k": 5, "13": [5, 7], "132": 7, "1339": 7, "134": 5, "135": 5, "1359": 7, "136": [1, 5], "14": [0, 5], "1408": 2, "1416": 7, "143": 7, "144": 5, "145": 1, "15": [1, 5, 7], "151": 5, "1558": 7, "1567": 5, "157": 7, "16": [0, 2, 5, 8], "161": 7, "1623": 7, "1681": 7, "1697": 7, "16th": 4, "17": [2, 5], "171": 1, "172": 5, "1724": 7, "174": 5, "1754": 7, "176": 7, "18": [5, 7], "1800": 8, "182": 1, "1873": 1, "19": 5, "192": 5, "195": 1, "198": 5, "199": 5, "1996": 7, "1bpp": 0, "1cycl": 5, "1d": 8, "1e": 5, "1f3b": 1, "1st": 7, "1x0": 5, "1x12": [5, 8], "1x16": 8, "1x48": 8, "2": [0, 2, 4, 5, 7, 8], "20": [2, 5, 8], "200": 5, "2001": [2, 5], "2006": 2, "2014": 2, "2019": 5, "2020": 4, "2021": 0, "2024": 4, "203": 1, "204": 7, "2053": 1, "207": 5, "2096": 7, "21": 4, "210": 5, "215": 5, "2182": 1, "21st": 4, "22": [0, 5, 7], "2236": 1, "224": 1, "2243": 1, "2256": 1, "226": 1, "2264": 1, "22fee3d1": 1, "23": [0, 5], "231": 1, "2324": 1, "2326": 1, "2334": 7, "2336": 1, "2337": 1, "2339": 1, "2344": 1, "2364": 7, "237": 1, "239": 1, "2397": 1, "2398": 1, "23rd": 2, "24": [0, 7], "2404": 1, "241": 5, "2420": 1, "2421": 1, "2422": 1, "2428": 1, "2436": 1, "2437": 1, "244": 1, "2446": 1, "245": 1, "246": 5, "2477": 1, "25": [5, 7, 8], "250": 1, "2500": 7, "2523": 1, "2539": 1, "2542": 1, "256": [5, 7, 8], "2574": 1, "258": 1, "2581": 1, "259": [1, 7], "26": 7, "266": 5, "269": 1, "27": 5, "270": 7, "27046": 7, "274": [1, 5], "277": 1, "28": 5, "2873": 2, "29": [0, 5], "294": 1, "2d": [2, 8], "3": [2, 5, 7, 8], "30": [4, 5, 7], "300": 5, "300dpi": 7, "304": 1, "307": 7, "309": 1, "31": 5, "32": [5, 8], "328": 5, "336": 7, "3418": 7, "345": 1, "35": 5, "35000": 7, "3504": 7, "3519": 7, "35619": 7, "365": 7, "3680": 7, "3748": 1, "3772": 1, "377e": 1, "38": 5, "384": 8, "39": [1, 5], "4": [4, 5, 7, 8], "40": 7, "400": 5, "4000": 5, "4130": 1, "428": 7, "431": 7, "45": 5, "46": 5, "469": 1, "47": 7, "48": [5, 7, 8], "488": 7, "49": [0, 5, 7], "4a35": 1, "4bba": 1, "4d": 2, "4eea": 1, "4f7d": 1, "5": [2, 5, 7, 8], "50": [5, 7], "500": 5, "5000": 5, "50bb": 1, "512": 8, "52": [5, 7], "5226": 5, "523": 5, "5230": 5, "5234": 5, "5258": 7, "5281": [0, 4], "53": 5, "536": 5, "53980": 5, "539eadc": 1, "54": 1, "54114": 5, "5431": 5, "545": 7, "5468665": 0, "56": [0, 4, 7], "5617734": 0, "5617783": 0, "575": 5, "577": 7, "59": [7, 8], "5951": 7, "5983a0c50ce8": 1, "599": 7, "6": [5, 7, 8], "60": [5, 7], "6022": 5, "61": 1, "62": 5, "63": 5, "64": [5, 8], "646": 7, "6542744": 0, "66": [5, 7], "6731": 1, "675": 1, "687": 1, "688": 1, "7": [5, 7, 8], "701": 5, "7012": 5, "7015": 7, "71": [1, 5], "7272": 7, "7281": 7, "738": 1, "74": [1, 5], "758": 1, "7593": 5, "773": 5, "7857": 5, "788": [5, 7], "789": 1, "790": 1, "794": 5, "7943": 5, "8": [0, 2, 5, 7, 8], "80": [2, 5], "800": 7, "8014": 5, "81": [5, 7], "811": 7, "82": 5, "824": 7, "8337": 5, "8344": 5, "8374": 5, "84": 7, "8445": 7, "8479": 7, "8481": 7, "8482": 7, "8484": 7, "8485": 7, "8486": 7, "8487": 7, "8488": 7, "8489": 7, "8490": 7, "8491": 7, "8492": 7, "8493": 7, "8494": 7, "8495": 7, "8496": 7, "8497": 7, "8498": 7, "8499": 7, "8500": 7, "8501": 7, "8502": 7, "8503": 7, "8504": 7, "8505": 7, "8506": 7, "8507": 7, "8508": 7, "8509": 7, "8510": 7, "8511": 7, "8512": 7, "8616": 5, "8620": 5, "876": 7, "8760": 5, "8762": 5, "8790": 5, "8795": 5, "8797": 5, "88": [5, 7], "8802": 5, "8804": 5, "8806": 5, "8813": 5, "8876": 5, "8878": 5, "8883": 5, "889": 7, "9": [2, 5, 7, 8], "90": [2, 5], "906": 8, "906x32": 8, "91": 5, "912": 5, "92": 1, "93": 1, "9315": 7, "9318": 7, "9350": 7, "9361": 7, "9381": 7, "95": [0, 5], "9541": 7, "9550": 7, "96": [5, 7], "97": 7, "98": [4, 7], "99": 7, "9918": 7, "9920": 7, "9924": 7, "A": [0, 1, 2, 4, 5, 7, 8], "As": [0, 1, 2, 5], "BY": 0, "By": 7, "For": [0, 1, 2, 5, 7, 8], "If": [0, 2, 4, 5, 7, 8], "In": [0, 1, 2, 4, 5, 7], "It": [0, 1, 5, 7, 8], "Its": 0, "NO": 7, "One": [2, 5], "The": [0, 1, 2, 3, 4, 5, 7, 8], "Then": 5, "There": [0, 1, 4, 5, 6, 7], "These": [0, 1, 2, 4, 5, 7], "To": [0, 1, 2, 4, 5, 7], "Will": 2, "With": [0, 5], "_abcdefghijklmnopqrstuvwxyz": 4, "a287": 1, "a785": 1, "a8c8": 1, "aaebv2": 0, "abbrevi": 4, "abbyyxml": [0, 4], "abcdefghijklmnopqrstuvwxyz": 4, "abcdefghijklmnopqrstuvxabcdefghijklmnopqrstuvwxyz": 0, "abjad": 5, "abl": [0, 2, 5, 7], "abort": [5, 7], "about": 7, "abov": [0, 1, 4, 5, 7], "absolut": [2, 5], "abstract": 2, "abugida": 5, "acceler": [4, 5, 7], "accent": [0, 4], "accept": [0, 1, 2, 5], "access": [0, 1, 2], "access_token": 0, "accord": [0, 2, 5], "accordingli": 2, "account": [0, 7], "accur": 5, "accuraci": [0, 1, 2, 4, 5, 7], "achiev": 7, "acm": 2, "across": [2, 5], "action": [0, 5], "activ": [0, 5, 7, 8], "actual": [2, 4, 5, 7], "acut": [0, 4], "ad": [2, 5, 7], "adam": 5, "adapt": 5, "add": [0, 2, 4, 5, 8], "add_codec": 2, "add_label": 2, "add_lin": 2, "add_pag": 2, "addit": [0, 1, 2, 4, 5], "addition": 2, "adjust": [5, 7, 8], "administr": 0, "advantag": 5, "advis": 7, "affect": 7, "after": [0, 1, 2, 5, 7, 8], "afterward": [0, 1], "again": [4, 7], "agenc": 4, "aggreg": 2, "ah": 7, "aid": [2, 4], "aim": 5, "aku": 7, "al": [2, 7], "alam": 7, "albeit": 7, "aletheia": 7, "alex": 2, "algn1": 2, "algn2": 2, "algorithm": [0, 1, 2, 5], "align": [2, 5], "align1": 2, "align2": 2, "all": [0, 1, 2, 4, 5, 6, 7], "allographet": 4, "allow": [2, 5, 6, 7], "almost": [0, 1], "along": [2, 8], "alphabet": [0, 2, 4, 5, 7, 8], "alreadi": 5, "also": [0, 1, 2, 4, 5, 7], "altern": [2, 5, 8], "although": [0, 1, 5, 7], "alto": [0, 1, 4, 7], "alto_doc": 1, "alto_seg_onli": 1, "alwai": [0, 2, 4], "amiss": 7, "among": 5, "amount": [0, 2, 5, 7], "amp": 5, "an": [0, 1, 2, 4, 5, 7, 8], "anaconda": 4, "analogu": 0, "analysi": [0, 4, 7], "ani": [0, 1, 2, 5], "annot": [0, 4, 5], "anoth": [0, 2, 5, 7, 8], "anr": 4, "antiqua": 0, "anymor": [0, 5, 7], "anyth": 2, "apach": 4, "apart": [0, 3, 5], "api": 5, "appear": 2, "append": [0, 2, 5, 7, 8], "appli": [0, 1, 2, 4, 7, 8], "applic": [1, 7], "approach": [4, 5, 7], "appropri": [0, 2, 4, 5, 7, 8], "approv": 0, "approxim": [1, 5], "ar": [0, 1, 2, 4, 5, 6, 7, 8], "arab": [0, 5, 7], "arbitrari": [1, 6, 7, 8], "architectur": [4, 5, 6, 8], "archiv": [1, 5, 7], "area": [0, 2], "aren": [2, 5], "arg": 2, "argument": [1, 5], "arian": 0, "arm": 4, "around": [0, 1, 2, 5, 7], "arrai": [1, 2], "arrow": [2, 5], "arrow_t": 2, "arrowipcrecognitiondataset": 2, "arxiv": 2, "ask": 0, "aspect": 2, "assign": [2, 5, 7], "associ": [1, 2], "assum": 2, "attach": [1, 5], "attribut": [1, 2, 5], "au": 4, "aug": 2, "augment": [1, 2, 5, 7, 8], "author": [0, 4], "authorship": 0, "auto": [1, 2, 5], "autocast": 2, "automat": [0, 1, 2, 5, 7, 8], "automatic_optim": 2, "aux_lay": 2, "auxiliari": [0, 1], "avail": [0, 1, 4, 5, 7], "avenir": 4, "averag": [0, 2, 5, 7], "awar": 1, "awesom": 0, "awni": 2, "axi": [2, 8], "b": [0, 1, 2, 5, 7, 8], "b247": 1, "b9e5": 1, "back": [2, 8], "backbon": 5, "backend": 3, "background": [0, 2, 5], "backslash": 5, "base": [1, 2, 5, 6, 7, 8], "base_dir": [1, 2], "baselin": [2, 4, 5, 7], "baseline_seg": 1, "baselinelin": [1, 2], "baselineocrrecord": [1, 2], "baselineset": 2, "basic": [0, 5, 7], "batch": [0, 2, 5, 7, 8], "batch_siz": 5, "bayr\u016bt": 7, "bbox": [1, 2], "bboxlin": [1, 2], "bboxocrrecord": [1, 2], "bcewithlogitsloss": 5, "beam": 2, "beam_decod": 2, "beam_siz": 2, "becaus": [1, 4, 5, 7], "becom": 0, "been": [0, 2, 4, 5, 7], "befor": [2, 5, 7, 8], "beforehand": 7, "behav": [5, 8], "behavior": [2, 5], "being": [1, 2, 5, 8], "below": [0, 5, 7], "best": [0, 2, 7], "better": 5, "between": [0, 2, 5, 7], "bi": [2, 8], "biblissima": 4, "bidi": [2, 4, 5], "bidi_reord": 2, "bidirect": [2, 5], "bidirection": 8, "bien": 1, "binar": [1, 7], "binari": [0, 1, 2], "bind": 0, "bit": [1, 5], "biton": 2, "bl": [0, 4], "black": [0, 1, 2, 7], "black_colsep": 2, "blank": 2, "blank_threshold_decod": 2, "blla": 1, "blob": 2, "block": [0, 1, 2, 5, 8], "block_i": 5, "block_n": 5, "blocktyp": 2, "board": 4, "boilerpl": 1, "book": 0, "bookhand": 0, "bool": 2, "border": [0, 2], "both": [0, 1, 2, 3, 4, 5, 7], "bottom": [0, 1, 2, 4], "bound": [0, 1, 2, 4, 5], "boundari": [0, 1, 2, 5], "box": [1, 2, 4, 5], "branch": 8, "break": 7, "brought": 5, "build": [2, 5, 7], "build_addit": 2, "build_conv": 2, "build_dropout": 2, "build_groupnorm": 2, "build_ident": 2, "build_maxpool": 2, "build_output": 2, "build_parallel": 2, "build_reshap": 2, "build_rnn": 2, "build_ro": 2, "build_seri": 2, "build_wav2vec2": 2, "buld\u0101n": 7, "bundl": 5, "bw": [0, 4], "bw_im": 1, "bw_imag": 7, "b\u00e9n\u00e9fici\u00e9": 4, "c": [0, 1, 2, 4, 5, 8], "c1": 2, "c2": 2, "c4a751dc": 1, "c7767d10c407": 1, "c_sort": 2, "cach": 2, "cairo": 2, "calcul": [1, 2, 5], "calculate_polygonal_environ": 2, "call": [1, 2, 5, 7], "callabl": 2, "callback": 1, "can": [0, 1, 2, 3, 4, 5, 7, 8], "cannot": [0, 1], "capabl": [0, 5], "case": [0, 1, 2, 5, 7], "cat": 0, "catalan": 4, "categori": 2, "catmu": 4, "caus": [1, 2], "caveat": 5, "cb910c0aaf2b": 1, "cc": [0, 4], "cd": 4, "ce": [4, 7], "cedilla": 4, "cell": 8, "cent": 7, "center": 5, "centerlin": [2, 5], "centerline_norm": 2, "central": [4, 7], "centuri": 4, "certain": [0, 2, 7], "chain": [0, 4, 7], "chanc": 2, "chang": [0, 1, 2, 5], "channel": [2, 4, 8], "char": 2, "char_": 2, "char_confus": 2, "charact": [0, 1, 2, 4, 5, 6, 7], "charconfid": 2, "charparam": 2, "charset": 2, "check": 0, "chines": [0, 5], "chinese_training_data": 5, "choic": 5, "chosen": 1, "circumflex": 4, "circumst": 7, "class": [0, 1, 2, 5, 7], "class_map": 2, "class_stat": 2, "classic": 7, "classif": [2, 5, 7, 8], "classifi": [0, 1, 8], "classmethod": 2, "claus": 7, "cli": 1, "clone": 4, "close": [4, 5], "closer": 1, "clstm": [2, 6], "cl\u00e9rice": 4, "code": [0, 1, 2, 4, 5, 7], "codec": 1, "coher": 0, "collabor": 4, "collate_sequ": 2, "collect": [2, 7], "color": [0, 1, 5, 7, 8], "colsep": 0, "column": [0, 1, 2], "com": [2, 4, 7], "combin": [0, 1, 2, 4, 5, 7, 8], "come": [2, 5, 8], "comma": 4, "command": [0, 1, 4, 5, 7], "commenc": 1, "common": [2, 5, 7], "commoni": 5, "commun": 0, "compact": [0, 6], "compar": 5, "comparison": 5, "compat": [2, 3, 4, 5], "compil": 5, "complet": [1, 5, 7], "complex": [1, 7], "complic": 5, "compon": 5, "compos": 2, "composedblocktyp": 5, "composit": 0, "compound": 2, "compress": 7, "compris": 7, "comput": [0, 2, 3, 4, 5, 7], "computation": 7, "compute_confus": 2, "compute_polygon_sect": 2, "con": 1, "concaten": 8, "conda": 7, "condit": [4, 5], "confer": 2, "confid": [0, 1, 2], "configur": [1, 2, 5, 8], "confluenc": 8, "conform": 5, "confus": [2, 5], "conjunct": 5, "connect": [2, 5, 7], "connectionist": 2, "conserv": 5, "consid": [0, 2], "consist": [0, 1, 4, 7, 8], "consolid": 4, "constant": 5, "construct": [1, 5, 7], "contain": [0, 1, 4, 5, 6, 7], "contemporari": 0, "content": [1, 2, 5], "contentgener": 2, "continu": [0, 1, 2, 5, 7], "contrast": [5, 7], "contrib": 1, "contribut": 4, "control": 5, "conv": [5, 8], "converg": 5, "convers": [1, 7], "convert": [0, 1, 2, 5, 7], "convolut": [2, 5], "coord": 5, "coordin": [2, 4], "core": [5, 6], "coreml": 2, "corpu": 5, "correct": [0, 1, 2, 5, 7], "correctli": 8, "correspond": [0, 1, 2], "corsican": 4, "cosin": 5, "cost": 7, "could": [2, 5], "couldn": 2, "count": [2, 5, 7], "counter": 2, "coupl": [0, 5, 7], "cover": 0, "coverag": 7, "cpu": [1, 2, 5, 7], "cr3": [5, 8], "cr7": 5, "creat": [0, 1, 2, 4, 5, 7, 8], "creation": 0, "cremma": 0, "cremma_medieval_bicerin": 0, "criterion": [2, 5], "css": 0, "ctc": [1, 2, 5], "ctc_decod": 1, "ctr3": 8, "cuda": [3, 4, 5], "cudnn": 3, "cumbersom": 0, "cuneiform": 5, "curat": 0, "current": [0, 2, 4, 5, 6], "curv": 0, "custom": [0, 1, 2, 5], "cut": [1, 2, 4], "cycl": 5, "d": [0, 4, 5, 7, 8], "d4b57683f5b0": 1, "dai": 4, "data": [0, 1, 2, 4, 7, 8], "dataclass": 1, "dataset": 1, "dataset_larg": 5, "date": [0, 4], "de": [1, 2, 4, 7], "deal": [0, 4, 5], "debug": [1, 5, 7], "decai": 5, "decent": 5, "decid": [0, 5], "decis": 5, "decod": [1, 2, 5], "decompos": 5, "decomposit": 5, "decreas": 7, "def": 1, "default": [0, 1, 4, 5, 6, 7, 8], "defaultlin": 5, "defin": [0, 1, 2, 4, 5, 8], "definit": [0, 5, 8], "degrad": 1, "degre": 7, "del": 2, "del_indic": 2, "delet": [0, 2, 5, 7], "denot": 0, "depend": [0, 1, 2, 4, 5, 7], "deposit": 0, "deprec": [0, 2], "depth": [5, 7, 8], "deriv": 1, "describ": [2, 5], "descript": [0, 1, 2, 5], "descriptor": 2, "deseri": 2, "desir": [1, 2, 8], "desktop": 7, "destin": 2, "destroi": 5, "detail": [0, 2, 5, 7], "detect": [0, 2], "determin": [0, 2, 5], "develop": [2, 4], "deviat": 5, "devic": [1, 2, 5, 7], "diachron": 4, "diacrit": [4, 5], "diaeres": 7, "diaeresi": [4, 7], "diagram": 5, "dialect": 8, "dice": 5, "dict": 2, "dictionari": [1, 2, 5], "differ": [0, 1, 4, 5, 7, 8], "difficult": 5, "digit": 4, "dilat": 8, "dilation_i": 8, "dilation_x": 8, "dim": [5, 7, 8], "dimens": [2, 8], "dimension": 5, "dir": 5, "direct": [1, 2, 4, 5, 7, 8], "directli": [0, 1, 5, 8], "directori": [1, 2, 4, 5, 7], "disabl": [0, 2, 5, 7], "disallow": 2, "discover": 0, "disk": 7, "displai": [2, 5], "display_ord": 2, "dissimilar": 5, "dist1": 2, "dist2": 2, "distanc": 2, "distinguish": 5, "distractor": 5, "distribut": 8, "dnn": 2, "do": [0, 1, 2, 4, 5, 6, 7, 8], "do0": [5, 8], "doc": [0, 2], "document": [0, 1, 2, 4, 5, 7], "doe": [0, 1, 2, 5, 7], "doesn": [2, 5, 7], "doi": 0, "domain": [1, 5], "don": 5, "done": [0, 4, 5, 7, 8], "dot": [4, 7], "down": [7, 8], "download": [0, 4, 7], "downward": 2, "drastic": 5, "drawback": [0, 5], "driver": 1, "drop": [1, 8], "dropcapitallin": 5, "dropout": [2, 5, 7], "du": 4, "duplic": 2, "dure": [2, 5, 7], "e": [0, 1, 2, 5, 7, 8], "each": [0, 1, 2, 4, 5, 7, 8], "earli": [5, 7], "earlier": 2, "early_stop": 5, "easi": 2, "easiest": 7, "easili": [5, 7], "ecod": 2, "edg": 2, "edit": 7, "editor": 7, "edu": 7, "effect": 0, "either": [0, 1, 2, 5, 7, 8], "element": [1, 5], "elementref": 2, "els": 2, "emit": 2, "emploi": [0, 7], "empti": [2, 5], "enabl": [1, 2, 3, 5, 7, 8], "enable_progress_bar": [1, 2], "enable_summari": 2, "encapsul": 1, "encod": [2, 5, 7], "end": [1, 2, 5], "end_separ": 2, "endfor": 2, "endif": 2, "endmacro": 2, "endpoint": 2, "energi": 2, "enforc": [0, 5], "engin": 1, "english": 4, "enough": 7, "ensur": 5, "entir": 5, "entiti": 2, "entri": 2, "env": [2, 4, 7], "environ": [2, 4, 7], "environment_cuda": 4, "epoch": [5, 7], "equal": [1, 7, 8], "equival": 8, "erron": 7, "error": [0, 2, 5, 7], "escal": [0, 2], "escap": 5, "escripta": 4, "escriptorium": [4, 7], "especi": 0, "esr": 4, "essenti": 5, "estim": [0, 2, 5, 7], "et": 2, "etc": 0, "european": 4, "eval": 2, "evalu": 5, "evaluation_data": 1, "evaluation_fil": 1, "even": [0, 2, 5, 7], "everi": [0, 1], "everyth": 5, "evolv": 4, "exact": [5, 7], "exactli": [1, 5], "exampl": [0, 1, 5, 7], "except": [1, 4, 5], "exchang": 0, "execut": [0, 7, 8], "exhaust": 7, "exist": [0, 1, 4, 5, 7], "exit": 2, "expand": 0, "expect": [2, 5, 7, 8], "experi": [4, 5, 7], "experiment": 7, "explic": 0, "explicit": [1, 5], "explicitli": [1, 5, 7], "exponenti": 5, "express": 0, "extend": [2, 8], "extens": [0, 5], "extent": 7, "extern": 1, "extra": [2, 4], "extract": [0, 1, 2, 4, 5, 7], "extract_polygon": 2, "extrapol": 2, "extrem": 5, "f": [0, 4, 5, 7, 8], "f17d03e0": 1, "f795": 1, "fact": 5, "factor": [0, 2], "fail": 5, "failed_sampl": 2, "faint": 0, "fairli": [5, 7], "fallback": 0, "fals": [1, 2, 5, 7, 8], "fame": 0, "fancy_model": 0, "faq\u012bh": 7, "fashion": 5, "faster": [5, 7, 8], "fc1": 5, "fc2": 5, "fd": 2, "featur": [1, 2, 5, 7, 8], "fed": [0, 1, 2, 5, 8], "feed": [0, 1, 5], "feminin": 7, "fetch": 7, "few": [0, 5, 7], "field": [2, 5], "file": [0, 1, 2, 4, 5, 6, 7], "file_1": 5, "file_2": 5, "filenam": [1, 2, 5], "filenotfounderror": 2, "filetyp": [1, 2], "fill": 2, "filter": [1, 2, 5, 8], "final": [0, 2, 4, 5, 7, 8], "find": [0, 5, 7], "fine": [1, 7], "finereader10": 2, "finereader_xml": 2, "finetun": 5, "finish": 7, "first": [0, 1, 2, 4, 5, 7, 8], "fit": [1, 2, 7], "fix": [0, 5, 7, 8], "flag": [1, 2, 4, 5], "float": [0, 2], "flow": [0, 5], "flush": 2, "fname": 2, "follow": [0, 2, 4, 5, 8], "fondu": 4, "font": 2, "font_styl": 2, "foo": [1, 5], "footrul": 5, "forbid": 2, "forc": [0, 2], "force_binar": 2, "foreground": 0, "forg": 4, "form": [0, 2, 5], "format": [1, 2, 6, 7], "format_typ": 1, "formul": 8, "forward": [2, 8], "found": [0, 1, 2, 5, 7], "four": 0, "fp": 1, "fr_manu_ro": 5, "fr_manu_ro_best": 5, "fr_manu_seg": 5, "fr_manu_seg_best": 5, "fr_manu_seg_with_ro": 5, "framework": [1, 4], "free": [2, 5], "freeli": [0, 7], "freez": 5, "freeze_backbon": 2, "french": [0, 4, 5], "frequenc": [5, 7], "friendli": [4, 7], "from": [0, 1, 2, 3, 4, 7, 8], "full": 7, "fulli": [2, 4, 5], "function": [1, 5], "fundament": 1, "further": [0, 1, 2, 4, 5], "g": [0, 2, 5, 7, 8], "gabai": 4, "gain": 1, "garantue": 2, "gaussian_filt": 2, "gc": 2, "gener": [0, 1, 2, 5, 7], "geneva": 4, "gentl": 5, "geometr": 5, "geometri": 2, "german": 4, "get": [0, 1, 4, 5, 7], "get_feature_dim": 2, "get_sorted_lin": 1, "git": 4, "github": 4, "githubusercont": 7, "gitter": 4, "give": 8, "given": [1, 2, 5, 8], "glob": [0, 1], "global": 2, "global_align": 2, "glori": 0, "glyph": [2, 5, 7], "gn": 8, "gn32": 5, "gn8": 8, "go": 7, "good": 5, "gov": [2, 5], "gpu": [1, 5], "gradient": 2, "grain": [1, 7], "graph": [2, 8], "graphem": [2, 5, 7], "graphemat": 4, "graphic": 5, "grave": [2, 4], "grayscal": [0, 1, 2, 5, 7, 8], "greedi": 2, "greedili": 2, "greedy_decod": [1, 2], "greek": [0, 4, 7], "grei": 0, "grek": 0, "ground": [5, 7], "ground_truth": 1, "groundtruthdataset": 2, "group": [4, 7], "groupnorm": 8, "gru": [2, 8], "gt": [2, 5], "guarante": 1, "guid": 7, "guidelin": 4, "g\u00e9r\u00e9e": 4, "h": [0, 2, 7], "ha": [0, 1, 2, 5, 7, 8], "hamza": [5, 7], "han": 5, "hand": [5, 7], "handl": 1, "handwrit": 5, "handwritten": [0, 5], "hannun": 2, "happen": 1, "happili": 0, "hard": [2, 7], "hardwar": 4, "haut": 4, "have": [0, 1, 2, 3, 4, 5, 7], "headinglin": 5, "heatmap": [0, 1, 8], "hebrew": [0, 5, 7], "hebrew_training_data": 5, "height": [0, 2, 5, 8], "held": 7, "hellip": 4, "help": [4, 7], "henc": 8, "here": [0, 5], "heurist": [0, 5], "high": [0, 1, 2, 7, 8], "higher": 8, "highli": [2, 5, 7], "hijo": 1, "histor": 4, "hline": 0, "hoc": 5, "hocr": [0, 4, 7], "honor": 0, "horizon": 4, "horizont": [0, 1, 2], "hour": 7, "how": [4, 5, 7], "howev": 8, "hpo": [2, 5], "hpu": 5, "html": 2, "htr": 4, "http": [2, 4, 5, 7], "huffmann": 5, "human": [2, 5], "hundr": 7, "hyper_param": 2, "hyperparamet": 5, "h\u0101d\u012b": 7, "i": [0, 1, 2, 4, 5, 6, 7, 8], "ibn": 7, "id": [1, 2, 5], "ident": [1, 2, 8], "identifi": [0, 2], "idx": 2, "ignor": [0, 2, 5], "illustr": 2, "im": [1, 2], "im_feat": 2, "im_mod": 2, "im_siz": 2, "im_transform": 2, "imag": [0, 1, 2, 4, 5, 8], "image_s": [1, 2], "imagefilenam": 5, "imageinputtransform": 2, "imagenam": [1, 2], "imaginari": [2, 7], "img": 2, "immedi": 5, "immut": 1, "implement": [0, 1, 8], "impli": 5, "implicit": 1, "implicitli": 5, "import": [0, 1, 5, 7], "importantli": [2, 5, 7], "improv": [0, 5, 7], "includ": [0, 1, 4, 5, 7], "inclus": 0, "incompat": 2, "inconsist": 4, "incorrect": 7, "increas": [5, 7], "independ": 8, "index": [0, 2, 5], "indic": [2, 5, 7], "individu": [0, 2, 5], "individualis": 0, "inf": 5, "infer": [2, 4, 5, 7], "influenc": 5, "inform": [0, 1, 2, 4, 5, 7], "ingest": 5, "inherit": [5, 7], "init": 1, "init_weight": 2, "initi": [0, 1, 2, 5, 7, 8], "inlin": 0, "innov": 4, "input": [1, 2, 5, 7, 8], "input_1": [0, 7], "input_2": [0, 7], "input_imag": 7, "inria": 4, "ins": 2, "insert": [1, 2, 5, 7, 8], "insid": 2, "insight": 1, "inspect": [5, 7], "instal": 3, "instanc": [0, 1, 2, 5], "instanti": 2, "instead": [2, 5, 7], "insuffici": 7, "int": 2, "integ": [0, 1, 2, 5, 7, 8], "integr": 7, "intend": 4, "intens": 7, "interact": 0, "interchang": 2, "interfac": [2, 4], "interlinearlin": 5, "intermedi": [1, 5, 7], "intern": [0, 1, 2, 7], "interoper": 2, "interrupt": 5, "introduct": 5, "inttensor": 2, "intuit": 8, "invalid": [2, 5], "inventori": [5, 7], "invers": 0, "investiss": 4, "invoc": 5, "invok": 7, "involv": [5, 7], "iou": 5, "ipc": 2, "ipu": 5, "irregular": 5, "is_tot": 1, "is_valid": 2, "isn": [1, 2, 7, 8], "italian": 4, "item": 2, "iter": [1, 2, 7], "its": [0, 1, 2, 5, 7, 8], "itself": 1, "j": [2, 4], "jinja": 0, "jinja2": [1, 2], "join": 2, "jpeg": [0, 7], "jpeg2000": [0, 4], "jpg": [0, 5], "json": [0, 2, 4, 5], "just": [0, 1, 4, 5, 7], "justif": 5, "k": [2, 5], "kamil": 5, "keep": [0, 5], "kei": [2, 4], "kernel": [5, 8], "kernel_s": 8, "keto": [0, 5, 7], "keyword": 0, "kind": [0, 2, 5, 6, 7], "kit\u0101b": 7, "know": 7, "known": [2, 7], "kraken": [0, 1, 3, 5, 6, 8], "krakencairosurfaceexcept": 2, "krakencodecexcept": 2, "krakenencodeexcept": 2, "krakeninputexcept": 2, "krakeninvalidmodelexcept": 2, "krakenrecordexcept": 2, "krakenrepoexcept": 2, "krakenstoptrainingexcept": 2, "krakentrain": [1, 2], "kutub": 7, "kwarg": 2, "l": [0, 2, 4, 7, 8], "l2c": [1, 2], "l2c_singl": 2, "la": 4, "label": [0, 1, 2, 5], "lack": 7, "lag": 5, "lang": 2, "languag": [2, 4, 5, 8], "larg": [0, 1, 2, 4, 5, 7], "larger": [2, 5, 7], "last": [0, 2, 5, 8], "lastli": 5, "later": [0, 7], "latest": [3, 4], "latin": [0, 4], "latin_training_data": 5, "latn": [0, 4], "latter": 1, "layer": [2, 5, 7], "layout": [0, 2, 4, 5, 7], "lbx100": [5, 7, 8], "lbx128": 8, "lbx200": 5, "lbx256": [5, 8], "learn": [1, 2, 5], "least": [1, 5, 7], "leav": [5, 8], "lectaurep": 0, "left": [0, 2, 4, 5, 7], "leftmost": 2, "leftward": 0, "legaci": [5, 7, 8], "legacy_polygon": 2, "legacy_polygons_statu": 2, "leipzig": 7, "len": 2, "length": [2, 5], "less": [5, 7], "let": 7, "letter": [0, 4], "level": [0, 1, 2, 5, 7], "lfx25": 8, "lfys20": 8, "lfys64": 8, "lib": 1, "libr": 4, "librari": 1, "licens": 0, "ligatur": 4, "light": 0, "lightn": [1, 2], "lightningmodul": 1, "like": [0, 1, 5, 7], "likewis": [1, 7], "limit": [0, 5], "line": [0, 1, 2, 4, 5, 7, 8], "line_0": 5, "line_1469098625593_463": 1, "line_1469098649515_464": 1, "line_1469099255968_508": 1, "line_idx": 2, "line_implicit": 1, "line_it": 2, "line_k": 5, "line_ord": [1, 2], "line_transkribu": 1, "line_typ": 2, "line_type_": 2, "line_width": 2, "linear": [2, 5, 7, 8], "link": [4, 5], "linux": [4, 7], "list": [0, 1, 2, 4, 5, 7], "liter": 2, "litteratur": 0, "ll": 4, "lo": 1, "load": [0, 1, 2, 4, 5, 7], "load_ani": [1, 2], "load_model": [1, 2], "loadabl": 2, "loader": 1, "loc": [2, 5], "local": 5, "locat": [1, 2, 5, 7], "log": [2, 5, 7], "log_dir": 2, "logger": [2, 5], "logic": [2, 5], "logical_ord": 2, "logograph": 5, "long": [0, 4, 5], "longest": 2, "look": [0, 1, 5, 7], "loop": 2, "loss": 5, "lossless": 7, "lot": [1, 5], "low": [0, 1, 2, 5], "lower": 5, "lr": [0, 1, 2, 7], "lrate": 5, "lstm": [2, 8], "ltr": 0, "m": [0, 2, 4, 5, 7, 8], "mac": [4, 7], "machin": 2, "macro": 2, "macron": [0, 4], "maddah": 7, "made": 7, "mai": [0, 1, 2, 5, 7], "main": [0, 4, 5, 7], "mainli": 1, "major": 1, "make": [0, 5], "mandatori": 1, "mani": [2, 5], "manifest": 5, "manual": [0, 1, 2, 5, 7], "manuscript": [0, 4, 7], "map": [0, 1, 2, 5], "mark": [5, 7], "markedli": 7, "mask": [1, 2, 5], "massag": 5, "match": [2, 5, 8], "materi": [0, 1, 4, 5, 7], "matric": 2, "matrix": [1, 5], "matter": 7, "max": 2, "max_epoch": 2, "max_label": 2, "maxcolsep": [0, 2], "maxim": 7, "maximum": [0, 2, 8], "maxpool": [2, 5, 8], "mb": [0, 5], "mbl_dict": 2, "mean": [1, 2, 5, 7], "measur": 5, "measurementunit": [2, 5], "mediev": [0, 4], "memori": [2, 5, 7], "merg": [2, 5], "merge_baselin": 2, "merge_region": 2, "messag": 2, "metadata": [0, 1, 2, 4, 5, 6, 7], "method": [0, 1, 2], "metric": 5, "might": [0, 4, 5, 7], "min": [2, 5], "min_epoch": 2, "min_length": 2, "mind": 5, "minim": [1, 2, 5], "minimum": 5, "minor": 5, "mismatch": [1, 5, 7], "misrecogn": 7, "miss": [0, 2, 5, 7], "mittagessen": [4, 7], "mix": [0, 2, 5], "ml": 6, "mlmodel": [0, 4, 5, 7], "mlp": 5, "mm_rpred": [1, 2], "mode": [0, 1, 2, 5], "model": [1, 7, 8], "model_1": 5, "model_25": 5, "model_5": 5, "model_best": 5, "model_fil": 7, "model_nam": 7, "model_name_best": 7, "model_path": 1, "model_typ": 2, "modern": [0, 7], "modest": 1, "modif": 5, "modul": 1, "momentum": [5, 7], "mono": 0, "more": [0, 1, 2, 4, 5, 7, 8], "most": [0, 1, 2, 5, 7], "mostli": [0, 1, 4, 5, 7, 8], "move": [2, 7, 8], "mp": 8, "mp2": [5, 8], "mp3": 8, "mreg_dict": 2, "much": [1, 2, 4, 5], "multi": [0, 1, 2, 4, 7], "multilabel": 2, "multipl": [0, 1, 2, 4, 5, 7], "my": 0, "myprintingcallback": 1, "n": [0, 2, 5, 8], "name": [0, 2, 4, 5, 7, 8], "named_spec": 2, "national": 4, "nativ": [0, 2, 6], "natur": [2, 7], "nchw": 2, "ndarrai": 2, "necessari": [0, 1, 2, 4, 5, 7], "necessarili": [2, 5], "need": [1, 2, 7], "neg": 5, "nest": 2, "net": [1, 2, 7], "network": [1, 2, 4, 5, 6, 7], "neural": [1, 2, 5, 6, 7], "neural_reading_ord": 2, "never": 7, "nevertheless": [1, 5], "new": [0, 1, 2, 3, 5, 7, 8], "next": [1, 7], "nf": 5, "nfc": 5, "nfd": 5, "nfkc": 5, "nfkd": [4, 5], "nlbin": [0, 1, 2], "nn": 2, "no_encod": 2, "no_hlin": 2, "no_legacy_polygon": 2, "noisi": 7, "non": [0, 1, 2, 4, 5, 7, 8], "none": [0, 1, 2, 5, 7, 8], "nonlinear": 8, "nor": 1, "norm": 4, "normal": [2, 4], "notabl": 0, "note": 2, "notion": 1, "now": [1, 7], "np": 2, "num": [2, 5], "num_class": 2, "number": [0, 1, 2, 5, 7, 8], "numer": [1, 2, 7], "numpi": [1, 2], "nvidia": [3, 5], "o": [0, 1, 2, 4, 5, 7], "o1c103": 8, "o2l8": 8, "o_": 2, "o_1530717944451": 1, "object": [0, 1, 2], "obtain": 7, "obvious": 7, "occur": 7, "occurr": 2, "ocr": [0, 1, 2, 4, 7], "ocr_": 2, "ocr_0": 2, "ocr_record": [1, 2], "ocropi": 2, "ocropu": [0, 2], "off": [5, 7], "offer": 5, "offset": [2, 5], "often": [0, 1, 5, 7], "ogonek": 4, "old": [0, 2, 6], "omit": 7, "on_init_end": 1, "on_init_start": 1, "on_train_end": 1, "onc": [0, 5], "one": [0, 1, 2, 5, 7, 8], "one_channel_mod": 2, "ones": [1, 5], "onli": [0, 1, 2, 5, 7, 8], "onto": [2, 5], "op": 2, "open": 1, "openmp": [2, 5, 7], "oper": [1, 2, 8], "optic": [0, 7], "optim": [0, 4, 5, 7], "option": [0, 1, 2, 5, 8], "order": [0, 1, 4, 8], "orderedgroup": 2, "org": [2, 5], "orient": [0, 1, 2], "origin": [1, 2, 5, 8], "originalcoord": 2, "orthogon": 2, "other": [0, 2, 4, 5, 7, 8], "othertag": 2, "otherwis": [2, 5], "out": [0, 5, 7, 8], "output": [1, 2, 4, 5, 7, 8], "output_1": [0, 7], "output_2": [0, 7], "output_dir": 7, "output_fil": 7, "output_s": 2, "outsid": 2, "over": [2, 4], "overal": 5, "overfit": 7, "overhead": 5, "overlap": 5, "overrepres": 5, "overrid": [2, 5], "overwritten": 2, "own": 4, "p": [0, 4, 5], "pac": 1, "packag": [1, 2, 4, 7], "pacto": 1, "pad": [0, 2, 5], "padding_left": 2, "padding_right": 2, "pag": 1, "page": [1, 2, 4, 7], "page_0": 2, "page_idx": 2, "pagecont": 5, "pageseg": 1, "pagewiseroset": 2, "pagexml": [0, 1, 4, 7], "paint": 5, "pair": [0, 2, 5], "pairwiseroset": 2, "paper": [0, 4], "par": [1, 2, 4], "paradigm": 0, "paragraph": [2, 5], "parallel": [2, 5, 8], "param": [5, 7, 8], "paramet": [0, 1, 2, 4, 5, 7, 8], "parameterless": 0, "parametr": 2, "parchment": 0, "pari": 4, "pars": [2, 5], "parsed_doc": 1, "parser": [1, 2, 5], "part": [0, 1, 5, 7, 8], "parti": 1, "partial": [2, 4], "particular": [0, 1, 4, 5, 7, 8], "partit": 5, "pass": [2, 5, 7, 8], "path": [1, 2, 5], "pathlik": 2, "pattern": [2, 7], "pcgt": 5, "pdf": [0, 4, 7], "pdfimag": 7, "pdftocairo": 7, "peopl": 4, "per": [0, 1, 2, 5, 7], "perc": [0, 2], "percentag": 2, "percentil": 2, "perfect": 5, "perform": [1, 2, 4, 5, 7], "period": 7, "perispomeni": 4, "persist": 0, "person": 0, "physical_img_nr": 2, "pick": 5, "pickl": 6, "pil": [1, 2], "pillow": 1, "pinch": 0, "pinpoint": 7, "pipelin": [1, 2, 5], "pixel": [0, 1, 2, 5, 8], "pl_logger": 2, "pl_modul": 1, "place": [0, 4, 5, 7], "placement": 7, "plain": 0, "platform": 0, "pleas": 5, "plethora": 1, "png": [0, 1, 5, 7], "point": [0, 1, 2, 5, 7], "polygon": [0, 1, 2, 5, 7], "polygonal_reading_ord": 2, "polygongtdataset": 2, "polygonizaton": 2, "polylin": 2, "polyton": [0, 7], "pool": 5, "porson": 0, "portant": 4, "portion": 0, "posit": [2, 5], "possibl": [0, 1, 2, 4, 5, 7, 8], "postoper": 2, "postprocess": [1, 2, 5], "potenti": 5, "power": [5, 7], "practic": 1, "pratiqu": 4, "pre": [0, 5], "preced": 5, "precis": [2, 5], "precompil": [2, 5], "precomput": 2, "pred": 2, "pred_it": 1, "predict": [1, 2, 5], "predict_label": 2, "predict_str": 2, "prefer": [1, 7], "prefilt": 0, "prefix": [2, 5, 7], "prefix_epoch": 7, "preliminari": 0, "preload": 7, "prematur": 5, "preoper": 2, "prepar": 7, "prepend": 8, "preprint": 2, "preprocess": [2, 4], "prerequisit": 4, "present": 4, "preserv": [2, 4], "pretrain_best": 5, "prevent": [2, 7], "previou": [4, 5], "previous": [4, 5], "previtem": 2, "primaresearch": 5, "primari": [0, 1, 5], "primarili": 4, "princip": [1, 2, 5], "principl": 4, "print": [0, 1, 2, 4, 5, 7], "printspac": [2, 5], "privat": 0, "prob": [2, 8], "probabl": [2, 5, 7, 8], "problemat": 5, "proc_type_t": 2, "proce": 8, "proceed": 2, "process": [0, 1, 2, 4, 5, 7, 8], "processing_step": 2, "processingcategori": 2, "processingsoftwar": 2, "processingstep": 2, "processingstepdescript": 2, "processingstepset": 2, "produc": [0, 1, 2, 4, 5, 7], "programm": 4, "progress": [2, 7], "project": [4, 8], "prone": 5, "pronn": 6, "proper": 1, "properli": 7, "properti": [1, 2], "proport": 5, "proportion": 2, "protobuf": [2, 6], "prove": 7, "provid": [0, 1, 2, 4, 5, 7, 8], "psl": 4, "public": [0, 4], "publish": 4, "pull": 4, "pure": 5, "purpos": [0, 1, 2, 7, 8], "put": [2, 5, 7], "py": 1, "pypi": 4, "pyrnn": 6, "python": 4, "pytorch": [0, 1, 2, 3, 6], "pytorch_lightn": 1, "pytorchcodec": 2, "pyvip": 4, "q": 5, "qualiti": [0, 1, 7], "queryabl": 0, "quit": [1, 4, 5], "r": [0, 2, 5, 8], "rais": [1, 2, 5], "raise_on_error": 2, "ran": 4, "random": [2, 5, 7], "randomli": 5, "rang": [0, 2], "rank": 5, "rapidli": [5, 7], "rare": 5, "rate": [5, 7], "rather": [0, 5], "ratio": 5, "raw": [0, 1, 5, 7], "rb": 2, "reach": [5, 7], "read": [0, 1, 4], "reader": 5, "reading_ord": [1, 2], "reading_order_fn": 2, "readingord": 2, "real": 7, "realiz": 5, "reason": [0, 2, 5], "rebuild_alphabet": 2, "rec_model_path": 1, "recherch": 4, "recogn": [0, 1, 2, 4, 5, 7], "recognit": [3, 8], "recognitionmodel": 1, "recommend": [0, 1, 5, 7], "recomput": 2, "record": [1, 2, 4], "rectangl": 2, "rectangular": 0, "recurr": [2, 6], "reduc": [5, 8], "reduceonplateau": 5, "ref": 2, "refer": [0, 1, 5, 7], "refin": 5, "region": [0, 1, 2, 4, 5, 7], "region_1469098557906_461": 1, "region_1469098609000_462": 1, "region_implicit": 1, "region_ord": 1, "region_transkribu": 1, "region_typ": [2, 5], "region_type_": 2, "regular": 5, "reinstanti": 2, "rel": 5, "relat": [0, 1, 5, 7], "relax": 7, "reli": 5, "reliabl": [5, 7], "relu": [5, 8], "remain": [0, 5, 7], "remaind": 8, "remedi": 7, "remov": [0, 2, 5, 7, 8], "render": [1, 2], "render_lin": 2, "render_report": 2, "reorder": [2, 5, 7], "repeatedli": 7, "replac": 1, "repolygon": 1, "report": [2, 5, 7], "repositori": [4, 5, 7], "repres": 2, "represent": [2, 7], "reproduc": 5, "request": [0, 4, 8], "requir": [0, 1, 2, 4, 5, 7, 8], "requisit": 7, "rescal": 2, "research": 4, "reserv": 1, "reshap": [2, 5], "resili": 4, "resiz": [2, 5], "resize_output": 2, "resolut": 2, "resolv": [4, 5], "respect": [1, 2, 4, 5, 8], "result": [0, 1, 2, 4, 5, 7, 8], "resum": 5, "retain": [2, 5], "retrain": 7, "retriev": [4, 5, 7], "return": [0, 1, 2, 8], "reus": 2, "revers": [4, 8], "rgb": [1, 2, 5, 8], "right": [0, 2, 4, 5, 7], "ring": 4, "rl": [0, 2], "rmsprop": [5, 7], "rnn": [2, 4, 5, 7, 8], "ro": [1, 2, 5], "ro_": 2, "ro_0": 2, "ro_id": 2, "ro_net": 5, "roadd": 5, "robust": 5, "romanov": 7, "root": 5, "rotat": 0, "rotrain": 5, "rough": 7, "roughli": 0, "round": 2, "routin": 1, "rpred": 1, "rtl": 0, "rtl_display_data": 5, "rtl_training_data": 5, "rukkakha": 7, "rule": 7, "run": [1, 2, 3, 4, 5, 7, 8], "r\u00e9f\u00e9renc": 4, "s1": [5, 8], "sa": 0, "same": [0, 1, 2, 4, 5, 7, 8], "sampl": [2, 5, 7], "sarah": 7, "satur": 5, "savant": 7, "save": [2, 5, 7], "save_model": 2, "savefreq": [5, 7], "scale": [0, 2, 5, 8], "scale_polygonal_lin": 2, "scale_region": 2, "scan": 7, "scantailor": 7, "schedul": 5, "schema": [2, 5], "schemaloc": [2, 5], "scientif": 4, "score": 5, "scratch": [0, 1], "script": [0, 1, 2, 4, 5, 7], "script_detect": [1, 2], "scriptal": 1, "scroung": 4, "seamcarv": 2, "search": [0, 2], "second": [0, 2], "section": [1, 2, 7], "see": [0, 1, 2, 5, 7], "seen": [0, 1, 7], "seg": 1, "seg_idx": 2, "seg_typ": 2, "segment": [4, 7], "segment_": 2, "segment_k": 5, "segmentation_output": 1, "segmentation_overlai": 1, "segmentationmodel": 1, "segmodel_best": 5, "segtrain": 5, "seldom": 7, "select": [0, 2, 5, 8], "selector": 2, "self": 1, "semant": [5, 7], "semi": [0, 7], "sensibl": [1, 5], "separ": [0, 1, 2, 4, 5, 7, 8], "sephardi": 0, "seq1": 2, "seq2": 2, "seqrecogn": 2, "sequenc": [1, 2, 5, 7, 8], "serial": [0, 4, 5, 6, 8], "set": [0, 1, 2, 4, 5, 7, 8], "set_num_thread": 2, "setup": 1, "sever": [1, 4, 7, 8], "sgd": 5, "shape": [2, 5, 8], "share": [0, 5], "shell": 7, "shini": 2, "ship": 2, "short": [0, 8], "should": [1, 2, 7], "show": [0, 4, 5, 7], "shown": [0, 7], "shuffl": 1, "side": 0, "sigmoid": 8, "signific": 5, "similar": [1, 5, 7], "simon": 4, "simpl": [0, 1, 5, 7, 8], "simpli": [2, 8], "simplifi": 0, "singl": [0, 1, 2, 5, 7, 8], "singular": 2, "size": [0, 1, 2, 5, 7, 8], "skew": [0, 7], "skip": 2, "skip_empty_lin": 2, "slice": 2, "slightli": [4, 5, 7, 8], "slow": [2, 5], "slower": 5, "small": [0, 1, 2, 4, 5, 7, 8], "so": [0, 1, 2, 3, 5, 7, 8], "sobel": 2, "softmax": [1, 2, 8], "softwar": [0, 7], "softwarenam": 2, "softwarevers": 2, "some": [0, 1, 4, 5, 7], "someon": 0, "someth": [1, 7], "sometim": [1, 4, 5, 7], "somewhat": 7, "soon": [5, 7], "sort": [2, 4, 7], "sourc": [1, 2, 5, 7, 8], "sourceimageinform": [2, 5], "sp": [2, 5], "space": [0, 1, 2, 4, 5, 7], "spanish": 4, "spearman": 5, "spec": [2, 5], "special": [0, 1, 2], "specialis": 5, "specif": [0, 5, 7], "specifi": [0, 1, 5], "speckl": 7, "speech": 2, "speed": 5, "speedup": 5, "split": [1, 2, 5, 7, 8], "split_filt": 2, "spot": 4, "sqrt": 5, "squar": 5, "squash": [2, 8], "stabil": 2, "stabl": [1, 4, 5], "stack": [2, 5, 8], "stage": [0, 1, 5], "standard": [0, 1, 2, 4, 5, 7], "start": [0, 1, 2, 5, 7], "start_separ": 2, "stddev": 5, "step": [0, 1, 2, 4, 5, 7, 8], "still": [0, 1, 2, 4], "stop": [5, 7], "storag": 5, "str": 2, "straightforward": 1, "stream": 5, "strength": 1, "strict": [2, 5], "strictli": 7, "stride": [5, 8], "stride_i": 8, "stride_x": 8, "string": [2, 5, 8], "strip": 8, "strong": 4, "structur": [1, 4, 5], "stub": 5, "style": 1, "su": 1, "sub": [1, 2], "subclass": 1, "subcommand": [0, 4], "subcommand_1": 0, "subcommand_2": 0, "subcommand_n": 0, "subimag": 2, "suboptim": 5, "subsampl": 5, "subsequ": [1, 2, 5], "subset": [1, 2], "substitut": [2, 5, 7], "suffer": 7, "suffici": [1, 5], "suffix": 0, "suggest": [0, 1], "suit": 7, "suitabl": [0, 7], "sum": [2, 5], "summar": [2, 5, 7, 8], "superflu": 7, "superscript": 4, "supervis": 5, "suppl_obj": 2, "suppli": [0, 1, 2, 5, 7], "support": [0, 1, 4, 5, 6], "suppos": 1, "suppress": [0, 5], "sure": [0, 5], "surfac": [0, 2], "surrog": 5, "switch": [0, 2, 5, 7], "symbol": [5, 7], "syntax": [0, 5, 8], "syr": [5, 7], "syriac": 7, "syriac_best": 7, "system": [0, 4, 5, 7, 8], "systemat": 7, "t": [0, 1, 2, 5, 7, 8], "tabl": [5, 7], "tag": [1, 2, 5], "tagref": 2, "tags_ignor": 2, "take": [1, 4, 5, 7, 8], "tanh": 8, "target": 2, "target_output_shap": 2, "task": [5, 7], "tb": 2, "technic": 4, "tei": 0, "tell": 5, "templat": [0, 1, 4], "template_sourc": 2, "tempor": 2, "tensor": [1, 2, 8], "tensorboard": 5, "tensorflow": 8, "term": 4, "tesseract": 8, "test": [2, 7], "test_model": 5, "text": [1, 2, 4, 7], "text_direct": [1, 2], "text_transform": 2, "textblock": [2, 5], "textblock_": 2, "textblock_m": 5, "textblock_n": 5, "textequiv": 5, "textlin": [2, 5], "textregion": 5, "textregion_1520586482298_193": 1, "textregion_1520586482298_194": 1, "textual": 1, "th": 2, "than": [2, 5, 7], "thei": [1, 2, 5, 7], "them": [0, 1, 2, 5], "therefor": [0, 1, 5, 7], "therein": 7, "thi": [0, 1, 2, 4, 5, 6, 7, 8], "thibault": 4, "thing": 5, "third": 1, "those": [4, 5], "though": 1, "thousand": 7, "thread": [2, 5, 7], "three": [5, 6], "threshold": [0, 2], "through": [0, 1, 2, 4, 5, 7, 8], "thrown": 0, "thu": 1, "tif": [0, 4], "tiff": [0, 4, 7], "tightli": 7, "tild": [0, 4], "time": [0, 1, 2, 5, 7, 8], "tip": 1, "titl": 0, "titr": 4, "tmpl": [0, 2], "to_contain": 1, "todo": 1, "togeth": 8, "token": 0, "told": 5, "too": [5, 8], "tool": [1, 5, 7, 8], "top": [0, 1, 2, 4], "toplin": [2, 5], "topolog": 0, "torch": 2, "torchsegrecogn": 2, "torchseqrecogn": [1, 2], "torchvgslmodel": [1, 2], "total": [5, 7], "tpu": 5, "tr9": 2, "track": 5, "train": [0, 3, 8], "trainabl": [0, 1, 2, 4, 5], "trainer": [1, 5], "training_data": [1, 5], "training_fil": 1, "transcrib": [4, 5, 7], "transcript": [1, 2, 4, 5], "transcriptioninterfac": 2, "transfer": [1, 5], "transform": [1, 2, 4, 5, 8], "transkribu": 1, "translat": 2, "transpos": [5, 7, 8], "travail": 4, "treat": [2, 7, 8], "trial": 5, "true": [1, 2, 8], "truli": 0, "truth": [5, 7], "try": [2, 4, 8], "tupl": 2, "turn": 4, "tutori": [1, 5], "tweak": 0, "two": [0, 1, 2, 5, 8], "txt": [0, 4, 5], "type": [0, 1, 2, 5, 7, 8], "typefac": [5, 7], "typograph": 7, "typologi": 5, "u": [0, 1, 4, 5], "u1f05": 5, "uax": 2, "un": 4, "unclean": 7, "unclear": 5, "undecod": 1, "undegrad": 0, "under": [0, 4], "undesir": [5, 8], "unencod": 2, "uneven": 0, "uni": [0, 7], "unicod": [1, 2, 4, 7], "uniformli": 2, "union": [2, 4, 5], "uniqu": [0, 2, 7], "univers": [0, 4], "universit\u00e9": 4, "unknown": 2, "unlabel": 5, "unlearn": 5, "unless": 5, "unnecessarili": 1, "unord": 1, "unorderedgroup": 2, "unpredict": 7, "unrepres": 7, "unseg": [2, 7], "unset": 5, "until": 5, "untrain": 5, "unus": 5, "up": [1, 4, 5], "upcom": 4, "updat": 0, "upload": [0, 5], "upon": 0, "upward": [2, 5, 7], "ur": 0, "us": [0, 1, 2, 3, 5, 7, 8], "usabl": 1, "use_legacy_polygon": 2, "user": [0, 2, 4, 5, 7], "user_metadata": 2, "usual": [0, 1, 5, 7], "utf": [2, 5], "util": [1, 4, 5, 7], "v": [2, 4, 5, 7], "v1": 2, "v4": [2, 5], "val_loss": 5, "val_spearman": 5, "valid": [0, 2, 5], "valid_baselin": 2, "valid_norm": 2, "valid_region": 2, "valu": [0, 1, 2, 5, 8], "valueerror": 2, "variabl": [2, 4, 5, 8], "variant": [4, 5, 8], "variat": 5, "varieti": [4, 5], "variou": 0, "vast": 1, "vector": [0, 1, 2, 5], "vectorize_lin": 2, "verbos": [1, 7], "veri": 5, "versa": [0, 5], "versatil": 6, "version": [0, 2, 3, 4, 5], "vertic": [0, 2], "vgsl": [1, 5], "vice": [0, 5], "visual": [0, 5], "vocabulari": 2, "vocal": 7, "vpo": [2, 5], "vsgl": 2, "vv": 7, "w": [0, 1, 2, 5, 8], "w3": [2, 5], "wa": [2, 4, 5, 7], "wai": [0, 1, 5, 7], "wait": 5, "want": [4, 5, 7], "warmup": 5, "warn": [0, 1, 2, 7], "warp": 7, "wav2vec2": 2, "wc": 2, "we": [2, 5, 7], "weak": [1, 7], "websit": 7, "weight": [2, 5], "welcom": 4, "well": [0, 5, 7], "were": [2, 5], "west": 4, "western": 7, "wget": 7, "what": [1, 7], "when": [0, 1, 2, 5, 7, 8], "where": [0, 2, 5, 7, 8], "whether": 2, "which": [0, 1, 2, 3, 4, 5], "while": [0, 1, 2, 5, 7], "white": [0, 1, 2, 7], "whitespac": [2, 5], "whitespace_norm": 2, "whole": [2, 7], "wide": [4, 8], "wider": 0, "width": [1, 2, 5, 7, 8], "wildli": 7, "without": [0, 2, 5, 7], "won": 2, "word": [2, 4, 5], "word_text": 5, "wordstart": 2, "work": [0, 1, 2, 5, 7, 8], "workabl": 5, "worker": 5, "world": [0, 7], "worsen": 0, "would": [0, 2, 5], "wrapper": [1, 2], "write": [0, 1, 2, 5], "writing_mod": 2, "written": [0, 5, 7], "www": [2, 5], "x": [0, 2, 4, 5, 7, 8], "x0": 2, "x01": 1, "x02": 1, "x03": 1, "x04": 1, "x05": 1, "x06": 1, "x07": 1, "x1": 2, "x64": 4, "x_n": 2, "x_stride": 8, "xa0": 7, "xdg_base_dir": 0, "xk": 2, "xm": 2, "xmax": 2, "xmin": 2, "xml": [0, 7], "xmln": [2, 5], "xmlpage": [1, 2], "xmlschema": [2, 5], "xn": 2, "xsd": [2, 5], "xsi": [2, 5], "xyz": 0, "y": [0, 2, 8], "y0": 2, "y1": 2, "y2": 2, "y_n": 2, "y_stride": 8, "year": 4, "yield": 2, "yk": 2, "ym": 2, "ymax": 2, "ymin": 2, "yml": [4, 7], "yn": 2, "you": [4, 5, 7], "your": 5, "ypogegrammeni": 4, "y\u016bsuf": 7, "zenodo": [0, 4], "zero": [2, 7, 8], "zigzag": 0, "zoom": [0, 2], "\u00e3\u00ed\u00f1\u00f5": 0, "\u00e6\u00df\u00e6\u0111\u0142\u0153\u0153\u0180\u01dd\u0247\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c2\u03c3\u03c4\u03c5\u03c6\u03c7\u03c9\u03db\u05d7\u05dc\u05e8\u1455\u15c5\u15de\u16a0\u00df": 4, "\u00e9cole": 4, "\u00e9tat": 4, "\u00e9tude": 4, "\u0127\u0129\u0142\u0169\u01ba\u1d49\u1ebd": 0, "\u02bf\u0101lam": 7, "\u0621": 5, "\u0621\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0627": 5, "\u0628": 5, "\u0629": 5, "\u062a": 5, "\u062b": 5, "\u062c": 5, "\u062d": 5, "\u062e": 5, "\u062f": 5, "\u0630": 5, "\u0631": 5, "\u0632": 5, "\u0633": 5, "\u0634": 5, "\u0635": 5, "\u0636": 5, "\u0637": 5, "\u0638": 5, "\u0639": 5, "\u063a": 5, "\u0640": 5, "\u0641": 5, "\u0642": 5, "\u0643": 5, "\u0644": 5, "\u0645": 5, "\u0646": 5, "\u0647": 5, "\u0648": 5, "\u0649": 5, "\u064a": 5, "\u0710": 7, "\u0712": 7, "\u0713": 7, "\u0715": 7, "\u0717": 7, "\u0718": 7, "\u0719": 7, "\u071a": 7, "\u071b": 7, "\u071d": 7, "\u071f": 7, "\u0720": 7, "\u0721": 7, "\u0722": 7, "\u0723": 7, "\u0725": 7, "\u0726": 7, "\u0728": 7, "\u0729": 7, "\u072a": 7, "\u072b": 7, "\u072c": 7, "\u2079\ua751\ua753\ua76f\ua770": 0, "\ua751\ua753\ua757\ua759\ua75f\ua76f\ua775": 4}, "titles": ["Advanced Usage", "API Quickstart", "API Reference", "GPU Acceleration", "kraken", "Training", "Models", "Training kraken", "VGSL network specification"], "titleterms": {"4": 2, "abbyi": 2, "acceler": 3, "acquisit": 7, "advanc": 0, "alto": [2, 5], "annot": 7, "api": [1, 2], "baselin": [0, 1], "basic": [1, 8], "best": 5, "binar": [0, 2], "binari": 5, "blla": 2, "box": 0, "codec": [2, 5], "compil": 7, "concept": 1, "conda": 4, "contain": 2, "convolut": 8, "coreml": 6, "ctc_decod": 2, "data": 5, "dataset": [2, 5, 7], "default": 2, "direct": 0, "dropout": 8, "evalu": [2, 7], "exampl": 8, "except": 2, "featur": 4, "find": 4, "fine": 5, "format": [0, 5], "from": 5, "function": 2, "fund": 4, "gpu": 3, "group": 8, "helper": [2, 8], "hocr": 2, "imag": 7, "input": 0, "instal": [4, 7], "kraken": [2, 4, 7], "layer": 8, "legaci": [0, 1, 2], "lib": 2, "licens": 4, "linegen": 2, "loss": 2, "mask": 0, "max": 8, "model": [0, 2, 4, 5, 6], "modul": 2, "network": 8, "normal": [5, 8], "order": [2, 5], "output": 0, "page": [0, 5], "pageseg": 2, "pagexml": 2, "pars": 1, "pip": 4, "plumb": 8, "pool": 8, "practic": 5, "preprocess": [1, 7], "pretrain": 5, "princip": 0, "publish": 0, "queri": 0, "quickstart": [1, 4], "read": [2, 5], "recognit": [0, 1, 2, 4, 5, 6, 7], "recurr": 8, "refer": 2, "regular": 8, "relat": 4, "repositori": 0, "reshap": 8, "retriev": 0, "rpred": 2, "scratch": 5, "segment": [0, 1, 2, 5, 6], "serial": [1, 2], "slice": 5, "softwar": 4, "specif": 8, "templat": 2, "test": 5, "text": [0, 5], "train": [1, 2, 4, 5, 7], "trainer": 2, "transcrib": 2, "transcript": 7, "tune": 5, "tutori": 4, "unicod": 5, "unsupervis": 5, "us": 4, "usag": 0, "valid": 7, "vgsl": [2, 8], "xml": [1, 2, 5]}}) \ No newline at end of file diff --git a/5.2/training.html b/5.2/training.html new file mode 100644 index 000000000..31b766804 --- /dev/null +++ b/5.2/training.html @@ -0,0 +1,509 @@ + + + + + + + + Training kraken — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Training kraken

+

kraken is an optical character recognition package that can be trained fairly +easily for a large number of scripts. In contrast to other system requiring +segmentation down to glyph level before classification, it is uniquely suited +for the recognition of connected scripts, because the neural network is trained +to assign correct character to unsegmented training data.

+

Both segmentation, the process finding lines and regions on a page image, and +recognition, the conversion of line images into text, can be trained in kraken. +To train models for either we require training data, i.e. examples of page +segmentations and transcriptions that are similar to what we want to be able to +recognize. For segmentation the examples are the location of baselines, i.e. +the imaginary lines the text is written on, and polygons of regions. For +recognition these are the text contained in a line. There are multiple ways to +supply training data but the easiest is through PageXML or ALTO files.

+
+

Installing kraken

+

The easiest way to install and use kraken is through conda. kraken works both on Linux and Mac OS +X. After installing conda, download the environment file and create the +environment for kraken:

+
$ wget https://raw.githubusercontent.com/mittagessen/kraken/main/environment.yml
+$ conda env create -f environment.yml
+
+
+

Each time you want to use the kraken environment in a shell is has to be +activated first:

+
$ conda activate kraken
+
+
+
+
+

Image acquisition and preprocessing

+

First a number of high quality scans, preferably color or grayscale and at +least 300dpi are required. Scans should be in a lossless image format such as +TIFF or PNG, images in PDF files have to be extracted beforehand using a tool +such as pdftocairo or pdfimages. While each of these requirements can +be relaxed to a degree, the final accuracy will suffer to some extent. For +example, only slightly compressed JPEG scans are generally suitable for +training and recognition.

+

Depending on the source of the scans some preprocessing such as splitting scans +into pages, correcting skew and warp, and removing speckles can be advisable +although it isn’t strictly necessary as the segmenter can be trained to treat +noisy material with a high accuracy. A fairly user-friendly software for +semi-automatic batch processing of image scans is Scantailor albeit most work can be done using a standard image +editor.

+

The total number of scans required depends on the kind of model to train +(segmentation or recognition), the complexity of the layout or the nature of +the script to recognize. Only features that are found in the training data can +later be recognized, so it is important that the coverage of typographic +features is exhaustive. Training a small segmentation model for a particular +kind of material might require less than a few hundred samples while a general +model can well go into the thousands of pages. Likewise a specific recognition +model for printed script with a small grapheme inventory such as Arabic or +Hebrew requires around 800 lines, with manuscripts, complex scripts (such as +polytonic Greek), and general models for multiple typefaces and hands needing +more training data for the same accuracy.

+

There is no hard rule for the amount of training data and it may be required to +retrain a model after the initial training data proves insufficient. Most +western texts contain between 25 and 40 lines per page, therefore upward of +30 pages have to be preprocessed and later transcribed.

+
+
+

Annotation and transcription

+

kraken does not provide internal tools for the annotation and transcription of +baselines, regions, and text. There are a number of tools available that can +create ALTO and PageXML files containing the requisite information for either +segmentation or recognition training: escriptorium integrates kraken tightly including +training and inference, Aletheia is a powerful desktop +application that can create fine grained annotations.

+
+
+

Dataset Compilation

+
+
+

Training

+

The training data, e.g. a collection of PAGE XML documents, obtained through +annotation and transcription may now be used to train segmentation and/or +transcription models.

+

The training data in output_dir may now be used to train a new model by +invoking the ketos train command. Just hand a list of images to the command +such as:

+
$ ketos train output_dir/*.png
+
+
+

to start training.

+

A number of lines will be split off into a separate held-out set that is used +to estimate the actual recognition accuracy achieved in the real world. These +are never shown to the network during training but will be recognized +periodically to evaluate the accuracy of the model. Per default the validation +set will comprise of 10% of the training data.

+

Basic model training is mostly automatic albeit there are multiple parameters +that can be adjusted:

+
+
--output
+

Sets the prefix for models generated during training. They will best as +prefix_epochs.mlmodel.

+
+
--report
+

How often evaluation passes are run on the validation set. It is an +integer equal or larger than 1 with 1 meaning a report is created each +time the complete training set has been seen by the network.

+
+
--savefreq
+

How often intermediate models are saved to disk. It is an integer with +the same semantics as --report.

+
+
--load
+

Continuing training is possible by loading an existing model file with +--load. To continue training from a base model with another +training set refer to the full ketos documentation.

+
+
--preload
+

Enables/disables preloading of the training set into memory for +accelerated training. The default setting preloads data sets with less +than 2500 lines, explicitly adding --preload will preload arbitrary +sized sets. --no-preload disables preloading in all circumstances.

+
+
+

Training a network will take some time on a modern computer, even with the +default parameters. While the exact time required is unpredictable as training +is a somewhat random process a rough guide is that accuracy seldom improves +after 50 epochs reached between 8 and 24 hours of training.

+

When to stop training is a matter of experience; the default setting employs a +fairly reliable approach known as early stopping that stops training as soon as +the error rate on the validation set doesn’t improve anymore. This will +prevent overfitting, i.e. +fitting the model to recognize only the training data properly instead of the +general patterns contained therein.

+
$ ketos train output_dir/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'}
+Initializing model ✓
+Accuracy report (0) -1.5951 3680 9550
+epoch 0/-1  [####################################]  788/788
+Accuracy report (1) 0.0245 3504 3418
+epoch 1/-1  [####################################]  788/788
+Accuracy report (2) 0.8445 3504 545
+epoch 2/-1  [####################################]  788/788
+Accuracy report (3) 0.9541 3504 161
+epoch 3/-1  [------------------------------------]  13/788  0d 00:22:09
+...
+
+
+

By now there should be a couple of models model_name-1.mlmodel, +model_name-2.mlmodel, … in the directory the script was executed in. Lets +take a look at each part of the output.

+
Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+
+
+

shows the progress of loading the training and validation set into memory. This +might take a while as preprocessing the whole set and putting it into memory is +computationally intensive. Loading can be made faster without preloading at the +cost of performing preprocessing repeatedly during the training process.

+
[270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'}
+
+
+

is a warning about missing characters in either the validation or training set, +i.e. that the alphabets of the sets are not equal. Increasing the size of the +validation set will often remedy this warning.

+
Accuracy report (2) 0.8445 3504 545
+
+
+

this line shows the results of the validation set evaluation. The error after 2 +epochs is 545 incorrect characters out of 3504 characters in the validation set +for a character accuracy of 84.4%. It should decrease fairly rapidly. If +accuracy remains around 0.30 something is amiss, e.g. non-reordered +right-to-left or wildly incorrect transcriptions. Abort training, correct the +error(s) and start again.

+

After training is finished the best model is saved as +model_name_best.mlmodel. It is highly recommended to also archive the +training log and data for later reference.

+

ketos can also produce more verbose output with training set and network +information by appending one or more -v to the command:

+
$ ketos -vv train syr/*.png
+[0.7272] Building ground truth set from 876 line images
+[0.7281] Taking 88 lines from training for evaluation
+...
+[0.8479] Training set 788 lines, validation set 88 lines, alphabet 48 symbols
+[0.8481] alphabet mismatch {'\xa0', '0', ':', '݀', '܇', '݂', '5'}
+[0.8482] grapheme       count
+[0.8484] SPACE  5258
+[0.8484]        ܐ       3519
+[0.8485]        ܘ       2334
+[0.8486]        ܝ       2096
+[0.8487]        ܠ       1754
+[0.8487]        ܢ       1724
+[0.8488]        ܕ       1697
+[0.8489]        ܗ       1681
+[0.8489]        ܡ       1623
+[0.8490]        ܪ       1359
+[0.8491]        ܬ       1339
+[0.8491]        ܒ       1184
+[0.8492]        ܥ       824
+[0.8492]        .       811
+[0.8493] COMBINING DOT BELOW    646
+[0.8493]        ܟ       599
+[0.8494]        ܫ       577
+[0.8495] COMBINING DIAERESIS    488
+[0.8495]        ܚ       431
+[0.8496]        ܦ       428
+[0.8496]        ܩ       307
+[0.8497] COMBINING DOT ABOVE    259
+[0.8497]        ܣ       256
+[0.8498]        ܛ       204
+[0.8498]        ܓ       176
+[0.8499]        ܀       132
+[0.8499]        ܙ       81
+[0.8500]        *       66
+[0.8501]        ܨ       59
+[0.8501]        ܆       40
+[0.8502]        [       40
+[0.8503]        ]       40
+[0.8503]        1       18
+[0.8504]        2       11
+[0.8504]        ܇       9
+[0.8505]        3       8
+[0.8505]                6
+[0.8506]        5       5
+[0.8506] NO-BREAK SPACE 4
+[0.8507]        0       4
+[0.8507]        6       4
+[0.8508]        :       4
+[0.8508]        8       4
+[0.8509]        9       3
+[0.8510]        7       3
+[0.8510]        4       3
+[0.8511] SYRIAC FEMININE DOT    1
+[0.8511] SYRIAC RUKKAKHA        1
+[0.8512] Encoding training set
+[0.9315] Creating new model [1,1,0,48 Lbx100 Do] with 49 outputs
+[0.9318] layer          type    params
+[0.9350] 0              rnn     direction b transposed False summarize False out 100 legacy None
+[0.9361] 1              dropout probability 0.5 dims 1
+[0.9381] 2              linear  augmented False out 49
+[0.9918] Constructing RMSprop optimizer (lr: 0.001, momentum: 0.9)
+[0.9920] Set OpenMP threads to 4
+[0.9920] Moving model to device cpu
+[0.9924] Starting evaluation run
+
+
+

indicates that the training is running on 788 transcribed lines and a +validation set of 88 lines. 49 different classes, i.e. Unicode code points, +where found in these 788 lines. These affect the output size of the network; +obviously only these 49 different classes/code points can later be output by +the network. Importantly, we can see that certain characters occur markedly +less often than others. Characters like the Syriac feminine dot and numerals +that occur less than 10 times will most likely not be recognized well by the +trained net.

+
+
+

Evaluation and Validation

+

While output during training is detailed enough to know when to stop training +one usually wants to know the specific kinds of errors to expect. Doing more +in-depth error analysis also allows to pinpoint weaknesses in the training +data, e.g. above average error rates for numerals indicate either a lack of +representation of numerals in the training data or erroneous transcription in +the first place.

+

First the trained model has to be applied to some line transcriptions with the +ketos test command:

+
$ ketos test -m syriac_best.mlmodel lines/*.png
+Loading model syriac_best.mlmodel ✓
+Evaluating syriac_best.mlmodel
+Evaluating  [#-----------------------------------]    3%  00:04:56
+...
+
+
+

After all lines have been processed a evaluation report will be printed:

+
=== report  ===
+
+35619     Characters
+336       Errors
+99.06%    Accuracy
+
+157       Insertions
+81        Deletions
+98        Substitutions
+
+Count     Missed  %Right
+27046     143     99.47%  Syriac
+7015      52      99.26%  Common
+1558      60      96.15%  Inherited
+
+Errors    Correct-Generated
+25        {  } - { COMBINING DOT BELOW }
+25        { COMBINING DOT BELOW } - {  }
+15        { . } - {  }
+15        { COMBINING DIAERESIS } - {  }
+12        { ܢ } - {  }
+10        {  } - { . }
+8 { COMBINING DOT ABOVE } - {  }
+8 { ܝ } - {  }
+7 { ZERO WIDTH NO-BREAK SPACE } - {  }
+7 { ܆ } - {  }
+7 { SPACE } - {  }
+7 { ܣ } - {  }
+6 {  } - { ܝ }
+6 { COMBINING DOT ABOVE } - { COMBINING DIAERESIS }
+5 { ܙ } - {  }
+5 { ܬ } - {  }
+5 {  } - { ܢ }
+4 { NO-BREAK SPACE } - {  }
+4 { COMBINING DIAERESIS } - { COMBINING DOT ABOVE }
+4 {  } - { ܒ }
+4 {  } - { COMBINING DIAERESIS }
+4 { ܗ } - {  }
+4 {  } - { ܬ }
+4 {  } - { ܘ }
+4 { ܕ } - { ܢ }
+3 {  } - { ܕ }
+3 { ܐ } - {  }
+3 { ܗ } - { ܐ }
+3 { ܝ } - { ܢ }
+3 { ܀ } - { . }
+3 {  } - { ܗ }
+
+  .....
+
+
+

The first section of the report consists of a simple accounting of the number +of characters in the ground truth, the errors in the recognition output and the +resulting accuracy in per cent.

+

The next table lists the number of insertions (characters occurring in the +ground truth but not in the recognition output), substitutions (misrecognized +characters), and deletions (superfluous characters recognized by the model).

+

Next is a grouping of errors (insertions and substitutions) by Unicode script.

+

The final part of the report are errors sorted by frequency and a per +character accuracy report. Importantly most errors are incorrect recognition of +combining marks such as dots and diaereses. These may have several sources: +different dot placement in training and validation set, incorrect transcription +such as non-systematic transcription, or unclean speckled scans. Depending on +the error source, correction most often involves adding more training data and +fixing transcriptions. Sometimes it may even be advisable to remove +unrepresentative data from the training set.

+
+
+

Recognition

+

The kraken utility is employed for all non-training related tasks. Optical +character recognition is a multi-step process consisting of binarization +(conversion of input images to black and white), page segmentation (extracting +lines from the image), and recognition (converting line image to character +sequences). All of these may be run in a single call like this:

+
$ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m MODEL_FILE
+
+
+

producing a text file from the input image. There are also hocr and ALTO output +formats available through the appropriate switches:

+
$ kraken -i ... ocr -h
+$ kraken -i ... ocr -a
+
+
+

For debugging purposes it is sometimes helpful to run each step manually and +inspect intermediate results:

+
$ kraken -i INPUT_IMAGE BW_IMAGE binarize
+$ kraken -i BW_IMAGE LINES segment
+$ kraken -i BW_IMAGE OUTPUT_FILE ocr -l LINES ...
+
+
+

It is also possible to recognize more than one file at a time by just chaining +-i ... ... clauses like this:

+
$ kraken -i input_1 output_1 -i input_2 output_2 ...
+
+
+

Finally, there is a central repository containing freely available models. +Getting a list of all available models:

+
$ kraken list
+
+
+

Retrieving model metadata for a particular model:

+
$ kraken show arabic-alam-al-kutub
+name: arabic-alam-al-kutub.mlmodel
+
+An experimental model for Classical Arabic texts.
+
+Network trained on 889 lines of [0] as a test case for a general Classical
+Arabic model. Ground truth was prepared by Sarah Savant
+<sarah.savant@aku.edu> and Maxim Romanov <maxim.romanov@uni-leipzig.de>.
+
+Vocalization was omitted in the ground truth. Training was stopped at ~35000
+iterations with an accuracy of 97%.
+
+[0] Ibn al-Faqīh (d. 365 AH). Kitāb al-buldān. Edited by Yūsuf al-Hādī, 1st
+edition. Bayrūt: ʿĀlam al-kutub, 1416 AH/1996 CE.
+alphabet:  !()-.0123456789:[] «»،؟ءابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ARABIC
+MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW
+
+
+

and actually fetching the model:

+
$ kraken get arabic-alam-al-kutub
+
+
+

The downloaded model can then be used for recognition by the name shown in its metadata, e.g.:

+
$ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m arabic-alam-al-kutub.mlmodel
+
+
+

For more documentation see the kraken website.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/5.2/vgsl.html b/5.2/vgsl.html new file mode 100644 index 000000000..d15c90193 --- /dev/null +++ b/5.2/vgsl.html @@ -0,0 +1,320 @@ + + + + + + + + VGSL network specification — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

VGSL network specification

+

kraken implements a dialect of the Variable-size Graph Specification Language +(VGSL), enabling the specification of different network architectures for image +processing purposes using a short definition string.

+
+

Basics

+

A VGSL specification consists of an input block, one or more layers, and an +output block. For example:

+
[1,48,0,1 Cr3,3,32 Mp2,2 Cr3,3,64 Mp2,2 S1(1x12)1,3 Lbx100 Do O1c103]
+
+
+

The first block defines the input in order of [batch, height, width, channels] +with zero-valued dimensions being variable. Integer valued height or width +input specifications will result in the input images being automatically scaled +in either dimension.

+

When channels are set to 1 grayscale or B/W inputs are expected, 3 expects RGB +color images. Higher values in combination with a height of 1 result in the +network being fed 1 pixel wide grayscale strips scaled to the size of the +channel dimension.

+

After the input, a number of layers are defined. Layers operate on the channel +dimension; this is intuitive for convolutional layers but a recurrent layer +doing sequence classification along the width axis on an image of a particular +height requires the height dimension to be moved to the channel dimension, +e.g.:

+
[1,48,0,1 S1(1x48)1,3 Lbx100 O1c103]
+
+
+

or using the alternative slightly faster formulation:

+
[1,1,0,48 Lbx100 O1c103]
+
+
+

Finally an output definition is appended. When training sequence classification +networks with the provided tools the appropriate output definition is +automatically appended to the network based on the alphabet of the training +data.

+
+
+

Examples

+
[1,1,0,48 Lbx100 Do 01c59]
+
+Creating new model [1,1,0,48 Lbx100 Do] with 59 outputs
+layer           type    params
+0               rnn     direction b transposed False summarize False out 100 legacy None
+1               dropout probability 0.5 dims 1
+2               linear  augmented False out 59
+
+
+

A simple recurrent recognition model with a single LSTM layer classifying lines +normalized to 48 pixels in height.

+
[1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do 01c59]
+
+Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 59 outputs
+layer           type    params
+0               conv    kernel 3 x 3 filters 32 activation r
+1               dropout probability 0.1 dims 2
+2               maxpool kernel 2 x 2 stride 2 x 2
+3               conv    kernel 3 x 3 filters 64 activation r
+4               dropout probability 0.1 dims 2
+5               maxpool kernel 2 x 2 stride 2 x 2
+6               reshape from 1 1 x 12 to 1/3
+7               rnn     direction b transposed False summarize False out 100 legacy None
+8               dropout probability 0.5 dims 1
+9               linear  augmented False out 59
+
+
+

A model with a small convolutional stack before a recurrent LSTM layer. The +extended dropout layer syntax is used to reduce drop probability on the depth +dimension as the default is too high for convolutional layers. The remainder of +the height dimension (12) is reshaped into the depth dimensions before +applying the final recurrent and linear layers.

+
[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do 01c59]
+
+Creating new model [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do] with 59 outputs
+layer           type    params
+0               conv    kernel 3 x 3 filters 16 activation r
+1               maxpool kernel 3 x 3 stride 3 x 3
+2               rnn     direction f transposed True summarize True out 64 legacy None
+3               rnn     direction b transposed False summarize False out 128 legacy None
+4               rnn     direction b transposed False summarize False out 256 legacy None
+5               dropout probability 0.5 dims 1
+6               linear  augmented False out 59
+
+
+

A model with arbitrary sized color image input, an initial summarizing +recurrent layer to squash the height to 64, followed by 2 bi-directional +recurrent layers and a linear projection.

+
[1,1800,0,3 Cr3,3,32 Gn8 (I [Cr3,3,64,2,2 Gn8 CTr3,3,32,2,2]) Cr3,3,32 O2l8]
+
+layer           type    params
+0               conv    kernel 3 x 3 filters 32 activation r
+1               groupnorm       8 groups
+2               parallel        execute 2.0 and 2.1 in parallel
+2.0             identity
+2.1             serial  execute 2.1.0 to 2.1.2 in sequence
+2.1.0           conv    kernel 3 x 3 stride 2 x 2 filters 64 activation r
+2.1.1           groupnorm       8 groups
+2.1.2           transposed convolution  kernel 3 x 3 stride 2 x 2 filters 2 activation r
+3               conv    kernel 3 x 3 stride 1 x 1 filters 32 activation r
+4               linear  activation sigmoid
+
+
+

A model that outputs heatmaps with 8 feature dimensions, taking color images with +height normalized to 1800 pixels as its input. It uses a strided convolution +to first scale the image down, and then a transposed convolution to transform +the image back to its original size. This is done in a parallel block, where the +other branch simply passes through the output of the first convolution layer. +The input of the last convolutional layer is then the output of the two branches +of the parallel block concatenated, i.e. the output of the first +convolutional layer together with the output of the transposed convolutional layer, +giving 32 + 32 = 64 feature dimensions.

+
+
+

Convolutional Layers

+
C[T][{name}](s|t|r|l|m)[{name}]<y>,<x>,<d>[,<stride_y>,<stride_x>][,<dilation_y>,<dilation_x>]
+s = sigmoid
+t = tanh
+r = relu
+l = linear
+m = softmax
+
+
+

Adds a 2D convolution with kernel size (y, x) and d output channels, applying +the selected nonlinearity. Stride and dilation can be adjusted with the optional last +two parameters. T gives a transposed convolution. For transposed convolutions, +several output sizes are possible for the same configuration. The system +will try to match the output size of the different branches of parallel +blocks, however, this will only work if the transposed convolution directly +proceeds the confluence of the parallel branches, and if the branches with +fixed output size come first in the definition of the parallel block. Hence, +out of (I [Cr3,3,8,2,2 CTr3,3,8,2,2]), ([Cr3,3,8,2,2 CTr3,3,8,2,2] I) +and (I [Cr3,3,8,2,2 CTr3,3,8,2,2 Gn8]) only the first variant will +behave correctly.

+
+
+

Recurrent Layers

+
L[{name}](f|r|b)(x|y)[s][{name}]<n> LSTM cell with n outputs.
+G[{name}](f|r|b)(x|y)[s][{name}]<n> GRU cell with n outputs.
+f runs the RNN forward only.
+r runs the RNN reversed only.
+b runs the RNN bidirectionally.
+s (optional) summarizes the output in the requested dimension, return the last step.
+
+
+

Adds either an LSTM or GRU recurrent layer to the network using either the x +(width) or y (height) dimension as the time axis. Input features are the +channel dimension and the non-time-axis dimension (height/width) is treated as +another batch dimension. For example, a Lfx25 layer on an 1, 16, 906, 32 +input will execute 16 independent forward passes on 906x32 tensors resulting +in an output of shape 1, 16, 906, 25. If this isn’t desired either run a +summarizing layer in the other direction, e.g. Lfys20 for an input 1, 1, +906, 20, or prepend a reshape layer S1(1x16)1,3 combining the height and +channel dimension for an 1, 1, 906, 512 input to the recurrent layer.

+
+
+

Helper and Plumbing Layers

+
+

Max Pool

+
Mp[{name}]<y>,<x>[,<y_stride>,<x_stride>]
+
+
+

Adds a maximum pooling with (y, x) kernel_size and (y_stride, x_stride) stride.

+
+
+

Reshape

+
S[{name}]<d>(<a>x<b>)<e>,<f> Splits one dimension, moves one part to another
+        dimension.
+
+
+

The S layer reshapes a source dimension d to a,b and distributes a into +dimension e, respectively b into f. Either e or f has to be equal to +d. So S1(1, 48)1, 3 on an 1, 48, 1020, 8 input will first reshape into +1, 1, 48, 1020, 8, leave the 1 part in the height dimension and distribute +the 48 sized tensor into the channel dimension resulting in a 1, 1, 1024, +48*8=384 sized output. S layers are mostly used to remove undesirable non-1 +height before a recurrent layer.

+
+

Note

+

This S layer is equivalent to the one implemented in the tensorflow +implementation of VGSL, i.e. behaves differently from tesseract.

+
+
+
+
+

Regularization Layers

+
+

Dropout

+
Do[{name}][<prob>],[<dim>] Insert a 1D or 2D dropout layer
+
+
+

Adds an 1D or 2D dropout layer with a given probability. Defaults to 0.5 drop +probability and 1D dropout. Set to dim to 2 after convolutional layers.

+
+
+

Group Normalization

+
Gn<groups> Inserts a group normalization layer
+
+
+

Adds a group normalization layer separating the input into <groups> groups, +normalizing each separately.

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/CNAME b/CNAME new file mode 100644 index 000000000..d57c5ef3f --- /dev/null +++ b/CNAME @@ -0,0 +1 @@ +kraken.re \ No newline at end of file diff --git a/index.html b/index.html new file mode 100644 index 000000000..10575d18a --- /dev/null +++ b/index.html @@ -0,0 +1,9 @@ + + + + Redirecting to main branch + + + + + diff --git a/main/.buildinfo b/main/.buildinfo new file mode 100644 index 000000000..e39df1bf4 --- /dev/null +++ b/main/.buildinfo @@ -0,0 +1,4 @@ +# Sphinx build info version 1 +# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. +config: 532fe6c24dd01cc0ea9ff45316b91906 +tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/main/.doctrees/advanced.doctree b/main/.doctrees/advanced.doctree new file mode 100644 index 000000000..956d6b9f1 Binary files /dev/null and b/main/.doctrees/advanced.doctree differ diff --git a/main/.doctrees/api.doctree b/main/.doctrees/api.doctree new file mode 100644 index 000000000..a317ecc1a Binary files /dev/null and b/main/.doctrees/api.doctree differ diff --git a/main/.doctrees/api_docs.doctree b/main/.doctrees/api_docs.doctree new file mode 100644 index 000000000..422c2a845 Binary files /dev/null and b/main/.doctrees/api_docs.doctree differ diff --git a/main/.doctrees/environment.pickle b/main/.doctrees/environment.pickle new file mode 100644 index 000000000..a5b36c949 Binary files /dev/null and b/main/.doctrees/environment.pickle differ diff --git a/main/.doctrees/gpu.doctree b/main/.doctrees/gpu.doctree new file mode 100644 index 000000000..2f8f44200 Binary files /dev/null and b/main/.doctrees/gpu.doctree differ diff --git a/main/.doctrees/index.doctree b/main/.doctrees/index.doctree new file mode 100644 index 000000000..be96a905e Binary files /dev/null and b/main/.doctrees/index.doctree differ diff --git a/main/.doctrees/ketos.doctree b/main/.doctrees/ketos.doctree new file mode 100644 index 000000000..76d5dc1ab Binary files /dev/null and b/main/.doctrees/ketos.doctree differ diff --git a/main/.doctrees/models.doctree b/main/.doctrees/models.doctree new file mode 100644 index 000000000..ce6b6c321 Binary files /dev/null and b/main/.doctrees/models.doctree differ diff --git a/main/.doctrees/training.doctree b/main/.doctrees/training.doctree new file mode 100644 index 000000000..d2afe0623 Binary files /dev/null and b/main/.doctrees/training.doctree differ diff --git a/main/.doctrees/vgsl.doctree b/main/.doctrees/vgsl.doctree new file mode 100644 index 000000000..0fc0ebd1b Binary files /dev/null and b/main/.doctrees/vgsl.doctree differ diff --git a/main/.nojekyll b/main/.nojekyll new file mode 100644 index 000000000..e69de29bb diff --git a/main/_images/blla_heatmap.jpg b/main/_images/blla_heatmap.jpg new file mode 100644 index 000000000..3f3381096 Binary files /dev/null and b/main/_images/blla_heatmap.jpg differ diff --git a/main/_images/blla_output.jpg b/main/_images/blla_output.jpg new file mode 100644 index 000000000..72c652fa4 Binary files /dev/null and b/main/_images/blla_output.jpg differ diff --git a/main/_images/bw.png b/main/_images/bw.png new file mode 100644 index 000000000..e7e5054eb Binary files /dev/null and b/main/_images/bw.png differ diff --git a/main/_images/normal-reproduction-low-resolution.jpg b/main/_images/normal-reproduction-low-resolution.jpg new file mode 100644 index 000000000..673be92ae Binary files /dev/null and b/main/_images/normal-reproduction-low-resolution.jpg differ diff --git a/main/_images/pat.png b/main/_images/pat.png new file mode 100644 index 000000000..52f7a8995 Binary files /dev/null and b/main/_images/pat.png differ diff --git a/main/_sources/advanced.rst.txt b/main/_sources/advanced.rst.txt new file mode 100644 index 000000000..93822e472 --- /dev/null +++ b/main/_sources/advanced.rst.txt @@ -0,0 +1,466 @@ +.. _advanced: + +Advanced Usage +============== + +Optical character recognition is the serial execution of multiple steps, in the +case of kraken, layout analysis/page segmentation (extracting topological text +lines from an image), recognition (feeding text lines images into a +classifier), and finally serialization of results into an appropriate format +such as ALTO or PageXML. + +Input and Outputs +----------------- + +Kraken inputs and their outputs can be defined in multiple ways. The most +simple are input-output pairs, i.e. producing one output document for one input +document follow the basic syntax: + +.. code-block:: console + + $ kraken -i input_1 output_1 -i input_2 output_2 ... subcommand_1 subcommand_2 ... subcommand_n + +In particular subcommands may be chained. + +There are other ways to define inputs and outputs as the syntax shown above can +become rather cumbersome for large amounts of files. + +As such there are a couple of ways to deal with multiple files in a compact +way. The first is batch processing: + +.. code-block:: console + + $ kraken -I '*.png' -o ocr.txt segment ... + +which expands the `glob expression +`_ in kraken internally and +appends the suffix defined with `-o` to each output file. An input file +`xyz.png` will therefore produce an output file `xyz.png.ocr.txt`. `-I` batch +inputs can also be specified multiple times: + +.. code-block:: console + + $ kraken -I '*.png' -I '*.jpg' -I '*.tif' -o ocr.txt segment ... + +A second way is to input multi-image files directly. These can be either in +PDF, TIFF, or JPEG2000 format and are specified like: + +.. code-block:: console + + $ kraken -I some.pdf -o ocr.txt -f pdf segment ... + +This will internally extract all page images from the input PDF file and write +one output file with an index (can be changed using the `-p` option) and the +suffix defined with `-o`. + +The `-f` option can not only be used to extract data from PDF/TIFF/JPEG2000 +files but also various XML formats. In these cases the appropriate data is +automatically selected from the inputs, image data for segmentation or line and +region segmentation for recognition: + +.. code-block:: console + + $ kraken -i alto.xml alto.ocr.txt -i page.xml page.ocr.txt -f xml ocr ... + +The code is able to automatically determine if a file is in PageXML or ALTO format. + +Output formats +^^^^^^^^^^^^^^ + +All commands have a default output format such as raw text for `ocr`, a plain +image for `binarize`, or a JSON definition of the the segmentation for +`segment`. These are specific to kraken and generally not suitable for further +processing by other software but a number of standardized data exchange formats +can be selected. Per default `ALTO `_, +`PageXML `_, `hOCR +`_, and abbyyXML containing additional metadata such as +bounding boxes and confidences are implemented. In addition, custom `jinja +`_ templates can be loaded to create +individualised output such as TEI. + +Output formats are selected on the main `kraken` command and apply to the last +subcommand defined in the subcommand chain. For example: + +.. code-block:: console + + $ kraken --alto -i ... segment -bl + +will serialize a plain segmentation in ALTO into the specified output file. + +The currently available format switches are: + +.. code-block:: console + + $ kraken -n -i ... ... # native output + $ kraken -a -i ... ... # ALTO output + $ kraken -x -i ... ... # PageXML output + $ kraken -h -i ... ... # hOCR output + $ kraken -y -i ... ... # abbyyXML output + +Custom templates can be loaded with the ``--template`` option: + +.. code-block:: console + + $ kraken --template /my/awesome/template.tmpl -i ... ... + +The data objects used by the templates are considered internal to kraken and +can change from time to time. The best way to get some orientation when writing +a new template from scratch is to have a look at the existing templates `here +`_. + +Binarization +------------ + +.. _binarization: + +.. note:: + + Binarization is deprecated and mostly not necessary anymore. It can often + worsen text recognition results especially for documents with uneven + lighting, faint writing, etc. + +The binarization subcommand converts a color or grayscale input image into an +image containing only two color levels: white (background) and black +(foreground, i.e. text). It accepts almost the same parameters as +``ocropus-nlbin``. Only options not related to binarization, e.g. skew +detection are missing. In addition, error checking (image sizes, inversion +detection, grayscale enforcement) is always disabled and kraken will happily +binarize any image that is thrown at it. + +Available parameters are: + +============ ==== +option type +============ ==== +\--threshold FLOAT +\--zoom FLOAT +\--escale FLOAT +\--border FLOAT +\--perc INTEGER RANGE +\--range INTEGER +\--low INTEGER RANGE +\--high INTEGER RANGE +============ ==== + +To binarize an image: + +.. code-block:: console + + $ kraken -i input.jpg bw.png binarize + +.. note:: + + Some image formats, notably JPEG, do not support a black and white + image mode. Per default the output format according to the output file + name extension will be honored. If this is not possible, a warning will + be printed and the output forced to PNG: + + .. code-block:: console + + $ kraken -i input.jpg bw.jpg binarize + Binarizing [06/24/22 09:56:23] WARNING jpeg does not support 1bpp images. Forcing to png. + ✓ + +Page Segmentation +----------------- + +The `segment` subcommand accesses page segmentation into lines and regions with +the two layout analysis methods implemented: the trainable baseline segmenter +that is capable of detecting both lines of different types and regions and a +legacy non-trainable segmenter that produces bounding boxes. + +Universal parameters of either segmenter are: + +=============================================== ====== +option action +=============================================== ====== +-d, \--text-direction Sets principal text direction. Valid values are `horizontal-lr`, `horizontal-rl`, `vertical-lr`, and `vertical-rl`. +-m, \--mask Segmentation mask suppressing page areas for line detection. A simple black and white mask image where 0-valued (black) areas are ignored for segmentation purposes. +=============================================== ====== + +Baseline Segmentation +^^^^^^^^^^^^^^^^^^^^^ + +The baseline segmenter works by applying a segmentation model on a page image +which labels each pixel on the image with one or more classes with each class +corresponding to a line or region of a specific type. In addition there are two +auxiliary classes that are used to determine the line orientation. A simplified +example of a composite image of the auxiliary classes and a single line type +without regions can be seen below: + +.. image:: _static/blla_heatmap.jpg + :width: 800 + :alt: BLLA output heatmap + +In a second step the raw heatmap is vectorized to extract line instances and +region boundaries, followed by bounding polygon computation for the baselines, +and text line ordering. The final output can be visualized as: + +.. image:: _static/blla_output.jpg + :width: 800 + :alt: BLLA final output + +The primary determinant of segmentation quality is the segmentation model +employed. There is a default model that works reasonably well on printed and +handwritten material on undegraded, even writing surfaces such as paper or +parchment. The output of this model consists of a single line type and a +generic text region class that denotes coherent blocks of text. This model is +employed automatically when the baseline segment is activated with the `-bl` +option: + +.. code-block:: console + + $ kraken -i input.jpg segmentation.json segment -bl + +New models optimized for other kinds of documents can be trained (see +:ref:`here `). These can be applied with the `-i` option of the +`segment` subcommand: + +.. code-block:: console + + $ kraken -i input.jpg segmentation.json segment -bl -i fancy_model.mlmodel + +Legacy Box Segmentation +^^^^^^^^^^^^^^^^^^^^^^^ + +The legacy page segmentation is mostly parameterless, although a couple of +switches exist to tweak it for particular inputs. Its output consists of +rectangular bounding boxes in reading order and the general text direction +(horizontal, i.e. LTR or RTL text in top-to-bottom reading order or +vertical-ltr/rtl for vertical lines read from left-to-right or right-to-left). + +Apart from the limitations of the bounding box paradigm (rotated and curved +lines cannot be effectively extracted) another important drawback of the legacy +segmenter is the requirement for binarized input images. It is therefore +necessary to apply :ref:`binarization ` first or supply only +pre-binarized inputs. + +The legacy segmenter can be applied on some input image with: + +.. code-block:: console + + $ kraken -i 14.tif lines.json segment -x + $ cat lines.json + +Available specific parameters are: + +=============================================== ====== +option action +=============================================== ====== +\--scale FLOAT Estimate of the average line height on the page +-m, \--maxcolseps Maximum number of columns in the input document. Set to `0` for uni-column layouts. +-b, \--black-colseps / -w, \--white-colseps Switch to black column separators. +-r, \--remove-hlines / -l, \--hlines Disables prefiltering of small horizontal lines. Improves segmenter output on some Arabic texts. +-p, \--pad Adds left and right padding around lines in the output. +=============================================== ====== + +Principal Text Direction +^^^^^^^^^^^^^^^^^^^^^^^^ + +The principal text direction selected with the ``-d/--text-direction`` is a +switch used in the reading order heuristic to determine the order of text +blocks (regions) and individual lines. It roughly corresponds to the `block +flow direction +`_ in CSS with +an additional option. Valid options consist of two parts, an initial principal +line orientation (`horizontal` or `vertical`) followed by a block order (`lr` +for left-to-right or `rl` for right-to-left). + +.. warning:: + + The principal text direction is independent of the direction of the + *inline text direction* (which is left-to-right for writing systems like + Latin and right-to-left for ones like Hebrew or Arabic). Kraken deals + automatically with the inline text direction through the BiDi algorithm + but can't infer the principal text direction automatically as it is + determined by factors like layout, type of document, primary script in + the document, and other factors. The different types of text + directionality and their relation can be confusing, the `W3C writing + mode `_ document explains + the fundamentals, although the model used in Kraken differs slightly. + +The first part is usually `horizontal` for scripts like Latin, Arabic, or +Hebrew where the lines are horizontally oriented on the page and are written/read from +top to bottom: + +.. image:: _static/bw.png + :width: 800 + :alt: Horizontal Latin script text + +Other scripts like Chinese can be written with vertical lines that are +written/read from left to right or right to left: + +.. image:: https://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Chinese_manuscript_Ti-i_ch%27i-shu._Wellcome_L0020843.jpg/577px-Chinese_manuscript_Ti-i_ch%27i-shu._Wellcome_L0020843.jpg + :width: 800 + :alt: Vertical Chinese text + +The second part is dependent on a number of factors as the order in which text +blocks are read is not fixed for every writing system. In mono-script texts it +is usually determined by the inline text direction, i.e. Latin script texts +columns are read starting with the top-left column followed by the column to +its right and so on, continuing with the left-most column below if none remain +to the right (inverse for right-to-left scripts like Arabic which start on the +top right-most columns, continuing leftward, and returning to the right-most +column just below when none remain). + +In multi-script documents the order is determined by the primary writing +system employed in the document, e.g. for a modern book containing both Latin +and Arabic script text it would be set to `lr` when Latin is primary, e.g. when +the binding is on the left side of the book seen from the title cover, and +vice-versa (`rl` if binding is on the right on the title cover). The analogue +applies to text written with vertical lines. + +With these explications there are four different text directions available: + +=============================================== ====== +Text Direction Examples +=============================================== ====== +horizontal-lr Latin script texts, Mixed LTR/RTL docs with principal LTR script +horizontal-rl Arabic script texts, Mixed LTR/RTL docs with principal RTL script +vertical-lr Vertical script texts read from left-to-right. +vertical-rl Vertical script texts read from right-to-left. +=============================================== ====== + +Masking +^^^^^^^ + +It is possible to keep the segmenter from finding text lines and regions on +certain areas of the input image. This is done through providing a binary mask +image that has the same size as the input image where blocked out regions are +black and valid regions white: + +.. code-block:: console + + $ kraken -i input.jpg segmentation.json segment -bl -m mask.png + +Model Repository +---------------- + +.. _repo: + +There is a semi-curated `repository +`_ of freely licensed recognition +models that can be interacted with from the command line using a few +subcommands. + +Querying and Model Retrieval +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``list`` subcommand retrieves a list of all models available and prints +them including some additional information (identifier, type, and a short +description): + +.. code-block:: console + + $ kraken list + Retrieving model list ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 8/8 0:00:00 0:00:07 + 10.5281/zenodo.6542744 (pytorch) - LECTAUREP Contemporary French Model (Administration) + 10.5281/zenodo.5617783 (pytorch) - Cremma-Medieval Old French Model (Litterature) + 10.5281/zenodo.5468665 (pytorch) - Medieval Hebrew manuscripts in Sephardi bookhand version 1.0 + ... + +To access more detailed information the ``show`` subcommand may be used: + +.. code-block:: console + + $ kraken show 10.5281/zenodo.5617783 + name: 10.5281/zenodo.5617783 + + Cremma-Medieval Old French Model (Litterature) + + .... + scripts: Latn + alphabet: &'(),-.0123456789:;?ABCDEFGHIJKLMNOPQRSTUVXabcdefghijklmnopqrstuvwxyz¶ãíñõ÷ħĩłũƺᵉẽ’•⁊⁹ꝑꝓꝯꝰ SPACE, COMBINING ACUTE ACCENT, COMBINING TILDE, COMBINING MACRON, COMBINING ZIGZAG ABOVE, COMBINING LATIN SMALL LETTER A, COMBINING LATIN SMALL LETTER E, COMBINING LATIN SMALL LETTER I, COMBINING LATIN SMALL LETTER O, COMBINING LATIN SMALL LETTER U, COMBINING LATIN SMALL LETTER C, COMBINING LATIN SMALL LETTER R, COMBINING LATIN SMALL LETTER T, COMBINING UR ABOVE, COMBINING US ABOVE, COMBINING LATIN SMALL LETTER S, 0xe8e5, 0xf038, 0xf128 + accuracy: 95.49% + license: CC-BY-SA-2.0 + author(s): Pinche, Ariane + date: 2021-10-29 + +If a suitable model has been decided upon it can be retrieved using the ``get`` +subcommand: + +.. code-block:: console + + $ kraken get 10.5281/zenodo.5617783 + Processing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 16.1/16.1 MB 0:00:00 0:00:10 + Model name: cremma_medieval_bicerin.mlmodel + +Models will be placed in ``$XDG_BASE_DIR`` and can be accessed using their name as +printed in the last line of the ``kraken get`` output. + +.. code-block:: console + + $ kraken -i ... ... ocr -m cremma_medieval_bicerin.mlmodel + +Publishing +^^^^^^^^^^ + +When one would like to share a model with the wider world (for fame and glory!) +it is possible (and recommended) to upload them to repository. The process +consists of 2 stages: the creation of the deposit on the Zenodo platform +followed by approval of the model in the community making it discoverable for +other kraken users. + +For uploading model a Zenodo account and a personal access token is required. +After account creation tokens can be created under the account settings: + +.. image:: _static/pat.png + :width: 800 + :alt: Zenodo token creation dialogue + +With the token models can then be uploaded: + +.. code-block:: console + + $ ketos publish -a $ACCESS_TOKEN aaebv2-2.mlmodel + DOI: 10.5281/zenodo.5617783 + +A number of important metadata will be asked for such as a short description of +the model, long form description, recognized scripts, and authorship. +Afterwards the model is deposited at Zenodo. This deposit is persistent, i.e. +can't be changed or deleted so it is important to make sure that all the +information is correct. Each deposit also has a unique persistent identifier, a +DOI, that can be used to refer to it, e.g. in publications or when pointing +someone to a particular model. + +Once the deposit has been created a request (requiring manual approval) for +inclusion in the repository will automatically be created which will make it +discoverable by other users. + +It is possible to deposit models without including them in the queryable +repository. Models uploaded this way are not truly private and can still be +found through the standard Zenodo search and be downloaded with `kraken get` +and its DOI. It is mostly suggested for preliminary models that might get +updated later: + +.. code-block:: console + + $ ketos publish --private -a $ACCESS_TOKEN aaebv2-2.mlmodel + DOI: 10.5281/zenodo.5617734 + +Recognition +----------- + +Recognition requires a grey-scale or binarized image, a page segmentation for +that image, and a model file. In particular there is no requirement to use the +page segmentation algorithm contained in the ``segment`` subcommand or the +binarization provided by kraken. + +Multi-script recognition is possible by supplying a script-annotated +segmentation and a mapping between scripts and models: + +.. code-block:: console + + $ kraken -i ... ... ocr -m Grek:porson.mlmodel -m Latn:antiqua.mlmodel + +All polytonic Greek text portions will be recognized using the `porson.mlmodel` +model while Latin text will be fed into the `antiqua.mlmodel` model. It is +possible to define a fallback model that other text will be fed to: + +.. code-block:: console + + $ kraken -i ... ... ocr -m ... -m ... -m default:porson.mlmodel + +It is also possible to disable recognition on a particular script by mapping to +the special model keyword `ignore`. Ignored lines will still be serialized but +will not contain any recognition results. diff --git a/main/_sources/api.rst.txt b/main/_sources/api.rst.txt new file mode 100644 index 000000000..56d0fca81 --- /dev/null +++ b/main/_sources/api.rst.txt @@ -0,0 +1,546 @@ +API Quickstart +============== + +Kraken provides routines which are usable by third party tools to access all +functionality of the OCR engine. Most functional blocks, binarization, +segmentation, recognition, and serialization are encapsulated in one high +level method each. + +Simple use cases of the API which are mostly useful for debugging purposes are +contained in the `contrib` directory. In general it is recommended to look at +this tutorial, these scripts, or the API reference. The command line drivers +are unnecessarily complex for straightforward applications as they contain lots +of boilerplate to enable all use cases. + +Basic Concepts +-------------- + +The fundamental modules of the API are similar to the command line drivers. +Image inputs and outputs are generally `Pillow `_ +objects and numerical outputs numpy arrays. + +Top-level modules implement high level functionality while :mod:`kraken.lib` +contains loaders and low level methods that usually should not be used if +access to intermediate results is not required. + +Preprocessing and Segmentation +------------------------------ + +The primary preprocessing function is binarization although depending on the +particular setup of the pipeline and the models utilized it can be optional. +For the non-trainable legacy bounding box segmenter binarization is mandatory +although it is still possible to feed color and grayscale images to the +recognizer. The trainable baseline segmenter can work with black and white, +grayscale, and color images, depending on the training data and network +configuration utilized; though grayscale and color data are used in almost all +cases. + +.. code-block:: python + + >>> from PIL import Image + + >>> from kraken import binarization + + # can be any supported image format and mode + >>> im = Image.open('foo.png') + >>> bw_im = binarization.nlbin(im) + +Legacy segmentation +~~~~~~~~~~~~~~~~~~~ + +The basic parameter of the legacy segmenter consists just of a b/w image +object, although some additional parameters exist, largely to change the +principal text direction (important for column ordering and top-to-bottom +scripts) and explicit masking of non-text image regions: + +.. code-block:: python + + >>> from kraken import pageseg + + >>> seg = pageseg.segment(bw_im) + >>> seg + Segmentation(type='bbox', + imagename='foo.png', + text_direction='horizontal-lr', + script_detection=False, + lines=[BBoxLine(id='0ce11ad6-1f3b-4f7d-a8c8-0178e411df69', + bbox=[74, 61, 136, 101], + text=None, + base_dir=None, + type='bbox', + imagename=None, + tags=None, + split=None, + regions=None, + text_direction='horizontal-lr'), + BBoxLine(id='c4a751dc-6731-4eea-a287-d4b57683f5b0', ...), + ....], + regions={}, + line_orders=[]) + +All segmentation methods return a :class:`kraken.containers.Segmentation` +object that contains all elements of the segmentation: its type, a list of +lines (either :class:`kraken.containers.BBoxLine` or +:class:`kraken.containers.BaselineLine`), a dictionary mapping region types to +lists of regions (:class:`kraken.containers.Region`), and one or more line +reading orders. + +Baseline segmentation +~~~~~~~~~~~~~~~~~~~~~ + +The baseline segmentation method is based on a neural network that classifies +image pixels into baselines and regions. Because it is trainable, a +segmentation model is required in addition to the image to be segmented and +it has to be loaded first: + +.. code-block:: python + + >>> from kraken import blla + >>> from kraken.lib import vgsl + + >>> model_path = 'path/to/model/file' + >>> model = vgsl.TorchVGSLModel.load_model(model_path) + +A segmentation model contains a basic neural network and associated metadata +defining the available line and region types, bounding regions, and an +auxiliary baseline location flag for the polygonizer: + +.. raw:: html + :file: _static/kraken_segmodel.svg + +Afterwards they can be fed into the segmentation method +:func:`kraken.blla.segment` with image objects: + +.. code-block:: python + + >>> from kraken import blla + >>> from kraken import serialization + + >>> baseline_seg = blla.segment(im, model=model) + >>> baseline_seg + Segmentation(type='baselines', + imagename='foo.png', + text_direction='horizontal-lr', + script_detection=False, + lines=[BaselineLine(id='22fee3d1-377e-4130-b9e5-5983a0c50ce8', + baseline=[[71, 93], [145, 92]], + boundary=[[71, 93], ..., [71, 93]], + text=None, + base_dir=None, + type='baselines', + imagename=None, + tags={'type': 'default'}, + split=None, + regions=['f17d03e0-50bb-4a35-b247-cb910c0aaf2b']), + BaselineLine(id='539eadce-f795-4bba-a785-c7767d10c407', ...), ...], + regions={'text': [Region(id='f17d03e0-50bb-4a35-b247-cb910c0aaf2b', + boundary=[[277, 54], ..., [277, 54]], + imagename=None, + tags={'type': 'text'})]}, + line_orders=[]) + >>> alto = serialization.serialize(baseline_seg, + image_size=im.size, + template='alto') + >>> with open('segmentation_output.xml', 'w') as fp: + fp.write(alto) + +A default segmentation model is supplied and will be used if none is specified +explicitly as an argument. Optional parameters are largely the same as for the +legacy segmenter, i.e. text direction and masking. + +Images are automatically converted into the proper mode for recognition, except +in the case of models trained on binary images as there is a plethora of +different algorithms available, each with strengths and weaknesses. For most +material the kraken-provided binarization should be sufficient, though. This +does not mean that a segmentation model trained on RGB images will have equal +accuracy for B/W, grayscale, and RGB inputs. Nevertheless the drop in quality +will often be modest or non-existent for color models while non-binarized +inputs to a binary model will cause severe degradation (and a warning to that +notion). + +Per default segmentation is performed on the CPU although the neural network +can be run on a GPU with the `device` argument. As the vast majority of the +processing required is postprocessing the performance gain will most likely +modest though. + +The above API is the most simple way to perform a complete segmentation. The +process consists of multiple steps such as pixel labelling, separate region and +baseline vectorization, and bounding polygon calculation: + +.. raw:: html + :file: _static/kraken_segmentation.svg + +It is possible to only run a subset of the functionality depending on one's +needs by calling the respective functions in :mod:`kraken.lib.segmentation`. As +part of the sub-library the API is not guaranteed to be stable but it generally +does not change much. Examples of more fine-grained use of the segmentation API +can be found in `contrib/repolygonize.py +`_ +and `contrib/segmentation_overlay.py +`_. + +Recognition +----------- + +Recognition itself is a multi-step process with a neural network producing a +matrix with a confidence value for possible outputs at each time step. This +matrix is decoded into a sequence of integer labels (*label domain*) which are +subsequently mapped into Unicode code points using a codec. Labels and code +points usually correspond one-to-one, i.e. each label is mapped to exactly one +Unicode code point, but if desired more complex codecs can map single labels to +multiple code points, multiple labels to single code points, or multiple labels +to multiple code points (see the :ref:`Codec ` section for further +information). + +.. _recognition_steps: + +.. raw:: html + :file: _static/kraken_recognition.svg + +As the customization of this two-stage decoding process is usually reserved +for specialized use cases, sensible defaults are chosen by default: codecs are +part of the model file and do not have to be supplied manually; the preferred +CTC decoder is an optional parameter of the recognition model object. + +To perform text line recognition a neural network has to be loaded first. A +:class:`kraken.lib.models.TorchSeqRecognizer` is returned which is a wrapper +around the :class:`kraken.lib.vgsl.TorchVGSLModel` class seen above for +segmentation model loading. + +.. code-block:: python + + >>> from kraken.lib import models + + >>> rec_model_path = '/path/to/recognition/model' + >>> model = models.load_any(rec_model_path) + +The sequence recognizer wrapper combines the neural network itself, a +:ref:`codec `, metadata such as if the input is supposed to be +grayscale or binarized, and an instance of a CTC decoder that performs the +conversion of the raw output tensor of the network into a sequence of labels: + +.. raw:: html + :file: _static/kraken_torchseqrecognizer.svg + +Afterwards, given an image, a segmentation and the model one can perform text +recognition. The code is identical for both legacy and baseline segmentations. +Like for segmentation input images are auto-converted to the correct color +mode, except in the case of binary models for which a warning will be raised if +there is a mismatch. + +There are two methods for recognition, a basic single model call +:func:`kraken.rpred.rpred` and a multi-model recognizer +:func:`kraken.rpred.mm_rpred`. The latter is useful for recognizing +multi-scriptal documents, i.e. applying different models to different parts of +a document. + +.. code-block:: python + + >>> from kraken import rpred + # single model recognition + >>> pred_it = rpred(network=model, + im=im, + segmentation=baseline_seg) + >>> for record in pred_it: + print(record) + +The output isn't just a sequence of characters but, depending on the type of +segmentation supplied, a :class:`kraken.containers.BaselineOCRRecord` or +:class:`kraken.containers.BBoxOCRRecord` record object containing the character +prediction, cuts (approximate locations), and confidences. + +.. code-block:: python + + >>> record.cuts + >>> record.prediction + >>> record.confidences + +it is also possible to access the original line information: + +.. code-block:: python + + # for baselines + >>> record.type + 'baselines' + >>> record.line + >>> record.baseline + >>> record.script + + # for box lines + >>> record.type + 'bbox' + >>> record.line + >>> record.script + +Sometimes the undecoded raw output of the network is required. The :math:`C +\times W` softmax output matrix is accessible as the `outputs` attribute on the +:class:`kraken.lib.models.TorchSeqRecognizer` after each step of the +:func:`kraken.rpred.rpred` iterator. To get a mapping from the label space +:math:`C` the network operates in to Unicode code points a codec is used. An +arbitrary sequence of labels can generate an arbitrary number of Unicode code +points although usually the relation is one-to-one. + +.. code-block:: python + + >>> pred_it = rpred(model, im, baseline_seg) + >>> next(pred_it) + >>> model.output + >>> model.codec.l2c + {'\x01': ' ', + '\x02': '"', + '\x03': "'", + '\x04': '(', + '\x05': ')', + '\x06': '-', + '\x07': '/', + ... + } + +There are several different ways to convert the output matrix to a sequence of +labels that can be decoded into a character sequence. These are contained in +:mod:`kraken.lib.ctc_decoder` with +:func:`kraken.lib.ctc_decoder.greedy_decoder` being the default. + +XML Parsing +----------- + +Sometimes it is desired to take the data in an existing XML serialization +format like PageXML or ALTO and apply an OCR function on it. The +:mod:`kraken.lib.xml` module includes parsers extracting information into data +structures processable with minimal transformation by the functional blocks: + +Parsing is accessed is through the :class:`kraken.lib.xml.XMLPage` class. + +.. code-block:: python + + >>> from kraken.lib import xml + + >>> alto_doc = '/path/to/alto' + >>> parsed_doc = xml.XMLPage(alto_doc) + >>> parsed_doc + XMLPage(filename='/path/to/alto', filetype=alto) + >>> parsed_doc.lines + {'line_1469098625593_463': BaselineLine(id='line_1469098625593_463', + baseline=[(2337, 226), (2421, 239)], + boundary=[(2344, 182), (2428, 195), (2420, 244), (2336, 231)], + text='$pag:39', + base_dir=None, + type='baselines', + imagename=None, + tags={'type': '$pag'}, + split=None, + regions=['region_1469098609000_462']), + + 'line_1469098649515_464': BaselineLine(id='line_1469098649515_464', + baseline=[(789, 269), (2397, 304)], + boundary=[(790, 224), (2398, 259), (2397, 309), (789, 274)], + text='$-nor su hijo, De todos sus bienes, con los pactos', + base_dir=None, + type='baselines', + imagename=None, + tags={'type': '$pac'}, + split=None, + regions=['region_1469098557906_461']), + ....} + >>> parsed_doc.regions + {'$pag': [Region(id='region_1469098609000_462', + boundary=[(2324, 171), (2437, 171), (2436, 258), (2326, 237)], + imagename=None, + tags={'type': '$pag'})], + '$pac': [Region(id='region_1469098557906_461', + boundary=[(738, 203), (2339, 245), (2398, 294), (2446, 345), (2574, 469), (2539, 1873), (2523, 2053), (2477, 2182), (738, 2243)], + imagename=None, + tags={'type': '$pac'})], + '$tip': [Region(id='TextRegion_1520586482298_194', + boundary=[(687, 2428), (688, 2422), (107, 2420), (106, 2264), (789, 2256), (758, 2404)], + imagename=None, + tags={'type': '$tip'})], + '$par': [Region(id='TextRegion_1520586482298_193', + boundary=[(675, 3772), (687, 2428), (758, 2404), (789, 2256), (2542, 2236), (2581, 3748)], + imagename=None, + tags={'type': '$par'})] + } + +The parser is aware of reading order(s), thus the basic properties accessing +lines and regions are unordered dictionaries. Reading orders can be accessed +separately through the `reading_orders` property: + +.. code-block:: python + + >>> parsed_doc.region_orders + {'line_implicit': {'order': ['line_1469098625593_463', + 'line_1469098649515_464', + ... + 'line_1469099255968_508'], + 'is_total': True, + 'description': 'Implicit line order derived from element sequence'}, + 'region_implicit': {'order': ['region_1469098609000_462', + ... + 'TextRegion_1520586482298_193'], + 'is_total': True, + 'description': 'Implicit region order derived from element sequence'}, + 'region_transkribus': {'order': ['region_1469098609000_462', + ... + 'TextRegion_1520586482298_193'], + 'is_total': True, + 'description': 'Explicit region order from `custom` attribute'}, + 'line_transkribus': {'order': ['line_1469098625593_463', + ... + 'line_1469099255968_508'], + 'is_total': True, + 'description': 'Explicit line order from `custom` attribute'}, + 'o_1530717944451': {'order': ['region_1469098609000_462', + ... + 'TextRegion_1520586482298_193'], + 'is_total': True, + 'description': 'Regions reading order'}} + +Reading orders are created from different sources, depending on the content of +the XML file. Every document will contain at least implicit orders for lines +and regions (`line_implicit` and `region_implicit`) sourced from the sequence +of line and region elements. There can also be explicit additional orders +defined by the standard reading order elements, for example `o_1530717944451` +in the above example. In Page XML files reading orders defined with the +Transkribus style custom attribute are also recognized. + +To access the lines or regions of a document in a particular order: + +.. code-block:: python + + >>> parsed_doc.get_sorted_lines(ro='line_implicit') + [BaselineLine(id='line_1469098625593_463', + baseline=[(2337, 226), (2421, 239)], + boundary=[(2344, 182), (2428, 195), (2420, 244), (2336, 231)], + text='$pag:39', + base_dir=None, + type='baselines', + imagename=None, + tags={'type': '$pag'}, + split=None, + regions=['region_1469098609000_462']), + BaselineLine(id='line_1469098649515_464', + baseline=[(789, 269), (2397, 304)], + boundary=[(790, 224), (2398, 259), (2397, 309), (789, 274)], + text='$-nor su hijo, De todos sus bienes, con los pactos', + base_dir=None, + type='baselines', + imagename=None, + tags={'type': '$pac'}, + split=None, + regions=['region_1469098557906_461']) + ...] + +The recognizer functions do not accept :class:`kraken.lib.xml.XMLPage` objects +directly which means that for most practical purposes these need to be +converted into :class:`container ` objects: + +.. code-block:: python + + >>> segmentation = parsed_doc.to_container() + >>> pred_it = rpred(network=model, + im=im, + segmentation=segmentation) + >>> for record in pred_it: + print(record) + + +Serialization +------------- + + +The serialization module can be used to transform results returned by the +segmenter or recognizer into a text based (most often XML) format for archival. +The module renders `jinja2 `_ templates, +either ones :ref:`packaged ` with kraken or supplied externally, +through the :func:`kraken.serialization.serialize` function. + +.. code-block:: python + + >>> import dataclasses + >>> from kraken.lib import serialization + + >>> alto_seg_only = serialization.serialize(baseline_seg, image_size=im.size, template='alto') + + >>> records = [record for record in pred_it] + >>> results = dataclasses.replace(pred_it.bounds, lines=records) + >>> alto = serialization.serialize(results, image_size=im.size, template='alto') + >>> with open('output.xml', 'w') as fp: + fp.write(alto) + +The serialization function accepts arbitrary +:class:`kraken.containers.Segmentation` objects, which may contain textual or +only segmentation information. As the recognizer returns +:class:`ocr_records ` which cannot be serialized +directly it is necessary to either construct a new +:class:`kraken.containers.Segmentation` from scratch or insert them into the +segmentation fed into the recognizer (:class:`ocr_records +` subclass :class:`BaselineLine +`/:class:`BBoxLine +` The container classes are immutable data classes, +therefore it is necessary for simple insertion of the records to use +`dataclasses.replace` to create a new segmentation with a changed lines +attribute. + +Training +-------- + +Training is largely implemented with the `pytorch lightning +`_ framework. There are separate +`LightningModule`s for recognition and segmentation training and a small +wrapper around the lightning's `Trainer` class that mainly sets up model +handling and verbosity options for the CLI. + + +.. code-block:: python + + >>> from kraken.lib.train import RecognitionModel, KrakenTrainer + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer() + >>> trainer.fit(model) + +Likewise for a baseline and region segmentation model: + +.. code-block:: python + + >>> from kraken.lib.train import SegmentationModel, KrakenTrainer + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = SegmentationModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer() + >>> trainer.fit(model) + +When the `fit()` method is called the dataset is initialized and the training +commences. Both can take quite a bit of time. To get insight into what exactly +is happening the standard `lightning callbacks +`_ +can be attached to the trainer object: + +.. code-block:: python + + >>> from pytorch_lightning.callbacks import Callback + >>> from kraken.lib.train import RecognitionModel, KrakenTrainer + >>> class MyPrintingCallback(Callback): + def on_init_start(self, trainer): + print("Starting to init trainer!") + + def on_init_end(self, trainer): + print("trainer is init now") + + def on_train_end(self, trainer, pl_module): + print("do something when training ends") + >>> ground_truth = glob.glob('training/*.xml') + >>> training_files = ground_truth[:250] # training data is shuffled internally + >>> evaluation_files = ground_truth[250:] + >>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True) + >>> trainer = KrakenTrainer(enable_progress_bar=False, callbacks=[MyPrintingCallback]) + >>> trainer.fit(model) + Starting to init trainer! + trainer is init now + +This is only a small subset of the training functionality. It is suggested to +have a closer look at the command line parameters for features as transfer +learning, region and baseline filtering, training continuation, and so on. diff --git a/main/_sources/api_docs.rst.txt b/main/_sources/api_docs.rst.txt new file mode 100644 index 000000000..494232c09 --- /dev/null +++ b/main/_sources/api_docs.rst.txt @@ -0,0 +1,289 @@ +************* +API Reference +************* + +Segmentation +============ + +kraken.blla module +------------------ + +.. note:: + + `blla` provides the interface to the fully trainable segmenter. For the + legacy segmenter interface refer to the `pageseg` module. Note that + recognition models are not interchangeable between segmenters. + +.. autoapifunction:: kraken.blla.segment + +kraken.pageseg module +--------------------- + +.. note:: + + `pageseg` is the legacy bounding box-based segmenter. For the trainable + baseline segmenter interface refer to the `blla` module. Note that + recognition models are not interchangeable between segmenters. + +.. autoapifunction:: kraken.pageseg.segment + +Recognition +=========== + +kraken.rpred module +------------------- + +.. autoapiclass:: kraken.rpred.mm_rpred + :members: + +.. autoapifunction:: kraken.rpred.rpred + +Serialization +============= + +kraken.serialization module +--------------------------- + +.. autoapifunction:: kraken.serialization.render_report + +.. autoapifunction:: kraken.serialization.serialize + +.. autoapifunction:: kraken.serialization.serialize_segmentation + +Default templates +----------------- + +.. _templates: + +ALTO 4.4 +^^^^^^^^ + +.. literalinclude:: ../kraken/templates/alto + :language: xml+jinja + +PageXML +^^^^^^^ + +.. literalinclude:: ../kraken/templates/alto + :language: xml+jinja + +hOCR +^^^^ + +.. literalinclude:: ../kraken/templates/alto + :language: xml+jinja + +ABBYY XML +^^^^^^^^^ + +.. literalinclude:: ../kraken/templates/abbyyxml + :language: xml+jinja + +Containers and Helpers +====================== + +kraken.lib.codec module +----------------------- + +.. autoapiclass:: kraken.lib.codec.PytorchCodec + :members: + +kraken.containers module +------------------------ + +.. autoapiclass:: kraken.containers.Segmentation + :members: + +.. autoapiclass:: kraken.containers.BaselineLine + :members: + +.. autoapiclass:: kraken.containers.BBoxLine + :members: + +.. autoapiclass:: kraken.containers.Region + :members: + +.. autoapiclass:: kraken.containers.ocr_record + :members: + +.. autoapiclass:: kraken.containers.BaselineOCRRecord + :members: + +.. autoapiclass:: kraken.containers.BBoxOCRRecord + :members: + +.. autoapiclass:: kraken.containers.ProcessingStep + :members: + +kraken.lib.ctc_decoder +---------------------- + +.. autoapifunction:: kraken.lib.ctc_decoder.beam_decoder + +.. autoapifunction:: kraken.lib.ctc_decoder.greedy_decoder + +.. autoapifunction:: kraken.lib.ctc_decoder.blank_threshold_decoder + +kraken.lib.exceptions +--------------------- + +.. autoapiclass:: kraken.lib.exceptions.KrakenCodecException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenStopTrainingException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenEncodeException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenRecordException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenInvalidModelException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenInputException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenRepoException + :members: + +.. autoapiclass:: kraken.lib.exceptions.KrakenCairoSurfaceException + :members: + +kraken.lib.models module +------------------------ + +.. autoapiclass:: kraken.lib.models.TorchSeqRecognizer + :members: + +.. autoapifunction:: kraken.lib.models.load_any + +kraken.lib.segmentation module +------------------------------ + +.. autoapifunction:: kraken.lib.segmentation.reading_order + +.. autoapifunction:: kraken.lib.segmentation.neural_reading_order + +.. autoapifunction:: kraken.lib.segmentation.polygonal_reading_order + +.. autoapifunction:: kraken.lib.segmentation.vectorize_lines + +.. autoapifunction:: kraken.lib.segmentation.calculate_polygonal_environment + +.. autoapifunction:: kraken.lib.segmentation.scale_polygonal_lines + +.. autoapifunction:: kraken.lib.segmentation.scale_regions + +.. autoapifunction:: kraken.lib.segmentation.compute_polygon_section + +.. autoapifunction:: kraken.lib.segmentation.extract_polygons + +kraken.lib.vgsl module +---------------------- + +.. autoapiclass:: kraken.lib.vgsl.TorchVGSLModel + :members: + +kraken.lib.xml module +--------------------- + +.. autoapiclass:: kraken.lib.xml.XMLPage + +Training +======== + +kraken.lib.train module +----------------------- + +Loss and Evaluation Functions +----------------------------- + +.. autoapifunction:: kraken.lib.train.recognition_loss_fn + +.. autoapifunction:: kraken.lib.train.baseline_label_loss_fn + +.. autoapifunction:: kraken.lib.train.recognition_evaluator_fn + +.. autoapifunction:: kraken.lib.train.baseline_label_evaluator_fn + +Trainer +------- + +.. autoapiclass:: kraken.lib.train.KrakenTrainer + :members: + + +kraken.lib.dataset module +------------------------- + +Recognition datasets +^^^^^^^^^^^^^^^^^^^^ + +.. autoapiclass:: kraken.lib.dataset.ArrowIPCRecognitionDataset + :members: + +.. autoapiclass:: kraken.lib.dataset.BaselineSet + :members: + +.. autoapiclass:: kraken.lib.dataset.GroundTruthDataset + :members: + +Segmentation datasets +^^^^^^^^^^^^^^^^^^^^^ + +.. autoapiclass:: kraken.lib.dataset.PolygonGTDataset + :members: + +Reading order datasets +^^^^^^^^^^^^^^^^^^^^^^ + +.. autoapiclass:: kraken.lib.dataset.PairWiseROSet + :members: + +.. autoapiclass:: kraken.lib.dataset.PageWiseROSet + :members: + +Helpers +^^^^^^^ + +.. autoapiclass:: kraken.lib.dataset.ImageInputTransforms + :members: + +.. autoapifunction:: kraken.lib.dataset.collate_sequences + +.. autoapifunction:: kraken.lib.dataset.global_align + +.. autoapifunction:: kraken.lib.dataset.compute_confusions + +Legacy modules +============== + +These modules are retained for compatibility reasons or highly specialized use +cases. In most cases their use is not necessary and they aren't further +developed for interoperability with new functionality, e.g. the transcription +and line generation modules do not work with the baseline segmenter. + +kraken.binarization module +-------------------------- + +.. autoapifunction:: kraken.binarization.nlbin + +kraken.transcribe module +------------------------ + +.. autoapiclass:: kraken.transcribe.TranscriptionInterface + :members: + +kraken.linegen module +--------------------- + +.. autoapiclass:: kraken.transcribe.LineGenerator + :members: + +.. autoapifunction:: kraken.transcribe.ocropy_degrade + +.. autoapifunction:: kraken.transcribe.degrade_line + +.. autoapifunction:: kraken.transcribe.distort_line diff --git a/main/_sources/gpu.rst.txt b/main/_sources/gpu.rst.txt new file mode 100644 index 000000000..fbb66ba76 --- /dev/null +++ b/main/_sources/gpu.rst.txt @@ -0,0 +1,10 @@ +.. _gpu: + +GPU Acceleration +================ + +The latest version of kraken uses a new pytorch backend which enables GPU +acceleration both for training and recognition. Apart from a compatible Nvidia +GPU, CUDA and cuDNN have to be installed so pytorch can run computation on it. + + diff --git a/main/_sources/index.rst.txt b/main/_sources/index.rst.txt new file mode 100644 index 000000000..dda41e35e --- /dev/null +++ b/main/_sources/index.rst.txt @@ -0,0 +1,247 @@ +kraken +====== + +.. toctree:: + :hidden: + :maxdepth: 2 + + advanced + Training + API Tutorial + API Reference + Models + +kraken is a turn-key OCR system optimized for historical and non-Latin script +material. + +Features +======== + +kraken's main features are: + + - Fully trainable :ref:`layout analysis `, :ref:`reading order `, and :ref:`character recognition ` + - `Right-to-Left `_, `BiDi + `_, and Top-to-Bottom + script support + - `ALTO `_, PageXML, abbyyXML, and hOCR + output + - Word bounding boxes and character cuts + - Multi-script recognition support + - :ref:`Public repository ` of model files + - :ref:`Variable recognition network architectures ` + +Pull requests and code contributions are always welcome. + +Installation +============ + +Kraken can be run on Linux or Mac OS X (both x64 and ARM). Installation through +the on-board *pip* utility and the `anaconda `_ +scientific computing python are supported. + +Installation using Pip +---------------------- + +.. code-block:: console + + $ pip install kraken + +or by running pip in the git repository: + +.. code-block:: console + + $ pip install . + +If you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to +install the `pdf` extras package for PyPi: + +.. code-block:: console + + $ pip install kraken[pdf] + +or + +.. code-block:: console + + $ pip install .[pdf] + +respectively. + +Installation using Conda +------------------------ + +To install the stable version through `conda `_: + +.. code-block:: console + + $ conda install -c conda-forge -c mittagessen kraken + +Again PDF/multi-page TIFF/JPEG2000 support requires some additional dependencies: + +.. code-block:: console + + $ conda install -c conda-forge pyvips + +The git repository contains some environment files that aid in setting up the latest development version: + +.. code-block:: console + + $ git clone https://github.com/mittagessen/kraken.git + $ cd kraken + $ conda env create -f environment.yml + +or: + +.. code-block:: console + + $ git clone https://github.com/mittagessen/kraken.git + $ cd kraken + $ conda env create -f environment_cuda.yml + +for CUDA acceleration with the appropriate hardware. + +Finding Recognition Models +-------------------------- + +Finally you'll have to scrounge up a model to do the actual recognition of +characters. To download the default model for printed French text and place it +in the kraken directory for the current user: + +:: + + $ kraken get 10.5281/zenodo.10592716 + + +A list of libre models available in the central repository can be retrieved by +running: + +.. code-block:: console + + $ kraken list + +Model metadata can be extracted using: + +.. code-block:: console + + $ kraken show 10.5281/zenodo.10592716 + name: 10.5281/zenodo.10592716 + + CATMuS-Print (Large, 2024-01-30) - Diachronic model for French prints and other languages + +

CATMuS-Print (Large) - Diachronic model for French prints and other West European languages

+

CATMuS (Consistent Approach to Transcribing ManuScript) Print is a Kraken HTR model trained on data produced by several projects, dealing with different languages (French, Spanish, German, English, Corsican, Catalan, Latin, Italian…) and different centuries (from the first prints of the 16th c. to digital documents of the 21st century).

+

Transcriptions follow graphematic principles and try to be as compatible as possible with guidelines previously published for French: no ligature (except those that still exist), no allographetic variants (except the long s), and preservation of the historical use of some letters (u/v, i/j). Abbreviations are not resolved. Inconsistencies might be present, because transcriptions have been done over several years and the norms have slightly evolved.

+

The model is trained with NFKD Unicode normalization: each diacritic (including superscripts) are transcribed as their own characters, separately from the "main" character.

+

This model is the result of the collaboration from researchers from the University of Geneva and Inria Paris and will be consolidated under the CATMuS Medieval Guidelines in an upcoming paper.

+ scripts: Latn + alphabet: !"#$%&'()*+,-./0123456789:;<=>?ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_abcdefghijklmnopqrstuvwxyz|}~¡£¥§«¬°¶·»¿ÆßæđłŒœƀǝɇΑΒΓΔΕΖΘΙΚΛΜΝΟΠΡΣΤΥΦΧΩαβγδεζηθικλμνξοπρςστυφχωϛחלרᑕᗅᗞᚠẞ–—‘’‚“”„‟†•⁄⁊⁋℟←▽◊★☙✠✺✻⟦⟧⬪ꝑꝓꝗꝙꝟꝯꝵ SPACE, COMBINING GRAVE ACCENT, COMBINING ACUTE ACCENT, COMBINING CIRCUMFLEX ACCENT, COMBINING TILDE, COMBINING MACRON, COMBINING DOT ABOVE, COMBINING DIAERESIS, COMBINING RING ABOVE, COMBINING COMMA ABOVE, COMBINING REVERSED COMMA ABOVE, COMBINING CEDILLA, COMBINING OGONEK, COMBINING GREEK PERISPOMENI, COMBINING GREEK YPOGEGRAMMENI, COMBINING LATIN SMALL LETTER I, COMBINING LATIN SMALL LETTER U, 0xe682, 0xe68b, 0xe8bf, 0xf1a7 + accuracy: 98.56% + license: cc-by-4.0 + author(s): Gabay, Simon; Clérice, Thibault + date: 2024-01-30 + +Quickstart +========== + +The structure of an OCR software consists of multiple steps, primarily +preprocessing, segmentation, and recognition, each of which takes the output of +the previous step and sometimes additional files such as models and templates +that define how a particular transformation is to be performed. + +In kraken these are separated into different subcommands that can be chained or +ran separately: + +.. raw:: html + :file: _static/kraken_workflow.svg + +Recognizing text on an image using the default parameters including the +prerequisite step of page segmentation: + +.. code-block:: console + + $ kraken -i image.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel + Loading RNN ✓ + Processing ⣻ + +To segment an image into reading-order sorted baselines and regions: + +.. code-block:: console + + $ kraken -i bw.tif lines.json segment -bl + +To OCR an image using the previously downloaded model: + +.. code-block:: console + + $ kraken -i bw.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel + +To OCR an image using the default model and serialize the output using the ALTO +template: + +.. code-block:: console + + $ kraken -a -i bw.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel + +All commands and their parameters are documented, just add the standard +``--help`` flag for further information. + +Training Tutorial +================= + +There is a training tutorial at :doc:`training`. + +Related Software +================ + +These days kraken is quite closely linked to the `eScriptorium +`_ project developed in the same eScripta research +group. eScriptorium provides a user-friendly interface for annotating data, +training models, and inference (but also much more). There is a `gitter channel +`_ that is mostly intended for +coordinating technical development but is also a spot to find people with +experience on applying kraken on a wide variety of material. + +.. _license: + +License +======= + +``Kraken`` is provided under the terms and conditions of the `Apache 2.0 +License `_. + +Funding +======= + +kraken is developed at the `École Pratique des Hautes Études `_, `Université PSL `_. + + +.. container:: twocol + + .. container:: leftside + + .. image:: _static/normal-reproduction-low-resolution.jpg + :width: 100 + :alt: Co-financed by the European Union + + .. container:: rightside + + This project was partially funded through the RESILIENCE project, funded from + the European Union’s Horizon 2020 Framework Programme for Research and + Innovation. + + +.. container:: twocol + + .. container:: leftside + + .. image:: https://projet.biblissima.fr/sites/default/files/2021-11/biblissima-baseline-sombre-ia.png + :width: 300 + :alt: Received funding from the Programme d’investissements d’Avenir + + .. container:: rightside + + Ce travail a bénéficié d’une aide de l’État gérée par l’Agence Nationale de la + Recherche au titre du Programme d’Investissements d’Avenir portant la référence + ANR-21-ESRE-0005 (Biblissima+). + + diff --git a/main/_sources/ketos.rst.txt b/main/_sources/ketos.rst.txt new file mode 100644 index 000000000..11f4d2a90 --- /dev/null +++ b/main/_sources/ketos.rst.txt @@ -0,0 +1,832 @@ +.. _ketos: + +Training +======== + +This page describes the training utilities available through the ``ketos`` +command line utility in depth. For a gentle introduction on model training +please refer to the :ref:`tutorial `. + +There are currently three trainable components in the kraken processing pipeline: +* Segmentation: finding lines and regions in images +* Reading Order: ordering lines found in the previous segmentation step. Reading order models are closely linked to segmentation models and both are usually trained on the same dataset. +* Recognition: recognition models transform images of lines into text. + +Depending on the use case it is not necessary to manually train new models for +each material. The default segmentation model works well on quite a variety of +handwritten and printed documents, a reading order model might not perform +better than the default heuristic for simple text flows, and there are +recognition models for some types of material available in the repository. + +Best practices +-------------- + +Recognition model training +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +* The default architecture works well for decently sized datasets. +* Use precompiled binary datasets and put them in a place where they can be memory mapped during training (local storage, not NFS or similar). +* Fixed splits in precompiled datasets increase memory use and slow down startup as the dataset needs to be loaded once into the dataset. It is recommended to create explicit splits by compiling source XML files into separate datasets. +* Use the ``--logger`` flag to track your training metrics across experiments using Tensorboard. +* If the network doesn't converge before the early stopping aborts training, increase ``--min-epochs`` or ``--lag``. Use the ``--logger`` option to inspect your training loss. +* Use the flag ``--augment`` to activate data augmentation. +* Increase the amount of ``--workers`` to speedup data loading. This is essential when you use the ``--augment`` option. +* When using an Nvidia GPU, set the ``--precision`` option to 16 to use automatic mixed precision (AMP). This can provide significant speedup without any loss in accuracy. +* Use option -B to scale batch size until GPU utilization reaches 100%. When using a larger batch size, it is recommended to use option -r to scale the learning rate by the square root of the batch size (1e-3 * sqrt(batch_size)). +* When fine-tuning, it is recommended to use `new` mode not `union` as the network will rapidly unlearn missing labels in the new dataset. +* If the new dataset is fairly dissimilar or your base model has been pretrained with ketos pretrain, use ``--warmup`` in conjunction with ``--freeze-backbone`` for one 1 or 2 epochs. +* Upload your models to the model repository. + +Segmentation model training +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +* The segmenter is fairly robust when it comes to hyperparameter choice. +* Start by finetuning from the default model for a fixed number of epochs (50 for reasonably sized datasets) with a cosine schedule. +* Segmentation models' performance is difficult to evaluate. Pixel accuracy doesn't mean much because there are many more pixels that aren't part of a line or region than just background. Frequency-weighted IoU is good for overall performance, while mean IoU overrepresents rare classes. The best way to evaluate segmentation models is to look at the output on unlabelled data. +* If you don't have rare classes you can use a fairly small validation set to make sure everything is converging and just visually validate on unlabelled data. + +Training data formats +--------------------- + +The training tools accept a variety of training data formats, usually some kind +of custom low level format, the XML-based formats that are commony used for +archival of annotation and transcription data, and in the case of recognizer +training a precompiled binary format. It is recommended to use the XML formats +for segmentation and reading order training and the binary format for +recognition training. + +ALTO +~~~~ + +Kraken parses and produces files according to ALTO 4.3. An example showing the +attributes necessary for segmentation, recognition, and reading order training +follows: + +.. literalinclude:: alto.xml + :language: xml + :force: + +Importantly, the parser only works with measurements in the pixel domain, i.e. +an unset `MeasurementUnit` or one with an element value of `pixel`. In +addition, as the minimal version required for ingestion is quite new it is +likely that most existing ALTO documents will not contain sufficient +information to be used with kraken out of the box. + +PAGE XML +~~~~~~~~ + +PAGE XML is parsed and produced according to the 2019-07-15 version of the +schema, although the parser is not strict and works with non-conformant output +from a variety of tools. As with ALTO, PAGE XML files can be used to train +segmentation, reading order, and recognition models. + +.. literalinclude:: pagexml.xml + :language: xml + :force: + +Binary Datasets +~~~~~~~~~~~~~~~ + +.. _binary_datasets: + +In addition to training recognition models directly from XML and image files, a +binary dataset format offering a couple of advantages is supported for +recognition training. Binary datasets drastically improve loading performance +allowing the saturation of most GPUs with minimal computational overhead while +also allowing training with datasets that are larger than the systems main +memory. A minor drawback is a ~30% increase in dataset size in comparison to +the raw images + XML approach. + +To realize this speedup the dataset has to be compiled first: + +.. code-block:: console + + $ ketos compile -f xml -o dataset.arrow file_1.xml file_2.xml ... + +if there are a lot of individual lines containing many lines this process can +take a long time. It can easily be parallelized by specifying the number of +separate parsing workers with the ``--workers`` option: + +.. code-block:: console + + $ ketos compile --workers 8 -f xml ... + +In addition, binary datasets can contain fixed splits which allow +reproducibility and comparability between training and evaluation runs. +Training, validation, and test splits can be pre-defined from multiple sources. +Per default they are sourced from tags defined in the source XML files unless +the option telling kraken to ignore them is set: + +.. code-block:: console + + $ ketos compile --ignore-splits -f xml ... + +Alternatively fixed-proportion random splits can be created ad-hoc during +compile time: + +.. code-block:: console + + $ ketos compile --random-split 0.8 0.1 0.1 ... + + +The above line splits assigns 80% of the source lines to the training set, 10% +to the validation set, and 10% to the test set. The training and validation +sets in the dataset file are used automatically by `ketos train` (unless told +otherwise) while the remaining 10% of the test set is selected by `ketos test`. + +.. warning:: + Fixed splits in datasets are ignored during training and testing per + default as they require loading the entire dataset into main memory at + once, drastically increasing memory consumption and causing initial delays. + Use the `\-\-fixed-splits` option in `ketos train` and `ketos test` to + respect fixed splits. + +Recognition training +-------------------- + +.. _predtrain: + +The training utility allows training of :ref:`VGSL ` specified models +both from scratch and from existing models. Here are its most important command line options: + +======================================================= ====== +option action +======================================================= ====== +-o, \--output Output model file prefix. Defaults to model. +-s, \--spec VGSL spec of the network to train. CTC layer + will be added automatically. default: + [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 + Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] +-a, \--append Removes layers before argument and then + appends spec. Only works when loading an + existing model +-i, \--load Load existing file to continue training +-F, \--savefreq Model save frequency in epochs during + training +-q, \--quit Stop condition for training. Set to `early` + for early stopping (default) or `fixed` for fixed + number of epochs. +-N, \--epochs Number of epochs to train for. +\--min-epochs Minimum number of epochs to train for when using early stopping. +\--lag Number of epochs to wait before stopping + training without improvement. Only used when using early stopping. +-d, \--device Select device to use (cpu, cuda:0, cuda:1,...). GPU acceleration requires CUDA. +\--optimizer Select optimizer (Adam, SGD, RMSprop). +-r, \--lrate Learning rate [default: 0.001] +-m, \--momentum Momentum used with SGD optimizer. Ignored otherwise. +-w, \--weight-decay Weight decay. +\--schedule Sets the learning rate scheduler. May be either constant, 1cycle, exponential, cosine, step, or + reduceonplateau. For 1cycle the cycle length is determined by the `--epoch` option. +-p, \--partition Ground truth data partition ratio between train/validation set +-u, \--normalization Ground truth Unicode normalization. One of NFC, NFKC, NFD, NFKD. +-c, \--codec Load a codec JSON definition (invalid if loading existing model) +\--resize Codec/output layer resizing option. If set + to `union` code points will be added, `new` + will set the layer to match exactly the + training data, `fail` will abort if training + data and model codec do not match. Only valid when refining an existing model. +-n, \--reorder / \--no-reorder Reordering of code points to display order. +-t, \--training-files File(s) with additional paths to training data. Used to + enforce an explicit train/validation set split and deal with + training sets with more lines than the command line can process. Can be used more than once. +-e, \--evaluation-files File(s) with paths to evaluation data. Overrides the `-p` parameter. +-f, \--format-type Sets the training and evaluation data format. + Valid choices are 'path', 'xml' (default), 'alto', 'page', or binary. + In `alto`, `page`, and xml mode all data is extracted from XML files + containing both baselines and a link to source images. + In `path` mode arguments are image files sharing a prefix up to the last + extension with JSON `.path` files containing the baseline information. + In `binary` mode arguments are precompiled binary dataset files. +\--augment / \--no-augment Enables/disables data augmentation. +\--workers Number of OpenMP threads and workers used to perform neural network passes and load samples from the dataset. +======================================================= ====== + +From Scratch +~~~~~~~~~~~~ + +The absolute minimal example to train a new recognition model from a number of +ALTO or PAGE XML documents is similar to the segmentation training: + +.. code-block:: console + + $ ketos train -f xml training_data/*.xml + +Training will continue until the error does not improve anymore and the best +model (among intermediate results) will be saved in the current directory; this +approach is called early stopping. + +In some cases changing the network architecture might be useful. One such +example would be material that is not well recognized in the grayscale domain, +as the default architecture definition converts images into grayscale. The +input definition can be changed quite easily to train on color data (RGB) instead: + +.. code-block:: console + + $ ketos train -f page -s '[1,120,0,3 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do0.1,2 Lbx200 Do]]' syr/*.xml + +Complete documentation for the network description language can be found on the +:ref:`VGSL ` page. + +Sometimes the early stopping default parameters might produce suboptimal +results such as stopping training too soon. Adjusting the lag can be useful: + +.. code-block:: console + + $ ketos train --lag 10 syr/*.png + +To switch optimizers from Adam to SGD or RMSprop just set the option: + +.. code-block:: console + + $ ketos train --optimizer SGD syr/*.png + +It is possible to resume training from a previously saved model: + +.. code-block:: console + + $ ketos train -i model_25.mlmodel syr/*.png + +A good configuration for a small precompiled print dataset and GPU acceleration +would be: + +.. code-block:: console + + $ ketos train -d cuda -f binary dataset.arrow + +A better configuration for large and complicated datasets such as handwritten texts: + +.. code-block:: console + + $ ketos train --augment --workers 4 -d cuda -f binary --min-epochs 20 -w 0 -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]' -r 0.0001 dataset_large.arrow + +This configuration is slower to train and often requires a couple of epochs to +output any sensible text at all. Therefore we tell ketos to train for at least +20 epochs so the early stopping algorithm doesn't prematurely interrupt the +training process. + +Fine Tuning +~~~~~~~~~~~ + +Fine tuning an existing model for another typeface or new characters is also +possible with the same syntax as resuming regular training: + +.. code-block:: console + + $ ketos train -f page -i model_best.mlmodel syr/*.xml + +The caveat is that the alphabet of the base model and training data have to be +an exact match. Otherwise an error will be raised: + +.. code-block:: console + + $ ketos train -i model_5.mlmodel kamil/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [0.8616] alphabet mismatch {'~', '»', '8', '9', 'ـ'} + Network codec not compatible with training set + [0.8620] Training data and model codec alphabets mismatch: {'ٓ', '؟', '!', 'ص', '،', 'ذ', 'ة', 'ي', 'و', 'ب', 'ز', 'ح', 'غ', '~', 'ف', ')', 'د', 'خ', 'م', '»', 'ع', 'ى', 'ق', 'ش', 'ا', 'ه', 'ك', 'ج', 'ث', '(', 'ت', 'ظ', 'ض', 'ل', 'ط', '؛', 'ر', 'س', 'ن', 'ء', 'ٔ', '«', 'ـ', 'ٕ'} + +There are two modes dealing with mismatching alphabets, ``union`` and ``new``. +``union`` resizes the output layer and codec of the loaded model to include all +characters in the new training set without removing any characters. ``new`` +will make the resulting model an exact match with the new training set by both +removing unused characters from the model and adding new ones. + +.. code-block:: console + + $ ketos -v train --resize union -i model_5.mlmodel syr/*.png + ... + [0.7943] Training set 788 lines, validation set 88 lines, alphabet 50 symbols + ... + [0.8337] Resizing codec to include 3 new code points + [0.8374] Resizing last layer in network to 52 outputs + ... + +In this example 3 characters were added for a network that is able to +recognize 52 different characters after sufficient additional training. + +.. code-block:: console + + $ ketos -v train --resize new -i model_5.mlmodel syr/*.png + ... + [0.7593] Training set 788 lines, validation set 88 lines, alphabet 49 symbols + ... + [0.7857] Resizing network or given codec to 49 code sequences + [0.8344] Deleting 2 output classes from network (46 retained) + ... + +In ``new`` mode 2 of the original characters were removed and 3 new ones were added. + +Slicing +~~~~~~~ + +Refining on mismatched alphabets has its limits. If the alphabets are highly +different the modification of the final linear layer to add/remove character +will destroy the inference capabilities of the network. In those cases it is +faster to slice off the last few layers of the network and only train those +instead of a complete network from scratch. + +Taking the default network definition as printed in the debug log we can see +the layer indices of the model: + +.. code-block:: console + + [0.8760] Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 48 outputs + [0.8762] layer type params + [0.8790] 0 conv kernel 3 x 3 filters 32 activation r + [0.8795] 1 dropout probability 0.1 dims 2 + [0.8797] 2 maxpool kernel 2 x 2 stride 2 x 2 + [0.8802] 3 conv kernel 3 x 3 filters 64 activation r + [0.8804] 4 dropout probability 0.1 dims 2 + [0.8806] 5 maxpool kernel 2 x 2 stride 2 x 2 + [0.8813] 6 reshape from 1 1 x 12 to 1/3 + [0.8876] 7 rnn direction b transposed False summarize False out 100 legacy None + [0.8878] 8 dropout probability 0.5 dims 1 + [0.8883] 9 linear augmented False out 48 + +To remove everything after the initial convolutional stack and add untrained +layers we define a network stub and index for appending: + +.. code-block:: console + + $ ketos train -i model_1.mlmodel --append 7 -s '[Lbx256 Do]' syr/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [0.8014] alphabet mismatch {'8', '3', '9', '7', '܇', '݀', '݂', '4', ':', '0'} + Slicing and dicing model ✓ + +The new model will behave exactly like a new one, except potentially training a +lot faster. + +Text Normalization and Unicode +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. note: + + The description of the different behaviors of Unicode text below are highly + abbreviated. If confusion arrises it is recommended to take a look at the + linked documents which are more exhaustive and include visual examples. + +Text can be encoded in multiple different ways when using Unicode. For many +scripts characters with diacritics can be encoded either as a single code point +or a base character and the diacritic, `different types of whitespace +`_ exist, and mixed bidirectional text +can be written differently depending on the `base line direction +`_. + +Ketos provides options to largely normalize input into normalized forms that +make processing of data from multiple sources possible. Principally, two +options are available: one for `Unicode normalization +`_ and one for whitespace normalization. The +Unicode normalization (disabled per default) switch allows one to select one of +the 4 normalization forms: + +.. code-block:: console + + $ ketos train --normalization NFD -f xml training_data/*.xml + $ ketos train --normalization NFC -f xml training_data/*.xml + $ ketos train --normalization NFKD -f xml training_data/*.xml + $ ketos train --normalization NFKC -f xml training_data/*.xml + +Whitespace normalization is enabled per default and converts all Unicode +whitespace characters into a simple space. It is highly recommended to leave +this function enabled as the variation of space width, resulting either from +text justification or the irregularity of handwriting, is difficult for a +recognition model to accurately model and map onto the different space code +points. Nevertheless it can be disabled through: + +.. code-block:: console + + $ ketos train --no-normalize-whitespace -f xml training_data/*.xml + +Further the behavior of the `BiDi algorithm +`_ can be influenced through two options. The +configuration of the algorithm is important as the recognition network is +trained to output characters (or rather labels which are mapped to code points +by a :ref:`codec `) in the order a line is fed into the network, i.e. +left-to-right also called display order. Unicode text is encoded as a stream of +code points in logical order, i.e. the order the characters in a line are read +in by a human reader, for example (mostly) right-to-left for a text in Hebrew. +The BiDi algorithm resolves this logical order to the display order expected by +the network and vice versa. The primary parameter of the algorithm is the base +direction which is just the default direction of the input fields of the user +when the ground truth was initially transcribed. Base direction will be +automatically determined by kraken when using PAGE XML or ALTO files that +contain it, otherwise it will have to be supplied if it differs from the +default when training a model: + +.. code-block:: console + + $ ketos train --base-dir R -f xml rtl_training_data/*.xml + +It is also possible to disable BiDi processing completely, e.g. when the text +has been brought into display order already: + +.. code-block:: console + + $ ketos train --no-reorder -f xml rtl_display_data/*.xml + +Codecs +~~~~~~ + +.. _codecs: + +Codecs map between the label decoded from the raw network output and Unicode +code points (see :ref:`this ` diagram for the precise steps +involved in text line recognition). Codecs are attached to a recognition model +and are usually defined once at initial training time, although they can be +adapted either explicitly (with the API) or implicitly through domain adaptation. + +The default behavior of kraken is to auto-infer this mapping from all the +characters in the training set and map each code point to one separate label. +This is usually sufficient for alphabetic scripts, abjads, and abugidas apart +from very specialised use cases. Logographic writing systems with a very large +number of different graphemes, such as all the variants of Han characters or +Cuneiform, can be more problematic as their large inventory makes recognition +both slow and error-prone. In such cases it can be advantageous to decompose +each code point into multiple labels to reduce the output dimensionality of the +network. During decoding valid sequences of labels will be mapped to their +respective code points as usual. + +There are multiple approaches one could follow constructing a custom codec: +*randomized block codes*, i.e. producing random fixed-length labels for each code +point, *Huffmann coding*, i.e. variable length label sequences depending on the +frequency of each code point in some text (not necessarily the training set), +or *structural decomposition*, i.e. describing each code point through a +sequence of labels that describe the shape of the grapheme similar to how some +input systems for Chinese characters function. + +While the system is functional it is not well-tested in practice and it is +unclear which approach works best for which kinds of inputs. + +Custom codecs can be supplied as simple JSON files that contain a dictionary +mapping between strings and integer sequences, e.g.: + +.. code-block:: console + + $ ketos train -c sample.codec -f xml training_data/*.xml + +with `sample.codec` containing: + +.. code-block:: json + + {"S": [50, 53, 74, 23], + "A": [95, 60, 19, 95], + "B": [2, 96, 28, 29], + "\u1f05": [91, 14, 95, 90]} + +Unsupervised recognition pretraining +------------------------------------ + +Text recognition models can be pretrained in an unsupervised fashion from text +line images, both in bounding box and baseline format. The pretraining is +performed through a contrastive surrogate task aiming to distinguish in-painted +parts of the input image features from randomly sampled distractor slices. + +All data sources accepted by the supervised trainer are valid for pretraining +but for performance reasons it is recommended to use pre-compiled binary +datasets. One thing to keep in mind is that compilation filters out empty +(non-transcribed) text lines per default which is undesirable for pretraining. +With the ``--keep-empty-lines`` option all valid lines will be written to the +dataset file: + +.. code-block:: console + + $ ketos compile --keep-empty-lines -f xml -o foo.arrow *.xml + + +The basic pretraining call is very similar to a training one: + +.. code-block:: console + + $ ketos pretrain -f binary foo.arrow + +There are a couple of hyperparameters that are specific to pretraining: the +mask width (at the subsampling level of the last convolutional layer), the +probability of a particular position being the start position of a mask, and +the number of negative distractor samples. + +.. code-block:: console + + $ ketos pretrain -o pretrain --mask-width 4 --mask-probability 0.2 --num-negatives 3 -f binary foo.arrow + +Once a model has been pretrained it has to be adapted to perform actual +recognition with a standard labelled dataset, although training data +requirements will usually be much reduced: + +.. code-block:: console + + $ ketos train -i pretrain_best.mlmodel --warmup 5000 --freeze-backbone 1000 -f binary labelled.arrow + +It is necessary to use learning rate warmup (`warmup`) for at least a couple of +epochs in addition to freezing the backbone (all but the last fully connected +layer performing the classification) to have the model converge during +fine-tuning. Fine-tuning models from pre-trained weights is quite a bit less +stable than training from scratch or fine-tuning an existing model. As such it +can be necessary to run a couple of trials with different hyperparameters +(principally learning rate) to find workable ones. It is entirely possible that +pretrained models do not converge at all even with reasonable hyperparameter +configurations. + +Segmentation training +--------------------- + +.. _segtrain: + +Training a segmentation model is very similar to training models for text +recognition. The basic invocation is: + +.. code-block:: console + + $ ketos segtrain -f xml training_data/*.xml + +This takes all text lines and regions encoded in the XML files and trains a +model to recognize them. + +Most other options available in transcription training are also available in +segmentation training. CUDA acceleration: + +.. code-block:: console + + $ ketos segtrain -d cuda -f xml training_data/*.xml + +Defining custom architectures: + +.. code-block:: console + + $ ketos segtrain -d cuda -s '[1,1200,0,3 Cr7,7,64,2,2 Gn32 Cr3,3,128,2,2 Gn32 Cr3,3,128 Gn32 Cr3,3,256 Gn32]' -f xml training_data/*.xml + +Fine tuning/transfer learning with last layer adaptation and slicing: + +.. code-block:: console + + $ ketos segtrain --resize new -i segmodel_best.mlmodel training_data/*.xml + $ ketos segtrain -i segmodel_best.mlmodel --append 7 -s '[Cr3,3,64 Do0.1]' training_data/*.xml + +In addition there are a couple of specific options that allow filtering of +baseline and region types. Datasets are often annotated to a level that is too +detailed or contains undesirable types, e.g. when combining segmentation data +from different sources. The most basic option is the suppression of *all* of +either baseline or region data contained in the dataset: + +.. code-block:: console + + $ ketos segtrain --suppress-baselines -f xml training_data/*.xml + Training line types: + Training region types: + graphic 3 135 + text 4 1128 + separator 5 5431 + paragraph 6 10218 + table 7 16 + ... + $ ketos segtrain --suppress-regions -f xml training-data/*.xml + Training line types: + default 2 53980 + foo 8 134 + ... + +It is also possible to filter out baselines/regions selectively: + +.. code-block:: console + + $ ketos segtrain -f xml --valid-baselines default training_data/*.xml + Training line types: + default 2 53980 + Training region types: + graphic 3 135 + text 4 1128 + separator 5 5431 + paragraph 6 10218 + table 7 16 + $ ketos segtrain -f xml --valid-regions graphic --valid-regions paragraph training_data/*.xml + Training line types: + default 2 53980 + Training region types: + graphic 3 135 + paragraph 6 10218 + +Finally, we can merge baselines and regions into each other: + +.. code-block:: console + + $ ketos segtrain -f xml --merge-baselines default:foo training_data/*.xml + Training line types: + default 2 54114 + ... + $ ketos segtrain -f xml --merge-regions text:paragraph --merge-regions graphic:table training_data/*.xml + ... + Training region types: + graphic 3 151 + text 4 11346 + separator 5 5431 + ... + +These options are combinable to massage the dataset into any typology you want. +Tags containing the separator character `:` can be specified by escaping them +with backslash. + +Then there are some options that set metadata fields controlling the +postprocessing. When computing the bounding polygons the recognized baselines +are offset slightly to ensure overlap with the line corpus. This offset is per +default upwards for baselines but as it is possible to annotate toplines (for +scripts like Hebrew) and centerlines (for baseline-free scripts like Chinese) +the appropriate offset can be selected with an option: + +.. code-block:: console + + $ ketos segtrain --topline -f xml hebrew_training_data/*.xml + $ ketos segtrain --centerline -f xml chinese_training_data/*.xml + $ ketos segtrain --baseline -f xml latin_training_data/*.xml + +Lastly, there are some regions that are absolute boundaries for text line +content. When these regions are marked as such the polygonization can sometimes +be improved: + +.. code-block:: console + + $ ketos segtrain --bounding-regions paragraph -f xml training_data/*.xml + ... + +Reading order training +---------------------- + +.. _rotrain: + +Reading order models work slightly differently from segmentation and reading +order models. They are closely linked to the typology used in the dataset they +were trained on as they use type information on lines and regions to make +ordering decisions. As the same typology was probably used to train a specific +segmentation model, reading order models are trained separately but bundled +with their segmentation model in a subsequent step. The general sequence is +therefore: + +.. code-block:: console + + $ ketos segtrain -o fr_manu_seg.mlmodel -f xml french/*.xml + ... + $ ketos rotrain -o fr_manu_ro.mlmodel -f xml french/*.xml + ... + $ ketos roadd -o fr_manu_seg_with_ro.mlmodel -i fr_manu_seg_best.mlmodel -r fr_manu_ro_best.mlmodel + +Only the `fr_manu_seg_with_ro.mlmodel` file will contain the trained reading +order model. Segmentation models can exist with or without reading order +models. If one is added, the neural reading order will be computed *in +addition* to the one produced by the default heuristic during segmentation and +serialized in the final XML output (in ALTO/PAGE XML). + +.. note:: + + Reading order models work purely on the typology and geometric features + of the lines and regions. They construct an approximate ordering matrix + by feeding feature vectors of two lines (or regions) into the network + to decide which of those two lines precedes the other. + + These feature vectors are quite simple; just the lines' types, and + their start, center, and end points. Therefore they can *not* reliably + learn any ordering relying on graphical features of the input page such + as: line color, typeface, or writing system. + +Reading order models are extremely simple and do not require a lot of memory or +computational power to train. In fact, the default parameters are extremely +conservative and it is recommended to increase the batch size for improved +training speed. Large batch size above 128k are easily possible with +sufficiently large training datasets: + +.. code-block:: console + + $ ketos rotrain -o fr_manu_ro.mlmodel -B 128000 -f french/*.xml + Training RO on following baselines types: + DefaultLine 1 + DropCapitalLine 2 + HeadingLine 3 + InterlinearLine 4 + GPU available: False, used: False + TPU available: False, using: 0 TPU cores + IPU available: False, using: 0 IPUs + HPU available: False, using: 0 HPUs + ┏━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓ + ┃ ┃ Name ┃ Type ┃ Params ┃ + ┡━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩ + │ 0 │ criterion │ BCEWithLogitsLoss │ 0 │ + │ 1 │ ro_net │ MLP │ 1.1 K │ + │ 2 │ ro_net.fc1 │ Linear │ 1.0 K │ + │ 3 │ ro_net.relu │ ReLU │ 0 │ + │ 4 │ ro_net.fc2 │ Linear │ 45 │ + └───┴─────────────┴───────────────────┴────────┘ + Trainable params: 1.1 K + Non-trainable params: 0 + Total params: 1.1 K + Total estimated model params size (MB): 0 + stage 0/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/35 0:00:00 • -:--:-- 0.00it/s val_spearman: 0.912 val_loss: 0.701 early_stopping: 0/300 inf + +During validation a metric called Spearman's footrule is computed. To calculate +Spearman's footrule, the ranks of the lines of text in the ground truth reading +order and the predicted reading order are compared. The footrule is then +calculated as the sum of the absolute differences between the ranks of pairs of +lines. The score increases by 1 for each line between the correct and predicted +positions of a line. + +A lower footrule score indicates a better alignment between the two orders. A +score of 0 implies perfect alignment of line ranks. + +Recognition testing +------------------- + +Picking a particular model from a pool or getting a more detailed look on the +recognition accuracy can be done with the `test` command. It uses transcribed +lines, the test set, in the same format as the `train` command, recognizes the +line images with one or more models, and creates a detailed report of the +differences from the ground truth for each of them. + +======================================================= ====== +option action +======================================================= ====== +-f, \--format-type Sets the test set data format. + Valid choices are 'path', 'xml' (default), 'alto', 'page', or binary. + In `alto`, `page`, and xml mode all data is extracted from XML files + containing both baselines and a link to source images. + In `path` mode arguments are image files sharing a prefix up to the last + extension with JSON `.path` files containing the baseline information. + In `binary` mode arguments are precompiled binary dataset files. +-m, \--model Model(s) to evaluate. +-e, \--evaluation-files File(s) with paths to evaluation data. +-d, \--device Select device to use. +\--pad Left and right padding around lines. +======================================================= ====== + +Transcriptions are handed to the command in the same way as for the `train` +command, either through a manifest with ``-e/--evaluation-files`` or by just +adding a number of image files as the final argument: + +.. code-block:: console + + $ ketos test -m $model -e test.txt test/*.png + Evaluating $model + Evaluating [####################################] 100% + === report test_model.mlmodel === + + 7012 Characters + 6022 Errors + 14.12% Accuracy + + 5226 Insertions + 2 Deletions + 794 Substitutions + + Count Missed %Right + 1567 575 63.31% Common + 5230 5230 0.00% Arabic + 215 215 0.00% Inherited + + Errors Correct-Generated + 773 { ا } - { } + 536 { ل } - { } + 328 { و } - { } + 274 { ي } - { } + 266 { م } - { } + 256 { ب } - { } + 246 { ن } - { } + 241 { SPACE } - { } + 207 { ر } - { } + 199 { ف } - { } + 192 { ه } - { } + 174 { ع } - { } + 172 { ARABIC HAMZA ABOVE } - { } + 144 { ت } - { } + 136 { ق } - { } + 122 { س } - { } + 108 { ، } - { } + 106 { د } - { } + 82 { ك } - { } + 81 { ح } - { } + 71 { ج } - { } + 66 { خ } - { } + 62 { ة } - { } + 60 { ص } - { } + 39 { ، } - { - } + 38 { ش } - { } + 30 { ا } - { - } + 30 { ن } - { - } + 29 { ى } - { } + 28 { ذ } - { } + 27 { ه } - { - } + 27 { ARABIC HAMZA BELOW } - { } + 25 { ز } - { } + 23 { ث } - { } + 22 { غ } - { } + 20 { م } - { - } + 20 { ي } - { - } + 20 { ) } - { } + 19 { : } - { } + 19 { ط } - { } + 19 { ل } - { - } + 18 { ، } - { . } + 17 { ة } - { - } + 16 { ض } - { } + ... + Average accuracy: 14.12%, (stddev: 0.00) + +The report(s) contains character accuracy measured per script and a detailed +list of confusions. When evaluating multiple models the last line of the output +will the average accuracy and the standard deviation across all of them. diff --git a/main/_sources/models.rst.txt b/main/_sources/models.rst.txt new file mode 100644 index 000000000..b393f0738 --- /dev/null +++ b/main/_sources/models.rst.txt @@ -0,0 +1,24 @@ +.. _models: + +Models +====== + +There are currently three kinds of models containing the recurrent neural +networks doing all the character recognition supported by kraken: ``pronn`` +files serializing old pickled ``pyrnn`` models as protobuf, clstm's native +serialization, and versatile `Core ML +`_ models. + +CoreML +------ + +Core ML allows arbitrary network architectures in a compact serialization with +metadata. This is the default format in pytorch-based kraken. + +Segmentation Models +------------------- + +Recognition Models +------------------ + + diff --git a/main/_sources/training.rst.txt b/main/_sources/training.rst.txt new file mode 100644 index 000000000..704727aa5 --- /dev/null +++ b/main/_sources/training.rst.txt @@ -0,0 +1,463 @@ +.. _training: + +Training kraken +=============== + +kraken is an optical character recognition package that can be trained fairly +easily for a large number of scripts. In contrast to other system requiring +segmentation down to glyph level before classification, it is uniquely suited +for the recognition of connected scripts, because the neural network is trained +to assign correct character to unsegmented training data. + +Both segmentation, the process finding lines and regions on a page image, and +recognition, the conversion of line images into text, can be trained in kraken. +To train models for either we require training data, i.e. examples of page +segmentations and transcriptions that are similar to what we want to be able to +recognize. For segmentation the examples are the location of baselines, i.e. +the imaginary lines the text is written on, and polygons of regions. For +recognition these are the text contained in a line. There are multiple ways to +supply training data but the easiest is through PageXML or ALTO files. + +Installing kraken +----------------- + +The easiest way to install and use kraken is through `conda +`_. kraken works both on Linux and Mac OS +X. After installing conda, download the environment file and create the +environment for kraken: + +.. code-block:: console + + $ wget https://raw.githubusercontent.com/mittagessen/kraken/main/environment.yml + $ conda env create -f environment.yml + +Each time you want to use the kraken environment in a shell is has to be +activated first: + +.. code-block:: console + + $ conda activate kraken + +Image acquisition and preprocessing +----------------------------------- + +First a number of high quality scans, preferably color or grayscale and at +least 300dpi are required. Scans should be in a lossless image format such as +TIFF or PNG, images in PDF files have to be extracted beforehand using a tool +such as ``pdftocairo`` or ``pdfimages``. While each of these requirements can +be relaxed to a degree, the final accuracy will suffer to some extent. For +example, only slightly compressed JPEG scans are generally suitable for +training and recognition. + +Depending on the source of the scans some preprocessing such as splitting scans +into pages, correcting skew and warp, and removing speckles can be advisable +although it isn't strictly necessary as the segmenter can be trained to treat +noisy material with a high accuracy. A fairly user-friendly software for +semi-automatic batch processing of image scans is `Scantailor +`_ albeit most work can be done using a standard image +editor. + +The total number of scans required depends on the kind of model to train +(segmentation or recognition), the complexity of the layout or the nature of +the script to recognize. Only features that are found in the training data can +later be recognized, so it is important that the coverage of typographic +features is exhaustive. Training a small segmentation model for a particular +kind of material might require less than a few hundred samples while a general +model can well go into the thousands of pages. Likewise a specific recognition +model for printed script with a small grapheme inventory such as Arabic or +Hebrew requires around 800 lines, with manuscripts, complex scripts (such as +polytonic Greek), and general models for multiple typefaces and hands needing +more training data for the same accuracy. + +There is no hard rule for the amount of training data and it may be required to +retrain a model after the initial training data proves insufficient. Most +``western`` texts contain between 25 and 40 lines per page, therefore upward of +30 pages have to be preprocessed and later transcribed. + +Annotation and transcription +---------------------------- + +kraken does not provide internal tools for the annotation and transcription of +baselines, regions, and text. There are a number of tools available that can +create ALTO and PageXML files containing the requisite information for either +segmentation or recognition training: `escriptorium +`_ integrates kraken tightly including +training and inference, `Aletheia +`_ is a powerful desktop +application that can create fine grained annotations. + +Dataset Compilation +------------------- + +.. _compilation: + +Training +-------- + +.. _training_step: + +The training data, e.g. a collection of PAGE XML documents, obtained through +annotation and transcription may now be used to train segmentation and/or +transcription models. + +The training data in ``output_dir`` may now be used to train a new model by +invoking the ``ketos train`` command. Just hand a list of images to the command +such as: + +.. code-block:: console + + $ ketos train output_dir/*.png + +to start training. + +A number of lines will be split off into a separate held-out set that is used +to estimate the actual recognition accuracy achieved in the real world. These +are never shown to the network during training but will be recognized +periodically to evaluate the accuracy of the model. Per default the validation +set will comprise of 10% of the training data. + +Basic model training is mostly automatic albeit there are multiple parameters +that can be adjusted: + +--output + Sets the prefix for models generated during training. They will best as + ``prefix_epochs.mlmodel``. +--report + How often evaluation passes are run on the validation set. It is an + integer equal or larger than 1 with 1 meaning a report is created each + time the complete training set has been seen by the network. +--savefreq + How often intermediate models are saved to disk. It is an integer with + the same semantics as ``--report``. +--load + Continuing training is possible by loading an existing model file with + ``--load``. To continue training from a base model with another + training set refer to the full :ref:`ketos ` documentation. +--preload + Enables/disables preloading of the training set into memory for + accelerated training. The default setting preloads data sets with less + than 2500 lines, explicitly adding ``--preload`` will preload arbitrary + sized sets. ``--no-preload`` disables preloading in all circumstances. + +Training a network will take some time on a modern computer, even with the +default parameters. While the exact time required is unpredictable as training +is a somewhat random process a rough guide is that accuracy seldom improves +after 50 epochs reached between 8 and 24 hours of training. + +When to stop training is a matter of experience; the default setting employs a +fairly reliable approach known as `early stopping +`_ that stops training as soon as +the error rate on the validation set doesn't improve anymore. This will +prevent `overfitting `_, i.e. +fitting the model to recognize only the training data properly instead of the +general patterns contained therein. + +.. code-block:: console + + $ ketos train output_dir/*.png + Building training set [####################################] 100% + Building validation set [####################################] 100% + [270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'} + Initializing model ✓ + Accuracy report (0) -1.5951 3680 9550 + epoch 0/-1 [####################################] 788/788 + Accuracy report (1) 0.0245 3504 3418 + epoch 1/-1 [####################################] 788/788 + Accuracy report (2) 0.8445 3504 545 + epoch 2/-1 [####################################] 788/788 + Accuracy report (3) 0.9541 3504 161 + epoch 3/-1 [------------------------------------] 13/788 0d 00:22:09 + ... + +By now there should be a couple of models model_name-1.mlmodel, +model_name-2.mlmodel, ... in the directory the script was executed in. Lets +take a look at each part of the output. + +.. code-block:: console + + Building training set [####################################] 100% + Building validation set [####################################] 100% + +shows the progress of loading the training and validation set into memory. This +might take a while as preprocessing the whole set and putting it into memory is +computationally intensive. Loading can be made faster without preloading at the +cost of performing preprocessing repeatedly during the training process. + +.. code-block:: console + + [270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'} + +is a warning about missing characters in either the validation or training set, +i.e. that the alphabets of the sets are not equal. Increasing the size of the +validation set will often remedy this warning. + +.. code-block:: console + + Accuracy report (2) 0.8445 3504 545 + +this line shows the results of the validation set evaluation. The error after 2 +epochs is 545 incorrect characters out of 3504 characters in the validation set +for a character accuracy of 84.4%. It should decrease fairly rapidly. If +accuracy remains around 0.30 something is amiss, e.g. non-reordered +right-to-left or wildly incorrect transcriptions. Abort training, correct the +error(s) and start again. + +After training is finished the best model is saved as +``model_name_best.mlmodel``. It is highly recommended to also archive the +training log and data for later reference. + +``ketos`` can also produce more verbose output with training set and network +information by appending one or more ``-v`` to the command: + +.. code-block:: console + + $ ketos -vv train syr/*.png + [0.7272] Building ground truth set from 876 line images + [0.7281] Taking 88 lines from training for evaluation + ... + [0.8479] Training set 788 lines, validation set 88 lines, alphabet 48 symbols + [0.8481] alphabet mismatch {'\xa0', '0', ':', '݀', '܇', '݂', '5'} + [0.8482] grapheme count + [0.8484] SPACE 5258 + [0.8484] ܐ 3519 + [0.8485] ܘ 2334 + [0.8486] ܝ 2096 + [0.8487] ܠ 1754 + [0.8487] ܢ 1724 + [0.8488] ܕ 1697 + [0.8489] ܗ 1681 + [0.8489] ܡ 1623 + [0.8490] ܪ 1359 + [0.8491] ܬ 1339 + [0.8491] ܒ 1184 + [0.8492] ܥ 824 + [0.8492] . 811 + [0.8493] COMBINING DOT BELOW 646 + [0.8493] ܟ 599 + [0.8494] ܫ 577 + [0.8495] COMBINING DIAERESIS 488 + [0.8495] ܚ 431 + [0.8496] ܦ 428 + [0.8496] ܩ 307 + [0.8497] COMBINING DOT ABOVE 259 + [0.8497] ܣ 256 + [0.8498] ܛ 204 + [0.8498] ܓ 176 + [0.8499] ܀ 132 + [0.8499] ܙ 81 + [0.8500] * 66 + [0.8501] ܨ 59 + [0.8501] ܆ 40 + [0.8502] [ 40 + [0.8503] ] 40 + [0.8503] 1 18 + [0.8504] 2 11 + [0.8504] ܇ 9 + [0.8505] 3 8 + [0.8505] 6 + [0.8506] 5 5 + [0.8506] NO-BREAK SPACE 4 + [0.8507] 0 4 + [0.8507] 6 4 + [0.8508] : 4 + [0.8508] 8 4 + [0.8509] 9 3 + [0.8510] 7 3 + [0.8510] 4 3 + [0.8511] SYRIAC FEMININE DOT 1 + [0.8511] SYRIAC RUKKAKHA 1 + [0.8512] Encoding training set + [0.9315] Creating new model [1,1,0,48 Lbx100 Do] with 49 outputs + [0.9318] layer type params + [0.9350] 0 rnn direction b transposed False summarize False out 100 legacy None + [0.9361] 1 dropout probability 0.5 dims 1 + [0.9381] 2 linear augmented False out 49 + [0.9918] Constructing RMSprop optimizer (lr: 0.001, momentum: 0.9) + [0.9920] Set OpenMP threads to 4 + [0.9920] Moving model to device cpu + [0.9924] Starting evaluation run + + +indicates that the training is running on 788 transcribed lines and a +validation set of 88 lines. 49 different classes, i.e. Unicode code points, +where found in these 788 lines. These affect the output size of the network; +obviously only these 49 different classes/code points can later be output by +the network. Importantly, we can see that certain characters occur markedly +less often than others. Characters like the Syriac feminine dot and numerals +that occur less than 10 times will most likely not be recognized well by the +trained net. + + +Evaluation and Validation +------------------------- + +While output during training is detailed enough to know when to stop training +one usually wants to know the specific kinds of errors to expect. Doing more +in-depth error analysis also allows to pinpoint weaknesses in the training +data, e.g. above average error rates for numerals indicate either a lack of +representation of numerals in the training data or erroneous transcription in +the first place. + +First the trained model has to be applied to some line transcriptions with the +`ketos test` command: + +.. code-block:: console + + $ ketos test -m syriac_best.mlmodel lines/*.png + Loading model syriac_best.mlmodel ✓ + Evaluating syriac_best.mlmodel + Evaluating [#-----------------------------------] 3% 00:04:56 + ... + +After all lines have been processed a evaluation report will be printed: + +.. code-block:: console + + === report === + + 35619 Characters + 336 Errors + 99.06% Accuracy + + 157 Insertions + 81 Deletions + 98 Substitutions + + Count Missed %Right + 27046 143 99.47% Syriac + 7015 52 99.26% Common + 1558 60 96.15% Inherited + + Errors Correct-Generated + 25 { } - { COMBINING DOT BELOW } + 25 { COMBINING DOT BELOW } - { } + 15 { . } - { } + 15 { COMBINING DIAERESIS } - { } + 12 { ܢ } - { } + 10 { } - { . } + 8 { COMBINING DOT ABOVE } - { } + 8 { ܝ } - { } + 7 { ZERO WIDTH NO-BREAK SPACE } - { } + 7 { ܆ } - { } + 7 { SPACE } - { } + 7 { ܣ } - { } + 6 { } - { ܝ } + 6 { COMBINING DOT ABOVE } - { COMBINING DIAERESIS } + 5 { ܙ } - { } + 5 { ܬ } - { } + 5 { } - { ܢ } + 4 { NO-BREAK SPACE } - { } + 4 { COMBINING DIAERESIS } - { COMBINING DOT ABOVE } + 4 { } - { ܒ } + 4 { } - { COMBINING DIAERESIS } + 4 { ܗ } - { } + 4 { } - { ܬ } + 4 { } - { ܘ } + 4 { ܕ } - { ܢ } + 3 { } - { ܕ } + 3 { ܐ } - { } + 3 { ܗ } - { ܐ } + 3 { ܝ } - { ܢ } + 3 { ܀ } - { . } + 3 { } - { ܗ } + + ..... + +The first section of the report consists of a simple accounting of the number +of characters in the ground truth, the errors in the recognition output and the +resulting accuracy in per cent. + +The next table lists the number of insertions (characters occurring in the +ground truth but not in the recognition output), substitutions (misrecognized +characters), and deletions (superfluous characters recognized by the model). + +Next is a grouping of errors (insertions and substitutions) by Unicode script. + +The final part of the report are errors sorted by frequency and a per +character accuracy report. Importantly most errors are incorrect recognition of +combining marks such as dots and diaereses. These may have several sources: +different dot placement in training and validation set, incorrect transcription +such as non-systematic transcription, or unclean speckled scans. Depending on +the error source, correction most often involves adding more training data and +fixing transcriptions. Sometimes it may even be advisable to remove +unrepresentative data from the training set. + +Recognition +----------- + +The ``kraken`` utility is employed for all non-training related tasks. Optical +character recognition is a multi-step process consisting of binarization +(conversion of input images to black and white), page segmentation (extracting +lines from the image), and recognition (converting line image to character +sequences). All of these may be run in a single call like this: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m MODEL_FILE + +producing a text file from the input image. There are also `hocr +`_ and `ALTO `_ output +formats available through the appropriate switches: + +.. code-block:: console + + $ kraken -i ... ocr -h + $ kraken -i ... ocr -a + +For debugging purposes it is sometimes helpful to run each step manually and +inspect intermediate results: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE BW_IMAGE binarize + $ kraken -i BW_IMAGE LINES segment + $ kraken -i BW_IMAGE OUTPUT_FILE ocr -l LINES ... + +It is also possible to recognize more than one file at a time by just chaining +``-i ... ...`` clauses like this: + +.. code-block:: console + + $ kraken -i input_1 output_1 -i input_2 output_2 ... + +Finally, there is a central repository containing freely available models. +Getting a list of all available models: + +.. code-block:: console + + $ kraken list + +Retrieving model metadata for a particular model: + +.. code-block:: console + + $ kraken show arabic-alam-al-kutub + name: arabic-alam-al-kutub.mlmodel + + An experimental model for Classical Arabic texts. + + Network trained on 889 lines of [0] as a test case for a general Classical + Arabic model. Ground truth was prepared by Sarah Savant + and Maxim Romanov . + + Vocalization was omitted in the ground truth. Training was stopped at ~35000 + iterations with an accuracy of 97%. + + [0] Ibn al-Faqīh (d. 365 AH). Kitāb al-buldān. Edited by Yūsuf al-Hādī, 1st + edition. Bayrūt: ʿĀlam al-kutub, 1416 AH/1996 CE. + alphabet: !()-.0123456789:[] «»،؟ءابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ARABIC + MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW + +and actually fetching the model: + +.. code-block:: console + + $ kraken get arabic-alam-al-kutub + +The downloaded model can then be used for recognition by the name shown in its metadata, e.g.: + +.. code-block:: console + + $ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m arabic-alam-al-kutub.mlmodel + +For more documentation see the kraken `website `_. diff --git a/main/_sources/vgsl.rst.txt b/main/_sources/vgsl.rst.txt new file mode 100644 index 000000000..6a0c42de4 --- /dev/null +++ b/main/_sources/vgsl.rst.txt @@ -0,0 +1,233 @@ +.. _vgsl: + +VGSL network specification +========================== + +kraken implements a dialect of the Variable-size Graph Specification Language +(VGSL), enabling the specification of different network architectures for image +processing purposes using a short definition string. + +Basics +------ + +A VGSL specification consists of an input block, one or more layers, and an +output block. For example: + +.. code-block:: console + + [1,48,0,1 Cr3,3,32 Mp2,2 Cr3,3,64 Mp2,2 S1(1x12)1,3 Lbx100 Do O1c103] + +The first block defines the input in order of [batch, height, width, channels] +with zero-valued dimensions being variable. Integer valued height or width +input specifications will result in the input images being automatically scaled +in either dimension. + +When channels are set to 1 grayscale or B/W inputs are expected, 3 expects RGB +color images. Higher values in combination with a height of 1 result in the +network being fed 1 pixel wide grayscale strips scaled to the size of the +channel dimension. + +After the input, a number of layers are defined. Layers operate on the channel +dimension; this is intuitive for convolutional layers but a recurrent layer +doing sequence classification along the width axis on an image of a particular +height requires the height dimension to be moved to the channel dimension, +e.g.: + +.. code-block:: console + + [1,48,0,1 S1(1x48)1,3 Lbx100 O1c103] + +or using the alternative slightly faster formulation: + +.. code-block:: console + + [1,1,0,48 Lbx100 O1c103] + +Finally an output definition is appended. When training sequence classification +networks with the provided tools the appropriate output definition is +automatically appended to the network based on the alphabet of the training +data. + +Examples +-------- + +.. code-block:: console + + [1,1,0,48 Lbx100 Do 01c59] + + Creating new model [1,1,0,48 Lbx100 Do] with 59 outputs + layer type params + 0 rnn direction b transposed False summarize False out 100 legacy None + 1 dropout probability 0.5 dims 1 + 2 linear augmented False out 59 + +A simple recurrent recognition model with a single LSTM layer classifying lines +normalized to 48 pixels in height. + +.. code-block:: console + + [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do 01c59] + + Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 59 outputs + layer type params + 0 conv kernel 3 x 3 filters 32 activation r + 1 dropout probability 0.1 dims 2 + 2 maxpool kernel 2 x 2 stride 2 x 2 + 3 conv kernel 3 x 3 filters 64 activation r + 4 dropout probability 0.1 dims 2 + 5 maxpool kernel 2 x 2 stride 2 x 2 + 6 reshape from 1 1 x 12 to 1/3 + 7 rnn direction b transposed False summarize False out 100 legacy None + 8 dropout probability 0.5 dims 1 + 9 linear augmented False out 59 + +A model with a small convolutional stack before a recurrent LSTM layer. The +extended dropout layer syntax is used to reduce drop probability on the depth +dimension as the default is too high for convolutional layers. The remainder of +the height dimension (`12`) is reshaped into the depth dimensions before +applying the final recurrent and linear layers. + +.. code-block:: console + + [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do 01c59] + + Creating new model [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do] with 59 outputs + layer type params + 0 conv kernel 3 x 3 filters 16 activation r + 1 maxpool kernel 3 x 3 stride 3 x 3 + 2 rnn direction f transposed True summarize True out 64 legacy None + 3 rnn direction b transposed False summarize False out 128 legacy None + 4 rnn direction b transposed False summarize False out 256 legacy None + 5 dropout probability 0.5 dims 1 + 6 linear augmented False out 59 + +A model with arbitrary sized color image input, an initial summarizing +recurrent layer to squash the height to 64, followed by 2 bi-directional +recurrent layers and a linear projection. + +.. code-block:: console + + [1,1800,0,3 Cr3,3,32 Gn8 (I [Cr3,3,64,2,2 Gn8 CTr3,3,32,2,2]) Cr3,3,32 O2l8] + + layer type params + 0 conv kernel 3 x 3 filters 32 activation r + 1 groupnorm 8 groups + 2 parallel execute 2.0 and 2.1 in parallel + 2.0 identity + 2.1 serial execute 2.1.0 to 2.1.2 in sequence + 2.1.0 conv kernel 3 x 3 stride 2 x 2 filters 64 activation r + 2.1.1 groupnorm 8 groups + 2.1.2 transposed convolution kernel 3 x 3 stride 2 x 2 filters 2 activation r + 3 conv kernel 3 x 3 stride 1 x 1 filters 32 activation r + 4 linear activation sigmoid + +A model that outputs heatmaps with 8 feature dimensions, taking color images with +height normalized to 1800 pixels as its input. It uses a strided convolution +to first scale the image down, and then a transposed convolution to transform +the image back to its original size. This is done in a parallel block, where the +other branch simply passes through the output of the first convolution layer. +The input of the last convolutional layer is then the output of the two branches +of the parallel block concatenated, i.e. the output of the first +convolutional layer together with the output of the transposed convolutional layer, +giving `32 + 32 = 64` feature dimensions. + +Convolutional Layers +-------------------- + +.. code-block:: console + + C[T][{name}](s|t|r|l|m)[{name}],,[,,][,,] + s = sigmoid + t = tanh + r = relu + l = linear + m = softmax + +Adds a 2D convolution with kernel size `(y, x)` and `d` output channels, applying +the selected nonlinearity. Stride and dilation can be adjusted with the optional last +two parameters. `T` gives a transposed convolution. For transposed convolutions, +several output sizes are possible for the same configuration. The system +will try to match the output size of the different branches of parallel +blocks, however, this will only work if the transposed convolution directly +proceeds the confluence of the parallel branches, and if the branches with +fixed output size come first in the definition of the parallel block. Hence, +out of `(I [Cr3,3,8,2,2 CTr3,3,8,2,2])`, `([Cr3,3,8,2,2 CTr3,3,8,2,2] I)` +and `(I [Cr3,3,8,2,2 CTr3,3,8,2,2 Gn8])` only the first variant will +behave correctly. + +Recurrent Layers +---------------- + +.. code-block:: console + + L[{name}](f|r|b)(x|y)[s][{name}] LSTM cell with n outputs. + G[{name}](f|r|b)(x|y)[s][{name}] GRU cell with n outputs. + f runs the RNN forward only. + r runs the RNN reversed only. + b runs the RNN bidirectionally. + s (optional) summarizes the output in the requested dimension, return the last step. + +Adds either an LSTM or GRU recurrent layer to the network using either the `x` +(width) or `y` (height) dimension as the time axis. Input features are the +channel dimension and the non-time-axis dimension (height/width) is treated as +another batch dimension. For example, a `Lfx25` layer on an `1, 16, 906, 32` +input will execute 16 independent forward passes on `906x32` tensors resulting +in an output of shape `1, 16, 906, 25`. If this isn't desired either run a +summarizing layer in the other direction, e.g. `Lfys20` for an input `1, 1, +906, 20`, or prepend a reshape layer `S1(1x16)1,3` combining the height and +channel dimension for an `1, 1, 906, 512` input to the recurrent layer. + +Helper and Plumbing Layers +-------------------------- + +Max Pool +^^^^^^^^ +.. code-block:: console + + Mp[{name}],[,,] + +Adds a maximum pooling with `(y, x)` kernel_size and `(y_stride, x_stride)` stride. + +Reshape +^^^^^^^ + +.. code-block:: console + + S[{name}](x), Splits one dimension, moves one part to another + dimension. + +The `S` layer reshapes a source dimension `d` to `a,b` and distributes `a` into +dimension `e`, respectively `b` into `f`. Either `e` or `f` has to be equal to +`d`. So `S1(1, 48)1, 3` on an `1, 48, 1020, 8` input will first reshape into +`1, 1, 48, 1020, 8`, leave the `1` part in the height dimension and distribute +the `48` sized tensor into the channel dimension resulting in a `1, 1, 1024, +48*8=384` sized output. `S` layers are mostly used to remove undesirable non-1 +height before a recurrent layer. + +.. note:: + + This `S` layer is equivalent to the one implemented in the tensorflow + implementation of VGSL, i.e. behaves differently from tesseract. + +Regularization Layers +--------------------- + +Dropout +^^^^^^^ + +.. code-block:: console + + Do[{name}][],[] Insert a 1D or 2D dropout layer + +Adds an 1D or 2D dropout layer with a given probability. Defaults to `0.5` drop +probability and 1D dropout. Set to `dim` to `2` after convolutional layers. + +Group Normalization +^^^^^^^^^^^^^^^^^^^ + +.. code-block:: console + + Gn Inserts a group normalization layer + +Adds a group normalization layer separating the input into `` groups, +normalizing each separately. diff --git a/main/_static/alabaster.css b/main/_static/alabaster.css new file mode 100644 index 000000000..e3174bf93 --- /dev/null +++ b/main/_static/alabaster.css @@ -0,0 +1,708 @@ +@import url("basic.css"); + +/* -- page layout ----------------------------------------------------------- */ + +body { + font-family: Georgia, serif; + font-size: 17px; + background-color: #fff; + color: #000; + margin: 0; + padding: 0; +} + + +div.document { + width: 940px; + margin: 30px auto 0 auto; +} + +div.documentwrapper { + float: left; + width: 100%; +} + +div.bodywrapper { + margin: 0 0 0 220px; +} + +div.sphinxsidebar { + width: 220px; + font-size: 14px; + line-height: 1.5; +} + +hr { + border: 1px solid #B1B4B6; +} + +div.body { + background-color: #fff; + color: #3E4349; + padding: 0 30px 0 30px; +} + +div.body > .section { + text-align: left; +} + +div.footer { + width: 940px; + margin: 20px auto 30px auto; + font-size: 14px; + color: #888; + text-align: right; +} + +div.footer a { + color: #888; +} + +p.caption { + font-family: inherit; + font-size: inherit; +} + + +div.relations { + display: none; +} + + +div.sphinxsidebar { + max-height: 100%; + overflow-y: auto; +} + +div.sphinxsidebar a { + color: #444; + text-decoration: none; + border-bottom: 1px dotted #999; +} + +div.sphinxsidebar a:hover { + border-bottom: 1px solid #999; +} + +div.sphinxsidebarwrapper { + padding: 18px 10px; +} + +div.sphinxsidebarwrapper p.logo { + padding: 0; + margin: -10px 0 0 0px; + text-align: center; +} + +div.sphinxsidebarwrapper h1.logo { + margin-top: -10px; + text-align: center; + margin-bottom: 5px; + text-align: left; +} + +div.sphinxsidebarwrapper h1.logo-name { + margin-top: 0px; +} + +div.sphinxsidebarwrapper p.blurb { + margin-top: 0; + font-style: normal; +} + +div.sphinxsidebar h3, +div.sphinxsidebar h4 { + font-family: Georgia, serif; + color: #444; + font-size: 24px; + font-weight: normal; + margin: 0 0 5px 0; + padding: 0; +} + +div.sphinxsidebar h4 { + font-size: 20px; +} + +div.sphinxsidebar h3 a { + color: #444; +} + +div.sphinxsidebar p.logo a, +div.sphinxsidebar h3 a, +div.sphinxsidebar p.logo a:hover, +div.sphinxsidebar h3 a:hover { + border: none; +} + +div.sphinxsidebar p { + color: #555; + margin: 10px 0; +} + +div.sphinxsidebar ul { + margin: 10px 0; + padding: 0; + color: #000; +} + +div.sphinxsidebar ul li.toctree-l1 > a { + font-size: 120%; +} + +div.sphinxsidebar ul li.toctree-l2 > a { + font-size: 110%; +} + +div.sphinxsidebar input { + border: 1px solid #CCC; + font-family: Georgia, serif; + font-size: 1em; +} + +div.sphinxsidebar #searchbox input[type="text"] { + width: 160px; +} + +div.sphinxsidebar .search > div { + display: table-cell; +} + +div.sphinxsidebar hr { + border: none; + height: 1px; + color: #AAA; + background: #AAA; + + text-align: left; + margin-left: 0; + width: 50%; +} + +div.sphinxsidebar .badge { + border-bottom: none; +} + +div.sphinxsidebar .badge:hover { + border-bottom: none; +} + +/* To address an issue with donation coming after search */ +div.sphinxsidebar h3.donation { + margin-top: 10px; +} + +/* -- body styles ----------------------------------------------------------- */ + +a { + color: #004B6B; + text-decoration: underline; +} + +a:hover { + color: #6D4100; + text-decoration: underline; +} + +div.body h1, +div.body h2, +div.body h3, +div.body h4, +div.body h5, +div.body h6 { + font-family: Georgia, serif; + font-weight: normal; + margin: 30px 0px 10px 0px; + padding: 0; +} + +div.body h1 { margin-top: 0; padding-top: 0; font-size: 240%; } +div.body h2 { font-size: 180%; } +div.body h3 { font-size: 150%; } +div.body h4 { font-size: 130%; } +div.body h5 { font-size: 100%; } +div.body h6 { font-size: 100%; } + +a.headerlink { + color: #DDD; + padding: 0 4px; + text-decoration: none; +} + +a.headerlink:hover { + color: #444; + background: #EAEAEA; +} + +div.body p, div.body dd, div.body li { + line-height: 1.4em; +} + +div.admonition { + margin: 20px 0px; + padding: 10px 30px; + background-color: #EEE; + border: 1px solid #CCC; +} + +div.admonition tt.xref, div.admonition code.xref, div.admonition a tt { + background-color: #FBFBFB; + border-bottom: 1px solid #fafafa; +} + +div.admonition p.admonition-title { + font-family: Georgia, serif; + font-weight: normal; + font-size: 24px; + margin: 0 0 10px 0; + padding: 0; + line-height: 1; +} + +div.admonition p.last { + margin-bottom: 0; +} + +div.highlight { + background-color: #fff; +} + +dt:target, .highlight { + background: #FAF3E8; +} + +div.warning { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.danger { + background-color: #FCC; + border: 1px solid #FAA; + -moz-box-shadow: 2px 2px 4px #D52C2C; + -webkit-box-shadow: 2px 2px 4px #D52C2C; + box-shadow: 2px 2px 4px #D52C2C; +} + +div.error { + background-color: #FCC; + border: 1px solid #FAA; + -moz-box-shadow: 2px 2px 4px #D52C2C; + -webkit-box-shadow: 2px 2px 4px #D52C2C; + box-shadow: 2px 2px 4px #D52C2C; +} + +div.caution { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.attention { + background-color: #FCC; + border: 1px solid #FAA; +} + +div.important { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.note { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.tip { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.hint { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.seealso { + background-color: #EEE; + border: 1px solid #CCC; +} + +div.topic { + background-color: #EEE; +} + +p.admonition-title { + display: inline; +} + +p.admonition-title:after { + content: ":"; +} + +pre, tt, code { + font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace; + font-size: 0.9em; +} + +.hll { + background-color: #FFC; + margin: 0 -12px; + padding: 0 12px; + display: block; +} + +img.screenshot { +} + +tt.descname, tt.descclassname, code.descname, code.descclassname { + font-size: 0.95em; +} + +tt.descname, code.descname { + padding-right: 0.08em; +} + +img.screenshot { + -moz-box-shadow: 2px 2px 4px #EEE; + -webkit-box-shadow: 2px 2px 4px #EEE; + box-shadow: 2px 2px 4px #EEE; +} + +table.docutils { + border: 1px solid #888; + -moz-box-shadow: 2px 2px 4px #EEE; + -webkit-box-shadow: 2px 2px 4px #EEE; + box-shadow: 2px 2px 4px #EEE; +} + +table.docutils td, table.docutils th { + border: 1px solid #888; + padding: 0.25em 0.7em; +} + +table.field-list, table.footnote { + border: none; + -moz-box-shadow: none; + -webkit-box-shadow: none; + box-shadow: none; +} + +table.footnote { + margin: 15px 0; + width: 100%; + border: 1px solid #EEE; + background: #FDFDFD; + font-size: 0.9em; +} + +table.footnote + table.footnote { + margin-top: -15px; + border-top: none; +} + +table.field-list th { + padding: 0 0.8em 0 0; +} + +table.field-list td { + padding: 0; +} + +table.field-list p { + margin-bottom: 0.8em; +} + +/* Cloned from + * https://github.com/sphinx-doc/sphinx/commit/ef60dbfce09286b20b7385333d63a60321784e68 + */ +.field-name { + -moz-hyphens: manual; + -ms-hyphens: manual; + -webkit-hyphens: manual; + hyphens: manual; +} + +table.footnote td.label { + width: .1px; + padding: 0.3em 0 0.3em 0.5em; +} + +table.footnote td { + padding: 0.3em 0.5em; +} + +dl { + margin-left: 0; + margin-right: 0; + margin-top: 0; + padding: 0; +} + +dl dd { + margin-left: 30px; +} + +blockquote { + margin: 0 0 0 30px; + padding: 0; +} + +ul, ol { + /* Matches the 30px from the narrow-screen "li > ul" selector below */ + margin: 10px 0 10px 30px; + padding: 0; +} + +pre { + background: #EEE; + padding: 7px 30px; + margin: 15px 0px; + line-height: 1.3em; +} + +div.viewcode-block:target { + background: #ffd; +} + +dl pre, blockquote pre, li pre { + margin-left: 0; + padding-left: 30px; +} + +tt, code { + background-color: #ecf0f3; + color: #222; + /* padding: 1px 2px; */ +} + +tt.xref, code.xref, a tt { + background-color: #FBFBFB; + border-bottom: 1px solid #fff; +} + +a.reference { + text-decoration: none; + border-bottom: 1px dotted #004B6B; +} + +/* Don't put an underline on images */ +a.image-reference, a.image-reference:hover { + border-bottom: none; +} + +a.reference:hover { + border-bottom: 1px solid #6D4100; +} + +a.footnote-reference { + text-decoration: none; + font-size: 0.7em; + vertical-align: top; + border-bottom: 1px dotted #004B6B; +} + +a.footnote-reference:hover { + border-bottom: 1px solid #6D4100; +} + +a:hover tt, a:hover code { + background: #EEE; +} + + +@media screen and (max-width: 870px) { + + div.sphinxsidebar { + display: none; + } + + div.document { + width: 100%; + + } + + div.documentwrapper { + margin-left: 0; + margin-top: 0; + margin-right: 0; + margin-bottom: 0; + } + + div.bodywrapper { + margin-top: 0; + margin-right: 0; + margin-bottom: 0; + margin-left: 0; + } + + ul { + margin-left: 0; + } + + li > ul { + /* Matches the 30px from the "ul, ol" selector above */ + margin-left: 30px; + } + + .document { + width: auto; + } + + .footer { + width: auto; + } + + .bodywrapper { + margin: 0; + } + + .footer { + width: auto; + } + + .github { + display: none; + } + + + +} + + + +@media screen and (max-width: 875px) { + + body { + margin: 0; + padding: 20px 30px; + } + + div.documentwrapper { + float: none; + background: #fff; + } + + div.sphinxsidebar { + display: block; + float: none; + width: 102.5%; + margin: 50px -30px -20px -30px; + padding: 10px 20px; + background: #333; + color: #FFF; + } + + div.sphinxsidebar h3, div.sphinxsidebar h4, div.sphinxsidebar p, + div.sphinxsidebar h3 a { + color: #fff; + } + + div.sphinxsidebar a { + color: #AAA; + } + + div.sphinxsidebar p.logo { + display: none; + } + + div.document { + width: 100%; + margin: 0; + } + + div.footer { + display: none; + } + + div.bodywrapper { + margin: 0; + } + + div.body { + min-height: 0; + padding: 0; + } + + .rtd_doc_footer { + display: none; + } + + .document { + width: auto; + } + + .footer { + width: auto; + } + + .footer { + width: auto; + } + + .github { + display: none; + } +} + + +/* misc. */ + +.revsys-inline { + display: none!important; +} + +/* Hide ugly table cell borders in ..bibliography:: directive output */ +table.docutils.citation, table.docutils.citation td, table.docutils.citation th { + border: none; + /* Below needed in some edge cases; if not applied, bottom shadows appear */ + -moz-box-shadow: none; + -webkit-box-shadow: none; + box-shadow: none; +} + + +/* relbar */ + +.related { + line-height: 30px; + width: 100%; + font-size: 0.9rem; +} + +.related.top { + border-bottom: 1px solid #EEE; + margin-bottom: 20px; +} + +.related.bottom { + border-top: 1px solid #EEE; +} + +.related ul { + padding: 0; + margin: 0; + list-style: none; +} + +.related li { + display: inline; +} + +nav#rellinks { + float: right; +} + +nav#rellinks li+li:before { + content: "|"; +} + +nav#breadcrumbs li+li:before { + content: "\00BB"; +} + +/* Hide certain items when printing */ +@media print { + div.related { + display: none; + } +} \ No newline at end of file diff --git a/main/_static/basic.css b/main/_static/basic.css new file mode 100644 index 000000000..e5179b7a9 --- /dev/null +++ b/main/_static/basic.css @@ -0,0 +1,925 @@ +/* + * basic.css + * ~~~~~~~~~ + * + * Sphinx stylesheet -- basic theme. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +/* -- main layout ----------------------------------------------------------- */ + +div.clearer { + clear: both; +} + +div.section::after { + display: block; + content: ''; + clear: left; +} + +/* -- relbar ---------------------------------------------------------------- */ + +div.related { + width: 100%; + font-size: 90%; +} + +div.related h3 { + display: none; +} + +div.related ul { + margin: 0; + padding: 0 0 0 10px; + list-style: none; +} + +div.related li { + display: inline; +} + +div.related li.right { + float: right; + margin-right: 5px; +} + +/* -- sidebar --------------------------------------------------------------- */ + +div.sphinxsidebarwrapper { + padding: 10px 5px 0 10px; +} + +div.sphinxsidebar { + float: left; + width: 230px; + margin-left: -100%; + font-size: 90%; + word-wrap: break-word; + overflow-wrap : break-word; +} + +div.sphinxsidebar ul { + list-style: none; +} + +div.sphinxsidebar ul ul, +div.sphinxsidebar ul.want-points { + margin-left: 20px; + list-style: square; +} + +div.sphinxsidebar ul ul { + margin-top: 0; + margin-bottom: 0; +} + +div.sphinxsidebar form { + margin-top: 10px; +} + +div.sphinxsidebar input { + border: 1px solid #98dbcc; + font-family: sans-serif; + font-size: 1em; +} + +div.sphinxsidebar #searchbox form.search { + overflow: hidden; +} + +div.sphinxsidebar #searchbox input[type="text"] { + float: left; + width: 80%; + padding: 0.25em; + box-sizing: border-box; +} + +div.sphinxsidebar #searchbox input[type="submit"] { + float: left; + width: 20%; + border-left: none; + padding: 0.25em; + box-sizing: border-box; +} + + +img { + border: 0; + max-width: 100%; +} + +/* -- search page ----------------------------------------------------------- */ + +ul.search { + margin: 10px 0 0 20px; + padding: 0; +} + +ul.search li { + padding: 5px 0 5px 20px; + background-image: url(file.png); + background-repeat: no-repeat; + background-position: 0 7px; +} + +ul.search li a { + font-weight: bold; +} + +ul.search li p.context { + color: #888; + margin: 2px 0 0 30px; + text-align: left; +} + +ul.keywordmatches li.goodmatch a { + font-weight: bold; +} + +/* -- index page ------------------------------------------------------------ */ + +table.contentstable { + width: 90%; + margin-left: auto; + margin-right: auto; +} + +table.contentstable p.biglink { + line-height: 150%; +} + +a.biglink { + font-size: 1.3em; +} + +span.linkdescr { + font-style: italic; + padding-top: 5px; + font-size: 90%; +} + +/* -- general index --------------------------------------------------------- */ + +table.indextable { + width: 100%; +} + +table.indextable td { + text-align: left; + vertical-align: top; +} + +table.indextable ul { + margin-top: 0; + margin-bottom: 0; + list-style-type: none; +} + +table.indextable > tbody > tr > td > ul { + padding-left: 0em; +} + +table.indextable tr.pcap { + height: 10px; +} + +table.indextable tr.cap { + margin-top: 10px; + background-color: #f2f2f2; +} + +img.toggler { + margin-right: 3px; + margin-top: 3px; + cursor: pointer; +} + +div.modindex-jumpbox { + border-top: 1px solid #ddd; + border-bottom: 1px solid #ddd; + margin: 1em 0 1em 0; + padding: 0.4em; +} + +div.genindex-jumpbox { + border-top: 1px solid #ddd; + border-bottom: 1px solid #ddd; + margin: 1em 0 1em 0; + padding: 0.4em; +} + +/* -- domain module index --------------------------------------------------- */ + +table.modindextable td { + padding: 2px; + border-collapse: collapse; +} + +/* -- general body styles --------------------------------------------------- */ + +div.body { + min-width: inherit; + max-width: 800px; +} + +div.body p, div.body dd, div.body li, div.body blockquote { + -moz-hyphens: auto; + -ms-hyphens: auto; + -webkit-hyphens: auto; + hyphens: auto; +} + +a.headerlink { + visibility: hidden; +} + +a:visited { + color: #551A8B; +} + +h1:hover > a.headerlink, +h2:hover > a.headerlink, +h3:hover > a.headerlink, +h4:hover > a.headerlink, +h5:hover > a.headerlink, +h6:hover > a.headerlink, +dt:hover > a.headerlink, +caption:hover > a.headerlink, +p.caption:hover > a.headerlink, +div.code-block-caption:hover > a.headerlink { + visibility: visible; +} + +div.body p.caption { + text-align: inherit; +} + +div.body td { + text-align: left; +} + +.first { + margin-top: 0 !important; +} + +p.rubric { + margin-top: 30px; + font-weight: bold; +} + +img.align-left, figure.align-left, .figure.align-left, object.align-left { + clear: left; + float: left; + margin-right: 1em; +} + +img.align-right, figure.align-right, .figure.align-right, object.align-right { + clear: right; + float: right; + margin-left: 1em; +} + +img.align-center, figure.align-center, .figure.align-center, object.align-center { + display: block; + margin-left: auto; + margin-right: auto; +} + +img.align-default, figure.align-default, .figure.align-default { + display: block; + margin-left: auto; + margin-right: auto; +} + +.align-left { + text-align: left; +} + +.align-center { + text-align: center; +} + +.align-default { + text-align: center; +} + +.align-right { + text-align: right; +} + +/* -- sidebars -------------------------------------------------------------- */ + +div.sidebar, +aside.sidebar { + margin: 0 0 0.5em 1em; + border: 1px solid #ddb; + padding: 7px; + background-color: #ffe; + width: 40%; + float: right; + clear: right; + overflow-x: auto; +} + +p.sidebar-title { + font-weight: bold; +} + +nav.contents, +aside.topic, +div.admonition, div.topic, blockquote { + clear: left; +} + +/* -- topics ---------------------------------------------------------------- */ + +nav.contents, +aside.topic, +div.topic { + border: 1px solid #ccc; + padding: 7px; + margin: 10px 0 10px 0; +} + +p.topic-title { + font-size: 1.1em; + font-weight: bold; + margin-top: 10px; +} + +/* -- admonitions ----------------------------------------------------------- */ + +div.admonition { + margin-top: 10px; + margin-bottom: 10px; + padding: 7px; +} + +div.admonition dt { + font-weight: bold; +} + +p.admonition-title { + margin: 0px 10px 5px 0px; + font-weight: bold; +} + +div.body p.centered { + text-align: center; + margin-top: 25px; +} + +/* -- content of sidebars/topics/admonitions -------------------------------- */ + +div.sidebar > :last-child, +aside.sidebar > :last-child, +nav.contents > :last-child, +aside.topic > :last-child, +div.topic > :last-child, +div.admonition > :last-child { + margin-bottom: 0; +} + +div.sidebar::after, +aside.sidebar::after, +nav.contents::after, +aside.topic::after, +div.topic::after, +div.admonition::after, +blockquote::after { + display: block; + content: ''; + clear: both; +} + +/* -- tables ---------------------------------------------------------------- */ + +table.docutils { + margin-top: 10px; + margin-bottom: 10px; + border: 0; + border-collapse: collapse; +} + +table.align-center { + margin-left: auto; + margin-right: auto; +} + +table.align-default { + margin-left: auto; + margin-right: auto; +} + +table caption span.caption-number { + font-style: italic; +} + +table caption span.caption-text { +} + +table.docutils td, table.docutils th { + padding: 1px 8px 1px 5px; + border-top: 0; + border-left: 0; + border-right: 0; + border-bottom: 1px solid #aaa; +} + +th { + text-align: left; + padding-right: 5px; +} + +table.citation { + border-left: solid 1px gray; + margin-left: 1px; +} + +table.citation td { + border-bottom: none; +} + +th > :first-child, +td > :first-child { + margin-top: 0px; +} + +th > :last-child, +td > :last-child { + margin-bottom: 0px; +} + +/* -- figures --------------------------------------------------------------- */ + +div.figure, figure { + margin: 0.5em; + padding: 0.5em; +} + +div.figure p.caption, figcaption { + padding: 0.3em; +} + +div.figure p.caption span.caption-number, +figcaption span.caption-number { + font-style: italic; +} + +div.figure p.caption span.caption-text, +figcaption span.caption-text { +} + +/* -- field list styles ----------------------------------------------------- */ + +table.field-list td, table.field-list th { + border: 0 !important; +} + +.field-list ul { + margin: 0; + padding-left: 1em; +} + +.field-list p { + margin: 0; +} + +.field-name { + -moz-hyphens: manual; + -ms-hyphens: manual; + -webkit-hyphens: manual; + hyphens: manual; +} + +/* -- hlist styles ---------------------------------------------------------- */ + +table.hlist { + margin: 1em 0; +} + +table.hlist td { + vertical-align: top; +} + +/* -- object description styles --------------------------------------------- */ + +.sig { + font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace; +} + +.sig-name, code.descname { + background-color: transparent; + font-weight: bold; +} + +.sig-name { + font-size: 1.1em; +} + +code.descname { + font-size: 1.2em; +} + +.sig-prename, code.descclassname { + background-color: transparent; +} + +.optional { + font-size: 1.3em; +} + +.sig-paren { + font-size: larger; +} + +.sig-param.n { + font-style: italic; +} + +/* C++ specific styling */ + +.sig-inline.c-texpr, +.sig-inline.cpp-texpr { + font-family: unset; +} + +.sig.c .k, .sig.c .kt, +.sig.cpp .k, .sig.cpp .kt { + color: #0033B3; +} + +.sig.c .m, +.sig.cpp .m { + color: #1750EB; +} + +.sig.c .s, .sig.c .sc, +.sig.cpp .s, .sig.cpp .sc { + color: #067D17; +} + + +/* -- other body styles ----------------------------------------------------- */ + +ol.arabic { + list-style: decimal; +} + +ol.loweralpha { + list-style: lower-alpha; +} + +ol.upperalpha { + list-style: upper-alpha; +} + +ol.lowerroman { + list-style: lower-roman; +} + +ol.upperroman { + list-style: upper-roman; +} + +:not(li) > ol > li:first-child > :first-child, +:not(li) > ul > li:first-child > :first-child { + margin-top: 0px; +} + +:not(li) > ol > li:last-child > :last-child, +:not(li) > ul > li:last-child > :last-child { + margin-bottom: 0px; +} + +ol.simple ol p, +ol.simple ul p, +ul.simple ol p, +ul.simple ul p { + margin-top: 0; +} + +ol.simple > li:not(:first-child) > p, +ul.simple > li:not(:first-child) > p { + margin-top: 0; +} + +ol.simple p, +ul.simple p { + margin-bottom: 0; +} + +aside.footnote > span, +div.citation > span { + float: left; +} +aside.footnote > span:last-of-type, +div.citation > span:last-of-type { + padding-right: 0.5em; +} +aside.footnote > p { + margin-left: 2em; +} +div.citation > p { + margin-left: 4em; +} +aside.footnote > p:last-of-type, +div.citation > p:last-of-type { + margin-bottom: 0em; +} +aside.footnote > p:last-of-type:after, +div.citation > p:last-of-type:after { + content: ""; + clear: both; +} + +dl.field-list { + display: grid; + grid-template-columns: fit-content(30%) auto; +} + +dl.field-list > dt { + font-weight: bold; + word-break: break-word; + padding-left: 0.5em; + padding-right: 5px; +} + +dl.field-list > dd { + padding-left: 0.5em; + margin-top: 0em; + margin-left: 0em; + margin-bottom: 0em; +} + +dl { + margin-bottom: 15px; +} + +dd > :first-child { + margin-top: 0px; +} + +dd ul, dd table { + margin-bottom: 10px; +} + +dd { + margin-top: 3px; + margin-bottom: 10px; + margin-left: 30px; +} + +.sig dd { + margin-top: 0px; + margin-bottom: 0px; +} + +.sig dl { + margin-top: 0px; + margin-bottom: 0px; +} + +dl > dd:last-child, +dl > dd:last-child > :last-child { + margin-bottom: 0; +} + +dt:target, span.highlighted { + background-color: #fbe54e; +} + +rect.highlighted { + fill: #fbe54e; +} + +dl.glossary dt { + font-weight: bold; + font-size: 1.1em; +} + +.versionmodified { + font-style: italic; +} + +.system-message { + background-color: #fda; + padding: 5px; + border: 3px solid red; +} + +.footnote:target { + background-color: #ffa; +} + +.line-block { + display: block; + margin-top: 1em; + margin-bottom: 1em; +} + +.line-block .line-block { + margin-top: 0; + margin-bottom: 0; + margin-left: 1.5em; +} + +.guilabel, .menuselection { + font-family: sans-serif; +} + +.accelerator { + text-decoration: underline; +} + +.classifier { + font-style: oblique; +} + +.classifier:before { + font-style: normal; + margin: 0 0.5em; + content: ":"; + display: inline-block; +} + +abbr, acronym { + border-bottom: dotted 1px; + cursor: help; +} + +.translated { + background-color: rgba(207, 255, 207, 0.2) +} + +.untranslated { + background-color: rgba(255, 207, 207, 0.2) +} + +/* -- code displays --------------------------------------------------------- */ + +pre { + overflow: auto; + overflow-y: hidden; /* fixes display issues on Chrome browsers */ +} + +pre, div[class*="highlight-"] { + clear: both; +} + +span.pre { + -moz-hyphens: none; + -ms-hyphens: none; + -webkit-hyphens: none; + hyphens: none; + white-space: nowrap; +} + +div[class*="highlight-"] { + margin: 1em 0; +} + +td.linenos pre { + border: 0; + background-color: transparent; + color: #aaa; +} + +table.highlighttable { + display: block; +} + +table.highlighttable tbody { + display: block; +} + +table.highlighttable tr { + display: flex; +} + +table.highlighttable td { + margin: 0; + padding: 0; +} + +table.highlighttable td.linenos { + padding-right: 0.5em; +} + +table.highlighttable td.code { + flex: 1; + overflow: hidden; +} + +.highlight .hll { + display: block; +} + +div.highlight pre, +table.highlighttable pre { + margin: 0; +} + +div.code-block-caption + div { + margin-top: 0; +} + +div.code-block-caption { + margin-top: 1em; + padding: 2px 5px; + font-size: small; +} + +div.code-block-caption code { + background-color: transparent; +} + +table.highlighttable td.linenos, +span.linenos, +div.highlight span.gp { /* gp: Generic.Prompt */ + user-select: none; + -webkit-user-select: text; /* Safari fallback only */ + -webkit-user-select: none; /* Chrome/Safari */ + -moz-user-select: none; /* Firefox */ + -ms-user-select: none; /* IE10+ */ +} + +div.code-block-caption span.caption-number { + padding: 0.1em 0.3em; + font-style: italic; +} + +div.code-block-caption span.caption-text { +} + +div.literal-block-wrapper { + margin: 1em 0; +} + +code.xref, a code { + background-color: transparent; + font-weight: bold; +} + +h1 code, h2 code, h3 code, h4 code, h5 code, h6 code { + background-color: transparent; +} + +.viewcode-link { + float: right; +} + +.viewcode-back { + float: right; + font-family: sans-serif; +} + +div.viewcode-block:target { + margin: -1px -10px; + padding: 0 10px; +} + +/* -- math display ---------------------------------------------------------- */ + +img.math { + vertical-align: middle; +} + +div.body div.math p { + text-align: center; +} + +span.eqno { + float: right; +} + +span.eqno a.headerlink { + position: absolute; + z-index: 1; +} + +div.math:hover a.headerlink { + visibility: visible; +} + +/* -- printout stylesheet --------------------------------------------------- */ + +@media print { + div.document, + div.documentwrapper, + div.bodywrapper { + margin: 0 !important; + width: 100%; + } + + div.sphinxsidebar, + div.related, + div.footer, + #top-link { + display: none; + } +} \ No newline at end of file diff --git a/main/_static/blla_heatmap.jpg b/main/_static/blla_heatmap.jpg new file mode 100644 index 000000000..3f3381096 Binary files /dev/null and b/main/_static/blla_heatmap.jpg differ diff --git a/main/_static/blla_output.jpg b/main/_static/blla_output.jpg new file mode 100644 index 000000000..72c652fa4 Binary files /dev/null and b/main/_static/blla_output.jpg differ diff --git a/main/_static/bw.png b/main/_static/bw.png new file mode 100644 index 000000000..e7e5054eb Binary files /dev/null and b/main/_static/bw.png differ diff --git a/main/_static/custom.css b/main/_static/custom.css new file mode 100644 index 000000000..c41f90af5 --- /dev/null +++ b/main/_static/custom.css @@ -0,0 +1,24 @@ +pre { + white-space: pre-wrap; +} +svg { + width: 100%; +} +.highlight .err { + border: inherit; + box-sizing: inherit; +} + +div.leftside { + width: 110px; + padding: 0px 3px 0px 0px; + float: left; +} + +div.rightside { + margin-left: 125px; +} + +dl.py { + margin-top: 25px; +} diff --git a/main/_static/doctools.js b/main/_static/doctools.js new file mode 100644 index 000000000..4d67807d1 --- /dev/null +++ b/main/_static/doctools.js @@ -0,0 +1,156 @@ +/* + * doctools.js + * ~~~~~~~~~~~ + * + * Base JavaScript utilities for all Sphinx HTML documentation. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ +"use strict"; + +const BLACKLISTED_KEY_CONTROL_ELEMENTS = new Set([ + "TEXTAREA", + "INPUT", + "SELECT", + "BUTTON", +]); + +const _ready = (callback) => { + if (document.readyState !== "loading") { + callback(); + } else { + document.addEventListener("DOMContentLoaded", callback); + } +}; + +/** + * Small JavaScript module for the documentation. + */ +const Documentation = { + init: () => { + Documentation.initDomainIndexTable(); + Documentation.initOnKeyListeners(); + }, + + /** + * i18n support + */ + TRANSLATIONS: {}, + PLURAL_EXPR: (n) => (n === 1 ? 0 : 1), + LOCALE: "unknown", + + // gettext and ngettext don't access this so that the functions + // can safely bound to a different name (_ = Documentation.gettext) + gettext: (string) => { + const translated = Documentation.TRANSLATIONS[string]; + switch (typeof translated) { + case "undefined": + return string; // no translation + case "string": + return translated; // translation exists + default: + return translated[0]; // (singular, plural) translation tuple exists + } + }, + + ngettext: (singular, plural, n) => { + const translated = Documentation.TRANSLATIONS[singular]; + if (typeof translated !== "undefined") + return translated[Documentation.PLURAL_EXPR(n)]; + return n === 1 ? singular : plural; + }, + + addTranslations: (catalog) => { + Object.assign(Documentation.TRANSLATIONS, catalog.messages); + Documentation.PLURAL_EXPR = new Function( + "n", + `return (${catalog.plural_expr})` + ); + Documentation.LOCALE = catalog.locale; + }, + + /** + * helper function to focus on search bar + */ + focusSearchBar: () => { + document.querySelectorAll("input[name=q]")[0]?.focus(); + }, + + /** + * Initialise the domain index toggle buttons + */ + initDomainIndexTable: () => { + const toggler = (el) => { + const idNumber = el.id.substr(7); + const toggledRows = document.querySelectorAll(`tr.cg-${idNumber}`); + if (el.src.substr(-9) === "minus.png") { + el.src = `${el.src.substr(0, el.src.length - 9)}plus.png`; + toggledRows.forEach((el) => (el.style.display = "none")); + } else { + el.src = `${el.src.substr(0, el.src.length - 8)}minus.png`; + toggledRows.forEach((el) => (el.style.display = "")); + } + }; + + const togglerElements = document.querySelectorAll("img.toggler"); + togglerElements.forEach((el) => + el.addEventListener("click", (event) => toggler(event.currentTarget)) + ); + togglerElements.forEach((el) => (el.style.display = "")); + if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) togglerElements.forEach(toggler); + }, + + initOnKeyListeners: () => { + // only install a listener if it is really needed + if ( + !DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS && + !DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS + ) + return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.altKey || event.ctrlKey || event.metaKey) return; + + if (!event.shiftKey) { + switch (event.key) { + case "ArrowLeft": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const prevLink = document.querySelector('link[rel="prev"]'); + if (prevLink && prevLink.href) { + window.location.href = prevLink.href; + event.preventDefault(); + } + break; + case "ArrowRight": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const nextLink = document.querySelector('link[rel="next"]'); + if (nextLink && nextLink.href) { + window.location.href = nextLink.href; + event.preventDefault(); + } + break; + } + } + + // some keyboard layouts may need Shift to get / + switch (event.key) { + case "/": + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) break; + Documentation.focusSearchBar(); + event.preventDefault(); + } + }); + }, +}; + +// quick alias for translations +const _ = Documentation.gettext; + +_ready(Documentation.init); diff --git a/main/_static/documentation_options.js b/main/_static/documentation_options.js new file mode 100644 index 000000000..7e4c114f2 --- /dev/null +++ b/main/_static/documentation_options.js @@ -0,0 +1,13 @@ +const DOCUMENTATION_OPTIONS = { + VERSION: '', + LANGUAGE: 'en', + COLLAPSE_INDEX: false, + BUILDER: 'html', + FILE_SUFFIX: '.html', + LINK_SUFFIX: '.html', + HAS_SOURCE: true, + SOURCELINK_SUFFIX: '.txt', + NAVIGATION_WITH_KEYS: false, + SHOW_SEARCH_SUMMARY: true, + ENABLE_SEARCH_SHORTCUTS: true, +}; \ No newline at end of file diff --git a/main/_static/file.png b/main/_static/file.png new file mode 100644 index 000000000..a858a410e Binary files /dev/null and b/main/_static/file.png differ diff --git a/main/_static/graphviz.css b/main/_static/graphviz.css new file mode 100644 index 000000000..027576e34 --- /dev/null +++ b/main/_static/graphviz.css @@ -0,0 +1,19 @@ +/* + * graphviz.css + * ~~~~~~~~~~~~ + * + * Sphinx stylesheet -- graphviz extension. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +img.graphviz { + border: 0; + max-width: 100%; +} + +object.graphviz { + max-width: 100%; +} diff --git a/main/_static/kraken.png b/main/_static/kraken.png new file mode 100644 index 000000000..8f25dd8be Binary files /dev/null and b/main/_static/kraken.png differ diff --git a/main/_static/kraken_recognition.svg b/main/_static/kraken_recognition.svg new file mode 100644 index 000000000..129b2c67a --- /dev/null +++ b/main/_static/kraken_recognition.svg @@ -0,0 +1,948 @@ + + + + + + + + + + + + Output Matrix + + + Labels + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Label + Sequence + + + 15, 10, 1, ... + + + + 'Time' Steps + + + + + + + + + + + + + + 'Time' Steps + (Width) + + + + + + + + + + + + + + + + + + + + + + + + + + Neural + Net + + + + Character + Sequence + + + o, c, u, ... + + + + + + + + + + + + + + + CTC + decoder + + + + + Codec + + + + + + + + + + + + + + diff --git a/main/_static/kraken_segmentation.svg b/main/_static/kraken_segmentation.svg new file mode 100644 index 000000000..4b9c860ce --- /dev/null +++ b/main/_static/kraken_segmentation.svg @@ -0,0 +1,1161 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Pixel Labelling + + + + + + + + Line and Separator + Heatmaps + + + + + + + + + Bounding Polygon + Calculation + + + + + + + + + + + Baseline + Vectorization + and Orientation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Oriented + Baselines + + + + + + + + + Line + Ordering + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Bounding + Polygons + + + + + + + Trainable + + + + + + + + + + + + Segmentation + + + + + + + + + + Region Heatmaps + + + + + + + + + + Region + Vectorization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Region + Boundaries + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/main/_static/kraken_segmodel.svg b/main/_static/kraken_segmodel.svg new file mode 100644 index 000000000..e722a9707 --- /dev/null +++ b/main/_static/kraken_segmodel.svg @@ -0,0 +1,250 @@ + + + + + + + + + + + + + Segmentation Model + (TorchVGSLModel) + + + + + + + + + Metadata + + + + + + + Line and Region Types + + + + + + + Baseline location flag + + + + + + + Bounding Regions + + + + + + + + + + + Neural Network + + + + diff --git a/main/_static/kraken_torchseqrecognizer.svg b/main/_static/kraken_torchseqrecognizer.svg new file mode 100644 index 000000000..c9a2f1135 --- /dev/null +++ b/main/_static/kraken_torchseqrecognizer.svg @@ -0,0 +1,239 @@ + + + + + + + + + + + + + Transcription Model + (TorchSeqRecognizer) + + + + + + + + + + Codec + + + + + + + + + + + Metadata + + + + + + + + + + + CTC Decoder + + + + + + + + + + + Neural Network + + + + diff --git a/main/_static/kraken_workflow.svg b/main/_static/kraken_workflow.svg new file mode 100644 index 000000000..5a50b51d6 --- /dev/null +++ b/main/_static/kraken_workflow.svg @@ -0,0 +1,753 @@ + + + + + + + + + + + + + + + Segmentation + + + + + + + + + + + Recognition + + + + + + + + + + + Serialization + + + + + + + + + + + + + + + + + + + + + + Recognition Model + + + + + + + + + + + + + + + + + + + + + + Segmentation Model + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + OCR Records + + + + + + + + + + + + + + + + + + Baselines, + Regions, + and Order + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Output File + + + + + + + + + + + + + + + + + + Output Template + + + + + + + + + + + + + + + + + + Image + + diff --git a/main/_static/language_data.js b/main/_static/language_data.js new file mode 100644 index 000000000..367b8ed81 --- /dev/null +++ b/main/_static/language_data.js @@ -0,0 +1,199 @@ +/* + * language_data.js + * ~~~~~~~~~~~~~~~~ + * + * This script contains the language-specific data used by searchtools.js, + * namely the list of stopwords, stemmer, scorer and splitter. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"]; + + +/* Non-minified version is copied as a separate JS file, if available */ + +/** + * Porter Stemmer + */ +var Stemmer = function() { + + var step2list = { + ational: 'ate', + tional: 'tion', + enci: 'ence', + anci: 'ance', + izer: 'ize', + bli: 'ble', + alli: 'al', + entli: 'ent', + eli: 'e', + ousli: 'ous', + ization: 'ize', + ation: 'ate', + ator: 'ate', + alism: 'al', + iveness: 'ive', + fulness: 'ful', + ousness: 'ous', + aliti: 'al', + iviti: 'ive', + biliti: 'ble', + logi: 'log' + }; + + var step3list = { + icate: 'ic', + ative: '', + alize: 'al', + iciti: 'ic', + ical: 'ic', + ful: '', + ness: '' + }; + + var c = "[^aeiou]"; // consonant + var v = "[aeiouy]"; // vowel + var C = c + "[^aeiouy]*"; // consonant sequence + var V = v + "[aeiou]*"; // vowel sequence + + var mgr0 = "^(" + C + ")?" + V + C; // [C]VC... is m>0 + var meq1 = "^(" + C + ")?" + V + C + "(" + V + ")?$"; // [C]VC[V] is m=1 + var mgr1 = "^(" + C + ")?" + V + C + V + C; // [C]VCVC... is m>1 + var s_v = "^(" + C + ")?" + v; // vowel in stem + + this.stemWord = function (w) { + var stem; + var suffix; + var firstch; + var origword = w; + + if (w.length < 3) + return w; + + var re; + var re2; + var re3; + var re4; + + firstch = w.substr(0,1); + if (firstch == "y") + w = firstch.toUpperCase() + w.substr(1); + + // Step 1a + re = /^(.+?)(ss|i)es$/; + re2 = /^(.+?)([^s])s$/; + + if (re.test(w)) + w = w.replace(re,"$1$2"); + else if (re2.test(w)) + w = w.replace(re2,"$1$2"); + + // Step 1b + re = /^(.+?)eed$/; + re2 = /^(.+?)(ed|ing)$/; + if (re.test(w)) { + var fp = re.exec(w); + re = new RegExp(mgr0); + if (re.test(fp[1])) { + re = /.$/; + w = w.replace(re,""); + } + } + else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1]; + re2 = new RegExp(s_v); + if (re2.test(stem)) { + w = stem; + re2 = /(at|bl|iz)$/; + re3 = new RegExp("([^aeiouylsz])\\1$"); + re4 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + if (re2.test(w)) + w = w + "e"; + else if (re3.test(w)) { + re = /.$/; + w = w.replace(re,""); + } + else if (re4.test(w)) + w = w + "e"; + } + } + + // Step 1c + re = /^(.+?)y$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(s_v); + if (re.test(stem)) + w = stem + "i"; + } + + // Step 2 + re = /^(.+?)(ational|tional|enci|anci|izer|bli|alli|entli|eli|ousli|ization|ation|ator|alism|iveness|fulness|ousness|aliti|iviti|biliti|logi)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = new RegExp(mgr0); + if (re.test(stem)) + w = stem + step2list[suffix]; + } + + // Step 3 + re = /^(.+?)(icate|ative|alize|iciti|ical|ful|ness)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = new RegExp(mgr0); + if (re.test(stem)) + w = stem + step3list[suffix]; + } + + // Step 4 + re = /^(.+?)(al|ance|ence|er|ic|able|ible|ant|ement|ment|ent|ou|ism|ate|iti|ous|ive|ize)$/; + re2 = /^(.+?)(s|t)(ion)$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(mgr1); + if (re.test(stem)) + w = stem; + } + else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1] + fp[2]; + re2 = new RegExp(mgr1); + if (re2.test(stem)) + w = stem; + } + + // Step 5 + re = /^(.+?)e$/; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = new RegExp(mgr1); + re2 = new RegExp(meq1); + re3 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + if (re.test(stem) || (re2.test(stem) && !(re3.test(stem)))) + w = stem; + } + re = /ll$/; + re2 = new RegExp(mgr1); + if (re.test(w) && re2.test(w)) { + re = /.$/; + w = w.replace(re,""); + } + + // and turn initial Y back to y + if (firstch == "y") + w = firstch.toLowerCase() + w.substr(1); + return w; + } +} + diff --git a/main/_static/minus.png b/main/_static/minus.png new file mode 100644 index 000000000..d96755fda Binary files /dev/null and b/main/_static/minus.png differ diff --git a/main/_static/normal-reproduction-low-resolution.jpg b/main/_static/normal-reproduction-low-resolution.jpg new file mode 100644 index 000000000..673be92ae Binary files /dev/null and b/main/_static/normal-reproduction-low-resolution.jpg differ diff --git a/main/_static/pat.png b/main/_static/pat.png new file mode 100644 index 000000000..52f7a8995 Binary files /dev/null and b/main/_static/pat.png differ diff --git a/main/_static/plus.png b/main/_static/plus.png new file mode 100644 index 000000000..7107cec93 Binary files /dev/null and b/main/_static/plus.png differ diff --git a/main/_static/pygments.css b/main/_static/pygments.css new file mode 100644 index 000000000..0d49244ed --- /dev/null +++ b/main/_static/pygments.css @@ -0,0 +1,75 @@ +pre { line-height: 125%; } +td.linenos .normal { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +span.linenos { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +td.linenos .special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +.highlight .hll { background-color: #ffffcc } +.highlight { background: #eeffcc; } +.highlight .c { color: #408090; font-style: italic } /* Comment */ +.highlight .err { border: 1px solid #FF0000 } /* Error */ +.highlight .k { color: #007020; font-weight: bold } /* Keyword */ +.highlight .o { color: #666666 } /* Operator */ +.highlight .ch { color: #408090; font-style: italic } /* Comment.Hashbang */ +.highlight .cm { color: #408090; font-style: italic } /* Comment.Multiline */ +.highlight .cp { color: #007020 } /* Comment.Preproc */ +.highlight .cpf { color: #408090; font-style: italic } /* Comment.PreprocFile */ +.highlight .c1 { color: #408090; font-style: italic } /* Comment.Single */ +.highlight .cs { color: #408090; background-color: #fff0f0 } /* Comment.Special */ +.highlight .gd { color: #A00000 } /* Generic.Deleted */ +.highlight .ge { font-style: italic } /* Generic.Emph */ +.highlight .ges { font-weight: bold; font-style: italic } /* Generic.EmphStrong */ +.highlight .gr { color: #FF0000 } /* Generic.Error */ +.highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */ +.highlight .gi { color: #00A000 } /* Generic.Inserted */ +.highlight .go { color: #333333 } /* Generic.Output */ +.highlight .gp { color: #c65d09; font-weight: bold } /* Generic.Prompt */ +.highlight .gs { font-weight: bold } /* Generic.Strong */ +.highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */ +.highlight .gt { color: #0044DD } /* Generic.Traceback */ +.highlight .kc { color: #007020; font-weight: bold } /* Keyword.Constant */ +.highlight .kd { color: #007020; font-weight: bold } /* Keyword.Declaration */ +.highlight .kn { color: #007020; font-weight: bold } /* Keyword.Namespace */ +.highlight .kp { color: #007020 } /* Keyword.Pseudo */ +.highlight .kr { color: #007020; font-weight: bold } /* Keyword.Reserved */ +.highlight .kt { color: #902000 } /* Keyword.Type */ +.highlight .m { color: #208050 } /* Literal.Number */ +.highlight .s { color: #4070a0 } /* Literal.String */ +.highlight .na { color: #4070a0 } /* Name.Attribute */ +.highlight .nb { color: #007020 } /* Name.Builtin */ +.highlight .nc { color: #0e84b5; font-weight: bold } /* Name.Class */ +.highlight .no { color: #60add5 } /* Name.Constant */ +.highlight .nd { color: #555555; font-weight: bold } /* Name.Decorator */ +.highlight .ni { color: #d55537; font-weight: bold } /* Name.Entity */ +.highlight .ne { color: #007020 } /* Name.Exception */ +.highlight .nf { color: #06287e } /* Name.Function */ +.highlight .nl { color: #002070; font-weight: bold } /* Name.Label */ +.highlight .nn { color: #0e84b5; font-weight: bold } /* Name.Namespace */ +.highlight .nt { color: #062873; font-weight: bold } /* Name.Tag */ +.highlight .nv { color: #bb60d5 } /* Name.Variable */ +.highlight .ow { color: #007020; font-weight: bold } /* Operator.Word */ +.highlight .w { color: #bbbbbb } /* Text.Whitespace */ +.highlight .mb { color: #208050 } /* Literal.Number.Bin */ +.highlight .mf { color: #208050 } /* Literal.Number.Float */ +.highlight .mh { color: #208050 } /* Literal.Number.Hex */ +.highlight .mi { color: #208050 } /* Literal.Number.Integer */ +.highlight .mo { color: #208050 } /* Literal.Number.Oct */ +.highlight .sa { color: #4070a0 } /* Literal.String.Affix */ +.highlight .sb { color: #4070a0 } /* Literal.String.Backtick */ +.highlight .sc { color: #4070a0 } /* Literal.String.Char */ +.highlight .dl { color: #4070a0 } /* Literal.String.Delimiter */ +.highlight .sd { color: #4070a0; font-style: italic } /* Literal.String.Doc */ +.highlight .s2 { color: #4070a0 } /* Literal.String.Double */ +.highlight .se { color: #4070a0; font-weight: bold } /* Literal.String.Escape */ +.highlight .sh { color: #4070a0 } /* Literal.String.Heredoc */ +.highlight .si { color: #70a0d0; font-style: italic } /* Literal.String.Interpol */ +.highlight .sx { color: #c65d09 } /* Literal.String.Other */ +.highlight .sr { color: #235388 } /* Literal.String.Regex */ +.highlight .s1 { color: #4070a0 } /* Literal.String.Single */ +.highlight .ss { color: #517918 } /* Literal.String.Symbol */ +.highlight .bp { color: #007020 } /* Name.Builtin.Pseudo */ +.highlight .fm { color: #06287e } /* Name.Function.Magic */ +.highlight .vc { color: #bb60d5 } /* Name.Variable.Class */ +.highlight .vg { color: #bb60d5 } /* Name.Variable.Global */ +.highlight .vi { color: #bb60d5 } /* Name.Variable.Instance */ +.highlight .vm { color: #bb60d5 } /* Name.Variable.Magic */ +.highlight .il { color: #208050 } /* Literal.Number.Integer.Long */ \ No newline at end of file diff --git a/main/_static/searchtools.js b/main/_static/searchtools.js new file mode 100644 index 000000000..b08d58c9b --- /dev/null +++ b/main/_static/searchtools.js @@ -0,0 +1,620 @@ +/* + * searchtools.js + * ~~~~~~~~~~~~~~~~ + * + * Sphinx JavaScript utilities for the full-text search. + * + * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ +"use strict"; + +/** + * Simple result scoring code. + */ +if (typeof Scorer === "undefined") { + var Scorer = { + // Implement the following function to further tweak the score for each result + // The function takes a result array [docname, title, anchor, descr, score, filename] + // and returns the new score. + /* + score: result => { + const [docname, title, anchor, descr, score, filename] = result + return score + }, + */ + + // query matches the full name of an object + objNameMatch: 11, + // or matches in the last dotted part of the object name + objPartialMatch: 6, + // Additive scores depending on the priority of the object + objPrio: { + 0: 15, // used to be importantResults + 1: 5, // used to be objectResults + 2: -5, // used to be unimportantResults + }, + // Used when the priority is not in the mapping. + objPrioDefault: 0, + + // query found in title + title: 15, + partialTitle: 7, + // query found in terms + term: 5, + partialTerm: 2, + }; +} + +const _removeChildren = (element) => { + while (element && element.lastChild) element.removeChild(element.lastChild); +}; + +/** + * See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping + */ +const _escapeRegExp = (string) => + string.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string + +const _displayItem = (item, searchTerms, highlightTerms) => { + const docBuilder = DOCUMENTATION_OPTIONS.BUILDER; + const docFileSuffix = DOCUMENTATION_OPTIONS.FILE_SUFFIX; + const docLinkSuffix = DOCUMENTATION_OPTIONS.LINK_SUFFIX; + const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY; + const contentRoot = document.documentElement.dataset.content_root; + + const [docName, title, anchor, descr, score, _filename] = item; + + let listItem = document.createElement("li"); + let requestUrl; + let linkUrl; + if (docBuilder === "dirhtml") { + // dirhtml builder + let dirname = docName + "/"; + if (dirname.match(/\/index\/$/)) + dirname = dirname.substring(0, dirname.length - 6); + else if (dirname === "index/") dirname = ""; + requestUrl = contentRoot + dirname; + linkUrl = requestUrl; + } else { + // normal html builders + requestUrl = contentRoot + docName + docFileSuffix; + linkUrl = docName + docLinkSuffix; + } + let linkEl = listItem.appendChild(document.createElement("a")); + linkEl.href = linkUrl + anchor; + linkEl.dataset.score = score; + linkEl.innerHTML = title; + if (descr) { + listItem.appendChild(document.createElement("span")).innerHTML = + " (" + descr + ")"; + // highlight search terms in the description + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + } + else if (showSearchSummary) + fetch(requestUrl) + .then((responseData) => responseData.text()) + .then((data) => { + if (data) + listItem.appendChild( + Search.makeSearchSummary(data, searchTerms, anchor) + ); + // highlight search terms in the summary + if (SPHINX_HIGHLIGHT_ENABLED) // set in sphinx_highlight.js + highlightTerms.forEach((term) => _highlightText(listItem, term, "highlighted")); + }); + Search.output.appendChild(listItem); +}; +const _finishSearch = (resultCount) => { + Search.stopPulse(); + Search.title.innerText = _("Search Results"); + if (!resultCount) + Search.status.innerText = Documentation.gettext( + "Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories." + ); + else + Search.status.innerText = _( + "Search finished, found ${resultCount} page(s) matching the search query." + ).replace('${resultCount}', resultCount); +}; +const _displayNextItem = ( + results, + resultCount, + searchTerms, + highlightTerms, +) => { + // results left, load the summary and display it + // this is intended to be dynamic (don't sub resultsCount) + if (results.length) { + _displayItem(results.pop(), searchTerms, highlightTerms); + setTimeout( + () => _displayNextItem(results, resultCount, searchTerms, highlightTerms), + 5 + ); + } + // search finished, update title and status message + else _finishSearch(resultCount); +}; +// Helper function used by query() to order search results. +// Each input is an array of [docname, title, anchor, descr, score, filename]. +// Order the results by score (in opposite order of appearance, since the +// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically. +const _orderResultsByScoreThenName = (a, b) => { + const leftScore = a[4]; + const rightScore = b[4]; + if (leftScore === rightScore) { + // same score: sort alphabetically + const leftTitle = a[1].toLowerCase(); + const rightTitle = b[1].toLowerCase(); + if (leftTitle === rightTitle) return 0; + return leftTitle > rightTitle ? -1 : 1; // inverted is intentional + } + return leftScore > rightScore ? 1 : -1; +}; + +/** + * Default splitQuery function. Can be overridden in ``sphinx.search`` with a + * custom function per language. + * + * The regular expression works by splitting the string on consecutive characters + * that are not Unicode letters, numbers, underscores, or emoji characters. + * This is the same as ``\W+`` in Python, preserving the surrogate pair area. + */ +if (typeof splitQuery === "undefined") { + var splitQuery = (query) => query + .split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu) + .filter(term => term) // remove remaining empty strings +} + +/** + * Search Module + */ +const Search = { + _index: null, + _queued_query: null, + _pulse_status: -1, + + htmlToText: (htmlString, anchor) => { + const htmlElement = new DOMParser().parseFromString(htmlString, 'text/html'); + for (const removalQuery of [".headerlink", "script", "style"]) { + htmlElement.querySelectorAll(removalQuery).forEach((el) => { el.remove() }); + } + if (anchor) { + const anchorContent = htmlElement.querySelector(`[role="main"] ${anchor}`); + if (anchorContent) return anchorContent.textContent; + + console.warn( + `Anchored content block not found. Sphinx search tries to obtain it via DOM query '[role=main] ${anchor}'. Check your theme or template.` + ); + } + + // if anchor not specified or not found, fall back to main content + const docContent = htmlElement.querySelector('[role="main"]'); + if (docContent) return docContent.textContent; + + console.warn( + "Content block not found. Sphinx search tries to obtain it via DOM query '[role=main]'. Check your theme or template." + ); + return ""; + }, + + init: () => { + const query = new URLSearchParams(window.location.search).get("q"); + document + .querySelectorAll('input[name="q"]') + .forEach((el) => (el.value = query)); + if (query) Search.performSearch(query); + }, + + loadIndex: (url) => + (document.body.appendChild(document.createElement("script")).src = url), + + setIndex: (index) => { + Search._index = index; + if (Search._queued_query !== null) { + const query = Search._queued_query; + Search._queued_query = null; + Search.query(query); + } + }, + + hasIndex: () => Search._index !== null, + + deferQuery: (query) => (Search._queued_query = query), + + stopPulse: () => (Search._pulse_status = -1), + + startPulse: () => { + if (Search._pulse_status >= 0) return; + + const pulse = () => { + Search._pulse_status = (Search._pulse_status + 1) % 4; + Search.dots.innerText = ".".repeat(Search._pulse_status); + if (Search._pulse_status >= 0) window.setTimeout(pulse, 500); + }; + pulse(); + }, + + /** + * perform a search for something (or wait until index is loaded) + */ + performSearch: (query) => { + // create the required interface elements + const searchText = document.createElement("h2"); + searchText.textContent = _("Searching"); + const searchSummary = document.createElement("p"); + searchSummary.classList.add("search-summary"); + searchSummary.innerText = ""; + const searchList = document.createElement("ul"); + searchList.classList.add("search"); + + const out = document.getElementById("search-results"); + Search.title = out.appendChild(searchText); + Search.dots = Search.title.appendChild(document.createElement("span")); + Search.status = out.appendChild(searchSummary); + Search.output = out.appendChild(searchList); + + const searchProgress = document.getElementById("search-progress"); + // Some themes don't use the search progress node + if (searchProgress) { + searchProgress.innerText = _("Preparing search..."); + } + Search.startPulse(); + + // index already loaded, the browser was quick! + if (Search.hasIndex()) Search.query(query); + else Search.deferQuery(query); + }, + + _parseQuery: (query) => { + // stem the search terms and add them to the correct list + const stemmer = new Stemmer(); + const searchTerms = new Set(); + const excludedTerms = new Set(); + const highlightTerms = new Set(); + const objectTerms = new Set(splitQuery(query.toLowerCase().trim())); + splitQuery(query.trim()).forEach((queryTerm) => { + const queryTermLower = queryTerm.toLowerCase(); + + // maybe skip this "word" + // stopwords array is from language_data.js + if ( + stopwords.indexOf(queryTermLower) !== -1 || + queryTerm.match(/^\d+$/) + ) + return; + + // stem the word + let word = stemmer.stemWord(queryTermLower); + // select the correct list + if (word[0] === "-") excludedTerms.add(word.substr(1)); + else { + searchTerms.add(word); + highlightTerms.add(queryTermLower); + } + }); + + if (SPHINX_HIGHLIGHT_ENABLED) { // set in sphinx_highlight.js + localStorage.setItem("sphinx_highlight_terms", [...highlightTerms].join(" ")) + } + + // console.debug("SEARCH: searching for:"); + // console.info("required: ", [...searchTerms]); + // console.info("excluded: ", [...excludedTerms]); + + return [query, searchTerms, excludedTerms, highlightTerms, objectTerms]; + }, + + /** + * execute search (requires search index to be loaded) + */ + _performSearch: (query, searchTerms, excludedTerms, highlightTerms, objectTerms) => { + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const titles = Search._index.titles; + const allTitles = Search._index.alltitles; + const indexEntries = Search._index.indexentries; + + // Collect multiple result groups to be sorted separately and then ordered. + // Each is an array of [docname, title, anchor, descr, score, filename]. + const normalResults = []; + const nonMainIndexResults = []; + + _removeChildren(document.getElementById("search-progress")); + + const queryLower = query.toLowerCase().trim(); + for (const [title, foundTitles] of Object.entries(allTitles)) { + if (title.toLowerCase().trim().includes(queryLower) && (queryLower.length >= title.length/2)) { + for (const [file, id] of foundTitles) { + const score = Math.round(Scorer.title * queryLower.length / title.length); + const boost = titles[file] === title ? 1 : 0; // add a boost for document titles + normalResults.push([ + docNames[file], + titles[file] !== title ? `${titles[file]} > ${title}` : title, + id !== null ? "#" + id : "", + null, + score + boost, + filenames[file], + ]); + } + } + } + + // search for explicit entries in index directives + for (const [entry, foundEntries] of Object.entries(indexEntries)) { + if (entry.includes(queryLower) && (queryLower.length >= entry.length/2)) { + for (const [file, id, isMain] of foundEntries) { + const score = Math.round(100 * queryLower.length / entry.length); + const result = [ + docNames[file], + titles[file], + id ? "#" + id : "", + null, + score, + filenames[file], + ]; + if (isMain) { + normalResults.push(result); + } else { + nonMainIndexResults.push(result); + } + } + } + } + + // lookup as object + objectTerms.forEach((term) => + normalResults.push(...Search.performObjectSearch(term, objectTerms)) + ); + + // lookup as search terms in fulltext + normalResults.push(...Search.performTermsSearch(searchTerms, excludedTerms)); + + // let the scorer override scores with a custom scoring function + if (Scorer.score) { + normalResults.forEach((item) => (item[4] = Scorer.score(item))); + nonMainIndexResults.forEach((item) => (item[4] = Scorer.score(item))); + } + + // Sort each group of results by score and then alphabetically by name. + normalResults.sort(_orderResultsByScoreThenName); + nonMainIndexResults.sort(_orderResultsByScoreThenName); + + // Combine the result groups in (reverse) order. + // Non-main index entries are typically arbitrary cross-references, + // so display them after other results. + let results = [...nonMainIndexResults, ...normalResults]; + + // remove duplicate search results + // note the reversing of results, so that in the case of duplicates, the highest-scoring entry is kept + let seen = new Set(); + results = results.reverse().reduce((acc, result) => { + let resultStr = result.slice(0, 4).concat([result[5]]).map(v => String(v)).join(','); + if (!seen.has(resultStr)) { + acc.push(result); + seen.add(resultStr); + } + return acc; + }, []); + + return results.reverse(); + }, + + query: (query) => { + const [searchQuery, searchTerms, excludedTerms, highlightTerms, objectTerms] = Search._parseQuery(query); + const results = Search._performSearch(searchQuery, searchTerms, excludedTerms, highlightTerms, objectTerms); + + // for debugging + //Search.lastresults = results.slice(); // a copy + // console.info("search results:", Search.lastresults); + + // print the results + _displayNextItem(results, results.length, searchTerms, highlightTerms); + }, + + /** + * search for object names + */ + performObjectSearch: (object, objectTerms) => { + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const objects = Search._index.objects; + const objNames = Search._index.objnames; + const titles = Search._index.titles; + + const results = []; + + const objectSearchCallback = (prefix, match) => { + const name = match[4] + const fullname = (prefix ? prefix + "." : "") + name; + const fullnameLower = fullname.toLowerCase(); + if (fullnameLower.indexOf(object) < 0) return; + + let score = 0; + const parts = fullnameLower.split("."); + + // check for different match types: exact matches of full name or + // "last name" (i.e. last dotted part) + if (fullnameLower === object || parts.slice(-1)[0] === object) + score += Scorer.objNameMatch; + else if (parts.slice(-1)[0].indexOf(object) > -1) + score += Scorer.objPartialMatch; // matches in last name + + const objName = objNames[match[1]][2]; + const title = titles[match[0]]; + + // If more than one term searched for, we require other words to be + // found in the name/title/description + const otherTerms = new Set(objectTerms); + otherTerms.delete(object); + if (otherTerms.size > 0) { + const haystack = `${prefix} ${name} ${objName} ${title}`.toLowerCase(); + if ( + [...otherTerms].some((otherTerm) => haystack.indexOf(otherTerm) < 0) + ) + return; + } + + let anchor = match[3]; + if (anchor === "") anchor = fullname; + else if (anchor === "-") anchor = objNames[match[1]][1] + "-" + fullname; + + const descr = objName + _(", in ") + title; + + // add custom score for some objects according to scorer + if (Scorer.objPrio.hasOwnProperty(match[2])) + score += Scorer.objPrio[match[2]]; + else score += Scorer.objPrioDefault; + + results.push([ + docNames[match[0]], + fullname, + "#" + anchor, + descr, + score, + filenames[match[0]], + ]); + }; + Object.keys(objects).forEach((prefix) => + objects[prefix].forEach((array) => + objectSearchCallback(prefix, array) + ) + ); + return results; + }, + + /** + * search for full-text terms in the index + */ + performTermsSearch: (searchTerms, excludedTerms) => { + // prepare search + const terms = Search._index.terms; + const titleTerms = Search._index.titleterms; + const filenames = Search._index.filenames; + const docNames = Search._index.docnames; + const titles = Search._index.titles; + + const scoreMap = new Map(); + const fileMap = new Map(); + + // perform the search on the required terms + searchTerms.forEach((word) => { + const files = []; + const arr = [ + { files: terms[word], score: Scorer.term }, + { files: titleTerms[word], score: Scorer.title }, + ]; + // add support for partial matches + if (word.length > 2) { + const escapedWord = _escapeRegExp(word); + if (!terms.hasOwnProperty(word)) { + Object.keys(terms).forEach((term) => { + if (term.match(escapedWord)) + arr.push({ files: terms[term], score: Scorer.partialTerm }); + }); + } + if (!titleTerms.hasOwnProperty(word)) { + Object.keys(titleTerms).forEach((term) => { + if (term.match(escapedWord)) + arr.push({ files: titleTerms[term], score: Scorer.partialTitle }); + }); + } + } + + // no match but word was a required one + if (arr.every((record) => record.files === undefined)) return; + + // found search word in contents + arr.forEach((record) => { + if (record.files === undefined) return; + + let recordFiles = record.files; + if (recordFiles.length === undefined) recordFiles = [recordFiles]; + files.push(...recordFiles); + + // set score for the word in each file + recordFiles.forEach((file) => { + if (!scoreMap.has(file)) scoreMap.set(file, {}); + scoreMap.get(file)[word] = record.score; + }); + }); + + // create the mapping + files.forEach((file) => { + if (!fileMap.has(file)) fileMap.set(file, [word]); + else if (fileMap.get(file).indexOf(word) === -1) fileMap.get(file).push(word); + }); + }); + + // now check if the files don't contain excluded terms + const results = []; + for (const [file, wordList] of fileMap) { + // check if all requirements are matched + + // as search terms with length < 3 are discarded + const filteredTermCount = [...searchTerms].filter( + (term) => term.length > 2 + ).length; + if ( + wordList.length !== searchTerms.size && + wordList.length !== filteredTermCount + ) + continue; + + // ensure that none of the excluded terms is in the search result + if ( + [...excludedTerms].some( + (term) => + terms[term] === file || + titleTerms[term] === file || + (terms[term] || []).includes(file) || + (titleTerms[term] || []).includes(file) + ) + ) + break; + + // select one (max) score for the file. + const score = Math.max(...wordList.map((w) => scoreMap.get(file)[w])); + // add result to the result list + results.push([ + docNames[file], + titles[file], + "", + null, + score, + filenames[file], + ]); + } + return results; + }, + + /** + * helper function to return a node containing the + * search summary for a given text. keywords is a list + * of stemmed words. + */ + makeSearchSummary: (htmlText, keywords, anchor) => { + const text = Search.htmlToText(htmlText, anchor); + if (text === "") return null; + + const textLower = text.toLowerCase(); + const actualStartPosition = [...keywords] + .map((k) => textLower.indexOf(k.toLowerCase())) + .filter((i) => i > -1) + .slice(-1)[0]; + const startWithContext = Math.max(actualStartPosition - 120, 0); + + const top = startWithContext === 0 ? "" : "..."; + const tail = startWithContext + 240 < text.length ? "..." : ""; + + let summary = document.createElement("p"); + summary.classList.add("context"); + summary.textContent = top + text.substr(startWithContext, 240).trim() + tail; + + return summary; + }, +}; + +_ready(Search.init); diff --git a/main/_static/sphinx_highlight.js b/main/_static/sphinx_highlight.js new file mode 100644 index 000000000..8a96c69a1 --- /dev/null +++ b/main/_static/sphinx_highlight.js @@ -0,0 +1,154 @@ +/* Highlighting utilities for Sphinx HTML documentation. */ +"use strict"; + +const SPHINX_HIGHLIGHT_ENABLED = true + +/** + * highlight a given string on a node by wrapping it in + * span elements with the given class name. + */ +const _highlight = (node, addItems, text, className) => { + if (node.nodeType === Node.TEXT_NODE) { + const val = node.nodeValue; + const parent = node.parentNode; + const pos = val.toLowerCase().indexOf(text); + if ( + pos >= 0 && + !parent.classList.contains(className) && + !parent.classList.contains("nohighlight") + ) { + let span; + + const closestNode = parent.closest("body, svg, foreignObject"); + const isInSVG = closestNode && closestNode.matches("svg"); + if (isInSVG) { + span = document.createElementNS("http://www.w3.org/2000/svg", "tspan"); + } else { + span = document.createElement("span"); + span.classList.add(className); + } + + span.appendChild(document.createTextNode(val.substr(pos, text.length))); + const rest = document.createTextNode(val.substr(pos + text.length)); + parent.insertBefore( + span, + parent.insertBefore( + rest, + node.nextSibling + ) + ); + node.nodeValue = val.substr(0, pos); + /* There may be more occurrences of search term in this node. So call this + * function recursively on the remaining fragment. + */ + _highlight(rest, addItems, text, className); + + if (isInSVG) { + const rect = document.createElementNS( + "http://www.w3.org/2000/svg", + "rect" + ); + const bbox = parent.getBBox(); + rect.x.baseVal.value = bbox.x; + rect.y.baseVal.value = bbox.y; + rect.width.baseVal.value = bbox.width; + rect.height.baseVal.value = bbox.height; + rect.setAttribute("class", className); + addItems.push({ parent: parent, target: rect }); + } + } + } else if (node.matches && !node.matches("button, select, textarea")) { + node.childNodes.forEach((el) => _highlight(el, addItems, text, className)); + } +}; +const _highlightText = (thisNode, text, className) => { + let addItems = []; + _highlight(thisNode, addItems, text, className); + addItems.forEach((obj) => + obj.parent.insertAdjacentElement("beforebegin", obj.target) + ); +}; + +/** + * Small JavaScript module for the documentation. + */ +const SphinxHighlight = { + + /** + * highlight the search words provided in localstorage in the text + */ + highlightSearchWords: () => { + if (!SPHINX_HIGHLIGHT_ENABLED) return; // bail if no highlight + + // get and clear terms from localstorage + const url = new URL(window.location); + const highlight = + localStorage.getItem("sphinx_highlight_terms") + || url.searchParams.get("highlight") + || ""; + localStorage.removeItem("sphinx_highlight_terms") + url.searchParams.delete("highlight"); + window.history.replaceState({}, "", url); + + // get individual terms from highlight string + const terms = highlight.toLowerCase().split(/\s+/).filter(x => x); + if (terms.length === 0) return; // nothing to do + + // There should never be more than one element matching "div.body" + const divBody = document.querySelectorAll("div.body"); + const body = divBody.length ? divBody[0] : document.querySelector("body"); + window.setTimeout(() => { + terms.forEach((term) => _highlightText(body, term, "highlighted")); + }, 10); + + const searchBox = document.getElementById("searchbox"); + if (searchBox === null) return; + searchBox.appendChild( + document + .createRange() + .createContextualFragment( + '" + ) + ); + }, + + /** + * helper function to hide the search marks again + */ + hideSearchWords: () => { + document + .querySelectorAll("#searchbox .highlight-link") + .forEach((el) => el.remove()); + document + .querySelectorAll("span.highlighted") + .forEach((el) => el.classList.remove("highlighted")); + localStorage.removeItem("sphinx_highlight_terms") + }, + + initEscapeListener: () => { + // only install a listener if it is really needed + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.shiftKey || event.altKey || event.ctrlKey || event.metaKey) return; + if (DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS && (event.key === "Escape")) { + SphinxHighlight.hideSearchWords(); + event.preventDefault(); + } + }); + }, +}; + +_ready(() => { + /* Do not call highlightSearchWords() when we are on the search page. + * It will highlight words from the *previous* search query. + */ + if (typeof Search === "undefined") SphinxHighlight.highlightSearchWords(); + SphinxHighlight.initEscapeListener(); +}); diff --git a/main/advanced.html b/main/advanced.html new file mode 100644 index 000000000..8ce727d59 --- /dev/null +++ b/main/advanced.html @@ -0,0 +1,551 @@ + + + + + + + + Advanced Usage — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Advanced Usage

+

Optical character recognition is the serial execution of multiple steps, in the +case of kraken, layout analysis/page segmentation (extracting topological text +lines from an image), recognition (feeding text lines images into a +classifier), and finally serialization of results into an appropriate format +such as ALTO or PageXML.

+
+

Input and Outputs

+

Kraken inputs and their outputs can be defined in multiple ways. The most +simple are input-output pairs, i.e. producing one output document for one input +document follow the basic syntax:

+
$ kraken -i input_1 output_1 -i input_2 output_2 ... subcommand_1 subcommand_2 ... subcommand_n
+
+
+

In particular subcommands may be chained.

+

There are other ways to define inputs and outputs as the syntax shown above can +become rather cumbersome for large amounts of files.

+

As such there are a couple of ways to deal with multiple files in a compact +way. The first is batch processing:

+
$ kraken -I '*.png' -o ocr.txt segment ...
+
+
+

which expands the glob expression in kraken internally and +appends the suffix defined with -o to each output file. An input file +xyz.png will therefore produce an output file xyz.png.ocr.txt. -I batch +inputs can also be specified multiple times:

+
$ kraken -I '*.png' -I '*.jpg' -I '*.tif' -o ocr.txt segment ...
+
+
+

A second way is to input multi-image files directly. These can be either in +PDF, TIFF, or JPEG2000 format and are specified like:

+
$ kraken -I some.pdf -o ocr.txt -f pdf segment ...
+
+
+

This will internally extract all page images from the input PDF file and write +one output file with an index (can be changed using the -p option) and the +suffix defined with -o.

+

The -f option can not only be used to extract data from PDF/TIFF/JPEG2000 +files but also various XML formats. In these cases the appropriate data is +automatically selected from the inputs, image data for segmentation or line and +region segmentation for recognition:

+
$ kraken -i alto.xml alto.ocr.txt -i page.xml page.ocr.txt -f xml ocr ...
+
+
+

The code is able to automatically determine if a file is in PageXML or ALTO format.

+
+

Output formats

+

All commands have a default output format such as raw text for ocr, a plain +image for binarize, or a JSON definition of the the segmentation for +segment. These are specific to kraken and generally not suitable for further +processing by other software but a number of standardized data exchange formats +can be selected. Per default ALTO, +PageXML, hOCR, and abbyyXML containing additional metadata such as +bounding boxes and confidences are implemented. In addition, custom jinja templates can be loaded to create +individualised output such as TEI.

+

Output formats are selected on the main kraken command and apply to the last +subcommand defined in the subcommand chain. For example:

+
$ kraken --alto -i ... segment -bl
+
+
+

will serialize a plain segmentation in ALTO into the specified output file.

+

The currently available format switches are:

+
$ kraken -n -i ... ... # native output
+$ kraken -a -i ... ... # ALTO output
+$ kraken -x -i ... ... # PageXML output
+$ kraken -h -i ... ... # hOCR output
+$ kraken -y -i ... ... # abbyyXML output
+
+
+

Custom templates can be loaded with the --template option:

+
$ kraken --template /my/awesome/template.tmpl -i ... ...
+
+
+

The data objects used by the templates are considered internal to kraken and +can change from time to time. The best way to get some orientation when writing +a new template from scratch is to have a look at the existing templates here.

+
+
+
+

Binarization

+
+

Note

+

Binarization is deprecated and mostly not necessary anymore. It can often +worsen text recognition results especially for documents with uneven +lighting, faint writing, etc.

+
+

The binarization subcommand converts a color or grayscale input image into an +image containing only two color levels: white (background) and black +(foreground, i.e. text). It accepts almost the same parameters as +ocropus-nlbin. Only options not related to binarization, e.g. skew +detection are missing. In addition, error checking (image sizes, inversion +detection, grayscale enforcement) is always disabled and kraken will happily +binarize any image that is thrown at it.

+

Available parameters are:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

option

type

--threshold

FLOAT

--zoom

FLOAT

--escale

FLOAT

--border

FLOAT

--perc

INTEGER RANGE

--range

INTEGER

--low

INTEGER RANGE

--high

INTEGER RANGE

+

To binarize an image:

+
$ kraken -i input.jpg bw.png binarize
+
+
+
+

Note

+

Some image formats, notably JPEG, do not support a black and white +image mode. Per default the output format according to the output file +name extension will be honored. If this is not possible, a warning will +be printed and the output forced to PNG:

+
$ kraken -i input.jpg bw.jpg binarize
+Binarizing      [06/24/22 09:56:23] WARNING  jpeg does not support 1bpp images. Forcing to png.
+
+
+
+
+
+
+

Page Segmentation

+

The segment subcommand accesses page segmentation into lines and regions with +the two layout analysis methods implemented: the trainable baseline segmenter +that is capable of detecting both lines of different types and regions and a +legacy non-trainable segmenter that produces bounding boxes.

+

Universal parameters of either segmenter are:

+ + + + + + + + + + + + + + +

option

action

-d, --text-direction

Sets principal text direction. Valid values are horizontal-lr, horizontal-rl, vertical-lr, and vertical-rl.

-m, --mask

Segmentation mask suppressing page areas for line detection. A simple black and white mask image where 0-valued (black) areas are ignored for segmentation purposes.

+
+

Baseline Segmentation

+

The baseline segmenter works by applying a segmentation model on a page image +which labels each pixel on the image with one or more classes with each class +corresponding to a line or region of a specific type. In addition there are two +auxiliary classes that are used to determine the line orientation. A simplified +example of a composite image of the auxiliary classes and a single line type +without regions can be seen below:

+BLLA output heatmap + +

In a second step the raw heatmap is vectorized to extract line instances and +region boundaries, followed by bounding polygon computation for the baselines, +and text line ordering. The final output can be visualized as:

+BLLA final output + +

The primary determinant of segmentation quality is the segmentation model +employed. There is a default model that works reasonably well on printed and +handwritten material on undegraded, even writing surfaces such as paper or +parchment. The output of this model consists of a single line type and a +generic text region class that denotes coherent blocks of text. This model is +employed automatically when the baseline segment is activated with the -bl +option:

+
$ kraken -i input.jpg segmentation.json segment -bl
+
+
+

New models optimized for other kinds of documents can be trained (see +here). These can be applied with the -i option of the +segment subcommand:

+
$ kraken -i input.jpg segmentation.json segment -bl -i fancy_model.mlmodel
+
+
+
+
+

Legacy Box Segmentation

+

The legacy page segmentation is mostly parameterless, although a couple of +switches exist to tweak it for particular inputs. Its output consists of +rectangular bounding boxes in reading order and the general text direction +(horizontal, i.e. LTR or RTL text in top-to-bottom reading order or +vertical-ltr/rtl for vertical lines read from left-to-right or right-to-left).

+

Apart from the limitations of the bounding box paradigm (rotated and curved +lines cannot be effectively extracted) another important drawback of the legacy +segmenter is the requirement for binarized input images. It is therefore +necessary to apply binarization first or supply only +pre-binarized inputs.

+

The legacy segmenter can be applied on some input image with:

+
$ kraken -i 14.tif lines.json segment -x
+$ cat lines.json
+
+
+

Available specific parameters are:

+ + + + + + + + + + + + + + + + + + + + + + + +

option

action

--scale FLOAT

Estimate of the average line height on the page

-m, --maxcolseps

Maximum number of columns in the input document. Set to 0 for uni-column layouts.

-b, --black-colseps / -w, --white-colseps

Switch to black column separators.

-r, --remove-hlines / -l, --hlines

Disables prefiltering of small horizontal lines. Improves segmenter output on some Arabic texts.

-p, --pad

Adds left and right padding around lines in the output.

+
+
+

Principal Text Direction

+

The principal text direction selected with the -d/--text-direction is a +switch used in the reading order heuristic to determine the order of text +blocks (regions) and individual lines. It roughly corresponds to the block +flow direction in CSS with +an additional option. Valid options consist of two parts, an initial principal +line orientation (horizontal or vertical) followed by a block order (lr +for left-to-right or rl for right-to-left).

+
+

Warning

+

The principal text direction is independent of the direction of the +inline text direction (which is left-to-right for writing systems like +Latin and right-to-left for ones like Hebrew or Arabic). Kraken deals +automatically with the inline text direction through the BiDi algorithm +but can’t infer the principal text direction automatically as it is +determined by factors like layout, type of document, primary script in +the document, and other factors. The different types of text +directionality and their relation can be confusing, the W3C writing +mode document explains +the fundamentals, although the model used in Kraken differs slightly.

+
+

The first part is usually horizontal for scripts like Latin, Arabic, or +Hebrew where the lines are horizontally oriented on the page and are written/read from +top to bottom:

+Horizontal Latin script text + +

Other scripts like Chinese can be written with vertical lines that are +written/read from left to right or right to left:

+Vertical Chinese text + +

The second part is dependent on a number of factors as the order in which text +blocks are read is not fixed for every writing system. In mono-script texts it +is usually determined by the inline text direction, i.e. Latin script texts +columns are read starting with the top-left column followed by the column to +its right and so on, continuing with the left-most column below if none remain +to the right (inverse for right-to-left scripts like Arabic which start on the +top right-most columns, continuing leftward, and returning to the right-most +column just below when none remain).

+

In multi-script documents the order is determined by the primary writing +system employed in the document, e.g. for a modern book containing both Latin +and Arabic script text it would be set to lr when Latin is primary, e.g. when +the binding is on the left side of the book seen from the title cover, and +vice-versa (rl if binding is on the right on the title cover). The analogue +applies to text written with vertical lines.

+

With these explications there are four different text directions available:

+ + + + + + + + + + + + + + + + + + + + +

Text Direction

Examples

horizontal-lr

Latin script texts, Mixed LTR/RTL docs with principal LTR script

horizontal-rl

Arabic script texts, Mixed LTR/RTL docs with principal RTL script

vertical-lr

Vertical script texts read from left-to-right.

vertical-rl

Vertical script texts read from right-to-left.

+
+
+

Masking

+

It is possible to keep the segmenter from finding text lines and regions on +certain areas of the input image. This is done through providing a binary mask +image that has the same size as the input image where blocked out regions are +black and valid regions white:

+
$ kraken -i input.jpg segmentation.json segment -bl -m mask.png
+
+
+
+
+
+

Model Repository

+

There is a semi-curated repository of freely licensed recognition +models that can be interacted with from the command line using a few +subcommands.

+
+

Querying and Model Retrieval

+

The list subcommand retrieves a list of all models available and prints +them including some additional information (identifier, type, and a short +description):

+
$ kraken list
+Retrieving model list ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 8/8 0:00:00 0:00:07
+10.5281/zenodo.6542744 (pytorch) - LECTAUREP Contemporary French Model (Administration)
+10.5281/zenodo.5617783 (pytorch) - Cremma-Medieval Old French Model (Litterature)
+10.5281/zenodo.5468665 (pytorch) - Medieval Hebrew manuscripts in Sephardi bookhand version 1.0
+...
+
+
+

To access more detailed information the show subcommand may be used:

+
$ kraken show 10.5281/zenodo.5617783
+name: 10.5281/zenodo.5617783
+
+Cremma-Medieval Old French Model (Litterature)
+
+....
+scripts: Latn
+alphabet: &'(),-.0123456789:;?ABCDEFGHIJKLMNOPQRSTUVXabcdefghijklmnopqrstuvwxyz¶ãíñõ÷ħĩłũƺᵉẽ’•⁊⁹ꝑꝓꝯꝰ SPACE, COMBINING ACUTE ACCENT, COMBINING TILDE, COMBINING MACRON, COMBINING ZIGZAG ABOVE, COMBINING LATIN SMALL LETTER A, COMBINING LATIN SMALL LETTER E, COMBINING LATIN SMALL LETTER I, COMBINING LATIN SMALL LETTER O, COMBINING LATIN SMALL LETTER U, COMBINING LATIN SMALL LETTER C, COMBINING LATIN SMALL LETTER R, COMBINING LATIN SMALL LETTER T, COMBINING UR ABOVE, COMBINING US ABOVE, COMBINING LATIN SMALL LETTER S, 0xe8e5, 0xf038, 0xf128
+accuracy: 95.49%
+license: CC-BY-SA-2.0
+author(s): Pinche, Ariane
+date: 2021-10-29
+
+
+

If a suitable model has been decided upon it can be retrieved using the get +subcommand:

+
$ kraken get 10.5281/zenodo.5617783
+Processing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 16.1/16.1 MB 0:00:00 0:00:10
+Model name: cremma_medieval_bicerin.mlmodel
+
+
+

Models will be placed in $XDG_BASE_DIR and can be accessed using their name as +printed in the last line of the kraken get output.

+
$ kraken -i ... ... ocr -m cremma_medieval_bicerin.mlmodel
+
+
+
+
+

Publishing

+

When one would like to share a model with the wider world (for fame and glory!) +it is possible (and recommended) to upload them to repository. The process +consists of 2 stages: the creation of the deposit on the Zenodo platform +followed by approval of the model in the community making it discoverable for +other kraken users.

+

For uploading model a Zenodo account and a personal access token is required. +After account creation tokens can be created under the account settings:

+Zenodo token creation dialogue + +

With the token models can then be uploaded:

+
$ ketos publish -a $ACCESS_TOKEN aaebv2-2.mlmodel
+DOI: 10.5281/zenodo.5617783
+
+
+

A number of important metadata will be asked for such as a short description of +the model, long form description, recognized scripts, and authorship. +Afterwards the model is deposited at Zenodo. This deposit is persistent, i.e. +can’t be changed or deleted so it is important to make sure that all the +information is correct. Each deposit also has a unique persistent identifier, a +DOI, that can be used to refer to it, e.g. in publications or when pointing +someone to a particular model.

+

Once the deposit has been created a request (requiring manual approval) for +inclusion in the repository will automatically be created which will make it +discoverable by other users.

+

It is possible to deposit models without including them in the queryable +repository. Models uploaded this way are not truly private and can still be +found through the standard Zenodo search and be downloaded with kraken get +and its DOI. It is mostly suggested for preliminary models that might get +updated later:

+
$ ketos publish --private -a $ACCESS_TOKEN aaebv2-2.mlmodel
+DOI: 10.5281/zenodo.5617734
+
+
+
+
+
+

Recognition

+

Recognition requires a grey-scale or binarized image, a page segmentation for +that image, and a model file. In particular there is no requirement to use the +page segmentation algorithm contained in the segment subcommand or the +binarization provided by kraken.

+

Multi-script recognition is possible by supplying a script-annotated +segmentation and a mapping between scripts and models:

+
$ kraken -i ... ... ocr -m Grek:porson.mlmodel -m Latn:antiqua.mlmodel
+
+
+

All polytonic Greek text portions will be recognized using the porson.mlmodel +model while Latin text will be fed into the antiqua.mlmodel model. It is +possible to define a fallback model that other text will be fed to:

+
$ kraken -i ... ... ocr -m ... -m ... -m default:porson.mlmodel
+
+
+

It is also possible to disable recognition on a particular script by mapping to +the special model keyword ignore. Ignored lines will still be serialized but +will not contain any recognition results.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/main/api.html b/main/api.html new file mode 100644 index 000000000..3353f612d --- /dev/null +++ b/main/api.html @@ -0,0 +1,3185 @@ + + + + + + + + API Quickstart — kraken documentation + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

API Quickstart

+

Kraken provides routines which are usable by third party tools to access all +functionality of the OCR engine. Most functional blocks, binarization, +segmentation, recognition, and serialization are encapsulated in one high +level method each.

+

Simple use cases of the API which are mostly useful for debugging purposes are +contained in the contrib directory. In general it is recommended to look at +this tutorial, these scripts, or the API reference. The command line drivers +are unnecessarily complex for straightforward applications as they contain lots +of boilerplate to enable all use cases.

+
+

Basic Concepts

+

The fundamental modules of the API are similar to the command line drivers. +Image inputs and outputs are generally Pillow +objects and numerical outputs numpy arrays.

+

Top-level modules implement high level functionality while kraken.lib +contains loaders and low level methods that usually should not be used if +access to intermediate results is not required.

+
+
+

Preprocessing and Segmentation

+

The primary preprocessing function is binarization although depending on the +particular setup of the pipeline and the models utilized it can be optional. +For the non-trainable legacy bounding box segmenter binarization is mandatory +although it is still possible to feed color and grayscale images to the +recognizer. The trainable baseline segmenter can work with black and white, +grayscale, and color images, depending on the training data and network +configuration utilized; though grayscale and color data are used in almost all +cases.

+
>>> from PIL import Image
+
+>>> from kraken import binarization
+
+# can be any supported image format and mode
+>>> im = Image.open('foo.png')
+>>> bw_im = binarization.nlbin(im)
+
+
+
+

Legacy segmentation

+

The basic parameter of the legacy segmenter consists just of a b/w image +object, although some additional parameters exist, largely to change the +principal text direction (important for column ordering and top-to-bottom +scripts) and explicit masking of non-text image regions:

+
>>> from kraken import pageseg
+
+>>> seg = pageseg.segment(bw_im)
+>>> seg
+Segmentation(type='bbox',
+             imagename='foo.png',
+             text_direction='horizontal-lr',
+             script_detection=False,
+             lines=[BBoxLine(id='0ce11ad6-1f3b-4f7d-a8c8-0178e411df69',
+                             bbox=[74, 61, 136, 101],
+                             text=None,
+                             base_dir=None,
+                             type='bbox',
+                             imagename=None,
+                             tags=None,
+                             split=None,
+                             regions=None,
+                             text_direction='horizontal-lr'),
+                    BBoxLine(id='c4a751dc-6731-4eea-a287-d4b57683f5b0', ...),
+                    ....],
+             regions={},
+             line_orders=[])
+
+
+

All segmentation methods return a kraken.containers.Segmentation +object that contains all elements of the segmentation: its type, a list of +lines (either kraken.containers.BBoxLine or +kraken.containers.BaselineLine), a dictionary mapping region types to +lists of regions (kraken.containers.Region), and one or more line +reading orders.

+
+
+

Baseline segmentation

+

The baseline segmentation method is based on a neural network that classifies +image pixels into baselines and regions. Because it is trainable, a +segmentation model is required in addition to the image to be segmented and +it has to be loaded first:

+
>>> from kraken import blla
+>>> from kraken.lib import vgsl
+
+>>> model_path = 'path/to/model/file'
+>>> model = vgsl.TorchVGSLModel.load_model(model_path)
+
+
+

A segmentation model contains a basic neural network and associated metadata +defining the available line and region types, bounding regions, and an +auxiliary baseline location flag for the polygonizer:

+ + + + + + + + + + + + + Segmentation Model + (TorchVGSLModel) + + + + + + + + + Metadata + + + + + + + Line and Region Types + + + + + + + Baseline location flag + + + + + + + Bounding Regions + + + + + + + + + + + Neural Network + + + + +

Afterwards they can be fed into the segmentation method +kraken.blla.segment() with image objects:

+
>>> from kraken import blla
+>>> from kraken import serialization
+
+>>> baseline_seg = blla.segment(im, model=model)
+>>> baseline_seg
+Segmentation(type='baselines',
+             imagename='foo.png',
+             text_direction='horizontal-lr',
+             script_detection=False,
+             lines=[BaselineLine(id='22fee3d1-377e-4130-b9e5-5983a0c50ce8',
+                                 baseline=[[71, 93], [145, 92]],
+                                 boundary=[[71, 93], ..., [71, 93]],
+                                 text=None,
+                                 base_dir=None,
+                                 type='baselines',
+                                 imagename=None,
+                                 tags={'type': 'default'},
+                                 split=None,
+                                 regions=['f17d03e0-50bb-4a35-b247-cb910c0aaf2b']),
+                    BaselineLine(id='539eadce-f795-4bba-a785-c7767d10c407', ...), ...],
+             regions={'text': [Region(id='f17d03e0-50bb-4a35-b247-cb910c0aaf2b',
+                                      boundary=[[277, 54], ..., [277, 54]],
+                                      imagename=None,
+                                      tags={'type': 'text'})]},
+             line_orders=[])
+>>> alto = serialization.serialize(baseline_seg,
+                                   image_size=im.size,
+                                   template='alto')
+>>> with open('segmentation_output.xml', 'w') as fp:
+        fp.write(alto)
+
+
+

A default segmentation model is supplied and will be used if none is specified +explicitly as an argument. Optional parameters are largely the same as for the +legacy segmenter, i.e. text direction and masking.

+

Images are automatically converted into the proper mode for recognition, except +in the case of models trained on binary images as there is a plethora of +different algorithms available, each with strengths and weaknesses. For most +material the kraken-provided binarization should be sufficient, though. This +does not mean that a segmentation model trained on RGB images will have equal +accuracy for B/W, grayscale, and RGB inputs. Nevertheless the drop in quality +will often be modest or non-existent for color models while non-binarized +inputs to a binary model will cause severe degradation (and a warning to that +notion).

+

Per default segmentation is performed on the CPU although the neural network +can be run on a GPU with the device argument. As the vast majority of the +processing required is postprocessing the performance gain will most likely +modest though.

+

The above API is the most simple way to perform a complete segmentation. The +process consists of multiple steps such as pixel labelling, separate region and +baseline vectorization, and bounding polygon calculation:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Pixel Labelling + + + + + + + + Line and Separator + Heatmaps + + + + + + + + + Bounding Polygon + Calculation + + + + + + + + + + + Baseline + Vectorization + and Orientation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Oriented + Baselines + + + + + + + + + Line + Ordering + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Bounding + Polygons + + + + + + + Trainable + + + + + + + + + + + + Segmentation + + + + + + + + + + Region Heatmaps + + + + + + + + + + Region + Vectorization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Region + Boundaries + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

It is possible to only run a subset of the functionality depending on one’s +needs by calling the respective functions in kraken.lib.segmentation. As +part of the sub-library the API is not guaranteed to be stable but it generally +does not change much. Examples of more fine-grained use of the segmentation API +can be found in contrib/repolygonize.py +and contrib/segmentation_overlay.py.

+
+
+
+

Recognition

+

Recognition itself is a multi-step process with a neural network producing a +matrix with a confidence value for possible outputs at each time step. This +matrix is decoded into a sequence of integer labels (label domain) which are +subsequently mapped into Unicode code points using a codec. Labels and code +points usually correspond one-to-one, i.e. each label is mapped to exactly one +Unicode code point, but if desired more complex codecs can map single labels to +multiple code points, multiple labels to single code points, or multiple labels +to multiple code points (see the Codec section for further +information).

+ + + + + + + + + + + + Output Matrix + + + Labels + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Label + Sequence + + + 15, 10, 1, ... + + + + 'Time' Steps + + + + + + + + + + + + + + 'Time' Steps + (Width) + + + + + + + + + + + + + + + + + + + + + + + + + + Neural + Net + + + + Character + Sequence + + + o, c, u, ... + + + + + + + + + + + + + + + CTC + decoder + + + + + Codec + + + + + + + + + + + + + + +

As the customization of this two-stage decoding process is usually reserved +for specialized use cases, sensible defaults are chosen by default: codecs are +part of the model file and do not have to be supplied manually; the preferred +CTC decoder is an optional parameter of the recognition model object.

+

To perform text line recognition a neural network has to be loaded first. A +kraken.lib.models.TorchSeqRecognizer is returned which is a wrapper +around the kraken.lib.vgsl.TorchVGSLModel class seen above for +segmentation model loading.

+
>>> from kraken.lib import models
+
+>>> rec_model_path = '/path/to/recognition/model'
+>>> model = models.load_any(rec_model_path)
+
+
+

The sequence recognizer wrapper combines the neural network itself, a +codec, metadata such as if the input is supposed to be +grayscale or binarized, and an instance of a CTC decoder that performs the +conversion of the raw output tensor of the network into a sequence of labels:

+ + + + + + + + + + + + + Transcription Model + (TorchSeqRecognizer) + + + + + + + + + + Codec + + + + + + + + + + + Metadata + + + + + + + + + + + CTC Decoder + + + + + + + + + + + Neural Network + + + + +

Afterwards, given an image, a segmentation and the model one can perform text +recognition. The code is identical for both legacy and baseline segmentations. +Like for segmentation input images are auto-converted to the correct color +mode, except in the case of binary models for which a warning will be raised if +there is a mismatch.

+

There are two methods for recognition, a basic single model call +kraken.rpred.rpred() and a multi-model recognizer +kraken.rpred.mm_rpred(). The latter is useful for recognizing +multi-scriptal documents, i.e. applying different models to different parts of +a document.

+
>>> from kraken import rpred
+# single model recognition
+>>> pred_it = rpred(network=model,
+                    im=im,
+                    segmentation=baseline_seg)
+>>> for record in pred_it:
+        print(record)
+
+
+

The output isn’t just a sequence of characters but, depending on the type of +segmentation supplied, a kraken.containers.BaselineOCRRecord or +kraken.containers.BBoxOCRRecord record object containing the character +prediction, cuts (approximate locations), and confidences.

+
>>> record.cuts
+>>> record.prediction
+>>> record.confidences
+
+
+

it is also possible to access the original line information:

+
# for baselines
+>>> record.type
+'baselines'
+>>> record.line
+>>> record.baseline
+>>> record.script
+
+# for box lines
+>>> record.type
+'bbox'
+>>> record.line
+>>> record.script
+
+
+

Sometimes the undecoded raw output of the network is required. The \(C +\times W\) softmax output matrix is accessible as the outputs attribute on the +kraken.lib.models.TorchSeqRecognizer after each step of the +kraken.rpred.rpred() iterator. To get a mapping from the label space +\(C\) the network operates in to Unicode code points a codec is used. An +arbitrary sequence of labels can generate an arbitrary number of Unicode code +points although usually the relation is one-to-one.

+
>>> pred_it = rpred(model, im, baseline_seg)
+>>> next(pred_it)
+>>> model.output
+>>> model.codec.l2c
+{'\x01': ' ',
+ '\x02': '"',
+ '\x03': "'",
+ '\x04': '(',
+ '\x05': ')',
+ '\x06': '-',
+ '\x07': '/',
+ ...
+}
+
+
+

There are several different ways to convert the output matrix to a sequence of +labels that can be decoded into a character sequence. These are contained in +kraken.lib.ctc_decoder with +kraken.lib.ctc_decoder.greedy_decoder() being the default.

+
+
+

XML Parsing

+

Sometimes it is desired to take the data in an existing XML serialization +format like PageXML or ALTO and apply an OCR function on it. The +kraken.lib.xml module includes parsers extracting information into data +structures processable with minimal transformation by the functional blocks:

+

Parsing is accessed is through the kraken.lib.xml.XMLPage class.

+
>>> from kraken.lib import xml
+
+>>> alto_doc = '/path/to/alto'
+>>> parsed_doc = xml.XMLPage(alto_doc)
+>>> parsed_doc
+XMLPage(filename='/path/to/alto', filetype=alto)
+>>> parsed_doc.lines
+{'line_1469098625593_463': BaselineLine(id='line_1469098625593_463',
+                                        baseline=[(2337, 226), (2421, 239)],
+                                        boundary=[(2344, 182), (2428, 195), (2420, 244), (2336, 231)],
+                                        text='$pag:39',
+                                        base_dir=None,
+                                        type='baselines',
+                                        imagename=None,
+                                        tags={'type': '$pag'},
+                                        split=None,
+                                        regions=['region_1469098609000_462']),
+
+ 'line_1469098649515_464': BaselineLine(id='line_1469098649515_464',
+                                        baseline=[(789, 269), (2397, 304)],
+                                        boundary=[(790, 224), (2398, 259), (2397, 309), (789, 274)],
+                                        text='$-nor su hijo, De todos sus bienes, con los pactos',
+                                        base_dir=None,
+                                        type='baselines',
+                                        imagename=None,
+                                        tags={'type': '$pac'},
+                                        split=None,
+                                        regions=['region_1469098557906_461']),
+ ....}
+>>> parsed_doc.regions
+{'$pag': [Region(id='region_1469098609000_462',
+                 boundary=[(2324, 171), (2437, 171), (2436, 258), (2326, 237)],
+                 imagename=None,
+                 tags={'type': '$pag'})],
+ '$pac': [Region(id='region_1469098557906_461',
+                 boundary=[(738, 203), (2339, 245), (2398, 294), (2446, 345), (2574, 469), (2539, 1873), (2523, 2053), (2477, 2182), (738, 2243)],
+                 imagename=None,
+                 tags={'type': '$pac'})],
+ '$tip': [Region(id='TextRegion_1520586482298_194',
+                 boundary=[(687, 2428), (688, 2422), (107, 2420), (106, 2264), (789, 2256), (758, 2404)],
+                 imagename=None,
+                 tags={'type': '$tip'})],
+ '$par': [Region(id='TextRegion_1520586482298_193',
+                 boundary=[(675, 3772), (687, 2428), (758, 2404), (789, 2256), (2542, 2236), (2581, 3748)],
+                 imagename=None,
+                 tags={'type': '$par'})]
+}
+
+
+

The parser is aware of reading order(s), thus the basic properties accessing +lines and regions are unordered dictionaries. Reading orders can be accessed +separately through the reading_orders property:

+
>>> parsed_doc.region_orders
+{'line_implicit': {'order': ['line_1469098625593_463',
+                             'line_1469098649515_464',
+                             ...
+                            'line_1469099255968_508'],
+                   'is_total': True,
+                   'description': 'Implicit line order derived from element sequence'},
+'region_implicit': {'order': ['region_1469098609000_462',
+                              ...
+                             'TextRegion_1520586482298_193'],
+                    'is_total': True,
+                    'description': 'Implicit region order derived from element sequence'},
+'region_transkribus': {'order': ['region_1469098609000_462',
+                                 ...
+                                'TextRegion_1520586482298_193'],
+                    'is_total': True,
+                    'description': 'Explicit region order from `custom` attribute'},
+'line_transkribus': {'order': ['line_1469098625593_463',
+                               ...
+                               'line_1469099255968_508'],
+                     'is_total': True,
+                     'description': 'Explicit line order from `custom` attribute'},
+'o_1530717944451': {'order': ['region_1469098609000_462',
+                              ...
+                              'TextRegion_1520586482298_193'],
+                   'is_total': True,
+                   'description': 'Regions reading order'}}
+
+
+

Reading orders are created from different sources, depending on the content of +the XML file. Every document will contain at least implicit orders for lines +and regions (line_implicit and region_implicit) sourced from the sequence +of line and region elements. There can also be explicit additional orders +defined by the standard reading order elements, for example o_1530717944451 +in the above example. In Page XML files reading orders defined with the +Transkribus style custom attribute are also recognized.

+

To access the lines or regions of a document in a particular order:

+
>>> parsed_doc.get_sorted_lines(ro='line_implicit')
+[BaselineLine(id='line_1469098625593_463',
+              baseline=[(2337, 226), (2421, 239)],
+              boundary=[(2344, 182), (2428, 195), (2420, 244), (2336, 231)],
+              text='$pag:39',
+              base_dir=None,
+              type='baselines',
+              imagename=None,
+              tags={'type': '$pag'},
+              split=None,
+              regions=['region_1469098609000_462']),
+ BaselineLine(id='line_1469098649515_464',
+              baseline=[(789, 269), (2397, 304)],
+              boundary=[(790, 224), (2398, 259), (2397, 309), (789, 274)],
+              text='$-nor su hijo, De todos sus bienes, con los pactos',
+              base_dir=None,
+              type='baselines',
+              imagename=None,
+              tags={'type': '$pac'},
+              split=None,
+              regions=['region_1469098557906_461'])
+...]
+
+
+

The recognizer functions do not accept kraken.lib.xml.XMLPage objects +directly which means that for most practical purposes these need to be +converted into container objects:

+
>>> segmentation = parsed_doc.to_container()
+>>> pred_it = rpred(network=model,
+                    im=im,
+                    segmentation=segmentation)
+>>> for record in pred_it:
+        print(record)
+
+
+
+
+

Serialization

+

The serialization module can be used to transform results returned by the +segmenter or recognizer into a text based (most often XML) format for archival. +The module renders jinja2 templates, +either ones packaged with kraken or supplied externally, +through the kraken.serialization.serialize() function.

+
>>> import dataclasses
+>>> from kraken.lib import serialization
+
+>>> alto_seg_only = serialization.serialize(baseline_seg, image_size=im.size, template='alto')
+
+>>> records = [record for record in pred_it]
+>>> results = dataclasses.replace(pred_it.bounds, lines=records)
+>>> alto = serialization.serialize(results, image_size=im.size, template='alto')
+>>> with open('output.xml', 'w') as fp:
+        fp.write(alto)
+
+
+

The serialization function accepts arbitrary +kraken.containers.Segmentation objects, which may contain textual or +only segmentation information. As the recognizer returns +ocr_records which cannot be serialized +directly it is necessary to either construct a new +kraken.containers.Segmentation from scratch or insert them into the +segmentation fed into the recognizer (ocr_records subclass BaselineLine/BBoxLine The container classes are immutable data classes, +therefore it is necessary for simple insertion of the records to use +dataclasses.replace to create a new segmentation with a changed lines +attribute.

+
+
+

Training

+

Training is largely implemented with the pytorch lightning framework. There are separate +LightningModule`s for recognition and segmentation training and a small +wrapper around the lightning’s `Trainer class that mainly sets up model +handling and verbosity options for the CLI.

+
>>> from kraken.lib.train import RecognitionModel, KrakenTrainer
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer()
+>>> trainer.fit(model)
+
+
+

Likewise for a baseline and region segmentation model:

+
>>> from kraken.lib.train import SegmentationModel, KrakenTrainer
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = SegmentationModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer()
+>>> trainer.fit(model)
+
+
+

When the fit() method is called the dataset is initialized and the training +commences. Both can take quite a bit of time. To get insight into what exactly +is happening the standard lightning callbacks +can be attached to the trainer object:

+
>>> from pytorch_lightning.callbacks import Callback
+>>> from kraken.lib.train import RecognitionModel, KrakenTrainer
+>>> class MyPrintingCallback(Callback):
+    def on_init_start(self, trainer):
+        print("Starting to init trainer!")
+
+    def on_init_end(self, trainer):
+        print("trainer is init now")
+
+    def on_train_end(self, trainer, pl_module):
+        print("do something when training ends")
+>>> ground_truth = glob.glob('training/*.xml')
+>>> training_files = ground_truth[:250] # training data is shuffled internally
+>>> evaluation_files = ground_truth[250:]
+>>> model = RecognitionModel(training_data=training_files, evaluation_data=evaluation_files, format_type='xml', augment=True)
+>>> trainer = KrakenTrainer(enable_progress_bar=False, callbacks=[MyPrintingCallback])
+>>> trainer.fit(model)
+Starting to init trainer!
+trainer is init now
+
+
+

This is only a small subset of the training functionality. It is suggested to +have a closer look at the command line parameters for features as transfer +learning, region and baseline filtering, training continuation, and so on.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/main/api_docs.html b/main/api_docs.html new file mode 100644 index 000000000..273fd9430 --- /dev/null +++ b/main/api_docs.html @@ -0,0 +1,4361 @@ + + + + + + + + API Reference — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

API Reference

+
+

Segmentation

+
+

kraken.blla module

+
+

Note

+

blla provides the interface to the fully trainable segmenter. For the +legacy segmenter interface refer to the pageseg module. Note that +recognition models are not interchangeable between segmenters.

+
+
+
+kraken.blla.segment(im, text_direction='horizontal-lr', mask=None, reading_order_fn=polygonal_reading_order, model=None, device='cpu', raise_on_error=False, autocast=False)
+

Segments a page into text lines using the baseline segmenter.

+

Segments a page into text lines and returns the polyline formed by each +baseline and their estimated environment.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image. The mode can generally be anything but it is possible +to supply a binarized-input-only model which requires accordingly +treated images.

  • +
  • text_direction (Literal['horizontal-lr', 'horizontal-rl', 'vertical-lr', 'vertical-rl']) – Passed-through value for serialization.serialize.

  • +
  • mask (Optional[numpy.ndarray]) – A bi-level mask image of the same size as im where 0-valued +regions are ignored for segmentation purposes. Disables column +detection.

  • +
  • reading_order_fn (Callable) – Function to determine the reading order. Has to +accept a list of tuples (baselines, polygon) and a +text direction (lr or rl).

  • +
  • model (Union[List[kraken.lib.vgsl.TorchVGSLModel], kraken.lib.vgsl.TorchVGSLModel]) – One or more TorchVGSLModel containing a segmentation model. If +none is given a default model will be loaded.

  • +
  • device (str) – The target device to run the neural network on.

  • +
  • raise_on_error (bool) – Raises error instead of logging them when they are +not-blocking

  • +
  • autocast (bool) – Runs the model with automatic mixed precision

  • +
+
+
Returns:
+

A kraken.containers.Segmentation class containing reading +order sorted baselines (polylines) and their respective polygonal +boundaries as kraken.containers.BaselineLine records. The +last and first point of each boundary polygon are connected.

+
+
Raises:
+
+
+
Return type:
+

kraken.containers.Segmentation

+
+
+

Notes

+

Multi-model operation is most useful for combining one or more region +detection models and one text line model. Detected lines from all +models are simply combined without any merging or duplicate detection +so the chance of the same line appearing multiple times in the output +are high. In addition, neural reading order determination is disabled +when more than one model outputs lines.

+
+ +
+
+

kraken.pageseg module

+
+

Note

+

pageseg is the legacy bounding box-based segmenter. For the trainable +baseline segmenter interface refer to the blla module. Note that +recognition models are not interchangeable between segmenters.

+
+
+
+kraken.pageseg.segment(im, text_direction='horizontal-lr', scale=None, maxcolseps=2, black_colseps=False, no_hlines=True, pad=0, mask=None, reading_order_fn=reading_order)
+

Segments a page into text lines.

+

Segments a page into text lines and returns the absolute coordinates of +each line in reading order.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – A bi-level page of mode ‘1’ or ‘L’

  • +
  • text_direction (str) – Principal direction of the text +(horizontal-lr/rl/vertical-lr/rl)

  • +
  • scale (Optional[float]) – Scale of the image. Will be auto-determined if set to None.

  • +
  • maxcolseps (float) – Maximum number of whitespace column separators

  • +
  • black_colseps (bool) – Whether column separators are assumed to be vertical +black lines or not

  • +
  • no_hlines (bool) – Switch for small horizontal line removal.

  • +
  • pad (Union[int, Tuple[int, int]]) – Padding to add to line bounding boxes. If int the same padding is +used both left and right. If a 2-tuple, uses (padding_left, +padding_right).

  • +
  • mask (Optional[numpy.ndarray]) – A bi-level mask image of the same size as im where 0-valued +regions are ignored for segmentation purposes. Disables column +detection.

  • +
  • reading_order_fn (Callable) – Function to call to order line output. Callable +accepting a list of slices (y, x) and a text +direction in (rl, lr).

  • +
+
+
Returns:
+

A kraken.containers.Segmentation class containing reading +order sorted bounding box-type lines as +kraken.containers.BBoxLine records.

+
+
Raises:
+

KrakenInputException – if the input image is not binarized or the text +direction is invalid.

+
+
Return type:
+

kraken.containers.Segmentation

+
+
+
+ +
+
+
+

Recognition

+
+

kraken.rpred module

+
+
+class kraken.rpred.mm_rpred(nets, im, bounds, pad=16, bidi_reordering=True, tags_ignore=None, no_legacy_polygons=False)
+

Multi-model version of kraken.rpred.rpred

+
+
Parameters:
+
+
+
+
+
+bidi_reordering
+
+ +
+
+bounds
+
+ +
+
+im
+
+ +
+
+len
+
+ +
+
+line_iter
+
+ +
+
+nets
+
+ +
+
+no_legacy_polygons
+
+ +
+
+one_channel_modes
+
+ +
+
+pad
+
+ +
+
+seg_types
+
+ +
+
+tags_ignore
+
+ +
+ +
+
+kraken.rpred.rpred(network, im, bounds, pad=16, bidi_reordering=True, no_legacy_polygons=False)
+

Uses a TorchSeqRecognizer and a segmentation to recognize text

+
+
Parameters:
+
    +
  • network (kraken.lib.models.TorchSeqRecognizer) – A TorchSegRecognizer object

  • +
  • im (PIL.Image.Image) – Image to extract text from

  • +
  • bounds (kraken.containers.Segmentation) – A Segmentation class instance containing either a baseline or +bbox segmentation.

  • +
  • pad (int) – Extra blank padding to the left and right of text line. +Auto-disabled when expected network inputs are incompatible with +padding.

  • +
  • bidi_reordering (Union[bool, str]) – Reorder classes in the ocr_record according to the +Unicode bidirectional algorithm for correct display. +Set to L|R to change base text direction.

  • +
  • no_legacy_polygons (bool)

  • +
+
+
Yields:
+

An ocr_record containing the recognized text, absolute character +positions, and confidence values for each character.

+
+
Return type:
+

Generator[kraken.containers.ocr_record, None, None]

+
+
+
+ +
+
+
+

Serialization

+
+

kraken.serialization module

+
+
+kraken.serialization.render_report(model, chars, errors, char_accuracy, word_accuracy, char_confusions, scripts, insertions, deletions, substitutions)
+

Renders an accuracy report.

+
+
Parameters:
+
    +
  • model (str) – Model name.

  • +
  • errors (int) – Number of errors on test set.

  • +
  • char_confusions (dict) – Dictionary mapping a tuple (gt, pred) to a +number of occurrences.

  • +
  • scripts (dict) – Dictionary counting character per script.

  • +
  • insertions (dict) – Dictionary counting insertion operations per Unicode +script

  • +
  • deletions (int) – Number of deletions

  • +
  • substitutions (dict) – Dictionary counting substitution operations per +Unicode script.

  • +
  • chars (int)

  • +
  • char_accuracy (float)

  • +
  • word_accuracy (float)

  • +
+
+
Returns:
+

A string containing the rendered report.

+
+
Return type:
+

str

+
+
+
+ +
+
+kraken.serialization.serialize(results, image_size=(0, 0), writing_mode='horizontal-tb', scripts=None, template='alto', template_source='native', processing_steps=None)
+

Serializes recognition and segmentation results into an output document.

+

Serializes a Segmentation container object containing either segmentation +or recognition results into an output document. The rendering is performed +with jinja2 templates that can either be shipped with kraken +(template_source == ‘native’) or custom (template_source == ‘custom’).

+

Note: Empty records are ignored for serialization purposes.

+
+
Parameters:
+
    +
  • segmentation – Segmentation container object

  • +
  • image_size (Tuple[int, int]) – Dimensions of the source image

  • +
  • writing_mode (Literal['horizontal-tb', 'vertical-lr', 'vertical-rl']) – Sets the principal layout of lines and the +direction in which blocks progress. Valid values are +horizontal-tb, vertical-rl, and vertical-lr.

  • +
  • scripts (Optional[Iterable[str]]) – List of scripts contained in the OCR records

  • +
  • template ([os.PathLike, str]) – Selector for the serialization format. May be ‘hocr’, +‘alto’, ‘page’ or any template found in the template +directory. If template_source is set to custom a path to a +template is expected.

  • +
  • template_source (Literal['native', 'custom']) – Switch to enable loading of custom templates from +outside the kraken package.

  • +
  • processing_steps (Optional[List[kraken.containers.ProcessingStep]]) – A list of ProcessingStep container classes describing +the processing kraken performed on the inputs.

  • +
  • results (kraken.containers.Segmentation)

  • +
+
+
Returns:
+

The rendered template

+
+
Return type:
+

str

+
+
+
+ +
+
+

Default templates

+
+

ALTO 4.4

+
{% set proc_type_table = {'processing': 'contentGeneration',
+              'preprocessing': 'preOperation',
+              'postprocessing': 'postOperation'}
+%}
+{%+ macro render_line(page, line) +%}
+                    <TextLine ID="{{ line.id }}" HPOS="{{ line.bbox[0] }}" VPOS="{{ line.bbox[1] }}" WIDTH="{{ line.bbox[2] - line.bbox[0] }}" HEIGHT="{{ line.bbox[3] - line.bbox[1] }}" {% if line.baseline %}BASELINE="{{ line.baseline|sum(start=[])|join(' ') }}"{% endif %} {% if line.tags %}TAGREFS="{% for type in page.line_types %}{% if type[0] in line.tags and line.tags[type[0]] == type[1] %}LINE_TYPE_{{ loop.index }}{% endif %}{% endfor %}"{% endif %}>
+                        {% if line.boundary %}
+                        <Shape>
+                            <Polygon POINTS="{{ line.boundary|sum(start=[])|join(' ') }}"/>
+                        </Shape>
+                        {% endif %}
+                            {% if line.recognition|length() == 0 %}
+                        <String CONTENT=""/>
+                        {% else %}
+                        {% for segment in line.recognition %}
+                        {# ALTO forbids encoding whitespace before any String/Shape tags #}
+                        {% if segment.text is whitespace and loop.index > 1 %}
+                        <SP ID="segment_{{ segment.index }}" HPOS="{{ segment.bbox[0]}}"  VPOS="{{ segment.bbox[1] }}" WIDTH="{{ segment.bbox[2] - segment.bbox[0] }}"  HEIGHT="{{ segment.bbox[3] - segment.bbox[1] }}"/>
+                        {% else %}
+                        <String ID="segment_{{ segment.index }}" CONTENT="{{ segment.text|e }}" HPOS="{{ segment.bbox[0] }}" VPOS="{{ segment.bbox[1] }}" WIDTH="{{ segment.bbox[2] - segment.bbox[0] }}" HEIGHT="{{ segment.bbox[3] - segment.bbox[1] }}" WC="{{ (segment.confidences|sum / segment.confidences|length)|round(4) }}">
+                            {% if segment.boundary %}
+                            <Shape>
+                                <Polygon POINTS="{{ segment.boundary|sum(start=[])|join(' ') }}"/>
+                            </Shape>
+                            {% endif %}
+                            {% for char in segment.recognition %}
+                            <Glyph ID="char_{{ char.index }}" CONTENT="{{ char.text|e }}" HPOS="{{ char.bbox[0] }}" VPOS="{{ char.bbox[1] }}" WIDTH="{{ char.bbox[2] - char.bbox[0] }}" HEIGHT="{{ char.bbox[3] - char.bbox[1] }}" GC="{{ char.confidence|round(4) }}">
+                                {% if char.boundary %}
+                                <Shape>
+                                    <Polygon POINTS="{{ char.boundary|sum(start=[])|join(' ') }}"/>
+                                </Shape>
+                                {% endif %}
+                            </Glyph>
+                            {% endfor %}
+                        </String>
+                        {% endif %}
+                        {% endfor %}
+                        {% endif %}
+                    </TextLine>
+{%+ endmacro %}
+<?xml version="1.0" encoding="UTF-8"?>
+<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+    xmlns="http://www.loc.gov/standards/alto/ns-v4#"
+    xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-3.xsd">
+    <Description>
+        <MeasurementUnit>pixel</MeasurementUnit>
+        <sourceImageInformation>
+            <fileName>{{ page.name }}</fileName>
+        </sourceImageInformation>
+        {% if metadata.processing_steps %}
+        {% for step in metadata.processing_steps %}
+        <Processing ID="OCR_{{ step.id }}">
+            <processingCategory>{{ proc_type_table[step.category] }}</processingCategory>
+            <processingStepDescription>{{ step.description }}</processingStepDescription>
+            <processingStepSettings>{% for k, v in step.settings.items() %}{{k}}: {{v}}{% if not loop.last %}; {% endif %}{% endfor %}</processingStepSettings>
+            <processingSoftware>
+                <softwareName>kraken</softwareName>
+                <softwareVersion>{{ metadata.version }}</softwareVersion>
+            </processingSoftware>
+        </Processing>
+        {% endfor %}
+        {% else %}
+        <Processing ID="OCR_0">
+            <processingCategory>other</processingCategory>
+            <processingStepDescription>unknown</processingStepDescription>
+            <processingSoftware>
+                <softwareName>kraken</softwareName>
+                <softwareVersion>{{ metadata.version }}</softwareVersion>
+            </processingSoftware>
+        </Processing>
+        {% endif %}
+    </Description>
+    <Tags>
+    {% for type, label in page.line_types %}
+        <OtherTag DESCRIPTION="line type" ID="LINE_TYPE_{{ loop.index }}" TYPE="{{ type }}" LABEL="{{ label }}"/>
+    {% endfor %}
+    {% for label in page.region_types %}
+        <OtherTag DESCRIPTION="region type" ID="REGION_TYPE_{{ loop.index }}" TYPE="region" LABEL="{{ label }}"/>
+    {% endfor %}
+    </Tags>
+    {% if page.line_orders|length() > 0 %}
+    <ReadingOrder>
+        {% if page.line_orders | length == 1 %}
+        <OrderedGroup ID="ro_0">
+           {% for id in page.line_orders[0] %}
+           <ElementRef ID="o_{{ loop.index }}" REF="{{ id }}"/>
+           {% endfor %}
+        </OrderedGroup>
+        {% else %}
+        <UnorderedGroup>
+        {% for ro in page.line_orders %}
+           <OrderedGroup ID="ro_{{ loop.index }}">
+           {% for id in ro %}
+               <ElementRef ID="o_{{ loop.index }}" REF="{{ id }}"/>
+           {% endfor %}
+           </OrderedGroup>
+	{% endfor %}
+        </UnorderedGroup>
+        {% endif %}
+    </ReadingOrder>
+    {% endif %}
+    <Layout>
+        <Page WIDTH="{{ page.size[0] }}" HEIGHT="{{ page.size[1] }}" PHYSICAL_IMG_NR="0" ID="page_0">
+            <PrintSpace HPOS="0" VPOS="0" WIDTH="{{ page.size[0] }}" HEIGHT="{{ page.size[1] }}">
+            {% for entity in page.entities %}
+                {% if entity.type == "region" %}
+                {% if loop.previtem and loop.previtem.type == 'line' %}
+                </TextBlock>
+                {% endif %}
+                <TextBlock ID="{{ entity.id }}" {% if entity.bbox %}HPOS="{{ entity.bbox[0] }}" VPOS="{{ entity.bbox[1] }}" WIDTH="{{ entity.bbox[2] - entity.bbox[0] }}" HEIGHT="{{ entity.bbox[3] - entity.bbox[1] }}"{% endif %} {% if entity.tags %}{% for type in page.region_types %}{% if type in entity.tags.values() %}TAGREFS="REGION_TYPE_{{ loop.index }}"{% endif %}{% endfor %}{% endif %}>
+                    {% if entity.bbox %}<Shape>
+                        <Polygon POINTS="{{ entity.boundary|sum(start=[])|join(' ') }}"/>
+                    </Shape>{% endif %}
+                    {%- for line in entity.lines -%}
+                    {{ render_line(page, line) }}
+                    {%- endfor -%}
+                </TextBlock>
+                {% else %}
+                {% if not loop.previtem or loop.previtem.type != 'line' %}
+                <TextBlock ID="textblock_{{ loop.index }}">
+                {% endif %}
+                    {{ render_line(page, entity) }}
+                {% if loop.last %}
+                </TextBlock>
+                {% endif %}
+            {% endif %}
+            {% endfor %}
+            </PrintSpace>
+        </Page>
+    </Layout>
+</alto>
+
+
+
+
+

PageXML

+
{% set proc_type_table = {'processing': 'contentGeneration',
+              'preprocessing': 'preOperation',
+              'postprocessing': 'postOperation'}
+%}
+{%+ macro render_line(page, line) +%}
+                    <TextLine ID="{{ line.id }}" HPOS="{{ line.bbox[0] }}" VPOS="{{ line.bbox[1] }}" WIDTH="{{ line.bbox[2] - line.bbox[0] }}" HEIGHT="{{ line.bbox[3] - line.bbox[1] }}" {% if line.baseline %}BASELINE="{{ line.baseline|sum(start=[])|join(' ') }}"{% endif %} {% if line.tags %}TAGREFS="{% for type in page.line_types %}{% if type[0] in line.tags and line.tags[type[0]] == type[1] %}LINE_TYPE_{{ loop.index }}{% endif %}{% endfor %}"{% endif %}>
+                        {% if line.boundary %}
+                        <Shape>
+                            <Polygon POINTS="{{ line.boundary|sum(start=[])|join(' ') }}"/>
+                        </Shape>
+                        {% endif %}
+                            {% if line.recognition|length() == 0 %}
+                        <String CONTENT=""/>
+                        {% else %}
+                        {% for segment in line.recognition %}
+                        {# ALTO forbids encoding whitespace before any String/Shape tags #}
+                        {% if segment.text is whitespace and loop.index > 1 %}
+                        <SP ID="segment_{{ segment.index }}" HPOS="{{ segment.bbox[0]}}"  VPOS="{{ segment.bbox[1] }}" WIDTH="{{ segment.bbox[2] - segment.bbox[0] }}"  HEIGHT="{{ segment.bbox[3] - segment.bbox[1] }}"/>
+                        {% else %}
+                        <String ID="segment_{{ segment.index }}" CONTENT="{{ segment.text|e }}" HPOS="{{ segment.bbox[0] }}" VPOS="{{ segment.bbox[1] }}" WIDTH="{{ segment.bbox[2] - segment.bbox[0] }}" HEIGHT="{{ segment.bbox[3] - segment.bbox[1] }}" WC="{{ (segment.confidences|sum / segment.confidences|length)|round(4) }}">
+                            {% if segment.boundary %}
+                            <Shape>
+                                <Polygon POINTS="{{ segment.boundary|sum(start=[])|join(' ') }}"/>
+                            </Shape>
+                            {% endif %}
+                            {% for char in segment.recognition %}
+                            <Glyph ID="char_{{ char.index }}" CONTENT="{{ char.text|e }}" HPOS="{{ char.bbox[0] }}" VPOS="{{ char.bbox[1] }}" WIDTH="{{ char.bbox[2] - char.bbox[0] }}" HEIGHT="{{ char.bbox[3] - char.bbox[1] }}" GC="{{ char.confidence|round(4) }}">
+                                {% if char.boundary %}
+                                <Shape>
+                                    <Polygon POINTS="{{ char.boundary|sum(start=[])|join(' ') }}"/>
+                                </Shape>
+                                {% endif %}
+                            </Glyph>
+                            {% endfor %}
+                        </String>
+                        {% endif %}
+                        {% endfor %}
+                        {% endif %}
+                    </TextLine>
+{%+ endmacro %}
+<?xml version="1.0" encoding="UTF-8"?>
+<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+    xmlns="http://www.loc.gov/standards/alto/ns-v4#"
+    xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-3.xsd">
+    <Description>
+        <MeasurementUnit>pixel</MeasurementUnit>
+        <sourceImageInformation>
+            <fileName>{{ page.name }}</fileName>
+        </sourceImageInformation>
+        {% if metadata.processing_steps %}
+        {% for step in metadata.processing_steps %}
+        <Processing ID="OCR_{{ step.id }}">
+            <processingCategory>{{ proc_type_table[step.category] }}</processingCategory>
+            <processingStepDescription>{{ step.description }}</processingStepDescription>
+            <processingStepSettings>{% for k, v in step.settings.items() %}{{k}}: {{v}}{% if not loop.last %}; {% endif %}{% endfor %}</processingStepSettings>
+            <processingSoftware>
+                <softwareName>kraken</softwareName>
+                <softwareVersion>{{ metadata.version }}</softwareVersion>
+            </processingSoftware>
+        </Processing>
+        {% endfor %}
+        {% else %}
+        <Processing ID="OCR_0">
+            <processingCategory>other</processingCategory>
+            <processingStepDescription>unknown</processingStepDescription>
+            <processingSoftware>
+                <softwareName>kraken</softwareName>
+                <softwareVersion>{{ metadata.version }}</softwareVersion>
+            </processingSoftware>
+        </Processing>
+        {% endif %}
+    </Description>
+    <Tags>
+    {% for type, label in page.line_types %}
+        <OtherTag DESCRIPTION="line type" ID="LINE_TYPE_{{ loop.index }}" TYPE="{{ type }}" LABEL="{{ label }}"/>
+    {% endfor %}
+    {% for label in page.region_types %}
+        <OtherTag DESCRIPTION="region type" ID="REGION_TYPE_{{ loop.index }}" TYPE="region" LABEL="{{ label }}"/>
+    {% endfor %}
+    </Tags>
+    {% if page.line_orders|length() > 0 %}
+    <ReadingOrder>
+        {% if page.line_orders | length == 1 %}
+        <OrderedGroup ID="ro_0">
+           {% for id in page.line_orders[0] %}
+           <ElementRef ID="o_{{ loop.index }}" REF="{{ id }}"/>
+           {% endfor %}
+        </OrderedGroup>
+        {% else %}
+        <UnorderedGroup>
+        {% for ro in page.line_orders %}
+           <OrderedGroup ID="ro_{{ loop.index }}">
+           {% for id in ro %}
+               <ElementRef ID="o_{{ loop.index }}" REF="{{ id }}"/>
+           {% endfor %}
+           </OrderedGroup>
+	{% endfor %}
+        </UnorderedGroup>
+        {% endif %}
+    </ReadingOrder>
+    {% endif %}
+    <Layout>
+        <Page WIDTH="{{ page.size[0] }}" HEIGHT="{{ page.size[1] }}" PHYSICAL_IMG_NR="0" ID="page_0">
+            <PrintSpace HPOS="0" VPOS="0" WIDTH="{{ page.size[0] }}" HEIGHT="{{ page.size[1] }}">
+            {% for entity in page.entities %}
+                {% if entity.type == "region" %}
+                {% if loop.previtem and loop.previtem.type == 'line' %}
+                </TextBlock>
+                {% endif %}
+                <TextBlock ID="{{ entity.id }}" {% if entity.bbox %}HPOS="{{ entity.bbox[0] }}" VPOS="{{ entity.bbox[1] }}" WIDTH="{{ entity.bbox[2] - entity.bbox[0] }}" HEIGHT="{{ entity.bbox[3] - entity.bbox[1] }}"{% endif %} {% if entity.tags %}{% for type in page.region_types %}{% if type in entity.tags.values() %}TAGREFS="REGION_TYPE_{{ loop.index }}"{% endif %}{% endfor %}{% endif %}>
+                    {% if entity.bbox %}<Shape>
+                        <Polygon POINTS="{{ entity.boundary|sum(start=[])|join(' ') }}"/>
+                    </Shape>{% endif %}
+                    {%- for line in entity.lines -%}
+                    {{ render_line(page, line) }}
+                    {%- endfor -%}
+                </TextBlock>
+                {% else %}
+                {% if not loop.previtem or loop.previtem.type != 'line' %}
+                <TextBlock ID="textblock_{{ loop.index }}">
+                {% endif %}
+                    {{ render_line(page, entity) }}
+                {% if loop.last %}
+                </TextBlock>
+                {% endif %}
+            {% endif %}
+            {% endfor %}
+            </PrintSpace>
+        </Page>
+    </Layout>
+</alto>
+
+
+
+
+

hOCR

+
{% set proc_type_table = {'processing': 'contentGeneration',
+              'preprocessing': 'preOperation',
+              'postprocessing': 'postOperation'}
+%}
+{%+ macro render_line(page, line) +%}
+                    <TextLine ID="{{ line.id }}" HPOS="{{ line.bbox[0] }}" VPOS="{{ line.bbox[1] }}" WIDTH="{{ line.bbox[2] - line.bbox[0] }}" HEIGHT="{{ line.bbox[3] - line.bbox[1] }}" {% if line.baseline %}BASELINE="{{ line.baseline|sum(start=[])|join(' ') }}"{% endif %} {% if line.tags %}TAGREFS="{% for type in page.line_types %}{% if type[0] in line.tags and line.tags[type[0]] == type[1] %}LINE_TYPE_{{ loop.index }}{% endif %}{% endfor %}"{% endif %}>
+                        {% if line.boundary %}
+                        <Shape>
+                            <Polygon POINTS="{{ line.boundary|sum(start=[])|join(' ') }}"/>
+                        </Shape>
+                        {% endif %}
+                            {% if line.recognition|length() == 0 %}
+                        <String CONTENT=""/>
+                        {% else %}
+                        {% for segment in line.recognition %}
+                        {# ALTO forbids encoding whitespace before any String/Shape tags #}
+                        {% if segment.text is whitespace and loop.index > 1 %}
+                        <SP ID="segment_{{ segment.index }}" HPOS="{{ segment.bbox[0]}}"  VPOS="{{ segment.bbox[1] }}" WIDTH="{{ segment.bbox[2] - segment.bbox[0] }}"  HEIGHT="{{ segment.bbox[3] - segment.bbox[1] }}"/>
+                        {% else %}
+                        <String ID="segment_{{ segment.index }}" CONTENT="{{ segment.text|e }}" HPOS="{{ segment.bbox[0] }}" VPOS="{{ segment.bbox[1] }}" WIDTH="{{ segment.bbox[2] - segment.bbox[0] }}" HEIGHT="{{ segment.bbox[3] - segment.bbox[1] }}" WC="{{ (segment.confidences|sum / segment.confidences|length)|round(4) }}">
+                            {% if segment.boundary %}
+                            <Shape>
+                                <Polygon POINTS="{{ segment.boundary|sum(start=[])|join(' ') }}"/>
+                            </Shape>
+                            {% endif %}
+                            {% for char in segment.recognition %}
+                            <Glyph ID="char_{{ char.index }}" CONTENT="{{ char.text|e }}" HPOS="{{ char.bbox[0] }}" VPOS="{{ char.bbox[1] }}" WIDTH="{{ char.bbox[2] - char.bbox[0] }}" HEIGHT="{{ char.bbox[3] - char.bbox[1] }}" GC="{{ char.confidence|round(4) }}">
+                                {% if char.boundary %}
+                                <Shape>
+                                    <Polygon POINTS="{{ char.boundary|sum(start=[])|join(' ') }}"/>
+                                </Shape>
+                                {% endif %}
+                            </Glyph>
+                            {% endfor %}
+                        </String>
+                        {% endif %}
+                        {% endfor %}
+                        {% endif %}
+                    </TextLine>
+{%+ endmacro %}
+<?xml version="1.0" encoding="UTF-8"?>
+<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+    xmlns="http://www.loc.gov/standards/alto/ns-v4#"
+    xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-3.xsd">
+    <Description>
+        <MeasurementUnit>pixel</MeasurementUnit>
+        <sourceImageInformation>
+            <fileName>{{ page.name }}</fileName>
+        </sourceImageInformation>
+        {% if metadata.processing_steps %}
+        {% for step in metadata.processing_steps %}
+        <Processing ID="OCR_{{ step.id }}">
+            <processingCategory>{{ proc_type_table[step.category] }}</processingCategory>
+            <processingStepDescription>{{ step.description }}</processingStepDescription>
+            <processingStepSettings>{% for k, v in step.settings.items() %}{{k}}: {{v}}{% if not loop.last %}; {% endif %}{% endfor %}</processingStepSettings>
+            <processingSoftware>
+                <softwareName>kraken</softwareName>
+                <softwareVersion>{{ metadata.version }}</softwareVersion>
+            </processingSoftware>
+        </Processing>
+        {% endfor %}
+        {% else %}
+        <Processing ID="OCR_0">
+            <processingCategory>other</processingCategory>
+            <processingStepDescription>unknown</processingStepDescription>
+            <processingSoftware>
+                <softwareName>kraken</softwareName>
+                <softwareVersion>{{ metadata.version }}</softwareVersion>
+            </processingSoftware>
+        </Processing>
+        {% endif %}
+    </Description>
+    <Tags>
+    {% for type, label in page.line_types %}
+        <OtherTag DESCRIPTION="line type" ID="LINE_TYPE_{{ loop.index }}" TYPE="{{ type }}" LABEL="{{ label }}"/>
+    {% endfor %}
+    {% for label in page.region_types %}
+        <OtherTag DESCRIPTION="region type" ID="REGION_TYPE_{{ loop.index }}" TYPE="region" LABEL="{{ label }}"/>
+    {% endfor %}
+    </Tags>
+    {% if page.line_orders|length() > 0 %}
+    <ReadingOrder>
+        {% if page.line_orders | length == 1 %}
+        <OrderedGroup ID="ro_0">
+           {% for id in page.line_orders[0] %}
+           <ElementRef ID="o_{{ loop.index }}" REF="{{ id }}"/>
+           {% endfor %}
+        </OrderedGroup>
+        {% else %}
+        <UnorderedGroup>
+        {% for ro in page.line_orders %}
+           <OrderedGroup ID="ro_{{ loop.index }}">
+           {% for id in ro %}
+               <ElementRef ID="o_{{ loop.index }}" REF="{{ id }}"/>
+           {% endfor %}
+           </OrderedGroup>
+	{% endfor %}
+        </UnorderedGroup>
+        {% endif %}
+    </ReadingOrder>
+    {% endif %}
+    <Layout>
+        <Page WIDTH="{{ page.size[0] }}" HEIGHT="{{ page.size[1] }}" PHYSICAL_IMG_NR="0" ID="page_0">
+            <PrintSpace HPOS="0" VPOS="0" WIDTH="{{ page.size[0] }}" HEIGHT="{{ page.size[1] }}">
+            {% for entity in page.entities %}
+                {% if entity.type == "region" %}
+                {% if loop.previtem and loop.previtem.type == 'line' %}
+                </TextBlock>
+                {% endif %}
+                <TextBlock ID="{{ entity.id }}" {% if entity.bbox %}HPOS="{{ entity.bbox[0] }}" VPOS="{{ entity.bbox[1] }}" WIDTH="{{ entity.bbox[2] - entity.bbox[0] }}" HEIGHT="{{ entity.bbox[3] - entity.bbox[1] }}"{% endif %} {% if entity.tags %}{% for type in page.region_types %}{% if type in entity.tags.values() %}TAGREFS="REGION_TYPE_{{ loop.index }}"{% endif %}{% endfor %}{% endif %}>
+                    {% if entity.bbox %}<Shape>
+                        <Polygon POINTS="{{ entity.boundary|sum(start=[])|join(' ') }}"/>
+                    </Shape>{% endif %}
+                    {%- for line in entity.lines -%}
+                    {{ render_line(page, line) }}
+                    {%- endfor -%}
+                </TextBlock>
+                {% else %}
+                {% if not loop.previtem or loop.previtem.type != 'line' %}
+                <TextBlock ID="textblock_{{ loop.index }}">
+                {% endif %}
+                    {{ render_line(page, entity) }}
+                {% if loop.last %}
+                </TextBlock>
+                {% endif %}
+            {% endif %}
+            {% endfor %}
+            </PrintSpace>
+        </Page>
+    </Layout>
+</alto>
+
+
+
+
+

ABBYY XML

+
{%+ macro render_line(page, line) +%}
+                    <line baseline="{{ ((line.bbox[1] + line.bbox[3]) / 2)|int }}" l="{{ line.bbox[0] }}" r="{{ line.bbox[2] }}" t="{{ line.bbox[1] }}" b="{{ line.bbox[3] }}"><formatting lang="">
+                        {% for segment in line.recognition %}
+                        {% for char in segment.recognition %}
+                        {% if loop.first %}
+                        <charParams l="{{ char.bbox[0] }}" r="{{ char.bbox[2] }}" t="{{ char.bbox[1] }}" b="{{ char.bbox[3] }}" wordStart="1" charConfidence="{{ [char.confidence]|rescale(0, 100)|int }}">{{ char.text }}</charParams>
+                        {% else %}
+                        <charParams l="{{ char.bbox[0] }}" r="{{ char.bbox[2] }}" t="{{ char.bbox[1] }}" b="{{ char.bbox[3] }}" wordStart="0" charConfidence="{{ [char.confidence]|rescale(0, 100)|int }}">{{ char.text }}</charParams>
+                        {% endif %}
+                        {% endfor %}
+                        {% endfor %}
+                    </formatting>
+                    </line>
+{%+ endmacro %}
+<?xml version="1.0" encoding="UTF-8"?>
+<document xmlns="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml" version="1.0" producer="kraken {{ metadata.version}}">
+    <page width="{{ page.size[0] }}" height="{{ page.size[1] }}" resolution="0" originalCoords="1">
+        {% for entity in page.entities %}
+        {% if entity.type == "region" %}
+        <block blockType="Text">
+            <text>
+                <par>
+                {%- for line in entity.lines -%}
+                    {{ render_line(page, line) }}
+                {%- endfor -%}
+                </par>
+            </text>
+        </block>
+        {% else %}
+        <block blockType="Text">
+            <text>
+                <par>
+                    {{ render_line(page, entity) }}
+                </par>
+            </text>
+        </block>
+        {% endif %}
+        {% endfor %}
+    </page>
+</document>
+
+
+
+
+
+
+

Containers and Helpers

+
+

kraken.lib.codec module

+
+
+class kraken.lib.codec.PytorchCodec(charset, strict=False)
+

Builds a codec converting between graphemes/code points and integer +label sequences.

+

charset may either be a string, a list or a dict. In the first case +each code point will be assigned a label, in the second case each +string in the list will be assigned a label, and in the final case each +key string will be mapped to the value sequence of integers. In the +first two cases labels will be assigned automatically. When a mapping +is manually provided the label codes need to be a prefix-free code.

+

As 0 is the blank label in a CTC output layer, output labels and input +dictionaries are/should be 1-indexed.

+
+
Parameters:
+
    +
  • charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) – Input character set.

  • +
  • strict – Flag indicating if encoding/decoding errors should be ignored +or cause an exception.

  • +
+
+
Raises:
+

KrakenCodecException – If the character set contains duplicate +entries or the mapping is non-singular or +non-prefix-free.

+
+
+
+
+add_labels(charset)
+

Adds additional characters/labels to the codec.

+

charset may either be a string, a list or a dict. In the first case +each code point will be assigned a label, in the second case each +string in the list will be assigned a label, and in the final case each +key string will be mapped to the value sequence of integers. In the +first two cases labels will be assigned automatically.

+

As 0 is the blank label in a CTC output layer, output labels and input +dictionaries are/should be 1-indexed.

+
+
Parameters:
+

charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) – Input character set.

+
+
Return type:
+

PytorchCodec

+
+
+
+ +
+
+c_sorted
+
+ +
+
+decode(labels)
+

Decodes a labelling.

+

Given a labelling with cuts and confidences returns a string with the +cuts and confidences aggregated across label-code point +correspondences. When decoding multilabels to code points the resulting +cuts are min/max, confidences are averaged.

+
+
Parameters:
+

labels (Sequence[Tuple[int, int, int, float]]) – Input containing tuples (label, start, end, +confidence).

+
+
Returns:
+

A list of tuples (code point, start, end, confidence)

+
+
Return type:
+

List[Tuple[str, int, int, float]]

+
+
+
+ +
+
+encode(s)
+

Encodes a string into a sequence of labels.

+

If the code is non-singular we greedily encode the longest sequence first.

+
+
Parameters:
+

s (str) – Input unicode string

+
+
Returns:
+

Ecoded label sequence

+
+
Raises:
+

KrakenEncodeException – if the a subsequence is not encodable and the +codec is set to strict mode.

+
+
Return type:
+

torch.IntTensor

+
+
+
+ +
+
+property is_valid: bool
+

Returns True if the codec is prefix-free (in label space) and +non-singular (in both directions).

+
+
Return type:
+

bool

+
+
+
+ +
+
+l2c: Dict[Tuple[int], str]
+
+ +
+
+l2c_single
+
+ +
+
+property max_label: int
+

Returns the maximum label value.

+
+
Return type:
+

int

+
+
+
+ +
+
+merge(codec)
+

Transforms this codec (c1) into another (c2) reusing as many labels as +possible.

+

The resulting codec is able to encode the same code point sequences +while not necessarily having the same labels for them as c2. +Retains matching character -> label mappings from both codecs, removes +mappings not c2, and adds mappings not in c1. Compound labels in c2 for +code point sequences not in c1 containing labels also in use in c1 are +added as separate labels.

+
+
Parameters:
+

codec (PytorchCodec) – PytorchCodec to merge with

+
+
Returns:
+

A merged codec and a list of labels that were removed from the +original codec.

+
+
Return type:
+

Tuple[PytorchCodec, Set]

+
+
+
+ +
+
+strict
+
+ +
+ +
+
+

kraken.containers module

+
+
+class kraken.containers.Segmentation
+

A container class for segmentation or recognition results.

+

In order to allow easy JSON de-/serialization, nested classes for lines +(BaselineLine/BBoxLine) and regions (Region) are reinstantiated from their +dictionaries.

+
+
+type
+

Field indicating if baselines +(kraken.containers.BaselineLine) or bbox +(kraken.containers.BBoxLine) line records are in the +segmentation.

+
+ +
+
+imagename
+

Path to the image associated with the segmentation.

+
+ +
+
+text_direction
+

Sets the principal orientation (of the line), i.e. +horizontal/vertical, and reading direction (of the +document), i.e. lr/rl.

+
+ +
+
+script_detection
+

Flag indicating if the line records have tags.

+
+ +
+
+lines
+

List of line records. Records are expected to be in a valid +reading order.

+
+ +
+
+regions
+

Dict mapping types to lists of regions.

+
+ +
+
+line_orders
+

List of alternative reading orders for the segmentation. +Each reading order is a list of line indices.

+
+ +
+
+imagename: str | os.PathLike
+
+ +
+
+line_orders: List[List[int]] | None = None
+
+ +
+
+lines: List[BaselineLine | BBoxLine] | None = None
+
+ +
+
+regions: Dict[str, List[Region]] | None = None
+
+ +
+
+script_detection: bool
+
+ +
+
+text_direction: Literal['horizontal-lr', 'horizontal-rl', 'vertical-lr', 'vertical-rl']
+
+ +
+
+type: Literal['baselines', 'bbox']
+
+ +
+ +
+
+class kraken.containers.BaselineLine
+

Baseline-type line record.

+

A container class for a single line in baseline + bounding polygon format, +optionally containing a transcription, tags, or associated regions.

+
+
+id
+

Unique identifier

+
+ +
+
+baseline
+

List of tuples (x_n, y_n) defining the baseline.

+
+ +
+
+boundary
+

List of tuples (x_n, y_n) defining the bounding polygon of +the line. The first and last points should be identical.

+
+ +
+
+text
+

Transcription of this line.

+
+ +
+
+base_dir
+

An optional string defining the base direction (also called +paragraph direction) for the BiDi algorithm. Valid values are +‘L’ or ‘R’. If None is given the default auto-resolution will +be used.

+
+ +
+
+imagename
+

Path to the image associated with the line.

+
+ +
+
+tags
+

A dict mapping types to values.

+
+ +
+
+split
+

Defines whether this line is in the train, validation, or +test set during training.

+
+ +
+
+regions
+

A list of identifiers of regions the line is associated with.

+
+ +
+
+base_dir: Literal['L', 'R'] | None = None
+
+ +
+
+baseline: List[Tuple[int, int]]
+
+ +
+
+boundary: List[Tuple[int, int]]
+
+ +
+
+id: str
+
+ +
+
+imagename: str | os.PathLike | None = None
+
+ +
+
+regions: List[str] | None = None
+
+ +
+
+split: Literal['train', 'validation', 'test'] | None = None
+
+ +
+
+tags: Dict[str, str] | None = None
+
+ +
+
+text: str | None = None
+
+ +
+
+type: str = 'baselines'
+
+ +
+ +
+
+class kraken.containers.BBoxLine
+

Bounding box-type line record.

+

A container class for a single line in axis-aligned bounding box format, +optionally containing a transcription, tags, or associated regions.

+
+
+id
+

Unique identifier

+
+ +
+
+bbox
+

Tuple in form (xmin, ymin, xmax, ymax) defining +the bounding box.

+
+ +
+
+text
+

Transcription of this line.

+
+ +
+
+base_dir
+

An optional string defining the base direction (also called +paragraph direction) for the BiDi algorithm. Valid values are +‘L’ or ‘R’. If None is given the default auto-resolution will +be used.

+
+ +
+
+imagename
+

Path to the image associated with the line..

+
+ +
+
+tags
+

A dict mapping types to values.

+
+ +
+
+split
+

Defines whether this line is in the train, validation, or +test set during training.

+
+ +
+
+regions
+

A list of identifiers of regions the line is associated with.

+
+ +
+
+text_direction
+

Sets the principal orientation (of the line) and +reading direction (of the document).

+
+ +
+
+base_dir: Literal['L', 'R'] | None = None
+
+ +
+
+bbox: Tuple[int, int, int, int]
+
+ +
+
+id: str
+
+ +
+
+imagename: str | os.PathLike | None = None
+
+ +
+
+regions: List[str] | None = None
+
+ +
+
+split: Literal['train', 'validation', 'test'] | None = None
+
+ +
+
+tags: Dict[str, str] | None = None
+
+ +
+
+text: str | None = None
+
+ +
+
+text_direction: Literal['horizontal-lr', 'horizontal-rl', 'vertical-lr', 'vertical-rl'] = 'horizontal-lr'
+
+ +
+
+type: str = 'bbox'
+
+ +
+ +
+
+class kraken.containers.Region
+

Container class of a single polygonal region.

+
+
+id
+

Unique identifier

+
+ +
+
+boundary
+

List of tuples (x_n, y_n) defining the bounding polygon of +the region. The first and last points should be identical.

+
+ +
+
+imagename
+

Path to the image associated with the region.

+
+ +
+
+tags
+

A dict mapping types to values.

+
+ +
+
+boundary: List[Tuple[int, int]]
+
+ +
+
+id: str
+
+ +
+
+imagename: str | os.PathLike | None = None
+
+ +
+
+tags: Dict[str, str] | None = None
+
+ +
+ +
+
+class kraken.containers.ocr_record(prediction, cuts, confidences, display_order=True)
+

A record object containing the recognition result of a single line

+
+
Parameters:
+
    +
  • prediction (str)

  • +
  • cuts (List[Union[Tuple[int, int], Tuple[Tuple[int, int], Tuple[int, int], Tuple[int, int], Tuple[int, int]]]])

  • +
  • confidences (List[float])

  • +
  • display_order (bool)

  • +
+
+
+
+
+base_dir = None
+
+ +
+
+property confidences: List[float]
+
+
Return type:
+

List[float]

+
+
+
+ +
+
+property cuts: List
+
+
Return type:
+

List

+
+
+
+ +
+
+abstract display_order(base_dir)
+
+
Return type:
+

ocr_record

+
+
+
+ +
+
+abstract logical_order(base_dir)
+
+
Return type:
+

ocr_record

+
+
+
+ +
+
+property prediction: str
+
+
Return type:
+

str

+
+
+
+ +
+
+abstract property type
+
+ +
+ +
+
+class kraken.containers.BaselineOCRRecord(prediction, cuts, confidences, line, base_dir=None, display_order=True)
+

A record object containing the recognition result of a single line in +baseline format.

+
+
Parameters:
+
    +
  • prediction (str)

  • +
  • cuts (List[Tuple[int, int]])

  • +
  • confidences (List[float])

  • +
  • line (Union[BaselineLine, Dict[str, Any]])

  • +
  • base_dir (Optional[Literal['L', 'R']])

  • +
  • display_order (bool)

  • +
+
+
+
+
+type
+

‘baselines’ to indicate a baseline record

+
+ +
+
+prediction
+

The text predicted by the network as one continuous string.

+
+
Return type:
+

str

+
+
+
+ +
+
+cuts
+

The absolute bounding polygons for each code point in prediction +as a list of tuples [(x0, y0), (x1, y2), …].

+
+
Return type:
+

List[Tuple[int, int]]

+
+
+
+ +
+
+confidences
+

A list of floats indicating the confidence value of each +code point.

+
+
Return type:
+

List[float]

+
+
+
+ +
+
+base_dir
+

An optional string defining the base direction (also called +paragraph direction) for the BiDi algorithm. Valid values are +‘L’ or ‘R’. If None is given the default auto-resolution will +be used.

+
+ +
+
+display_order
+

Flag indicating the order of the code points in the +prediction. In display order (True) the n-th code +point in the string corresponds to the n-th leftmost +code point, in logical order (False) the n-th code +point corresponds to the n-th read code point. See [UAX +#9](https://unicode.org/reports/tr9) for more details.

+
+
Parameters:
+

base_dir (Optional[Literal['L', 'R']])

+
+
Return type:
+

BaselineOCRRecord

+
+
+
+ +

Notes

+

When slicing the record the behavior of the cuts is changed from +earlier versions of kraken. Instead of returning per-character bounding +polygons a single polygons section of the line bounding polygon +starting at the first and extending to the last code point emitted by +the network is returned. This aids numerical stability when computing +aggregated bounding polygons such as for words. Individual code point +bounding polygons are still accessible through the cuts attribute or +by iterating over the record code point by code point.

+
+
+base_dir
+
+ +
+
+property cuts: List[Tuple[int, int]]
+
+
Return type:
+

List[Tuple[int, int]]

+
+
+
+ +
+
+display_order(base_dir=None)
+

Returns the OCR record in Unicode display order, i.e. ordered from left +to right inside the line.

+
+
Parameters:
+

base_dir (Optional[Literal['L', 'R']]) – An optional string defining the base direction (also +called paragraph direction) for the BiDi algorithm. Valid +values are ‘L’ or ‘R’. If None is given the default +auto-resolution will be used.

+
+
Return type:
+

BaselineOCRRecord

+
+
+
+ +
+
+logical_order(base_dir=None)
+

Returns the OCR record in Unicode logical order, i.e. in the order the +characters in the line would be read by a human.

+
+
Parameters:
+

base_dir (Optional[Literal['L', 'R']]) – An optional string defining the base direction (also +called paragraph direction) for the BiDi algorithm. Valid +values are ‘L’ or ‘R’. If None is given the default +auto-resolution will be used.

+
+
Return type:
+

BaselineOCRRecord

+
+
+
+ +
+
+type = 'baselines'
+
+ +
+ +
+
+class kraken.containers.BBoxOCRRecord(prediction, cuts, confidences, line, base_dir=None, display_order=True)
+

A record object containing the recognition result of a single line in +bbox format.

+
+
Parameters:
+
    +
  • prediction (str)

  • +
  • cuts (List[Tuple[Tuple[int, int], Tuple[int, int], Tuple[int, int], Tuple[int, int]]])

  • +
  • confidences (List[float])

  • +
  • line (Union[BBoxLine, Dict[str, Any]])

  • +
  • base_dir (Optional[Literal['L', 'R']])

  • +
  • display_order (bool)

  • +
+
+
+
+
+type
+

‘bbox’ to indicate a bounding box record

+
+ +
+
+prediction
+

The text predicted by the network as one continuous string.

+
+
Return type:
+

str

+
+
+
+ +
+
+cuts
+

The absolute bounding polygons for each code point in prediction +as a list of 4-tuples ((x0, y0), (x1, y0), (x1, y1), (x0, y1)).

+
+
Return type:
+

List

+
+
+
+ +
+
+confidences
+

A list of floats indicating the confidence value of each +code point.

+
+
Return type:
+

List[float]

+
+
+
+ +
+
+base_dir
+

An optional string defining the base direction (also called +paragraph direction) for the BiDi algorithm. Valid values are +‘L’ or ‘R’. If None is given the default auto-resolution will +be used.

+
+ +
+
+display_order
+

Flag indicating the order of the code points in the +prediction. In display order (True) the n-th code +point in the string corresponds to the n-th leftmost +code point, in logical order (False) the n-th code +point corresponds to the n-th read code point. See [UAX +#9](https://unicode.org/reports/tr9) for more details.

+
+
Parameters:
+

base_dir (Optional[Literal['L', 'R']])

+
+
Return type:
+

BBoxOCRRecord

+
+
+
+ +

Notes

+

When slicing the record the behavior of the cuts is changed from +earlier versions of kraken. Instead of returning per-character bounding +polygons a single polygons section of the line bounding polygon +starting at the first and extending to the last code point emitted by +the network is returned. This aids numerical stability when computing +aggregated bounding polygons such as for words. Individual code point +bounding polygons are still accessible through the cuts attribute or +by iterating over the record code point by code point.

+
+
+base_dir
+
+ +
+
+display_order(base_dir=None)
+

Returns the OCR record in Unicode display order, i.e. ordered from left +to right inside the line.

+
+
Parameters:
+

base_dir (Optional[Literal['L', 'R']]) – An optional string defining the base direction (also +called paragraph direction) for the BiDi algorithm. Valid +values are ‘L’ or ‘R’. If None is given the default +auto-resolution will be used.

+
+
Return type:
+

BBoxOCRRecord

+
+
+
+ +
+
+logical_order(base_dir=None)
+

Returns the OCR record in Unicode logical order, i.e. in the order the +characters in the line would be read by a human.

+
+
Parameters:
+

base_dir (Optional[Literal['L', 'R']]) – An optional string defining the base direction (also +called paragraph direction) for the BiDi algorithm. Valid +values are ‘L’ or ‘R’. If None is given the default +auto-resolution will be used.

+
+
Return type:
+

BBoxOCRRecord

+
+
+
+ +
+
+type = 'bbox'
+
+ +
+ +
+
+class kraken.containers.ProcessingStep
+

A processing step in the recognition pipeline.

+
+
+id
+

Unique identifier

+
+ +
+
+category
+

Category of processing step that has been performed.

+
+ +
+
+description
+

Natural-language description of the process.

+
+ +
+
+settings
+

Dict describing the parameters of the processing step.

+
+ +
+
+category: Literal['preprocessing', 'processing', 'postprocessing']
+
+ +
+
+description: str
+
+ +
+
+id: str
+
+ +
+
+settings: Dict[str, Dict | str | float | int | bool]
+
+ +
+ +
+
+

kraken.lib.ctc_decoder

+
+
+kraken.lib.ctc_decoder.beam_decoder(outputs, beam_size=3)
+

Translates back the network output to a label sequence using +same-prefix-merge beam search decoding as described in [0].

+

[0] Hannun, Awni Y., et al. “First-pass large vocabulary continuous speech +recognition using bi-directional recurrent DNNs.” arXiv preprint +arXiv:1408.2873 (2014).

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • beam_size (int) – Size of the beam

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, prob). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+kraken.lib.ctc_decoder.greedy_decoder(outputs)
+

Translates back the network output to a label sequence using greedy/best +path decoding as described in [0].

+

[0] Graves, Alex, et al. “Connectionist temporal classification: labelling +unsegmented sequence data with recurrent neural networks.” Proceedings of +the 23rd international conference on Machine learning. ACM, 2006.

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, max). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+kraken.lib.ctc_decoder.blank_threshold_decoder(outputs, threshold=0.5)
+

Translates back the network output to a label sequence as the original +ocropy/clstm.

+

Thresholds on class 0, then assigns the maximum (non-zero) class to each +region.

+
+
Parameters:
+
    +
  • output – (C, W) shaped softmax output tensor

  • +
  • threshold (float) – Threshold for 0 class when determining possible label +locations.

  • +
  • outputs (numpy.ndarray)

  • +
+
+
Returns:
+

A list with tuples (class, start, end, max). max is the maximum value +of the softmax layer in the region.

+
+
Return type:
+

List[Tuple[int, int, int, float]]

+
+
+
+ +
+
+

kraken.lib.exceptions

+
+
+class kraken.lib.exceptions.KrakenCodecException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenStopTrainingException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenEncodeException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenRecordException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenInvalidModelException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenInputException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenRepoException(message=None)
+

Common base class for all non-exit exceptions.

+
+ +
+
+class kraken.lib.exceptions.KrakenCairoSurfaceException(message, width, height)
+

Raised when the Cairo surface couldn’t be created.

+
+
Parameters:
+
    +
  • message (str)

  • +
  • width (int)

  • +
  • height (int)

  • +
+
+
+
+
+message
+

Error message

+
+
Type:
+

str

+
+
+
+ +
+
+width
+

Width of the surface

+
+
Type:
+

int

+
+
+
+ +
+
+height
+

Height of the surface

+
+
Type:
+

int

+
+
+
+ +
+
+height
+
+ +
+
+message
+
+ +
+
+width
+
+ +
+ +
+
+

kraken.lib.models module

+
+
+class kraken.lib.models.TorchSeqRecognizer(nn, decoder=kraken.lib.ctc_decoder.greedy_decoder, train=False, device='cpu')
+

A wrapper class around a TorchVGSLModel for text recognition.

+
+
Parameters:
+
+
+
+
+
+codec
+
+ +
+
+decoder
+
+ +
+
+device
+
+ +
+
+forward(line, lens=None)
+

Performs a forward pass on a torch tensor of one or more lines with +shape (N, C, H, W) and returns a numpy array (N, W, C).

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (torch.Tensor) – Optional tensor containing sequence lengths if N > 1

  • +
+
+
Returns:
+

Tuple with (N, W, C) shaped numpy array and final output sequence +lengths.

+
+
Raises:
+

KrakenInputException – Is raised if the channel dimension isn’t of +size 1 in the network output.

+
+
Return type:
+

Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]

+
+
+
+ +
+
+kind = ''
+
+ +
+
+nn
+
+ +
+
+one_channel_mode
+
+ +
+
+predict(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns the decoding as a list of tuples (string, start, end, +confidence).

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (Optional[torch.Tensor]) – Optional tensor containing sequence lengths if N > 1

  • +
+
+
Returns:
+

List of decoded sequences.

+
+
Return type:
+

List[List[Tuple[str, int, int, float]]]

+
+
+
+ +
+
+predict_labels(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns a list of tuples (class, start, end, max). Max is the +maximum value of the softmax layer in the region.

+
+
Parameters:
+
    +
  • line (torch.tensor)

  • +
  • lens (torch.Tensor)

  • +
+
+
Return type:
+

List[List[Tuple[int, int, int, float]]]

+
+
+
+ +
+
+predict_string(line, lens=None)
+

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) +and returns a string of the results.

+
+
Parameters:
+
    +
  • line (torch.Tensor) – NCHW line tensor

  • +
  • lens (Optional[torch.Tensor]) – Optional tensor containing the sequence lengths of the input batch.

  • +
+
+
Return type:
+

List[str]

+
+
+
+ +
+
+seg_type
+
+ +
+
+to(device)
+

Moves model to device and automatically loads input tensors onto it.

+
+ +
+
+train
+
+ +
+ +
+
+kraken.lib.models.load_any(fname, train=False, device='cpu')
+

Loads anything that was, is, and will be a valid ocropus model and +instantiates a shiny new kraken.lib.lstm.SeqRecognizer from the RNN +configuration in the file.

+

Currently it recognizes the following kinds of models:

+
+
    +
  • protobuf models containing VGSL segmentation and recognition +networks.

  • +
+
+

Additionally an attribute ‘kind’ will be added to the SeqRecognizer +containing a string representation of the source kind. Current known values +are:

+
+
    +
  • vgsl for VGSL models

  • +
+
+
+
Parameters:
+
    +
  • fname (Union[os.PathLike, str]) – Path to the model

  • +
  • train (bool) – Enables gradient calculation and dropout layers in model.

  • +
  • device (str) – Target device

  • +
+
+
Returns:
+

A kraken.lib.models.TorchSeqRecognizer object.

+
+
Raises:
+

KrakenInvalidModelException – if the model is not loadable by any parser.

+
+
Return type:
+

TorchSeqRecognizer

+
+
+
+ +
+
+

kraken.lib.segmentation module

+
+
+kraken.lib.segmentation.reading_order(lines, text_direction='lr')
+

Given the list of lines (a list of 2D slices), computes +the partial reading order. The output is a binary 2D array +such that order[i,j] is true if line i comes before line j +in reading order.

+
+
Parameters:
+
    +
  • lines (Sequence[Tuple[slice, slice]])

  • +
  • text_direction (Literal['lr', 'rl'])

  • +
+
+
Return type:
+

numpy.ndarray

+
+
+
+ +
+
+kraken.lib.segmentation.neural_reading_order(lines, text_direction='lr', regions=None, im_size=None, model=None, class_mapping=None)
+

Given a list of baselines and regions, calculates the correct reading order +and applies it to the input.

+
+
Parameters:
+
    +
  • lines (Sequence[Dict]) – List of tuples containing the baseline and its polygonization.

  • +
  • model (kraken.lib.vgsl.TorchVGSLModel) – torch Module for

  • +
  • text_direction (str)

  • +
  • regions (Optional[Sequence[shapely.geometry.Polygon]])

  • +
  • im_size (Tuple[int, int])

  • +
  • class_mapping (Dict[str, int])

  • +
+
+
Returns:
+

The indices of the ordered input.

+
+
Return type:
+

Sequence[int]

+
+
+
+ +
+
+kraken.lib.segmentation.polygonal_reading_order(lines, text_direction='lr', regions=None)
+

Given a list of baselines and regions, calculates the correct reading order +and applies it to the input.

+
+
Parameters:
+
    +
  • lines (Sequence[Dict]) – List of tuples containing the baseline and its polygonization.

  • +
  • regions (Optional[Sequence[shapely.geometry.Polygon]]) – List of region polygons.

  • +
  • text_direction (Literal['lr', 'rl']) – Set principal text direction for column ordering. Can +be ‘lr’ or ‘rl’

  • +
+
+
Returns:
+

The indices of the ordered input.

+
+
Return type:
+

Sequence[int]

+
+
+
+ +
+
+kraken.lib.segmentation.vectorize_lines(im, threshold=0.17, min_length=5, text_direction='horizontal')
+

Vectorizes lines from a binarized array.

+
+
Parameters:
+
    +
  • im (np.ndarray) – Array of shape (3, H, W) with the first dimension +being probabilities for (start_separators, +end_separators, baseline).

  • +
  • threshold (float) – Threshold for baseline blob detection.

  • +
  • min_length (int) – Minimal length of output baselines.

  • +
  • text_direction (str) – Base orientation of the text line (horizontal or +vertical).

  • +
+
+
Returns:
+

[[x0, y0, … xn, yn], [xm, ym, …, xk, yk], … ] +A list of lists containing the points of all baseline polylines.

+
+
+
+ +
+
+kraken.lib.segmentation.calculate_polygonal_environment(im=None, baselines=None, suppl_obj=None, im_feats=None, scale=None, topline=False, raise_on_error=False)
+

Given a list of baselines and an input image, calculates a polygonal +environment around each baseline.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – grayscale input image (mode ‘L’)

  • +
  • baselines (Sequence[Sequence[Tuple[int, int]]]) – List of lists containing a single baseline per entry.

  • +
  • suppl_obj (Sequence[Sequence[Tuple[int, int]]]) – List of lists containing additional polylines that should be +considered hard boundaries for polygonizaton purposes. Can +be used to prevent polygonization into non-text areas such +as illustrations or to compute the polygonization of a +subset of the lines in an image.

  • +
  • im_feats (numpy.ndarray) – An optional precomputed seamcarve energy map. Overrides data +in im. The default map is gaussian_filter(sobel(im), 2).

  • +
  • scale (Tuple[int, int]) – A 2-tuple (h, w) containing optional scale factors of the input. +Values of 0 are used for aspect-preserving scaling. None skips +input scaling.

  • +
  • topline (bool) – Switch to change default baseline location for offset +calculation purposes. If set to False, baselines are assumed +to be on the bottom of the text line and will be offset +upwards, if set to True, baselines are on the top and will be +offset downwards. If set to None, no offset will be applied.

  • +
  • raise_on_error (bool) – Raises error instead of logging them when they are +not-blocking

  • +
+
+
Returns:
+

List of lists of coordinates. If no polygonization could be compute for +a baseline None is returned instead.

+
+
+
+ +
+
+kraken.lib.segmentation.scale_polygonal_lines(lines, scale)
+

Scales baselines/polygon coordinates by a certain factor.

+
+
Parameters:
+
    +
  • lines (Sequence[Tuple[List, List]]) – List of tuples containing the baseline and its polygonization.

  • +
  • scale (Union[float, Tuple[float, float]]) – Scaling factor

  • +
+
+
Return type:
+

Sequence[Tuple[List, List]]

+
+
+
+ +
+
+kraken.lib.segmentation.scale_regions(regions, scale)
+

Scales baselines/polygon coordinates by a certain factor.

+
+
Parameters:
+
    +
  • lines – List of tuples containing the baseline and its polygonization.

  • +
  • scale (Union[float, Tuple[float, float]]) – Scaling factor

  • +
  • regions (Sequence[Tuple[List[int], List[int]]])

  • +
+
+
Return type:
+

Sequence[Tuple[List, List]]

+
+
+
+ +
+
+kraken.lib.segmentation.compute_polygon_section(baseline, boundary, dist1, dist2)
+

Given a baseline, polygonal boundary, and two points on the baseline return +the rectangle formed by the orthogonal cuts on that baseline segment. The +resulting polygon is not garantueed to have a non-zero area.

+

The distance can be larger than the actual length of the baseline if the +baseline endpoints are inside the bounding polygon. In that case the +baseline will be extrapolated to the polygon edge.

+
+
Parameters:
+
    +
  • baseline (Sequence[Tuple[int, int]]) – A polyline ((x1, y1), …, (xn, yn))

  • +
  • boundary (Sequence[Tuple[int, int]]) – A bounding polygon around the baseline (same format as +baseline). Last and first point are automatically connected.

  • +
  • dist1 (int) – Absolute distance along the baseline of the first point.

  • +
  • dist2 (int) – Absolute distance along the baseline of the second point.

  • +
+
+
Returns:
+

A sequence of polygon points.

+
+
Return type:
+

Tuple[Tuple[int, int]]

+
+
+
+ +
+
+kraken.lib.segmentation.extract_polygons(im, bounds, legacy=False)
+

Yields the subimages of image im defined in the list of bounding polygons +with baselines preserving order.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image

  • +
  • bounds (kraken.containers.Segmentation) – A Segmentation class containing a bounding box or baseline +segmentation.

  • +
  • legacy (bool) – Use the old, slow, and deprecated path

  • +
+
+
Yields:
+

The extracted subimage, and the corresponding bounding box or baseline

+
+
Return type:
+

Generator[Tuple[PIL.Image.Image, Union[kraken.containers.BBoxLine, kraken.containers.BaselineLine]], None, None]

+
+
+
+ +
+
+

kraken.lib.vgsl module

+
+
+class kraken.lib.vgsl.TorchVGSLModel(spec)
+

Class building a torch module from a VSGL spec.

+

The initialized class will contain a variable number of layers and a loss +function. Inputs and outputs are always 4D tensors in order (batch, +channels, height, width) with channels always being the feature dimension.

+

Importantly this means that a recurrent network will be fed the channel +vector at each step along its time axis, i.e. either put the non-time-axis +dimension into the channels dimension or use a summarizing RNN squashing +the time axis to 1 and putting the output into the channels dimension +respectively.

+
+
Parameters:
+

spec (str)

+
+
+
+
+input
+

Expected input tensor as a 4-tuple.

+
+ +
+
+nn
+

Stack of layers parsed from the spec.

+
+ +
+
+criterion
+

Fully parametrized loss function.

+
+ +
+
+user_metadata
+

dict with user defined metadata. Is flushed into +model file during saving/overwritten by loading +operations.

+
+ +
+
+one_channel_mode
+

Field indicating the image type used during +training of one-channel images. Is ‘1’ for +models trained on binarized images, ‘L’ for +grayscale, and None otherwise.

+
+ +
+
+add_codec(codec)
+

Adds a PytorchCodec to the model.

+
+
Parameters:
+

codec (kraken.lib.codec.PytorchCodec)

+
+
Return type:
+

None

+
+
+
+ +
+
+append(idx, spec)
+

Splits a model at layer idx and append layers spec.

+

New layers are initialized using the init_weights method.

+
+
Parameters:
+
    +
  • idx (int) – Index of layer to append spec to starting with 1. To +select the whole layer stack set idx to None.

  • +
  • spec (str) – VGSL spec without input block to append to model.

  • +
+
+
Return type:
+

None

+
+
+
+ +
+
+property aux_layers
+
+ +
+
+blocks
+
+ +
+
+build_addition(input, blocks, idx, target_output_shape=None)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_conv(input, blocks, idx, target_output_shape=None)
+

Builds a 2D convolution layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_dropout(input, blocks, idx, target_output_shape=None)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_groupnorm(input, blocks, idx, target_output_shape=None)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_identity(input, blocks, idx, target_output_shape=None)
+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_maxpool(input, blocks, idx, target_output_shape=None)
+

Builds a maxpool layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_output(input, blocks, idx, target_output_shape=None)
+

Builds an output layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_parallel(input, blocks, idx, target_output_shape=None)
+

Builds a block of parallel layers.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_reshape(input, blocks, idx, target_output_shape=None)
+

Builds a reshape layer

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_rnn(input, blocks, idx, target_output_shape=None)
+

Builds an LSTM/GRU layer returning number of outputs and layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_ro(input, blocks, idx)
+

Builds a RO determination layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_series(input, blocks, idx, target_output_shape=None)
+

Builds a serial block of layers.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+build_wav2vec2(input, blocks, idx, target_output_shape=None)
+

Builds a Wav2Vec2 masking layer.

+
+
Parameters:
+
    +
  • input (Tuple[int, int, int, int])

  • +
  • blocks (List[str])

  • +
  • idx (int)

  • +
  • target_output_shape (Optional[Tuple[int, int, int, int]])

  • +
+
+
Return type:
+

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

+
+
+
+ +
+
+codec: kraken.lib.codec.PytorchCodec | None = None
+
+ +
+
+criterion: Any = None
+
+ +
+
+eval()
+

Sets the model to evaluation/inference mode, disabling dropout and +gradient calculation.

+
+
Return type:
+

None

+
+
+
+ +
+
+property hyper_params
+
+ +
+
+idx
+
+ +
+
+init_weights(idx=slice(0, None))
+

Initializes weights for all or a subset of layers in the graph.

+

LSTM/GRU layers are orthogonally initialized, convolutional layers +uniformly from (-0.1,0.1).

+
+
Parameters:
+

idx (slice) – A slice object representing the indices of layers to +initialize.

+
+
Return type:
+

None

+
+
+
+ +
+
+input
+
+ +
+
+classmethod load_model(path)
+

Deserializes a VGSL model from a CoreML file.

+
+
Parameters:
+

path (Union[str, os.PathLike]) – CoreML file

+
+
Returns:
+

A TorchVGSLModel instance.

+
+
Raises:
+
    +
  • KrakenInvalidModelException if the model data is invalid (not a

  • +
  • string, protobuf file, or without appropriate metadata).

  • +
  • FileNotFoundError if the path doesn't point to a file.

  • +
+
+
+
+ +
+
+m
+
+ +
+
+property model_type
+
+ +
+
+named_spec: List[str] = []
+
+ +
+
+nn
+
+ +
+
+property one_channel_mode
+
+ +
+
+ops
+
+ +
+
+pattern
+
+ +
+
+resize_output(output_size, del_indices=None)
+

Resizes an output layer.

+
+
Parameters:
+
    +
  • output_size (int) – New size/output channels of last layer

  • +
  • del_indices (list) – list of outputs to delete from layer

  • +
+
+
Return type:
+

None

+
+
+
+ +
+
+save_model(path)
+

Serializes the model into path.

+
+
Parameters:
+

path (str) – Target destination

+
+
+
+ +
+
+property seg_type
+
+ +
+
+set_num_threads(num)
+

Sets number of OpenMP threads to use.

+
+
Parameters:
+

num (int)

+
+
Return type:
+

None

+
+
+
+ +
+
+spec
+
+ +
+
+to(device)
+
+
Parameters:
+

device (Union[str, torch.device])

+
+
Return type:
+

None

+
+
+
+ +
+
+train()
+

Sets the model to training mode (enables dropout layers and disables +softmax on CTC layers).

+
+
Return type:
+

None

+
+
+
+ +
+
+property use_legacy_polygons
+
+ +
+
+user_metadata: Dict[str, Any]
+
+ +
+ +
+
+

kraken.lib.xml module

+
+
+class kraken.lib.xml.XMLPage(filename, filetype='xml')
+
+
Parameters:
+
    +
  • filename (Union[str, os.PathLike])

  • +
  • filetype (Literal['xml', 'alto', 'page'])

  • +
+
+
+
+ +
+
+
+

Training

+
+

kraken.lib.train module

+
+
+

Loss and Evaluation Functions

+
+
+

Trainer

+
+
+class kraken.lib.train.KrakenTrainer(enable_progress_bar=True, enable_summary=True, min_epochs=5, max_epochs=100, freeze_backbone=-1, pl_logger=None, log_dir=None, *args, **kwargs)
+
+
Parameters:
+
    +
  • enable_progress_bar (bool)

  • +
  • enable_summary (bool)

  • +
  • min_epochs (int)

  • +
  • max_epochs (int)

  • +
  • pl_logger (Union[lightning.pytorch.loggers.logger.Logger, str, None])

  • +
  • log_dir (Optional[os.PathLike])

  • +
+
+
+
+
+automatic_optimization = False
+
+ +
+
+fit(*args, **kwargs)
+
+ +
+ +
+
+

kraken.lib.dataset module

+
+

Recognition datasets

+
+
+class kraken.lib.dataset.ArrowIPCRecognitionDataset(normalization=None, whitespace_normalization=True, skip_empty_lines=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False, split_filter=None)
+

Dataset for training a recognition model from a precompiled dataset in +Arrow IPC format.

+
+
Parameters:
+
    +
  • normalization (Optional[str])

  • +
  • whitespace_normalization (bool)

  • +
  • skip_empty_lines (bool)

  • +
  • reorder (Union[bool, Literal['L', 'R']])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • augmentation (bool)

  • +
  • split_filter (Optional[str])

  • +
+
+
+
+
+add(file)
+

Adds an Arrow IPC file to the dataset.

+
+
Parameters:
+

file (Union[str, os.PathLike]) – Location of the precompiled dataset file.

+
+
Return type:
+

None

+
+
+
+ +
+
+alphabet: collections.Counter
+
+ +
+
+arrow_table = None
+
+ +
+
+aug = None
+
+ +
+
+codec = None
+
+ +
+
+encode(codec=None)
+

Adds a codec to the dataset.

+
+
Parameters:
+

codec (Optional[kraken.lib.codec.PytorchCodec])

+
+
Return type:
+

None

+
+
+
+ +
+
+failed_samples
+
+ +
+
+im_mode
+
+ +
+
+legacy_polygons_status = None
+
+ +
+
+no_encode()
+

Creates an unencoded dataset.

+
+
Return type:
+

None

+
+
+
+ +
+
+rebuild_alphabet()
+

Recomputes the alphabet depending on the given text transformation.

+
+ +
+
+seg_type = None
+
+ +
+
+skip_empty_lines
+
+ +
+
+text_transforms: List[Callable[[str], str]] = []
+
+ +
+
+transforms
+
+ +
+ +
+
+class kraken.lib.dataset.BaselineSet(line_width=4, padding=(0, 0, 0, 0), im_transforms=transforms.Compose([]), augmentation=False, valid_baselines=None, merge_baselines=None, valid_regions=None, merge_regions=None)
+

Dataset for training a baseline/region segmentation model.

+
+
Parameters:
+
    +
  • line_width (int)

  • +
  • padding (Tuple[int, int, int, int])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • augmentation (bool)

  • +
  • valid_baselines (Sequence[str])

  • +
  • merge_baselines (Dict[str, Sequence[str]])

  • +
  • valid_regions (Sequence[str])

  • +
  • merge_regions (Dict[str, Sequence[str]])

  • +
+
+
+
+
+add(doc)
+

Adds a page to the dataset.

+
+
Parameters:
+

doc (kraken.containers.Segmentation) – A Segmentation container class.

+
+
+
+ +
+
+aug = None
+
+ +
+
+class_mapping
+
+ +
+
+class_stats
+
+ +
+
+failed_samples
+
+ +
+
+im_mode = '1'
+
+ +
+
+imgs = []
+
+ +
+
+line_width
+
+ +
+
+mbl_dict
+
+ +
+
+mreg_dict
+
+ +
+
+num_classes = 2
+
+ +
+
+pad
+
+ +
+
+seg_type = None
+
+ +
+
+targets = []
+
+ +
+
+transform(image, target)
+
+ +
+
+transforms
+
+ +
+
+valid_baselines
+
+ +
+
+valid_regions
+
+ +
+ +
+
+class kraken.lib.dataset.GroundTruthDataset(normalization=None, whitespace_normalization=True, skip_empty_lines=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False)
+

Dataset for training a line recognition model.

+

All data is cached in memory.

+
+
Parameters:
+
    +
  • normalization (Optional[str])

  • +
  • whitespace_normalization (bool)

  • +
  • skip_empty_lines (bool)

  • +
  • reorder (Union[bool, str])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • augmentation (bool)

  • +
+
+
+
+
+add(line=None, page=None)
+

Adds an individual line or all lines on a page to the dataset.

+
+
Parameters:
+
+
+
+
+ +
+
+add_line(line)
+

Adds a line to the dataset.

+
+
Parameters:
+

line (kraken.containers.BBoxLine) – BBoxLine container object for a line.

+
+
Raises:
+
    +
  • ValueError if the transcription of the line is empty after

  • +
  • transformation or either baseline or bounding polygon are missing.

  • +
+
+
+
+ +
+
+add_page(page)
+

Adds all lines on a page to the dataset.

+

Invalid lines will be skipped and a warning will be printed.

+
+
Parameters:
+

page (kraken.containers.Segmentation) – Segmentation container object for a page.

+
+
+
+ +
+
+alphabet: collections.Counter
+
+ +
+
+aug = None
+
+ +
+
+encode(codec=None)
+

Adds a codec to the dataset and encodes all text lines.

+

Has to be run before sampling from the dataset.

+
+
Parameters:
+

codec (Optional[kraken.lib.codec.PytorchCodec])

+
+
Return type:
+

None

+
+
+
+ +
+
+failed_samples
+
+ +
+
+property im_mode
+
+ +
+
+no_encode()
+

Creates an unencoded dataset.

+
+
Return type:
+

None

+
+
+
+ +
+
+seg_type = 'bbox'
+
+ +
+
+skip_empty_lines
+
+ +
+
+text_transforms: List[Callable[[str], str]] = []
+
+ +
+
+transforms
+
+ +
+ +
+
+

Segmentation datasets

+
+
+class kraken.lib.dataset.PolygonGTDataset(normalization=None, whitespace_normalization=True, skip_empty_lines=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False, legacy_polygons=False)
+

Dataset for training a line recognition model from polygonal/baseline data.

+
+
Parameters:
+
    +
  • normalization (Optional[str])

  • +
  • whitespace_normalization (bool)

  • +
  • skip_empty_lines (bool)

  • +
  • reorder (Union[bool, Literal['L', 'R']])

  • +
  • im_transforms (Callable[[Any], torch.Tensor])

  • +
  • augmentation (bool)

  • +
  • legacy_polygons (bool)

  • +
+
+
+
+
+add(line=None, page=None)
+

Adds an individual line or all lines on a page to the dataset.

+
+
Parameters:
+
+
+
+
+ +
+
+add_line(line)
+

Adds a line to the dataset.

+
+
Parameters:
+

line (kraken.containers.BaselineLine) – BaselineLine container object for a line.

+
+
Raises:
+
    +
  • ValueError if the transcription of the line is empty after

  • +
  • transformation or either baseline or bounding polygon are missing.

  • +
+
+
+
+ +
+
+add_page(page)
+

Adds all lines on a page to the dataset.

+

Invalid lines will be skipped and a warning will be printed.

+
+
Parameters:
+

page (kraken.containers.Segmentation) – Segmentation container object for a page.

+
+
+
+ +
+
+alphabet: collections.Counter
+
+ +
+
+aug = None
+
+ +
+
+encode(codec=None)
+

Adds a codec to the dataset and encodes all text lines.

+

Has to be run before sampling from the dataset.

+
+
Parameters:
+

codec (Optional[kraken.lib.codec.PytorchCodec])

+
+
Return type:
+

None

+
+
+
+ +
+
+failed_samples
+
+ +
+
+property im_mode
+
+ +
+
+legacy_polygons
+
+ +
+
+no_encode()
+

Creates an unencoded dataset.

+
+
Return type:
+

None

+
+
+
+ +
+
+seg_type = 'baselines'
+
+ +
+
+skip_empty_lines
+
+ +
+
+text_transforms: List[Callable[[str], str]] = []
+
+ +
+
+transforms
+
+ +
+ +
+
+

Reading order datasets

+
+
+class kraken.lib.dataset.PairWiseROSet(files=None, mode='xml', level='baselines', ro_id=None, class_mapping=None)
+

Dataset for training a reading order determination model.

+

Returns random pairs of lines from the same page.

+
+
Parameters:
+
    +
  • files (Sequence[Union[os.PathLike, str]])

  • +
  • mode (Optional[Literal['alto', 'page', 'xml']])

  • +
  • level (Literal['regions', 'baselines'])

  • +
  • ro_id (Optional[str])

  • +
  • class_mapping (Optional[Dict[str, int]])

  • +
+
+
+
+
+data = []
+
+ +
+
+failed_samples = []
+
+ +
+
+get_feature_dim()
+
+ +
+ +
+
+class kraken.lib.dataset.PageWiseROSet(files=None, mode='xml', level='baselines', ro_id=None, class_mapping=None)
+

Dataset for training a reading order determination model.

+

Returns all lines from the same page.

+
+
Parameters:
+
    +
  • files (Sequence[Union[os.PathLike, str]])

  • +
  • mode (Optional[Literal['alto', 'page', 'xml']])

  • +
  • level (Literal['regions', 'baselines'])

  • +
  • ro_id (Optional[str])

  • +
  • class_mapping (Optional[Dict[str, int]])

  • +
+
+
+
+
+data = []
+
+ +
+
+failed_samples = []
+
+ +
+
+get_feature_dim()
+
+ +
+ +
+
+

Helpers

+
+
+class kraken.lib.dataset.ImageInputTransforms(batch, height, width, channels, pad, valid_norm=True, force_binarization=False)
+
+
Parameters:
+
    +
  • batch (int)

  • +
  • height (int)

  • +
  • width (int)

  • +
  • channels (int)

  • +
  • pad (Union[int, Tuple[int, int], Tuple[int, int, int, int]])

  • +
  • valid_norm (bool)

  • +
  • force_binarization (bool)

  • +
+
+
+
+
+property batch: int
+

Batch size attribute. Ignored.

+
+
Return type:
+

int

+
+
+
+ +
+
+property centerline_norm: bool
+

Attribute indicating if centerline normalization will be applied to +input images.

+
+
Return type:
+

bool

+
+
+
+ +
+
+property channels: int
+

Channels attribute. Can be either 1 (binary/grayscale), 3 (RGB).

+
+
Return type:
+

int

+
+
+
+ +
+
+property force_binarization: bool
+

Switch enabling/disabling forced binarization.

+
+
Return type:
+

bool

+
+
+
+ +
+
+property height: int
+

Desired output image height. If set to 0, image will be rescaled +proportionally with width, if 1 and channels is larger than 3 output +will be grayscale and of the height set with the channels attribute.

+
+
Return type:
+

int

+
+
+
+ +
+
+property mode: str
+

Imaginary PIL.Image.Image mode of the output tensor. Possible values +are RGB, L, and 1.

+
+
Return type:
+

str

+
+
+
+ +
+
+property pad: int
+

Amount of padding around left/right end of image.

+
+
Return type:
+

int

+
+
+
+ +
+
+property scale: Tuple[int, int]
+

Desired output shape (height, width) of the image. If any value is set +to 0, image will be rescaled proportionally with height, width, if 1 +and channels is larger than 3 output will be grayscale and of the +height set with the channels attribute.

+
+
Return type:
+

Tuple[int, int]

+
+
+
+ +
+
+property valid_norm: bool
+

Switch allowing/disallowing centerline normalization. Even if enabled +won’t be applied to 3-channel images.

+
+
Return type:
+

bool

+
+
+
+ +
+
+property width: int
+

Desired output image width. If set to 0, image will be rescaled +proportionally with height.

+
+
Return type:
+

int

+
+
+
+ +
+ +
+
+kraken.lib.dataset.collate_sequences(batch)
+

Sorts and pads sequences.

+
+ +
+
+kraken.lib.dataset.global_align(seq1, seq2)
+

Computes a global alignment of two strings.

+
+
Parameters:
+
    +
  • seq1 (Sequence[Any])

  • +
  • seq2 (Sequence[Any])

  • +
+
+
Return type:
+

Tuple[int, List[str], List[str]]

+
+
+

Returns a tuple (distance, list(algn1), list(algn2))

+
+ +
+
+kraken.lib.dataset.compute_confusions(algn1, algn2)
+

Compute confusion matrices from two globally aligned strings.

+
+
Parameters:
+
    +
  • align1 (Sequence[str]) – sequence 1

  • +
  • align2 (Sequence[str]) – sequence 2

  • +
  • algn1 (Sequence[str])

  • +
  • algn2 (Sequence[str])

  • +
+
+
Returns:
+

A tuple (counts, scripts, ins, dels, subs) with counts being per-character +confusions, scripts per-script counts, ins a dict with per script +insertions, del an integer of the number of deletions, subs per +script substitutions.

+
+
+
+ +
+
+
+
+

Legacy modules

+

These modules are retained for compatibility reasons or highly specialized use +cases. In most cases their use is not necessary and they aren’t further +developed for interoperability with new functionality, e.g. the transcription +and line generation modules do not work with the baseline segmenter.

+
+

kraken.binarization module

+
+
+kraken.binarization.nlbin(im, threshold=0.5, zoom=0.5, escale=1.0, border=0.1, perc=80, range=20, low=5, high=90)
+

Performs binarization using non-linear processing.

+
+
Parameters:
+
    +
  • im (PIL.Image.Image) – Input image

  • +
  • threshold (float)

  • +
  • zoom (float) – Zoom for background page estimation

  • +
  • escale (float) – Scale for estimating a mask over the text region

  • +
  • border (float) – Ignore this much of the border

  • +
  • perc (int) – Percentage for filters

  • +
  • range (int) – Range for filters

  • +
  • low (int) – Percentile for black estimation

  • +
  • high (int) – Percentile for white estimation

  • +
+
+
Returns:
+

PIL.Image.Image containing the binarized image

+
+
Raises:
+

KrakenInputException – When trying to binarize an empty image.

+
+
Return type:
+

PIL.Image.Image

+
+
+
+ +
+
+

kraken.transcribe module

+
+
+class kraken.transcribe.TranscriptionInterface(font=None, font_style=None)
+
+
+add_page(im, segmentation=None)
+

Adds an image to the transcription interface, optionally filling in +information from a list of ocr_record objects.

+
+
Parameters:
+
    +
  • im – Input image

  • +
  • segmentation – Output of the segment method.

  • +
+
+
+
+ +
+
+env
+
+ +
+
+font
+
+ +
+
+line_idx = 1
+
+ +
+
+page_idx = 1
+
+ +
+
+pages: List[Dict[Any, Any]] = []
+
+ +
+
+seg_idx = 1
+
+ +
+
+text_direction = 'horizontal-tb'
+
+ +
+
+tmpl
+
+ +
+
+write(fd)
+

Writes the HTML file to a file descriptor.

+
+
Parameters:
+

fd (File) – File descriptor (mode=’rb’) to write to.

+
+
+
+ +
+ +
+
+

kraken.linegen module

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/main/genindex.html b/main/genindex.html new file mode 100644 index 000000000..a6da23c29 --- /dev/null +++ b/main/genindex.html @@ -0,0 +1,926 @@ + + + + + + + Index — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ + +

Index

+ +
+ A + | B + | C + | D + | E + | F + | G + | H + | I + | K + | L + | M + | N + | O + | P + | R + | S + | T + | U + | V + | W + | X + +
+

A

+ + + +
+ +

B

+ + + +
+ +

C

+ + + +
+ +

D

+ + + +
+ +

E

+ + + +
+ +

F

+ + + +
+ +

G

+ + + +
+ +

H

+ + + +
+ +

I

+ + + +
+ +

K

+ + + +
+ +

L

+ + + +
+ +

M

+ + + +
+ +

N

+ + + +
+ +

O

+ + + +
+ +

P

+ + + +
+ +

R

+ + + +
+ +

S

+ + + +
+ +

T

+ + + +
+ +

U

+ + + +
+ +

V

+ + + +
+ +

W

+ + + +
+ +

X

+ + +
+ + + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/main/gpu.html b/main/gpu.html new file mode 100644 index 000000000..c3079237f --- /dev/null +++ b/main/gpu.html @@ -0,0 +1,100 @@ + + + + + + + + GPU Acceleration — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

GPU Acceleration

+

The latest version of kraken uses a new pytorch backend which enables GPU +acceleration both for training and recognition. Apart from a compatible Nvidia +GPU, CUDA and cuDNN have to be installed so pytorch can run computation on it.

+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/main/index.html b/main/index.html new file mode 100644 index 000000000..7da3bb57b --- /dev/null +++ b/main/index.html @@ -0,0 +1,1040 @@ + + + + + + + + kraken — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

kraken

+
+
+

kraken is a turn-key OCR system optimized for historical and non-Latin script +material.

+
+
+

Features

+

kraken’s main features are:

+
+
+
+

Pull requests and code contributions are always welcome.

+
+
+

Installation

+

Kraken can be run on Linux or Mac OS X (both x64 and ARM). Installation through +the on-board pip utility and the anaconda +scientific computing python are supported.

+
+

Installation using Pip

+
$ pip install kraken
+
+
+

or by running pip in the git repository:

+
$ pip install .
+
+
+

If you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to +install the pdf extras package for PyPi:

+
$ pip install kraken[pdf]
+
+
+

or

+
$ pip install .[pdf]
+
+
+

respectively.

+
+
+

Installation using Conda

+

To install the stable version through conda:

+
$ conda install -c conda-forge -c mittagessen kraken
+
+
+

Again PDF/multi-page TIFF/JPEG2000 support requires some additional dependencies:

+
$ conda install -c conda-forge pyvips
+
+
+

The git repository contains some environment files that aid in setting up the latest development version:

+
$ git clone https://github.com/mittagessen/kraken.git
+$ cd kraken
+$ conda env create -f environment.yml
+
+
+

or:

+
$ git clone https://github.com/mittagessen/kraken.git
+$ cd kraken
+$ conda env create -f environment_cuda.yml
+
+
+

for CUDA acceleration with the appropriate hardware.

+
+
+

Finding Recognition Models

+

Finally you’ll have to scrounge up a model to do the actual recognition of +characters. To download the default model for printed French text and place it +in the kraken directory for the current user:

+
$ kraken get 10.5281/zenodo.10592716
+
+
+

A list of libre models available in the central repository can be retrieved by +running:

+
$ kraken list
+
+
+

Model metadata can be extracted using:

+
$ kraken show 10.5281/zenodo.10592716
+name: 10.5281/zenodo.10592716
+
+CATMuS-Print (Large, 2024-01-30) - Diachronic model for French prints and other languages
+
+<p><strong>CATMuS-Print (Large) - Diachronic model for French prints and other West European languages</strong></p>
+<p>CATMuS (Consistent Approach to Transcribing ManuScript) Print is a Kraken HTR model trained on data produced by several projects, dealing with different languages (French, Spanish, German, English, Corsican, Catalan, Latin, Italian&hellip;) and different centuries (from the first prints of the 16th c. to digital documents of the 21st century).</p>
+<p>Transcriptions follow graphematic principles and try to be as compatible as possible with guidelines previously published for French: no ligature (except those that still exist), no allographetic variants (except the long s), and preservation of the historical use of some letters (u/v, i/j). Abbreviations are not resolved. Inconsistencies might be present, because transcriptions have been done over several years and the norms have slightly evolved.</p>
+<p>The model is trained with NFKD Unicode normalization: each diacritic (including superscripts) are transcribed as their own characters, separately from the "main" character.</p>
+<p>This model is the result of the collaboration from researchers from the University of Geneva and Inria Paris and will be consolidated under the CATMuS Medieval Guidelines in an upcoming paper.</p>
+scripts: Latn
+alphabet: !"#$%&'()*+,-./0123456789:;<=>?ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_abcdefghijklmnopqrstuvwxyz|}~¡£¥§«¬°¶·»¿ÆßæđłŒœƀǝɇΑΒΓΔΕΖΘΙΚΛΜΝΟΠΡΣΤΥΦΧΩαβγδεζηθικλμνξοπρςστυφχωϛחלרᑕᗅᗞᚠẞ–—‘’‚“”„‟†•⁄⁊⁋℟←▽◊★☙✠✺✻⟦⟧⬪ꝑꝓꝗꝙꝟꝯꝵ SPACE, COMBINING GRAVE ACCENT, COMBINING ACUTE ACCENT, COMBINING CIRCUMFLEX ACCENT, COMBINING TILDE, COMBINING MACRON, COMBINING DOT ABOVE, COMBINING DIAERESIS, COMBINING RING ABOVE, COMBINING COMMA ABOVE, COMBINING REVERSED COMMA ABOVE, COMBINING CEDILLA, COMBINING OGONEK, COMBINING GREEK PERISPOMENI, COMBINING GREEK YPOGEGRAMMENI, COMBINING LATIN SMALL LETTER I, COMBINING LATIN SMALL LETTER U, 0xe682, 0xe68b, 0xe8bf, 0xf1a7
+accuracy: 98.56%
+license: cc-by-4.0
+author(s): Gabay, Simon; Clérice, Thibault
+date: 2024-01-30
+
+
+
+
+
+

Quickstart

+

The structure of an OCR software consists of multiple steps, primarily +preprocessing, segmentation, and recognition, each of which takes the output of +the previous step and sometimes additional files such as models and templates +that define how a particular transformation is to be performed.

+

In kraken these are separated into different subcommands that can be chained or +ran separately:

+ + + + + + + + + + + + + + + Segmentation + + + + + + + + + + + Recognition + + + + + + + + + + + Serialization + + + + + + + + + + + + + + + + + + + + + + Recognition Model + + + + + + + + + + + + + + + + + + + + + + Segmentation Model + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + OCR Records + + + + + + + + + + + + + + + + + + Baselines, + Regions, + and Order + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Output File + + + + + + + + + + + + + + + + + + Output Template + + + + + + + + + + + + + + + + + + Image + + +

Recognizing text on an image using the default parameters including the +prerequisite step of page segmentation:

+
$ kraken -i image.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel
+Loading RNN     ✓
+Processing      ⣻
+
+
+

To segment an image into reading-order sorted baselines and regions:

+
$ kraken -i bw.tif lines.json segment -bl
+
+
+

To OCR an image using the previously downloaded model:

+
$ kraken -i bw.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel
+
+
+

To OCR an image using the default model and serialize the output using the ALTO +template:

+
$ kraken -a -i bw.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel
+
+
+

All commands and their parameters are documented, just add the standard +--help flag for further information.

+
+
+

Training Tutorial

+

There is a training tutorial at Training kraken.

+
+ +
+

License

+

Kraken is provided under the terms and conditions of the Apache 2.0 +License.

+
+
+

Funding

+

kraken is developed at the École Pratique des Hautes Études, Université PSL.

+
+
+Co-financed by the European Union + +
+
+

This project was partially funded through the RESILIENCE project, funded from +the European Union’s Horizon 2020 Framework Programme for Research and +Innovation.

+
+
+
+
+Received funding from the Programme d’investissements d’Avenir + +
+
+

Ce travail a bénéficié d’une aide de l’État gérée par l’Agence Nationale de la +Recherche au titre du Programme d’Investissements d’Avenir portant la référence +ANR-21-ESRE-0005 (Biblissima+).

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/main/ketos.html b/main/ketos.html new file mode 100644 index 000000000..e87b04442 --- /dev/null +++ b/main/ketos.html @@ -0,0 +1,959 @@ + + + + + + + + Training — kraken documentation + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Training

+

This page describes the training utilities available through the ketos +command line utility in depth. For a gentle introduction on model training +please refer to the tutorial.

+

There are currently three trainable components in the kraken processing pipeline: +* Segmentation: finding lines and regions in images +* Reading Order: ordering lines found in the previous segmentation step. Reading order models are closely linked to segmentation models and both are usually trained on the same dataset. +* Recognition: recognition models transform images of lines into text.

+

Depending on the use case it is not necessary to manually train new models for +each material. The default segmentation model works well on quite a variety of +handwritten and printed documents, a reading order model might not perform +better than the default heuristic for simple text flows, and there are +recognition models for some types of material available in the repository.

+
+

Best practices

+
+

Recognition model training

+
    +
  • The default architecture works well for decently sized datasets.

  • +
  • Use precompiled binary datasets and put them in a place where they can be memory mapped during training (local storage, not NFS or similar).

  • +
  • Fixed splits in precompiled datasets increase memory use and slow down startup as the dataset needs to be loaded once into the dataset. It is recommended to create explicit splits by compiling source XML files into separate datasets.

  • +
  • Use the --logger flag to track your training metrics across experiments using Tensorboard.

  • +
  • If the network doesn’t converge before the early stopping aborts training, increase --min-epochs or --lag. Use the --logger option to inspect your training loss.

  • +
  • Use the flag --augment to activate data augmentation.

  • +
  • Increase the amount of --workers to speedup data loading. This is essential when you use the --augment option.

  • +
  • When using an Nvidia GPU, set the --precision option to 16 to use automatic mixed precision (AMP). This can provide significant speedup without any loss in accuracy.

  • +
  • Use option -B to scale batch size until GPU utilization reaches 100%. When using a larger batch size, it is recommended to use option -r to scale the learning rate by the square root of the batch size (1e-3 * sqrt(batch_size)).

  • +
  • When fine-tuning, it is recommended to use new mode not union as the network will rapidly unlearn missing labels in the new dataset.

  • +
  • If the new dataset is fairly dissimilar or your base model has been pretrained with ketos pretrain, use --warmup in conjunction with --freeze-backbone for one 1 or 2 epochs.

  • +
  • Upload your models to the model repository.

  • +
+
+
+

Segmentation model training

+
    +
  • The segmenter is fairly robust when it comes to hyperparameter choice.

  • +
  • Start by finetuning from the default model for a fixed number of epochs (50 for reasonably sized datasets) with a cosine schedule.

  • +
  • Segmentation models’ performance is difficult to evaluate. Pixel accuracy doesn’t mean much because there are many more pixels that aren’t part of a line or region than just background. Frequency-weighted IoU is good for overall performance, while mean IoU overrepresents rare classes. The best way to evaluate segmentation models is to look at the output on unlabelled data.

  • +
  • If you don’t have rare classes you can use a fairly small validation set to make sure everything is converging and just visually validate on unlabelled data.

  • +
+
+
+
+

Training data formats

+

The training tools accept a variety of training data formats, usually some kind +of custom low level format, the XML-based formats that are commony used for +archival of annotation and transcription data, and in the case of recognizer +training a precompiled binary format. It is recommended to use the XML formats +for segmentation and reading order training and the binary format for +recognition training.

+
+

ALTO

+

Kraken parses and produces files according to ALTO 4.3. An example showing the +attributes necessary for segmentation, recognition, and reading order training +follows:

+
<?xml version="1.0" encoding="UTF-8"?>
+<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+	xmlns="http://www.loc.gov/standards/alto/ns-v4#"
+	xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-0.xsd">
+	<Description>
+		<sourceImageInformation>
+			<fileName>filename.jpg</fileName><!-- relative path in relation to XML location of the image file-->
+		</sourceImageInformation>
+		....
+	</Description>
+	<Layout>
+		<Page...>
+			<PrintSpace...>
+				<ComposedBlockType ID="block_I"
+						   HPOS="125"
+						   VPOS="523"
+						   WIDTH="5234"
+						   HEIGHT="4000"
+						   TYPE="region_type"><!-- for textlines part of a semantic region -->
+					<TextBlock ID="textblock_N">
+						<TextLine ID="line_0"
+							  HPOS="..."
+							  VPOS="..."
+							  WIDTH="..."
+							  HEIGHT="..."
+							  BASELINE="10 20 15 20 400 20"><!-- necessary for segmentation training -->
+							<String ID="segment_K"
+								CONTENT="word_text"><!-- necessary for recognition training. Text is retrieved from <String> and <SP> tags. Lower level glyphs are ignored. -->
+								...
+							</String>
+							<SP.../>
+						</TextLine>
+					</TextBlock>
+				</ComposedBlockType>
+				<TextBlock ID="textblock_M"><!-- for textlines not part of a region -->
+				...
+				</TextBlock>
+			</PrintSpace>
+		</Page>
+	</Layout>
+</alto>
+
+
+

Importantly, the parser only works with measurements in the pixel domain, i.e. +an unset MeasurementUnit or one with an element value of pixel. In +addition, as the minimal version required for ingestion is quite new it is +likely that most existing ALTO documents will not contain sufficient +information to be used with kraken out of the box.

+
+
+

PAGE XML

+

PAGE XML is parsed and produced according to the 2019-07-15 version of the +schema, although the parser is not strict and works with non-conformant output +from a variety of tools. As with ALTO, PAGE XML files can be used to train +segmentation, reading order, and recognition models.

+
<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd">
+	<Metadata>...</Metadata>
+	<Page imageFilename="filename.jpg"...><!-- relative path to an image file from the location of the XML document -->
+		<TextRegion id="block_N"
+			    custom="structure {type:region_type;}"><!-- region type is a free text field-->
+			<Coords points="10,20 500,20 400,200, 500,300, 10,300 5,80"/><!-- polygon for region boundary -->
+			<TextLine id="line_K">
+				<Baseline points="80,200 100,210, 400,198"/><!-- required for baseline segmentation training -->
+				<TextEquiv><Unicode>text text text</Unicode></TextEquiv><!-- only TextEquiv tags immediately below the TextLine tag are parsed for recognition training -->
+				<Word>
+				...
+			</TextLine>
+			....
+		</TextRegion>
+		<TextRegion id="textblock_M"><!-- for lines not contained in any region. TextRegions without a type are automatically assigned the 'text' type which can be filtered out for training. -->
+			<Coords points="0,0 0,{{ page.size[1] }} {{ page.size[0] }},{{ page.size[1] }} {{ page.size[0] }},0"/>
+			<TextLine>...</TextLine><!-- same as above -->
+			....
+                </TextRegion>
+	</Page>
+</PcGts>
+
+
+
+
+

Binary Datasets

+

In addition to training recognition models directly from XML and image files, a +binary dataset format offering a couple of advantages is supported for +recognition training. Binary datasets drastically improve loading performance +allowing the saturation of most GPUs with minimal computational overhead while +also allowing training with datasets that are larger than the systems main +memory. A minor drawback is a ~30% increase in dataset size in comparison to +the raw images + XML approach.

+

To realize this speedup the dataset has to be compiled first:

+
$ ketos compile -f xml -o dataset.arrow file_1.xml file_2.xml ...
+
+
+

if there are a lot of individual lines containing many lines this process can +take a long time. It can easily be parallelized by specifying the number of +separate parsing workers with the --workers option:

+
$ ketos compile --workers 8 -f xml ...
+
+
+

In addition, binary datasets can contain fixed splits which allow +reproducibility and comparability between training and evaluation runs. +Training, validation, and test splits can be pre-defined from multiple sources. +Per default they are sourced from tags defined in the source XML files unless +the option telling kraken to ignore them is set:

+
$ ketos compile --ignore-splits -f xml ...
+
+
+

Alternatively fixed-proportion random splits can be created ad-hoc during +compile time:

+
$ ketos compile --random-split 0.8 0.1 0.1 ...
+
+
+

The above line splits assigns 80% of the source lines to the training set, 10% +to the validation set, and 10% to the test set. The training and validation +sets in the dataset file are used automatically by ketos train (unless told +otherwise) while the remaining 10% of the test set is selected by ketos test.

+
+

Warning

+

Fixed splits in datasets are ignored during training and testing per +default as they require loading the entire dataset into main memory at +once, drastically increasing memory consumption and causing initial delays. +Use the --fixed-splits option in ketos train and ketos test to +respect fixed splits.

+
+
+
+
+

Recognition training

+

The training utility allows training of VGSL specified models +both from scratch and from existing models. Here are its most important command line options:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

option

action

-o, --output

Output model file prefix. Defaults to model.

-s, --spec

VGSL spec of the network to train. CTC layer +will be added automatically. default: +[1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 +Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do]

-a, --append

Removes layers before argument and then +appends spec. Only works when loading an +existing model

-i, --load

Load existing file to continue training

-F, --savefreq

Model save frequency in epochs during +training

-q, --quit

Stop condition for training. Set to early +for early stopping (default) or fixed for fixed +number of epochs.

-N, --epochs

Number of epochs to train for.

--min-epochs

Minimum number of epochs to train for when using early stopping.

--lag

Number of epochs to wait before stopping +training without improvement. Only used when using early stopping.

-d, --device

Select device to use (cpu, cuda:0, cuda:1,…). GPU acceleration requires CUDA.

--optimizer

Select optimizer (Adam, SGD, RMSprop).

-r, --lrate

Learning rate [default: 0.001]

-m, --momentum

Momentum used with SGD optimizer. Ignored otherwise.

-w, --weight-decay

Weight decay.

--schedule

Sets the learning rate scheduler. May be either constant, 1cycle, exponential, cosine, step, or +reduceonplateau. For 1cycle the cycle length is determined by the –epoch option.

-p, --partition

Ground truth data partition ratio between train/validation set

-u, --normalization

Ground truth Unicode normalization. One of NFC, NFKC, NFD, NFKD.

-c, --codec

Load a codec JSON definition (invalid if loading existing model)

--resize

Codec/output layer resizing option. If set +to union code points will be added, new +will set the layer to match exactly the +training data, fail will abort if training +data and model codec do not match. Only valid when refining an existing model.

-n, --reorder / --no-reorder

Reordering of code points to display order.

-t, --training-files

File(s) with additional paths to training data. Used to +enforce an explicit train/validation set split and deal with +training sets with more lines than the command line can process. Can be used more than once.

-e, --evaluation-files

File(s) with paths to evaluation data. Overrides the -p parameter.

-f, --format-type

Sets the training and evaluation data format. +Valid choices are ‘path’, ‘xml’ (default), ‘alto’, ‘page’, or binary. +In alto, page, and xml mode all data is extracted from XML files +containing both baselines and a link to source images. +In path mode arguments are image files sharing a prefix up to the last +extension with JSON .path files containing the baseline information. +In binary mode arguments are precompiled binary dataset files.

--augment / --no-augment

Enables/disables data augmentation.

--workers

Number of OpenMP threads and workers used to perform neural network passes and load samples from the dataset.

+
+

From Scratch

+

The absolute minimal example to train a new recognition model from a number of +ALTO or PAGE XML documents is similar to the segmentation training:

+
$ ketos train -f xml training_data/*.xml
+
+
+

Training will continue until the error does not improve anymore and the best +model (among intermediate results) will be saved in the current directory; this +approach is called early stopping.

+

In some cases changing the network architecture might be useful. One such +example would be material that is not well recognized in the grayscale domain, +as the default architecture definition converts images into grayscale. The +input definition can be changed quite easily to train on color data (RGB) instead:

+
$ ketos train -f page -s '[1,120,0,3 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do0.1,2 Lbx200 Do]]' syr/*.xml
+
+
+

Complete documentation for the network description language can be found on the +VGSL page.

+

Sometimes the early stopping default parameters might produce suboptimal +results such as stopping training too soon. Adjusting the lag can be useful:

+
$ ketos train --lag 10 syr/*.png
+
+
+

To switch optimizers from Adam to SGD or RMSprop just set the option:

+
$ ketos train --optimizer SGD syr/*.png
+
+
+

It is possible to resume training from a previously saved model:

+
$ ketos train -i model_25.mlmodel syr/*.png
+
+
+

A good configuration for a small precompiled print dataset and GPU acceleration +would be:

+
$ ketos train -d cuda -f binary dataset.arrow
+
+
+

A better configuration for large and complicated datasets such as handwritten texts:

+
$ ketos train --augment --workers 4 -d cuda -f binary --min-epochs 20 -w 0 -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]' -r 0.0001 dataset_large.arrow
+
+
+

This configuration is slower to train and often requires a couple of epochs to +output any sensible text at all. Therefore we tell ketos to train for at least +20 epochs so the early stopping algorithm doesn’t prematurely interrupt the +training process.

+
+
+

Fine Tuning

+

Fine tuning an existing model for another typeface or new characters is also +possible with the same syntax as resuming regular training:

+
$ ketos train -f page -i model_best.mlmodel syr/*.xml
+
+
+

The caveat is that the alphabet of the base model and training data have to be +an exact match. Otherwise an error will be raised:

+
$ ketos train -i model_5.mlmodel kamil/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[0.8616] alphabet mismatch {'~', '»', '8', '9', 'ـ'}
+Network codec not compatible with training set
+[0.8620] Training data and model codec alphabets mismatch: {'ٓ', '؟', '!', 'ص', '،', 'ذ', 'ة', 'ي', 'و', 'ب', 'ز', 'ح', 'غ', '~', 'ف', ')', 'د', 'خ', 'م', '»', 'ع', 'ى', 'ق', 'ش', 'ا', 'ه', 'ك', 'ج', 'ث', '(', 'ت', 'ظ', 'ض', 'ل', 'ط', '؛', 'ر', 'س', 'ن', 'ء', 'ٔ', '«', 'ـ', 'ٕ'}
+
+
+

There are two modes dealing with mismatching alphabets, union and new. +union resizes the output layer and codec of the loaded model to include all +characters in the new training set without removing any characters. new +will make the resulting model an exact match with the new training set by both +removing unused characters from the model and adding new ones.

+
$ ketos -v train --resize union -i model_5.mlmodel syr/*.png
+...
+[0.7943] Training set 788 lines, validation set 88 lines, alphabet 50 symbols
+...
+[0.8337] Resizing codec to include 3 new code points
+[0.8374] Resizing last layer in network to 52 outputs
+...
+
+
+

In this example 3 characters were added for a network that is able to +recognize 52 different characters after sufficient additional training.

+
$ ketos -v train --resize new -i model_5.mlmodel syr/*.png
+...
+[0.7593] Training set 788 lines, validation set 88 lines, alphabet 49 symbols
+...
+[0.7857] Resizing network or given codec to 49 code sequences
+[0.8344] Deleting 2 output classes from network (46 retained)
+...
+
+
+

In new mode 2 of the original characters were removed and 3 new ones were added.

+
+
+

Slicing

+

Refining on mismatched alphabets has its limits. If the alphabets are highly +different the modification of the final linear layer to add/remove character +will destroy the inference capabilities of the network. In those cases it is +faster to slice off the last few layers of the network and only train those +instead of a complete network from scratch.

+

Taking the default network definition as printed in the debug log we can see +the layer indices of the model:

+
[0.8760] Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 48 outputs
+[0.8762] layer          type    params
+[0.8790] 0              conv    kernel 3 x 3 filters 32 activation r
+[0.8795] 1              dropout probability 0.1 dims 2
+[0.8797] 2              maxpool kernel 2 x 2 stride 2 x 2
+[0.8802] 3              conv    kernel 3 x 3 filters 64 activation r
+[0.8804] 4              dropout probability 0.1 dims 2
+[0.8806] 5              maxpool kernel 2 x 2 stride 2 x 2
+[0.8813] 6              reshape from 1 1 x 12 to 1/3
+[0.8876] 7              rnn     direction b transposed False summarize False out 100 legacy None
+[0.8878] 8              dropout probability 0.5 dims 1
+[0.8883] 9              linear  augmented False out 48
+
+
+

To remove everything after the initial convolutional stack and add untrained +layers we define a network stub and index for appending:

+
$ ketos train -i model_1.mlmodel --append 7 -s '[Lbx256 Do]' syr/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[0.8014] alphabet mismatch {'8', '3', '9', '7', '܇', '݀', '݂', '4', ':', '0'}
+Slicing and dicing model ✓
+
+
+

The new model will behave exactly like a new one, except potentially training a +lot faster.

+
+
+

Text Normalization and Unicode

+

Text can be encoded in multiple different ways when using Unicode. For many +scripts characters with diacritics can be encoded either as a single code point +or a base character and the diacritic, different types of whitespace exist, and mixed bidirectional text +can be written differently depending on the base line direction.

+

Ketos provides options to largely normalize input into normalized forms that +make processing of data from multiple sources possible. Principally, two +options are available: one for Unicode normalization and one for whitespace normalization. The +Unicode normalization (disabled per default) switch allows one to select one of +the 4 normalization forms:

+
$ ketos train --normalization NFD -f xml training_data/*.xml
+$ ketos train --normalization NFC -f xml training_data/*.xml
+$ ketos train --normalization NFKD -f xml training_data/*.xml
+$ ketos train --normalization NFKC -f xml training_data/*.xml
+
+
+

Whitespace normalization is enabled per default and converts all Unicode +whitespace characters into a simple space. It is highly recommended to leave +this function enabled as the variation of space width, resulting either from +text justification or the irregularity of handwriting, is difficult for a +recognition model to accurately model and map onto the different space code +points. Nevertheless it can be disabled through:

+
$ ketos train --no-normalize-whitespace -f xml training_data/*.xml
+
+
+

Further the behavior of the BiDi algorithm can be influenced through two options. The +configuration of the algorithm is important as the recognition network is +trained to output characters (or rather labels which are mapped to code points +by a codec) in the order a line is fed into the network, i.e. +left-to-right also called display order. Unicode text is encoded as a stream of +code points in logical order, i.e. the order the characters in a line are read +in by a human reader, for example (mostly) right-to-left for a text in Hebrew. +The BiDi algorithm resolves this logical order to the display order expected by +the network and vice versa. The primary parameter of the algorithm is the base +direction which is just the default direction of the input fields of the user +when the ground truth was initially transcribed. Base direction will be +automatically determined by kraken when using PAGE XML or ALTO files that +contain it, otherwise it will have to be supplied if it differs from the +default when training a model:

+
$ ketos train --base-dir R -f xml rtl_training_data/*.xml
+
+
+

It is also possible to disable BiDi processing completely, e.g. when the text +has been brought into display order already:

+
$ ketos train --no-reorder -f xml rtl_display_data/*.xml
+
+
+
+
+

Codecs

+

Codecs map between the label decoded from the raw network output and Unicode +code points (see this diagram for the precise steps +involved in text line recognition). Codecs are attached to a recognition model +and are usually defined once at initial training time, although they can be +adapted either explicitly (with the API) or implicitly through domain adaptation.

+

The default behavior of kraken is to auto-infer this mapping from all the +characters in the training set and map each code point to one separate label. +This is usually sufficient for alphabetic scripts, abjads, and abugidas apart +from very specialised use cases. Logographic writing systems with a very large +number of different graphemes, such as all the variants of Han characters or +Cuneiform, can be more problematic as their large inventory makes recognition +both slow and error-prone. In such cases it can be advantageous to decompose +each code point into multiple labels to reduce the output dimensionality of the +network. During decoding valid sequences of labels will be mapped to their +respective code points as usual.

+

There are multiple approaches one could follow constructing a custom codec: +randomized block codes, i.e. producing random fixed-length labels for each code +point, Huffmann coding, i.e. variable length label sequences depending on the +frequency of each code point in some text (not necessarily the training set), +or structural decomposition, i.e. describing each code point through a +sequence of labels that describe the shape of the grapheme similar to how some +input systems for Chinese characters function.

+

While the system is functional it is not well-tested in practice and it is +unclear which approach works best for which kinds of inputs.

+

Custom codecs can be supplied as simple JSON files that contain a dictionary +mapping between strings and integer sequences, e.g.:

+
$ ketos train -c sample.codec -f xml training_data/*.xml
+
+
+

with sample.codec containing:

+
{"S": [50, 53, 74, 23],
+ "A": [95, 60, 19, 95],
+ "B": [2, 96, 28, 29],
+ "\u1f05": [91, 14, 95, 90]}
+
+
+
+
+
+

Unsupervised recognition pretraining

+

Text recognition models can be pretrained in an unsupervised fashion from text +line images, both in bounding box and baseline format. The pretraining is +performed through a contrastive surrogate task aiming to distinguish in-painted +parts of the input image features from randomly sampled distractor slices.

+

All data sources accepted by the supervised trainer are valid for pretraining +but for performance reasons it is recommended to use pre-compiled binary +datasets. One thing to keep in mind is that compilation filters out empty +(non-transcribed) text lines per default which is undesirable for pretraining. +With the --keep-empty-lines option all valid lines will be written to the +dataset file:

+
$ ketos compile --keep-empty-lines -f xml -o foo.arrow *.xml
+
+
+

The basic pretraining call is very similar to a training one:

+
$ ketos pretrain -f binary foo.arrow
+
+
+

There are a couple of hyperparameters that are specific to pretraining: the +mask width (at the subsampling level of the last convolutional layer), the +probability of a particular position being the start position of a mask, and +the number of negative distractor samples.

+
$ ketos pretrain -o pretrain --mask-width 4 --mask-probability 0.2 --num-negatives 3 -f binary foo.arrow
+
+
+

Once a model has been pretrained it has to be adapted to perform actual +recognition with a standard labelled dataset, although training data +requirements will usually be much reduced:

+
$ ketos train -i pretrain_best.mlmodel --warmup 5000 --freeze-backbone 1000 -f binary labelled.arrow
+
+
+

It is necessary to use learning rate warmup (warmup) for at least a couple of +epochs in addition to freezing the backbone (all but the last fully connected +layer performing the classification) to have the model converge during +fine-tuning. Fine-tuning models from pre-trained weights is quite a bit less +stable than training from scratch or fine-tuning an existing model. As such it +can be necessary to run a couple of trials with different hyperparameters +(principally learning rate) to find workable ones. It is entirely possible that +pretrained models do not converge at all even with reasonable hyperparameter +configurations.

+
+
+

Segmentation training

+

Training a segmentation model is very similar to training models for text +recognition. The basic invocation is:

+
$ ketos segtrain -f xml training_data/*.xml
+
+
+

This takes all text lines and regions encoded in the XML files and trains a +model to recognize them.

+

Most other options available in transcription training are also available in +segmentation training. CUDA acceleration:

+
$ ketos segtrain -d cuda -f xml training_data/*.xml
+
+
+

Defining custom architectures:

+
$ ketos segtrain -d cuda -s '[1,1200,0,3 Cr7,7,64,2,2 Gn32 Cr3,3,128,2,2 Gn32 Cr3,3,128 Gn32 Cr3,3,256 Gn32]' -f xml training_data/*.xml
+
+
+

Fine tuning/transfer learning with last layer adaptation and slicing:

+
$ ketos segtrain --resize new -i segmodel_best.mlmodel training_data/*.xml
+$ ketos segtrain -i segmodel_best.mlmodel --append 7 -s '[Cr3,3,64 Do0.1]' training_data/*.xml
+
+
+

In addition there are a couple of specific options that allow filtering of +baseline and region types. Datasets are often annotated to a level that is too +detailed or contains undesirable types, e.g. when combining segmentation data +from different sources. The most basic option is the suppression of all of +either baseline or region data contained in the dataset:

+
$ ketos segtrain --suppress-baselines -f xml training_data/*.xml
+Training line types:
+Training region types:
+  graphic       3       135
+  text  4       1128
+  separator     5       5431
+  paragraph     6       10218
+  table 7       16
+...
+$ ketos segtrain --suppress-regions -f xml training-data/*.xml
+Training line types:
+  default 2     53980
+  foo     8     134
+...
+
+
+

It is also possible to filter out baselines/regions selectively:

+
$ ketos segtrain -f xml --valid-baselines default training_data/*.xml
+Training line types:
+  default 2     53980
+Training region types:
+  graphic       3       135
+  text  4       1128
+  separator     5       5431
+  paragraph     6       10218
+  table 7       16
+$ ketos segtrain -f xml --valid-regions graphic --valid-regions paragraph training_data/*.xml
+Training line types:
+  default 2     53980
+ Training region types:
+  graphic       3       135
+  paragraph     6       10218
+
+
+

Finally, we can merge baselines and regions into each other:

+
$ ketos segtrain -f xml --merge-baselines default:foo training_data/*.xml
+Training line types:
+  default 2     54114
+...
+$ ketos segtrain -f xml --merge-regions text:paragraph --merge-regions graphic:table training_data/*.xml
+...
+Training region types:
+  graphic       3       151
+  text  4       11346
+  separator     5       5431
+...
+
+
+

These options are combinable to massage the dataset into any typology you want. +Tags containing the separator character : can be specified by escaping them +with backslash.

+

Then there are some options that set metadata fields controlling the +postprocessing. When computing the bounding polygons the recognized baselines +are offset slightly to ensure overlap with the line corpus. This offset is per +default upwards for baselines but as it is possible to annotate toplines (for +scripts like Hebrew) and centerlines (for baseline-free scripts like Chinese) +the appropriate offset can be selected with an option:

+
$ ketos segtrain --topline -f xml hebrew_training_data/*.xml
+$ ketos segtrain --centerline -f xml chinese_training_data/*.xml
+$ ketos segtrain --baseline -f xml latin_training_data/*.xml
+
+
+

Lastly, there are some regions that are absolute boundaries for text line +content. When these regions are marked as such the polygonization can sometimes +be improved:

+
$ ketos segtrain --bounding-regions paragraph -f xml training_data/*.xml
+...
+
+
+
+
+

Reading order training

+

Reading order models work slightly differently from segmentation and reading +order models. They are closely linked to the typology used in the dataset they +were trained on as they use type information on lines and regions to make +ordering decisions. As the same typology was probably used to train a specific +segmentation model, reading order models are trained separately but bundled +with their segmentation model in a subsequent step. The general sequence is +therefore:

+
$ ketos segtrain -o fr_manu_seg.mlmodel -f xml french/*.xml
+...
+$ ketos rotrain -o fr_manu_ro.mlmodel -f xml french/*.xml
+...
+$ ketos roadd -o fr_manu_seg_with_ro.mlmodel -i fr_manu_seg_best.mlmodel  -r fr_manu_ro_best.mlmodel
+
+
+

Only the fr_manu_seg_with_ro.mlmodel file will contain the trained reading +order model. Segmentation models can exist with or without reading order +models. If one is added, the neural reading order will be computed in +addition to the one produced by the default heuristic during segmentation and +serialized in the final XML output (in ALTO/PAGE XML).

+
+

Note

+

Reading order models work purely on the typology and geometric features +of the lines and regions. They construct an approximate ordering matrix +by feeding feature vectors of two lines (or regions) into the network +to decide which of those two lines precedes the other.

+

These feature vectors are quite simple; just the lines’ types, and +their start, center, and end points. Therefore they can not reliably +learn any ordering relying on graphical features of the input page such +as: line color, typeface, or writing system.

+
+

Reading order models are extremely simple and do not require a lot of memory or +computational power to train. In fact, the default parameters are extremely +conservative and it is recommended to increase the batch size for improved +training speed. Large batch size above 128k are easily possible with +sufficiently large training datasets:

+
$ ketos rotrain -o fr_manu_ro.mlmodel -B 128000 -f french/*.xml
+Training RO on following baselines types:
+  DefaultLine   1
+  DropCapitalLine       2
+  HeadingLine   3
+  InterlinearLine       4
+GPU available: False, used: False
+TPU available: False, using: 0 TPU cores
+IPU available: False, using: 0 IPUs
+HPU available: False, using: 0 HPUs
+┏━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
+┃   ┃ Name        ┃ Type              ┃ Params ┃
+┡━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
+│ 0 │ criterion   │ BCEWithLogitsLoss │      0 │
+│ 1 │ ro_net      │ MLP               │  1.1 K │
+│ 2 │ ro_net.fc1  │ Linear            │  1.0 K │
+│ 3 │ ro_net.relu │ ReLU              │      0 │
+│ 4 │ ro_net.fc2  │ Linear            │     45 │
+└───┴─────────────┴───────────────────┴────────┘
+Trainable params: 1.1 K
+Non-trainable params: 0
+Total params: 1.1 K
+Total estimated model params size (MB): 0
+stage 0/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/35 0:00:00 • -:--:-- 0.00it/s val_spearman: 0.912 val_loss: 0.701 early_stopping: 0/300 inf
+
+
+

During validation a metric called Spearman’s footrule is computed. To calculate +Spearman’s footrule, the ranks of the lines of text in the ground truth reading +order and the predicted reading order are compared. The footrule is then +calculated as the sum of the absolute differences between the ranks of pairs of +lines. The score increases by 1 for each line between the correct and predicted +positions of a line.

+

A lower footrule score indicates a better alignment between the two orders. A +score of 0 implies perfect alignment of line ranks.

+
+
+

Recognition testing

+

Picking a particular model from a pool or getting a more detailed look on the +recognition accuracy can be done with the test command. It uses transcribed +lines, the test set, in the same format as the train command, recognizes the +line images with one or more models, and creates a detailed report of the +differences from the ground truth for each of them.

+ + + + + + + + + + + + + + + + + + + + + + + +

option

action

-f, --format-type

Sets the test set data format. +Valid choices are ‘path’, ‘xml’ (default), ‘alto’, ‘page’, or binary. +In alto, page, and xml mode all data is extracted from XML files +containing both baselines and a link to source images. +In path mode arguments are image files sharing a prefix up to the last +extension with JSON .path files containing the baseline information. +In binary mode arguments are precompiled binary dataset files.

-m, --model

Model(s) to evaluate.

-e, --evaluation-files

File(s) with paths to evaluation data.

-d, --device

Select device to use.

--pad

Left and right padding around lines.

+

Transcriptions are handed to the command in the same way as for the train +command, either through a manifest with -e/--evaluation-files or by just +adding a number of image files as the final argument:

+
$ ketos test -m $model -e test.txt test/*.png
+Evaluating $model
+Evaluating  [####################################]  100%
+=== report test_model.mlmodel ===
+
+7012 Characters
+6022 Errors
+14.12%       Accuracy
+
+5226 Insertions
+2    Deletions
+794  Substitutions
+
+Count Missed   %Right
+1567  575    63.31%  Common
+5230  5230   0.00%   Arabic
+215   215    0.00%   Inherited
+
+Errors       Correct-Generated
+773  { ا } - {  }
+536  { ل } - {  }
+328  { و } - {  }
+274  { ي } - {  }
+266  { م } - {  }
+256  { ب } - {  }
+246  { ن } - {  }
+241  { SPACE } - {  }
+207  { ر } - {  }
+199  { ف } - {  }
+192  { ه } - {  }
+174  { ع } - {  }
+172  { ARABIC HAMZA ABOVE } - {  }
+144  { ت } - {  }
+136  { ق } - {  }
+122  { س } - {  }
+108  { ، } - {  }
+106  { د } - {  }
+82   { ك } - {  }
+81   { ح } - {  }
+71   { ج } - {  }
+66   { خ } - {  }
+62   { ة } - {  }
+60   { ص } - {  }
+39   { ، } - { - }
+38   { ش } - {  }
+30   { ا } - { - }
+30   { ن } - { - }
+29   { ى } - {  }
+28   { ذ } - {  }
+27   { ه } - { - }
+27   { ARABIC HAMZA BELOW } - {  }
+25   { ز } - {  }
+23   { ث } - {  }
+22   { غ } - {  }
+20   { م } - { - }
+20   { ي } - { - }
+20   { ) } - {  }
+19   { : } - {  }
+19   { ط } - {  }
+19   { ل } - { - }
+18   { ، } - { . }
+17   { ة } - { - }
+16   { ض } - {  }
+...
+Average accuracy: 14.12%, (stddev: 0.00)
+
+
+

The report(s) contains character accuracy measured per script and a detailed +list of confusions. When evaluating multiple models the last line of the output +will the average accuracy and the standard deviation across all of them.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/main/models.html b/main/models.html new file mode 100644 index 000000000..4ebb97839 --- /dev/null +++ b/main/models.html @@ -0,0 +1,126 @@ + + + + + + + + Models — kraken documentation + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Models

+

There are currently three kinds of models containing the recurrent neural +networks doing all the character recognition supported by kraken: pronn +files serializing old pickled pyrnn models as protobuf, clstm’s native +serialization, and versatile Core ML models.

+
+

CoreML

+

Core ML allows arbitrary network architectures in a compact serialization with +metadata. This is the default format in pytorch-based kraken.

+
+
+

Segmentation Models

+
+
+

Recognition Models

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/main/objects.inv b/main/objects.inv new file mode 100644 index 000000000..1b1edb173 Binary files /dev/null and b/main/objects.inv differ diff --git a/main/search.html b/main/search.html new file mode 100644 index 000000000..9ec3ebaa1 --- /dev/null +++ b/main/search.html @@ -0,0 +1,113 @@ + + + + + + + Search — kraken documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +

Search

+ + + + +

+ Searching for multiple words only shows matches that contain + all words. +

+ + +
+ + + +
+ + +
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/main/searchindex.js b/main/searchindex.js new file mode 100644 index 000000000..887a0f2be --- /dev/null +++ b/main/searchindex.js @@ -0,0 +1 @@ +Search.setIndex({"alltitles": {"ABBYY XML": [[2, "abbyy-xml"]], "ALTO": [[5, "alto"]], "ALTO 4.4": [[2, "alto-4-4"]], "API Quickstart": [[1, null]], "API Reference": [[2, null]], "Advanced Usage": [[0, null]], "Annotation and transcription": [[7, "annotation-and-transcription"]], "Baseline Segmentation": [[0, "baseline-segmentation"]], "Baseline segmentation": [[1, "baseline-segmentation"]], "Basic Concepts": [[1, "basic-concepts"]], "Basics": [[8, "basics"]], "Best practices": [[5, "best-practices"]], "Binarization": [[0, "binarization"]], "Binary Datasets": [[5, "binary-datasets"]], "Codecs": [[5, "codecs"]], "Containers and Helpers": [[2, "containers-and-helpers"]], "Convolutional Layers": [[8, "convolutional-layers"]], "CoreML": [[6, "coreml"]], "Dataset Compilation": [[7, "dataset-compilation"]], "Default templates": [[2, "default-templates"]], "Dropout": [[8, "dropout"]], "Evaluation and Validation": [[7, "evaluation-and-validation"]], "Examples": [[8, "examples"]], "Features": [[4, "features"]], "Finding Recognition Models": [[4, "finding-recognition-models"]], "Fine Tuning": [[5, "fine-tuning"]], "From Scratch": [[5, "from-scratch"]], "Funding": [[4, "funding"]], "GPU Acceleration": [[3, null]], "Group Normalization": [[8, "group-normalization"]], "Helper and Plumbing Layers": [[8, "helper-and-plumbing-layers"]], "Helpers": [[2, "helpers"]], "Image acquisition and preprocessing": [[7, "image-acquisition-and-preprocessing"]], "Input and Outputs": [[0, "input-and-outputs"]], "Installation": [[4, "installation"]], "Installation using Conda": [[4, "installation-using-conda"]], "Installation using Pip": [[4, "installation-using-pip"]], "Installing kraken": [[7, "installing-kraken"]], "Legacy Box Segmentation": [[0, "legacy-box-segmentation"]], "Legacy modules": [[2, "legacy-modules"]], "Legacy segmentation": [[1, "legacy-segmentation"]], "License": [[4, "license"]], "Loss and Evaluation Functions": [[2, "loss-and-evaluation-functions"]], "Masking": [[0, "masking"]], "Max Pool": [[8, "max-pool"]], "Model Repository": [[0, "model-repository"]], "Models": [[6, null]], "Output formats": [[0, "output-formats"]], "PAGE XML": [[5, "page-xml"]], "Page Segmentation": [[0, "page-segmentation"]], "PageXML": [[2, "pagexml"]], "Preprocessing and Segmentation": [[1, "preprocessing-and-segmentation"]], "Principal Text Direction": [[0, "principal-text-direction"]], "Publishing": [[0, "publishing"]], "Querying and Model Retrieval": [[0, "querying-and-model-retrieval"]], "Quickstart": [[4, "quickstart"]], "Reading order datasets": [[2, "reading-order-datasets"]], "Reading order training": [[5, "reading-order-training"]], "Recognition": [[0, "recognition"], [1, "recognition"], [2, "recognition"], [7, "recognition"]], "Recognition Models": [[6, "recognition-models"]], "Recognition datasets": [[2, "recognition-datasets"]], "Recognition model training": [[5, "recognition-model-training"]], "Recognition testing": [[5, "recognition-testing"]], "Recognition training": [[5, "recognition-training"]], "Recurrent Layers": [[8, "recurrent-layers"]], "Regularization Layers": [[8, "regularization-layers"]], "Related Software": [[4, "related-software"]], "Reshape": [[8, "reshape"]], "Segmentation": [[2, "segmentation"]], "Segmentation Models": [[6, "segmentation-models"]], "Segmentation datasets": [[2, "segmentation-datasets"]], "Segmentation model training": [[5, "segmentation-model-training"]], "Segmentation training": [[5, "segmentation-training"]], "Serialization": [[1, "serialization"], [2, "serialization"]], "Slicing": [[5, "slicing"]], "Text Normalization and Unicode": [[5, "text-normalization-and-unicode"]], "Trainer": [[2, "trainer"]], "Training": [[1, "training"], [2, "training"], [5, null], [7, "compilation"]], "Training Tutorial": [[4, "training-tutorial"]], "Training data formats": [[5, "training-data-formats"]], "Training kraken": [[7, null]], "Unsupervised recognition pretraining": [[5, "unsupervised-recognition-pretraining"]], "VGSL network specification": [[8, null]], "XML Parsing": [[1, "xml-parsing"]], "hOCR": [[2, "hocr"]], "kraken": [[4, null]], "kraken.binarization module": [[2, "kraken-binarization-module"]], "kraken.blla module": [[2, "kraken-blla-module"]], "kraken.containers module": [[2, "kraken-containers-module"]], "kraken.lib.codec module": [[2, "kraken-lib-codec-module"]], "kraken.lib.ctc_decoder": [[2, "kraken-lib-ctc-decoder"]], "kraken.lib.dataset module": [[2, "kraken-lib-dataset-module"]], "kraken.lib.exceptions": [[2, "kraken-lib-exceptions"]], "kraken.lib.models module": [[2, "kraken-lib-models-module"]], "kraken.lib.segmentation module": [[2, "kraken-lib-segmentation-module"]], "kraken.lib.train module": [[2, "kraken-lib-train-module"]], "kraken.lib.vgsl module": [[2, "kraken-lib-vgsl-module"]], "kraken.lib.xml module": [[2, "kraken-lib-xml-module"]], "kraken.linegen module": [[2, "kraken-linegen-module"]], "kraken.pageseg module": [[2, "kraken-pageseg-module"]], "kraken.rpred module": [[2, "kraken-rpred-module"]], "kraken.serialization module": [[2, "kraken-serialization-module"]], "kraken.transcribe module": [[2, "kraken-transcribe-module"]]}, "docnames": ["advanced", "api", "api_docs", "gpu", "index", "ketos", "models", "training", "vgsl"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2}, "filenames": ["advanced.rst", "api.rst", "api_docs.rst", "gpu.rst", "index.rst", "ketos.rst", "models.rst", "training.rst", "vgsl.rst"], "indexentries": {"add() (kraken.lib.dataset.arrowipcrecognitiondataset method)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.add", false]], "add() (kraken.lib.dataset.baselineset method)": [[2, "kraken.lib.dataset.BaselineSet.add", false]], "add() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.add", false]], "add() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.add", false]], "add_codec() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.add_codec", false]], "add_labels() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.add_labels", false]], "add_line() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.add_line", false]], "add_line() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.add_line", false]], "add_page() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.add_page", false]], "add_page() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.add_page", false]], "add_page() (kraken.transcribe.transcriptioninterface method)": [[2, "kraken.transcribe.TranscriptionInterface.add_page", false]], "alphabet (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.alphabet", false]], "alphabet (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.alphabet", false]], "alphabet (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.alphabet", false]], "append() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.append", false]], "arrow_table (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.arrow_table", false]], "arrowipcrecognitiondataset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset", false]], "aug (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.aug", false]], "aug (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.aug", false]], "aug (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.aug", false]], "aug (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.aug", false]], "automatic_optimization (kraken.lib.train.krakentrainer attribute)": [[2, "kraken.lib.train.KrakenTrainer.automatic_optimization", false]], "aux_layers (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.aux_layers", false]], "base_dir (kraken.containers.baselineline attribute)": [[2, "id7", false], [2, "kraken.containers.BaselineLine.base_dir", false]], "base_dir (kraken.containers.baselineocrrecord attribute)": [[2, "id29", false], [2, "kraken.containers.BaselineOCRRecord.base_dir", false]], "base_dir (kraken.containers.bboxline attribute)": [[2, "id16", false], [2, "kraken.containers.BBoxLine.base_dir", false]], "base_dir (kraken.containers.bboxocrrecord attribute)": [[2, "id33", false], [2, "kraken.containers.BBoxOCRRecord.base_dir", false]], "base_dir (kraken.containers.ocr_record attribute)": [[2, "kraken.containers.ocr_record.base_dir", false]], "baseline (kraken.containers.baselineline attribute)": [[2, "id8", false], [2, "kraken.containers.BaselineLine.baseline", false]], "baselineline (class in kraken.containers)": [[2, "kraken.containers.BaselineLine", false]], "baselineocrrecord (class in kraken.containers)": [[2, "kraken.containers.BaselineOCRRecord", false]], "baselineset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.BaselineSet", false]], "batch (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.batch", false]], "bbox (kraken.containers.bboxline attribute)": [[2, "id17", false], [2, "kraken.containers.BBoxLine.bbox", false]], "bboxline (class in kraken.containers)": [[2, "kraken.containers.BBoxLine", false]], "bboxocrrecord (class in kraken.containers)": [[2, "kraken.containers.BBoxOCRRecord", false]], "beam_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.beam_decoder", false]], "bidi_reordering (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.bidi_reordering", false]], "blank_threshold_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.blank_threshold_decoder", false]], "blocks (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.blocks", false]], "boundary (kraken.containers.baselineline attribute)": [[2, "id9", false], [2, "kraken.containers.BaselineLine.boundary", false]], "boundary (kraken.containers.region attribute)": [[2, "id25", false], [2, "kraken.containers.Region.boundary", false]], "bounds (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.bounds", false]], "build_addition() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_addition", false]], "build_conv() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_conv", false]], "build_dropout() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_dropout", false]], "build_groupnorm() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_groupnorm", false]], "build_identity() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_identity", false]], "build_maxpool() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_maxpool", false]], "build_output() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_output", false]], "build_parallel() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_parallel", false]], "build_reshape() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_reshape", false]], "build_rnn() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_rnn", false]], "build_ro() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_ro", false]], "build_series() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_series", false]], "build_wav2vec2() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.build_wav2vec2", false]], "c_sorted (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.c_sorted", false]], "calculate_polygonal_environment() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.calculate_polygonal_environment", false]], "category (kraken.containers.processingstep attribute)": [[2, "id36", false], [2, "kraken.containers.ProcessingStep.category", false]], "centerline_norm (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.centerline_norm", false]], "channels (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.channels", false]], "class_mapping (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.class_mapping", false]], "class_stats (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.class_stats", false]], "codec (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.codec", false]], "codec (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.codec", false]], "codec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.codec", false]], "collate_sequences() (in module kraken.lib.dataset)": [[2, "kraken.lib.dataset.collate_sequences", false]], "compute_confusions() (in module kraken.lib.dataset)": [[2, "kraken.lib.dataset.compute_confusions", false]], "compute_polygon_section() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.compute_polygon_section", false]], "confidences (kraken.containers.baselineocrrecord attribute)": [[2, "kraken.containers.BaselineOCRRecord.confidences", false]], "confidences (kraken.containers.bboxocrrecord attribute)": [[2, "kraken.containers.BBoxOCRRecord.confidences", false]], "confidences (kraken.containers.ocr_record property)": [[2, "kraken.containers.ocr_record.confidences", false]], "criterion (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id43", false], [2, "kraken.lib.vgsl.TorchVGSLModel.criterion", false]], "cuts (kraken.containers.baselineocrrecord attribute)": [[2, "kraken.containers.BaselineOCRRecord.cuts", false]], "cuts (kraken.containers.baselineocrrecord property)": [[2, "id30", false]], "cuts (kraken.containers.bboxocrrecord attribute)": [[2, "kraken.containers.BBoxOCRRecord.cuts", false]], "cuts (kraken.containers.ocr_record property)": [[2, "kraken.containers.ocr_record.cuts", false]], "data (kraken.lib.dataset.pagewiseroset attribute)": [[2, "kraken.lib.dataset.PageWiseROSet.data", false]], "data (kraken.lib.dataset.pairwiseroset attribute)": [[2, "kraken.lib.dataset.PairWiseROSet.data", false]], "decode() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.decode", false]], "decoder (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.decoder", false]], "description (kraken.containers.processingstep attribute)": [[2, "id37", false], [2, "kraken.containers.ProcessingStep.description", false]], "device (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.device", false]], "display_order (kraken.containers.baselineocrrecord attribute)": [[2, "kraken.containers.BaselineOCRRecord.display_order", false]], "display_order (kraken.containers.bboxocrrecord attribute)": [[2, "kraken.containers.BBoxOCRRecord.display_order", false]], "display_order() (kraken.containers.baselineocrrecord method)": [[2, "id31", false]], "display_order() (kraken.containers.bboxocrrecord method)": [[2, "id34", false]], "display_order() (kraken.containers.ocr_record method)": [[2, "kraken.containers.ocr_record.display_order", false]], "encode() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.encode", false]], "encode() (kraken.lib.dataset.arrowipcrecognitiondataset method)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.encode", false]], "encode() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.encode", false]], "encode() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.encode", false]], "env (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.env", false]], "eval() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.eval", false]], "extract_polygons() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.extract_polygons", false]], "failed_samples (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.failed_samples", false]], "failed_samples (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.failed_samples", false]], "failed_samples (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.failed_samples", false]], "failed_samples (kraken.lib.dataset.pagewiseroset attribute)": [[2, "kraken.lib.dataset.PageWiseROSet.failed_samples", false]], "failed_samples (kraken.lib.dataset.pairwiseroset attribute)": [[2, "kraken.lib.dataset.PairWiseROSet.failed_samples", false]], "failed_samples (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.failed_samples", false]], "fit() (kraken.lib.train.krakentrainer method)": [[2, "kraken.lib.train.KrakenTrainer.fit", false]], "font (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.font", false]], "force_binarization (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.force_binarization", false]], "forward() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.forward", false]], "get_feature_dim() (kraken.lib.dataset.pagewiseroset method)": [[2, "kraken.lib.dataset.PageWiseROSet.get_feature_dim", false]], "get_feature_dim() (kraken.lib.dataset.pairwiseroset method)": [[2, "kraken.lib.dataset.PairWiseROSet.get_feature_dim", false]], "global_align() (in module kraken.lib.dataset)": [[2, "kraken.lib.dataset.global_align", false]], "greedy_decoder() (in module kraken.lib.ctc_decoder)": [[2, "kraken.lib.ctc_decoder.greedy_decoder", false]], "groundtruthdataset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.GroundTruthDataset", false]], "height (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.height", false]], "height (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id40", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.height", false]], "hyper_params (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.hyper_params", false]], "id (kraken.containers.baselineline attribute)": [[2, "id10", false], [2, "kraken.containers.BaselineLine.id", false]], "id (kraken.containers.bboxline attribute)": [[2, "id18", false], [2, "kraken.containers.BBoxLine.id", false]], "id (kraken.containers.processingstep attribute)": [[2, "id38", false], [2, "kraken.containers.ProcessingStep.id", false]], "id (kraken.containers.region attribute)": [[2, "id26", false], [2, "kraken.containers.Region.id", false]], "idx (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.idx", false]], "im (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.im", false]], "im_mode (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.im_mode", false]], "im_mode (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.im_mode", false]], "im_mode (kraken.lib.dataset.groundtruthdataset property)": [[2, "kraken.lib.dataset.GroundTruthDataset.im_mode", false]], "im_mode (kraken.lib.dataset.polygongtdataset property)": [[2, "kraken.lib.dataset.PolygonGTDataset.im_mode", false]], "imageinputtransforms (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.ImageInputTransforms", false]], "imagename (kraken.containers.baselineline attribute)": [[2, "id11", false], [2, "kraken.containers.BaselineLine.imagename", false]], "imagename (kraken.containers.bboxline attribute)": [[2, "id19", false], [2, "kraken.containers.BBoxLine.imagename", false]], "imagename (kraken.containers.region attribute)": [[2, "id27", false], [2, "kraken.containers.Region.imagename", false]], "imagename (kraken.containers.segmentation attribute)": [[2, "id0", false], [2, "kraken.containers.Segmentation.imagename", false]], "imgs (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.imgs", false]], "init_weights() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.init_weights", false]], "input (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id44", false], [2, "kraken.lib.vgsl.TorchVGSLModel.input", false]], "is_valid (kraken.lib.codec.pytorchcodec property)": [[2, "kraken.lib.codec.PytorchCodec.is_valid", false]], "kind (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.kind", false]], "krakencairosurfaceexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenCairoSurfaceException", false]], "krakencodecexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenCodecException", false]], "krakenencodeexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenEncodeException", false]], "krakeninputexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenInputException", false]], "krakeninvalidmodelexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenInvalidModelException", false]], "krakenrecordexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenRecordException", false]], "krakenrepoexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenRepoException", false]], "krakenstoptrainingexception (class in kraken.lib.exceptions)": [[2, "kraken.lib.exceptions.KrakenStopTrainingException", false]], "krakentrainer (class in kraken.lib.train)": [[2, "kraken.lib.train.KrakenTrainer", false]], "l2c (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.l2c", false]], "l2c_single (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.l2c_single", false]], "legacy_polygons (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.legacy_polygons", false]], "legacy_polygons_status (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.legacy_polygons_status", false]], "len (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.len", false]], "line_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.line_idx", false]], "line_iter (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.line_iter", false]], "line_orders (kraken.containers.segmentation attribute)": [[2, "id1", false], [2, "kraken.containers.Segmentation.line_orders", false]], "line_width (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.line_width", false]], "lines (kraken.containers.segmentation attribute)": [[2, "id2", false], [2, "kraken.containers.Segmentation.lines", false]], "load_any() (in module kraken.lib.models)": [[2, "kraken.lib.models.load_any", false]], "load_model() (kraken.lib.vgsl.torchvgslmodel class method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.load_model", false]], "logical_order() (kraken.containers.baselineocrrecord method)": [[2, "kraken.containers.BaselineOCRRecord.logical_order", false]], "logical_order() (kraken.containers.bboxocrrecord method)": [[2, "kraken.containers.BBoxOCRRecord.logical_order", false]], "logical_order() (kraken.containers.ocr_record method)": [[2, "kraken.containers.ocr_record.logical_order", false]], "m (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.m", false]], "max_label (kraken.lib.codec.pytorchcodec property)": [[2, "kraken.lib.codec.PytorchCodec.max_label", false]], "mbl_dict (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.mbl_dict", false]], "merge() (kraken.lib.codec.pytorchcodec method)": [[2, "kraken.lib.codec.PytorchCodec.merge", false]], "message (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id41", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.message", false]], "mm_rpred (class in kraken.rpred)": [[2, "kraken.rpred.mm_rpred", false]], "mode (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.mode", false]], "model_type (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.model_type", false]], "mreg_dict (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.mreg_dict", false]], "named_spec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.named_spec", false]], "nets (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.nets", false]], "neural_reading_order() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.neural_reading_order", false]], "nlbin() (in module kraken.binarization)": [[2, "kraken.binarization.nlbin", false]], "nn (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.nn", false]], "nn (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id45", false], [2, "kraken.lib.vgsl.TorchVGSLModel.nn", false]], "no_encode() (kraken.lib.dataset.arrowipcrecognitiondataset method)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.no_encode", false]], "no_encode() (kraken.lib.dataset.groundtruthdataset method)": [[2, "kraken.lib.dataset.GroundTruthDataset.no_encode", false]], "no_encode() (kraken.lib.dataset.polygongtdataset method)": [[2, "kraken.lib.dataset.PolygonGTDataset.no_encode", false]], "no_legacy_polygons (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.no_legacy_polygons", false]], "num_classes (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.num_classes", false]], "ocr_record (class in kraken.containers)": [[2, "kraken.containers.ocr_record", false]], "one_channel_mode (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.one_channel_mode", false]], "one_channel_mode (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.one_channel_mode", false]], "one_channel_mode (kraken.lib.vgsl.torchvgslmodel property)": [[2, "id46", false]], "one_channel_modes (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.one_channel_modes", false]], "ops (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.ops", false]], "pad (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.pad", false]], "pad (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.pad", false]], "pad (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.pad", false]], "page_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.page_idx", false]], "pages (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.pages", false]], "pagewiseroset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.PageWiseROSet", false]], "pairwiseroset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.PairWiseROSet", false]], "pattern (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.pattern", false]], "polygonal_reading_order() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.polygonal_reading_order", false]], "polygongtdataset (class in kraken.lib.dataset)": [[2, "kraken.lib.dataset.PolygonGTDataset", false]], "predict() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict", false]], "predict_labels() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict_labels", false]], "predict_string() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.predict_string", false]], "prediction (kraken.containers.baselineocrrecord attribute)": [[2, "kraken.containers.BaselineOCRRecord.prediction", false]], "prediction (kraken.containers.bboxocrrecord attribute)": [[2, "kraken.containers.BBoxOCRRecord.prediction", false]], "prediction (kraken.containers.ocr_record property)": [[2, "kraken.containers.ocr_record.prediction", false]], "processingstep (class in kraken.containers)": [[2, "kraken.containers.ProcessingStep", false]], "pytorchcodec (class in kraken.lib.codec)": [[2, "kraken.lib.codec.PytorchCodec", false]], "reading_order() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.reading_order", false]], "rebuild_alphabet() (kraken.lib.dataset.arrowipcrecognitiondataset method)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.rebuild_alphabet", false]], "region (class in kraken.containers)": [[2, "kraken.containers.Region", false]], "regions (kraken.containers.baselineline attribute)": [[2, "id12", false], [2, "kraken.containers.BaselineLine.regions", false]], "regions (kraken.containers.bboxline attribute)": [[2, "id20", false], [2, "kraken.containers.BBoxLine.regions", false]], "regions (kraken.containers.segmentation attribute)": [[2, "id3", false], [2, "kraken.containers.Segmentation.regions", false]], "render_report() (in module kraken.serialization)": [[2, "kraken.serialization.render_report", false]], "resize_output() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.resize_output", false]], "rpred() (in module kraken.rpred)": [[2, "kraken.rpred.rpred", false]], "save_model() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.save_model", false]], "scale (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.scale", false]], "scale_polygonal_lines() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.scale_polygonal_lines", false]], "scale_regions() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.scale_regions", false]], "script_detection (kraken.containers.segmentation attribute)": [[2, "id4", false], [2, "kraken.containers.Segmentation.script_detection", false]], "seg_idx (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.seg_idx", false]], "seg_type (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.seg_type", false]], "seg_type (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.seg_type", false]], "seg_type (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.seg_type", false]], "seg_type (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.seg_type", false]], "seg_type (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.seg_type", false]], "seg_type (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.seg_type", false]], "seg_types (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.seg_types", false]], "segment() (in module kraken.blla)": [[2, "kraken.blla.segment", false]], "segment() (in module kraken.pageseg)": [[2, "kraken.pageseg.segment", false]], "segmentation (class in kraken.containers)": [[2, "kraken.containers.Segmentation", false]], "serialize() (in module kraken.serialization)": [[2, "kraken.serialization.serialize", false]], "set_num_threads() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.set_num_threads", false]], "settings (kraken.containers.processingstep attribute)": [[2, "id39", false], [2, "kraken.containers.ProcessingStep.settings", false]], "skip_empty_lines (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.skip_empty_lines", false]], "skip_empty_lines (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.skip_empty_lines", false]], "skip_empty_lines (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.skip_empty_lines", false]], "spec (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "kraken.lib.vgsl.TorchVGSLModel.spec", false]], "split (kraken.containers.baselineline attribute)": [[2, "id13", false], [2, "kraken.containers.BaselineLine.split", false]], "split (kraken.containers.bboxline attribute)": [[2, "id21", false], [2, "kraken.containers.BBoxLine.split", false]], "strict (kraken.lib.codec.pytorchcodec attribute)": [[2, "kraken.lib.codec.PytorchCodec.strict", false]], "tags (kraken.containers.baselineline attribute)": [[2, "id14", false], [2, "kraken.containers.BaselineLine.tags", false]], "tags (kraken.containers.bboxline attribute)": [[2, "id22", false], [2, "kraken.containers.BBoxLine.tags", false]], "tags (kraken.containers.region attribute)": [[2, "id28", false], [2, "kraken.containers.Region.tags", false]], "tags_ignore (kraken.rpred.mm_rpred attribute)": [[2, "kraken.rpred.mm_rpred.tags_ignore", false]], "targets (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.targets", false]], "text (kraken.containers.baselineline attribute)": [[2, "id15", false], [2, "kraken.containers.BaselineLine.text", false]], "text (kraken.containers.bboxline attribute)": [[2, "id23", false], [2, "kraken.containers.BBoxLine.text", false]], "text_direction (kraken.containers.bboxline attribute)": [[2, "id24", false], [2, "kraken.containers.BBoxLine.text_direction", false]], "text_direction (kraken.containers.segmentation attribute)": [[2, "id5", false], [2, "kraken.containers.Segmentation.text_direction", false]], "text_direction (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.text_direction", false]], "text_transforms (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.text_transforms", false]], "text_transforms (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.text_transforms", false]], "text_transforms (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.text_transforms", false]], "tmpl (kraken.transcribe.transcriptioninterface attribute)": [[2, "kraken.transcribe.TranscriptionInterface.tmpl", false]], "to() (kraken.lib.models.torchseqrecognizer method)": [[2, "kraken.lib.models.TorchSeqRecognizer.to", false]], "to() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.to", false]], "torchseqrecognizer (class in kraken.lib.models)": [[2, "kraken.lib.models.TorchSeqRecognizer", false]], "torchvgslmodel (class in kraken.lib.vgsl)": [[2, "kraken.lib.vgsl.TorchVGSLModel", false]], "train (kraken.lib.models.torchseqrecognizer attribute)": [[2, "kraken.lib.models.TorchSeqRecognizer.train", false]], "train() (kraken.lib.vgsl.torchvgslmodel method)": [[2, "kraken.lib.vgsl.TorchVGSLModel.train", false]], "transcriptioninterface (class in kraken.transcribe)": [[2, "kraken.transcribe.TranscriptionInterface", false]], "transform() (kraken.lib.dataset.baselineset method)": [[2, "kraken.lib.dataset.BaselineSet.transform", false]], "transforms (kraken.lib.dataset.arrowipcrecognitiondataset attribute)": [[2, "kraken.lib.dataset.ArrowIPCRecognitionDataset.transforms", false]], "transforms (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.transforms", false]], "transforms (kraken.lib.dataset.groundtruthdataset attribute)": [[2, "kraken.lib.dataset.GroundTruthDataset.transforms", false]], "transforms (kraken.lib.dataset.polygongtdataset attribute)": [[2, "kraken.lib.dataset.PolygonGTDataset.transforms", false]], "type (kraken.containers.baselineline attribute)": [[2, "kraken.containers.BaselineLine.type", false]], "type (kraken.containers.baselineocrrecord attribute)": [[2, "id32", false], [2, "kraken.containers.BaselineOCRRecord.type", false]], "type (kraken.containers.bboxline attribute)": [[2, "kraken.containers.BBoxLine.type", false]], "type (kraken.containers.bboxocrrecord attribute)": [[2, "id35", false], [2, "kraken.containers.BBoxOCRRecord.type", false]], "type (kraken.containers.ocr_record property)": [[2, "kraken.containers.ocr_record.type", false]], "type (kraken.containers.segmentation attribute)": [[2, "id6", false], [2, "kraken.containers.Segmentation.type", false]], "use_legacy_polygons (kraken.lib.vgsl.torchvgslmodel property)": [[2, "kraken.lib.vgsl.TorchVGSLModel.use_legacy_polygons", false]], "user_metadata (kraken.lib.vgsl.torchvgslmodel attribute)": [[2, "id47", false], [2, "kraken.lib.vgsl.TorchVGSLModel.user_metadata", false]], "valid_baselines (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.valid_baselines", false]], "valid_norm (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.valid_norm", false]], "valid_regions (kraken.lib.dataset.baselineset attribute)": [[2, "kraken.lib.dataset.BaselineSet.valid_regions", false]], "vectorize_lines() (in module kraken.lib.segmentation)": [[2, "kraken.lib.segmentation.vectorize_lines", false]], "width (kraken.lib.dataset.imageinputtransforms property)": [[2, "kraken.lib.dataset.ImageInputTransforms.width", false]], "width (kraken.lib.exceptions.krakencairosurfaceexception attribute)": [[2, "id42", false], [2, "kraken.lib.exceptions.KrakenCairoSurfaceException.width", false]], "write() (kraken.transcribe.transcriptioninterface method)": [[2, "kraken.transcribe.TranscriptionInterface.write", false]], "xmlpage (class in kraken.lib.xml)": [[2, "kraken.lib.xml.XMLPage", false]]}, "objects": {"kraken.binarization": [[2, 0, 1, "", "nlbin"]], "kraken.blla": [[2, 0, 1, "", "segment"]], "kraken.containers": [[2, 1, 1, "", "BBoxLine"], [2, 1, 1, "", "BBoxOCRRecord"], [2, 1, 1, "", "BaselineLine"], [2, 1, 1, "", "BaselineOCRRecord"], [2, 1, 1, "", "ProcessingStep"], [2, 1, 1, "", "Region"], [2, 1, 1, "", "Segmentation"], [2, 1, 1, "", "ocr_record"]], "kraken.containers.BBoxLine": [[2, 2, 1, "id16", "base_dir"], [2, 2, 1, "id17", "bbox"], [2, 2, 1, "id18", "id"], [2, 2, 1, "id19", "imagename"], [2, 2, 1, "id20", "regions"], [2, 2, 1, "id21", "split"], [2, 2, 1, "id22", "tags"], [2, 2, 1, "id23", "text"], [2, 2, 1, "id24", "text_direction"], [2, 2, 1, "", "type"]], "kraken.containers.BBoxOCRRecord": [[2, 2, 1, "id33", "base_dir"], [2, 2, 1, "", "confidences"], [2, 2, 1, "", "cuts"], [2, 3, 1, "id34", "display_order"], [2, 3, 1, "", "logical_order"], [2, 2, 1, "", "prediction"], [2, 2, 1, "id35", "type"]], "kraken.containers.BaselineLine": [[2, 2, 1, "id7", "base_dir"], [2, 2, 1, "id8", "baseline"], [2, 2, 1, "id9", "boundary"], [2, 2, 1, "id10", "id"], [2, 2, 1, "id11", "imagename"], [2, 2, 1, "id12", "regions"], [2, 2, 1, "id13", "split"], [2, 2, 1, "id14", "tags"], [2, 2, 1, "id15", "text"], [2, 2, 1, "", "type"]], "kraken.containers.BaselineOCRRecord": [[2, 2, 1, "id29", "base_dir"], [2, 2, 1, "", "confidences"], [2, 4, 1, "id30", "cuts"], [2, 3, 1, "id31", "display_order"], [2, 3, 1, "", "logical_order"], [2, 2, 1, "", "prediction"], [2, 2, 1, "id32", "type"]], "kraken.containers.ProcessingStep": [[2, 2, 1, "id36", "category"], [2, 2, 1, "id37", "description"], [2, 2, 1, "id38", "id"], [2, 2, 1, "id39", "settings"]], "kraken.containers.Region": [[2, 2, 1, "id25", "boundary"], [2, 2, 1, "id26", "id"], [2, 2, 1, "id27", "imagename"], [2, 2, 1, "id28", "tags"]], "kraken.containers.Segmentation": [[2, 2, 1, "id0", "imagename"], [2, 2, 1, "id1", "line_orders"], [2, 2, 1, "id2", "lines"], [2, 2, 1, "id3", "regions"], [2, 2, 1, "id4", "script_detection"], [2, 2, 1, "id5", "text_direction"], [2, 2, 1, "id6", "type"]], "kraken.containers.ocr_record": [[2, 2, 1, "", "base_dir"], [2, 4, 1, "", "confidences"], [2, 4, 1, "", "cuts"], [2, 3, 1, "", "display_order"], [2, 3, 1, "", "logical_order"], [2, 4, 1, "", "prediction"], [2, 4, 1, "", "type"]], "kraken.lib.codec": [[2, 1, 1, "", "PytorchCodec"]], "kraken.lib.codec.PytorchCodec": [[2, 3, 1, "", "add_labels"], [2, 2, 1, "", "c_sorted"], [2, 3, 1, "", "decode"], [2, 3, 1, "", "encode"], [2, 4, 1, "", "is_valid"], [2, 2, 1, "", "l2c"], [2, 2, 1, "", "l2c_single"], [2, 4, 1, "", "max_label"], [2, 3, 1, "", "merge"], [2, 2, 1, "", "strict"]], "kraken.lib.ctc_decoder": [[2, 0, 1, "", "beam_decoder"], [2, 0, 1, "", "blank_threshold_decoder"], [2, 0, 1, "", "greedy_decoder"]], "kraken.lib.dataset": [[2, 1, 1, "", "ArrowIPCRecognitionDataset"], [2, 1, 1, "", "BaselineSet"], [2, 1, 1, "", "GroundTruthDataset"], [2, 1, 1, "", "ImageInputTransforms"], [2, 1, 1, "", "PageWiseROSet"], [2, 1, 1, "", "PairWiseROSet"], [2, 1, 1, "", "PolygonGTDataset"], [2, 0, 1, "", "collate_sequences"], [2, 0, 1, "", "compute_confusions"], [2, 0, 1, "", "global_align"]], "kraken.lib.dataset.ArrowIPCRecognitionDataset": [[2, 3, 1, "", "add"], [2, 2, 1, "", "alphabet"], [2, 2, 1, "", "arrow_table"], [2, 2, 1, "", "aug"], [2, 2, 1, "", "codec"], [2, 3, 1, "", "encode"], [2, 2, 1, "", "failed_samples"], [2, 2, 1, "", "im_mode"], [2, 2, 1, "", "legacy_polygons_status"], [2, 3, 1, "", "no_encode"], [2, 3, 1, "", "rebuild_alphabet"], [2, 2, 1, "", "seg_type"], [2, 2, 1, "", "skip_empty_lines"], [2, 2, 1, "", "text_transforms"], [2, 2, 1, "", "transforms"]], "kraken.lib.dataset.BaselineSet": [[2, 3, 1, "", "add"], [2, 2, 1, "", "aug"], [2, 2, 1, "", "class_mapping"], [2, 2, 1, "", "class_stats"], [2, 2, 1, "", "failed_samples"], [2, 2, 1, "", "im_mode"], [2, 2, 1, "", "imgs"], [2, 2, 1, "", "line_width"], [2, 2, 1, "", "mbl_dict"], [2, 2, 1, "", "mreg_dict"], [2, 2, 1, "", "num_classes"], [2, 2, 1, "", "pad"], [2, 2, 1, "", "seg_type"], [2, 2, 1, "", "targets"], [2, 3, 1, "", "transform"], [2, 2, 1, "", "transforms"], [2, 2, 1, "", "valid_baselines"], [2, 2, 1, "", "valid_regions"]], "kraken.lib.dataset.GroundTruthDataset": [[2, 3, 1, "", "add"], [2, 3, 1, "", "add_line"], [2, 3, 1, "", "add_page"], [2, 2, 1, "", "alphabet"], [2, 2, 1, "", "aug"], [2, 3, 1, "", "encode"], [2, 2, 1, "", "failed_samples"], [2, 4, 1, "", "im_mode"], [2, 3, 1, "", "no_encode"], [2, 2, 1, "", "seg_type"], [2, 2, 1, "", "skip_empty_lines"], [2, 2, 1, "", "text_transforms"], [2, 2, 1, "", "transforms"]], "kraken.lib.dataset.ImageInputTransforms": [[2, 4, 1, "", "batch"], [2, 4, 1, "", "centerline_norm"], [2, 4, 1, "", "channels"], [2, 4, 1, "", "force_binarization"], [2, 4, 1, "", "height"], [2, 4, 1, "", "mode"], [2, 4, 1, "", "pad"], [2, 4, 1, "", "scale"], [2, 4, 1, "", "valid_norm"], [2, 4, 1, "", "width"]], "kraken.lib.dataset.PageWiseROSet": [[2, 2, 1, "", "data"], [2, 2, 1, "", "failed_samples"], [2, 3, 1, "", "get_feature_dim"]], "kraken.lib.dataset.PairWiseROSet": [[2, 2, 1, "", "data"], [2, 2, 1, "", "failed_samples"], [2, 3, 1, "", "get_feature_dim"]], "kraken.lib.dataset.PolygonGTDataset": [[2, 3, 1, "", "add"], [2, 3, 1, "", "add_line"], [2, 3, 1, "", "add_page"], [2, 2, 1, "", "alphabet"], [2, 2, 1, "", "aug"], [2, 3, 1, "", "encode"], [2, 2, 1, "", "failed_samples"], [2, 4, 1, "", "im_mode"], [2, 2, 1, "", "legacy_polygons"], [2, 3, 1, "", "no_encode"], [2, 2, 1, "", "seg_type"], [2, 2, 1, "", "skip_empty_lines"], [2, 2, 1, "", "text_transforms"], [2, 2, 1, "", "transforms"]], "kraken.lib.exceptions": [[2, 1, 1, "", "KrakenCairoSurfaceException"], [2, 1, 1, "", "KrakenCodecException"], [2, 1, 1, "", "KrakenEncodeException"], [2, 1, 1, "", "KrakenInputException"], [2, 1, 1, "", "KrakenInvalidModelException"], [2, 1, 1, "", "KrakenRecordException"], [2, 1, 1, "", "KrakenRepoException"], [2, 1, 1, "", "KrakenStopTrainingException"]], "kraken.lib.exceptions.KrakenCairoSurfaceException": [[2, 2, 1, "id40", "height"], [2, 2, 1, "id41", "message"], [2, 2, 1, "id42", "width"]], "kraken.lib.models": [[2, 1, 1, "", "TorchSeqRecognizer"], [2, 0, 1, "", "load_any"]], "kraken.lib.models.TorchSeqRecognizer": [[2, 2, 1, "", "codec"], [2, 2, 1, "", "decoder"], [2, 2, 1, "", "device"], [2, 3, 1, "", "forward"], [2, 2, 1, "", "kind"], [2, 2, 1, "", "nn"], [2, 2, 1, "", "one_channel_mode"], [2, 3, 1, "", "predict"], [2, 3, 1, "", "predict_labels"], [2, 3, 1, "", "predict_string"], [2, 2, 1, "", "seg_type"], [2, 3, 1, "", "to"], [2, 2, 1, "", "train"]], "kraken.lib.segmentation": [[2, 0, 1, "", "calculate_polygonal_environment"], [2, 0, 1, "", "compute_polygon_section"], [2, 0, 1, "", "extract_polygons"], [2, 0, 1, "", "neural_reading_order"], [2, 0, 1, "", "polygonal_reading_order"], [2, 0, 1, "", "reading_order"], [2, 0, 1, "", "scale_polygonal_lines"], [2, 0, 1, "", "scale_regions"], [2, 0, 1, "", "vectorize_lines"]], "kraken.lib.train": [[2, 1, 1, "", "KrakenTrainer"]], "kraken.lib.train.KrakenTrainer": [[2, 2, 1, "", "automatic_optimization"], [2, 3, 1, "", "fit"]], "kraken.lib.vgsl": [[2, 1, 1, "", "TorchVGSLModel"]], "kraken.lib.vgsl.TorchVGSLModel": [[2, 3, 1, "", "add_codec"], [2, 3, 1, "", "append"], [2, 4, 1, "", "aux_layers"], [2, 2, 1, "", "blocks"], [2, 3, 1, "", "build_addition"], [2, 3, 1, "", "build_conv"], [2, 3, 1, "", "build_dropout"], [2, 3, 1, "", "build_groupnorm"], [2, 3, 1, "", "build_identity"], [2, 3, 1, "", "build_maxpool"], [2, 3, 1, "", "build_output"], [2, 3, 1, "", "build_parallel"], [2, 3, 1, "", "build_reshape"], [2, 3, 1, "", "build_rnn"], [2, 3, 1, "", "build_ro"], [2, 3, 1, "", "build_series"], [2, 3, 1, "", "build_wav2vec2"], [2, 2, 1, "", "codec"], [2, 2, 1, "id43", "criterion"], [2, 3, 1, "", "eval"], [2, 4, 1, "", "hyper_params"], [2, 2, 1, "", "idx"], [2, 3, 1, "", "init_weights"], [2, 2, 1, "id44", "input"], [2, 3, 1, "", "load_model"], [2, 2, 1, "", "m"], [2, 4, 1, "", "model_type"], [2, 2, 1, "", "named_spec"], [2, 2, 1, "id45", "nn"], [2, 4, 1, "id46", "one_channel_mode"], [2, 2, 1, "", "ops"], [2, 2, 1, "", "pattern"], [2, 3, 1, "", "resize_output"], [2, 3, 1, "", "save_model"], [2, 4, 1, "", "seg_type"], [2, 3, 1, "", "set_num_threads"], [2, 2, 1, "", "spec"], [2, 3, 1, "", "to"], [2, 3, 1, "", "train"], [2, 4, 1, "", "use_legacy_polygons"], [2, 2, 1, "id47", "user_metadata"]], "kraken.lib.xml": [[2, 1, 1, "", "XMLPage"]], "kraken.pageseg": [[2, 0, 1, "", "segment"]], "kraken.rpred": [[2, 1, 1, "", "mm_rpred"], [2, 0, 1, "", "rpred"]], "kraken.rpred.mm_rpred": [[2, 2, 1, "", "bidi_reordering"], [2, 2, 1, "", "bounds"], [2, 2, 1, "", "im"], [2, 2, 1, "", "len"], [2, 2, 1, "", "line_iter"], [2, 2, 1, "", "nets"], [2, 2, 1, "", "no_legacy_polygons"], [2, 2, 1, "", "one_channel_modes"], [2, 2, 1, "", "pad"], [2, 2, 1, "", "seg_types"], [2, 2, 1, "", "tags_ignore"]], "kraken.serialization": [[2, 0, 1, "", "render_report"], [2, 0, 1, "", "serialize"]], "kraken.transcribe": [[2, 1, 1, "", "TranscriptionInterface"]], "kraken.transcribe.TranscriptionInterface": [[2, 3, 1, "", "add_page"], [2, 2, 1, "", "env"], [2, 2, 1, "", "font"], [2, 2, 1, "", "line_idx"], [2, 2, 1, "", "page_idx"], [2, 2, 1, "", "pages"], [2, 2, 1, "", "seg_idx"], [2, 2, 1, "", "text_direction"], [2, 2, 1, "", "tmpl"], [2, 3, 1, "", "write"]]}, "objnames": {"0": ["py", "function", "Python function"], "1": ["py", "class", "Python class"], "2": ["py", "attribute", "Python attribute"], "3": ["py", "method", "Python method"], "4": ["py", "property", "Python property"]}, "objtypes": {"0": "py:function", "1": "py:class", "2": "py:attribute", "3": "py:method", "4": "py:property"}, "terms": {"": [0, 1, 2, 4, 5, 6, 7, 8], "0": [0, 2, 4, 5, 7, 8], "00": [0, 5, 7], "0001": 5, "0005": 4, "001": [5, 7], "00it": 5, "01": 4, "0123456789": [0, 4, 7], "0178e411df69": 1, "01c59": 8, "0245": 7, "04": 7, "06": [0, 7], "07": [0, 5], "09": [0, 7], "0ce11ad6": 1, "0d": 7, "0xe682": 4, "0xe68b": 4, "0xe8bf": 4, "0xe8e5": 0, "0xf038": 0, "0xf128": 0, "0xf1a7": 4, "1": [0, 1, 2, 5, 7, 8], "10": [0, 1, 4, 5, 7], "100": [0, 2, 5, 7, 8], "1000": 5, "101": 1, "1020": 8, "10218": 5, "1024": 8, "10592716": 4, "106": [1, 5], "107": 1, "108": 5, "11": 7, "1128": 5, "11346": 5, "1184": 7, "12": [5, 7, 8], "120": 5, "1200": 5, "122": 5, "125": 5, "128": [5, 8], "128000": 5, "128k": 5, "13": [5, 7], "132": 7, "1339": 7, "134": 5, "135": 5, "1359": 7, "136": [1, 5], "14": [0, 5], "1408": 2, "1416": 7, "143": 7, "144": 5, "145": 1, "15": [1, 5, 7], "151": 5, "1558": 7, "1567": 5, "157": 7, "16": [0, 2, 5, 8], "161": 7, "1623": 7, "1681": 7, "1697": 7, "16th": 4, "17": [2, 5], "171": 1, "172": 5, "1724": 7, "174": 5, "1754": 7, "176": 7, "18": [5, 7], "1800": 8, "182": 1, "1873": 1, "19": 5, "192": 5, "195": 1, "198": 5, "199": 5, "1996": 7, "1bpp": 0, "1cycl": 5, "1d": 8, "1e": 5, "1f3b": 1, "1st": 7, "1x0": 5, "1x12": [5, 8], "1x16": 8, "1x48": 8, "2": [0, 2, 4, 5, 7, 8], "20": [2, 5, 8], "200": 5, "2001": [2, 5], "2006": 2, "2014": 2, "2019": 5, "2020": 4, "2021": 0, "2024": 4, "203": 1, "204": 7, "2053": 1, "207": 5, "2096": 7, "21": 4, "210": 5, "215": 5, "2182": 1, "21st": 4, "22": [0, 5, 7], "2236": 1, "224": 1, "2243": 1, "2256": 1, "226": 1, "2264": 1, "22fee3d1": 1, "23": [0, 5], "231": 1, "2324": 1, "2326": 1, "2334": 7, "2336": 1, "2337": 1, "2339": 1, "2344": 1, "2364": 7, "237": 1, "239": 1, "2397": 1, "2398": 1, "23rd": 2, "24": [0, 7], "2404": 1, "241": 5, "2420": 1, "2421": 1, "2422": 1, "2428": 1, "2436": 1, "2437": 1, "244": 1, "2446": 1, "245": 1, "246": 5, "2477": 1, "25": [5, 7, 8], "250": 1, "2500": 7, "2523": 1, "2539": 1, "2542": 1, "256": [5, 7, 8], "2574": 1, "258": 1, "2581": 1, "259": [1, 7], "26": 7, "266": 5, "269": 1, "27": 5, "270": 7, "27046": 7, "274": [1, 5], "277": 1, "28": 5, "2873": 2, "29": [0, 5], "294": 1, "2d": [2, 8], "3": [2, 5, 7, 8], "30": [4, 5, 7], "300": 5, "300dpi": 7, "304": 1, "307": 7, "309": 1, "31": 5, "32": [5, 8], "328": 5, "336": 7, "3418": 7, "345": 1, "35": 5, "35000": 7, "3504": 7, "3519": 7, "35619": 7, "365": 7, "3680": 7, "3748": 1, "3772": 1, "377e": 1, "38": 5, "384": 8, "39": [1, 5], "4": [4, 5, 7, 8], "40": 7, "400": 5, "4000": 5, "4130": 1, "428": 7, "431": 7, "45": 5, "46": 5, "469": 1, "47": 7, "48": [5, 7, 8], "488": 7, "49": [0, 5, 7], "4a35": 1, "4bba": 1, "4d": 2, "4eea": 1, "4f7d": 1, "5": [2, 5, 7, 8], "50": [5, 7], "500": 5, "5000": 5, "50bb": 1, "512": 8, "52": [5, 7], "5226": 5, "523": 5, "5230": 5, "5234": 5, "5258": 7, "5281": [0, 4], "53": 5, "536": 5, "53980": 5, "539eadc": 1, "54": 1, "54114": 5, "5431": 5, "545": 7, "5468665": 0, "56": [0, 4, 7], "5617734": 0, "5617783": 0, "575": 5, "577": 7, "59": [7, 8], "5951": 7, "5983a0c50ce8": 1, "599": 7, "6": [5, 7, 8], "60": [5, 7], "6022": 5, "61": 1, "62": 5, "63": 5, "64": [5, 8], "646": 7, "6542744": 0, "66": [5, 7], "6731": 1, "675": 1, "687": 1, "688": 1, "7": [5, 7, 8], "701": 5, "7012": 5, "7015": 7, "71": [1, 5], "7272": 7, "7281": 7, "738": 1, "74": [1, 5], "758": 1, "7593": 5, "773": 5, "7857": 5, "788": [5, 7], "789": 1, "790": 1, "794": 5, "7943": 5, "8": [0, 2, 5, 7, 8], "80": [2, 5], "800": 7, "8014": 5, "81": [5, 7], "811": 7, "82": 5, "824": 7, "8337": 5, "8344": 5, "8374": 5, "84": 7, "8445": 7, "8479": 7, "8481": 7, "8482": 7, "8484": 7, "8485": 7, "8486": 7, "8487": 7, "8488": 7, "8489": 7, "8490": 7, "8491": 7, "8492": 7, "8493": 7, "8494": 7, "8495": 7, "8496": 7, "8497": 7, "8498": 7, "8499": 7, "8500": 7, "8501": 7, "8502": 7, "8503": 7, "8504": 7, "8505": 7, "8506": 7, "8507": 7, "8508": 7, "8509": 7, "8510": 7, "8511": 7, "8512": 7, "8616": 5, "8620": 5, "876": 7, "8760": 5, "8762": 5, "8790": 5, "8795": 5, "8797": 5, "88": [5, 7], "8802": 5, "8804": 5, "8806": 5, "8813": 5, "8876": 5, "8878": 5, "8883": 5, "889": 7, "9": [2, 5, 7, 8], "90": [2, 5], "906": 8, "906x32": 8, "91": 5, "912": 5, "92": 1, "93": 1, "9315": 7, "9318": 7, "9350": 7, "9361": 7, "9381": 7, "95": [0, 5], "9541": 7, "9550": 7, "96": [5, 7], "97": 7, "98": [4, 7], "99": 7, "9918": 7, "9920": 7, "9924": 7, "A": [0, 1, 2, 4, 5, 7, 8], "As": [0, 1, 2, 5], "BY": 0, "By": 7, "For": [0, 1, 2, 5, 7, 8], "If": [0, 2, 4, 5, 7, 8], "In": [0, 1, 2, 4, 5, 7], "It": [0, 1, 5, 7, 8], "Its": 0, "NO": 7, "One": [2, 5], "The": [0, 1, 2, 3, 4, 5, 7, 8], "Then": 5, "There": [0, 1, 4, 5, 6, 7], "These": [0, 1, 2, 4, 5, 7], "To": [0, 1, 2, 4, 5, 7], "Will": 2, "With": [0, 5], "_abcdefghijklmnopqrstuvwxyz": 4, "a287": 1, "a785": 1, "a8c8": 1, "aaebv2": 0, "abbrevi": 4, "abbyyxml": [0, 4], "abcdefghijklmnopqrstuvwxyz": 4, "abcdefghijklmnopqrstuvxabcdefghijklmnopqrstuvwxyz": 0, "abjad": 5, "abl": [0, 2, 5, 7], "abort": [5, 7], "about": 7, "abov": [0, 1, 4, 5, 7], "absolut": [2, 5], "abstract": 2, "abugida": 5, "acceler": [4, 5, 7], "accent": [0, 4], "accept": [0, 1, 2, 5], "access": [0, 1, 2], "access_token": 0, "accord": [0, 2, 5], "accordingli": 2, "account": [0, 7], "accur": 5, "accuraci": [0, 1, 2, 4, 5, 7], "achiev": 7, "acm": 2, "across": [2, 5], "action": [0, 5], "activ": [0, 5, 7, 8], "actual": [2, 4, 5, 7], "acut": [0, 4], "ad": [2, 5, 7], "adam": 5, "adapt": 5, "add": [0, 2, 4, 5, 8], "add_codec": 2, "add_label": 2, "add_lin": 2, "add_pag": 2, "addit": [0, 1, 2, 4, 5], "addition": 2, "adjust": [5, 7, 8], "administr": 0, "advantag": 5, "advis": 7, "affect": 7, "after": [0, 1, 2, 5, 7, 8], "afterward": [0, 1], "again": [4, 7], "agenc": 4, "aggreg": 2, "ah": 7, "aid": [2, 4], "aim": 5, "aku": 7, "al": [2, 7], "alam": 7, "albeit": 7, "aletheia": 7, "alex": 2, "algn1": 2, "algn2": 2, "algorithm": [0, 1, 2, 5], "align": [2, 5], "align1": 2, "align2": 2, "all": [0, 1, 2, 4, 5, 6, 7], "allographet": 4, "allow": [2, 5, 6, 7], "almost": [0, 1], "along": [2, 8], "alphabet": [0, 2, 4, 5, 7, 8], "alreadi": 5, "also": [0, 1, 2, 4, 5, 7], "altern": [2, 5, 8], "although": [0, 1, 5, 7], "alto": [0, 1, 4, 7], "alto_doc": 1, "alto_seg_onli": 1, "alwai": [0, 2, 4], "amiss": 7, "among": 5, "amount": [0, 2, 5, 7], "amp": 5, "an": [0, 1, 2, 4, 5, 7, 8], "anaconda": 4, "analogu": 0, "analysi": [0, 4, 7], "ani": [0, 1, 2, 5], "annot": [0, 4, 5], "anoth": [0, 2, 5, 7, 8], "anr": 4, "antiqua": 0, "anymor": [0, 5, 7], "anyth": 2, "apach": 4, "apart": [0, 3, 5], "api": 5, "appear": 2, "append": [0, 2, 5, 7, 8], "appli": [0, 1, 2, 4, 7, 8], "applic": [1, 7], "approach": [4, 5, 7], "appropri": [0, 2, 4, 5, 7, 8], "approv": 0, "approxim": [1, 5], "ar": [0, 1, 2, 4, 5, 6, 7, 8], "arab": [0, 5, 7], "arbitrari": [1, 6, 7, 8], "architectur": [4, 5, 6, 8], "archiv": [1, 5, 7], "area": [0, 2], "aren": [2, 5], "arg": 2, "argument": [1, 5], "arian": 0, "arm": 4, "around": [0, 1, 2, 5, 7], "arrai": [1, 2], "arrow": [2, 5], "arrow_t": 2, "arrowipcrecognitiondataset": 2, "arxiv": 2, "ask": 0, "aspect": 2, "assign": [2, 5, 7], "associ": [1, 2], "assum": 2, "attach": [1, 5], "attribut": [1, 2, 5], "au": 4, "aug": 2, "augment": [1, 2, 5, 7, 8], "author": [0, 4], "authorship": 0, "auto": [1, 2, 5], "autocast": 2, "automat": [0, 1, 2, 5, 7, 8], "automatic_optim": 2, "aux_lay": 2, "auxiliari": [0, 1], "avail": [0, 1, 4, 5, 7], "avenir": 4, "averag": [0, 2, 5, 7], "awar": 1, "awesom": 0, "awni": 2, "axi": [2, 8], "b": [0, 1, 2, 5, 7, 8], "b247": 1, "b9e5": 1, "back": [2, 8], "backbon": 5, "backend": 3, "background": [0, 2, 5], "backslash": 5, "base": [1, 2, 5, 6, 7, 8], "base_dir": [1, 2], "baselin": [2, 4, 5, 7], "baseline_seg": 1, "baselinelin": [1, 2], "baselineocrrecord": [1, 2], "baselineset": 2, "basic": [0, 5, 7], "batch": [0, 2, 5, 7, 8], "batch_siz": 5, "bayr\u016bt": 7, "bbox": [1, 2], "bboxlin": [1, 2], "bboxocrrecord": [1, 2], "bcewithlogitsloss": 5, "beam": 2, "beam_decod": 2, "beam_siz": 2, "becaus": [1, 4, 5, 7], "becom": 0, "been": [0, 2, 4, 5, 7], "befor": [2, 5, 7, 8], "beforehand": 7, "behav": [5, 8], "behavior": [2, 5], "being": [1, 2, 5, 8], "below": [0, 5, 7], "best": [0, 2, 7], "better": 5, "between": [0, 2, 5, 7], "bi": [2, 8], "biblissima": 4, "bidi": [0, 2, 4, 5], "bidi_reord": 2, "bidirect": [2, 5], "bidirection": 8, "bien": 1, "binar": [1, 7], "binari": [0, 1, 2], "bind": 0, "bit": [1, 5], "biton": 2, "bl": [0, 4], "black": [0, 1, 2, 7], "black_colsep": 2, "blank": 2, "blank_threshold_decod": 2, "blla": 1, "blob": 2, "block": [0, 1, 2, 5, 8], "block_i": 5, "block_n": 5, "blocktyp": 2, "board": 4, "boilerpl": 1, "book": 0, "bookhand": 0, "bool": 2, "border": [0, 2], "both": [0, 1, 2, 3, 4, 5, 7], "bottom": [0, 1, 2, 4], "bound": [0, 1, 2, 4, 5], "boundari": [0, 1, 2, 5], "box": [1, 2, 4, 5], "branch": 8, "break": 7, "brought": 5, "build": [2, 5, 7], "build_addit": 2, "build_conv": 2, "build_dropout": 2, "build_groupnorm": 2, "build_ident": 2, "build_maxpool": 2, "build_output": 2, "build_parallel": 2, "build_reshap": 2, "build_rnn": 2, "build_ro": 2, "build_seri": 2, "build_wav2vec2": 2, "buld\u0101n": 7, "bundl": 5, "bw": [0, 4], "bw_im": 1, "bw_imag": 7, "b\u00e9n\u00e9fici\u00e9": 4, "c": [0, 1, 2, 4, 5, 8], "c1": 2, "c2": 2, "c4a751dc": 1, "c7767d10c407": 1, "c_sort": 2, "cach": 2, "cairo": 2, "calcul": [1, 2, 5], "calculate_polygonal_environ": 2, "call": [1, 2, 5, 7], "callabl": 2, "callback": 1, "can": [0, 1, 2, 3, 4, 5, 7, 8], "cannot": [0, 1], "capabl": [0, 5], "case": [0, 1, 2, 5, 7], "cat": 0, "catalan": 4, "categori": 2, "catmu": 4, "caus": [1, 2, 5], "caveat": 5, "cb910c0aaf2b": 1, "cc": [0, 4], "cd": 4, "ce": [4, 7], "cedilla": 4, "cell": 8, "cent": 7, "center": 5, "centerlin": [2, 5], "centerline_norm": 2, "central": [4, 7], "centuri": 4, "certain": [0, 2, 7], "chain": [0, 4, 7], "chanc": 2, "chang": [0, 1, 2, 5], "channel": [2, 4, 8], "char": 2, "char_": 2, "char_accuraci": 2, "char_confus": 2, "charact": [0, 1, 2, 4, 5, 6, 7], "charconfid": 2, "charparam": 2, "charset": 2, "check": 0, "chines": [0, 5], "chinese_training_data": 5, "choic": 5, "chosen": 1, "circumflex": 4, "circumst": 7, "class": [0, 1, 2, 5, 7], "class_map": 2, "class_stat": 2, "classic": 7, "classif": [2, 5, 7, 8], "classifi": [0, 1, 8], "classmethod": 2, "claus": 7, "cli": 1, "clone": 4, "close": [4, 5], "closer": 1, "clstm": [2, 6], "cl\u00e9rice": 4, "code": [0, 1, 2, 4, 5, 7], "codec": 1, "coher": 0, "collabor": 4, "collate_sequ": 2, "collect": [2, 7], "color": [0, 1, 5, 7, 8], "colsep": 0, "column": [0, 1, 2], "com": [2, 4, 7], "combin": [0, 1, 2, 4, 5, 7, 8], "come": [2, 5, 8], "comma": 4, "command": [0, 1, 4, 5, 7], "commenc": 1, "common": [2, 5, 7], "commoni": 5, "commun": 0, "compact": [0, 6], "compar": 5, "comparison": 5, "compat": [2, 3, 4, 5], "compil": 5, "complet": [1, 5, 7], "complex": [1, 7], "complic": 5, "compon": 5, "compos": 2, "composedblocktyp": 5, "composit": 0, "compound": 2, "compress": 7, "compris": 7, "comput": [0, 2, 3, 4, 5, 7], "computation": 7, "compute_confus": 2, "compute_polygon_sect": 2, "con": 1, "concaten": 8, "conda": 7, "condit": [4, 5], "confer": 2, "confid": [0, 1, 2], "configur": [1, 2, 5, 8], "confluenc": 8, "conform": 5, "confus": [0, 2, 5], "conjunct": 5, "connect": [2, 5, 7], "connectionist": 2, "conserv": 5, "consid": [0, 2], "consist": [0, 1, 4, 7, 8], "consolid": 4, "constant": 5, "construct": [1, 5, 7], "consumpt": 5, "contain": [0, 1, 4, 5, 6, 7], "contemporari": 0, "content": [1, 2, 5], "contentgener": 2, "continu": [0, 1, 2, 5, 7], "contrast": [5, 7], "contrib": 1, "contribut": 4, "control": 5, "conv": [5, 8], "converg": 5, "convers": [1, 7], "convert": [0, 1, 2, 5, 7], "convolut": [2, 5], "coord": 5, "coordin": [2, 4], "core": [5, 6], "coreml": 2, "corpu": 5, "correct": [0, 1, 2, 5, 7], "correctli": 8, "correspond": [0, 1, 2], "corsican": 4, "cosin": 5, "cost": 7, "could": [2, 5], "couldn": 2, "count": [2, 5, 7], "counter": 2, "coupl": [0, 5, 7], "cover": 0, "coverag": 7, "cpu": [1, 2, 5, 7], "cr3": [5, 8], "cr7": 5, "creat": [0, 1, 2, 4, 5, 7, 8], "creation": 0, "cremma": 0, "cremma_medieval_bicerin": 0, "criterion": [2, 5], "css": 0, "ctc": [1, 2, 5], "ctc_decod": 1, "ctr3": 8, "cuda": [3, 4, 5], "cudnn": 3, "cumbersom": 0, "cuneiform": 5, "curat": 0, "current": [0, 2, 4, 5, 6], "curv": 0, "custom": [0, 1, 2, 5], "cut": [1, 2, 4], "cycl": 5, "d": [0, 4, 5, 7, 8], "d4b57683f5b0": 1, "dai": 4, "data": [0, 1, 2, 4, 7, 8], "dataclass": 1, "dataset": 1, "dataset_larg": 5, "date": [0, 4], "de": [1, 2, 4, 7], "deal": [0, 4, 5], "debug": [1, 5, 7], "decai": 5, "decent": 5, "decid": [0, 5], "decis": 5, "decod": [1, 2, 5], "decompos": 5, "decomposit": 5, "decreas": 7, "def": 1, "default": [0, 1, 4, 5, 6, 7, 8], "defaultlin": 5, "defin": [0, 1, 2, 4, 5, 8], "definit": [0, 5, 8], "degrad": 1, "degre": 7, "del": 2, "del_indic": 2, "delai": 5, "delet": [0, 2, 5, 7], "denot": 0, "depend": [0, 1, 2, 4, 5, 7], "deposit": 0, "deprec": [0, 2], "depth": [5, 7, 8], "deriv": 1, "describ": [2, 5], "descript": [0, 1, 2, 5], "descriptor": 2, "deseri": 2, "desir": [1, 2, 8], "desktop": 7, "destin": 2, "destroi": 5, "detail": [0, 2, 5, 7], "detect": [0, 2], "determin": [0, 2, 5], "develop": [2, 4], "deviat": 5, "devic": [1, 2, 5, 7], "diachron": 4, "diacrit": [4, 5], "diaeres": 7, "diaeresi": [4, 7], "diagram": 5, "dialect": 8, "dice": 5, "dict": 2, "dictionari": [1, 2, 5], "differ": [0, 1, 4, 5, 7, 8], "difficult": 5, "digit": 4, "dilat": 8, "dilation_i": 8, "dilation_x": 8, "dim": [5, 7, 8], "dimens": [2, 8], "dimension": 5, "dir": 5, "direct": [1, 2, 4, 5, 7, 8], "direction": 0, "directli": [0, 1, 5, 8], "directori": [1, 2, 4, 5, 7], "disabl": [0, 2, 5, 7], "disallow": 2, "discover": 0, "disk": 7, "displai": [2, 5], "display_ord": 2, "dissimilar": 5, "dist1": 2, "dist2": 2, "distanc": 2, "distinguish": 5, "distractor": 5, "distribut": 8, "dnn": 2, "do": [0, 1, 2, 4, 5, 6, 7, 8], "do0": [5, 8], "doc": [0, 2], "document": [0, 1, 2, 4, 5, 7], "doe": [0, 1, 2, 5, 7], "doesn": [2, 5, 7], "doi": 0, "domain": [1, 5], "don": 5, "done": [0, 4, 5, 7, 8], "dot": [4, 7], "down": [5, 7, 8], "download": [0, 4, 7], "downward": 2, "drastic": 5, "drawback": [0, 5], "driver": 1, "drop": [1, 8], "dropcapitallin": 5, "dropout": [2, 5, 7], "du": 4, "duplic": 2, "dure": [2, 5, 7], "e": [0, 1, 2, 5, 7, 8], "each": [0, 1, 2, 4, 5, 7, 8], "earli": [5, 7], "earlier": 2, "early_stop": 5, "easi": 2, "easiest": 7, "easili": [5, 7], "ecod": 2, "edg": 2, "edit": 7, "editor": 7, "edu": 7, "effect": 0, "either": [0, 1, 2, 5, 7, 8], "element": [1, 5], "elementref": 2, "els": 2, "emit": 2, "emploi": [0, 7], "empti": [2, 5], "enabl": [1, 2, 3, 5, 7, 8], "enable_progress_bar": [1, 2], "enable_summari": 2, "encapsul": 1, "encod": [2, 5, 7], "end": [1, 2, 5], "end_separ": 2, "endfor": 2, "endif": 2, "endmacro": 2, "endpoint": 2, "energi": 2, "enforc": [0, 5], "engin": 1, "english": 4, "enough": 7, "ensur": 5, "entir": 5, "entiti": 2, "entri": 2, "env": [2, 4, 7], "environ": [2, 4, 7], "environment_cuda": 4, "epoch": [5, 7], "equal": [1, 7, 8], "equival": 8, "erron": 7, "error": [0, 2, 5, 7], "escal": [0, 2], "escap": 5, "escripta": 4, "escriptorium": [4, 7], "especi": 0, "esr": 4, "essenti": 5, "estim": [0, 2, 5, 7], "et": 2, "etc": 0, "european": 4, "eval": 2, "evalu": 5, "evaluation_data": 1, "evaluation_fil": 1, "even": [0, 2, 5, 7], "everi": [0, 1], "everyth": 5, "evolv": 4, "exact": [5, 7], "exactli": [1, 5], "exampl": [0, 1, 5, 7], "except": [1, 4, 5], "exchang": 0, "execut": [0, 7, 8], "exhaust": 7, "exist": [0, 1, 4, 5, 7], "exit": 2, "expand": 0, "expect": [2, 5, 7, 8], "experi": [4, 5, 7], "experiment": 7, "explain": 0, "explic": 0, "explicit": [1, 5], "explicitli": [1, 5, 7], "exponenti": 5, "express": 0, "extend": [2, 8], "extens": [0, 5], "extent": 7, "extern": 1, "extra": [2, 4], "extract": [0, 1, 2, 4, 5, 7], "extract_polygon": 2, "extrapol": 2, "extrem": 5, "f": [0, 4, 5, 7, 8], "f17d03e0": 1, "f795": 1, "fact": 5, "factor": [0, 2], "fail": 5, "failed_sampl": 2, "faint": 0, "fairli": [5, 7], "fallback": 0, "fals": [1, 2, 5, 7, 8], "fame": 0, "fancy_model": 0, "faq\u012bh": 7, "fashion": 5, "faster": [5, 7, 8], "fc1": 5, "fc2": 5, "fd": 2, "featur": [1, 2, 5, 7, 8], "fed": [0, 1, 2, 5, 8], "feed": [0, 1, 5], "feminin": 7, "fetch": 7, "few": [0, 5, 7], "field": [2, 5], "file": [0, 1, 2, 4, 5, 6, 7], "file_1": 5, "file_2": 5, "filenam": [1, 2, 5], "filenotfounderror": 2, "filetyp": [1, 2], "fill": 2, "filter": [1, 2, 5, 8], "final": [0, 2, 4, 5, 7, 8], "find": [0, 5, 7], "fine": [1, 7], "finereader10": 2, "finereader_xml": 2, "finetun": 5, "finish": 7, "first": [0, 1, 2, 4, 5, 7, 8], "fit": [1, 2, 7], "fix": [0, 5, 7, 8], "flag": [1, 2, 4, 5], "float": [0, 2], "flow": [0, 5], "flush": 2, "fname": 2, "follow": [0, 2, 4, 5, 8], "fondu": 4, "font": 2, "font_styl": 2, "foo": [1, 5], "footrul": 5, "forbid": 2, "forc": [0, 2], "force_binar": 2, "foreground": 0, "forg": 4, "form": [0, 2, 5], "format": [1, 2, 6, 7], "format_typ": 1, "formul": 8, "forward": [2, 8], "found": [0, 1, 2, 5, 7], "four": 0, "fp": 1, "fr_manu_ro": 5, "fr_manu_ro_best": 5, "fr_manu_seg": 5, "fr_manu_seg_best": 5, "fr_manu_seg_with_ro": 5, "framework": [1, 4], "free": [2, 5], "freeli": [0, 7], "freez": 5, "freeze_backbon": 2, "french": [0, 4, 5], "frequenc": [5, 7], "friendli": [4, 7], "from": [0, 1, 2, 3, 4, 7, 8], "full": 7, "fulli": [2, 4, 5], "function": [1, 5], "fundament": [0, 1], "further": [0, 1, 2, 4, 5], "g": [0, 2, 5, 7, 8], "gabai": 4, "gain": 1, "garantue": 2, "gaussian_filt": 2, "gc": 2, "gener": [0, 1, 2, 5, 7], "geneva": 4, "gentl": 5, "geometr": 5, "geometri": 2, "german": 4, "get": [0, 1, 4, 5, 7], "get_feature_dim": 2, "get_sorted_lin": 1, "git": 4, "github": 4, "githubusercont": 7, "gitter": 4, "give": 8, "given": [1, 2, 5, 8], "glob": [0, 1], "global": 2, "global_align": 2, "glori": 0, "glyph": [2, 5, 7], "gn": 8, "gn32": 5, "gn8": 8, "go": 7, "good": 5, "gov": [2, 5], "gpu": [1, 5], "gradient": 2, "grain": [1, 7], "graph": [2, 8], "graphem": [2, 5, 7], "graphemat": 4, "graphic": 5, "grave": [2, 4], "grayscal": [0, 1, 2, 5, 7, 8], "greedi": 2, "greedili": 2, "greedy_decod": [1, 2], "greek": [0, 4, 7], "grei": 0, "grek": 0, "ground": [5, 7], "ground_truth": 1, "groundtruthdataset": 2, "group": [4, 7], "groupnorm": 8, "gru": [2, 8], "gt": [2, 5], "guarante": 1, "guid": 7, "guidelin": 4, "g\u00e9r\u00e9e": 4, "h": [0, 2, 7], "ha": [0, 1, 2, 5, 7, 8], "hamza": [5, 7], "han": 5, "hand": [5, 7], "handl": 1, "handwrit": 5, "handwritten": [0, 5], "hannun": 2, "happen": 1, "happili": 0, "hard": [2, 7], "hardwar": 4, "haut": 4, "have": [0, 1, 2, 3, 4, 5, 7], "headinglin": 5, "heatmap": [0, 1, 8], "hebrew": [0, 5, 7], "hebrew_training_data": 5, "height": [0, 2, 5, 8], "held": 7, "hellip": 4, "help": [4, 7], "henc": 8, "here": [0, 5], "heurist": [0, 5], "high": [0, 1, 2, 7, 8], "higher": 8, "highli": [2, 5, 7], "hijo": 1, "histor": 4, "hline": 0, "hoc": 5, "hocr": [0, 4, 7], "honor": 0, "horizon": 4, "horizont": [0, 1, 2], "hour": 7, "how": [4, 5, 7], "howev": 8, "hpo": [2, 5], "hpu": 5, "html": 2, "htr": 4, "http": [2, 4, 5, 7], "huffmann": 5, "human": [2, 5], "hundr": 7, "hyper_param": 2, "hyperparamet": 5, "h\u0101d\u012b": 7, "i": [0, 1, 2, 4, 5, 6, 7, 8], "ibn": 7, "id": [1, 2, 5], "ident": [1, 2, 8], "identifi": [0, 2], "idx": 2, "ignor": [0, 2, 5], "illustr": 2, "im": [1, 2], "im_feat": 2, "im_mod": 2, "im_siz": 2, "im_transform": 2, "imag": [0, 1, 2, 4, 5, 8], "image_s": [1, 2], "imagefilenam": 5, "imageinputtransform": 2, "imagenam": [1, 2], "imaginari": [2, 7], "img": 2, "immedi": 5, "immut": 1, "implement": [0, 1, 8], "impli": 5, "implicit": 1, "implicitli": 5, "import": [0, 1, 5, 7], "importantli": [2, 5, 7], "improv": [0, 5, 7], "includ": [0, 1, 4, 5, 7], "inclus": 0, "incompat": 2, "inconsist": 4, "incorrect": 7, "increas": [5, 7], "independ": [0, 8], "index": [0, 2, 5], "indic": [2, 5, 7], "individu": [0, 2, 5], "individualis": 0, "inf": 5, "infer": [0, 2, 4, 5, 7], "influenc": 5, "inform": [0, 1, 2, 4, 5, 7], "ingest": 5, "inherit": [5, 7], "init": 1, "init_weight": 2, "initi": [0, 1, 2, 5, 7, 8], "inlin": 0, "innov": 4, "input": [1, 2, 5, 7, 8], "input_1": [0, 7], "input_2": [0, 7], "input_imag": 7, "inria": 4, "ins": 2, "insert": [1, 2, 5, 7, 8], "insid": 2, "insight": 1, "inspect": [5, 7], "instal": 3, "instanc": [0, 1, 2, 5], "instanti": 2, "instead": [2, 5, 7], "insuffici": 7, "int": 2, "integ": [0, 1, 2, 5, 7, 8], "integr": 7, "intend": 4, "intens": 7, "interact": 0, "interchang": 2, "interfac": [2, 4], "interlinearlin": 5, "intermedi": [1, 5, 7], "intern": [0, 1, 2, 7], "interoper": 2, "interrupt": 5, "introduct": 5, "inttensor": 2, "intuit": 8, "invalid": [2, 5], "inventori": [5, 7], "invers": 0, "investiss": 4, "invoc": 5, "invok": 7, "involv": [5, 7], "iou": 5, "ipc": 2, "ipu": 5, "irregular": 5, "is_tot": 1, "is_valid": 2, "isn": [1, 2, 7, 8], "italian": 4, "item": 2, "iter": [1, 2, 7], "its": [0, 1, 2, 5, 7, 8], "itself": 1, "j": [2, 4], "jinja": 0, "jinja2": [1, 2], "join": 2, "jpeg": [0, 7], "jpeg2000": [0, 4], "jpg": [0, 5], "json": [0, 2, 4, 5], "just": [0, 1, 4, 5, 7], "justif": 5, "k": [2, 5], "kamil": 5, "keep": [0, 5], "kei": [2, 4], "kernel": [5, 8], "kernel_s": 8, "keto": [0, 5, 7], "keyword": 0, "kind": [0, 2, 5, 6, 7], "kit\u0101b": 7, "know": 7, "known": [2, 7], "kraken": [0, 1, 3, 5, 6, 8], "krakencairosurfaceexcept": 2, "krakencodecexcept": 2, "krakenencodeexcept": 2, "krakeninputexcept": 2, "krakeninvalidmodelexcept": 2, "krakenrecordexcept": 2, "krakenrepoexcept": 2, "krakenstoptrainingexcept": 2, "krakentrain": [1, 2], "kutub": 7, "kwarg": 2, "l": [0, 2, 4, 7, 8], "l2c": [1, 2], "l2c_singl": 2, "la": 4, "label": [0, 1, 2, 5], "lack": 7, "lag": 5, "lang": 2, "languag": [2, 4, 5, 8], "larg": [0, 1, 2, 4, 5, 7], "larger": [2, 5, 7], "last": [0, 2, 5, 8], "lastli": 5, "later": [0, 7], "latest": [3, 4], "latin": [0, 4], "latin_training_data": 5, "latn": [0, 4], "latter": 1, "layer": [2, 5, 7], "layout": [0, 2, 4, 5, 7], "lbx100": [5, 7, 8], "lbx128": 8, "lbx200": 5, "lbx256": [5, 8], "learn": [1, 2, 5], "least": [1, 5, 7], "leav": [5, 8], "lectaurep": 0, "left": [0, 2, 4, 5, 7], "leftmost": 2, "leftward": 0, "legaci": [5, 7, 8], "legacy_polygon": 2, "legacy_polygons_statu": 2, "leipzig": 7, "len": 2, "length": [2, 5], "less": [5, 7], "let": 7, "letter": [0, 4], "level": [0, 1, 2, 5, 7], "lfx25": 8, "lfys20": 8, "lfys64": 8, "lib": 1, "libr": 4, "librari": 1, "licens": 0, "ligatur": 4, "light": 0, "lightn": [1, 2], "lightningmodul": 1, "like": [0, 1, 5, 7], "likewis": [1, 7], "limit": [0, 5], "line": [0, 1, 2, 4, 5, 7, 8], "line_0": 5, "line_1469098625593_463": 1, "line_1469098649515_464": 1, "line_1469099255968_508": 1, "line_idx": 2, "line_implicit": 1, "line_it": 2, "line_k": 5, "line_ord": [1, 2], "line_transkribu": 1, "line_typ": 2, "line_type_": 2, "line_width": 2, "linear": [2, 5, 7, 8], "link": [4, 5], "linux": [4, 7], "list": [0, 1, 2, 4, 5, 7], "liter": 2, "litteratur": 0, "ll": 4, "lo": 1, "load": [0, 1, 2, 4, 5, 7], "load_ani": [1, 2], "load_model": [1, 2], "loadabl": 2, "loader": 1, "loc": [2, 5], "local": 5, "locat": [1, 2, 5, 7], "log": [2, 5, 7], "log_dir": 2, "logger": [2, 5], "logic": [2, 5], "logical_ord": 2, "logograph": 5, "long": [0, 4, 5], "longest": 2, "look": [0, 1, 5, 7], "loop": 2, "loss": 5, "lossless": 7, "lot": [1, 5], "low": [0, 1, 2, 5], "lower": 5, "lr": [0, 1, 2, 7], "lrate": 5, "lstm": [2, 8], "ltr": 0, "m": [0, 2, 4, 5, 7, 8], "mac": [4, 7], "machin": 2, "macro": 2, "macron": [0, 4], "maddah": 7, "made": 7, "mai": [0, 1, 2, 5, 7], "main": [0, 4, 5, 7], "mainli": 1, "major": 1, "make": [0, 5], "mandatori": 1, "mani": [2, 5], "manifest": 5, "manual": [0, 1, 2, 5, 7], "manuscript": [0, 4, 7], "map": [0, 1, 2, 5], "mark": [5, 7], "markedli": 7, "mask": [1, 2, 5], "massag": 5, "match": [2, 5, 8], "materi": [0, 1, 4, 5, 7], "matric": 2, "matrix": [1, 5], "matter": 7, "max": 2, "max_epoch": 2, "max_label": 2, "maxcolsep": [0, 2], "maxim": 7, "maximum": [0, 2, 8], "maxpool": [2, 5, 8], "mb": [0, 5], "mbl_dict": 2, "mean": [1, 2, 5, 7], "measur": 5, "measurementunit": [2, 5], "mediev": [0, 4], "memori": [2, 5, 7], "merg": [2, 5], "merge_baselin": 2, "merge_region": 2, "messag": 2, "metadata": [0, 1, 2, 4, 5, 6, 7], "method": [0, 1, 2], "metric": 5, "might": [0, 4, 5, 7], "min": [2, 5], "min_epoch": 2, "min_length": 2, "mind": 5, "minim": [1, 2, 5], "minimum": 5, "minor": 5, "mismatch": [1, 5, 7], "misrecogn": 7, "miss": [0, 2, 5, 7], "mittagessen": [4, 7], "mix": [0, 2, 5], "ml": 6, "mlmodel": [0, 4, 5, 7], "mlp": 5, "mm_rpred": [1, 2], "mode": [0, 1, 2, 5], "model": [1, 7, 8], "model_1": 5, "model_25": 5, "model_5": 5, "model_best": 5, "model_fil": 7, "model_nam": 7, "model_name_best": 7, "model_path": 1, "model_typ": 2, "modern": [0, 7], "modest": 1, "modif": 5, "modul": 1, "momentum": [5, 7], "mono": 0, "more": [0, 1, 2, 4, 5, 7, 8], "most": [0, 1, 2, 5, 7], "mostli": [0, 1, 4, 5, 7, 8], "move": [2, 7, 8], "mp": 8, "mp2": [5, 8], "mp3": 8, "mreg_dict": 2, "much": [1, 2, 4, 5], "multi": [0, 1, 2, 4, 7], "multilabel": 2, "multipl": [0, 1, 2, 4, 5, 7], "my": 0, "myprintingcallback": 1, "n": [0, 2, 5, 8], "name": [0, 2, 4, 5, 7, 8], "named_spec": 2, "national": 4, "nativ": [0, 2, 6], "natur": [2, 7], "nchw": 2, "ndarrai": 2, "necessari": [0, 1, 2, 4, 5, 7], "necessarili": [2, 5], "need": [1, 2, 5, 7], "neg": 5, "nest": 2, "net": [1, 2, 7], "network": [1, 2, 4, 5, 6, 7], "neural": [1, 2, 5, 6, 7], "neural_reading_ord": 2, "never": 7, "nevertheless": [1, 5], "new": [0, 1, 2, 3, 5, 7, 8], "next": [1, 7], "nf": 5, "nfc": 5, "nfd": 5, "nfkc": 5, "nfkd": [4, 5], "nlbin": [0, 1, 2], "nn": 2, "no_encod": 2, "no_hlin": 2, "no_legacy_polygon": 2, "noisi": 7, "non": [0, 1, 2, 4, 5, 7, 8], "none": [0, 1, 2, 5, 7, 8], "nonlinear": 8, "nor": 1, "norm": 4, "normal": [2, 4], "notabl": 0, "note": 2, "notion": 1, "now": [1, 7], "np": 2, "num": [2, 5], "num_class": 2, "number": [0, 1, 2, 5, 7, 8], "numer": [1, 2, 7], "numpi": [1, 2], "nvidia": [3, 5], "o": [0, 1, 2, 4, 5, 7], "o1c103": 8, "o2l8": 8, "o_": 2, "o_1530717944451": 1, "object": [0, 1, 2], "obtain": 7, "obvious": 7, "occur": 7, "occurr": 2, "ocr": [0, 1, 2, 4, 7], "ocr_": 2, "ocr_0": 2, "ocr_record": [1, 2], "ocropi": 2, "ocropu": [0, 2], "off": [5, 7], "offer": 5, "offset": [2, 5], "often": [0, 1, 5, 7], "ogonek": 4, "old": [0, 2, 6], "omit": 7, "on_init_end": 1, "on_init_start": 1, "on_train_end": 1, "onc": [0, 5], "one": [0, 1, 2, 5, 7, 8], "one_channel_mod": 2, "ones": [0, 1, 5], "onli": [0, 1, 2, 5, 7, 8], "onto": [2, 5], "op": 2, "open": 1, "openmp": [2, 5, 7], "oper": [1, 2, 8], "optic": [0, 7], "optim": [0, 4, 5, 7], "option": [0, 1, 2, 5, 8], "order": [0, 1, 4, 8], "orderedgroup": 2, "org": [2, 5], "orient": [0, 1, 2], "origin": [1, 2, 5, 8], "originalcoord": 2, "orthogon": 2, "other": [0, 2, 4, 5, 7, 8], "othertag": 2, "otherwis": [2, 5], "out": [0, 5, 7, 8], "output": [1, 2, 4, 5, 7, 8], "output_1": [0, 7], "output_2": [0, 7], "output_dir": 7, "output_fil": 7, "output_s": 2, "outsid": 2, "over": [2, 4], "overal": 5, "overfit": 7, "overhead": 5, "overlap": 5, "overrepres": 5, "overrid": [2, 5], "overwritten": 2, "own": 4, "p": [0, 4, 5], "pac": 1, "packag": [1, 2, 4, 7], "pacto": 1, "pad": [0, 2, 5], "padding_left": 2, "padding_right": 2, "pag": 1, "page": [1, 2, 4, 7], "page_0": 2, "page_idx": 2, "pagecont": 5, "pageseg": 1, "pagewiseroset": 2, "pagexml": [0, 1, 4, 7], "paint": 5, "pair": [0, 2, 5], "pairwiseroset": 2, "paper": [0, 4], "par": [1, 2, 4], "paradigm": 0, "paragraph": [2, 5], "parallel": [2, 5, 8], "param": [5, 7, 8], "paramet": [0, 1, 2, 4, 5, 7, 8], "parameterless": 0, "parametr": 2, "parchment": 0, "pari": 4, "pars": [2, 5], "parsed_doc": 1, "parser": [1, 2, 5], "part": [0, 1, 5, 7, 8], "parti": 1, "partial": [2, 4], "particular": [0, 1, 4, 5, 7, 8], "partit": 5, "pass": [2, 5, 7, 8], "path": [1, 2, 5], "pathlik": 2, "pattern": [2, 7], "pcgt": 5, "pdf": [0, 4, 7], "pdfimag": 7, "pdftocairo": 7, "peopl": 4, "per": [0, 1, 2, 5, 7], "perc": [0, 2], "percentag": 2, "percentil": 2, "perfect": 5, "perform": [1, 2, 4, 5, 7], "period": 7, "perispomeni": 4, "persist": 0, "person": 0, "physical_img_nr": 2, "pick": 5, "pickl": 6, "pil": [1, 2], "pillow": 1, "pinch": 0, "pinpoint": 7, "pipelin": [1, 2, 5], "pixel": [0, 1, 2, 5, 8], "pl_logger": 2, "pl_modul": 1, "place": [0, 4, 5, 7], "placement": 7, "plain": 0, "platform": 0, "pleas": 5, "plethora": 1, "png": [0, 1, 5, 7], "point": [0, 1, 2, 5, 7], "polygon": [0, 1, 2, 5, 7], "polygonal_reading_ord": 2, "polygongtdataset": 2, "polygonizaton": 2, "polylin": 2, "polyton": [0, 7], "pool": 5, "porson": 0, "portant": 4, "portion": 0, "posit": [2, 5], "possibl": [0, 1, 2, 4, 5, 7, 8], "postoper": 2, "postprocess": [1, 2, 5], "potenti": 5, "power": [5, 7], "practic": 1, "pratiqu": 4, "pre": [0, 5], "preced": 5, "precis": [2, 5], "precompil": [2, 5], "precomput": 2, "pred": 2, "pred_it": 1, "predict": [1, 2, 5], "predict_label": 2, "predict_str": 2, "prefer": [1, 7], "prefilt": 0, "prefix": [2, 5, 7], "prefix_epoch": 7, "preliminari": 0, "preload": 7, "prematur": 5, "preoper": 2, "prepar": 7, "prepend": 8, "preprint": 2, "preprocess": [2, 4], "prerequisit": 4, "present": 4, "preserv": [2, 4], "pretrain_best": 5, "prevent": [2, 7], "previou": [4, 5], "previous": [4, 5], "previtem": 2, "primaresearch": 5, "primari": [0, 1, 5], "primarili": 4, "princip": [1, 2, 5], "principl": 4, "print": [0, 1, 2, 4, 5, 7], "printspac": [2, 5], "privat": 0, "prob": [2, 8], "probabl": [2, 5, 7, 8], "problemat": 5, "proc_type_t": 2, "proce": 8, "proceed": 2, "process": [0, 1, 2, 4, 5, 7, 8], "processing_step": 2, "processingcategori": 2, "processingsoftwar": 2, "processingstep": 2, "processingstepdescript": 2, "processingstepset": 2, "produc": [0, 1, 2, 4, 5, 7], "programm": 4, "progress": [2, 7], "project": [4, 8], "prone": 5, "pronn": 6, "proper": 1, "properli": 7, "properti": [1, 2], "proport": 5, "proportion": 2, "protobuf": [2, 6], "prove": 7, "provid": [0, 1, 2, 4, 5, 7, 8], "psl": 4, "public": [0, 4], "publish": 4, "pull": 4, "pure": 5, "purpos": [0, 1, 2, 7, 8], "put": [2, 5, 7], "py": 1, "pypi": 4, "pyrnn": 6, "python": 4, "pytorch": [0, 1, 2, 3, 6], "pytorch_lightn": 1, "pytorchcodec": 2, "pyvip": 4, "q": 5, "qualiti": [0, 1, 7], "queryabl": 0, "quit": [1, 4, 5], "r": [0, 2, 5, 8], "rais": [1, 2, 5], "raise_on_error": 2, "ran": 4, "random": [2, 5, 7], "randomli": 5, "rang": [0, 2], "rank": 5, "rapidli": [5, 7], "rare": 5, "rate": [5, 7], "rather": [0, 5], "ratio": 5, "raw": [0, 1, 5, 7], "rb": 2, "reach": [5, 7], "read": [0, 1, 4], "reader": 5, "reading_ord": [1, 2], "reading_order_fn": 2, "readingord": 2, "real": 7, "realiz": 5, "reason": [0, 2, 5], "rebuild_alphabet": 2, "rec_model_path": 1, "recherch": 4, "recogn": [0, 1, 2, 4, 5, 7], "recognit": [3, 8], "recognitionmodel": 1, "recommend": [0, 1, 5, 7], "recomput": 2, "record": [1, 2, 4], "rectangl": 2, "rectangular": 0, "recurr": [2, 6], "reduc": [5, 8], "reduceonplateau": 5, "ref": 2, "refer": [0, 1, 5, 7], "refin": 5, "region": [0, 1, 2, 4, 5, 7], "region_1469098557906_461": 1, "region_1469098609000_462": 1, "region_implicit": 1, "region_ord": 1, "region_transkribu": 1, "region_typ": [2, 5], "region_type_": 2, "regular": 5, "reinstanti": 2, "rel": 5, "relat": [0, 1, 5, 7], "relax": 7, "reli": 5, "reliabl": [5, 7], "relu": [5, 8], "remain": [0, 5, 7], "remaind": 8, "remedi": 7, "remov": [0, 2, 5, 7, 8], "render": [1, 2], "render_lin": 2, "render_report": 2, "reorder": [2, 5, 7], "repeatedli": 7, "replac": 1, "repolygon": 1, "report": [2, 5, 7], "repositori": [4, 5, 7], "repres": 2, "represent": [2, 7], "reproduc": 5, "request": [0, 4, 8], "requir": [0, 1, 2, 4, 5, 7, 8], "requisit": 7, "rescal": 2, "research": 4, "reserv": 1, "reshap": [2, 5], "resili": 4, "resiz": [2, 5], "resize_output": 2, "resolut": 2, "resolv": [4, 5], "respect": [1, 2, 4, 5, 8], "result": [0, 1, 2, 4, 5, 7, 8], "resum": 5, "retain": [2, 5], "retrain": 7, "retriev": [4, 5, 7], "return": [0, 1, 2, 8], "reus": 2, "revers": [4, 8], "rgb": [1, 2, 5, 8], "right": [0, 2, 4, 5, 7], "ring": 4, "rl": [0, 2], "rmsprop": [5, 7], "rnn": [2, 4, 5, 7, 8], "ro": [1, 2, 5], "ro_": 2, "ro_0": 2, "ro_id": 2, "ro_net": 5, "roadd": 5, "robust": 5, "romanov": 7, "root": 5, "rotat": 0, "rotrain": 5, "rough": 7, "roughli": 0, "round": 2, "routin": 1, "rpred": 1, "rtl": 0, "rtl_display_data": 5, "rtl_training_data": 5, "rukkakha": 7, "rule": 7, "run": [1, 2, 3, 4, 5, 7, 8], "r\u00e9f\u00e9renc": 4, "s1": [5, 8], "sa": 0, "same": [0, 1, 2, 4, 5, 7, 8], "sampl": [2, 5, 7], "sarah": 7, "satur": 5, "savant": 7, "save": [2, 5, 7], "save_model": 2, "savefreq": [5, 7], "scale": [0, 2, 5, 8], "scale_polygonal_lin": 2, "scale_region": 2, "scan": 7, "scantailor": 7, "schedul": 5, "schema": [2, 5], "schemaloc": [2, 5], "scientif": 4, "score": 5, "scratch": [0, 1], "script": [0, 1, 2, 4, 5, 7], "script_detect": [1, 2], "scriptal": 1, "scroung": 4, "seamcarv": 2, "search": [0, 2], "second": [0, 2], "section": [1, 2, 7], "see": [0, 1, 2, 5, 7], "seen": [0, 1, 7], "seg": 1, "seg_idx": 2, "seg_typ": 2, "segment": [4, 7], "segment_": 2, "segment_k": 5, "segmentation_output": 1, "segmentation_overlai": 1, "segmentationmodel": 1, "segmodel_best": 5, "segtrain": 5, "seldom": 7, "select": [0, 2, 5, 8], "selector": 2, "self": 1, "semant": [5, 7], "semi": [0, 7], "sensibl": [1, 5], "separ": [0, 1, 2, 4, 5, 7, 8], "sephardi": 0, "seq1": 2, "seq2": 2, "seqrecogn": 2, "sequenc": [1, 2, 5, 7, 8], "serial": [0, 4, 5, 6, 8], "set": [0, 1, 2, 4, 5, 7, 8], "set_num_thread": 2, "setup": 1, "sever": [1, 4, 7, 8], "sgd": 5, "shape": [2, 5, 8], "share": [0, 5], "shell": 7, "shini": 2, "ship": 2, "short": [0, 8], "should": [1, 2, 7], "show": [0, 4, 5, 7], "shown": [0, 7], "shuffl": 1, "side": 0, "sigmoid": 8, "signific": 5, "similar": [1, 5, 7], "simon": 4, "simpl": [0, 1, 5, 7, 8], "simpli": [2, 8], "simplifi": 0, "singl": [0, 1, 2, 5, 7, 8], "singular": 2, "size": [0, 1, 2, 5, 7, 8], "skew": [0, 7], "skip": 2, "skip_empty_lin": 2, "slice": 2, "slightli": [0, 4, 5, 7, 8], "slow": [2, 5], "slower": 5, "small": [0, 1, 2, 4, 5, 7, 8], "so": [0, 1, 2, 3, 5, 7, 8], "sobel": 2, "softmax": [1, 2, 8], "softwar": [0, 7], "softwarenam": 2, "softwarevers": 2, "some": [0, 1, 4, 5, 7], "someon": 0, "someth": [1, 7], "sometim": [1, 4, 5, 7], "somewhat": 7, "soon": [5, 7], "sort": [2, 4, 7], "sourc": [1, 2, 5, 7, 8], "sourceimageinform": [2, 5], "sp": [2, 5], "space": [0, 1, 2, 4, 5, 7], "spanish": 4, "spearman": 5, "spec": [2, 5], "special": [0, 1, 2], "specialis": 5, "specif": [0, 5, 7], "specifi": [0, 1, 5], "speckl": 7, "speech": 2, "speed": 5, "speedup": 5, "split": [1, 2, 5, 7, 8], "split_filt": 2, "spot": 4, "sqrt": 5, "squar": 5, "squash": [2, 8], "stabil": 2, "stabl": [1, 4, 5], "stack": [2, 5, 8], "stage": [0, 1, 5], "standard": [0, 1, 2, 4, 5, 7], "start": [0, 1, 2, 5, 7], "start_separ": 2, "startup": 5, "stddev": 5, "step": [0, 1, 2, 4, 5, 7, 8], "still": [0, 1, 2, 4], "stop": [5, 7], "storag": 5, "str": 2, "straightforward": 1, "stream": 5, "strength": 1, "strict": [2, 5], "strictli": 7, "stride": [5, 8], "stride_i": 8, "stride_x": 8, "string": [2, 5, 8], "strip": 8, "strong": 4, "structur": [1, 4, 5], "stub": 5, "style": 1, "su": 1, "sub": [1, 2], "subclass": 1, "subcommand": [0, 4], "subcommand_1": 0, "subcommand_2": 0, "subcommand_n": 0, "subimag": 2, "suboptim": 5, "subsampl": 5, "subsequ": [1, 2, 5], "subset": [1, 2], "substitut": [2, 5, 7], "suffer": 7, "suffici": [1, 5], "suffix": 0, "suggest": [0, 1], "suit": 7, "suitabl": [0, 7], "sum": [2, 5], "summar": [2, 5, 7, 8], "superflu": 7, "superscript": 4, "supervis": 5, "suppl_obj": 2, "suppli": [0, 1, 2, 5, 7], "support": [0, 1, 4, 5, 6], "suppos": 1, "suppress": [0, 5], "sure": [0, 5], "surfac": [0, 2], "surrog": 5, "switch": [0, 2, 5, 7], "symbol": [5, 7], "syntax": [0, 5, 8], "syr": [5, 7], "syriac": 7, "syriac_best": 7, "system": [0, 4, 5, 7, 8], "systemat": 7, "t": [0, 1, 2, 5, 7, 8], "tabl": [5, 7], "tag": [1, 2, 5], "tagref": 2, "tags_ignor": 2, "take": [1, 4, 5, 7, 8], "tanh": 8, "target": 2, "target_output_shap": 2, "task": [5, 7], "tb": 2, "technic": 4, "tei": 0, "tell": 5, "templat": [0, 1, 4], "template_sourc": 2, "tempor": 2, "tensor": [1, 2, 8], "tensorboard": 5, "tensorflow": 8, "term": 4, "tesseract": 8, "test": [2, 7], "test_model": 5, "text": [1, 2, 4, 7], "text_direct": [1, 2], "text_transform": 2, "textblock": [2, 5], "textblock_": 2, "textblock_m": 5, "textblock_n": 5, "textequiv": 5, "textlin": [2, 5], "textregion": 5, "textregion_1520586482298_193": 1, "textregion_1520586482298_194": 1, "textual": 1, "th": 2, "than": [2, 5, 7], "thei": [1, 2, 5, 7], "them": [0, 1, 2, 5], "therefor": [0, 1, 5, 7], "therein": 7, "thi": [0, 1, 2, 4, 5, 6, 7, 8], "thibault": 4, "thing": 5, "third": 1, "those": [4, 5], "though": 1, "thousand": 7, "thread": [2, 5, 7], "three": [5, 6], "threshold": [0, 2], "through": [0, 1, 2, 4, 5, 7, 8], "thrown": 0, "thu": 1, "tif": [0, 4], "tiff": [0, 4, 7], "tightli": 7, "tild": [0, 4], "time": [0, 1, 2, 5, 7, 8], "tip": 1, "titl": 0, "titr": 4, "tmpl": [0, 2], "to_contain": 1, "todo": 1, "togeth": 8, "token": 0, "told": 5, "too": [5, 8], "tool": [1, 5, 7, 8], "top": [0, 1, 2, 4], "toplin": [2, 5], "topolog": 0, "torch": 2, "torchsegrecogn": 2, "torchseqrecogn": [1, 2], "torchvgslmodel": [1, 2], "total": [5, 7], "tpu": 5, "tr9": 2, "track": 5, "train": [0, 3, 8], "trainabl": [0, 1, 2, 4, 5], "trainer": [1, 5], "training_data": [1, 5], "training_fil": 1, "transcrib": [4, 5, 7], "transcript": [1, 2, 4, 5], "transcriptioninterfac": 2, "transfer": [1, 5], "transform": [1, 2, 4, 5, 8], "transkribu": 1, "translat": 2, "transpos": [5, 7, 8], "travail": 4, "treat": [2, 7, 8], "trial": 5, "true": [1, 2, 8], "truli": 0, "truth": [5, 7], "try": [2, 4, 8], "tupl": 2, "turn": 4, "tutori": [1, 5], "tweak": 0, "two": [0, 1, 2, 5, 8], "txt": [0, 4, 5], "type": [0, 1, 2, 5, 7, 8], "typefac": [5, 7], "typograph": 7, "typologi": 5, "u": [0, 1, 4, 5], "u1f05": 5, "uax": 2, "un": 4, "unclean": 7, "unclear": 5, "undecod": 1, "undegrad": 0, "under": [0, 4], "undesir": [5, 8], "unencod": 2, "uneven": 0, "uni": [0, 7], "unicod": [1, 2, 4, 7], "uniformli": 2, "union": [2, 4, 5], "uniqu": [0, 2, 7], "univers": [0, 4], "universit\u00e9": 4, "unknown": 2, "unlabel": 5, "unlearn": 5, "unless": 5, "unnecessarili": 1, "unord": 1, "unorderedgroup": 2, "unpredict": 7, "unrepres": 7, "unseg": [2, 7], "unset": 5, "until": 5, "untrain": 5, "unus": 5, "up": [1, 4, 5], "upcom": 4, "updat": 0, "upload": [0, 5], "upon": 0, "upward": [2, 5, 7], "ur": 0, "us": [0, 1, 2, 3, 5, 7, 8], "usabl": 1, "use_legacy_polygon": 2, "user": [0, 2, 4, 5, 7], "user_metadata": 2, "usual": [0, 1, 5, 7], "utf": [2, 5], "util": [1, 4, 5, 7], "v": [2, 4, 5, 7], "v1": 2, "v4": [2, 5], "val_loss": 5, "val_spearman": 5, "valid": [0, 2, 5], "valid_baselin": 2, "valid_norm": 2, "valid_region": 2, "valu": [0, 1, 2, 5, 8], "valueerror": 2, "variabl": [2, 4, 5, 8], "variant": [4, 5, 8], "variat": 5, "varieti": [4, 5], "variou": 0, "vast": 1, "vector": [0, 1, 2, 5], "vectorize_lin": 2, "verbos": [1, 7], "veri": 5, "versa": [0, 5], "versatil": 6, "version": [0, 2, 3, 4, 5], "vertic": [0, 2], "vgsl": [1, 5], "vice": [0, 5], "visual": [0, 5], "vocabulari": 2, "vocal": 7, "vpo": [2, 5], "vsgl": 2, "vv": 7, "w": [0, 1, 2, 5, 8], "w3": [2, 5], "w3c": 0, "wa": [2, 4, 5, 7], "wai": [0, 1, 5, 7], "wait": 5, "want": [4, 5, 7], "warmup": 5, "warn": [0, 1, 2, 7], "warp": 7, "wav2vec2": 2, "wc": 2, "we": [2, 5, 7], "weak": [1, 7], "websit": 7, "weight": [2, 5], "welcom": 4, "well": [0, 5, 7], "were": [2, 5], "west": 4, "western": 7, "wget": 7, "what": [1, 7], "when": [0, 1, 2, 5, 7, 8], "where": [0, 2, 5, 7, 8], "whether": 2, "which": [0, 1, 2, 3, 4, 5], "while": [0, 1, 2, 5, 7], "white": [0, 1, 2, 7], "whitespac": [2, 5], "whitespace_norm": 2, "whole": [2, 7], "wide": [4, 8], "wider": 0, "width": [1, 2, 5, 7, 8], "wildli": 7, "without": [0, 2, 5, 7], "won": 2, "word": [2, 4, 5], "word_accuraci": 2, "word_text": 5, "wordstart": 2, "work": [0, 1, 2, 5, 7, 8], "workabl": 5, "worker": 5, "world": [0, 7], "worsen": 0, "would": [0, 2, 5], "wrapper": [1, 2], "write": [0, 1, 2, 5], "writing_mod": 2, "written": [0, 5, 7], "www": [2, 5], "x": [0, 2, 4, 5, 7, 8], "x0": 2, "x01": 1, "x02": 1, "x03": 1, "x04": 1, "x05": 1, "x06": 1, "x07": 1, "x1": 2, "x64": 4, "x_n": 2, "x_stride": 8, "xa0": 7, "xdg_base_dir": 0, "xk": 2, "xm": 2, "xmax": 2, "xmin": 2, "xml": [0, 7], "xmln": [2, 5], "xmlpage": [1, 2], "xmlschema": [2, 5], "xn": 2, "xsd": [2, 5], "xsi": [2, 5], "xyz": 0, "y": [0, 2, 8], "y0": 2, "y1": 2, "y2": 2, "y_n": 2, "y_stride": 8, "year": 4, "yield": 2, "yk": 2, "ym": 2, "ymax": 2, "ymin": 2, "yml": [4, 7], "yn": 2, "you": [4, 5, 7], "your": 5, "ypogegrammeni": 4, "y\u016bsuf": 7, "zenodo": [0, 4], "zero": [2, 7, 8], "zigzag": 0, "zoom": [0, 2], "\u00e3\u00ed\u00f1\u00f5": 0, "\u00e6\u00df\u00e6\u0111\u0142\u0153\u0153\u0180\u01dd\u0247\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c2\u03c3\u03c4\u03c5\u03c6\u03c7\u03c9\u03db\u05d7\u05dc\u05e8\u1455\u15c5\u15de\u16a0\u00df": 4, "\u00e9cole": 4, "\u00e9tat": 4, "\u00e9tude": 4, "\u0127\u0129\u0142\u0169\u01ba\u1d49\u1ebd": 0, "\u02bf\u0101lam": 7, "\u0621": 5, "\u0621\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0627": 5, "\u0628": 5, "\u0629": 5, "\u062a": 5, "\u062b": 5, "\u062c": 5, "\u062d": 5, "\u062e": 5, "\u062f": 5, "\u0630": 5, "\u0631": 5, "\u0632": 5, "\u0633": 5, "\u0634": 5, "\u0635": 5, "\u0636": 5, "\u0637": 5, "\u0638": 5, "\u0639": 5, "\u063a": 5, "\u0640": 5, "\u0641": 5, "\u0642": 5, "\u0643": 5, "\u0644": 5, "\u0645": 5, "\u0646": 5, "\u0647": 5, "\u0648": 5, "\u0649": 5, "\u064a": 5, "\u0710": 7, "\u0712": 7, "\u0713": 7, "\u0715": 7, "\u0717": 7, "\u0718": 7, "\u0719": 7, "\u071a": 7, "\u071b": 7, "\u071d": 7, "\u071f": 7, "\u0720": 7, "\u0721": 7, "\u0722": 7, "\u0723": 7, "\u0725": 7, "\u0726": 7, "\u0728": 7, "\u0729": 7, "\u072a": 7, "\u072b": 7, "\u072c": 7, "\u2079\ua751\ua753\ua76f\ua770": 0, "\ua751\ua753\ua757\ua759\ua75f\ua76f\ua775": 4}, "titles": ["Advanced Usage", "API Quickstart", "API Reference", "GPU Acceleration", "kraken", "Training", "Models", "Training kraken", "VGSL network specification"], "titleterms": {"4": 2, "abbyi": 2, "acceler": 3, "acquisit": 7, "advanc": 0, "alto": [2, 5], "annot": 7, "api": [1, 2], "baselin": [0, 1], "basic": [1, 8], "best": 5, "binar": [0, 2], "binari": 5, "blla": 2, "box": 0, "codec": [2, 5], "compil": 7, "concept": 1, "conda": 4, "contain": 2, "convolut": 8, "coreml": 6, "ctc_decod": 2, "data": 5, "dataset": [2, 5, 7], "default": 2, "direct": 0, "dropout": 8, "evalu": [2, 7], "exampl": 8, "except": 2, "featur": 4, "find": 4, "fine": 5, "format": [0, 5], "from": 5, "function": 2, "fund": 4, "gpu": 3, "group": 8, "helper": [2, 8], "hocr": 2, "imag": 7, "input": 0, "instal": [4, 7], "kraken": [2, 4, 7], "layer": 8, "legaci": [0, 1, 2], "lib": 2, "licens": 4, "linegen": 2, "loss": 2, "mask": 0, "max": 8, "model": [0, 2, 4, 5, 6], "modul": 2, "network": 8, "normal": [5, 8], "order": [2, 5], "output": 0, "page": [0, 5], "pageseg": 2, "pagexml": 2, "pars": 1, "pip": 4, "plumb": 8, "pool": 8, "practic": 5, "preprocess": [1, 7], "pretrain": 5, "princip": 0, "publish": 0, "queri": 0, "quickstart": [1, 4], "read": [2, 5], "recognit": [0, 1, 2, 4, 5, 6, 7], "recurr": 8, "refer": 2, "regular": 8, "relat": 4, "repositori": 0, "reshap": 8, "retriev": 0, "rpred": 2, "scratch": 5, "segment": [0, 1, 2, 5, 6], "serial": [1, 2], "slice": 5, "softwar": 4, "specif": 8, "templat": 2, "test": 5, "text": [0, 5], "train": [1, 2, 4, 5, 7], "trainer": 2, "transcrib": 2, "transcript": 7, "tune": 5, "tutori": 4, "unicod": 5, "unsupervis": 5, "us": 4, "usag": 0, "valid": 7, "vgsl": [2, 8], "xml": [1, 2, 5]}}) \ No newline at end of file diff --git a/main/training.html b/main/training.html new file mode 100644 index 000000000..76f866a78 --- /dev/null +++ b/main/training.html @@ -0,0 +1,509 @@ + + + + + + + + Training kraken — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

Training kraken

+

kraken is an optical character recognition package that can be trained fairly +easily for a large number of scripts. In contrast to other system requiring +segmentation down to glyph level before classification, it is uniquely suited +for the recognition of connected scripts, because the neural network is trained +to assign correct character to unsegmented training data.

+

Both segmentation, the process finding lines and regions on a page image, and +recognition, the conversion of line images into text, can be trained in kraken. +To train models for either we require training data, i.e. examples of page +segmentations and transcriptions that are similar to what we want to be able to +recognize. For segmentation the examples are the location of baselines, i.e. +the imaginary lines the text is written on, and polygons of regions. For +recognition these are the text contained in a line. There are multiple ways to +supply training data but the easiest is through PageXML or ALTO files.

+
+

Installing kraken

+

The easiest way to install and use kraken is through conda. kraken works both on Linux and Mac OS +X. After installing conda, download the environment file and create the +environment for kraken:

+
$ wget https://raw.githubusercontent.com/mittagessen/kraken/main/environment.yml
+$ conda env create -f environment.yml
+
+
+

Each time you want to use the kraken environment in a shell is has to be +activated first:

+
$ conda activate kraken
+
+
+
+
+

Image acquisition and preprocessing

+

First a number of high quality scans, preferably color or grayscale and at +least 300dpi are required. Scans should be in a lossless image format such as +TIFF or PNG, images in PDF files have to be extracted beforehand using a tool +such as pdftocairo or pdfimages. While each of these requirements can +be relaxed to a degree, the final accuracy will suffer to some extent. For +example, only slightly compressed JPEG scans are generally suitable for +training and recognition.

+

Depending on the source of the scans some preprocessing such as splitting scans +into pages, correcting skew and warp, and removing speckles can be advisable +although it isn’t strictly necessary as the segmenter can be trained to treat +noisy material with a high accuracy. A fairly user-friendly software for +semi-automatic batch processing of image scans is Scantailor albeit most work can be done using a standard image +editor.

+

The total number of scans required depends on the kind of model to train +(segmentation or recognition), the complexity of the layout or the nature of +the script to recognize. Only features that are found in the training data can +later be recognized, so it is important that the coverage of typographic +features is exhaustive. Training a small segmentation model for a particular +kind of material might require less than a few hundred samples while a general +model can well go into the thousands of pages. Likewise a specific recognition +model for printed script with a small grapheme inventory such as Arabic or +Hebrew requires around 800 lines, with manuscripts, complex scripts (such as +polytonic Greek), and general models for multiple typefaces and hands needing +more training data for the same accuracy.

+

There is no hard rule for the amount of training data and it may be required to +retrain a model after the initial training data proves insufficient. Most +western texts contain between 25 and 40 lines per page, therefore upward of +30 pages have to be preprocessed and later transcribed.

+
+
+

Annotation and transcription

+

kraken does not provide internal tools for the annotation and transcription of +baselines, regions, and text. There are a number of tools available that can +create ALTO and PageXML files containing the requisite information for either +segmentation or recognition training: escriptorium integrates kraken tightly including +training and inference, Aletheia is a powerful desktop +application that can create fine grained annotations.

+
+
+

Dataset Compilation

+
+
+

Training

+

The training data, e.g. a collection of PAGE XML documents, obtained through +annotation and transcription may now be used to train segmentation and/or +transcription models.

+

The training data in output_dir may now be used to train a new model by +invoking the ketos train command. Just hand a list of images to the command +such as:

+
$ ketos train output_dir/*.png
+
+
+

to start training.

+

A number of lines will be split off into a separate held-out set that is used +to estimate the actual recognition accuracy achieved in the real world. These +are never shown to the network during training but will be recognized +periodically to evaluate the accuracy of the model. Per default the validation +set will comprise of 10% of the training data.

+

Basic model training is mostly automatic albeit there are multiple parameters +that can be adjusted:

+
+
--output
+

Sets the prefix for models generated during training. They will best as +prefix_epochs.mlmodel.

+
+
--report
+

How often evaluation passes are run on the validation set. It is an +integer equal or larger than 1 with 1 meaning a report is created each +time the complete training set has been seen by the network.

+
+
--savefreq
+

How often intermediate models are saved to disk. It is an integer with +the same semantics as --report.

+
+
--load
+

Continuing training is possible by loading an existing model file with +--load. To continue training from a base model with another +training set refer to the full ketos documentation.

+
+
--preload
+

Enables/disables preloading of the training set into memory for +accelerated training. The default setting preloads data sets with less +than 2500 lines, explicitly adding --preload will preload arbitrary +sized sets. --no-preload disables preloading in all circumstances.

+
+
+

Training a network will take some time on a modern computer, even with the +default parameters. While the exact time required is unpredictable as training +is a somewhat random process a rough guide is that accuracy seldom improves +after 50 epochs reached between 8 and 24 hours of training.

+

When to stop training is a matter of experience; the default setting employs a +fairly reliable approach known as early stopping that stops training as soon as +the error rate on the validation set doesn’t improve anymore. This will +prevent overfitting, i.e. +fitting the model to recognize only the training data properly instead of the +general patterns contained therein.

+
$ ketos train output_dir/*.png
+Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+[270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'}
+Initializing model ✓
+Accuracy report (0) -1.5951 3680 9550
+epoch 0/-1  [####################################]  788/788
+Accuracy report (1) 0.0245 3504 3418
+epoch 1/-1  [####################################]  788/788
+Accuracy report (2) 0.8445 3504 545
+epoch 2/-1  [####################################]  788/788
+Accuracy report (3) 0.9541 3504 161
+epoch 3/-1  [------------------------------------]  13/788  0d 00:22:09
+...
+
+
+

By now there should be a couple of models model_name-1.mlmodel, +model_name-2.mlmodel, … in the directory the script was executed in. Lets +take a look at each part of the output.

+
Building training set  [####################################]  100%
+Building validation set  [####################################]  100%
+
+
+

shows the progress of loading the training and validation set into memory. This +might take a while as preprocessing the whole set and putting it into memory is +computationally intensive. Loading can be made faster without preloading at the +cost of performing preprocessing repeatedly during the training process.

+
[270.2364] alphabet mismatch {'9', '8', '݂', '3', '݀', '4', '1', '7', '5', '\xa0'}
+
+
+

is a warning about missing characters in either the validation or training set, +i.e. that the alphabets of the sets are not equal. Increasing the size of the +validation set will often remedy this warning.

+
Accuracy report (2) 0.8445 3504 545
+
+
+

this line shows the results of the validation set evaluation. The error after 2 +epochs is 545 incorrect characters out of 3504 characters in the validation set +for a character accuracy of 84.4%. It should decrease fairly rapidly. If +accuracy remains around 0.30 something is amiss, e.g. non-reordered +right-to-left or wildly incorrect transcriptions. Abort training, correct the +error(s) and start again.

+

After training is finished the best model is saved as +model_name_best.mlmodel. It is highly recommended to also archive the +training log and data for later reference.

+

ketos can also produce more verbose output with training set and network +information by appending one or more -v to the command:

+
$ ketos -vv train syr/*.png
+[0.7272] Building ground truth set from 876 line images
+[0.7281] Taking 88 lines from training for evaluation
+...
+[0.8479] Training set 788 lines, validation set 88 lines, alphabet 48 symbols
+[0.8481] alphabet mismatch {'\xa0', '0', ':', '݀', '܇', '݂', '5'}
+[0.8482] grapheme       count
+[0.8484] SPACE  5258
+[0.8484]        ܐ       3519
+[0.8485]        ܘ       2334
+[0.8486]        ܝ       2096
+[0.8487]        ܠ       1754
+[0.8487]        ܢ       1724
+[0.8488]        ܕ       1697
+[0.8489]        ܗ       1681
+[0.8489]        ܡ       1623
+[0.8490]        ܪ       1359
+[0.8491]        ܬ       1339
+[0.8491]        ܒ       1184
+[0.8492]        ܥ       824
+[0.8492]        .       811
+[0.8493] COMBINING DOT BELOW    646
+[0.8493]        ܟ       599
+[0.8494]        ܫ       577
+[0.8495] COMBINING DIAERESIS    488
+[0.8495]        ܚ       431
+[0.8496]        ܦ       428
+[0.8496]        ܩ       307
+[0.8497] COMBINING DOT ABOVE    259
+[0.8497]        ܣ       256
+[0.8498]        ܛ       204
+[0.8498]        ܓ       176
+[0.8499]        ܀       132
+[0.8499]        ܙ       81
+[0.8500]        *       66
+[0.8501]        ܨ       59
+[0.8501]        ܆       40
+[0.8502]        [       40
+[0.8503]        ]       40
+[0.8503]        1       18
+[0.8504]        2       11
+[0.8504]        ܇       9
+[0.8505]        3       8
+[0.8505]                6
+[0.8506]        5       5
+[0.8506] NO-BREAK SPACE 4
+[0.8507]        0       4
+[0.8507]        6       4
+[0.8508]        :       4
+[0.8508]        8       4
+[0.8509]        9       3
+[0.8510]        7       3
+[0.8510]        4       3
+[0.8511] SYRIAC FEMININE DOT    1
+[0.8511] SYRIAC RUKKAKHA        1
+[0.8512] Encoding training set
+[0.9315] Creating new model [1,1,0,48 Lbx100 Do] with 49 outputs
+[0.9318] layer          type    params
+[0.9350] 0              rnn     direction b transposed False summarize False out 100 legacy None
+[0.9361] 1              dropout probability 0.5 dims 1
+[0.9381] 2              linear  augmented False out 49
+[0.9918] Constructing RMSprop optimizer (lr: 0.001, momentum: 0.9)
+[0.9920] Set OpenMP threads to 4
+[0.9920] Moving model to device cpu
+[0.9924] Starting evaluation run
+
+
+

indicates that the training is running on 788 transcribed lines and a +validation set of 88 lines. 49 different classes, i.e. Unicode code points, +where found in these 788 lines. These affect the output size of the network; +obviously only these 49 different classes/code points can later be output by +the network. Importantly, we can see that certain characters occur markedly +less often than others. Characters like the Syriac feminine dot and numerals +that occur less than 10 times will most likely not be recognized well by the +trained net.

+
+
+

Evaluation and Validation

+

While output during training is detailed enough to know when to stop training +one usually wants to know the specific kinds of errors to expect. Doing more +in-depth error analysis also allows to pinpoint weaknesses in the training +data, e.g. above average error rates for numerals indicate either a lack of +representation of numerals in the training data or erroneous transcription in +the first place.

+

First the trained model has to be applied to some line transcriptions with the +ketos test command:

+
$ ketos test -m syriac_best.mlmodel lines/*.png
+Loading model syriac_best.mlmodel ✓
+Evaluating syriac_best.mlmodel
+Evaluating  [#-----------------------------------]    3%  00:04:56
+...
+
+
+

After all lines have been processed a evaluation report will be printed:

+
=== report  ===
+
+35619     Characters
+336       Errors
+99.06%    Accuracy
+
+157       Insertions
+81        Deletions
+98        Substitutions
+
+Count     Missed  %Right
+27046     143     99.47%  Syriac
+7015      52      99.26%  Common
+1558      60      96.15%  Inherited
+
+Errors    Correct-Generated
+25        {  } - { COMBINING DOT BELOW }
+25        { COMBINING DOT BELOW } - {  }
+15        { . } - {  }
+15        { COMBINING DIAERESIS } - {  }
+12        { ܢ } - {  }
+10        {  } - { . }
+8 { COMBINING DOT ABOVE } - {  }
+8 { ܝ } - {  }
+7 { ZERO WIDTH NO-BREAK SPACE } - {  }
+7 { ܆ } - {  }
+7 { SPACE } - {  }
+7 { ܣ } - {  }
+6 {  } - { ܝ }
+6 { COMBINING DOT ABOVE } - { COMBINING DIAERESIS }
+5 { ܙ } - {  }
+5 { ܬ } - {  }
+5 {  } - { ܢ }
+4 { NO-BREAK SPACE } - {  }
+4 { COMBINING DIAERESIS } - { COMBINING DOT ABOVE }
+4 {  } - { ܒ }
+4 {  } - { COMBINING DIAERESIS }
+4 { ܗ } - {  }
+4 {  } - { ܬ }
+4 {  } - { ܘ }
+4 { ܕ } - { ܢ }
+3 {  } - { ܕ }
+3 { ܐ } - {  }
+3 { ܗ } - { ܐ }
+3 { ܝ } - { ܢ }
+3 { ܀ } - { . }
+3 {  } - { ܗ }
+
+  .....
+
+
+

The first section of the report consists of a simple accounting of the number +of characters in the ground truth, the errors in the recognition output and the +resulting accuracy in per cent.

+

The next table lists the number of insertions (characters occurring in the +ground truth but not in the recognition output), substitutions (misrecognized +characters), and deletions (superfluous characters recognized by the model).

+

Next is a grouping of errors (insertions and substitutions) by Unicode script.

+

The final part of the report are errors sorted by frequency and a per +character accuracy report. Importantly most errors are incorrect recognition of +combining marks such as dots and diaereses. These may have several sources: +different dot placement in training and validation set, incorrect transcription +such as non-systematic transcription, or unclean speckled scans. Depending on +the error source, correction most often involves adding more training data and +fixing transcriptions. Sometimes it may even be advisable to remove +unrepresentative data from the training set.

+
+
+

Recognition

+

The kraken utility is employed for all non-training related tasks. Optical +character recognition is a multi-step process consisting of binarization +(conversion of input images to black and white), page segmentation (extracting +lines from the image), and recognition (converting line image to character +sequences). All of these may be run in a single call like this:

+
$ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m MODEL_FILE
+
+
+

producing a text file from the input image. There are also hocr and ALTO output +formats available through the appropriate switches:

+
$ kraken -i ... ocr -h
+$ kraken -i ... ocr -a
+
+
+

For debugging purposes it is sometimes helpful to run each step manually and +inspect intermediate results:

+
$ kraken -i INPUT_IMAGE BW_IMAGE binarize
+$ kraken -i BW_IMAGE LINES segment
+$ kraken -i BW_IMAGE OUTPUT_FILE ocr -l LINES ...
+
+
+

It is also possible to recognize more than one file at a time by just chaining +-i ... ... clauses like this:

+
$ kraken -i input_1 output_1 -i input_2 output_2 ...
+
+
+

Finally, there is a central repository containing freely available models. +Getting a list of all available models:

+
$ kraken list
+
+
+

Retrieving model metadata for a particular model:

+
$ kraken show arabic-alam-al-kutub
+name: arabic-alam-al-kutub.mlmodel
+
+An experimental model for Classical Arabic texts.
+
+Network trained on 889 lines of [0] as a test case for a general Classical
+Arabic model. Ground truth was prepared by Sarah Savant
+<sarah.savant@aku.edu> and Maxim Romanov <maxim.romanov@uni-leipzig.de>.
+
+Vocalization was omitted in the ground truth. Training was stopped at ~35000
+iterations with an accuracy of 97%.
+
+[0] Ibn al-Faqīh (d. 365 AH). Kitāb al-buldān. Edited by Yūsuf al-Hādī, 1st
+edition. Bayrūt: ʿĀlam al-kutub, 1416 AH/1996 CE.
+alphabet:  !()-.0123456789:[] «»،؟ءابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ARABIC
+MADDAH ABOVE, ARABIC HAMZA ABOVE, ARABIC HAMZA BELOW
+
+
+

and actually fetching the model:

+
$ kraken get arabic-alam-al-kutub
+
+
+

The downloaded model can then be used for recognition by the name shown in its metadata, e.g.:

+
$ kraken -i INPUT_IMAGE OUTPUT_FILE binarize segment ocr -m arabic-alam-al-kutub.mlmodel
+
+
+

For more documentation see the kraken website.

+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file diff --git a/main/vgsl.html b/main/vgsl.html new file mode 100644 index 000000000..b03e0cc5f --- /dev/null +++ b/main/vgsl.html @@ -0,0 +1,320 @@ + + + + + + + + VGSL network specification — kraken documentation + + + + + + + + + + + + + + + + + + + + +
+
+
+ + +
+ +
+

VGSL network specification

+

kraken implements a dialect of the Variable-size Graph Specification Language +(VGSL), enabling the specification of different network architectures for image +processing purposes using a short definition string.

+
+

Basics

+

A VGSL specification consists of an input block, one or more layers, and an +output block. For example:

+
[1,48,0,1 Cr3,3,32 Mp2,2 Cr3,3,64 Mp2,2 S1(1x12)1,3 Lbx100 Do O1c103]
+
+
+

The first block defines the input in order of [batch, height, width, channels] +with zero-valued dimensions being variable. Integer valued height or width +input specifications will result in the input images being automatically scaled +in either dimension.

+

When channels are set to 1 grayscale or B/W inputs are expected, 3 expects RGB +color images. Higher values in combination with a height of 1 result in the +network being fed 1 pixel wide grayscale strips scaled to the size of the +channel dimension.

+

After the input, a number of layers are defined. Layers operate on the channel +dimension; this is intuitive for convolutional layers but a recurrent layer +doing sequence classification along the width axis on an image of a particular +height requires the height dimension to be moved to the channel dimension, +e.g.:

+
[1,48,0,1 S1(1x48)1,3 Lbx100 O1c103]
+
+
+

or using the alternative slightly faster formulation:

+
[1,1,0,48 Lbx100 O1c103]
+
+
+

Finally an output definition is appended. When training sequence classification +networks with the provided tools the appropriate output definition is +automatically appended to the network based on the alphabet of the training +data.

+
+
+

Examples

+
[1,1,0,48 Lbx100 Do 01c59]
+
+Creating new model [1,1,0,48 Lbx100 Do] with 59 outputs
+layer           type    params
+0               rnn     direction b transposed False summarize False out 100 legacy None
+1               dropout probability 0.5 dims 1
+2               linear  augmented False out 59
+
+
+

A simple recurrent recognition model with a single LSTM layer classifying lines +normalized to 48 pixels in height.

+
[1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do 01c59]
+
+Creating new model [1,48,0,1 Cr3,3,32 Do0.1,2 Mp2,2 Cr3,3,64 Do0.1,2 Mp2,2 S1(1x12)1,3 Lbx100 Do] with 59 outputs
+layer           type    params
+0               conv    kernel 3 x 3 filters 32 activation r
+1               dropout probability 0.1 dims 2
+2               maxpool kernel 2 x 2 stride 2 x 2
+3               conv    kernel 3 x 3 filters 64 activation r
+4               dropout probability 0.1 dims 2
+5               maxpool kernel 2 x 2 stride 2 x 2
+6               reshape from 1 1 x 12 to 1/3
+7               rnn     direction b transposed False summarize False out 100 legacy None
+8               dropout probability 0.5 dims 1
+9               linear  augmented False out 59
+
+
+

A model with a small convolutional stack before a recurrent LSTM layer. The +extended dropout layer syntax is used to reduce drop probability on the depth +dimension as the default is too high for convolutional layers. The remainder of +the height dimension (12) is reshaped into the depth dimensions before +applying the final recurrent and linear layers.

+
[1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do 01c59]
+
+Creating new model [1,0,0,3 Cr3,3,16 Mp3,3 Lfys64 Lbx128 Lbx256 Do] with 59 outputs
+layer           type    params
+0               conv    kernel 3 x 3 filters 16 activation r
+1               maxpool kernel 3 x 3 stride 3 x 3
+2               rnn     direction f transposed True summarize True out 64 legacy None
+3               rnn     direction b transposed False summarize False out 128 legacy None
+4               rnn     direction b transposed False summarize False out 256 legacy None
+5               dropout probability 0.5 dims 1
+6               linear  augmented False out 59
+
+
+

A model with arbitrary sized color image input, an initial summarizing +recurrent layer to squash the height to 64, followed by 2 bi-directional +recurrent layers and a linear projection.

+
[1,1800,0,3 Cr3,3,32 Gn8 (I [Cr3,3,64,2,2 Gn8 CTr3,3,32,2,2]) Cr3,3,32 O2l8]
+
+layer           type    params
+0               conv    kernel 3 x 3 filters 32 activation r
+1               groupnorm       8 groups
+2               parallel        execute 2.0 and 2.1 in parallel
+2.0             identity
+2.1             serial  execute 2.1.0 to 2.1.2 in sequence
+2.1.0           conv    kernel 3 x 3 stride 2 x 2 filters 64 activation r
+2.1.1           groupnorm       8 groups
+2.1.2           transposed convolution  kernel 3 x 3 stride 2 x 2 filters 2 activation r
+3               conv    kernel 3 x 3 stride 1 x 1 filters 32 activation r
+4               linear  activation sigmoid
+
+
+

A model that outputs heatmaps with 8 feature dimensions, taking color images with +height normalized to 1800 pixels as its input. It uses a strided convolution +to first scale the image down, and then a transposed convolution to transform +the image back to its original size. This is done in a parallel block, where the +other branch simply passes through the output of the first convolution layer. +The input of the last convolutional layer is then the output of the two branches +of the parallel block concatenated, i.e. the output of the first +convolutional layer together with the output of the transposed convolutional layer, +giving 32 + 32 = 64 feature dimensions.

+
+
+

Convolutional Layers

+
C[T][{name}](s|t|r|l|m)[{name}]<y>,<x>,<d>[,<stride_y>,<stride_x>][,<dilation_y>,<dilation_x>]
+s = sigmoid
+t = tanh
+r = relu
+l = linear
+m = softmax
+
+
+

Adds a 2D convolution with kernel size (y, x) and d output channels, applying +the selected nonlinearity. Stride and dilation can be adjusted with the optional last +two parameters. T gives a transposed convolution. For transposed convolutions, +several output sizes are possible for the same configuration. The system +will try to match the output size of the different branches of parallel +blocks, however, this will only work if the transposed convolution directly +proceeds the confluence of the parallel branches, and if the branches with +fixed output size come first in the definition of the parallel block. Hence, +out of (I [Cr3,3,8,2,2 CTr3,3,8,2,2]), ([Cr3,3,8,2,2 CTr3,3,8,2,2] I) +and (I [Cr3,3,8,2,2 CTr3,3,8,2,2 Gn8]) only the first variant will +behave correctly.

+
+
+

Recurrent Layers

+
L[{name}](f|r|b)(x|y)[s][{name}]<n> LSTM cell with n outputs.
+G[{name}](f|r|b)(x|y)[s][{name}]<n> GRU cell with n outputs.
+f runs the RNN forward only.
+r runs the RNN reversed only.
+b runs the RNN bidirectionally.
+s (optional) summarizes the output in the requested dimension, return the last step.
+
+
+

Adds either an LSTM or GRU recurrent layer to the network using either the x +(width) or y (height) dimension as the time axis. Input features are the +channel dimension and the non-time-axis dimension (height/width) is treated as +another batch dimension. For example, a Lfx25 layer on an 1, 16, 906, 32 +input will execute 16 independent forward passes on 906x32 tensors resulting +in an output of shape 1, 16, 906, 25. If this isn’t desired either run a +summarizing layer in the other direction, e.g. Lfys20 for an input 1, 1, +906, 20, or prepend a reshape layer S1(1x16)1,3 combining the height and +channel dimension for an 1, 1, 906, 512 input to the recurrent layer.

+
+
+

Helper and Plumbing Layers

+
+

Max Pool

+
Mp[{name}]<y>,<x>[,<y_stride>,<x_stride>]
+
+
+

Adds a maximum pooling with (y, x) kernel_size and (y_stride, x_stride) stride.

+
+
+

Reshape

+
S[{name}]<d>(<a>x<b>)<e>,<f> Splits one dimension, moves one part to another
+        dimension.
+
+
+

The S layer reshapes a source dimension d to a,b and distributes a into +dimension e, respectively b into f. Either e or f has to be equal to +d. So S1(1, 48)1, 3 on an 1, 48, 1020, 8 input will first reshape into +1, 1, 48, 1020, 8, leave the 1 part in the height dimension and distribute +the 48 sized tensor into the channel dimension resulting in a 1, 1, 1024, +48*8=384 sized output. S layers are mostly used to remove undesirable non-1 +height before a recurrent layer.

+
+

Note

+

This S layer is equivalent to the one implemented in the tensorflow +implementation of VGSL, i.e. behaves differently from tesseract.

+
+
+
+
+

Regularization Layers

+
+

Dropout

+
Do[{name}][<prob>],[<dim>] Insert a 1D or 2D dropout layer
+
+
+

Adds an 1D or 2D dropout layer with a given probability. Defaults to 0.5 drop +probability and 1D dropout. Set to dim to 2 after convolutional layers.

+
+
+

Group Normalization

+
Gn<groups> Inserts a group normalization layer
+
+
+

Adds a group normalization layer separating the input into <groups> groups, +normalizing each separately.

+
+
+
+ + +
+ +
+
+ +
+
+ + + + + + + \ No newline at end of file