diff --git a/doc/tesseract.1.asc b/doc/tesseract.1.asc index 3061e15de8..1bd82ee792 100644 --- a/doc/tesseract.1.asc +++ b/doc/tesseract.1.asc @@ -36,7 +36,7 @@ IN/OUT ARGUMENTS The basename of the output file (to which the appropriate extension will be appended). By default the output will be a text file with `.txt` added to the basename unless there are one or more - 'configfile' options which explicitly specify the desired output. + parameters set which explicitly specify the desired output. 'stdout':: Instruction to send output data to standard output. @@ -54,7 +54,7 @@ OPTIONS Specify the location of user patterns file. '-c configvar=value':: - Set value for control parameter. Multiple -c arguments are allowed. + Set value for parameter 'configvar'. Multiple -c arguments are allowed. '-l lang':: The language to use. If none is specified, English is assumed. @@ -86,20 +86,21 @@ OPTIONS 3 = Default, based on what is available. 'configfile':: - The name of a config to use. A config is a plaintext file which - contains a list of variables and their values, one per line, with a - space separating variable from value. Interesting config files - include: + - * `alto` - Output in ALTO format (file extension `.xml`). - * `hocr` - Output in hOCR format (file extension `.hocr`). - * `pdf` - Output PDF (file extension `.pdf`). - * `tsv` - Output TSV (file extension `.tsv`). - * `txt` - Output plain text (file extension `.txt`). - * `get.images` - Write images. - * `logfile` - Write debug file `tesseract.log`. - * `lstm.train` - Used for LSTM training. - * `makebox` - Output box file. - * `quiet` - Write debug file to /dev/null. + The name of a config to use. A config is a plain text file which + contains a list of parameters and their values, one per line, + with a space separating parameter from value. + + Interesting config files include: + + * `alto` - Output in ALTO format ('outputbase'`.xml`). + * `hocr` - Output in hOCR format ('outputbase'`.hocr`). + * `pdf` - Output PDF ('outputbase'`.pdf`). + * `tsv` - Output TSV ('outputbase'`.tsv`). + * `txt` - Output plain text ('outputbase'`.txt`). + * `get.images` - Write processed input images to file (`tessinput.tif`). + * `logfile` - Redirect debug messages to file (`tesseract.log`). + * `lstm.train` - Output files used by LSTM training ('outputbase'`.lstmf`). + * `makebox` - Write box file ('outputbase'`.box`). + * `quiet` - Redirect debug messages to /dev/null. It is possible to select several config files, for example `tesseract image.png demo hocr pdf txt` will create three output files @@ -334,14 +335,14 @@ Tesseract 4 LSTM OCR engine. CONFIG FILES AND AUGMENTING WITH USER DATA ------------------------------------------ -Tesseract config files consist of lines with variable-value pairs (space -separated). The variables are documented as flags in the source code like +Tesseract config files consist of lines with parameter-value pairs (space +separated). The parameters are documented as flags in the source code like the following one in tesseractclass.h: STRING_VAR_H(tessedit_char_blacklist, "", "Blacklist of chars not to recognize"); -These variables may enable or disable various features of the engine, and +These parameters may enable or disable various features of the engine, and may cause it to load (or not load) various data. For instance, let's suppose you want to OCR in English, but suppress the normal dictionary and load an alternative word list and an alternative list of patterns -- these two files @@ -371,8 +372,8 @@ load_freq_dawg F user_words_suffix user-words user_patterns_suffix user-patterns -Now, if you pass the word 'bazaar' as a trailing command line parameter -to Tesseract, Tesseract will not bother loading the system dictionary nor +Now, if you pass the word 'bazaar' as a 'configfile' to Tesseract, +Tesseract will not bother loading the system dictionary nor the dictionary of frequent words and will load and use the eng.user-words and eng.user-patterns files you provided. The former is a simple word list, one per line. The format of the latter is documented in dict/trie.h