Skip to content

Commit

Permalink
tidy tesseract(1) adding missing options
Browse files Browse the repository at this point in the history
Together with:
- fix "C\++"
- align executable --print-parameters message
  • Loading branch information
cjmayo committed Mar 23, 2017
1 parent 6c3d8fa commit b231aee
Show file tree
Hide file tree
Showing 5 changed files with 157 additions and 22 deletions.
2 changes: 1 addition & 1 deletion api/tesseractmain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ void PrintHelpMessage(const char* program) {
" --help-oem Show OCR Engine modes.\n"
" -v, --version Show version information.\n"
" --list-langs List available languages for tesseract engine.\n"
" --print-parameters Print tesseract parameters to stdout.\n";
" --print-parameters Print tesseract parameters.\n";

printf("\n%s", single_options);
}
Expand Down
51 changes: 43 additions & 8 deletions doc/tesseract.1
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@
.\" Title: tesseract
.\" Author: [see the "AUTHOR" section]
.\" Generator: DocBook XSL Stylesheets v1.78.1 <http://docbook.sf.net/>
.\" Date: 06/28/2015
.\" Date: 03/23/2017
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "TESSERACT" "1" "06/28/2015" "\ \&" "\ \&"
.TH "TESSERACT" "1" "03/23/2017" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
Expand Down Expand Up @@ -84,7 +84,7 @@ Set value for control parameter\&. Multiple \-c arguments are allowed\&.
The language to use\&. If none is specified, English is assumed\&. Multiple languages may be specified, separated by plus characters\&. Tesseract uses 3\-character ISO 639\-2 language codes\&. (See LANGUAGES)
.RE
.PP
\fI\--psm N\fR
\fI\-\-psm N\fR
.RS 4
Set Tesseract to only run a subset of layout analysis and assume a certain form of image\&. The options for
\fBN\fR
Expand All @@ -111,6 +111,26 @@ are:
.\}
.RE
.PP
\fI\-\-oem N\fR
.RS 4
Specify OCR Engine mode\&. The options for
\fBN\fR
are:
.sp
.if n \{\
.RS 4
.\}
.nf
0 = Original Tesseract only\&.
1 = Neural nets LSTM only\&.
2 = Tesseract + LSTM\&.
3 = Default, based on what is available\&.
.fi
.if n \{\
.RE
.\}
.RE
.PP
\fIconfigfile\fR
.RS 4
The name of a config to use\&. A config is a plaintext file which contains a list of variables and their values, one per line, with a space separating variable from value\&. Interesting config files include:
Expand Down Expand Up @@ -139,22 +159,37 @@ pdf \- Output in pdf instead of a text file\&.
.RE
.RE
.sp
\fBNota Bene:\fR The options \fI\-l lang\fR and \fI\--psm N\fR must occur before any \fIconfigfile\fR\&.
\fBNota Bene:\fR The options \fI\-l lang\fR and \fI\-\-psm N\fR must occur before any \fIconfigfile\fR\&.
.SH "SINGLE OPTIONS"
.PP
\fI\-v\fR
\fI\-h, \-\-help\fR
.RS 4
Show help message\&.
.RE
.PP
\fI\-\-help\-psm\fR
.RS 4
Show page segmentation modes\&.
.RE
.PP
\fI\-\-help\-oem\fR
.RS 4
Show OCR Engine modes\&.
.RE
.PP
\fI\-v, \-\-version\fR
.RS 4
Returns the current version of the tesseract(1) executable\&.
.RE
.PP
\fI\-\-list\-langs\fR
.RS 4
list available languages for tesseract engine\&. Can be used with \-\-tessdata\-dir\&.
List available languages for tesseract engine\&. Can be used with \-\-tessdata\-dir\&.
.RE
.PP
\fI\-\-print\-parameters\fR
.RS 4
print tesseract parameters to the stdout\&.
Print tesseract parameters\&.
.RE
.SH "LANGUAGES"
.sp
Expand Down Expand Up @@ -220,7 +255,7 @@ user_patterns_suffix user\-patterns
Now, if you pass the word \fIbazaar\fR as a trailing command line parameter to Tesseract, Tesseract will not bother loading the system dictionary nor the dictionary of frequent words and will load and use the eng\&.user\-words and eng\&.user\-patterns files you provided\&. The former is a simple word list, one per line\&. The format of the latter is documented in dict/trie\&.h on read_pattern_list()\&.
.SH "HISTORY"
.sp
The engine was developed at Hewlett Packard Laboratories Bristol and at Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998\&. A lot of the code was written in C, and then some more was written in C++\&. The C\e++ code makes heavy use of a list system using macros\&. This predates stl, was portable before stl, and is more efficient than stl lists, but has the big negative that if you do get a segmentation violation, it is hard to debug\&.
The engine was developed at Hewlett Packard Laboratories Bristol and at Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998\&. A lot of the code was written in C, and then some more was written in C++\&. The C++ code makes heavy use of a list system using macros\&. This predates stl, was portable before stl, and is more efficient than stl lists, but has the big negative that if you do get a segmentation violation, it is hard to debug\&.
.sp
Version 2\&.00 brought Unicode (UTF\-8) support, six languages, and the ability to train Tesseract\&.
.sp
Expand Down
25 changes: 21 additions & 4 deletions doc/tesseract.1.asc
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,14 @@ OPTIONS
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.

'--oem N'::
Specify OCR Engine mode. The options for *N* are:

0 = Original Tesseract only.
1 = Neural nets LSTM only.
2 = Tesseract + LSTM.
3 = Default, based on what is available.

'configfile'::
The name of a config to use. A config is a plaintext file which
contains a list of variables and their values, one per line, with a
Expand All @@ -84,14 +92,23 @@ before any 'configfile'.

SINGLE OPTIONS
--------------
'-v'::
'-h, --help'::
Show help message.
'--help-psm'::
Show page segmentation modes.
'--help-oem'::
Show OCR Engine modes.
'-v, --version'::
Returns the current version of the tesseract(1) executable.
'--list-langs'::
list available languages for tesseract engine. Can be used with --tessdata-dir.
List available languages for tesseract engine. Can be used with --tessdata-dir.
'--print-parameters'::
print tesseract parameters to the stdout.
Print tesseract parameters.
Expand Down Expand Up @@ -268,7 +285,7 @@ The engine was developed at Hewlett Packard Laboratories Bristol and at
Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more
changes made in 1996 to port to Windows, and some C\+\+izing in 1998. A
lot of the code was written in C, and then some more was written in C\+\+.
The C\+\+ code makes heavy use of a list system using macros. This predates
The C++ code makes heavy use of a list system using macros. This predates
stl, was portable before stl, and is more efficient than stl lists, but has
the big negative that if you do get a segmentation violation, it is hard to
debug.
Expand Down
49 changes: 44 additions & 5 deletions doc/tesseract.1.html
Original file line number Diff line number Diff line change
Expand Up @@ -870,6 +870,21 @@ <h2 id="_options">OPTIONS</h2>
</div></div>
</dd>
<dt class="hdlist1">
<em>--oem N</em>
</dt>
<dd>
<p>
Specify OCR Engine mode. The options for <strong>N</strong> are:
</p>
<div class="literalblock">
<div class="content">
<pre><code>0 = Original Tesseract only.
1 = Neural nets LSTM only.
2 = Tesseract + LSTM.
3 = Default, based on what is available.</code></pre>
</div></div>
</dd>
<dt class="hdlist1">
<em>configfile</em>
</dt>
<dd>
Expand Down Expand Up @@ -902,7 +917,31 @@ <h2 id="_single_options">SINGLE OPTIONS</h2>
<div class="sectionbody">
<div class="dlist"><dl>
<dt class="hdlist1">
<em>-v</em>
<em>-h, --help</em>
</dt>
<dd>
<p>
Show help message.
</p>
</dd>
<dt class="hdlist1">
<em>--help-psm</em>
</dt>
<dd>
<p>
Show page segmentation modes.
</p>
</dd>
<dt class="hdlist1">
<em>--help-oem</em>
</dt>
<dd>
<p>
Show OCR Engine modes.
</p>
</dd>
<dt class="hdlist1">
<em>-v, --version</em>
</dt>
<dd>
<p>
Expand All @@ -914,15 +953,15 @@ <h2 id="_single_options">SINGLE OPTIONS</h2>
</dt>
<dd>
<p>
list available languages for tesseract engine. Can be used with --tessdata-dir.
List available languages for tesseract engine. Can be used with --tessdata-dir.
</p>
</dd>
<dt class="hdlist1">
<em>--print-parameters</em>
</dt>
<dd>
<p>
print tesseract parameters to the stdout.
Print tesseract parameters.
</p>
</dd>
</dl></div>
Expand Down Expand Up @@ -1099,7 +1138,7 @@ <h2 id="_history">HISTORY</h2>
Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more
changes made in 1996 to port to Windows, and some C++izing in 1998. A
lot of the code was written in C, and then some more was written in C++.
The C\++ code makes heavy use of a list system using macros. This predates
The C++ code makes heavy use of a list system using macros. This predates
stl, was portable before stl, and is more efficient than stl lists, but has
the big negative that if you do get a segmentation violation, it is hard to
debug.</p></div>
Expand Down Expand Up @@ -1156,7 +1195,7 @@ <h2 id="_copying">COPYING</h2>
<div id="footnotes"><hr /></div>
<div id="footer">
<div id="footer-text">
Last updated 2015-06-28 22:23:47 CEST
Last updated 2017-03-23 19:56:19 GMT
</div>
</div>
</body>
Expand Down
52 changes: 48 additions & 4 deletions doc/tesseract.1.xml
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,20 @@ at Google since then.</simpara>
</varlistentry>
<varlistentry>
<term>
<emphasis>--oem N</emphasis>
</term>
<listitem>
<simpara>
Specify OCR Engine mode. The options for <emphasis role="strong">N</emphasis> are:
</simpara>
<literallayout class="monospaced">0 = Original Tesseract only.
1 = Neural nets LSTM only.
2 = Tesseract + LSTM.
3 = Default, based on what is available.</literallayout>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis>configfile</emphasis>
</term>
<listitem>
Expand Down Expand Up @@ -184,7 +198,37 @@ before any <emphasis>configfile</emphasis>.</simpara>
<variablelist>
<varlistentry>
<term>
<emphasis>-v</emphasis>
<emphasis>-h, --help</emphasis>
</term>
<listitem>
<simpara>
Show help message.
</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis>--help-psm</emphasis>
</term>
<listitem>
<simpara>
Show page segmentation modes.
</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis>--help-oem</emphasis>
</term>
<listitem>
<simpara>
Show OCR Engine modes.
</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis>-v, --version</emphasis>
</term>
<listitem>
<simpara>
Expand All @@ -198,7 +242,7 @@ before any <emphasis>configfile</emphasis>.</simpara>
</term>
<listitem>
<simpara>
list available languages for tesseract engine. Can be used with --tessdata-dir.
List available languages for tesseract engine. Can be used with --tessdata-dir.
</simpara>
</listitem>
</varlistentry>
Expand All @@ -208,7 +252,7 @@ before any <emphasis>configfile</emphasis>.</simpara>
</term>
<listitem>
<simpara>
print tesseract parameters to the stdout.
Print tesseract parameters.
</simpara>
</listitem>
</varlistentry>
Expand Down Expand Up @@ -377,7 +421,7 @@ on read_pattern_list().</simpara>
Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more
changes made in 1996 to port to Windows, and some C++izing in 1998. A
lot of the code was written in C, and then some more was written in C++.
The C\++ code makes heavy use of a list system using macros. This predates
The C++ code makes heavy use of a list system using macros. This predates
stl, was portable before stl, and is more efficient than stl lists, but has
the big negative that if you do get a segmentation violation, it is hard to
debug.</simpara>
Expand Down

0 comments on commit b231aee

Please sign in to comment.