Skip to content

Commit

Permalink
set unlv_tilde_crunching to false; fixes #1449 #948
Browse files Browse the repository at this point in the history
  • Loading branch information
zdenop committed Oct 23, 2018
1 parent 5a68b7f commit 3d508a6
Show file tree
Hide file tree
Showing 4 changed files with 4 additions and 2 deletions.
1 change: 1 addition & 0 deletions src/api/tesseractmain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -456,6 +456,7 @@ static void PreloadRenderers(

api->GetBoolVariable("tessedit_write_unlv", &b);
if (b) {
api->SetVariable("unlv_tilde_crunching", "true");

This comment has been minimized.

Copy link
@stweil

stweil Oct 23, 2018

Member

Is this line of code needed? tessdata/configs/unlv normally already sets the variable. I suggest to remove the line.

This comment has been minimized.

Copy link
@amitdo

amitdo Oct 23, 2018

Collaborator

I think it is needed.

The user can do -c tessedit_write_unlv=1, and not use the config file.

This comment has been minimized.

Copy link
@zdenop

zdenop Oct 23, 2018

Author Contributor

@stweil: it would make sense to remove it from config file, because it is set here... but this was at least unlv use can recognized that something has changed...

This comment has been minimized.

Copy link
@stweil

stweil Oct 23, 2018

Member

Are there use cases where writing a UNLV file is required, but without tilde crunching? Are they still possible?

The latest code changes the behaviour for -c tessedit_write_unlv=1, so must be mentioned in the release notes at least.

This comment has been minimized.

Copy link
@zdenop

zdenop Oct 24, 2018

Author Contributor

@stweil : I am not aware about anybody using unlv ;-) And my small test in #1449 indicate these problems are related to 4.00 trainneddata but not 3.05...

I put some remark to release notes, but I do not understand what you mean by "The latest code changes the behaviour..." If somebody used -c tessedit_write_unlv=1 before this change, setting will be the same as today, because unlv_tilde_crunching is set true as it was before change.

tesseract::TessUnlvRenderer* renderer =
new tesseract::TessUnlvRenderer(outputbase);
if (renderer->happy()) {
Expand Down
2 changes: 1 addition & 1 deletion src/ccmain/tesseractclass.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,7 @@ Tesseract::Tesseract()
this->params()),
double_MEMBER(quality_rowrej_pc, 1.1,
"good_quality_doc gte good char limit", this->params()),
BOOL_MEMBER(unlv_tilde_crunching, true,
BOOL_MEMBER(unlv_tilde_crunching, false,
"Mark v.bad words for tilde crunch", this->params()),
BOOL_MEMBER(hocr_font_info, false, "Add font info to hocr output",
this->params()),
Expand Down
2 changes: 1 addition & 1 deletion src/ccmain/tesseractclass.h
Original file line number Diff line number Diff line change
Expand Up @@ -963,7 +963,7 @@ class Tesseract : public Wordrec {
BOOL_VAR_H(bland_unrej, false, "unrej potential with no checks");
double_VAR_H(quality_rowrej_pc, 1.1,
"good_quality_doc gte good char limit");
BOOL_VAR_H(unlv_tilde_crunching, true,
BOOL_VAR_H(unlv_tilde_crunching, false,
"Mark v.bad words for tilde crunch");
BOOL_VAR_H(hocr_font_info, false,
"Add font info to hocr output");
Expand Down
1 change: 1 addition & 0 deletions tessdata/configs/unlv
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
tessedit_write_unlv 1
unlv_tilde_crunching T

0 comments on commit 3d508a6

Please sign in to comment.