Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More fixes for issue #3925 #3977

Merged
merged 10 commits into from
Dec 13, 2022
Merged

More fixes for issue #3925 #3977

merged 10 commits into from
Dec 13, 2022

Conversation

stweil
Copy link
Member

@stweil stweil commented Dec 11, 2022

No description provided.

@stweil

This comment was marked as resolved.

@stweil
Copy link
Member Author

stweil commented Dec 11, 2022

@SpaceView, @zdenop, could you please try whether you still have an issue after applying this pull request? Here is my test result:

+ tesseract num.ocra.exp0.png num.ocra.exp0 nobatch box.train
APPLY_BOXES:
   Boxes read from boxfile:      10
   Found 10 good blobs.
Generated training data for 1 words
+ unicharset_extractor num.ocra.exp0.box
Extracting unicharset from box file num.ocra.exp0.box
Wrote unicharset file unicharset
+ set_unicharset_properties -U unicharset -O num.unicharset --script_dir=/home/stweil/src/github/tesseract-ocr/langdata/
Loaded unicharset of size 13 from file unicharset
Setting unichar properties
Setting script properties
Writing unicharset to file num.unicharset
+ shapeclustering -F font_properties -U num.unicharset num.ocra.exp0.tr
Reading num.ocra.exp0.tr ...
Building master shape table
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0 1 2 3 4 5 6 7 8 9
Stopped with 0 merged, min dist 0.256098
Master shape_table:Number of shapes = 10 max unichars = 1 number with multiple unichars = 0
+ mftraining -F font_properties -U num.unicharset -O num.unicharset num.ocra.exp0.tr
Read shape table shapetable of 10 shapes
Reading num.ocra.exp0.tr ...
Warning: no protos/configs for Joined in CreateIntTemplates()
Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates()
Done!
+ cntraining num.ocra.exp0.tr
Reading num.ocra.exp0.tr ...
Clustering ...

Writing normproto ...
+ mv inttemp num.inttemp
+ mv pffmtable num.pffmtable
+ mv normproto num.normproto
+ mv shapetable num.shapetable
+ combine_tessdata num.
Version:5.2.0-108-gf77c
1:unicharset:size=837, offset=192
3:inttemp:size=134553, offset=1029
4:pffmtable:size=110, offset=135582
5:normproto:size=1382, offset=135692
13:shapetable:size=184, offset=137074
23:version:size=15, offset=137258
Combining tessdata files
Output num.traineddata created successfully.
+ mkdir tessdata
+ mv num.traineddata tessdata
+ tesseract num.ocra.exp0.png - --psm 7 -l num --tessdata-dir tessdata
0123456789

This allows removing a reinterpret_cast and fixes a runtime error
with sanitizers:

runtime error: call to function
tesseract::MakePotentialClusters(tesseract::ClusteringContext*, tesseract::CLUSTER*, int)
through pointer to incorrect function type 'void (*)(...)'

Signed-off-by: Stefan Weil <[email protected]>
…ract-ocr#3925)

It is required for mftraining which otherwise writes a wrong shapetable.

Signed-off-by: Stefan Weil <[email protected]>
The old code did not work correctly if FClass->font_set.size() was 0.
It created the FontSet fs with size 1 instead of 0.

Signed-off-by: Stefan Weil <[email protected]>
It was triggered by mftraining.

Signed-off-by: Stefan Weil <[email protected]>
mftraining crashed if the search did not find anything.

Signed-off-by: Stefan Weil <[email protected]>
mftraining crashed because the returned value was 1 instead of 0
for the first call of UnicityTable::push_back.

Signed-off-by: Stefan Weil <[email protected]>
It crashed when running mftraining with fs.size() == 0.

Signed-off-by: Stefan Weil <[email protected]>
It crashed when running mftraining because unicharset_size in file
"inttemp" was written with 8 bytes instead of 4 bytes.

Signed-off-by: Stefan Weil <[email protected]>
This fixes duplicate delete when running cntraining.

Signed-off-by: Stefan Weil <[email protected]>
…plates

UnicityTable did not provide the [] operator, so add it for this change.

Suggested-by: Egor Pugin <[email protected]>
Signed-off-by: Stefan Weil <[email protected]>
@stweil stweil merged commit 369b811 into tesseract-ocr:main Dec 13, 2022
@stweil stweil deleted the fix3925 branch December 13, 2022 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants