Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Google code for more unittests #1863

Merged
merged 11 commits into from
Aug 25, 2018
Merged

Conversation

stweil
Copy link
Member

@stweil stweil commented Aug 25, 2018

Some additional changes of the Tesseract code base make integration of the new tests easier.
As an example, bitvector_test and some more tests were now added to the Tesseract test set.

@Shreeshrii
Copy link
Collaborator

Welcome addition!!
Thank you @jbreiden and @stweil .

@stweil
Copy link
Member Author

stweil commented Aug 25, 2018

Some of the new test files which are currently not activated use the Abseil API (https://abseil.io/, https://github.com/abseil/abseil-cpp). We have to decide whether we want to replace those calls, add a minimal implementation to Tesseract or require a full Abseil installation (maybe as a Git submodule, about 10 MB).

These tests are affected:

baseapi_test – done
baseapi_thread_test – done
fileio_test – done
imagedata_test – done
lang_model_test – done
mastertrainer_test – done
pango_font_info_test – done
paragraphs_test – done
stringrenderer_test – done
unicharcompress_test – done

@stweil
Copy link
Member Author

stweil commented Aug 25, 2018

These tests neither depend on the Abseil API nor on unavailable TIFF images, so might be the next candidates to get fixed and added to our test set:

commandlineflags_test – done
dawg_test – done
heap_test – done
indexmapbidi_test – done
intfeaturemap_test – done
ligature_table_test – done
linlsq_test – done
lstm_recode_test – done
lstm_squashed_test – done
networkio_test – done
normstrngs_test – done
nthitem_test – done
params_model_test – done
qrsequence_test – done
recodebeam_test – done
rect_test – done
scanutils_test – done
shapetable_test – done
stats_test – done
stridemap_test – done
tablefind_test – done
tablerecog_test – done
tabvector_test – done
tfile_test – done
unicharset_test – done
unichar_test – done
validate_grapheme_test – done
validate_indic_test – done
validate_khmer_test – done
validate_myanmar_test – done
validator_test – done

Still missing (2019-07-06):

pagesegmode_test – needs missing TIFF files
tatweel_test – needs files ara.* (from ara.traineddata?)

stweil added 7 commits August 25, 2018 18:16
This allows using the class for unittests, too.

Signed-off-by: Stefan Weil <[email protected]>
They were provided by Jeff Breidenbach <[email protected]>.

Signed-off-by: Stefan Weil <[email protected]>
@Shreeshrii
Copy link
Collaborator

a full Abseil installation (maybe as a Git submodule, about 10 MB).

Abseil is an open-source collection of C++ code (compliant to C++11) designed to augment the C++ standard library.

We are already using Googletest and test (testdata) as submodules that are required only for testing, so I think it will be OK to add Abseil as a submodule too.

@egorpugin egorpugin merged commit 4620674 into tesseract-ocr:master Aug 25, 2018
@stweil stweil deleted the unittest branch August 25, 2018 20:40
@amitdo
Copy link
Collaborator

amitdo commented Oct 11, 2018

qrsequence_test – needs unknown data type CycleTimer

It is mentioned here:

https://github.com/abseil/abseil-cpp/blob/87a4c07856e7dc69958019d47b2f02ae47746ec0/absl/base/internal/unscaledcycleclock.h

Also check
https://www.google.com/search?q=%22CycleTimer%22+%22google%22+%22apache%22+%22github%22

params_model_test – needs file eng.params_model

Try to extract a eng.traindata file that contains the data for the legacy engine.
eng.params_model should be there.

https://github.com/tesseract-ocr/tesseract/blob/5fdaa479da2c/doc/combine_tessdata.1.asc

needs unknown data type Array2D (class?)

Try to replace with GENERIC_2D_ARRAY

shapetable_test – needs unknown function StringPrintf

https://www.google.com/search?q=%22StringPrintf%22+%22google%22+%22apache%22

tatweel_test – needs util/utf8/public/unicodetext.h

https://github.com/tensorflow/models/search?q=%22unicodetext.h%22

dawg_test – needs util/process/subprocess.h

https://www.google.com/search?q=%22subprocess.h%22+%22google%22+%22apache%22

@stweil
Copy link
Member Author

stweil commented Oct 11, 2018

I'm just preparing the baseapi_test. It is the first test case which uses the Abseil API, and it also needs CycleTimer and LOG(). I am still struggling with the integration of Abseil into the build process, but hopefully will have finished that soon. Then adding more tests with similar requirements should be possible.

@jbreiden
Copy link
Contributor

absl::StrFormat is the replacement for StringPrintf. Tests should work with the LOG lines removed, but the logging library is probably https://github.com/google/glog

@stweil
Copy link
Member Author

stweil commented Oct 12, 2018

Yes, the code is obviously using glog. Pull request #1980 includes a rudimentary implementation for LOG, so the Tesseract code does not need glog.

@stweil
Copy link
Member Author

stweil commented Jan 23, 2019

Latest Tesseract now has 40 unit tests (with lots of subtests) which pass successfully:

============================================================================
Testsuite summary for tesseract 4.0.0-225-gdaf6
============================================================================
# TOTAL: 40
# PASS:  40
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0

@amitdo
Copy link
Collaborator

amitdo commented Jan 23, 2019

Good work. @stweil and @Shreeshrii, thank you!

How many tests are not activated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants