-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
box.train segfault #57
Comments
Not enough input. In short, box.train needs both an image, and a box file, and from those it creates training data. For a more complete explanation, see the wiki: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract#run-tesseract-for-training |
Tesseract should return an error when there is insufficient input, not segfault. |
Re-opening, as requested. |
Something like this? (Using the variable also supports box.train.stderr)
|
Pretty good! But even more robust is to locate the lower level function that is crashing |
I think it's a matter for broader discussion. On the one hand, it's The Right Thing, and I've already done The Wrong Thing by closing an issue that mentions a segfault... but it's an exceptional case. One, because it overloads what is normally the output file position to be a secondary input, and two, because it's not a frequent use case. |
I tried it on opensuse 13.2 64bit and it did not crashed:
Warning in pixReadMemTiff: tiff page 1 not found
Just OCR to stdout worked as expected:
The quick brown dog jumped over the Warning in pixReadMemTiff: tiff page 1 not found |
|
@jbreiden: In openSUSE 13.2 I do not have api/.libs/lt-tesseract just api/.libs/tesseract. And you have two times, so output is testing/phototest.tif.tr And I got this:
For "stdout version" I got this:
|
@jbreiden, please test with latest commit. |
make
|
I can reproduce this. I reread this issue. Jim's explanation is still true.
|
I'd suggest something like this. I didn't check to see if we leak memory --- baseapi.cpp.orig 2016-02-04 01:09:07.790101916 +0000
+++ baseapi.cpp 2016-02-04 01:07:15.464620603 +0000
@@ -851,6 +851,9 @@
page_res_ = new PAGE_RES(false,
block_list_, &tesseract_->prev_word_best_choice_);
}
+ if (page_res_ == NULL) {
+ return -1;
+ }
if (tesseract_->tessedit_make_boxes_from_boxes) {
tesseract_->CorrectClassifyWords(page_res_);
return 0; |
Tested. It works - no segfault. |
I have no idea what the box.train config is supposed to do, or what
missing data it needs. I just don't like segfaults.
The text was updated successfully, but these errors were encountered: