-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 1: [ 1669644 ] Crash in letter_is_okay() with trigger #3
Comments
2007-03-07T22:23:34.000Z |
2007-03-07T22:24:44.000Z |
2007-03-07T22:47:04.000Z |
2008-11-14T03:24:22.000Z |
2008-12-24T01:04:46.000Z |
The following code caused a crash when Tesseract was compiled with -ftrapv: 1259 int width = right - left; #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00007ffff665c231 in __GI_abort () at abort.c:79 #2 0x00007ffff69e34d8 in __subvsi3 () from /lib/x86_64-linux-gnu/libgcc_s.so.1 #3 0x000055555560c1c5 in tesseract::ColPartitionGrid::FindVPartitionPartners (this=0x55555717e3c0, to_the_left=true, part=0x5555571fa380) at ../../../src/textord/colpartitiongrid.cpp:1259 #4 0x000055555560bda0 in tesseract::ColPartitionGrid::FindPartitionPartners (this=0x55555717e3c0) at ../../../src/textord/colpartitiongrid.cpp:1196 #5 0x00005555555f52b6 in tesseract::ColumnFinder::FindBlocks (this=0x55555717e280, pageseg_mode=tesseract::PSM_AUTO, scaled_color=0x0, scaled_factor=-1, input_block=0x555555f91390, photo_mask_pix=0x555555f73300, thresholds_pix=0x555555f76290, grey_pix=0x555555f762e0, pixa_debug=0x7ffff7fc88d8, blocks=0x7fffffffd250, diacritic_blobs=0x7fffffffd330, to_blocks=0x7fffffffd328) at ../../../src/textord/colfind.cpp:431 #6 0x00005555555c240d in tesseract::Tesseract::AutoPageSeg (this=0x7ffff7fa5010, pageseg_mode=tesseract::PSM_AUTO, blocks=0x555555f761d0, to_blocks=0x7fffffffd328, diacritic_blobs=0x7fffffffd330, osd_tess=0x0, osr=0x7fffffffd6d0) at ../../../src/ccmain/pagesegmain.cpp:229 #7 0x00005555555c1ffd in tesseract::Tesseract::SegmentPage (this=0x7ffff7fa5010, input_file=0x555555f7bd90, blocks=0x555555f761d0, osd_tess=0x0, osr=0x7fffffffd6d0) at ../../../src/ccmain/pagesegmain.cpp:141 #8 0x0000555555582540 in tesseract::TessBaseAPI::FindLines (this=0x555555a9a580 <main::api>) at ../../../src/api/baseapi.cpp:2291 #9 0x000055555557ce42 in tesseract::TessBaseAPI::Recognize (this=0x555555a9a580 <main::api>, monitor=0x0) at ../../../src/api/baseapi.cpp:802 Signed-off-by: Stefan Weil <[email protected]>
The following code caused a crash when Tesseract was compiled with -ftrapv: 1259 int width = right - left; #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 tesseract-ocr#1 0x00007ffff665c231 in __GI_abort () at abort.c:79 tesseract-ocr#2 0x00007ffff69e34d8 in __subvsi3 () from /lib/x86_64-linux-gnu/libgcc_s.so.1 tesseract-ocr#3 0x000055555560c1c5 in tesseract::ColPartitionGrid::FindVPartitionPartners (this=0x55555717e3c0, to_the_left=true, part=0x5555571fa380) at ../../../src/textord/colpartitiongrid.cpp:1259 tesseract-ocr#4 0x000055555560bda0 in tesseract::ColPartitionGrid::FindPartitionPartners (this=0x55555717e3c0) at ../../../src/textord/colpartitiongrid.cpp:1196 tesseract-ocr#5 0x00005555555f52b6 in tesseract::ColumnFinder::FindBlocks (this=0x55555717e280, pageseg_mode=tesseract::PSM_AUTO, scaled_color=0x0, scaled_factor=-1, input_block=0x555555f91390, photo_mask_pix=0x555555f73300, thresholds_pix=0x555555f76290, grey_pix=0x555555f762e0, pixa_debug=0x7ffff7fc88d8, blocks=0x7fffffffd250, diacritic_blobs=0x7fffffffd330, to_blocks=0x7fffffffd328) at ../../../src/textord/colfind.cpp:431 tesseract-ocr#6 0x00005555555c240d in tesseract::Tesseract::AutoPageSeg (this=0x7ffff7fa5010, pageseg_mode=tesseract::PSM_AUTO, blocks=0x555555f761d0, to_blocks=0x7fffffffd328, diacritic_blobs=0x7fffffffd330, osd_tess=0x0, osr=0x7fffffffd6d0) at ../../../src/ccmain/pagesegmain.cpp:229 tesseract-ocr#7 0x00005555555c1ffd in tesseract::Tesseract::SegmentPage (this=0x7ffff7fa5010, input_file=0x555555f7bd90, blocks=0x555555f761d0, osd_tess=0x0, osr=0x7fffffffd6d0) at ../../../src/ccmain/pagesegmain.cpp:141 tesseract-ocr#8 0x0000555555582540 in tesseract::TessBaseAPI::FindLines (this=0x555555a9a580 <main::api>) at ../../../src/api/baseapi.cpp:2291 tesseract-ocr#9 0x000055555557ce42 in tesseract::TessBaseAPI::Recognize (this=0x555555a9a580 <main::api>, monitor=0x0) at ../../../src/api/baseapi.cpp:802 Signed-off-by: Stefan Weil <[email protected]>
Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1163:17: runtime error: index 24 out of bounds for type 'uint8_t [24]' #0 0x610d3b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT*, unsigned int*) tesseract/src/classify/intmatcher.cpp:1163:17 #1 0x60ff4e in IntegerMatcher::Match(INT_CLASS_STRUCT*, unsigned int*, unsigned int*, short, INT_FEATURE_STRUCT const*, tesseract::UnicharRating*, int, int, bool) tesseract/src/classify/intmatcher.cpp:563:11 #2 0x5f4355 in tesseract::Classify::AdaptToChar(TBLOB*, int, int, float, ADAPT_TEMPLATES_STRUCT*) tesseract/src/classify/adaptmatch.cpp:894:9 #3 0x5f35fd in tesseract::Classify::LearnPieces(char const*, int, int, float, tesseract::CharSegmentationType, char const*, WERD_RES*) tesseract/src/classify/adaptmatch.cpp:430:5 #4 0x5f201e in tesseract::Classify::LearnWord(char const*, WERD_RES*) tesseract/src/classify/adaptmatch.cpp:293:7 This catches the out of bounds data reads, but does not fix the primary reason: ProtoLengths currently gets values which are larger than the allowed index. Signed-off-by: Stefan Weil <[email protected]>
Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1121:17: runtime error: index 24 out of bounds for type 'uint8_t [24]' #0 0x61034b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT*, unsigned int*, short) tesseract/src/classify/intmatcher.cpp:1121:17 #1 0x60f560 in IntegerMatcher::Match(INT_CLASS_STRUCT*, unsigned int*, unsigned int*, short, INT_FEATURE_STRUCT const*, tesseract::UnicharRating*, int, int, bool) tesseract/src/classify/intmatcher.cpp:514:11 #2 0x5f3a25 in tesseract::Classify::AdaptToChar(TBLOB*, int, int, float, ADAPT_TEMPLATES_STRUCT*) tesseract/src/classify/adaptmatch.cpp:894:9 #3 0x5f2ccd in tesseract::Classify::LearnPieces(char const*, int, int, float, tesseract::CharSegmentationType, char const*, WERD_RES*) tesseract/src/classify/adaptmatch.cpp:430:5 #4 0x5f16ee in tesseract::Classify::LearnWord(char const*, WERD_RES*) tesseract/src/classify/adaptmatch.cpp:293:7 This catches the out of bounds data reads in release builds. Add also assertions for debug builds. See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13818. Signed-off-by: Stefan Weil <[email protected]>
Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1231:62: runtime error: division by zero #0 0x6119d5 in IntegerMatcher::ApplyCNCorrection(float, int, int, int) tesseract/src/classify/intmatcher.cpp:1231:62 #1 0x5fe9c4 in tesseract::Classify::ComputeCorrectedRating(bool, int, double, double, int, int, int, int, int, unsigned char const*) tesseract/src/classify/adaptmatch.cpp:1213:29 #2 0x5fdc22 in tesseract::Classify::ExpandShapesAndApplyCorrections(ADAPT_CLASS_STRUCT**, bool, int, int, int, float, int, int, unsigned char const*, tesseract::UnicharRating*, ADAPT_RESULTS*) tesseract/src/classify/adaptmatch.cpp:1184:13 #3 0x5fe421 in tesseract::Classify::MasterMatcher(INT_TEMPLATES_STRUCT*, short, INT_FEATURE_STRUCT const*, unsigned char const*, ADAPT_CLASS_STRUCT**, int, int, TBOX const&, GenericVector<CP_RESULT_STRUCT> const&, ADAPT_RESULTS*) tesseract/src/classify/adaptmatch.cpp:1119:5 #4 0x6003eb in tesseract::Classify::CharNormTrainingSample(bool, int, tesseract::TrainingSample const&, GenericVector<tesseract::UnicharRating>*) tesseract/src/classify/adaptmatch.cpp:1374:5 See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13712. Signed-off-by: Stefan Weil <[email protected]>
Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1231:62: runtime error: division by zero #0 0x6119d5 in IntegerMatcher::ApplyCNCorrection(float, int, int, int) tesseract/src/classify/intmatcher.cpp:1231:62 #1 0x5fe9c4 in tesseract::Classify::ComputeCorrectedRating(bool, int, double, double, int, int, int, int, int, unsigned char const*) tesseract/src/classify/adaptmatch.cpp:1213:29 #2 0x5fdc22 in tesseract::Classify::ExpandShapesAndApplyCorrections(ADAPT_CLASS_STRUCT**, bool, int, int, int, float, int, int, unsigned char const*, tesseract::UnicharRating*, ADAPT_RESULTS*) tesseract/src/classify/adaptmatch.cpp:1184:13 #3 0x5fe421 in tesseract::Classify::MasterMatcher(INT_TEMPLATES_STRUCT*, short, INT_FEATURE_STRUCT const*, unsigned char const*, ADAPT_CLASS_STRUCT**, int, int, TBOX const&, GenericVector<CP_RESULT_STRUCT> const&, ADAPT_RESULTS*) tesseract/src/classify/adaptmatch.cpp:1119:5 #4 0x6003eb in tesseract::Classify::CharNormTrainingSample(bool, int, tesseract::TrainingSample const&, GenericVector<tesseract::UnicharRating>*) tesseract/src/classify/adaptmatch.cpp:1374:5 See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13712. Signed-off-by: Stefan Weil <[email protected]>
https://code.google.com/p/tesseract-ocr/issues/detail?id=1
Reported by tmbdev, Mar 7, 2007
Filip Gieszczykiewicz - filipg(sf)
recognizing attached tif with v1.03 crashes as follows:
pppppspppppppppspppppppppsppppppppppppppppppppppppppp
Program received signal SIGSEGV, Segmentation fault.
0x080fb8a8 in letter_is_okay (dawg=0xb7f09008, node=0xbf815a04,
char_index=7, prevchar=0 '\0',
word=0xbf815bff "proto-ft", word_end=0) at dawg.cpp:49
49 if (edge_occupied (dawg, edge)) {
(gdb) bt
0 0x080fb8a8 in letter_is_okay (dawg=0xb7f09008, node=0xbf815a04,
char_index=7, prevchar=0 '\0',
word=0xbf815bff "proto-ft", word_end=0) at dawg.cpp:49
1 0x080f3b26 in append_next_choice (dawg=0xb7f09008, node=108107,
permuter=5 '\005',
word=0xbf815bff "proto-ft", choices=0x82e7ad0, char_index=7,
this_choice=0x8260df0,
prevchar=0 '\0', limit=0xbf815c28, rating=0, certainty=-1.15637732,
rating_array=0xbf815ab4,
certainty_array=0xbf815b58, word_ending=0, last_word=0,
result=0xbf815a58) at permdawg.cpp:202
2 0x080f3f03 in dawg_permute (dawg=0xb7f09008, node=108107, permuter=5
'\005',
choices=0x82e7ad0, char_index=7, limit=0xbf815c28, word=0xbf815bff
"proto-ft", rating=0,
certainty=0, rating_array=0xbf815ab4, certainty_array=0xbf815b58,
last_word=0)
at permdawg.cpp:273
3 0x080f40b3 in dawg_permute_and_select (string=0x814f9fc "system
words:", dawg=0xb7f09008,
permuter=5 '\005', character_choices=0x82e7ad0, best_choice=0x8260d40,
system_words=1)
at permdawg.cpp:334
4 0x080f5640 in permute_words (char_choices=0x82e7ad0, rating_limit=1000)
at permute.cpp:1611
5 0x080f6549 in permute_all (char_choices=0x82e7ad0, rating_limit=1000,
raw_choice=0xbf815dc8)
at permute.cpp:1092
6 0x080f6952 in permute_characters (char_choices=0x82e7ad0, limit=1000,
best_choice=0xbf815dd8,
raw_choice=0xbf815dc8) at permute.cpp:1146
7 0x080d1ef6 in chop_word_main (word=0x826f830, fx=1,
best_choice=0xbf815dd8,
raw_choice=0xbf815dc8, tester=0 '\0', trainer=0 '\0') at
chopper.cpp:476
8 0x080cf426 in cc_recog (tessword=0x826f830, best_choice=0xbf815dd8,
best_raw_choice=0xbf815dc8, tester=0 '\0', trainer=0 '\0') at
tface.cpp:247
9 0x08069a94 in recog_word_recursive (word=0x826e9f0, denorm=0x826be54,
matcher=0x80684a0 <tess_default_matcher(PBLOB*, PBLOB*, PBLOB*, WERD*,
DENORM*, BLOB_CHOICE_LIST&)>, tester=0, trainer=,
testing=0 '\0', raw_choice=@0x826be7c,
blob_choices=0xbf8162b8, outword=@0x826be50) at tfacepp.cpp:191
10 0x0806a380 in recog_word (word=0x826e9f0, denorm=0x826be54,
matcher=0x80684a0 <tess_default_matcher(PBLOB*, PBLOB*, PBLOB*, WERD*,
DENORM*, BLOB_CHOICE_LIST&)>, tester=0, trainer=0, testing=0 '\0',
raw_choice=@0x826be7c, blob_choices=0xbf8162b8,
outword=@0x826be50) at tfacepp.cpp:90
I don't think it's related to issue 1546972
It is dependent on the specific image - recreating a new TIF with pbmtext
of the contained text does not crash. Also, scaling image -2.0 or +2.0 does
not crash - just this one does.
Argh, image too big for sf.net - see
http://tesseract-ocr.repairfaq.org/downloads/b37by2.tif
The text was updated successfully, but these errors were encountered: