-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
While processing, division by zero causes an arithmetic exception #3498
Comments
The exception happens in Leptonica code (function |
Leptonica prints a hint:
So the JPEG image is invalid. With latest Leptonica code Tesseract no longer crashes, but delivers an empty text result (which is fine for this image). |
I close this issue because it only happens with an invalid JPEG image, and it is fixed in latest Leptonica. |
This is referenced in CVE-2022-38266. Do we what Leptonica patch fixes this? |
First of all: provide an image for replicating the problem. |
Citing the CVE: "An issue in the Leptonica linked library (v1.79.0) in Tesseract v5.0.0 allows attackers to cause an arithmetic exception leading to a Denial of Service (DoS) via a crafted JPEG file." This sounds rather strange. Attackers who are able to provide a JPEG for Tesseract OCR can simply run a lot of Tesseract processes with normal JPEG files. That is also some kind of DoS. |
I find jpeg in 2021-05-06-03_01_41_0x7b7d0fd6_0xb1c1261c.zip. >tesseract 2021-05-06-03_01_41_0x7b7d0fd6_0xb1c1261c.jpeg -
Corrupt JPEG data: 2 extraneous bytes before marker 0xd9
Estimating resolution as 1625 >tesseract -v
tesseract 5.2.0-8-ge589b
leptonica-1.83.0 (Aug 1 2022, 13:11:36) [MSC v.1929 LIB Release x64]
libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : libtiff 4.4.0 : zlib 1.2.12 : libwebp 1.2.2 : libopenjp2 2.5.0
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found OpenMP 2019
Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 libzstd/1.4.9
Found libcurl/7.75.0 zlib/1.2.12 libssh2/1.10.1_DEV So I do not see what we can do with it in tesseract. |
I want to make the CVE clearer. There's obviously an issue in Leptonica, but it doesn't contain any information about remediation other than "use latest Leptonica". That's not an option for certain distributions, so it's important to be able to identify the patches that fix it so those distributions can apply those patches to their distributions. Leptonica also has many CVEs. If this issue has already been assigned a CVE, then CVE-2022-38266 is a duplicate and should be rejected. Very hard to tell without knowing what the fix is. |
Leptonica > 1.80.0 should be fine. Older releases require patch DanBloomberg/leptonica@f062b42. |
Many thanks to the developers of tesseract for their hard work. I am part of the team that reported this issue. When I found this issue, I was only using the library provided by default in the ubuntu environment mentioned above (after apt-get update), so this bug was found even though it was fixed in the latest leptonica. As stweil said, with the above bug in mind, it might be better to specify the library version for secure run . It is a recent and frequent debate that an outdated library can cause problems in the product itself that uses it. We know that it imposes a lot of burden on developers. Thanks again for the action. |
Users with Linux distributions which still have that problem can report it as an issue to their distribution. It can be fixed in old releases with the patch mentioned above. |
Thanks. I'll request that MITRE update the CVE accordingly. |
System Configuration
leptonica-1.79.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
Found AVX512BW
Found AVX512F
Found AVX2
Found AVX
Found FMA
Found SSE
Found OpenMP 201511
Found libcurl/7.68.0 GnuTLS/3.6.13 zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
Program received signal SIGFPE, Arithmetic exception.
#0 0x00007ffff7dbe24a in pixBlockconvGray () from /lib/x86_64-linux-gnu/liblept.so.5
#1 0x00007ffff7dbeadd in pixBlockconv () from /lib/x86_64-linux-gnu/liblept.so.5
#2 0x00005555556b2d9b in tesseract::TextlineProjection::ConstructProjection (this=0x55555586e230, input_block=input_block@entry=0x55555587e110, rotation=..., nontext_map=...) at ./src/ccstruct/image.h:34
#3 0x00005555556980a7 in tesseract::StrokeWidth::GradeBlobsIntoPartitions (this=0x55555586ac20, pageseg_mode=pageseg_mode@entry=tesseract::PSM_AUTO, rerotation=..., block=block@entry=0x55555587e110, nontext_pix=...,
denorm=, cjk_script=0x0, projection=0x55555586e230, diacritic_blobs=0x7fffffffd018, part_grid=0x55555586e1d8, big_parts=0x55555586e208) at src/textord/strokewidth.cpp:371
#4 0x000055555566b4e5 in tesseract::ColumnFinder::FindBlocks (this=this@entry=0x55555586e0a0, pageseg_mode=pageseg_mode@entry=tesseract::PSM_AUTO, scaled_color=..., scaled_factor=,
input_block=input_block@entry=0x55555587e110, photo_mask_pix=..., thresholds_pix=..., grey_pix=..., pixa_debug=0x7ffff624cbd0, blocks=0x7fffffffcf78, diacritic_blobs=0x7fffffffd018, to_blocks=0x7fffffffd020)
at src/textord/colfind.cpp:295
#5 0x00005555555b1c8f in tesseract::Tesseract::AutoPageSeg (this=0x7ffff6229010, pageseg_mode=tesseract::PSM_AUTO, blocks=0x555556f3f830, to_blocks=0x7fffffffd020, diacritic_blobs=0x7fffffffd018, osd_tess=,
osr=0x7fffffffd3d0) at src/ccmain/pagesegmain.cpp:226
#6 0x00005555555b214d in tesseract::Tesseract::SegmentPage (this=0x7ffff6229010, input_file=, blocks=0x555556f3f830, osd_tess=osd_tess@entry=0x0, osr=osr@entry=0x7fffffffd3d0) at ./src/ccutil/params.h:202
#7 0x0000555555580e17 in tesseract::TessBaseAPI::FindLines (this=0x7fffffffe100) at /usr/include/c++/9/bits/basic_string.h:2300
#8 0x0000555555583608 in tesseract::TessBaseAPI::Recognize (this=0x7fffffffe100, monitor=0x0) at src/api/baseapi.cpp:838
#9 0x0000555555583c0a in tesseract::TessBaseAPI::ProcessPage (this=this@entry=0x7fffffffe100, pix=0x55555587a110, page_index=page_index@entry=0x0,
filename=filename@entry=0x7fffffffe77a "/home/ubuntu/Aws-Results/orcheFuzz-newbug/output_tesseract_of/initial_crashes/2021-05-06-03:01:41_0x7b7d0fd6_0xb1c1261c", retry_config=retry_config@entry=0x0,
timeout_millisec=timeout_millisec@entry=0x0, renderer=0x55555586e710) at src/api/baseapi.cpp:1259
#10 0x0000555555584888 in tesseract::TessBaseAPI::ProcessPagesInternal (this=0x7fffffffe100, filename=, retry_config=0x0, timeout_millisec=0x0, renderer=0x55555586e710) at /usr/include/c++/9/bits/basic_string.h:2300
#11 0x0000555555584e33 in tesseract::TessBaseAPI::ProcessPages (this=0x7fffffffe100, filename=, retry_config=, timeout_millisec=, renderer=) at src/api/baseapi.cpp:1071
#12 0x0000555555575ba5 in main (argc=argc@entry=0x3, argv=argv@entry=0x7fffffffe528) at /usr/include/c++/9/bits/unique_ptr.h:360
#13 0x00007ffff771f0b3 in __libc_start_main (main=0x555555574ee0 <main(int, char**)>, argc=0x3, argv=0x7fffffffe528, init=, fini=, rtld_fini=, stack_end=0x7fffffffe518)
at ../csu/libc-start.c:308
#14 0x000055555557d1be in _start () at /usr/include/x86_64-linux-gnu/bits/stdio2.h:100
I've attached the file. Please download and check the file.
2021-05-06-03_01_41_0x7b7d0fd6_0xb1c1261c.zip
The text was updated successfully, but these errors were encountered: