-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Android] PdfRenderer can't include JPG files to PDFs produced on Android (PNG files works) #3317
Comments
It is a bad idea:
So if I understand it right: you take jpeg as input and you wish to store it as png(like) image format in pdf. If this is desired workaround for you, you can do it in your app instead of modifying tesseract. |
Of course, but in this case I just used code that existed in this repository for some time before it was changed by commit f794571. Reason for that commit was fixing issue named "jpg input files result in much bigger pdf". So I concluded that probably worst thing that could happen by reverting it (only for Android) is producing bigger PDF files, but at least it will work again (on Android). That's why I mentioned it in this issue as a possible solution/work-around that I have found. I just don't know if it's correct one.
I want to take JPEG file as input and produce PDF. But currently the produced PDF doesn't contain the JPEG image at all (and shows error when opening the file in PDF reader). |
I made this comment on the commit from 2 years ago -- it should have been here: This is a leptonica issue only to the extent that leptonica does not allow a direct to memory jpeg encoding of pix raster images, but instead requires encoding into a temporary file. Because that implementation is not likely to happen anytime soon, and as fmemopen() is now available on android, there is not much incentive. The simplest work-around without changing code is, as @zdenop mentioned, to convert the images to png before making the pdf. You can do this in leptonica, using either:
However::: This leaves me with one problem: in pixcompFastConvertToPdfData(), we avoid transcoding because I have implemented the change, and it is now pushed. Dan |
Dan, I see you use _open(). https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/open-wopen?view=msvc-160
|
@DanBloomberg Thank you! Your change fixes the issue. |
See discussion at tesseract-ocr/tesseract#3317
I've got report from user (adaptech-cz/Tesseract4Android#31) that
Tesseract4Android
library (which uses Tesseract 4.x/5.x) produces corrupted PDFs for JPG files (PNG files works), but ontess-two
library (which uses Tesseract 3.x.x) it works correctly for JPG files.After some debugging I found that difference/problem is probably caused by commit f794571.
If I let the code always go through the first branch
sad = pixGenerateCIData(pix, L_FLATE_ENCODE, 0, 0, &cid);
for both PNG and JPG then it produces PDF files correctly.But if the code goes through second branch with
sad = l_generateCIDataForPdf(filename, pix, kJpegQuality, &cid);
then that call always fails.The reason why the second branch fails is that Leptonica tries to load the file via
fmemopen
but that is not available on Android, so as work-around it tries to first write the data to temporary file viatmpfile()
and then process that file. But on Android is/tmp
directory not available and thus every attempt to usetmpfile()
fails. As a result it can never process such file and produces PDF without the image data.(note: fmemopen support was added to Android NDK starting with API 23, but I'd like to keep supporting older API as original libraries while possible)
Quick fix would be always using the first branch (in mentioned code above) on Android via preprocessor macro. But I'm not familiar with PDF format and I don't know what that
L_FLATE_ENCODE
method exactly does.Should I provide PR for this change or is such work-around a bad idea?
The text was updated successfully, but these errors were encountered: