-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDF output mangling image for TIFF input #535
Comments
Emergency workaround while I go hunt down root cause. --- tesseract/api/pdfrenderer.cpp 2016-11-21 08:45:47.000000000 -0800
+++ tesseract/api/pdfrenderer.cpp 2016-12-05 14:15:42.000000000 -0800
@@ -841,8 +841,8 @@
bool TessPDFRenderer::AddImageHandler(TessBaseAPI* api) {
size_t n;
char buf[kBasicBufSize];
- Pix *pix = api->GetInputImage();
char *filename = (char *)api->GetInputName();
+ Pix *pix = pixRead(filename);
int ppi = api->GetSourceYResolution();
if (!pix || ppi <= 0)
return false; |
This change also does it, at the cost of memory. And probably leaks. --- tesseract/api/baseapi.cpp 2016-12-05 08:51:32.000000000 -0800
+++ tesseract/api/baseapi.cpp 2016-12-05 14:47:16.000000000 -0800
@@ -523,7 +523,7 @@
if (InternalSetImage()) {
thresholder_->SetImage(imagedata, width, height,
bytes_per_pixel, bytes_per_line);
- SetInputImage(thresholder_->GetPixRect());
+ SetInputImage(pixCopy(NULL, thresholder_->GetPixRect()));
}
}
@@ -545,7 +545,7 @@
void TessBaseAPI::SetImage(Pix* pix) {
if (InternalSetImage()) {
thresholder_->SetImage(pix);
- SetInputImage(thresholder_->GetPixRect());
+ SetInputImage(pixCopy(NULL, thresholder_->GetPixRect()));
}
} |
This one is probably best. --- tesseract/ccmain/thresholder.cpp 2016-03-11 14:29:36.000000000 -0800
+++ tesseract/ccmain/thresholder.cpp 2016-12-05 15:00:46.000000000 -0800
@@ -225,7 +225,7 @@
Pix* ImageThresholder::GetPixRect() {
if (IsFullImage()) {
// Just clone the whole thing.
- return pixClone(pix_);
+ return pixCopy(pix_);
} else {
// Crop to the given rectangle.
Box* box = boxCreate(rect_left_, rect_top_, rect_width_, rect_height_);
@@ -322,4 +322,3 @@
}
} // namespace tesseract.
- |
This bug happens when:
So for example, this example is TIFF G4. Converting to an identical looking TIFF LZW |
Ray found the exact spot. This is the final answer. --- tesseract/ccmain/thresholder.cpp 2016-03-11 14:29:36.000000000 -0800
+++ tesseract/ccmain/thresholder.cpp 2016-12-05 15:27:45.000000000 -0800
@@ -181,8 +181,9 @@
// Caller must use pixDestroy to free the created Pix.
void ImageThresholder::ThresholdToPix(PageSegMode pageseg_mode, Pix** pix) {
if (pix_channels_ == 0) {
- // We have a binary image, so it just has to be cloned.
- *pix = GetPixRect();
+ // We have a binary image, so it just has to be copied.
+ // Don't clone or you'll mess up api->GetInputImage()
+ *pix = pixCopy(NULL, GetPixRect());
} else {
OtsuThresholdRectToPix(pix_, pix);
}
@@ -322,4 +323,3 @@
}
} // namespace tesseract.
- |
Note that this bug affects all versions of Tesseract capable of producing PDF output, both 3.0.x and 4.x. |
... And the code above is leaky. Ray is doing the final final final version right now. |
This means api->GetInputImage() is giving us a processed image.
test.tif.zip
test.pdf
The text was updated successfully, but these errors were encountered: