-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tesseract not working fine with arabic #791
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
gcc report: opencl_device_selection.h: In function 'ds_status getNumDeviceWithEmptyScore(ds_profile*, unsigned int*)': opencl_device_selection.h:589:13: warning: value computed is not used [-Wunused-value] *num++; ^ This is caused by a buggy implementation which increases the value of num instead of *num. Signed-off-by: Stefan Weil <[email protected]>
gcc report: In file included from /usr/include/leptonica/alltypes.h:36:0, from /usr/include/leptonica/allheaders.h:34, from openclwrapper.h:2, from openclwrapper.cpp:11: openclwrapper.cpp: In static member function 'static PIX* OpenclDevice::pixReadMemTiffCl(const l_uint8*, size_t, l_int32)': /usr/include/leptonica/environ.h:442:68: warning: format '%d' expects a matching 'int' argument [-Wformat=] (void)fprintf(stderr, "Warning in %s: " a, __VA_ARGS__), \ ^ /usr/include/leptonica/environ.h:427:61: note: in definition of macro 'IF_SEV' ((l) >= MINIMUM_SEVERITY && (l) >= LeptMsgSeverity ? (t) : (f)) ^ opencl/openclwrapper.cpp:1162:3: note: in expansion of macro 'L_WARNING' L_WARNING("tiff page %d not found", procName); ^ Signed-off-by: Stefan Weil <[email protected]>
This fixes compiler warnings. Signed-off-by: Stefan Weil <[email protected]>
Commit b1f03cb added a call of function FreeFeatureSet to fix a memory leak, but introduced a new bug because the local variable FloatFeatures was not always assigned a value. Now FloatFeatures is always assigned a value, and we only need a single place where FreeFeatureSet is called. Signed-off-by: Stefan Weil <[email protected]>
CL_CONTEXT_NUM_DEVICES expects a cl_uint. Passing size_t results in a wrong value for numDevices on hosts where sizeof(cl_uint) != sizeof(size_t). This results in errors like these: Tesseract Open Source OCR Engine v3.05.00dev with Leptonica OpenCL error code is -44 at when clCreateKernel kernel_HistogramRectAllChannels . OpenCL error code is -44 at when clCreateKernel kernel_HistogramRectAllChannelsReduction . OpenCL error code is -48 at when clSetKernelArg imageBuffer . ... Signed-off-by: Stefan Weil <[email protected]>
CL_PROGRAM_NUM_DEVICES expects a cl_uint. Passing size_t results in a wrong value for numDevices on hosts where sizeof(cl_uint) != sizeof(size_t). Signed-off-by: Stefan Weil <[email protected]>
… lossless-encoded at different bpp
training/commontraining.cpp:824:3: warning: 'register' storage class specifier is deprecated and incompatible with C++1z [-Wdeprecated-register] ... Signed-off-by: Stefan Weil <[email protected]>
C++ needs to escaped as C\+\+ in the AsciiDoc source code.
Signed-off-by: Stefan Weil <[email protected]>
Signed-off-by: Stefan Weil <[email protected]> Conflicts: opencl/openclwrapper.cpp
The compare method is called very often, so even small improvements are important. The new code avoids one comparison in each loop iteration. This results in smaller code (60 bytes for x86_64, gcc). Signed-off-by: Stefan Weil <[email protected]>
gcc report: ccstruct/blamer.cpp:343:65: warning: 'truth_x' may be used uninitialized in this function [-Wmaybe-uninitialized] Signed-off-by: Stefan Weil <[email protected]>
... and Jeff
and fix typo
Signed-off-by: Stefan Weil <[email protected]>
The indentation is wrong since commit fd0683f and results in a gcc warning: api/baseapi.cpp: In member function 'bool tesseract::TessBaseAPI::ProcessPagesMultipageTiff(const l_uint8*, size_t, const char*, const char*, int, tesseract::TessResultRenderer*, int)': api/baseapi.cpp:986:5: warning: this 'if' clause does not guard... [-Wmisleading-indentation] if (tessedit_page_number >= 0) ^~ api/baseapi.cpp:988:7: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the 'if' pix = (data) ? pixReadMemFromMultipageTiff(data, size, &offset) ^~~ Signed-off-by: Stefan Weil <[email protected]>
…olize` away into `$LIBTOOLIZE`. Increase portability by insulating `autogen.sh` from platform variance.
This is with respect to the comment preceding the `libtoolize`/`glibtoolize` existence check I introduced.
…ror message. Explicitly mention the latter variant of the tool inside said error message.
Use the `$LIBTOOLIZE` variable inside the message to abstract over the two possible variants of the tool which can be invoked.
Signed-off-by: Stefan Weil <[email protected]>
Add item to ChangeLog for options writing to stdout instead of stderr
This not correct PR |
so please tell me whom can i ask to guide me about this issue ? |
How he made all those old comments from me, zdenko and others reappear here? Really weird. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello,
I installed tesseract and tested it , it gives correct result for english.
But i have to extract Arabic text for which i download ara.traineddata and its related files from here
https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#data-files-for-version-400
then i tried a jpeg image and got its output on a text file. then it retrieves arabic txt but not proper result
جهمة # ك سو-ة
. ظرسظة عهود
١صي سعد إلاء . سلعة-ا
. سدلعدسسلعلىوس
وا{ قللا{ ٧تلا ٣تاتع٨ اق با لإت«اح» . سا مي ضجة دةسءع عظك
«قلم«ة٧حلا و و«تعهاق بي ئت»حه لة
and there is not such words in this image
and when i used -l ara+eng parameter then out says no best words !! and tesseract stoped working.
whats the problem here ?