Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesseract not working fine with arabic #791

Closed
wants to merge 138 commits into from
Closed

Tesseract not working fine with arabic #791

wants to merge 138 commits into from

Conversation

waleedraza786
Copy link

@waleedraza786 waleedraza786 commented Mar 26, 2017

Hello,
I installed tesseract and tested it , it gives correct result for english.
But i have to extract Arabic text for which i download ara.traineddata and its related files from here
https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#data-files-for-version-400
then i tried a jpeg image and got its output on a text file. then it retrieves arabic txt but not proper result

جهمة # ك سو-ة

. ظرسظة عهود

١صي سعد إلاء . سلعة-ا
. سدلعدسسلعلىوس
وا{ قللا{ ٧تلا ٣تاتع٨ اق با لإت«اح» . سا مي ضجة دةسءع عظك
«قلم«ة٧حلا و و«تعهاق بي ئت»حه لة

and there is not such words in this image

image

and when i used -l ara+eng parameter then out says no best words !! and tesseract stoped working.
whats the problem here ?
image

stweil and others added 30 commits November 24, 2016 13:52
gcc report:

opencl_device_selection.h: In function 'ds_status getNumDeviceWithEmptyScore(ds_profile*, unsigned int*)':
opencl_device_selection.h:589:13: warning: value computed is not used [-Wunused-value]
       *num++;
             ^

This is caused by a buggy implementation which increases the value of num
instead of *num.

Signed-off-by: Stefan Weil <[email protected]>
gcc report:

In file included from /usr/include/leptonica/alltypes.h:36:0,
                 from /usr/include/leptonica/allheaders.h:34,
                 from openclwrapper.h:2,
                 from openclwrapper.cpp:11:
openclwrapper.cpp: In static member function 'static PIX* OpenclDevice::pixReadMemTiffCl(const l_uint8*, size_t, l_int32)':
/usr/include/leptonica/environ.h:442:68: warning: format '%d' expects a matching 'int' argument [-Wformat=]
              (void)fprintf(stderr, "Warning in %s: " a, __VA_ARGS__), \
                                                                    ^
/usr/include/leptonica/environ.h:427:61: note: in definition of macro 'IF_SEV'
       ((l) >= MINIMUM_SEVERITY && (l) >= LeptMsgSeverity ? (t) : (f))
                                                             ^
opencl/openclwrapper.cpp:1162:3: note: in expansion of macro 'L_WARNING'
   L_WARNING("tiff page %d not found", procName);
   ^

Signed-off-by: Stefan Weil <[email protected]>
This fixes compiler warnings.

Signed-off-by: Stefan Weil <[email protected]>
Commit b1f03cb added a call of function
FreeFeatureSet to fix a memory leak, but introduced a new bug because the
local variable FloatFeatures was not always assigned a value.

Now FloatFeatures is always assigned a value, and we only need a single
place where FreeFeatureSet is called.

Signed-off-by: Stefan Weil <[email protected]>
CL_CONTEXT_NUM_DEVICES expects a cl_uint.

Passing size_t results in a wrong value for numDevices on hosts where
sizeof(cl_uint) != sizeof(size_t). This results in errors like these:

  Tesseract Open Source OCR Engine v3.05.00dev with Leptonica
  OpenCL error code is -44 at   when clCreateKernel kernel_HistogramRectAllChannels .
  OpenCL error code is -44 at   when clCreateKernel kernel_HistogramRectAllChannelsReduction .
  OpenCL error code is -48 at   when clSetKernelArg imageBuffer .
  ...

Signed-off-by: Stefan Weil <[email protected]>
CL_PROGRAM_NUM_DEVICES expects a cl_uint.

Passing size_t results in a wrong value for numDevices on hosts where
sizeof(cl_uint) != sizeof(size_t).

Signed-off-by: Stefan Weil <[email protected]>
training/commontraining.cpp:824:3: warning:
 'register' storage class specifier is deprecated and incompatible with C++1z [-Wdeprecated-register]
...

Signed-off-by: Stefan Weil <[email protected]>
C++ needs to escaped as C\+\+ in the AsciiDoc source code.
Signed-off-by: Stefan Weil <[email protected]>
Signed-off-by: Stefan Weil <[email protected]>

Conflicts:
	opencl/openclwrapper.cpp
The compare method is called very often, so even small improvements
are important.

The new code avoids one comparison in each loop iteration.
This results in smaller code (60 bytes for x86_64, gcc).

Signed-off-by: Stefan Weil <[email protected]>
gcc report:

ccstruct/blamer.cpp:343:65: warning:
 'truth_x' may be used uninitialized in this function [-Wmaybe-uninitialized]

Signed-off-by: Stefan Weil <[email protected]>
zdenop and others added 25 commits February 19, 2017 13:49
The indentation is wrong since commit
fd0683f and results in a gcc warning:

api/baseapi.cpp: In member function 'bool tesseract::TessBaseAPI::ProcessPagesMultipageTiff(const l_uint8*, size_t, const char*, const char*, int, tesseract::TessResultRenderer*, int)':
api/baseapi.cpp:986:5: warning: this 'if' clause does not guard... [-Wmisleading-indentation]
     if (tessedit_page_number >= 0)
     ^~
api/baseapi.cpp:988:7: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the 'if'
       pix = (data) ? pixReadMemFromMultipageTiff(data, size, &offset)
       ^~~

Signed-off-by: Stefan Weil <[email protected]>
…olize` away into `$LIBTOOLIZE`.

Increase portability by insulating `autogen.sh` from platform variance.
This is with respect to the comment preceding the `libtoolize`/`glibtoolize` existence check I introduced.
…ror message.

Explicitly mention the latter variant of the tool inside said error message.
Use the `$LIBTOOLIZE` variable inside the message to abstract over the two possible variants of the tool which
can be invoked.
Add item to ChangeLog for options writing to stdout instead of stderr
@zdenop
Copy link
Contributor

zdenop commented Mar 26, 2017

This not correct PR

@zdenop zdenop closed this Mar 26, 2017
@waleedraza786
Copy link
Author

so please tell me whom can i ask to guide me about this issue ?

@amitdo
Copy link
Collaborator

amitdo commented Mar 26, 2017

How he made all those old comments from me, zdenko and others reappear here?

Really weird.

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Mar 26, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.