-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Tesseract 4.0 Support? #196
Comments
Thanks. I definitely want to update to support Tesseract 4.0 for the reasons you point to. I'll need help to do it for sure, and I appreciate all the contributions from you and everyone else! There are two things that contributors can help with right now that will help toward supporting Tesseract 4:
The crash is reproducible on emulators, so having a 64-bit device isn't a requirement for looking into this.
|
Update: I've pushed code to the master branch that runs Tesseract 3.05.00. The problems I had been having with an earlier version of the Tesseract code have been resolved. I plan to make a release on Bintray/JCenter with these new changes soon. |
Update: The Tesseract 3.05.00 code has been released in tess-two 6.3.0. I have pushed a branch called |
Tesseract 4.0's LSTM is "much more memory-intensive" according to the doc on accuracy and performance. I can't find the specs of the test machine, but is possible that the memory constraints of most mobile devices will slow down the engine. I did read somewhere that the plan is to mark the original Tesseract engine as obsolete, so I hope that LSTM can really perform better on devices with 1 to 2 GB of RAM. |
|
Hi All, Any updates on this issue ? |
Can we use tess-two with Tesseract 4.0 ? |
in tesseract 4 it mentioned that "AVX" and "SSE" can't be used on Android as i know none of android CPUs support it (Intel doesn't let) or maybe i am wrong about this post?! |
Have you tried to compile and build the recent Tesseract 4.0 https://github.com/tesseract-ocr/tesseract version? |
Hi, |
When will tess-two support tesseract4? |
any news for tesseract 4 ? |
what about tesseract 4 ? |
I found Tesseract OCR iOS is a Framework for iOS7+, compiled also for armv7s and arm64. Update tesseract version to 4.00.00alpha at https://github.com/chaoskyme/Tesseract-OCR-iOS Will this help to figure out the compile issues for Android? |
Sounds interesting but I think it doesn’t help because the major challenge is the JNI interface which exists just on OCR Android |
Dear @rmtheis , |
I like to contribute too, but this is my first time and first post here and don't know how can I do that? |
did any of u guys could use tess two with tesseract 4 so far or not ?? is there any way to get tesseract 4.0 to work with andriod ?? |
Maybe the owner is left the project? |
Hi guys, I thought we may have asked for too much for the project contributors. LSTM/RNN inference performance & resource optimization in mobile/embed platforms is not just a piece of cake as supposed. for guys wish to contribute, my suggestion is to get the latest stable release (tesseract v3.0.5) to run with pure JNI/c++ code in android firstly. This project(code) by @rmtheis and other guys has already provided enough HOWTO information. They have no duty to answer all the questions since |
I don't know when I'll have time to work on updating this project to use the Tesseract 4 beta. If anyone wants to take this task on, please have at it! One smaller (but still pretty big) task that would help toward that effort would be to make a pull request that gets Travis CI working on this project. What I have in mind is a Travis configuration that builds the project and then runs the instrumented tests on emulators for armv7, armv8, x86, and x86-64. |
I checkout tesseract4 branch (from tess-two repository) and succeed to run './gradlew assemble' with tests passed accuracy of 89%. Can I use tess-two now with tesseract4 support? |
@rmtheis can you please answer my question? |
Hi, I think we need a list of remaining tasks to integrate completely the tesseract 4 on this library. @rmtheis What can i do to contribute on it? Regards, |
Currently the tesseract4 branch builds successfully with NDK r16b, and the legacy OEM mode
|
@rmtheis what's your testing environment? If it's running on Android/ARM instead of x86 emulator/, I suspect there are some issues in the project build setting - from the stacktrace it shows it's running some x86 code. |
hi guys, since many of us are interested in the 4.0 stuff, why not try to build & run it and report issues here. The steps might looks like that:
If we can just make LSTM engine (even w/o any architecture native optimization, e.g. Using hand-written Neon code (ARM SSE/AVX counterpart in x86)) run with an android phone, it would be a great leap ahead. comments? |
the ANDROID_BUILD macro in tess-two/jni/com_googlecode_tesseract_android/src/ looks problematic. current tess-two building (branch tesseract4.0) doesnt define this macro, so it will enable the LSTM stuff for the real android build. the lucky thing is : there are some defensive coding in tesseract/arch sources which just simply abort the x86 SSE/AVX optimization in the compiling time: // from dotproductsse.cpp #if !defined(SSE4_1) #include "dotproductsse.h" namespace tesseract { #else // !defined(SSE4_1) not sure if the result of the calling of 'abort()' is that people observed in running time while trying to launch a tess-two with LSTM engine in android. |
@hejin does this mean the LSTM feature was intended to be disabled in android? |
yep it looks like the tesseract 4.0 authors won't enable LSTM feature in android platform too early for potential resource running out issues. so they use the ANDROID_BUILD macro to disable LSTM feature temporarily. however the tess-two JNI build instruction looks not to follow the rule to use the ANDROID_BUILD macro (pls correct me if wrong @rmtheis ), so the LSTM feature will be enabled in tess-two tesseract4.0 branch. as a defensive approach to avoid more issues by wrongly taken x86 AVX/SSE instructions in ARM platforms, the LSTM operators optimization people replaced the optimized operator subroutines with a calling of abort() function while the not-expected case does happen! |
Final version of Tesseract 4.0 was released few weeks ago. Is there any new progress or time expectation when it will be integrated in tess-two? EDIT: Someone said here that he was able to compile Tesseract for Android (without tess-two) - https://groups.google.com/d/msg/tesseract-ocr/zuZYuz12oQc/VCavzreVCQAJ |
@Robyer I won't have time to update tess-two for Tesseract 4.0 anytime soon. This project is in need of someone familiar with C++ to take this task on! I'm happy to review and test proposed changes. Please don't hesitate to contribute yourself if you're at all inclined to do so -- your past contributions have been hugely helpful. I'm not sure what to make of the linked comment about the cmake build. Please share your results if you end up looking into that approach. |
@rmtheis Will you have time to help me understand the current build configuration that you use for native code? I tried to rework building your native code to standard ndkBuild in Gradle (I wanted to have proper native code completion and debugging in Android Studio) by removing your custom tasks, specifying
but there were some errors with references to liblept. It seems both tess-two and eyes-two depends on leptonica, but also tess-two depends on eyes-two. Problem is that eyes-two can't compile leptonica, but expects leptonica prebuilt library which is compiled by tess-two. So it's somehow circular reference which works only in your manual compilation. I think we should separate leptonica into its own module and then make tess-two and eyes-two modules directly dependent on leptonica module. But I don't understand the Android.mk files and the sources enough to easily do that. Perhaps you can help with that? So far I prepared PR #256 to make project work properly in latest Android Studio. Then if you look at Robyer@572c2f1 you will see changes to use ndkBuild in Gradle, but Android.mk/Application.mk files needs to be modified to make it compile. It doesn't know how to compile |
@Robyer Agreed that using I'm not aware of anywhere that the tess-two module depends on the eyes-two module, and the intent is to not have that type of circular dependency. I agree that it would be a better design to have Leptonica as a separate module, but the overall legacy project structure is so time-consuming for me to rearrange that I'd be reluctant to take that project on. Like you mention, it probably would require substantial changes to the Android.mk files and so on. When I try building your ndkBuildGradle branch, I see the issue you mentioned with libhydrogen and liblept. I'm not sure how to resolve that issue. When I remove the eyes-two module from the project and try again, it starts building but then fails with the mystery error |
I've used tess-two in the past, but since going native and basically only needing the .so files, I've switched to a more direct way of building tesseract, just using the sdk/ndk. I'm not sure if this information is directly transferable to the build issues of tess-two, but just in case, I've got a working build chain for Tesseract 4.0.0, that might help as an example?: https://github.com/rhardih/bad/blob/master/tesseract/tesseract-4.0.0.Dockerfile It obviously depends on Leptonica as well, which is also included: https://github.com/rhardih/bad/blob/master/leptonica/leptonica.Dockerfile If these is completely unhelpful, please disregard. :) |
Hi,@ALL |
@zsmartercn Hi, is it intentional that you squashed all your changes into single first commit? It's completely unusable to cherry-pick potential fixes or changes back to tess-two repository. Perhaps you can make pull requests with important changes from which could benefit tess-two users? |
Success! I created new AS project from scratch to be able to use default directory structure and configure CMake instead of ndkBuild and after various changes I'm finally able to successfully compile and use Tesseract 4.0 even with LSTM (it seems). Also debugging, code completion and other things works nicely in Android Studio 3.3. Because of the completely reworked project structure I won't be able to provide PR for tess-two though. After I clean my code and changes, I will publish it as a separate repository. |
Excellent--thanks @zsmartercn and @Robyer, for your contributions to open source. I'm looking forward to trying out your projects, and I'll plan to merge your changes for Tesseract 4 support back into this project when I have some time. |
@Robyer When you update latest code with CMake build ? please provide some details to prepare current @zsmartercn repo to CMake base build. |
Here it is! https://github.com/adaptech-cz/Tesseract4Android 🎉 Note eyes-two is not included yet. Monitor changes from tess-two are not implemented either - it should be reworked to use @rmtheis Why is in your tesseract4 branch this "Add hack to handle log2" commit? What it does? |
@Robyer Thanks man, it works great. |
@Robyer |
@rmtheis I see, that explains why I didn't experienced the missing log2 problem myself. Thanks. |
@zsmartercn @Robyer Thanks for the effort. It works!! |
i fixed this issue. my Android.mk of tesseract is Android.zip EXPLICIT_SRC_EXCLUDES should include fileio.cpp (training use) to remove dependence of glob.c, or download a copy of glob.c to local. when build on windows, max path length should be < 251. apk run correctly on mobiles of api 19->api23 (armeabi-v7a)
|
Hi, great work. A small suggestion, perhaps it would be nice to put some warning on the front page/README of this project in order to inform that there is a different repo with Tesseract-4.1.0 available. I wasted a lot of hours today because I got different results between this project and the command line, until I finally realised that the versions are different. Best regards |
I'm wrapping up the maintenance on this repo and I don't plan on making updates in the future. Note that updates to support Tesseract 4.0 have been made on other forks of this repo such as https://github.com/alexcohn/tess-two/tree/4.1. Thanks everyone, for your interest and support! |
First, I love tess-two...really :). I was just reading through the tesseract-ocr wiki (https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy-and-Performance) and noticed there are some major performance gains with 4.0. Is there anything I can do to help update tess-two to support 4.0 as well?
Thanks!
The text was updated successfully, but these errors were encountered: