-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AnalyseLayout() for tesseract.js #656
Comments
I have no opposition to adding this, although probably won't have time personally (in the near future). Will likely require an interested user to develop an interface. As you note, the necessary API functions do appear to be exposed already (in the glue file), so it would presumably just require building an interface around that using JavaScript. |
@Balearica Thank you for the reply. Could you briefly describe where and what should be modified/added ? I will see if I can do this. I would need some directions. |
@mattiaCanevascini I have never used this particular function, however can speak to development more broadly. The first step of exposing a new feature is cloning Tesseract.js-core and familiarizing yourself with the examples. For example, this is a basic recognition example in Tesseract.js-core. In contrast to the recognition example in Tesseract.js, you'll note that it uses lower-level functions (calls to methods of Virtually every Tesseract API function is already included in Tesseract.js-core (including, as you note, |
@Balearica thank you for the description. It was what I was looking for. I will try :) |
I added the ability to run layout analysis but not recognition to the master branch. It is included in releases starting at Running only layout analysis requires setting the
The |
Is your feature request related to a problem? Please describe.
Currently it is not possible to perform a fast document layout analysis.
Describe the solution you'd like
The function AnalyseLayout() is present in tesseract C++ and I have seen that there is something present in the tesseract.js-core inside the glue.js file:
https://github.com/naptha/tesseract.js-core/blob/82c349860e5d0cd81449761077d0d113fdf04c1b/javascript/glue.js#L1481
The AnalyzeLayout function makes a very fast analysis of the document returning the document segmented in boxes.
Describe alternatives you've considered
Using the regular worker.recognize() function is possible to perform layout analysis working with the TSV output but this does require e full analysis wheras the function AnalyseLayout() uses another method that is much more immediate and can define the zones to later perfomr a worker.recognize().
Additional context
Using gImageReader with tesseract
The text was updated successfully, but these errors were encountered: