-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tesseract issue (again) #495
Comments
In the logs of the most recent travis build of Also, in the same logs, for fscrawler 2.5-SNAPSHOT, here are the related tika dependencies
|
Btw maybe if you update from 2.3 to 2.5 it'll fix your problem? |
FSCrawler 2.5 should work with elasticsearch 5.x. Just tests are not working well AFAIK. |
This is strange. I think I should add some options to set exactly the path to tesseract instead of relying only on PATH. |
thanks a lot for the responses. setting path exactly should help, I seem to remember reading somewhere about changes in tesseract recenet releases about PATH behaviour. |
## OCR Path If your Tesseract application is not available in default system PATH, you can define the path to use by setting `fs.ocr.path` property in your `~/.fscrawler/test/_settings.json` file: ```json { "name" : "test", "fs" : { "url" : "/path/to/data/dir", "ocr" : { "path": "/path/to/tesseract/executable" } } } ``` When you set it, it's highly recommended to [set the data path for Tesseract](#ocr-data-path). ## OCR Data Path Set the path to the 'tessdata' folder, which contains language files and config files if Tesseract can not be automatically detected. You can define the path to use by setting `fs.ocr.data_path` property in your `~/.fscrawler/test/_settings.json` file: ```json { "name" : "test", "fs" : { "url" : "/path/to/data/dir", "ocr" : { "path": "/path/to/tesseract/executable", "data_path": "/path/to/tesseract/tessdata" } } } ``` Closes #495.
Just wanted to report that have continued to face same issues even with latest snapshot of FS 2.5 on an windows machine. I have set the path to the tesseract executible in the fscrawler settings but still fscrawler gives the message "But Tesseract is not installed so we won't run OCR". Ref. @shadiakiki1986 @dadoonet |
@Ramon-zaro Could you open a new issue and describe exactly your configuration file in it? |
Did Tesseract work on Windows with FScrawler? I tried FScrawler 2.5 and 2.6 with ES 6.5 but it is same issue. Any idea? |
Hi David,
as you might remember I am running fscrawler 2.3 on a windows machine to index large number of document files. it is working like a dream. can't thank you enough for this tool !
I could not however get fscrawler to recognize tesseract as the release notes say it should. tesseract runs well enough on its own from command line.
Now, I know you have mentioned earlier you are not familiar with how Tika works under the hood and that you cant help on this issue. Is that still the case ?
If so, could you just tell me which version of tika you are using and how, so that I can ask the right questions to the Tika people ?
The text was updated successfully, but these errors were encountered: