Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker build #7

Open
avpicov opened this issue Apr 19, 2019 · 1 comment
Open

Docker build #7

avpicov opened this issue Apr 19, 2019 · 1 comment
Assignees

Comments

@avpicov
Copy link

avpicov commented Apr 19, 2019

First, thank you for making this code available. I am having problems with the docker build. I'm thinking it may not have been used in a while. My first change was updating the base image from from ubuntu:wily to ubunutu.xenial.

#FROM ubuntu:wily FROM ubuntu:xenial
I then made the following change:

#RUN locale-gen en_US.UTF-8 RUN apt-get clean && apt-get update && apt-get install -y locales && locale-gen en_US.UTF-8

The latest issue I have run into is the following:

`Step 22/28 : RUN wget -O tesseract-3.04.00/training/tesstrain_utils.sh 'https://raw.githubusercontent.com/tesseract-ocr/tesseract/master/training/tesstrain_utils.sh'
---> Running in 6368cfd8f443
--2019-04-19 19:57:08-- https://raw.githubusercontent.com/tesseract-ocr/tesseract/master/training/tesstrain_utils.sh
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2019-04-19 19:57:08 ERROR 404: Not Found.

The command '/bin/sh -c wget -O tesseract-3.04.00/training/tesstrain_utils.sh 'https://raw.githubusercontent.com/tesseract-ocr/tesseract/master/training/tesstrain_utils.sh'' returned a non-zero code: 8
`
I'm not sure how active the project is at this point, but I wanted to reach out to see if you might know what the issue is here.

Thanks
-Will

@ryanfb ryanfb self-assigned this Apr 20, 2019
@ryanfb
Copy link
Owner

ryanfb commented Jul 19, 2019

You're correct that this has been dormant for a while. Part of this is due to the work I put into getting a version of the Latin-specific OCR training into Tesseract core:

But since that uses the one-size-fits-all Tesseract training process, some things were lost from this Latin-specific process. So I'll try to take a look at fixing this build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants