Skip to content
This repository was archived by the owner on Jan 13, 2023. It is now read-only.

Feat tesseract 4 #65

Merged
merged 7 commits into from
Nov 23, 2018
Merged

Feat tesseract 4 #65

merged 7 commits into from
Nov 23, 2018

Conversation

lrog
Copy link
Contributor

@lrog lrog commented Nov 22, 2018

Moving forward with the new version of tesseract. The changes include:

  • CogStack Pipeline Dockerfile -- since the image is based on the OpenJDK v.11 (based on the newest debian), the new version will be downloaded automatically.
  • TravisCI build configuration -- since travis builds can be only based either on Ubuntu Trusty or Xenial distros, needed to add a custom PPA repo to download the newest version of tesseract (Trusty / Xenial only can provide tesseract in 3.x version)

CogStack Pipeline already can use the new version of Tesseract since Tika dependencies have been updated to the newest version.

@lrog lrog requested a review from afolarin November 22, 2018 16:27
Copy link
Contributor

@afolarin afolarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Dockerfile Outdated

RUN apt-get update && \
# apt-get dist-upgrade -y && \
# apt-get install -y tesseract-ocr && \
apt-get install -y tesseract-ocr-osd=3.04.00-1 tesseract-ocr-eng=3.04.00-1 tesseract-ocr=3.04.01-5 && \
apt-get install -y tesseract-ocr && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better practice to always specify versions of applications installed so the builds of the container behave consistently across different docker build

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, thx. I've just added the versions to tesseract and imagemagick.

@lrog lrog merged commit 8bcd883 into dev Nov 23, 2018
@lrog lrog deleted the feat-tesseract-4 branch November 23, 2018 17:33
This was referenced Nov 23, 2018
vladd-bit pushed a commit that referenced this pull request Nov 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants