Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesseract.js can't load language files on deployed server #618

Closed
danisss9 opened this issue Jun 24, 2022 · 5 comments
Closed

Tesseract.js can't load language files on deployed server #618

danisss9 opened this issue Jun 24, 2022 · 5 comments

Comments

@danisss9
Copy link

Describe the bug
Tesseract throws error when running in a deployed VM (ex: "https://vmdev-01:1000/DocScanner").

But works when running on localhost (ex: "https://localhost/DocScanner").

The code and deployment is the same. Both servers are running windows 10, IIS and ASP.NET Core 3.1 and for Frontend i'm using Angular 12.

To Reproduce
Steps to reproduce the behavior:

  1. Setup Tesseract.js to run on browser (see code bellow)
  2. Deploy WebApp on a VPS or Docker
  3. Shows error loading languages

Expected behavior
Should work the same as in localhost.

Screenshots
Error Message:
image

"assets/scripts/tesseract" folder:
image

"assets/scripts/tesseract/lang-data" folder:
image

Package.json tesseract versions:
image

Desktop:

  • OS: Windows 10
  • Browser: Chrome
  • Version: 102.0.5005.115 (Official Build) (64-bit)

Additional context

Tesseract.js setup code:

import { createWorker } from 'tesseract.js';

private async OcrImage(img: string) {
    const worker = createWorker({
      corePath:
        (this.baseUrl ?? '/') +
        'assets/scripts/tesseract/tesseract-core.wasm.js',
      workerPath:
        (this.baseUrl ?? '/') + 'assets/scripts/tesseract/worker.min.js',
      langPath:
        location.origin +
        (this.baseUrl ?? '/') +
        'assets/scripts/tesseract/lang-data',
      cacheMethod: 'none'
    });

    await worker.load();
    await worker.loadLanguage('eng');
    await worker.initialize('eng');
    await worker.setParameters({
      tessedit_char_whitelist: '0123456789',
    });

    const data = await worker.recognize(img);
    await worker.terminate();
    return data;
  }
@zerosdev
Copy link

zerosdev commented Nov 2, 2022

I have this problem too, I have 2 VPS with Ubuntu 20.04, 1 server is working fine and the other sometimes get this error even though eng.traineddata already exists. I don't know what happened. Both servers have the same code

@Balearica
Copy link
Member

@zerosdev As your issue only occurs "sometimes" it is probably related to the caching error discussed in #666. This should be fixed or significantly mitigated in version 4. In short, it looks like eng.traineddata can be corrupted, so when this happens you should either delete this file, or disable caching entirely (by setting cacheMethod to "none") although this may increase network use.

@314513535
Copy link

I have same issue. I guess this component can only work by running on node.js but not pure javascript.

@Balearica
Copy link
Member

@314513535 It is entirely possible to load custom traineddata files using JavaScript in browser. If you are having trouble doing so and want support please provide a reproducible example of your issue in the form of a code snippet (that can be run without additional dependencies) or repository.

Here is an example of a program using custom training data in browser: https://github.com/scribeocr/scribeocr/blob/master/main.js#L758

@danisss9
Copy link
Author

This seems to be fixed in version 4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants