Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution worker.recognize repeatedly causes "Out of Memory" error in JSFiddle #920

Closed
horihiro opened this issue Apr 27, 2024 · 5 comments

Comments

@horihiro
Copy link

horihiro commented Apr 27, 2024

If there are some non-appropreate description, point out.

Tesseract.js version (version number for npm/GitHub release, or specific commit for repo)
I beleave the latest version is used because Tesseract is included by the following script tag

<script src='https://cdn.jsdelivr.net/npm/tesseract.js@5/dist/tesseract.min.js'></script>

Describe the bug
When worker.recognize is executed repeatedly, Out of Memory occurs.

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'https://jsfiddle.net/pgr36sct/5/'
    In this page, worker.recognize executes repeatedly by using requestAnimationFrame and output the result to the console
  2. Wait until the number of output lines on console at the bottom window reaches 30
    image
  3. In many case, See Out of Memory error as the following screenshot before the number reaches 30.
    image

Expected behavior
I expected the error doesn't occur and the repeatation of worker.recognize can continue.

Device Version:

  • OS + Version: Windows 11
  • Browser: 124.0.6367.92 (Official Build) (64-bit)

Additional context
Add any other context about the problem here.

@Balearica
Copy link
Member

Balearica commented Apr 27, 2024

Edit: This explanation is not the root cause for this user, however it may be useful for other users experiencing an 'out of memory' error. See comments below.

Copying the code from the JSFiddle below for the benefit of other users, as opening will indeed freeze/crash the page.

let worker
let i=0;
const x = 50;
const image = document.querySelector('img');
async function OCRImageByTesseract() {
  i++;
  if (i%x==0) {
    worker = worker || await Tesseract.createWorker('eng');
    const result = await worker.recognize(image.src);
    console.log(i/x, result)
  }
    requestAnimationFrame(OCRImageByTesseract)
}
// loop start;
requestAnimationFrame(OCRImageByTesseract);

Short answer: I believe this would be resolved by switching to using a scheduler rather than using worker.recognize. The basic syntax for schedulers is explained here, and there is a scheduler example in the examples directory.

Longer answer: I believe this issue is due to the fact that this code sends new jobs to the worker before the previous job is completed. Workers have no mechanism for queuing jobs--workers were written with the assumption that a new worker.recognize function would not be run until the previous call to worker.recognize completed. Support for running jobs asynchronously and/or in parallel was added later with the addition of schedulers. As a result, Tesseract.js behaves in unexpected and undesirable ways when this is not the case. This was recently discussed in #875.

@horihiro
Copy link
Author

Thank you @Balearica
Let me confirm one thing.

Doesn't the below code using await wait until finishing worker.recognize though the return value is assigned to result?

const result = await worker.recognize(image.src);

What I want to do is just executing worker.recognize repeatedly, not parallel execution.

@Balearica
Copy link
Member

You're right, my original explanation was incorrect. I was unfamiliar with the requestAnimationFrame function, however it looks like calling that function is the equivalent of just calling OCRImageByTesseract once. Therefore, this snippet is waiting for worker.recognize to finish before running it again.

I do not know why this code is causing the page to crash in JSFiddle, however I now suspect the issue is with JSFiddle rather than Tesseract.js. I was unable to replicate this issue outside of JSFiddle, even when copy/pasting the exact code from the JSFiddle that crashes.

If you are able to replicate this problem using a standard web server, please create a repo with a reproducible example, or alternatively paste an HTML snippet that can be run as a single-file site, and I can look into it further. If the issue cannot be replicated anywhere outside of JSFiddle, then the issue should be raised with that project.

@horihiro
Copy link
Author

Thank you @Balearica !
I will check if the issue can be reproducible except on JSFiddle

@horihiro
Copy link
Author

I checked same code on CodePen, but this issue cannot be reproducible.
So this might be depends on JSFiddle as you suspected.

@Balearica Balearica changed the title Execution worker.recognize repeatedly causes "Out of Memory" error Execution worker.recognize repeatedly causes "Out of Memory" error in JSFiddle Apr 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants