Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker stuck on "loading language traineddata" #901

Closed
laurent22 opened this issue Mar 12, 2024 · 4 comments
Closed

Worker stuck on "loading language traineddata" #901

laurent22 opened this issue Mar 12, 2024 · 4 comments

Comments

@laurent22
Copy link

laurent22 commented Mar 12, 2024

Tesseract.js version (version number for npm/GitHub release, or specific commit for repo)

5.0.4

Describe the bug

This is the same issue as #414, which normally should have been addressed with the errorHandler property but not in all cases it seems. I'm using Tesseract.js with Electron and it get stuck at the message { workerId: "Worker-0-ac418", status: "loading language traineddata", progress: 0 }

I set the errorHandler property but it's never triggered.

Using the "lazy fox" default image.

And the same fix as mentioned in the other issue, setting cacheMethod: 'none' works, but I'd rather keep the cache enabled since downloading 10 MB every time wouldn't make sense.

Edit:

I've just discovered that Tesseract.js has a second way to log using Tesseract.setLogging so I set that to true but it didn't help. It just prints [Worker-0-e9fc5]: Start Job-1-4ae93, action=loadLanguage followed by the dreaded loading language traineddata message.

Device Version:

  • macOS 12.5
  • Electron 26.5
@Balearica
Copy link
Member

Was this a one-time thing that was resolved once you deleted/refreshed the cache data, or can it be replicated? If it can be replicated, please provide a reproducible example.

@laurent22
Copy link
Author

I couldn't find where it stores the cache and setting langPath didn't seem to have any effect. Where can I find the cache data? For now I have disabled the cache but if I enable it again I think it will happen again, and then I can share these cached files so that the bug can be replicated

@Balearica
Copy link
Member

Files are cached at ${cachePath}/${lang}.traineddata, where cachePath is determined by the cachePath argument (. by default). For the browser version of Tesseract.js the file is cached in IndexDB, and for the Node.js version of Tesseract.js the file is cached on the local file system.

For example, the following snippet will download eng.traineddata from IndexDB on browser. It must be run from the devtools console on a website that has previously saved eng.traineddata to the cache.

(async () => {
        // Open a connection to the database
        const openRequest = indexedDB.open('keyval-store');
        
        const db = await new Promise((resolve, reject) => {
            openRequest.onerror = () => reject(openRequest.error);
            openRequest.onsuccess = () => resolve(openRequest.result);
        });
        
        // Start a transaction and get the object store
        const transaction = db.transaction(['keyval'], 'readonly');
        const store = transaction.objectStore('keyval');
        
        // Use the key to get the file as a Blob
        const getRequest = store.get('./eng.traineddata');
        
        const data = await new Promise((resolve, reject) => {
            getRequest.onerror = () => reject(getRequest.error);
            getRequest.onsuccess = () => resolve(getRequest.result);
        });
        
        const blob = new Blob([data], {type: 'application/octet-stream'});
        
        // Create a URL for the blob
        const url = URL.createObjectURL(blob);
        
        // Create a temporary anchor element to trigger download
        const a = document.createElement('a');
        a.href = url;
        a.download = 'eng.traineddata'; 
        document.body.appendChild(a);
        a.click();
        document.body.removeChild(a);
        
        // Revoke the blob URL after download
        URL.revokeObjectURL(url);
        
})();

@Balearica
Copy link
Member

@laurent22 To follow up, were you ever able to replicate this issue in a reproducible way and/or figure out what you think the root cause is?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants