Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import pdfjs-dist not working correctly #58313

Closed
1 task done
Luluno01 opened this issue Nov 10, 2023 · 54 comments
Closed
1 task done

Import pdfjs-dist not working correctly #58313

Luluno01 opened this issue Nov 10, 2023 · 54 comments
Labels
bug Issue was opened via the bug report template. locked Module Resolution Module resolution (CJS / ESM, module resolving).

Comments

@Luluno01
Copy link
Contributor

Link to the code that reproduces this issue

https://github.com/Luluno01/pdfjs-dist-import-reproducer

To Reproduce

  1. Start the application in development mode (next dev)
  2. Open home page (/)
  3. Got an error in dev server console "Attempted import error: 'getDocument' is not exported from 'pdfjs-dist' (imported as 'pdfjs')." and getDocument being undefined.

Current vs. Expected behavior

The ESM package pdfjs-dist should be imported correctly. The actual outcome, however, is nothing will be imported -- all exported objects are undefined, including the default export.

Verify canary release

  • I verified that the issue exists in the latest Next.js canary release

Provide environment information

Operating System:
  Platform: win32
  Arch: x64
  Version: Windows 11 Pro
Binaries:
  Node: 21.1.0
  npm: N/A
  Yarn: N/A
  pnpm: N/A
Relevant Packages:
  next: 14.0.3-canary.1
  eslint-config-next: N/A
  react: 18.2.0
  react-dom: 18.2.0
  typescript: 5.1.3
Next.js Config:
  output: N/A

Which area(s) are affected? (Select all that apply)

App Router, TypeScript (plugin, built-in types)

Additional context

Same problem with version "13.5.4" and version "13.0.0".

@Luluno01 Luluno01 added the bug Issue was opened via the bug report template. label Nov 10, 2023
@github-actions github-actions bot added the TypeScript Related to types with Next.js. label Nov 10, 2023
@Luluno01
Copy link
Contributor Author

Note that importing pdfjs-dist directly in a plain JS file without any bundler works without a problem.

https://github.com/Luluno01/pdfjs-dist-import-reproducer/blob/main/expected.js

@Luluno01
Copy link
Contributor Author

Looks like it has something to do with webpack. I found a temporary workaround, which is using dynamic import() with /*webpackIgnore: true*/. Not sure if this is a good practice but at least it works locally.

@AChangXD
Copy link

AChangXD commented Nov 12, 2023

@Luluno01 I ran into this issue just now, and yeah the imports shows up as undefined for me as well (I'm on version 14)

@AChangXD
Copy link

AChangXD commented Nov 12, 2023

@Luluno01 I tried using import() and is running into the same error. Can you post a snippet on how you are importing/calling the function? (Note I'm using it in an API request)

My code:
`const pdfJs = await import('pdfjs-dist');

export async function POST(req: Request, res: Response) {
console.log(typeof pdfJs);
console.log(typeof pdfJs.getDocument);`

@AChangXD
Copy link

@Luluno01 Even though typeof shows that it's an object, an empty array will show up if you try to print Object.keys(pdfJS)

@AChangXD
Copy link

AChangXD commented Nov 12, 2023

I even added https://github.com/mozilla/pdfjs-dist manually into my project, same error, something to do with the imports for sure

@Luluno01
Copy link
Contributor Author

@Luluno01 I tried using import() and is running into the same error. Can you post a snippet on how you are importing/calling the function? (Note I'm using it in an API request)

My code: `const pdfJs = await import('pdfjs-dist');

export async function POST(req: Request, res: Response) { console.log(typeof pdfJs); console.log(typeof pdfJs.getDocument);`

I did the same as what you did and got the same result. Then I added a magic comment /* webpackIgnore: true */ inside the import statement to prevent webpack from recursing into it and bundling nothing. It turns out, however, forcing webpack to ignoring the dynamic import will not work after deploying to vercel because it doesn't ship node_modules at all.

@Luluno01
Copy link
Contributor Author

I even added https://github.com/mozilla/pdfjs-dist manually into my project, same error, something to do with the imports for sure

Found this. Not a big fan of webpack but I tried to follow the settings in the example provided. Still no luck.

@AChangXD
Copy link

I even added https://github.com/mozilla/pdfjs-dist manually into my project, same error, something to do with the imports for sure

Found this. Not a big fan of webpack but I tried to follow the settings in the example provided. Still no luck.

I created a brand new node project and everything works, so this is an issue with how next.js/webpack bundle the different modules.

@AChangXD
Copy link

Interesting thing is I think everything works in the pages router

@Luluno01
Copy link
Contributor Author

I even added https://github.com/mozilla/pdfjs-dist manually into my project, same error, something to do with the imports for sure

Found this. Not a big fan of webpack but I tried to follow the settings in the example provided. Still no luck.

I created a brand new node project and everything works, so this is an issue with how next.js/webpack bundle the different modules.

Interesting thing is I think everything works in the pages router

You mean so far it ONLY works in pages router?

@malikiz
Copy link

malikiz commented Nov 12, 2023

Try importing like this:

import * as PDFJS from 'pdfjs-dist/build/pdf.min.mjs'

@Luluno01
Copy link
Contributor Author

Luluno01 commented Nov 12, 2023

Try importing like this:

import * as PDFJS from 'pdfjs-dist/build/pdf.min.mjs'

Interesting, it does make some difference, but results in another error. The result is the same as installing and importing the CommonJS version directly from the repo. While it no longer imports nothing, the library complains:

Error: Setting up fake worker failed: "Cannot find module './pdf.worker.mjs'".

According to the official example, we should add pdf.worker as an entry to split it as a separate chunk after packing by webpack. Unfortunately, I run into a webpack error "Error: Entry pdf.worker depends on main, but this entry was not found" after adding the entry pdf.worker. Not sure why it depends on "main" and what "main" is supposed to be. Would you mind sharing a minimal working example of next.config.js?

@AChangXD
Copy link

I got the worker error as well, I think 'import {getDocument} from 'pdfjs-dist'' is the official recommended way? Re webpack splitting, I have not the slightest clue lol, never really messed around with it before. Really hate to split this pdf processing into it's own microservice lol

@Luluno01
Copy link
Contributor Author

Interesting thing is I think everything works in the pages router

Interesting. I'm pretty sure it has everything to do with webpack. But I'm not familiar with webpack stuff... Still struggling to figure out how to configure webpack to make it work with app router.

@Luluno01
Copy link
Contributor Author

I got the worker error as well, I think 'import {getDocument} from 'pdfjs-dist'' is the official recommended way? Re webpack splitting, I have not the slightest clue lol, never really messed around with it before. Really hate to split this pdf processing into it's own microservice lol

Yeah, me too. I ended up reimplementing the PDF processing API endpoint with Cloud Functions, which doesn't use a bundler but runs directly the compiled code of TypeScript (or your JS code as-is). Really ugly workaround.

@Luluno01
Copy link
Contributor Author

Luluno01 commented Nov 12, 2023

I got the worker error as well, I think 'import {getDocument} from 'pdfjs-dist'' is the official recommended way? Re webpack splitting, I have not the slightest clue lol, never really messed around with it before. Really hate to split this pdf processing into it's own microservice lol

If I still remember my experiments correctly, import { getDocument } from '...' results in undefined no matter if you import it from 'pdfjs-dist' or 'pdfjs-dist/build/pdf.min.mjs'. Only import * as pdfjs from '...' gets a chance to work.

@AChangXD
Copy link

also tried raw-loader as suggested by some,

I got the worker error as well, I think 'import {getDocument} from 'pdfjs-dist'' is the official recommended way? Re webpack splitting, I have not the slightest clue lol, never really messed around with it before. Really hate to split this pdf processing into it's own microservice lol

Yeah, me too. I ended up reimplementing the PDF processing API endpoint with Cloud Functions, which doesn't use a bundler but runs directly the compiled code of TypeScript (or your JS code as-is). Really ugly workaround.

Going have to do the same thing, I think the team at Vercel should also look at other libraries with pdfjs-dist as a dependency, I was using pdf-to-png-converter. I did see something about using raw-loader and it didn't seem to have done anything? Here's my webpack config
/** @type {import('next').NextConfig} */

const nextConfig = {
  experimental: {
    esmExternals: true,
  },
  webpack: (config) => {
    config.module.rules.push({
      test: /\.node/,
      use: 'raw-loader',
    });
    config.resolve.alias.canvas = false;
    config.resolve.alias.encoding = false;
    return config;
  },
};

export default nextConfig;

Also could you link the doc where pdf.worker needs to be split into its own chunk?

@AChangXD
Copy link

I think it's also important to clarify that pdfjs-dist could be used in BOTH React and any API routes, not sure if that causes any difference in behavior.

@Luluno01
Copy link
Contributor Author

also tried raw-loader as suggested by some,

I got the worker error as well, I think 'import {getDocument} from 'pdfjs-dist'' is the official recommended way? Re webpack splitting, I have not the slightest clue lol, never really messed around with it before. Really hate to split this pdf processing into it's own microservice lol

Yeah, me too. I ended up reimplementing the PDF processing API endpoint with Cloud Functions, which doesn't use a bundler but runs directly the compiled code of TypeScript (or your JS code as-is). Really ugly workaround.

Going have to do the same thing, I think the team at Vercel should also look at other libraries with pdfjs-dist as a dependency, I was using pdf-to-png-converter. I did see something about using raw-loader and it didn't seem to have done anything? Here's my webpack config /** @type {import('next').NextConfig} */

const nextConfig = {
  experimental: {
    esmExternals: true,
  },
  webpack: (config) => {
    config.module.rules.push({
      test: /\.node/,
      use: 'raw-loader',
    });
    config.resolve.alias.canvas = false;
    config.resolve.alias.encoding = false;
    return config;
  },
};

export default nextConfig;

Also could you link the doc where pdf.worker needs to be split into its own chunk?

I just found that pdf.worker actually doesn't need to be split into a separate chunk. I looked into the webpack.config.js of the official example, which declares an entry that points to the worker source file. That's why I thought it would pass the file path to a real Worker constructor. Since Worker expects a path to a real file, the worker source file should be bundled as a separate chunk.

Later I inspected the source code: https://github.com/mozilla/pdfjs-dist/blob/master/build/pdf.js#L2031. it is const worker = await import(/* webpackIgnore: true */ this.workerSrc); in the npm distributed version, and await import(this.workerSrc) (without the magic comment) in the minified pdf.min.mjs. So it seems in Node.js environment, the worker is imported into the main thread with dynamic import instead of started as a worker thread.

@Luluno01
Copy link
Contributor Author

I think it's also important to clarify that pdfjs-dist could be used in BOTH React and any API routes, not sure if that causes any difference in behavior.

Yes, you are right. And my use case is server-side PDF file processing.

@Luluno01
Copy link
Contributor Author

Okay, I managed to get it work by adding an ugly hint for webpack: await import('pdfjs-dist/build/pdf.worker.mjs') after importing with import * as pdfjs from 'pdfjs-dist/build/pdf.min.mjs'. Confirmed to work by deploying on Vercel. I'm adding a new branch to the reproducer...

@AChangXD

@AChangXD
Copy link

Okay, I managed to get it work by adding an ugly hint for webpack: await import('pdfjs-dist/build/pdf.worker.mjs') after importing with import * as pdfjs from 'pdfjs-dist/build/pdf.min.mjs'. Confirmed to work by deploying on Vercel. I'm adding a new branch to the reproducer...

@AChangXD

Interesting, I can't get 'pdfjs-dist/build/pdf.min.mjs to import without TS complaining. With //@ts-ignore, I get Attempted import error: 'getDocument' is not exported from 'pdfjs-dist/build/pdf.mjs' (imported as 'pdfjs').

@AChangXD
Copy link

Okay, I managed to get it work by adding an ugly hint for webpack: await import('pdfjs-dist/build/pdf.worker.mjs') after importing with import * as pdfjs from 'pdfjs-dist/build/pdf.min.mjs'. Confirmed to work by deploying on Vercel. I'm adding a new branch to the reproducer...

@AChangXD

I'll try your workaround when you add the new branch, in the meantime I'm going to see if it works in create-t3-app and trpc

@Luluno01
Copy link
Contributor Author

Okay, I managed to get it work by adding an ugly hint for webpack: await import('pdfjs-dist/build/pdf.worker.mjs') after importing with import * as pdfjs from 'pdfjs-dist/build/pdf.min.mjs'. Confirmed to work by deploying on Vercel. I'm adding a new branch to the reproducer...
@AChangXD

Interesting, I can't get 'pdfjs-dist/build/pdf.min.mjs to import without TS complaining. With //@ts-ignore, I get Attempted import error: 'getDocument' is not exported from 'pdfjs-dist/build/pdf.mjs' (imported as 'pdfjs').

Just add declare module 'pdfjs-dist/build/pdf.min.mjs' { export * from 'pdfjs-dist' } to get TypeScript working again.

@Luluno01
Copy link
Contributor Author

Okay, I managed to get it work by adding an ugly hint for webpack: await import('pdfjs-dist/build/pdf.worker.mjs') after importing with import * as pdfjs from 'pdfjs-dist/build/pdf.min.mjs'. Confirmed to work by deploying on Vercel. I'm adding a new branch to the reproducer...
@AChangXD

I'll try your workaround when you add the new branch, in the meantime I'm going to see if it works in create-t3-app and trpc

Here you are: Luluno01/pdfjs-dist-import-reproducer@82c4439

@AChangXD
Copy link

Okay, I managed to get it work by adding an ugly hint for webpack: await import('pdfjs-dist/build/pdf.worker.mjs') after importing with import * as pdfjs from 'pdfjs-dist/build/pdf.min.mjs'. Confirmed to work by deploying on Vercel. I'm adding a new branch to the reproducer...
@AChangXD

I'll try your workaround when you add the new branch, in the meantime I'm going to see if it works in create-t3-app and trpc

Here you are: Luluno01/pdfjs-dist-import-reproducer@82c4439

OMG you are a genius!! I added an API endpoint and it also works:

import { NextResponse } from 'next/server';
import * as pdfjs from 'pdfjs-dist/build/pdf.min.mjs';
await import('pdfjs-dist/build/pdf.worker.min.mjs');

export async function POST(req: Request, res: Response) {
  const pdf = await pdfjs.getDocument(
    'https://www.africau.edu/images/default/sample.pdf'
  ).promise;
  const page = await pdf.getPage(1);
  const textContent = await page.getTextContent();
  return NextResponse.json({ message: textContent }, { status: 200 });
}

On my end it does give me a warning about a font issue, not sure if it's an import related issue but I'm getting me results! Warning: fetchStandardFontData: failed to fetch file "LiberationSans-Regular.ttf" with "UnknownErrorException: The standard font "baseUrl" parameter must be specified, ensure that the "standardFontDataUrl" API parameter is provided.".

Thanks a lot!

@AChangXD
Copy link

Also for future folks who may stumble on this error message when using another package that depends on pdfjs-dist: { message: 'The API version "3.11.174" does not match the Worker version "4.0.189".', name: 'UnknownErrorException', details: 'Error: The API version "3.11.174" does not match the Worker version "4.0.189".' } - You'd have to uninstall pdfjs-dist and install the correct version (3.11.174) in this case.

@Luluno01
Copy link
Contributor Author

@Luluno01 So building locally works perfectly, building on Vercel gives me this:


> Build error occurred
--
13:14:03.947 | Error: Collecting page data for undefined is still timing out after 2 attempts. See more info here https://nextjs.org/docs/messages/page-data-collection-timeout
13:14:03.954 | at onRestart (/vercel/path0/node_modules/next/dist/build/index.js:762:39)
13:14:03.954 | at Worker.isPageStatic (/vercel/path0/node_modules/next/dist/lib/worker.js:95:40)
13:14:03.954 | at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
13:14:03.954 | at async Span.traceAsyncFn (/vercel/path0/node_modules/next/dist/trace/trace.js:140:20)
13:14:03.954 | at async /vercel/path0/node_modules/next/dist/build/index.js:959:56
13:14:03.954 | at async Span.traceAsyncFn (/vercel/path0/node_modules/next/dist/trace/trace.js:140:20)
13:14:03.954 | at async Promise.all (index 4)
13:14:03.955 | at async /vercel/path0/node_modules/next/dist/build/index.js:892:17
13:14:03.955 | at async Span.traceAsyncFn (/vercel/path0/node_modules/next/dist/trace/trace.js:140:20)
13:14:03.955 | at async /vercel/path0/node_modules/next/dist/build/index.js:829:124
13:14:04.001 | Error: Command "npm run build" exited with 1

Same with my reproducer. That's why I moved it to /api/.... The static page generation will somehow timeout.

@Luluno01
Copy link
Contributor Author

I installed the exact versions of next, pdfjs-dist and canvas in the reproducer as used by the other project of mine that magically works. It doesn't help ruling out the segment fault, though. Also, the bundle size of the endpoint that uses pdfjs-dist grows to 50 MB already, which is close to the size limit posed by Vercel. Considering it is very likely that I will add more functionalities to the endpoint, I guess I had better stay with the good old solution - turning to Google Cloud Functions.

@Luluno01
Copy link
Contributor Author

Okay, I managed to get it work by adding an ugly hint for webpack: await import('pdfjs-dist/build/pdf.worker.mjs') after importing with import * as pdfjs from 'pdfjs-dist/build/pdf.min.mjs'. Confirmed to work by deploying on Vercel. I'm adding a new branch to the reproducer...
@AChangXD

I'll try your workaround when you add the new branch, in the meantime I'm going to see if it works in create-t3-app and trpc

Here you are: Luluno01/pdfjs-dist-import-reproducer@82c4439

OMG you are a genius!! I added an API endpoint and it also works:

import { NextResponse } from 'next/server';
import * as pdfjs from 'pdfjs-dist/build/pdf.min.mjs';
await import('pdfjs-dist/build/pdf.worker.min.mjs');

export async function POST(req: Request, res: Response) {
  const pdf = await pdfjs.getDocument(
    'https://www.africau.edu/images/default/sample.pdf'
  ).promise;
  const page = await pdf.getPage(1);
  const textContent = await page.getTextContent();
  return NextResponse.json({ message: textContent }, { status: 200 });
}

On my end it does give me a warning about a font issue, not sure if it's an import related issue but I'm getting me results! Warning: fetchStandardFontData: failed to fetch file "LiberationSans-Regular.ttf" with "UnknownErrorException: The standard font "baseUrl" parameter must be specified, ensure that the "standardFontDataUrl" API parameter is provided.".
Thanks a lot!

Yeah I'm also getting some bizarre warnings. I guess although this workaround is unstable and not recommended. While this workaround works in my other project after deploying, it fails in the deployment of the exact workaround branch. And the error is even more bizarre - it is a segment fault that happens only in the deployment with 0 stack trace.

Running a build on Vercel right now, will see if it fails on my end too. Do you happen to also use tesseract.js for OCR? That import is giving me hell as well :(

Not yet. But I might have to use it soon LOL

@AChangXD
Copy link

I installed the exact versions of next, pdfjs-dist and canvas in the reproducer as used by the other project of mine that magically works. It doesn't help ruling out the segment fault, though. Also, the bundle size of the endpoint that uses pdfjs-dist grows to 50 MB already, which is close to the size limit posed by Vercel. Considering it is very likely that I will add more functionalities to the endpoint, I guess I had better stay with the good old solution - turning to Google Cloud Functions.

yeah I'll have to as well, or at least host a nodejs backend on Vercel, there was a change that pdfjs-dist introduced that ballooned the bundle size, I read it somewhere yestarday but can't remember where

@AChangXD
Copy link

I installed the exact versions of next, pdfjs-dist and canvas in the reproducer as used by the other project of mine that magically works. It doesn't help ruling out the segment fault, though. Also, the bundle size of the endpoint that uses pdfjs-dist grows to 50 MB already, which is close to the size limit posed by Vercel. Considering it is very likely that I will add more functionalities to the endpoint, I guess I had better stay with the good old solution - turning to Google Cloud Functions.

Also keep in mind that cloud functions has a 100MB limit as well, sadly. why can't there be a semi-decent pdf parsing library out there... So frustrating

@Luluno01
Copy link
Contributor Author

I installed the exact versions of next, pdfjs-dist and canvas in the reproducer as used by the other project of mine that magically works. It doesn't help ruling out the segment fault, though. Also, the bundle size of the endpoint that uses pdfjs-dist grows to 50 MB already, which is close to the size limit posed by Vercel. Considering it is very likely that I will add more functionalities to the endpoint, I guess I had better stay with the good old solution - turning to Google Cloud Functions.

yeah I'll have to as well, or at least host a nodejs backend on Vercel, there was a change that pdfjs-dist introduced that ballooned the bundle size, I read it somewhere yestarday but can't remember where

I guess it might have something to do with the transient dependency canvas. Although it is an optional dependency of pdfjs-dist, webpack decides it needs that package and it might be bundling the huge binaries of canvas.

@Luluno01
Copy link
Contributor Author

Luluno01 commented Nov 12, 2023

I installed the exact versions of next, pdfjs-dist and canvas in the reproducer as used by the other project of mine that magically works. It doesn't help ruling out the segment fault, though. Also, the bundle size of the endpoint that uses pdfjs-dist grows to 50 MB already, which is close to the size limit posed by Vercel. Considering it is very likely that I will add more functionalities to the endpoint, I guess I had better stay with the good old solution - turning to Google Cloud Functions.

Also keep in mind that cloud functions has a 100MB limit as well, sadly. why can't there be a semi-decent pdf parsing library out there... So frustrating

Cloud Functions has much relaxed restrictions as claimed here:

100MB (compressed) for sources. 500MB (uncompressed) for sources plus modules. (1st gen max deployment size)
N/A (2nd gen max deployment size)

@AChangXD
Copy link

AChangXD commented Nov 13, 2023

@Luluno01 I downgraded next to 13.5.6 and at least langchain's PDFLoader is working? I'm guessing they bundle the PDFLoader in a specific way that the other libraries don't?

@Luluno01
Copy link
Contributor Author

@Luluno01 I downgraded next to 13.5.6 and at least langchain's PDFLoader is working? I'm guessing they bundle the PDFLoader in a specific way that the other libraries don't?

I was testing with getDocument and none of canary, 13.5.6 or 13.5.4 works in the reproducer. My other project which runs next.js 13.5.4, however, works magically. I don't think it's a good idea to use that library in an unstable hacky way.

@AChangXD
Copy link

@Luluno01 So deployed my Node/Express backend on Vercel and got this as well: Unhandled Promise Rejection {"errorType":"Runtime.UnhandledPromiseRejection","errorMessage":"Error: Setting up fake worker failed: "Cannot find module '/var/task/node_modules/pdfjs-dist/build/pdf.worker.mjs' imported from /var/task/node_modules/pdfjs-dist/build/pdf.mjs".","reason":{"errorType":"Error","errorMessage":"Setting up fake worker failed: "Cannot find module '/var/task/node_modules/pdfjs-dist/build/pdf.worker.mjs' imported from /var/task/node_modules/pdfjs-dist/build/pdf.mjs".","stack":["Error: Setting up fake worker failed: "Cannot find module '/var/task/node_modules/pdfjs-dist/build/pdf.worker.mjs' imported from /var/task/node_modules/pdfjs-dist/build/pdf.mjs"."," at file:///var/task/node_modules/pdfjs-dist/build/pdf.mjs:3720:36"," at processTicksAndRejections (node:internal/process/task_queues:95:5)"]},"promise":{},"stack":["Runtime.UnhandledPromiseRejection: Error: Setting up fake worker failed: "Cannot find module '/var/task/node_modules/pdfjs-dist/build/pdf.worker.mjs' imported from /var/task/node_modules/pdfjs-dist/build/pdf.mjs"."," at process. (file:///var/runtime/index.mjs:1276:17)"," at process.emit (node:events:526:35)"," at process.emit (/var/task/___vc/__launcher/__sourcemap_support.js:602:21)"," at emit (node:internal/process/promises:150:20)"," at processPromiseRejections (node:internal/process/promises:284:27)"," at processTicksAndRejections (node:internal/process/task_queues:96:32)"]}
Unknown application error occurred
Runtime.Unknown

Works fine and dandy on localhost, think this one is related to ESM though

@Luluno01
Copy link
Contributor Author

@Luluno01 So deployed my Node/Express backend on Vercel and got this as well: Unhandled Promise Rejection {"errorType":"Runtime.UnhandledPromiseRejection","errorMessage":"Error: Setting up fake worker failed: "Cannot find module '/var/task/node_modules/pdfjs-dist/build/pdf.worker.mjs' imported from /var/task/node_modules/pdfjs-dist/build/pdf.mjs".","reason":{"errorType":"Error","errorMessage":"Setting up fake worker failed: "Cannot find module '/var/task/node_modules/pdfjs-dist/build/pdf.worker.mjs' imported from /var/task/node_modules/pdfjs-dist/build/pdf.mjs".","stack":["Error: Setting up fake worker failed: "Cannot find module '/var/task/node_modules/pdfjs-dist/build/pdf.worker.mjs' imported from /var/task/node_modules/pdfjs-dist/build/pdf.mjs"."," at file:///var/task/node_modules/pdfjs-dist/build/pdf.mjs:3720:36"," at processTicksAndRejections (node:internal/process/task_queues:95:5)"]},"promise":{},"stack":["Runtime.UnhandledPromiseRejection: Error: Setting up fake worker failed: "Cannot find module '/var/task/node_modules/pdfjs-dist/build/pdf.worker.mjs' imported from /var/task/node_modules/pdfjs-dist/build/pdf.mjs"."," at process. (file:///var/runtime/index.mjs:1276:17)"," at process.emit (node:events:526:35)"," at process.emit (/var/task/___vc/__launcher/__sourcemap_support.js:602:21)"," at emit (node:internal/process/promises:150:20)"," at processPromiseRejections (node:internal/process/promises:284:27)"," at processTicksAndRejections (node:internal/process/task_queues:96:32)"]} Unknown application error occurred Runtime.Unknown

Works fine and dandy on localhost, think this one is related to ESM though

I think you might have to import the minified version as pdf.mjs uses await import(/* webpackIgnore: true */ this.workerSrc) to import the worker module dynamically, which requires manual setup to ensure the worker module being bundled separately. The minified version, in contrast, has the magic comment /* webpackIgnore: true */ stripped but still keeps the dynamic import, allowing this dynamic import to be intercepted by Next.js's Webpack. As far as I know, that's very likely why my hacky workaround tricks Webpack into bundling and registering an import path for pdf.worker.mjs.

@AChangXD
Copy link

@Luluno01 So deployed my Node/Express backend on Vercel and got this as well: Unhandled Promise Rejection {"errorType":"Runtime.UnhandledPromiseRejection","errorMessage":"Error: Setting up fake worker failed: "Cannot find module '/var/task/node_modules/pdfjs-dist/build/pdf.worker.mjs' imported from /var/task/node_modules/pdfjs-dist/build/pdf.mjs".","reason":{"errorType":"Error","errorMessage":"Setting up fake worker failed: "Cannot find module '/var/task/node_modules/pdfjs-dist/build/pdf.worker.mjs' imported from /var/task/node_modules/pdfjs-dist/build/pdf.mjs".","stack":["Error: Setting up fake worker failed: "Cannot find module '/var/task/node_modules/pdfjs-dist/build/pdf.worker.mjs' imported from /var/task/node_modules/pdfjs-dist/build/pdf.mjs"."," at file:///var/task/node_modules/pdfjs-dist/build/pdf.mjs:3720:36"," at processTicksAndRejections (node:internal/process/task_queues:95:5)"]},"promise":{},"stack":["Runtime.UnhandledPromiseRejection: Error: Setting up fake worker failed: "Cannot find module '/var/task/node_modules/pdfjs-dist/build/pdf.worker.mjs' imported from /var/task/node_modules/pdfjs-dist/build/pdf.mjs"."," at process. (file:///var/runtime/index.mjs:1276:17)"," at process.emit (node:events:526:35)"," at process.emit (/var/task/___vc/__launcher/__sourcemap_support.js:602:21)"," at emit (node:internal/process/promises:150:20)"," at processPromiseRejections (node:internal/process/promises:284:27)"," at processTicksAndRejections (node:internal/process/task_queues:96:32)"]} Unknown application error occurred Runtime.Unknown
Works fine and dandy on localhost, think this one is related to ESM though

I think you might have to import the minified version as pdf.mjs uses await import(/* webpackIgnore: true */ this.workerSrc) to import the worker module dynamically, which requires manual setup to ensure the worker module being bundled separately. The minified version, in contrast, has the magic comment /* webpackIgnore: true */ stripped but still keeps the dynamic import, allowing this dynamic import to be intercepted by Next.js's Webpack. As far as I know, that's very likely why my hacky workaround tricks Webpack into bundling and registering an import path for pdf.worker.mjs.

Yep you are right, that worked for me! Seems like Vercel also have issues finding .wasm files as well:
Aborted(Error: ENOENT: no such file or directory, open '/var/task/node_modules/tesseract.js-core/tesseract-core-simd.wasm') Uncaught Exception {"errorType":"RuntimeError","errorMessage":"Aborted(Error: ENOENT: no such file or directory, open '/var/task/node_modules/tesseract.js-core/tesseract-core-simd.wasm'). Build with -sASSERTIONS for more info.","stack":["RuntimeError: Aborted(Error: ENOENT: no such file or directory, open '/var/task/node_modules/tesseract.js-core/tesseract-core-simd.wasm'). Build with -sASSERTIONS for more info."," at n (/var/task/node_modules/tesseract.js-core/tesseract-core-simd.js:13:225)"," at Ma (/var/task/node_modules/tesseract.js-core/tesseract-core-simd.js:14:143)"," at /var/task/node_modules/tesseract.js-core/tesseract-core-simd.js:14:491"]} Unknown application error occurred Runtime.Unknown

This might be webpack not bundling the .wasm as well? I never thought it would be this much headache to get two packages running on Vercel...

@Luluno01
Copy link
Contributor Author

@Luluno01 So deployed my Node/Express backend on Vercel and got this as well: Unhandled Promise Rejection {"errorType":"Runtime.UnhandledPromiseRejection","errorMessage":"Error: Setting up fake worker failed: "Cannot find module '/var/task/node_modules/pdfjs-dist/build/pdf.worker.mjs' imported from /var/task/node_modules/pdfjs-dist/build/pdf.mjs".","reason":{"errorType":"Error","errorMessage":"Setting up fake worker failed: "Cannot find module '/var/task/node_modules/pdfjs-dist/build/pdf.worker.mjs' imported from /var/task/node_modules/pdfjs-dist/build/pdf.mjs".","stack":["Error: Setting up fake worker failed: "Cannot find module '/var/task/node_modules/pdfjs-dist/build/pdf.worker.mjs' imported from /var/task/node_modules/pdfjs-dist/build/pdf.mjs"."," at file:///var/task/node_modules/pdfjs-dist/build/pdf.mjs:3720:36"," at processTicksAndRejections (node:internal/process/task_queues:95:5)"]},"promise":{},"stack":["Runtime.UnhandledPromiseRejection: Error: Setting up fake worker failed: "Cannot find module '/var/task/node_modules/pdfjs-dist/build/pdf.worker.mjs' imported from /var/task/node_modules/pdfjs-dist/build/pdf.mjs"."," at process. (file:///var/runtime/index.mjs:1276:17)"," at process.emit (node:events:526:35)"," at process.emit (/var/task/___vc/__launcher/__sourcemap_support.js:602:21)"," at emit (node:internal/process/promises:150:20)"," at processPromiseRejections (node:internal/process/promises:284:27)"," at processTicksAndRejections (node:internal/process/task_queues:96:32)"]} Unknown application error occurred Runtime.Unknown
Works fine and dandy on localhost, think this one is related to ESM though

I think you might have to import the minified version as pdf.mjs uses await import(/* webpackIgnore: true */ this.workerSrc) to import the worker module dynamically, which requires manual setup to ensure the worker module being bundled separately. The minified version, in contrast, has the magic comment /* webpackIgnore: true */ stripped but still keeps the dynamic import, allowing this dynamic import to be intercepted by Next.js's Webpack. As far as I know, that's very likely why my hacky workaround tricks Webpack into bundling and registering an import path for pdf.worker.mjs.

Yep you are right, that worked for me! Seems like Vercel also have issues finding .wasm files as well: Aborted(Error: ENOENT: no such file or directory, open '/var/task/node_modules/tesseract.js-core/tesseract-core-simd.wasm') Uncaught Exception {"errorType":"RuntimeError","errorMessage":"Aborted(Error: ENOENT: no such file or directory, open '/var/task/node_modules/tesseract.js-core/tesseract-core-simd.wasm'). Build with -sASSERTIONS for more info.","stack":["RuntimeError: Aborted(Error: ENOENT: no such file or directory, open '/var/task/node_modules/tesseract.js-core/tesseract-core-simd.wasm'). Build with -sASSERTIONS for more info."," at n (/var/task/node_modules/tesseract.js-core/tesseract-core-simd.js:13:225)"," at Ma (/var/task/node_modules/tesseract.js-core/tesseract-core-simd.js:14:143)"," at /var/task/node_modules/tesseract.js-core/tesseract-core-simd.js:14:491"]} Unknown application error occurred Runtime.Unknown

This might be webpack not bundling the .wasm as well? I never thought it would be this much headache to get two packages running on Vercel...

Very likely. If you have to use tesseract.js on Vercel, another workaround is to bypass Next.js and register a separate folder as your function implementation (you will need to do your own vendoring/bundling/tree-shaking). See vercel.json for more details.

@malikiz
Copy link

malikiz commented Jan 9, 2024

I decided to follow a simple path, I downloaded the stable version from the official website. I put all the files in the public folder. Then I added this tag to my component:

<script src="/pdfjs/pdf.mjs" type="module" />

then adding code in useEffect:

  const pdfjs = window.pdfjsLib as typeof import('pdfjs-dist/types/src/pdf')
  const pdfjsWorker = await import('pdfjs-dist/build/pdf.worker.min.mjs');
  pdfjs.GlobalWorkerOptions.workerSrc = pdfjsWorker;

  const pdfDocument = pdfjs.getDocument('http://localhost:3000/pdf-files/myFile.pdf')

  console.log('pdfDocument', pdfDocument);

@huozhi huozhi added Module Resolution Module resolution (CJS / ESM, module resolving). and removed TypeScript Related to types with Next.js. labels Jan 9, 2024
@huozhi
Copy link
Member

huozhi commented Jan 9, 2024

Hi, there're some bundling fixes are landed on the canary (14.0.5-canary.45) I tested against latest canary it works well now.
getDocument is a valid function.
Another thing to notice that you don't need to remove .default to get the full module imports await import('pdfjs-dist')

@huozhi huozhi closed this as completed Jan 9, 2024
@Luluno01
Copy link
Contributor Author

Hi, there're some bundling fixes are landed on the canary (14.0.5-canary.45) I tested against latest canary it works well now. getDocument is a valid function. Another thing to notice that you don't need to remove .default to get the full module imports await import('pdfjs-dist')

Good to hear that! Could you please elaborate a bit on what the fix is and how it fixes the issue? Will that fix land on 13.x, or how can we cherry-pick that that fix to 13.x? Thanks a lot.

@huozhi
Copy link
Member

huozhi commented Jan 10, 2024

There're few module resolution related bundling fixes applied after 14.0.4, on canary now. Unfortunately we're not going to apply them back to 13.x.

@Luluno01
Copy link
Contributor Author

Luluno01 commented Jan 10, 2024

There're few module resolution related bundling fixes applied after 14.0.4, on canary now. Unfortunately we're not going to apply them back to 13.x.

Okayyyy... Thank you for your reply. Sounds like I have to upgrade to 14.0.5+ later to be able to use pdfjs with less workaround.

@dhallX
Copy link

dhallX commented Jan 22, 2024

is there an updated solution for this? facing the same issues: import trace for request module/Release/canvas.node

next version 14.0.5

@Luluno01
Copy link
Contributor Author

is there an updated solution for this? facing the same issues: import trace for request module/Release/canvas.node

next version 14.0.5

No, I don't find a new solution to this. But you can post the full context and error message here or in a new issue since you are using 14.0.5 which they claimed to have the issue fixed.

@dhallX
Copy link

dhallX commented Jan 22, 2024

my use case is for a file image generator in a hook

`
import pdfjs from "pdfjs-dist";

export default function useFileImageGenerator() {

function getThumbnail(file: File) {
const canvas = document.createElement("canvas");
const context = canvas.getContext("2d");

console.log("1", file, canvas, context);

if (context !== null) {
  pdfjs
    .getDocument(file)
    .promise.then(pdfDoc => pdfDoc.getPage(1))
    .then(page => {
      console.log("2", page);
      const viewport = page.getViewport({ scale: 1 });
      canvas.width = viewport.width;
      canvas.height = viewport.height;

      const renderContext = {
        canvasContext: context,
        viewport: viewport,
      };
      console.log("3", renderContext);
      return page.render(renderContext).promise;
    })
    .then(() => {
      const imageDataUrl = canvas.toDataURL("image/png");

      console.log("4", imageDataUrl);
      const blob = dataURLtoBlob(imageDataUrl);
      // Create a File object
      const fileName = `${file.name}_screenshot.png`;

      const thumbnailImage = new File([blob], fileName, { type: "image/png" });
      console.log("5", thumbnailImage);
      return thumbnailImage;
    })
    .catch(error => {
      console.error("unable to convert pdf to image:", error);
    });
}

}

const dataURLtoBlob = (dataURL: string) => {
const arr = dataURL.split(",");
const mimeMatch = arr[0].match(/:(.*?);/);
const mime = mimeMatch ? mimeMatch[1] : "application/octet-stream";
const bstr = window.atob(arr[1]);
let n = bstr.length;
const u8arr = new Uint8Array(n);

while (n--) {
  u8arr[n] = bstr.charCodeAt(n);
}

return new Blob([u8arr], { type: mime });

};

return getThumbnail;
}
`

next 14.0.5
pdfjs-dist ^3.11.174
canvas 2.11.2

image

@Luluno01
Copy link
Contributor Author

my use case is for a file image generator in a hook

` import pdfjs from "pdfjs-dist";

export default function useFileImageGenerator() {

function getThumbnail(file: File) { const canvas = document.createElement("canvas"); const context = canvas.getContext("2d");

console.log("1", file, canvas, context);

if (context !== null) {
  pdfjs
    .getDocument(file)
    .promise.then(pdfDoc => pdfDoc.getPage(1))
    .then(page => {
      console.log("2", page);
      const viewport = page.getViewport({ scale: 1 });
      canvas.width = viewport.width;
      canvas.height = viewport.height;

      const renderContext = {
        canvasContext: context,
        viewport: viewport,
      };
      console.log("3", renderContext);
      return page.render(renderContext).promise;
    })
    .then(() => {
      const imageDataUrl = canvas.toDataURL("image/png");

      console.log("4", imageDataUrl);
      const blob = dataURLtoBlob(imageDataUrl);
      // Create a File object
      const fileName = `${file.name}_screenshot.png`;

      const thumbnailImage = new File([blob], fileName, { type: "image/png" });
      console.log("5", thumbnailImage);
      return thumbnailImage;
    })
    .catch(error => {
      console.error("unable to convert pdf to image:", error);
    });
}

}

const dataURLtoBlob = (dataURL: string) => { const arr = dataURL.split(","); const mimeMatch = arr[0].match(/:(.*?);/); const mime = mimeMatch ? mimeMatch[1] : "application/octet-stream"; const bstr = window.atob(arr[1]); let n = bstr.length; const u8arr = new Uint8Array(n);

while (n--) {
  u8arr[n] = bstr.charCodeAt(n);
}

return new Blob([u8arr], { type: mime });

};

return getThumbnail; } `

next 14.0.5 pdfjs-dist ^3.11.174 canvas 2.11.2

image

@huozhi Any thoughts?

Copy link
Contributor

github-actions bot commented Feb 6, 2024

This closed issue has been automatically locked because it had no new activity for 2 weeks. If you are running into a similar issue, please create a new issue with the steps to reproduce. Thank you.

@github-actions github-actions bot added the locked label Feb 6, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue was opened via the bug report template. locked Module Resolution Module resolution (CJS / ESM, module resolving).
Projects
None yet
Development

No branches or pull requests

5 participants