Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak with Next.js's global fetch. Tested against http module #64212

Closed
m-rphy opened this issue Apr 8, 2024 · 25 comments · Fixed by #64746
Closed

Memory Leak with Next.js's global fetch. Tested against http module #64212

m-rphy opened this issue Apr 8, 2024 · 25 comments · Fixed by #64746
Labels
bug Issue was opened via the bug report template. locked Pages Router Related to Pages Router.

Comments

@m-rphy
Copy link

m-rphy commented Apr 8, 2024

Link to the code that reproduces this issue

https://github.com/m-rphy/nextMemoryLeak

To Reproduce

  1. In one terminal and cd into the /express_server and run npm start

  2. In a sperate teminal cd in /next_app and run npm run inspect

  3. Using any browser go to either http://localhost:3000/start-fetch or http://localhost:3000/start-custom-fetch to begin the requests.

  4. Then open chrome inspect (chrome://inspect) or use any other debugging tools.

I believe this is ticket is also relevant - but this repo reproduces it with and avoids it -> #54708

Current vs. Expected behavior

Next.js's global fetch is holding onto performance metrics or some other data that is leading to heap growth after every requests.

The heap should not grow after the scope closes.

Provide environment information

Operating System:
  Platform: darwin
  Arch: arm64
  Version: Darwin Kernel Version 23.4.0: Fri Mar 15 00:12:49 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6020
  Available memory (MB): 16384
  Available CPU cores: 10
Binaries:
  Node: 20.10.0
  npm: 10.2.3
  Yarn: N/A
  pnpm: N/A
Relevant Packages:
  next: 14.2.0-canary.62 // Latest available version is detected (14.2.0-canary.62).
  eslint-config-next: 14.1.4
  react: 18.2.0
  react-dom: 18.2.0
  typescript: 5.4.4
Next.js Config:
  output: N/A

Which area(s) are affected? (Select all that apply)

Not sure, Data fetching (gS(S)P, getInitialProps)

Which stage(s) are affected? (Select all that apply)

next dev (local), next build (local), next start (local), Other (Deployed)

Additional context

I've tested this repo against different canary releases (canary-32 and canary-62), as well as Next.js lts.
I haven't been able to find a version that works.

@m-rphy m-rphy added the bug Issue was opened via the bug report template. label Apr 8, 2024
@github-actions github-actions bot added the Pages Router Related to Pages Router. label Apr 8, 2024
@m-rphy
Copy link
Author

m-rphy commented Apr 8, 2024

Using next.js fetch
before_720

Using Axios (implements nodes http module)
after

@billnbell2

This comment has been minimized.

@stephanebachelier
Copy link

@m-rphy interesting! I've a question about the charts with next.js's global fetch. What is the cause of the halt in the memory consumption? You've stopped the process or it's the garbage collector?

@m-rphy
Copy link
Author

m-rphy commented Apr 9, 2024

We switched to using axios for request handling instead of the native fetch. That's it. These charts, produced by Grafana for our production environment, illustrate the severity of the memory leak issue we faced. Our application, which processes thousands of outgoing requests per second, suffered significantly due to this memory leak. This repository is capable of reproducing the memory leak, as well as bypassing it by simply changing the HTTP handler. While we use axios in our production app, in this repo's toy model, I implemented a simple "fetch" with Node.js' http module. However, the effect remains the same. If you by pass next.js fetch and use one that implement node's http module, the leak goes away.

@stephanebachelier
Copy link

@m-rphy Thanks for the explanations!! I'll switch then to axios.
The issue I see is running services on the edge runtime where you don't have access to HTTP module. I've only a few services so I'll see later and move them back to node runtime.
As I rely heavily on fetch for non edge services, by migrating away from fetch, I should definitely find a significant gain.

@billnbell2
Copy link

How did you switch them to Axios for Apollo Client in NextJS?

@billnbell2
Copy link

We are using cross-fetch and it works much better.

@GabenGar
Copy link
Contributor

@stephanebachelier

As I rely heavily on fetch for non edge services, by migrating away from fetch, I should definitely find a significant gain.

I don't see how are you going not to "use fetch" on a custom watered down javascript runtime, aka edge. Axios wraps either XMLHttpRequest (browser-only) or http/https (nodejs-only).
Unless you plan to write an http handler from scratch, but then it will bloat the size of every single edge function relying on it.

@stephanebachelier
Copy link

@GabenGar

I don't see how are you going not to "use fetch" on a custom watered down javascript runtime, aka edge. Axios wraps either XMLHttpRequest (browser-only) or http/https (nodejs-only).

You're absolutely right.

Maybe I'm wrong but I say for lambdas running on nodejs runtime I could replace Next.js's fetch for axios/cross-fetch/... to have a significant gain.

For edge lambda I will either stick to global fetch or move them to nodejs runtime with non global fetch.

@feedthejim
Copy link
Contributor

hey folks, we think we identified the source of the leak, we'll try to fix it by next week

@ElHawary-Ebutler

This comment has been minimized.

@feedthejim
Copy link
Contributor

We're still investigating!

@billnbell2
Copy link

Anyone have a work around besides using cross-fetch ?

@feedthejim
Copy link
Contributor

@m-rphy in your screenshot, are the graph drops due to a server restart? What I can see for now is that there's some fetch data being collected as part of the performance observer metrics, but I think those get flushed, and some others we're collecting that is being retained.

@m-rphy
Copy link
Author

m-rphy commented Apr 18, 2024

Yes. The server is running in kubernetes and the service is restarted when resources are starved.

@billnbell2
Copy link

billnbell2 commented Apr 18, 2024

If I use Cross-fetch in Apollo Client to response time (doing 10 calls) reduces from 1.5 seconds to .5 seconds consistently. Apollo Client I believe uses fetch() that is in NextJS right?

Like this:

import fetch from 'cross-fetch';

 new ApolloClient({
    link: createHttpLink({
            uri: process.env.NEXT_PUBLIC_GRAPHQL_URL,
            fetch
        })
})

ijjk added a commit that referenced this issue Apr 18, 2024
This ensures we only track fetch metrics in development mode as that's
the only time we report them currently, this also adds an upper limit on
how many metrics we store per-request as previously this was unbounded,
on top of that this ensures we don't keep tracking fetch metrics after
the request has ended as we only report on request end, and finally this
adds a clean-up to the reference we attach to the request object
containing the fetch metrics once we have used them.

Closes: #64212

Closes NEXT-3159
@ijjk
Copy link
Member

ijjk commented Apr 19, 2024

Hi, this has been updated in v14.3.0-canary.11 of Next.js, please update and give it a try!

@m-rphy
Copy link
Author

m-rphy commented Apr 19, 2024

Looks like it's working well! I've only tested in the toy app, but we'll port the fetching back to the native fetch and do some testing in the production application.

huozhi pushed a commit that referenced this issue Apr 23, 2024
This ensures we only track fetch metrics in development mode as that's
the only time we report them currently, this also adds an upper limit on
how many metrics we store per-request as previously this was unbounded,
on top of that this ensures we don't keep tracking fetch metrics after
the request has ended as we only report on request end, and finally this
adds a clean-up to the reference we attach to the request object
containing the fetch metrics once we have used them.

Closes: #64212

Closes NEXT-3159
@mdahiemstra
Copy link

This has also been released in v14.2.3. Seems to resolve the memory issue 🚀 , but unfortunately we still had to switch to cross-env because of #64956

@RashmikaFeefo
Copy link

I still face Reached heap limit Allocation failed - JavaScript heap out of memory even I've updated to Next 14.2.3. It only occurs in prod(Node 19). I have fetch calls as well. Any solution?

@ijjk
Copy link
Member

ijjk commented May 3, 2024

@RashmikaFeefo if you upgraded and are still facing the issue it's most likely unrelated to fetch at this point and you might have a user code memory leak. Investigating via node --inspect and taking heap snapshots could help surface your issue.

@RashmikaFeefo
Copy link

RashmikaFeefo commented May 4, 2024

@RashmikaFeefo if you upgraded and are still facing the issue it's most likely unrelated to fetch at this point and you might have a user code memory leak. Investigating via node --inspect and taking heap snapshots could help surface your issue.

I've used 14.2.3 and not v14.3.0-canary.11. And yeah I already tried node --inspect but didn't find any code issues. I've called several async APIs using fetch in a server component (api/[directory]/page.tsx) and facing this issue in prod.

@ijjk
Copy link
Member

ijjk commented May 6, 2024

@RashmikaFeefo please open a new issue with a minimal reproduction and we can take a closer look, although without further info there isn't much we can investigate on our end.

Copy link
Contributor

This closed issue has been automatically locked because it had no new activity for 2 weeks. If you are running into a similar issue, please create a new issue with the steps to reproduce. Thank you.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue was opened via the bug report template. locked Pages Router Related to Pages Router.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants