Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce in-memory mutex to avoid race condition issues in parallel workers runs #819

Closed
wants to merge 3 commits into from

Conversation

dario-piotrowicz
Copy link
Member

fixes #805

Copy link

changeset-bot bot commented Jun 29, 2024

🦋 Changeset detected

Latest commit: 0ba6498

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages
Name Type
@cloudflare/next-on-pages Patch
eslint-plugin-next-on-pages Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

github-actions bot commented Jun 29, 2024

🧪 Prereleases are available for testing 🧪

@cloudflare/next-on-pages

You can install this latest build in your project with:

npm install --save-dev https://prerelease-registry.devprod.cloudflare.dev/next-on-pages/runs/9794913991/npm-package-next-on-pages-819

@cloudflare/eslint-plugin-next-on-pages

You can install this latest build in your project with:

npm install --save-dev https://prerelease-registry.devprod.cloudflare.dev/next-on-pages/runs/9794913991/npm-package-eslint-plugin-next-on-pages-819

@ostenbom
Copy link
Contributor

ostenbom commented Jul 3, 2024

@dario-piotrowicz We haven't managed to get this pre-release in production for us because there seems to be a deadlock somewhere (on a small proportion of requests). Really struggling to create a reproducible case since the worker isn't flushing log lines until the request completes and the request never does complete 😅

@dario-piotrowicz
Copy link
Member Author

@dario-piotrowicz We haven't managed to get this pre-release in production for us because there seems to be a deadlock somewhere (on a small proportion of requests). Really struggling to create a reproducible case since the worker isn't flushing log lines until the request completes and the request never does complete 😅

oh... ok that's not good 😓

let me have a think about it 😓

@dario-piotrowicz
Copy link
Member Author

@ostenbom I've added a finally block for releasing the mutex, I hope this should be enough to solve your issue, could you please check if you still get the issue on your end and let me know? 🙏

(if this doesn't fix it I have another idea in mind, but I hope that this should be enough)

@ostenbom
Copy link
Contributor

ostenbom commented Jul 4, 2024

@dario-piotrowicz

Here are the very strange conditions under which I can get it to deadlock 🤪 :

  • Added a middleware to do a NextResponse.rewrite()
  • Deploy to cf pages
  • curl -I https://my-parallel-app.pages.dev/ -H 'x-forwarded-host: 019c8a6b.my-parallel-app.pages.dev'
  • Deadlock, eventually we see 2 cancelled requests in the logs to the two different domains of the pages worker

Screenshot 2024-07-04 at 17 31 43

The last finally commit didn't do the trick unfortunately. Could it have to do with the behaviour of NextResponse.rewrite()?

@ostenbom
Copy link
Contributor

ostenbom commented Jul 9, 2024

@dario-piotrowicz anything more you can do here? Or is this a not-so-expected use case? Appreciate your help here! ⭐

* What is this file for?
*
* workerd isolates share the same modules registry meaning that it can happen for the same isolate to run
* multiple instances of the same worker and those will share the modules' state.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An isolate by definition runs only one "instance" of the Worker. However, one Worker instance may handle many requests, possibly at the same time, possibly from different users. Global state will be shared across all these requests.

Basically all servers work this way, including Node.js. We actually would like to change this in Workers, but it requires building new fundamental technology.

@ostenbom
Copy link
Contributor

Sorry to keep pinging you @dario-piotrowicz but this is affecting hundreds of our users every day, and is our main source of instability on next-on-pages, is there anything we can do to fix this?

@dario-piotrowicz
Copy link
Member Author

dario-piotrowicz commented Jul 17, 2024

@ostenbom sorry, I meant to reply to your older ping sooner 😓

I don't think that this is a not-so-expected use case and I do believe we need to fix it asap...

I simply hadn't had time to dedicate to this (as I've been working on other unrelated projects)

I don't think that there's anything specific you can do on your end 😓

anyways I'm really sorry that this is affecting many of your users, I will try my best to prioritize this work and I promise to either give you a solution or some concrete info by the end of this week/begging of the next (monday tuesday (😅) at the very latest) 🙏

@dario-piotrowicz
Copy link
Member Author

Hi @ostenbom 🙂

I've tried a different solution for this in #836
(since the in-memory mutex was just a first attempt which would always at best be a sub-optimal solution)

I've been testing this with your reproduction and it seems to be working smoothly 😃 (hopefully I am not getting false positives for some reason 🤔)

I still want to polish it a bit and analyze more this approach, but please if you want try the PR's prerelease and let me know if it works for you 🙂

@ostenbom
Copy link
Contributor

Thanks @dario-piotrowicz! This one looks good! Both passing the reproduction and our internal tests, plus not deadlocking! 😍

@dario-piotrowicz
Copy link
Member Author

@ostenbom Awesome! thanks for confirming! 😄

I'll close this PR then and let's just move to the other one, I need to do some light polishing and I think it should be good to go 🙂

@dario-piotrowicz dario-piotrowicz deleted the in-memory-mutex branch July 24, 2024 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[🐛 Bug]: React contexts break when loading pages in parallel
3 participants