Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow main thread to be supplanted by another #38454

Open
bmeck opened this issue Apr 28, 2021 · 13 comments
Open

Allow main thread to be supplanted by another #38454

bmeck opened this issue Apr 28, 2021 · 13 comments
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. worker Issues and PRs related to Worker support.

Comments

@bmeck
Copy link
Member

bmeck commented Apr 28, 2021

Is your feature request related to a problem? Please describe.
Please describe the problem you are trying to solve.

ESM in Node requires command line flags in order to use --loader. This is because if we try to provide an API to customize the loader, the entire module graph for the application is already linked and it is too late to do so. Common tools like APMs run on environments like cloud functions that do not support customizing CLI flags but need to muck with the loading systems.

Describe the solution you'd like
Please describe the desired behavior.

When spinning up workers, we can already custom configure --loader since that module graph hasn't linked yet. I would like for applications to be able to do so in a way that makes the worker act exactly as the original main thread: no piping of I/O, no limitations on process, etc.

This likely would be something like:

// entrypoint
import init from 'apm'
init('path/to/app');
// APM
const loader = 'path/to/loader.mjs';
export function init(entry) {
  worker_threads.replaceSelf(entry);
}

Likely this feature needs to sort out file descriptor behavior in particular when replacing the main/current thread and what happens with potentially orphaned workers.

Describe alternatives you've considered
Please describe alternative solutions or features you have considered.

None given how ESM works and links eagerly this is the only solution I've found.

@bmeck bmeck added esm Issues and PRs related to the ECMAScript Modules implementation. worker Issues and PRs related to Worker support. labels Apr 28, 2021
@jasnell
Copy link
Member

jasnell commented Apr 28, 2021

This is definitely a difficult problem given the relationship of Environment to the thread and the way the main thread is bootstrapped. The only way I could imagine being able to tackle this would be to redefine the main thread as "Just Another Type of Worker Thread" that is spun off a top level root thread.

That is, if we think about the current structure as a DAG, what we currently have is:

 (root thread 1) | ---> (worker thread 2)
  Environment    |         Environment
    Isolate      |           Isolate
                 |
                 | ---> (worker thread 3)
                           Environment
                             Isolate

To do what you're suggesting, we would likely need to move to a model like:

  (root thread) | ---> (worker thread 1) | ---> (worker thread 2)
                         Environment     |          Environment
                           Isolate       |            Isolate
                                         | ---> (worker thread 3)
                                                    Environment
                                                      Isolate

From there, we could allow a parent worker to be replaced by one of the children, but as you point out it gets complicated when we ask what happens with the other child.

  (root thread) | ---> (worker thread 1) | ---> (worker thread 2)  (replace parent with this thread)
                         Environment     |         Environment
                           Isolate       |           Isolate
                                         | ---> (worker thread 3)
                                                   Environment
                                                     Isolate

  (root thread) | ---> (worker thread 2)
                |         Environment
                |           Isolate
                | ---> (worker thread 3)
                          Environment
                            Isolate

There are a couple of options. (a) We could terminate orphan workers automatically, (b) We could just leave them orphaned, (c) We could transfer responsibility for them to the new parent.

I think we can all agree that (b) is not a great choice. We can let the user code decide between (a) and (c).

For (c), the DAG would end up as:

  (root thread) | ---> (worker thread 2)  | ---> (worker thread 3)
                          Environment               Environment
                            Isolate                   Isolate

Obviously there are a ton of complexities wrapped up in that!

The key question I would have initially is: Is replacing threads really what we need? Alternatively, we could implement the ability for a Worker Thread to transfer ownership/responsibility for a child thread to a different thread.

For instance, the Node.js main process thread can load the apm implementation in a separate thread, which does its thing and spawns the instrumented application code thread. Once the application code thread is spun up, the APM thread transfers ownership of that back to the main thread then exits. This, however, obviously does not provide the requirement of the spun up worker acting like it's the main thread (no piped i/o, etc)

The other possibility is whether APM's could just be written to act as alternative launchers, or embedders, of the Node.js process so that they can inject the loaders the extensive code changes that would make replacing the thread possible. I know that has its own associated complexities, however.

@bmeck
Copy link
Member Author

bmeck commented Apr 28, 2021

The key question I would have initially is: Is replacing threads really what we need? Alternatively, we could implement the ability for a Worker Thread to transfer ownership/responsibility for a child thread to a different thread.

Well, the problem is only the main thread can act like a main thread given how worker_threads are implemented, we could also make this just be some form of allowing worker threads to look and act exactly like the main thread? I didn't want to approach it that way since things can break if running in worker threads and they don't have the proper APIs due to limitations on worker_threads.

For instance, the Node.js main process thread can load the apm implementation in a separate thread, which does its thing and spawns the instrumented application code thread. Once the application code thread is spun up, the APM thread transfers ownership of that back to the main thread then exits.

This would seem fine to me, but still has the problem of the application living in a worker_thread environment which doesn't exactly match the main thread environment. So, this doesn't seem viable and more than other approaches.

The other possibility is whether APM's could just be written to act as alternative launchers, or embedders, of the Node.js process so that they can inject the loaders the extensive code changes that would make replacing the thread possible. I know that has its own associated complexities, however.

How would this be any more able to solve the real world issues than the existing --loader where they do not completely control the env they are running on? Also, doing this as custom embedders seems even harder to compose multiple APMs than --loader.

@devsnek
Copy link
Member

devsnek commented Apr 28, 2021

None given how ESM works and links eagerly this is the only solution I've found.

Theoretically couldn't you use a cjs entrypoint that calls import()?

@bmeck
Copy link
Member Author

bmeck commented Apr 28, 2021

@devsnek if the loader hasn't been used yet maybe (--require can call out to import)? but it also wouldn't solve for ESM entrypoints.

@jasnell
Copy link
Member

jasnell commented Apr 28, 2021

How would this be any more able to solve the real world issues than the existing --loader where they do not completely control the env they are running on? Also, doing this as custom embedders seems even harder to compose multiple APMs than --loader.

As I said, it has it's own complexities ;-) ... there's not going to be an easy and simple solution to the problem, unfortunately. At this point I'm just saying we need to consider alternatives and rule them out to try to narrow in on the least complicated and difficult approach. I'm nowhere near convinced that the work necessary to make threads replaceable is it.

@bmeck
Copy link
Member Author

bmeck commented Apr 28, 2021

As I said, it has it's own complexities ;-) ... there's not going to be an easy and simple solution to the problem, unfortunately. At this point I'm just saying we need to consider alternatives and rule them out to try to narrow in on the least complicated and difficult approach.

I'd note that solving for "least complicated and difficult" varies wildly on what you are trying to optimize for. I think we should optimize for some usability beyond custom builds of node since this space is a fairly generic one.

I'm nowhere near convinced that the work necessary to make threads replaceable is it.

As I've stated above, allowing worker threads to act exactly like the main thread would be sufficient as well. Though it would mean effectively leave a dead thread if the original isn't destroyed I'd assume. Also, I would not I am not talking about "native" threads here but conceptual ones. I could easily imagine the replace behavior spinning up an Isolate in the same native thread.

@jasnell
Copy link
Member

jasnell commented Apr 28, 2021

I could easily imagine the replace behavior spinning up an Isolate in the same native thread.

This is an interesting approach also, and one that I've been stewing over... essentially, allowing multiple Environment/Isolates to exist within the same thread and sharing the same event loop. I was hoping to get around to playing with this soon.

@addaleax
Copy link
Member

I could easily imagine the replace behavior spinning up an Isolate in the same native thread.

This is an interesting approach also, and one that I've been stewing over... essentially, allowing multiple Environment/Isolates to exist within the same thread and sharing the same event loop. I was hoping to get around to playing with this soon.

For multiple Environments in the same thread and event loop, this exists: https://github.com/addaleax/synchronous-worker

@jasnell
Copy link
Member

jasnell commented Apr 30, 2021

Oh very nice. I had missed that!

@bmeck bmeck added the loaders-agenda Issues and PRs to discuss during the meetings of the Loaders team label May 6, 2021
@cspotcode
Copy link

Theoretically couldn't you use a cjs entrypoint that calls import()?

if the loader hasn't been used yet maybe (--require can call out to import)? but it also wouldn't solve for ESM entrypoints.

What about dynamic import in an ESM entrypoint?

@bmeck
Copy link
Member Author

bmeck commented May 24, 2021

@cspotcode this leads to a similar situation to how Loaders work today. Loaders actually have a separate Module Map (Registry if you like that term) from application code due to some modules being loaded in the instrumentation and having side effects. To avoid side effects like prototype pollution and/or leaking via singletons of the instrumentation code (not the globals, use --frozen-primordials for now) a separate module instance is loaded for the same URL for Loaders different from the one loaded for Application code. Overall, the problem with stating the entrypoint should just use dynamic import is similar to what requiring a CJS entrypoint is since you can use dynamic import also in CJS.

@cspotcode
Copy link

I'm thinking that the application entrypoint can be something like this:

// entrypoint.mjs
await installInstrumentation();
import('./rest-of-the-application');

And the instrumentation -- whatever it may be -- will apply to the application's entire module graph, with the exception of entrypoint.mjs and whatever it loads to installInstrumentation()

It seems like the above should work if the instrumentation is bundled (via bundleDependencies or other solution) to avoid sharing modules with application code.

Is that the issue, that instrumentation might share dependencies with the application, and thus instrumentation-loaded dependencies will not be instrumented?

@bmeck
Copy link
Member Author

bmeck commented May 24, 2021

Is that the issue, that instrumentation might share dependencies with the application, and thus instrumentation-loaded dependencies will not be instrumented?

Half of that.

Is that the issue, that instrumentation might share dependencies with the application

Yes

thus instrumentation-loaded dependencies will not be instrumented?

This is just one reason, other reasons are things like adding state in observable ways through instrumentation (e.g. how eventEmitter._events is visible and/or doing things like prepend / removing of instrumentation via APIs), or instrumentation code using singleton modules where state is stored.

Honestly, in order to force things to be instrumented for application code, it isn't terrible, just make all app code marked w/ a special query parameter that you strip out of import.meta.url on each module loaded. That however, is a terrible solution due to a variety of reasons like mutating source code and/or some not having query parameter equivalents. This kind of terrible solution was done to prove service workers could in theory emulate the loader API in https://github.com/bmeck/node-sw-compat-loader-test and it is kind of a mess as can be seen.

@GeoffreyBooth GeoffreyBooth removed the loaders-agenda Issues and PRs to discuss during the meetings of the Loaders team label May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. worker Issues and PRs related to Worker support.
Projects
None yet
Development

No branches or pull requests

6 participants