Allow main thread to be supplanted by another #38454

bmeck · 2021-04-28T14:26:54Z

Is your feature request related to a problem? Please describe.
Please describe the problem you are trying to solve.

ESM in Node requires command line flags in order to use --loader. This is because if we try to provide an API to customize the loader, the entire module graph for the application is already linked and it is too late to do so. Common tools like APMs run on environments like cloud functions that do not support customizing CLI flags but need to muck with the loading systems.

Describe the solution you'd like
Please describe the desired behavior.

When spinning up workers, we can already custom configure --loader since that module graph hasn't linked yet. I would like for applications to be able to do so in a way that makes the worker act exactly as the original main thread: no piping of I/O, no limitations on process, etc.

This likely would be something like:

// entrypoint
import init from 'apm'
init('path/to/app');

// APM
const loader = 'path/to/loader.mjs';
export function init(entry) {
  worker_threads.replaceSelf(entry);
}

Likely this feature needs to sort out file descriptor behavior in particular when replacing the main/current thread and what happens with potentially orphaned workers.

Describe alternatives you've considered
Please describe alternative solutions or features you have considered.

None given how ESM works and links eagerly this is the only solution I've found.

The text was updated successfully, but these errors were encountered:

jasnell · 2021-04-28T14:57:48Z

This is definitely a difficult problem given the relationship of Environment to the thread and the way the main thread is bootstrapped. The only way I could imagine being able to tackle this would be to redefine the main thread as "Just Another Type of Worker Thread" that is spun off a top level root thread.

That is, if we think about the current structure as a DAG, what we currently have is:

 (root thread 1) | ---> (worker thread 2)
  Environment    |         Environment
    Isolate      |           Isolate
                 |
                 | ---> (worker thread 3)
                           Environment
                             Isolate

To do what you're suggesting, we would likely need to move to a model like:

  (root thread) | ---> (worker thread 1) | ---> (worker thread 2)
                         Environment     |          Environment
                           Isolate       |            Isolate
                                         | ---> (worker thread 3)
                                                    Environment
                                                      Isolate

From there, we could allow a parent worker to be replaced by one of the children, but as you point out it gets complicated when we ask what happens with the other child.

  (root thread) | ---> (worker thread 1) | ---> (worker thread 2)  (replace parent with this thread)
                         Environment     |         Environment
                           Isolate       |           Isolate
                                         | ---> (worker thread 3)
                                                   Environment
                                                     Isolate

  (root thread) | ---> (worker thread 2)
                |         Environment
                |           Isolate
                | ---> (worker thread 3)
                          Environment
                            Isolate

There are a couple of options. (a) We could terminate orphan workers automatically, (b) We could just leave them orphaned, (c) We could transfer responsibility for them to the new parent.

I think we can all agree that (b) is not a great choice. We can let the user code decide between (a) and (c).

For (c), the DAG would end up as:

  (root thread) | ---> (worker thread 2)  | ---> (worker thread 3)
                          Environment               Environment
                            Isolate                   Isolate

Obviously there are a ton of complexities wrapped up in that!

The key question I would have initially is: Is replacing threads really what we need? Alternatively, we could implement the ability for a Worker Thread to transfer ownership/responsibility for a child thread to a different thread.

For instance, the Node.js main process thread can load the apm implementation in a separate thread, which does its thing and spawns the instrumented application code thread. Once the application code thread is spun up, the APM thread transfers ownership of that back to the main thread then exits. This, however, obviously does not provide the requirement of the spun up worker acting like it's the main thread (no piped i/o, etc)

The other possibility is whether APM's could just be written to act as alternative launchers, or embedders, of the Node.js process so that they can inject the loaders the extensive code changes that would make replacing the thread possible. I know that has its own associated complexities, however.

bmeck · 2021-04-28T15:04:06Z

The key question I would have initially is: Is replacing threads really what we need? Alternatively, we could implement the ability for a Worker Thread to transfer ownership/responsibility for a child thread to a different thread.

Well, the problem is only the main thread can act like a main thread given how worker_threads are implemented, we could also make this just be some form of allowing worker threads to look and act exactly like the main thread? I didn't want to approach it that way since things can break if running in worker threads and they don't have the proper APIs due to limitations on worker_threads.

For instance, the Node.js main process thread can load the apm implementation in a separate thread, which does its thing and spawns the instrumented application code thread. Once the application code thread is spun up, the APM thread transfers ownership of that back to the main thread then exits.

This would seem fine to me, but still has the problem of the application living in a worker_thread environment which doesn't exactly match the main thread environment. So, this doesn't seem viable and more than other approaches.

The other possibility is whether APM's could just be written to act as alternative launchers, or embedders, of the Node.js process so that they can inject the loaders the extensive code changes that would make replacing the thread possible. I know that has its own associated complexities, however.

How would this be any more able to solve the real world issues than the existing --loader where they do not completely control the env they are running on? Also, doing this as custom embedders seems even harder to compose multiple APMs than --loader.

devsnek · 2021-04-28T15:07:16Z

None given how ESM works and links eagerly this is the only solution I've found.

Theoretically couldn't you use a cjs entrypoint that calls import()?

bmeck · 2021-04-28T15:10:35Z

@devsnek if the loader hasn't been used yet maybe (--require can call out to import)? but it also wouldn't solve for ESM entrypoints.

jasnell · 2021-04-28T15:10:48Z

How would this be any more able to solve the real world issues than the existing --loader where they do not completely control the env they are running on? Also, doing this as custom embedders seems even harder to compose multiple APMs than --loader.

As I said, it has it's own complexities ;-) ... there's not going to be an easy and simple solution to the problem, unfortunately. At this point I'm just saying we need to consider alternatives and rule them out to try to narrow in on the least complicated and difficult approach. I'm nowhere near convinced that the work necessary to make threads replaceable is it.

bmeck · 2021-04-28T15:15:38Z

As I said, it has it's own complexities ;-) ... there's not going to be an easy and simple solution to the problem, unfortunately. At this point I'm just saying we need to consider alternatives and rule them out to try to narrow in on the least complicated and difficult approach.

I'd note that solving for "least complicated and difficult" varies wildly on what you are trying to optimize for. I think we should optimize for some usability beyond custom builds of node since this space is a fairly generic one.

I'm nowhere near convinced that the work necessary to make threads replaceable is it.

As I've stated above, allowing worker threads to act exactly like the main thread would be sufficient as well. Though it would mean effectively leave a dead thread if the original isn't destroyed I'd assume. Also, I would not I am not talking about "native" threads here but conceptual ones. I could easily imagine the replace behavior spinning up an Isolate in the same native thread.

jasnell · 2021-04-28T15:17:55Z

I could easily imagine the replace behavior spinning up an Isolate in the same native thread.

This is an interesting approach also, and one that I've been stewing over... essentially, allowing multiple Environment/Isolates to exist within the same thread and sharing the same event loop. I was hoping to get around to playing with this soon.

addaleax · 2021-04-30T14:48:27Z

I could easily imagine the replace behavior spinning up an Isolate in the same native thread.

This is an interesting approach also, and one that I've been stewing over... essentially, allowing multiple Environment/Isolates to exist within the same thread and sharing the same event loop. I was hoping to get around to playing with this soon.

For multiple Environments in the same thread and event loop, this exists: https://github.com/addaleax/synchronous-worker

jasnell · 2021-04-30T15:04:32Z

Oh very nice. I had missed that!

cspotcode · 2021-05-24T07:09:15Z

Theoretically couldn't you use a cjs entrypoint that calls import()?

if the loader hasn't been used yet maybe (--require can call out to import)? but it also wouldn't solve for ESM entrypoints.

What about dynamic import in an ESM entrypoint?

bmeck · 2021-05-24T13:13:16Z

@cspotcode this leads to a similar situation to how Loaders work today. Loaders actually have a separate Module Map (Registry if you like that term) from application code due to some modules being loaded in the instrumentation and having side effects. To avoid side effects like prototype pollution and/or leaking via singletons of the instrumentation code (not the globals, use --frozen-primordials for now) a separate module instance is loaded for the same URL for Loaders different from the one loaded for Application code. Overall, the problem with stating the entrypoint should just use dynamic import is similar to what requiring a CJS entrypoint is since you can use dynamic import also in CJS.

cspotcode · 2021-05-24T16:31:22Z

I'm thinking that the application entrypoint can be something like this:

// entrypoint.mjs
await installInstrumentation();
import('./rest-of-the-application');

And the instrumentation -- whatever it may be -- will apply to the application's entire module graph, with the exception of entrypoint.mjs and whatever it loads to installInstrumentation()

It seems like the above should work if the instrumentation is bundled (via bundleDependencies or other solution) to avoid sharing modules with application code.

Is that the issue, that instrumentation might share dependencies with the application, and thus instrumentation-loaded dependencies will not be instrumented?

bmeck · 2021-05-24T17:08:16Z

Is that the issue, that instrumentation might share dependencies with the application, and thus instrumentation-loaded dependencies will not be instrumented?

Half of that.

Is that the issue, that instrumentation might share dependencies with the application

Yes

thus instrumentation-loaded dependencies will not be instrumented?

This is just one reason, other reasons are things like adding state in observable ways through instrumentation (e.g. how eventEmitter._events is visible and/or doing things like prepend / removing of instrumentation via APIs), or instrumentation code using singleton modules where state is stored.

Honestly, in order to force things to be instrumented for application code, it isn't terrible, just make all app code marked w/ a special query parameter that you strip out of import.meta.url on each module loaded. That however, is a terrible solution due to a variety of reasons like mutating source code and/or some not having query parameter equivalents. This kind of terrible solution was done to prove service workers could in theory emulate the loader API in https://github.com/bmeck/node-sw-compat-loader-test and it is kind of a mess as can be seen.

bmeck added esm Issues and PRs related to the ECMAScript Modules implementation. worker Issues and PRs related to Worker support. labels Apr 28, 2021

bmeck added the loaders-agenda Issues and PRs to discuss during the meetings of the Loaders team label May 6, 2021

mhdawson mentioned this issue May 10, 2021

Node.js Loaders Team Meeting 2021-05-14 nodejs/loaders#2

Closed

mhdawson mentioned this issue May 24, 2021

Node.js Loaders Team Meeting 2021-05-28 nodejs/loaders#6

Closed

bmeck mentioned this issue May 27, 2021

Tool config transpilation for webpack, gulp, etc nodejs/loaders#7

Open

GeoffreyBooth removed the loaders-agenda Issues and PRs to discuss during the meetings of the Loaders team label May 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow main thread to be supplanted by another #38454

Allow main thread to be supplanted by another #38454

bmeck commented Apr 28, 2021

jasnell commented Apr 28, 2021 •

edited

Loading

bmeck commented Apr 28, 2021

devsnek commented Apr 28, 2021

bmeck commented Apr 28, 2021

jasnell commented Apr 28, 2021

bmeck commented Apr 28, 2021

jasnell commented Apr 28, 2021

addaleax commented Apr 30, 2021

jasnell commented Apr 30, 2021

cspotcode commented May 24, 2021

bmeck commented May 24, 2021

cspotcode commented May 24, 2021

bmeck commented May 24, 2021

Allow main thread to be supplanted by another #38454

Allow main thread to be supplanted by another #38454

Comments

bmeck commented Apr 28, 2021

jasnell commented Apr 28, 2021 • edited Loading

bmeck commented Apr 28, 2021

devsnek commented Apr 28, 2021

bmeck commented Apr 28, 2021

jasnell commented Apr 28, 2021

bmeck commented Apr 28, 2021

jasnell commented Apr 28, 2021

addaleax commented Apr 30, 2021

jasnell commented Apr 30, 2021

cspotcode commented May 24, 2021

bmeck commented May 24, 2021

cspotcode commented May 24, 2021

bmeck commented May 24, 2021

jasnell commented Apr 28, 2021 •

edited

Loading