Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: t31135461 / Refine jest-worker API #6676

Closed
wants to merge 19 commits into from

Conversation

Kureev
Copy link

@Kureev Kureev commented Jul 11, 2018

Summary

Node 10 shipped with a "threading API" that uses SharedBuffers to communicate between the main process and its child threads. Being jest-worker a parallelization library, we can take advantage of this API when available. At the same time, we'll decouple our scheduling logic from our communication logic for better re-usability.

Here are some key things:

  • Decouple the scheduling logic (i.e. everything that's not specifically to send and receive jobs)
  • Add the option to use "threads" into the library. If they are present, we'll favor them.

Performance

➜ node --experimental-worker --expose-gc test.js empty 10000
---------------------------------------------------------------------------
jest-worker: { globalTime: 272, processingTime: 217 }
worker-farm: { globalTime: 521, processingTime: 462 }
---------------------------------------------------------------------------
jest-worker: { globalTime: 244, processingTime: 210 }
worker-farm: { globalTime: 502, processingTime: 451 }
---------------------------------------------------------------------------
jest-worker: { globalTime: 290, processingTime: 253 }
worker-farm: { globalTime: 455, processingTime: 433 }
---------------------------------------------------------------------------
jest-worker: { globalTime: 262, processingTime: 229 }
worker-farm: { globalTime: 668, processingTime: 634 }
---------------------------------------------------------------------------
jest-worker: { globalTime: 280, processingTime: 222 }
worker-farm: { globalTime: 600, processingTime: 548 }
---------------------------------------------------------------------------
jest-worker: { globalTime: 222, processingTime: 181 }
worker-farm: { globalTime: 490, processingTime: 459 }
---------------------------------------------------------------------------
jest-worker: { globalTime: 225, processingTime: 181 }
worker-farm: { globalTime: 469, processingTime: 428 }
---------------------------------------------------------------------------
jest-worker: { globalTime: 220, processingTime: 180 }
worker-farm: { globalTime: 437, processingTime: 407 }
---------------------------------------------------------------------------
jest-worker: { globalTime: 214, processingTime: 176 }
worker-farm: { globalTime: 489, processingTime: 444 }
---------------------------------------------------------------------------
jest-worker: { globalTime: 241, processingTime: 191 }
worker-farm: { globalTime: 440, processingTime: 412 }
---------------------------------------------------------------------------
total worker-farm: { wFGT: 5071, wFPT: 4678 }
total jest-worker: { jWGT: 2470, jWPT: 2040 }
---------------------------------------------------------------------------
% improvement over 10000 calls (global time): 51.29165845000986
% improvement over 10000 calls (processing time): 56.39162035057717

Test plan

TBA

@Kureev Kureev self-assigned this Jul 11, 2018
@Kureev Kureev requested review from rafeca and mjesun July 11, 2018 10:11
@Kureev
Copy link
Author

Kureev commented Jul 11, 2018

Please, take into account that this is a WIP PR. Currently, there are a few parts missing:

  • Retry in case of a failure
  • Tests
  • Usage docs

this._locks = [];
this._offset = 0;

// If we exceeded the amount of retries, we will emulate an error reply
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part will be rewritten and removed from here. This a part of missing functionality I've mentioned in the first comment to the PR

const numOfWorkers = this._workerPool.getWorkers().length;
for (let i = 0; i < numOfWorkers; i++) {
const workerIdx = (this._offset + i) % numOfWorkers;
this.enqueue(task, workerIdx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate the initial part of the function into another one, to avoid recursivity.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can use enqueue(QueueChildMessage, number) for scheduling to a specific worker and push(QueueChildMessage) to schedule a job in all workers. wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is to move the for loop into the parent implementation, where you will call enqueue always with a workerId. This way workerId won't be optional anymore. This will also allow you to remove getWorkers, because you won't need it to get the length anymore.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the design perspective I'd say Farm shouldn't be concerned how WorkerPool spread tasks among workers. It is fine that Farm can pin job to the worker if it has a reference to it, however it isn't cool if Farm contains queue distribution logic in it 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 Let's separate it into a different method as discussed offline.

_options: FarmOptions;
_stderr: Readable;
_stdout: Readable;
_workers: Array<WorkerInterface>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface shouldn't have any of the private variables.

send(request: ChildMessage, onProcessStart: OnStart, onProcessEnd: OnEnd) {
onProcessStart(this);
this._onProcessEnd = onProcessEnd;
// $FlowFixMe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

¿
this._child.send(request) doesn't expect to receive a request type

]);
}

function execMethod(method: string, args: $ReadOnlyArray<any>): void {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a common file within the workers/ folder that abstract this functionality? It looks like child and threadChild are 90% the same.

@@ -32,14 +34,16 @@ let file = null;
* If an invalid message is detected, the child will exit (by throwing) with a
* non-zero exit code.
*/
process.on('message', (request: any /* Should be ChildMessage */) => {
process.on('message', (request: any) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename this file to processChild as opposed to threadChild.

const workerOptions: WorkerOptions = {
forkOptions: options.forkOptions || {},
maxRetries: options.maxRetries || 3,
useNodeWorkersIfPossible: options.useNodeWorkersIfPossible,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this and make it true always.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense as an option, doesn't it? Seems like an opt-in thing (beyond using a version of node where it's unflagged)

Copy link
Author

@Kureev Kureev Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no real reason why you'd like to use child_process instead of worker_threads when the second one is available. According to our benchmarks, implementation using workers works ~ twice as fast

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API stability, there might be behavioural differences. Upgrading my app from node 10 to 11 (or whenever they remove the flag) shouldn't change the underlying implementation IMO. The docs for the module says it uses multiple processes, and just silently not doing that, depending on node version, seems risky to me

FWIW I could get behind setting it to true by default so it's possible to opt-out.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgrading my app from node 10 to 11 (or whenever they remove the flag) shouldn't change the underlying implementation IMO

I think that underlying implementation is not something that should concern an average library user. Public interface stays completely the same (which is crucial for all library users), while "engine" is getting faster. Same happens with the language and other libraries like React (Fiber rewrite) etc.

The docs for the module says it uses multiple processes, and just silently not doing that, depending on node version, seems risky to me

I'm completely on the same page, but we're going to cover all that by specs so there should be no issues with unexpected regression bugs.

I think @mjesun can give you more insights.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's re-write the docs as "we'll choose the most-performant implementation that is possible". People can enforce one or another by explicitly passing the corresponding worker pool.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me!

return;
}
const task = {onEnd, onStart, request};
const workerId = worker ? worker.getWorkerId() : undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's unify the interface. Either we work with worker instances or with worker ids, but I see a mix in the API.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. We use workerIds everywhere up to the pool


if (job.owner) {
this._queue[workerId] = job.next ? job.next : null;
return this.run(workerId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove recursivity here and use a while loop to find the first job available. Don't lock by worker, lock by job.

@SimenB
Copy link
Member

SimenB commented Jul 11, 2018

Ooooooh, nice!

Copy link
Member

@SimenB SimenB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also missing docs, in addition to the WIP points highlighted in a comment 🙂

Excited about this!

@@ -0,0 +1,33 @@
/**
* Copyright (c) 2017-present, Facebook, Inc. All rights reserved.
Copy link
Member

@SimenB SimenB Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

18? I have to admit I don't know what these should say

@Kureev
Copy link
Author

Kureev commented Jul 11, 2018

@SimenB you haven't seen the performance part yet 😉

P.S. Added "Usage docs" as a missing part. I think it should come along with this PR to avoid inconsistency

if (workerId != null) {
if (this._queue[workerId]) {
this._last[workerId].next = task;
this._last[workerId] = task;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is common.


this._workerPool.send(worker, job.request, job.onStart, onEnd);

job.owner = worker;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert owner to a boolean.

const job = this._queue[workerId];
const worker = this._workerPool.getWorkers()[workerId];

if (!job) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


this.lock(workerId);

this._workerPool.send(worker, job.request, job.onStart, onEnd);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an example of API unification: let's use workerId.

// $FlowFixMe: This has to be a dynamic require.
const child = require(workerPath);
this._cacheKeys = Object.create(null);
this._options = Object.assign({}, defaultFarmOptions, options);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to make all configuration non-optional from here down the line (e.g. your input says numWorkers: ?number, but the _options object passed here should not be optional (i.e. numWorkers: number).

PARENT_MESSAGE_ERROR,
CHILD_MESSAGE_INITIALIZE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alphabetical order


const onEnd = (error: ?Error, result: mixed, worker: WorkerInterface) => {
this.unlock(workerId);
job.onEnd(error, result, worker);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we swap this with the previous line? Conceptually sounds better to keep the worker locked while finishing, just in case onEnd decides to go wild.

}

getNextJob(workerId: number): ?QueueChildMessage {
if (!this._queue[workerId]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache the entire queue in const queue = this._queue[workerId].

break;
}
this._queue[workerId] = this._queue[workerId].next;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const numOfWorkers = this._workerPool.getWorkers().length;
for (let i = 0; i < numOfWorkers; i++) {
const workerIdx = (this._offset + i) % numOfWorkers;
this.enqueue(task, workerIdx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is to move the for loop into the parent implementation, where you will call enqueue always with a workerId. This way workerId won't be optional anymore. This will also allow you to remove getWorkers, because you won't need it to get the length anymore.


this._workers[i] = worker;

// $FlowFixMe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add explanation to all FlowFixMes.

You can get rid of this ones by redefining at the top the result, as a variable (class methods are covariant in Flow). For instance, putting at the top: getStderr: () => BaseWorkerPool; would do the trick.

Anyway, I'm not sure why do we need to bind them all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually don't need it anymore. Will remove them accordingly

@Kureev Kureev force-pushed the feature/T31135461 branch from 1ad9aa5 to 29e04d2 Compare July 12, 2018 11:01
@SimenB
Copy link
Member

SimenB commented Aug 15, 2018

Any news on this one? I'm excited!

@SimenB
Copy link
Member

SimenB commented Sep 15, 2018

@mjesun @Kureev ping 🙂 Would be awesome to land this. Currently landing some breaking changes, so this should be fine to land as part of that if needed

@Kureev
Copy link
Author

Kureev commented Sep 24, 2018

Hi @SimenB! :) Sorry for being silent. I'm working in a different team now, might be someone would like to take it over? IIRC, it's only about tests now. cc @mjesun

@SimenB SimenB added this to the Jest 24 milestone Oct 16, 2018
@SimenB
Copy link
Member

SimenB commented Oct 17, 2018

@mjesun @rubennorte anyone at FB working on this? If not, anyone able to say what works and what doesn't with this PR so we can land it?

@rubennorte
Copy link
Contributor

@SimenB AFAIK no one is working on this right now. I might take a look in a few weeks but I don't think it's going to be ready for the next major. Feel free to take over from here if you want to work on this (or anyone else).

@SimenB
Copy link
Member

SimenB commented Oct 17, 2018

Might look into it if you could say some about what's missing. Just getting CI green? More tests?

@rickhanlonii
Copy link
Member

I merged in master and wrote tests for this on #7408

Separated the PRs to get a clean review 👍

@SimenB
Copy link
Member

SimenB commented Dec 5, 2018

Landed

@SimenB SimenB closed this Dec 5, 2018
@SimenB SimenB mentioned this pull request Jan 22, 2019
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants