-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TODO: progress towards v0.3.0 #9
Comments
Yeah, it's basically feature complete. But unfortunately it has a serious flaw: for the sake of performance we need deterministic scopes (arrayfire/arrayfire-dotnet#8 (comment)). But that implies that we will call destructors synchronously, and because destructors are synchronization points in ArrayFire, ArrayFire.js is synchronous despite my best effort to make it asynchronous. I'm thinking on a new approach, but that's gonna make every call and value access asynchronous for sure, which is kinda ugly, and hurts performance. So it works, but it blocks the event loop at the current state. EDIT: TODO list of new version:
|
@unbornchikken Thanks for the quick reply. |
@pavanky can give you more detail but the destructor shouldn't be a blocking call. It should be managing the reference counts for all arrays and should be marked for deletion once all of the work is done. Those objects will be deleted at a later time(if the memory of that size is not used or the garbage collector is called). That event needs to be blocking because the GPU drivers perform a synchronization on the device but we try to avoid that whenever possible. |
Ok, I'm gonna work out a simple repro case with a code flow that - in theory - shouldn't block at all, but according to v8 performance data it does. |
@mehdi-cit as you can see, this project is still active, but my life events prevented me to make significant progress on it in the last few months. But, since my pet ML project is still have and will have a dependency on ArrayFire.js (and on CMake.js), you can expect me to put my focus back to those eventually. |
Not sure if this can help with the issue at hand but it could be a good alternative when it comes to "integrating" javascript and c++ code: |
@mehdi-cit unfortunately that's not that simple as nbind states. In ArrayFire there are a bunch of operations that act as synchronization point: constructors, complex operators, memory copy, etc. Which mean, if you wrap'em naive as-is, then you're gonna block the main loop on that point until all of previously enqueued AF operations gets completed. In Node.js you should never block the event loop. Let's say, nbind supports asynchronous operations by the standard way. I mean by using nan's async workers: https://github.com/nodejs/nan/blob/master/doc/asyncworker.md (note: almost all of native library wrappers are doing this). But AsyncWorker uses libuv worker threads, so you gotta synchronize AF calls by some way. In current version you gotta use manual locks, but eventually thread safe ArrayFire will land and make that unnecessary. If you lock libuv workers then you'll serialize them, if there are more than one AF operations executing in parallel. Which means you'll kill libuv entirely and make Node. js totally synchronous, which is really really bad. The only viable option is that you launch a separate libuv loop for AF and make your binding as a proxy for that. Well, this is where things will go really complicated especially if you're interacting v8 in C++, because verbosity and complexity of v8/nan. That's why I'm creating fastcall. It will offer about the same performance that you could get with C++ based bindings (dyncall is really that fast, according to my benchmarks, it's overhead is negligible, 5%-ish), with the above mentioned separated libuv loop support. Once fastcall stabilizes, I'll be back to this project. However I gotta invent something for proper RAII in JS, because of arrayfire/arrayfire-dotnet#8 (comment) |
I've been watching arrayfire.js for a few months now, and am in love with it. I'm working on https://github.com/harthur-org/brain.js & its recurrent neural net and want to connect to arrayfire.js at some future point in time when things line up. I've spent a great deal of time researching before coding anything, and started here (amongst many research papers, how to articles, and many other libraries) for the most part: After reviewing each of the major methods that are associated with the overall mathematical procedures, I found something that concerned me:
The reason that this concerned me was first, those are (from past experience) memory leaks, and that after reviewing the recurrent neural net in arrayfire.js, I see a similar approach (please note, I am very nieve to arrayfire.js still, and this isn't a slam against the library, but rather rethinking the semi normal) and it got me thinking, how could we greatly speed this up? Or rather how can we use less resources to do the same thing? One of the biggest bottlenecks of multi threading, is of course, memory to and from Tinkering with the idea, I came up with a few pseudo code sessions to try and wrap my head around what I was aiming for eventually, and yes, I'll say it was a yack shave at best. This was how I saw the operation on the first go around ( (outlined here)[https://github.com/BrainJS/brain.js/issues/24] ), I thought: Really what we are trying to do is build a state tree, just like in parsing, so you'd have something like:
This would repeat over and over again, depending on the complexity of your math.
In this scenario, rather than doing just in time operations on math, we'd actually setup a math equation in the gpu that could be fed a set of data and would at some future point in time return the answer to it. In the original neural net, the setup doesn't really exist, other than some tricky var h0 = this.multiply(hiddenMatrix.weight, inputVector, this);
var h1 = this.multiply(hiddenMatrix.transition, hiddenPrev, this);
var hiddenD = this.relu(this.add(this.add(h0, h1, this), hiddenMatrix.bias, this), this); What I'm proposing is that with this new thinking, we'd setup a math problem that could be used, similar to: var eq = new Equation();
return eq.relu(
eq.add(
eq.add(
eq.multiply(
hiddenModel.weight,
input
),
eq.multiply(
hiddenModel.transition,
previousInput
)
),
hiddenModel.bias
)
); Which instantiates an equation that can be used like:
|
I accidentally hit enter before I was done. So my question is, would this answer the problem we are having of a blocking synchronous thread that ultimately is synchronous? By giving the gpu the whole problem, with minimal in and out, would that address or even help so that we could have a full fledge multi threaded approach? |
@UniqueFool, curious your thoughts here. |
@robertleeplummerjr Just wait for the new fastcall based bindings to come out before making any serious dependency on AF.js, please. In this version I'm working on a fully asynchronous, declarative approach that you proposed, with just one huge exception: I wanna have control flow too not just expressions! Like: const result = yield raii.scope(() => {
const arr1 = af.randu(42);
const arr2 = af.constant(0, 42);
for (let i = 0; i < 10; i++) {
arr2.set(Math.random() * 42, arr1.get(Math.random() * 42));
}
return arr1.host();
}); You'll get yer plain old JavaScript, but that doesn't get executed right away. It gets enqueued in a separate libuv loop, and you'll get a Promise that resolves asynchronously once all of the operations gets completed on the device. And there will be an asynchronous RAII mechanism that will do exactly the same RAM and VRAM resource management automatically that C++ bindings have. |
What is the eta? No rush on perfection :) |
That's above just the trailer. Kinda No Man's Sky. :) ETA: when it's done. ;) I'll keep you posted in this thread about my progress. |
Anything I can do to help? |
Unfortunately nothing at this stage. Once I'm starting to add some actual methods to the new binding, you can help to add the others. |
I love feedback, and brainstorming. I'll be here to assist in the meantime. |
Any updates? |
On it. I've just reached the second milestone with fastcall, one major feature remains: callback support. Few weeks ahead. |
Saweet! As of this evening I got rnn, lstm, and gru networks up and running with unit tests!!! Your audience is standing by to watch the master at work. |
Bragging rights: BrainJS/brain.js#29 |
Your code looks fantastic, by the way! |
Work on the new version has been started: https://github.com/arrayfire/arrayfire-js/tree/fastcall Sorry for the delay, I had to invent the wheel to make an efficient ArrayFire binding possible on Node. |
How ironic, I just landed the rnn, lstm, and gru last night! Ty for your hard work! |
nearly ready? |
It depends on what you mean by nearly. :) I'm working on the array class. Once it gets ready, only the function wrap grinding remains. Which is a lot of work, but repetitive and easy to do. That's where I'm hoping for a bunch of PRs, though. |
I would like to break down what will happen, once this is ready, to better understand. Here is an example I posted from above of how the neural net equation is composed:
This will give us, not processed numbers, but rather all the binary (think of it like a parser tree) steps to achieve processed numbers at a later point in time. Sometime later we do: eq.run(); and then to run the equation backward (needed for backpropagation) we run: eq.runBackpropagate(); The standard model is to perform equations on the gpu, and send them to the cpu, and the cpu sends them back to the gpu, and then again to the cpu. What you end up with are tons of copies of arrays that ultimately (arguably) are not needed. So my question is this: Is there any way to keep the values on the gpu, and completely processed there, so there is one (or much fewer) in(s), and much less copying in and out? To illustrate the standard model: relu(
add(
add(
multiply(
hiddenModel.weight,
input
), // send to gpu, process, receive on cpu
multiply(
hiddenModel.transition,
previousInput
) // send to gpu, process, receive on cpu
), // send to gpu, process, receive on cpu
hiddenModel.bias
) // send to gpu, process, receive on cpu
); // send to gpu, process, receive on cpu Will we get something like this with your work?: relu(
add(
add(
multiply(
hiddenModel.weight,
input
),
multiply(
hiddenModel.transition,
previousInput
)
),
hiddenModel.bias
)
); // send to gpu, process, receive on cpu |
You'll get something like that, but it's from work of @pavanky and co. not me. ArrayFire's JIT technology merges a bunch of operations into a single kernel, and launch them at once. It works on most operations, however there is much to be done for better coverage: https://github.com/arrayfire/arrayfire/milestone/16 (search for "JIT" there). |
I've added a TODO list on top of this thread. |
The good news is that next week is holiday for me, so I can work on this module ... if my kids and my wife live me some breath. :) |
Same exact situation here. |
Unfortunately I couldn't come up with a solution in Node.js that is better (faster) than we have in the current version. ArrayFire's design could not fit very well with the Nature of Node's event queue, so it turned out, my first approach is the best that I can come up with. The good news is that already near feature complete, just check out the examples folder. |
It's all in the question's title!
I am of the opinion that this project if carried correctly would give a huge boost to arrayfire (using arrayfire from node that is).
The text was updated successfully, but these errors were encountered: