TODO: progress towards v0.3.0 #9

mehdi-cit · 2016-05-10T08:38:33Z

It's all in the question's title!
I am of the opinion that this project if carried correctly would give a huge boost to arrayfire (using arrayfire from node that is).

unbornchikken · 2016-05-11T12:35:25Z

mehdi-cit · 2016-05-11T15:43:06Z

@unbornchikken Thanks for the quick reply.
Please do keep us posted if you find ways to mitigate this!

unbornchikken · 2016-05-26T08:21:04Z

@umar456 Please take a look at my previous comment. I'm thinking on a solution. What if, let's say, array's destructor implementation wouldn't block, instead it'd add up an asynchronous operation in the queue to free up array's resources once all previous operations gets completed? cc @pavanky

umar456 · 2016-05-26T14:20:35Z

@pavanky can give you more detail but the destructor shouldn't be a blocking call. It should be managing the reference counts for all arrays and should be marked for deletion once all of the work is done. Those objects will be deleted at a later time(if the memory of that size is not used or the garbage collector is called). That event needs to be blocking because the GPU drivers perform a synchronization on the device but we try to avoid that whenever possible.

unbornchikken · 2016-05-26T14:26:16Z

Ok, I'm gonna work out a simple repro case with a code flow that - in theory - shouldn't block at all, but according to v8 performance data it does.

unbornchikken · 2016-05-26T14:31:20Z

@mehdi-cit as you can see, this project is still active, but my life events prevented me to make significant progress on it in the last few months. But, since my pet ML project is still have and will have a dependency on ArrayFire.js (and on CMake.js), you can expect me to put my focus back to those eventually.

mehdi-cit · 2016-06-08T08:02:48Z

Not sure if this can help with the issue at hand but it could be a good alternative when it comes to "integrating" javascript and c++ code:
https://github.com/charto/nbind

unbornchikken · 2016-08-16T09:00:39Z

@mehdi-cit unfortunately that's not that simple as nbind states. In ArrayFire there are a bunch of operations that act as synchronization point: constructors, complex operators, memory copy, etc. Which mean, if you wrap'em naive as-is, then you're gonna block the main loop on that point until all of previously enqueued AF operations gets completed. In Node.js you should never block the event loop.

Let's say, nbind supports asynchronous operations by the standard way. I mean by using nan's async workers: https://github.com/nodejs/nan/blob/master/doc/asyncworker.md (note: almost all of native library wrappers are doing this). But AsyncWorker uses libuv worker threads, so you gotta synchronize AF calls by some way. In current version you gotta use manual locks, but eventually thread safe ArrayFire will land and make that unnecessary. If you lock libuv workers then you'll serialize them, if there are more than one AF operations executing in parallel. Which means you'll kill libuv entirely and make Node. js totally synchronous, which is really really bad.

The only viable option is that you launch a separate libuv loop for AF and make your binding as a proxy for that. Well, this is where things will go really complicated especially if you're interacting v8 in C++, because verbosity and complexity of v8/nan.

That's why I'm creating fastcall. It will offer about the same performance that you could get with C++ based bindings (dyncall is really that fast, according to my benchmarks, it's overhead is negligible, 5%-ish), with the above mentioned separated libuv loop support.

Once fastcall stabilizes, I'll be back to this project. However I gotta invent something for proper RAII in JS, because of arrayfire/arrayfire-dotnet#8 (comment)

robertleeplummerjr · 2016-08-22T20:24:45Z

I've been watching arrayfire.js for a few months now, and am in love with it. I'm working on https://github.com/harthur-org/brain.js & its recurrent neural net and want to connect to arrayfire.js at some future point in time when things line up. I've spent a great deal of time researching before coding anything, and started here (amongst many research papers, how to articles, and many other libraries) for the most part:
https://github.com/karpathy/recurrentjs

After reviewing each of the major methods that are associated with the overall mathematical procedures, I found something that concerned me:

create a new matrix for each and every mathematical operation: https://github.com/karpathy/recurrentjs/blob/master/src/recurrent.js#L199
create a new closure for every mathematical operation: https://github.com/karpathy/recurrentjs/blob/master/src/recurrent.js#L211

The reason that this concerned me was first, those are (from past experience) memory leaks, and that after reviewing the recurrent neural net in arrayfire.js, I see a similar approach (please note, I am very nieve to arrayfire.js still, and this isn't a slam against the library, but rather rethinking the semi normal) and it got me thinking, how could we greatly speed this up? Or rather how can we use less resources to do the same thing? One of the biggest bottlenecks of multi threading, is of course, memory to and from cpu -> gpu -> cpu -> repeat, garbage collecting, etc. but what if we could offload the entire operation of calculations to the gpu, so that there is only one input (values -> input -> gpu), and one output (values -> ouput -> cpu)?

Tinkering with the idea, I came up with a few pseudo code sessions to try and wrap my head around what I was aiming for eventually, and yes, I'll say it was a yack shave at best. This was how I saw the operation on the first go around ( (outlined here)[https://github.com/BrainJS/brain.js/issues/24] ), I thought: Really what we are trying to do is build a state tree, just like in parsing, so you'd have something like:

{
  left: leftMatrix,
  right: rightMatrix,
  into: intoMatrix,
  forwardFunction: forwardFunction,
  backpropagateFunction: backpropagateFunction
}

This would repeat over and over again, depending on the complexity of your math. forwardFunction would be the math moving forward, say add, multiply, or relu. So here if we send left and rightas arguments called withforwardFunction, whereforwardFunctionusesinto` is where the values from the addition will go. The state tree is a little odd & backward, but you get the jist (exaggerated a bit to show structure):

 left-\
       > add = into -> next -\
right-/                       \ 
                               > multiply = into -> next -\
 left-\                       /                            \
       > add = into -> next -/                              \
right-/                                                      \
                                                              > relu = into -> DONE!
 left-\                                                      /
       > add = into -> next -\                              /
right-/                       \                            /
                               > multiply = into -> next -/
 left-\                       /
       > add = into -> next -/
right-/

In this scenario, rather than doing just in time operations on math, we'd actually setup a math equation in the gpu that could be fed a set of data and would at some future point in time return the answer to it.

In the original neural net, the setup doesn't really exist, other than some tricky prev value checks to trigger backfeeding the neural net, however the just in time calculations look something like:

var h0 = this.multiply(hiddenMatrix.weight, inputVector, this);
var h1 = this.multiply(hiddenMatrix.transition, hiddenPrev, this);
var hiddenD = this.relu(this.add(this.add(h0, h1, this), hiddenMatrix.bias, this), this);

What I'm proposing is that with this new thinking, we'd setup a math problem that could be used, similar to:

var eq = new Equation();
return eq.relu(
      eq.add(
        eq.add(
          eq.multiply(
            hiddenModel.weight,
            input
          ),
          eq.multiply(
            hiddenModel.transition,
            previousInput
          )
        ),
        hiddenModel.bias
      )
    );

Which instantiates an equation that can be used like:

eq.run(input, function(output) {
  console.log('look ma!  no CPU!', output);
});

robertleeplummerjr · 2016-08-22T20:45:17Z

I accidentally hit enter before I was done. So my question is, would this answer the problem we are having of a blocking synchronous thread that ultimately is synchronous? By giving the gpu the whole problem, with minimal in and out, would that address or even help so that we could have a full fledge multi threaded approach?

robertleeplummerjr · 2016-08-22T20:53:02Z

@UniqueFool, curious your thoughts here.

unbornchikken · 2016-08-23T08:33:34Z

@robertleeplummerjr Just wait for the new fastcall based bindings to come out before making any serious dependency on AF.js, please. In this version I'm working on a fully asynchronous, declarative approach that you proposed, with just one huge exception: I wanna have control flow too not just expressions! Like:

const result = yield raii.scope(() => {
    const arr1 = af.randu(42);
    const arr2 = af.constant(0, 42);
    for (let i = 0; i < 10; i++) {
        arr2.set(Math.random() * 42, arr1.get(Math.random() * 42));
    }
    return arr1.host();
});

You'll get yer plain old JavaScript, but that doesn't get executed right away. It gets enqueued in a separate libuv loop, and you'll get a Promise that resolves asynchronously once all of the operations gets completed on the device. And there will be an asynchronous RAII mechanism that will do exactly the same RAM and VRAM resource management automatically that C++ bindings have.

robertleeplummerjr · 2016-08-23T12:46:34Z

What is the eta? No rush on perfection :)

unbornchikken · 2016-08-23T12:51:57Z

That's above just the trailer. Kinda No Man's Sky. :) ETA: when it's done. ;) I'll keep you posted in this thread about my progress.

robertleeplummerjr · 2016-08-23T12:53:37Z

Anything I can do to help?

unbornchikken · 2016-08-23T12:56:25Z

Unfortunately nothing at this stage. Once I'm starting to add some actual methods to the new binding, you can help to add the others.

robertleeplummerjr · 2016-08-23T13:55:40Z

I love feedback, and brainstorming. I'll be here to assist in the meantime.

robertleeplummerjr · 2016-09-03T23:31:10Z

Any updates?

unbornchikken · 2016-09-10T20:43:39Z

On it. I've just reached the second milestone with fastcall, one major feature remains: callback support. Few weeks ahead.

robertleeplummerjr · 2016-09-10T20:51:55Z

Saweet! As of this evening I got rnn, lstm, and gru networks up and running with unit tests!!! Your audience is standing by to watch the master at work.

robertleeplummerjr · 2016-09-10T20:56:07Z

Bragging rights: BrainJS/brain.js#29

robertleeplummerjr · 2016-09-10T20:56:19Z

Your code looks fantastic, by the way!

unbornchikken · 2016-11-17T14:52:52Z

fyi: https://github.com/cmake-js/fastcall/milestone/3

robertleeplummerjr · 2016-11-17T16:04:58Z

(awesome!)

unbornchikken · 2016-12-06T10:43:52Z

Work on the new version has been started: https://github.com/arrayfire/arrayfire-js/tree/fastcall

Sorry for the delay, I had to invent the wheel to make an efficient ArrayFire binding possible on Node.

robertleeplummerjr · 2016-12-06T14:13:07Z

How ironic, I just landed the rnn, lstm, and gru last night!
BrainJS/brain.js#29

Ty for your hard work!

robertleeplummerjr · 2016-12-18T15:17:02Z

nearly ready?

unbornchikken · 2016-12-19T07:47:56Z

It depends on what you mean by nearly. :) I'm working on the array class. Once it gets ready, only the function wrap grinding remains. Which is a lot of work, but repetitive and easy to do. That's where I'm hoping for a bunch of PRs, though.

robertleeplummerjr · 2016-12-21T22:49:00Z

I would like to break down what will happen, once this is ready, to better understand. Here is an example I posted from above of how the neural net equation is composed:

var eq = new Equation();
return eq.relu(
      eq.add(
        eq.add(
          eq.multiply(
            hiddenModel.weight,
            input
          ),
          eq.multiply(
            hiddenModel.transition,
            previousInput
          )
        ),
        hiddenModel.bias
      )
    );

This will give us, not processed numbers, but rather all the binary (think of it like a parser tree) steps to achieve processed numbers at a later point in time. Sometime later we do:

eq.run();

and then to run the equation backward (needed for backpropagation) we run:

eq.runBackpropagate();

The standard model is to perform equations on the gpu, and send them to the cpu, and the cpu sends them back to the gpu, and then again to the cpu. What you end up with are tons of copies of arrays that ultimately (arguably) are not needed. So my question is this: Is there any way to keep the values on the gpu, and completely processed there, so there is one (or much fewer) in(s), and much less copying in and out?

To illustrate the standard model:

relu(
      add(
        add(
          multiply(
            hiddenModel.weight,
            input
          ), // send to gpu, process, receive on cpu
          multiply(
            hiddenModel.transition,
            previousInput
          ) // send to gpu, process, receive on cpu
        ), // send to gpu, process, receive on cpu
        hiddenModel.bias
      ) // send to gpu, process, receive on cpu
    ); // send to gpu, process, receive on cpu

Will we get something like this with your work?:

relu(
      add(
        add(
          multiply(
            hiddenModel.weight,
            input
          ),
          multiply(
            hiddenModel.transition,
            previousInput
          ) 
        ),
        hiddenModel.bias
      )
    ); // send to gpu, process, receive on cpu

unbornchikken · 2016-12-22T09:30:58Z

You'll get something like that, but it's from work of @pavanky and co. not me. ArrayFire's JIT technology merges a bunch of operations into a single kernel, and launch them at once. It works on most operations, however there is much to be done for better coverage: https://github.com/arrayfire/arrayfire/milestone/16 (search for "JIT" there).

unbornchikken · 2016-12-22T09:45:53Z

I've added a TODO list on top of this thread.

robertleeplummerjr · 2016-12-22T13:27:33Z

Well, sir...

unbornchikken · 2016-12-22T15:07:55Z

The good news is that next week is holiday for me, so I can work on this module ... if my kids and my wife live me some breath. :)

robertleeplummerjr · 2016-12-22T15:10:49Z

Same exact situation here.

unbornchikken · 2017-03-23T08:26:30Z

Unfortunately I couldn't come up with a solution in Node.js that is better (faster) than we have in the current version. ArrayFire's design could not fit very well with the Nature of Node's event queue, so it turned out, my first approach is the best that I can come up with. The good news is that already near feature complete, just check out the examples folder.

unbornchikken added the TODO label May 26, 2016

unbornchikken changed the title ~~Is this project still active? (developed/maintained)~~ TODO: work out a way to have deterministic RAII scopes but keep the flow as async May 26, 2016

unbornchikken mentioned this issue May 26, 2016

Tracking progress to 1.0.0-beta #1

Closed

25 tasks

unbornchikken mentioned this issue May 30, 2016

Use for neural networks (ANN) #10

Open

unbornchikken mentioned this issue Sep 22, 2016

More es6-ish syntax #13

Closed

unbornchikken changed the title ~~TODO: work out a way to have deterministic RAII scopes but keep the flow as async~~ TODO: progress towards v0.3.0 Dec 22, 2016

unbornchikken closed this as completed Mar 23, 2017

TODO: progress towards v0.3.0 #9

TODO: progress towards v0.3.0 #9

Comments

mehdi-cit commented May 10, 2016

unbornchikken commented May 11, 2016 • edited Loading

mehdi-cit commented May 11, 2016

unbornchikken commented May 26, 2016 • edited Loading

umar456 commented May 26, 2016

unbornchikken commented May 26, 2016

unbornchikken commented May 26, 2016

mehdi-cit commented Jun 8, 2016

unbornchikken commented Aug 16, 2016 • edited Loading

robertleeplummerjr commented Aug 22, 2016 • edited Loading

robertleeplummerjr commented Aug 22, 2016 • edited Loading

robertleeplummerjr commented Aug 22, 2016

unbornchikken commented Aug 23, 2016 • edited Loading

robertleeplummerjr commented Aug 23, 2016

unbornchikken commented Aug 23, 2016

robertleeplummerjr commented Aug 23, 2016

unbornchikken commented Aug 23, 2016

robertleeplummerjr commented Aug 23, 2016

robertleeplummerjr commented Sep 3, 2016

unbornchikken commented Sep 10, 2016

robertleeplummerjr commented Sep 10, 2016

robertleeplummerjr commented Sep 10, 2016

robertleeplummerjr commented Sep 10, 2016

unbornchikken commented Nov 17, 2016

robertleeplummerjr commented Nov 17, 2016

unbornchikken commented Dec 6, 2016

robertleeplummerjr commented Dec 6, 2016

robertleeplummerjr commented Dec 18, 2016

unbornchikken commented Dec 19, 2016 • edited Loading

robertleeplummerjr commented Dec 21, 2016 • edited Loading

unbornchikken commented Dec 22, 2016

unbornchikken commented Dec 22, 2016

robertleeplummerjr commented Dec 22, 2016

unbornchikken commented Dec 22, 2016

robertleeplummerjr commented Dec 22, 2016

unbornchikken commented Mar 23, 2017

unbornchikken commented May 11, 2016 •

edited

Loading

unbornchikken commented May 26, 2016 •

edited

Loading

unbornchikken commented Aug 16, 2016 •

edited

Loading

robertleeplummerjr commented Aug 22, 2016 •

edited

Loading

robertleeplummerjr commented Aug 22, 2016 •

edited

Loading

unbornchikken commented Aug 23, 2016 •

edited

Loading

unbornchikken commented Dec 19, 2016 •

edited

Loading

robertleeplummerjr commented Dec 21, 2016 •

edited

Loading