Training is incredibly slow compared to BrainJS #70

nemo · 2016-01-21T00:19:45Z

Hey folks,

We have a relatively small dataset (<1000) that when we use BrainJS – trains in about 2-4 minutes.

However, with synaptic – the same task takes about 45 minutes or so. We're using a 1 layer Architect.Perceptron on Node 4.2.2. Changing the learning rate / error has given us some minimal speed improvements but definitely not anywhere close to the BrainJS version.

Any thoughts around this? What else should we try?

ghost · 2016-01-23T18:22:42Z

Are you sure that BrainJS is a LSTM based network ?

menduz · 2016-01-23T18:27:32Z

Hello nemo, can you post your tests codes so we can make a diagnosis?

nemo · 2016-01-24T22:28:49Z

@Pummelchen – BrainJS is just a feed-forward network. And I'm not using synaptic's LSTM capabilities.

@menduz – here's the network, training is called from here with these options. Thanks for helping out!

cazala · 2016-02-26T18:31:21Z

Hey @nemo, sorry for the late response, I haven't run your code yet but something that I notice is that you are using cross entropy as the cost function, while brainjs uses minimum squared error. You could try replacing this line with Trainer.cost.MSE to have a fair comparison. It may not change much, but on certain tasks the cost function can hugely shorten the training. btw, after both networks are trained (the brainjs one and the synaptic one), do you notice any of them performing better than the other one when running you test sets?

nemo · 2016-04-07T17:26:02Z

@cazala haven't tested the performance of the activate function most optimization yet.

Using MSE as the cost function doesn't improve the performance by much either, unfortunately.

UniqueFool · 2016-05-22T22:59:01Z

is BrainJS possible using some form of threading (web-workers), and what about synaptic's use of idle cores ?

UniqueFool · 2016-05-23T16:50:20Z

You may be interested in taking a look at this: https://github.com/arrayfire/arrayfire-js
The also have an ANN example, too: https://github.com/arrayfire/arrayfire-js/blob/master/examples/es6/machine-learning/neuralNetwork.js

For this to be feasible, synaptic would need to encapsulate all array handling, so that a different back-end can be provided/used for this.

UniqueFool · 2016-05-26T14:30:52Z

@cazala: Would it be possible to get a few pointers regarding the feasibility of encapsulating array handling to make use of a library like arrayfire-js ?

http://arrayfire.org/docs/index.htm

UniqueFool · 2016-05-27T14:38:35Z

I am willing to take a look at this, as far as I can tell, the optimize() function may be a good starting point, because it's already converting a whole network, including all neurons, into a hard-coded function, right ?

https://github.com/cazala/synaptic/blob/master/src/network.js#L123

robertleeplummerjr · 2016-05-28T23:41:08Z

I'm currently working on brain.js and am very interested in multi threading, will do more research and we can help each other out.

UniqueFool · 2016-05-29T08:20:14Z

explicit multithreading probably doesn't scale too well - an arrayfire-js based approach would have the advantage that parallelization takes place implicitly, so that even OpenCL is supported (think GPUs, FPGAs). Such hardware is known for speeding up vectorization by a factor of up to 250x

In conjunction with some kind of clustering module on top of arrayfire-js, a neural network library like Brain.js or synaptic would have no hard-coded limits regarding the degree of parallelization it can use, which would even mean supporting heterogenous hardware clusters.

For that to happen, all array-handling and calculations would need to be moved to a helper object, that can serve as the "driver" for different back-ends - such a "driver" could then be shared with other ANN projects like Brain.js/synaptic, and could even become its own project at some point - think of it as an API for creating ANN frameworks.

robertleeplummerjr · 2016-05-29T18:58:15Z

arrayfire seems like magic!

UniqueFool · 2016-05-29T19:39:06Z

It doesn't need to be arrayfire - there are other solutions that allow code to be vectorized automatically to make use of different back-ends - no matter if that means OpenMP (SIMD) or OpenCL.

Note that OpenCL would work even for GPU-less systems as long as a an openCL environment is installed that exposes the CPU as an OpenCL environment (e.g. Intel/AMD)

ghost · 2016-05-29T19:48:10Z

So who here has the time and knowledge to implement that ?

robertleeplummerjr · 2016-05-29T19:56:16Z

I can offer some time. @UniqueFool do you have an example of your neural net with and without arrayfire? That would be terribly helpful.

UniqueFool · 2016-05-29T20:06:30Z

I think I posted a comment a few days ago containing a link to the arrayfire example implementing a NN: https://github.com/arrayfire/arrayfire-js/blob/master/examples/es6/machine-learning/neuralNetwork.js

Note that this would require arrayfire, and arrayfire-js to be installed first of all (there are different dependencies required depending on the back-ends you want to support/use).

And notice that, for the time being, this is unrelated to synaptic - which is why I asked for pointers on refactoring the existing synaptic code to encapsulate array handling and any calculations that would benefit from vectorization

But like I said, I would not necessarily make this specific to arrayfire, it was really just meant to make the point that OpenCL scales better than OpenMP-level parallelization, because the latter can be used by the former on platforms without hardware-accelerated GPUs/FPGAs, while the opposite is not the case - which is to say that even heavily multi-threaded code cannot automatically benefit from dedicated vectorization hardware, unless it happens to be a CPU.

My suggestion would be not to actually do any coding until @cazala has left a comment, and preferably a few pointers, here.

Personally, my suggestion would be supporting something like arrayfire as an optional back-end, and for that, we would need to work out a way to rework the existing code base so that its major computation workhorses are encapsulated into helper functions/classes that can directly deal with arrays in a functional fashion (think map/reduce and filter), at which point it will be much more straightforward to map everything to a different back-end like arrayfire-js.

Obviously, it does not make sense to make such a back-end mandatory, because OpenCL/CUDA and even C is really only supported by node, whereas synaptic has to run in the browser, too - and with WebCL still being experimental, it is rather challenging to make that work in a portable fashion (e.g. see this)

robertleeplummerjr · 2016-05-29T20:24:42Z

I should have referenced the above link, and asked for a pure js example. Speaking naively, what I meant was do you have a version with no arrayfire? So we could see exactly the before arrayfire and after arrayfire implementations, for ease of reference.

UniqueFool · 2016-05-29T20:52:51Z

If you are referring to pure synaptic vs. synaptic using arrayfire - no, I don't have any code doing that, that was the whole point of my original comment - however, looking at the code in question, what we basically need to do to use a different vectorization backend (like arrayfire) is locating the calculation routines in syntaptic and mapping those to the af.* calls that can be seen in the arrayfire NN example

There's another example in the machine learning folder at: https://github.com/arrayfire/arrayfire-js/blob/master/examples/es6/machine-learning/ann.js

In general, the docs are pretty good actually: http://arrayfire.org/arrayfire-js/

Apart from that, I am not sure if there is a side-by-side comparison illustrating how to adopt the library - at least, I haven't seen one yet.

robertleeplummerjr · 2016-05-30T11:18:41Z

One of the things that makes these libraries so powerful is that they run in the browser, node, and elsewhere. I bet if we put some thought into it, we can achieve a model that checks for the existence of arrayfire, and uses it, if not it gracefully degrades to the status quo.

cazala · 2016-05-31T02:47:41Z

I believe the only two parts of the code that need to be optimized are the activation and propagation methods from the Neuron, since they do all math. All they do is performing simple additions and multiplications over values in arrays using for-loops (the comments point to the equations in the paper if you like the algebraic representation of the algorithm better), it shouldn't be hard to encapsulate those two into a driver class and abstract the array operations to use methods like map/reduce. I can give it a try in a separate branch if I find some time but that's probably not going to happen in the following couple weeks, but anyone is welcome to give it a shot and I'd be happy to help as much as I can. I already toyed in the past with the idea of using WebGL (also WebCL but that's still too experimental and din't get too far) and I believe we can achieve a great performance boost for big networks (using layers with thousands of neurons rather than dozens)

UniqueFool · 2016-05-31T06:18:59Z

Thanks for getting back i ntouch, I posted a follow-up over at the arrayfire tracker (here), and they generally seem supportive of the idea - but mentioned that some key APIs are going to change due to async related refactorings.

I do agree that a map/reduce approach would be useful to help generalize the existing code, at which point it will be easier to adopt a different back-end like arrayfire

UniqueFool · 2016-05-31T11:39:24Z

I believe the only two parts of the code that need to be optimized are the activation and propagation methods from the Neuron, since they do all math.

How about moving the current implementation to a different object, and letting the constructor accept a calllback that directly deals with the corresponding arrays in a forEach fashion ?

That way, we could override the default behavior using a different/hardware-accelerated back-end like arrayfire, just by providing a custom callback that implements the activation/propagation methods using whatever means is available ?

robertleeplummerjr · 2016-05-31T11:44:57Z

I like it!
On May 31, 2016 7:39 AM, "UniqueFool" [email protected] wrote:

I believe the only two parts of the code that need to be optimized are the
activation and propagation methods from the Neuron, since they do all math.

How about moving the current implementation to a different object, and
letting the constructor accept a calllback that directly deals with the
corresponding arrays in a forEach fashion ?

That way, we could override the default behavior using a
different/hardware-accelerated back-end like arrayfire, just by providing a
custom callback that implements the activation/propagation methods using
whatever means is available ?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#70 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AApcu-43HBSbT1M_K0Pk3AiqpzIUV8Jtks5qHB3xgaJpZM4HJGxJ
.

UniqueFool · 2016-05-31T11:51:18Z

And that would in fact also work very well for all currently supported use-cases, including the browser/non-arrayfire scenario - because the default behavior would be left "as is", whereas we could add some startup/runtime flag (or heuristics) to enable the arrayfire based version of the activation/propagation functions.

The added benefit is that arrayfire-js optionally supports custom OpenCL kernels, which is to say that existing OpenCL kernels implementing CNNs, RNNs etc could be reused for a more aggressively optimized version, despite never having been written with JavaScript use in mind.

Equally, a map/reduce approach would make it much easier to make use of web workers (browser) or other parallelization schemes (think clustering) that may not even have access to OpenCL or the GPU in general.

UniqueFool · 2016-05-31T12:18:16Z

I believe the only two parts of the code that need to be optimized are the activation and propagation methods from the Neuron, since they do all math.

Just for the sake of completeness, even the training/backpropagation part of the code could in theory make use of parallelization, and thus, speed up network training considerably.

Here's a short, and very accessible/light, 7-page PDF illustrating the basic concept, based on doing parallel backprop on 1ghz dual-core systems (note, no GPU use at all): http://www.neuropro.ru/mypapers/lncs3606.pdf

This is something that people experimented with already in the early 90s: http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1226&context=ecetr

And here's a more recent paper detailing how OpenCL can be used for parallelizing ANN training: https://bib.irb.hr/datoteka/584308.MIPRO_2011_Nenad.pdf
With the conclusion stating:

This paper describes two implementations of parallel neural
network training algorithms with OpenCL. The programming
model allows the implementations to be deployed on different
platforms; although the maximum efficiency is obtained on a
GPU device, the program can also be executed on a general
purpose CPU.
Training with parallel backpropagation proved to be efficient
only with large networks and is not recommended for smaller
networks and smaller number of samples.

UniqueFool · 2016-06-02T16:43:27Z

Just for future reference, here are two github projects which apparently use OpenCL kernels for parallel backpropagation:

Besides, the arrayfire project is currently creating a machine-learning library on top of arrayfire, which will include ANN support

For details, see: arrayfire/arrayfire-ml#3

menduz added the Performance label Feb 9, 2016

This was referenced May 29, 2016

Sound classifying network #77

Open

Use for neural networks (ANN) arrayfire/arrayfire-js#10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training is incredibly slow compared to BrainJS #70

Training is incredibly slow compared to BrainJS #70

nemo commented Jan 21, 2016

ghost commented Jan 23, 2016

menduz commented Jan 23, 2016

nemo commented Jan 24, 2016

cazala commented Feb 26, 2016

nemo commented Apr 7, 2016

UniqueFool commented May 22, 2016

UniqueFool commented May 23, 2016 •

edited

Loading

UniqueFool commented May 26, 2016

UniqueFool commented May 27, 2016 •

edited

Loading

robertleeplummerjr commented May 28, 2016

UniqueFool commented May 29, 2016 •

edited

Loading

robertleeplummerjr commented May 29, 2016

UniqueFool commented May 29, 2016 •

edited

Loading

ghost commented May 29, 2016

robertleeplummerjr commented May 29, 2016

UniqueFool commented May 29, 2016 •

edited

Loading

robertleeplummerjr commented May 29, 2016

UniqueFool commented May 29, 2016 •

edited

Loading

robertleeplummerjr commented May 30, 2016

cazala commented May 31, 2016

UniqueFool commented May 31, 2016 •

edited

Loading

UniqueFool commented May 31, 2016

robertleeplummerjr commented May 31, 2016

UniqueFool commented May 31, 2016 •

edited

Loading

UniqueFool commented May 31, 2016

UniqueFool commented Jun 2, 2016 •

edited

Loading

Training is incredibly slow compared to BrainJS #70

Training is incredibly slow compared to BrainJS #70

Comments

nemo commented Jan 21, 2016

ghost commented Jan 23, 2016

menduz commented Jan 23, 2016

nemo commented Jan 24, 2016

cazala commented Feb 26, 2016

nemo commented Apr 7, 2016

UniqueFool commented May 22, 2016

UniqueFool commented May 23, 2016 • edited Loading

UniqueFool commented May 26, 2016

UniqueFool commented May 27, 2016 • edited Loading

robertleeplummerjr commented May 28, 2016

UniqueFool commented May 29, 2016 • edited Loading

robertleeplummerjr commented May 29, 2016

UniqueFool commented May 29, 2016 • edited Loading

ghost commented May 29, 2016

robertleeplummerjr commented May 29, 2016

UniqueFool commented May 29, 2016 • edited Loading

robertleeplummerjr commented May 29, 2016

UniqueFool commented May 29, 2016 • edited Loading

robertleeplummerjr commented May 30, 2016

cazala commented May 31, 2016

UniqueFool commented May 31, 2016 • edited Loading

UniqueFool commented May 31, 2016

robertleeplummerjr commented May 31, 2016

UniqueFool commented May 31, 2016 • edited Loading

UniqueFool commented May 31, 2016

UniqueFool commented Jun 2, 2016 • edited Loading

UniqueFool commented May 23, 2016 •

edited

Loading

UniqueFool commented May 27, 2016 •

edited

Loading

UniqueFool commented May 29, 2016 •

edited

Loading

UniqueFool commented May 29, 2016 •

edited

Loading

UniqueFool commented May 29, 2016 •

edited

Loading

UniqueFool commented May 29, 2016 •

edited

Loading

UniqueFool commented May 31, 2016 •

edited

Loading

UniqueFool commented May 31, 2016 •

edited

Loading

UniqueFool commented Jun 2, 2016 •

edited

Loading