Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training is incredibly slow compared to BrainJS #70

Open
nemo opened this issue Jan 21, 2016 · 26 comments
Open

Training is incredibly slow compared to BrainJS #70

nemo opened this issue Jan 21, 2016 · 26 comments

Comments

@nemo
Copy link

nemo commented Jan 21, 2016

Hey folks,

We have a relatively small dataset (<1000) that when we use BrainJS – trains in about 2-4 minutes.

However, with synaptic – the same task takes about 45 minutes or so. We're using a 1 layer Architect.Perceptron on Node 4.2.2. Changing the learning rate / error has given us some minimal speed improvements but definitely not anywhere close to the BrainJS version.

Any thoughts around this? What else should we try?

@ghost
Copy link

ghost commented Jan 23, 2016

Are you sure that BrainJS is a LSTM based network ?

@menduz
Copy link
Collaborator

menduz commented Jan 23, 2016

Hello nemo, can you post your tests codes so we can make a diagnosis?

@nemo
Copy link
Author

nemo commented Jan 24, 2016

@Pummelchen – BrainJS is just a feed-forward network. And I'm not using synaptic's LSTM capabilities.

@menduz – here's the network, training is called from here with these options. Thanks for helping out!

@cazala
Copy link
Owner

cazala commented Feb 26, 2016

Hey @nemo, sorry for the late response, I haven't run your code yet but something that I notice is that you are using cross entropy as the cost function, while brainjs uses minimum squared error. You could try replacing this line with Trainer.cost.MSE to have a fair comparison. It may not change much, but on certain tasks the cost function can hugely shorten the training. btw, after both networks are trained (the brainjs one and the synaptic one), do you notice any of them performing better than the other one when running you test sets?

@nemo
Copy link
Author

nemo commented Apr 7, 2016

@cazala haven't tested the performance of the activate function most optimization yet.

Using MSE as the cost function doesn't improve the performance by much either, unfortunately.

@UniqueFool
Copy link

is BrainJS possible using some form of threading (web-workers), and what about synaptic's use of idle cores ?

@UniqueFool
Copy link

UniqueFool commented May 23, 2016

You may be interested in taking a look at this: https://github.com/arrayfire/arrayfire-js
The also have an ANN example, too: https://github.com/arrayfire/arrayfire-js/blob/master/examples/es6/machine-learning/neuralNetwork.js

For this to be feasible, synaptic would need to encapsulate all array handling, so that a different back-end can be provided/used for this.

@UniqueFool
Copy link

@cazala: Would it be possible to get a few pointers regarding the feasibility of encapsulating array handling to make use of a library like arrayfire-js ?

http://arrayfire.org/docs/index.htm

@UniqueFool
Copy link

UniqueFool commented May 27, 2016

I am willing to take a look at this, as far as I can tell, the optimize() function may be a good starting point, because it's already converting a whole network, including all neurons, into a hard-coded function, right ?

https://github.com/cazala/synaptic/blob/master/src/network.js#L123

@robertleeplummerjr
Copy link

I'm currently working on brain.js and am very interested in multi threading, will do more research and we can help each other out.

@UniqueFool
Copy link

UniqueFool commented May 29, 2016

explicit multithreading probably doesn't scale too well - an arrayfire-js based approach would have the advantage that parallelization takes place implicitly, so that even OpenCL is supported (think GPUs, FPGAs). Such hardware is known for speeding up vectorization by a factor of up to 250x

In conjunction with some kind of clustering module on top of arrayfire-js, a neural network library like Brain.js or synaptic would have no hard-coded limits regarding the degree of parallelization it can use, which would even mean supporting heterogenous hardware clusters.

For that to happen, all array-handling and calculations would need to be moved to a helper object, that can serve as the "driver" for different back-ends - such a "driver" could then be shared with other ANN projects like Brain.js/synaptic, and could even become its own project at some point - think of it as an API for creating ANN frameworks.

@robertleeplummerjr
Copy link

arrayfire seems like magic!

@UniqueFool
Copy link

UniqueFool commented May 29, 2016

It doesn't need to be arrayfire - there are other solutions that allow code to be vectorized automatically to make use of different back-ends - no matter if that means OpenMP (SIMD) or OpenCL.

Note that OpenCL would work even for GPU-less systems as long as a an openCL environment is installed that exposes the CPU as an OpenCL environment (e.g. Intel/AMD)

@ghost
Copy link

ghost commented May 29, 2016

So who here has the time and knowledge to implement that ?

@robertleeplummerjr
Copy link

I can offer some time. @UniqueFool do you have an example of your neural net with and without arrayfire? That would be terribly helpful.

@UniqueFool
Copy link

UniqueFool commented May 29, 2016

I think I posted a comment a few days ago containing a link to the arrayfire example implementing a NN: https://github.com/arrayfire/arrayfire-js/blob/master/examples/es6/machine-learning/neuralNetwork.js

Note that this would require arrayfire, and arrayfire-js to be installed first of all (there are different dependencies required depending on the back-ends you want to support/use).

And notice that, for the time being, this is unrelated to synaptic - which is why I asked for pointers on refactoring the existing synaptic code to encapsulate array handling and any calculations that would benefit from vectorization

But like I said, I would not necessarily make this specific to arrayfire, it was really just meant to make the point that OpenCL scales better than OpenMP-level parallelization, because the latter can be used by the former on platforms without hardware-accelerated GPUs/FPGAs, while the opposite is not the case - which is to say that even heavily multi-threaded code cannot automatically benefit from dedicated vectorization hardware, unless it happens to be a CPU.

My suggestion would be not to actually do any coding until @cazala has left a comment, and preferably a few pointers, here.

Personally, my suggestion would be supporting something like arrayfire as an optional back-end, and for that, we would need to work out a way to rework the existing code base so that its major computation workhorses are encapsulated into helper functions/classes that can directly deal with arrays in a functional fashion (think map/reduce and filter), at which point it will be much more straightforward to map everything to a different back-end like arrayfire-js.

Obviously, it does not make sense to make such a back-end mandatory, because OpenCL/CUDA and even C is really only supported by node, whereas synaptic has to run in the browser, too - and with WebCL still being experimental, it is rather challenging to make that work in a portable fashion (e.g. see this)

@robertleeplummerjr
Copy link

I should have referenced the above link, and asked for a pure js example. Speaking naively, what I meant was do you have a version with no arrayfire? So we could see exactly the before arrayfire and after arrayfire implementations, for ease of reference.

@UniqueFool
Copy link

UniqueFool commented May 29, 2016

If you are referring to pure synaptic vs. synaptic using arrayfire - no, I don't have any code doing that, that was the whole point of my original comment - however, looking at the code in question, what we basically need to do to use a different vectorization backend (like arrayfire) is locating the calculation routines in syntaptic and mapping those to the af.* calls that can be seen in the arrayfire NN example

There's another example in the machine learning folder at: https://github.com/arrayfire/arrayfire-js/blob/master/examples/es6/machine-learning/ann.js

In general, the docs are pretty good actually: http://arrayfire.org/arrayfire-js/

Apart from that, I am not sure if there is a side-by-side comparison illustrating how to adopt the library - at least, I haven't seen one yet.

@robertleeplummerjr
Copy link

One of the things that makes these libraries so powerful is that they run in the browser, node, and elsewhere. I bet if we put some thought into it, we can achieve a model that checks for the existence of arrayfire, and uses it, if not it gracefully degrades to the status quo.

@cazala
Copy link
Owner

cazala commented May 31, 2016

I believe the only two parts of the code that need to be optimized are the activation and propagation methods from the Neuron, since they do all math. All they do is performing simple additions and multiplications over values in arrays using for-loops (the comments point to the equations in the paper if you like the algebraic representation of the algorithm better), it shouldn't be hard to encapsulate those two into a driver class and abstract the array operations to use methods like map/reduce. I can give it a try in a separate branch if I find some time but that's probably not going to happen in the following couple weeks, but anyone is welcome to give it a shot and I'd be happy to help as much as I can. I already toyed in the past with the idea of using WebGL (also WebCL but that's still too experimental and din't get too far) and I believe we can achieve a great performance boost for big networks (using layers with thousands of neurons rather than dozens)

@UniqueFool
Copy link

UniqueFool commented May 31, 2016

Thanks for getting back i ntouch, I posted a follow-up over at the arrayfire tracker (here), and they generally seem supportive of the idea - but mentioned that some key APIs are going to change due to async related refactorings.

I do agree that a map/reduce approach would be useful to help generalize the existing code, at which point it will be easier to adopt a different back-end like arrayfire

@UniqueFool
Copy link

I believe the only two parts of the code that need to be optimized are the activation and propagation methods from the Neuron, since they do all math.

How about moving the current implementation to a different object, and letting the constructor accept a calllback that directly deals with the corresponding arrays in a forEach fashion ?

That way, we could override the default behavior using a different/hardware-accelerated back-end like arrayfire, just by providing a custom callback that implements the activation/propagation methods using whatever means is available ?

@robertleeplummerjr
Copy link

I like it!
On May 31, 2016 7:39 AM, "UniqueFool" [email protected] wrote:

I believe the only two parts of the code that need to be optimized are the
activation and propagation methods from the Neuron, since they do all math.

How about moving the current implementation to a different object, and
letting the constructor accept a calllback that directly deals with the
corresponding arrays in a forEach fashion ?

That way, we could override the default behavior using a
different/hardware-accelerated back-end like arrayfire, just by providing a
custom callback that implements the activation/propagation methods using
whatever means is available ?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#70 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AApcu-43HBSbT1M_K0Pk3AiqpzIUV8Jtks5qHB3xgaJpZM4HJGxJ
.

@UniqueFool
Copy link

UniqueFool commented May 31, 2016

And that would in fact also work very well for all currently supported use-cases, including the browser/non-arrayfire scenario - because the default behavior would be left "as is", whereas we could add some startup/runtime flag (or heuristics) to enable the arrayfire based version of the activation/propagation functions.

The added benefit is that arrayfire-js optionally supports custom OpenCL kernels, which is to say that existing OpenCL kernels implementing CNNs, RNNs etc could be reused for a more aggressively optimized version, despite never having been written with JavaScript use in mind.

Equally, a map/reduce approach would make it much easier to make use of web workers (browser) or other parallelization schemes (think clustering) that may not even have access to OpenCL or the GPU in general.

@UniqueFool
Copy link

I believe the only two parts of the code that need to be optimized are the activation and propagation methods from the Neuron, since they do all math.

Just for the sake of completeness, even the training/backpropagation part of the code could in theory make use of parallelization, and thus, speed up network training considerably.

Here's a short, and very accessible/light, 7-page PDF illustrating the basic concept, based on doing parallel backprop on 1ghz dual-core systems (note, no GPU use at all): http://www.neuropro.ru/mypapers/lncs3606.pdf

This is something that people experimented with already in the early 90s: http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1226&context=ecetr

And here's a more recent paper detailing how OpenCL can be used for parallelizing ANN training: https://bib.irb.hr/datoteka/584308.MIPRO_2011_Nenad.pdf
With the conclusion stating:

This paper describes two implementations of parallel neural
network training algorithms with OpenCL. The programming
model allows the implementations to be deployed on different
platforms; although the maximum efficiency is obtained on a
GPU device, the program can also be executed on a general
purpose CPU.
Training with parallel backpropagation proved to be efficient
only with large networks and is not recommended for smaller
networks and smaller number of samples.

@UniqueFool
Copy link

UniqueFool commented Jun 2, 2016

Just for future reference, here are two github projects which apparently use OpenCL kernels for parallel backpropagation:

Besides, the arrayfire project is currently creating a machine-learning library on top of arrayfire, which will include ANN support

For details, see: arrayfire/arrayfire-ml#3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants