Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrix operations and performance #275

Closed
amergin opened this issue Feb 10, 2015 · 20 comments
Closed

Matrix operations and performance #275

amergin opened this issue Feb 10, 2015 · 20 comments

Comments

@amergin
Copy link

amergin commented Feb 10, 2015

The library seems to choke when computing larger matrices. For instance, consider this snippet:

//see https://en.wikipedia.org/wiki/Ordinary_least_squares#Estimation
//beta = (X^T X)^{-1} X^T y 
var beta = math.chain(xMatrixTransp)
.multiply(xMatrix)
.inv()
.multiply(xMatrixTransp)
.multiply(targetData)
.done();
return beta;

xMatrix contains some 1600 rows and 4 columns. Computing this in 4 web workers, math.js eats up 2GB of memory on Chrome and spends several minutes on the first few steps. I terminated the computation after around 10 minutes. To compare, numeric.js seems to be able to do the same calculation in less than a minute.

@josdejong
Copy link
Owner

Thanks. Yes, there is a lot of room for improving the performance of math.js, and you're welcome to contribute it you like :)

Right now for every operation math.js does validates the number, type, and size of the input values (numeric.js doesn't do that). That gives some overhead, which is relatively large for matrix operations. We can for example improve performance a lot when we know that a matrix just contains regular numbers, so there there is no need to do the type checking for every operation on every value in the matrix.

See also this discussion: #59. For v1 of the library we've focused on a consistent and user friendly API. Right now, I'm working on modularizing the library (see v2 branch). After that, performance and more (advanced) functions will get high priority. Once modularized, it will be quite easy to start implementing these kind of performance improvements (I'm curious to see the the performance of typed arrays in action...).

@rjbaucells
Copy link
Collaborator

I am working on a Sparse Matrix implementation, you can see it on my "matrix" branch. So far I have implemented Compressed Row Storage (CRS) and Compressed Column Storage (CCS) formats. It is a work in process that I plan to complete soon.

I am planning to change Matrix to support plugable storage implementations, possible candidates are CRS, CCS and Dense (current format). At creation time the consumer can specify a specific format or leave it to the default (Dense). Something like:

var m = new Matrix([[...]], 'crs');

Many of the current Matrix.prototype.operations will be delegated to the storage implementation to use "native" operations when available.

I think it would be good if we add support for specifying the format to some of the functions supporting matrix operations (if it is possible without breaking existing code):

var m = math.eye(2, 2, 'ccs');

The same way, these functions should use "native" implementations if available.

What do you think?

@josdejong
Copy link
Owner

That makes sense.

Right now the Matrix class stores data in nested arrays. I would like to rewrite this to a single, flat array (as discussed here #59 by @garrison). That will for example allow to utilize typed arrays. The Matrix implementation would then get a property like "typed" or "mixed" to denote whether the matrix contents are of a single type or contain mixed data. In the first case this would allow for a optimization as there is no need to type-check matrix values individually. In the same spirit, a Matrix can have a special mode to store data in a sparse form.

All functions creating matrices (math.matrix, math.eye, etc.) should support options to specify what type of matrix you want, and besides that, we can add some "smart" code to automatically select the best type for given data.

@rjbaucells
Copy link
Collaborator

Is this feature something that you are planning for v2 or it can be incorporated in v1.X?
If this is for v2 I can take advantage of the "typed-function" while implementing it.

@josdejong
Copy link
Owner

If it's non-breaking it could be added right now in v1. But if the implementation will be way easier using the typed-function functionality of v2, it maybe better to wait or start implementing it in the v2 branch, else we're wasting a lot of time.

@rjbaucells
Copy link
Collaborator

The CCS and CRS sparse matrix format implementations require the use of the "equal" function in order to test if a given matrix item value is zero. The "equal" function is sensitive to the current math instance configuration (math.epsilon), so it is a challenge on how to define the CCS and CRS protypes. None of the types currently defined in mathjs are config sensitive, those prototypes are defined only once independently of the "math" instance:

var math1 = math.create();
var math2 = math.create();

var m = new math1.type.Matrix();

assert.ok(m instanceof math2.type.Matrix); ==> OK

The easiest way to implement different storage formats is by having a registry of available formats and having Matrix implementation delegate to the selected format implementation:

Matrix.format = {
'ccs': require('path/to/CssFormat')(config), <== problem, config sensitive!
'crs': require('path/to/CrsFormat')(config), <== problem, config sensitive!
'dense': require('path/to/DenseFormat')
};

This presents the problem of Matrix definition shared among "math" instances.

Is it a requirement having a single prototype definition across "math" instances?

Do you have any recommendation on how to change the "math" instance creation to accomplish this requirement?

@josdejong
Copy link
Owner

Ok so if I understand it correctly the SparseMatrix needs the equal function, which in v1 is dependent on both math and config (and in v2 will only depend on config, which is nicer).

I'm not sure about the implications of changing the Matrix from being static to being instantiated as require('./type/Matrix')(math, config). It's used in a lot of files, which will give a cascading effect. It should be doable I think, but it would be even nicer if the matrix class could do without config. What sort of equality checks do you need in SparseMatrix? Do you really need math.equal, or would a simple, local equal function be ok too?

Note that we have a similar case already with BigNumber, which also needs to be constructed based on config.precision (see #279).

I think the release of v2 will still take some serious time, at least months. There are a lot of functions that need to be refactored (relatively simple work but takes time), and there are still a few hurdles to take with the new typed-function solution, memory usage and load time needs improvement.

@rjbaucells
Copy link
Collaborator

Yes, some of the matrix storage formats require the "equal" function.

The change to require('./type/Matrix')(math, config) is big but doable. I started doing it and stop to check if there was another solution. To avoid having different instances of Matrix inside the same "math" instance we will need to create the type and pass it to the different function implementations to avoid requiring the "Matrix" file again. For v1 the require call must be changed to require('module')(math, config) and get the Matrix instance from math.type.Matrix. For v2 the require calls must be changed to require('module')(config, type) and get the types from type.Matrix.

We need to compare the item value with zero in order to avoid storing it. Compressed formats (CCS and CRS) only store matrix elements different than zero. We need to use "equal" since a Matrix can store Number, Complex and BigNumber. and in any case the correct solution is "equal(item, 0)".

I will work on v1 to make all the changes. We can review it once it is complete.

@josdejong
Copy link
Owner

I will close this issue: thanks to @rjbaucells we now have support for sparse matrices. And and API for being able to use typed matrices is already implemented by @rjbaucells and just needs to be opened up via some API (which can automatically mark a new matrix as typed, if I'm right this isn't yet implemented)

@amergin
Copy link
Author

amergin commented Jul 31, 2015

Thank you for the sparse matrix implementation. I tried again with v.2.0.1 and unfortunately, the CPU and memory usage is still relatively high compared to what I've experienced with numeric.js.

@josdejong: you said earlier that "Right now for every operation math.js does validates the number, type, and size of the input values (numeric.js doesn't do that)." - Would it be possible to manually disable these validations in favor of performance?

Is there any performance difference in using math.chain() vs. calculating steps successively vs. forming an expression tree & compiling it & evaluating it?

@rjbaucells
Copy link
Collaborator

Sparse matrix operations can be improved, I just did one round of performance improvements. Can you provide a specific case where numeric.js is performing better than math.js?

@amergin
Copy link
Author

amergin commented Aug 1, 2015

@rjbaucells: I cannot produce a reduced test case for you but I can give you the code I'm running and example output of the input data: https://gist.github.com/amergin/0e88075d9d94141c3094

statDist.tdistr is not included in the gist, but it's the function found here.

mathJs function is called with new data repeatedly. In this example the larger dimension of the matrix is 4084 but it could be thousands more in other execution example. This function is called repeatedly with new variable data.

Tte variables are split equally to web worker instances (and WW instance count should be estimated to be the same as number of cores). So, on a computer that has 4 cores and 40 variables selected for the execution, and the variable is exists in 3 datasets, 4 web workers are created and each WW executes the mathJs function 10*3 times.

Any tips to increase the performance would be appreciated.

@rjbaucells
Copy link
Collaborator

I do not hace access to a computer now but it seems you are not using sparse matrices in your code. Change the calls to math.matrix() with math.sparse() and run the tests again.

@amergin
Copy link
Author

amergin commented Aug 1, 2015

The definition of sparce matrix says that most elements of the matrix are zero. In my case the input matrices are filled with mostly non-zero content. The code uses sparce matrix only when producing an identity matrix.

@rjbaucells
Copy link
Collaborator

Sorry, I thought the matrices were sparse. Another optimization you can use is when all elements in the matrix are of the same data type. There is a parameter you can pass at the time of matrix creation to specify the data type. Not all matrix operations have been optimized to use the data type.

@amergin
Copy link
Author

amergin commented Aug 1, 2015

Is the parameter for setting data type documented somewhere? I see math.matrix() takes three arguments of which the third seems to set the _datatype field. So if I were to init my matrices with math.matrix(whatEverNumContent, 'dense', 'number') there could be some performance implications vs. matrices that have _datatype undefined?

@rjbaucells
Copy link
Collaborator

Yes, that's the parameter. Setting the parameter to 'number' will make operations that have been optimized faster.

@amergin
Copy link
Author

amergin commented Aug 2, 2015

@rjbaucells: Okay, added the datatype definition to math.matrix statements and did some benchmarking (provided gist code):

16 variables included in 3 datasets that comprise of 10000 samples. Elapsed time measured with Chrome's performance.now(), 4 web workers.

Math.js, elapsed times of three executions: 104 seconds, 103 seconds, 105 seconds

Numeric.js, elapsed times of three executions: 17 seconds, 17 seconds, 18 seconds

Either I'm doing something horribly wrong or Math.js just does not scale to big data. Any ideas?

@rjbaucells
Copy link
Collaborator

I do not have computer access now. I will look at the data and operations you are working on once I get back in a few weeks.

@amergin
Copy link
Author

amergin commented Aug 26, 2015

@rjbaucells have you had time to look into this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants