-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matrix operations and performance #275
Comments
Thanks. Yes, there is a lot of room for improving the performance of math.js, and you're welcome to contribute it you like :) Right now for every operation math.js does validates the number, type, and size of the input values (numeric.js doesn't do that). That gives some overhead, which is relatively large for matrix operations. We can for example improve performance a lot when we know that a matrix just contains regular numbers, so there there is no need to do the type checking for every operation on every value in the matrix. See also this discussion: #59. For v1 of the library we've focused on a consistent and user friendly API. Right now, I'm working on modularizing the library (see v2 branch). After that, performance and more (advanced) functions will get high priority. Once modularized, it will be quite easy to start implementing these kind of performance improvements (I'm curious to see the the performance of typed arrays in action...). |
I am working on a Sparse Matrix implementation, you can see it on my "matrix" branch. So far I have implemented Compressed Row Storage (CRS) and Compressed Column Storage (CCS) formats. It is a work in process that I plan to complete soon. I am planning to change Matrix to support plugable storage implementations, possible candidates are CRS, CCS and Dense (current format). At creation time the consumer can specify a specific format or leave it to the default (Dense). Something like: var m = new Matrix([[...]], 'crs'); Many of the current Matrix.prototype.operations will be delegated to the storage implementation to use "native" operations when available. I think it would be good if we add support for specifying the format to some of the functions supporting matrix operations (if it is possible without breaking existing code): var m = math.eye(2, 2, 'ccs'); The same way, these functions should use "native" implementations if available. What do you think? |
That makes sense. Right now the Matrix class stores data in nested arrays. I would like to rewrite this to a single, flat array (as discussed here #59 by @garrison). That will for example allow to utilize typed arrays. The Matrix implementation would then get a property like "typed" or "mixed" to denote whether the matrix contents are of a single type or contain mixed data. In the first case this would allow for a optimization as there is no need to type-check matrix values individually. In the same spirit, a Matrix can have a special mode to store data in a sparse form. All functions creating matrices (math.matrix, math.eye, etc.) should support options to specify what type of matrix you want, and besides that, we can add some "smart" code to automatically select the best type for given data. |
Is this feature something that you are planning for v2 or it can be incorporated in v1.X? |
If it's non-breaking it could be added right now in v1. But if the implementation will be way easier using the |
The CCS and CRS sparse matrix format implementations require the use of the "equal" function in order to test if a given matrix item value is zero. The "equal" function is sensitive to the current math instance configuration (math.epsilon), so it is a challenge on how to define the CCS and CRS protypes. None of the types currently defined in mathjs are config sensitive, those prototypes are defined only once independently of the "math" instance: var math1 = math.create(); var m = new math1.type.Matrix(); assert.ok(m instanceof math2.type.Matrix); ==> OK The easiest way to implement different storage formats is by having a registry of available formats and having Matrix implementation delegate to the selected format implementation: Matrix.format = { This presents the problem of Matrix definition shared among "math" instances. Is it a requirement having a single prototype definition across "math" instances? Do you have any recommendation on how to change the "math" instance creation to accomplish this requirement? |
Ok so if I understand it correctly the SparseMatrix needs the I'm not sure about the implications of changing the Note that we have a similar case already with BigNumber, which also needs to be constructed based on I think the release of v2 will still take some serious time, at least months. There are a lot of functions that need to be refactored (relatively simple work but takes time), and there are still a few hurdles to take with the new |
Yes, some of the matrix storage formats require the "equal" function. The change to require('./type/Matrix')(math, config) is big but doable. I started doing it and stop to check if there was another solution. To avoid having different instances of Matrix inside the same "math" instance we will need to create the type and pass it to the different function implementations to avoid requiring the "Matrix" file again. For v1 the require call must be changed to require('module')(math, config) and get the Matrix instance from math.type.Matrix. For v2 the require calls must be changed to require('module')(config, type) and get the types from type.Matrix. We need to compare the item value with zero in order to avoid storing it. Compressed formats (CCS and CRS) only store matrix elements different than zero. We need to use "equal" since a Matrix can store Number, Complex and BigNumber. and in any case the correct solution is "equal(item, 0)". I will work on v1 to make all the changes. We can review it once it is complete. |
I will close this issue: thanks to @rjbaucells we now have support for sparse matrices. And and API for being able to use typed matrices is already implemented by @rjbaucells and just needs to be opened up via some API (which can automatically mark a new matrix as typed, if I'm right this isn't yet implemented) |
Thank you for the sparse matrix implementation. I tried again with @josdejong: you said earlier that "Right now for every operation math.js does validates the number, type, and size of the input values (numeric.js doesn't do that)." - Would it be possible to manually disable these validations in favor of performance? Is there any performance difference in using |
Sparse matrix operations can be improved, I just did one round of performance improvements. Can you provide a specific case where numeric.js is performing better than math.js? |
@rjbaucells: I cannot produce a reduced test case for you but I can give you the code I'm running and example output of the input data: https://gist.github.com/amergin/0e88075d9d94141c3094
Tte variables are split equally to web worker instances (and WW instance count should be estimated to be the same as number of cores). So, on a computer that has 4 cores and 40 variables selected for the execution, and the variable is exists in 3 datasets, 4 web workers are created and each WW executes the Any tips to increase the performance would be appreciated. |
I do not hace access to a computer now but it seems you are not using sparse matrices in your code. Change the calls to math.matrix() with math.sparse() and run the tests again. |
The definition of sparce matrix says that most elements of the matrix are zero. In my case the input matrices are filled with mostly non-zero content. The code uses sparce matrix only when producing an identity matrix. |
Sorry, I thought the matrices were sparse. Another optimization you can use is when all elements in the matrix are of the same data type. There is a parameter you can pass at the time of matrix creation to specify the data type. Not all matrix operations have been optimized to use the data type. |
Is the parameter for setting data type documented somewhere? I see |
Yes, that's the parameter. Setting the parameter to 'number' will make operations that have been optimized faster. |
@rjbaucells: Okay, added the datatype definition to 16 variables included in 3 datasets that comprise of 10000 samples. Elapsed time measured with Chrome's Math.js, elapsed times of three executions: 104 seconds, 103 seconds, 105 seconds Numeric.js, elapsed times of three executions: 17 seconds, 17 seconds, 18 seconds Either I'm doing something horribly wrong or Math.js just does not scale to big data. Any ideas? |
I do not have computer access now. I will look at the data and operations you are working on once I get back in a few weeks. |
@rjbaucells have you had time to look into this? |
The library seems to choke when computing larger matrices. For instance, consider this snippet:
xMatrix
contains some 1600 rows and 4 columns. Computing this in 4 web workers, math.js eats up 2GB of memory on Chrome and spends several minutes on the first few steps. I terminated the computation after around 10 minutes. To compare,numeric.js
seems to be able to do the same calculation in less than a minute.The text was updated successfully, but these errors were encountered: