Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of determinant #908

Closed
josdejong opened this issue Jul 31, 2017 · 28 comments
Closed

Improve performance of determinant #908

josdejong opened this issue Jul 31, 2017 · 28 comments

Comments

@josdejong
Copy link
Owner

When running some simple benchmarks of mathjs:

node benchmark/matrix_operations.js

You see output like:

matrix operations mathjs A+A             x 85,375 ops/sec ±1.83% (89 runs sampled)
matrix operations mathjs A*A             x 2,843 ops/sec ±0.15% (96 runs sampled)
matrix operations mathjs A'              x 79,028 ops/sec ±1.73% (91 runs sampled)
matrix operations mathjs det(A)          x 105 ops/sec ±0.21% (75 runs sampled)
matrix operations sylvester A+A          x 69,318 ops/sec ±1.17% (92 runs sampled)
matrix operations sylvester A*A          x 5,659 ops/sec ±0.21% (96 runs sampled)
matrix operations sylvester A'           x 113,950 ops/sec ±1.79% (85 runs sampled)
matrix operations sylvester det(A)       x 9,778 ops/sec ±1.28% (88 runs sampled)
matrix operations numericjs A+A          x 426,161 ops/sec ±1.58% (81 runs sampled)
matrix operations numericjs A*A          x 33,622 ops/sec ±0.33% (95 runs sampled)
matrix operations numericjs A'           x 528,647 ops/sec ±1.30% (91 runs sampled)
matrix operations numericjs det(A)       x 81,438 ops/sec ±0.54% (89 runs sampled)
matrix operations ndarray A+A            x 342,996 ops/sec ±1.03% (92 runs sampled)
matrix operations ndarray A*A            x 25,012 ops/sec ±0.52% (91 runs sampled)
matrix operations ndarray A'             x 390,326 ops/sec ±1.68% (85 runs sampled)
matrix operations ndarray det(A)         x 400,209 ops/sec ±0.44% (95 runs sampled)

math.js is slower than other alternatives (up to 1 order of magnitude). This is probably caused by math.js dealing with mixed types rather than just numbers, and there is room for improvement there, but that's for an other topic.

There is one extreme outlier above: calculating the determinant det(A) which is really hundreds of times slower than other math libraries.

Who is interested in diving into this performance issue and try to improve the performance of the determinant?

@firepick1
Copy link
Collaborator

Determinants can result in big equations. Is the slowness due to: a) the creation of the equations or the b) the evaluation of the created equation?

If (b) then my Equations module might help, since it only evaluates sub-expressions once.

@josdejong
Copy link
Owner Author

There aren't really equations into play here, it's just numeric number crunching of a matrix (see det.js).

@firepick1
Copy link
Collaborator

Oh. I see. It's numeric, not symbolic. I had a hunch that a symbolic equation representing the determinant could be optimized. However, getting that symbolic equation would take some doing. 😶

@firepick1
Copy link
Collaborator

I agree that mathjs operator polymorphism (e.g., add() vs '+') may be the magnitude killer here. Its the old "interpreted vs. compiled" order of magnitude. If this hypothesis holds, then we might want to consider a sister package, "mathjs-double". Such a package would be tuned for performance but retain most of the mathjs API goodness. An alternative solution would be to go the C++ node extension route, which is a lot of work 😩

@firepick1
Copy link
Collaborator

firepick1 commented Aug 2, 2017

(deleted comment. sorry wrong thread)

@josdejong
Copy link
Owner Author

For some Matrix operations, there is already an optimization in place that checks whether the Matrix does not contain mixed types (i.e. just numbers or just BigNumbers), and in that case, don't use the generic mixed type math.add, but the fast and simple implementation of math.add just for numbers. Maybe we could apply that here too.

But at this point I don't know if that's actually the cause, or that the current algorithm isn't efficient enough and we should look for a different algorithm. We need to figure that out first.

@firepick1
Copy link
Collaborator

As a thought exercise, if we expressed any particular determinant as an expanded equation (i.e., write out the det()) and compiled that expression to Javascript, it would be faster because we would have eliminated all the function calls and conditionals. Indeed, we know this works because other Javascript math packages are faster. An additional performance boost for matrices of size N>3 would be to memoize the sub-products that recur. Such memoization would precede the compilation into Javascript.

The above exercise leads one to consider an architectural approach to performance overall that would rely on the extensive use of compiled expression trees with fully expanded function nodes. Such an approach could yield blazingly fast evals of all compiled expressions. Some functions like sin() would still be atomic, but others like det() and derivative() would benefit. In other words, my hypothesis is that the symbolic processing of math expressions is the key to performance.

Symbolic processing of math expressions allows us to maintain polymorphism via compilation rather than having it burden each evaluation. In this manner, we would compile det([[a,b], [c,d]]) as a*d-b*c for doubles and compile it as subtract(multiply(a,d), multiply(b,c)) for all other number types.

@josdejong
Copy link
Owner Author

Yes, so we can optimize by doing analysis of the data types beforehand, and optimize evaluation for that.

Note that math.js already uses compiling expressions and typed functions dynamically, that's why the expression parser is very fast once compiled, which allows you to evaluate an expression against a scope very fast.

Also, for example the function det already handles special cases of 2x2, 3x3 matrices, so the function is already optimized in that regard.

I don't think we necessary need to compile dynamic JavaScript, but we need switches which check for mixed type vs. single type contents of matrices, and then use either the generic or type specific implementation of an operation. Something like:

// pseudo code for adding two matrices:
var A = math.matrix([...]);
var B = math.matrix([...]);
var operation = math.add;

// we need a new method which returns the different types of the values in the matrix 
// for example return ['number'] when single type, or ['number', 'BigNumber'] in case 
// of mixed types
var typesA = A.contentTypes(); 
var typesB = B.contentTypes(); 
if (typesA.length === 1 && typesB.length === 1 && typesA[0] === typesB[0]) {
  // both matrices are single type, and have the same type (for example 'number')
  // -> find the type specific implementation of operation math.add for numbers:
  var signature = typesA[0] + ',' + typesB[0]  // 'number,number'
  var addSingleType = operation.signatures[signature];
  if (addSingleType) {
    // evaluate with the fast, single type implementation of math.add
    return algorithm13(A, B, addSingleType);
  }
}

// evaluate with the generic, multi type implementation of math.add
// (algorithm13 is one of a set of helper functions which applies a 
// callback function on two matrices element wise)
return algorithm13(A, B, operation); 

@firepick1
Copy link
Collaborator

That would be faster, yes. 😀
I guess I'm just idly wondering what we might do with functions such as detExpr() or multiplyExpr() which would return the optimizable expression of the abstract mathematical function. For example, detExpr([[a,b][c,d]) would return 'ad-bc'. Although such functions are the insanely verbose anti-thesis of abstraction, they do permit fascinating optimization possibilites via the one-time computation of recurring sub-expressions. 🤔 (this is where you nod somberly and ignore me)

@ericman314
Copy link
Collaborator

I tried replacing the polymorphic arithmetic in det.js and multiply.js with the native operators:

100x100
det(number): 3176ms
det_native(number): 869ms

Virtually all of the speed up was due to refactoring multiply.js to use native types, since most of the work done in det.js is multiplying matrices. But this doesn't come close to the benchmarks of other matrix libraries, so I wondered whether there was an algorithm that doesn't require multiplying matrices at all.

It turns out you can use the LU decomposition to compute the determinant, which is much faster:

100x100
det(number): 3212ms
det_native(number): 860ms
det_lu(number): 54ms

But the LU decomposition can be numerically unstable so the computed determinant starts to differ significantly between the two methods once you get larger than about 50x50.

Implementing a native version of lup.js gives even better results. Here are the two with a 500x500 matrix
(At this point I stopped benchmarking det and det_native):

500x500
det_lu(number): 1628ms
det_lu_native(number): 515ms

@josdejong
Copy link
Owner Author

(this is where you nod somberly and ignore me)

ha ha, of course not. I like being able to think aloud and have a sort of a brainstorm, which can result in great ideas :)

@ericman314 thanks for doing these experiments. So optimizing for matrices with a single data type can yield an improvement in the order of a factor 3 or 4, and a much bigger gain could be replacing the algorithm with LU decomposition for example. But unstable results for larger matrices are an issue of course :(

@ericman314
Copy link
Collaborator

ericman314 commented Aug 6, 2017 via email

@josdejong
Copy link
Owner Author

So you mean to say that it's acceptable to have a solution that becomes unstable for large matrices? That can be a pragmatic (and very fast) approach. I would love to somehow give a warning or error then. Or automatically fall back to a slow but precise and stable algorithm for large matrices. What do you think?

Maybe we should also be looking at benchmarks for calculating lots of small
determinants as well as one big one?

Yes, that will be best. Like calculating it for 2x2 or 3x3 can/should be extremely fast :) Feel free to extend the benchmark folder with new tests where needed.

@ericman314
Copy link
Collaborator

Are we still interested in this? I stashed my determinant testing code somewhere and now I can't find it. I can recreate it if we still want to use lup decomposition for calculating the determinant.

@josdejong
Copy link
Owner Author

Yes I think so! The current performance of det is not yet resolved and it would be great have a better performance. Maybe fallback on the slow algorithm where the fast algorithm isn't stable (large matrices you said?).

@ericman314
Copy link
Collaborator

I never did find that code I wrote, but I can do it again. Code always comes out better the second time around, anyway.

About falling back to the slow algorithm, I wouldn't recommend that. For matrices of the size where that would matter, the slow algorithm would be much too slow. I will do some research and see if there are any relaxation methods that can improve the accuracy of the LU decomp before evaluating the determinant.

@ericman314
Copy link
Collaborator

Well, I don't know if my code was any better the second time around but it is still faster than the previous implementation. Here are the results of the benchmark before and after the changes in PR #1118. (It doesn't contain any relaxation methods, though.)

Before

matrix operations mathjs det(A)          x 70.35 ops/sec ±0.35% (72 runs sampled)
matrix operations sylvester det(A)       x 6,103 ops/sec ±0.68% (95 runs sampled)
matrix operations numericjs det(A)       x 89,893 ops/sec ±0.73% (94 runs sampled)
matrix operations ndarray det(A)         x 446,671 ops/sec ±0.57% (93 runs sampled)

matrix operations mathjs det(A)          avg duration per operation: 14214 microseconds
matrix operations sylvester det(A)       avg duration per operation: 164 microseconds
matrix operations numericjs det(A)       avg duration per operation: 11 microseconds
matrix operations ndarray det(A)         avg duration per operation: 2 microseconds

After

matrix operations mathjs det(A)          x 2,987 ops/sec ±1.18% (95 runs sampled)
matrix operations sylvester det(A)       x 6,114 ops/sec ±0.81% (93 runs sampled)
matrix operations numericjs det(A)       x 90,075 ops/sec ±0.84% (92 runs sampled)
matrix operations ndarray det(A)         x 424,066 ops/sec ±1.62% (88 runs sampled)

matrix operations mathjs det(A)          avg duration per operation: 335 microseconds
matrix operations sylvester det(A)       avg duration per operation: 164 microseconds
matrix operations numericjs det(A)       avg duration per operation: 11 microseconds
matrix operations ndarray det(A)         avg duration per operation: 2 microseconds

@josdejong
Copy link
Owner Author

Wow!!! that's huge! Thanks a lot.

I would say the new performance is acceptable, it's now its on par with other matrix operations like multiplication. Shall we close this issue, or do you want to keep it open for another improvement round?

@ericman314
Copy link
Collaborator

Computing determinants isn't very important in linear algebra, but computing the LU decomposition is. If someone would like to profile that function and look for some optimizations I think it could be worth the time.

@josdejong
Copy link
Owner Author

That makes sense. I will close this issue now, the improvements are now released in v4.4.2

@josdejong
Copy link
Owner Author

josdejong commented Jul 8, 2018

@ericman314 I just discovered something awesome (triggered by #1154): the performance of matrix operations is only optimized for matrices containing a single datatype (like numbers) when you explicitly pass the datatype. I wrongly assumed there was code in place to automatically detect the datatype of matrix contents.

So:

const A1 = math.zeros(100, 100)
const A2 = math.matrix(A1, 'dense', 'number')

console.time('1'); 
const R1 = math.multiply(A1, A1); 
console.timeEnd('1');  // 147 ms on Firefox, 86 ms on Chrome

console.time('2'); 
const R2 = math.multiply(A2, A2); 
console.timeEnd('2');  // 11 ms on Firefox, 30 ms on Chrome

This means that mathjs is / can be much faster than we described in our paper: at worst only 2 times slower than the fastest JS libraries instead of 10 times (!). I've run the benchmarks again and added test cases for generic (mixed) vs number Matrices:

matrix operations mathjs (number) A+A    x 303,444 ops/sec ±0.63% (95 runs sampled)
matrix operations mathjs (number) A*A    x 27,310 ops/sec ±0.82% (92 runs sampled)
matrix operations mathjs (number) A'     x 338,112 ops/sec ±1.04% (92 runs sampled)
matrix operations mathjs (number) det(A) x 3,069 ops/sec ±0.69% (94 runs sampled)
matrix operations mathjs (generic) A+A   x 50,932 ops/sec ±1.45% (89 runs sampled)
matrix operations mathjs (generic) A*A   x 1,163 ops/sec ±0.36% (96 runs sampled)
matrix operations mathjs (generic) A'    x 307,134 ops/sec ±3.75% (85 runs sampled)
matrix operations mathjs (generic) det(A) x 3,032 ops/sec ±1.00% (93 runs sampled)
matrix operations sylvester A+A          x 127,192 ops/sec ±0.31% (92 runs sampled)
matrix operations sylvester A*A          x 7,102 ops/sec ±0.32% (97 runs sampled)
matrix operations sylvester A'           x 143,134 ops/sec ±0.28% (96 runs sampled)
matrix operations sylvester det(A)       x 7,107 ops/sec ±0.98% (92 runs sampled)
matrix operations numericjs A+A          x 675,836 ops/sec ±0.40% (91 runs sampled)
matrix operations numericjs A*A          x 52,502 ops/sec ±0.39% (97 runs sampled)
matrix operations numericjs A'           x 628,030 ops/sec ±0.33% (94 runs sampled)
matrix operations numericjs det(A)       x 101,374 ops/sec ±0.24% (95 runs sampled)
matrix operations ndarray A+A            x 409,924 ops/sec ±0.66% (94 runs sampled)
matrix operations ndarray A*A            x 29,841 ops/sec ±0.25% (96 runs sampled)
matrix operations ndarray A'             x 572,023 ops/sec ±0.85% (96 runs sampled)
matrix operations ndarray det(A)         x 626,702 ops/sec ±0.25% (95 runs sampled)

matrix operations mathjs (number) A+A    avg duration per operation: 3 microseconds
matrix operations mathjs (number) A*A    avg duration per operation: 37 microseconds
matrix operations mathjs (number) A'     avg duration per operation: 3 microseconds
matrix operations mathjs (number) det(A) avg duration per operation: 326 microseconds
matrix operations mathjs (generic) A+A   avg duration per operation: 20 microseconds
matrix operations mathjs (generic) A*A   avg duration per operation: 860 microseconds
matrix operations mathjs (generic) A'    avg duration per operation: 3 microseconds
matrix operations mathjs (generic) det(A) avg duration per operation: 330 microseconds
matrix operations sylvester A+A          avg duration per operation: 8 microseconds
matrix operations sylvester A*A          avg duration per operation: 141 microseconds
matrix operations sylvester A'           avg duration per operation: 7 microseconds
matrix operations sylvester det(A)       avg duration per operation: 141 microseconds
matrix operations numericjs A+A          avg duration per operation: 1 microseconds
matrix operations numericjs A*A          avg duration per operation: 19 microseconds
matrix operations numericjs A'           avg duration per operation: 2 microseconds
matrix operations numericjs det(A)       avg duration per operation: 10 microseconds
matrix operations ndarray A+A            avg duration per operation: 2 microseconds
matrix operations ndarray A*A            avg duration per operation: 34 microseconds
matrix operations ndarray A'             avg duration per operation: 2 microseconds
matrix operations ndarray det(A)         avg duration per operation: 2 microseconds

I came across a bug when trying to add two matrices having a datatype defined, fixed that in the develop branch, see b44ce14 (you have to run npm run build before executing the benchmarks)

From this I see two interesting action points:

  1. When creating a matrix, the datatype (like number) should be automatically determined so any calculations on this matrix are way faster.
  2. From the benchmarks, I don't see a difference in performance for det whilst it should be much faster. I think that maybe the datatype information is lost inside, maybe inside lup. If we can fix that it would mean another huge bump in the performance of det :)

@ericman314
Copy link
Collaborator

Cool! And you're right, det should be much faster with the explicit typing. I'll take a look at lup and see what I can figure out. And keeping track of the data types is a good idea; I'll take a look at that too but I'm not as familiar with how or where matrices are actually created in math.js, to be honest.

@ericman314
Copy link
Collaborator

Hmm, I'm actually not sure now about det. After looking closer, lup is the only function called in det that does any significant work. And in lup, there are no other matrix functions called. All the computation is done on individual elements. A long time ago I created a version of lup that used primitive JavaScript types, rather than the generic functions, and I think it was about two or three times faster.

And I found the place where matrices are constructed, but I'm afraid I'll mess something up if I try to figure out how to automatically guess the dataType when it's not supplied.

@josdejong
Copy link
Owner Author

Thanks for checking it out. Looks like we have to dig deeper then to find out why lup doesn't utilize matrices explicit types.

A different solution would indeed be to have a separate implementation for the different numeric types but I'm not sure if we should go there right now.

In #1154, @JasonShin is working on a new function getMatrixDataType and corresponding matrix methods getDateType. I think when we have that in place it will be quite easy use that instead of reading the matrix property ._datatype to determine whether we can use explicit types. See my comment here: #1154 (comment).

@bartekleon
Copy link
Contributor

bartekleon commented Dec 31, 2019

there is an amazing algorithm for it which takes performance to O(n^2.372)
https://people.csail.mit.edu/virgi/matrixmult-f.pdf
or some others with O(n^3) (which, as people say, are more practical)
But I have to analize papers about them.

Meanwhile I'll take a look on current implementation. There probably is some bottleneck or unnecessary condition which breaks something

@josdejong
Copy link
Owner Author

Meanwhile I'll take a look on current implementation. There probably is some bottleneck or unnecessary condition which breaks something

Yes it could very well be an implementation issue and not the algorithm itself.

@bartekleon
Copy link
Contributor

bartekleon commented Jan 6, 2020

@josdejong I have read code (nothing caught my attention tho) and tested several different algorithms for determinants, even these from numericjs which were supposed to be the fastest. They are, but only for normal numbers (I got ~7 times increased speed). Though when i try using BigNumbers, performance drops so drastically, that your algorithm is faster (400000ops/s for normal numbers, 60000ops/s, 130ops/s for BigNumbers).

Sadly I have to say that I won't help with this issue. It is not my field of expertice and honestly I have no other idea how to touch it :/.
About gamma function - I have been looking for some nice algorithms, papers covering it, but still haven't found anything interesting. I'm going to talk with my maths teacher about it. Maybe we'll find some solution

@josdejong
Copy link
Owner Author

Thanks for having a look Bartosz.

Maybe we should close this issue, a big step was already made by Eric long time ago.

I've created a separate issue for the idea I explained in #1154 (comment), it's separate from improving determinant.

Repository owner locked and limited conversation to collaborators Aug 26, 2022
@josdejong josdejong converted this issue into discussion #2724 Aug 26, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

4 participants