Speed up parent creation for multiplication of square matrices #8096

boothby · 2010-01-27T20:32:00Z

Multiplication of small square matrices is ridiculously slow:

sage: for d in range(1, 100):
...    print d
...    A = random_matrix(GF(3), d)
...    B = random_matrix(GF(3), d)
...    timeit("C = A*B")
    

1
625 loops, best of 3: 93.8 µs per loop
2
625 loops, best of 3: 93.9 µs per loop
3
625 loops, best of 3: 94.2 µs per loop
4
625 loops, best of 3: 94.1 µs per loop
5
625 loops, best of 3: 94.7 µs per loop
6
625 loops, best of 3: 94.9 µs per loop
7
625 loops, best of 3: 95.2 µs per loop
8
625 loops, best of 3: 95.8 µs per loop
9
625 loops, best of 3: 96.8 µs per loop
10
625 loops, best of 3: 97.6 µs per loop
11
625 loops, best of 3: 98.1 µs per loop
12
625 loops, best of 3: 101 µs per loop
13
625 loops, best of 3: 101 µs per loop
14
625 loops, best of 3: 104 µs per loop
15
625 loops, best of 3: 104 µs per loop
16
625 loops, best of 3: 108 µs per loop
17
625 loops, best of 3: 108 µs per loop
18
625 loops, best of 3: 113 µs per loop
19
625 loops, best of 3: 112 µs per loop
20
625 loops, best of 3: 118 µs per loop
21
625 loops, best of 3: 117 µs per loop
22
625 loops, best of 3: 125 µs per loop
23
625 loops, best of 3: 123 µs per loop
24
625 loops, best of 3: 133 µs per loop
25
625 loops, best of 3: 129 µs per loop
26
625 loops, best of 3: 143 µs per loop
27
625 loops, best of 3: 137 µs per loop
28
625 loops, best of 3: 155 µs per loop
29
625 loops, best of 3: 147 µs per loop
30
625 loops, best of 3: 166 µs per loop
31
625 loops, best of 3: 157 µs per loop
32
625 loops, best of 3: 179 µs per loop
33
625 loops, best of 3: 168 µs per loop
34
625 loops, best of 3: 196 µs per loop
35
625 loops, best of 3: 182 µs per loop
36
625 loops, best of 3: 214 µs per loop
37
625 loops, best of 3: 198 µs per loop
38
625 loops, best of 3: 234 µs per loop
39
625 loops, best of 3: 213 µs per loop
40
625 loops, best of 3: 255 µs per loop
41
625 loops, best of 3: 231 µs per loop
42
625 loops, best of 3: 279 µs per loop
43
625 loops, best of 3: 251 µs per loop
44
625 loops, best of 3: 307 µs per loop
45
625 loops, best of 3: 271 µs per loop
46
625 loops, best of 3: 335 µs per loop
47
625 loops, best of 3: 298 µs per loop
48
625 loops, best of 3: 363 µs per loop
49
625 loops, best of 3: 319 µs per loop
50
625 loops, best of 3: 401 µs per loop
51
625 loops, best of 3: 346 µs per loop

Here's a profile of the 1x1 case:

625 loops, best of 3: 91.7 µs per loop
         810004 function calls in 5.980 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    40000    0.100    0.000    0.100    0.000 :0(IntegerMod)
    30000    0.080    0.000    0.080    0.000 :0(append)
    10000    0.030    0.000    0.030    0.000 :0(base_ring)
    10000    0.150    0.000    0.900    0.000 :0(category)
    40000    0.250    0.000    0.290    0.000 :0(has_key)
    10000    0.070    0.000    0.070    0.000 :0(hasattr)
    10000    0.030    0.000    0.030    0.000 :0(is_Matrix)
    80000    0.250    0.000    0.250    0.000 :0(isinstance)
    30000    0.070    0.000    0.070    0.000 :0(keys)
    30000    0.040    0.000    0.040    0.000 :0(len)
    30001    0.080    0.000    0.080    0.000 :0(range)
    10000    0.030    0.000    0.030    0.000 :0(setdefault)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
    30000    0.140    0.000    0.140    0.000 :0(sorted)
        1    0.380    0.380    5.980    5.980 <string>:1(<module>)
    30000    0.250    0.000    1.750    0.000 cachefunc.py:155(get_key)
    10000    0.060    0.000    0.850    0.000 cachefunc.py:220(__call__)
    10000    0.060    0.000    0.090    0.000 cachefunc.py:254(get_cache)
    10000    0.050    0.000    0.050    0.000 cachefunc.py:275(__get__)
    20000    0.270    0.000    1.550    0.000 cachefunc.py:76(__call__)
    20000    0.060    0.000    0.060    0.000 cachefunc.py:95(get_cache)
    10000    0.070    0.000    2.520    0.000 category.py:459(__contains__)
    10000    0.360    0.000    1.550    0.000 category.py:651(is_subcategory)
    20000    0.120    0.000    1.670    0.000 classcall_metaclass.py:64(__call__)
    20000    0.040    0.000    0.040    0.000 finite_field_prime_modn.py:121(characteristic)
    20000    0.110    0.000    0.110    0.000 finite_field_prime_modn.py:187(order)
    30000    1.010    0.000    1.500    0.000 function_mangling.py:205(fix_to_pos)
    30000    0.080    0.000    0.080    0.000 function_mangling.py:261(<genexpr>)
    40000    0.290    0.000    0.390    0.000 integer_mod_ring.py:733(__call__)
    10000    0.050    0.000    0.070    0.000 matrix_group_element.py:68(is_MatrixGroupElement)
    10000    0.450    0.000    0.710    0.000 matrix_space.py:1035(matrix)
    10000    0.140    0.000    4.000    0.000 matrix_space.py:1089(matrix_space)
    10000    0.200    0.000    3.830    0.000 matrix_space.py:110(MatrixSpace)
    10000    0.000    0.000    0.000    0.000 matrix_space.py:1112(ncols)
    10000    0.030    0.000    0.030    0.000 matrix_space.py:1124(nrows)
    10000    0.310    0.000    1.270    0.000 matrix_space.py:271(__call__)
    10000    0.000    0.000    0.000    0.000 misc.py:514(get_verbose)
        1    0.000    0.000    5.980    5.980 profile:0(for i in range(10000): C = A*B)
        0    0.000             0.000          profile:0(profiler)
    90000    0.270    0.000    0.270    0.000 unique_representation.py:454(__eq__)

Apply attachment: trac8096_speedup_matrix_parent.patch

CC: @robertwb @malb

Component: linear algebra

Author: Tom Boothby, Robert Bradshaw, Simon King

Reviewer: Simon King, David Loeffler

Merged: sage-5.0.beta9

Issue created by migration from https://trac.sagemath.org/ticket/8096

The text was updated successfully, but these errors were encountered:

boothby · 2010-01-28T01:03:44Z

Attachment: 8096-new_matrix.patch.gz

boothby · 2010-01-28T01:11:34Z

comment:2

I'm not sure that I did this in the right place, but it cuts the time to multiply two 1x1 matrices down to 1/3 of the previous time -- which is still dog slow, but significantly better. As a reference,

sage: A = GF(3).random_element()
sage: B = GF(3).random_element()
sage: timeit("C = A*B")
625 loops, best of 3: 142 ns per loop

and with this patch, I'm getting:

sage: d = 1
sage: A = random_matrix(GF(3), d)
sage: B = random_matrix(GF(3), d)
sage: timeit("C = A*B")
625 loops, best of 3: 32.5 µs per loop

sage: import profile
sage: profile.run("for i in range(10000): C = A*B")
         250004 function calls in 1.840 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    40000    0.120    0.000    0.120    0.000 :0(IntegerMod)
    10000    0.070    0.000    0.070    0.000 :0(hasattr)
    10000    0.060    0.000    0.060    0.000 :0(is_Matrix)
    70000    0.190    0.000    0.190    0.000 :0(isinstance)
        1    0.000    0.000    0.000    0.000 :0(range)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        1    0.400    0.400    1.840    1.840 <string>:1(<module>)
    20000    0.060    0.000    0.060    0.000 finite_field_prime_modn.py:121(characteristic)
    40000    0.230    0.000    0.350    0.000 integer_mod_ring.py:733(__call__)
    10000    0.060    0.000    0.070    0.000 matrix_group_element.py:68(is_MatrixGroupElement)
    10000    0.390    0.000    0.720    0.000 matrix_space.py:1035(matrix)
    10000    0.030    0.000    0.030    0.000 matrix_space.py:1112(ncols)
    10000    0.020    0.000    0.020    0.000 matrix_space.py:1124(nrows)
    10000    0.160    0.000    1.100    0.000 matrix_space.py:271(__call__)
    10000    0.050    0.000    0.050    0.000 misc.py:514(get_verbose)
        1    0.000    0.000    1.840    1.840 profile:0(for i in range(10000): C = A*B)
        0    0.000             0.000          profile:0(profiler)

simon-king-jena · 2010-01-30T23:44:39Z

comment:3

Replying to @boothby:

... As a reference, sage: A = GF(3).random_element() sage: B = GF(3).random_element() sage: timeit("C = A*B") 625 loops, best of 3: ``142 ns`` per loop and with this patch, I'm getting: {{{ sage: d = 1 sage: A = random_matrix(GF(3), d) sage: B = random_matrix(GF(3), d) sage: timeit("C = A*B") 625 loops, best of 3: 32.5 µs per loop

But 32.5 µs (with the patch) is much *slower *than 142 ns (without the patch)!

boothby · 2010-01-31T06:36:24Z

comment:4

You'll note that the 32.5µs is matrix-by-matrix, whereas the 142ns is element-by-element. Before my patch, the matrix-by-matrix time was 101µs.

I'd like the 32.5µs to go away, but I don't know how much of that would be possible, at the moment.

simon-king-jena · 2010-01-31T09:41:31Z

comment:5

Replying to @boothby:

You'll note that the 32.5µs is matrix-by-matrix, whereas the 142ns is element-by-element. Before my patch, the matrix-by-matrix time was 101µs. I'd like the 32.5µs to go away, but I don't know how much of that would be possible, at the moment.

OK, then it is an improvement. I hope to be able to do some refereeing later today or tomorrow.

Anyway, I still wonder why basic matrix operations in Sage tend to be so slow. I mean, do any complicated operations with parents happen behind the scenes? By "slow", I mean "compared with MeatAxe matrices as provided by my cohomology spkg":

sage: from pGroupCohomology.mtx import MTX
sage: F = GF(3)
sage: A = random_matrix(F,3)
sage: B = random_matrix(F,3)
sage: a = MTX(3,[list(r) for r in A.rows()])
sage: b = MTX(3,[list(r) for r in B.rows()])
sage: timeit("C=A*B")
625 loops, best of 3: 99.6 Âµs per loop
sage: timeit("c=a*b")
625 loops, best of 3: 1.01 Âµs per loop
sage: a*b == MTX(3,[list(r) for r in C.rows()])
True
sage: A = random_matrix(F,100)
sage: B = random_matrix(F,100)
sage: a = MTX(3,[list(r) for r in A.rows()])
sage: b = MTX(3,[list(r) for r in B.rows()])
sage: timeit("C=A*B")
125 loops, best of 3: 2.43 ms per loop
sage: timeit("c=a*b")
625 loops, best of 3: 376 Âµs per loop
sage: a*b == MTX(3,[list(r) for r in C.rows()])
True

robertwb · 2010-01-31T10:18:46Z

comment:6

Replying to @simon-king-jena:

Anyway, I still wonder why basic matrix operations in Sage tend to be so slow. I mean, do any complicated operations with parents happen behind the scenes? By "slow", I mean "compared with MeatAxe matrices as provided by my cohomology spkg":

This is because they have only been optimized for large dimension and word-sized p. I bet MeatAxe is slower for 1000 x 1000 matrices.

simon-king-jena · 2010-01-31T11:52:35Z

comment:7

Replying to @robertwb:

This is because they have only been optimized for large dimension and word-sized p. I bet MeatAxe is slower for 1000 x 1000 matrices.

You just lost the bet.

sage: from pGroupCohomology.mtx import MTX
sage: F = GF(3)
sage: A = random_matrix(F,2000)
sage: B = random_matrix(F,2000)
sage: a = MTX(3,[list(r) for r in A.rows()])
sage: b = MTX(3,[list(r) for r in B.rows()])
sage: timeit("C=A*B")
5 loops, best of 3: 12.6 s per loop
sage: timeit("c=a*b")
5 loops, best of 3: 2.01 s per loop
sage: C = A*B
sage: a*b == MTX(3,[list(r) for r in C.rows()])
True

When I did some benchmarks two years ago, I was often astonished how MeatAxe outperformed the usual matrices in Sage in basic operations such as computing hash, copying, getting the list of coefficients, changing a coefficient, etc: Hash and copying took seconds for size 10000x10000!

I complained, and things did improve: hash and copy is alright. But with the above matrix, loads(dumps(A)) did not terminate after a minute, and ate 90% of my computer's memory.

robertwb · 2010-01-31T13:05:19Z

Attachment: 8096-zero-matrix.patch.gz

apply on top of previous

robertwb · 2010-01-31T13:18:04Z

comment:8

SimonKing: I'm impressed and depressed. Is MeatAxe's runtime a function of the characteristic? At least one can do

sage: A = random_matrix(GF(3), 2000)
sage: B = random_matrix(GF(3), 2000)
sage: %timeit A._multiply_linbox(B)
5 loops, best of 3: 1.9 s per loop

but I agree the current state of matrices are in quite a bit of a mess. Granted, most of this code was written way back before Sage had the quality control it has now, just to get something up and running, and hasn't been touched since. (They're pretty good over Z and Q, which is what I use...)

Tom: Your patch looks good, positive review to that. I've posted another patch which provides another 2x speedup. There's still a lot of cruft it goes through, and matrix_space could really stand to be cythonized (there are 3 mandatory Python calls on it every time a matrix is created), but I don't have time to dig through it now.

simon-king-jena · 2010-01-31T16:05:22Z

comment:9

Replying to @robertwb:

SimonKing: I'm impressed and depressed. Is MeatAxe's runtime a function of the characteristic?

Yes. Over a non-prime field:

sage: F = GF(8,'t')
sage: A = random_matrix(F,2000)
sage: B = random_matrix(F,2000)
sage: from pGroupCohomology.mtx import MTX
sage: a = MTX(8,[list(r) for r in A.rows()])
sage: b = MTX(8,[list(r) for r in B.rows()])
sage: %time c = a*b
CPU times: user 5.19 s, sys: 0.00 s, total: 5.19 s
Wall time: 5.19 s
sage: %time C = A*B

The last line took several minutes, and it was not possible to interrupt with Ctrl-c. So, I had to kill the Python process.

And with a bigger prime, comparing against linbox:

sage: F = GF(37)
sage: A = random_matrix(F,2000)
sage: B = random_matrix(F,2000)
sage: a = MTX(37,[list(r) for r in A.rows()])
sage: b = MTX(37,[list(r) for r in B.rows()])
sage: %time c = a*b
CPU times: user 11.83 s, sys: 0.00 s, total: 11.84 s
Wall time: 11.87 s
sage: %time C = A._multiply_linbox(B)
CPU times: user 1.82 s, sys: 0.00 s, total: 1.82 s
Wall time: 1.82 s
sage: %time C = A*B
CPU times: user 12.65 s, sys: 0.00 s, total: 12.65 s
Wall time: 12.70 s

In other words, for bigger fields, linbox is clearly better than MeatAxe, but the overhead kills it.

At least one can do sage: A = random_matrix(GF(3), 2000) sage: B = random_matrix(GF(3), 2000) sage: %timeit A._multiply_linbox(B) 5 loops, best of 3: 1.9 s per loop

Again, there is a huge overhead. But at what point? I mean, the parents of A and B are the same, so, it can't be in the coercion model. And if two matrices have the same parent, then I'd expect that A*B would simply call A._multiply_linbox(B).

Tom: Your patch looks good, positive review to that. I've posted another patch which provides another 2x speedup. There's still a lot of cruft it goes through, and matrix_space could really stand to be cythonized (there are 3 mandatory Python calls on it every time a matrix is created), but I don't have time to dig through it now.

Which means, a third person (e.g.) should review both Tom's and your patch, right?

simon-king-jena · 2010-01-31T16:31:01Z

Author: boothby, robertwb

simon-king-jena · 2010-01-31T16:31:01Z

comment:10

With the patch, I get:

sage: F = GF(3)
sage: A = random_matrix(F,3)
sage: B = random_matrix(F,3)
sage: %timeit C = A*B
625 loops, best of 3: 14.9 µs per loop # was: 99.6 Âµs
sage: A = random_matrix(F,2000)
sage: B = random_matrix(F,2000)
sage: %timeit C = A*B
5 loops, best of 3: 12.4 s per loop # was: 12.6 s

So, there is only a small improvement. But it is an improvement.

I currently do sage -testall. If it passes, I'll give it a positive review, and suggest to open two new tickets, one concerning the overhead in multiplying matrices, the other concerning the problem that loads(dumps()) fails on big matrices.

simon-king-jena · 2010-01-31T17:47:00Z

Work Issues: various doc test errors

simon-king-jena · 2010-01-31T17:47:00Z

comment:11

There are doctest failures after installing the patches. See http://sage.math.washington.edu/home/SimonKing/logs/8096test.log

Some of these failures seem pretty severe. There are unexpected numerical values and recursion depth problems.

Do you have an idea how such innocent-looking change can have such a big effect?

boothby · 2010-02-01T00:17:26Z

comment:12

Looks to me like the second patch isn't compatible matrix_double_dense -- that's an easy fix. I'm unsure of matrix_2x2, though. I'll check these out after I get some homework done.

simon-king-jena · 2010-02-01T08:24:02Z

comment:13

Hi Tom!

Hopefully your fix will work. This message is mainly to test whether it works with Firefox 3.0.6 on my desktop computer to commit a message starting in wysiwyg mode.

simon-king-jena · 2011-07-12T14:03:01Z

comment:15

I think it is worth while to revive that ticket!

There has been some work done in the past, e.g., on #10763 (already merged) and #11589. I therefore suggest to start with determining the status quo (performance wise).

The following is with sage-4.7.rc2 (which contains #10763) and #11589. I provide the timings with sage-4.6.2 in square brackets:

sage: F = GF(3)
sage: A = random_matrix(F,3)
sage: B = random_matrix(F,3)
sage: %timeit C = A*B
625 loops, best of 3: 36.6 µs per loop
[625 loops, best of 3: 54.6 µs per loop]
sage: A = random_matrix(F,2000)
sage: B = random_matrix(F,2000)
sage: %timeit C = A*B
5 loops, best of 3: 8.36 s per loop
[5 loops, best of 3: 8.27 s per loop]

So, part of the issue with small matrices is resolved. The second patch attachment: 8096-new_matrix.patch is actually obsoleted by #10763 and #11589, and the first patch needs to be rebased.

However, large matrices are still not nice. We have seen that linbox provides a much faster multiplication. Wouldn't it be a valid approach to use linbox more often? Perhaps make it the default over finite fields?

simon-king-jena · 2011-07-12T15:11:19Z

comment:16

For the record: Today I slightly improved MeatAxe matrix initialisation (my MeatAxe wrapper relies on a modified/improved version of MeatAxe-2.2.4, and today I added memset).

Now, multiplication of two random 2000x2000 matrices over GF(3) takes 1.72 seconds with MeatAxe (it used to be 11 seconds, without memset), compared to 1.65 seconds with _multiply_linbox and 8.34 seconds with plain Sage matrices.

I really wonder why linbox isn't used by default.

simon-king-jena · 2012-03-10T20:34:20Z

Changed work issues from various doc test errors to rebase

simon-king-jena · 2012-03-10T20:34:20Z

comment:39

Replying to @loefflerd:

These patches do not apply correctly on either the current release or the current beta (see patchbot logs).

No surprise that two-year-old patches don't apply anymore. I am rebasing them now, producing a single patch replacing the two. I am not totally sure, but I think the ideas of the second patch are already in Sage - need to verify it, though.

simon-king-jena · 2012-03-10T20:53:32Z

comment:40

I have rebased the patches, combining them into one patch. In fact, a part (but not all) of the second patch is not needed anymore, since the creation of the zero matrix has been improved in another ticket.

Here is evidence that the patch still does improve the timings for the computation of small matrices - even after we have switched to linbox:

Without patch:

sage: d = 1
sage: A = random_matrix(GF(3), d)
sage: B = random_matrix(GF(3), d)
sage: timeit("C = A*B")
625 loops, best of 3: 40.5 µs per loop

With the patch:

sage: d = 1
sage: A = random_matrix(GF(3), d)
sage: B = random_matrix(GF(3), d)
sage: timeit("C = A*B")
625 loops, best of 3: 18.5 µs per loop

I need to run the doctests, though. But needs review.

Apply trac8096_speedup_matrix_parent.patch

simon-king-jena · 2012-03-10T21:01:27Z

comment:41

Here is the example from the ticket description.

Without the patch:

sage: for d in range(1, 70):
....:     print d,
....:     A = random_matrix(GF(3), d)
....:     B = random_matrix(GF(3), d)
....:     timeit("C = A*B")
....:     
1 625 loops, best of 3: 41 µs per loop
2 625 loops, best of 3: 41.1 µs per loop
3 625 loops, best of 3: 41 µs per loop
4 625 loops, best of 3: 41.4 µs per loop
5 625 loops, best of 3: 41.6 µs per loop
6 625 loops, best of 3: 42.4 µs per loop
7 625 loops, best of 3: 42.9 µs per loop
8 625 loops, best of 3: 43.3 µs per loop
9 625 loops, best of 3: 43.5 µs per loop
10 625 loops, best of 3: 44.1 µs per loop
11 625 loops, best of 3: 44.8 µs per loop
12 625 loops, best of 3: 45.5 µs per loop
13 625 loops, best of 3: 46.3 µs per loop
14 625 loops, best of 3: 47.6 µs per loop
15 625 loops, best of 3: 48.8 µs per loop
16 625 loops, best of 3: 50.4 µs per loop
17 625 loops, best of 3: 51.8 µs per loop
18 625 loops, best of 3: 53.4 µs per loop
19 625 loops, best of 3: 54.7 µs per loop
20 625 loops, best of 3: 56.5 µs per loop
21 625 loops, best of 3: 58.4 µs per loop
22 625 loops, best of 3: 60.8 µs per loop
23 625 loops, best of 3: 63.3 µs per loop
24 625 loops, best of 3: 61.7 µs per loop
25 625 loops, best of 3: 64.1 µs per loop
26 625 loops, best of 3: 66.3 µs per loop
27 625 loops, best of 3: 70.3 µs per loop
28 625 loops, best of 3: 72.7 µs per loop
29 625 loops, best of 3: 75.2 µs per loop
30 625 loops, best of 3: 79.4 µs per loop
31 625 loops, best of 3: 82 µs per loop
32 625 loops, best of 3: 86.5 µs per loop
33 625 loops, best of 3: 89.8 µs per loop
34 625 loops, best of 3: 94.3 µs per loop
35 625 loops, best of 3: 98.1 µs per loop
36 625 loops, best of 3: 92.1 µs per loop
37 625 loops, best of 3: 95.9 µs per loop
38 625 loops, best of 3: 100 µs per loop
39 625 loops, best of 3: 105 µs per loop
40 625 loops, best of 3: 109 µs per loop
41 625 loops, best of 3: 117 µs per loop
42 625 loops, best of 3: 123 µs per loop
43 625 loops, best of 3: 129 µs per loop
44 625 loops, best of 3: 136 µs per loop
45 625 loops, best of 3: 142 µs per loop
46 625 loops, best of 3: 149 µs per loop
47 625 loops, best of 3: 156 µs per loop
48 625 loops, best of 3: 146 µs per loop
49 625 loops, best of 3: 154 µs per loop
50 625 loops, best of 3: 161 µs per loop
51 625 loops, best of 3: 168 µs per loop
52 625 loops, best of 3: 177 µs per loop
53 625 loops, best of 3: 185 µs per loop
54 625 loops, best of 3: 194 µs per loop
55 625 loops, best of 3: 202 µs per loop
56 625 loops, best of 3: 213 µs per loop
57 625 loops, best of 3: 222 µs per loop
58 625 loops, best of 3: 234 µs per loop
59 625 loops, best of 3: 244 µs per loop
60 625 loops, best of 3: 225 µs per loop
61 625 loops, best of 3: 235 µs per loop
62 625 loops, best of 3: 248 µs per loop
63 625 loops, best of 3: 260 µs per loop
64 625 loops, best of 3: 271 µs per loop
65 625 loops, best of 3: 287 µs per loop
66 625 loops, best of 3: 297 µs per loop
67 625 loops, best of 3: 311 µs per loop
68 625 loops, best of 3: 324 µs per loop
69 625 loops, best of 3: 340 µs per loop

With the patch:

sage: for d in range(1, 70):
....:     print d,
....:     A = random_matrix(GF(3), d)
....:     B = random_matrix(GF(3), d)
....:     timeit("C = A*B")
....:     
1 625 loops, best of 3: 18.3 µs per loop
2 625 loops, best of 3: 18.4 µs per loop
3 625 loops, best of 3: 18.5 µs per loop
4 625 loops, best of 3: 18.8 µs per loop
5 625 loops, best of 3: 18.9 µs per loop
6 625 loops, best of 3: 19.5 µs per loop
7 625 loops, best of 3: 19.9 µs per loop
8 625 loops, best of 3: 20.3 µs per loop
9 625 loops, best of 3: 21 µs per loop
10 625 loops, best of 3: 21.4 µs per loop
11 625 loops, best of 3: 22 µs per loop
12 625 loops, best of 3: 22.4 µs per loop
13 625 loops, best of 3: 23.9 µs per loop
14 625 loops, best of 3: 24.9 µs per loop
15 625 loops, best of 3: 25.6 µs per loop
16 625 loops, best of 3: 27.2 µs per loop
17 625 loops, best of 3: 28.2 µs per loop
18 625 loops, best of 3: 29.8 µs per loop
19 625 loops, best of 3: 31.4 µs per loop
20 625 loops, best of 3: 33.1 µs per loop
21 625 loops, best of 3: 35 µs per loop
22 625 loops, best of 3: 37.2 µs per loop
23 625 loops, best of 3: 39.4 µs per loop
24 625 loops, best of 3: 38.3 µs per loop
25 625 loops, best of 3: 40.9 µs per loop
26 625 loops, best of 3: 43.2 µs per loop
27 625 loops, best of 3: 46 µs per loop
28 625 loops, best of 3: 49 µs per loop
29 625 loops, best of 3: 51.9 µs per loop
30 625 loops, best of 3: 55.2 µs per loop
31 625 loops, best of 3: 58.3 µs per loop
32 625 loops, best of 3: 62.8 µs per loop
33 625 loops, best of 3: 66.9 µs per loop
34 625 loops, best of 3: 71.1 µs per loop
35 625 loops, best of 3: 75.1 µs per loop
36 625 loops, best of 3: 68.1 µs per loop
37 625 loops, best of 3: 72.3 µs per loop
38 625 loops, best of 3: 76.9 µs per loop
39 625 loops, best of 3: 81.7 µs per loop
40 625 loops, best of 3: 85.8 µs per loop
41 625 loops, best of 3: 94.2 µs per loop
42 625 loops, best of 3: 99.8 µs per loop
43 625 loops, best of 3: 106 µs per loop
44 625 loops, best of 3: 112 µs per loop
45 625 loops, best of 3: 119 µs per loop
46 625 loops, best of 3: 126 µs per loop
47 625 loops, best of 3: 132 µs per loop
48 625 loops, best of 3: 123 µs per loop
49 625 loops, best of 3: 130 µs per loop
50 625 loops, best of 3: 137 µs per loop
51 625 loops, best of 3: 145 µs per loop
52 625 loops, best of 3: 153 µs per loop
53 625 loops, best of 3: 162 µs per loop
54 625 loops, best of 3: 170 µs per loop
55 625 loops, best of 3: 180 µs per loop
56 625 loops, best of 3: 190 µs per loop
57 625 loops, best of 3: 200 µs per loop
58 625 loops, best of 3: 210 µs per loop
59 625 loops, best of 3: 221 µs per loop
60 625 loops, best of 3: 202 µs per loop
61 625 loops, best of 3: 212 µs per loop
62 625 loops, best of 3: 224 µs per loop
63 625 loops, best of 3: 237 µs per loop
64 625 loops, best of 3: 248 µs per loop
65 625 loops, best of 3: 263 µs per loop
66 625 loops, best of 3: 276 µs per loop
67 625 loops, best of 3: 288 µs per loop
68 625 loops, best of 3: 305 µs per loop
69 625 loops, best of 3: 318 µs per loop

So, there is an improvement for bigger matrices as well.

simon-king-jena · 2012-03-10T21:04:02Z

Changed work issues from rebase to none

simon-king-jena · 2012-03-11T08:53:50Z

comment:43

The patchbot found no doctest errors, but complains that the old patch has introduced white space.

Hence, I have updated the patch and hope that it will be fine now.

Apply trac8096_speedup_matrix_parent.patch

loefflerd · 2012-03-11T09:24:32Z

Reviewer: PatchBot

loefflerd · 2012-03-11T09:24:32Z

Changed author from boothby, robertwb to Tom Boothby, Robert Bradshaw, Simon King

loefflerd · 2012-03-11T09:24:32Z

comment:44

Curiously the latest version doesn't seem to build on 5.0.beta7: the patch applies, but the modified cython file matrix_window.pyx won't build. (See patchbot logs; I also reproduced this separately by hand.)

simon-king-jena · 2012-03-11T12:14:52Z

comment:45

Replying to @loefflerd:

Curiously the latest version doesn't seem to build on 5.0.beta7: the patch applies, but the modified cython file matrix_window.pyx won't build. (See patchbot logs; I also reproduced this separately by hand.)

Oops. Apparently, while removing white space, I have also removed something else. Sorry, I'll update it in a minute.

simon-king-jena · 2012-03-11T12:18:35Z

comment:46

Sorry that I did not try to build Sage again after manually editing the patch.

I have updated the patch, and now it does build. I leave tests to the patchbot...

Apply trac8096_speedup_matrix_parent.patch

simon-king-jena · 2012-03-11T14:27:54Z

Replaces the previous patches, rebased rel sage-5.0.beta7

simon-king-jena · 2012-03-11T14:30:17Z

comment:47

Attachment: trac8096_speedup_matrix_parent.patch.gz

Arrgh! There was another trailing whitespace in the patch!

It should now be fixed.

By the way: Am I entitled to review the patch, even though rebasing the two original patches has not been totally mechanic? I still think that Tom and Robert are the authors, so that I could be reviewer. Agreed?

simon-king-jena · 2012-03-11T14:30:49Z

comment:48

Patchbot...

Apply trac8096_speedup_matrix_parent.patch

loefflerd · 2012-03-11T14:40:17Z

comment:49

I think reviewing patches you have rebased is allowed, as long as there's no change in functionality.

loefflerd · 2012-03-13T01:48:11Z

Changed reviewer from PatchBot to Simon King, David Loeffler

jdemeyer · 2012-03-21T22:04:46Z

Merged: sage-5.0.beta9

boothby added c: linear algebra labels Jan 27, 2010

boothby assigned williamstein and boothby and unassigned williamstein Jan 27, 2010

boothby changed the title ~~Coersion of square matrices is slow~~ Speed up parent creation for multiplication of square matrices Jan 27, 2010

boothby added the s: needs review label Jan 28, 2010

simon-king-jena added s: needs work and removed s: needs review labels Jan 31, 2010

jasongrout added t: enhancement and removed t: bug labels Mar 17, 2010

loefflerd mannequin added the s: needs work label Mar 10, 2012

simon-king-jena added this to the sage-5.0 milestone Mar 10, 2012

This comment has been minimized.

Sign in to view

simon-king-jena added s: needs review and removed s: needs work labels Mar 10, 2012

loefflerd mannequin added s: needs work and removed s: needs review labels Mar 11, 2012

simon-king-jena added s: needs review and removed s: needs work labels Mar 11, 2012

loefflerd mannequin added s: positive review and removed s: needs review labels Mar 13, 2012

jdemeyer removed the s: positive review label Mar 21, 2012

jdemeyer closed this as completed Mar 21, 2012

Speed up parent creation for multiplication of square matrices #8096

Speed up parent creation for multiplication of square matrices #8096

Comments

boothby commented Jan 27, 2010

boothby commented Jan 28, 2010

boothby commented Jan 28, 2010

simon-king-jena commented Jan 30, 2010

boothby commented Jan 31, 2010

simon-king-jena commented Jan 31, 2010

robertwb commented Jan 31, 2010

simon-king-jena commented Jan 31, 2010

robertwb commented Jan 31, 2010

robertwb commented Jan 31, 2010

simon-king-jena commented Jan 31, 2010

simon-king-jena commented Jan 31, 2010

simon-king-jena commented Jan 31, 2010

simon-king-jena commented Jan 31, 2010

simon-king-jena commented Jan 31, 2010

boothby commented Feb 1, 2010

simon-king-jena commented Feb 1, 2010

simon-king-jena commented Jul 12, 2011

simon-king-jena commented Jul 12, 2011

simon-king-jena commented Mar 10, 2012

simon-king-jena commented Mar 10, 2012

simon-king-jena commented Mar 10, 2012

This comment has been minimized.

simon-king-jena commented Mar 10, 2012

simon-king-jena commented Mar 10, 2012

simon-king-jena commented Mar 11, 2012

loefflerd mannequin commented Mar 11, 2012

loefflerd mannequin commented Mar 11, 2012

loefflerd mannequin commented Mar 11, 2012

simon-king-jena commented Mar 11, 2012

simon-king-jena commented Mar 11, 2012

simon-king-jena commented Mar 11, 2012

simon-king-jena commented Mar 11, 2012

simon-king-jena commented Mar 11, 2012

loefflerd mannequin commented Mar 11, 2012

loefflerd mannequin commented Mar 13, 2012

jdemeyer commented Mar 21, 2012