Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up parent creation for multiplication of square matrices #8096

Closed
boothby opened this issue Jan 27, 2010 · 58 comments
Closed

Speed up parent creation for multiplication of square matrices #8096

boothby opened this issue Jan 27, 2010 · 58 comments

Comments

@boothby
Copy link

boothby commented Jan 27, 2010

Multiplication of small square matrices is ridiculously slow:

sage: for d in range(1, 100):
...    print d
...    A = random_matrix(GF(3), d)
...    B = random_matrix(GF(3), d)
...    timeit("C = A*B")
    

1
625 loops, best of 3: 93.8 µs per loop
2
625 loops, best of 3: 93.9 µs per loop
3
625 loops, best of 3: 94.2 µs per loop
4
625 loops, best of 3: 94.1 µs per loop
5
625 loops, best of 3: 94.7 µs per loop
6
625 loops, best of 3: 94.9 µs per loop
7
625 loops, best of 3: 95.2 µs per loop
8
625 loops, best of 3: 95.8 µs per loop
9
625 loops, best of 3: 96.8 µs per loop
10
625 loops, best of 3: 97.6 µs per loop
11
625 loops, best of 3: 98.1 µs per loop
12
625 loops, best of 3: 101 µs per loop
13
625 loops, best of 3: 101 µs per loop
14
625 loops, best of 3: 104 µs per loop
15
625 loops, best of 3: 104 µs per loop
16
625 loops, best of 3: 108 µs per loop
17
625 loops, best of 3: 108 µs per loop
18
625 loops, best of 3: 113 µs per loop
19
625 loops, best of 3: 112 µs per loop
20
625 loops, best of 3: 118 µs per loop
21
625 loops, best of 3: 117 µs per loop
22
625 loops, best of 3: 125 µs per loop
23
625 loops, best of 3: 123 µs per loop
24
625 loops, best of 3: 133 µs per loop
25
625 loops, best of 3: 129 µs per loop
26
625 loops, best of 3: 143 µs per loop
27
625 loops, best of 3: 137 µs per loop
28
625 loops, best of 3: 155 µs per loop
29
625 loops, best of 3: 147 µs per loop
30
625 loops, best of 3: 166 µs per loop
31
625 loops, best of 3: 157 µs per loop
32
625 loops, best of 3: 179 µs per loop
33
625 loops, best of 3: 168 µs per loop
34
625 loops, best of 3: 196 µs per loop
35
625 loops, best of 3: 182 µs per loop
36
625 loops, best of 3: 214 µs per loop
37
625 loops, best of 3: 198 µs per loop
38
625 loops, best of 3: 234 µs per loop
39
625 loops, best of 3: 213 µs per loop
40
625 loops, best of 3: 255 µs per loop
41
625 loops, best of 3: 231 µs per loop
42
625 loops, best of 3: 279 µs per loop
43
625 loops, best of 3: 251 µs per loop
44
625 loops, best of 3: 307 µs per loop
45
625 loops, best of 3: 271 µs per loop
46
625 loops, best of 3: 335 µs per loop
47
625 loops, best of 3: 298 µs per loop
48
625 loops, best of 3: 363 µs per loop
49
625 loops, best of 3: 319 µs per loop
50
625 loops, best of 3: 401 µs per loop
51
625 loops, best of 3: 346 µs per loop

Here's a profile of the 1x1 case:

625 loops, best of 3: 91.7 µs per loop
         810004 function calls in 5.980 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    40000    0.100    0.000    0.100    0.000 :0(IntegerMod)
    30000    0.080    0.000    0.080    0.000 :0(append)
    10000    0.030    0.000    0.030    0.000 :0(base_ring)
    10000    0.150    0.000    0.900    0.000 :0(category)
    40000    0.250    0.000    0.290    0.000 :0(has_key)
    10000    0.070    0.000    0.070    0.000 :0(hasattr)
    10000    0.030    0.000    0.030    0.000 :0(is_Matrix)
    80000    0.250    0.000    0.250    0.000 :0(isinstance)
    30000    0.070    0.000    0.070    0.000 :0(keys)
    30000    0.040    0.000    0.040    0.000 :0(len)
    30001    0.080    0.000    0.080    0.000 :0(range)
    10000    0.030    0.000    0.030    0.000 :0(setdefault)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
    30000    0.140    0.000    0.140    0.000 :0(sorted)
        1    0.380    0.380    5.980    5.980 <string>:1(<module>)
    30000    0.250    0.000    1.750    0.000 cachefunc.py:155(get_key)
    10000    0.060    0.000    0.850    0.000 cachefunc.py:220(__call__)
    10000    0.060    0.000    0.090    0.000 cachefunc.py:254(get_cache)
    10000    0.050    0.000    0.050    0.000 cachefunc.py:275(__get__)
    20000    0.270    0.000    1.550    0.000 cachefunc.py:76(__call__)
    20000    0.060    0.000    0.060    0.000 cachefunc.py:95(get_cache)
    10000    0.070    0.000    2.520    0.000 category.py:459(__contains__)
    10000    0.360    0.000    1.550    0.000 category.py:651(is_subcategory)
    20000    0.120    0.000    1.670    0.000 classcall_metaclass.py:64(__call__)
    20000    0.040    0.000    0.040    0.000 finite_field_prime_modn.py:121(characteristic)
    20000    0.110    0.000    0.110    0.000 finite_field_prime_modn.py:187(order)
    30000    1.010    0.000    1.500    0.000 function_mangling.py:205(fix_to_pos)
    30000    0.080    0.000    0.080    0.000 function_mangling.py:261(<genexpr>)
    40000    0.290    0.000    0.390    0.000 integer_mod_ring.py:733(__call__)
    10000    0.050    0.000    0.070    0.000 matrix_group_element.py:68(is_MatrixGroupElement)
    10000    0.450    0.000    0.710    0.000 matrix_space.py:1035(matrix)
    10000    0.140    0.000    4.000    0.000 matrix_space.py:1089(matrix_space)
    10000    0.200    0.000    3.830    0.000 matrix_space.py:110(MatrixSpace)
    10000    0.000    0.000    0.000    0.000 matrix_space.py:1112(ncols)
    10000    0.030    0.000    0.030    0.000 matrix_space.py:1124(nrows)
    10000    0.310    0.000    1.270    0.000 matrix_space.py:271(__call__)
    10000    0.000    0.000    0.000    0.000 misc.py:514(get_verbose)
        1    0.000    0.000    5.980    5.980 profile:0(for i in range(10000): C = A*B)
        0    0.000             0.000          profile:0(profiler)
    90000    0.270    0.000    0.270    0.000 unique_representation.py:454(__eq__)

Apply attachment: trac8096_speedup_matrix_parent.patch

CC: @robertwb @malb

Component: linear algebra

Author: Tom Boothby, Robert Bradshaw, Simon King

Reviewer: Simon King, David Loeffler

Merged: sage-5.0.beta9

Issue created by migration from https://trac.sagemath.org/ticket/8096

@boothby boothby assigned williamstein and boothby and unassigned williamstein Jan 27, 2010
@boothby boothby changed the title Coersion of square matrices is slow Speed up parent creation for multiplication of square matrices Jan 27, 2010
@boothby
Copy link
Author

boothby commented Jan 28, 2010

Attachment: 8096-new_matrix.patch.gz

@boothby
Copy link
Author

boothby commented Jan 28, 2010

comment:2

I'm not sure that I did this in the right place, but it cuts the time to multiply two 1x1 matrices down to 1/3 of the previous time -- which is still dog slow, but significantly better. As a reference,

sage: A = GF(3).random_element()
sage: B = GF(3).random_element()
sage: timeit("C = A*B")
625 loops, best of 3: 142 ns per loop

and with this patch, I'm getting:

sage: d = 1
sage: A = random_matrix(GF(3), d)
sage: B = random_matrix(GF(3), d)
sage: timeit("C = A*B")
625 loops, best of 3: 32.5 µs per loop

sage: import profile
sage: profile.run("for i in range(10000): C = A*B")
         250004 function calls in 1.840 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    40000    0.120    0.000    0.120    0.000 :0(IntegerMod)
    10000    0.070    0.000    0.070    0.000 :0(hasattr)
    10000    0.060    0.000    0.060    0.000 :0(is_Matrix)
    70000    0.190    0.000    0.190    0.000 :0(isinstance)
        1    0.000    0.000    0.000    0.000 :0(range)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        1    0.400    0.400    1.840    1.840 <string>:1(<module>)
    20000    0.060    0.000    0.060    0.000 finite_field_prime_modn.py:121(characteristic)
    40000    0.230    0.000    0.350    0.000 integer_mod_ring.py:733(__call__)
    10000    0.060    0.000    0.070    0.000 matrix_group_element.py:68(is_MatrixGroupElement)
    10000    0.390    0.000    0.720    0.000 matrix_space.py:1035(matrix)
    10000    0.030    0.000    0.030    0.000 matrix_space.py:1112(ncols)
    10000    0.020    0.000    0.020    0.000 matrix_space.py:1124(nrows)
    10000    0.160    0.000    1.100    0.000 matrix_space.py:271(__call__)
    10000    0.050    0.000    0.050    0.000 misc.py:514(get_verbose)
        1    0.000    0.000    1.840    1.840 profile:0(for i in range(10000): C = A*B)
        0    0.000             0.000          profile:0(profiler)

@simon-king-jena
Copy link
Member

comment:3

Replying to @boothby:

... As a reference, sage: A = GF(3).random_element() sage: B = GF(3).random_element() sage: timeit("C = A*B") 625 loops, best of 3: ``142 ns`` per loop and with this patch, I'm getting: {{{ sage: d = 1 sage: A = random_matrix(GF(3), d) sage: B = random_matrix(GF(3), d) sage: timeit("C = A*B") 625 loops, best of 3: 32.5 µs per loop

But 32.5 µs (with the patch) is much *slower *than 142 ns (without the patch)!

@boothby
Copy link
Author

boothby commented Jan 31, 2010

comment:4

You'll note that the 32.5µs is matrix-by-matrix, whereas the 142ns is element-by-element. Before my patch, the matrix-by-matrix time was 101µs.

I'd like the 32.5µs to go away, but I don't know how much of that would be possible, at the moment.

@simon-king-jena
Copy link
Member

comment:5

Replying to @boothby:

You'll note that the 32.5µs is matrix-by-matrix, whereas the 142ns is element-by-element. Before my patch, the matrix-by-matrix time was 101µs. I'd like the 32.5µs to go away, but I don't know how much of that would be possible, at the moment.

OK, then it is an improvement. I hope to be able to do some refereeing later today or tomorrow.

Anyway, I still wonder why basic matrix operations in Sage tend to be so slow. I mean, do any complicated operations with parents happen behind the scenes? By "slow", I mean "compared with MeatAxe matrices as provided by my cohomology spkg":

sage: from pGroupCohomology.mtx import MTX
sage: F = GF(3)
sage: A = random_matrix(F,3)
sage: B = random_matrix(F,3)
sage: a = MTX(3,[list(r) for r in A.rows()])
sage: b = MTX(3,[list(r) for r in B.rows()])
sage: timeit("C=A*B")
625 loops, best of 3: 99.6 µs per loop
sage: timeit("c=a*b")
625 loops, best of 3: 1.01 µs per loop
sage: a*b == MTX(3,[list(r) for r in C.rows()])
True
sage: A = random_matrix(F,100)
sage: B = random_matrix(F,100)
sage: a = MTX(3,[list(r) for r in A.rows()])
sage: b = MTX(3,[list(r) for r in B.rows()])
sage: timeit("C=A*B")
125 loops, best of 3: 2.43 ms per loop
sage: timeit("c=a*b")
625 loops, best of 3: 376 µs per loop
sage: a*b == MTX(3,[list(r) for r in C.rows()])
True

@robertwb
Copy link
Contributor

comment:6

Replying to @simon-king-jena:

Anyway, I still wonder why basic matrix operations in Sage tend to be so slow. I mean, do any complicated operations with parents happen behind the scenes? By "slow", I mean "compared with MeatAxe matrices as provided by my cohomology spkg":

This is because they have only been optimized for large dimension and word-sized p. I bet MeatAxe is slower for 1000 x 1000 matrices.

@simon-king-jena
Copy link
Member

comment:7

Replying to @robertwb:

This is because they have only been optimized for large dimension and word-sized p. I bet MeatAxe is slower for 1000 x 1000 matrices.

You just lost the bet.

sage: from pGroupCohomology.mtx import MTX
sage: F = GF(3)
sage: A = random_matrix(F,2000)
sage: B = random_matrix(F,2000)
sage: a = MTX(3,[list(r) for r in A.rows()])
sage: b = MTX(3,[list(r) for r in B.rows()])
sage: timeit("C=A*B")
5 loops, best of 3: 12.6 s per loop
sage: timeit("c=a*b")
5 loops, best of 3: 2.01 s per loop
sage: C = A*B
sage: a*b == MTX(3,[list(r) for r in C.rows()])
True

When I did some benchmarks two years ago, I was often astonished how MeatAxe outperformed the usual matrices in Sage in basic operations such as computing hash, copying, getting the list of coefficients, changing a coefficient, etc: Hash and copying took seconds for size 10000x10000!

I complained, and things did improve: hash and copy is alright. But with the above matrix, loads(dumps(A)) did not terminate after a minute, and ate 90% of my computer's memory.

@robertwb
Copy link
Contributor

Attachment: 8096-zero-matrix.patch.gz

apply on top of previous

@robertwb
Copy link
Contributor

comment:8

SimonKing: I'm impressed and depressed. Is MeatAxe's runtime a function of the characteristic? At least one can do

sage: A = random_matrix(GF(3), 2000)
sage: B = random_matrix(GF(3), 2000)
sage: %timeit A._multiply_linbox(B)
5 loops, best of 3: 1.9 s per loop

but I agree the current state of matrices are in quite a bit of a mess. Granted, most of this code was written way back before Sage had the quality control it has now, just to get something up and running, and hasn't been touched since. (They're pretty good over Z and Q, which is what I use...)

Tom: Your patch looks good, positive review to that. I've posted another patch which provides another 2x speedup. There's still a lot of cruft it goes through, and matrix_space could really stand to be cythonized (there are 3 mandatory Python calls on it every time a matrix is created), but I don't have time to dig through it now.

@simon-king-jena
Copy link
Member

comment:9

Replying to @robertwb:

SimonKing: I'm impressed and depressed. Is MeatAxe's runtime a function of the characteristic?

Yes. Over a non-prime field:

sage: F = GF(8,'t')
sage: A = random_matrix(F,2000)
sage: B = random_matrix(F,2000)
sage: from pGroupCohomology.mtx import MTX
sage: a = MTX(8,[list(r) for r in A.rows()])
sage: b = MTX(8,[list(r) for r in B.rows()])
sage: %time c = a*b
CPU times: user 5.19 s, sys: 0.00 s, total: 5.19 s
Wall time: 5.19 s
sage: %time C = A*B

The last line took several minutes, and it was not possible to interrupt with Ctrl-c. So, I had to kill the Python process.

And with a bigger prime, comparing against linbox:

sage: F = GF(37)
sage: A = random_matrix(F,2000)
sage: B = random_matrix(F,2000)
sage: a = MTX(37,[list(r) for r in A.rows()])
sage: b = MTX(37,[list(r) for r in B.rows()])
sage: %time c = a*b
CPU times: user 11.83 s, sys: 0.00 s, total: 11.84 s
Wall time: 11.87 s
sage: %time C = A._multiply_linbox(B)
CPU times: user 1.82 s, sys: 0.00 s, total: 1.82 s
Wall time: 1.82 s
sage: %time C = A*B
CPU times: user 12.65 s, sys: 0.00 s, total: 12.65 s
Wall time: 12.70 s

In other words, for bigger fields, linbox is clearly better than MeatAxe, but the overhead kills it.

At least one can do sage: A = random_matrix(GF(3), 2000) sage: B = random_matrix(GF(3), 2000) sage: %timeit A._multiply_linbox(B) 5 loops, best of 3: 1.9 s per loop

Again, there is a huge overhead. But at what point? I mean, the parents of A and B are the same, so, it can't be in the coercion model. And if two matrices have the same parent, then I'd expect that A*B would simply call A._multiply_linbox(B).

Tom: Your patch looks good, positive review to that. I've posted another patch which provides another 2x speedup. There's still a lot of cruft it goes through, and matrix_space could really stand to be cythonized (there are 3 mandatory Python calls on it every time a matrix is created), but I don't have time to dig through it now.

Which means, a third person (e.g.) should review both Tom's and your patch, right?

@simon-king-jena
Copy link
Member

Author: boothby, robertwb

@simon-king-jena
Copy link
Member

comment:10

With the patch, I get:

sage: F = GF(3)
sage: A = random_matrix(F,3)
sage: B = random_matrix(F,3)
sage: %timeit C = A*B
625 loops, best of 3: 14.9 µs per loop # was: 99.6 µs
sage: A = random_matrix(F,2000)
sage: B = random_matrix(F,2000)
sage: %timeit C = A*B
5 loops, best of 3: 12.4 s per loop # was: 12.6 s

So, there is only a small improvement. But it is an improvement.

I currently do sage -testall. If it passes, I'll give it a positive review, and suggest to open two new tickets, one concerning the overhead in multiplying matrices, the other concerning the problem that loads(dumps()) fails on big matrices.

@simon-king-jena
Copy link
Member

Work Issues: various doc test errors

@simon-king-jena
Copy link
Member

comment:11

There are doctest failures after installing the patches. See http://sage.math.washington.edu/home/SimonKing/logs/8096test.log

Some of these failures seem pretty severe. There are unexpected numerical values and recursion depth problems.

Do you have an idea how such innocent-looking change can have such a big effect?

@boothby
Copy link
Author

boothby commented Feb 1, 2010

comment:12

Looks to me like the second patch isn't compatible matrix_double_dense -- that's an easy fix. I'm unsure of matrix_2x2, though. I'll check these out after I get some homework done.

@simon-king-jena
Copy link
Member

comment:13

Hi Tom!

Hopefully your fix will work. This message is mainly to test whether it works with Firefox 3.0.6 on my desktop computer to commit a message starting in wysiwyg mode.

@simon-king-jena
Copy link
Member

comment:15

I think it is worth while to revive that ticket!

There has been some work done in the past, e.g., on #10763 (already merged) and #11589. I therefore suggest to start with determining the status quo (performance wise).

The following is with sage-4.7.rc2 (which contains #10763) and #11589. I provide the timings with sage-4.6.2 in square brackets:

sage: F = GF(3)
sage: A = random_matrix(F,3)
sage: B = random_matrix(F,3)
sage: %timeit C = A*B
625 loops, best of 3: 36.6 µs per loop
[625 loops, best of 3: 54.6 µs per loop]
sage: A = random_matrix(F,2000)
sage: B = random_matrix(F,2000)
sage: %timeit C = A*B
5 loops, best of 3: 8.36 s per loop
[5 loops, best of 3: 8.27 s per loop]

So, part of the issue with small matrices is resolved. The second patch attachment: 8096-new_matrix.patch is actually obsoleted by #10763 and #11589, and the first patch needs to be rebased.

However, large matrices are still not nice. We have seen that linbox provides a much faster multiplication. Wouldn't it be a valid approach to use linbox more often? Perhaps make it the default over finite fields?

@simon-king-jena
Copy link
Member

comment:16

For the record: Today I slightly improved MeatAxe matrix initialisation (my MeatAxe wrapper relies on a modified/improved version of MeatAxe-2.2.4, and today I added memset).

Now, multiplication of two random 2000x2000 matrices over GF(3) takes 1.72 seconds with MeatAxe (it used to be 11 seconds, without memset), compared to 1.65 seconds with _multiply_linbox and 8.34 seconds with plain Sage matrices.

I really wonder why linbox isn't used by default.

@loefflerd loefflerd mannequin added the s: needs work label Mar 10, 2012
@simon-king-jena
Copy link
Member

Changed work issues from various doc test errors to rebase

@simon-king-jena
Copy link
Member

comment:39

Replying to @loefflerd:

These patches do not apply correctly on either the current release or the current beta (see patchbot logs).

No surprise that two-year-old patches don't apply anymore. I am rebasing them now, producing a single patch replacing the two. I am not totally sure, but I think the ideas of the second patch are already in Sage - need to verify it, though.

@simon-king-jena simon-king-jena added this to the sage-5.0 milestone Mar 10, 2012
@simon-king-jena
Copy link
Member

comment:40

I have rebased the patches, combining them into one patch. In fact, a part (but not all) of the second patch is not needed anymore, since the creation of the zero matrix has been improved in another ticket.

Here is evidence that the patch still does improve the timings for the computation of small matrices - even after we have switched to linbox:

Without patch:

sage: d = 1
sage: A = random_matrix(GF(3), d)
sage: B = random_matrix(GF(3), d)
sage: timeit("C = A*B")
625 loops, best of 3: 40.5 µs per loop

With the patch:

sage: d = 1
sage: A = random_matrix(GF(3), d)
sage: B = random_matrix(GF(3), d)
sage: timeit("C = A*B")
625 loops, best of 3: 18.5 µs per loop

I need to run the doctests, though. But needs review.

Apply trac8096_speedup_matrix_parent.patch

@simon-king-jena

This comment has been minimized.

@simon-king-jena
Copy link
Member

comment:41

Here is the example from the ticket description.

Without the patch:

sage: for d in range(1, 70):
....:     print d,
....:     A = random_matrix(GF(3), d)
....:     B = random_matrix(GF(3), d)
....:     timeit("C = A*B")
....:     
1 625 loops, best of 3: 41 µs per loop
2 625 loops, best of 3: 41.1 µs per loop
3 625 loops, best of 3: 41 µs per loop
4 625 loops, best of 3: 41.4 µs per loop
5 625 loops, best of 3: 41.6 µs per loop
6 625 loops, best of 3: 42.4 µs per loop
7 625 loops, best of 3: 42.9 µs per loop
8 625 loops, best of 3: 43.3 µs per loop
9 625 loops, best of 3: 43.5 µs per loop
10 625 loops, best of 3: 44.1 µs per loop
11 625 loops, best of 3: 44.8 µs per loop
12 625 loops, best of 3: 45.5 µs per loop
13 625 loops, best of 3: 46.3 µs per loop
14 625 loops, best of 3: 47.6 µs per loop
15 625 loops, best of 3: 48.8 µs per loop
16 625 loops, best of 3: 50.4 µs per loop
17 625 loops, best of 3: 51.8 µs per loop
18 625 loops, best of 3: 53.4 µs per loop
19 625 loops, best of 3: 54.7 µs per loop
20 625 loops, best of 3: 56.5 µs per loop
21 625 loops, best of 3: 58.4 µs per loop
22 625 loops, best of 3: 60.8 µs per loop
23 625 loops, best of 3: 63.3 µs per loop
24 625 loops, best of 3: 61.7 µs per loop
25 625 loops, best of 3: 64.1 µs per loop
26 625 loops, best of 3: 66.3 µs per loop
27 625 loops, best of 3: 70.3 µs per loop
28 625 loops, best of 3: 72.7 µs per loop
29 625 loops, best of 3: 75.2 µs per loop
30 625 loops, best of 3: 79.4 µs per loop
31 625 loops, best of 3: 82 µs per loop
32 625 loops, best of 3: 86.5 µs per loop
33 625 loops, best of 3: 89.8 µs per loop
34 625 loops, best of 3: 94.3 µs per loop
35 625 loops, best of 3: 98.1 µs per loop
36 625 loops, best of 3: 92.1 µs per loop
37 625 loops, best of 3: 95.9 µs per loop
38 625 loops, best of 3: 100 µs per loop
39 625 loops, best of 3: 105 µs per loop
40 625 loops, best of 3: 109 µs per loop
41 625 loops, best of 3: 117 µs per loop
42 625 loops, best of 3: 123 µs per loop
43 625 loops, best of 3: 129 µs per loop
44 625 loops, best of 3: 136 µs per loop
45 625 loops, best of 3: 142 µs per loop
46 625 loops, best of 3: 149 µs per loop
47 625 loops, best of 3: 156 µs per loop
48 625 loops, best of 3: 146 µs per loop
49 625 loops, best of 3: 154 µs per loop
50 625 loops, best of 3: 161 µs per loop
51 625 loops, best of 3: 168 µs per loop
52 625 loops, best of 3: 177 µs per loop
53 625 loops, best of 3: 185 µs per loop
54 625 loops, best of 3: 194 µs per loop
55 625 loops, best of 3: 202 µs per loop
56 625 loops, best of 3: 213 µs per loop
57 625 loops, best of 3: 222 µs per loop
58 625 loops, best of 3: 234 µs per loop
59 625 loops, best of 3: 244 µs per loop
60 625 loops, best of 3: 225 µs per loop
61 625 loops, best of 3: 235 µs per loop
62 625 loops, best of 3: 248 µs per loop
63 625 loops, best of 3: 260 µs per loop
64 625 loops, best of 3: 271 µs per loop
65 625 loops, best of 3: 287 µs per loop
66 625 loops, best of 3: 297 µs per loop
67 625 loops, best of 3: 311 µs per loop
68 625 loops, best of 3: 324 µs per loop
69 625 loops, best of 3: 340 µs per loop

With the patch:

sage: for d in range(1, 70):
....:     print d,
....:     A = random_matrix(GF(3), d)
....:     B = random_matrix(GF(3), d)
....:     timeit("C = A*B")
....:     
1 625 loops, best of 3: 18.3 µs per loop
2 625 loops, best of 3: 18.4 µs per loop
3 625 loops, best of 3: 18.5 µs per loop
4 625 loops, best of 3: 18.8 µs per loop
5 625 loops, best of 3: 18.9 µs per loop
6 625 loops, best of 3: 19.5 µs per loop
7 625 loops, best of 3: 19.9 µs per loop
8 625 loops, best of 3: 20.3 µs per loop
9 625 loops, best of 3: 21 µs per loop
10 625 loops, best of 3: 21.4 µs per loop
11 625 loops, best of 3: 22 µs per loop
12 625 loops, best of 3: 22.4 µs per loop
13 625 loops, best of 3: 23.9 µs per loop
14 625 loops, best of 3: 24.9 µs per loop
15 625 loops, best of 3: 25.6 µs per loop
16 625 loops, best of 3: 27.2 µs per loop
17 625 loops, best of 3: 28.2 µs per loop
18 625 loops, best of 3: 29.8 µs per loop
19 625 loops, best of 3: 31.4 µs per loop
20 625 loops, best of 3: 33.1 µs per loop
21 625 loops, best of 3: 35 µs per loop
22 625 loops, best of 3: 37.2 µs per loop
23 625 loops, best of 3: 39.4 µs per loop
24 625 loops, best of 3: 38.3 µs per loop
25 625 loops, best of 3: 40.9 µs per loop
26 625 loops, best of 3: 43.2 µs per loop
27 625 loops, best of 3: 46 µs per loop
28 625 loops, best of 3: 49 µs per loop
29 625 loops, best of 3: 51.9 µs per loop
30 625 loops, best of 3: 55.2 µs per loop
31 625 loops, best of 3: 58.3 µs per loop
32 625 loops, best of 3: 62.8 µs per loop
33 625 loops, best of 3: 66.9 µs per loop
34 625 loops, best of 3: 71.1 µs per loop
35 625 loops, best of 3: 75.1 µs per loop
36 625 loops, best of 3: 68.1 µs per loop
37 625 loops, best of 3: 72.3 µs per loop
38 625 loops, best of 3: 76.9 µs per loop
39 625 loops, best of 3: 81.7 µs per loop
40 625 loops, best of 3: 85.8 µs per loop
41 625 loops, best of 3: 94.2 µs per loop
42 625 loops, best of 3: 99.8 µs per loop
43 625 loops, best of 3: 106 µs per loop
44 625 loops, best of 3: 112 µs per loop
45 625 loops, best of 3: 119 µs per loop
46 625 loops, best of 3: 126 µs per loop
47 625 loops, best of 3: 132 µs per loop
48 625 loops, best of 3: 123 µs per loop
49 625 loops, best of 3: 130 µs per loop
50 625 loops, best of 3: 137 µs per loop
51 625 loops, best of 3: 145 µs per loop
52 625 loops, best of 3: 153 µs per loop
53 625 loops, best of 3: 162 µs per loop
54 625 loops, best of 3: 170 µs per loop
55 625 loops, best of 3: 180 µs per loop
56 625 loops, best of 3: 190 µs per loop
57 625 loops, best of 3: 200 µs per loop
58 625 loops, best of 3: 210 µs per loop
59 625 loops, best of 3: 221 µs per loop
60 625 loops, best of 3: 202 µs per loop
61 625 loops, best of 3: 212 µs per loop
62 625 loops, best of 3: 224 µs per loop
63 625 loops, best of 3: 237 µs per loop
64 625 loops, best of 3: 248 µs per loop
65 625 loops, best of 3: 263 µs per loop
66 625 loops, best of 3: 276 µs per loop
67 625 loops, best of 3: 288 µs per loop
68 625 loops, best of 3: 305 µs per loop
69 625 loops, best of 3: 318 µs per loop

So, there is an improvement for bigger matrices as well.

@simon-king-jena
Copy link
Member

Changed work issues from rebase to none

@simon-king-jena
Copy link
Member

comment:43

The patchbot found no doctest errors, but complains that the old patch has introduced white space.

Hence, I have updated the patch and hope that it will be fine now.

Apply trac8096_speedup_matrix_parent.patch

@loefflerd
Copy link
Mannequin

loefflerd mannequin commented Mar 11, 2012

Reviewer: PatchBot

@loefflerd
Copy link
Mannequin

loefflerd mannequin commented Mar 11, 2012

Changed author from boothby, robertwb to Tom Boothby, Robert Bradshaw, Simon King

@loefflerd
Copy link
Mannequin

loefflerd mannequin commented Mar 11, 2012

comment:44

Curiously the latest version doesn't seem to build on 5.0.beta7: the patch applies, but the modified cython file matrix_window.pyx won't build. (See patchbot logs; I also reproduced this separately by hand.)

@loefflerd loefflerd mannequin added s: needs work and removed s: needs review labels Mar 11, 2012
@simon-king-jena
Copy link
Member

comment:45

Replying to @loefflerd:

Curiously the latest version doesn't seem to build on 5.0.beta7: the patch applies, but the modified cython file matrix_window.pyx won't build. (See patchbot logs; I also reproduced this separately by hand.)

Oops. Apparently, while removing white space, I have also removed something else. Sorry, I'll update it in a minute.

@simon-king-jena
Copy link
Member

comment:46

Sorry that I did not try to build Sage again after manually editing the patch.

I have updated the patch, and now it does build. I leave tests to the patchbot...

Apply trac8096_speedup_matrix_parent.patch

@simon-king-jena
Copy link
Member

Replaces the previous patches, rebased rel sage-5.0.beta7

@simon-king-jena
Copy link
Member

comment:47

Attachment: trac8096_speedup_matrix_parent.patch.gz

Arrgh! There was another trailing whitespace in the patch!

It should now be fixed.

By the way: Am I entitled to review the patch, even though rebasing the two original patches has not been totally mechanic? I still think that Tom and Robert are the authors, so that I could be reviewer. Agreed?

@simon-king-jena
Copy link
Member

comment:48

Patchbot...

Apply trac8096_speedup_matrix_parent.patch

@loefflerd
Copy link
Mannequin

loefflerd mannequin commented Mar 11, 2012

comment:49

I think reviewing patches you have rebased is allowed, as long as there's no change in functionality.

@loefflerd
Copy link
Mannequin

loefflerd mannequin commented Mar 13, 2012

Changed reviewer from PatchBot to Simon King, David Loeffler

@jdemeyer
Copy link

Merged: sage-5.0.beta9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants