-
-
Notifications
You must be signed in to change notification settings - Fork 488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
matrix multiplication over ZZ sometimes gives incorrect results #11358
Comments
comment:1
This also fails on Sage 4.6.2 under Red Hat Enterprise Linux (binary install, x86_64 GNU/Linux) |
comment:2
Looking at the function mpz_crt_vec_tail() in sage/ext/multi_modular.pyx, it appears that three of the last five lines:
are redundant, because the variable zz is never used. (NB I am unfamiliar with both Cython and GMP, so I may be wrong here.) But why is this code there? Is it supposed to be doing something else? |
comment:3
From #sagemath on IRC:
|
comment:4
Replying to @rbeezer: I don't think that this is correct. By running the code repeatedly, I found examples without zero columns. |
comment:5
Replying to @sagetrac-tomc:
OK, just saw that in IRC and thought I'd capture it. I'm curious to see what the root cause is here! Rob |
comment:6
I think I've got it. This seems to happen iff a random prime happens to be chosen twice. For example, if you modify _extend_moduli_to_height_c to always reuse a prime if more than one is needed, this always happens. The same bug seems to exist in _extend_moduli_to_count as well. The problem seems to be that in _new_random_prime, there's a test which doesn't do what it's trying to do:
IIUC, the "for i in range(self.n)" loop is attempting to avoid the problem of repeated primes, but this only works if p is added to self.moduli immediately after it's returned. In the case of _extend_moduli_to_height_c, this isn't true; the addition is delayed until a later extend_with_primes call, so the above check decides that the prime isn't reused, we get repeated moduli, and the multiplication breaks. Or even more obviously:
ISTM either we can remove the check-for-duplicate code from _new_random_prime and do it externally, or we can add an argument and pass it the "accumulated" primes as we build them elsewhere so it can avoid duplicates there too. I prefer the first. Should I work up a patch? |
comment:8
Replying to @sagetrac-dsm: Good catch! I agree.
I prefer the first too. Please write a patch and then I will review it. |
comment:9
Okay, here's my first attempt. Because another function calls _new_random_prime assuming that it correctly avoids duplication I had to change my intended approach a bit, but it should work. Two other minor changes: (1) as mentioned above, mpz_crt_vec_tail() has what looks to me like vestigial code, and I've removed it. (2) the documentation for replace_prime_c claimed that it would replace a prime with a greater prime, but made no efforts to ensure the "greater than" part. I've changed the doc accordingly. |
comment:10
Thanks. I'll review this. |
comment:11
I can't get the patch to apply. When I apply the patch and then do
I get
Also, in the new implementation of _new_random_prime() there is the possibility of an infinite loop if self.moduli already contains all the primes between self._lbound and self._ubound. I suggest modifying _new_random_prime() so that it:
|
comment:12
Replying to @sagetrac-tomc:
I suggest raising an error instead of increasing |
comment:13
Replying to @burcin:
If we do raise an error then it should be caught before it propagates up to the user. For instance, it should not be that if A and B are integer matrices then evaluating AB can sometimes cause an exception because the multi-modular matrix multiplication algorithm ran out of primes: in this case we should catch the exception and evaluate AB using a different algorithm. Is it really the case that raising self._ubound would cause things to break? I thought from looking at the multi-modular code that it all used mpz_t types from GMP, and that these would automatically increase their allocated memory as necessary. The multi-modular code is used in five places: dense matrix multiplication over ZZ; dense matrix multiplication over cyclotomics; dense matrix multiplication over QQ (although only in the rational reconstruction routines _lift_crt_rr() and _lift_crt_rr_with_lcm(), which I think are dead code); echelon form calculations over cyclotomics; and characteristic polynomial calculations over cyclotomics. If we decide to raise an error if the multi-modular code runs out of primes then we will need to catch this error and handle it appropriately in all five places. |
comment:14
Replying to @sagetrac-tomc:
This is better than trying to handle it magically in the
Multi precision integers are needed for the lifting step. One of the reasons to use a multi-modular algorithm is to take advantage of fast arithmetic implemented in hardware. This requires that the modulus is small enough to fit in a word size. For linear algebra, the dimensions of the matrix also come into play. Ideally you should reduce only once for each row*column multiplication.
You cannot assume that the only users of the code are in the Sage library. This class is designed to be an abstraction that makes it easy to implement multi modular algorithms. Perhaps we can verify that this error will never occur in these places through some theoretical argument. It is highly unlikely that we will always keep choosing the same random prime. |
comment:15
Replying to @burcin:
I agree that things will slow down if we ever raise self._ubound to be larger than a machine word: my question was whether anything would actually break. But in any case, since the MultiModularBasis class is explicitly documented as:
we should not allow the code to raise self._ubound. Thus we should raise an error if we repeatedly fail to choose a new random prime. As discussed, we will need to handle this error in several other places in Sage. Note that the situation we are discussing is very unlikely to ever occur in practice, as the default values of _lbound and _ubound are repectively 210 and 215 and there are many primes in between 210 and 215. But in theory someone could do:
or something equally silly, and then the multi-modular code would quickly run out of primes. |
comment:16
Replying to @sagetrac-tomc:
Try it against one of the 4.7 release candidates. I should've mentioned what I was basing it on. As for the case of running out of primes -- which I considered the caller's fault, and so I simply warned about it in the docstring rather than handle it -- I'm fine with whatever. (Once cython 0.15 gets integrated and we have cythonic generators it may be worth collecting random prime functions into a fast module of their own.) Given that the moduli have to be below MAX_MODULUS, it wouldn't cost much to check whether there really are any primes left and raise an exception only if we really are done and not merely unlucky. If we're doing that, though, then we should probably raise the same exception if you ask for more primes than there are available in extend_moduli_to_count as well. |
comment:17
ping why is this ticket dead? It seems very, very important, but everybody forgot about it a year ago?! Replying to @sagetrac-tomc:
This is completely false. There are hardly any primes in that range:
The linear algebra in Sage is currently very broken in practice, perhaps due to various optimizations and re-implementations that have fundamentally broken things. For example, I'm constantly hitting a problem of "not enough primes" (see #10281), even for matrices over QQ. |
This comment has been minimized.
This comment has been minimized.
Stopgaps: #12710 |
this is a usable version of the test program in the ticket description |
comment:20
Attachment: 11358.sage.gz I've posted a patch trac_11358-wstein.patch that fixes the bug. The solution is definitely much better than what was proposed 10 years ago. You can run the attached 11358.sage to check that the exact bug cited here doesn't happen. But better is to see the new doctests which get to the heart of the matter. |
comment:21
The patch does seem to eliminate the original bug for me, but there's still some strangeness for different _extend methods:
So I think that two of the three _extend methods still don't give results that I'd expect. |
comment:22
I have fixed the patch. There was one case where I didn't update use of _new_random_prime correctly. |
comment:23
The new patch: (1) Applies against 5.0.beta8 with minor fuzz. Positive review. |
Reviewer: Douglas McNeil |
Author: William Stein |
Merged: sage-5.0.beta12 |
apply ONLY this. |
comment:26
Attachment: trac_11358-wstein.patch.gz Rebased patch by removing a hunk which just added a newline. |
Something is wrong with the multi-modular matrix multiplication code for matrices over ZZ. At random, and infrequently, it gives incorrect results. For example, the following code chooses random 3x2 and 2x10 integer matrices and multiplies them together using the multi-modular algorithm. It then does the same multiplication 100 more times, checks that the answer is always the same, and if not raises an exception:
This fails with very high probability (on Sage 4.6.2 under OS X, built from source) with output such as:
Note that the two candidates for the matrix product here are congruent modulo 1103, which is prime. If you rerun the code with verbose logging, using set_verbose(2), then every time it fails the two candidates for the matrix product are congruent modulo the prime being used in the multi-modular algorithm. Thus I suspect that the Chinese Remainder Theorem code in sage/ext/multi_modular.pyx is not handling a corner case properly.
SEE ALSO: #10281, which touches the sage/ext/multi_modular.pyx code in nontrivial ways.
CC: @burcin
Component: linear algebra
Keywords: matrix multiplication, multi-modular, integers, ZZ
Stopgaps: #12710
Author: William Stein
Reviewer: Douglas McNeil
Merged: sage-5.0.beta12
Issue created by migration from https://trac.sagemath.org/ticket/11358
The text was updated successfully, but these errors were encountered: